1
|
Strober BJ, Tayeb K, Popp J, Qi G, Gordon MG, Perez R, Ye CJ, Battle A. SURGE: uncovering context-specific genetic-regulation of gene expression from single-cell RNA sequencing using latent-factor models. Genome Biol 2024; 25:28. [PMID: 38254214 PMCID: PMC10801966 DOI: 10.1186/s13059-023-03152-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 12/20/2023] [Indexed: 01/24/2024] Open
Abstract
Genetic regulation of gene expression is a complex process, with genetic effects known to vary across cellular contexts such as cell types and environmental conditions. We developed SURGE, a method for unsupervised discovery of context-specific expression quantitative trait loci (eQTLs) from single-cell transcriptomic data. This allows discovery of the contexts or cell types modulating genetic regulation without prior knowledge. Applied to peripheral blood single-cell eQTL data, SURGE contexts capture continuous representations of distinct cell types and groupings of biologically related cell types. We demonstrate the disease-relevance of SURGE context-specific eQTLs using colocalization analysis and stratified LD-score regression.
Collapse
Affiliation(s)
- Benjamin J Strober
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Karl Tayeb
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Joshua Popp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Guanghao Qi
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - M Grace Gordon
- Biological and Medical Informatics Graduate Program, University of California, San Francisco, CA, USA
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Richard Perez
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Chun Jimmie Ye
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
- Institute for Computational Health Sciences, University of California, San Francisco, San Francisco, CA, USA
- Chan-Zuckerberg Biohub, San Francisco, CA, USA
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
2
|
Fitzgerald T, Jones A, Engelhardt BE. A Poisson reduced-rank regression model for association mapping in sequencing data. BMC Bioinformatics 2022; 23:529. [PMID: 36482321 PMCID: PMC9733401 DOI: 10.1186/s12859-022-05054-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 11/14/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions. RESULTS We aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses. CONCLUSION We show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensional representations of transcriptional states.
Collapse
Affiliation(s)
- Tiana Fitzgerald
- grid.16750.350000 0001 2097 5006Department of Computer Science, Princeton University, Princeton, NJ USA
| | - Andrew Jones
- grid.16750.350000 0001 2097 5006Department of Computer Science, Princeton University, Princeton, NJ USA
| | - Barbara E. Engelhardt
- grid.16750.350000 0001 2097 5006Department of Computer Science, Princeton University, Princeton, NJ USA ,grid.249878.80000 0004 0572 7110Data Science and Biotechnology Institute, Gladstone Institutes, San Francisco, CA USA ,grid.168010.e0000000419368956Department of Biomedical Data Science, Stanford University, Stanford, CA USA
| |
Collapse
|