1
|
Li Y, Li H, Lin Y, Zhang D, Peng D, Liu X, Xie J, Hu P, Chen L, Luo H, Peng X. MetaQ: fast, scalable and accurate metacell inference via single-cell quantization. Nat Commun 2025; 16:1205. [PMID: 39885131 DOI: 10.1038/s41467-025-56424-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Accepted: 01/14/2025] [Indexed: 02/01/2025] Open
Abstract
To overcome the computational barriers of analyzing large-scale single-cell sequencing data, we introduce MetaQ, a metacell algorithm that scales to arbitrarily large datasets with linear runtime and constant memory usage. Inspired by cellular development, MetaQ conceptualizes each metacell as a collective ancestor of biologically similar cells. By quantizing cells into a discrete codebook, where each entry represents a metacell capable of reconstructing the original cells it quantizes, MetaQ identifies homogeneous cell subsets for efficient and accurate metacell inference. This approach reduces computational complexity from exponential to linear while maintaining or surpassing the performance of existing metacell algorithms. Extensive experiments demonstrate that MetaQ excels in downstream tasks such as cell type annotation, developmental trajectory inference, batch integration, and differential expression analysis. Thanks to its superior efficiency and effectiveness, MetaQ makes analyzing datasets with millions of cells practical, offering a powerful solution for single-cell studies in the era of high-throughput profiling.
Collapse
Affiliation(s)
- Yunfan Li
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China
| | - Hancong Li
- Department of Thyroid and Parathyroid Surgery, Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease Related Molecular Network, West China Hospital, Sichuan University, Chengdu, Sichuan, China
- Sichuan Clinical Research Center for Laboratory Medicine, Chengdu, Sichuan, China
| | - Yijie Lin
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China
| | - Dan Zhang
- Department of Laboratory Medicine, State Key Laboratory of Biotherapy, West China Second University Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Dezhong Peng
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China
| | - Xiting Liu
- School of Computer Science, Georgia Insitute of Technology, Atlanta, GA, USA
| | - Jie Xie
- College of Life Science, Sichuan Normal University, Chengdu, Sichuan, China
| | - Peng Hu
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China
| | - Lu Chen
- Department of Laboratory Medicine, State Key Laboratory of Biotherapy, West China Second University Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Han Luo
- Department of Thyroid and Parathyroid Surgery, Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease Related Molecular Network, West China Hospital, Sichuan University, Chengdu, Sichuan, China
- Sichuan Clinical Research Center for Laboratory Medicine, Chengdu, Sichuan, China
| | - Xi Peng
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China.
- State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, Sichuan, China.
| |
Collapse
|
2
|
Klein D, Palla G, Lange M, Klein M, Piran Z, Gander M, Meng-Papaxanthos L, Sterr M, Saber L, Jing C, Bastidas-Ponce A, Cota P, Tarquis-Medina M, Parikh S, Gold I, Lickert H, Bakhti M, Nitzan M, Cuturi M, Theis FJ. Mapping cells through time and space with moscot. Nature 2025:10.1038/s41586-024-08453-2. [PMID: 39843746 DOI: 10.1038/s41586-024-08453-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 11/25/2024] [Indexed: 01/24/2025]
Abstract
Single-cell genomic technologies enable the multimodal profiling of millions of cells across temporal and spatial dimensions. However, experimental limitations hinder the comprehensive measurement of cells under native temporal dynamics and in their native spatial tissue niche. Optimal transport has emerged as a powerful tool to address these constraints and has facilitated the recovery of the original cellular context1-4. Yet, most optimal transport applications are unable to incorporate multimodal information or scale to single-cell atlases. Here we introduce multi-omics single-cell optimal transport (moscot), a scalable framework for optimal transport in single-cell genomics that supports multimodality across all applications. We demonstrate the capability of moscot to efficiently reconstruct developmental trajectories of 1.7 million cells from mouse embryos across 20 time points. To illustrate the capability of moscot in space, we enrich spatial transcriptomic datasets by mapping multimodal information from single-cell profiles in a mouse liver sample and align multiple coronal sections of the mouse brain. We present moscot.spatiotemporal, an approach that leverages gene-expression data across both spatial and temporal dimensions to uncover the spatiotemporal dynamics of mouse embryogenesis. We also resolve endocrine-lineage relationships of delta and epsilon cells in a previously unpublished mouse, time-resolved pancreas development dataset using paired measurements of gene expression and chromatin accessibility. Our findings are confirmed through experimental validation of NEUROD2 as a regulator of epsilon progenitor cells in a model of human induced pluripotent stem cell islet cell differentiation. Moscot is available as open-source software, accompanied by extensive documentation.
Collapse
Affiliation(s)
- Dominik Klein
- Institute of Computational Biology, Helmholtz Center, Munich, Germany
- Department of Mathematics, Technical University of Munich, Garching, Germany
| | - Giovanni Palla
- Institute of Computational Biology, Helmholtz Center, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Marius Lange
- Institute of Computational Biology, Helmholtz Center, Munich, Germany
- Department of Mathematics, Technical University of Munich, Garching, Germany
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | | | - Zoe Piran
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Manuel Gander
- Institute of Computational Biology, Helmholtz Center, Munich, Germany
| | | | - Michael Sterr
- Institute of Diabetes and Regeneration Research, Helmholtz Center, Munich, Germany
- German Center for Diabetes Research, Neuherberg, Germany
| | - Lama Saber
- Institute of Diabetes and Regeneration Research, Helmholtz Center, Munich, Germany
- German Center for Diabetes Research, Neuherberg, Germany
- School of Medicine, Technical University of Munich, Munich, Germany
| | - Changying Jing
- Institute of Diabetes and Regeneration Research, Helmholtz Center, Munich, Germany
- German Center for Diabetes Research, Neuherberg, Germany
- Munich Medical Research School (MMRS), Ludwig Maximilian University (LMU), Munich, Germany
| | - Aimée Bastidas-Ponce
- Institute of Diabetes and Regeneration Research, Helmholtz Center, Munich, Germany
- German Center for Diabetes Research, Neuherberg, Germany
| | - Perla Cota
- Institute of Diabetes and Regeneration Research, Helmholtz Center, Munich, Germany
- German Center for Diabetes Research, Neuherberg, Germany
- School of Medicine, Technical University of Munich, Munich, Germany
| | - Marta Tarquis-Medina
- Institute of Diabetes and Regeneration Research, Helmholtz Center, Munich, Germany
- German Center for Diabetes Research, Neuherberg, Germany
| | - Shrey Parikh
- Institute of Computational Biology, Helmholtz Center, Munich, Germany
| | - Ilan Gold
- Institute of Computational Biology, Helmholtz Center, Munich, Germany
| | - Heiko Lickert
- Institute of Diabetes and Regeneration Research, Helmholtz Center, Munich, Germany.
- German Center for Diabetes Research, Neuherberg, Germany.
- School of Medicine, Technical University of Munich, Munich, Germany.
| | - Mostafa Bakhti
- Institute of Diabetes and Regeneration Research, Helmholtz Center, Munich, Germany
- German Center for Diabetes Research, Neuherberg, Germany
| | - Mor Nitzan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
- Racah Institute of Physics, The Hebrew University of Jerusalem, Jerusalem, Israel
- Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | | | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center, Munich, Germany.
- Department of Mathematics, Technical University of Munich, Garching, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.
| |
Collapse
|
3
|
Metzner E, Southard KM, Norman TM. Multiome Perturb-seq unlocks scalable discovery of integrated perturbation effects on the transcriptome and epigenome. Cell Syst 2025; 16:101161. [PMID: 39689711 PMCID: PMC11738662 DOI: 10.1016/j.cels.2024.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 10/14/2024] [Accepted: 12/04/2024] [Indexed: 12/19/2024]
Abstract
Single-cell CRISPR screens link genetic perturbations to transcriptional states, but high-throughput methods connecting these induced changes to their regulatory foundations are limited. Here, we introduce Multiome Perturb-seq, extending single-cell CRISPR screens to simultaneously measure perturbation-induced changes in gene expression and chromatin accessibility. We apply Multiome Perturb-seq in a CRISPRi screen of 13 chromatin remodelers in human RPE-1 cells, achieving efficient assignment of sgRNA identities to single nuclei via an improved method for capturing barcode transcripts from nuclear RNA. We organize expression and accessibility measurements into coherent programs describing the integrated effects of perturbations on cell state, finding that ARID1A and SUZ12 knockdowns induce programs enriched for developmental features. Modeling of perturbation-induced heterogeneity connects accessibility changes to changes in gene expression, highlighting the value of multimodal profiling. Overall, our method provides a scalable and simply implemented system to dissect the regulatory logic underpinning cell state. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Eli Metzner
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY 10065, USA
| | - Kaden M Southard
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Thomas M Norman
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.
| |
Collapse
|
4
|
Miao Z, Wang J, Park K, Kuang D, Kim J. Depth-corrected multi-factor dissection of chromatin accessibility for scATAC-seq data with PACS. Nat Commun 2025; 16:401. [PMID: 39757254 DOI: 10.1038/s41467-024-55580-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 12/10/2024] [Indexed: 01/07/2025] Open
Abstract
Single cell ATAC-seq (scATAC-seq) experimental designs have become increasingly complex, with multiple factors that might affect chromatin accessibility, including genotype, cell type, tissue of origin, sample location, batch, etc., whose compound effects are difficult to test by existing methods. In addition, current scATAC-seq data present statistical difficulties due to their sparsity and variations in individual sequence capture. To address these problems, we present a zero-adjusted statistical model, Probability model of Accessible Chromatin of Single cells (PACS), that allows complex hypothesis testing of accessibility-modulating factors while accounting for sparse and incomplete data. For differential accessibility analysis, PACS controls the false positive rate and achieves a 17% to 122% higher power on average than existing tools. We demonstrate the effectiveness of PACS through several analysis tasks, including supervised cell type annotation, compound hypothesis testing, batch effect correction, and spatiotemporal modeling. We apply PACS to datasets from various tissues and show its ability to reveal previously undiscovered insights in scATAC-seq data.
Collapse
Affiliation(s)
- Zhen Miao
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Jianqiao Wang
- Department of Biostatistics, Harvard T.H. Chan School of Health, Boston, MA, USA
- Department of Statistics and Data Science, Tsinghua University, Beijing, China
| | - Kernyu Park
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Da Kuang
- Deptartment Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Junhyong Kim
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA.
- Deptartment Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
5
|
Gao C, Welch JD. Integrating single-cell multimodal epigenomic data using 1D convolutional neural networks. Bioinformatics 2024; 41:btae705. [PMID: 39820306 PMCID: PMC11751632 DOI: 10.1093/bioinformatics/btae705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 09/30/2024] [Accepted: 01/14/2025] [Indexed: 01/19/2025] Open
Abstract
MOTIVATION Recent experimental developments enable single-cell multimodal epigenomic profiling, which measures multiple histone modifications and chromatin accessibility within the same cell. Such parallel measurements provide exciting new opportunities to investigate how epigenomic modalities vary together across cell types and states. A pivotal step in using these types of data is integrating the epigenomic modalities to learn a unified representation of each cell, but existing approaches are not designed to model the unique nature of this data type. Our key insight is to model single-cell multimodal epigenome data as a multichannel sequential signal. RESULTS We developed ConvNet-VAEs, a novel framework that uses one-dimensional (1D) convolutional variational autoencoders (VAEs) for single-cell multimodal epigenomic data integration. We evaluated ConvNet-VAEs on nano-CUT&Tag and single-cell nanobody-tethered transposition followed by sequencing data generated from juvenile mouse brain and human bone marrow. We found that ConvNet-VAEs can perform dimension reduction and batch correction better than previous architectures while using significantly fewer parameters. Furthermore, the performance gap between convolutional and fully connected architectures increases with the number of modalities, and deeper convolutional architectures can increase the performance, while the performance degrades for deeper fully connected architectures. Our results indicate that convolutional autoencoders are a promising method for integrating current and future single-cell multimodal epigenomic datasets. AVAILABILITY AND IMPLEMENTATION The source code of VAE models and a demo in Jupyter notebook are available at https://github.com/welch-lab/ConvNetVAE.
Collapse
Affiliation(s)
- Chao Gao
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, United States
| | - Joshua D Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, United States
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI 48109, United States
| |
Collapse
|
6
|
Kwok AWC, Shim H, McCarthy DJ. Going beyond cell clustering and feature aggregation: Is there single cell level information in single-cell ATAC-seq data? BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.04.626927. [PMID: 39713401 PMCID: PMC11661094 DOI: 10.1101/2024.12.04.626927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Single-cell Assay for Transposase Accessible Chromatin with sequencing (scATAC-seq) has become a widely used method for investigating chromatin accessibility at single-cell resolution. However, the resulting data is highly sparse with most data entries being zeros. As such, currently available computational methods for scATAC-seq feature a range of transformation procedures to extract meaningful information from the sparse data. Most notably, these transformations can be categorized into: 1) feature aggregation with known biological associations, 2) pseudo-bulking cells of similar biology, and 3) binarisation of count data. These strategies beg the question of whether or not scATAC-seq data actually has usable single-cell and single-region information as intended from the assay. If we can go beyond aggregated features and pooled cells, it opens up the possibility of more complex statistical tasks that require that degree of granularity. To reach the finest possible resolution of single-cell, single-region information there are inevitably many computational challenges to overcome. Here, we review the major data analysis challenges lying between raw data readout and biological discovery, and discuss the limitations of current data analysis approaches. Lastly, we conclude that chromatin accessibility profiling at true single-cell resolution is not yet achieved with current technology, but that it may be achieved with promising developments in optimising the efficiency of scATAC-seq assays.
Collapse
Affiliation(s)
- Aaron Wing Cheung Kwok
- Bioinformatics and Cellular Genomics, St Vincent's Institute of Medical Research, Fitzroy, VIC 3065, Australia
- Melbourne Integrative Genomics, University of Melbourne, Parkville, VIC, 3010, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Heejung Shim
- Melbourne Integrative Genomics, University of Melbourne, Parkville, VIC, 3010, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Davis J McCarthy
- Bioinformatics and Cellular Genomics, St Vincent's Institute of Medical Research, Fitzroy, VIC 3065, Australia
- Melbourne Integrative Genomics, University of Melbourne, Parkville, VIC, 3010, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Melbourne, Parkville, VIC, 3010, Australia
- Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Parkville, VIC, 3010, Australia
| |
Collapse
|
7
|
Boyeau P, Bates S, Ergen C, Jordan MI, Yosef N. VI-VS: calibrated identification of feature dependencies in single-cell multiomics. Genome Biol 2024; 25:294. [PMID: 39548591 PMCID: PMC11566124 DOI: 10.1186/s13059-024-03419-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 10/08/2024] [Indexed: 11/18/2024] Open
Abstract
Unveiling functional relationships between various molecular cell phenotypes from data using machine learning models is a key promise of multiomics. Existing methods either use flexible but hard-to-interpret models or simpler, misspecified models. VI-VS (Variational Inference for Variable Selection) balances flexibility and interpretability to identify relevant feature relationships in multiomic data. It uses deep generative models to identify conditionally dependent features, with false discovery rate control. VI-VS is available as an open-source Python package, providing a robust solution to identify features more likely representing genuine causal relationships.
Collapse
Affiliation(s)
- Pierre Boyeau
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA
| | - Stephen Bates
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, USA
| | - Can Ergen
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA
- Center for Computational Biology, University of California, Berkeley, USA
| | - Michael I Jordan
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA
- Department of Statistics, University of California, Berkeley, USA
- Center for Computational Biology, University of California, Berkeley, USA
- Inria, Paris, France
| | - Nir Yosef
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA.
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
8
|
Piran Z, Cohen N, Hoshen Y, Nitzan M. Disentanglement of single-cell data with biolord. Nat Biotechnol 2024; 42:1678-1683. [PMID: 38225466 PMCID: PMC11554562 DOI: 10.1038/s41587-023-02079-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 11/30/2023] [Indexed: 01/17/2024]
Abstract
Biolord is a deep generative method for disentangling single-cell multi-omic data to known and unknown attributes, including spatial, temporal and disease states, used to reveal the decoupled biological signatures over diverse single-cell modalities and biological systems. By virtually shifting cells across states, biolord generates experimentally inaccessible samples, outperforming state-of-the-art methods in predictions of cellular response to unseen drugs and genetic perturbations. Biolord is available at https://github.com/nitzanlab/biolord .
Collapse
Affiliation(s)
- Zoe Piran
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel
| | - Niv Cohen
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel
| | - Yedid Hoshen
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel
| | - Mor Nitzan
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel.
- Racah Institute of Physics, The Hebrew University, Jerusalem, Israel.
- Faculty of Medicine, The Hebrew University, Jerusalem, Israel.
| |
Collapse
|
9
|
Chow A, Lareau CA. Concepts and new developments in droplet-based single cell multi-omics. Trends Biotechnol 2024; 42:1379-1395. [PMID: 39095258 PMCID: PMC11568944 DOI: 10.1016/j.tibtech.2024.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 05/31/2024] [Accepted: 07/12/2024] [Indexed: 08/04/2024]
Abstract
Single cell sequencing technologies have become a fixture in the molecular profiling of cells due to their ease, flexibility, and commercial availability. In particular, partitioning individual cells inside oil droplets via microfluidic reactions enables transcriptomic or multi-omic measurements for thousands of cells in parallel. Complementing the multitude of biological discoveries from genomics analyses, the past decade has brought new capabilities from assay baselines to enable a deeper understanding of the complex data from single cell multi-omics. Here, we highlight four innovations that have improved the reliability and understanding of droplet microfluidic assays. We emphasize new developments that further orient principles of technology development and guidelines for the design, benchmarking, and implementation of new droplet-based methodologies.
Collapse
Affiliation(s)
- Arthur Chow
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Caleb A Lareau
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| |
Collapse
|
10
|
Teo AYY, Squair JW, Courtine G, Skinnider MA. Best practices for differential accessibility analysis in single-cell epigenomics. Nat Commun 2024; 15:8805. [PMID: 39394227 PMCID: PMC11470024 DOI: 10.1038/s41467-024-53089-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 09/24/2024] [Indexed: 10/13/2024] Open
Abstract
Differential accessibility (DA) analysis of single-cell epigenomics data enables the discovery of regulatory programs that establish cell type identity and steer responses to physiological and pathophysiological perturbations. While many statistical methods to identify DA regions have been developed, the principles that determine the performance of these methods remain unclear. As a result, there is no consensus on the most appropriate statistical methods for DA analysis of single-cell epigenomics data. Here, we present a systematic evaluation of statistical methods that have been applied to identify DA regions in single-cell ATAC-seq (scATAC-seq) data. We leverage a compendium of scATAC-seq experiments with matching bulk ATAC-seq or scRNA-seq in order to assess the accuracy, bias, robustness, and scalability of each statistical method. The structure of our experiments also provides the opportunity to define best practices for the analysis of scATAC-seq data beyond DA itself. We leverage this understanding to develop an R package implementing these best practices.
Collapse
Affiliation(s)
- Alan Yue Yang Teo
- Defitech Center for Interventional Neurotherapies (.NeuroRestore), EPFL/CHUV/UNIL, Lausanne, Switzerland
- NeuroX Institute and Brain Mind Institute, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Jordan W Squair
- Defitech Center for Interventional Neurotherapies (.NeuroRestore), EPFL/CHUV/UNIL, Lausanne, Switzerland.
- NeuroX Institute and Brain Mind Institute, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
- Department of Clinical Neuroscience, Lausanne University Hospital (CHUV) and University of Lausanne (UNIL), Lausanne, Switzerland.
| | - Gregoire Courtine
- Defitech Center for Interventional Neurotherapies (.NeuroRestore), EPFL/CHUV/UNIL, Lausanne, Switzerland.
- NeuroX Institute and Brain Mind Institute, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
- Department of Clinical Neuroscience, Lausanne University Hospital (CHUV) and University of Lausanne (UNIL), Lausanne, Switzerland.
| | - Michael A Skinnider
- Defitech Center for Interventional Neurotherapies (.NeuroRestore), EPFL/CHUV/UNIL, Lausanne, Switzerland.
- NeuroX Institute and Brain Mind Institute, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
- Ludwig Institute for Cancer Research, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
11
|
Su C, Lee D, Jin P, Zhang J. Cell-type-specific mapping of enhancers and target genes from single-cell multimodal data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614814. [PMID: 39386519 PMCID: PMC11463474 DOI: 10.1101/2024.09.24.614814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Mapping enhancers and target genes in disease-related cell types has provided critical insights into the functional mechanisms of genetic variants identified by genome-wide association studies (GWAS). However, most existing analyses rely on bulk data or cultured cell lines, which may fail to identify cell-type-specific enhancers and target genes. Recently, single-cell multimodal data measuring both gene expression and chromatin accessibility within the same cells have enabled the inference of enhancer-gene pairs in a cell-type-specific and context-specific manner. However, this task is challenged by the data's high sparsity, sequencing depth variation, and the computational burden of analyzing a large number of enhancer-gene pairs. To address these challenges, we propose scMultiMap, a statistical method that infers enhancer-gene association from sparse multimodal counts using a joint latent-variable model. It adjusts for technical confounding, permits fast moment-based estimation and provides analytically derived p -values. In systematic analyses of blood and brain data, scMultiMap shows appropriate type I error control, high statistical power with greater reproducibility across independent datasets and stronger consistency with orthogonal data modalities. Meanwhile, its computational cost is less than 1% of existing methods. When applied to single-cell multimodal data from postmortem brain samples from Alzheimer's disease (AD) patients and controls, scMultiMap gave the highest heritability enrichment in microglia and revealed new insights into the regulatory mechanisms of AD GWAS variants in microglia.
Collapse
Affiliation(s)
- Chang Su
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| | - Dongsoo Lee
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| | - Peng Jin
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA, USA
| | - Jingfei Zhang
- Information Systems and Operations Management, Emory University, Atlanta, GA, USA
| |
Collapse
|
12
|
Hingerl JC, Martens LD, Karollus A, Manz T, Buenrostro JD, Theis FJ, Gagneur J. scooby: Modeling multi-modal genomic profiles from DNA sequence at single-cell resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.19.613754. [PMID: 39345504 PMCID: PMC11429888 DOI: 10.1101/2024.09.19.613754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Understanding how regulatory DNA elements shape gene expression across individual cells is a fundamental challenge in genomics. Joint RNA-seq and epigenomic profiling provides opportunities to build unifying models of gene regulation capturing sequence determinants across steps of gene expression. However, current models, developed primarily for bulk omics data, fail to capture the cellular heterogeneity and dynamic processes revealed by single-cell multi-modal technologies. Here, we introduce scooby, the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome from sequence at single-cell resolution. For this, we leverage the pre-trained multi-omics profile predictor Borzoi as a foundation model, equip it with a cell-specific decoder, and fine-tune its sequence embeddings. Specifically, we condition the decoder on the cell position in a precomputed single-cell embedding resulting in strong generalization capability. Applied to a hematopoiesis dataset, scooby recapitulates cell-specific expression levels of held-out genes and cells, and identifies regulators and their putative target genes through in silico motif deletion. Moreover, accurate variant effect prediction with scooby allows for breaking down bulk eQTL effects into single-cell effects and delineating their impact on chromatin accessibility and gene expression. We anticipate scooby to aid unraveling the complexities of gene regulation at the resolution of individual cells.
Collapse
Affiliation(s)
- Johannes C Hingerl
- School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Laura D Martens
- School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
| | - Alexander Karollus
- School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Trevor Manz
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Jason D Buenrostro
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142 USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Fabian J Theis
- School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
| |
Collapse
|
13
|
Martin-Rufino JD, Caulier A, Lee S, Castano N, King E, Joubran S, Jones M, Goldman SR, Arora UP, Wahlster L, Lander ES, Sankaran VG. Transcription factor networks disproportionately enrich for heritability of blood cell phenotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.09.611392. [PMID: 39314298 PMCID: PMC11419094 DOI: 10.1101/2024.09.09.611392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Most phenotype-associated genetic variants map to non-coding regulatory regions of the human genome. Moreover, variants associated with blood cell phenotypes are enriched in regulatory regions active during hematopoiesis. To systematically explore the nature of these regions, we developed a highly efficient strategy, Perturb-multiome, that makes it possible to simultaneously profile both chromatin accessibility and gene expression in single cells with CRISPR-mediated perturbation of a range of master transcription factors (TFs). This approach allowed us to examine the connection between TFs, accessible regions, and gene expression across the genome throughout hematopoietic differentiation. We discovered that variants within the TF-sensitive accessible chromatin regions, while representing less than 0.3% of the genome, show a ~100-fold enrichment in heritability across certain blood cell phenotypes; this enrichment is strikingly higher than for other accessible chromatin regions. Our approach facilitates large-scale mechanistic understanding of phenotype-associated genetic variants by connecting key cis-regulatory elements and their target genes within gene regulatory networks.
Collapse
Affiliation(s)
- Jorge Diego Martin-Rufino
- Division of Hematology/Oncology, Boston Children’s Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Boston, MA, USA
- Equally contributed to work
| | - Alexis Caulier
- Division of Hematology/Oncology, Boston Children’s Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Boston, MA, USA
- Equally contributed to work
| | - Seayoung Lee
- Broad Institute of MIT and Harvard, Boston, MA, USA
- Equally contributed to work
| | - Nicole Castano
- Division of Hematology/Oncology, Boston Children’s Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Boston, MA, USA
- Equally contributed to work
| | - Emily King
- Division of Hematology/Oncology, Boston Children’s Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Boston, MA, USA
| | - Samantha Joubran
- Division of Hematology/Oncology, Boston Children’s Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Boston, MA, USA
| | - Marcus Jones
- Nascent Transcriptomics Core, Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA
| | - Seth R. Goldman
- Nascent Transcriptomics Core, Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA
| | - Uma P. Arora
- Division of Hematology/Oncology, Boston Children’s Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Boston, MA, USA
| | - Lara Wahlster
- Division of Hematology/Oncology, Boston Children’s Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Boston, MA, USA
| | - Eric S. Lander
- Broad Institute of MIT and Harvard, Boston, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Vijay G. Sankaran
- Division of Hematology/Oncology, Boston Children’s Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Boston, MA, USA
- Howard Hughes Medical Institute, Boston, MA, USA
- Harvard Stem Cell Institute, Cambridge, MA, USA
| |
Collapse
|
14
|
Metzner E, Southard KM, Norman TM. Multiome Perturb-seq unlocks scalable discovery of integrated perturbation effects on the transcriptome and epigenome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.26.605307. [PMID: 39091800 PMCID: PMC11291144 DOI: 10.1101/2024.07.26.605307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Single-cell CRISPR screens link genetic perturbations to transcriptional states, but high-throughput methods connecting these induced changes to their regulatory foundations are limited. Here we introduce Multiome Perturb-seq, extending single-cell CRISPR screens to simultaneously measure perturbation-induced changes in gene expression and chromatin accessibility. We apply Multiome Perturb-seq in a CRISPRi screen of 13 chromatin remodelers in human RPE-1 cells, achieving efficient assignment of sgRNA identities to single nuclei via an improved method for capturing barcode transcripts from nuclear RNA. We organize expression and accessibility measurements into coherent programs describing the integrated effects of perturbations on cell state, finding that ARID1A and SUZ12 knockdowns induce programs enriched for developmental features. Pseudotime analysis of perturbations connects accessibility changes to changes in gene expression, highlighting the value of multimodal profiling. Overall, our method provides a scalable and simply implemented system to dissect the regulatory logic underpinning cell state.
Collapse
Affiliation(s)
- Eli Metzner
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY, USA
| | - Kaden M. Southard
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Thomas M. Norman
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
15
|
Feng AC, Thomas BJ, Purbey PK, de Melo FM, Liu X, Daly AE, Sun F, Lo JHH, Cheng L, Carey MF, Scumpia PO, Smale ST. The transcription factor NF-κB orchestrates nucleosome remodeling during the primary response to Toll-like receptor 4 signaling. Immunity 2024; 57:462-477.e9. [PMID: 38430908 PMCID: PMC10984581 DOI: 10.1016/j.immuni.2024.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 11/26/2023] [Accepted: 02/07/2024] [Indexed: 03/05/2024]
Abstract
Inducible nucleosome remodeling at hundreds of latent enhancers and several promoters shapes the transcriptional response to Toll-like receptor 4 (TLR4) signaling in macrophages. We aimed to define the identities of the transcription factors that promote TLR-induced remodeling. An analysis strategy based on ATAC-seq and single-cell ATAC-seq that enriched for genomic regions most likely to undergo remodeling revealed that the transcription factor nuclear factor κB (NF-κB) bound to all high-confidence peaks marking remodeling during the primary response to the TLR4 ligand, lipid A. Deletion of NF-κB subunits RelA and c-Rel resulted in the loss of remodeling at high-confidence ATAC-seq peaks, and CRISPR-Cas9 mutagenesis of NF-κB-binding motifs impaired remodeling. Remodeling selectivity at defined regions was conferred by collaboration with other inducible factors, including IRF3- and MAP-kinase-induced factors. Thus, NF-κB is unique among TLR4-activated transcription factors in its broad contribution to inducible nucleosome remodeling, alongside its ability to activate poised enhancers and promoters assembled into open chromatin.
Collapse
Affiliation(s)
- An-Chieh Feng
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Brandon J Thomas
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Psychiatry and Behavioral Science, University of Washington, Seattle, WA 98195, USA
| | - Prabhat K Purbey
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Filipe Menegatti de Melo
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA; Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Xin Liu
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Allison E Daly
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Fei Sun
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Jerry Hung-Hao Lo
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Lijing Cheng
- Department of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Michael F Carey
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Philip O Scumpia
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Stephen T Smale
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
16
|
Gao C, Welch JD. Integrating single-cell multimodal epigenomic data using 1D-convolutional neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.16.580655. [PMID: 38464242 PMCID: PMC10925154 DOI: 10.1101/2024.02.16.580655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Recent experimental developments enable single-cell multimodal epigenomic profiling, which measures multiple histone modifications and chromatin accessibility within the same cell. Such parallel measurements provide exciting new opportunities to investigate how epigenomic modalities vary together across cell types and states. A pivotal step in using this type of data is integrating the epigenomic modalities to learn a unified representation of each cell, but existing approaches are not designed to model the unique nature of this data type. Our key insight is to model single-cell multimodal epigenome data as a multi-channel sequential signal. Based on this insight, we developed ConvNet-VAEs, a novel framework that uses 1D-convolutional variational autoencoders (VAEs) for single-cell multimodal epigenomic data integration. We evaluated ConvNet-VAEs on nano-CT and scNTT-seq data generated from juvenile mouse brain and human bone marrow. We found that ConvNet-VAEs can perform dimension reduction and batch correction better than previous architectures while using significantly fewer parameters. Furthermore, the performance gap between convolutional and fully-connected architectures increases with the number of modalities, and deeper convolutional architectures can increase performance while performance degrades for deeper fully-connected architectures. Our results indicate that convolutional autoencoders are a promising method for integrating current and future single-cell multimodal epigenomic datasets.
Collapse
Affiliation(s)
- Chao Gao
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor MI 48109, USA
| | - Joshua D Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor MI 48109, USA
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor MI 48109, USA
| |
Collapse
|