1
|
Song D, Wang Q, Yan G, Liu T, Sun T, Li JJ. scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics. Nat Biotechnol 2024; 42:247-252. [PMID: 37169966 PMCID: PMC11182337 DOI: 10.1038/s41587-023-01772-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 03/30/2023] [Indexed: 05/13/2023]
Abstract
We present a statistical simulator, scDesign3, to generate realistic single-cell and spatial omics data, including various cell states, experimental designs and feature modalities, by learning interpretable parameters from real data. Using a unified probabilistic model for single-cell and spatial omics data, scDesign3 infers biologically meaningful parameters; assesses the goodness-of-fit of inferred cell clusters, trajectories and spatial locations; and generates in silico negative and positive controls for benchmarking computational tools.
Collapse
Affiliation(s)
- Dongyuan Song
- Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA, USA
| | - Qingyang Wang
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Guanao Yan
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Tianyang Liu
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Tianyi Sun
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Jingyi Jessica Li
- Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA, USA.
- Department of Statistics, University of California, Los Angeles, CA, USA.
- Department of Human Genetics, University of California, Los Angeles, CA, USA.
- Department of Computational Medicine, University of California, Los Angeles, CA, USA.
- Department of Biostatistics, University of California, Los Angeles, CA, USA.
- Radcliffe Institute for Advanced Study, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
2
|
Roth C, Venu V, Job V, Lubbers N, Sanbonmatsu KY, Steadman CR, Starkenburg SR. Improved quality metrics for association and reproducibility in chromatin accessibility data using mutual information. BMC Bioinformatics 2023; 24:441. [PMID: 37990143 PMCID: PMC10664258 DOI: 10.1186/s12859-023-05553-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 10/30/2023] [Indexed: 11/23/2023] Open
Abstract
BACKGROUND Correlation metrics are widely utilized in genomics analysis and often implemented with little regard to assumptions of normality, homoscedasticity, and independence of values. This is especially true when comparing values between replicated sequencing experiments that probe chromatin accessibility, such as assays for transposase-accessible chromatin via sequencing (ATAC-seq). Such data can possess several regions across the human genome with little to no sequencing depth and are thus non-normal with a large portion of zero values. Despite distributed use in the epigenomics field, few studies have evaluated and benchmarked how correlation and association statistics behave across ATAC-seq experiments with known differences or the effects of removing specific outliers from the data. Here, we developed a computational simulation of ATAC-seq data to elucidate the behavior of correlation statistics and to compare their accuracy under set conditions of reproducibility. RESULTS Using these simulations, we monitored the behavior of several correlation statistics, including the Pearson's R and Spearman's [Formula: see text] coefficients as well as Kendall's [Formula: see text] and Top-Down correlation. We also test the behavior of association measures, including the coefficient of determination R[Formula: see text], Kendall's W, and normalized mutual information. Our experiments reveal an insensitivity of most statistics, including Spearman's [Formula: see text], Kendall's [Formula: see text], and Kendall's W, to increasing differences between simulated ATAC-seq replicates. The removal of co-zeros (regions lacking mapped sequenced reads) between simulated experiments greatly improves the estimates of correlation and association. After removing co-zeros, the R[Formula: see text] coefficient and normalized mutual information display the best performance, having a closer one-to-one relationship with the known portion of shared, enhanced loci between simulated replicates. When comparing values between experimental ATAC-seq data using a random forest model, mutual information best predicts ATAC-seq replicate relationships. CONCLUSIONS Collectively, this study demonstrates how measures of correlation and association can behave in epigenomics experiments. We provide improved strategies for quantifying relationships in these increasingly prevalent and important chromatin accessibility assays.
Collapse
Affiliation(s)
- Cullen Roth
- Los Alamos National Laboratory, Genomics and Bioanalytics, Los Alamos, NM, USA.
| | - Vrinda Venu
- Los Alamos National Laboratory, Climate, Ecosystems, and Environmental Science, Los Alamos, NM, USA
| | - Vanessa Job
- Los Alamos National Laboratory, High Performance Computing and Design, Los Alamos, NM, USA
| | - Nicholas Lubbers
- Los Alamos National Laboratory, Information Sciences, Los Alamos, NM, USA
| | - Karissa Y Sanbonmatsu
- Los Alamos National Laboratory, Theoretical Biology and Biophysics, Los Alamos, NM, USA
| | - Christina R Steadman
- Los Alamos National Laboratory, Climate, Ecosystems, and Environmental Science, Los Alamos, NM, USA
| | - Shawn R Starkenburg
- Los Alamos National Laboratory, Genomics and Bioanalytics, Los Alamos, NM, USA
| |
Collapse
|
3
|
Yan G, Song D, Li JJ. scReadSim: a single-cell RNA-seq and ATAC-seq read simulator. Nat Commun 2023; 14:7482. [PMID: 37980428 PMCID: PMC10657386 DOI: 10.1038/s41467-023-43162-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 11/02/2023] [Indexed: 11/20/2023] Open
Abstract
Benchmarking single-cell RNA-seq (scRNA-seq) and single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) computational tools demands simulators to generate realistic sequencing reads. However, none of the few read simulators aim to mimic real data. To fill this gap, we introduce scReadSim, a single-cell RNA-seq and ATAC-seq read simulator that allows user-specified ground truths and generates synthetic sequencing reads (in a FASTQ or BAM file) by mimicking real data. At both read-sequence and read-count levels, scReadSim mimics real scRNA-seq and scATAC-seq data. Moreover, scReadSim provides ground truths, including unique molecular identifier (UMI) counts for scRNA-seq and open chromatin regions for scATAC-seq. In particular, scReadSim allows users to design cell-type-specific ground-truth open chromatin regions for scATAC-seq data generation. In benchmark applications of scReadSim, we show that UMI-tools achieves the top accuracy in scRNA-seq UMI deduplication, and HMMRATAC and MACS3 achieve the top performance in scATAC-seq peak calling.
Collapse
Affiliation(s)
- Guanao Yan
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
| | - Dongyuan Song
- Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA, 90095-7246, USA
| | - Jingyi Jessica Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA.
- Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA, 90095-7246, USA.
- Department of Human Genetics, University of California, Los Angeles, CA, 90095-7088, USA.
- Department of Computational Medicine, University of California, Los Angeles, CA, 90095-1766, USA.
- Department of Biostatistics, University of California, Los Angeles, CA, 90095-1772, USA.
- Radcliffe Institute for Advanced Study, Harvard University, Cambridge, MA, 02138, USA.
| |
Collapse
|
4
|
Patruno L, Milite S, Bergamin R, Calonaci N, D’Onofrio A, Anselmi F, Antoniotti M, Graudenzi A, Caravagna G. A Bayesian method to infer copy number clones from single-cell RNA and ATAC sequencing. PLoS Comput Biol 2023; 19:e1011557. [PMID: 37917660 PMCID: PMC10645363 DOI: 10.1371/journal.pcbi.1011557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 11/14/2023] [Accepted: 09/30/2023] [Indexed: 11/04/2023] Open
Abstract
Single-cell RNA and ATAC sequencing technologies enable the examination of gene expression and chromatin accessibility in individual cells, providing insights into cellular phenotypes. In cancer research, it is important to consistently analyze these states within an evolutionary context on genetic clones. Here we present CONGAS+, a Bayesian model to map single-cell RNA and ATAC profiles onto the latent space of copy number clones. CONGAS+ clusters cells into tumour subclones with similar ploidy, rendering straightforward to compare their expression and chromatin profiles. The framework, implemented on GPU and tested on real and simulated data, scales to analyse seamlessly thousands of cells, demonstrating better performance than single-molecule models, and supporting new multi-omics assays. In prostate cancer, lymphoma and basal cell carcinoma, CONGAS+ successfully identifies complex subclonal architectures while providing a coherent mapping between ATAC and RNA, facilitating the study of genotype-phenotype maps and their connection to genomic instability.
Collapse
Affiliation(s)
- Lucrezia Patruno
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Salvatore Milite
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
- Centre for Computational Biology, Human Technopole, Milan, Italy
| | - Riccardo Bergamin
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Nicola Calonaci
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Alberto D’Onofrio
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Fabio Anselmi
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| | - Marco Antoniotti
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- B4—Bicocca Bioinformatics Biostatistics and Bioimaging Centre, Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Alex Graudenzi
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- B4—Bicocca Bioinformatics Biostatistics and Bioimaging Centre, Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Giulio Caravagna
- Department of Mathematics and Geosciences, Università degli Studi di Trieste, Trieste, Italy
| |
Collapse
|
5
|
Zhang W, Jiang R, Chen S, Wang Y. scIBD: a self-supervised iterative-optimizing model for boosting the detection of heterotypic doublets in single-cell chromatin accessibility data. Genome Biol 2023; 24:225. [PMID: 37814314 PMCID: PMC10561408 DOI: 10.1186/s13059-023-03072-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 09/22/2023] [Indexed: 10/11/2023] Open
Abstract
Application of the widely used droplet-based microfluidic technologies in single-cell sequencing often yields doublets, introducing bias to downstream analyses. Especially, doublet-detection methods for single-cell chromatin accessibility sequencing (scCAS) data have multiple assay-specific challenges. Therefore, we propose scIBD, a self-supervised iterative-optimizing model for boosting heterotypic doublet detection in scCAS data. scIBD introduces an adaptive strategy to simulate high-confident heterotypic doublets and self-supervise for doublet-detection in an iteratively optimizing manner. Comprehensive benchmarking on various simulated and real datasets demonstrates the outperformance and robustness of scIBD. Moreover, the downstream biological analyses suggest the efficacy of doublet-removal by scIBD.
Collapse
Affiliation(s)
- Wenhao Zhang
- Department of Automation, Xiamen University, Xiamen, 361000, Fujian, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361000, Fujian, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| | - Ying Wang
- Department of Automation, Xiamen University, Xiamen, 361000, Fujian, China.
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361000, Fujian, China.
- Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Xiamen, 361005, Fujian, China.
| |
Collapse
|
6
|
Li H, Zhang Z, Squires M, Chen X, Zhang X. scMultiSim: simulation of single cell multi-omics and spatial data guided by gene regulatory networks and cell-cell interactions. RESEARCH SQUARE 2023:rs.3.rs-3301625. [PMID: 37790516 PMCID: PMC10543280 DOI: 10.21203/rs.3.rs-3301625/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Simulated single-cell data is essential for designing and evaluating computational methods in the absence of experimental ground truth. Existing simulators typically focus on modeling one or two specific biological factors or mechanisms that affect the output data, which limits their capacity to simulate the complexity and multi-modality in real data. Here, we present scMultiSim, an in silico simulator that generates multi-modal single-cell data, including gene expression, chromatin accessibility, RNA velocity, and spatial cell locations while accounting for the relationships between modalities. scMultiSim jointly models various biological factors that affect the output data, including cell identity, within-cell gene regulatory networks (GRNs), cell-cell interactions (CCIs), and chromatin accessibility, hile also incorporating technical noises. Moreover, it allows users to adjust each factor's effect easily. We validated scMultiSim's simulated biological effects and demonstrated its applications by benchmarking a wide range of computational tasks, including multi-modal and multi-batch data integration, RNA velocity estimation, GRN inference and CCI inference using spatially resolved gene expression data, many of them were not benchmarked before due to the lack of proper tools. Compared to existing simulators, scMultiSim can benchmark a much broader range of existing computational problems and even new potential tasks.
Collapse
Affiliation(s)
- Hechen Li
- Georgia Institute of Technology, Atlanta, USA
| | - Ziqi Zhang
- Georgia Institute of Technology, Atlanta, USA
| | | | - Xi Chen
- Southern University of Science and Technology, Shenzhen, China
| | | |
Collapse
|
7
|
Li C, Chen X, Chen S, Jiang R, Zhang X. simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data. Bioinformatics 2023; 39:btad453. [PMID: 37494428 PMCID: PMC10394124 DOI: 10.1093/bioinformatics/btad453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/25/2023] [Accepted: 07/25/2023] [Indexed: 07/28/2023] Open
Abstract
MOTIVATION Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework is desirable for evaluation and validation of these methods. However, existing simulators generate synthetic data by sampling reads from real data or mimicking existing cell states, which is inadequate to provide credible ground-truth labels for method evaluation. RESULTS We present simCAS, an embedding-based simulator, for generating high-fidelity scCAS data from both cell- and peak-wise embeddings. We demonstrate simCAS outperforms existing simulators in resembling real data and show that simCAS can generate cells of different states with user-defined cell populations and differentiation trajectories. Additionally, simCAS can simulate data from different batches and encode user-specified interactions of chromatin regions in the synthetic data, which provides ground-truth labels more than cell states. We systematically demonstrate that simCAS facilitates the benchmarking of four core tasks in downstream analysis: cell clustering, trajectory inference, data integration, and cis-regulatory interaction inference. We anticipate simCAS will be a reliable and flexible simulator for evaluating the ongoing computational methods applied on scCAS data. AVAILABILITY AND IMPLEMENTATION simCAS is freely available at https://github.com/Chen-Li-17/simCAS.
Collapse
Affiliation(s)
- Chen Li
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaoyang Chen
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
- Center for Synthetic and Systems Biology, School of Life Sciences and School of Medicine, Tsinghua University, Beijing 100084, China
| |
Collapse
|
8
|
Ellis D, Roy A, Datta S. Clustering single-cell multimodal omics data with jrSiCKLSNMF. Front Genet 2023; 14:1179439. [PMID: 37359367 PMCID: PMC10288154 DOI: 10.3389/fgene.2023.1179439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 05/23/2023] [Indexed: 06/28/2023] Open
Abstract
Introduction: The development of multimodal single-cell omics methods has enabled the collection of data across different omics modalities from the same set of single cells. Each omics modality provides unique information about cell type and function, so the ability to integrate data from different modalities can provide deeper insights into cellular functions. Often, single-cell omics data can prove challenging to model because of high dimensionality, sparsity, and technical noise. Methods: We propose a novel multimodal data analysis method called joint graph-regularized Single-Cell Kullback-Leibler Sparse Non-negative Matrix Factorization (jrSiCKLSNMF, pronounced "junior sickles NMF") that extracts latent factors shared across omics modalities within the same set of single cells. Results: We compare our clustering algorithm to several existing methods on four sets of data simulated from third party software. We also apply our algorithm to a real set of cell line data. Discussion: We show overwhelmingly better clustering performance than several existing methods on the simulated data. On a real multimodal omics dataset, we also find our method to produce scientifically accurate clustering results.
Collapse
|
9
|
Li H, Zhang Z, Squires M, Chen X, Zhang X. scMultiSim: simulation of multi-modality single cell data guided by cell-cell interactions and gene regulatory networks. RESEARCH SQUARE 2023:rs.3.rs-2675530. [PMID: 36993284 PMCID: PMC10055660 DOI: 10.21203/rs.3.rs-2675530/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Simulated single-cell data is essential for designing and evaluating computational methods in the absence of experimental ground truth. Existing simulators typically focus on modeling one or two specific biological factors or mechanisms that affect the output data, which limits their capacity to simulate the complexity and multi-modality in real data. Here, we present scMultiSim, an in silico simulator that generates multi-modal single-cell data, including gene expression, chromatin accessibility, RNA velocity, and spatial cell locations while accounting for the relationships between modalities. scMultiSim jointly models various biological factors that affect the output data, including cell identity, within-cell gene regulatory networks (GRNs), cell-cell interactions (CCIs), and chromatin accessibility, while also incorporating technical noises. Moreover, it allows users to adjust each factor's effect easily. We validated scMultiSim's simulated biological effects and demonstrated its applications by benchmarking a wide range of computational tasks, including cell clustering and trajectory inference, multi-modal and multi-batch data integration, RNA velocity estimation, GRN inference and CCI inference using spatially resolved gene expression data. Compared to existing simulators, scMultiSim can benchmark a much broader range of existing computational problems and even new potential tasks.
Collapse
Affiliation(s)
- Hechen Li
- Georgia Institute of Technology, Atlanta, USA
| | - Ziqi Zhang
- Georgia Institute of Technology, Atlanta, USA
| | | | - Xi Chen
- Southern University of Science and Technology, China
| | | |
Collapse
|
10
|
scChIX-seq infers dynamic relationships between histone modifications in single cells. Nat Biotechnol 2023:10.1038/s41587-022-01560-3. [PMID: 36593403 DOI: 10.1038/s41587-022-01560-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 10/12/2022] [Indexed: 01/03/2023]
Abstract
Regulation of chromatin states involves the dynamic interplay between different histone modifications to control gene expression. Recent advances have enabled mapping of histone marks in single cells, but most methods are constrained to profile only one histone mark per cell. Here, we present an integrated experimental and computational framework, scChIX-seq (single-cell chromatin immunocleavage and unmixing sequencing), to map several histone marks in single cells. scChIX-seq multiplexes two histone marks together in single cells, then computationally deconvolves the signal using training data from respective histone mark profiles. This framework learns the cell-type-specific correlation structure between histone marks, and therefore does not require a priori assumptions of their genomic distributions. Using scChIX-seq, we demonstrate multimodal analysis of histone marks in single cells across a range of mark combinations. Modeling dynamics of in vitro macrophage differentiation enables integrated analysis of chromatin velocity. Overall, scChIX-seq unlocks systematic interrogation of the interplay between histone modifications in single cells.
Collapse
|
11
|
Chen X, Chen S, Song S, Gao Z, Hou L, Zhang X, Lv H, Jiang R. Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-021-00432-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
12
|
Schwartz GW, Zhou Y, Petrovic J, Pear WS, Faryabi RB. TooManyPeaks identifies drug-resistant-specific regulatory elements from single-cell leukemic epigenomes. Cell Rep 2021; 36:109575. [PMID: 34433064 PMCID: PMC8409102 DOI: 10.1016/j.celrep.2021.109575] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 03/30/2021] [Accepted: 07/29/2021] [Indexed: 12/13/2022] Open
Abstract
Emerging single-cell epigenomic assays are used to investigate the heterogeneity of chromatin activity and its function. However, identifying cells with distinct regulatory elements and clearly visualizing their relationships remains challenging. To this end, we introduce TooManyPeaks to address the need for the simultaneous study of chromatin state heterogeneity in both rare and abundant subpopulations. Our analyses of existing data from three widely used single-cell assays for transposase-accessible chromatin using sequencing (scATAC-seq) show the superior performance of TooManyPeaks in delineating and visualizing pure clusters of rare and abundant subpopulations. Furthermore, the application of TooManyPeaks to new scATAC-seq data from drug-naive and drug-resistant leukemic T cells clearly visualizes relationships among these cells and stratifies a rare "resistant-like" drug-naive sub-clone with distinct cis-regulatory elements.
Collapse
Affiliation(s)
- Gregory W Schwartz
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA; Penn Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA; Abramson Family Cancer Research Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yeqiao Zhou
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA; Penn Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA; Abramson Family Cancer Research Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jelena Petrovic
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA; Penn Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA; Abramson Family Cancer Research Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Warren S Pear
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA; Abramson Family Cancer Research Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Robert B Faryabi
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA; Penn Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA; Abramson Family Cancer Research Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|