1
|
Teo AYY, Squair JW, Courtine G, Skinnider MA. Best practices for differential accessibility analysis in single-cell epigenomics. Nat Commun 2024; 15:8805. [PMID: 39394227 PMCID: PMC11470024 DOI: 10.1038/s41467-024-53089-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 09/24/2024] [Indexed: 10/13/2024] Open
Abstract
Differential accessibility (DA) analysis of single-cell epigenomics data enables the discovery of regulatory programs that establish cell type identity and steer responses to physiological and pathophysiological perturbations. While many statistical methods to identify DA regions have been developed, the principles that determine the performance of these methods remain unclear. As a result, there is no consensus on the most appropriate statistical methods for DA analysis of single-cell epigenomics data. Here, we present a systematic evaluation of statistical methods that have been applied to identify DA regions in single-cell ATAC-seq (scATAC-seq) data. We leverage a compendium of scATAC-seq experiments with matching bulk ATAC-seq or scRNA-seq in order to assess the accuracy, bias, robustness, and scalability of each statistical method. The structure of our experiments also provides the opportunity to define best practices for the analysis of scATAC-seq data beyond DA itself. We leverage this understanding to develop an R package implementing these best practices.
Collapse
Affiliation(s)
- Alan Yue Yang Teo
- Defitech Center for Interventional Neurotherapies (.NeuroRestore), EPFL/CHUV/UNIL, Lausanne, Switzerland
- NeuroX Institute and Brain Mind Institute, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Jordan W Squair
- Defitech Center for Interventional Neurotherapies (.NeuroRestore), EPFL/CHUV/UNIL, Lausanne, Switzerland.
- NeuroX Institute and Brain Mind Institute, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
- Department of Clinical Neuroscience, Lausanne University Hospital (CHUV) and University of Lausanne (UNIL), Lausanne, Switzerland.
| | - Gregoire Courtine
- Defitech Center for Interventional Neurotherapies (.NeuroRestore), EPFL/CHUV/UNIL, Lausanne, Switzerland.
- NeuroX Institute and Brain Mind Institute, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
- Department of Clinical Neuroscience, Lausanne University Hospital (CHUV) and University of Lausanne (UNIL), Lausanne, Switzerland.
| | - Michael A Skinnider
- Defitech Center for Interventional Neurotherapies (.NeuroRestore), EPFL/CHUV/UNIL, Lausanne, Switzerland.
- NeuroX Institute and Brain Mind Institute, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
- Ludwig Institute for Cancer Research, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
2
|
Subedi S, Sumida TS, Park YP. A scalable approach to topic modelling in single-cell data by approximate pseudobulk projection. Life Sci Alliance 2024; 7:e202402713. [PMID: 39107066 PMCID: PMC11303850 DOI: 10.26508/lsa.202402713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 07/29/2024] [Accepted: 07/30/2024] [Indexed: 08/09/2024] Open
Abstract
Probabilistic topic modelling has become essential in many types of single-cell data analysis. Based on probabilistic topic assignments in each cell, we identify the latent representation of cellular states. A dictionary matrix, consisting of topic-specific gene frequency vectors, provides interpretable bases to be compared with known cell type-specific marker genes and other pathway annotations. However, fitting a topic model on a large number of cells would require heavy computational resources-specialized computing units, computing time and memory. Here, we present a scalable approximation method customized for single-cell RNA-seq data analysis, termed ASAP, short for Annotating a Single-cell data matrix by Approximate Pseudobulk estimation. Our approach is more accurate than existing methods but requires orders of magnitude less computing time, leaving much lower memory consumption. We also show that our approach is widely applicable for atlas-scale data analysis; our method seamlessly integrates single-cell and bulk data in joint analysis, not requiring additional preprocessing or feature selection steps.
Collapse
Affiliation(s)
- Sishir Subedi
- https://ror.org/03rmrcq20Bioinformatics Graduate Program, University of British Columbia, Vancouver, Canada
- BC Cancer Research, Vancouver, Canada
| | - Tomokazu S Sumida
- Neurology, Program for Neuroinflammation, Yale School of Medicine, New Haven, CT, USA
| | - Yongjin P Park
- BC Cancer Research, Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- Department of Statistics, University of British Columbia, Vancouver, Canada
| |
Collapse
|
3
|
Rymuza J, Sun Y, Zheng G, LeRoy N, Murach M, Phan N, Zhang A, Sheffield N. Methods for constructing and evaluating consensus genomic interval sets. Nucleic Acids Res 2024; 52:10119-10131. [PMID: 39180401 PMCID: PMC11417377 DOI: 10.1093/nar/gkae685] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 07/05/2024] [Accepted: 07/29/2024] [Indexed: 08/26/2024] Open
Abstract
The amount of genomic region data continues to increase. Integrating across diverse genomic region sets requires consensus regions, which enable comparing regions across experiments, but also by necessity lose precision in region definitions. We require methods to assess this loss of precision and build optimal consensus region sets. Here, we introduce the concept of flexible intervals and propose three novel methods for building consensus region sets, or universes: a coverage cutoff method, a likelihood method, and a Hidden Markov Model. We then propose three novel measures for evaluating how well a proposed universe fits a collection of region sets: a base-level overlap score, a region boundary distance score, and a likelihood score. We apply our methods and evaluation approaches to several collections of region sets and show how these methods can be used to evaluate fit of universes and build optimal universes. We describe scenarios where the common approach of merging regions to create consensus leads to undesirable outcomes and provide principled alternatives that provide interoperability of interval data while minimizing loss of resolution.
Collapse
Affiliation(s)
- Julia Rymuza
- Department of Genome Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| | - Yuchen Sun
- Department of Genome Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Guangtao Zheng
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Nathan J LeRoy
- Department of Genome Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA
| | - Maria Murach
- Department of Genome Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| | - Neil Phan
- Department of Genome Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Aidong Zhang
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA
- School of Data Science, University of Virginia, Charlottesville, VA 22904, USA
| | - Nathan C Sheffield
- Department of Genome Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- School of Data Science, University of Virginia, Charlottesville, VA 22904, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Child Health Research Center, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| |
Collapse
|
4
|
LeRoy N, Smith J, Zheng G, Rymuza J, Gharavi E, Brown D, Zhang A, Sheffield N. Fast clustering and cell-type annotation of scATAC data using pre-trained embeddings. NAR Genom Bioinform 2024; 6:lqae073. [PMID: 38974799 PMCID: PMC11224678 DOI: 10.1093/nargab/lqae073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 04/29/2024] [Accepted: 06/20/2024] [Indexed: 07/09/2024] Open
Abstract
Data from the single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) are now widely available. One major computational challenge is dealing with high dimensionality and inherent sparsity, which is typically addressed by producing lower dimensional representations of single cells for downstream clustering tasks. Current approaches produce such individual cell embeddings directly through a one-step learning process. Here, we propose an alternative approach by building embedding models pre-trained on reference data. We argue that this provides a more flexible analysis workflow that also has computational performance advantages through transfer learning. We implemented our approach in scEmbed, an unsupervised machine-learning framework that learns low-dimensional embeddings of genomic regulatory regions to represent and analyze scATAC-seq data. scEmbed performs well in terms of clustering ability and has the key advantage of learning patterns of region co-occurrence that can be transferred to other, unseen datasets. Moreover, models pre-trained on reference data can be exploited to build fast and accurate cell-type annotation systems without the need for other data modalities. scEmbed is implemented in Python and it is available to download from GitHub. We also make our pre-trained models available on huggingface for public use. scEmbed is open source and available at https://github.com/databio/geniml. Pre-trained models from this work can be obtained on huggingface: https://huggingface.co/databio.
Collapse
Affiliation(s)
- Nathan J LeRoy
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA
| | - Jason P Smith
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Child Health Research Center, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| | - Guangtao Zheng
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Julia Rymuza
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| | - Erfaneh Gharavi
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- School of Data Science, University of Virginia, Charlottesville, VA 22904, USA
| | - Donald E Brown
- School of Data Science, University of Virginia, Charlottesville, VA 22904, USA
- Department of Systems and Information Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Aidong Zhang
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
- School of Data Science, University of Virginia, Charlottesville, VA 22904, USA
| | - Nathan C Sheffield
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Child Health Research Center, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
- School of Data Science, University of Virginia, Charlottesville, VA 22904, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| |
Collapse
|
5
|
Luo S, Germain PL, Robinson MD, von Meyenn F. Benchmarking computational methods for single-cell chromatin data analysis. Genome Biol 2024; 25:225. [PMID: 39152456 PMCID: PMC11328424 DOI: 10.1186/s13059-024-03356-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 07/29/2024] [Indexed: 08/19/2024] Open
Abstract
BACKGROUND Single-cell chromatin accessibility assays, such as scATAC-seq, are increasingly employed in individual and joint multi-omic profiling of single cells. As the accumulation of scATAC-seq and multi-omics datasets continue, challenges in analyzing such sparse, noisy, and high-dimensional data become pressing. Specifically, one challenge relates to optimizing the processing of chromatin-level measurements and efficiently extracting information to discern cellular heterogeneity. This is of critical importance, since the identification of cell types is a fundamental step in current single-cell data analysis practices. RESULTS We benchmark 8 feature engineering pipelines derived from 5 recent methods to assess their ability to discover and discriminate cell types. By using 10 metrics calculated at the cell embedding, shared nearest neighbor graph, or partition levels, we evaluate the performance of each method at different data processing stages. This comprehensive approach allows us to thoroughly understand the strengths and weaknesses of each method and the influence of parameter selection. CONCLUSIONS Our analysis provides guidelines for choosing analysis methods for different datasets. Overall, feature aggregation, SnapATAC, and SnapATAC2 outperform latent semantic indexing-based methods. For datasets with complex cell-type structures, SnapATAC and SnapATAC2 are preferred. With large datasets, SnapATAC2 and ArchR are most scalable.
Collapse
Affiliation(s)
- Siyuan Luo
- Laboratory of Nutrition and Metabolic Epigenetics, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | - Pierre-Luc Germain
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
- Institute for Neuroscience, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
| | - Mark D Robinson
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland.
| | - Ferdinand von Meyenn
- Laboratory of Nutrition and Metabolic Epigenetics, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland.
| |
Collapse
|
6
|
Rachid Zaim S, Pebworth MP, McGrath I, Okada L, Weiss M, Reading J, Czartoski JL, Torgerson TR, McElrath MJ, Bumol TF, Skene PJ, Li XJ. MOCHA's advanced statistical modeling of scATAC-seq data enables functional genomic inference in large human cohorts. Nat Commun 2024; 15:6828. [PMID: 39122670 PMCID: PMC11316085 DOI: 10.1038/s41467-024-50612-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 07/13/2024] [Indexed: 08/12/2024] Open
Abstract
Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) is being increasingly used to study gene regulation. However, major analytical gaps limit its utility in studying gene regulatory programs in complex diseases. In response, MOCHA (Model-based single cell Open CHromatin Analysis) presents major advances over existing analysis tools, including: 1) improving identification of sample-specific open chromatin, 2) statistical modeling of technical drop-out with zero-inflated methods, 3) mitigation of false positives in single cell analysis, 4) identification of alternative transcription-starting-site regulation, and 5) modules for inferring temporal gene regulatory networks from longitudinal data. These advances, in addition to open chromatin analyses, provide a robust framework after quality control and cell labeling to study gene regulatory programs in human disease. We benchmark MOCHA with four state-of-the-art tools to demonstrate its advances. We also construct cross-sectional and longitudinal gene regulatory networks, identifying potential mechanisms of COVID-19 response. MOCHA provides researchers with a robust analytical tool for functional genomic inference from scATAC-seq data.
Collapse
Affiliation(s)
| | | | | | - Lauren Okada
- Allen Institute for Immunology, Seattle, WA, USA
| | - Morgan Weiss
- Allen Institute for Immunology, Seattle, WA, USA
| | | | - Julie L Czartoski
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | - M Juliana McElrath
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | | | - Xiao-Jun Li
- Allen Institute for Immunology, Seattle, WA, USA.
| |
Collapse
|
7
|
Liu J, Ma J, Wen J, Zhou X. A Cell Cycle-Aware Network for Data Integration and Label Transferring of Single-Cell RNA-Seq and ATAC-Seq. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2401815. [PMID: 38887194 PMCID: PMC11336957 DOI: 10.1002/advs.202401815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/22/2024] [Indexed: 06/20/2024]
Abstract
In recent years, the integration of single-cell multi-omics data has provided a more comprehensive understanding of cell functions and internal regulatory mechanisms from a non-single omics perspective, but it still suffers many challenges, such as omics-variance, sparsity, cell heterogeneity, and confounding factors. As it is known, the cell cycle is regarded as a confounder when analyzing other factors in single-cell RNA-seq data, but it is not clear how it will work on the integrated single-cell multi-omics data. Here, a cell cycle-aware network (CCAN) is developed to remove cell cycle effects from the integrated single-cell multi-omics data while keeping the cell type-specific variations. This is the first computational model to study the cell-cycle effects in the integration of single-cell multi-omics data. Validations on several benchmark datasets show the outstanding performance of CCAN in a variety of downstream analyses and applications, including removing cell cycle effects and batch effects of scRNA-seq datasets from different protocols, integrating paired and unpaired scRNA-seq and scATAC-seq data, accurately transferring cell type labels from scRNA-seq to scATAC-seq data, and characterizing the differentiation process from hematopoietic stem cells to different lineages in the integration of differentiation data.
Collapse
Affiliation(s)
- Jiajia Liu
- Center for Computational Systems MedicineMcWilliams School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTX77030USA
| | - Jian Ma
- Department of Electronic Information and Computer EngineeringThe Engineering & Technical College of Chengdu University of TechnologyLeshanSichuan614000China
| | - Jianguo Wen
- Center for Computational Systems MedicineMcWilliams School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTX77030USA
| | - Xiaobo Zhou
- Center for Computational Systems MedicineMcWilliams School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTX77030USA
- McGovern Medical SchoolThe University of Texas Health Science Center at HoustonHoustonTX77030USA
- School of DentistryThe University of Texas Health Science Center at HoustonHoustonTX77030USA
| |
Collapse
|
8
|
Zhao F, Ma X, Yao B, Lu Q, Chen L. scaDA: A novel statistical method for differential analysis of single-cell chromatin accessibility sequencing data. PLoS Comput Biol 2024; 20:e1011854. [PMID: 39093856 PMCID: PMC11324137 DOI: 10.1371/journal.pcbi.1011854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 08/14/2024] [Accepted: 07/17/2024] [Indexed: 08/04/2024] Open
Abstract
Single-cell ATAC-seq sequencing data (scATAC-seq) has been widely used to investigate chromatin accessibility on the single-cell level. One important application of scATAC-seq data analysis is differential chromatin accessibility (DA) analysis. However, the data characteristics of scATAC-seq such as excessive zeros and large variability of chromatin accessibility across cells impose a unique challenge for DA analysis. Existing statistical methods focus on detecting the mean difference of the chromatin accessible regions while overlooking the distribution difference. Motivated by real data exploration that distribution difference exists among cell types, we introduce a novel composite statistical test named "scaDA", which is based on zero-inflated negative binomial model (ZINB), for performing differential distribution analysis of chromatin accessibility by jointly testing the abundance, prevalence and dispersion simultaneously. Benefiting from both dispersion shrinkage and iterative refinement of mean and prevalence parameter estimates, scaDA demonstrates its superiority to both ZINB-based likelihood ratio tests and published methods by achieving the highest power and best FDR control in a comprehensive simulation study. In addition to demonstrating the highest power in three real sc-multiome data analyses, scaDA successfully identifies differentially accessible regions in microglia from sc-multiome data for an Alzheimer's disease (AD) study that are most enriched in GO terms related to neurogenesis and the clinical phenotype of AD, and AD-associated GWAS SNPs.
Collapse
Affiliation(s)
- Fengdi Zhao
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| | - Xin Ma
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| | - Bing Yao
- Department of Human Genetics, Emory University, Atlanta, Georgia, United States of America
| | - Qing Lu
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| | - Li Chen
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
9
|
Yang X, Mann KK, Wu H, Ding J. scCross: a deep generative model for unifying single-cell multi-omics with seamless integration, cross-modal generation, and in silico exploration. Genome Biol 2024; 25:198. [PMID: 39075536 PMCID: PMC11285326 DOI: 10.1186/s13059-024-03338-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 07/16/2024] [Indexed: 07/31/2024] Open
Abstract
Single-cell multi-omics data reveal complex cellular states, providing significant insights into cellular dynamics and disease. Yet, integration of multi-omics data presents challenges. Some modalities have not reached the robustness or clarity of established transcriptomics. Coupled with data scarcity for less established modalities and integration intricacies, these challenges limit our ability to maximize single-cell omics benefits. We introduce scCross, a tool leveraging variational autoencoders, generative adversarial networks, and the mutual nearest neighbors (MNN) technique for modality alignment. By enabling single-cell cross-modal data generation, multi-omics data simulation, and in silico cellular perturbations, scCross enhances the utility of single-cell multi-omics studies.
Collapse
Affiliation(s)
- Xiuhui Yang
- School of Software, Shandong University, 1500 Shunhua, Jinan, 250101, Shandong, China
- Meakins-Christie Laboratories, Department of Medicine, McGill University Health Centre, Montreal, H4A 3J1, QC, Canada
- Quantitative Life Sciences, Faculty of Medicine & Health Sciences, McGill University, Montreal, QC, H3G 1Y6, Canada
| | - Koren K Mann
- Department of Pharmacology and Therapeutics, McGill University, Montreal, QC, H3G 1Y6, Canada
| | - Hao Wu
- School of Software, Shandong University, 1500 Shunhua, Jinan, 250101, Shandong, China.
| | - Jun Ding
- Meakins-Christie Laboratories, Department of Medicine, McGill University Health Centre, Montreal, H4A 3J1, QC, Canada.
- Quantitative Life Sciences, Faculty of Medicine & Health Sciences, McGill University, Montreal, QC, H3G 1Y6, Canada.
- Mila-Quebec AI Institute, Montreal, QC, H2S 3H1, Canada.
| |
Collapse
|
10
|
Lin Y, Pan Z, Zeng Y, Yang Y, Dai Z. Detecting novel cell type in single-cell chromatin accessibility data via open-set domain adaptation. Brief Bioinform 2024; 25:bbae370. [PMID: 39073828 DOI: 10.1093/bib/bbae370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/27/2024] [Accepted: 07/15/2024] [Indexed: 07/30/2024] Open
Abstract
Recent advances in single-cell technologies enable the rapid growth of multi-omics data. Cell type annotation is one common task in analyzing single-cell data. It is a challenge that some cell types in the testing set are not present in the training set (i.e. unknown cell types). Most scATAC-seq cell type annotation methods generally assign each cell in the testing set to one known type in the training set but neglect unknown cell types. Here, we present OVAAnno, an automatic cell types annotation method which utilizes open-set domain adaptation to detect unknown cell types in scATAC-seq data. Comprehensive experiments show that OVAAnno successfully identifies known and unknown cell types. Further experiments demonstrate that OVAAnno also performs well on scRNA-seq data. Our codes are available online at https://github.com/lisaber/OVAAnno/tree/master.
Collapse
Affiliation(s)
- Yuefan Lin
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China
| | - Zixiang Pan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China
| | - Yuansong Zeng
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China
| |
Collapse
|
11
|
Loers JU, Vermeirssen V. A single-cell multimodal view on gene regulatory network inference from transcriptomics and chromatin accessibility data. Brief Bioinform 2024; 25:bbae382. [PMID: 39207727 PMCID: PMC11359808 DOI: 10.1093/bib/bbae382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 06/27/2024] [Accepted: 07/23/2024] [Indexed: 09/04/2024] Open
Abstract
Eukaryotic gene regulation is a combinatorial, dynamic, and quantitative process that plays a vital role in development and disease and can be modeled at a systems level in gene regulatory networks (GRNs). The wealth of multi-omics data measured on the same samples and even on the same cells has lifted the field of GRN inference to the next stage. Combinations of (single-cell) transcriptomics and chromatin accessibility allow the prediction of fine-grained regulatory programs that go beyond mere correlation of transcription factor and target gene expression, with enhancer GRNs (eGRNs) modeling molecular interactions between transcription factors, regulatory elements, and target genes. In this review, we highlight the key components for successful (e)GRN inference from (sc)RNA-seq and (sc)ATAC-seq data exemplified by state-of-the-art methods as well as open challenges and future developments. Moreover, we address preprocessing strategies, metacell generation and computational omics pairing, transcription factor binding site detection, and linear and three-dimensional approaches to identify chromatin interactions as well as dynamic and causal eGRN inference. We believe that the integration of transcriptomics together with epigenomics data at a single-cell level is the new standard for mechanistic network inference, and that it can be further advanced with integrating additional omics layers and spatiotemporal data, as well as with shifting the focus towards more quantitative and causal modeling strategies.
Collapse
Affiliation(s)
- Jens Uwe Loers
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Vanessa Vermeirssen
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| |
Collapse
|
12
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-023-2561-0. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
13
|
Missarova A, Dann E, Rosen L, Satija R, Marioni J. Leveraging neighborhood representations of single-cell data to achieve sensitive DE testing with miloDE. Genome Biol 2024; 25:189. [PMID: 39026254 PMCID: PMC11256449 DOI: 10.1186/s13059-024-03334-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 07/10/2024] [Indexed: 07/20/2024] Open
Abstract
Single-cell RNA-sequencing enables testing for differential expression (DE) between conditions at a cell type level. While powerful, one of the limitations of such approaches is that the sensitivity of DE testing is dictated by the sensitivity of clustering, which is often suboptimal. To overcome this, we present miloDE-a cluster-free framework for DE testing (available as an open-source R package). We illustrate the performance of miloDE on both simulated and real data. Using miloDE, we identify a transient hemogenic endothelia-like state in mouse embryos lacking Tal1 and detect distinct programs during macrophage activation in idiopathic pulmonary fibrosis.
Collapse
Affiliation(s)
- Alsu Missarova
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Emma Dann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Leah Rosen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Rahul Satija
- Center for Genomics and Systems Biology, NYU, New York, USA.
- New York Genome Center, New York, USA.
| | - John Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
| |
Collapse
|
14
|
Wang X, Lian Q, Dong H, Xu S, Su Y, Wu X. Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae014. [PMID: 39049508 PMCID: PMC11423854 DOI: 10.1093/gpbjnl/qzae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Revised: 06/20/2023] [Accepted: 06/25/2023] [Indexed: 07/27/2024]
Abstract
Gene set scoring (GSS) has been routinely conducted for gene expression analysis of bulk or single-cell RNA sequencing (RNA-seq) data, which helps to decipher single-cell heterogeneity and cell type-specific variability by incorporating prior knowledge from functional gene sets. Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a powerful technique for interrogating single-cell chromatin-based gene regulation, and genes or gene sets with dynamic regulatory potentials can be regarded as cell type-specific markers as if in single-cell RNA-seq (scRNA-seq). However, there are few GSS tools specifically designed for scATAC-seq, and the applicability and performance of RNA-seq GSS tools on scATAC-seq data remain to be investigated. Here, we systematically benchmarked ten GSS tools, including four bulk RNA-seq tools, five scRNA-seq tools, and one scATAC-seq method. First, using matched scATAC-seq and scRNA-seq datasets, we found that the performance of GSS tools on scATAC-seq data was comparable to that on scRNA-seq, suggesting their applicability to scATAC-seq. Then, the performance of different GSS tools was extensively evaluated using up to ten scATAC-seq datasets. Moreover, we evaluated the impact of gene activity conversion, dropout imputation, and gene set collections on the results of GSS. Results show that dropout imputation can significantly promote the performance of almost all GSS tools, while the impact of gene activity conversion methods or gene set collections on GSS performance is more dependent on GSS tools or datasets. Finally, we provided practical guidelines for choosing appropriate preprocessing methods and GSS tools in different application scenarios.
Collapse
Affiliation(s)
- Xi Wang
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Qiwei Lian
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Haoyu Dong
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| | - Shuo Xu
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Yaru Su
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
| | - Xiaohui Wu
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| |
Collapse
|
15
|
Bilous M, Hérault L, Gabriel AA, Teleman M, Gfeller D. Building and analyzing metacells in single-cell genomics data. Mol Syst Biol 2024; 20:744-766. [PMID: 38811801 PMCID: PMC11220014 DOI: 10.1038/s44320-024-00045-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 05/03/2024] [Accepted: 05/08/2024] [Indexed: 05/31/2024] Open
Abstract
The advent of high-throughput single-cell genomics technologies has fundamentally transformed biological sciences. Currently, millions of cells from complex biological tissues can be phenotypically profiled across multiple modalities. The scaling of computational methods to analyze and visualize such data is a constant challenge, and tools need to be regularly updated, if not redesigned, to cope with ever-growing numbers of cells. Over the last few years, metacells have been introduced to reduce the size and complexity of single-cell genomics data while preserving biologically relevant information and improving interpretability. Here, we review recent studies that capitalize on the concept of metacells-and the many variants in nomenclature that have been used. We further outline how and when metacells should (or should not) be used to analyze single-cell genomics data and what should be considered when analyzing such data at the metacell level. To facilitate the exploration of metacells, we provide a comprehensive tutorial on the construction and analysis of metacells from single-cell RNA-seq data ( https://github.com/GfellerLab/MetacellAnalysisTutorial ) as well as a fully integrated pipeline to rapidly build, visualize and evaluate metacells with different methods ( https://github.com/GfellerLab/MetacellAnalysisToolkit ).
Collapse
Affiliation(s)
- Mariia Bilous
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Léonard Hérault
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Aurélie Ag Gabriel
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Matei Teleman
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland.
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland.
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland.
| |
Collapse
|
16
|
Chen H, Ryu J, Vinyard ME, Lerer A, Pinello L. SIMBA: single-cell embedding along with features. Nat Methods 2024; 21:1003-1013. [PMID: 37248389 PMCID: PMC11166568 DOI: 10.1038/s41592-023-01899-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 04/26/2023] [Indexed: 05/31/2023]
Abstract
Most current single-cell analysis pipelines are limited to cell embeddings and rely heavily on clustering, while lacking the ability to explicitly model interactions between different feature types. Furthermore, these methods are tailored to specific tasks, as distinct single-cell problems are formulated differently. To address these shortcomings, here we present SIMBA, a graph embedding method that jointly embeds single cells and their defining features, such as genes, chromatin-accessible regions and DNA sequences, into a common latent space. By leveraging the co-embedding of cells and features, SIMBA allows for the study of cellular heterogeneity, clustering-free marker discovery, gene regulation inference, batch effect removal and omics data integration. We show that SIMBA provides a single framework that allows diverse single-cell problems to be formulated in a unified way and thus simplifies the development of new analyses and extension to new single-cell modalities. SIMBA is implemented as a comprehensive Python library ( https://simba-bio.readthedocs.io ).
Collapse
Affiliation(s)
- Huidong Chen
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Jayoung Ryu
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Michael E Vinyard
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Adam Lerer
- Facebook AI Research, New York, NY, USA.
| | - Luca Pinello
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA.
- Department of Pathology, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
17
|
Lyons A, Brown J, Davenport KM. Single-Cell Sequencing Technology in Ruminant Livestock: Challenges and Opportunities. Curr Issues Mol Biol 2024; 46:5291-5306. [PMID: 38920988 PMCID: PMC11202421 DOI: 10.3390/cimb46060316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 05/20/2024] [Accepted: 05/25/2024] [Indexed: 06/27/2024] Open
Abstract
Advancements in single-cell sequencing have transformed the genomics field by allowing researchers to delve into the intricate cellular heterogeneity within tissues at greater resolution. While single-cell omics are more widely applied in model organisms and humans, their use in livestock species is just beginning. Studies in cattle, sheep, and goats have already leveraged single-cell and single-nuclei RNA-seq as well as single-cell and single-nuclei ATAC-seq to delineate cellular diversity in tissues, track changes in cell populations and gene expression over developmental stages, and characterize immune cell populations important for disease resistance and resilience. Although challenges exist for the use of this technology in ruminant livestock, such as the precise annotation of unique cell populations and spatial resolution of cells within a tissue, there is vast potential to enhance our understanding of the cellular and molecular mechanisms underpinning traits essential for healthy and productive livestock. This review intends to highlight the insights gained from published single-cell omics studies in cattle, sheep, and goats, particularly those with publicly accessible data. Further, this manuscript will discuss the challenges and opportunities of this technology in ruminant livestock and how it may contribute to enhanced profitability and sustainability of animal agriculture in the future.
Collapse
|
18
|
Wei Z. Discrete latent embeddings illuminate cellular diversity in single-cell epigenomics. NATURE COMPUTATIONAL SCIENCE 2024; 4:316-317. [PMID: 38811822 DOI: 10.1038/s43588-024-00634-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Affiliation(s)
- Zhi Wei
- Department of Computer Science, Ying Wu College of Computing, New Jersey Institute of Technology, Newark, NJ, USA.
| |
Collapse
|
19
|
Geens B, Goossens S, Li J, Van de Peer Y, Vanden Broeck J. Untangling the gordian knot: The intertwining interactions between developmental hormone signaling and epigenetic mechanisms in insects. Mol Cell Endocrinol 2024; 585:112178. [PMID: 38342134 DOI: 10.1016/j.mce.2024.112178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 01/30/2024] [Accepted: 02/04/2024] [Indexed: 02/13/2024]
Abstract
Hormones control developmental and physiological processes, often by regulating the expression of multiple genes simultaneously or sequentially. Crosstalk between hormones and epigenetics is pivotal to dynamically coordinate this process. Hormonal signals can guide the addition and removal of epigenetic marks, steering gene expression. Conversely, DNA methylation, histone modifications and non-coding RNAs can modulate regional chromatin structure and accessibility and regulate the expression of numerous (hormone-related) genes. Here, we provide a review of the interplay between the classical insect hormones, ecdysteroids and juvenile hormones, and epigenetics. We summarize the mode-of-action and roles of these hormones in post-embryonic development, and provide a general overview of epigenetic mechanisms. We then highlight recent advances on the interactions between these hormonal pathways and epigenetics, and their involvement in development. Furthermore, we give an overview of several 'omics techniques employed in the field. Finally, we discuss which questions remain unanswered and possible avenues for future research.
Collapse
Affiliation(s)
- Bart Geens
- Molecular Developmental Physiology and Signal Transduction, KU Leuven, Naamsestraat 59 box 2465, B-3000 Leuven, Belgium.
| | - Stijn Goossens
- Molecular Developmental Physiology and Signal Transduction, KU Leuven, Naamsestraat 59 box 2465, B-3000 Leuven, Belgium.
| | - Jia Li
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium; VIB Center for Plant Systems Biology, VIB, Ghent, Belgium.
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium; VIB Center for Plant Systems Biology, VIB, Ghent, Belgium.
| | - Jozef Vanden Broeck
- Molecular Developmental Physiology and Signal Transduction, KU Leuven, Naamsestraat 59 box 2465, B-3000 Leuven, Belgium.
| |
Collapse
|
20
|
Cui X, Chen X, Li Z, Gao Z, Chen S, Jiang R. Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity. NATURE COMPUTATIONAL SCIENCE 2024; 4:346-359. [PMID: 38730185 DOI: 10.1038/s43588-024-00625-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 04/05/2024] [Indexed: 05/12/2024]
Abstract
Single-cell epigenomic data has been growing continuously at an unprecedented pace, but their characteristics such as high dimensionality and sparsity pose substantial challenges to downstream analysis. Although deep learning models-especially variational autoencoders-have been widely used to capture low-dimensional feature embeddings, the prevalent Gaussian assumption somewhat disagrees with real data, and these models tend to struggle to incorporate reference information from abundant cell atlases. Here we propose CASTLE, a deep generative model based on the vector-quantized variational autoencoder framework to extract discrete latent embeddings that interpretably characterize single-cell chromatin accessibility sequencing data. We validate the performance and robustness of CASTLE for accurate cell-type identification and reasonable visualization compared with state-of-the-art methods. We demonstrate the advantages of CASTLE for effective incorporation of existing massive reference datasets in a weakly supervised or supervised manner. We further demonstrate CASTLE's capacity for intuitively distilling cell-type-specific feature spectra that unveil cell heterogeneity and biological implications quantitatively.
Collapse
Affiliation(s)
- Xuejian Cui
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Zhen Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Zijing Gao
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China.
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China.
| |
Collapse
|
21
|
Cao Y, Zhao X, Tang S, Jiang Q, Li S, Li S, Chen S. scButterfly: a versatile single-cell cross-modality translation method via dual-aligned variational autoencoders. Nat Commun 2024; 15:2973. [PMID: 38582890 PMCID: PMC10998864 DOI: 10.1038/s41467-024-47418-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 03/28/2024] [Indexed: 04/08/2024] Open
Abstract
Recent advancements for simultaneously profiling multi-omics modalities within individual cells have enabled the interrogation of cellular heterogeneity and molecular hierarchy. However, technical limitations lead to highly noisy multi-modal data and substantial costs. Although computational methods have been proposed to translate single-cell data across modalities, broad applications of the methods still remain impeded by formidable challenges. Here, we propose scButterfly, a versatile single-cell cross-modality translation method based on dual-aligned variational autoencoders and data augmentation schemes. With comprehensive experiments on multiple datasets, we provide compelling evidence of scButterfly's superiority over baseline methods in preserving cellular heterogeneity while translating datasets of various contexts and in revealing cell type-specific biological insights. Besides, we demonstrate the extensive applications of scButterfly for integrative multi-omics analysis of single-modality data, data enhancement of poor-quality single-cell multi-omics, and automatic cell type annotation of scATAC-seq data. Moreover, scButterfly can be generalized to unpaired data training, perturbation-response analysis, and consecutive translation.
Collapse
Affiliation(s)
- Yichuan Cao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Xiamiao Zhao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Songming Tang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Qun Jiang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, 100084, Beijing, China
| | - Sijie Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Siyu Li
- School of Statistics and Data Science, Nankai University, Tianjin, 300071, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| |
Collapse
|
22
|
Sakaue S, Weinand K, Isaac S, Dey KK, Jagadeesh K, Kanai M, Watts GFM, Zhu Z, Brenner MB, McDavid A, Donlin LT, Wei K, Price AL, Raychaudhuri S. Tissue-specific enhancer-gene maps from multimodal single-cell data identify causal disease alleles. Nat Genet 2024; 56:615-626. [PMID: 38594305 PMCID: PMC11456345 DOI: 10.1038/s41588-024-01682-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 02/07/2024] [Indexed: 04/11/2024]
Abstract
Translating genome-wide association study (GWAS) loci into causal variants and genes requires accurate cell-type-specific enhancer-gene maps from disease-relevant tissues. Building enhancer-gene maps is essential but challenging with current experimental methods in primary human tissues. Here we developed a nonparametric statistical method, SCENT (single-cell enhancer target gene mapping), that models association between enhancer chromatin accessibility and gene expression in single-cell or nucleus multimodal RNA sequencing and ATAC sequencing data. We applied SCENT to 9 multimodal datasets including >120,000 single cells or nuclei and created 23 cell-type-specific enhancer-gene maps. These maps were highly enriched for causal variants in expression quantitative loci and GWAS for 1,143 diseases and traits. We identified likely causal genes for both common and rare diseases and linked somatic mutation hotspots to target genes. We demonstrate that application of SCENT to multimodal data from disease-relevant human tissue enables the scalable construction of accurate cell-type-specific enhancer-gene maps, essential for defining noncoding variant function.
Collapse
Affiliation(s)
- Saori Sakaue
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kathryn Weinand
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Shakson Isaac
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Kushal K Dey
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Karthik Jagadeesh
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Masahiro Kanai
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA, USA
| | - Gerald F M Watts
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Zhu Zhu
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Michael B Brenner
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Andrew McDavid
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY, USA
| | - Laura T Donlin
- Hospital for Special Surgery, New York, NY, USA
- Weill Cornell Medicine, New York, NY, USA
| | - Kevin Wei
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Alkes L Price
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Soumya Raychaudhuri
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
23
|
Annotating cell types in single-cell ATAC data via the guidance of the underlying DNA sequences. NATURE COMPUTATIONAL SCIENCE 2024; 4:261-262. [PMID: 38671305 DOI: 10.1038/s43588-024-00626-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
|
24
|
Zeng Y, Luo M, Shangguan N, Shi P, Feng J, Xu J, Chen K, Lu Y, Yu W, Yang Y. Deciphering cell types by integrating scATAC-seq data with genome sequences. NATURE COMPUTATIONAL SCIENCE 2024; 4:285-298. [PMID: 38600256 DOI: 10.1038/s43588-024-00622-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/18/2024] [Indexed: 04/12/2024]
Abstract
The single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) technology provides insight into gene regulation and epigenetic heterogeneity at single-cell resolution, but cell annotation from scATAC-seq remains challenging due to high dimensionality and extreme sparsity within the data. Existing cell annotation methods mostly focus on the cell peak matrix without fully utilizing the underlying genomic sequence. Here we propose a method, SANGO, for accurate single-cell annotation by integrating genome sequences around the accessibility peaks within scATAC data. The genome sequences of peaks are encoded into low-dimensional embeddings, and then iteratively used to reconstruct the peak statistics of cells through a fully connected network. The learned weights are considered as regulatory modes to represent cells, and utilized to align the query cells and the annotated cells in the reference data through a graph transformer network for cell annotations. SANGO was demonstrated to consistently outperform competing methods on 55 paired scATAC-seq datasets across samples, platforms and tissues. SANGO was also shown to be able to detect unknown tumor cells through attention edge weights learned by the graph transformer. Moreover, from the annotated cells, we found cell-type-specific peaks that provide functional insights/biological signals through expression enrichment analysis, cis-regulatory chromatin interaction analysis and motif enrichment analysis.
Collapse
Affiliation(s)
- Yuansong Zeng
- School of Big Data and Software Engineering, Chongqing University, Chongqing, China
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Mai Luo
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Ningyuan Shangguan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Peiyu Shi
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Junxi Feng
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Jin Xu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Ken Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yutong Lu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Weijiang Yu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Guangzhou, China.
| |
Collapse
|
25
|
Li S, Li Y, Sun Y, Li Y, Chen X, Tang S, Chen S. EpiCarousel: memory- and time-efficient identification of metacells for atlas-level single-cell chromatin accessibility data. Bioinformatics 2024; 40:btae191. [PMID: 38588573 PMCID: PMC11037479 DOI: 10.1093/bioinformatics/btae191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 03/02/2024] [Accepted: 04/05/2024] [Indexed: 04/10/2024] Open
Abstract
SUMMARY Recent technical advancements in single-cell chromatin accessibility sequencing (scCAS) have brought new insights to the characterization of epigenetic heterogeneity. As single-cell genomics experiments scale up to hundreds of thousands of cells, the demand for computational resources for downstream analysis grows intractably large and exceeds the capabilities of most researchers. Here, we propose EpiCarousel, a tailored Python package based on lazy loading, parallel processing, and community detection for memory- and time-efficient identification of metacells, i.e. the emergence of homogenous cells, in large-scale scCAS data. Through comprehensive experiments on five datasets of various protocols, sample sizes, dimensions, number of cell types, and degrees of cell-type imbalance, EpiCarousel outperformed baseline methods in systematic evaluation of memory usage, computational time, and multiple downstream analyses including cell type identification. Moreover, EpiCarousel executes preprocessing and downstream cell clustering on the atlas-level dataset with 707 043 cells and 1 154 611 peaks within 2 h consuming <75 GB of RAM and provides superior performance for characterizing cell heterogeneity than state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION The EpiCarousel software is well-documented and freely available at https://github.com/biox-nku/epicarousel. It can be seamlessly interoperated with extensive scCAS analysis toolkits.
Collapse
Affiliation(s)
- Sijie Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Yuxi Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Yu Sun
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yaru Li
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaoyang Chen
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Songming Tang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| |
Collapse
|
26
|
Zhang W, Cui Y, Liu B, Loza M, Park SJ, Nakai K. HyGAnno: hybrid graph neural network-based cell type annotation for single-cell ATAC sequencing data. Brief Bioinform 2024; 25:bbae152. [PMID: 38581422 PMCID: PMC10998639 DOI: 10.1093/bib/bbae152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/19/2024] [Accepted: 03/10/2024] [Indexed: 04/08/2024] Open
Abstract
Reliable cell type annotations are crucial for investigating cellular heterogeneity in single-cell omics data. Although various computational approaches have been proposed for single-cell RNA sequencing (scRNA-seq) annotation, high-quality cell labels are still lacking in single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) data, because of extreme sparsity and inconsistent chromatin accessibility between datasets. Here, we present a novel automated cell annotation method that transfers cell type information from a well-labeled scRNA-seq reference to an unlabeled scATAC-seq target, via a parallel graph neural network, in a semi-supervised manner. Unlike existing methods that utilize only gene expression or gene activity features, HyGAnno leverages genome-wide accessibility peak features to facilitate the training process. In addition, HyGAnno reconstructs a reference-target cell graph to detect cells with low prediction reliability, according to their specific graph connectivity patterns. HyGAnno was assessed across various datasets, showcasing its strengths in precise cell annotation, generating interpretable cell embeddings, robustness to noisy reference data and adaptability to tumor tissues.
Collapse
Affiliation(s)
- Weihang Zhang
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Yang Cui
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Bowen Liu
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Martin Loza
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Sung-Joon Park
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Kenta Nakai
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| |
Collapse
|
27
|
Khandibharad S, Singh S. Single-cell ATAC sequencing identifies sleepy macrophages during reciprocity of cytokines in L. major infection. Microbiol Spectr 2024; 12:e0347823. [PMID: 38299832 PMCID: PMC10913457 DOI: 10.1128/spectrum.03478-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 12/31/2023] [Indexed: 02/02/2024] Open
Abstract
The hallmark characteristic of macrophages lies in their inherent plasticity, allowing them to adapt to dynamic microenvironments. Leishmania strategically modulates the phenotypic plasticity of macrophages, creating a favorable environment for intracellular survival and persistent infection through regulatory cytokine such as interleukin (IL)-10. Nevertheless, these effector cells can counteract infection by modulating crucial cytokines like IL-12 and key components involved in its production. Using sophisticated tool of single-cell assay for transposase accessible chromatin (ATAC) sequencing, we systematically examined the regulatory axis of IL-10 and IL-12 in a time-dependent manner during Leishmania major infection in macrophages Our analysis revealed the cellular heterogeneity post-infection with the regulators of IL-10 and IL-12, unveiling a reciprocal relationship between these cytokines. Notably, our significant findings highlighted the presence of sleepy macrophages and their pivotal role in mediating reciprocity between IL-10 and IL-12. To summarize, the roles of cytokine expression, transcription factors, cell cycle, and epigenetics of host cell machinery were vital in identification of sleepy macrophages, which is a transient state where transcription factors controlled the epigenetic remodeling and expression of genes involved in pro-inflammatory cytokine expression and recruitment of immune cells.IMPORTANCELeishmaniasis is an endemic affecting 99 countries and territories globally, as outlined in the 2022 World Health Organization report. The disease's severity is compounded by compromised host immune systems, emphasizing the pivotal role of the interplay between parasite and host immune factors in disease regulation. In instances of cutaneous leishmaniasis induced by L. major, macrophages function as sentinel cells. Our findings indicate that the plasticity and phenotype of macrophages can be modulated to express a cytokine profile involving IL-10 and IL-12, mediated by the regulation of transcription factors and their target genes post-L. major infection in macrophages. Employing sophisticated methodologies such as single-cell ATAC sequencing and computational genomics, we have identified a distinctive subset of macrophages termed "sleepy macrophages." These macrophages exhibit downregulated housekeeping genes while expressing a unique set of variable features. This data set constitutes a valuable resource for comprehending the intricate host-parasite interplay during L. major infection.
Collapse
Affiliation(s)
- Shweta Khandibharad
- Systems Medicine Lab, National Centre for Cell Science, SP Pune University Campus, Pune, India
| | - Shailza Singh
- Systems Medicine Lab, National Centre for Cell Science, SP Pune University Campus, Pune, India
| |
Collapse
|
28
|
Peidli S, Green TD, Shen C, Gross T, Min J, Garda S, Yuan B, Schumacher LJ, Taylor-King JP, Marks DS, Luna A, Blüthgen N, Sander C. scPerturb: harmonized single-cell perturbation data. Nat Methods 2024; 21:531-540. [PMID: 38279009 DOI: 10.1038/s41592-023-02144-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 12/04/2023] [Indexed: 01/28/2024]
Abstract
Analysis across a growing number of single-cell perturbation datasets is hampered by poor data interoperability. To facilitate development and benchmarking of computational methods, we collect a set of 44 publicly available single-cell perturbation-response datasets with molecular readouts, including transcriptomics, proteomics and epigenomics. We apply uniform quality control pipelines and harmonize feature annotations. The resulting information resource, scPerturb, enables development and testing of computational methods, and facilitates comparison and integration across datasets. We describe energy statistics (E-statistics) for quantification of perturbation effects and significance testing, and demonstrate E-distance as a general distance measure between sets of single-cell expression profiles. We illustrate the application of E-statistics for quantifying similarity and efficacy of perturbations. The perturbation-response datasets and E-statistics computation software are publicly available at scperturb.org. This work provides an information resource for researchers working with single-cell perturbation data and recommendations for experimental design, including optimal cell counts and read depth.
Collapse
Affiliation(s)
- Stefan Peidli
- Institute of Pathology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität, Berlin, Germany.
- Institute of Biology, Humboldt-Universität, Berlin, Germany.
| | - Tessa D Green
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Ciyue Shen
- Departments of Cell Biology and Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute, Cambridge, MA, USA
| | | | - Joseph Min
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Samuele Garda
- Institute of Biology, Humboldt-Universität, Berlin, Germany
- Institute for Computer Science, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Bo Yuan
- Departments of Cell Biology and Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute, Cambridge, MA, USA
| | - Linus J Schumacher
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh, UK
| | | | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute, Cambridge, MA, USA
| | - Augustin Luna
- Departments of Cell Biology and Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute, Cambridge, MA, USA.
- Computational Biology Branch, National Library of Medicine and Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD, USA.
| | - Nils Blüthgen
- Institute of Pathology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität, Berlin, Germany.
- Institute of Biology, Humboldt-Universität, Berlin, Germany.
| | - Chris Sander
- Departments of Cell Biology and Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute, Cambridge, MA, USA.
| |
Collapse
|
29
|
Zhao C, Liu A, Zhang X, Cao X, Ding Z, Sha Q, Shen H, Deng HW, Zhou W. CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data. Comput Biol Med 2024; 170:108058. [PMID: 38295477 PMCID: PMC10959569 DOI: 10.1016/j.compbiomed.2024.108058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 12/30/2023] [Accepted: 01/26/2024] [Indexed: 02/02/2024]
Abstract
Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding etiology of complex genetic diseases. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning is employed, which maximizes the mutual information between different types of omics. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Finally, a Softmax classifier is employed to perform multi-omics data classification. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicate that our proposed CLCLSA produces promising results in multi-omics data classification using both complete and incomplete multi-omics data.
Collapse
Affiliation(s)
- Chen Zhao
- Department of Computer Science, Kennesaw State University, Marietta, GA, 30060, USA
| | - Anqi Liu
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Xiao Zhang
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Dr, Houghton, MI, 49931, USA
| | - Zhengming Ding
- Department of Computer Science, Tulane University, New Orleans, LA, 70118, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Dr, Houghton, MI, 49931, USA
| | - Hui Shen
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Hong-Wen Deng
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA.
| | - Weihua Zhou
- Department of Applied Computing, Michigan Technological University, 1400 Townsend Dr, Houghton, MI, 49931, USA; Center for Biocomputing and Digital Health, Institute of Computing and Cybersystems, and Health Research Institute, Michigan Technological University, Houghton, MI, 49931, USA.
| |
Collapse
|
30
|
Tang S, Cui X, Wang R, Li S, Li S, Huang X, Chen S. scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data. Nat Commun 2024; 15:1629. [PMID: 38388573 PMCID: PMC10884038 DOI: 10.1038/s41467-024-46045-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 02/12/2024] [Indexed: 02/24/2024] Open
Abstract
Single-cell chromatin accessibility sequencing (scCAS) has emerged as a valuable tool for interrogating and elucidating epigenomic heterogeneity and gene regulation. However, scCAS data inherently suffers from limitations such as high sparsity and dimensionality, which pose significant challenges for downstream analyses. Although several methods are proposed to enhance scCAS data, there are still challenges and limitations that hinder the effectiveness of these methods. Here, we propose scCASE, a scCAS data enhancement method based on non-negative matrix factorization which incorporates an iteratively updating cell-to-cell similarity matrix. Through comprehensive experiments on multiple datasets, we demonstrate the advantages of scCASE over existing methods for scCAS data enhancement. The interpretable cell type-specific peaks identified by scCASE can provide valuable biological insights into cell subpopulations. Moreover, to leverage the large compendia of available omics data as a reference, we further expand scCASE to scCASER, which enables the incorporation of external reference data to improve enhancement performance.
Collapse
Affiliation(s)
- Songming Tang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Xuejian Cui
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, 100084, Beijing, China
| | - Rongxiang Wang
- Department of Computer Science, University of Virginia, Charlottesville, VA, 22903, USA
| | - Sijie Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Siyu Li
- School of Statistics and Data Science, Nankai University, Tianjin, 300071, China
| | - Xin Huang
- Beijing Key Laboratory for Radiobiology, Department of Radiation Biology, Beijing Institute of Radiation Medicine, 100850, Beijing, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| |
Collapse
|
31
|
Martini L, Bardini R, Savino A, Di Carlo S. Cross-Omic Transcription Factor Analysis: An Insight on Transcription Factor Accessibility and Expression Correlation. Genes (Basel) 2024; 15:268. [PMID: 38540327 PMCID: PMC10970009 DOI: 10.3390/genes15030268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 02/13/2024] [Accepted: 02/17/2024] [Indexed: 06/15/2024] Open
Abstract
It is well known how sequencing technologies propelled cellular biology research in recent years, providing incredible insight into the basic mechanisms of cells. Single-cell RNA sequencing is at the front in this field, with single-cell ATAC sequencing supporting it and becoming more popular. In this regard, multi-modal technologies play a crucial role, allowing the possibility to simultaneously perform the mentioned sequencing modalities on the same cells. Yet, there still needs to be a clear and dedicated way to analyze these multi-modal data. One of the current methods is to calculate the Gene Activity Matrix (GAM), which summarizes the accessibility of the genes at the genomic level, to have a more direct link with the transcriptomic data. However, this concept is not well defined, and it is unclear how various accessible regions impact the expression of the genes. Moreover, the transcription process is highly regulated by the transcription factors that bind to the different DNA regions. Therefore, this work presents a continuation of the meta-analysis of Genomic-Annotated Gene Activity Matrix (GAGAM) contributions, aiming to investigate the correlation between the TF expression and motif information in the different functional genomic regions to understand the different Transcription Factors (TFs) dynamics involved in different cell types.
Collapse
Affiliation(s)
| | | | | | - Stefano Di Carlo
- Control and Computer Engineering Department, Politecnico di Torino, 10129 Torino, Italy; (L.M.); (R.B.); (A.S.)
| |
Collapse
|
32
|
Gao C, Welch JD. Integrating single-cell multimodal epigenomic data using 1D-convolutional neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.16.580655. [PMID: 38464242 PMCID: PMC10925154 DOI: 10.1101/2024.02.16.580655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Recent experimental developments enable single-cell multimodal epigenomic profiling, which measures multiple histone modifications and chromatin accessibility within the same cell. Such parallel measurements provide exciting new opportunities to investigate how epigenomic modalities vary together across cell types and states. A pivotal step in using this type of data is integrating the epigenomic modalities to learn a unified representation of each cell, but existing approaches are not designed to model the unique nature of this data type. Our key insight is to model single-cell multimodal epigenome data as a multi-channel sequential signal. Based on this insight, we developed ConvNet-VAEs, a novel framework that uses 1D-convolutional variational autoencoders (VAEs) for single-cell multimodal epigenomic data integration. We evaluated ConvNet-VAEs on nano-CT and scNTT-seq data generated from juvenile mouse brain and human bone marrow. We found that ConvNet-VAEs can perform dimension reduction and batch correction better than previous architectures while using significantly fewer parameters. Furthermore, the performance gap between convolutional and fully-connected architectures increases with the number of modalities, and deeper convolutional architectures can increase performance while performance degrades for deeper fully-connected architectures. Our results indicate that convolutional autoencoders are a promising method for integrating current and future single-cell multimodal epigenomic datasets.
Collapse
Affiliation(s)
- Chao Gao
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor MI 48109, USA
| | - Joshua D Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor MI 48109, USA
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor MI 48109, USA
| |
Collapse
|
33
|
Liu J, Ma J, Wen J, Zhou X. A Cell Cycle-aware Network for Data Integration and Label Transferring of Single-cell RNA-seq and ATAC-seq. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.31.578213. [PMID: 38352302 PMCID: PMC10862874 DOI: 10.1101/2024.01.31.578213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
In recent years, the integration of single-cell multi-omics data has provided a more comprehensive understanding of cell functions and internal regulatory mechanisms from a non-single omics perspective, but it still suffers many challenges, such as omics-variance, sparsity, cell heterogeneity and confounding factors. As we know, cell cycle is regarded as a confounder when analyzing other factors in single-cell RNA-seq data, but it's not clear how it will work on the integrated single-cell multi-omics data. Here, we developed a Cell Cycle-Aware Network (CCAN) to remove cell cycle effects from the integrated single-cell multi-omics data while keeping the cell type-specific variations. This is the first computational model to study the cell-cycle effects in the integration of single-cell multi-omics data. Validations on several benchmark datasets show the out-standing performance of CCAN in a variety of downstream analyses and applications, including removing cell cycle effects and batch effects of scRNA-seq datasets from different protocols, integrating paired and unpaired scRNA-seq and scATAC-seq data, accurately transferring cell type labels from scRNA-seq to scATAC-seq data, and characterizing the differentiation process from hematopoietic stem cells to different lineages in the integration of differentiation data.
Collapse
|
34
|
Zhang K, Zemke NR, Armand EJ, Ren B. A fast, scalable and versatile tool for analysis of single-cell omics data. Nat Methods 2024; 21:217-227. [PMID: 38191932 PMCID: PMC10864184 DOI: 10.1038/s41592-023-02139-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 11/23/2023] [Indexed: 01/10/2024]
Abstract
Single-cell omics technologies have revolutionized the study of gene regulation in complex tissues. A major computational challenge in analyzing these datasets is to project the large-scale and high-dimensional data into low-dimensional space while retaining the relative relationships between cells. This low dimension embedding is necessary to decompose cellular heterogeneity and reconstruct cell-type-specific gene regulatory programs. Traditional dimensionality reduction techniques, however, face challenges in computational efficiency and in comprehensively addressing cellular diversity across varied molecular modalities. Here we introduce a nonlinear dimensionality reduction algorithm, embodied in the Python package SnapATAC2, which not only achieves a more precise capture of single-cell omics data heterogeneities but also ensures efficient runtime and memory usage, scaling linearly with the number of cells. Our algorithm demonstrates exceptional performance, scalability and versatility across diverse single-cell omics datasets, including single-cell assay for transposase-accessible chromatin using sequencing, single-cell RNA sequencing, single-cell Hi-C and single-cell multi-omics datasets, underscoring its utility in advancing single-cell analysis.
Collapse
Affiliation(s)
- Kai Zhang
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA
- Westlake Laboratory of Life Sciences and Biomedicine, School of Life Sciences, Westlake University, Hangzhou, China
| | - Nathan R Zemke
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA
- Center for Epigenomics, University of California, San Diego School of Medicine, La Jolla, CA, USA
| | - Ethan J Armand
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA
| | - Bing Ren
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA.
- Center for Epigenomics, University of California, San Diego School of Medicine, La Jolla, CA, USA.
- Ludwig Institute for Cancer Research, La Jolla, CA, USA.
- Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
35
|
Zhao F, Ma X, Yao B, Chen L. scaDA: A Novel Statistical Method for Differential Analysis of Single-Cell Chromatin Accessibility Sequencing Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.21.576570. [PMID: 38328112 PMCID: PMC10849518 DOI: 10.1101/2024.01.21.576570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Single-cell ATAC-seq sequencing data (scATAC-seq) has been widely used to investigate chromatin accessibility on the single-cell level. One important application of scATAC-seq data analysis is differential chromatin accessibility analysis. However, the data characteristics of scATAC-seq such as excessive zeros and large variability of chromatin accessibility across cells impose a unique challenge for DA analysis. Existing statistical methods focus on detecting the mean difference of the chromatin accessible regions while overlooking the distribution difference. Motivated by real data exploration that distribution difference exists among cell types, we introduce a novel composite statistical test named "scaDA", which is based on zero-inflated negative binomial model (ZINB), for performing differential distribution analysis of chromatin accessibility by jointly testing the abundance, prevalence and dispersion simultaneously. Benefiting from both dispersion shrinkage and iterative refinement of mean and prevalence parameter estimates, scaDA demonstrates its superiority to both ZINB-based likelihood ratio tests and published methods by achieving the highest power and best FDR control in a comprehensive simulation study. In addition to demonstrating the highest power in three real sc-multiome data analyses, scaDA successfully identifies differentially accessible regions in microglia from sc-multiome data for an Alzheimer's disease (AD) study, regions which are most enriched in GO terms related to neurogenesis, the clinical phenotype of AD, and SNPs identified in AD-associated GWAS.
Collapse
Affiliation(s)
- Fengdi Zhao
- Department of Biostatistics, University of Florida, Gainesville, FL, USA
| | - Xin Ma
- Department of Biostatistics, University of Florida, Gainesville, FL, USA
| | - Bing Yao
- Department of Human Genetics, Emory University, Atlanta, GA, USA
| | - Li Chen
- Department of Biostatistics, University of Florida, Gainesville, FL, USA
| |
Collapse
|
36
|
Chen Y, Zheng R, Liu J, Li M. scMLC: an accurate and robust multiplex community detection method for single-cell multi-omics data. Brief Bioinform 2024; 25:bbae101. [PMID: 38493339 PMCID: PMC10944569 DOI: 10.1093/bib/bbae101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 01/03/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
Clustering cells based on single-cell multi-modal sequencing technologies provides an unprecedented opportunity to create high-resolution cell atlas, reveal cellular critical states and study health and diseases. However, effectively integrating different sequencing data for cell clustering remains a challenging task. Motivated by the successful application of Louvain in scRNA-seq data, we propose a single-cell multi-modal Louvain clustering framework, called scMLC, to tackle this problem. scMLC builds multiplex single- and cross-modal cell-to-cell networks to capture modal-specific and consistent information between modalities and then adopts a robust multiplex community detection method to obtain the reliable cell clusters. In comparison with 15 state-of-the-art clustering methods on seven real datasets simultaneously measuring gene expression and chromatin accessibility, scMLC achieves better accuracy and stability in most datasets. Synthetic results also indicate that the cell-network-based integration strategy of multi-omics data is superior to other strategies in terms of generalization. Moreover, scMLC is flexible and can be extended to single-cell sequencing data with more than two modalities.
Collapse
Affiliation(s)
- Yuxuan Chen
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jin Liu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
37
|
Jiang Y, Hu Z, Lynch AW, Jiang J, Zhu A, Zhang Y, Xie Y, Li R, Zhou N, Meyer CA, Cejas P, Brown M, Long HW, Qiu X. scATAnno: Automated Cell Type Annotation for single-cell ATAC Sequencing Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.06.01.543296. [PMID: 37333088 PMCID: PMC10274707 DOI: 10.1101/2023.06.01.543296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
The recent advances in single-cell epigenomic techniques have created a growing demand for scATAC-seq analysis. One key task is to determine cell types based on epigenetic profiling. We introduce scATAnno, a workflow designed to automatically annotate scATAC-seq data using large-scale scATAC-seq reference atlases. This workflow can generate scATAC-seq reference atlases from publicly available datasets, and enable accurate cell type annotation by integrating query data with reference atlases, without the aid of scRNA-seq profiling. To enhance annotation accuracy, we have incorporated KNN-based and weighted distance-based uncertainty scores to effectively detect unknown cell populations within the query data. We showcase the utility of scATAnno across multiple datasets, including peripheral blood mononuclear cell (PBMC), basal cell carcinoma (BCC) and Triple Negative Breast Cancer (TNBC), and demonstrate that scATAnno accurately annotates cell types across conditions. Overall, scATAnno is a powerful tool for cell type annotation in scATAC-seq data and can aid in the interpretation of new scATAC-seq datasets in complex biological systems.
Collapse
|
38
|
Eun M, Kim D, Shin SI, Yang HO, Kim KD, Choi SY, Park S, Kim DK, Jeong CW, Moon KC, Lee H, Park J. Chromatin accessibility analysis and architectural profiling of human kidneys reveal key cell types and a regulator of diabetic kidney disease. Kidney Int 2024; 105:150-164. [PMID: 37925023 DOI: 10.1016/j.kint.2023.09.030] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 08/22/2023] [Accepted: 09/25/2023] [Indexed: 11/06/2023]
Abstract
Diabetes is the leading cause of kidney disease that progresses to kidney failure. However, the key molecular and cellular pathways involved in diabetic kidney disease (DKD) pathogenesis are largely unknown. Here, we performed a comparative analysis of adult human kidneys by examining cell type-specific chromatin accessibility by single-nucleus ATAC-seq (snATAC-seq) and analyzing three-dimensional chromatin architecture via high-throughput chromosome conformation capture (Hi-C method) of paired samples. We mapped the cell type-specific and DKD-specific open chromatin landscape and found that genetic variants associated with kidney diseases were significantly enriched in the proximal tubule- (PT) and injured PT-specific open chromatin regions in samples from patients with DKD. BACH1 was identified as a core transcription factor of injured PT cells; its binding target genes were highly associated with fibrosis and inflammation, which were also key features of injured PT cells. Additionally, Hi-C analysis revealed global chromatin architectural changes in DKD, accompanied by changes in local open chromatin patterns. Combining the snATAC-seq and Hi-C data identified direct target genes of BACH1, and indicated that BACH1 binding regions showed increased chromatin contact frequency with promoters of their target genes in DKD. Thus, our multi-omics analysis revealed BACH1 target genes in injured PTs and highlighted the role of BACH1 as a novel regulator of tubular inflammation and fibrosis.
Collapse
Affiliation(s)
- Minho Eun
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea
| | - Donggun Kim
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea
| | - So-I Shin
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea
| | - Hyun Oh Yang
- Department of Systems Biotechnology, Chung-Ang University, Anseong, Republic of Korea
| | - Kyoung-Dong Kim
- Department of Systems Biotechnology, Chung-Ang University, Anseong, Republic of Korea
| | - Sin Young Choi
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea
| | - Sehoon Park
- Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Dong Ki Kim
- Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea; Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Chang Wook Jeong
- Department of Urology, Seoul National University Hospital, Seoul, Republic of Korea
| | - Kyung Chul Moon
- Department of Pathology, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Hajeong Lee
- Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea; Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea.
| | - Jihwan Park
- School of Life Sciences, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea.
| |
Collapse
|
39
|
Xu C, Prete M, Webb S, Jardine L, Stewart BJ, Hoo R, He P, Meyer KB, Teichmann SA. Automatic cell-type harmonization and integration across Human Cell Atlas datasets. Cell 2023; 186:5876-5891.e20. [PMID: 38134877 DOI: 10.1016/j.cell.2023.11.026] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 08/24/2023] [Accepted: 11/23/2023] [Indexed: 12/24/2023]
Abstract
Harmonizing cell types across the single-cell community and assembling them into a common framework is central to building a standardized Human Cell Atlas. Here, we present CellHint, a predictive clustering tree-based tool to resolve cell-type differences in annotation resolution and technical biases across datasets. CellHint accurately quantifies cell-cell transcriptomic similarities and places cell types into a relationship graph that hierarchically defines shared and unique cell subtypes. Application to multiple immune datasets recapitulates expert-curated annotations. CellHint also reveals underexplored relationships between healthy and diseased lung cell states in eight diseases. Furthermore, we present a workflow for fast cross-dataset integration guided by harmonized cell types and cell hierarchy, which uncovers underappreciated cell types in adult human hippocampus. Finally, we apply CellHint to 12 tissues from 38 datasets, providing a deeply curated cross-tissue database with ∼3.7 million cells and various machine learning models for automatic cell annotation across human tissues.
Collapse
Affiliation(s)
- Chuan Xu
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Martin Prete
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Simone Webb
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Laura Jardine
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Benjamin J Stewart
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge CB2 0QQ, UK; Cambridge University Hospitals NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge CB2 0QQ, UK
| | - Regina Hoo
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Peng He
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Kerstin B Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Theory of Condensed Matter Group, Department of Physics, Cavendish Laboratory, University of Cambridge, Cambridge CB3 0HE, UK.
| |
Collapse
|
40
|
Amarasinghe SL, Yang P, Voogd O, Yang H, Du MM, Su S, Brown D, Jabbari J, Bowden R, Ritchie M. scPipe: an extended preprocessing pipeline for comprehensive single-cell ATAC-Seq data integration in R/Bioconductor. NAR Genom Bioinform 2023; 5:lqad105. [PMID: 38046273 PMCID: PMC10689045 DOI: 10.1093/nargab/lqad105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 11/23/2023] [Indexed: 12/05/2023] Open
Abstract
scPipe is a flexible R/Bioconductor package originally developed to analyse platform-independent single-cell RNA-Seq data. To expand its preprocessing capability to accommodate new single-cell technologies, we further developed scPipe to handle single-cell ATAC-Seq and multi-modal (RNA-Seq and ATAC-Seq) data. After executing multiple data cleaning steps to remove duplicated reads, low abundance features and cells of poor quality, a SingleCellExperiment object is created that contains a sparse count matrix with features of interest in the rows and cells in the columns. Quality control information (e.g. counts per cell, features per cell, total number of fragments, fraction of fragments per peak) and any relevant feature annotations are stored as metadata. We demonstrate that scPipe can efficiently identify 'true' cells and provides flexibility for the user to fine-tune the quality control thresholds using various feature and cell-based metrics collected during data preprocessing. Researchers can then take advantage of various downstream single-cell tools available in Bioconductor for further analysis of scATAC-Seq data such as dimensionality reduction, clustering, motif enrichment, differential accessibility and cis-regulatory network analysis. The scPipe package enables a complete beginning-to-end pipeline for single-cell ATAC-Seq and RNA-Seq data analysis in R.
Collapse
Affiliation(s)
- Shanika L Amarasinghe
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
- The Australian Regenerative Medicine Institute, Monash University, Clayton, Victoria, 3800, Australia
| | - Phil Yang
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
| | - Oliver Voogd
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
| | - Haoyu Yang
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
| | - Mei R M Du
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
| | - Shian Su
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
| | - Daniel V Brown
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Jafar S Jabbari
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Rory Bowden
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Matthew E Ritchie
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, 3010, Australia
| |
Collapse
|
41
|
Lareau C. Subtle cell states resolved in single-cell data. Nat Biotechnol 2023; 41:1690-1691. [PMID: 37198441 PMCID: PMC10654257 DOI: 10.1038/s41587-023-01797-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Affiliation(s)
- Caleb Lareau
- Department of Pathology, Stanford University, Palo Alto, CA, USA.
| |
Collapse
|
42
|
Li K, Chen X, Song S, Hou L, Chen S, Jiang R. Cofea: correlation-based feature selection for single-cell chromatin accessibility data. Brief Bioinform 2023; 25:bbad458. [PMID: 38113078 PMCID: PMC10782922 DOI: 10.1093/bib/bbad458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/19/2023] [Accepted: 11/20/2023] [Indexed: 12/21/2023] Open
Abstract
Single-cell chromatin accessibility sequencing (scCAS) technologies have enabled characterizing the epigenomic heterogeneity of individual cells. However, the identification of features of scCAS data that are relevant to underlying biological processes remains a significant gap. Here, we introduce a novel method Cofea, to fill this gap. Through comprehensive experiments on 5 simulated and 54 real datasets, Cofea demonstrates its superiority in capturing cellular heterogeneity and facilitating downstream analysis. Applying this method to identification of cell type-specific peaks and candidate enhancers, as well as pathway enrichment analysis and partitioned heritability analysis, we illustrate the potential of Cofea to uncover functional biological process.
Collapse
Affiliation(s)
- Keyi Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shuang Song
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China
| | - Lin Hou
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
43
|
Roth C, Venu V, Job V, Lubbers N, Sanbonmatsu KY, Steadman CR, Starkenburg SR. Improved quality metrics for association and reproducibility in chromatin accessibility data using mutual information. BMC Bioinformatics 2023; 24:441. [PMID: 37990143 PMCID: PMC10664258 DOI: 10.1186/s12859-023-05553-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 10/30/2023] [Indexed: 11/23/2023] Open
Abstract
BACKGROUND Correlation metrics are widely utilized in genomics analysis and often implemented with little regard to assumptions of normality, homoscedasticity, and independence of values. This is especially true when comparing values between replicated sequencing experiments that probe chromatin accessibility, such as assays for transposase-accessible chromatin via sequencing (ATAC-seq). Such data can possess several regions across the human genome with little to no sequencing depth and are thus non-normal with a large portion of zero values. Despite distributed use in the epigenomics field, few studies have evaluated and benchmarked how correlation and association statistics behave across ATAC-seq experiments with known differences or the effects of removing specific outliers from the data. Here, we developed a computational simulation of ATAC-seq data to elucidate the behavior of correlation statistics and to compare their accuracy under set conditions of reproducibility. RESULTS Using these simulations, we monitored the behavior of several correlation statistics, including the Pearson's R and Spearman's [Formula: see text] coefficients as well as Kendall's [Formula: see text] and Top-Down correlation. We also test the behavior of association measures, including the coefficient of determination R[Formula: see text], Kendall's W, and normalized mutual information. Our experiments reveal an insensitivity of most statistics, including Spearman's [Formula: see text], Kendall's [Formula: see text], and Kendall's W, to increasing differences between simulated ATAC-seq replicates. The removal of co-zeros (regions lacking mapped sequenced reads) between simulated experiments greatly improves the estimates of correlation and association. After removing co-zeros, the R[Formula: see text] coefficient and normalized mutual information display the best performance, having a closer one-to-one relationship with the known portion of shared, enhanced loci between simulated replicates. When comparing values between experimental ATAC-seq data using a random forest model, mutual information best predicts ATAC-seq replicate relationships. CONCLUSIONS Collectively, this study demonstrates how measures of correlation and association can behave in epigenomics experiments. We provide improved strategies for quantifying relationships in these increasingly prevalent and important chromatin accessibility assays.
Collapse
Affiliation(s)
- Cullen Roth
- Los Alamos National Laboratory, Genomics and Bioanalytics, Los Alamos, NM, USA.
| | - Vrinda Venu
- Los Alamos National Laboratory, Climate, Ecosystems, and Environmental Science, Los Alamos, NM, USA
| | - Vanessa Job
- Los Alamos National Laboratory, High Performance Computing and Design, Los Alamos, NM, USA
| | - Nicholas Lubbers
- Los Alamos National Laboratory, Information Sciences, Los Alamos, NM, USA
| | - Karissa Y Sanbonmatsu
- Los Alamos National Laboratory, Theoretical Biology and Biophysics, Los Alamos, NM, USA
| | - Christina R Steadman
- Los Alamos National Laboratory, Climate, Ecosystems, and Environmental Science, Los Alamos, NM, USA
| | - Shawn R Starkenburg
- Los Alamos National Laboratory, Genomics and Bioanalytics, Los Alamos, NM, USA
| |
Collapse
|
44
|
Akhtyamov P, Shaheen L, Raevskiy M, Stupnikov A, Medvedeva YA. scATAC-seq preprocessing and imputation evaluation system for visualization, clustering and digital footprinting. Brief Bioinform 2023; 25:bbad447. [PMID: 38084919 PMCID: PMC10714317 DOI: 10.1093/bib/bbad447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 10/29/2023] [Accepted: 11/14/2023] [Indexed: 12/18/2023] Open
Abstract
Single-cell ATAC-seq (scATAC-seq) is a recently developed approach that provides means to investigate open chromatin at single cell level, to assess epigenetic regulation and transcription factors binding landscapes. The sparsity of the scATAC-seq data calls for imputation. Similarly, preprocessing (filtering) may be required to reduce computational load due to the large number of open regions. However, optimal strategies for both imputation and preprocessing have not been yet evaluated together. We present SAPIEnS (scATAC-seq Preprocessing and Imputation Evaluation System), a benchmark for scATAC-seq imputation frameworks, a combination of state-of-the-art imputation methods with commonly used preprocessing techniques. We assess different types of scATAC-seq analysis, i.e. clustering, visualization and digital genomic footprinting, and attain optimal preprocessing-imputation strategies. We discuss the benefits of the imputation framework depending on the task and the number of the dataset features (peaks). We conclude that the preprocessing with the Boruta method is beneficial for the majority of tasks, while imputation is helpful mostly for small datasets. We also implement a SAPIEnS database with pre-computed transcription factor footprints based on imputed data with their activity scores in a specific cell type. SAPIEnS is published at: https://github.com/lab-medvedeva/SAPIEnS. SAPIEnS database is available at: https://sapiensdb.com.
Collapse
Affiliation(s)
- Pavel Akhtyamov
- Department of Biomedical Physics, Moscow Institute of Physics and Technology (National Research University), 9 Institutskiy per., 141701, Moscow Region, Russian Federation
- The National Medical Research Center for Endocrinology, Dm. Ulyanova, 11, 117036, Moscow, Russian Federation
| | - Layal Shaheen
- Department of Biomedical Physics, Moscow Institute of Physics and Technology (National Research University), 9 Institutskiy per., 141701, Moscow Region, Russian Federation
- The National Medical Research Center for Endocrinology, Dm. Ulyanova, 11, 117036, Moscow, Russian Federation
| | - Mikhail Raevskiy
- Department, École Polytechnique Fédérale de Lausanne, Rte Cantonale, 1015, Lausanne, Vaud, Switzerland
| | - Alexey Stupnikov
- Department of Biomedical Physics, Moscow Institute of Physics and Technology (National Research University), 9 Institutskiy per., 141701, Moscow Region, Russian Federation
- The National Medical Research Center for Endocrinology, Dm. Ulyanova, 11, 117036, Moscow, Russian Federation
- Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Science, Leninsky prospect, 33, build. 2, 119071, Moscow, Russian Federation
| | - Yulia A Medvedeva
- Department of Biomedical Physics, Moscow Institute of Physics and Technology (National Research University), 9 Institutskiy per., 141701, Moscow Region, Russian Federation
- The National Medical Research Center for Endocrinology, Dm. Ulyanova, 11, 117036, Moscow, Russian Federation
- Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Science, Leninsky prospect, 33, build. 2, 119071, Moscow, Russian Federation
| |
Collapse
|
45
|
Bollepogu Raja KK, Yeung K, Shim YK, Li Y, Chen R, Mardon G. A single cell genomics atlas of the Drosophila larval eye reveals distinct photoreceptor developmental timelines. Nat Commun 2023; 14:7205. [PMID: 37938573 PMCID: PMC10632452 DOI: 10.1038/s41467-023-43037-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 10/30/2023] [Indexed: 11/09/2023] Open
Abstract
The Drosophila eye is a powerful model system to study the dynamics of cell differentiation, cell state transitions, cell maturation, and pattern formation. However, a high-resolution single cell genomics resource that accurately profiles all major cell types of the larval eye disc and their spatiotemporal relationships is lacking. Here, we report transcriptomic and chromatin accessibility data for all known cell types in the developing eye. Photoreceptors appear as strands of cells that represent their dynamic developmental timelines. As photoreceptor subtypes mature, they appear to assume a common transcriptomic profile that is dominated by genes involved in axon function. We identify cell type maturation genes, enhancers, and potential regulators, as well as genes with distinct R3 or R4 photoreceptor specific expression. Finally, we observe that the chromatin accessibility between cones and photoreceptors is distinct. These single cell genomics atlases will greatly enhance the power of the Drosophila eye as a model system.
Collapse
Affiliation(s)
- Komal Kumar Bollepogu Raja
- Department of Pathology and Immunology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Kelvin Yeung
- Department of Pathology and Immunology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Yoon-Kyung Shim
- Department of Pathology and Immunology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Yumei Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Rui Chen
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Graeme Mardon
- Department of Pathology and Immunology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
46
|
Carraro C, Bonaguro L, Srinivasa R, van Uelft M, Isakzai V, Schulte-Schrepping J, Gambhir P, Elmzzahi T, Montgomery JV, Hayer H, Li Y, Theis H, Kraut M, Mahbubani KT, Aschenbrenner AC, König I, Fava E, Fried HU, De Domenico E, Beyer M, Saglam A, Schultze JL. Chromatin accessibility profiling of targeted cell populations with laser capture microdissection coupled to ATAC-seq. CELL REPORTS METHODS 2023; 3:100598. [PMID: 37776856 PMCID: PMC10626193 DOI: 10.1016/j.crmeth.2023.100598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 08/04/2023] [Accepted: 09/05/2023] [Indexed: 10/02/2023]
Abstract
Spatially resolved omics technologies reveal context-dependent cellular regulatory networks in tissues of interest. Beyond transcriptome analysis, information on epigenetic traits and chromatin accessibility can provide further insights on gene regulation in health and disease. Nevertheless, compared to the enormous advancements in spatial transcriptomics technologies, the field of spatial epigenomics is much younger and still underexplored. In this study, we report laser capture microdissection coupled to ATAC-seq (LCM-ATAC-seq) applied to fresh frozen samples for the spatial characterization of chromatin accessibility. We first demonstrate the efficient use of LCM coupled to in situ tagmentation and evaluate its performance as a function of cell number, microdissected areas, and tissue type. Further, we demonstrate its use for the targeted chromatin accessibility analysis of discrete contiguous or scattered cell populations in tissues via single-nuclei capture based on immunostaining for specific cellular markers.
Collapse
Affiliation(s)
- Caterina Carraro
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany; Genomics and Immunoregulation, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany.
| | - Lorenzo Bonaguro
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany; Genomics and Immunoregulation, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany; PRECISE Platform for Genomics and Epigenomics, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V. and University of Bonn, Bonn, Germany
| | - Rachana Srinivasa
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany; Genomics and Immunoregulation, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Martina van Uelft
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany; Genomics and Immunoregulation, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Victoria Isakzai
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany; Genomics and Immunoregulation, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Jonas Schulte-Schrepping
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany; Genomics and Immunoregulation, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany; PRECISE Platform for Genomics and Epigenomics, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V. and University of Bonn, Bonn, Germany
| | - Prerna Gambhir
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany; Genomics and Immunoregulation, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Tarek Elmzzahi
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany; Immunogenomics & Neurodegeneration, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany
| | - Jessica V Montgomery
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany
| | - Hannah Hayer
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany; Genomics and Immunoregulation, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Yuanfang Li
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany; Immunogenomics & Neurodegeneration, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany
| | - Heidi Theis
- PRECISE Platform for Genomics and Epigenomics, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V. and University of Bonn, Bonn, Germany
| | - Michael Kraut
- PRECISE Platform for Genomics and Epigenomics, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V. and University of Bonn, Bonn, Germany
| | - Krishnaa T Mahbubani
- Department of Surgery, University of Cambridge, and Cambridge NIHR Biomedical Research Centre, Cambridge, UK
| | - Anna C Aschenbrenner
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany
| | - Ireen König
- Core Research Facilities and Services, Light Microscope Facility (LMF), Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany
| | - Eugenio Fava
- Core Research Facilities and Services, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany
| | - Hans-Ulrich Fried
- Core Research Facilities and Services, Light Microscope Facility (LMF), Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany
| | - Elena De Domenico
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany; Genomics and Immunoregulation, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany; PRECISE Platform for Genomics and Epigenomics, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V. and University of Bonn, Bonn, Germany
| | - Marc Beyer
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany; PRECISE Platform for Genomics and Epigenomics, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V. and University of Bonn, Bonn, Germany; Immunogenomics & Neurodegeneration, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany
| | - Adem Saglam
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany; PRECISE Platform for Genomics and Epigenomics, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V. and University of Bonn, Bonn, Germany.
| | - Joachim L Schultze
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V., Bonn, Germany; Genomics and Immunoregulation, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany; PRECISE Platform for Genomics and Epigenomics, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE) e.V. and University of Bonn, Bonn, Germany
| |
Collapse
|
47
|
Carbonetto P, Luo K, Sarkar A, Hung A, Tayeb K, Pott S, Stephens M. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. Genome Biol 2023; 24:236. [PMID: 37858253 PMCID: PMC10588049 DOI: 10.1186/s13059-023-03067-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/20/2023] [Indexed: 10/21/2023] Open
Abstract
Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.
Collapse
Affiliation(s)
- Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Research Computing Center, University of Chicago, Chicago, IL, USA
| | - Kaixuan Luo
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Vesalius Therapeutics, Cambridge, MA, USA
| | - Anthony Hung
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Karl Tayeb
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Sebastian Pott
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- Department of Statistics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
48
|
Braband KL, Nedwed AS, Helbich SS, Simon M, Beumer N, Brors B, Marini F, Delacher M. Using single-cell chromatin accessibility sequencing to characterize CD4+ T cells from murine tissues. Front Immunol 2023; 14:1232511. [PMID: 37908367 PMCID: PMC10613658 DOI: 10.3389/fimmu.2023.1232511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 08/29/2023] [Indexed: 11/02/2023] Open
Abstract
The Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) is a cutting-edge technology that enables researchers to assess genome-wide chromatin accessibility and to characterize cell type specific gene-regulatory programs. Recent technological progress allows for using this technology also on the single-cell level. In this article, we describe the whole value chain from the isolation of T cells from murine tissues to a complete bioinformatic analysis workflow. We start with methods for isolating scATAC-seq-ready CD4+ T cells from murine tissues such as visceral adipose tissue, skin, colon, and secondary lymphoid tissues such as the spleen. We describe the preparation of nuclei and quality control parameters during library preparation. Based on publicly available sequencing data that was generated using these protocols, we describe a step-by-step bioinformatic analysis pipeline for data pre-processing and downstream analysis. Our analysis workflow will follow the R-based bioinformatics framework ArchR, which is currently well established for scATAC-seq datasets. All in all, this work serves as a one-stop shop for generating and analyzing chromatin accessibility landscapes in T cells.
Collapse
Affiliation(s)
- Kathrin Luise Braband
- Institute of Immunology, University Medical Center Mainz, Mainz, Germany
- Research Center for Immunotherapy (FZI), University Medical Center Mainz, Mainz, Germany
| | - Annekathrin Silvia Nedwed
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center Mainz, Mainz, Germany
| | - Sara Salome Helbich
- Institute of Immunology, University Medical Center Mainz, Mainz, Germany
- Research Center for Immunotherapy (FZI), University Medical Center Mainz, Mainz, Germany
| | - Malte Simon
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Niklas Beumer
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- DKFZ-Hector Cancer Institute, University Medical Center Mannheim, Mannheim, Germany
- Division of Personalized Medical Oncology (A420), German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Personalized Oncology, University Hospital Mannheim, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Benedikt Brors
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- National Center for Tumor Diseases (NCT), Heidelberg, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Federico Marini
- Research Center for Immunotherapy (FZI), University Medical Center Mainz, Mainz, Germany
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center Mainz, Mainz, Germany
| | - Michael Delacher
- Institute of Immunology, University Medical Center Mainz, Mainz, Germany
- Research Center for Immunotherapy (FZI), University Medical Center Mainz, Mainz, Germany
| |
Collapse
|
49
|
Zhang W, Jiang R, Chen S, Wang Y. scIBD: a self-supervised iterative-optimizing model for boosting the detection of heterotypic doublets in single-cell chromatin accessibility data. Genome Biol 2023; 24:225. [PMID: 37814314 PMCID: PMC10561408 DOI: 10.1186/s13059-023-03072-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 09/22/2023] [Indexed: 10/11/2023] Open
Abstract
Application of the widely used droplet-based microfluidic technologies in single-cell sequencing often yields doublets, introducing bias to downstream analyses. Especially, doublet-detection methods for single-cell chromatin accessibility sequencing (scCAS) data have multiple assay-specific challenges. Therefore, we propose scIBD, a self-supervised iterative-optimizing model for boosting heterotypic doublet detection in scCAS data. scIBD introduces an adaptive strategy to simulate high-confident heterotypic doublets and self-supervise for doublet-detection in an iteratively optimizing manner. Comprehensive benchmarking on various simulated and real datasets demonstrates the outperformance and robustness of scIBD. Moreover, the downstream biological analyses suggest the efficacy of doublet-removal by scIBD.
Collapse
Affiliation(s)
- Wenhao Zhang
- Department of Automation, Xiamen University, Xiamen, 361000, Fujian, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361000, Fujian, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| | - Ying Wang
- Department of Automation, Xiamen University, Xiamen, 361000, Fujian, China.
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361000, Fujian, China.
- Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Xiamen, 361005, Fujian, China.
| |
Collapse
|
50
|
Zhang B, Upadhyay R, Hao Y, Samanovic MI, Herati RS, Blair JD, Axelrad J, Mulligan MJ, Littman DR, Satija R. Multimodal single-cell datasets characterize antigen-specific CD8 + T cells across SARS-CoV-2 vaccination and infection. Nat Immunol 2023; 24:1725-1734. [PMID: 37735591 PMCID: PMC10522491 DOI: 10.1038/s41590-023-01608-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 07/31/2023] [Indexed: 09/23/2023]
Abstract
The immune response to SARS-CoV-2 antigen after infection or vaccination is defined by the durable production of antibodies and T cells. Population-based monitoring typically focuses on antibody titer, but there is a need for improved characterization and quantification of T cell responses. Here, we used multimodal sequencing technologies to perform a longitudinal analysis of circulating human leukocytes collected before and after immunization with the mRNA vaccine BNT162b2. Our data indicated distinct subpopulations of CD8+ T cells, which reliably appeared 28 days after prime vaccination. Using a suite of cross-modality integration tools, we defined their transcriptome, accessible chromatin landscape and immunophenotype, and we identified unique biomarkers within each modality. We further showed that this vaccine-induced population was SARS-CoV-2 antigen-specific and capable of rapid clonal expansion. Moreover, we identified these CD8+ T cell populations in scRNA-seq datasets from COVID-19 patients and found that their relative frequency and differentiation outcomes were predictive of subsequent clinical outcomes.
Collapse
Affiliation(s)
- Bingjie Zhang
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- Department of Cell Biology, New York University Grossman School of Medicine, New York, NY, USA
| | - Rabi Upadhyay
- Department of Cell Biology, New York University Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, New York University Langone Health, New York, NY, USA
- Department of Medicine, New York University Grossman School of Medicine, New York, NY, USA
| | - Yuhan Hao
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Marie I Samanovic
- Department of Medicine, New York University Grossman School of Medicine, New York, NY, USA
- New York University Langone Vaccine Center, New York, NY, USA
| | - Ramin S Herati
- Department of Medicine, New York University Grossman School of Medicine, New York, NY, USA
- New York University Langone Vaccine Center, New York, NY, USA
| | - John D Blair
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Jordan Axelrad
- Department of Medicine, New York University Grossman School of Medicine, New York, NY, USA
| | - Mark J Mulligan
- Department of Medicine, New York University Grossman School of Medicine, New York, NY, USA
- New York University Langone Vaccine Center, New York, NY, USA
| | - Dan R Littman
- Department of Cell Biology, New York University Grossman School of Medicine, New York, NY, USA.
- Perlmutter Cancer Center, New York University Langone Health, New York, NY, USA.
- Howard Hughes Medical Institute, New York, NY, USA.
| | - Rahul Satija
- New York Genome Center, New York, NY, USA.
- Center for Genomics and Systems Biology, New York University, New York, NY, USA.
| |
Collapse
|