1
|
Teo AYY, Squair JW, Courtine G, Skinnider MA. Best practices for differential accessibility analysis in single-cell epigenomics. Nat Commun 2024; 15:8805. [PMID: 39394227 PMCID: PMC11470024 DOI: 10.1038/s41467-024-53089-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 09/24/2024] [Indexed: 10/13/2024] Open
Abstract
Differential accessibility (DA) analysis of single-cell epigenomics data enables the discovery of regulatory programs that establish cell type identity and steer responses to physiological and pathophysiological perturbations. While many statistical methods to identify DA regions have been developed, the principles that determine the performance of these methods remain unclear. As a result, there is no consensus on the most appropriate statistical methods for DA analysis of single-cell epigenomics data. Here, we present a systematic evaluation of statistical methods that have been applied to identify DA regions in single-cell ATAC-seq (scATAC-seq) data. We leverage a compendium of scATAC-seq experiments with matching bulk ATAC-seq or scRNA-seq in order to assess the accuracy, bias, robustness, and scalability of each statistical method. The structure of our experiments also provides the opportunity to define best practices for the analysis of scATAC-seq data beyond DA itself. We leverage this understanding to develop an R package implementing these best practices.
Collapse
Affiliation(s)
- Alan Yue Yang Teo
- Defitech Center for Interventional Neurotherapies (.NeuroRestore), EPFL/CHUV/UNIL, Lausanne, Switzerland
- NeuroX Institute and Brain Mind Institute, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Jordan W Squair
- Defitech Center for Interventional Neurotherapies (.NeuroRestore), EPFL/CHUV/UNIL, Lausanne, Switzerland.
- NeuroX Institute and Brain Mind Institute, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
- Department of Clinical Neuroscience, Lausanne University Hospital (CHUV) and University of Lausanne (UNIL), Lausanne, Switzerland.
| | - Gregoire Courtine
- Defitech Center for Interventional Neurotherapies (.NeuroRestore), EPFL/CHUV/UNIL, Lausanne, Switzerland.
- NeuroX Institute and Brain Mind Institute, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
- Department of Clinical Neuroscience, Lausanne University Hospital (CHUV) and University of Lausanne (UNIL), Lausanne, Switzerland.
| | - Michael A Skinnider
- Defitech Center for Interventional Neurotherapies (.NeuroRestore), EPFL/CHUV/UNIL, Lausanne, Switzerland.
- NeuroX Institute and Brain Mind Institute, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
- Ludwig Institute for Cancer Research, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
2
|
Wang Z, Luo P, Xiao M, Wang B, Liu T, Sun X. Recover then aggregate: unified cross-modal deep clustering with global structural information for single-cell data. Brief Bioinform 2024; 25:bbae485. [PMID: 39356327 PMCID: PMC11445907 DOI: 10.1093/bib/bbae485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 08/24/2024] [Accepted: 09/13/2024] [Indexed: 10/03/2024] Open
Abstract
Single-cell cross-modal joint clustering has been extensively utilized to investigate the tumor microenvironment. Although numerous approaches have been suggested, accurate clustering remains the main challenge. First, the gene expression matrix frequently contains numerous missing values due to measurement limitations. The majority of existing clustering methods treat it as a typical multi-modal dataset without further processing. Few methods conduct recovery before clustering and do not sufficiently engage with the underlying research, leading to suboptimal outcomes. Additionally, the existing cross-modal information fusion strategy does not ensure consistency of representations across different modes, potentially leading to the integration of conflicting information, which could degrade performance. To address these challenges, we propose the 'Recover then Aggregate' strategy and introduce the Unified Cross-Modal Deep Clustering model. Specifically, we have developed a data augmentation technique based on neighborhood similarity, iteratively imposing rank constraints on the Laplacian matrix, thus updating the similarity matrix and recovering dropout events. Concurrently, we integrate cross-modal features and employ contrastive learning to align modality-specific representations with consistent ones, enhancing the effective integration of diverse modal information. Comprehensive experiments on five real-world multi-modal datasets have demonstrated this method's superior effectiveness in single-cell clustering tasks.
Collapse
Affiliation(s)
- Ziyi Wang
- Department of Surgical Oncology and General Surgery, First Hospital of China Medical University, Shenyang 110001, PR China
- Section of Esophageal and Mediastinal Oncology, Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China
- Department of Thoracic Surgery, The First Hospital of China Medical University, No.155 North Nanjing Street, Shenyang 110001, People’s Republic of China
| | - Peng Luo
- Department of Thoracic Surgery, Xinqiao Hospital, Army Medical University, Chongqing 400038, China
| | - Mingming Xiao
- Department of Pathology, People’s Hospital of China Medical University (Liaoning Provincial People’s Hospital), Shenyang, Liaoning Province 110015, People’s Republic of China
| | - Boyang Wang
- Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, IL 60607, United States
| | - Tianyu Liu
- Computer Science and Engineering, University of California, Riverside, Riverside, CA 92521, United States
| | - Xiangyu Sun
- Cancer Hospital of China Medical University, Liaoning Cancer Hospital and Institute, Shenyang 110042, Liaoning, China
- Cancer Hospital of Dalian University of Technology, Shenyang, Liaoning Province 110042, China
| |
Collapse
|
3
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-023-2561-0. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
4
|
Wang X, Lian Q, Dong H, Xu S, Su Y, Wu X. Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae014. [PMID: 39049508 PMCID: PMC11423854 DOI: 10.1093/gpbjnl/qzae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Revised: 06/20/2023] [Accepted: 06/25/2023] [Indexed: 07/27/2024]
Abstract
Gene set scoring (GSS) has been routinely conducted for gene expression analysis of bulk or single-cell RNA sequencing (RNA-seq) data, which helps to decipher single-cell heterogeneity and cell type-specific variability by incorporating prior knowledge from functional gene sets. Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a powerful technique for interrogating single-cell chromatin-based gene regulation, and genes or gene sets with dynamic regulatory potentials can be regarded as cell type-specific markers as if in single-cell RNA-seq (scRNA-seq). However, there are few GSS tools specifically designed for scATAC-seq, and the applicability and performance of RNA-seq GSS tools on scATAC-seq data remain to be investigated. Here, we systematically benchmarked ten GSS tools, including four bulk RNA-seq tools, five scRNA-seq tools, and one scATAC-seq method. First, using matched scATAC-seq and scRNA-seq datasets, we found that the performance of GSS tools on scATAC-seq data was comparable to that on scRNA-seq, suggesting their applicability to scATAC-seq. Then, the performance of different GSS tools was extensively evaluated using up to ten scATAC-seq datasets. Moreover, we evaluated the impact of gene activity conversion, dropout imputation, and gene set collections on the results of GSS. Results show that dropout imputation can significantly promote the performance of almost all GSS tools, while the impact of gene activity conversion methods or gene set collections on GSS performance is more dependent on GSS tools or datasets. Finally, we provided practical guidelines for choosing appropriate preprocessing methods and GSS tools in different application scenarios.
Collapse
Affiliation(s)
- Xi Wang
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Qiwei Lian
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Haoyu Dong
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| | - Shuo Xu
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Yaru Su
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
| | - Xiaohui Wu
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| |
Collapse
|
5
|
Chen A, Yangzom T, Hong Y, Lundberg BC, Sullivan GJ, Tzoulis C, Bindoff LA, Liang KX. Hallmark Molecular and Pathological Features of POLG Disease are Recapitulated in Cerebral Organoids. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2307136. [PMID: 38445970 PMCID: PMC11095234 DOI: 10.1002/advs.202307136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 11/26/2023] [Indexed: 03/07/2024]
Abstract
In this research, a 3D brain organoid model is developed to study POLG-related encephalopathy, a mitochondrial disease stemming from POLG mutations. Induced pluripotent stem cells (iPSCs) derived from patients with these mutations is utilized to generate cortical organoids, which exhibited typical features of the diseases with POLG mutations, such as altered morphology, neuronal loss, and mitochondiral DNA (mtDNA) depletion. Significant dysregulation is also identified in pathways crucial for neuronal development and function, alongside upregulated NOTCH and JAK-STAT signaling pathways. Metformin treatment ameliorated many of these abnormalities, except for the persistent affliction of inhibitory dopamine-glutamate (DA GLU) neurons. This novel model effectively mirrors both the molecular and pathological attributes of diseases with POLG mutations, providing a valuable tool for mechanistic understanding and therapeutic screening for POLG-related disorders and other conditions characterized by compromised neuronal mtDNA maintenance and complex I deficiency.
Collapse
Affiliation(s)
- Anbin Chen
- Department of Clinical Medicine (K1)University of BergenBergen5021Norway
- Department of NeurosurgeryXinhua Hospital Affiliated to Shanghai Jiaotong University School of MedicineShanghai20092China
| | - Tsering Yangzom
- Department of Clinical Medicine (K1)University of BergenBergen5021Norway
- Centre for International HealthUniversity of BergenBergen5020Norway
| | - Yu Hong
- Department of Clinical Medicine (K1)University of BergenBergen5021Norway
| | - Bjørn Christian Lundberg
- Department of Clinical Medicine (K1)University of BergenBergen5021Norway
- Department of BiomedicineUniversity of BergenBergen5009Norway
| | | | - Charalampos Tzoulis
- Department of Clinical Medicine (K1)University of BergenBergen5021Norway
- Neuro‐SysMedCenter of Excellence for Clinical Research in Neurological DiseasesHaukeland University HospitalBergen5021Norway
| | | | | |
Collapse
|
6
|
Cui X, Chen X, Li Z, Gao Z, Chen S, Jiang R. Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity. NATURE COMPUTATIONAL SCIENCE 2024; 4:346-359. [PMID: 38730185 DOI: 10.1038/s43588-024-00625-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 04/05/2024] [Indexed: 05/12/2024]
Abstract
Single-cell epigenomic data has been growing continuously at an unprecedented pace, but their characteristics such as high dimensionality and sparsity pose substantial challenges to downstream analysis. Although deep learning models-especially variational autoencoders-have been widely used to capture low-dimensional feature embeddings, the prevalent Gaussian assumption somewhat disagrees with real data, and these models tend to struggle to incorporate reference information from abundant cell atlases. Here we propose CASTLE, a deep generative model based on the vector-quantized variational autoencoder framework to extract discrete latent embeddings that interpretably characterize single-cell chromatin accessibility sequencing data. We validate the performance and robustness of CASTLE for accurate cell-type identification and reasonable visualization compared with state-of-the-art methods. We demonstrate the advantages of CASTLE for effective incorporation of existing massive reference datasets in a weakly supervised or supervised manner. We further demonstrate CASTLE's capacity for intuitively distilling cell-type-specific feature spectra that unveil cell heterogeneity and biological implications quantitatively.
Collapse
Affiliation(s)
- Xuejian Cui
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Zhen Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Zijing Gao
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China.
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China.
| |
Collapse
|
7
|
Zeng Y, Luo M, Shangguan N, Shi P, Feng J, Xu J, Chen K, Lu Y, Yu W, Yang Y. Deciphering cell types by integrating scATAC-seq data with genome sequences. NATURE COMPUTATIONAL SCIENCE 2024; 4:285-298. [PMID: 38600256 DOI: 10.1038/s43588-024-00622-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/18/2024] [Indexed: 04/12/2024]
Abstract
The single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) technology provides insight into gene regulation and epigenetic heterogeneity at single-cell resolution, but cell annotation from scATAC-seq remains challenging due to high dimensionality and extreme sparsity within the data. Existing cell annotation methods mostly focus on the cell peak matrix without fully utilizing the underlying genomic sequence. Here we propose a method, SANGO, for accurate single-cell annotation by integrating genome sequences around the accessibility peaks within scATAC data. The genome sequences of peaks are encoded into low-dimensional embeddings, and then iteratively used to reconstruct the peak statistics of cells through a fully connected network. The learned weights are considered as regulatory modes to represent cells, and utilized to align the query cells and the annotated cells in the reference data through a graph transformer network for cell annotations. SANGO was demonstrated to consistently outperform competing methods on 55 paired scATAC-seq datasets across samples, platforms and tissues. SANGO was also shown to be able to detect unknown tumor cells through attention edge weights learned by the graph transformer. Moreover, from the annotated cells, we found cell-type-specific peaks that provide functional insights/biological signals through expression enrichment analysis, cis-regulatory chromatin interaction analysis and motif enrichment analysis.
Collapse
Affiliation(s)
- Yuansong Zeng
- School of Big Data and Software Engineering, Chongqing University, Chongqing, China
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Mai Luo
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Ningyuan Shangguan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Peiyu Shi
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Junxi Feng
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Jin Xu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Ken Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yutong Lu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Weijiang Yu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Guangzhou, China.
| |
Collapse
|
8
|
Tang S, Cui X, Wang R, Li S, Li S, Huang X, Chen S. scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data. Nat Commun 2024; 15:1629. [PMID: 38388573 PMCID: PMC10884038 DOI: 10.1038/s41467-024-46045-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 02/12/2024] [Indexed: 02/24/2024] Open
Abstract
Single-cell chromatin accessibility sequencing (scCAS) has emerged as a valuable tool for interrogating and elucidating epigenomic heterogeneity and gene regulation. However, scCAS data inherently suffers from limitations such as high sparsity and dimensionality, which pose significant challenges for downstream analyses. Although several methods are proposed to enhance scCAS data, there are still challenges and limitations that hinder the effectiveness of these methods. Here, we propose scCASE, a scCAS data enhancement method based on non-negative matrix factorization which incorporates an iteratively updating cell-to-cell similarity matrix. Through comprehensive experiments on multiple datasets, we demonstrate the advantages of scCASE over existing methods for scCAS data enhancement. The interpretable cell type-specific peaks identified by scCASE can provide valuable biological insights into cell subpopulations. Moreover, to leverage the large compendia of available omics data as a reference, we further expand scCASE to scCASER, which enables the incorporation of external reference data to improve enhancement performance.
Collapse
Affiliation(s)
- Songming Tang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Xuejian Cui
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, 100084, Beijing, China
| | - Rongxiang Wang
- Department of Computer Science, University of Virginia, Charlottesville, VA, 22903, USA
| | - Sijie Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Siyu Li
- School of Statistics and Data Science, Nankai University, Tianjin, 300071, China
| | - Xin Huang
- Beijing Key Laboratory for Radiobiology, Department of Radiation Biology, Beijing Institute of Radiation Medicine, 100850, Beijing, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| |
Collapse
|
9
|
Chen Y, Zheng R, Liu J, Li M. scMLC: an accurate and robust multiplex community detection method for single-cell multi-omics data. Brief Bioinform 2024; 25:bbae101. [PMID: 38493339 PMCID: PMC10944569 DOI: 10.1093/bib/bbae101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 01/03/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
Clustering cells based on single-cell multi-modal sequencing technologies provides an unprecedented opportunity to create high-resolution cell atlas, reveal cellular critical states and study health and diseases. However, effectively integrating different sequencing data for cell clustering remains a challenging task. Motivated by the successful application of Louvain in scRNA-seq data, we propose a single-cell multi-modal Louvain clustering framework, called scMLC, to tackle this problem. scMLC builds multiplex single- and cross-modal cell-to-cell networks to capture modal-specific and consistent information between modalities and then adopts a robust multiplex community detection method to obtain the reliable cell clusters. In comparison with 15 state-of-the-art clustering methods on seven real datasets simultaneously measuring gene expression and chromatin accessibility, scMLC achieves better accuracy and stability in most datasets. Synthetic results also indicate that the cell-network-based integration strategy of multi-omics data is superior to other strategies in terms of generalization. Moreover, scMLC is flexible and can be extended to single-cell sequencing data with more than two modalities.
Collapse
Affiliation(s)
- Yuxuan Chen
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jin Liu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
10
|
Miao Z, Kim J. Uniform quantification of single-nucleus ATAC-seq data with Paired-Insertion Counting (PIC) and a model-based insertion rate estimator. Nat Methods 2024; 21:32-36. [PMID: 38049698 PMCID: PMC10776405 DOI: 10.1038/s41592-023-02103-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 10/25/2023] [Indexed: 12/06/2023]
Abstract
Existing approaches to scoring single-nucleus assay for transposase-accessible chromatin with sequencing (snATAC-seq) feature matrices from sequencing reads are inconsistent, affecting downstream analyses and displaying artifacts. We show that, even with sparse single-cell data, quantitative counts are informative for estimating the regulatory state of a cell, which calls for a consistent treatment. We propose Paired-Insertion Counting as a uniform method for snATAC-seq feature characterization and provide a probability model for inferring latent insertion dynamics from snATAC-seq count matrices.
Collapse
Affiliation(s)
- Zhen Miao
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Junhyong Kim
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
11
|
Li K, Chen X, Song S, Hou L, Chen S, Jiang R. Cofea: correlation-based feature selection for single-cell chromatin accessibility data. Brief Bioinform 2023; 25:bbad458. [PMID: 38113078 PMCID: PMC10782922 DOI: 10.1093/bib/bbad458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/19/2023] [Accepted: 11/20/2023] [Indexed: 12/21/2023] Open
Abstract
Single-cell chromatin accessibility sequencing (scCAS) technologies have enabled characterizing the epigenomic heterogeneity of individual cells. However, the identification of features of scCAS data that are relevant to underlying biological processes remains a significant gap. Here, we introduce a novel method Cofea, to fill this gap. Through comprehensive experiments on 5 simulated and 54 real datasets, Cofea demonstrates its superiority in capturing cellular heterogeneity and facilitating downstream analysis. Applying this method to identification of cell type-specific peaks and candidate enhancers, as well as pathway enrichment analysis and partitioned heritability analysis, we illustrate the potential of Cofea to uncover functional biological process.
Collapse
Affiliation(s)
- Keyi Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shuang Song
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China
| | - Lin Hou
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
12
|
Li Y, Zhang D, Yang M, Peng D, Yu J, Liu Y, Lv J, Chen L, Peng X. scBridge embraces cell heterogeneity in single-cell RNA-seq and ATAC-seq data integration. Nat Commun 2023; 14:6045. [PMID: 37770437 PMCID: PMC10539354 DOI: 10.1038/s41467-023-41795-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 09/08/2023] [Indexed: 09/30/2023] Open
Abstract
Single-cell multi-omics data integration aims to reduce the omics difference while keeping the cell type difference. However, it is daunting to model and distinguish the two differences due to cell heterogeneity. Namely, even cells of the same omics and type would have various features, making the two differences less significant. In this work, we reveal that instead of being an interference, cell heterogeneity could be exploited to improve data integration. Specifically, we observe that the omics difference varies in cells, and cells with smaller omics differences are easier to be integrated. Hence, unlike most existing works that homogeneously treat and integrate all cells, we propose a multi-omics data integration method (dubbed scBridge) that integrates cells in a heterogeneous manner. In brief, scBridge iterates between i) identifying reliable scATAC-seq cells that have smaller omics differences, and ii) integrating reliable scATAC-seq cells with scRNA-seq data to narrow the omics gap, thus benefiting the integration for the rest cells. Extensive experiments on seven multi-omics datasets demonstrate the superiority of scBridge compared with six representative baselines.
Collapse
Affiliation(s)
- Yunfan Li
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China
| | - Dan Zhang
- Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE, Department of Laboratory Medicine, State Key Laboratory of Biotherapy, West China Second University Hospital, Sichuan University, Chengdu, China
| | - Mouxing Yang
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China
| | - Dezhong Peng
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China
| | - Jun Yu
- School of Computer Science, Hangzhou Dianzi University, Hangzhou, Zhejiang, China
| | - Yu Liu
- School of Electronic and Information Engineering, Naval Aviation University, Yantai, Shandong, China
| | - Jiancheng Lv
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China
| | - Lu Chen
- Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE, Department of Laboratory Medicine, State Key Laboratory of Biotherapy, West China Second University Hospital, Sichuan University, Chengdu, China
| | - Xi Peng
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China.
| |
Collapse
|
13
|
Zhou S, Chen B, Fu ES, Yan H. Computer vision meets microfluidics: a label-free method for high-throughput cell analysis. MICROSYSTEMS & NANOENGINEERING 2023; 9:116. [PMID: 37744264 PMCID: PMC10511704 DOI: 10.1038/s41378-023-00562-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 03/21/2023] [Accepted: 04/10/2023] [Indexed: 09/26/2023]
Abstract
In this paper, we review the integration of microfluidic chips and computer vision, which has great potential to advance research in the life sciences and biology, particularly in the analysis of cell imaging data. Microfluidic chips enable the generation of large amounts of visual data at the single-cell level, while computer vision techniques can rapidly process and analyze these data to extract valuable information about cellular health and function. One of the key advantages of this integrative approach is that it allows for noninvasive and low-damage cellular characterization, which is important for studying delicate or fragile microbial cells. The use of microfluidic chips provides a highly controlled environment for cell growth and manipulation, minimizes experimental variability and improves the accuracy of data analysis. Computer vision can be used to recognize and analyze target species within heterogeneous microbial populations, which is important for understanding the physiological status of cells in complex biological systems. As hardware and artificial intelligence algorithms continue to improve, computer vision is expected to become an increasingly powerful tool for in situ cell analysis. The use of microelectromechanical devices in combination with microfluidic chips and computer vision could enable the development of label-free, automatic, low-cost, and fast cellular information recognition and the high-throughput analysis of cellular responses to different compounds, for broad applications in fields such as drug discovery, diagnostics, and personalized medicine.
Collapse
Affiliation(s)
- Shizheng Zhou
- State Key Laboratory of Marine Resource Utilization in South China Sea, Hainan University, Haikou, 570228 China
| | - Bingbing Chen
- State Key Laboratory of Marine Resource Utilization in South China Sea, Hainan University, Haikou, 570228 China
| | - Edgar S. Fu
- Graduate School of Computing and Information Science, University of Pittsburgh, Pittsburgh, PA 15260 USA
| | - Hong Yan
- State Key Laboratory of Marine Resource Utilization in South China Sea, Hainan University, Haikou, 570228 China
| |
Collapse
|
14
|
Liu T, Lu Y, Zhu B, Zhao H. Clustering high-dimensional data via feature selection. Biometrics 2023; 79:940-950. [PMID: 35338489 PMCID: PMC10119907 DOI: 10.1111/biom.13665] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 02/08/2021] [Accepted: 03/14/2022] [Indexed: 01/18/2023]
Abstract
High-dimensional clustering analysis is a challenging problem in statistics and machine learning, with broad applications such as the analysis of microarray data and RNA-seq data. In this paper, we propose a new clustering procedure called spectral clustering with feature selection (SC-FS), where we first obtain an initial estimate of labels via spectral clustering, then select a small fraction of features with the largest R-squared with these labels, that is, the proportion of variation explained by group labels, and conduct clustering again using selected features. Under mild conditions, we prove that the proposed method identifies all informative features with high probability and achieves the minimax optimal clustering error rate for the sparse Gaussian mixture model. Applications of SC-FS to four real-world datasets demonstrate its usefulness in clustering high-dimensional data.
Collapse
Affiliation(s)
- Tianqi Liu
- Google Research, New York, New York, USA
| | - Yu Lu
- Two Sigma Investments, New York, New York, USA
| | - Biqing Zhu
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Hongyu Zhao
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, USA
| |
Collapse
|
15
|
Ma W, Lu J, Wu H. Cellcano: supervised cell type identification for single cell ATAC-seq data. Nat Commun 2023; 14:1864. [PMID: 37012226 PMCID: PMC10070275 DOI: 10.1038/s41467-023-37439-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 03/15/2023] [Indexed: 04/05/2023] Open
Abstract
Computational cell type identification is a fundamental step in single-cell omics data analysis. Supervised celltyping methods have gained increasing popularity in single-cell RNA-seq data because of the superior performance and the availability of high-quality reference datasets. Recent technological advances in profiling chromatin accessibility at single-cell resolution (scATAC-seq) have brought new insights to the understanding of epigenetic heterogeneity. With continuous accumulation of scATAC-seq datasets, supervised celltyping method specifically designed for scATAC-seq is in urgent need. Here we develop Cellcano, a computational method based on a two-round supervised learning algorithm to identify cell types from scATAC-seq data. The method alleviates the distributional shift between reference and target data and improves the prediction performance. After systematically benchmarking Cellcano on 50 well-designed celltyping tasks from various datasets, we show that Cellcano is accurate, robust, and computationally efficient. Cellcano is well-documented and freely available at https://marvinquiet.github.io/Cellcano/ .
Collapse
Affiliation(s)
- Wenjing Ma
- Department of Computer Science, Emory University, 400 Dowman Drive, Atlanta, GA, 30322, USA
| | - Jiaying Lu
- Department of Computer Science, Emory University, 400 Dowman Drive, Atlanta, GA, 30322, USA
| | - Hao Wu
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, 518055, P. R. China.
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, GA, 30322, USA.
| |
Collapse
|
16
|
Zhang Z, Chen S, Lin Z. RefTM: reference-guided topic modeling of single-cell chromatin accessibility data. Brief Bioinform 2023; 24:6895319. [PMID: 36513377 DOI: 10.1093/bib/bbac540] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 10/27/2022] [Accepted: 11/09/2022] [Indexed: 12/15/2022] Open
Abstract
Single-cell analysis is a valuable approach for dissecting the cellular heterogeneity, and single-cell chromatin accessibility sequencing (scCAS) can profile the epigenetic landscapes for thousands of individual cells. It is challenging to analyze scCAS data, because of its high dimensionality and a higher degree of sparsity compared with scRNA-seq data. Topic modeling in single-cell data analysis can lead to robust identification of the cell types and it can provide insight into the regulatory mechanisms. Reference-guided approach may facilitate the analysis of scCAS data by utilizing the information in existing datasets. We present RefTM (Reference-guided Topic Modeling of single-cell chromatin accessibility data), which not only utilizes the information in existing bulk chromatin accessibility and annotated scCAS data, but also takes advantage of topic models for single-cell data analysis. RefTM simultaneously models: (1) the shared biological variation among reference data and the target scCAS data; (2) the unique biological variation in scCAS data; (3) other variations from known covariates in scCAS data.
Collapse
Affiliation(s)
- Zheng Zhang
- Department of Statistics in the Chinese University of Hong Kong
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC in Nankai university
| | - Zhixiang Lin
- Department of Statistics in the Chinese University of Hong Kong
| |
Collapse
|
17
|
Preissl S, Gaulton KJ, Ren B. Characterizing cis-regulatory elements using single-cell epigenomics. Nat Rev Genet 2023; 24:21-43. [PMID: 35840754 PMCID: PMC9771884 DOI: 10.1038/s41576-022-00509-1] [Citation(s) in RCA: 69] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/24/2022] [Indexed: 12/24/2022]
Abstract
Cell type-specific gene expression patterns and dynamics during development or in disease are controlled by cis-regulatory elements (CREs), such as promoters and enhancers. Distinct classes of CREs can be characterized by their epigenomic features, including DNA methylation, chromatin accessibility, combinations of histone modifications and conformation of local chromatin. Tremendous progress has been made in cataloguing CREs in the human genome using bulk transcriptomic and epigenomic methods. However, single-cell epigenomic and multi-omic technologies have the potential to provide deeper insight into cell type-specific gene regulatory programmes as well as into how they change during development, in response to environmental cues and through disease pathogenesis. Here, we highlight recent advances in single-cell epigenomic methods and analytical tools and discuss their readiness for human tissue profiling.
Collapse
Affiliation(s)
- Sebastian Preissl
- Center for Epigenomics, University of California San Diego, La Jolla, CA, USA.
- Institute of Experimental and Clinical Pharmacology and Toxicology, Faculty of Medicine, University of Freiburg, Freiburg, Germany.
| | - Kyle J Gaulton
- Department of Paediatrics, Paediatric Diabetes Research Center, University of California San Diego, La Jolla, CA, USA.
| | - Bing Ren
- Center for Epigenomics, University of California San Diego, La Jolla, CA, USA.
- Department of Cellular and Molecular Medicine, University of California San Diego, School of Medicine, La Jolla, CA, USA.
- Ludwig Institute for Cancer Research, La Jolla, CA, USA.
| |
Collapse
|
18
|
O'Neill H, Lee H, Gupta I, Rodger EJ, Chatterjee A. Single-Cell DNA Methylation Analysis in Cancer. Cancers (Basel) 2022; 14:6171. [PMID: 36551655 PMCID: PMC9777108 DOI: 10.3390/cancers14246171] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/07/2022] [Accepted: 12/10/2022] [Indexed: 12/23/2022] Open
Abstract
Morphological, transcriptomic, and genomic defects are well-explored parameters of cancer biology. In more recent years, the impact of epigenetic influences, such as DNA methylation, is becoming more appreciated. Aberrant DNA methylation has been implicated in many types of cancers, influencing cell type, state, transcriptional regulation, and genomic stability to name a few. Traditionally, large populations of cells from the tissue of interest are coalesced for analysis, producing averaged methylome data. Considering the inherent heterogeneity of cancer, analysing populations of cells as a whole denies the ability to discover novel aberrant methylation patterns, identify subpopulations, and trace cell lineages. Due to recent advancements in technology, it is now possible to obtain methylome data from single cells. This has both research and clinical implications, ranging from the identification of biomarkers to improved diagnostic tools. As with all emerging technologies, distinct experimental, bioinformatic, and practical challenges present themselves. This review begins with exploring the potential impact of single-cell sequencing on understanding cancer biology and how it could eventually benefit a clinical setting. Following this, the techniques and experimental approaches which made this technology possible are explored. Finally, the present challenges currently associated with single-cell DNA methylation sequencing are described.
Collapse
Affiliation(s)
- Hannah O'Neill
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin 9016, New Zealand
| | - Heather Lee
- School of Biomedical Sciences and Pharmacy, College of Health, Medicine and Wellbeing, The University of Newcastle, Callaghan, NSW 2308, Australia
- Hunter Medical Research Institute, New Lambton Heights, NSW 2305, Australia
| | - Ishaan Gupta
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi 110016, India
| | - Euan J Rodger
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin 9016, New Zealand
| | - Aniruddha Chatterjee
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin 9016, New Zealand
- School of Health Sciences and Technology, University of Petroleum and Energy Studies (UPES), Dehradun 248007, India
| |
Collapse
|
19
|
Duan H, Li F, Shang J, Liu J, Li Y, Liu X. scVAEBGM: Clustering Analysis of Single-Cell ATAC-seq Data Using a Deep Generative Model. Interdiscip Sci 2022; 14:917-928. [PMID: 35939233 DOI: 10.1007/s12539-022-00536-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 07/15/2022] [Accepted: 07/20/2022] [Indexed: 06/15/2023]
Abstract
A surge in research has occurred because of current developments in single-cell technologies. Above all, single-cell Assay for Transposase-Accessible Chromatin with high throughput sequencing (scATAC-seq) is a popular approach of analyzing chromatin accessibility differences at the level of single cell, either within or between groups. As a result, it is critical to examine cell heterogeneity at a previously unseen level and to identify both recognized and unknown cell types. However, with the ever-increasing number of cells engendered by technological development and the characteristics of the data, such as high noise, sparsity and dimension, challenges in distinguishing cell types have emerged. We propose scVAEBGM, which integrates a Variational Autoencoder (VAE) with a Bayesian Gaussian-mixture model (BGM) to process and analyze scATAC-seq data. This method combines and takes benefits of a Bayesian Gaussian mixture model to estimate the number of cell types without determining the cluster number in a beforehand. In other words, the size of the clusters is inferred from the data, thus avoiding biases introduced by subjective assessments when manually determining the size of the clusters. Additionally, the method is more robust to noise and can better represent single-cell data in lower dimensions. We also create a further clustering strategy. It is indicated by experiments that further clustering based on the already completed clustering can improve the clustering accuracy again. We test on six public datasets, and scVAEBGM outperforms various dimension reduction baselines. In downstream applications, scVAEBGM can reveal biological cell types.
Collapse
Affiliation(s)
- Hongyu Duan
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Feng Li
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China.
| | - Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Jinxing Liu
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Yan Li
- Department of Electrical Engineering and Information Technology, Shandong University of Science and Technology, Jinan, 250031, Shandong, China
| | - Xikui Liu
- Department of Electrical Engineering and Information Technology, Shandong University of Science and Technology, Jinan, 250031, Shandong, China
| |
Collapse
|
20
|
Zeng P, Ma Y, Lin Z. scAWMV: an adaptively weighted multi-view learning framework for the integrative analysis of parallel scRNA-seq and scATAC-seq data. Bioinformatics 2022; 39:6831091. [PMID: 36383176 PMCID: PMC9805575 DOI: 10.1093/bioinformatics/btac739] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 10/16/2022] [Accepted: 11/15/2022] [Indexed: 11/17/2022] Open
Abstract
MOTIVATION Technological advances have enabled us to profile single-cell multi-omics data from the same cells, providing us with an unprecedented opportunity to understand the cellular phenotype and links to its genotype. The available protocols and multi-omics datasets [including parallel single-cell RNA sequencing (scRNA-seq) and single-cell ATAC sequencing (scATAC-seq) data profiled from the same cell] are growing increasingly. However, such data are highly sparse and tend to have high level of noise, making data analysis challenging. The methods that integrate the multi-omics data can potentially improve the capacity of revealing the cellular heterogeneity. RESULTS We propose an adaptively weighted multi-view learning (scAWMV) method for the integrative analysis of parallel scRNA-seq and scATAC-seq data profiled from the same cell. scAWMV considers both the difference in importance across different modalities in multi-omics data and the biological connection of the features in the scRNA-seq and scATAC-seq data. It generates biologically meaningful low-dimensional representations for the transcriptomic and epigenomic profiles via unsupervised learning. Application to four real datasets demonstrates that our framework scAWMV is an efficient method to dissect cellular heterogeneity for single-cell multi-omics data. AVAILABILITY AND IMPLEMENTATION The software and datasets are available at https://github.com/pengchengzeng/scAWMV. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pengcheng Zeng
- Institute of Mathematical Sciences, ShanghaiTech University, Shanghai 201210, China
| | - Yuanyuan Ma
- School of Computer and Information Engineering, Anyang Normal University, Henan 455000, China
| | | |
Collapse
|
21
|
Mukherjee P, Park SH, Pathak N, Patino CA, Bao G, Espinosa HD. Integrating Micro and Nano Technologies for Cell Engineering and Analysis: Toward the Next Generation of Cell Therapy Workflows. ACS NANO 2022; 16:15653-15680. [PMID: 36154011 DOI: 10.1021/acsnano.2c05494] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The emerging field of cell therapy offers the potential to treat and even cure a diverse array of diseases for which existing interventions are inadequate. Recent advances in micro and nanotechnology have added a multitude of single cell analysis methods to our research repertoire. At the same time, techniques have been developed for the precise engineering and manipulation of cells. Together, these methods have aided the understanding of disease pathophysiology, helped formulate corrective interventions at the cellular level, and expanded the spectrum of available cell therapeutic options. This review discusses how micro and nanotechnology have catalyzed the development of cell sorting, cellular engineering, and single cell analysis technologies, which have become essential workflow components in developing cell-based therapeutics. The review focuses on the technologies adopted in research studies and explores the opportunities and challenges in combining the various elements of cell engineering and single cell analysis into the next generation of integrated and automated platforms that can accelerate preclinical studies and translational research.
Collapse
Affiliation(s)
- Prithvijit Mukherjee
- Department of Mechanical Engineering, Northwestern University, Evanston, Illinois 60208, United States
- Theoretical and Applied Mechanics Program, Northwestern University, Evanston, Illinois 60208, United States
| | - So Hyun Park
- Department of Bioengineering, Rice University, 6500 Main Street, Houston, Texas 77030, United States
| | - Nibir Pathak
- Department of Mechanical Engineering, Northwestern University, Evanston, Illinois 60208, United States
- Theoretical and Applied Mechanics Program, Northwestern University, Evanston, Illinois 60208, United States
| | - Cesar A Patino
- Department of Mechanical Engineering, Northwestern University, Evanston, Illinois 60208, United States
| | - Gang Bao
- Department of Bioengineering, Rice University, 6500 Main Street, Houston, Texas 77030, United States
| | - Horacio D Espinosa
- Department of Mechanical Engineering, Northwestern University, Evanston, Illinois 60208, United States
- Theoretical and Applied Mechanics Program, Northwestern University, Evanston, Illinois 60208, United States
| |
Collapse
|
22
|
Shi P, Nie Y, Yang J, Zhang W, Tang Z, Xu J. Fundamental and practical approaches for single-cell ATAC-seq analysis. ABIOTECH 2022; 3:212-223. [PMID: 36313930 PMCID: PMC9590475 DOI: 10.1007/s42994-022-00082-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 09/07/2022] [Indexed: 11/28/2022]
Abstract
Assays for transposase-accessible chromatin through high-throughput sequencing (ATAC-seq) are effective tools in the study of genome-wide chromatin accessibility landscapes. With the rapid development of single-cell technology, open chromatin regions that play essential roles in epigenetic regulation have been measured at the single-cell level using single-cell ATAC-seq approaches. The application of scATAC-seq has become as popular as that of scRNA-seq. However, owing to the nature of scATAC-seq data, which are sparse and noisy, processing the data requires different methodologies and empirical experience. This review presents a practical guide for processing scATAC-seq data, from quality evaluation to downstream analysis, for various applications. In addition to the epigenomic profiling from scATAC-seq, we also discuss recent studies in which the function of non-coding variants has been investigated based on cell type-specific cis-regulatory elements and how to use the by-product genetic information obtained from scATAC-seq to infer single-cell copy number variants and trace cell lineage. We anticipate that this review will assist researchers in designing and implementing scATAC-seq assays to facilitate research in diverse fields.
Collapse
Affiliation(s)
- Peiyu Shi
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275 China
| | - Yage Nie
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510275 China
| | - Jiawen Yang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275 China
| | - Weixing Zhang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275 China
| | - Zhongjie Tang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275 China
| | - Jin Xu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275 China
| |
Collapse
|
23
|
Chen Z, Chen W, Li Y, Moos M, Xiao D, Wang C. Single-nucleus chromatin accessibility and RNA sequencing reveal impaired brain development in prenatally e-cigarette exposed neonatal rats. iScience 2022; 25:104686. [PMID: 35874099 PMCID: PMC9304611 DOI: 10.1016/j.isci.2022.104686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 05/13/2022] [Accepted: 06/24/2022] [Indexed: 11/03/2022] Open
Abstract
Although emerging evidence reveals that vaping alters the function of the central nervous system, the effects of maternal vaping on offspring brain development remain elusive. Using a well-established in utero exposure model, we performed single-nucleus ATAC-seq (snATAC-seq) and RNA sequencing (snRNA-seq) on prenatally e-cigarette-exposed rat brains. We found that maternal vaping distorted neuronal lineage differentiation in the neonatal brain by promoting excitatory neurons and inhibiting lateral ganglionic eminence-derived inhibitory neuronal differentiation. Moreover, maternal vaping disrupted calcium homeostasis, induced microglia cell death, and elevated susceptibility to cerebral ischemic injury in the developing brain of offspring. Our results suggest that the aberrant calcium signaling, diminished microglial population, and impaired microglia-neuron interaction may all contribute to the underlying mechanisms by which prenatal e-cigarette exposure impairs neonatal rat brain development. Our findings raise the concern that maternal vaping may cause adverse long-term brain damage to the offspring.
Collapse
Affiliation(s)
- Zhong Chen
- Center for Genomics, School of Medicine, Loma Linda University, 11021 Campus St., Loma Linda, CA 92350, USA
| | - Wanqiu Chen
- Center for Genomics, School of Medicine, Loma Linda University, 11021 Campus St., Loma Linda, CA 92350, USA
| | - Yong Li
- Lawrence D. Longo, MD Center for Perinatal Biology, Division of Pharmacology, Department of Basic Sciences, Loma Linda University School of Medicine, Loma Linda, CA 92350, USA
| | - Malcolm Moos
- Center for Biologics Evaluation and Research & Division of Cellular and Gene Therapies, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD 20993, USA
| | - Daliao Xiao
- Lawrence D. Longo, MD Center for Perinatal Biology, Division of Pharmacology, Department of Basic Sciences, Loma Linda University School of Medicine, Loma Linda, CA 92350, USA
| | - Charles Wang
- Center for Genomics, School of Medicine, Loma Linda University, 11021 Campus St., Loma Linda, CA 92350, USA
- Division of Microbiology & Molecular Genetics, Department of Basic Science, School of Medicine, Loma Linda University, 11021 Campus St., Loma Linda, CA 92350, USA
| |
Collapse
|
24
|
Yin K, Zhao M, Lin L, Chen Y, Huang S, Zhu C, Liang X, Lin F, Wei H, Zeng H, Zhu Z, Song J, Yang C. Well-Paired-Seq: A Size-Exclusion and Locally Quasi-Static Hydrodynamic Microwell Chip for Single-Cell RNA-Seq. SMALL METHODS 2022; 6:e2200341. [PMID: 35521945 DOI: 10.1002/smtd.202200341] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 04/14/2022] [Indexed: 06/14/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a powerful technology for revealing the heterogeneity of cellular states. However, existing scRNA-seq platforms that utilize bead-based technologies suffer from a large number of empty microreactors and a low cell/bead capture efficiency. Here, Well-paired-seq is presented, which consists of thousands of size exclusion and quasi-static hydrodynamic dual wells to address these limitations. The size-exclusion principle allows one cell and one bead to be trapped in the bottom well (cell-capture-well) and the top well (bead-capture-well), respectively, while the quasi-static hydrodynamic principle ensures that the trapped cells are difficult to escape from cell-capture-wells, achieving cumulative capture of cells and effective buffer exchange. By the integration of quasi-static hydrodynamic and size-exclusion principles, the dual wells ensure single cells/beads pairing with high density, achieving excellent efficiency of cell capture (≈91%), cell/bead pairing (≈82%), and cell-free RNA removal. The high utilization of microreactors and single cells/beads enable to achieve a high throughput (≈105 cells) with low collision rates. The technical performance of Well-paired-seq is demonstrated by collecting transcriptome data from around 200 000 cells across 21 samples, successfully revealing the heterogeneity of single cells and showing the wide applicability of Well-paired-seq for basic and clinical research.
Collapse
Affiliation(s)
- Kun Yin
- State Key Laboratory of Physical Chemistry of Solid Surfaces, The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, Key Laboratory for Chemical Biology of Fujian Province, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, P. R. China
| | - Meijuan Zhao
- State Key Laboratory of Physical Chemistry of Solid Surfaces, The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, Key Laboratory for Chemical Biology of Fujian Province, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, P. R. China
| | - Li Lin
- State Key Laboratory of Physical Chemistry of Solid Surfaces, The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, Key Laboratory for Chemical Biology of Fujian Province, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, P. R. China
| | - Yingwen Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, Key Laboratory for Chemical Biology of Fujian Province, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, P. R. China
| | - Shanqing Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, Key Laboratory for Chemical Biology of Fujian Province, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, P. R. China
| | - Chun Zhu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, Key Laboratory for Chemical Biology of Fujian Province, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, P. R. China
| | - Xuan Liang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, Key Laboratory for Chemical Biology of Fujian Province, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, P. R. China
| | - Fanghe Lin
- State Key Laboratory of Physical Chemistry of Solid Surfaces, The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, Key Laboratory for Chemical Biology of Fujian Province, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, P. R. China
| | - Haopai Wei
- State Key Laboratory of Physical Chemistry of Solid Surfaces, The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, Key Laboratory for Chemical Biology of Fujian Province, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, P. R. China
| | - Huimin Zeng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, Key Laboratory for Chemical Biology of Fujian Province, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, P. R. China
| | - Zhi Zhu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, Key Laboratory for Chemical Biology of Fujian Province, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, P. R. China
| | - Jia Song
- Institute of Molecular Medicine, State Key Laboratory of Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200120, China
| | - Chaoyong Yang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, Key Laboratory for Chemical Biology of Fujian Province, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, P. R. China
- Institute of Molecular Medicine, State Key Laboratory of Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200120, China
| |
Collapse
|
25
|
Xu J, Zhou S, Xia F, Xu A, Ye J. Research on the lying pattern of grouped pigs using unsupervised clustering and deep learning. Livest Sci 2022. [DOI: 10.1016/j.livsci.2022.104946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
26
|
Wang M, Song WM, Ming C, Wang Q, Zhou X, Xu P, Krek A, Yoon Y, Ho L, Orr ME, Yuan GC, Zhang B. Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer's disease: review, recommendation, implementation and application. Mol Neurodegener 2022; 17:17. [PMID: 35236372 PMCID: PMC8889402 DOI: 10.1186/s13024-022-00517-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/18/2022] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD) is the most common form of dementia, characterized by progressive cognitive impairment and neurodegeneration. Extensive clinical and genomic studies have revealed biomarkers, risk factors, pathways, and targets of AD in the past decade. However, the exact molecular basis of AD development and progression remains elusive. The emerging single-cell sequencing technology can potentially provide cell-level insights into the disease. Here we systematically review the state-of-the-art bioinformatics approaches to analyze single-cell sequencing data and their applications to AD in 14 major directions, including 1) quality control and normalization, 2) dimension reduction and feature extraction, 3) cell clustering analysis, 4) cell type inference and annotation, 5) differential expression, 6) trajectory inference, 7) copy number variation analysis, 8) integration of single-cell multi-omics, 9) epigenomic analysis, 10) gene network inference, 11) prioritization of cell subpopulations, 12) integrative analysis of human and mouse sc-RNA-seq data, 13) spatial transcriptomics, and 14) comparison of single cell AD mouse model studies and single cell human AD studies. We also address challenges in using human postmortem and mouse tissues and outline future developments in single cell sequencing data analysis. Importantly, we have implemented our recommended workflow for each major analytic direction and applied them to a large single nucleus RNA-sequencing (snRNA-seq) dataset in AD. Key analytic results are reported while the scripts and the data are shared with the research community through GitHub. In summary, this comprehensive review provides insights into various approaches to analyze single cell sequencing data and offers specific guidelines for study design and a variety of analytic directions. The review and the accompanied software tools will serve as a valuable resource for studying cellular and molecular mechanisms of AD, other diseases, or biological systems at the single cell level.
Collapse
Affiliation(s)
- Minghui Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Won-min Song
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Chen Ming
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Qian Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Xianxiao Zhou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Peng Xu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Azra Krek
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Yonejung Yoon
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Lap Ho
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Miranda E. Orr
- Department of Internal Medicine, Section of Gerontology and Geriatric Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
- Sticht Center for Healthy Aging and Alzheimer’s Prevention, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| |
Collapse
|
27
|
Ni Z, Zheng X, Zheng X, Zou X. scLRTD : A Novel Low Rank Tensor Decomposition Method for Imputing Missing Values in Single-Cell Multi-Omics Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1144-1153. [PMID: 32960767 DOI: 10.1109/tcbb.2020.3025804] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
With the successful application of single-cell sequencing technology, a large number of single-cell multi-omics sequencing (scMO-seq)data have been generated, which enables researchers to study heterogeneity between individual cells. One prominent problem in single-cell data analysis is the prevalence of dropouts, caused by failures in amplification during the experiments. It is necessary to develop effective approaches for imputing the missing values. Different with general methods imputing single type of single-cell data, we propose an imputation method called scLRTD, using low-rank tensor decomposition based on nuclear norm to impute scMO-seq data and single-cell RNA-sequencing (scRNA-seq)data with different stages, tissues or conditions. Furthermore, four sets of simulated and two sets of real scRNA-seq data from mouse embryonic stem cells and hepatocellular carcinoma, respectively, are used to carry out numerical experiments and compared with other six published methods. Error accuracy and clustering results demonstrate the effectiveness of proposed method. Moreover, we clearly identify two cell subpopulations after imputing the real scMO-seq data from hepatocellular carcinoma. Further, Gene Ontology identifies 7 genes in Bile secretion pathway, which is related to metabolism in hepatocellular carcinoma. The survival analysis using the database TCGA also show that two cell subpopulations after imputing have distinguished survival rates.
Collapse
|
28
|
Simultaneous dimensionality reduction and integration for single-cell ATAC-seq data using deep learning. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00443-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
AbstractAdvances in single-cell technologies enable the routine interrogation of chromatin accessibility for tens of thousands of single cells, elucidating gene regulatory processes at an unprecedented resolution. Meanwhile, size, sparsity and high dimensionality of the resulting data continue to pose challenges for its computational analysis, and specifically the integration of data from different sources. We have developed a dedicated computational approach: a variational auto-encoder using a noise model specifically designed for single-cell ATAC-seq (assay for transposase-accessible chromatin with high-throughput sequencing) data, which facilitates simultaneous dimensionality reduction and batch correction via an adversarial learning strategy. We showcase its benefits for detailed cell-type characterization on individual real and simulated datasets as well as for integrating multiple complex datasets.
Collapse
|
29
|
Chen X, Chen S, Song S, Gao Z, Hou L, Zhang X, Lv H, Jiang R. Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-021-00432-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
30
|
Li Z, Kuppe C, Ziegler S, Cheng M, Kabgani N, Menzel S, Zenke M, Kramann R, Costa IG. Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen. Nat Commun 2021; 12:6386. [PMID: 34737275 PMCID: PMC8568974 DOI: 10.1038/s41467-021-26530-2] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 10/04/2021] [Indexed: 12/18/2022] Open
Abstract
A major drawback of single-cell ATAC-seq (scATAC-seq) is its sparsity, i.e., open chromatin regions with no reads due to loss of DNA material during the scATAC-seq protocol. Here, we propose scOpen, a computational method based on regularized non-negative matrix factorization for imputing and quantifying the open chromatin status of regulatory regions from sparse scATAC-seq experiments. We show that scOpen improves crucial downstream analysis steps of scATAC-seq data as clustering, visualization, cis-regulatory DNA interactions, and delineation of regulatory features. We demonstrate the power of scOpen to dissect regulatory changes in the development of fibrosis in the kidney. This identifies a role of Runx1 and target genes by promoting fibroblast to myofibroblast differentiation driving kidney fibrosis.
Collapse
Affiliation(s)
- Zhijian Li
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, 52074, Aachen, Germany
| | - Christoph Kuppe
- Institute of Experimental Medicine and Systems Biology, RWTH Aachen University Medical School, 52074, Aachen, Germany
- Division of Nephrology and Clinical Immunology, RWTH Aachen University, 52074, Aachen, Germany
| | - Susanne Ziegler
- Institute of Experimental Medicine and Systems Biology, RWTH Aachen University Medical School, 52074, Aachen, Germany
| | - Mingbo Cheng
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, 52074, Aachen, Germany
| | - Nazanin Kabgani
- Institute of Experimental Medicine and Systems Biology, RWTH Aachen University Medical School, 52074, Aachen, Germany
| | - Sylvia Menzel
- Institute of Experimental Medicine and Systems Biology, RWTH Aachen University Medical School, 52074, Aachen, Germany
| | - Martin Zenke
- Department of Cell Biology, Institute of Biomedical Engineering, RWTH Aachen University Medical School, 52074, Aachen, Germany
- Helmholtz Institute for Biomedical Engineering, RWTH Aachen University, Aachen, Germany
| | - Rafael Kramann
- Institute of Experimental Medicine and Systems Biology, RWTH Aachen University Medical School, 52074, Aachen, Germany.
- Division of Nephrology and Clinical Immunology, RWTH Aachen University, 52074, Aachen, Germany.
- Department of Internal Medicine, Nephrology and Transplantation, Erasmus Medical Center, 3015GD, Rotterdam, The Netherlands.
| | - Ivan G Costa
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, 52074, Aachen, Germany.
| |
Collapse
|
31
|
Liu Y, Zhang J, Wang S, Zeng X, Zhang W. Are dropout imputation methods for scRNA-seq effective for scATAC-seq data? Brief Bioinform 2021; 23:6412397. [PMID: 34718405 DOI: 10.1093/bib/bbab442] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Revised: 09/08/2021] [Accepted: 09/27/2021] [Indexed: 11/12/2022] Open
Abstract
The tremendous progress of single-cell sequencing technology has given researchers the opportunity to study cell development and differentiation processes at single-cell resolution. Assay of Transposase-Accessible Chromatin by deep sequencing (ATAC-seq) was proposed for genome-wide analysis of chromatin accessibility. Due to technical limitations or other reasons, dropout events are almost a common occurrence for extremely sparse single-cell ATAC-seq data, leading to confusion in downstream analysis (such as clustering). Although considerable progress has been made in the estimation of scRNA-seq data, there is currently no specific method for the inference of dropout events in single-cell ATAC-seq data. In this paper, we select several state-of-the-art scRNA-seq imputation methods (including MAGIC, SAVER, scImpute, deepImpute, PRIME, bayNorm and knn-smoothing) in recent years to infer dropout peaks in scATAC-seq data, and perform a systematic evaluation of these methods through several downstream analyses. Specifically, we benchmarked these methods in terms of correlation with meta-cell, clustering, subpopulations distance analysis, imputation performance for corruption datasets, identification of TF motifs and computation time. The experimental results indicated that most of the imputed peaks increased the correlation with the reference meta-cell, while the performance of different methods on different datasets varied greatly in different downstream analyses, thus should be used with caution. In general, MAGIC performed better than the other methods most consistently across all assessments. Our source code is freely available at https://github.com/yueyueliu/scATAC-master.
Collapse
Affiliation(s)
- Yue Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Junfeng Zhang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Shulin Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Wei Zhang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, Hunan 410003, China
| |
Collapse
|
32
|
Asada K, Takasawa K, Machino H, Takahashi S, Shinkai N, Bolatkan A, Kobayashi K, Komatsu M, Kaneko S, Okamoto K, Hamamoto R. Single-Cell Analysis Using Machine Learning Techniques and Its Application to Medical Research. Biomedicines 2021; 9:biomedicines9111513. [PMID: 34829742 PMCID: PMC8614827 DOI: 10.3390/biomedicines9111513] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 10/06/2021] [Accepted: 10/19/2021] [Indexed: 01/14/2023] Open
Abstract
In recent years, the diversity of cancer cells in tumor tissues as a result of intratumor heterogeneity has attracted attention. In particular, the development of single-cell analysis technology has made a significant contribution to the field; technologies that are centered on single-cell RNA sequencing (scRNA-seq) have been reported to analyze cancer constituent cells, identify cell groups responsible for therapeutic resistance, and analyze gene signatures of resistant cell groups. However, although single-cell analysis is a powerful tool, various issues have been reported, including batch effects and transcriptional noise due to gene expression variation and mRNA degradation. To overcome these issues, machine learning techniques are currently being introduced for single-cell analysis, and promising results are being reported. In addition, machine learning has also been used in various ways for single-cell analysis, such as single-cell assay of transposase accessible chromatin sequencing (ATAC-seq), chromatin immunoprecipitation sequencing (ChIP-seq) analysis, and multi-omics analysis; thus, it contributes to a deeper understanding of the characteristics of human diseases, especially cancer, and supports clinical applications. In this review, we present a comprehensive introduction to the implementation of machine learning techniques in medical research for single-cell analysis, and discuss their usefulness and future potential.
Collapse
Affiliation(s)
- Ken Asada
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (K.T.); (H.M.); (S.T.); (N.S.); (A.B.); (M.K.)
- Correspondence: (K.A.); (R.H.); Tel.: +81-3-3547-5271 (R.H.)
| | - Ken Takasawa
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (K.T.); (H.M.); (S.T.); (N.S.); (A.B.); (M.K.)
| | - Hidenori Machino
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (K.T.); (H.M.); (S.T.); (N.S.); (A.B.); (M.K.)
| | - Satoshi Takahashi
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (K.T.); (H.M.); (S.T.); (N.S.); (A.B.); (M.K.)
| | - Norio Shinkai
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (K.T.); (H.M.); (S.T.); (N.S.); (A.B.); (M.K.)
- Department of NCC Cancer Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo 113-8510, Japan
| | - Amina Bolatkan
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (K.T.); (H.M.); (S.T.); (N.S.); (A.B.); (M.K.)
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (K.K.); (S.K.)
| | - Kazuma Kobayashi
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (K.K.); (S.K.)
| | - Masaaki Komatsu
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; (K.T.); (H.M.); (S.T.); (N.S.); (A.B.); (M.K.)
| | - Syuzo Kaneko
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (K.K.); (S.K.)
| | - Koji Okamoto
- Division of Cancer Differentiation, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan;
| | - Ryuji Hamamoto
- Department of NCC Cancer Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo 113-8510, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (K.K.); (S.K.)
- Correspondence: (K.A.); (R.H.); Tel.: +81-3-3547-5271 (R.H.)
| |
Collapse
|
33
|
Danese A, Richter ML, Chaichoompu K, Fischer DS, Theis FJ, Colomé-Tatché M. EpiScanpy: integrated single-cell epigenomic analysis. Nat Commun 2021; 12:5228. [PMID: 34471111 PMCID: PMC8410937 DOI: 10.1038/s41467-021-25131-3] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Accepted: 07/22/2021] [Indexed: 11/14/2022] Open
Abstract
EpiScanpy is a toolkit for the analysis of single-cell epigenomic data, namely single-cell DNA methylation and single-cell ATAC-seq data. To address the modality specific challenges from epigenomics data, epiScanpy quantifies the epigenome using multiple feature space constructions and builds a nearest neighbour graph using epigenomic distance between cells. EpiScanpy makes the many existing scRNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities, including methods for common clustering, dimension reduction, cell type identification and trajectory learning techniques, as well as an atlas integration tool for scATAC-seq datasets. The toolkit also features numerous useful downstream functions, such as differential methylation and differential openness calling, mapping epigenomic features of interest to their nearest gene, or constructing gene activity matrices using chromatin openness. We successfully benchmark epiScanpy against other scATAC-seq analysis tools and show its outperformance at discriminating cell types.
Collapse
Affiliation(s)
- Anna Danese
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Maria L Richter
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Kridsadakorn Chaichoompu
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - David S Fischer
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.
- Department of Mathematics, Technical University of Munich, Garching, Germany.
| | - Maria Colomé-Tatché
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.
- Biomedical Center (BMC), Physiological Chemistry, Faculty of Medicine, LMU Munich, Planegg-Martinsried, Germany.
| |
Collapse
|
34
|
A Multiple Comprehensive Analysis of scATAC-seq Based on Auto-Encoder and Matrix Decomposition. Symmetry (Basel) 2021. [DOI: 10.3390/sym13081467] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Single-cell ATAC-seq (scATAC-seq), as the updating of ATAC-seq, provides a novel method for probing open chromatin sites. Currently, research of scATAC-seq is faced with the problem of high dimensionality and the inherent sparsity of the generated data. Recently, several works proposed the use of an autoencoder–decoder, a symmetry neural network architecture, and non-negative matrix factorization methods to characterize the high-dimensional data. To evaluate the performance of multiple methods, in this work, we performed a multiple comparison for characterizing scATAC-seq based on four kinds of auto-encoders known as a symmetry neural network, and two kinds of matrix factorization methods. Different sizes of latent features were used to generate the UMAP plots and for further K-means clustering. Using a gold-standard data set, we practically explored the performance among the methods and the number of latent features in a comprehensive way. Finally, we briefly discuss the underlying difficulties and future directions for scATAC-seq characterizing. As a result, the method designed for handling the sparsity outperforms other tools in the generated dataset.
Collapse
|
35
|
Bode D, Cull AH, Rubio-Lara JA, Kent DG. Exploiting Single-Cell Tools in Gene and Cell Therapy. Front Immunol 2021; 12:702636. [PMID: 34322133 PMCID: PMC8312222 DOI: 10.3389/fimmu.2021.702636] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 06/28/2021] [Indexed: 12/12/2022] Open
Abstract
Single-cell molecular tools have been developed at an incredible pace over the last five years as sequencing costs continue to drop and numerous molecular assays have been coupled to sequencing readouts. This rapid period of technological development has facilitated the delineation of individual molecular characteristics including the genome, transcriptome, epigenome, and proteome of individual cells, leading to an unprecedented resolution of the molecular networks governing complex biological systems. The immense power of single-cell molecular screens has been particularly highlighted through work in systems where cellular heterogeneity is a key feature, such as stem cell biology, immunology, and tumor cell biology. Single-cell-omics technologies have already contributed to the identification of novel disease biomarkers, cellular subsets, therapeutic targets and diagnostics, many of which would have been undetectable by bulk sequencing approaches. More recently, efforts to integrate single-cell multi-omics with single cell functional output and/or physical location have been challenging but have led to substantial advances. Perhaps most excitingly, there are emerging opportunities to reach beyond the description of static cellular states with recent advances in modulation of cells through CRISPR technology, in particular with the development of base editors which greatly raises the prospect of cell and gene therapies. In this review, we provide a brief overview of emerging single-cell technologies and discuss current developments in integrating single-cell molecular screens and performing single-cell multi-omics for clinical applications. We also discuss how single-cell molecular assays can be usefully combined with functional data to unpick the mechanism of cellular decision-making. Finally, we reflect upon the introduction of spatial transcriptomics and proteomics, its complementary role with single-cell RNA sequencing (scRNA-seq) and potential application in cellular and gene therapy.
Collapse
Affiliation(s)
- Daniel Bode
- Wellcome Medical Research Council (MRC) Cambridge Stem Cell Institute, University of Cambridge, Cambridge, United Kingdom
- Department of Haematology, University of Cambridge, Cambridge, United Kingdom
| | - Alyssa H. Cull
- York Biomedical Research Institute, Department of Biology, University of York, York, United Kingdom
| | - Juan A. Rubio-Lara
- York Biomedical Research Institute, Department of Biology, University of York, York, United Kingdom
| | - David G. Kent
- York Biomedical Research Institute, Department of Biology, University of York, York, United Kingdom
| |
Collapse
|
36
|
Yu F, Sankaran VG, Yuan GC. CUT&RUNTools 2.0: a pipeline for single-cell and bulk-level CUT&RUN and CUT&Tag data analysis. Bioinformatics 2021; 38:252-254. [PMID: 34244724 PMCID: PMC8696090 DOI: 10.1093/bioinformatics/btab507] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 07/01/2021] [Accepted: 07/07/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Genome-wide profiling of transcription factor binding and chromatin states is a widely-used approach for mechanistic understanding of gene regulation. Recent technology development has enabled such profiling at single-cell resolution. However, an end-to-end computational pipeline for analyzing such data is still lacking. RESULTS Here, we have developed a flexible pipeline for analysis and visualization of single-cell CUT&Tag and CUT&RUN data, which provides functions for sequence alignment, quality control, dimensionality reduction, cell clustering, data aggregation and visualization. Furthermore, it is also seamlessly integrated with the functions in original CUT&RUNTools for population-level analyses. As such, this provides a valuable toolbox for the community. AVAILABILITY AND IMPLEMENTATION https://github.com/fl-yu/CUT-RUNTools-2.0. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fulong Yu
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA,Division of Hematology/Oncology, Boston Children’s Hospital, Boston, MA 02115, USA,Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA,Program in Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02115, USA
| | - Vijay G Sankaran
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA,Division of Hematology/Oncology, Boston Children’s Hospital, Boston, MA 02115, USA,Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA,Program in Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02115, USA
| | | |
Collapse
|
37
|
Rudzka DA, Mason S, Neilson M, McGarry L, Kalna G, Hedley A, Blyth K, Olson MF. Selection of established tumour cells through narrow diameter micropores enriches for elevated Ras/Raf/MEK/ERK MAPK signalling and enhanced tumour growth. Small GTPases 2021; 12:294-310. [PMID: 32569510 PMCID: PMC8204978 DOI: 10.1080/21541248.2020.1780108] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 04/17/2020] [Accepted: 06/04/2020] [Indexed: 11/10/2022] Open
Abstract
As normal cells become cancer cells, and progress towards malignancy, they become progressively softer. Advantages of this change are that tumour cells become more deformable, and better able to move through narrow constraints. We designed a positive selection strategy that enriched for cells which could move through narrow diameter micropores to identify cell phenotypes that enabled constrained migration. Using human MDA MB 231 breast cancer and MDA MB 435 melanoma cancer cells, we found that micropore selection favoured cells with relatively higher Ras/Raf/MEK/ERK mitogen-activated protein kinase (MAPK) signalling, which affected actin cytoskeleton organization, focal adhesion density and cell elasticity. In this follow-up study, we provide further evidence that selection through micropores enriched for cells with altered cell morphology and adhesion. Additional analysis of RNA sequencing data revealed a set of transcripts associated with small cell size that was independent of constrained migration. Gene set enrichment analysis identified the 'matrisome' as the most significantly altered gene set linked with small size. When grown as orthotopic xenograft tumours in immunocompromised mice, micropore selected cells grew significantly faster than Parent or Flow-Sorted cells. Using mathematical modelling, we determined that there is an interaction between 1) the cell to gap size ratio; 2) the bending rigidity of the cell, which enable movement through narrow gaps. These results extend our previous conclusion that Ras/Raf/MEK/ERK MAPK signalling has a significant role in regulating cell biomechanics by showing that the selective pressure of movement through narrow gaps also enriches for increased tumour growth in vivo.
Collapse
Affiliation(s)
- Dominika a Rudzka
- Cancer Research UK Beatson Institute, Glasgow, UK
- Institute of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - Susan Mason
- Cancer Research UK Beatson Institute, Glasgow, UK
| | | | - Lynn McGarry
- Cancer Research UK Beatson Institute, Glasgow, UK
| | | | - Ann Hedley
- Cancer Research UK Beatson Institute, Glasgow, UK
| | - Karen Blyth
- Cancer Research UK Beatson Institute, Glasgow, UK
| | - Michael F. Olson
- Cancer Research UK Beatson Institute, Glasgow, UK
- Institute of Cancer Sciences, University of Glasgow, Glasgow, UK
- Department of Chemistry and Biology, Ryerson University, Toronto, ON, Canada
| |
Collapse
|
38
|
Wangwu J, Sun Z, Lin Z. scAMACE: Model-based approach to the joint analysis of single-cell data on chromatin accessibility, gene expression and methylation. Bioinformatics 2021; 37:3874-3880. [PMID: 34086847 DOI: 10.1093/bioinformatics/btab426] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 05/26/2021] [Accepted: 06/03/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The advancement in technologies and the growth of available single-cell datasets motivate integrative analysis of multiple single-cell genomic datasets. Integrative analysis of multimodal single-cell datasets combines complementary information offered by single-omic datasets and can offer deeper insights on complex biological process. Clustering methods that identify the unknown cell types are among the first few steps in the analysis of single-cell datasets, and they are important for downstream analysis built upon the identified cell types. RESULTS We propose scAMACE for the integrative analysis and clustering of single-cell data on chromatin accessibility, gene expression and methylation. We demonstrate that cell types are better identified and characterized through analyzing the three data types jointly. We develop an efficient expectationmaximization (EM) algorithm to perform statistical inference, and evaluate our methods on both simulation study and real data applications. We also provide the GPU implementation of scAMACE, making it scalable to large datasets. AVAILABILITY The software and datasets are available at https://github.com/cuhklinlab/scAMACE_py (python implementation) and https://github.com/cuhklinlab/scAMACE (R implementation). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiaxuan Wangwu
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Zexuan Sun
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Zhixiang Lin
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
39
|
coupleCoC+: An information-theoretic co-clustering-based transfer learning framework for the integrative analysis of single-cell genomic data. PLoS Comput Biol 2021; 17:e1009064. [PMID: 34077420 PMCID: PMC8202939 DOI: 10.1371/journal.pcbi.1009064] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Revised: 06/14/2021] [Accepted: 05/11/2021] [Indexed: 12/02/2022] Open
Abstract
Technological advances have enabled us to profile multiple molecular layers at unprecedented single-cell resolution and the available datasets from multiple samples or domains are growing. These datasets, including scRNA-seq data, scATAC-seq data and sc-methylation data, usually have different powers in identifying the unknown cell types through clustering. So, methods that integrate multiple datasets can potentially lead to a better clustering performance. Here we propose coupleCoC+ for the integrative analysis of single-cell genomic data. coupleCoC+ is a transfer learning method based on the information-theoretic co-clustering framework. In coupleCoC+, we utilize the information in one dataset, the source data, to facilitate the analysis of another dataset, the target data. coupleCoC+ uses the linked features in the two datasets for effective knowledge transfer, and it also uses the information of the features in the target data that are unlinked with the source data. In addition, coupleCoC+ matches similar cell types across the source data and the target data. By applying coupleCoC+ to the integrative clustering of mouse cortex scATAC-seq data and scRNA-seq data, mouse and human scRNA-seq data, mouse cortex sc-methylation and scRNA-seq data, and human blood dendritic cells scRNA-seq data from two batches, we demonstrate that coupleCoC+ improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. coupleCoC+ has fast convergence and it is computationally efficient. The software is available at https://github.com/cuhklinlab/coupleCoC_plus. The recent advances in single-cell technologies have enabled multiple biological layers to be probed and provides unprecedented opportunities to assay cellular heterogeneity. To analyze the complex biological processes varying across cells, we need to obtain and integrate different types of genomic features through flexible but rigorous computational methods. The most important challenge for data integration is to link data from different sources in a way that is biologically meaningful. In this work, we have developed a transfer learning method based on the information-theoretic co-clustering framework for the integrative analysis of single-cell genomic data. This method utilizes the information from one dataset to boost the analysis of another dataset, and it also uses the information of the features that are unlinked in the two datasets. We demonstrate that our transfer learning-based clustering method significantly improves clustering performance in single-cell genomic datasets. Our results show that transfer learning is promising for the integrative analysis of single-cell genomic data.
Collapse
|
40
|
Liu Q, Chen S, Jiang R, Wong WH. Simultaneous deep generative modeling and clustering of single cell genomic data. NAT MACH INTELL 2021; 3:536-544. [PMID: 34179690 PMCID: PMC8223760 DOI: 10.1038/s42256-021-00333-y] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 03/14/2021] [Indexed: 01/15/2023]
Abstract
Recent advances in single-cell technologies, including single-cell ATAC-seq (scATAC-seq), have enabled large-scale profiling of the chromatin accessibility landscape at the single cell level. However, the characteristics of scATAC-seq data, including high sparsity and high dimensionality, have greatly complicated the computational analysis. Here, we proposed scDEC, a computational tool for single cell ATAC-seq analysis with deep generative neural networks. scDEC is built on a pair of generative adversarial networks (GANs), and is capable of learning the latent representation and inferring the cell labels, simultaneously. In a series of experiments, scDEC demonstrates superior performance over other tools in scATAC-seq analysis across multiple datasets and experimental settings. In downstream applications, we demonstrated that the generative power of scDEC helps to infer the trajectory and intermediate state of cells during differentiation and the latent features learned by scDEC can potentially reveal both biological cell types and within-cell-type variations. We also showed that it is possible to extend scDEC for the integrative analysis of multi-modal single cell data.
Collapse
Affiliation(s)
- Qiao Liu
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Shengquan Chen
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wing Hung Wong
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Bio-X Program, Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
41
|
Zuo C, Dai H, Chen L. Deep cross-omics cycle attention model for joint analysis of single-cell multi-omics data. Bioinformatics 2021; 37:4091-4099. [PMID: 34028557 DOI: 10.1093/bioinformatics/btab403] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 05/14/2021] [Accepted: 05/22/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Joint profiling of single-cell transcriptomics and epigenomics data enables us to characterize cell states and transcriptomics regulatory programs related to cellular heterogeneity. However, the highly different features on sparsity, heterogeneity, and dimensionality between multi-omics data have severely hindered its integrative analysis. RESULTS We proposed deep cross-omics cycle attention (DCCA) model, a computational tool for joint analysis of single-cell multi-omics data, by combining variational autoencoders (VAEs) and attention-transfer. Specifically, we show that DCCA can leverage one omics data to fine-tune the network trained for another omics data, given a dataset of parallel multi-omics data within the same cell. Studies on both simulated and real datasets from various platforms, DCCA demonstrates its superior capability: (i) dissecting cellular heterogeneity; (ii) denoising and aggregating data; and (iii) constructing the link between multi-omics data, which is used to infer new transcriptional regulatory relations. In our applications, DCCA was demonstrated to have a superior power to generate missing stages or omics in a biologically meaningful manner, which provides a new way to analyze and also understand complicated biological processes. AVAILABILITY AND IMPLEMENTATION DCCA source code is available at https://github.com/cmzuo11/DCCA, and has been deposited in archived format at https://doi.org/10.5281/zenodo.4762065. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chunman Zuo
- State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Hao Dai
- State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Luonan Chen
- State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China.,Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China.,Pazhou Lab, Guangzhou 510330, China
| |
Collapse
|
42
|
Patty BJ, Hainer SJ. Transcription factor chromatin profiling genome-wide using uliCUT&RUN in single cells and individual blastocysts. Nat Protoc 2021; 16:2633-2666. [PMID: 33911257 PMCID: PMC8177051 DOI: 10.1038/s41596-021-00516-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 02/04/2021] [Indexed: 02/02/2023]
Abstract
Determining chromatin-associated protein localization across the genome has provided insight into the functions of DNA-binding proteins and their connections to disease. However, established protocols requiring large quantities of cell or tissue samples currently limit applications for clinical and biomedical research in this field. Furthermore, most technologies have been optimized to assess abundant histone protein localization, prohibiting the investigation of nonhistone protein localization in low cell numbers. We recently described a protocol to profile chromatin-associated protein localization in as low as one cell: ultra-low-input cleavage under targets and release using nuclease (uliCUT&RUN). Optimized from chromatin immunocleavage and CUT&RUN, uliCUT&RUN is a tethered enzyme-based protocol that utilizes a combination of recombinant protein, antibody recognition and stringent purification to selectively target proteins of interest and isolate the associated DNA. Performed in native conditions, uliCUT&RUN profiles protein localization to chromatin with low input and high precision. Compared with other profiling technologies, uliCUT&RUN can determine nonhistone protein chromatin occupancies in low cell numbers, permitting the investigation into the molecular functions of a range of DNA-binding proteins within rare samples. From sample preparation to sequencing library submission, the uliCUT&RUN protocol takes <2 d to perform, with the accompanying data analysis timeline dependent on experience level.
Collapse
Affiliation(s)
- Benjamin J Patty
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | - Sarah J Hainer
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
43
|
Chen S, Yan G, Zhang W, Li J, Jiang R, Lin Z. RA3 is a reference-guided approach for epigenetic characterization of single cells. Nat Commun 2021; 12:2177. [PMID: 33846355 PMCID: PMC8041798 DOI: 10.1038/s41467-021-22495-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 03/18/2021] [Indexed: 12/13/2022] Open
Abstract
The recent advancements in single-cell technologies, including single-cell chromatin accessibility sequencing (scCAS), have enabled profiling the epigenetic landscapes for thousands of individual cells. However, the characteristics of scCAS data, including high dimensionality, high degree of sparsity and high technical variation, make the computational analysis challenging. Reference-guided approaches, which utilize the information in existing datasets, may facilitate the analysis of scCAS data. Here, we present RA3 (Reference-guided Approach for the Analysis of single-cell chromatin Accessibility data), which utilizes the information in massive existing bulk chromatin accessibility and annotated scCAS data. RA3 simultaneously models (1) the shared biological variation among scCAS data and the reference data, and (2) the unique biological variation in scCAS data that identifies distinct subpopulations. We show that RA3 achieves superior performance when used on several scCAS datasets, and on references constructed using various approaches. Altogether, these analyses demonstrate the wide applicability of RA3 in analyzing scCAS data.
Collapse
Affiliation(s)
- Shengquan Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Guanao Yan
- School of Mathematical Sciences, Zhejiang University, Hangzhou, China
| | - Wenyu Zhang
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Jinzhao Li
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China.
| | - Zhixiang Lin
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
44
|
Rai MF, Wu CL, Capellini TD, Guilak F, Dicks AR, Muthuirulan P, Grandi F, Bhutani N, Westendorf JJ. Single Cell Omics for Musculoskeletal Research. Curr Osteoporos Rep 2021; 19:131-140. [PMID: 33559841 PMCID: PMC8743139 DOI: 10.1007/s11914-021-00662-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/19/2021] [Indexed: 02/04/2023]
Abstract
PURPOSE OF REVIEW The ability to analyze the molecular events occurring within individual cells as opposed to populations of cells is revolutionizing our understanding of musculoskeletal tissue development and disease. Single cell studies have the great potential of identifying cellular subpopulations that work in a synchronized fashion to regenerate and repair damaged tissues during normal homeostasis. In addition, such studies can elucidate how these processes break down in disease as well as identify cellular subpopulations that drive the disease. This review highlights three emerging technologies: single cell RNA sequencing (scRNA-seq), Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq), and Cytometry by Time-Of-Flight (CyTOF) mass cytometry. RECENT FINDINGS Technological and bioinformatic tools to analyze the transcriptome, epigenome, and proteome at the individual cell level have advanced rapidly making data collection relatively easy; however, understanding how to access and interpret the data remains a challenge for many scientists. It is, therefore, of paramount significance to educate the musculoskeletal community on how single cell technologies can be used to answer research questions and advance translation. This article summarizes talks given during a workshop on "Single Cell Omics" at the 2020 annual meeting of the Orthopedic Research Society. Studies that applied scRNA-seq, ATAC-seq, and CyTOF mass cytometry to cartilage development and osteoarthritis are reviewed. This body of work shows how these cutting-edge tools can advance our understanding of the cellular heterogeneity and trajectories of lineage specification during development and disease.
Collapse
Affiliation(s)
- Muhammad Farooq Rai
- Department of Orthopaedic Surgery, Washington University, St. Louis, MO, USA
| | - Chia-Lung Wu
- Department of Orthopaedic Surgery, Washington University and Shriners Hospitals for Children, St. Louis, MO, USA
| | - Terence D Capellini
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Farshid Guilak
- Department of Orthopaedic Surgery, Washington University and Shriners Hospitals for Children, St. Louis, MO, USA
| | - Amanda R Dicks
- Department of Orthopaedic Surgery, Washington University and Shriners Hospitals for Children, St. Louis, MO, USA
| | | | - Fiorella Grandi
- Department of Orthopedic Surgery, Stanford University, Stanford, CA, USA
| | - Nidhi Bhutani
- Department of Orthopedic Surgery, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
45
|
Navidi Z, Zhang L, Wang B. simATAC: a single-cell ATAC-seq simulation framework. Genome Biol 2021; 22:74. [PMID: 33663563 PMCID: PMC7934446 DOI: 10.1186/s13059-021-02270-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 01/13/2021] [Indexed: 12/21/2022] Open
Abstract
Single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) identifies regulated chromatin accessibility modules at the single-cell resolution. Robust evaluation is critical to the development of scATAC-seq pipelines, which calls for reproducible datasets for benchmarking. We hereby present the simATAC framework, an R package that generates scATAC-seq count matrices that highly resemble real scATAC-seq datasets in library size, sparsity, and chromatin accessibility signals. simATAC deploys statistical models derived from analyzing 90 real scATAC-seq cell groups. simATAC provides a robust and systematic approach to generate in silico scATAC-seq samples with known cell labels for assessing analytical pipelines.
Collapse
Affiliation(s)
- Zeinab Navidi
- Peter Munk Cardiac Centre, University Health Network, Toronto, Canada
| | - Lin Zhang
- Department of Statistical Sciences, University of Toronto, Toronto, Canada
| | - Bo Wang
- Peter Munk Cardiac Centre, University Health Network, Toronto, Canada. .,Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada. .,Department of Computer Science, University of Toronto, Toronto, Canada. .,Vector Institute, Toronto, Canada.
| |
Collapse
|
46
|
Scherer M, Schmidt F, Lazareva O, Walter J, Baumbach J, Schulz MH, List M. Machine learning for deciphering cell heterogeneity and gene regulation. NATURE COMPUTATIONAL SCIENCE 2021; 1:183-191. [PMID: 38183187 DOI: 10.1038/s43588-021-00038-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 02/08/2021] [Indexed: 12/14/2022]
Abstract
Epigenetics studies inheritable and reversible modifications of DNA that allow cells to control gene expression throughout their development and in response to environmental conditions. In computational epigenomics, machine learning is applied to study various epigenetic mechanisms genome wide. Its aim is to expand our understanding of cell differentiation, that is their specialization, in health and disease. Thus far, most efforts focus on understanding the functional encoding of the genome and on unraveling cell-type heterogeneity. Here, we provide an overview of state-of-the-art computational methods and their underlying statistical concepts, which range from matrix factorization and regularized linear regression to deep learning methods. We further show how the rise of single-cell technology leads to new computational challenges and creates opportunities to further our understanding of epigenetic regulation.
Collapse
Affiliation(s)
- Michael Scherer
- Department of Genetics/Epigenetics, Saarland University, Saarbrücken, Germany
- Computational Biology Group, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
- Graduate School of Computer Science, Saarland Informatics Campus, Saarbrücken, Germany
| | | | - Olga Lazareva
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Jörn Walter
- Computational Biology Group, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Jan Baumbach
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- Computational BioMedicine Lab, Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Marcel H Schulz
- Institute of Cardiovascular Regeneration, University Hospital and Goethe University Frankfurt, Frankfurt, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.
| |
Collapse
|
47
|
Granja JM, Corces MR, Pierce SE, Bagdatli ST, Choudhry H, Chang HY, Greenleaf WJ. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat Genet 2021; 53:403-411. [PMID: 33633365 PMCID: PMC8012210 DOI: 10.1038/s41588-021-00790-6] [Citation(s) in RCA: 563] [Impact Index Per Article: 187.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 01/19/2021] [Indexed: 12/26/2022]
Abstract
The advent of single-cell chromatin accessibility profiling has accelerated the ability to map gene regulatory landscapes but has outpaced the development of scalable software to rapidly extract biological meaning from these data. Here we present a software suite for single-cell analysis of regulatory chromatin in R (ArchR; https://www.archrproject.com/) that enables fast and comprehensive analysis of single-cell chromatin accessibility data. ArchR provides an intuitive, user-focused interface for complex single-cell analyses, including doublet removal, single-cell clustering and cell type identification, unified peak set generation, cellular trajectory identification, DNA element-to-gene linkage, transcription factor footprinting, mRNA expression level prediction from chromatin accessibility and multi-omic integration with single-cell RNA sequencing (scRNA-seq). Enabling the analysis of over 1.2 million single cells within 8 h on a standard Unix laptop, ArchR is a comprehensive software suite for end-to-end analysis of single-cell chromatin accessibility that will accelerate the understanding of gene regulation at the resolution of individual cells. ArchR is a software suite that enables efficient and end-to-end analysis of single-cell chromatin accessibility data (scATAC-seq).
Collapse
Affiliation(s)
- Jeffrey M Granja
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA. .,Program in Biophysics, Stanford University, Stanford, CA, USA. .,Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA.
| | - M Ryan Corces
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA.,Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA.,Gladstone Institute of Neurological Disease, Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA.,Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Sarah E Pierce
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.,Program in Cancer Biology, Stanford University School of Medicine, Stanford, CA, USA
| | - S Tansu Bagdatli
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Hani Choudhry
- Department of Biochemistry, Faculty of Science, Cancer and Mutagenesis Unit, King Fahd Center for Medical Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Howard Y Chang
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA. .,Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA. .,Howard Hughes Medical Institute, Stanford University, Stanford, CA, USA.
| | - William J Greenleaf
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA. .,Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA. .,Department of Applied Physics, Stanford University, Stanford, CA, USA. .,Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
48
|
Sinha S, Satpathy AT, Zhou W, Ji H, Stratton JA, Jaffer A, Bahlis N, Morrissy S, Biernaskie JA. Profiling Chromatin Accessibility at Single-cell Resolution. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:172-190. [PMID: 33581341 PMCID: PMC8602754 DOI: 10.1016/j.gpb.2020.06.010] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 03/04/2020] [Accepted: 08/15/2020] [Indexed: 01/22/2023]
Abstract
How distinct transcriptional programs are enacted to generate cellular heterogeneity and plasticity, and enable complex fate decisions are important open questions. One key regulator is the cell’s epigenome state that drives distinct transcriptional programs by regulating chromatin accessibility. Genome-wide chromatin accessibility measurements can impart insights into regulatory sequences (in)accessible to DNA-binding proteins at a single-cell resolution. This review outlines molecular methods and bioinformatic tools for capturing cell-to-cell chromatin variation using single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) in a scalable fashion. It also covers joint profiling of chromatin with transcriptome/proteome measurements, computational strategies to integrate multi-omic measurements, and predictive bioinformatic tools to infer chromatin accessibility from single-cell transcriptomic datasets. Methodological refinements that increase power for cell discovery through robust chromatin coverage and integrate measurements from multiple modalities will further expand our understanding of gene regulation during homeostasis and disease.
Collapse
Affiliation(s)
- Sarthak Sinha
- Department of Comparative Biology & Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada.
| | - Ansuman T Satpathy
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Weiqiang Zhou
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Hongkai Ji
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Jo A Stratton
- Department of Comparative Biology & Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada; Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Arzina Jaffer
- Department of Comparative Biology & Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Nizar Bahlis
- Arnie Charbonneau Cancer Institute, University of Calgary, Calgary, AB T2N 4Z6, Canada
| | - Sorana Morrissy
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada; Arnie Charbonneau Cancer Institute, University of Calgary, Calgary, AB T2N 4Z6, Canada; Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Jeff A Biernaskie
- Department of Comparative Biology & Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada; Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, AB T2N 4N1, Canada.
| |
Collapse
|
49
|
Minnoye L, Marinov GK, Krausgruber T, Pan L, Marand AP, Secchia S, Greenleaf WJ, Furlong EEM, Zhao K, Schmitz RJ, Bock C, Aerts S. Chromatin accessibility profiling methods. NATURE REVIEWS. METHODS PRIMERS 2021; 1:10. [PMID: 38410680 PMCID: PMC10895463 DOI: 10.1038/s43586-020-00008-9] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 12/01/2020] [Indexed: 02/06/2023]
Abstract
Chromatin accessibility, or the physical access to chromatinized DNA, is a widely studied characteristic of the eukaryotic genome. As active regulatory DNA elements are generally 'accessible', the genome-wide profiling of chromatin accessibility can be used to identify candidate regulatory genomic regions in a tissue or cell type. Multiple biochemical methods have been developed to profile chromatin accessibility, both in bulk and at the single-cell level. Depending on the method, enzymatic cleavage, transposition or DNA methyltransferases are used, followed by high-throughput sequencing, providing a view of genome-wide chromatin accessibility. In this Primer, we discuss these biochemical methods, as well as bioinformatics tools for analysing and interpreting the generated data, and insights into the key regulators underlying developmental, evolutionary and disease processes. We outline standards for data quality, reproducibility and deposition used by the genomics community. Although chromatin accessibility profiling is invaluable to study gene regulation, alone it provides only a partial view of this complex process. Orthogonal assays facilitate the interpretation of accessible regions with respect to enhancer-promoter proximity, functional transcription factor binding and regulatory function. We envision that technological improvements including single-molecule, multi-omics and spatial methods will bring further insight into the secrets of genome regulation.
Collapse
Affiliation(s)
- Liesbeth Minnoye
- Center for Brain & Disease Research, VIB-KU Leuven, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | | | - Thomas Krausgruber
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Lixia Pan
- Laboratory of Epigenome Biology, Systems Biology Center, Division of Intramural Research, National Heart, Lung and Blood Institute, NIH, Bethesda, MD, USA
| | | | - Stefano Secchia
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | | | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Keji Zhao
- Laboratory of Epigenome Biology, Systems Biology Center, Division of Intramural Research, National Heart, Lung and Blood Institute, NIH, Bethesda, MD, USA
| | | | - Christoph Bock
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Institute of Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria
| | - Stein Aerts
- Center for Brain & Disease Research, VIB-KU Leuven, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| |
Collapse
|
50
|
Chen Z, Zhang J, Liu J, Zhang Z, Zhu J, Lee D, Xu M, Gerstein M. SCAN-ATAC-Sim: a scalable and efficient method for simulating single-cell ATAC-seq data from bulk-tissue experiments. Bioinformatics 2021; 37:1756-1758. [PMID: 33471102 PMCID: PMC8289380 DOI: 10.1093/bioinformatics/btaa1039] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Revised: 10/17/2020] [Accepted: 11/30/2020] [Indexed: 01/19/2023] Open
Abstract
SUMMARY scATAC-seq is a powerful approach for characterizing cell-type-specific regulatory landscapes. However, it is difficult to benchmark the performance of various scATAC-seq analysis techniques (such as clustering and deconvolution) without having a priori a known set of gold-standard cell types. To simulate scATAC-seq experiments with known cell-type labels, we introduce an efficient and scalable scATAC-seq simulation method (SCAN-ATAC-Sim) that down-samples bulk ATAC-seq data (e.g., from representative cell lines or tissues). Our protocol uses a consistent but tunable signal-to-noise ratio across cell types in a scATAC-seq simulation for integrating bulk experiments with different levels of background noise, and it independently samples twice without replacement to account for the diploid genome. Because it uses an efficient weighted reservoir sampling algorithm and is highly parallelizable with OpenMP, our implementation in C ++ allows millions of cells to be simulated in less than an hour on a laptop computer. AVAILABILITY SCAN-ATAC-Sim is available at scan-atac-sim.gersteinlab.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhanlin Chen
- Department of Molecular Biophysics and Biochemistry.,Department of Computer Science, Yale University, New Haven, CT 06520, USA
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, CA 92617, USA
| | | | - Zixuan Zhang
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London, E1 4NS, United Kingdom
| | - Jiangqi Zhu
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London, E1 4NS, United Kingdom
| | - Donghoon Lee
- Department of Genetics and Genomic Sciences.,Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry.,Department of Computer Science, Yale University, New Haven, CT 06520, USA
| |
Collapse
|