1
|
Jiang Q, Chen S, Chen X, Jiang R. scPRAM accurately predicts single-cell gene expression perturbation response based on attention mechanism. Bioinformatics 2024; 40:btae265. [PMID: 38625746 PMCID: PMC11076148 DOI: 10.1093/bioinformatics/btae265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 04/06/2024] [Accepted: 04/13/2024] [Indexed: 04/17/2024] Open
Abstract
MOTIVATION With the rapid advancement of single-cell sequencing technology, it becomes gradually possible to delve into the cellular responses to various external perturbations at the gene expression level. However, obtaining perturbed samples in certain scenarios may be considerably challenging, and the substantial costs associated with sequencing also curtail the feasibility of large-scale experimentation. A repertoire of methodologies has been employed for forecasting perturbative responses in single-cell gene expression. However, existing methods primarily focus on the average response of a specific cell type to perturbation, overlooking the single-cell specificity of perturbation responses and a more comprehensive prediction of the entire perturbation response distribution. RESULTS Here, we present scPRAM, a method for predicting perturbation responses in single-cell gene expression based on attention mechanisms. Leveraging variational autoencoders and optimal transport, scPRAM aligns cell states before and after perturbation, followed by accurate prediction of gene expression responses to perturbations for unseen cell types through attention mechanisms. Experiments on multiple real perturbation datasets involving drug treatments and bacterial infections demonstrate that scPRAM attains heightened accuracy in perturbation prediction across cell types, species, and individuals, surpassing existing methodologies. Furthermore, scPRAM demonstrates outstanding capability in identifying differentially expressed genes under perturbation, capturing heterogeneity in perturbation responses across species, and maintaining stability in the presence of data noise and sample size variations. AVAILABILITY AND IMPLEMENTATION https://github.com/jiang-q19/scPRAM and https://doi.org/10.5281/zenodo.10935038.
Collapse
Affiliation(s)
- Qun Jiang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Xiaoyang Chen
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
2
|
Cui X, Chen X, Li Z, Gao Z, Chen S, Jiang R. Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity. NATURE COMPUTATIONAL SCIENCE 2024; 4:346-359. [PMID: 38730185 DOI: 10.1038/s43588-024-00625-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 04/05/2024] [Indexed: 05/12/2024]
Abstract
Single-cell epigenomic data has been growing continuously at an unprecedented pace, but their characteristics such as high dimensionality and sparsity pose substantial challenges to downstream analysis. Although deep learning models-especially variational autoencoders-have been widely used to capture low-dimensional feature embeddings, the prevalent Gaussian assumption somewhat disagrees with real data, and these models tend to struggle to incorporate reference information from abundant cell atlases. Here we propose CASTLE, a deep generative model based on the vector-quantized variational autoencoder framework to extract discrete latent embeddings that interpretably characterize single-cell chromatin accessibility sequencing data. We validate the performance and robustness of CASTLE for accurate cell-type identification and reasonable visualization compared with state-of-the-art methods. We demonstrate the advantages of CASTLE for effective incorporation of existing massive reference datasets in a weakly supervised or supervised manner. We further demonstrate CASTLE's capacity for intuitively distilling cell-type-specific feature spectra that unveil cell heterogeneity and biological implications quantitatively.
Collapse
Affiliation(s)
- Xuejian Cui
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Zhen Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Zijing Gao
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China.
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China.
| |
Collapse
|
3
|
Cao Y, Zhao X, Tang S, Jiang Q, Li S, Li S, Chen S. scButterfly: a versatile single-cell cross-modality translation method via dual-aligned variational autoencoders. Nat Commun 2024; 15:2973. [PMID: 38582890 PMCID: PMC10998864 DOI: 10.1038/s41467-024-47418-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 03/28/2024] [Indexed: 04/08/2024] Open
Abstract
Recent advancements for simultaneously profiling multi-omics modalities within individual cells have enabled the interrogation of cellular heterogeneity and molecular hierarchy. However, technical limitations lead to highly noisy multi-modal data and substantial costs. Although computational methods have been proposed to translate single-cell data across modalities, broad applications of the methods still remain impeded by formidable challenges. Here, we propose scButterfly, a versatile single-cell cross-modality translation method based on dual-aligned variational autoencoders and data augmentation schemes. With comprehensive experiments on multiple datasets, we provide compelling evidence of scButterfly's superiority over baseline methods in preserving cellular heterogeneity while translating datasets of various contexts and in revealing cell type-specific biological insights. Besides, we demonstrate the extensive applications of scButterfly for integrative multi-omics analysis of single-modality data, data enhancement of poor-quality single-cell multi-omics, and automatic cell type annotation of scATAC-seq data. Moreover, scButterfly can be generalized to unpaired data training, perturbation-response analysis, and consecutive translation.
Collapse
Affiliation(s)
- Yichuan Cao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Xiamiao Zhao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Songming Tang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Qun Jiang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, 100084, Beijing, China
| | - Sijie Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Siyu Li
- School of Statistics and Data Science, Nankai University, Tianjin, 300071, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| |
Collapse
|
4
|
Li K, Chen X, Song S, Hou L, Chen S, Jiang R. Cofea: correlation-based feature selection for single-cell chromatin accessibility data. Brief Bioinform 2023; 25:bbad458. [PMID: 38113078 PMCID: PMC10782922 DOI: 10.1093/bib/bbad458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/19/2023] [Accepted: 11/20/2023] [Indexed: 12/21/2023] Open
Abstract
Single-cell chromatin accessibility sequencing (scCAS) technologies have enabled characterizing the epigenomic heterogeneity of individual cells. However, the identification of features of scCAS data that are relevant to underlying biological processes remains a significant gap. Here, we introduce a novel method Cofea, to fill this gap. Through comprehensive experiments on 5 simulated and 54 real datasets, Cofea demonstrates its superiority in capturing cellular heterogeneity and facilitating downstream analysis. Applying this method to identification of cell type-specific peaks and candidate enhancers, as well as pathway enrichment analysis and partitioned heritability analysis, we illustrate the potential of Cofea to uncover functional biological process.
Collapse
Affiliation(s)
- Keyi Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shuang Song
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China
| | - Lin Hou
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
5
|
Li Z, Chen X, Zhang X, Jiang R, Chen S. Latent feature extraction with a prior-based self-attention framework for spatial transcriptomics. Genome Res 2023; 33:1757-1773. [PMID: 37903634 PMCID: PMC10691543 DOI: 10.1101/gr.277891.123] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 09/19/2023] [Indexed: 11/01/2023]
Abstract
Rapid advances in spatial transcriptomics (ST) have revolutionized the interrogation of spatial heterogeneity and increase the demand for comprehensive methods to effectively characterize spatial domains. As a prerequisite for ST data analysis, spatial domain characterization is a crucial step for downstream analyses and biological implications. Here we propose a prior-based self-attention framework for spatial transcriptomics (PAST), a variational graph convolutional autoencoder for ST, which effectively integrates prior information via a Bayesian neural network, captures spatial patterns via a self-attention mechanism, and enables scalable application via a ripple walk sampler strategy. Through comprehensive experiments on data sets generated by different technologies, we show that PAST can effectively characterize spatial domains and facilitate various downstream analyses, including ST visualization, spatial trajectory inference and pseudotime analysis. Also, we highlight the advantages of PAST for multislice joint embedding and automatic annotation of spatial domains in newly sequenced ST data. Compared with existing methods, PAST is the first ST method that integrates reference data to analyze ST data. We anticipate that PAST will open up new avenues for researchers to decipher ST data with customized reference data, which expands the applicability of ST technology.
Collapse
Affiliation(s)
- Zhen Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| |
Collapse
|