1
|
Meng X, Zhang Y, Xu X, Zhang K, Feng B. scSFCL:Deep clustering of scRNA-seq data with subspace feature confidence learning. Comput Biol Chem 2025; 114:108292. [PMID: 39591807 DOI: 10.1016/j.compbiolchem.2024.108292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 11/07/2024] [Accepted: 11/20/2024] [Indexed: 11/28/2024]
Abstract
The rapid development of single-cell RNA sequencing(scRNA-seq) technology has spawned a variety of single-cell clustering methods. These methods combine statistics and bioinformatics to reveal differences in gene expression between cells and the diversity of cell types. Deep exploration of single-cell data is more challenging due to the high dimensionality, sparsity and noise of scRNA-seq data. Discriminative attribute information is often difficult to be fully utilised, while traditional clustering methods may not accurately capture the diversity of cell types. Therefore, a deep clustering method is proposed for scRNA-seq data based on subspace feature confidence learning called scSFCL. By dividing the subspace based on kernel density, discriminative feature subsets are filtered. The feature confidence of the subset is learned by combining the graph convolutional network (GCN) with weighting. Also, scSFCL facilitates the complementary fusion of generic structural and idiosyncratic information through a mutually supervised clustering that integrates GCN and a denoising variational autoencoder based on zero-inflated negative binomials (DVAE-ZINB). By validation on multiple scRNA-seq datasets, it is shown that the clustering performance of scSFCL is significantly improved compared with traditional methods, providing an effective solution for deep clustering of scRNA-seq data.
Collapse
Affiliation(s)
- Xiaokun Meng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong 266520, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong 266520, China.
| | - Xiaoyu Xu
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong 266520, China
| | - Kaihao Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong 266520, China
| | - Baoming Feng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong 266520, China
| |
Collapse
|
2
|
Wang S, Li H, Zhang K, Wu H, Pang S, Wu W, Ye L, Su J, Zhang Y. scSID: A lightweight algorithm for identifying rare cell types by capturing differential expression from single-cell sequencing data. Comput Struct Biotechnol J 2024; 23:589-600. [PMID: 38274993 PMCID: PMC10809081 DOI: 10.1016/j.csbj.2023.12.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/27/2023] [Accepted: 12/27/2023] [Indexed: 01/27/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is currently an important technology for identifying cell types and studying diseases at the genetic level. Identifying rare cell types is biologically important as one of the downstream data analyses of single-cell RNA sequencing. Although rare cell identification methods have been developed, most of these suffer from insufficient mining of intercellular similarities, low scalability, and being time-consuming. In this paper, we propose a single-cell similarity division algorithm (scSID) for identifying rare cells. It takes cell-to-cell similarity into consideration by analyzing both inter-cluster and intra-cluster similarities, and discovers rare cell types based on the similarity differences. We show that scSID outperforms other existing methods by benchmarking it on different experimental datasets. Application of scSID to multiple datasets, including 68K PBMC and intestine, highlights its exceptional scalability and remarkable ability to identify rare cell populations.
Collapse
Affiliation(s)
- Shudong Wang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Hengxiao Li
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Kuijie Zhang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Hao Wu
- College of Information Engineering, Northwest A&F University, 712100, Yangling, China
- School of Software, Shandong University, 250100, Jinan, China
| | - Shanchen Pang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Wenhao Wu
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Lan Ye
- Cancer Center, the Second Hospital of Shandong University, Jinan, 250033, China
| | - Jionglong Su
- School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi'an Jiaotong-Liverpool University, Suzhou, 215123, Jiangsu, China
| | - Yulin Zhang
- College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao, 266590, China
| |
Collapse
|
3
|
Zhang W, Xu Y, Zheng X, Shen J, Li Y. Identifying cell types by lasso-constraint regularized Gaussian graphical model based on weighted distance penalty. Brief Bioinform 2024; 25:bbae572. [PMID: 39541187 PMCID: PMC11562834 DOI: 10.1093/bib/bbae572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 10/10/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technology is one of the most cost-effective and efficacious methods for revealing cellular heterogeneity and diversity. Precise identification of cell types is essential for establishing a robust foundation for downstream analyses and is a prerequisite for understanding heterogeneous mechanisms. However, the accuracy of existing methods warrants improvement, and highly accurate methods often impose stringent equipment requirements. Moreover, most unsupervised learning-based approaches are constrained by the need to input the number of cell types a prior, which limits their widespread application. In this paper, we propose a novel algorithm framework named WLGG. Initially, to capture the underlying nonlinear information, we introduce a weighted distance penalty term utilizing the Gaussian kernel function, which maps data from a low-dimensional nonlinear space to a high-dimensional linear space. We subsequently impose a Lasso constraint on the regularized Gaussian graphical model to enhance its ability to capture linear data characteristics. Additionally, we utilize the Eigengap strategy to predict the number of cell types and obtain predicted labels via spectral clustering. The experimental results on 14 test datasets demonstrate the superior clustering accuracy of the WLGG algorithm over 16 alternative methods. Furthermore, downstream analysis, including marker gene identification, pseudotime inference, and functional enrichment analysis based on the similarity matrix and predicted labels from the WLGG algorithm, substantiates the reliability of WLGG and offers valuable insights into biological dynamic biological processes and regulatory mechanisms.
Collapse
Affiliation(s)
- Wei Zhang
- School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan 430205, China
| | - Yaxin Xu
- Peng Cheng Laboratory, and School of Microelectronics, Southern University of Science and Technology, Shenzhen 518055, China
| | - Xiaoying Zheng
- School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan 430205, China
| | - Juan Shen
- School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan 430205, China
| | - Yuanyuan Li
- School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan 430205, China
| |
Collapse
|
4
|
Liu T, Jia C, Bi Y, Guo X, Zou Q, Li F. scDFN: enhancing single-cell RNA-seq clustering with deep fusion networks. Brief Bioinform 2024; 25:bbae486. [PMID: 39373051 PMCID: PMC11456827 DOI: 10.1093/bib/bbae486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 08/07/2024] [Accepted: 09/17/2024] [Indexed: 10/08/2024] Open
Abstract
Single-cell ribonucleic acid sequencing (scRNA-seq) technology can be used to perform high-resolution analysis of the transcriptomes of individual cells. Therefore, its application has gained popularity for accurately analyzing the ever-increasing content of heterogeneous single-cell datasets. Central to interpreting scRNA-seq data is the clustering of cells to decipher transcriptomic diversity and infer cell behavior patterns. However, its complexity necessitates the application of advanced methodologies capable of resolving the inherent heterogeneity and limited gene expression characteristics of single-cell data. Herein, we introduce a novel deep learning-based algorithm for single-cell clustering, designated scDFN, which can significantly enhance the clustering of scRNA-seq data through a fusion network strategy. The scDFN algorithm applies a dual mechanism involving an autoencoder to extract attribute information and an improved graph autoencoder to capture topological nuances, integrated via a cross-network information fusion mechanism complemented by a triple self-supervision strategy. This fusion is optimized through a holistic consideration of four distinct loss functions. A comparative analysis with five leading scRNA-seq clustering methodologies across multiple datasets revealed the superiority of scDFN, as determined by better the Normalized Mutual Information (NMI) and the Adjusted Rand Index (ARI) metrics. Additionally, scDFN demonstrated robust multi-cluster dataset performance and exceptional resilience to batch effects. Ablation studies highlighted the key roles of the autoencoder and the improved graph autoencoder components, along with the critical contribution of the four joint loss functions to the overall efficacy of the algorithm. Through these advancements, scDFN set a new benchmark in single-cell clustering and can be used as an effective tool for the nuanced analysis of single-cell transcriptomics.
Collapse
Affiliation(s)
- Tianxiang Liu
- School of Science, Dalian Maritime University, 1 Linghai Road, Dalian 116026, China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, 1 Linghai Road, Dalian 116026, China
| | - Yue Bi
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi,China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave, West Hi-Tech Zone, 611731, Chengdu, Sichuan, China
| | - Fuyi Li
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi,China
- South Australian Immunogenomics Cancer Institute, The University of Adelaide, 4 North Terrace, SA 5000, Australia
| |
Collapse
|
5
|
Wang Z, Wang H, Zhao J, Xia J, Zheng C. scVSC: Deep Variational Subspace Clustering for Single-Cell Transcriptome Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1492-1503. [PMID: 38801694 DOI: 10.1109/tcbb.2024.3405731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a potent advancement for analyzing gene expression at the individual cell level, allowing for the identification of cellular heterogeneity and subpopulations. However, it suffers from technical limitations that result in sparse and heterogeneous data. Here, we propose scVSC, an unsupervised clustering algorithm built on deep representation neural networks. The method incorporates the variational inference into the subspace model, which imposes regularization constraints on the latent space and further prevents overfitting. In a series of experiments across multiple datasets, scVSC outperforms existing state-of-the-art unsupervised and semi-supervised clustering tools regarding clustering accuracy and running efficiency. Moreover, the study indicates that scVSC could visually reveal the state of trajectory differentiation, accurately identify differentially expressed genes, and further discover biologically critical pathways.
Collapse
|
6
|
Monnier L, Cournède PH. A novel batch-effect correction method for scRNA-seq data based on Adversarial Information Factorization. PLoS Comput Biol 2024; 20:e1011880. [PMID: 38386700 PMCID: PMC10914288 DOI: 10.1371/journal.pcbi.1011880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 03/05/2024] [Accepted: 01/30/2024] [Indexed: 02/24/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technology produces an unprecedented resolution at the level of a unique cell, raising great hopes in medicine. Nevertheless, scRNA-seq data suffer from high variations due to the experimental conditions, called batch effects, preventing any aggregated downstream analysis. Adversarial Information Factorization provides a robust batch-effect correction method that does not rely on prior knowledge of the cell types nor a specific normalization strategy while being adapted to any downstream analysis task. It compares to and even outperforms state-of-the-art methods in several scenarios: low signal-to-noise ratio, batch-specific cell types with few cells, and a multi-batches dataset with imbalanced batches and batch-specific cell types. Moreover, it best preserves the relative gene expression between cell types, yielding superior differential expression analysis results. Finally, in a more complex setting of a Leukemia cohort, our method preserved most of the underlying biological information for each patient while aligning the batches, improving the clustering metrics in the aggregated dataset.
Collapse
Affiliation(s)
- Lily Monnier
- Paris-Saclay University, CentraleSupélec, Laboratory of Mathematics and Computer Science (MICS), Gif-sur-Yvette, France
| | - Paul-Henry Cournède
- Paris-Saclay University, CentraleSupélec, Laboratory of Mathematics and Computer Science (MICS), Gif-sur-Yvette, France
| |
Collapse
|