1
|
Xu L, Li Z, Ren J, Liu S, Xu Y. Single-cell RNA sequencing data analysis utilizing multi-type graph neural networks. Comput Biol Med 2024; 179:108921. [PMID: 39059210 DOI: 10.1016/j.compbiomed.2024.108921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 07/08/2024] [Accepted: 07/16/2024] [Indexed: 07/28/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) is the sequencing technology of a single cell whose expression reflects the overall characteristics of the individual cell, facilitating the research of problems at the cellular level. However, the problems of scRNA-seq such as dimensionality reduction processing of massive data, technical noise in data, and visualization of single-cell type clustering cause great difficulties for analyzing and processing scRNA-seq data. In this paper, we propose a new single-cell data analysis model using denoising autoencoder and multi-type graph neural networks (scDMG), which learns cell-cell topology information and latent representation of scRNA-seq data. scDMG introduces the zero-inflated negative binomial (ZINB) model into a denoising autoencoder (DAE) to perform dimensionality reduction and denoising on the raw data. scDMG integrates multiple-type graph neural networks as the encoder to further train the preprocessed data, which better deals with various types of scRNA-seq datasets, resolves dropout events in scRNA-seq data, and enables preliminary classification of scRNA-seq data. By employing TSNE and PCA algorithms for the trained data and invoking Louvain algorithm, scDMG has better dimensionality reduction and clustering optimization. Compared with other mainstream scRNA-seq clustering algorithms, scDMG outperforms other state-of-the-art methods in various clustering performance metrics and shows better scalability, shorter runtime, and great clustering results.
Collapse
Affiliation(s)
- Li Xu
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China
| | - Zhenpeng Li
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China.
| | - Jiaxu Ren
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China
| | - Shuaipeng Liu
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China
| | - Yiming Xu
- College of Engineering, Tokyo Institute of Technology, Tokyo, 226-0026, Tokyo, Japan
| |
Collapse
|
2
|
Qiao TJ, Li F, Yuan SS, Dai LY, Wang J. A Fusion Learning Model Based on Deep Learning for Single-Cell RNA Sequencing Data Clustering. J Comput Biol 2024; 31:576-588. [PMID: 38758925 DOI: 10.1089/cmb.2024.0512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technology provides a means for studying biology from a cellular perspective. The fundamental goal of scRNA-seq data analysis is to discriminate single-cell types using unsupervised clustering. Few single-cell clustering algorithms have taken into account both deep and surface information, despite the recent slew of suggestions. Consequently, this article constructs a fusion learning framework based on deep learning, namely scGASI. For learning a clustering similarity matrix, scGASI integrates data affinity recovery and deep feature embedding in a unified scheme based on various top feature sets. Next, scGASI learns the low-dimensional latent representation underlying the data using a graph autoencoder to mine the hidden information residing in the data. To efficiently merge the surface information from raw area and the deeper potential information from underlying area, we then construct a fusion learning model based on self-expression. scGASI uses this fusion learning model to learn the similarity matrix of an individual feature set as well as the clustering similarity matrix of all feature sets. Lastly, gene marker identification, visualization, and clustering are accomplished using the clustering similarity matrix. Extensive verification on actual data sets demonstrates that scGASI outperforms many widely used clustering techniques in terms of clustering accuracy.
Collapse
Affiliation(s)
- Tian-Jing Qiao
- School of Computer Science, Qufu Normal University, Rizhao, China
| | - Feng Li
- School of Computer Science, Qufu Normal University, Rizhao, China
| | - Sha-Sha Yuan
- School of Computer Science, Qufu Normal University, Rizhao, China
| | - Ling-Yun Dai
- School of Computer Science, Qufu Normal University, Rizhao, China
| | - Juan Wang
- School of Computer Science, Qufu Normal University, Rizhao, China
| |
Collapse
|
3
|
Ren L, Wang J, Li W, Guo M, Yu G. Single-cell RNA-seq data clustering by deep information fusion. Brief Funct Genomics 2024; 23:128-137. [PMID: 37208992 DOI: 10.1093/bfgp/elad017] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 02/13/2023] [Indexed: 05/21/2023] Open
Abstract
Determining cell types by single-cell transcriptomics data is fundamental for downstream analysis. However, cell clustering and data imputation still face the computation challenges, due to the high dropout rate, sparsity and dimensionality of single-cell data. Although some deep learning based solutions have been proposed to handle these challenges, they still can not leverage gene attribute information and cell topology in a sensible way to explore the consistent clustering. In this paper, we present scDeepFC, a deep information fusion-based single-cell data clustering method for cell clustering and data imputation. Specifically, scDeepFC uses a deep auto-encoder (DAE) network and a deep graph convolution network to embed high-dimensional gene attribute information and high-order cell-cell topological information into different low-dimensional representations, and then fuses them to generate a more comprehensive and accurate consensus representation via a deep information fusion network. In addition, scDeepFC integrates the zero-inflated negative binomial (ZINB) into DAE to model the dropout events. By jointly optimizing the ZINB loss and cell graph reconstruction loss, scDeepFC generates a salient embedding representation for clustering cells and imputing missing data. Extensive experiments on real single-cell datasets prove that scDeepFC outperforms other popular single-cell analysis methods. Both the gene attribute and cell topology information can improve the cell clustering.
Collapse
Affiliation(s)
- Liangrui Ren
- School of Software, Shandong University, 250101 Ji'nan, China
| | - Jun Wang
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, 250101 Ji'nan, China
| | - Wei Li
- School of Control Science and Engineering, Shandong University, 250061 Ji'nan, China
| | - Maozu Guo
- College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, 100044,Bei'jing, China
| | - Guoxian Yu
- School of Software, Shandong University, 250101 Ji'nan, China
| |
Collapse
|
4
|
Wang X, Chai Z, Li S, Liu Y, Li C, Jiang Y, Liu Q. CTISL: a dynamic stacking multi-class classification approach for identifying cell types from single-cell RNA-seq data. Bioinformatics 2024; 40:btae063. [PMID: 38317054 PMCID: PMC10873586 DOI: 10.1093/bioinformatics/btae063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 02/15/2024] [Accepted: 02/15/2024] [Indexed: 02/07/2024] Open
Abstract
MOTIVATION Effective identification of cell types is of critical importance in single-cell RNA-sequencing (scRNA-seq) data analysis. To date, many supervised machine learning-based predictors have been implemented to identify cell types from scRNA-seq datasets. Despite the technical advances of these state-of-the-art tools, most existing predictors were single classifiers, of which the performances can still be significantly improved. It is therefore highly desirable to employ the ensemble learning strategy to develop more accurate computational models for robust and comprehensive identification of cell types on scRNA-seq datasets. RESULTS We propose a two-layer stacking model, termed CTISL (Cell Type Identification by Stacking ensemble Learning), which integrates multiple classifiers to identify cell types. In the first layer, given a reference scRNA-seq dataset with known cell types, CTISL dynamically combines multiple cell-type-specific classifiers (i.e. support-vector machine and logistic regression) as the base learners to deliver the outcomes for the input of a meta-classifier in the second layer. We conducted a total of 24 benchmarking experiments on 17 human and mouse scRNA-seq datasets to evaluate and compare the prediction performance of CTISL and other state-of-the-art predictors. The experiment results demonstrate that CTISL achieves superior or competitive performance compared to these state-of-the-art approaches. We anticipate that CTISL can serve as a useful and reliable tool for cost-effective identification of cell types from scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION The webserver and source code are freely available at http://bigdata.biocie.cn/CTISLweb/home and https://zenodo.org/records/10568906, respectively.
Collapse
Affiliation(s)
- Xiao Wang
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Ziyi Chai
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Shaohua Li
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Yan Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Chen Li
- Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yu Jiang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Quanzhong Liu
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
- Shaanxi Engineering Research Center of Agricultural Information Intelligent Perception and Analysis, Northwest A&F University, Yangling 712100, China
| |
Collapse
|
5
|
Zhang DJ, Gao YL, Zhao JX, Zheng CH, Liu JX. A New Graph Autoencoder-Based Consensus-Guided Model for scRNA-seq Cell Type Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2473-2483. [PMID: 35857730 DOI: 10.1109/tnnls.2022.3190289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) technology is famous for providing a microscopic view to help capture cellular heterogeneity. This characteristic has advanced the field of genomics by enabling the delicate differentiation of cell types. However, the properties of single-cell datasets, such as high dropout events, noise, and high dimensionality, are still a research challenge in the single-cell field. To utilize single-cell data more efficiently and to better explore the heterogeneity among cells, a new graph autoencoder (GAE)-based consensus-guided model (scGAC) is proposed in this article. The data are preprocessed into multiple top-level feature datasets. Then, feature learning is performed by using GAEs to generate new feature matrices, followed by similarity learning based on distance fusion methods. The learned similarity matrices are fed back to the GAEs to guide their feature learning process. Finally, the abovementioned steps are iterated continuously to integrate the final consistent similarity matrix and perform other related downstream analyses. The scGAC model can accurately identify critical features and effectively preserve the internal structure of the data. This can further improve the accuracy of cell type identification.
Collapse
|
6
|
Jiang H, Huang Y, Li Q, Feng B. ScLSTM: single-cell type detection by siamese recurrent network and hierarchical clustering. BMC Bioinformatics 2023; 24:417. [PMID: 37932672 PMCID: PMC10629177 DOI: 10.1186/s12859-023-05494-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 09/21/2023] [Indexed: 11/08/2023] Open
Abstract
MOTIVATION Categorizing cells into distinct types can shed light on biological tissue functions and interactions, and uncover specific mechanisms under pathological conditions. Since gene expression throughout a population of cells is averaged out by conventional sequencing techniques, it is challenging to distinguish between different cell types. The accumulation of single-cell RNA sequencing (scRNA-seq) data provides the foundation for a more precise classification of cell types. It is crucial building a high-accuracy clustering approach to categorize cell types since the imbalance of cell types and differences in the distribution of scRNA-seq data affect single-cell clustering and visualization outcomes. RESULT To achieve single-cell type detection, we propose a meta-learning-based single-cell clustering model called ScLSTM. Specifically, ScLSTM transforms the single-cell type detection problem into a hierarchical classification problem based on feature extraction by the siamese long-short term memory (LSTM) network. The similarity matrix derived from the improved sigmoid kernel is mapped to the siamese LSTM feature space to analyze the differences between cells. ScLSTM demonstrated superior classification performance on 8 scRNA-seq data sets of different platforms, species, and tissues. Further quantitative analysis and visualization of the human breast cancer data set validated the superiority and capability of ScLSTM in recognizing cell types.
Collapse
Affiliation(s)
- Hanjing Jiang
- Key Laboratory of Image Information Processing and Intelligent Control of Education Ministry of China, Institute of Artificial Intelligence, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Yabing Huang
- Department of Pathology, Renmin Hospital of Wuhan University, Wuhan, 430060, China.
| | - Qianpeng Li
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Boyuan Feng
- Key Laboratory of Image Information Processing and Intelligent Control of Education Ministry of China, Institute of Artificial Intelligence, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
| |
Collapse
|
7
|
Liu H, Tang T. MAPK signaling pathway-based glioma subtypes, machine-learning risk model, and key hub proteins identification. Sci Rep 2023; 13:19055. [PMID: 37925483 PMCID: PMC10625624 DOI: 10.1038/s41598-023-45774-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 10/24/2023] [Indexed: 11/06/2023] Open
Abstract
An early diagnosis and precise prognosis are critical for the treatment of glioma. The mitogen‑activated protein kinase (MAPK) signaling pathway potentially affects glioma, but the exploration of the clinical values of the pathway remains lacking. We accessed data from TCGA, GTEx, CGGA, etc. Up-regulated MAPK signaling pathway genes in glioma were identified and used to cluster the glioma subtypes using consensus clustering. The subtype differences in survival, cancer stemness, and the immune microenvironment were analyzed. A prognostic model was trained with the identified genes using the LASSO method and was validated with three external cohorts. The correlations between the risk model and cancer-associated signatures in cancer were analyzed. Key hub genes of the gene set were identified by hub gene analysis and survival analysis. 47% of the MAPK signaling pathway genes were overexpressed in glioma. Subtypes based on these genes were distinguished in survival, cancer stemness, and the immune microenvironment. A risk model was calculated with high confidence in the prediction of overall survival and was correlated with multiple cancer-associated signatures. 12 hub genes were identified and 8 of them were associated with survival. The MAPK signaling pathway was overexpressed in glioma with prognostic value.
Collapse
Affiliation(s)
- Hengrui Liu
- Xinkaiyuan Pharmaceuticals, Beijing, China
- Guangzhou Regenerative Medicine Research Center, Future Homo Sapiens Institute of Regenerative Medicine Co., Ltd (FHIR), Guangzhou, China
| | - Tao Tang
- Department of Molecular Diagnostics, Sun Yat-Sen University Cancer Center, State Key Laboratory of Oncology in South China, Guangzhou, China.
- Collaborative Innovation Center for Cancer Medicine, Guangzhou, China.
| |
Collapse
|
8
|
Liu T, Hu A, Chen H, Li Y, Wang Y, Guo Y, Liu T, Zhou J, Li D, Chen Q. Comprehensive analysis identifies DNA damage repair-related gene HCLS1 associated with good prognosis in lung adenocarcinoma. Transl Cancer Res 2023; 12:2613-2628. [PMID: 37969376 PMCID: PMC10643974 DOI: 10.21037/tcr-23-921] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 09/21/2023] [Indexed: 11/17/2023]
Abstract
Background Lung cancer is the leading cause of cancer-associated mortality. Lung adenocarcinoma (LUAD) amounts to more than 40% of all lung malignancies. Therefore, developing clinically useful biomarkers for this disease is critical. DNA damage repair (DDR) is a complicated signal transduction process that ensures genomic stability. DDR should be comprehensively analyzed to elucidate their clinical significance and tumor immune microenvironment interactions. Methods In this study, DDR-related genes (DRGs) were selected to investigate their prognostic impact on LUAD. A regression-based prognostic model was established based on The Cancer Genome Atlas (TCGA)-LUAD cohort and three external Gene Expression Omnibus (GEO) validation cohorts (GSE31210, GSE68465, and GSE72094). The robust, established model could independently predict the clinical outcomes in patients. Then, the prognostic performance of risk profiles was assessed using a time-dependent receiver operating characteristic (ROC) curve, Cox regression, nomogram, and Kaplan-Meier analyses. Furthermore, the potential biological functions and infiltration status of DRGs in LUAD were investigated with ESTIMATE and CIBERSORT. Finally, the effects of HCLS1 on the clinical features, prognosis, biological function, immune infiltration, and treatment response in LUAD were systematically analyzed. Results Eleven DRGs were constructed to categorize patients into high- and low-risk groups. The risk score was an independent predictor of overall survival (OS). HCLS1 expression was downregulated in LUAD samples and linked with clinicopathological features. Multivariate Cox regression analysis using the Kaplan-Meier plotter revealed that low HCLS1 expression was independently associated with poor OS. Moreover, the HCLS1 high-expression group had higher immune-related gene expression and ESTIMATE scores. It was positively correlated with the infiltration of M1 macrophages, activated memory CD4 T cells, CD8 T cells, memory B cells, resting dendritic cells, and memory CD4 T cells, Tregs, and neutrophils. Conclusions A new classification system was developed for LUAD according to DDR characteristics. This stratification has important clinical values, reliable prognosis, and immunotherapy in patients with LUAD. Moreover, HCLS1 is a potential prognostic biomarker of LUAD that correlates with the extent of immune cell infiltration in the tumor microenvironment (TME).
Collapse
Affiliation(s)
- Tingjun Liu
- Center of Animal Laboratory, Xuzhou Medical University, Xuzhou, China
| | - Ankang Hu
- Center of Animal Laboratory, Xuzhou Medical University, Xuzhou, China
| | - Hao Chen
- Respiratory Department, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Yan Li
- Respiratory Department, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Yonghui Wang
- School of Life Sciences, Xuzhou Medical University, Xuzhou, China
| | - Yao Guo
- School of Life Sciences, Xuzhou Medical University, Xuzhou, China
| | - Tingya Liu
- Department of Neurology, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Jie Zhou
- The Second Clinical College of Xuzhou Medical University, Xuzhou, China
| | - Debao Li
- School of Imaging, Xuzhou Medical University, Xuzhou, China
| | - Quangang Chen
- School of Life Sciences, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
9
|
Lei T, Chen R, Zhang S, Chen Y. Self-supervised deep clustering of single-cell RNA-seq data to hierarchically detect rare cell populations. Brief Bioinform 2023; 24:bbad335. [PMID: 37769630 PMCID: PMC10539043 DOI: 10.1093/bib/bbad335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/05/2023] [Accepted: 09/06/2023] [Indexed: 10/02/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is a widely used technique for characterizing individual cells and studying gene expression at the single-cell level. Clustering plays a vital role in grouping similar cells together for various downstream analyses. However, the high sparsity and dimensionality of large scRNA-seq data pose challenges to clustering performance. Although several deep learning-based clustering algorithms have been proposed, most existing clustering methods have limitations in capturing the precise distribution types of the data or fully utilizing the relationships between cells, leaving a considerable scope for improving the clustering performance, particularly in detecting rare cell populations from large scRNA-seq data. We introduce DeepScena, a novel single-cell hierarchical clustering tool that fully incorporates nonlinear dimension reduction, negative binomial-based convolutional autoencoder for data fitting, and a self-supervision model for cell similarity enhancement. In comprehensive evaluation using multiple large-scale scRNA-seq datasets, DeepScena consistently outperformed seven popular clustering tools in terms of accuracy. Notably, DeepScena exhibits high proficiency in identifying rare cell populations within large datasets that contain large numbers of clusters. When applied to scRNA-seq data of multiple myeloma cells, DeepScena successfully identified not only previously labeled large cell types but also subpopulations in CD14 monocytes, T cells and natural killer cells, respectively.
Collapse
Affiliation(s)
- Tianyuan Lei
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Ruoyu Chen
- Moorestown High School, Moorestown, NJ 08057, USA
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, NJ 08028, USA
| |
Collapse
|
10
|
Li Y, Wu M, Ma S, Wu M. ZINBMM: a general mixture model for simultaneous clustering and gene selection using single-cell transcriptomic data. Genome Biol 2023; 24:208. [PMID: 37697330 PMCID: PMC10496184 DOI: 10.1186/s13059-023-03046-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 08/22/2023] [Indexed: 09/13/2023] Open
Abstract
Clustering is a critical component of single-cell RNA sequencing (scRNA-seq) data analysis and can help reveal cell types and infer cell lineages. Despite considerable successes, there are few methods tailored to investigating cluster-specific genes contributing to cell heterogeneity, which can promote biological understanding of cell heterogeneity. In this study, we propose a zero-inflated negative binomial mixture model (ZINBMM) that simultaneously achieves effective scRNA-seq data clustering and gene selection. ZINBMM conducts a systemic analysis on raw counts, accommodating both batch effects and dropout events. Simulations and the analysis of five scRNA-seq datasets demonstrate the practical applicability of ZINBMM.
Collapse
Affiliation(s)
- Yang Li
- Center for Applied Statistics and School of Statistics, Renmin University of China, Beijing, China
- RSS and China-Re Life Joint Lab on Public Health and Risk Management, Renmin University of China, Beijing, China
- Statistical Consulting Center, Renmin University of China, Beijing, China
| | - Mingcong Wu
- Center for Applied Statistics and School of Statistics, Renmin University of China, Beijing, China
- Statistical Consulting Center, Renmin University of China, Beijing, China
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Haven, USA
| | - Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China.
| |
Collapse
|
11
|
Wang S, Zhang Y, Zhang Y, Wu W, Ye L, Li Y, Su J, Pang S. scASGC: An adaptive simplified graph convolution model for clustering single-cell RNA-seq data. Comput Biol Med 2023; 163:107152. [PMID: 37364529 DOI: 10.1016/j.compbiomed.2023.107152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 05/24/2023] [Accepted: 06/07/2023] [Indexed: 06/28/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is now a successful technique for identifying cellular heterogeneity, revealing novel cell subpopulations, and forecasting developmental trajectories. A crucial component of the processing of scRNA-seq data is the precise identification of cell subpopulations. Although many unsupervised clustering methods have been developed to cluster cell subpopulations, the performance of these methods is vulnerable to dropouts and high dimensionality. In addition, most existing methods are time-consuming and fail to adequately account for potential associations between cells. In the manuscript, we present an unsupervised clustering method based on an adaptive simplified graph convolution model called scASGC. The proposed method builds plausible cell graphs, aggregates neighbor information using a simplified graph convolution model, and adaptively determines the most optimal number of convolution layers for various graphs. Experiments on 12 public datasets show that scASGC outperforms both classical and state-of-the-art clustering methods. In addition, in a study of mouse intestinal muscle containing 15,983 cells, we identified distinct marker genes based on the clustering results of scASGC. The source code of scASGC is available at https://github.com/ZzzOctopus/scASGC.
Collapse
Affiliation(s)
- Shudong Wang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao, 266580, China.
| | - Yu Zhang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao, 266580, China.
| | - Yulin Zhang
- College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao, 266590, China.
| | - Wenhao Wu
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao, 266580, China.
| | - Lan Ye
- Cancer Center, the Second Hospital of Shandong University, Jinan, 250033, China.
| | - YunYin Li
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao, 266580, China.
| | - Jionglong Su
- School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi'an Jiaotong-Liverpool University, Suzhou, 215123, China.
| | - Shanchen Pang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao, 266580, China.
| |
Collapse
|
12
|
Nie X, Qin D, Zhou X, Duo H, Hao Y, Li B, Liang G. Clustering ensemble in scRNA-seq data analysis: Methods, applications and challenges. Comput Biol Med 2023; 159:106939. [PMID: 37075602 DOI: 10.1016/j.compbiomed.2023.106939] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 03/31/2023] [Accepted: 04/14/2023] [Indexed: 04/21/2023]
Abstract
With the rapid development of single-cell RNA-sequencing techniques, various computational methods and tools were proposed to analyze these high-throughput data, which led to an accelerated reveal of potential biological information. As one of the core steps of single-cell transcriptome data analysis, clustering plays a crucial role in identifying cell types and interpreting cellular heterogeneity. However, the results generated by different clustering methods showed distinguishing, and those unstable partitions can affect the accuracy of the analysis to a certain extent. To overcome this challenge and obtain more accurate results, currently clustering ensemble is frequently applied to cluster analysis of single-cell transcriptome datasets, and the results generated by all clustering ensembles are nearly more reliable than those from most of the single clustering partitions. In this review, we summarize applications and challenges of the clustering ensemble method in single-cell transcriptome data analysis, and provide constructive thoughts and references for researchers in this field.
Collapse
Affiliation(s)
- Xiner Nie
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, China; College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Dan Qin
- Department of Biology, College of Science, Northeastern University, Boston, MA, 02115, USA
| | - Xinyi Zhou
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Hongrui Duo
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China.
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, China.
| |
Collapse
|
13
|
Su Y, Lin R, Wang J, Tan D, Zheng C. Denoising adaptive deep clustering with self-attention mechanism on single-cell sequencing data. Brief Bioinform 2023; 24:7008799. [PMID: 36715275 DOI: 10.1093/bib/bbad021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 12/20/2022] [Accepted: 01/05/2023] [Indexed: 01/31/2023] Open
Abstract
A large number of works have presented the single-cell RNA sequencing (scRNA-seq) to study the diversity and biological functions of cells at the single-cell level. Clustering identifies unknown cell types, which is essential for downstream analysis of scRNA-seq samples. However, the high dimensionality, high noise and pervasive dropout rate of scRNA-seq samples have a significant challenge to the cluster analysis of scRNA-seq samples. Herein, we propose a new adaptive fuzzy clustering model based on the denoising autoencoder and self-attention mechanism called the scDASFK. It implements the comparative learning to integrate cell similar information into the clustering method and uses a deep denoising network module to denoise the data. scDASFK consists of a self-attention mechanism for further denoising where an adaptive clustering optimization function for iterative clustering is implemented. In order to make the denoised latent features better reflect the cell structure, we introduce a new adaptive feedback mechanism to supervise the denoising process through the clustering results. Experiments on 16 real scRNA-seq datasets show that scDASFK performs well in terms of clustering accuracy, scalability and stability. Overall, scDASFK is an effective clustering model with great potential for scRNA-seq samples analysis. Our scDASFK model codes are freely available at https://github.com/LRX2022/scDASFK.
Collapse
Affiliation(s)
- Yansen Su
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, 230601, China
| | - Rongxin Lin
- School of Computer Science and Technology, Anhui University, Hefei, 230601, China
| | - Jing Wang
- School of Computer Science and Technology, Anhui University, Hefei, 230601, China
| | - Dayu Tan
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, China
| | - Chunhou Zheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, 230601, China
| |
Collapse
|
14
|
Liu Q, Zhao X, Wang G. A Clustering Ensemble Method for Cell Type Detection by Multiobjective Particle Optimization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1-14. [PMID: 34860653 DOI: 10.1109/tcbb.2021.3132400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a new technology different from previous sequencing methods that measure the average expression level for each gene across a large population of cells. Thus, new computational methods are required to reveal cell types among cell populations. We present a clustering ensemble algorithm using optimized multiobjective particle (CEMP). It is featured with several mechanisms: 1) A multi-subspace projection method for mapping the original data to low-dimensional subspaces is applied in order to detect complex data structure at both gene level and sample level. 2) The basic partition module in different subspaces is utilized to generate clustering solutions. 3) A transforming representation between clusters and particles is used to bridge the gap between the discrete clustering ensemble optimization problem and the continuous multiobjective optimization algorithm. 4) We propose a clustering ensemble optimization. To guide the multiobjective ensemble optimization process, three cluster metrics are embedded into CEMP as objective functions in which the final clustering will be dynamically evaluated. Experiments on 9 real scRNA-seq datasets indicated that CEMP had superior performance over several other clustering algorithms in clustering accuracy and robustness. The case study conducted on mouse neuronal cells identified main cell types and cell subtypes successfully.
Collapse
|
15
|
Sardoo AM, Zhang S, Ferraro TN, Keck TM, Chen Y. Decoding brain memory formation by single-cell RNA sequencing. Brief Bioinform 2022; 23:6713514. [PMID: 36156112 PMCID: PMC9677489 DOI: 10.1093/bib/bbac412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/10/2022] [Accepted: 08/25/2022] [Indexed: 12/14/2022] Open
Abstract
To understand how distinct memories are formed and stored in the brain is an important and fundamental question in neuroscience and computational biology. A population of neurons, termed engram cells, represents the physiological manifestation of a specific memory trace and is characterized by dynamic changes in gene expression, which in turn alters the synaptic connectivity and excitability of these cells. Recent applications of single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq) are promising approaches for delineating the dynamic expression profiles in these subsets of neurons, and thus understanding memory-specific genes, their combinatorial patterns and regulatory networks. The aim of this article is to review and discuss the experimental and computational procedures of sc/snRNA-seq, new studies of molecular mechanisms of memory aided by sc/snRNA-seq in human brain diseases and related mouse models, and computational challenges in understanding the regulatory mechanisms underlying long-term memory formation.
Collapse
Affiliation(s)
- Atlas M Sardoo
- Department of Biological & Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Thomas N Ferraro
- Department of Biomedical Sciences, Cooper Medical School of Rowan University, Camden, NJ 08103, USA
| | - Thomas M Keck
- Department of Biological & Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA,Department of Chemistry & Biochemistry, Rowan University, Glassboro, NJ 08028, USA
| | - Yong Chen
- Corresponding author. Yong Chen, Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA. Tel.: +1 856 256 4500; E-mail:
| |
Collapse
|
16
|
Zou G, Lin Y, Han T, Ou-Yang L. DEMOC: a deep embedded multi-omics learning approach for clustering single-cell CITE-seq data. Brief Bioinform 2022; 23:6679449. [PMID: 36047285 DOI: 10.1093/bib/bbac347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 07/04/2022] [Accepted: 07/26/2022] [Indexed: 11/13/2022] Open
Abstract
Advances in single-cell RNA sequencing (scRNA-seq) technologies has provided an unprecedent opportunity for cell-type identification. As clustering is an effective strategy towards cell-type identification, various computational approaches have been proposed for clustering scRNA-seq data. Recently, with the emergence of cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq), the cell surface expression of specific proteins and the RNA expression on the same cell can be captured, which provides more comprehensive information for cell analysis. However, existing single cell clustering algorithms are mainly designed for single-omic data, and have difficulties in handling multi-omics data with diverse characteristics efficiently. In this study, we propose a novel deep embedded multi-omics clustering with collaborative training (DEMOC) model to perform joint clustering on CITE-seq data. Our model can take into account the characteristics of transcriptomic and proteomic data, and make use of the consistent and complementary information provided by different data sources effectively. Experiment results on two real CITE-seq datasets demonstrate that our DEMOC model not only outperforms state-of-the-art single-omic clustering methods, but also achieves better and more stable performance than existing multi-omics clustering methods. We also apply our model on three scRNA-seq datasets to assess the performance of our model in rare cell-type identification, novel cell-subtype detection and cellular heterogeneity analysis. Experiment results illustrate the effectiveness of our model in discovering the underlying patterns of data.
Collapse
Affiliation(s)
- Guanhua Zou
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Yilong Lin
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Tianyang Han
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China.,Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, 518129, China
| |
Collapse
|
17
|
Zhang S, Xie L, Cui Y, Carone BR, Chen Y. Detecting Fear-Memory-Related Genes from Neuronal scRNA-seq Data by Diverse Distributions and Bhattacharyya Distance. Biomolecules 2022; 12:biom12081130. [PMID: 36009024 PMCID: PMC9405875 DOI: 10.3390/biom12081130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 08/12/2022] [Accepted: 08/15/2022] [Indexed: 11/16/2022] Open
Abstract
The detection of differentially expressed genes (DEGs) is one of most important computational challenges in the analysis of single-cell RNA sequencing (scRNA-seq) data. However, due to the high heterogeneity and dropout noise inherent in scRNAseq data, challenges in detecting DEGs exist when using a single distribution of gene expression levels, leaving much room to improve the precision and robustness of current DEG detection methods. Here, we propose the use of a new method, DEGman, which utilizes several possible diverse distributions in combination with Bhattacharyya distance. DEGman can automatically select the best-fitting distributions of gene expression levels, and then detect DEGs by permutation testing of Bhattacharyya distances of the selected distributions from two cell groups. Compared with several popular DEG analysis tools on both large-scale simulation data and real scRNA-seq data, DEGman shows an overall improvement in the balance of sensitivity and precision. We applied DEGman to scRNA-seq data of TRAP; Ai14 mouse neurons to detect fear-memory-related genes that are significantly differentially expressed in neurons with and without fear memory. DEGman detected well-known fear-memory-related genes and many novel candidates. Interestingly, we found 25 DEGs in common in five neuron clusters that are functionally enriched for synaptic vesicles, indicating that the coupled dynamics of synaptic vesicles across in neurons plays a critical role in remote memory formation. The proposed method leverages the advantage of the use of diverse distributions in DEG analysis, exhibiting better performance in analyzing composite scRNA-seq datasets in real applications.
Collapse
Affiliation(s)
- Shaoqiang Zhang
- Department of Computer Science, College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Linjuan Xie
- Department of Computer Science, College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Yaxuan Cui
- Department of Computer Science, College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Benjamin R. Carone
- Department of Biology and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| | - Yong Chen
- Department of Biology and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
- Correspondence: ; Tel.: +1-856-256-4500
| |
Collapse
|
18
|
Ding Q, Yang W, Luo M, Xu C, Xu Z, Pang F, Cai Y, Anashkina AA, Su X, Chen N, Jiang Q. CBLRR: a cauchy-based bounded constraint low-rank representation method to cluster single-cell RNA-seq data. Brief Bioinform 2022; 23:6649282. [DOI: 10.1093/bib/bbac300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 06/17/2022] [Accepted: 07/02/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
The rapid development of single-cel+l RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for exploring biological phenomena at the single-cell level. The discovery of cell types is one of the major applications for researchers to explore the heterogeneity of cells. Some computational methods have been proposed to solve the problem of scRNA-seq data clustering. However, the unavoidable technical noise and notorious dropouts also reduce the accuracy of clustering methods. Here, we propose the cauchy-based bounded constraint low-rank representation (CBLRR), which is a low-rank representation-based method by introducing cauchy loss function (CLF) and bounded nuclear norm regulation, aiming to alleviate the above issue. Specifically, as an effective loss function, the CLF is proven to enhance the robustness of the identification of cell types. Then, we adopt the bounded constraint to ensure the entry values of single-cell data within the restricted interval. Finally, the performance of CBLRR is evaluated on 15 scRNA-seq datasets, and compared with other state-of-the-art methods. The experimental results demonstrate that CBLRR performs accurately and robustly on clustering scRNA-seq data. Furthermore, CBLRR is an effective tool to cluster cells, and provides great potential for downstream analysis of single-cell data. The source code of CBLRR is available online at https://github.com/Ginnay/CBLRR.
Collapse
Affiliation(s)
- Qian Ding
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Wenyi Yang
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Chang Xu
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Zhaochun Xu
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Fenglan Pang
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Yideng Cai
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Anastasia A Anashkina
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences , Moscow, Russia
| | - Xi Su
- Foshan Maternity & Child Healthcare Hospital, Southern Medical University , Foshan, Guangdong, China
| | - Na Chen
- Department of Hematology, Shandong Provincial Hospital Affiliated to Shandong First Medical University , Jinan, Shandong, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| |
Collapse
|
19
|
Liu Q, Luo X, Li J, Wang G. scESI: evolutionary sparse imputation for single-cell transcriptomes from nearest neighbor cells. Brief Bioinform 2022; 23:6580519. [PMID: 35512331 DOI: 10.1093/bib/bbac144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 03/14/2022] [Accepted: 03/31/2022] [Indexed: 02/01/2023] Open
Abstract
The ubiquitous dropout problem in single-cell RNA sequencing technology causes a large amount of data noise in the gene expression profile. For this reason, we propose an evolutionary sparse imputation (ESI) algorithm for single-cell transcriptomes, which constructs a sparse representation model based on gene regulation relationships between cells. To solve this model, we design an optimization framework based on nondominated sorting genetics. This framework takes into account the topological relationship between cells and the variety of gene expression to iteratively search the global optimal solution, thereby learning the Pareto optimal cell-cell affinity matrix. Finally, we use the learned sparse relationship model between cells to improve data quality and reduce data noise. In simulated datasets, scESI performed significantly better than benchmark methods with various metrics. By applying scESI to real scRNA-seq datasets, we discovered scESI can not only further classify the cell types and separate cells in visualization successfully but also improve the performance in reconstructing trajectories differentiation and identifying differentially expressed genes. In addition, scESI successfully recovered the expression trends of marker genes in stem cell differentiation and can discover new cell types and putative pathways regulating biological processes.
Collapse
Affiliation(s)
- Qiaoming Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ximei Luo
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Guohua Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
20
|
Tian Q, Zou J, Tang J, Liang L, Cao X, Fan S. scMelody: An Enhanced Consensus-Based Clustering Model for Single-Cell Methylation Data by Reconstructing Cell-to-Cell Similarity. Front Bioeng Biotechnol 2022; 10:842019. [PMID: 35284424 PMCID: PMC8905497 DOI: 10.3389/fbioe.2022.842019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 01/24/2022] [Indexed: 11/13/2022] Open
Abstract
Single-cell DNA methylation sequencing technology has brought new perspectives to investigate epigenetic heterogeneity, supporting a need for computational methods to cluster cells based on single-cell methylation profiles. Although several methods have been developed, most of them cluster cells based on single (dis)similarity measures, failing to capture complete cell heterogeneity and resulting in locally optimal solutions. Here, we present scMelody, which utilizes an enhanced consensus-based clustering model to reconstruct cell-to-cell methylation similarity patterns and identifies cell subpopulations with the leveraged information from multiple basic similarity measures. Besides, benefitted from the reconstructed cell-to-cell similarity measure, scMelody could conveniently leverage the clustering validation criteria to determine the optimal number of clusters. Assessments on distinct real datasets showed that scMelody accurately recapitulated methylation subpopulations and outperformed existing methods in terms of both cluster partitions and the number of clusters. Moreover, when benchmarking the clustering stability of scMelody on a variety of synthetic datasets, it achieved significant clustering performance gains over existing methods and robustly maintained its clustering accuracy over a wide range of number of cells, number of clusters and CpG dropout proportions. Finally, the real case studies demonstrated the capability of scMelody to assess known cell types and uncover novel cell clusters.
Collapse
Affiliation(s)
- Qi Tian
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Jianxiao Zou
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China
- Intelligent Terminal Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, China
- Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, China
| | - Jianxiong Tang
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Liang Liang
- Cancer Center, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaohong Cao
- Department of Geriatric Endocrinology, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu, China
| | - Shicai Fan
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China
- Intelligent Terminal Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, China
- Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, China
- *Correspondence: Shicai Fan,
| |
Collapse
|
21
|
Ge J, King JL, Smuts A, Budowle B. Precision DNA Mixture Interpretation with Single-Cell Profiling. Genes (Basel) 2021; 12:1649. [PMID: 34828255 PMCID: PMC8623868 DOI: 10.3390/genes12111649] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 10/14/2021] [Accepted: 10/14/2021] [Indexed: 11/16/2022] Open
Abstract
Wet-lab based studies have exploited emerging single-cell technologies to address the challenges of interpreting forensic mixture evidence. However, little effort has been dedicated to developing a systematic approach to interpreting the single-cell profiles derived from the mixtures. This study is the first attempt to develop a comprehensive interpretation workflow in which single-cell profiles from mixtures are interpreted individually and holistically. In this approach, the genotypes from each cell are assessed, the number of contributors (NOC) of the single-cell profiles is estimated, followed by developing a consensus profile of each contributor, and finally the consensus profile(s) can be used for a DNA database search or comparing with known profiles to determine their potential sources. The potential of this single-cell interpretation workflow was assessed by simulation with various mixture scenarios and empirical allele drop-out and drop-in rates, the accuracies of estimating the NOC, the accuracies of recovering the true alleles by consensus, and the capabilities of deconvolving mixtures with related contributors. The results support that the single-cell based mixture interpretation can provide a precision that cannot beachieved with current standard CE-STR analyses. A new paradigm for mixture interpretation is available to enhance the interpretation of forensic genetic casework.
Collapse
Affiliation(s)
- Jianye Ge
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX 76107, USA; (J.L.K.); (A.S.); (B.B.)
- Department of Microbiology, Immunology and Genetics, University of North Texas Health Science Center, Fort Worth, TX 76107, USA
| | - Jonathan L. King
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX 76107, USA; (J.L.K.); (A.S.); (B.B.)
| | - Amy Smuts
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX 76107, USA; (J.L.K.); (A.S.); (B.B.)
| | - Bruce Budowle
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX 76107, USA; (J.L.K.); (A.S.); (B.B.)
- Department of Microbiology, Immunology and Genetics, University of North Texas Health Science Center, Fort Worth, TX 76107, USA
| |
Collapse
|
22
|
Wang M, Gu M, Liu L, Liu Y, Tian L. Single-Cell RNA Sequencing (scRNA-seq) in Cardiac Tissue: Applications and Limitations. Vasc Health Risk Manag 2021; 17:641-657. [PMID: 34629873 PMCID: PMC8495612 DOI: 10.2147/vhrm.s288090] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 09/14/2021] [Indexed: 12/16/2022] Open
Abstract
Cardiovascular diseases (CVDs) are a group of disorders of the blood vessels and heart, which are considered as the leading causes of death worldwide. The pathology of CVDs could be related to the functional abnormalities of multiple cell types in the heart. Single-cell RNA sequencing (scRNA-seq) technology is a powerful method for characterizing individual cells and elucidating the molecular mechanisms by providing a high resolution of transcriptomic changes at the single-cell level. Specifically, scRNA-seq has provided novel insights into CVDs by identifying rare cardiac cell types, inferring the trajectory tree, estimating RNA velocity, elucidating the cell-cell communication, and comparing healthy and pathological heart samples. In this review, we summarize the different scRNA-seq platforms and published single-cell datasets in the cardiovascular field, and describe the utilities and limitations of this technology. Lastly, we discuss the future perspective of the application of scRNA-seq technology into cardiovascular research.
Collapse
Affiliation(s)
- Mingqiang Wang
- Stanford Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Mingxia Gu
- Perinatal Institute, Division of Pulmonary Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
- Center for Stem Cell and Organoid Medicine, CuSTOM, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
| | - Ling Liu
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Yu Liu
- Stanford Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Lei Tian
- Stanford Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA, 94305, USA
| |
Collapse
|