1
|
Zhou Q, Wu J, Bei J, Zhai Z, Chen X, Liang W, Meng J, Liu M. Integration of single-cell sequencing and drug sensitivity profiling reveals an 11-gene prognostic model for liver cancer. Hum Genomics 2024; 18:132. [PMID: 39587687 PMCID: PMC11590408 DOI: 10.1186/s40246-024-00698-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 11/11/2024] [Indexed: 11/27/2024] Open
Abstract
BACKGROUND Liver cancer has a high global incidence, particularly in East Asia. Early detection difficulties lead to poor prognosis. Single-cell sequencing precisely identifies gene expression differences in specific cell types, making it valuable in tumor microenvironment research and immune drug development. However, the characteristics of tumor cells themselves are equally important for patient prognosis and treatment. METHODS We downloaded single-cell sequencing data from GSE189903, grouped cells by cluster markers, and classified epithelial cells into adjacent non-tumor, normal, and tumor cells. Differential gene and survival analyses identified significant differential genes. Using TCGA-LIHC data, we divided 370 patients into test and training sets. We constructed and validated a LASSO model based on these genes in both sets and two external datasets. Functional, immune infiltration, and mutation analyses were performed on high and low-risk groups. We also used RNA-seq and IC50 data of 15 liver cancer cell lines from GDSC, scoring them with our prognostic model to identify potential drugs for high-risk patients. RESULTS Dimensionality reduction and clustering of 34 single-cell samples identified five subgroups, with epithelial cells further classified. Differential gene analysis identified 124 significant genes. An 11-gene prognostic model was constructed, effectively stratifying patient prognosis (p < 0.05) and achieving an AUC above 0.6 for 5 year survival prediction in multiple cohorts. Functional analysis revealed that upregulated genes in high-risk groups were enriched in cell adhesion pathways, while downregulated genes were enriched in metabolic pathways. Mutation analysis showed more TP53 mutations in the high-risk group and more CTNNB1 mutations in the low-risk group. Immune infiltration analysis indicated higher immune scores and less CD8 + naive T cell infiltration in the high-risk group. Drug sensitivity analysis identified 14 drugs with lower IC50 in the high-risk group, including clinically approved Sorafenib and Axitinib for treating unresectable HCC. CONCLUSION We established an 11-gene prognostic model that effectively stratifies liver cancer patients based on differentially expressed genes between tumor and adjacent non-tumor cells clustered by scRNA-seq data. The two risk groups had significantly different molecular characteristics. We identified 14 drugs that might be effective for high-risk HCC patients. Our study provides novel insights into tumor cell characteristics, aiding in research on tumor development and treatment.
Collapse
Affiliation(s)
- Qunfang Zhou
- Department of Interventional Radiology, Chinese PLA General Hospital, Beijing, 100853, China
| | - Jingqiang Wu
- Department of Radiology, Guangzhou Chest Hospital, Guangzhou, 510095, Guangdong Province, China
| | - Jiaxin Bei
- Key Laboratory of Surveillance of Adverse Reactions Related to CAR T Cell Therapy, Department of Immuno-Oncology, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, 510062, Guangdong Province, China
| | - Zixuan Zhai
- Department of Radiology, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, 510260, Guangdong Province, China
| | - Xiuzhen Chen
- Department of Radiology, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510630, Guangdong Province, China
| | - Wei Liang
- Department of Radiology, The First People's Hospital of Foshan, Foshan, 528010, Guangdong Province, China
| | - Jing Meng
- Department of Ophthalmology, The First Affiliated Hospital, Jinan University, Guangzhou, 510630, Guangdong Province, China.
| | - Mingyu Liu
- Department of Interventional Radiology, The Affiliated Shunde Hospital of Jinan University, Foshan, 528306, Guangdong Province, China.
| |
Collapse
|
2
|
Rivera Monroy LC, Rist L, Ostalecki C, Bauer A, Vera J, Breininger K, Maier A. Graph neural networks in multi-stained pathological imaging: extended comparative analysis of Radiomic features. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03277-x. [PMID: 39373802 DOI: 10.1007/s11548-024-03277-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 09/20/2024] [Indexed: 10/08/2024]
Abstract
PURPOSE This study investigates the application of Radiomic features within graph neural networks (GNNs) for the classification of multiple-epitope-ligand cartography (MELC) pathology samples. It aims to enhance the diagnosis of often misdiagnosed skin diseases such as eczema, lymphoma, and melanoma. The novel contribution lies in integrating Radiomic features with GNNs and comparing their efficacy against traditional multi-stain profiles. METHODS We utilized GNNs to process multiple pathological slides as cell-level graphs, comparing their performance with XGBoost and Random Forest classifiers. The analysis included two feature types: multi-stain profiles and Radiomic features. Dimensionality reduction techniques such as UMAP and t-SNE were applied to optimize the feature space, and graph connectivity was based on spatial and feature closeness. RESULTS Integrating Radiomic features into spatially connected graphs significantly improved classification accuracy over traditional models. The application of UMAP further enhanced the performance of GNNs, particularly in classifying diseases with similar pathological features. The GNN model outperformed baseline methods, demonstrating its robustness in handling complex histopathological data. CONCLUSION Radiomic features processed through GNNs show significant promise for multi-disease classification, improving diagnostic accuracy. This study's findings suggest that integrating advanced imaging analysis with graph-based modeling can lead to better diagnostic tools. Future research should expand these methods to a wider range of diseases to validate their generalizability and effectiveness.
Collapse
Affiliation(s)
- Luis Carlos Rivera Monroy
- Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
- Department of Dermatology, Universitätsklinikum Erlangen, Erlangen, Germany.
| | - Leonhard Rist
- Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | | | - Andreas Bauer
- Department of Dermatology, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Julio Vera
- Department of Dermatology, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Katharina Breininger
- Department of Artificial Intelligence in Biomedical Engineering, FAU Erlangen-Nürnberg, Erlangen, Germany
- Center for AI and Data Science (CAIDAS), Universität Würzburg, Würzburg, Germany
| | - Andreas Maier
- Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
3
|
Wang Z, Wang H, Zhao J, Xia J, Zheng C. scVSC: Deep Variational Subspace Clustering for Single-Cell Transcriptome Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1492-1503. [PMID: 38801694 DOI: 10.1109/tcbb.2024.3405731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a potent advancement for analyzing gene expression at the individual cell level, allowing for the identification of cellular heterogeneity and subpopulations. However, it suffers from technical limitations that result in sparse and heterogeneous data. Here, we propose scVSC, an unsupervised clustering algorithm built on deep representation neural networks. The method incorporates the variational inference into the subspace model, which imposes regularization constraints on the latent space and further prevents overfitting. In a series of experiments across multiple datasets, scVSC outperforms existing state-of-the-art unsupervised and semi-supervised clustering tools regarding clustering accuracy and running efficiency. Moreover, the study indicates that scVSC could visually reveal the state of trajectory differentiation, accurately identify differentially expressed genes, and further discover biologically critical pathways.
Collapse
|
4
|
Cai X, Zhang W, Zheng X, Xu Y, Li Y. scEM: A New Ensemble Framework for Predicting Cell Type Composition Based on scRNA-Seq Data. Interdiscip Sci 2024; 16:304-317. [PMID: 38368575 DOI: 10.1007/s12539-023-00601-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 12/22/2023] [Accepted: 12/24/2023] [Indexed: 02/19/2024]
Abstract
With the advent of single-cell RNA sequencing (scRNA-seq) technology, many scRNA-seq data have become available, providing an unprecedented opportunity to explore cellular composition and heterogeneity. Recently, many computational algorithms for predicting cell type composition have been developed, and these methods are typically evaluated on different datasets and performance metrics using diverse techniques. Consequently, the lack of comprehensive and standardized comparative analysis makes it difficult to gain a clear understanding of the strengths and weaknesses of these methods. To address this gap, we reviewed 20 cutting-edge unsupervised cell type identification methods and evaluated these methods comprehensively using 24 real scRNA-seq datasets of varying scales. In addition, we proposed a new ensemble cell-type identification method, named scEM, which learns the consensus similarity matrix by applying the entropy weight method to the four representative methods are selected. The Louvain algorithm is adopted to obtain the final classification of individual cells based on the consensus matrix. Extensive evaluation and comparison with 11 other similarity-based methods under real scRNA-seq datasets demonstrate that the newly developed ensemble algorithm scEM is effective in predicting cellular type composition.
Collapse
Affiliation(s)
- Xianxian Cai
- School of Sciences, East China Jiaotong University, Nanchang, 330013, China
| | - Wei Zhang
- School of Sciences, East China Jiaotong University, Nanchang, 330013, China.
| | - Xiaoying Zheng
- Operations research and planning department, Naval University of Engineering, Wuhan, 430033, China
| | - Yaxin Xu
- School of Sciences, East China Jiaotong University, Nanchang, 330013, China
| | - Yuanyuan Li
- School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan, China
| |
Collapse
|
5
|
Wang HY, Zhao JP, Zheng CH, Su YS. scGMAAE: Gaussian mixture adversarial autoencoders for diversification analysis of scRNA-seq data. Brief Bioinform 2023; 24:6966535. [PMID: 36592058 DOI: 10.1093/bib/bbac585] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 11/14/2022] [Accepted: 11/29/2022] [Indexed: 01/03/2023] Open
Abstract
The progress of single-cell RNA sequencing (scRNA-seq) has led to a large number of scRNA-seq data, which are widely used in biomedical research. The noise in the raw data and tens of thousands of genes pose a challenge to capture the real structure and effective information of scRNA-seq data. Most of the existing single-cell analysis methods assume that the low-dimensional embedding of the raw data belongs to a Gaussian distribution or a low-dimensional nonlinear space without any prior information, which limits the flexibility and controllability of the model to a great extent. In addition, many existing methods need high computational cost, which makes them difficult to be used to deal with large-scale datasets. Here, we design and develop a depth generation model named Gaussian mixture adversarial autoencoders (scGMAAE), assuming that the low-dimensional embedding of different types of cells follows different Gaussian distributions, integrating Bayesian variational inference and adversarial training, as to give the interpretable latent representation of complex data and discover the statistical distribution of different types of cells. The scGMAAE is provided with good controllability, interpretability and scalability. Therefore, it can process large-scale datasets in a short time and give competitive results. scGMAAE outperforms existing methods in several ways, including dimensionality reduction visualization, cell clustering, differential expression analysis and batch effect removal. Importantly, compared with most deep learning methods, scGMAAE requires less iterations to generate the best results.
Collapse
Affiliation(s)
- Hai-Yun Wang
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China
| | - Jian-Ping Zhao
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.,Institute of Mathematics and Physics, Xinjiang University, Urumqi, China
| | - Chun-Hou Zheng
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.,School of Artificial Intelligence, Anhui University, Hefei, China
| | - Yan-Sen Su
- School of Artificial Intelligence, Anhui University, Hefei, China
| |
Collapse
|
6
|
Wang HY, Zhao JP, Su YS, Zheng CH. scCDG: A Method Based on DAE and GCN for scRNA-Seq Data Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3685-3694. [PMID: 34752401 DOI: 10.1109/tcbb.2021.3126641] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Identifying cell types is one of the main goals of single-cell RNA sequencing (scRNA-seq) analysis, and clustering is a common method for this item. However, the massive amount of data and the excess noise level bring challenge for single cell clustering. To address this challenge, in this paper, we introduced a novel method named single-cell clustering based on denoising autoencoder and graph convolution network (scCDG), which consists of two core models. The first model is a denoising autoencoder (DAE) used to fit the data distribution for data denoising. The second model is a graph autoencoder using graph convolution network (GCN), which projects the data into a low-dimensional space (compressed) preserving topological structure information and feature information in scRNA-seq data simultaneously. Extensive analysis on seven real scRNA-seq datasets demonstrate that scCDG outperforms state-of-the-art methods in some research sub-fields, including single cell clustering, visualization of transcriptome landscape, and trajectory inference.
Collapse
|
7
|
Zhao JP, Hou TS, Su Y, Zheng CH. scSSA:A clustering method for single cell RNA-seq data based on semi-supervised autoencoder. Methods 2022; 208:66-74. [DOI: 10.1016/j.ymeth.2022.10.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/09/2022] [Accepted: 10/21/2022] [Indexed: 11/06/2022] Open
|
8
|
Wang HY, Zhao JP, Zheng CH, Su YS. scCNC: A method based on Capsule Network for Clustering scRNA-seq Data. Bioinformatics 2022; 38:3703-3709. [PMID: 35699473 DOI: 10.1093/bioinformatics/btac393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 05/28/2022] [Accepted: 06/11/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION A large number of studies have shown that clustering is a crucial step in scRNA-seq analysis. Most existing methods are based on unsupervised learning without the prior exploitation of any domain knowledge, which does not utilize available gold-standard labels. When confronted by the high dimensionality and general dropout events of scRNA-seq data, purely unsupervised clustering methods may not produce biologically interpretable clusters, which complicates cell type assignment. RESULTS In this paper, we propose a semi-supervised clustering method based on a capsule network named scCNC, that integrates domain knowledge into the clustering step. Significantly, we also propose a Semi-supervised Greedy Iterative Training (SGIT) method used to train the whole network. Experiments on some real scRNA-seq datasets show that scCNC can significantly improve clustering performance and facilitate downstream analyses. AVAILABILITY The source code of scCNC is freely available at https://github.com/WHY-17/scCNC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hai-Yun Wang
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China
| | - Jian-Ping Zhao
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.,Institute of Mathematics and Physics, Xinjiang University, Urumqi, China
| | - Chun-Hou Zheng
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.,School of Artificial Intelligence, Anhui University, Hefei, China
| | - Yan-Sen Su
- School of Artificial Intelligence, Anhui University, Hefei, China
| |
Collapse
|
9
|
Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP Algorithm. J Imaging 2022; 8:jimaging8040113. [PMID: 35448241 PMCID: PMC9028155 DOI: 10.3390/jimaging8040113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 04/08/2022] [Accepted: 04/11/2022] [Indexed: 02/05/2023] Open
Abstract
Dimensionality reduction techniques are often used by researchers in order to make high dimensional data easier to interpret visually, as data visualization is only possible in low dimensional spaces. Recent research in nonlinear dimensionality reduction introduced many effective algorithms, including t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), dimensionality reduction technique based on triplet constraints (TriMAP), and pairwise controlled manifold approximation (PaCMAP), aimed to preserve both the local and global structure of high dimensional data while reducing the dimensionality. The UMAP algorithm has found its application in bioinformatics, genetics, genomics, and has been widely used to improve the accuracy of other machine learning algorithms. In this research, we compare the performance of different fuzzy information discrimination measures used as loss functions in the UMAP algorithm while constructing low dimensional embeddings. In order to achieve this, we derive the gradients of the considered losses analytically and employ the Adam algorithm during the loss function optimization process. From the conducted experimental studies we conclude that the use of either the logarithmic fuzzy cross entropy loss without reduced repulsion or the symmetric logarithmic fuzzy cross entropy loss with sufficiently large neighbor count leads to better global structure preservation of the original multidimensional data when compared to the loss function used in the original UMAP algorithm implementation.
Collapse
|