1
|
Xiong S, Zhang J, Luo H, Zhang Y, Xiao Q. A heterogeneous graph transformer framework for accurate cancer driver gene prediction and downstream analysis. Methods 2024:S1046-2023(24)00216-0. [PMID: 39426693 DOI: 10.1016/j.ymeth.2024.09.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 09/26/2024] [Accepted: 09/29/2024] [Indexed: 10/21/2024] Open
Abstract
Accurately predicting cancer driver genes remains a formidable challenge amidst the burgeoning volume and intricacy of cancer genomic data. In this investigation, we propose HGTDG, an innovative heterogeneous graph transformer framework tailored for precisely predicting cancer driver genes and exploring downstream tasks. A heterogeneous graph construction module is central to the framework, which assembles a gene-protein heterogeneous network leveraging the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and protein-protein interactions sourced from the STRING (search tool for recurring instances of neighboring genes) database. Moreover, our framework introduces a pioneering heterogeneous graph transformer module, harnessing multi-head attention mechanisms for nuanced node embedding. This transformative module proficiently captures distinct representations for both nodes and edges, thereby enriching the model's predictive capacity. Subsequently, the generated node embeddings are seamlessly integrated into a classification module, facilitating the discrimination between driver and non-driver genes. Our experimental findings evince the superiority of HGTDG over existing methodologies, as evidenced by the enhanced performance metrics, including the area under the receiver operating characteristic curves (AUROC) and the area under the precision-recall curves (AUPRC). Furthermore, the downstream analysis utilizing the newly identified cancer driver genes underscores the efficacy and versatility of our proposed framework.
Collapse
Affiliation(s)
- Shuwen Xiong
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Junming Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Hong Luo
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Qinyin Xiao
- Sichuan Institute of Computer Sciences, Chengdu, 610041, China.
| |
Collapse
|
2
|
Xu L, Li Z, Ren J, Liu S, Xu Y. Single-cell RNA sequencing data analysis utilizing multi-type graph neural networks. Comput Biol Med 2024; 179:108921. [PMID: 39059210 DOI: 10.1016/j.compbiomed.2024.108921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 07/08/2024] [Accepted: 07/16/2024] [Indexed: 07/28/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) is the sequencing technology of a single cell whose expression reflects the overall characteristics of the individual cell, facilitating the research of problems at the cellular level. However, the problems of scRNA-seq such as dimensionality reduction processing of massive data, technical noise in data, and visualization of single-cell type clustering cause great difficulties for analyzing and processing scRNA-seq data. In this paper, we propose a new single-cell data analysis model using denoising autoencoder and multi-type graph neural networks (scDMG), which learns cell-cell topology information and latent representation of scRNA-seq data. scDMG introduces the zero-inflated negative binomial (ZINB) model into a denoising autoencoder (DAE) to perform dimensionality reduction and denoising on the raw data. scDMG integrates multiple-type graph neural networks as the encoder to further train the preprocessed data, which better deals with various types of scRNA-seq datasets, resolves dropout events in scRNA-seq data, and enables preliminary classification of scRNA-seq data. By employing TSNE and PCA algorithms for the trained data and invoking Louvain algorithm, scDMG has better dimensionality reduction and clustering optimization. Compared with other mainstream scRNA-seq clustering algorithms, scDMG outperforms other state-of-the-art methods in various clustering performance metrics and shows better scalability, shorter runtime, and great clustering results.
Collapse
Affiliation(s)
- Li Xu
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China
| | - Zhenpeng Li
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China.
| | - Jiaxu Ren
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China
| | - Shuaipeng Liu
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China
| | - Yiming Xu
- College of Engineering, Tokyo Institute of Technology, Tokyo, 226-0026, Tokyo, Japan
| |
Collapse
|
3
|
Sun Y, Kong L, Huang J, Deng H, Bian X, Li X, Cui F, Dou L, Cao C, Zou Q, Zhang Z. A comprehensive survey of dimensionality reduction and clustering methods for single-cell and spatial transcriptomics data. Brief Funct Genomics 2024:elae023. [PMID: 38860675 DOI: 10.1093/bfgp/elae023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 02/29/2024] [Accepted: 05/27/2024] [Indexed: 06/12/2024] Open
Abstract
In recent years, the application of single-cell transcriptomics and spatial transcriptomics analysis techniques has become increasingly widespread. Whether dealing with single-cell transcriptomic or spatial transcriptomic data, dimensionality reduction and clustering are indispensable. Both single-cell and spatial transcriptomic data are often high-dimensional, making the analysis and visualization of such data challenging. Through dimensionality reduction, it becomes possible to visualize the data in a lower-dimensional space, allowing for the observation of relationships and differences between cell subpopulations. Clustering enables the grouping of similar cells into the same cluster, aiding in the identification of distinct cell subpopulations and revealing cellular diversity, providing guidance for downstream analyses. In this review, we systematically summarized the most widely recognized algorithms employed for the dimensionality reduction and clustering analysis of single-cell transcriptomic and spatial transcriptomic data. This endeavor provides valuable insights and ideas that can contribute to the development of novel tools in this rapidly evolving field.
Collapse
Affiliation(s)
- Yidi Sun
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Lingling Kong
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Jiayi Huang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Hongyan Deng
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Xinling Bian
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Xingfeng Li
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Lijun Dou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland, OH 44106, United States
| | - Chen Cao
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 210029, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| |
Collapse
|
4
|
Wang H, Liu Z, Ma X. Learning Consistency and Specificity of Cells From Single-Cell Multi-Omic Data. IEEE J Biomed Health Inform 2024; 28:3134-3145. [PMID: 38709615 DOI: 10.1109/jbhi.2024.3370868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Advancements in single-cell technologies concomitantly develop the epigenomic and transcriptomic profiles at the cell levels, providing opportunities to explore the potential biological mechanisms. Even though significant efforts have been dedicated to them, it remains challenging for the integration analysis of multi-omic data of single-cell because of the heterogeneity, complicated coupling and interpretability of data. To handle these issues, we propose a novel self-representation Learning-based Multi-omics data Integrative Clustering algorithm (sLMIC) for the integration of single-cell epigenomic profiles (DNA methylation or scATAC-seq) and transcriptomic (scRNA-seq), which the consistent and specific features of cells are explicitly extracted facilitating the cell clustering. Specifically, sLMIC constructs a graph for each type of single-cell data, thereby transforming omics data into multi-layer networks, which effectively removes heterogeneity of omic data. Then, sLMIC employs the low-rank and exclusivity constraints to separate the self-representation of cells into two parts, i.e., the shared and specific features, which explicitly characterize the consistency and diversity of omic data, providing an effective strategy to model the structure of cell types. Feature extraction and cell clustering are jointly formulated as an overall objective function, where latent features of data are obtained under the guidance of cell clustering. The extensive experimental results on 13 multi-omics datasets of single-cell from diverse organisms and tissues indicate that sLMIC observably exceeds the advanced algorithms regarding various measurements.
Collapse
|
5
|
Agerskov RH, Nyeng P. Innervation of the pancreas in development and disease. Development 2024; 151:dev202254. [PMID: 38265192 DOI: 10.1242/dev.202254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
The autonomic nervous system innervates the pancreas by sympathetic, parasympathetic and sensory branches during early organogenesis, starting with neural crest cell invasion and formation of an intrinsic neuronal network. Several studies have demonstrated that signals from pancreatic neural crest cells direct pancreatic endocrinogenesis. Likewise, autonomic neurons have been shown to regulate pancreatic islet formation, and have also been implicated in type I diabetes. Here, we provide an overview of recent progress in mapping pancreatic innervation and understanding the interactions between pancreatic neurons, epithelial morphogenesis and cell differentiation. Finally, we discuss pancreas innervation as a factor in the development of diabetes.
Collapse
Affiliation(s)
- Rikke Hoegsberg Agerskov
- Roskilde University, Department of Science and Environment, Universitetsvej 1, building 28, Roskilde 4000, Denmark
| | - Pia Nyeng
- Roskilde University, Department of Science and Environment, Universitetsvej 1, building 28, Roskilde 4000, Denmark
| |
Collapse
|
6
|
Fang Z, Zheng R, Li M. scMAE: a masked autoencoder for single-cell RNA-seq clustering. Bioinformatics 2024; 40:btae020. [PMID: 38230824 PMCID: PMC10832357 DOI: 10.1093/bioinformatics/btae020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 01/07/2024] [Accepted: 01/12/2024] [Indexed: 01/18/2024] Open
Abstract
MOTIVATION Single-cell RNA sequencing has emerged as a powerful technology for studying gene expression at the individual cell level. Clustering individual cells into distinct subpopulations is fundamental in scRNA-seq data analysis, facilitating the identification of cell types and exploration of cellular heterogeneity. Despite the recent development of many deep learning-based single-cell clustering methods, few have effectively exploited the correlations among genes, resulting in suboptimal clustering outcomes. RESULTS Here, we propose a novel masked autoencoder-based method, scMAE, for cell clustering. scMAE perturbs gene expression and employs a masked autoencoder to reconstruct the original data, learning robust and informative cell representations. The masked autoencoder introduces a masking predictor, which captures relationships among genes by predicting whether gene expression values are masked. By integrating this masking mechanism, scMAE effectively captures latent structures and dependencies in the data, enhancing clustering performance. We conducted extensive comparative experiments using various clustering evaluation metrics on 15 scRNA-seq datasets from different sequencing platforms. Experimental results indicate that scMAE outperforms other state-of-the-art methods on these datasets. In addition, scMAE accurately identifies rare cell types, which are challenging to detect due to their low abundance. Furthermore, biological analyses confirm the biological significance of the identified cell subpopulations. AVAILABILITY AND IMPLEMENTATION The source code of scMAE is available at: https://zenodo.org/records/10465991.
Collapse
Affiliation(s)
- Zhaoyu Fang
- School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District, Changsha 410083, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District, Changsha 410083, China
| |
Collapse
|
7
|
Zheng W, Min W, Wang S. TsImpute: an accurate two-step imputation method for single-cell RNA-seq data. Bioinformatics 2023; 39:btad731. [PMID: 38039139 PMCID: PMC10724850 DOI: 10.1093/bioinformatics/btad731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 11/22/2023] [Accepted: 11/30/2023] [Indexed: 12/03/2023] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) technology has enabled discovering gene expression patterns at single cell resolution. However, due to technical limitations, there are usually excessive zeros, called "dropouts," in scRNA-seq data, which may mislead the downstream analysis. Therefore, it is crucial to impute these dropouts to recover the biological information. RESULTS We propose a two-step imputation method called tsImpute to impute scRNA-seq data. At the first step, tsImpute adopts zero-inflated negative binomial distribution to discriminate dropouts from true zeros and performs initial imputation by calculating the expected expression level. At the second step, it conducts clustering with this modified expression matrix, based on which the final distance weighted imputation is performed. Numerical results based on both simulated and real data show that tsImpute achieves favorable performance in terms of gene expression recovery, cell clustering, and differential expression analysis. AVAILABILITY AND IMPLEMENTATION The R package of tsImpute is available at https://github.com/ZhengWeihuaYNU/tsImpute.
Collapse
Affiliation(s)
- Weihua Zheng
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650504, China
| | - Wenwen Min
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650504, China
- Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming 650504, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650504, China
- Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming 650504, China
| |
Collapse
|
8
|
Liu T, Fang ZY, Li X, Zhang LN, Cao DS, Yin MZ. Graph deep learning enabled spatial domains identification for spatial transcriptomics. Brief Bioinform 2023; 24:7130976. [PMID: 37080761 DOI: 10.1093/bib/bbad146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 03/02/2023] [Accepted: 03/27/2023] [Indexed: 04/22/2023] Open
Abstract
Advancing spatially resolved transcriptomics (ST) technologies help biologists comprehensively understand organ function and tissue microenvironment. Accurate spatial domain identification is the foundation for delineating genome heterogeneity and cellular interaction. Motivated by this perspective, a graph deep learning (GDL) based spatial clustering approach is constructed in this paper. First, the deep graph infomax module embedded with residual gated graph convolutional neural network is leveraged to address the gene expression profiles and spatial positions in ST. Then, the Bayesian Gaussian mixture model is applied to handle the latent embeddings to generate spatial domains. Designed experiments certify that the presented method is superior to other state-of-the-art GDL-enabled techniques on multiple ST datasets. The codes and dataset used in this manuscript are summarized at https://github.com/narutoten520/SCGDL.
Collapse
Affiliation(s)
- Teng Liu
- Clinical Research Center (CRC), Clinical Pathology Center (CPC), Chongqing University Three Gorges Hospital, Chongqing University, Wanzhou, Chongqing, P.R. China
| | - Zhao-Yu Fang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering at Central South University, Hunan, P.R. China
| | - Xin Li
- Clinical Research Center (CRC), Clinical Pathology Center (CPC), Chongqing University Three Gorges Hospital, Chongqing University, Wanzhou, Chongqing, P.R. China
| | - Li-Ning Zhang
- Clinical Research Center (CRC), Clinical Pathology Center (CPC), Chongqing University Three Gorges Hospital, Chongqing University, Wanzhou, Chongqing, P.R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P.R. China
| | - Ming-Zhu Yin
- Clinical Research Center (CRC), Clinical Pathology Center (CPC), Chongqing University Three Gorges Hospital, Chongqing University, Wanzhou, Chongqing, P.R. China
- Translational Medicine Research Center (TMRC), School of Medicine, Chongqing University, Shapingba, Chongqing, P.R. China
| |
Collapse
|
9
|
Lu J, Sheng Y, Qian W, Pan M, Zhao X, Ge Q. scRNA-seq data analysis method to improve analysis performance. IET Nanobiotechnol 2023; 17:246-256. [PMID: 36727937 DOI: 10.1049/nbt2.12115] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 12/28/2022] [Accepted: 12/30/2022] [Indexed: 02/03/2023] Open
Abstract
With the development of single-cell RNA sequencing technology (scRNA-seq), we have the ability to study biological questions at the level of the individual cell transcriptome. Nowadays, many analysis tools, specifically suitable for single-cell RNA sequencing data, have been developed. In this review, the currently commonly used scRNA-seq protocols are discussed. The upstream processing flow pipeline of scRNA-seq data, including goals and popular tools for reads mapping and expression quantification, quality control, normalization, imputation, and batch effect removal is also introduced. Finally, methods to evaluate these tools in both cellular and genetic dimensions, clustering and differential expression analysis are presented.
Collapse
Affiliation(s)
- Junru Lu
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Yuqi Sheng
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Weiheng Qian
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Min Pan
- School of Medicine, Southeast University, Nanjing, China
| | - Xiangwei Zhao
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Qinyu Ge
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| |
Collapse
|
10
|
Zhang J, Liu X, Huang Z, Wu C, Zhang F, Han A, Stalin A, Lu S, Guo S, Huang J, Liu P, Shi R, Zhai Y, Chen M, Zhou W, Bai M, Wu J. T cell-related prognostic risk model and tumor immune environment modulation in lung adenocarcinoma based on single-cell and bulk RNA sequencing. Comput Biol Med 2023; 152:106460. [PMID: 36565482 DOI: 10.1016/j.compbiomed.2022.106460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 12/06/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND T cells are present in all stages of tumor formation and play an important role in the tumor microenvironment. We aimed to explore the expression profile of T cell marker genes, constructed a prognostic risk model based on these genes in Lung adenocarcinoma (LUAD), and investigated the link between this risk model and the immunotherapy response. METHODS We obtained the single-cell sequencing data of LUAD from the literature, and screened out 6 tissue biopsy samples, including 32,108 cells from patients with non-small cell lung cancer, to identify T cell marker genes in LUAD. Combined with TCGA database, a prognostic risk model based on T-cell marker gene was constructed, and the data from GEO database was used for verification. We also investigated the association between this risk model and immunotherapy response. RESULTS Based on scRNA-seq data 1839 T-cell marker genes were identified, after which a risk model consisting of 9 gene signatures for prognosis was constructed in combination with the TCGA dataset. This risk model divided patients into high-risk and low-risk groups based on overall survival. The multivariate analysis demonstrated that the risk model was an independent prognostic factor. Analysis of immune profiles showed that high-risk groups presented discriminative immune-cell infiltrations and immune-suppressive states. Risk scores of the model were closely correlated with Linoleic acid metabolism, intestinal immune network for IgA production and drug metabolism cytochrome P450. CONCLUSION Our study proposed a novel prognostic risk model based on T cell marker genes for LUAD patients. The survival of LUAD patients as well as treatment outcomes may be accurately predicted by the prognostic risk model, and make the high-risk population present different immune cell infiltration and immunosuppression state.
Collapse
Affiliation(s)
- Jingyuan Zhang
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Xinkui Liu
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Zhihong Huang
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Chao Wu
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Fanqin Zhang
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Aiqing Han
- School of Management, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Antony Stalin
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Shan Lu
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Siyu Guo
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Jiaqi Huang
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Pengyun Liu
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Rui Shi
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Yiyan Zhai
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Meilin Chen
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Wei Zhou
- Pharmacy Department, China-Japan Friendship Hospital, Beijing, 100029, China.
| | - Meirong Bai
- Key Laboratory of Mongolian Medicine Research and Development Engineering, Ministry of Education, Tongliao, 028000, China.
| | - Jiarui Wu
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China.
| |
Collapse
|
11
|
Rong Z, Liu Z, Song J, Cao L, Yu Y, Qiu M, Hou Y. MCluster-VAEs: An end-to-end variational deep learning-based clustering method for subtype discovery using multi-omics data. Comput Biol Med 2022; 150:106085. [PMID: 36162197 DOI: 10.1016/j.compbiomed.2022.106085] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/30/2022] [Accepted: 09/03/2022] [Indexed: 11/03/2022]
Abstract
The discovery of cancer subtypes based on unsupervised clustering helps in providing a precise diagnosis, guide treatment, and improve patients' prognoses. Instead of single-omics data, multi-omics data can improve the clustering performance because it obtains a comprehensive landscape for understanding biological systems and mechanisms. However, heterogeneous data from multiple sources raises high complexity and different kinds of noise, which are detrimental to the extraction of clustering information. We propose an end-to-end deep learning based method, called Multi-omics Clustering Variational Autoencoders (MCluster-VAEs), that can extract cluster-friendly representations on multi-omics data. First, a unified network architecture with an attention mechanism was developed for accurately modeling multi-omics data. Then, using a novel objective function built from the Variational Bayes technique, the model was trained to effectively obtain the posterior estimation of the clustering assignments. Compared with 12 other state-of-the-art multi-omics clustering methods, MCluster-VAEs achieved an outstanding performance on benchmark datasets from the TCGA database. On the Pan Cancer dataset, MCluster-VAEs achieved an adjusted Rand index of approximately 0.78 for cancer category recognition, an increase of more than 18% compared with other methods. Furthermore, a survival analysis and clinical parameter enrichment tests conducted on 10 cancer datasets demonstrated that MCluster-VAEs provides comparable and even better results than many common integrative approaches. These results demonstrate that MCluster-VAEs are a powerful new tool for dissecting complex multi-omics relationships and providing new insights for cancer subtype discovery.
Collapse
Affiliation(s)
- Zhiwei Rong
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Zhilin Liu
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Jiali Song
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Lei Cao
- Department of Epidemiology and Biostatistics Harbin, Harbin Medical University School of Public Health, Harbin, 150000, Heilongjiang, China
| | - Yipe Yu
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Mantang Qiu
- Department of Thoracic Surgery Beijing, Peking University People's Hospital, Beijing, 100000, China.
| | - Yan Hou
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China; Peking University Clinical Research Center, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China.
| |
Collapse
|
12
|
Liu Q, Luo X, Li J, Wang G. scESI: evolutionary sparse imputation for single-cell transcriptomes from nearest neighbor cells. Brief Bioinform 2022; 23:6580519. [PMID: 35512331 DOI: 10.1093/bib/bbac144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 03/14/2022] [Accepted: 03/31/2022] [Indexed: 02/01/2023] Open
Abstract
The ubiquitous dropout problem in single-cell RNA sequencing technology causes a large amount of data noise in the gene expression profile. For this reason, we propose an evolutionary sparse imputation (ESI) algorithm for single-cell transcriptomes, which constructs a sparse representation model based on gene regulation relationships between cells. To solve this model, we design an optimization framework based on nondominated sorting genetics. This framework takes into account the topological relationship between cells and the variety of gene expression to iteratively search the global optimal solution, thereby learning the Pareto optimal cell-cell affinity matrix. Finally, we use the learned sparse relationship model between cells to improve data quality and reduce data noise. In simulated datasets, scESI performed significantly better than benchmark methods with various metrics. By applying scESI to real scRNA-seq datasets, we discovered scESI can not only further classify the cell types and separate cells in visualization successfully but also improve the performance in reconstructing trajectories differentiation and identifying differentially expressed genes. In addition, scESI successfully recovered the expression trends of marker genes in stem cell differentiation and can discover new cell types and putative pathways regulating biological processes.
Collapse
Affiliation(s)
- Qiaoming Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ximei Luo
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Guohua Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
13
|
Pan J, You ZH, Li LP, Huang WZ, Guo JX, Yu CQ, Wang LP, Zhao ZY. DWPPI: A Deep Learning Approach for Predicting Protein–Protein Interactions in Plants Based on Multi-Source Information With a Large-Scale Biological Network. Front Bioeng Biotechnol 2022; 10:807522. [PMID: 35387292 PMCID: PMC8978800 DOI: 10.3389/fbioe.2022.807522] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 02/25/2022] [Indexed: 12/30/2022] Open
Abstract
The prediction of protein–protein interactions (PPIs) in plants is vital for probing the cell function. Although multiple high-throughput approaches in the biological domain have been developed to identify PPIs, with the increasing complexity of PPI network, these methods fall into laborious and time-consuming situations. Thus, it is essential to develop an effective and feasible computational method for the prediction of PPIs in plants. In this study, we present a network embedding-based method, called DWPPI, for predicting the interactions between different plant proteins based on multi-source information and combined with deep neural networks (DNN). The DWPPI model fuses the protein natural language sequence information (attribute information) and protein behavior information to represent plant proteins as feature vectors and finally sends these features to a deep learning–based classifier for prediction. To validate the prediction performance of DWPPI, we performed it on three model plant datasets: Arabidopsis thaliana (A. thaliana), mazie (Zea mays), and rice (Oryza sativa). The experimental results with the fivefold cross-validation technique demonstrated that DWPPI obtains great performance with the AUC (area under ROC curves) values of 0.9548, 0.9867, and 0.9213, respectively. To further verify the predictive capacity of DWPPI, we compared it with some different state-of-the-art machine learning classifiers. Moreover, case studies were performed with the AC149810.2_FGP003 protein. As a result, 14 of the top 20 PPI pairs identified by DWPPI with the highest scores were confirmed by the literature. These excellent results suggest that the DWPPI model can act as a promising tool for related plant molecular biology.
Collapse
Affiliation(s)
- Jie Pan
- School of Information Engineering, Xijing University, Xi’an, China
| | - Zhu-Hong You
- School of Information Engineering, Xijing University, Xi’an, China
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi’an, China
- College of Grassland and Environment Science, Xinjiang Agricultural University, Urumqi, China
- *Correspondence: Li-Ping Li, ; Chang-Qing Yu,
| | - Wen-Zhun Huang
- School of Information Engineering, Xijing University, Xi’an, China
| | - Jian-Xin Guo
- School of Information Engineering, Xijing University, Xi’an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi’an, China
- *Correspondence: Li-Ping Li, ; Chang-Qing Yu,
| | - Li-Ping Wang
- School of Information Engineering, Xijing University, Xi’an, China
| | - Zheng-Yang Zhao
- School of Information Engineering, Xijing University, Xi’an, China
| |
Collapse
|