1
|
Liao X, Li Y, Li S, Wen L, Li X, Yu B. Enhanced Integration of Single-Cell Multi-Omics Data Using Graph Attention Networks. ACS Synth Biol 2025. [PMID: 39888834 DOI: 10.1021/acssynbio.4c00864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2025]
Abstract
The continuous advancement of single-cell multimodal omics (scMulti-omics) technologies offers unprecedented opportunities to measure various modalities, including RNA expression, protein abundance, gene perturbation, DNA methylation, and chromatin accessibility at single-cell resolution. These advances hold significant potential for breakthroughs by integrating diverse omics modalities. However, the data generated from different omics layers often face challenges due to high dimensionality, heterogeneity, and sparsity, which can adversely impact the accuracy and efficiency of data integration analyses. To address these challenges, we propose a high-precision analysis method called scMGAT (single-cell multiomics data analysis based on multihead graph attention networks). This method effectively coordinates reliable information across multiomics data sets using a multihead attention mechanism, allowing for better management of the heterogeneous characteristics inherent in scMulti-omics data. We evaluated scMGAT's performance on eight sets of real scMulti-omics data, including samples from both human and mouse. The experimental results demonstrate that scMGAT significantly enhances the quality of multiomics data and improves the accuracy of cell-type annotation compared to state-of-the-art methods. scMGAT is now freely accessible at https://github.com/Xingyu-Liao/scMGAT.
Collapse
Affiliation(s)
- Xingyu Liao
- School of Computer Science, Northwestern Polytechnical University (NPU), Chang'an Campus, Xi'an, Shaanxi 710072, P.R. China
| | - Yanyan Li
- School of Computer Science, Northwestern Polytechnical University (NPU), Chang'an Campus, Xi'an, Shaanxi 710072, P.R. China
| | - Shuangyi Li
- School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, P.R. China
| | - Long Wen
- School of Computer Science, Northwestern Polytechnical University (NPU), Chang'an Campus, Xi'an, Shaanxi 710072, P.R. China
| | - Xingyi Li
- School of Computer Science, Northwestern Polytechnical University (NPU), Chang'an Campus, Xi'an, Shaanxi 710072, P.R. China
| | - Bin Yu
- School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, P.R. China
| |
Collapse
|
2
|
Wu CH, Zhou X, Chen M. The curses of performing differential expression analysis using single-cell data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.28.596315. [PMID: 38853843 PMCID: PMC11160624 DOI: 10.1101/2024.05.28.596315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Differential expression analysis is pivotal in single-cell transcriptomics for unraveling cell-type- specific responses to stimuli. While numerous methods are available to identify differentially expressed genes in single-cell data, recent evaluations of both single-cell-specific methods and methods adapted from bulk studies have revealed significant shortcomings in performance. In this paper, we dissect the four major challenges in single-cell DE analysis: normalization, excessive zeros, donor effects, and cumulative biases. These "curses" underscore the limitations and conceptual pitfalls in existing workflows. In response, we introduce a novel paradigm addressing several of these issues.
Collapse
|
3
|
Li X, Hao J, Li J, Zhao Z, Shang X, Li M. Pathway Activation Analysis for Pan-Cancer Personalized Characterization Based on Riemannian Manifold. Int J Mol Sci 2024; 25:4411. [PMID: 38673997 PMCID: PMC11050713 DOI: 10.3390/ijms25084411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
The pathogenesis of carcinoma is believed to come from the combined effect of polygenic variation, and the initiation and progression of malignant tumors are closely related to the dysregulation of biological pathways. Quantifying the alteration in pathway activation and identifying coordinated patterns of pathway dysfunction are the imperative part of understanding the malignancy process and distinguishing different tumor stages or clinical outcomes of individual patients. In this study, we have conducted in silico pathway activation analysis using Riemannian manifold (RiePath) toward pan-cancer personalized characterization, which is the first attempt to apply the Riemannian manifold theory to measure the extent of pathway dysregulation in individual patient on the tangent space of the Riemannian manifold. RiePath effectively integrates pathway and gene expression information, not only generating a relatively low-dimensional and biologically relevant representation, but also identifying a robust panel of biologically meaningful pathway signatures as biomarkers. The pan-cancer analysis across 16 cancer types reveals the capability of RiePath to evaluate pathway activation accurately and identify clinical outcome-related pathways. We believe that RiePath has the potential to provide new prospects in understanding the molecular mechanisms of complex diseases and may find broader applications in predicting biomarkers for other intricate diseases.
Collapse
Affiliation(s)
- Xingyi Li
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China; (X.L.); (J.H.); (X.S.)
| | - Jun Hao
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China; (X.L.); (J.H.); (X.S.)
| | - Junming Li
- School of Software, Northwestern Polytechnical University, Xi’an 710072, China; (J.L.); (Z.Z.)
| | - Zhelin Zhao
- School of Software, Northwestern Polytechnical University, Xi’an 710072, China; (J.L.); (Z.Z.)
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China; (X.L.); (J.H.); (X.S.)
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
4
|
Zhang H, Wang Y, Lian B, Wang Y, Li X, Wang T, Shang X, Yang H, Aziz A, Hu J. Scbean: a python library for single-cell multi-omics data analysis. Bioinformatics 2024; 40:btae053. [PMID: 38290765 PMCID: PMC10868338 DOI: 10.1093/bioinformatics/btae053] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 01/10/2024] [Accepted: 01/25/2024] [Indexed: 02/01/2024] Open
Abstract
SUMMARY Single-cell multi-omics technologies provide a unique platform for characterizing cell states and reconstructing developmental process by simultaneously quantifying and integrating molecular signatures across various modalities, including genome, transcriptome, epigenome, and other omics layers. However, there is still an urgent unmet need for novel computational tools in this nascent field, which are critical for both effective and efficient interrogation of functionality across different omics modalities. Scbean represents a user-friendly Python library, designed to seamlessly incorporate a diverse array of models for the examination of single-cell data, encompassing both paired and unpaired multi-omics data. The library offers uniform and straightforward interfaces for tasks, such as dimensionality reduction, batch effect elimination, cell label transfer from well-annotated scRNA-seq data to scATAC-seq data, and the identification of spatially variable genes. Moreover, Scbean's models are engineered to harness the computational power of GPU acceleration through Tensorflow, rendering them capable of effortlessly handling datasets comprising millions of cells. AVAILABILITY AND IMPLEMENTATION Scbean is released on the Python Package Index (PyPI) (https://pypi.org/project/scbean/) and GitHub (https://github.com/jhu99/scbean) under the MIT license. The documentation and example code can be found at https://scbean.readthedocs.io/en/latest/.
Collapse
Affiliation(s)
- Haohui Zhang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Yuwei Wang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Bin Lian
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Yiran Wang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Xingyi Li
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Hui Yang
- School of Life Science, Northwestern Polytechnical University, 710072 Xi'an, Shaanxi, China
| | - Ahmad Aziz
- Population Health Sciences, German Center for Neurodegenerative Diseases (DZNE), 53127 Bonn, Germany
- Department of Neurology, Faculty of Medicine, University of Bonn, 53105 Bonn, Germany
| | - Jialu Hu
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
- Population Health Sciences, German Center for Neurodegenerative Diseases (DZNE), 53127 Bonn, Germany
| |
Collapse
|
5
|
Maden SK, Kwon SH, Huuki-Myers LA, Collado-Torres L, Hicks SC, Maynard KR. Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single-cell RNA-sequencing datasets. Genome Biol 2023; 24:288. [PMID: 38098055 PMCID: PMC10722720 DOI: 10.1186/s13059-023-03123-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/24/2023] [Indexed: 12/17/2023] Open
Abstract
Deconvolution of cell mixtures in "bulk" transcriptomic samples from homogenate human tissue is important for understanding disease pathologies. However, several experimental and computational challenges impede transcriptomics-based deconvolution approaches using single-cell/nucleus RNA-seq reference atlases. Cells from the brain and blood have substantially different sizes, total mRNA, and transcriptional activities, and existing approaches may quantify total mRNA instead of cell type proportions. Further, standards are lacking for the use of cell reference atlases and integrative analyses of single-cell and spatial transcriptomics data. We discuss how to approach these key challenges with orthogonal "gold standard" datasets for evaluating deconvolution methods.
Collapse
Affiliation(s)
- Sean K Maden
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Sang Ho Kwon
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Louise A Huuki-Myers
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Leonardo Collado-Torres
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, USA.
| | - Kristen R Maynard
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA.
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
6
|
Wang Y, Lian B, Zhang H, Zhong Y, He J, Wu F, Reinert K, Shang X, Yang H, Hu J. A multi-view latent variable model reveals cellular heterogeneity in complex tissues for paired multimodal single-cell data. Bioinformatics 2023; 39:btad005. [PMID: 36622018 PMCID: PMC9857983 DOI: 10.1093/bioinformatics/btad005] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 12/27/2022] [Accepted: 01/06/2023] [Indexed: 01/10/2023] Open
Abstract
MOTIVATION Single-cell multimodal assays allow us to simultaneously measure two different molecular features of the same cell, enabling new insights into cellular heterogeneity, cell development and diseases. However, most existing methods suffer from inaccurate dimensionality reduction for the joint-modality data, hindering their discovery of novel or rare cell subpopulations. RESULTS Here, we present VIMCCA, a computational framework based on variational-assisted multi-view canonical correlation analysis to integrate paired multimodal single-cell data. Our statistical model uses a common latent variable to interpret the common source of variances in two different data modalities. Our approach jointly learns an inference model and two modality-specific non-linear models by leveraging variational inference and deep learning. We perform VIMCCA and compare it with 10 existing state-of-the-art algorithms on four paired multi-modal datasets sequenced by different protocols. Results demonstrate that VIMCCA facilitates integrating various types of joint-modality data, thus leading to more reliable and accurate downstream analysis. VIMCCA improves our ability to identify novel or rare cell subtypes compared to existing widely used methods. Besides, it can also facilitate inferring cell lineage based on joint-modality profiles. AVAILABILITY AND IMPLEMENTATION The VIMCCA algorithm has been implemented in our toolkit package scbean (≥0.5.0), and its code has been archived at https://github.com/jhu99/scbean under MIT license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuwei Wang
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Bin Lian
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Haohui Zhang
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Yuanke Zhong
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Jie He
- Department of Biostatistics, School of Public Health, Peking University Health Science Center, Beijing 100191, China
| | - Fashuai Wu
- Department of Orthopaedics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Knut Reinert
- Institut für Informatik, Freie Universität Berlin, 14195 Berlin, Germany
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| | - Hui Yang
- School of Life Science, Northwestern Polytechnical University, Shaanxi 710072, China
| | - Jialu Hu
- School of Computer Science, Northwestern Polytechnical University, Shaanxi 710129, China
| |
Collapse
|
7
|
Chen Z, Liang B, Wu Y, Zhou H, Wang Y, Wu H. Identifying driver modules based on multi-omics biological networks in prostate cancer. IET Syst Biol 2022; 16:187-200. [PMID: 36039671 PMCID: PMC9675413 DOI: 10.1049/syb2.12050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Revised: 07/31/2022] [Accepted: 08/13/2022] [Indexed: 01/11/2023] Open
Abstract
The development of sequencing technology has promoted the expansion of cancer genome data. It is necessary to identify the pathogenesis of cancer at the molecular level and explore reliable treatment methods and precise drug targets in cancer by identifying carcinogenic functional modules in massive multi-omics data. However, there are still limitations to identifying carcinogenic driver modules by utilising genetic characteristics simply. Therefore, this study proposes a computational method, NetAP, to identify driver modules in prostate cancer. Firstly, high mutual exclusivity, high coverage, and high topological similarity between genes are integrated to construct a weight function, which calculates the weight of gene pairs in a biological network. Secondly, the random walk method is utilised to reevaluate the strength of interaction among genes. Finally, the optimal driver modules are identified by utilising the affinity propagation algorithm. According to the results, the authors' method identifies more validated driver genes and driver modules compared with the other previous methods. Thus, the proposed NetAP method can identify carcinogenic driver modules effectively and reliably, and the experimental results provide a powerful basis for cancer diagnosis, treatment and drug targets.
Collapse
Affiliation(s)
- Zhongli Chen
- Tibet Center for Disease Control and PreventionLhasaChina
- School of SoftwareShandong UniversityJinanChina
- School of Information EngineeringNorthwest A&F UniversityYanglingChina
| | - Biting Liang
- School of Information EngineeringNorthwest A&F UniversityYanglingChina
| | - Yingfu Wu
- School of Information EngineeringNorthwest A&F UniversityYanglingChina
| | - Haoru Zhou
- School of Information EngineeringNorthwest A&F UniversityYanglingChina
| | - Yuchen Wang
- School of SoftwareShandong UniversityJinanChina
| | - Hao Wu
- School of SoftwareShandong UniversityJinanChina
| |
Collapse
|
8
|
Zhang P, Wu Y, Zhou H, Zhou B, Zhang H, Wu H. CLNN-loop: a deep learning model to predict CTCF-mediated chromatin loops in the different cell lines and CTCF-binding sites (CBS) pair types. Bioinformatics 2022; 38:4497-4504. [PMID: 35997565 DOI: 10.1093/bioinformatics/btac575] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 06/28/2022] [Accepted: 08/22/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Three-dimensional (3D) genome organization is of vital importance in gene regulation and disease mechanisms. Previous studies have shown that CTCF-mediated chromatin loops are crucial to studying the 3D structure of cells. Although various experimental techniques have been developed to detect chromatin loops, they have been found to be time-consuming and costly. Nowadays, various sequence-based computational methods can capture significant features of 3D genome organization and help predict chromatin loops. However, these methods have low performance and poor generalization ability in predicting chromatin loops. RESULTS Here, we propose a novel deep learning model, called CLNN-loop, to predict chromatin loops in different cell lines and CTCF-binding sites (CBS) pair types by fusing multiple sequence-based features. The analysis of a series of examinations based on the datasets in the previous study shows that CLNN-loop has satisfactory performance and is superior to the existing methods in terms of predicting chromatin loops. In addition, we apply the SHAP framework to interpret the predictions of different models, and find that CTCF motif and sequence conservation are important signs of chromatin loops in different cell lines and CBS pair types. AVAILABILITY AND IMPLEMENTATION The source code of CLNN-loop is freely available at https://github.com/HaoWuLab-Bioinformatics/CLNN-loop and the webserver of CLNN-loop is freely available at http://hwclnn.sdu.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pengyu Zhang
- School of Software, Shandong University, Jinan, Shandong 250101, China.,College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yingfu Wu
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Haoru Zhou
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Bing Zhou
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Hongming Zhang
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Hao Wu
- School of Software, Shandong University, Jinan, Shandong 250101, China
| |
Collapse
|
9
|
Qu G, Yan Z, Wu H. Clover: tree structure-based efficient DNA clustering for DNA-based data storage. Brief Bioinform 2022; 23:6668252. [PMID: 35975958 DOI: 10.1093/bib/bbac336] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/21/2022] [Accepted: 07/22/2022] [Indexed: 11/12/2022] Open
Abstract
Deoxyribonucleic acid (DNA)-based data storage is a promising new storage technology which has the advantage of high storage capacity and long storage time compared with traditional storage media. However, the synthesis and sequencing process of DNA can randomly generate many types of errors, which makes it more difficult to cluster DNA sequences to recover DNA information. Currently, the available DNA clustering algorithms are targeted at DNA sequences in the biological domain, which not only cannot adapt to the characteristics of sequences in DNA storage, but also tend to be unacceptably time-consuming for billions of DNA sequences in DNA storage. In this paper, we propose an efficient DNA clustering method termed Clover for DNA storage with linear computational complexity and low memory. Clover avoids the computation of the Levenshtein distance by using a tree structure for interval-specific retrieval. We argue through theoretical proofs that Clover has standard linear computational complexity, low space complexity, etc. Experiments show that our method can cluster 10 million DNA sequences into 50 000 classes in 10 s and meet an accuracy rate of over 99%. Furthermore, we have successfully completed an unprecedented clustering of 10 billion DNA data on a single home computer and the time consumption still satisfies the linear relationship. Clover is freely available at https://github.com/Guanjinqu/Clover.
Collapse
Affiliation(s)
- Guanjin Qu
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Zihui Yan
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Huaming Wu
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| |
Collapse
|
10
|
Liu W, Liao X, Yang Y, Lin H, Yeong J, Zhou X, Shi X, Liu J. Joint dimension reduction and clustering analysis of single-cell RNA-seq and spatial transcriptomics data. Nucleic Acids Res 2022; 50:e72. [PMID: 35349708 PMCID: PMC9262606 DOI: 10.1093/nar/gkac219] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 02/22/2022] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
Dimension reduction and (spatial) clustering is usually performed sequentially; however, the low-dimensional embeddings estimated in the dimension-reduction step may not be relevant to the class labels inferred in the clustering step. We therefore developed a computation method, Dimension-Reduction Spatial-Clustering (DR-SC), that can simultaneously perform dimension reduction and (spatial) clustering within a unified framework. Joint analysis by DR-SC produces accurate (spatial) clustering results and ensures the effective extraction of biologically informative low-dimensional features. DR-SC is applicable to spatial clustering in spatial transcriptomics that characterizes the spatial organization of the tissue by segregating it into multiple tissue structures. Here, DR-SC relies on a latent hidden Markov random field model to encourage the spatial smoothness of the detected spatial cluster boundaries. Underlying DR-SC is an efficient expectation-maximization algorithm based on an iterative conditional mode. As such, DR-SC is scalable to large sample sizes and can optimize the spatial smoothness parameter in a data-driven manner. With comprehensive simulations and real data applications, we show that DR-SC outperforms existing clustering and spatial clustering methods: it extracts more biologically relevant features than conventional dimension reduction methods, improves clustering performance, and offers improved trajectory inference and visualization for downstream trajectory inference analyses.
Collapse
Affiliation(s)
- Wei Liu
- Academy of Statistics and Interdisciplinary Sciences, East China Normal University, Shanghai, 200062, China
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, 169857, Singapore
| | - Xu Liao
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, 169857, Singapore
| | - Yi Yang
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, 169857, Singapore
| | - Huazhen Lin
- Center of Statistical Research and School of Statistics, Southwestern University of Finance and Economics, Chengdu, 611130, China
| | - Joe Yeong
- Institute of Molecular and Cell Biology(IMCB), Agency of Science, Technology and Research(A*STAR), 138673, Singapore
- Department of Anatomical Pathology, Singapore General Hospital, 169856, Singapore
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, 48109, USA
| | - Xingjie Shi
- Academy of Statistics and Interdisciplinary Sciences, East China Normal University, Shanghai, 200062, China
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China
| | - Jin Liu
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, 169857, Singapore
| |
Collapse
|
11
|
Li W, Zhang S, Zhao Y, Wang D, Shi Q, Ding Z, Wang Y, Gao B, Yan M. Revealing the Key MSCs Niches and Pathogenic Genes in Influencing CEP Homeostasis: A Conjoint Analysis of Single-Cell and WGCNA. Front Immunol 2022; 13:933721. [PMID: 35833124 PMCID: PMC9271696 DOI: 10.3389/fimmu.2022.933721] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Accepted: 05/30/2022] [Indexed: 01/24/2023] Open
Abstract
Degenerative disc disease (DDD), a major contributor to discogenic pain, which is mainly resulted from the dysfunction of nucleus pulposus (NP), annulus fibrosis (AF) and cartilage endplate (CEP) cells. Genetic and cellular components alterations in CEP may influence disc homeostasis, while few single-cell RNA sequencing (scRNA-seq) report in CEP makes it a challenge to evaluate cellular heterogeneity in CEP. Here, this study conducted a first conjoint analysis of weighted gene co-expression network analysis (WGCNA) and scRNA-seq in CEP, systematically analyzed the interested module, immune infiltration situation, and cell niches in CEP. WGCNA and protein-protein interaction (PPI) network determined a group of gene signatures responsible for degenerative CEP, including BRD4, RAF1, ANGPT1, CHD7 and NOP56; differentially immune analysis elucidated that CD4+ T cells, NK cells and dendritic cells were highly activated in degenerative CEP; then single-cell resolution transcriptomic landscape further identified several mesenchymal stem cells and other cellular components focused on human CEP, which illuminated niche atlas of different cell subpopulations: 8 populations were identified by distinct molecular signatures. Among which, NP progenitor/mesenchymal stem cells (NPMSC), also served as multipotent stem cells in CEP, exhibited regenerative and therapeutic potentials in promoting bone repair and maintaining bone homeostasis through SPP1, NRP1-related cascade reactions; regulatory and effector mesenchymal chondrocytes could be further classified into 2 different subtypes, and each subtype behaved potential opposite effects in maintaining cartilage homeostasis; next, the potential functional differences of each mesenchymal stem cell populations and the possible interactions with different cell types analysis revealed that JAG1, SPP1, MIF and PDGF etc. generated by different cells could regulate the CEP homeostasis by bone formation or angiogenesis, which could be served as novel therapeutic targets for degenerative CEP. In brief, this study mainly revealed the mesenchymal stem cells populations complexity and phenotypic characteristics in CEP. In brief, this study filled the gap in the knowledge of CEP components, further enhanced researchers’ understanding of CEP and their cell niches constitution.
Collapse
Affiliation(s)
- Weihang Li
- Department of Orthopedic Surgery, Xijing Hospital, Air Force Medical University, Xi’an, China
| | - Shilei Zhang
- Department of Orthopedic Surgery, Xijing Hospital, Air Force Medical University, Xi’an, China
| | - Yingjing Zhao
- Department of Intensive Care Unit, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| | - Dong Wang
- Department of Orthopedic Surgery, Xijing Hospital, Air Force Medical University, Xi’an, China
- Department of Orthopaedics, Affiliated Hospital of Yanan University, Yanan, China
| | - Quan Shi
- Department of Orthopedic Surgery, Xijing Hospital, Air Force Medical University, Xi’an, China
- Department of Orthopaedics, Affiliated Hospital of Yanan University, Yanan, China
| | - Ziyi Ding
- Department of Orthopedic Surgery, Xijing Hospital, Air Force Medical University, Xi’an, China
| | - Yongchun Wang
- Department of Aerospace Medical Training, School of Aerospace Medicine, Air Force Medical University, Xi’an, China
- Key Lab of Aerospace Medicine, Chinese Ministry of Education, Xi’an, China
- *Correspondence: Ming Yan, ; Bo Gao, ; Yongchun Wang,
| | - Bo Gao
- Department of Orthopedic Surgery, Xijing Hospital, Air Force Medical University, Xi’an, China
- *Correspondence: Ming Yan, ; Bo Gao, ; Yongchun Wang,
| | - Ming Yan
- Department of Orthopedic Surgery, Xijing Hospital, Air Force Medical University, Xi’an, China
- *Correspondence: Ming Yan, ; Bo Gao, ; Yongchun Wang,
| |
Collapse
|
12
|
Zhao J, Wang G, Ming J, Lin Z, Wang Y, Wu AR, Yang C. Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets. NATURE COMPUTATIONAL SCIENCE 2022; 2:317-330. [PMID: 38177826 DOI: 10.1038/s43588-022-00251-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 04/20/2022] [Indexed: 01/06/2024]
Abstract
The rapid emergence of large-scale atlas-level single-cell RNA-seq datasets presents remarkable opportunities for broad and deep biological investigations through integrative analyses. However, harmonizing such datasets requires integration approaches to be not only computationally scalable, but also capable of preserving a wide range of fine-grained cell populations. We have created Portal, a unified framework of adversarial domain translation to learn harmonized representations of datasets. When compared to other state-of-the-art methods, Portal achieves better performance for preserving biological variation during integration, while achieving the integration of millions of cells, in minutes, with low memory consumption. We show that Portal is widely applicable to integrating datasets across different samples, platforms and data types. We also apply Portal to the integration of cross-species datasets with limited shared information among them, elucidating biological insights into the similarities and divergences in the spermatogenesis process among mouse, macaque and human.
Collapse
Affiliation(s)
- Jia Zhao
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Gefei Wang
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Jingsi Ming
- Academy of Statistics and Interdisciplinary Sciences, KLATASDS-MOE, East China Normal University, Shanghai, China
| | - Zhixiang Lin
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yang Wang
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
- Guangdong-Hong Kong-Macao Joint Laboratory for Data-Driven Fluid Mechanics and Engineering Applications, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Angela Ruohao Wu
- Division of Life Science, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
- Center for Aging Science, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| | - Can Yang
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
- Guangdong-Hong Kong-Macao Joint Laboratory for Data-Driven Fluid Mechanics and Engineering Applications, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| |
Collapse
|