1
|
Hu Y, Xie M, Li Y, Rao M, Shen W, Luo C, Qin H, Baek J, Zhou XM. Benchmarking clustering, alignment, and integration methods for spatial transcriptomics. Genome Biol 2024; 25:212. [PMID: 39123269 PMCID: PMC11312151 DOI: 10.1186/s13059-024-03361-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 07/30/2024] [Indexed: 08/12/2024] Open
Abstract
BACKGROUND Spatial transcriptomics (ST) is advancing our understanding of complex tissues and organisms. However, building a robust clustering algorithm to define spatially coherent regions in a single tissue slice and aligning or integrating multiple tissue slices originating from diverse sources for essential downstream analyses remains challenging. Numerous clustering, alignment, and integration methods have been specifically designed for ST data by leveraging its spatial information. The absence of comprehensive benchmark studies complicates the selection of methods and future method development. RESULTS In this study, we systematically benchmark a variety of state-of-the-art algorithms with a wide range of real and simulated datasets of varying sizes, technologies, species, and complexity. We analyze the strengths and weaknesses of each method using diverse quantitative and qualitative metrics and analyses, including eight metrics for spatial clustering accuracy and contiguity, uniform manifold approximation and projection visualization, layer-wise and spot-to-spot alignment accuracy, and 3D reconstruction, which are designed to assess method performance as well as data quality. The code used for evaluation is available on our GitHub. Additionally, we provide online notebook tutorials and documentation to facilitate the reproduction of all benchmarking results and to support the study of new methods and new datasets. CONCLUSIONS Our analyses lead to comprehensive recommendations that cover multiple aspects, helping users to select optimal tools for their specific needs and guide future method development.
Collapse
Affiliation(s)
- Yunfei Hu
- Department of Computer Science, Vanderbilt University, 37235, Nashville, USA
| | - Manfei Xie
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, USA
| | - Yikang Li
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, USA
| | - Mingxing Rao
- Department of Computer Science, Vanderbilt University, 37235, Nashville, USA
| | - Wenjun Shen
- Department of Bioinformatics, Shantou University Medical College, 515041, Shantou, China
| | - Can Luo
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, USA
| | - Haoran Qin
- Department of Computer Science, Vanderbilt University, 37235, Nashville, USA
| | - Jihoon Baek
- Department of Computer Science, Vanderbilt University, 37235, Nashville, USA
| | - Xin Maizie Zhou
- Department of Computer Science, Vanderbilt University, 37235, Nashville, USA.
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, USA.
| |
Collapse
|
2
|
You Y, Fu Y, Li L, Zhang Z, Jia S, Lu S, Ren W, Liu Y, Xu Y, Liu X, Jiang F, Peng G, Sampath Kumar A, Ritchie ME, Liu X, Tian L. Systematic comparison of sequencing-based spatial transcriptomic methods. Nat Methods 2024:10.1038/s41592-024-02325-3. [PMID: 38965443 DOI: 10.1038/s41592-024-02325-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 05/29/2024] [Indexed: 07/06/2024]
Abstract
Recent developments of sequencing-based spatial transcriptomics (sST) have catalyzed important advancements by facilitating transcriptome-scale spatial gene expression measurement. Despite this progress, efforts to comprehensively benchmark different platforms are currently lacking. The extant variability across technologies and datasets poses challenges in formulating standardized evaluation metrics. In this study, we established a collection of reference tissues and regions characterized by well-defined histological architectures, and used them to generate data to compare 11 sST methods. We highlighted molecular diffusion as a variable parameter across different methods and tissues, significantly affecting the effective resolutions. Furthermore, we observed that spatial transcriptomic data demonstrate unique attributes beyond merely adding a spatial axis to single-cell data, including an enhanced ability to capture patterned rare cell states along with specific markers, albeit being influenced by multiple factors including sequencing depth and resolution. Our study assists biologists in sST platform selection, and helps foster a consensus on evaluation standards and establish a framework for future benchmarking efforts that can be used as a gold standard for the development and benchmarking of computational tools for spatial transcriptomic analysis.
Collapse
Affiliation(s)
- Yue You
- Guangzhou National Laboratory, Guangzhou, China
| | - Yuting Fu
- School of Life Sciences, Westlake University, Hangzhou, China
- Research Center for Industries of the Future, Westlake University, Hangzhou, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China
- Westlake Institute for Advanced Study, Hangzhou, China
| | - Lanxiang Li
- Guangzhou National Laboratory, Guangzhou, China
| | | | - Shikai Jia
- School of Life Sciences, Westlake University, Hangzhou, China
- Research Center for Industries of the Future, Westlake University, Hangzhou, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China
- Westlake Institute for Advanced Study, Hangzhou, China
| | - Shihong Lu
- Guangzhou National Laboratory, Guangzhou, China
| | - Wenle Ren
- Guangzhou National Laboratory, Guangzhou, China
| | - Yifang Liu
- School of Life Sciences, Westlake University, Hangzhou, China
- Research Center for Industries of the Future, Westlake University, Hangzhou, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China
- Westlake Institute for Advanced Study, Hangzhou, China
| | - Yang Xu
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Xiaojing Liu
- School of Life Sciences, Westlake University, Hangzhou, China
- Research Center for Industries of the Future, Westlake University, Hangzhou, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China
- Westlake Institute for Advanced Study, Hangzhou, China
| | - Fuqing Jiang
- Center for Cell Lineage and Development, CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, GIBH-HKU Guangdong-Hong Kong Stem Cell and Regenerative Medicine Research Centre, University of Chinese Academy of Sciences, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China
- Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing, China
| | - Guangdun Peng
- Center for Cell Lineage and Development, CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, GIBH-HKU Guangdong-Hong Kong Stem Cell and Regenerative Medicine Research Centre, University of Chinese Academy of Sciences, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China
- Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing, China
| | - Abhishek Sampath Kumar
- Department of Stem Cell and Regenerative Biology, Harvard University. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Matthew E Ritchie
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Xiaodong Liu
- School of Life Sciences, Westlake University, Hangzhou, China.
- Research Center for Industries of the Future, Westlake University, Hangzhou, China.
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China.
- Westlake Institute for Advanced Study, Hangzhou, China.
| | - Luyi Tian
- Guangzhou National Laboratory, Guangzhou, China.
- GMU-GIBH Joint School of Life Sciences, Guangzhou Medical University, Guangzhou, China.
| |
Collapse
|
3
|
Andersson A, Behanova A, Avenel C, Windhager J, Malmberg F, Wählby C. Points2Regions: Fast, interactive clustering of imaging-based spatial transcriptomics data. Cytometry A 2024. [PMID: 38958502 DOI: 10.1002/cyto.a.24884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 05/30/2024] [Accepted: 06/13/2024] [Indexed: 07/04/2024]
Abstract
Imaging-based spatial transcriptomics techniques generate data in the form of spatial points belonging to different mRNA classes. A crucial part of analyzing the data involves the identification of regions with similar composition of mRNA classes. These biologically interesting regions can manifest at different spatial scales. For example, the composition of mRNA classes on a cellular scale corresponds to cell types, whereas compositions on a millimeter scale correspond to tissue-level structures. Traditional techniques for identifying such regions often rely on complementary data, such as pre-segmented cells, or lengthy optimization. This limits their applicability to tasks on a particular scale, restricting their capabilities in exploratory analysis. This article introduces "Points2Regions," a computational tool for identifying regions with similar mRNA compositions. The tool's novelty lies in its rapid feature extraction by rasterizing points (representing mRNAs) onto a pyramidal grid and its efficient clustering using a combination of hierarchical andk $$ k $$ -means clustering. This enables fast and efficient region discovery across multiple scales without relying on additional data, making it a valuable resource for exploratory analysis. Points2Regions has demonstrated performance similar to state-of-the-art methods on two simulated datasets, without relying on segmented cells, while being several times faster. Experiments on real-world datasets show that regions identified by Points2Regions are similar to those identified in other studies, confirming that Points2Regions can be used to extract biologically relevant regions. The tool is shared as a Python package integrated into TissUUmaps and a Napari plugin, offering interactive clustering and visualization, significantly enhancing user experience in data exploration.
Collapse
Affiliation(s)
- Axel Andersson
- Department of IT and SciLifeLab BioImage Informatics Facility, Uppsala University, Uppsala, Sweden
| | - Andrea Behanova
- Department of IT and SciLifeLab BioImage Informatics Facility, Uppsala University, Uppsala, Sweden
| | - Christophe Avenel
- Department of IT and SciLifeLab BioImage Informatics Facility, Uppsala University, Uppsala, Sweden
| | - Jonas Windhager
- Department of IT and SciLifeLab BioImage Informatics Facility, Uppsala University, Uppsala, Sweden
| | - Filip Malmberg
- Department of IT and SciLifeLab BioImage Informatics Facility, Uppsala University, Uppsala, Sweden
| | - Carolina Wählby
- Department of IT and SciLifeLab BioImage Informatics Facility, Uppsala University, Uppsala, Sweden
| |
Collapse
|
4
|
Jin Y, Zuo Y, Li G, Liu W, Pan Y, Fan T, Fu X, Yao X, Peng Y. Advances in spatial transcriptomics and its applications in cancer research. Mol Cancer 2024; 23:129. [PMID: 38902727 PMCID: PMC11188176 DOI: 10.1186/s12943-024-02040-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Accepted: 06/10/2024] [Indexed: 06/22/2024] Open
Abstract
Malignant tumors have increasing morbidity and high mortality, and their occurrence and development is a complicate process. The development of sequencing technologies enabled us to gain a better understanding of the underlying genetic and molecular mechanisms in tumors. In recent years, the spatial transcriptomics sequencing technologies have been developed rapidly and allow the quantification and illustration of gene expression in the spatial context of tissues. Compared with the traditional transcriptomics technologies, spatial transcriptomics technologies not only detect gene expression levels in cells, but also inform the spatial location of genes within tissues, cell composition of biological tissues, and interaction between cells. Here we summarize the development of spatial transcriptomics technologies, spatial transcriptomics tools and its application in cancer research. We also discuss the limitations and challenges of current spatial transcriptomics approaches, as well as future development and prospects.
Collapse
Affiliation(s)
- Yang Jin
- Laboratory of Molecular Oncology, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Yuanli Zuo
- Laboratory of Molecular Oncology, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Gang Li
- Department of Thoracic Surgery, The Public Health Clinical Center of Chengdu, Chengdu, 610061, China
| | - Wenrong Liu
- Laboratory of Molecular Oncology, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Yitong Pan
- Laboratory of Molecular Oncology, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Ting Fan
- Laboratory of Molecular Oncology, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xin Fu
- Laboratory of Molecular Oncology, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaojun Yao
- Department of Thoracic Surgery, The Public Health Clinical Center of Chengdu, Chengdu, 610061, China.
| | - Yong Peng
- Laboratory of Molecular Oncology, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Frontier Medical Center, Tianfu Jincheng Laboratory, Chengdu, 610212, China.
| |
Collapse
|
5
|
Yu S, Li WV. spVC for the detection and interpretation of spatial gene expression variation. Genome Biol 2024; 25:103. [PMID: 38641849 PMCID: PMC11027374 DOI: 10.1186/s13059-024-03245-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Accepted: 04/10/2024] [Indexed: 04/21/2024] Open
Abstract
Spatially resolved transcriptomics technologies have opened new avenues for understanding gene expression heterogeneity in spatial contexts. However, existing methods for identifying spatially variable genes often focus solely on statistical significance, limiting their ability to capture continuous expression patterns and integrate spot-level covariates. To address these challenges, we introduce spVC, a statistical method based on a generalized Poisson model. spVC seamlessly integrates constant and spatially varying effects of covariates, facilitating comprehensive exploration of gene expression variability and enhancing interpretability. Simulation and real data applications confirm spVC's accuracy in these tasks, highlighting its versatility in spatial transcriptomics analysis.
Collapse
Affiliation(s)
- Shan Yu
- Department of Statistics, Unversity of Virginia, Charlottesville, 22903, VA, USA.
| | - Wei Vivian Li
- Department of Statistics, University of California, Riverside, 92521, CA, USA.
| |
Collapse
|
6
|
Yuan Z, Zhao F, Lin S, Zhao Y, Yao J, Cui Y, Zhang XY, Zhao Y. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat Methods 2024; 21:712-722. [PMID: 38491270 DOI: 10.1038/s41592-024-02215-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 02/16/2024] [Indexed: 03/18/2024]
Abstract
Spatial clustering, which shares an analogy with single-cell clustering, has expanded the scope of tissue physiology studies from cell-centroid to structure-centroid with spatially resolved transcriptomics (SRT) data. Computational methods have undergone remarkable development in recent years, but a comprehensive benchmark study is still lacking. Here we present a benchmark study of 13 computational methods on 34 SRT data (7 datasets). The performance was evaluated on the basis of accuracy, spatial continuity, marker genes detection, scalability, and robustness. We found existing methods were complementary in terms of their performance and functionality, and we provide guidance for selecting appropriate methods for given scenarios. On testing additional 22 challenging datasets, we identified challenges in identifying noncontinuous spatial domains and limitations of existing methods, highlighting their inadequacies in handling recent large-scale tasks. Furthermore, with 145 simulated data, we examined the robustness of these methods against four different factors, and assessed the impact of pre- and postprocessing approaches. Our study offers a comprehensive evaluation of existing spatial clustering methods with SRT data, paving the way for future advancements in this rapidly evolving field.
Collapse
Affiliation(s)
- Zhiyuan Yuan
- Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Fudan University, Shanghai, China.
- Institute of Science and Technology for Brain-Inspired Intelligence; MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence; MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China.
| | - Fangyuan Zhao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Senlin Lin
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yu Zhao
- Tencent AI Lab, Shenzhen, China
| | | | - Yan Cui
- Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Fudan University, Shanghai, China
- Institute of Science and Technology for Brain-Inspired Intelligence; MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence; MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Xiao-Yong Zhang
- Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Fudan University, Shanghai, China
| | - Yi Zhao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
7
|
Zahedi R, Ghamsari R, Argha A, Macphillamy C, Beheshti A, Alizadehsani R, Lovell NH, Lotfollahi M, Alinejad-Rokny H. Deep learning in spatially resolved transcriptfomics: a comprehensive technical view. Brief Bioinform 2024; 25:bbae082. [PMID: 38483255 PMCID: PMC10939360 DOI: 10.1093/bib/bbae082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/22/2024] [Accepted: 02/13/2024] [Indexed: 03/17/2024] Open
Abstract
Spatially resolved transcriptomics (SRT) is a pioneering method for simultaneously studying morphological contexts and gene expression at single-cell precision. Data emerging from SRT are multifaceted, presenting researchers with intricate gene expression matrices, precise spatial details and comprehensive histology visuals. Such rich and intricate datasets, unfortunately, render many conventional methods like traditional machine learning and statistical models ineffective. The unique challenges posed by the specialized nature of SRT data have led the scientific community to explore more sophisticated analytical avenues. Recent trends indicate an increasing reliance on deep learning algorithms, especially in areas such as spatial clustering, identification of spatially variable genes and data alignment tasks. In this manuscript, we provide a rigorous critique of these advanced deep learning methodologies, probing into their merits, limitations and avenues for further refinement. Our in-depth analysis underscores that while the recent innovations in deep learning tailored for SRT have been promising, there remains a substantial potential for enhancement. A crucial area that demands attention is the development of models that can incorporate intricate biological nuances, such as phylogeny-aware processing or in-depth analysis of minuscule histology image segments. Furthermore, addressing challenges like the elimination of batch effects, perfecting data normalization techniques and countering the overdispersion and zero inflation patterns seen in gene expression is pivotal. To support the broader scientific community in their SRT endeavors, we have meticulously assembled a comprehensive directory of readily accessible SRT databases, hoping to serve as a foundation for future research initiatives.
Collapse
Affiliation(s)
- Roxana Zahedi
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
| | - Reza Ghamsari
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
| | - Ahmadreza Argha
- The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| | - Callum Macphillamy
- School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, 5371, Australia
| | - Amin Beheshti
- School of Computing, Macquarie University, Sydney, 2109, Australia
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Waurn Ponds, Melbourne, VIC, 3216, Australia
| | - Nigel H Lovell
- The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| | - Mohammad Lotfollahi
- Computational Health Center, Helmholtz Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Hamid Alinejad-Rokny
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| |
Collapse
|
8
|
Liang Y, Shi G, Cai R, Yuan Y, Xie Z, Yu L, Huang Y, Shi Q, Wang L, Li J, Tang Z. PROST: quantitative identification of spatially variable genes and domain detection in spatial transcriptomics. Nat Commun 2024; 15:600. [PMID: 38238417 PMCID: PMC10796707 DOI: 10.1038/s41467-024-44835-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 12/19/2023] [Indexed: 01/22/2024] Open
Abstract
Computational methods have been proposed to leverage spatially resolved transcriptomic data, pinpointing genes with spatial expression patterns and delineating tissue domains. However, existing approaches fall short in uniformly quantifying spatially variable genes (SVGs). Moreover, from a methodological viewpoint, while SVGs are naturally associated with depicting spatial domains, they are technically dissociated in most methods. Here, we present a framework (PROST) for the quantitative recognition of spatial transcriptomic patterns, consisting of (i) quantitatively characterizing spatial variations in gene expression patterns through the PROST Index; and (ii) unsupervised clustering of spatial domains via a self-attention mechanism. We demonstrate that PROST performs superior SVG identification and domain segmentation with various spatial resolutions, from multicellular to cellular levels. Importantly, PROST Index can be applied to prioritize spatial expression variations, facilitating the exploration of biological insights. Together, our study provides a flexible and robust framework for analyzing diverse spatial transcriptomic data.
Collapse
Affiliation(s)
- Yuchen Liang
- School of Geography and Planning, Sun Yat-sen University, Guangzhou, 510275, China
| | - Guowei Shi
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Runlin Cai
- School of Geography and Planning, Sun Yat-sen University, Guangzhou, 510275, China
| | - Yuchen Yuan
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Ziying Xie
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Long Yu
- School of Geography and Planning, Sun Yat-sen University, Guangzhou, 510275, China
| | - Yingjian Huang
- School of Geography and Planning, Sun Yat-sen University, Guangzhou, 510275, China
| | - Qian Shi
- School of Geography and Planning, Sun Yat-sen University, Guangzhou, 510275, China
| | - Lizhe Wang
- School of Computer Science, China University of Geosciences, Wuhan, 430078, China
| | - Jun Li
- School of Computer Science, China University of Geosciences, Wuhan, 430078, China.
| | - Zhonghui Tang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China.
| |
Collapse
|
9
|
Yuan Z. MENDER: fast and scalable tissue structure identification in spatial omics data. Nat Commun 2024; 15:207. [PMID: 38182575 PMCID: PMC10770058 DOI: 10.1038/s41467-023-44367-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 12/11/2023] [Indexed: 01/07/2024] Open
Abstract
Tissue structure identification is a crucial task in spatial omics data analysis, for which increasingly complex models, such as Graph Neural Networks and Bayesian networks, are employed. However, whether increased model complexity can effectively lead to improved performance is a notable question in the field. Inspired by the consistent observation of cellular neighborhood structures across various spatial technologies, we propose Multi-range cEll coNtext DEciphereR (MENDER), for tissue structure identification. Applied on datasets of 3 brain regions and a whole-brain atlas, MENDER, with biology-driven design, offers substantial improvements over modern complex models while automatically aligning labels across slices, despite using much less running time than the second-fastest. MENDER's identification power allows the uncovering of previously overlooked spatial domains that exhibit strong associations with brain aging. MENDER's scalability makes it freely appliable on a million-level brain spatial atlas. MENDER's discriminative power enables the differentiation of breast cancer patient subtypes obscured by single-cell analysis.
Collapse
Affiliation(s)
- Zhiyuan Yuan
- Institute of Science and Technology for Brain-Inspired Intelligence, MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, MOE Frontiers Center for Brain Science, Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Fudan University, Shanghai, 200433, China.
| |
Collapse
|
10
|
Fang Z, Liu T, Zheng R, A J, Yin M, Li M. stAA: adversarial graph autoencoder for spatial clustering task of spatially resolved transcriptomics. Brief Bioinform 2023; 25:bbad500. [PMID: 38189544 PMCID: PMC10772985 DOI: 10.1093/bib/bbad500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 11/22/2023] [Accepted: 12/11/2023] [Indexed: 01/09/2024] Open
Abstract
With the development of spatially resolved transcriptomics technologies, it is now possible to explore the gene expression profiles of single cells while preserving their spatial context. Spatial clustering plays a key role in spatial transcriptome data analysis. In the past 2 years, several graph neural network-based methods have emerged, which significantly improved the accuracy of spatial clustering. However, accurately identifying the boundaries of spatial domains remains a challenging task. In this article, we propose stAA, an adversarial variational graph autoencoder, to identify spatial domain. stAA generates cell embedding by leveraging gene expression and spatial information using graph neural networks and enforces the distribution of cell embeddings to a prior distribution through Wasserstein distance. The adversarial training process can make cell embeddings better capture spatial domain information and more robust. Moreover, stAA incorporates global graph information into cell embeddings using labels generated by pre-clustering. Our experimental results show that stAA outperforms the state-of-the-art methods and achieves better clustering results across different profiling platforms and various resolutions. We also conducted numerous biological analyses and found that stAA can identify fine-grained structures in tissues, recognize different functional subtypes within tumors and accurately identify developmental trajectories.
Collapse
Affiliation(s)
- Zhaoyu Fang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Teng Liu
- Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC), Chongqing University Three Gorges Hospital, Chongqing University, Chongqing 404031, China
- Translational Medicine Research Center (TMRC), School of Medicine, Chongqing University, Chongqing 401331, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Jin A
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Mingzhu Yin
- Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC), Chongqing University Three Gorges Hospital, Chongqing University, Chongqing 404031, China
- Translational Medicine Research Center (TMRC), School of Medicine, Chongqing University, Chongqing 401331, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| |
Collapse
|
11
|
Peng L, He X, Peng X, Li Z, Zhang L. STGNNks: Identifying cell types in spatial transcriptomics data based on graph neural network, denoising auto-encoder, and k-sums clustering. Comput Biol Med 2023; 166:107440. [PMID: 37738898 DOI: 10.1016/j.compbiomed.2023.107440] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/15/2023] [Accepted: 08/29/2023] [Indexed: 09/24/2023]
Abstract
BACKGROUND Spatial transcriptomics technologies fully utilize spatial location information, tissue morphological features, and transcriptional profiles. Integrating these data can greatly advance our understanding about cell biology in the morphological background. METHODS We developed an innovative spatial clustering method called STGNNks by combining graph neural network, denoising auto-encoder, and k-sums clustering. First, spatial resolved transcriptomics data are preprocessed and a hybrid adjacency matrix is constructed. Next, gene expressions and spatial context are integrated to learn spots' embedding features by a deep graph infomax-based graph convolutional network. Third, the learned features are mapped to a low-dimensional space through a zero-inflated negative binomial (ZINB)-based denoising auto-encoder. Fourth, a k-sums clustering algorithm is developed to identify spatial domains by combining k-means clustering and the ratio-cut clustering algorithms. Finally, it implements spatial trajectory inference, spatially variable gene identification, and differentially expressed gene detection based on the pseudo-space-time method on six 10x Genomics Visium datasets. RESULTS We compared our proposed STGNNks method with five other spatial clustering methods, CCST, Seurat, stLearn, Scanpy and SEDR. For the first time, four internal indicators in the area of machine learning, that is, silhouette coefficient, the Davies-Bouldin index, the Caliniski-Harabasz index, and the S_Dbw index, were used to measure the clustering performance of STGNNks with CCST, Seurat, stLearn, Scanpy and SEDR on five spatial transcriptomics datasets without labels (i.e., Adult Mouse Brain (FFPE), Adult Mouse Kidney (FFPE), Human Breast Cancer (Block A Section 2), Human Breast Cancer (FFPE), and Human Lymph Node). And two external indicators including adjusted Rand index (ARI) and normalized mutual information (NMI) were applied to evaluate the performance of the above six methods on Human Breast Cancer (Block A Section 1) with real labels. The comparison experiments elucidated that STGNNks obtained the smallest Davies-Bouldin and S_Dbw values and the largest Silhouette Coefficient, Caliniski-Harabasz, ARI and NMI, significantly outperforming the above five spatial transcriptomics analysis algorithms. Furthermore, we detected the top six spatially variable genes and the top five differentially expressed genes in each cluster on the above five unlabeled datasets. And the pseudo-space-time tree plot with hierarchical layout demonstrated a flow of Human Breast Cancer (Block A Section 1) progress in three clades branching from three invasive ductal carcinoma regions to multiple ductal carcinoma in situ sub-clusters. CONCLUSION We anticipate that STGNNks can efficiently improve spatial transcriptomics data analysis and further boost the diagnosis and therapy of related diseases. The codes are publicly available at https://github.com/plhhnu/STGNNks.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China; College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Xianzhi He
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Xinhuai Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, 421002, Hunan, China.
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, Jiangsu, China.
| |
Collapse
|
12
|
Li Z, Chen X, Zhang X, Jiang R, Chen S. Latent feature extraction with a prior-based self-attention framework for spatial transcriptomics. Genome Res 2023; 33:1757-1773. [PMID: 37903634 PMCID: PMC10691543 DOI: 10.1101/gr.277891.123] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 09/19/2023] [Indexed: 11/01/2023]
Abstract
Rapid advances in spatial transcriptomics (ST) have revolutionized the interrogation of spatial heterogeneity and increase the demand for comprehensive methods to effectively characterize spatial domains. As a prerequisite for ST data analysis, spatial domain characterization is a crucial step for downstream analyses and biological implications. Here we propose a prior-based self-attention framework for spatial transcriptomics (PAST), a variational graph convolutional autoencoder for ST, which effectively integrates prior information via a Bayesian neural network, captures spatial patterns via a self-attention mechanism, and enables scalable application via a ripple walk sampler strategy. Through comprehensive experiments on data sets generated by different technologies, we show that PAST can effectively characterize spatial domains and facilitate various downstream analyses, including ST visualization, spatial trajectory inference and pseudotime analysis. Also, we highlight the advantages of PAST for multislice joint embedding and automatic annotation of spatial domains in newly sequenced ST data. Compared with existing methods, PAST is the first ST method that integrates reference data to analyze ST data. We anticipate that PAST will open up new avenues for researchers to decipher ST data with customized reference data, which expands the applicability of ST technology.
Collapse
Affiliation(s)
- Zhen Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| |
Collapse
|
13
|
Zhang C, Duan ZW, Xu YP, Liu J, Li HD. FEED: a feature selection method based on gene expression decomposition for single cell clustering. Brief Bioinform 2023; 24:bbad389. [PMID: 37935617 DOI: 10.1093/bib/bbad389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 08/31/2023] [Accepted: 09/22/2023] [Indexed: 11/09/2023] Open
Abstract
Single-cell clustering is a critical step in biological downstream analysis. The clustering performance could be effectively improved by extracting cell-type-specific genes. The state-of-the-art feature selection methods usually calculate the importance of a single gene without considering the information contained in the gene expression distribution. Moreover, these methods ignore the intrinsic expression patterns of genes and heterogeneity within groups of different mean expression levels. In this work, we present a Feature sElection method based on gene Expression Decomposition (FEED) of scRNA-seq data, which selects informative genes to enhance clustering performance. First, the expression levels of genes are decomposed into multiple Gaussian components. Then, a novel gene correlation calculation method is proposed to measure the relationship between genes from the perspective of distribution. Finally, a permutation-based approach is proposed to determine the threshold of gene importance to obtain marker gene subsets. Compared with state-of-the-art feature selection methods, applying FEED on various scRNA-seq datasets including large datasets followed by different common clustering algorithms results in significant improvements in the accuracy of cell-type identification. The source codes for FEED are freely available at https://github.com/genemine/FEED.
Collapse
Affiliation(s)
- Chao Zhang
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Zhi-Wei Duan
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Yun-Pei Xu
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Jin Liu
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|