1
|
Lin Y, Liang Y, Wang D, Chang Y, Ma Q, Wang Y, He F, Xu D. A contrastive learning approach to integrate spatial transcriptomics and histological images. Comput Struct Biotechnol J 2024; 23:1786-1795. [PMID: 38707535 PMCID: PMC11068546 DOI: 10.1016/j.csbj.2024.04.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 04/14/2024] [Accepted: 04/15/2024] [Indexed: 05/07/2024] Open
Abstract
The rapid growth of spatially resolved transcriptomics technology provides new perspectives on spatial tissue architecture. Deep learning has been widely applied to derive useful representations for spatial transcriptome analysis. However, effectively integrating spatial multi-modal data remains challenging. Here, we present ConGcR, a contrastive learning-based model for integrating gene expression, spatial location, and tissue morphology for data representation and spatial tissue architecture identification. Graph convolution and ResNet were used as encoders for gene expression with spatial location and histological image inputs, respectively. We further enhanced ConGcR with a graph auto-encoder as ConGaR to better model spatially embedded representations. We validated our models using 16 human brains, four chicken hearts, eight breast tumors, and 30 human lung spatial transcriptomics samples. The results showed that our models generated more effective embeddings for obtaining tissue architectures closer to the ground truth than other methods. Overall, our models not only can improve tissue architecture identification's accuracy but also may provide valuable insights and effective data representation for other tasks in spatial transcriptome analyses.
Collapse
Affiliation(s)
- Yu Lin
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Yanchun Liang
- School of Computer Science, Zhuhai College of Science and Technology, Zhuhai 519041, China
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Duolin Wang
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Yuzhou Chang
- Department of Biomedical Informatics, Ohio State University, Columbus, OH 43210, United States
| | - Qin Ma
- Department of Biomedical Informatics, Ohio State University, Columbus, OH 43210, United States
| | - Yan Wang
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Fei He
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Dong Xu
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
2
|
Li J, Wang Y, Raina MA, Xu C, Su L, Guo Q, Ma Q, Wang J, Xu D. scBSP: A fast and accurate tool for identifying spatially variable genes from spatial transcriptomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.06.592851. [PMID: 38765956 PMCID: PMC11100755 DOI: 10.1101/2024.05.06.592851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Spatially resolved transcriptomics have enabled the inference of gene expression patterns within two and three-dimensional space, while introducing computational challenges due to growing spatial resolutions and sparse expressions. Here, we introduce scBSP, an open-source, versatile, and user-friendly package designed for identifying spatially variable genes in large-scale spatial transcriptomics. scBSP implements sparse matrix operation to significantly increase the computational efficiency in both computational time and memory usage, processing the high-definition spatial transcriptomics data for 19,950 genes on 181,367 spots within 10 seconds. Applied to diverse sequencing data and simulations, scBSP efficiently identifies spatially variable genes, demonstrating fast computational speed and consistency across various sequencing techniques and spatial resolutions for both two and three-dimensional data with up to millions of cells. On a sample with hundreds of thousands of sports, scBSP identifies SVGs accurately in seconds to on a typical desktop computer.
Collapse
|
3
|
Mallick H, Porwal A, Saha S, Basak P, Svetnik V, Paul E. An integrated Bayesian framework for multi-omics prediction and classification. Stat Med 2024; 43:983-1002. [PMID: 38146838 DOI: 10.1002/sim.9953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 10/06/2023] [Accepted: 10/24/2023] [Indexed: 12/27/2023]
Abstract
With the growing commonality of multi-omics datasets, there is now increasing evidence that integrated omics profiles lead to more efficient discovery of clinically actionable biomarkers that enable better disease outcome prediction and patient stratification. Several methods exist to perform host phenotype prediction from cross-sectional, single-omics data modalities but decentralized frameworks that jointly analyze multiple time-dependent omics data to highlight the integrative and dynamic impact of repeatedly measured biomarkers are currently limited. In this article, we propose a novel Bayesian ensemble method to consolidate prediction by combining information across several longitudinal and cross-sectional omics data layers. Unlike existing frequentist paradigms, our approach enables uncertainty quantification in prediction as well as interval estimation for a variety of quantities of interest based on posterior summaries. We apply our method to four published multi-omics datasets and demonstrate that it recapitulates known biology in addition to providing novel insights while also outperforming existing methods in estimation, prediction, and uncertainty quantification. Our open-source software is publicly available at https://github.com/himelmallick/IntegratedLearner.
Collapse
Affiliation(s)
- Himel Mallick
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, 10065, New York, USA
- Department of Statistics and Data Science, Cornell University, Ithaca, New York, USA
| | - Anupreet Porwal
- Department of Statistics, University of Washington, Seattle, Washington, USA
| | - Satabdi Saha
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Piyali Basak
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA
| | - Vladimir Svetnik
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA
| | - Erina Paul
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA
| |
Collapse
|
4
|
Yao J, Yu J, Caffo B, Page SC, Martinowich K, Hicks SC. Spatial domain detection using contrastive self-supervised learning for spatial multi-omics technologies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.02.578662. [PMID: 38352580 PMCID: PMC10862910 DOI: 10.1101/2024.02.02.578662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
Recent advances in spatially-resolved single-omics and multi-omics technologies have led to the emergence of computational tools to detect or predict spatial domains. Additionally, histological images and immunofluorescence (IF) staining of proteins and cell types provide multiple perspectives and a more complete understanding of tissue architecture. Here, we introduce Proust, a scalable tool to predict discrete domains using spatial multi-omics data by combining the low-dimensional representation of biological profiles based on graph-based contrastive self-supervised learning. Our scalable method integrates multiple data modalities, such as RNA, protein, and H&E images, and predicts spatial domains within tissue samples. Through the integration of multiple modalities, Proust consistently demonstrates enhanced accuracy in detecting spatial domains, as evidenced across various benchmark datasets and technological platforms.
Collapse
Affiliation(s)
- Jianing Yao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, MD, USA
| | - Jinglun Yu
- Department of Electrical and Computer Engineering, Johns Hopkins University, MD, USA
| | - Brian Caffo
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, MD, USA
| | - Stephanie C. Page
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Keri Martinowich
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, MD, USA
| |
Collapse
|
5
|
Gianopoulos I, Daskalopoulou SS. Macrophage profiling in atherosclerosis: understanding the unstable plaque. Basic Res Cardiol 2024; 119:35-56. [PMID: 38244055 DOI: 10.1007/s00395-023-01023-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/01/2023] [Accepted: 11/01/2023] [Indexed: 01/22/2024]
Abstract
The development and rupture of atherosclerotic plaques is a major contributor to myocardial infarctions and ischemic strokes. The dynamic evolution of the plaque is largely attributed to monocyte/macrophage functions, which respond to various stimuli in the plaque microenvironment. To this end, macrophages play a central role in atherosclerotic lesions through the uptake of oxidized low-density lipoprotein that gets trapped in the artery wall, and the induction of an inflammatory response that can differentially affect the stability of the plaque in men and women. In this environment, macrophages can polarize towards pro-inflammatory M1 or anti-inflammatory M2 phenotypes, which represent the extremes of the polarization spectrum that include Mhem, M(Hb), Mox, and M4 populations. However, this traditional macrophage model paradigm has been redefined to include numerous immune and nonimmune cell clusters based on in-depth unbiased single-cell approaches. The goal of this review is to highlight (1) the phenotypic and functional properties of monocyte subsets in the circulation, and macrophage populations in atherosclerotic plaques, as well as their contribution towards stable or unstable phenotypes in men and women, and (2) single-cell RNA sequencing studies that have advanced our knowledge of immune, particularly macrophage signatures present in the atherosclerotic niche. We discuss the importance of performing high-dimensional approaches to facilitate the development of novel sex-specific immunotherapies that aim to reduce the risk of cardiovascular events.
Collapse
Affiliation(s)
- Ioanna Gianopoulos
- Division of Experimental Medicine, Department of Medicine, Faculty of Medicine and Health Sciences, Research Institute of the McGill University Health Centre, McGill University, Montreal, Canada
| | - Stella S Daskalopoulou
- Division of Experimental Medicine, Department of Medicine, Faculty of Medicine and Health Sciences, Research Institute of the McGill University Health Centre, McGill University, Montreal, Canada.
- Division of Internal Medicine, Department of Medicine, Faculty of Medicine and Health Sciences, McGill University Health Centre, McGill University, Montreal, Canada.
- Department of Medicine, Research Institute of the McGill University Health Centre, Glen Site, 1001 Decarie Boulevard, EM1.2210, Montreal, Quebec, H4A 3J1, Canada.
| |
Collapse
|
6
|
Zahedi R, Ghamsari R, Argha A, Macphillamy C, Beheshti A, Alizadehsani R, Lovell NH, Lotfollahi M, Alinejad-Rokny H. Deep learning in spatially resolved transcriptfomics: a comprehensive technical view. Brief Bioinform 2024; 25:bbae082. [PMID: 38483255 PMCID: PMC10939360 DOI: 10.1093/bib/bbae082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/22/2024] [Accepted: 02/13/2024] [Indexed: 03/17/2024] Open
Abstract
Spatially resolved transcriptomics (SRT) is a pioneering method for simultaneously studying morphological contexts and gene expression at single-cell precision. Data emerging from SRT are multifaceted, presenting researchers with intricate gene expression matrices, precise spatial details and comprehensive histology visuals. Such rich and intricate datasets, unfortunately, render many conventional methods like traditional machine learning and statistical models ineffective. The unique challenges posed by the specialized nature of SRT data have led the scientific community to explore more sophisticated analytical avenues. Recent trends indicate an increasing reliance on deep learning algorithms, especially in areas such as spatial clustering, identification of spatially variable genes and data alignment tasks. In this manuscript, we provide a rigorous critique of these advanced deep learning methodologies, probing into their merits, limitations and avenues for further refinement. Our in-depth analysis underscores that while the recent innovations in deep learning tailored for SRT have been promising, there remains a substantial potential for enhancement. A crucial area that demands attention is the development of models that can incorporate intricate biological nuances, such as phylogeny-aware processing or in-depth analysis of minuscule histology image segments. Furthermore, addressing challenges like the elimination of batch effects, perfecting data normalization techniques and countering the overdispersion and zero inflation patterns seen in gene expression is pivotal. To support the broader scientific community in their SRT endeavors, we have meticulously assembled a comprehensive directory of readily accessible SRT databases, hoping to serve as a foundation for future research initiatives.
Collapse
Affiliation(s)
- Roxana Zahedi
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
| | - Reza Ghamsari
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
| | - Ahmadreza Argha
- The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| | - Callum Macphillamy
- School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, 5371, Australia
| | - Amin Beheshti
- School of Computing, Macquarie University, Sydney, 2109, Australia
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Waurn Ponds, Melbourne, VIC, 3216, Australia
| | - Nigel H Lovell
- The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| | - Mohammad Lotfollahi
- Computational Health Center, Helmholtz Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Hamid Alinejad-Rokny
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| |
Collapse
|
7
|
Maden SK, Kwon SH, Huuki-Myers LA, Collado-Torres L, Hicks SC, Maynard KR. Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single-cell RNA-sequencing datasets. Genome Biol 2023; 24:288. [PMID: 38098055 PMCID: PMC10722720 DOI: 10.1186/s13059-023-03123-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/24/2023] [Indexed: 12/17/2023] Open
Abstract
Deconvolution of cell mixtures in "bulk" transcriptomic samples from homogenate human tissue is important for understanding disease pathologies. However, several experimental and computational challenges impede transcriptomics-based deconvolution approaches using single-cell/nucleus RNA-seq reference atlases. Cells from the brain and blood have substantially different sizes, total mRNA, and transcriptional activities, and existing approaches may quantify total mRNA instead of cell type proportions. Further, standards are lacking for the use of cell reference atlases and integrative analyses of single-cell and spatial transcriptomics data. We discuss how to approach these key challenges with orthogonal "gold standard" datasets for evaluating deconvolution methods.
Collapse
Affiliation(s)
- Sean K Maden
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Sang Ho Kwon
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Louise A Huuki-Myers
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Leonardo Collado-Torres
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, USA.
| | - Kristen R Maynard
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA.
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
8
|
Peng L, He X, Peng X, Li Z, Zhang L. STGNNks: Identifying cell types in spatial transcriptomics data based on graph neural network, denoising auto-encoder, and k-sums clustering. Comput Biol Med 2023; 166:107440. [PMID: 37738898 DOI: 10.1016/j.compbiomed.2023.107440] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/15/2023] [Accepted: 08/29/2023] [Indexed: 09/24/2023]
Abstract
BACKGROUND Spatial transcriptomics technologies fully utilize spatial location information, tissue morphological features, and transcriptional profiles. Integrating these data can greatly advance our understanding about cell biology in the morphological background. METHODS We developed an innovative spatial clustering method called STGNNks by combining graph neural network, denoising auto-encoder, and k-sums clustering. First, spatial resolved transcriptomics data are preprocessed and a hybrid adjacency matrix is constructed. Next, gene expressions and spatial context are integrated to learn spots' embedding features by a deep graph infomax-based graph convolutional network. Third, the learned features are mapped to a low-dimensional space through a zero-inflated negative binomial (ZINB)-based denoising auto-encoder. Fourth, a k-sums clustering algorithm is developed to identify spatial domains by combining k-means clustering and the ratio-cut clustering algorithms. Finally, it implements spatial trajectory inference, spatially variable gene identification, and differentially expressed gene detection based on the pseudo-space-time method on six 10x Genomics Visium datasets. RESULTS We compared our proposed STGNNks method with five other spatial clustering methods, CCST, Seurat, stLearn, Scanpy and SEDR. For the first time, four internal indicators in the area of machine learning, that is, silhouette coefficient, the Davies-Bouldin index, the Caliniski-Harabasz index, and the S_Dbw index, were used to measure the clustering performance of STGNNks with CCST, Seurat, stLearn, Scanpy and SEDR on five spatial transcriptomics datasets without labels (i.e., Adult Mouse Brain (FFPE), Adult Mouse Kidney (FFPE), Human Breast Cancer (Block A Section 2), Human Breast Cancer (FFPE), and Human Lymph Node). And two external indicators including adjusted Rand index (ARI) and normalized mutual information (NMI) were applied to evaluate the performance of the above six methods on Human Breast Cancer (Block A Section 1) with real labels. The comparison experiments elucidated that STGNNks obtained the smallest Davies-Bouldin and S_Dbw values and the largest Silhouette Coefficient, Caliniski-Harabasz, ARI and NMI, significantly outperforming the above five spatial transcriptomics analysis algorithms. Furthermore, we detected the top six spatially variable genes and the top five differentially expressed genes in each cluster on the above five unlabeled datasets. And the pseudo-space-time tree plot with hierarchical layout demonstrated a flow of Human Breast Cancer (Block A Section 1) progress in three clades branching from three invasive ductal carcinoma regions to multiple ductal carcinoma in situ sub-clusters. CONCLUSION We anticipate that STGNNks can efficiently improve spatial transcriptomics data analysis and further boost the diagnosis and therapy of related diseases. The codes are publicly available at https://github.com/plhhnu/STGNNks.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China; College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Xianzhi He
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Xinhuai Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, 421002, Hunan, China.
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, Jiangsu, China.
| |
Collapse
|
9
|
Vahid MR, Brown EL, Steen CB, Zhang W, Jeon HS, Kang M, Gentles AJ, Newman AM. High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nat Biotechnol 2023; 41:1543-1548. [PMID: 36879008 PMCID: PMC10635828 DOI: 10.1038/s41587-023-01697-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Accepted: 01/25/2023] [Indexed: 03/08/2023]
Abstract
Recent studies have emphasized the importance of single-cell spatial biology, yet available assays for spatial transcriptomics have limited gene recovery or low spatial resolution. Here we introduce CytoSPACE, an optimization method for mapping individual cells from a single-cell RNA sequencing atlas to spatial expression profiles. Across diverse platforms and tissue types, we show that CytoSPACE outperforms previous methods with respect to noise tolerance and accuracy, enabling tissue cartography at single-cell resolution.
Collapse
Affiliation(s)
- Milad R Vahid
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Erin L Brown
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Chloé B Steen
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Cancer Immunology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Department of Medical Genetics, Oslo University Hospital, Oslo, Norway
| | - Wubing Zhang
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Hyun Soo Jeon
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Minji Kang
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Andrew J Gentles
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Medicine, Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - Aaron M Newman
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA.
| |
Collapse
|
10
|
Fatemi MY, Lu Y, Sharma C, Feng E, Azher ZL, Diallo AB, Srinivasan G, Rosner GM, Pointer KB, Christensen BC, Salas LA, Tsongalis GJ, Palisoul SM, Perreard L, Kolling FW, Vaickus LJ, Levy JJ. Feasibility of Inferring Spatial Transcriptomics from Single-Cell Histological Patterns for Studying Colon Cancer Tumor Heterogeneity. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.09.23296701. [PMID: 37873186 PMCID: PMC10593064 DOI: 10.1101/2023.10.09.23296701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Background Spatial transcriptomics involves studying the spatial organization of gene expression within tissues, offering insights into the molecular diversity of tumors. While spatial gene expression is commonly amalgamated from 1-10 cells across 50-micron spots, recent methods have demonstrated the capability to disaggregate this information at subspot resolution by leveraging both expression and histological patterns. However, elucidating such information from histology alone presents a significant challenge but if solved can better permit spatial molecular analysis at cellular resolution for instances where Visium data is not available, reducing study costs. This study explores integrating single-cell histological and transcriptomic data to infer spatial mRNA expression patterns in whole slide images collected from a cohort of stage pT3 colorectal cancer patients. A cell graph neural network algorithm was developed to align histological information extracted from detected cells with single cell RNA patterns through optimal transport methods, facilitating the analysis of cellular groupings and gene relationships. This approach leveraged spot-level expression as an intermediary to co-map histological and transcriptomic information at the single-cell level. Results Our study demonstrated that single-cell transcriptional heterogeneity within a spot could be predicted from histological markers extracted from cells detected within a spot. Furthermore, our model exhibited proficiency in delineating overarching gene expression patterns across whole-slide images. This approach compared favorably to traditional patch-based computer vision methods as well as other methods which did not incorporate single cell expression during the model fitting procedures. Topological nuances of single-cell expression within a Visium spot were preserved using the developed methodology. Conclusion This innovative approach augments the resolution of spatial molecular assays utilizing histology as a sole input through synergistic co-mapping of histological and transcriptomic datasets at the single-cell level, anchored by spatial transcriptomics. While initial results are promising, they warrant rigorous validation. This includes collaborating with pathologists for precise spatial identification of distinct cell types and utilizing sophisticated assays, such as Xenium, to attain deeper subcellular insights.
Collapse
|
11
|
Li Z, Chen X, Zhang X, Jiang R, Chen S. Latent feature extraction with a prior-based self-attention framework for spatial transcriptomics. Genome Res 2023; 33:1757-1773. [PMID: 37903634 PMCID: PMC10691543 DOI: 10.1101/gr.277891.123] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 09/19/2023] [Indexed: 11/01/2023]
Abstract
Rapid advances in spatial transcriptomics (ST) have revolutionized the interrogation of spatial heterogeneity and increase the demand for comprehensive methods to effectively characterize spatial domains. As a prerequisite for ST data analysis, spatial domain characterization is a crucial step for downstream analyses and biological implications. Here we propose a prior-based self-attention framework for spatial transcriptomics (PAST), a variational graph convolutional autoencoder for ST, which effectively integrates prior information via a Bayesian neural network, captures spatial patterns via a self-attention mechanism, and enables scalable application via a ripple walk sampler strategy. Through comprehensive experiments on data sets generated by different technologies, we show that PAST can effectively characterize spatial domains and facilitate various downstream analyses, including ST visualization, spatial trajectory inference and pseudotime analysis. Also, we highlight the advantages of PAST for multislice joint embedding and automatic annotation of spatial domains in newly sequenced ST data. Compared with existing methods, PAST is the first ST method that integrates reference data to analyze ST data. We anticipate that PAST will open up new avenues for researchers to decipher ST data with customized reference data, which expands the applicability of ST technology.
Collapse
Affiliation(s)
- Zhen Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| |
Collapse
|
12
|
Shi X, Zhu J, Long Y, Liang C. Identifying spatial domains of spatially resolved transcriptomics via multi-view graph convolutional networks. Brief Bioinform 2023; 24:bbad278. [PMID: 37544658 DOI: 10.1093/bib/bbad278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 06/27/2023] [Accepted: 07/14/2023] [Indexed: 08/08/2023] Open
Abstract
MOTIVATION Recent advances in spatially resolved transcriptomics (ST) technologies enable the measurement of gene expression profiles while preserving cellular spatial context. Linking gene expression of cells with their spatial distribution is essential for better understanding of tissue microenvironment and biological progress. However, effectively combining gene expression data with spatial information to identify spatial domains remains challenging. RESULTS To deal with the above issue, in this paper, we propose a novel unsupervised learning framework named STMGCN for identifying spatial domains using multi-view graph convolution networks (MGCNs). Specifically, to fully exploit spatial information, we first construct multiple neighbor graphs (views) with different similarity measures based on the spatial coordinates. Then, STMGCN learns multiple view-specific embeddings by combining gene expressions with each neighbor graph through graph convolution networks. Finally, to capture the importance of different graphs, we further introduce an attention mechanism to adaptively fuse view-specific embeddings and thus derive the final spot embedding. STMGCN allows for the effective utilization of spatial context to enhance the expressive power of the latent embeddings with multiple graph convolutions. We apply STMGCN on two simulation datasets and five real spatial transcriptomics datasets with different resolutions across distinct platforms. The experimental results demonstrate that STMGCN obtains competitive results in spatial domain identification compared with five state-of-the-art methods, including spatial and non-spatial alternatives. Besides, STMGCN can detect spatially variable genes with enriched expression patterns in the identified domains. Overall, STMGCN is a powerful and efficient computational framework for identifying spatial domains in spatial transcriptomics data.
Collapse
Affiliation(s)
- Xuejing Shi
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Juntong Zhu
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Yahui Long
- Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), 8A Biomedical Grove, 138648, Singapore
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| |
Collapse
|
13
|
Huo Y, Guo Y, Wang J, Xue H, Feng Y, Chen W, Li X. Integrating multi-modal information to detect spatial domains of spatial transcriptomics by graph attention network. J Genet Genomics 2023; 50:720-733. [PMID: 37356752 DOI: 10.1016/j.jgg.2023.06.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 06/15/2023] [Accepted: 06/16/2023] [Indexed: 06/27/2023]
Abstract
Recent advances in spatially resolved transcriptomic technologies have enabled unprecedented opportunities to elucidate tissue architecture and function in situ. Spatial transcriptomics can provide multimodal and complementary information simultaneously, including gene expression profiles, spatial locations, and histology images. However, most existing methods have limitations in efficiently utilizing spatial information and matched high-resolution histology images. To fully leverage the multi-modal information, we propose a SPAtially embedded Deep Attentional graph Clustering (SpaDAC) method to identify spatial domains while reconstructing denoised gene expression profiles. This method can efficiently learn the low-dimensional embeddings for spatial transcriptomics data by constructing multi-view graph modules to capture both spatial location connectives and morphological connectives. Benchmark results demonstrate that SpaDAC outperforms other algorithms on several recent spatial transcriptomics datasets. SpaDAC is a valuable tool for spatial domain detection, facilitating the comprehension of tissue architecture and cellular microenvironment. The source code of SpaDAC is freely available at Github (https://github.com/huoyuying/SpaDAC.git).
Collapse
Affiliation(s)
- Yuying Huo
- School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China
| | - Yilang Guo
- School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China
| | - Jiakang Wang
- School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China
| | - Huijie Xue
- School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China
| | - Yujuan Feng
- School of Software Engineering, Beijing University of Technology, Beijing 100124, China
| | | | - Xiangyu Li
- School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China.
| |
Collapse
|
14
|
Weber LM, Saha A, Datta A, Hansen KD, Hicks SC. nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes. Nat Commun 2023; 14:4059. [PMID: 37429865 DOI: 10.1038/s41467-023-39748-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 06/23/2023] [Indexed: 07/12/2023] Open
Abstract
Feature selection to identify spatially variable genes or other biologically informative genes is a key step during analyses of spatially-resolved transcriptomics data. Here, we propose nnSVG, a scalable approach to identify spatially variable genes based on nearest-neighbor Gaussian processes. Our method (i) identifies genes that vary in expression continuously across the entire tissue or within a priori defined spatial domains, (ii) uses gene-specific estimates of length scale parameters within the Gaussian process models, and (iii) scales linearly with the number of spatial locations. We demonstrate the performance of our method using experimental data from several technological platforms and simulations. A software implementation is available at https://bioconductor.org/packages/nnSVG .
Collapse
Affiliation(s)
- Lukas M Weber
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Arkajyoti Saha
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - Abhirup Datta
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Kasper D Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| |
Collapse
|
15
|
Zhang Q, Jiang S, Schroeder A, Hu J, Li K, Zhang B, Dai D, Lee EB, Xiao R, Li M. Leveraging spatial transcriptomics data to recover cell locations in single-cell RNA-seq with CeLEry. Nat Commun 2023; 14:4050. [PMID: 37422469 PMCID: PMC10329686 DOI: 10.1038/s41467-023-39895-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 07/03/2023] [Indexed: 07/10/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular heterogeneity in health and disease. However, the lack of physical relationships among dissociated cells has limited its applications. To address this issue, we present CeLEry (Cell Location recovEry), a supervised deep learning algorithm that leverages gene expression and spatial location relationships learned from spatial transcriptomics to recover the spatial origins of cells in scRNA-seq. CeLEry has an optional data augmentation procedure via a variational autoencoder, which improves the method's robustness and allows it to overcome noise in scRNA-seq data. We show that CeLEry can infer the spatial origins of cells in scRNA-seq at multiple levels, including 2D location and spatial domain of a cell, while also providing uncertainty estimates for the recovered locations. Our comprehensive benchmarking evaluations on multiple datasets generated from brain and cancer tissues using Visium, MERSCOPE, MERFISH, and Xenium demonstrate that CeLEry can reliably recover the spatial location information for cells using scRNA-seq data.
Collapse
Affiliation(s)
- Qihuang Zhang
- Department of Epidemiology, Biostatistics and Occupational Health, School of Population and Global Health, McGill University, Montreal, QC, Canada.
| | - Shunzhou Jiang
- Statistical Center for Single-Cell and Spatial Genomics, Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Amelia Schroeder
- Statistical Center for Single-Cell and Spatial Genomics, Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Jian Hu
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA, 30322, USA
| | - Kejie Li
- Research Department, Biogen, Inc., 225 Binney St., Cambridge, MA, 02142, USA
| | - Baohong Zhang
- Research Department, Biogen, Inc., 225 Binney St., Cambridge, MA, 02142, USA
| | - David Dai
- Translational Neuropathology Research Laboratory, Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Edward B Lee
- Translational Neuropathology Research Laboratory, Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Rui Xiao
- Statistical Center for Single-Cell and Spatial Genomics, Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Mingyao Li
- Statistical Center for Single-Cell and Spatial Genomics, Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
16
|
Hu Y, Zhao Y, Schunk CT, Ma Y, Derr T, Zhou XM. ADEPT: Autoencoder with differentially expressed genes and imputation for robust spatial transcriptomics clustering. iScience 2023; 26:106792. [PMID: 37235055 PMCID: PMC10205785 DOI: 10.1016/j.isci.2023.106792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 04/06/2023] [Accepted: 04/26/2023] [Indexed: 05/28/2023] Open
Abstract
Advancements in spatial transcriptomics (ST) have enabled an in-depth understanding of complex tissues by quantifying gene expression at spatially localized spots. Several notable clustering methods have been introduced to utilize both spatial and transcriptional information in the analysis of ST datasets. However, data quality across different ST sequencing techniques and types of datasets influence the performance of different methods and benchmarks. To harness spatial context and transcriptional profile in ST data, we developed a graph-based, multi-stage framework for robust clustering, called ADEPT. To control and stabilize data quality, ADEPT relies on a graph autoencoder backbone and performs an iterative clustering on imputed, differentially expressed genes-based matrices to minimize the variance of clustering results. ADEPT outperformed other popular methods on ST data generated by different platforms across analyses such as spatial domain identification, visualization, spatial trajectory inference, and data denoising.
Collapse
Affiliation(s)
- Yunfei Hu
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Yuying Zhao
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Curtis T. Schunk
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA
| | - Yingxiang Ma
- Data Science Institute, Vanderbilt University, Nashville, TN, USA
| | - Tyler Derr
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
- Data Science Institute, Vanderbilt University, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA
- Data Science Institute, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
17
|
Van de Sande B, Lee JS, Mutasa-Gottgens E, Naughton B, Bacon W, Manning J, Wang Y, Pollard J, Mendez M, Hill J, Kumar N, Cao X, Chen X, Khaladkar M, Wen J, Leach A, Ferran E. Applications of single-cell RNA sequencing in drug discovery and development. Nat Rev Drug Discov 2023; 22:496-520. [PMID: 37117846 PMCID: PMC10141847 DOI: 10.1038/s41573-023-00688-4] [Citation(s) in RCA: 52] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/10/2023] [Indexed: 04/30/2023]
Abstract
Single-cell technologies, particularly single-cell RNA sequencing (scRNA-seq) methods, together with associated computational tools and the growing availability of public data resources, are transforming drug discovery and development. New opportunities are emerging in target identification owing to improved disease understanding through cell subtyping, and highly multiplexed functional genomics screens incorporating scRNA-seq are enhancing target credentialling and prioritization. ScRNA-seq is also aiding the selection of relevant preclinical disease models and providing new insights into drug mechanisms of action. In clinical development, scRNA-seq can inform decision-making via improved biomarker identification for patient stratification and more precise monitoring of drug response and disease progression. Here, we illustrate how scRNA-seq methods are being applied in key steps in drug discovery and development, and discuss ongoing challenges for their implementation in the pharmaceutical industry.
Collapse
Affiliation(s)
| | | | | | - Bart Naughton
- Computational Neurobiology, Eisai, Cambridge, MA, USA
| | - Wendi Bacon
- EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
- The Open University, Milton Keynes, UK
| | | | - Yong Wang
- Precision Bioinformatics, Prometheus Biosciences, San Diego, CA, USA
| | | | - Melissa Mendez
- Genomic Sciences, GlaxoSmithKline, Collegeville, PA, USA
| | - Jon Hill
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, USA
| | - Namit Kumar
- Informatics & Predictive Sciences, Bristol Myers Squibb, San Diego, CA, USA
| | - Xiaohong Cao
- Genomic Research Center, AbbVie Inc., Cambridge, MA, USA
| | - Xiao Chen
- Magnet Biomedicine, Cambridge, MA, USA
| | - Mugdha Khaladkar
- Human Genetics and Computational Biology, GlaxoSmithKline, Collegeville, PA, USA
| | - Ji Wen
- Oncology Research and Development Unit, Pfizer, La Jolla, CA, USA
| | | | | |
Collapse
|
18
|
Iadecola C, Smith EE, Anrather J, Gu C, Mishra A, Misra S, Perez-Pinzon MA, Shih AY, Sorond FA, van Veluw SJ, Wellington CL. The Neurovasculome: Key Roles in Brain Health and Cognitive Impairment: A Scientific Statement From the American Heart Association/American Stroke Association. Stroke 2023; 54:e251-e271. [PMID: 37009740 PMCID: PMC10228567 DOI: 10.1161/str.0000000000000431] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2023]
Abstract
BACKGROUND Preservation of brain health has emerged as a leading public health priority for the aging world population. Advances in neurovascular biology have revealed an intricate relationship among brain cells, meninges, and the hematic and lymphatic vasculature (the neurovasculome) that is highly relevant to the maintenance of cognitive function. In this scientific statement, a multidisciplinary team of experts examines these advances, assesses their relevance to brain health and disease, identifies knowledge gaps, and provides future directions. METHODS Authors with relevant expertise were selected in accordance with the American Heart Association conflict-of-interest management policy. They were assigned topics pertaining to their areas of expertise, reviewed the literature, and summarized the available data. RESULTS The neurovasculome, composed of extracranial, intracranial, and meningeal vessels, as well as lymphatics and associated cells, subserves critical homeostatic functions vital for brain health. These include delivering O2 and nutrients through blood flow and regulating immune trafficking, as well as clearing pathogenic proteins through perivascular spaces and dural lymphatics. Single-cell omics technologies have unveiled an unprecedented molecular heterogeneity in the cellular components of the neurovasculome and have identified novel reciprocal interactions with brain cells. The evidence suggests a previously unappreciated diversity of the pathogenic mechanisms by which disruption of the neurovasculome contributes to cognitive dysfunction in neurovascular and neurodegenerative diseases, providing new opportunities for the prevention, recognition, and treatment of these conditions. CONCLUSIONS These advances shed new light on the symbiotic relationship between the brain and its vessels and promise to provide new diagnostic and therapeutic approaches for brain disorders associated with cognitive dysfunction.
Collapse
|
19
|
Geras A, Darvish Shafighi S, Domżał K, Filipiuk I, Rączkowska A, Szymczak P, Toosi H, Kaczmarek L, Koperski Ł, Lagergren J, Nowis D, Szczurek E. Celloscope: a probabilistic model for marker-gene-driven cell type deconvolution in spatial transcriptomics data. Genome Biol 2023; 24:120. [PMID: 37198601 PMCID: PMC10190053 DOI: 10.1186/s13059-023-02951-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 04/21/2023] [Indexed: 05/19/2023] Open
Abstract
Spatial transcriptomics maps gene expression across tissues, posing the challenge of determining the spatial arrangement of different cell types. However, spatial transcriptomics spots contain multiple cells. Therefore, the observed signal comes from mixtures of cells of different types. Here, we propose an innovative probabilistic model, Celloscope, that utilizes established prior knowledge on marker genes for cell type deconvolution from spatial transcriptomics data. Celloscope outperforms other methods on simulated data, successfully indicates known brain structures and spatially distinguishes between inhibitory and excitatory neuron types based in mouse brain tissue, and dissects large heterogeneity of immune infiltrate composition in prostate gland tissue.
Collapse
Affiliation(s)
- Agnieszka Geras
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Warsaw, Poland
| | - Shadi Darvish Shafighi
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Warsaw, Poland
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative - UMR, Paris, France
| | - Kacper Domżał
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Warsaw, Poland
| | - Igor Filipiuk
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Warsaw, Poland
| | - Alicja Rączkowska
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Warsaw, Poland
| | - Paulina Szymczak
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Warsaw, Poland
| | - Hosein Toosi
- KTH Royal Institute of Technology, Stockholm, Sweden
| | - Leszek Kaczmarek
- BRAINCITY, Nencki Institute of Experimental Biology of the Polish Academy of Sciences, Warsaw, Poland
| | - Łukasz Koperski
- Department of Pathology, Medical University of Warsaw, Warsaw, Poland
| | | | - Dominika Nowis
- Laboratory of Experimental Medicine, Medical University of Warsaw, Warsaw, Poland
| | - Ewa Szczurek
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Warsaw, Poland.
| |
Collapse
|
20
|
Coleman K, Hu J, Schroeder A, Lee EB, Li M. SpaDecon: cell-type deconvolution in spatial transcriptomics with semi-supervised learning. Commun Biol 2023; 6:378. [PMID: 37029267 PMCID: PMC10082183 DOI: 10.1038/s42003-023-04761-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 03/24/2023] [Indexed: 04/09/2023] Open
Abstract
Spatially resolved transcriptomics (SRT) has advanced our understanding of the spatial patterns of gene expression, but the lack of single-cell resolution in spatial barcoding-based SRT hinders the inference of specific locations of individual cells. To determine the spatial distribution of cell types in SRT, we present SpaDecon, a semi-supervised learning approach that incorporates gene expression, spatial location, and histology information for cell-type deconvolution. SpaDecon was evaluated through analyses of four real SRT datasets using knowledge of the expected distributions of cell types. Quantitative evaluations were performed for four pseudo-SRT datasets constructed according to benchmark proportions. Using mean squared error and Jensen-Shannon divergence with the benchmark proportions as evaluation criteria, we show that SpaDecon performance surpasses that of published cell-type deconvolution methods. Given the accuracy and computational speed of SpaDecon, we anticipate it will be valuable for SRT data analysis and will facilitate the integration of genomics and digital pathology.
Collapse
Affiliation(s)
- Kyle Coleman
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Jian Hu
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Amelia Schroeder
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Edward B Lee
- Translational Neuropathology Research Laboratory, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
21
|
Heydari AA, Sindi SS. Deep learning in spatial transcriptomics: Learning from the next next-generation sequencing. BIOPHYSICS REVIEWS 2023; 4:011306. [PMID: 38505815 PMCID: PMC10903438 DOI: 10.1063/5.0091135] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 12/19/2022] [Indexed: 03/21/2024]
Abstract
Spatial transcriptomics (ST) technologies are rapidly becoming the extension of single-cell RNA sequencing (scRNAseq), holding the potential of profiling gene expression at a single-cell resolution while maintaining cellular compositions within a tissue. Having both expression profiles and tissue organization enables researchers to better understand cellular interactions and heterogeneity, providing insight into complex biological processes that would not be possible with traditional sequencing technologies. Data generated by ST technologies are inherently noisy, high-dimensional, sparse, and multi-modal (including histological images, count matrices, etc.), thus requiring specialized computational tools for accurate and robust analysis. However, many ST studies currently utilize traditional scRNAseq tools, which are inadequate for analyzing complex ST datasets. On the other hand, many of the existing ST-specific methods are built upon traditional statistical or machine learning frameworks, which have shown to be sub-optimal in many applications due to the scale, multi-modality, and limitations of spatially resolved data (such as spatial resolution, sensitivity, and gene coverage). Given these intricacies, researchers have developed deep learning (DL)-based models to alleviate ST-specific challenges. These methods include new state-of-the-art models in alignment, spatial reconstruction, and spatial clustering, among others. However, DL models for ST analysis are nascent and remain largely underexplored. In this review, we provide an overview of existing state-of-the-art tools for analyzing spatially resolved transcriptomics while delving deeper into the DL-based approaches. We discuss the new frontiers and the open questions in this field and highlight domains in which we anticipate transformational DL applications.
Collapse
|
22
|
Zhang L, Badai J, Wang G, Ru X, Song W, You Y, He J, Huang S, Feng H, Chen R, Zhao Y, Chen Y. Discovering hematoma-stimulated circuits for secondary brain injury after intraventricular hemorrhage by spatial transcriptome analysis. Front Immunol 2023; 14:1123652. [PMID: 36825001 PMCID: PMC9941151 DOI: 10.3389/fimmu.2023.1123652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 01/11/2023] [Indexed: 02/09/2023] Open
Abstract
Introduction Central nervous system (CNS) diseases, such as neurodegenerative disorders and brain diseases caused by acute injuries, are important, yet challenging to study due to disease lesion locations and other complexities. Methods Utilizing the powerful method of spatial transcriptome analysis together with novel algorithms we developed for the study, we report here for the first time a 3D trajectory map of gene expression changes in the brain following acute neural injury using a mouse model of intraventricular hemorrhage (IVH). IVH is a common and representative complication after various acute brain injuries with severe mortality and mobility implications. Results Our data identified three main 3D global pseudospace-time trajectory bundles that represent the main neural circuits from the lateral ventricle to the hippocampus and primary cortex affected by experimental IVH stimulation. Further analysis indicated a rapid response in the primary cortex, as well as a direct and integrated effect on the hippocampus after IVH stimulation. Discussion These results are informative for understanding the pathophysiological changes, including the spatial and temporal patterns of gene expression changes, in IVH patients after acute brain injury, strategizing more effective clinical management regimens, and developing novel bioinformatics strategies for the study of other CNS diseases. The algorithm strategies used in this study are searchable via a web service (www.combio-lezhang.online/3dstivh/home).
Collapse
Affiliation(s)
- Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China,Innovation Center of Nursing Research, West China Hospital, Sichuan University, Chengdu, China
| | - Jiayidaer Badai
- College of Computer Science, Sichuan University, Chengdu, China
| | - Guan Wang
- College of Computer Science, Sichuan University, Chengdu, China,Innovation Center of Nursing Research, West China Hospital, Sichuan University, Chengdu, China
| | - Xufang Ru
- Chinese Academy of Sciences (CAS) Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, China,Department of Neurosurgery and State Key Laboratory of Trauma, Burn and Combined Injury, Southwest Hospital, Army Medical University, Chongqing, China
| | - Wenkai Song
- College of Computer Science, Sichuan University, Chengdu, China
| | - Yujie You
- College of Computer Science, Sichuan University, Chengdu, China
| | - Jiaojiao He
- College of Computer Science, Sichuan University, Chengdu, China
| | - Suna Huang
- Department of Neurosurgery and State Key Laboratory of Trauma, Burn and Combined Injury, Southwest Hospital, Army Medical University, Chongqing, China
| | - Hua Feng
- Department of Neurosurgery and State Key Laboratory of Trauma, Burn and Combined Injury, Southwest Hospital, Army Medical University, Chongqing, China
| | - Runsheng Chen
- College of Computer Science, Sichuan University, Chengdu, China,Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China,West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China,*Correspondence: Runsheng Chen, ; Yi Zhao, ; Yujie Chen, ;
| | - Yi Zhao
- College of Computer Science, Sichuan University, Chengdu, China,West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China,Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China,*Correspondence: Runsheng Chen, ; Yi Zhao, ; Yujie Chen, ;
| | - Yujie Chen
- Chinese Academy of Sciences (CAS) Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, China,Department of Neurosurgery and State Key Laboratory of Trauma, Burn and Combined Injury, Southwest Hospital, Army Medical University, Chongqing, China,*Correspondence: Runsheng Chen, ; Yi Zhao, ; Yujie Chen, ;
| |
Collapse
|
23
|
Ospina O, Soupir A, Fridley BL. A Primer on Preprocessing, Visualization, Clustering, and Phenotyping of Barcode-Based Spatial Transcriptomics Data. Methods Mol Biol 2023; 2629:115-140. [PMID: 36929076 DOI: 10.1007/978-1-0716-2986-4_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Recent developments in spatially resolved transcriptomics (ST) have resulted in a large number of studies characterizing the architecture of tissues, the spatial distribution of cell types, and their interactions. Furthermore, ST promises to enable the discovery of more accurate drug targets while also providing a better understanding of the etiology and evolution of complex diseases. The analysis of ST brings similar challenges as seen in other gene expression assays such as scRNA-seq; however, there is the additional spatial information that warrants the development of suitable algorithms for the quality control, preprocessing, visualization, and other discovery-enabling approaches (e.g., clustering, cell phenotyping). In this chapter, we review some of the existing algorithms to perform these analytical tasks and highlight some of the unmet analytical challenges in the analysis of ST data. Given the diversity of available ST technologies, we focus this chapter on the analysis of barcode-based RNA quantitation techniques.
Collapse
Affiliation(s)
- Oscar Ospina
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA
| | - Alex Soupir
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA
| | - Brooke L Fridley
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA.
| |
Collapse
|
24
|
Overbey EG, Das S, Cope H, Madrigal P, Andrusivova Z, Frapard S, Klotz R, Bezdan D, Gupta A, Scott RT, Park J, Chirko D, Galazka JM, Costes SV, Mason CE, Herranz R, Szewczyk NJ, Borg J, Giacomello S. Challenges and considerations for single-cell and spatially resolved transcriptomics sample collection during spaceflight. CELL REPORTS METHODS 2022; 2:100325. [PMID: 36452864 PMCID: PMC9701605 DOI: 10.1016/j.crmeth.2022.100325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have experienced rapid development in recent years. The findings of spaceflight-based scRNA-seq and SRT investigations are likely to improve our understanding of life in space and our comprehension of gene expression in various cell systems and tissue dynamics. However, compared to their Earth-based counterparts, gene expression experiments conducted in spaceflight have not experienced the same pace of development. Out of the hundreds of spaceflight gene expression datasets available, only a few used scRNA-seq and SRT. In this perspective piece, we explore the growing importance of scRNA-seq and SRT in space biology and discuss the challenges and considerations relevant to robust experimental design to enable growth of these methods in the field.
Collapse
Affiliation(s)
- Eliah G. Overbey
- Weill Cornell Medicine, New York, NY, USA
- Institute for Computational Biomedicine, New York, NY, USA
| | - Saswati Das
- Department of Biochemistry, Atal Bihari Vajpayee Institute of Medical Sciences & Dr. Ram Manohar Lohia Hospital, New Delhi, India
| | - Henry Cope
- School of Medicine, University of Nottingham, Derby DE22 3DT, UK
| | - Pedro Madrigal
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Zaneta Andrusivova
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Solène Frapard
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Rebecca Klotz
- KBR, Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Daniela Bezdan
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen 72076, Germany
- NGS Competence Center Tübingen (NCCT), University of Tübingen, Tübingen, German
- yuri GmbH, Meckenbeuren, Germany
| | | | - Ryan T. Scott
- KBR, Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | | | | | - Jonathan M. Galazka
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Sylvain V. Costes
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Christopher E. Mason
- Weill Cornell Medicine, New York, NY, USA
- Institute for Computational Biomedicine, New York, NY, USA
- The Feil Family Brain and Mind Research Institute, New York, NY, USA
- The WorldQuant Initiative for Quantitative Prediction, New York, NY, USA
| | - Raul Herranz
- Centro de Investigaciones Biológicas Margarita Salas (CSIC), Madrid 28040, Spain
| | - Nathaniel J. Szewczyk
- School of Medicine, University of Nottingham, Derby DE22 3DT, UK
- Department of Biomedical Sciences, Heritage College of Osteopathic Medicine, Ohio University, Athens, OH 45701, USA
| | - Joseph Borg
- Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, Msida, Malta
| | - Stefania Giacomello
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
25
|
Zuo C, Zhang Y, Cao C, Feng J, Jiao M, Chen L. Elucidating tumor heterogeneity from spatially resolved transcriptomics data by multi-view graph collaborative learning. Nat Commun 2022; 13:5962. [PMID: 36216831 PMCID: PMC9551038 DOI: 10.1038/s41467-022-33619-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Accepted: 09/26/2022] [Indexed: 12/02/2022] Open
Abstract
Spatially resolved transcriptomics (SRT) technology enables us to gain novel insights into tissue architecture and cell development, especially in tumors. However, lacking computational exploitation of biological contexts and multi-view features severely hinders the elucidation of tissue heterogeneity. Here, we propose stMVC, a multi-view graph collaborative-learning model that integrates histology, gene expression, spatial location, and biological contexts in analyzing SRT data by attention. Specifically, stMVC adopting semi-supervised graph attention autoencoder separately learns view-specific representations of histological-similarity-graph or spatial-location-graph, and then simultaneously integrates two-view graphs for robust representations through attention under semi-supervision of biological contexts. stMVC outperforms other tools in detecting tissue structure, inferring trajectory relationships, and denoising on benchmark slices of human cortex. Particularly, stMVC identifies disease-related cell-states and their transition cell-states in breast cancer study, which are further validated by the functional and survival analysis of independent clinical data. Those results demonstrate clinical and prognostic applications from SRT data.
Collapse
Affiliation(s)
- Chunman Zuo
- Institute of Artificial Intelligence, Donghua University, Shanghai, 201620, China.
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, 200031, China.
| | - Yijian Zhang
- Department of General Surgery, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, 200092, China
| | - Chen Cao
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China
| | - Jinwang Feng
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Mingqi Jiao
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, 310024, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, 200031, China.
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, 310024, China.
- Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, Guangdong, 519031, China.
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
| |
Collapse
|
26
|
Tsuchiya T, Hori H, Ozaki H. CCPLS reveals cell-type-specific spatial dependence of transcriptomes in single cells. Bioinformatics 2022; 38:4868-4877. [PMID: 36063454 PMCID: PMC9620831 DOI: 10.1093/bioinformatics/btac599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 08/17/2022] [Accepted: 09/04/2022] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Cell-cell communications regulate internal cellular states, e.g. gene expression and cell functions, and play pivotal roles in normal development and disease states. Furthermore, single-cell RNA sequencing methods have revealed cell-to-cell expression variability of highly variable genes (HVGs), which is also crucial. Nevertheless, the regulation of cell-to-cell expression variability of HVGs via cell-cell communications is still largely unexplored. The recent advent of spatial transcriptome methods has linked gene expression profiles to the spatial context of single cells, which has provided opportunities to reveal those regulations. The existing computational methods extract genes with expression levels influenced by neighboring cell types. However, limitations remain in the quantitativeness and interpretability: they neither focus on HVGs nor consider the effects of multiple neighboring cell types. RESULTS Here, we propose CCPLS (Cell-Cell communications analysis by Partial Least Square regression modeling), which is a statistical framework for identifying cell-cell communications as the effects of multiple neighboring cell types on cell-to-cell expression variability of HVGs, based on the spatial transcriptome data. For each cell type, CCPLS performs PLS regression modeling and reports coefficients as the quantitative index of the cell-cell communications. Evaluation using simulated data showed our method accurately estimated the effects of multiple neighboring cell types on HVGs. Furthermore, applications to the two real datasets demonstrate that CCPLS can extract biologically interpretable insights from the inferred cell-cell communications. AVAILABILITY AND IMPLEMENTATION The R package is available at https://github.com/bioinfo-tsukuba/CCPLS. The data are available at https://github.com/bioinfo-tsukuba/CCPLS_paper. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Takaho Tsuchiya
- Bioinformatics Laboratory, Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan,Center for Artificial Intelligence Research, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan
| | - Hiroki Hori
- Bioinformatics Laboratory, Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan,Doctoral Program in Medical Sciences, Graduate School of Comprehensive Human Sciences, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan
| | | |
Collapse
|
27
|
Chang Y, He F, Wang J, Chen S, Li J, Liu J, Yu Y, Su L, Ma A, Allen C, Lin Y, Sun S, Liu B, Javier Otero J, Chung D, Fu H, Li Z, Xu D, Ma Q. Define and visualize pathological architectures of human tissues from spatially resolved transcriptomics using deep learning. Comput Struct Biotechnol J 2022; 20:4600-4617. [PMID: 36090815 PMCID: PMC9440291 DOI: 10.1016/j.csbj.2022.08.029] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 08/11/2022] [Accepted: 08/12/2022] [Indexed: 11/29/2022] Open
Abstract
Spatially resolved transcriptomics provides a new way to define spatial contexts and understand the pathogenesis of complex human diseases. Although some computational frameworks can characterize spatial context via various clustering methods, the detailed spatial architectures and functional zonation often cannot be revealed and localized due to the limited capacities of associating spatial information. We present RESEPT, a deep-learning framework for characterizing and visualizing tissue architecture from spatially resolved transcriptomics. Given inputs such as gene expression or RNA velocity, RESEPT learns a three-dimensional embedding with a spatial retained graph neural network from spatial transcriptomics. The embedding is then visualized by mapping into color channels in an RGB image and segmented with a supervised convolutional neural network model. Based on a benchmark of 10x Genomics Visium spatial transcriptomics datasets on the human and mouse cortex, RESEPT infers and visualizes the tissue architecture accurately. It is noteworthy that, for the in-house AD samples, RESEPT can localize cortex layers and cell types based on pre-defined region- or cell-type-enriched genes and furthermore provide critical insights into the identification of amyloid-beta plaques in Alzheimer's disease. Interestingly, in a glioblastoma sample analysis, RESEPT distinguishes tumor-enriched, non-tumor, and regions of neuropil with infiltrating tumor cells in support of clinical and prognostic cancer applications.
Collapse
Affiliation(s)
- Yuzhou Chang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
- The Pelotonia Institute for Immuno-oncology, The Ohio State University Comprehensive Cancer Center, Columbus, OH 43210, USA
| | - Fei He
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin 130117, China
| | - Juexin Wang
- Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Shuo Chen
- Department of Neuroscience, The Ohio State University, Columbus, OH 43210, USA
| | - Jingyi Li
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin 130117, China
| | - Jixin Liu
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Yang Yu
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin 130117, China
| | - Li Su
- Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
- The Pelotonia Institute for Immuno-oncology, The Ohio State University Comprehensive Cancer Center, Columbus, OH 43210, USA
| | - Carter Allen
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| | - Yu Lin
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Shaoli Sun
- Department of Pathology, The Ohio State University, Columbus, OH 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan 250100, China
| | - José Javier Otero
- Departments of Neuroscience, Pathology, Neuropathology, The Ohio State University, Columbus, OH 43210, USA
| | - Dongjun Chung
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
- The Pelotonia Institute for Immuno-oncology, The Ohio State University Comprehensive Cancer Center, Columbus, OH 43210, USA
| | - Hongjun Fu
- Department of Neuroscience, The Ohio State University, Columbus, OH 43210, USA
| | - Zihai Li
- The Pelotonia Institute for Immuno-oncology, The Ohio State University Comprehensive Cancer Center, Columbus, OH 43210, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
- The Pelotonia Institute for Immuno-oncology, The Ohio State University Comprehensive Cancer Center, Columbus, OH 43210, USA
| |
Collapse
|
28
|
Avesani S, Viesi E, Alessandrì L, Motterle G, Bonnici V, Beccuti M, Calogero R, Giugno R. Stardust: improving spatial transcriptomics data analysis through space-aware modularity optimization-based clustering. Gigascience 2022; 11:6659721. [PMID: 35946989 PMCID: PMC9364686 DOI: 10.1093/gigascience/giac075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 04/27/2022] [Accepted: 06/30/2022] [Indexed: 01/24/2023] Open
Abstract
Background Spatial transcriptomics (ST) combines stained tissue images with spatially resolved high-throughput RNA sequencing. The spatial transcriptomic analysis includes challenging tasks like clustering, where a partition among data points (spots) is defined by means of a similarity measure. Improving clustering results is a key factor as clustering affects subsequent downstream analysis. State-of-the-art approaches group data by taking into account transcriptional similarity and some by exploiting spatial information as well. However, it is not yet clear how much the spatial information combined with transcriptomics improves the clustering result. Results We propose a new clustering method, Stardust, that easily exploits the combination of space and transcriptomic information in the clustering procedure through a manual or fully automatic tuning of algorithm parameters. Moreover, a parameter-free version of the method is also provided where the spatial contribution depends dynamically on the expression distances distribution in the space. We evaluated the proposed methods results by analyzing ST data sets available on the 10x Genomics website and comparing clustering performances with state-of-the-art approaches by measuring the spots' stability in the clusters and their biological coherence. Stability is defined by the tendency of each point to remain clustered with the same neighbors when perturbations are applied. Conclusions Stardust is an easy-to-use methodology allowing to define how much spatial information should influence clustering on different tissues and achieving more stable results than state-of-the-art approaches.
Collapse
Affiliation(s)
- Simone Avesani
- Department of Computer Science, University of Verona, Verona 37134, Italy
| | - Eva Viesi
- Department of Computer Science, University of Verona, Verona 37134, Italy
| | - Luca Alessandrì
- Department of Molecular Biotechnology and Health Sciences, University of Turin, Turin 10126, Italy
| | - Giovanni Motterle
- Department of Computer Science, University of Verona, Verona 37134, Italy
| | - Vincenzo Bonnici
- Department of Mathematical, Physical and Computer Sciences, University of Parma, Parma 43121, Italy
| | - Marco Beccuti
- Department of Computer Science, University of Turin, Turin 10149, Italy
| | - Raffaele Calogero
- Department of Molecular Biotechnology and Health Sciences, University of Turin, Turin 10126, Italy
| | - Rosalba Giugno
- Department of Computer Science, University of Verona, Verona 37134, Italy
| |
Collapse
|
29
|
Vadapalli S, Abdelhalim H, Zeeshan S, Ahmed Z. Artificial intelligence and machine learning approaches using gene expression and variant data for personalized medicine. Brief Bioinform 2022; 23:6590150. [PMID: 35595537 DOI: 10.1093/bib/bbac191] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 04/02/2022] [Accepted: 04/26/2022] [Indexed: 12/16/2022] Open
Abstract
Precision medicine uses genetic, environmental and lifestyle factors to more accurately diagnose and treat disease in specific groups of patients, and it is considered one of the most promising medical efforts of our time. The use of genetics is arguably the most data-rich and complex components of precision medicine. The grand challenge today is the successful assimilation of genetics into precision medicine that translates across different ancestries, diverse diseases and other distinct populations, which will require clever use of artificial intelligence (AI) and machine learning (ML) methods. Our goal here was to review and compare scientific objectives, methodologies, datasets, data sources, ethics and gaps of AI/ML approaches used in genomics and precision medicine. We selected high-quality literature published within the last 5 years that were indexed and available through PubMed Central. Our scope was narrowed to articles that reported application of AI/ML algorithms for statistical and predictive analyses using whole genome and/or whole exome sequencing for gene variants, and RNA-seq and microarrays for gene expression. We did not limit our search to specific diseases or data sources. Based on the scope of our review and comparative analysis criteria, we identified 32 different AI/ML approaches applied in variable genomics studies and report widely adapted AI/ML algorithms for predictive diagnostics across several diseases.
Collapse
Affiliation(s)
- Sreya Vadapalli
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA
| | - Habiba Abdelhalim
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA
| | - Saman Zeeshan
- Rutgers Cancer Institute of New Jersey, Rutgers University, 195 Little Albany St, New Brunswick, NJ, USA
| | - Zeeshan Ahmed
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA.,Department of Medicine, Robert Wood Johnson Medical School, Rutgers Biomedical and Health Sciences, 125 Paterson St, New Brunswick, NJ, USA
| |
Collapse
|
30
|
Li K, Yan C, Li C, Chen L, Zhao J, Zhang Z, Bao S, Sun J, Zhou M. Computational elucidation of spatial gene expression variation from spatially resolved transcriptomics data. MOLECULAR THERAPY - NUCLEIC ACIDS 2022; 27:404-411. [PMID: 35036053 PMCID: PMC8728308 DOI: 10.1016/j.omtn.2021.12.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Recent advances in spatially resolved transcriptomics (SRT) have revolutionized biological and medical research and enabled unprecedented insight into the functional organization and cell communication of tissues and organs in situ. Identifying and elucidating gene spatial expression variation (SE analysis) is fundamental to elucidate the SRT landscape. There is an urgent need for public repositories and computational techniques of SRT data in SE analysis alongside technological breakthroughs and large-scale data generation. Increasing efforts to use in silico techniques in SE analysis have been made. However, these attempts are widely scattered among a large number of studies that are not easily accessible or comprehensible by both medical and life scientists. This study provides a survey and a summary of public resources on SE analysis in SRT studies. An updated systematic overview of state-of-the-art computational approaches and tools currently available in SE analysis are presented herein, emphasizing recent advances. Finally, the present study explores the future perspectives and challenges of in silico techniques in SE analysis. This study guides medical and life scientists to look for dedicated resources and more competent tools for characterizing spatial patterns of gene expression.
Collapse
Affiliation(s)
- Ke Li
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Congcong Yan
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Chenghao Li
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Lu Chen
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Jingting Zhao
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Zicheng Zhang
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Siqi Bao
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Jie Sun
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
- Corresponding author Jie Sun, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.
| | - Meng Zhou
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
- Corresponding author Meng Zhou, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.
| |
Collapse
|
31
|
Liu B, Li Y, Zhang L. Analysis and Visualization of Spatial Transcriptomic Data. Front Genet 2022; 12:785290. [PMID: 35154244 PMCID: PMC8829434 DOI: 10.3389/fgene.2021.785290] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 12/24/2021] [Indexed: 12/21/2022] Open
Abstract
Human and animal tissues consist of heterogeneous cell types that organize and interact in highly structured manners. Bulk and single-cell sequencing technologies remove cells from their original microenvironments, resulting in a loss of spatial information. Spatial transcriptomics is a recent technological innovation that measures transcriptomic information while preserving spatial information. Spatial transcriptomic data can be generated in several ways. RNA molecules are measured by in situ sequencing, in situ hybridization, or spatial barcoding to recover original spatial coordinates. The inclusion of spatial information expands the range of possibilities for analysis and visualization, and spurred the development of numerous novel methods. In this review, we summarize the core concepts of spatial genomics technology and provide a comprehensive review of current analysis and visualization methods for spatial transcriptomics.
Collapse
|
32
|
Yang C, Chowdhury D, Zhang Z, Cheung WK, Lu A, Bian Z, Zhang L. A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data. Comput Struct Biotechnol J 2021; 19:6301-6314. [PMID: 34900140 PMCID: PMC8640167 DOI: 10.1016/j.csbj.2021.11.028] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 11/17/2021] [Accepted: 11/17/2021] [Indexed: 12/16/2022] Open
Abstract
Metagenomic sequencing provides a culture-independent avenue to investigate the complex microbial communities by constructing metagenome-assembled genomes (MAGs). A MAG represents a microbial genome by a group of sequences from genome assembly with similar characteristics. It enables us to identify novel species and understand their potential functions in a dynamic ecosystem. Many computational tools have been developed to construct and annotate MAGs from metagenomic sequencing, however, there is a prominent gap to comprehensively introduce their background and practical performance. In this paper, we have thoroughly investigated the computational tools designed for both upstream and downstream analyses, including metagenome assembly, metagenome binning, gene prediction, functional annotation, taxonomic classification, and profiling. We have categorized the commonly used tools into unique groups based on their functional background and introduced the underlying core algorithms and associated information to demonstrate a comparative outlook. Furthermore, we have emphasized the computational requisition and offered guidance to the users to select the most efficient tools. Finally, we have indicated current limitations, potential solutions, and future perspectives for further improving the tools of MAG construction and annotation. We believe that our work provides a consolidated resource for the current stage of MAG studies and shed light on the future development of more effective MAG analysis tools on metagenomic sequencing.
Collapse
Key Words
- CNN, convolutional neural network
- DBG, De Bruijn graph
- GTDB, Genome Taxonomy Database
- Gene functional annotation
- Gene prediction
- Genome assembly
- HMM, Hidden Markov Model
- KEGG, Kyoto Encyclopedia of Genes and Genomes
- LCA, lowest common ancestor
- LPA, label propagation algorithm
- MAGs, metagenome-assembled genomes
- Metagenome binning
- Metagenome-assembled genomes
- Metagenomic sequencing
- Microbial abundance profiling
- OLC, overlap-layout consensus
- ONT, Oxford Nanopore Technologies
- ORFs, open reading frames
- PacBio, Pacific Biosciences
- QC, quality control
- SLR, synthetic long reads
- TNFs, tetranucleotide frequencies
- Taxonomic classification
Collapse
Affiliation(s)
- Chao Yang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Debajyoti Chowdhury
- Computational Medicine Lab, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Institute of Integrated Bioinformedicine and Translational Sciences, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Zhenmiao Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - William K. Cheung
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Aiping Lu
- Computational Medicine Lab, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Institute of Integrated Bioinformedicine and Translational Sciences, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Zhaoxiang Bian
- Institute of Brain and Gut Research, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Chinese Medicine Clinical Study Center, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Computational Medicine Lab, Hong Kong Baptist University, Hong Kong Special Administrative Region
| |
Collapse
|
33
|
Auerbach BJ, Hu J, Reilly MP, Li M. Applications of single-cell genomics and computational strategies to study common disease and population-level variation. Genome Res 2021; 31:1728-1741. [PMID: 34599006 PMCID: PMC8494214 DOI: 10.1101/gr.275430.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The advent and rapid development of single-cell technologies have made it possible to study cellular heterogeneity at an unprecedented resolution and scale. Cellular heterogeneity underlies phenotypic differences among individuals, and studying cellular heterogeneity is an important step toward our understanding of the disease molecular mechanism. Single-cell technologies offer opportunities to characterize cellular heterogeneity from different angles, but how to link cellular heterogeneity with disease phenotypes requires careful computational analysis. In this article, we will review the current applications of single-cell methods in human disease studies and describe what we have learned so far from existing studies about human genetic variation. As single-cell technologies are becoming widely applicable in human disease studies, population-level studies have become a reality. We will describe how we should go about pursuing and designing these studies, particularly how to select study subjects, how to determine the number of cells to sequence per subject, and the needed sequencing depth per cell. We also discuss computational strategies for the analysis of single-cell data and describe how single-cell data can be integrated with bulk tissue data and data generated from genome-wide association studies. Finally, we point out open problems and future research directions.
Collapse
Affiliation(s)
- Benjamin J Auerbach
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - Jian Hu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - Muredach P Reilly
- Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, New York 10032, USA
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|