201
|
Bergenstråhle L, He B, Bergenstråhle J, Abalo X, Mirzazadeh R, Thrane K, Ji AL, Andersson A, Larsson L, Stakenborg N, Boeckxstaens G, Khavari P, Zou J, Lundeberg J, Maaskola J. Super-resolved spatial transcriptomics by deep data fusion. Nat Biotechnol 2022; 40:476-479. [PMID: 34845373 DOI: 10.1038/s41587-021-01075-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 08/27/2021] [Indexed: 02/07/2023]
Abstract
Current methods for spatial transcriptomics are limited by low spatial resolution. Here we introduce a method that integrates spatial gene expression data with histological image data from the same tissue section to infer higher-resolution expression maps. Using a deep generative model, our method characterizes the transcriptome of micrometer-scale anatomical features and can predict spatial gene expression from histology images alone.
Collapse
Affiliation(s)
- Ludvig Bergenstråhle
- SciLifeLab, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Bryan He
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Joseph Bergenstråhle
- SciLifeLab, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Xesús Abalo
- SciLifeLab, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Reza Mirzazadeh
- SciLifeLab, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Kim Thrane
- SciLifeLab, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Andrew L Ji
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA
| | - Alma Andersson
- SciLifeLab, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Ludvig Larsson
- SciLifeLab, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Nathalie Stakenborg
- Department of Chronic Diseases and Metabolism, Katholieke Universiteit te Leuven, Leuven, Belgium
| | - Guy Boeckxstaens
- Department of Chronic Diseases and Metabolism, Katholieke Universiteit te Leuven, Leuven, Belgium
| | - Paul Khavari
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Joakim Lundeberg
- SciLifeLab, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - Jonas Maaskola
- SciLifeLab, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.,SciLifeLab, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| |
Collapse
|
202
|
Abstract
Spatial transcriptomic technologies have been developed rapidly in recent years. The addition of spatial context to expression data holds the potential to revolutionize many fields in biology. However, the lack of computational tools remains a bottleneck that is preventing the broader utilization of these technologies. Recently, we have developed Giotto as a comprehensive, generally applicable, and user-friendly toolbox for spatial transcriptomic data analysis and visualization. Giotto implements a rich set of algorithms to enable robust spatial data analysis. To help users get familiar with the Giotto environment and apply it effectively in analyzing new datasets, we will describe the detailed protocols for applying Giotto without any advanced programming skills. © 2022 Wiley Periodicals LLC. Basic Protocol 1: Getting Giotto set up for use Basic Protocol 2: Pre-processing Basic Protocol 3: Clustering and cell-type identification Basic Protocol 4: Cell-type enrichment and deconvolution analyses Basic Protocol 5: Spatial structure analysis tools Basic Protocol 6: Spatial domain detection by using a hidden Markov random field model Support Protocol 1: Spatial proximity-associated cell-cell interactions Support Protocol 2: Assembly of a registered 3D Giotto object from 2D slices.
Collapse
Affiliation(s)
- Natalie Del Rossi
- Department of Genetics and Genomic Sciences. Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Jiaji G. Chen
- Section of Hematology and Medical Oncology, School of Medicine, Boston University, Boston, Massachusetts 02138, USA
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences. Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Ruben Dries
- Section of Hematology and Medical Oncology, School of Medicine, Boston University, Boston, Massachusetts 02138, USA
- Division of Computational Biomedicine, School of Medicine, Boston University, Boston, Massachusetts 02138, USA
| |
Collapse
|
203
|
Zeng Z, Li Y, Li Y, Luo Y. Statistical and machine learning methods for spatially resolved transcriptomics data analysis. Genome Biol 2022; 23:83. [PMID: 35337374 PMCID: PMC8951701 DOI: 10.1186/s13059-022-02653-7] [Citation(s) in RCA: 61] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Accepted: 03/15/2022] [Indexed: 01/28/2023] Open
Abstract
The recent advancement in spatial transcriptomics technology has enabled multiplexed profiling of cellular transcriptomes and spatial locations. As the capacity and efficiency of the experimental technologies continue to improve, there is an emerging need for the development of analytical approaches. Furthermore, with the continuous evolution of sequencing protocols, the underlying assumptions of current analytical methods need to be re-evaluated and adjusted to harness the increasing data complexity. To motivate and aid future model development, we herein review the recent development of statistical and machine learning methods in spatial transcriptomics, summarize useful resources, and highlight the challenges and opportunities ahead.
Collapse
Affiliation(s)
- Zexian Zeng
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100084, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100084, China
- Department of Data Sciences, Dana Farber Cancer Institute, Harvard T.H. Chan School of Public Health, Boston, MA, 02215, USA
| | - Yawei Li
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Yiming Li
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Yuan Luo
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA.
- Northwestern University Clinical and Translational Sciences Institute, Chicago, IL, 60611, USA.
- Institute for Augmented Intelligence in Medicine, Northwestern University, Chicago, IL, 60611, USA.
- Center for Health Information Partnerships, Northwestern University, Chicago, IL, 60611, USA.
| |
Collapse
|
204
|
Abstract
The function of many biological systems, such as embryos, liver lobules, intestinal villi, and tumors, depends on the spatial organization of their cells. In the past decade, high-throughput technologies have been developed to quantify gene expression in space, and computational methods have been developed that leverage spatial gene expression data to identify genes with spatial patterns and to delineate neighborhoods within tissues. To comprehensively document spatial gene expression technologies and data-analysis methods, we present a curated review of literature on spatial transcriptomics dating back to 1987, along with a thorough analysis of trends in the field, such as usage of experimental techniques, species, tissues studied, and computational approaches used. Our Review places current methods in a historical context, and we derive insights about the field that can guide current research strategies. A companion supplement offers a more detailed look at the technologies and methods analyzed: https://pachterlab.github.io/LP_2021/ .
Collapse
|
205
|
Walker BL, Cang Z, Ren H, Bourgain-Chang E, Nie Q. Deciphering tissue structure and function using spatial transcriptomics. Commun Biol 2022; 5:220. [PMID: 35273328 PMCID: PMC8913632 DOI: 10.1038/s42003-022-03175-5] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 02/16/2022] [Indexed: 01/31/2023] Open
Abstract
The rapid development of spatial transcriptomics (ST) techniques has allowed the measurement of transcriptional levels across many genes together with the spatial positions of cells. This has led to an explosion of interest in computational methods and techniques for harnessing both spatial and transcriptional information in analysis of ST datasets. The wide diversity of approaches in aim, methodology and technology for ST provides great challenges in dissecting cellular functions in spatial contexts. Here, we synthesize and review the key problems in analysis of ST data and methods that are currently applied, while also expanding on open questions and areas of future development.
Collapse
Affiliation(s)
- Benjamin L Walker
- The NSF-Simons Center for Multiscale Cell Fate Research, University of California Irvine, Irvine, CA, USA
- Department of Mathematics, University of California Irvine, Irvine, CA, USA
| | - Zixuan Cang
- The NSF-Simons Center for Multiscale Cell Fate Research, University of California Irvine, Irvine, CA, USA
- Department of Mathematics, University of California Irvine, Irvine, CA, USA
| | - Honglei Ren
- The NSF-Simons Center for Multiscale Cell Fate Research, University of California Irvine, Irvine, CA, USA
- Department of Mathematics, University of California Irvine, Irvine, CA, USA
| | | | - Qing Nie
- The NSF-Simons Center for Multiscale Cell Fate Research, University of California Irvine, Irvine, CA, USA.
- Department of Mathematics, University of California Irvine, Irvine, CA, USA.
- Department of Developmental and Cell Biology, University of California Irvine, Irvine, CA, USA.
| |
Collapse
|
206
|
Wang M, Song WM, Ming C, Wang Q, Zhou X, Xu P, Krek A, Yoon Y, Ho L, Orr ME, Yuan GC, Zhang B. Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer's disease: review, recommendation, implementation and application. Mol Neurodegener 2022; 17:17. [PMID: 35236372 PMCID: PMC8889402 DOI: 10.1186/s13024-022-00517-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/18/2022] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD) is the most common form of dementia, characterized by progressive cognitive impairment and neurodegeneration. Extensive clinical and genomic studies have revealed biomarkers, risk factors, pathways, and targets of AD in the past decade. However, the exact molecular basis of AD development and progression remains elusive. The emerging single-cell sequencing technology can potentially provide cell-level insights into the disease. Here we systematically review the state-of-the-art bioinformatics approaches to analyze single-cell sequencing data and their applications to AD in 14 major directions, including 1) quality control and normalization, 2) dimension reduction and feature extraction, 3) cell clustering analysis, 4) cell type inference and annotation, 5) differential expression, 6) trajectory inference, 7) copy number variation analysis, 8) integration of single-cell multi-omics, 9) epigenomic analysis, 10) gene network inference, 11) prioritization of cell subpopulations, 12) integrative analysis of human and mouse sc-RNA-seq data, 13) spatial transcriptomics, and 14) comparison of single cell AD mouse model studies and single cell human AD studies. We also address challenges in using human postmortem and mouse tissues and outline future developments in single cell sequencing data analysis. Importantly, we have implemented our recommended workflow for each major analytic direction and applied them to a large single nucleus RNA-sequencing (snRNA-seq) dataset in AD. Key analytic results are reported while the scripts and the data are shared with the research community through GitHub. In summary, this comprehensive review provides insights into various approaches to analyze single cell sequencing data and offers specific guidelines for study design and a variety of analytic directions. The review and the accompanied software tools will serve as a valuable resource for studying cellular and molecular mechanisms of AD, other diseases, or biological systems at the single cell level.
Collapse
Affiliation(s)
- Minghui Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Won-min Song
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Chen Ming
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Qian Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Xianxiao Zhou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Peng Xu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Azra Krek
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Yonejung Yoon
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Lap Ho
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Miranda E. Orr
- Department of Internal Medicine, Section of Gerontology and Geriatric Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
- Sticht Center for Healthy Aging and Alzheimer’s Prevention, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| |
Collapse
|
207
|
Li K, Yan C, Li C, Chen L, Zhao J, Zhang Z, Bao S, Sun J, Zhou M. Computational elucidation of spatial gene expression variation from spatially resolved transcriptomics data. MOLECULAR THERAPY - NUCLEIC ACIDS 2022; 27:404-411. [PMID: 35036053 PMCID: PMC8728308 DOI: 10.1016/j.omtn.2021.12.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Recent advances in spatially resolved transcriptomics (SRT) have revolutionized biological and medical research and enabled unprecedented insight into the functional organization and cell communication of tissues and organs in situ. Identifying and elucidating gene spatial expression variation (SE analysis) is fundamental to elucidate the SRT landscape. There is an urgent need for public repositories and computational techniques of SRT data in SE analysis alongside technological breakthroughs and large-scale data generation. Increasing efforts to use in silico techniques in SE analysis have been made. However, these attempts are widely scattered among a large number of studies that are not easily accessible or comprehensible by both medical and life scientists. This study provides a survey and a summary of public resources on SE analysis in SRT studies. An updated systematic overview of state-of-the-art computational approaches and tools currently available in SE analysis are presented herein, emphasizing recent advances. Finally, the present study explores the future perspectives and challenges of in silico techniques in SE analysis. This study guides medical and life scientists to look for dedicated resources and more competent tools for characterizing spatial patterns of gene expression.
Collapse
Affiliation(s)
- Ke Li
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Congcong Yan
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Chenghao Li
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Lu Chen
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Jingting Zhao
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Zicheng Zhang
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Siqi Bao
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Jie Sun
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
- Corresponding author Jie Sun, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.
| | - Meng Zhou
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
- Corresponding author Meng Zhou, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.
| |
Collapse
|
208
|
Obtaining spatially resolved tumor purity maps using deep multiple instance learning in a pan-cancer study. PATTERNS (NEW YORK, N.Y.) 2022; 3:100399. [PMID: 35199060 PMCID: PMC8848022 DOI: 10.1016/j.patter.2021.100399] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 09/07/2021] [Accepted: 11/03/2021] [Indexed: 02/07/2023]
Abstract
Tumor purity is the percentage of cancer cells within a tissue section. Pathologists estimate tumor purity to select samples for genomic analysis by manually reading hematoxylin-eosin (H&E)-stained slides, which is tedious, time consuming, and prone to inter-observer variability. Besides, pathologists' estimates do not correlate well with genomic tumor purity values, which are inferred from genomic data and accepted as accurate for downstream analysis. We developed a deep multiple instance learning model predicting tumor purity from H&E-stained digital histopathology slides. Our model successfully predicted tumor purity in eight The Cancer Genome Atlas (TCGA) cohorts and a local Singapore cohort. The predictions were highly consistent with genomic tumor purity values. Thus, our model can be utilized to select samples for genomic analysis, which will help reduce pathologists' workload and decrease inter-observer variability. Furthermore, our model provided tumor purity maps showing the spatial variation within sections. They can help better understand the tumor microenvironment. MIL model successfully predicts a sample's tumor purity from histopathology slides MIL model learns to spatially resolve tumor purity from sample-level labels Tumor purity varies spatially within a sample Pathologists’ region selection is vital for correct percentage tumor nuclei estimation
Given some big data and coarse-level labels, extracting fine-level information is a demanding yet rewarding challenge in data science. This study develops a machine learning model utilizing big data and exploiting coarse-level labels to reveal fine-level details within the data. Although it can be applied to different data science tasks with enormous data and coarse labels, we applied it to a computational histopathology task with gigapixel histopathology slides and sample-level labels. Specifically, the model revealed spatial resolution of tumor purity within histopathology slides using only sample-level genomic tumor purity values during training. This can also be extended to other omics features, providing precious information about cancer biology and promising personalized, precision medicine. Such studies are of great clinical importance in discovering imaging biomarkers and better understanding the tumor microenvironment.
Collapse
|
209
|
Vickovic S, Lötstedt B, Klughammer J, Mages S, Segerstolpe Å, Rozenblatt-Rosen O, Regev A. SM-Omics is an automated platform for high-throughput spatial multi-omics. Nat Commun 2022; 13:795. [PMID: 35145087 PMCID: PMC8831571 DOI: 10.1038/s41467-022-28445-y] [Citation(s) in RCA: 63] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 01/24/2022] [Indexed: 12/12/2022] Open
Abstract
The spatial organization of cells and molecules plays a key role in tissue function in homeostasis and disease. Spatial transcriptomics has recently emerged as a key technique to capture and positionally barcode RNAs directly in tissues. Here, we advance the application of spatial transcriptomics at scale, by presenting Spatial Multi-Omics (SM-Omics) as a fully automated, high-throughput all-sequencing based platform for combined and spatially resolved transcriptomics and antibody-based protein measurements. SM-Omics uses DNA-barcoded antibodies, immunofluorescence or a combination thereof, to scale and combine spatial transcriptomics and spatial antibody-based multiplex protein detection. SM-Omics allows processing of up to 64 in situ spatial reactions or up to 96 sequencing-ready libraries, of high complexity, in a ~2 days process. We demonstrate SM-Omics in the mouse brain, spleen and colorectal cancer model, showing its broad utility as a high-throughput platform for spatial multi-omics.
Collapse
Affiliation(s)
- S Vickovic
- Klarman Cell Observatory Broad Institute of MIT and Harvard, Cambridge, MA, USA. .,Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA. .,New York Genome Center, New York, NY, USA. .,Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden.
| | - B Lötstedt
- Klarman Cell Observatory Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.,Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - J Klughammer
- Klarman Cell Observatory Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - S Mages
- Klarman Cell Observatory Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Å Segerstolpe
- Klarman Cell Observatory Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - O Rozenblatt-Rosen
- Klarman Cell Observatory Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Genentech, 1 DNA Way, South San Francisco, CA, USA
| | - A Regev
- Klarman Cell Observatory Broad Institute of MIT and Harvard, Cambridge, MA, USA. .,Howard Hughes Medical Institute and Koch Institute for Integrative Cancer Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA. .,Genentech, 1 DNA Way, South San Francisco, CA, USA.
| |
Collapse
|
210
|
Spatial components of molecular tissue biology. Nat Biotechnol 2022; 40:308-318. [PMID: 35132261 DOI: 10.1038/s41587-021-01182-1] [Citation(s) in RCA: 119] [Impact Index Per Article: 59.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Accepted: 12/03/2021] [Indexed: 02/06/2023]
Abstract
Methods for profiling RNA and protein expression in a spatially resolved manner are rapidly evolving, making it possible to comprehensively characterize cells and tissues in health and disease. To maximize the biological insights obtained using these techniques, it is critical to both clearly articulate the key biological questions in spatial analysis of tissues and develop the requisite computational tools to address them. Developers of analytical tools need to decide on the intrinsic molecular features of each cell that need to be considered, and how cell shape and morphological features are incorporated into the analysis. Also, optimal ways to compare different tissue samples at various length scales are still being sought. Grouping these biological problems and related computational algorithms into classes across length scales, thus characterizing common issues that need to be addressed, will facilitate further progress in spatial transcriptomics and proteomics.
Collapse
|
211
|
Song T, Markham KK, Li Z, Muller KE, Greenham K, Kuang R. Detecting spatially co-expressed gene clusters with functional coherence by graph-regularized convolutional neural network. Bioinformatics 2022; 38:1344-1352. [PMID: 34864909 DOI: 10.1093/bioinformatics/btab812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 10/29/2021] [Accepted: 11/29/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Clustering spatial-resolved gene expression is an essential analysis to reveal gene activities in the underlying morphological context by their functional roles. However, conventional clustering analysis does not consider gene expression co-localizations in tissue for detecting spatial expression patterns or functional relationships among the genes for biological interpretation in the spatial context. In this article, we present a convolutional neural network (CNN) regularized by the graph of protein-protein interaction (PPI) network to cluster spatially resolved gene expression. This method improves the coherence of spatial patterns and provides biological interpretation of the gene clusters in the spatial context by exploiting the spatial localization by convolution and gene functional relationships by graph-Laplacian regularization. RESULTS In this study, we tested clustering the spatially variable genes or all expressed genes in the transcriptome in 22 Visium spatial transcriptomics datasets of different tissue sections publicly available from 10× Genomics and spatialLIBD. The results demonstrate that the PPI-regularized CNN constantly detects gene clusters with coherent spatial patterns and significantly enriched by gene functions with the state-of-the-art performance. Additional case studies on mouse kidney tissue and human breast cancer tissue suggest that the PPI-regularized CNN also detects spatially co-expressed genes to define the corresponding morphological context in the tissue with valuable insights. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/kuanglab/CNN-PReg. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tianci Song
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN 55414, USA
| | - Kathleen K Markham
- Department of Plant and Microbial Biology, University of Minnesota Twin Cities, Minneapolis, MN 55414, USA
| | - Zhuliu Li
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN 55414, USA
| | - Kristen E Muller
- Department of Pathology and Laboratory Medicine, Dartmouth-Hitchcock Medical Center, Lebanon, NH 03756, USA
| | - Kathleen Greenham
- Department of Plant and Microbial Biology, University of Minnesota Twin Cities, Minneapolis, MN 55414, USA
| | - Rui Kuang
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN 55414, USA
| |
Collapse
|
212
|
Velten B, Braunger JM, Argelaguet R, Arnol D, Wirbel J, Bredikhin D, Zeller G, Stegle O. Identifying temporal and spatial patterns of variation from multimodal data using MEFISTO. Nat Methods 2022; 19:179-186. [PMID: 35027765 PMCID: PMC8828471 DOI: 10.1038/s41592-021-01343-9] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 11/05/2021] [Indexed: 01/04/2023]
Abstract
Factor analysis is a widely used method for dimensionality reduction in genome biology, with applications from personalized health to single-cell biology. Existing factor analysis models assume independence of the observed samples, an assumption that fails in spatio-temporal profiling studies. Here we present MEFISTO, a flexible and versatile toolbox for modeling high-dimensional data when spatial or temporal dependencies between the samples are known. MEFISTO maintains the established benefits of factor analysis for multimodal data, but enables the performance of spatio-temporally informed dimensionality reduction, interpolation, and separation of smooth from non-smooth patterns of variation. Moreover, MEFISTO can integrate multiple related datasets by simultaneously identifying and aligning the underlying patterns of variation in a data-driven manner. To illustrate MEFISTO, we apply the model to different datasets with spatial or temporal resolution, including an evolutionary atlas of organ development, a longitudinal microbiome study, a single-cell multi-omics atlas of mouse gastrulation and spatially resolved transcriptomics.
Collapse
Affiliation(s)
- Britta Velten
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- Cellular Genetics Programme, Wellcome Sanger Institute, Cambridge, UK.
| | - Jana M Braunger
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Ricard Argelaguet
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Epigenetics Programme, Babraham Institute, Cambridge, UK
| | - Damien Arnol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Jakob Wirbel
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
| | - Danila Bredikhin
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
- Collaboration for joint PhD degree between EMBL and Heidelberg University, Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - Georg Zeller
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
| | - Oliver Stegle
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- Cellular Genetics Programme, Wellcome Sanger Institute, Cambridge, UK.
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
| |
Collapse
|
213
|
Behanova A, Klemm A, Wählby C. Spatial Statistics for Understanding Tissue Organization. Front Physiol 2022; 13:832417. [PMID: 35153840 PMCID: PMC8837270 DOI: 10.3389/fphys.2022.832417] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 01/06/2022] [Indexed: 11/13/2022] Open
Abstract
Interpreting tissue architecture plays an important role in gaining a better understanding of healthy tissue development and disease. Novel molecular detection and imaging techniques make it possible to locate many different types of objects, such as cells and/or mRNAs, and map their location across the tissue space. In this review, we present several methods that provide quantification and statistical verification of observed patterns in the tissue architecture. We categorize these methods into three main groups: Spatial statistics on a single type of object, two types of objects, and multiple types of objects. We discuss the methods in relation to four hypotheses regarding the methods' capability to distinguish random and non-random distributions of objects across a tissue sample, and present a number of openly available tools where these methods are provided. We also discuss other spatial statistics methods compatible with other types of input data.
Collapse
|
214
|
Liu B, Li Y, Zhang L. Analysis and Visualization of Spatial Transcriptomic Data. Front Genet 2022; 12:785290. [PMID: 35154244 PMCID: PMC8829434 DOI: 10.3389/fgene.2021.785290] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 12/24/2021] [Indexed: 12/21/2022] Open
Abstract
Human and animal tissues consist of heterogeneous cell types that organize and interact in highly structured manners. Bulk and single-cell sequencing technologies remove cells from their original microenvironments, resulting in a loss of spatial information. Spatial transcriptomics is a recent technological innovation that measures transcriptomic information while preserving spatial information. Spatial transcriptomic data can be generated in several ways. RNA molecules are measured by in situ sequencing, in situ hybridization, or spatial barcoding to recover original spatial coordinates. The inclusion of spatial information expands the range of possibilities for analysis and visualization, and spurred the development of numerous novel methods. In this review, we summarize the core concepts of spatial genomics technology and provide a comprehensive review of current analysis and visualization methods for spatial transcriptomics.
Collapse
|
215
|
Heinen T, Secchia S, Reddington JP, Zhao B, Furlong EEM, Stegle O. scDALI: modeling allelic heterogeneity in single cells reveals context-specific genetic regulation. Genome Biol 2022; 23:8. [PMID: 34991671 PMCID: PMC8734213 DOI: 10.1186/s13059-021-02593-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 12/27/2021] [Indexed: 01/04/2023] Open
Abstract
While it is established that the functional impact of genetic variation can vary across cell types and states, capturing this diversity remains challenging. Current studies using bulk sequencing either ignore this heterogeneity or use sorted cell populations, reducing discovery and explanatory power. Here, we develop scDALI, a versatile computational framework that integrates information on cellular states with allelic quantifications of single-cell sequencing data to characterize cell-state-specific genetic effects. We apply scDALI to scATAC-seq profiles from developing F1 Drosophila embryos and scRNA-seq from differentiating human iPSCs, uncovering heterogeneous genetic effects in specific lineages, developmental stages, or cell types.
Collapse
Affiliation(s)
- Tobias Heinen
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Faculty of Mathematics and Computer Science, Heidelberg University, Heidelberg, Germany
| | - Stefano Secchia
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Faculty of Biosciences, Collaboration for Joint PhD Degree between EMBL and Heidelberg University, Heidelberg, Germany
| | - James P Reddington
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Bingqing Zhao
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Eileen E M Furlong
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
| | - Oliver Stegle
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
| |
Collapse
|
216
|
Wu Y, Cheng Y, Wang X, Fan J, Gao Q. Spatial omics: Navigating to the golden era of cancer research. Clin Transl Med 2022; 12:e696. [PMID: 35040595 PMCID: PMC8764875 DOI: 10.1002/ctm2.696] [Citation(s) in RCA: 56] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 12/11/2021] [Accepted: 12/20/2021] [Indexed: 12/15/2022] Open
Abstract
The idea that tumour microenvironment (TME) is organised in a spatial manner will not surprise many cancer biologists; however, systematically capturing spatial architecture of TME is still not possible until recent decade. The past five years have witnessed a boom in the research of high-throughput spatial techniques and algorithms to delineate TME at an unprecedented level. Here, we review the technological progress of spatial omics and how advanced computation methods boost multi-modal spatial data analysis. Then, we discussed the potential clinical translations of spatial omics research in precision oncology, and proposed a transfer of spatial ecological principles to cancer biology in spatial data interpretation. So far, spatial omics is placing us in the golden age of spatial cancer research. Further development and application of spatial omics may lead to a comprehensive decoding of the TME ecosystem and bring the current spatiotemporal molecular medical research into an entirely new paradigm.
Collapse
Affiliation(s)
- Yingcheng Wu
- Center for Tumor Diagnosis & Therapy and Department of Cancer CenterJinshan Hospital and Jinshan Branch of Zhongshan HospitalZhongshan HospitalFudan UniversityShanghai200540China
- Department of Liver Surgery and Transplantationand Key Laboratory of Carcinogenesis and Cancer Invasion (Ministry of Education)Liver Cancer InstituteZhongshan HospitalFudan UniversityShanghaiChina
| | - Yifei Cheng
- Department of Liver Surgery and Transplantationand Key Laboratory of Carcinogenesis and Cancer Invasion (Ministry of Education)Liver Cancer InstituteZhongshan HospitalFudan UniversityShanghaiChina
| | - Xiangdong Wang
- Department of Pulmonary and Critical Care MedicineZhongshan Hospital Institute for Clinical ScienceShanghai Institute of Clinical BioinformaticsShanghai Engineering Research for AI Technology for Cardiopulmonary DiseasesJinshan Hospital Centre for Tumor Diagnosis and TherapyFudan University Shanghai Medical CollegeShanghaiChina
| | - Jia Fan
- Department of Liver Surgery and Transplantationand Key Laboratory of Carcinogenesis and Cancer Invasion (Ministry of Education)Liver Cancer InstituteZhongshan HospitalFudan UniversityShanghaiChina
- Key Laboratory of Medical Epigenetics and MetabolismInstitutes of Biomedical Sciences, Fudan UniversityShanghaiChina
- State Key Laboratory of Genetic EngineeringFudan UniversityShanghaiChina
| | - Qiang Gao
- Center for Tumor Diagnosis & Therapy and Department of Cancer CenterJinshan Hospital and Jinshan Branch of Zhongshan HospitalZhongshan HospitalFudan UniversityShanghai200540China
- Department of Liver Surgery and Transplantationand Key Laboratory of Carcinogenesis and Cancer Invasion (Ministry of Education)Liver Cancer InstituteZhongshan HospitalFudan UniversityShanghaiChina
- Key Laboratory of Medical Epigenetics and MetabolismInstitutes of Biomedical Sciences, Fudan UniversityShanghaiChina
- State Key Laboratory of Genetic EngineeringFudan UniversityShanghaiChina
| |
Collapse
|
217
|
Sankowski R, Monaco G, Prinz M. Evaluating microglial phenotypes using single-cell technologies. Trends Neurosci 2021; 45:133-144. [PMID: 34872773 DOI: 10.1016/j.tins.2021.11.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 10/25/2021] [Accepted: 11/07/2021] [Indexed: 12/13/2022]
Abstract
Recent single-cell technologies have enabled researchers to simultaneously assess the transcriptomes and other modalities of thousands of cells within their spatial context. Here, we have summarized available single-cell methods for dissociated tissues and tissue slides with respect to the specifics of microglial biology. We have focused on next-generation-based technologies. We review the potential of these single-cell sequencing methods and newer multiomics approaches to extend the understanding of microglia function beyond the status quo.
Collapse
Affiliation(s)
- Roman Sankowski
- Institute of Neuropathology, Faculty of Medicine, University of Freiburg, Freiburg, Germany; Berta-Ottenstein-Programme for Clinician Scientists, Faculty of Medicine, University of Freiburg, Freiburg, Germany; Single-Cell Omics Platform Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Gianni Monaco
- Institute of Neuropathology, Faculty of Medicine, University of Freiburg, Freiburg, Germany; Single-Cell Omics Platform Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Marco Prinz
- Institute of Neuropathology, Faculty of Medicine, University of Freiburg, Freiburg, Germany; Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Freiburg, Germany; Center for Basics in NeuroModulation (NeuroModulBasics), Faculty of Medicine, University of Freiburg, Freiburg, Germany.
| |
Collapse
|
218
|
Yang P, Huang H, Liu C. Feature selection revisited in the single-cell era. Genome Biol 2021; 22:321. [PMID: 34847932 PMCID: PMC8638336 DOI: 10.1186/s13059-021-02544-3] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 11/15/2021] [Indexed: 12/13/2022] Open
Abstract
Recent advances in single-cell biotechnologies have resulted in high-dimensional datasets with increased complexity, making feature selection an essential technique for single-cell data analysis. Here, we revisit feature selection techniques and summarise recent developments. We review their application to a range of single-cell data types generated from traditional cytometry and imaging technologies and the latest array of single-cell omics technologies. We highlight some of the challenges and future directions and finally consider their scalability and make general recommendations on each type of feature selection method. We hope this review stimulates future research and application of feature selection in the single-cell era.
Collapse
Affiliation(s)
- Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, 2006, Australia.
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia.
- Charles Perkins Centre, University of Sydney, Sydney, NSW, 2006, Australia.
| | - Hao Huang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, 2006, Australia
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
| | - Chunlei Liu
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
| |
Collapse
|
219
|
Sheng J, Li WV. Selecting gene features for unsupervised analysis of single-cell gene expression data. Brief Bioinform 2021; 22:bbab295. [PMID: 34351383 PMCID: PMC8574996 DOI: 10.1093/bib/bbab295] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 06/17/2021] [Accepted: 07/12/2021] [Indexed: 11/15/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technologies facilitate the characterization of transcriptomic landscapes in diverse species, tissues, and cell types with unprecedented molecular resolution. In order to evaluate various biological hypotheses using high-dimensional single-cell gene expression data, most computational and statistical methods depend on a gene feature selection step to identify genes with high biological variability and reduce computational complexity. Even though many gene selection methods have been developed for scRNA-seq analysis, there lacks a systematic comparison of the assumptions, statistical models, and selection criteria used by these methods. In this article, we summarize and discuss 17 computational methods for selecting gene features in unsupervised analysis of single-cell gene expression data, with unified notations and statistical frameworks. Our discussion provides a useful summary to help practitioners select appropriate methods based on their assumptions and applicability, and to assist method developers in designing new computational tools for unsupervised learning of scRNA-seq data.
Collapse
Affiliation(s)
- Jie Sheng
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Wei Vivian Li
- Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Piscataway, NJ 08854, USA
| |
Collapse
|
220
|
Huo L, Jiao Li J, Chen L, Yu Z, Hutvagner G, Li J. Single-cell multi-omics sequencing: application trends, COVID-19, data analysis issues and prospects. Brief Bioinform 2021; 22:bbab229. [PMID: 34111889 PMCID: PMC8344433 DOI: 10.1093/bib/bbab229] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 05/23/2021] [Accepted: 05/25/2021] [Indexed: 01/19/2023] Open
Abstract
Single-cell sequencing is a biotechnology to sequence one layer of genomic information for individual cells in a tissue sample. For example, single-cell DNA sequencing is to sequence the DNA from every single cell. Increasing in complexity, single-cell multi-omics sequencing, or single-cell multimodal omics sequencing, is to profile in parallel multiple layers of omics information from a single cell. In practice, single-cell multi-omics sequencing actually detects multiple traits such as DNA, RNA, methylation information and/or protein profiles from the same cell for many individuals in a tissue sample. Multi-omics sequencing has been widely applied to systematically unravel interplay mechanisms of key components and pathways in cell. This survey overviews recent developments in single-cell multi-omics sequencing, and their applications to understand complex diseases in particular the COVID-19 pandemic. We also summarize machine learning and bioinformatics techniques used in the analysis of the intercorrelated multilayer heterogeneous data. We observed that variational inference and graph-based learning are popular approaches, and Seurat V3 is a commonly used tool to transfer the missing variables and labels. We also discussed two intensively studied issues relating to data consistency and diversity and commented on currently cared issues surrounding the error correction of data pairs and data imputation methods. The survey is concluded with some open questions and opportunities for this extraordinary field.
Collapse
Affiliation(s)
- Lu Huo
- Data Science Institute, University of Technology Sydney, Ultimo, NSW 2007, Australia
- School of Computer Science, FEIT, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Jiao Jiao Li
- School of Biomedical Engineering, FEIT, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Ling Chen
- School of Computer Science, FEIT, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Zuguo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, 411105, P.R. China
| | - Gyorgy Hutvagner
- School of Biomedical Engineering, FEIT, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Jinyan Li
- Data Science Institute, University of Technology Sydney, Ultimo, NSW 2007, Australia
| |
Collapse
|
221
|
BinTayyash N, Georgaka S, John ST, Ahmed S, Boukouvalas A, Hensman J, Rattray M. Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments. Bioinformatics 2021; 37:3788-3795. [PMID: 34213536 PMCID: PMC10186154 DOI: 10.1093/bioinformatics/btab486] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 06/25/2021] [Accepted: 06/30/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The negative binomial distribution has been shown to be a good model for counts data from both bulk and single-cell RNA-sequencing (RNA-seq). Gaussian process (GP) regression provides a useful non-parametric approach for modelling temporal or spatial changes in gene expression. However, currently available GP regression methods that implement negative binomial likelihood models do not scale to the increasingly large datasets being produced by single-cell and spatial transcriptomics. RESULTS The GPcounts package implements GP regression methods for modelling counts data using a negative binomial likelihood function. Computational efficiency is achieved through the use of variational Bayesian inference. The GP function models changes in the mean of the negative binomial likelihood through a logarithmic link function and the dispersion parameter is fitted by maximum likelihood. We validate the method on simulated time course data, showing better performance to identify changes in over-dispersed counts data than methods based on Gaussian or Poisson likelihoods. To demonstrate temporal inference, we apply GPcounts to single-cell RNA-seq datasets after pseudotime and branching inference. To demonstrate spatial inference, we apply GPcounts to data from the mouse olfactory bulb to identify spatially variable genes and compare to two published GP methods. We also provide the option of modelling additional dropout using a zero-inflated negative binomial. Our results show that GPcounts can be used to model temporal and spatial counts data in cases where simpler Gaussian and Poisson likelihoods are unrealistic. AVAILABILITY AND IMPLEMENTATION GPcounts is implemented using the GPflow library in Python and is available at https://github.com/ManchesterBioinference/GPcounts along with the data, code and notebooks required to reproduce the results presented here. The version used for this paper is archived at https://doi.org/10.5281/zenodo.5027066. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nuha BinTayyash
- School of Computer Science, University of Manchester, Manchester M13 9PL, UK
| | - Sokratia Georgaka
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
| | - S T John
- Secondmind, Cambridge CB2 1LA, UK
- Finnish Center for Artificial Intelligence, FCAI, Department of Computer Science, Aalto University, Finland
| | - Sumon Ahmed
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
- Institute of Information Technology, University of Dhaka, Dhaka 1000, Bangladesh
| | | | | | - Magnus Rattray
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
| |
Collapse
|
222
|
Hu J, Li X, Coleman K, Schroeder A, Ma N, Irwin DJ, Lee EB, Shinohara RT, Li M. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods 2021; 18:1342-1351. [PMID: 34711970 DOI: 10.1038/s41592-021-01255-8] [Citation(s) in RCA: 290] [Impact Index Per Article: 96.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 07/29/2021] [Indexed: 01/24/2023]
Abstract
Recent advances in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive characterization of gene expression patterns in the context of tissue microenvironment. To elucidate spatial gene expression variation, we present SpaGCN, a graph convolutional network approach that integrates gene expression, spatial location and histology in SRT data analysis. Through graph convolution, SpaGCN aggregates gene expression of each spot from its neighboring spots, which enables the identification of spatial domains with coherent expression and histology. The subsequent domain guided differential expression (DE) analysis then detects genes with enriched expression patterns in the identified domains. Analyzing seven SRT datasets using SpaGCN, we show it can detect genes with much more enriched spatial expression patterns than competing methods. Furthermore, genes detected by SpaGCN are transferrable and can be utilized to study spatial variation of gene expression in other datasets. SpaGCN is computationally fast, platform independent, making it a desirable tool for diverse SRT studies.
Collapse
Affiliation(s)
- Jian Hu
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Xiangjie Li
- School of Statistics and Data Science, Nankai University, Tianjin, China
| | - Kyle Coleman
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Amelia Schroeder
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Nan Ma
- Weitzman School of Design, University of Pennsylvania, Philadelphia, PA, USA
| | - David J Irwin
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Edward B Lee
- Translational Neuropathology Research Laboratory, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Russell T Shinohara
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
223
|
Chen Y, Qian W, Lin L, Cai L, Yin K, Jiang S, Song J, Han RPS, Yang C. Mapping Gene Expression in the Spatial Dimension. SMALL METHODS 2021; 5:e2100722. [PMID: 34927963 DOI: 10.1002/smtd.202100722] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 08/25/2021] [Indexed: 06/14/2023]
Abstract
The main function and biological processes of tissues are determined by the combination of gene expression and spatial organization of their cells. RNA sequencing technologies have primarily interrogated gene expression without preserving the native spatial context of cells. However, the emergence of various spatially-resolved transcriptome analysis methods now makes it possible to map the gene expression to specific coordinates within tissues, enabling transcriptional heterogeneity between different regions, and for the localization of specific transcripts and novel spatial markers to be revealed. Hence, spatially-resolved transcriptome analysis technologies have broad utility in research into human disease and developmental biology. Here, recent advances in spatially-resolved transcriptome analysis methods are summarized, including experimental technologies and computational methods. Strengths, challenges, and potential applications of those methods are highlighted, and perspectives in this field are provided.
Collapse
Affiliation(s)
- Yingwen Chen
- The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, the Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Weizhou Qian
- The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, the Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Li Lin
- The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, the Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Linfeng Cai
- The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, the Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Kun Yin
- The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, the Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Shaowei Jiang
- The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, the Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Jia Song
- Institute of Molecular Medicine, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200127, China
| | - Ray P S Han
- Jiangxi University of Traditional Chinese Medicine, Nanchang, Jiangxi, 33004, China
| | - Chaoyong Yang
- The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, the Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
- Institute of Molecular Medicine, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200127, China
| |
Collapse
|
224
|
Lu S, Fürth D, Gillis J. Integrative analysis methods for spatial transcriptomics. Nat Methods 2021; 18:1282-1283. [PMID: 34711969 DOI: 10.1038/s41592-021-01272-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Shaina Lu
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Daniel Fürth
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Jesse Gillis
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA.
| |
Collapse
|
225
|
Abstract
Spatial transcriptomics is a rapidly growing field that promises to comprehensively characterize tissue organization and architecture at the single-cell or subcellular resolution. Such information provides a solid foundation for mechanistic understanding of many biological processes in both health and disease that cannot be obtained by using traditional technologies. The development of computational methods plays important roles in extracting biological signals from raw data. Various approaches have been developed to overcome technology-specific limitations such as spatial resolution, gene coverage, sensitivity, and technical biases. Downstream analysis tools formulate spatial organization and cell-cell communications as quantifiable properties, and provide algorithms to derive such properties. Integrative pipelines further assemble multiple tools in one package, allowing biologists to conveniently analyze data from beginning to end. In this review, we summarize the state of the art of spatial transcriptomic data analysis methods and pipelines, and discuss how they operate on different technological platforms.
Collapse
Affiliation(s)
- Ruben Dries
- Department of Medicine, Boston University School of Medicine, Boston, Massachusetts 02118, USA
- Bioinformatics Graduate Program, Boston University, Boston, Massachusetts 02215, USA
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, Massachusetts 02118, USA
| | - Jiaji Chen
- Department of Medicine, Boston University School of Medicine, Boston, Massachusetts 02118, USA
| | - Natalie Del Rossi
- Department of Genetics and Genomic Sciences, Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Mohammed Muzamil Khan
- Department of Medicine, Boston University School of Medicine, Boston, Massachusetts 02118, USA
- Bioinformatics Graduate Program, Boston University, Boston, Massachusetts 02215, USA
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, Massachusetts 02118, USA
| | - Adriana Sistig
- Department of Genetics and Genomic Sciences, Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| |
Collapse
|
226
|
Longo SK, Guo MG, Ji AL, Khavari PA. Integrating single-cell and spatial transcriptomics to elucidate intercellular tissue dynamics. Nat Rev Genet 2021; 22:627-644. [PMID: 34145435 PMCID: PMC9888017 DOI: 10.1038/s41576-021-00370-8] [Citation(s) in RCA: 409] [Impact Index Per Article: 136.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/29/2021] [Indexed: 02/07/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) identifies cell subpopulations within tissue but does not capture their spatial distribution nor reveal local networks of intercellular communication acting in situ. A suite of recently developed techniques that localize RNA within tissue, including multiplexed in situ hybridization and in situ sequencing (here defined as high-plex RNA imaging) and spatial barcoding, can help address this issue. However, no method currently provides as complete a scope of the transcriptome as does scRNA-seq, underscoring the need for approaches to integrate single-cell and spatial data. Here, we review efforts to integrate scRNA-seq with spatial transcriptomics, including emerging integrative computational methods, and propose ways to effectively combine current methodologies.
Collapse
Affiliation(s)
- Sophia K. Longo
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA,Stanford Cancer Institute, Stanford University, Stanford, CA, USA
| | - Margaret G. Guo
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA,Stanford Cancer Institute, Stanford University, Stanford, CA, USA,Program in Biomedical Informatics, Stanford University, Stanford, CA, USA
| | - Andrew L. Ji
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA,Stanford Cancer Institute, Stanford University, Stanford, CA, USA
| | - Paul A. Khavari
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA,Stanford Cancer Institute, Stanford University, Stanford, CA, USA,Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA
| |
Collapse
|
227
|
Auerbach BJ, Hu J, Reilly MP, Li M. Applications of single-cell genomics and computational strategies to study common disease and population-level variation. Genome Res 2021; 31:1728-1741. [PMID: 34599006 PMCID: PMC8494214 DOI: 10.1101/gr.275430.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The advent and rapid development of single-cell technologies have made it possible to study cellular heterogeneity at an unprecedented resolution and scale. Cellular heterogeneity underlies phenotypic differences among individuals, and studying cellular heterogeneity is an important step toward our understanding of the disease molecular mechanism. Single-cell technologies offer opportunities to characterize cellular heterogeneity from different angles, but how to link cellular heterogeneity with disease phenotypes requires careful computational analysis. In this article, we will review the current applications of single-cell methods in human disease studies and describe what we have learned so far from existing studies about human genetic variation. As single-cell technologies are becoming widely applicable in human disease studies, population-level studies have become a reality. We will describe how we should go about pursuing and designing these studies, particularly how to select study subjects, how to determine the number of cells to sequence per subject, and the needed sequencing depth per cell. We also discuss computational strategies for the analysis of single-cell data and describe how single-cell data can be integrated with bulk tissue data and data generated from genome-wide association studies. Finally, we point out open problems and future research directions.
Collapse
Affiliation(s)
- Benjamin J Auerbach
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - Jian Hu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - Muredach P Reilly
- Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, New York 10032, USA
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
228
|
Atta L, Fan J. Computational challenges and opportunities in spatially resolved transcriptomic data analysis. Nat Commun 2021; 12:5283. [PMID: 34489425 PMCID: PMC8421472 DOI: 10.1038/s41467-021-25557-9] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 08/18/2021] [Indexed: 12/19/2022] Open
Abstract
Spatially resolved transcriptomic data demand new computational analysis methods to derive biological insights. Here, we comment on these associated computational challenges as well as highlight the opportunities for standardized benchmarking metrics and data-sharing infrastructure in spurring innovation moving forward.
Collapse
Affiliation(s)
- Lyla Atta
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
- Medical Scientist Training Program, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jean Fan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
229
|
Xu J, Liao K, Yang X, Wu C, Wu W, Han S. Using single-cell sequencing technology to detect circulating tumor cells in solid tumors. Mol Cancer 2021; 20:104. [PMID: 34412644 PMCID: PMC8375060 DOI: 10.1186/s12943-021-01392-w] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Accepted: 07/12/2021] [Indexed: 12/30/2022] Open
Abstract
Circulating tumor cells are tumor cells with high vitality and high metastatic potential that invade and shed into the peripheral blood from primary solid tumors or metastatic foci. Due to the heterogeneity of tumors, it is difficult for high-throughput sequencing analysis of tumor tissues to find the genomic characteristics of low-abundance tumor stem cells. Single-cell sequencing of circulating tumor cells avoids interference from tumor heterogeneity by comparing the differences between single-cell genomes, transcriptomes, and epigenetic groups among circulating tumor cells, primary and metastatic tumors, and metastatic lymph nodes in patients' peripheral blood, providing a new perspective for understanding the biological process of tumors. This article describes the identification, biological characteristics, and single-cell genome-wide variation in circulating tumor cells and summarizes the application of single-cell sequencing technology to tumor typing, metastasis analysis, progression detection, and adjuvant therapy.
Collapse
Affiliation(s)
- Jiasheng Xu
- Department of Oncology, Huzhou Central Hospital, Affiliated Central Hospital Huzhou University, No.1558, Sanhuan North Road, Wuxing District Zhejiang Province, Huzhou, China.,Department of Vascular Surgery, the Second Affiliated Hospital of Nanchang University, No. 1 Minde Road, Nanchang, 330006, Jiangxi, China
| | - Kaili Liao
- Department of Clinical Laboratory, the Second Affiliated Hospital of Nanchang University, No. 1 Minde Road, Nanchang, 330006, Jiangxi, China
| | - Xi Yang
- Department of Oncology, Huzhou Central Hospital, Affiliated Central Hospital Huzhou University, No.1558, Sanhuan North Road, Wuxing District Zhejiang Province, Huzhou, China
| | - Chengfeng Wu
- Department of Vascular Surgery, the Second Affiliated Hospital of Nanchang University, No. 1 Minde Road, Nanchang, 330006, Jiangxi, China
| | - Wei Wu
- Department of Gastroenterology, Huzhou Central Hospital, Affiliated Central Hospital Huzhou University, No.1558, Sanhuan North Road, Wuxing District Zhejiang Province, 313000, Huzhou, China
| | - Shuwen Han
- Department of Oncology, Huzhou Central Hospital, Affiliated Central Hospital Huzhou University, No.1558, Sanhuan North Road, Wuxing District Zhejiang Province, Huzhou, China.
| |
Collapse
|
230
|
Zhang S. Automatic estimation of spatial spectra via smoothing splines. Comput Stat 2021. [DOI: 10.1007/s00180-021-01141-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
231
|
Gouin KH, Ing N, Plummer JT, Rosser CJ, Ben Cheikh B, Oh C, Chen SS, Chan KS, Furuya H, Tourtellotte WG, Knott SRV, Theodorescu D. An N-Cadherin 2 expressing epithelial cell subpopulation predicts response to surgery, chemotherapy and immunotherapy in bladder cancer. Nat Commun 2021; 12:4906. [PMID: 34385456 PMCID: PMC8361097 DOI: 10.1038/s41467-021-25103-7] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 07/22/2021] [Indexed: 12/20/2022] Open
Abstract
Neoadjuvant chemotherapy (NAC) prior to surgery and immune checkpoint therapy (ICT) have revolutionized bladder cancer management. However, stratification of patients that would benefit most from these modalities remains a major clinical challenge. Here, we combine single nuclei RNA sequencing with spatial transcriptomics and single-cell resolution spatial proteomic analysis of human bladder cancer to identify an epithelial subpopulation with therapeutic response prediction ability. These cells express Cadherin 12 (CDH12, N-Cadherin 2), catenins, and other epithelial markers. CDH12-enriched tumors define patients with poor outcome following surgery with or without NAC. In contrast, CDH12-enriched tumors exhibit superior response to ICT. In all settings, patient stratification by tumor CDH12 enrichment offers better prediction of outcome than currently established bladder cancer subtypes. Molecularly, the CDH12 population resembles an undifferentiated state with inherently aggressive biology including chemoresistance, likely mediated through progenitor-like gene expression and fibroblast activation. CDH12-enriched cells express PD-L1 and PD-L2 and co-localize with exhausted T-cells, possibly mediated through CD49a (ITGA1), providing one explanation for ICT efficacy in these tumors. Altogether, this study describes a cancer cell population with an intriguing diametric response to major bladder cancer therapeutics. Importantly, it also provides a compelling framework for designing biomarker-guided clinical trials.
Collapse
Affiliation(s)
- Kenneth H Gouin
- Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Center for Bioinformatics and Functional Genomics, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Nathan Ing
- Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Center for Bioinformatics and Functional Genomics, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Jasmine T Plummer
- Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Center for Bioinformatics and Functional Genomics, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Charles J Rosser
- Department of Surgery (Urology), Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Cedars-Sinai Samuel Oschin Comprehensive Cancer Institute, Los Angeles, CA, USA
| | - Bassem Ben Cheikh
- Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Center for Bioinformatics and Functional Genomics, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Catherine Oh
- Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Center for Bioinformatics and Functional Genomics, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Stephanie S Chen
- Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Center for Bioinformatics and Functional Genomics, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Keith Syson Chan
- Cedars-Sinai Samuel Oschin Comprehensive Cancer Institute, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Hideki Furuya
- Department of Surgery (Urology), Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Cedars-Sinai Samuel Oschin Comprehensive Cancer Institute, Los Angeles, CA, USA
| | - Warren G Tourtellotte
- Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Cedars-Sinai Samuel Oschin Comprehensive Cancer Institute, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Department of Neurology, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Department of Neurosurgery, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Simon R V Knott
- Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
- Center for Bioinformatics and Functional Genomics, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
- Cedars-Sinai Samuel Oschin Comprehensive Cancer Institute, Los Angeles, CA, USA.
| | - Dan Theodorescu
- Department of Surgery (Urology), Cedars-Sinai Medical Center, Los Angeles, CA, USA.
- Cedars-Sinai Samuel Oschin Comprehensive Cancer Institute, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
| |
Collapse
|
232
|
Xu Y, McCord RP. CoSTA: unsupervised convolutional neural network learning for spatial transcriptomics analysis. BMC Bioinformatics 2021; 22:397. [PMID: 34372758 PMCID: PMC8351440 DOI: 10.1186/s12859-021-04314-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 08/02/2021] [Indexed: 11/17/2022] Open
Abstract
Background The rise of spatial transcriptomics technologies is leading to new insights about how gene regulation happens in a spatial context. Determining which genes are expressed in similar spatial patterns can reveal gene regulatory relationships across cell types in a tissue. However, many current analysis methods do not take full advantage of the spatial organization of the data, instead treating pixels as independent features. Here, we present CoSTA: a novel approach to learn spatial similarities between gene expression matrices via convolutional neural network (ConvNet) clustering. Results By analyzing simulated and previously published spatial transcriptomics data, we demonstrate that CoSTA learns spatial relationships between genes in a way that emphasizes broader spatial patterns rather than pixel-level correlation. CoSTA provides a quantitative measure of expression pattern similarity between each pair of genes rather than only classifying genes into categories. We find that CoSTA identifies narrower, but biologically relevant, sets of significantly related genes as compared to other approaches. Conclusions The deep learning CoSTA approach provides a different angle to spatial transcriptomics analysis by focusing on the shape of expression patterns, using more information about the positions of neighboring pixels than would an overlap or pixel correlation approach. CoSTA can be applied to any spatial transcriptomics data represented in matrix form and may have future applications to datasets such as histology in which images of different genes are from similar but not identical biological sections. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04314-1.
Collapse
Affiliation(s)
- Yang Xu
- UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA
| | - Rachel Patton McCord
- Department of Biochemistry & Cellular and Molecular Biology, University of Tennessee, Knoxville, TN, USA.
| |
Collapse
|
233
|
Bioinformatics approach to spatially resolved transcriptomics. Emerg Top Life Sci 2021; 5:669-674. [PMID: 34369559 DOI: 10.1042/etls20210131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 07/20/2021] [Accepted: 07/20/2021] [Indexed: 11/17/2022]
Abstract
Spatially resolved transcriptomics encompasses a growing number of methods developed to enable gene expression profiling of individual cells within a tissue. Different technologies are available and they vary with respect to: the method used to define regions of interest, the method used to assess gene expression, and resolution. Since techniques based on next-generation sequencing are the most prevalent, and provide single-cell resolution, many bioinformatics tools for spatially resolved data are shared with single-cell RNA-seq. The analysis pipelines diverge at the level of quantification matrix, downstream of which spatial techniques require specific tools to answer key biological questions. Those questions include: (i) cell type classification; (ii) detection of genes with specific spatial distribution; (iii) identification of novel tissue regions based on gene expression patterns; (iv) cell-cell interactions. On the other hand, analysis of spatially resolved data is burdened by several specific challenges. Defining regions of interest, e.g. neoplastic tissue, often calls for manual annotation of images, which then poses a bottleneck in the pipeline. Another specific issue is the third spatial dimension and the need to expand the analysis beyond a single slice. Despite the problems, it can be predicted that the popularity of spatial techniques will keep growing until they replace single-cell assays (which will remain limited to specific cases, like blood). As soon as the computational protocol reach the maturity (e.g. bulk RNA-seq), one can foresee the expansion of spatial techniques beyond basic or translational research, even into routine medical diagnostics.
Collapse
|
234
|
Rao A, Barkley D, França GS, Yanai I. Exploring tissue architecture using spatial transcriptomics. Nature 2021; 596:211-220. [PMID: 34381231 PMCID: PMC8475179 DOI: 10.1038/s41586-021-03634-9] [Citation(s) in RCA: 574] [Impact Index Per Article: 191.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 05/11/2021] [Indexed: 02/08/2023]
Abstract
Deciphering the principles and mechanisms by which gene activity orchestrates complex cellular arrangements in multicellular organisms has far-reaching implications for research in the life sciences. Recent technological advances in next-generation sequencing- and imaging-based approaches have established the power of spatial transcriptomics to measure expression levels of all or most genes systematically throughout tissue space, and have been adopted to generate biological insights in neuroscience, development and plant biology as well as to investigate a range of disease contexts, including cancer. Similar to datasets made possible by genomic sequencing and population health surveys, the large-scale atlases generated by this technology lend themselves to exploratory data analysis for hypothesis generation. Here we review spatial transcriptomic technologies and describe the repertoire of operations available for paths of analysis of the resulting data. Spatial transcriptomics can also be deployed for hypothesis testing using experimental designs that compare time points or conditions-including genetic or environmental perturbations. Finally, spatial transcriptomic data are naturally amenable to integration with other data modalities, providing an expandable framework for insight into tissue organization.
Collapse
Affiliation(s)
- Anjali Rao
- Institute for Computational Medicine, NYU Langone Health, New York, NY, USA
| | - Dalia Barkley
- Institute for Computational Medicine, NYU Langone Health, New York, NY, USA
| | - Gustavo S França
- Institute for Computational Medicine, NYU Langone Health, New York, NY, USA
| | - Itai Yanai
- Institute for Computational Medicine, NYU Langone Health, New York, NY, USA.
- Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY, USA.
| |
Collapse
|
235
|
Eide PW, Moosavi SH, Eilertsen IA, Brunsell TH, Langerud J, Berg KCG, Røsok BI, Bjørnbeth BA, Nesbakken A, Lothe RA, Sveen A. Metastatic heterogeneity of the consensus molecular subtypes of colorectal cancer. NPJ Genom Med 2021; 6:59. [PMID: 34262039 PMCID: PMC8280229 DOI: 10.1038/s41525-021-00223-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 06/22/2021] [Indexed: 02/08/2023] Open
Abstract
Gene expression-based subtypes of colorectal cancer have clinical relevance, but the representativeness of primary tumors and the consensus molecular subtypes (CMS) for metastatic cancers is not well known. We investigated the metastatic heterogeneity of CMS. The best approach to subtype translation was delineated by comparisons of transcriptomic profiles from 317 primary tumors and 295 liver metastases, including multi-metastatic samples from 45 patients and 14 primary-metastasis sets. Associations were validated in an external data set (n = 618). Projection of metastases onto principal components of primary tumors showed that metastases were depleted of CMS1-immune/CMS3-metabolic signals, enriched for CMS4-mesenchymal/stromal signals, and heavily influenced by the microenvironment. The tailored CMS classifier (available in an updated version of the R package CMScaller) therefore implemented an approach to regress out the liver tissue background. The majority of classified metastases were either CMS2 or CMS4. Nonetheless, subtype switching and inter-metastatic CMS heterogeneity were frequent and increased with sampling intensity. Poor-prognostic value of CMS1/3 metastases was consistent in the context of intra-patient tumor heterogeneity.
Collapse
Affiliation(s)
- Peter W Eide
- Department of Molecular Oncology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.,K.G. Jebsen Colorectal Cancer Research Centre, Division for Cancer Medicine, Oslo University Hospital, Oslo, Norway
| | - Seyed H Moosavi
- Department of Molecular Oncology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.,K.G. Jebsen Colorectal Cancer Research Centre, Division for Cancer Medicine, Oslo University Hospital, Oslo, Norway.,Institute for Clinical Medicine, University of Oslo, Oslo, Norway
| | - Ina A Eilertsen
- Department of Molecular Oncology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.,K.G. Jebsen Colorectal Cancer Research Centre, Division for Cancer Medicine, Oslo University Hospital, Oslo, Norway.,Institute for Clinical Medicine, University of Oslo, Oslo, Norway
| | - Tuva H Brunsell
- Department of Molecular Oncology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.,K.G. Jebsen Colorectal Cancer Research Centre, Division for Cancer Medicine, Oslo University Hospital, Oslo, Norway.,Institute for Clinical Medicine, University of Oslo, Oslo, Norway.,Department of Gastrointestinal Surgery, Oslo University Hospital, Oslo, Norway
| | - Jonas Langerud
- Department of Molecular Oncology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.,K.G. Jebsen Colorectal Cancer Research Centre, Division for Cancer Medicine, Oslo University Hospital, Oslo, Norway.,Institute for Clinical Medicine, University of Oslo, Oslo, Norway
| | - Kaja C G Berg
- Department of Molecular Oncology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.,K.G. Jebsen Colorectal Cancer Research Centre, Division for Cancer Medicine, Oslo University Hospital, Oslo, Norway.,Institute for Clinical Medicine, University of Oslo, Oslo, Norway
| | - Bård I Røsok
- K.G. Jebsen Colorectal Cancer Research Centre, Division for Cancer Medicine, Oslo University Hospital, Oslo, Norway.,Department of Gastrointestinal Surgery, Oslo University Hospital, Oslo, Norway
| | - Bjørn A Bjørnbeth
- K.G. Jebsen Colorectal Cancer Research Centre, Division for Cancer Medicine, Oslo University Hospital, Oslo, Norway.,Department of Gastrointestinal Surgery, Oslo University Hospital, Oslo, Norway
| | - Arild Nesbakken
- K.G. Jebsen Colorectal Cancer Research Centre, Division for Cancer Medicine, Oslo University Hospital, Oslo, Norway.,Institute for Clinical Medicine, University of Oslo, Oslo, Norway.,Department of Gastrointestinal Surgery, Oslo University Hospital, Oslo, Norway
| | - Ragnhild A Lothe
- Department of Molecular Oncology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.,K.G. Jebsen Colorectal Cancer Research Centre, Division for Cancer Medicine, Oslo University Hospital, Oslo, Norway.,Institute for Clinical Medicine, University of Oslo, Oslo, Norway
| | - Anita Sveen
- Department of Molecular Oncology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway. .,K.G. Jebsen Colorectal Cancer Research Centre, Division for Cancer Medicine, Oslo University Hospital, Oslo, Norway. .,Institute for Clinical Medicine, University of Oslo, Oslo, Norway.
| |
Collapse
|
236
|
Bae S, Choi H, Lee DS. Discovery of molecular features underlying the morphological landscape by integrating spatial transcriptomic data with deep features of tissue images. Nucleic Acids Res 2021; 49:e55. [PMID: 33619564 PMCID: PMC8191797 DOI: 10.1093/nar/gkab095] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 01/10/2021] [Accepted: 02/03/2021] [Indexed: 12/26/2022] Open
Abstract
Profiling molecular features associated with the morphological landscape of tissue is crucial for investigating the structural and spatial patterns that underlie the biological function of tissues. In this study, we present a new method, spatial gene expression patterns by deep learning of tissue images (SPADE), to identify important genes associated with morphological contexts by combining spatial transcriptomic data with coregistered images. SPADE incorporates deep learning-derived image patterns with spatially resolved gene expression data to extract morphological context markers. Morphological features that correspond to spatial maps of the transcriptome were extracted by image patches surrounding each spot and were subsequently represented by image latent features. The molecular profiles correlated with the image latent features were identified. The extracted genes could be further analyzed to discover functional terms and exploited to extract clusters maintaining morphological contexts. We apply our approach to spatial transcriptomic data from different tissues, platforms and types of images to demonstrate an unbiased method that is capable of obtaining image-integrated gene expression trends.
Collapse
Affiliation(s)
- Sungwoo Bae
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Republic of Korea.,Department of Nuclear Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Hongyoon Choi
- Department of Nuclear Medicine, Seoul National University Hospital, Seoul, Republic of Korea.,Department of Nuclear Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Dong Soo Lee
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Republic of Korea.,Department of Nuclear Medicine, Seoul National University Hospital, Seoul, Republic of Korea.,Department of Nuclear Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
237
|
Coullomb A, Pancaldi V. Tysserand - Fast and accurate reconstruction of spatial networks from bioimages. Bioinformatics 2021; 37:3989-3991. [PMID: 34213523 DOI: 10.1093/bioinformatics/btab490] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 05/10/2021] [Accepted: 06/30/2021] [Indexed: 11/12/2022] Open
Abstract
SUMMARY Networks provide a powerful framework to analyze spatial omics experiments. However, we lack tools that integrate several methods to easily reconstruct networks for further analyses with dedicated libraries. In addition, choosing the appropriate method and parameters can be challenging.We propose tysserand, a Python library to reconstruct spatial networks from spatially resolved omics experiments. It is intended as a common tool to which the bioinformatics community can add new methods to reconstruct networks, choose appropriate parameters, clean resulting networks and pipe data to other libraries. AVAILABILITY AND IMPLEMENTATION tysserand software and tutorials with a Jupyter notebook to reproduce the results are available at https://github.com/VeraPancaldiLab/tysserand. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alexis Coullomb
- Bioinformatics department, Centre de Recherches en Cancrologie de Toulouse, INSERM, 2 Avenue Hubert Curien, 31100, Occitanie, France
| | - Vera Pancaldi
- Bioinformatics department, Centre de Recherches en Cancrologie de Toulouse, INSERM, 2 Avenue Hubert Curien, 31100, Occitanie, France
| |
Collapse
|
238
|
Hu J, Schroeder A, Coleman K, Chen C, Auerbach BJ, Li M. Statistical and machine learning methods for spatially resolved transcriptomics with histology. Comput Struct Biotechnol J 2021; 19:3829-3841. [PMID: 34285782 PMCID: PMC8273359 DOI: 10.1016/j.csbj.2021.06.052] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Revised: 06/28/2021] [Accepted: 06/30/2021] [Indexed: 01/22/2023] Open
Abstract
Recent developments in spatially resolved transcriptomics (SRT) technologies have enabled scientists to get an integrated understanding of cells in their morphological context. Applications of these technologies in diverse tissues and diseases have transformed our views of transcriptional complexity. Most published studies utilized tools developed for single-cell RNA sequencing (scRNA-seq) for data analysis. However, SRT data exhibit different properties from scRNA-seq. To take full advantage of the added dimension on spatial location information in such data, new methods that are tailored for SRT are needed. Additionally, SRT data often have companion high-resolution histology information available. Incorporating histological features in gene expression analysis is an underexplored area. In this review, we will focus on the statistical and machine learning aspects for SRT data analysis and discuss how spatial location and histology information can be integrated with gene expression to advance our understanding of the transcriptional complexity. We also point out open problems and future research directions in this field.
Collapse
Affiliation(s)
- Jian Hu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Amelia Schroeder
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Kyle Coleman
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Chixiang Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Benjamin J. Auerbach
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| |
Collapse
|
239
|
Hao M, Hua K, Zhang X. SOMDE: A scalable method for identifying spatially variable genes with self-organizing map. Bioinformatics 2021; 37:4392-4398. [PMID: 34165490 DOI: 10.1093/bioinformatics/btab471] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 05/22/2021] [Accepted: 06/23/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Recent developments of spatial transcriptomic sequencing technologies provide powerful tools for understanding cells in the physical context of tissue microenvironments. A fundamental task in spatial gene expression analysis is to identify genes with spatially variable expression patterns, or spatially variable genes (SVgenes). Several computational methods have been developed for this task. Their high computational complexity limited their scalability to the latest and future large-scale spatial expression data. RESULTS We present SOMDE, an efficient method for identifying SVgenes in large-scale spatial expression data. SOMDE uses self-organizing map (SOM) to cluster neighboring cells into nodes, and then uses a Gaussian process to fit the node-level spatial gene expression to identify SVgenes. Experiments show that SOMDE is about 5-50 times faster than existing methods with comparable results. The adjustable resolution of SOMDE makes it the only method that can give results in ∼5 minutes in large datasets of more than 20,000 sequencing sites. SOMDE is available as a python package on PyPI at https://pypi.org/project/somde free for academic use. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Minsheng Hao
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Kui Hua
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China.,School of Life Sciences, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
240
|
Zhu J, Sun S, Zhou X. SPARK-X: non-parametric modeling enables scalable and robust detection of spatial expression patterns for large spatial transcriptomic studies. Genome Biol 2021; 22:184. [PMID: 34154649 PMCID: PMC8218388 DOI: 10.1186/s13059-021-02404-0] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 06/07/2021] [Indexed: 01/01/2023] Open
Abstract
Spatial transcriptomic studies are becoming increasingly common and large, posing important statistical and computational challenges for many analytic tasks. Here, we present SPARK-X, a non-parametric method for rapid and effective detection of spatially expressed genes in large spatial transcriptomic studies. SPARK-X not only produces effective type I error control and high power but also brings orders of magnitude computational savings. We apply SPARK-X to analyze three large datasets, one of which is only analyzable by SPARK-X. In these data, SPARK-X identifies many spatially expressed genes including those that are spatially expressed within the same cell type, revealing new biological insights.
Collapse
Affiliation(s)
- Jiaqiang Zhu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Shiquan Sun
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Epidemiology and Biostatistics, Xi'an Jiaotong University, Xi'an, Shaanxi, 710061, P.R. China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
241
|
Li Q, Zhang M, Xie Y, Xiao G. Bayesian Modeling of Spatial Molecular Profiling Data via Gaussian Process. Bioinformatics 2021; 37:4129-4136. [PMID: 34146105 PMCID: PMC9502169 DOI: 10.1093/bioinformatics/btab455] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 05/29/2021] [Accepted: 06/16/2021] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION The location, timing, and abundance of gene expression (both mRNA and proteins) within a tissue define the molecular mechanisms of cell functions. Recent technology breakthroughs in spatial molecular profiling, including imaging-based technologies and sequencing-based technologies, have enabled the comprehensive molecular characterization of single cells while preserving their spatial and morphological contexts. This new bioinformatics scenario calls for effective and robust computational methods to identify genes with spatial patterns. RESULTS We represent a novel Bayesian hierarchical model to analyze spatial transcriptomics data, with several unique characteristics. It models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model that greatly increases model stability and robustness. Besides, the Bayesian inference framework allows us to borrow strength in parameter estimation in a de novo fashion. As a result, the proposed model shows competitive performances in accuracy and robustness over existing methods in both simulation studies and two real data applications. AVAILABILITY The related R/C ++ source code is available at https://github.com/Minzhe/BOOST-GP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiwei Li
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, TX 75080, USA
| | - Minzhe Zhang
- Quantitative Biology Research Center, Department of Population and Data Sciences, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Yang Xie
- Quantitative Biology Research Center, Department of Population and Data Sciences, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Guanghua Xiao
- Quantitative Biology Research Center, Department of Population and Data Sciences, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
242
|
Chen Y, Song J, Ruan Q, Zeng X, Wu L, Cai L, Wang X, Yang C. Single-Cell Sequencing Methodologies: From Transcriptome to Multi-Dimensional Measurement. SMALL METHODS 2021; 5:e2100111. [PMID: 34927917 DOI: 10.1002/smtd.202100111] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/26/2021] [Indexed: 06/14/2023]
Abstract
Cells are the basic building blocks of biological systems, with inherent unique molecular features and development trajectories. The study of single cells facilitates in-depth understanding of cellular diversity, disease processes, and organization of multicellular organisms. Single-cell RNA sequencing (scRNA-seq) technologies have become essential tools for the interrogation of gene expression patterns and the dynamics of single cells, allowing cellular heterogeneity to be dissected at unprecedented resolution. Nevertheless, measuring at only transcriptome level or 1D is incomplete; the cellular heterogeneity reflects in multiple dimensions, including the genome, epigenome, transcriptome, spatial, and even temporal dimensions. Hence, integrative single cell analysis is highly desired. In addition, the way to interpret sequencing data by virtue of bioinformatic tools also exerts critical roles in revealing differential gene expression. Here, a comprehensive review that summarizes the cutting-edge single-cell transcriptome sequencing methodologies, including scRNA-seq, spatial and temporal transcriptome profiling, multi-omics sequencing and computational methods developed for scRNA-seq data analysis is provided. Finally, the challenges and perspectives of this field are discussed.
Collapse
Affiliation(s)
- Yingwen Chen
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Jia Song
- Institute of Molecular Medicine, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200127, China
| | - Qingyu Ruan
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Xi Zeng
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Lingling Wu
- Institute of Molecular Medicine, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200127, China
| | - Linfeng Cai
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Xuanqun Wang
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Chaoyong Yang
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
- Institute of Molecular Medicine, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200127, China
| |
Collapse
|
243
|
Nguyen TM, Jeevan JJ, Xu N, Chen JY. Polar Gini Curve: A Technique to Discover Gene Expression Spatial Patterns from Single-cell RNA-seq Data. GENOMICS, PROTEOMICS & BIOINFORMATICS 2021; 19:493-503. [PMID: 34958962 PMCID: PMC8864247 DOI: 10.1016/j.gpb.2020.09.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 07/09/2020] [Accepted: 10/29/2020] [Indexed: 12/13/2022]
Abstract
In this work, we describe the development of Polar Gini Curve, a method for characterizing cluster markers by analyzing single-cell RNA sequencing (scRNA-seq) data. Polar Gini Curve combines the gene expression and the 2D coordinates ("spatial") information to detect patterns of uniformity in any clustered cells from scRNA-seq data. We demonstrate that Polar Gini Curve can help users characterize the shape and density distribution of cells in a particular cluster, which can be generated during routine scRNA-seq data analysis. To quantify the extent to which a gene is uniformly distributed in a cell cluster space, we combine two polar Gini curves (PGCs)-one drawn upon the cell-points expressing the gene (the "foreground curve") and the other drawn upon all cell-points in the cluster (the "background curve"). We show that genes with highly dissimilar foreground and background curves tend not to uniformly distributed in the cell cluster-thus having spatially divergent gene expression patterns within the cluster. Genes with similar foreground and background curves tend to uniformly distributed in the cell cluster-thus having uniform gene expression patterns within the cluster. Such quantitative attributes of PGCs can be applied to sensitively discover biomarkers across clusters from scRNA-seq data. We demonstrate the performance of the Polar Gini Curve framework in several simulation case studies. Using this framework to analyze a real-world neonatal mouse heart cell dataset, the detected biomarkers may characterize novel subtypes of cardiac muscle cells. The source code and data for Polar Gini Curve could be found at http://discovery.informatics.uab.edu/PGC/ or https://figshare.com/projects/Polar_Gini_Curve/76749.
Collapse
Affiliation(s)
- Thanh Minh Nguyen
- Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Jacob John Jeevan
- Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Nuo Xu
- Collat School of Business, the University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Jake Y Chen
- Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35294, USA.
| |
Collapse
|
244
|
Liu J, Fan Z, Zhao W, Zhou X. Machine Intelligence in Single-Cell Data Analysis: Advances and New Challenges. Front Genet 2021; 12:655536. [PMID: 34135939 PMCID: PMC8203333 DOI: 10.3389/fgene.2021.655536] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 04/26/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid development of single-cell technologies allows for dissecting cellular heterogeneity at different omics layers with an unprecedented resolution. In-dep analysis of cellular heterogeneity will boost our understanding of complex biological systems or processes, including cancer, immune system and chronic diseases, thereby providing valuable insights for clinical and translational research. In this review, we will focus on the application of machine learning methods in single-cell multi-omics data analysis. We will start with the pre-processing of single-cell RNA sequencing (scRNA-seq) data, including data imputation, cross-platform batch effect removal, and cell cycle and cell-type identification. Next, we will introduce advanced data analysis tools and methods used for copy number variance estimate, single-cell pseudo-time trajectory analysis, phylogenetic tree inference, cell-cell interaction, regulatory network inference, and integrated analysis of scRNA-seq and spatial transcriptome data. Finally, we will present the latest analyzing challenges, such as multi-omics integration and integrated analysis of scRNA-seq data.
Collapse
Affiliation(s)
- Jiajia Liu
- College of Electronic and Information Engineering, Tongji University, Shanghai, China
- School of Biomedical Informatics, The University of Texas Health Science Centre at Houston, Houston, TX, United States
| | - Zhiwei Fan
- School of Biomedical Informatics, The University of Texas Health Science Centre at Houston, Houston, TX, United States
- West China School of Public Health, West China Fourth Hospital, Sichuan University, Chengdu, China
| | - Weiling Zhao
- School of Biomedical Informatics, The University of Texas Health Science Centre at Houston, Houston, TX, United States
| | - Xiaobo Zhou
- School of Biomedical Informatics, The University of Texas Health Science Centre at Houston, Houston, TX, United States
| |
Collapse
|
245
|
Miller BF, Bambah-Mukku D, Dulac C, Zhuang X, Fan J. Characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomics data with nonuniform cellular densities. Genome Res 2021; 31:1843-1855. [PMID: 34035045 DOI: 10.1101/gr.271288.120] [Citation(s) in RCA: 68] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 05/13/2021] [Indexed: 11/24/2022]
Abstract
Recent technological advances have enabled spatially resolved measurements of expression profiles for hundreds to thousands of genes in fixed tissues at single-cell resolution. However, scalable computational analysis methods able to take into consideration the inherent 3D spatial organization of cell types and nonuniform cellular densities within tissues are still lacking. To address this, we developed MERINGUE, a computational framework based on spatial auto-correlation and cross-correlation analysis to identify genes with spatially heterogeneous expression patterns, infer putative cell-cell communication, and perform spatially informed cell clustering in 2D and 3D in a density-agnostic manner using spatially resolved transcriptomics data. We applied MERINGUE to a variety of spatially resolved transcriptomics datasets including multiplexed error-robust fluorescence in situ hybridization (MERFISH), spatial transcriptomics, Slide-Seq, and aligned in situ hybridization (ISH) data. We anticipate that such statistical analysis of spatially resolved transcriptomics data will facilitate our understanding of the interplay between cell state and spatial organization in tissue development and disease.
Collapse
|
246
|
Sun T, Song D, Li WV, Li JJ. scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. Genome Biol 2021; 22:163. [PMID: 34034771 PMCID: PMC8147071 DOI: 10.1186/s13059-021-02367-2] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 04/27/2021] [Indexed: 12/13/2022] Open
Abstract
A pressing challenge in single-cell transcriptomics is to benchmark experimental protocols and computational methods. A solution is to use computational simulators, but existing simulators cannot simultaneously achieve three goals: preserving genes, capturing gene correlations, and generating any number of cells with varying sequencing depths. To fill this gap, we propose scDesign2, a transparent simulator that achieves all three goals and generates high-fidelity synthetic data for multiple single-cell gene expression count-based technologies. In particular, scDesign2 is advantageous in its transparent use of probabilistic models and its ability to capture gene correlations via copulas.
Collapse
Affiliation(s)
- Tianyi Sun
- grid.19006.3e0000 0000 9632 6718Department of Statistics, University of California, Los Angeles, 90095-1554 CA USA
| | - Dongyuan Song
- grid.19006.3e0000 0000 9632 6718Interdepartmental Program of Bioinformatics, University of California, Los Angeles, 90095-7246 CA USA
| | - Wei Vivian Li
- Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Piscataway, 08854, NJ, USA.
| | - Jingyi Jessica Li
- Department of Statistics, University of California, Los Angeles, 90095-1554, CA, USA. .,Department of Human Genetics, University of California, Los Angeles, 90095-7088, CA, USA. .,Department of Computational Medicine, University of California, Los Angeles, 90095-1766, CA, USA. .,Department of Biostatistics, University of California, Los Angeles, 90095-1772, CA, USA.
| |
Collapse
|
247
|
Zhang M, Sheffield T, Zhan X, Li Q, Yang DM, Wang Y, Wang S, Xie Y, Wang T, Xiao G. Spatial molecular profiling: platforms, applications and analysis tools. Brief Bioinform 2021; 22:bbaa145. [PMID: 32770205 PMCID: PMC8138878 DOI: 10.1093/bib/bbaa145] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 05/26/2020] [Accepted: 06/09/2020] [Indexed: 12/24/2022] Open
Abstract
Molecular profiling technologies, such as genome sequencing and proteomics, have transformed biomedical research, but most such technologies require tissue dissociation, which leads to loss of tissue morphology and spatial information. Recent developments in spatial molecular profiling technologies have enabled the comprehensive molecular characterization of cells while keeping their spatial and morphological contexts intact. Molecular profiling data generate deep characterizations of the genetic, transcriptional and proteomic events of cells, while tissue images capture the spatial locations, organizations and interactions of the cells together with their morphology features. These data, together with cell and tissue imaging data, provide unprecedented opportunities to study tissue heterogeneity and cell spatial organization. This review aims to provide an overview of these recent developments in spatial molecular profiling technologies and the corresponding computational methods developed for analyzing such data.
Collapse
Affiliation(s)
- Minzhe Zhang
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| | - Thomas Sheffield
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| | - Xiaowei Zhan
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| | - Qiwei Li
- Department of Mathematics Sciences at University of Texas at Dallas
| | - Donghan M Yang
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| | - Yunguan Wang
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| | - Shidan Wang
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| | - Yang Xie
- Quantitative Biomedical Research Center at the University of Texas Southwestern Medical Center
| | - Tao Wang
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| | - Guanghua Xiao
- Department of Population and Data Sciences at University of Texas Southwestern Medical Center
| |
Collapse
|
248
|
Moehlin J, Mollet B, Colombo BM, Mendoza-Parra MA. Inferring biologically relevant molecular tissue substructures by agglomerative clustering of digitized spatial transcriptomes with multilayer. Cell Syst 2021; 12:694-705.e3. [PMID: 34159899 DOI: 10.1016/j.cels.2021.04.008] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 01/08/2021] [Accepted: 04/13/2021] [Indexed: 01/04/2023]
Abstract
Spatially resolved transcriptomics (SrT) can investigate organ or tissue architecture from the angle of gene programs that define their molecular complexity. However, computational methods to analyze SrT data underexploit their spatial signature. Inspired by contextual pixel classification strategies applied to image analysis, we developed MULTILAYER to stratify maps into functionally relevant molecular substructures. MULTILAYER applies agglomerative clustering within contiguous locally defined transcriptomes (gene expression elements or "gexels") combined with community detection methods for graphical partitioning. MULTILAYER resolves molecular tissue substructures within a variety of SrT data with superior performance to commonly used dimensionality reduction strategies and still detects differentially expressed genes on par with existing methods. MULTILAYER can process high-resolution as well as multiple SrT data in a comparative mode, anticipating future needs in the field. MULTILAYER provides a digital image perspective for SrT analysis and opens the door to contextual gexel classification strategies for developing self-supervised molecular diagnosis solutions. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Julien Moehlin
- Génomique métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Bastien Mollet
- Génomique métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France; École Normale Supérieure de Lyon, Université Claude Bernard - Lyon 1, Université de Lyon, 69342 Lyon Cedex 07, France
| | - Bruno Maria Colombo
- Génomique métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Marco Antonio Mendoza-Parra
- Génomique métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France.
| |
Collapse
|
249
|
DeTomaso D, Yosef N. Hotspot identifies informative gene modules across modalities of single-cell genomics. Cell Syst 2021; 12:446-456.e9. [PMID: 33951459 DOI: 10.1016/j.cels.2021.04.005] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 12/22/2020] [Accepted: 04/09/2021] [Indexed: 01/06/2023]
Abstract
Two fundamental aims that emerge when analyzing single-cell RNA-seq data are identifying which genes vary in an informative manner and determining how these genes organize into modules. Here, we propose a general approach to these problems, called "Hotspot," that operates directly on a given metric of cell-cell similarity, allowing for its integration with any method (linear or non-linear) for identifying the primary axes of transcriptional variation between cells. In addition, we show that when using multimodal data, Hotspot can be used to identify genes whose expression reflects alternative notions of similarity between cells, such as physical proximity in a tissue or clonal relatedness in a cell lineage tree. In this manner, we demonstrate that while Hotspot is capable of identifying genes that reflect nuanced transcriptional variability between T helper cells, it can also identify spatially dependent patterns of gene expression in the cerebellum as well as developmentally heritable expression programs during embryogenesis. Hotspot is implemented as an open-source Python package and is available for use at http://www.github.com/yoseflab/hotspot. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- David DeTomaso
- Center for Computational Biology, University of California Berkeley, Berkeley, CA, USA
| | - Nir Yosef
- Department of Electrical Engineering and Computer Science and Center for Computational Biology, University of California Berkeley, Berkeley, CA, USA; Ragon Institute of Massachusetts General Hospital, MIT and Harvard, Cambridge, MA, USA; Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
250
|
Singh R, Hie BL, Narayan A, Berger B. Schema: metric learning enables interpretable synthesis of heterogeneous single-cell modalities. Genome Biol 2021; 22:131. [PMID: 33941239 PMCID: PMC8091541 DOI: 10.1186/s13059-021-02313-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 03/12/2021] [Indexed: 02/08/2023] Open
Abstract
A complete understanding of biological processes requires synthesizing information across heterogeneous modalities, such as age, disease status, or gene expression. Technological advances in single-cell profiling have enabled researchers to assay multiple modalities simultaneously. We present Schema, which uses a principled metric learning strategy that identifies informative features in a modality to synthesize disparate modalities into a single coherent interpretation. We use Schema to infer cell types by integrating gene expression and chromatin accessibility data; demonstrate informative data visualizations that synthesize multiple modalities; perform differential gene expression analysis in the context of spatial variability; and estimate evolutionary pressure on peptide sequences.
Collapse
Affiliation(s)
- Rohit Singh
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| | - Brian L Hie
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Ashwin Narayan
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| |
Collapse
|