1
|
Subedi S, Sumida TS, Park YP. A scalable approach to topic modelling in single-cell data by approximate pseudobulk projection. Life Sci Alliance 2024; 7:e202402713. [PMID: 39107066 PMCID: PMC11303850 DOI: 10.26508/lsa.202402713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 07/29/2024] [Accepted: 07/30/2024] [Indexed: 08/09/2024] Open
Abstract
Probabilistic topic modelling has become essential in many types of single-cell data analysis. Based on probabilistic topic assignments in each cell, we identify the latent representation of cellular states. A dictionary matrix, consisting of topic-specific gene frequency vectors, provides interpretable bases to be compared with known cell type-specific marker genes and other pathway annotations. However, fitting a topic model on a large number of cells would require heavy computational resources-specialized computing units, computing time and memory. Here, we present a scalable approximation method customized for single-cell RNA-seq data analysis, termed ASAP, short for Annotating a Single-cell data matrix by Approximate Pseudobulk estimation. Our approach is more accurate than existing methods but requires orders of magnitude less computing time, leaving much lower memory consumption. We also show that our approach is widely applicable for atlas-scale data analysis; our method seamlessly integrates single-cell and bulk data in joint analysis, not requiring additional preprocessing or feature selection steps.
Collapse
Affiliation(s)
- Sishir Subedi
- https://ror.org/03rmrcq20Bioinformatics Graduate Program, University of British Columbia, Vancouver, Canada
- BC Cancer Research, Vancouver, Canada
| | - Tomokazu S Sumida
- Neurology, Program for Neuroinflammation, Yale School of Medicine, New Haven, CT, USA
| | - Yongjin P Park
- BC Cancer Research, Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- Department of Statistics, University of British Columbia, Vancouver, Canada
| |
Collapse
|
2
|
Ding Q, Yang W, Xue G, Liu H, Cai Y, Que J, Jin X, Luo M, Pang F, Yang Y, Lin Y, Liu Y, Sun H, Tan R, Wang P, Xu Z, Jiang Q. Dimension reduction, cell clustering, and cell-cell communication inference for single-cell transcriptomics with DcjComm. Genome Biol 2024; 25:241. [PMID: 39252099 PMCID: PMC11382422 DOI: 10.1186/s13059-024-03385-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 08/30/2024] [Indexed: 09/11/2024] Open
Abstract
Advances in single-cell transcriptomics provide an unprecedented opportunity to explore complex biological processes. However, computational methods for analyzing single-cell transcriptomics still have room for improvement especially in dimension reduction, cell clustering, and cell-cell communication inference. Herein, we propose a versatile method, named DcjComm, for comprehensive analysis of single-cell transcriptomics. DcjComm detects functional modules to explore expression patterns and performs dimension reduction and clustering to discover cellular identities by the non-negative matrix factorization-based joint learning model. DcjComm then infers cell-cell communication by integrating ligand-receptor pairs, transcription factors, and target genes. DcjComm demonstrates superior performance compared to state-of-the-art methods.
Collapse
Affiliation(s)
- Qian Ding
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Wenyi Yang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Guangfu Xue
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Hongxin Liu
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Yideng Cai
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Jinhao Que
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Xiyun Jin
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China
| | - Meng Luo
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Fenglan Pang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Yuexin Yang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Yi Lin
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China
| | - Yusong Liu
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China
| | - Haoxiu Sun
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China
| | - Renjie Tan
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China
| | - Pingping Wang
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China.
| | - Zhaochun Xu
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China.
| | - Qinghua Jiang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China.
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China.
- State Key Laboratory of Frigid Zone Cardiovascular Diseases (SKLFZCD), Harbin Medical University, Harbin, 150076, China.
| |
Collapse
|
3
|
Wang L, Wang C, Moriano JA, Chen S, Zuo G, Cebrián-Silla A, Zhang S, Mukhtar T, Wang S, Song M, de Oliveira LG, Bi Q, Augustin JJ, Ge X, Paredes MF, Huang EJ, Alvarez-Buylla A, Duan X, Li J, Kriegstein AR. Molecular and cellular dynamics of the developing human neocortex at single-cell resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.16.575956. [PMID: 39131371 PMCID: PMC11312442 DOI: 10.1101/2024.01.16.575956] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
The development of the human neocortex is a highly dynamic process and involves complex cellular trajectories controlled by cell-type-specific gene regulation1. Here, we collected paired single-nucleus chromatin accessibility and transcriptome data from 38 human neocortical samples encompassing both the prefrontal cortex and primary visual cortex. These samples span five main developmental stages, ranging from the first trimester to adolescence. In parallel, we performed spatial transcriptomic analysis on a subset of the samples to illustrate spatial organization and intercellular communication. This atlas enables us to catalog cell type-, age-, and area-specific gene regulatory networks underlying neural differentiation. Moreover, combining single-cell profiling, progenitor purification, and lineage-tracing experiments, we have untangled the complex lineage relationships among progenitor subtypes during the transition from neurogenesis to gliogenesis in the human neocortex. We identified a tripotential intermediate progenitor subtype, termed Tri-IPC, responsible for the local production of GABAergic neurons, oligodendrocyte precursor cells, and astrocytes. Remarkably, most glioblastoma cells resemble Tri-IPCs at the transcriptomic level, suggesting that cancer cells hijack developmental processes to enhance growth and heterogeneity. Furthermore, by integrating our atlas data with large-scale GWAS data, we created a disease-risk map highlighting enriched ASD risk in second-trimester intratelencephalic projection neurons. Our study sheds light on the gene regulatory landscape and cellular dynamics of the developing human neocortex.
Collapse
Affiliation(s)
- Li Wang
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurology, University of California San Francisco; San Francisco, CA 94143, USA
| | - Cheng Wang
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurology, University of California San Francisco; San Francisco, CA 94143, USA
| | - Juan A Moriano
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurology, University of California San Francisco; San Francisco, CA 94143, USA
- University of Barcelona Institute of Complex Systems; Barcelona, 08007, Spain
| | - Songcang Chen
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurology, University of California San Francisco; San Francisco, CA 94143, USA
| | - Guolong Zuo
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurology, University of California San Francisco; San Francisco, CA 94143, USA
| | - Arantxa Cebrián-Silla
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurological Surgery, University of California San Francisco; San Francisco, CA 94143, USA
| | - Shaobo Zhang
- Department of Ophthalmology, University of California San Francisco; San Francisco, CA 94143, USA
| | - Tanzila Mukhtar
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurology, University of California San Francisco; San Francisco, CA 94143, USA
| | - Shaohui Wang
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurology, University of California San Francisco; San Francisco, CA 94143, USA
| | - Mengyi Song
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurology, University of California San Francisco; San Francisco, CA 94143, USA
| | - Lilian Gomes de Oliveira
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Neuro-immune Interactions Laboratory, Institute of Biomedical Sciences, Department of Immunology, University of São Paulo; São Paulo, SP 05508-220, Brazil
| | - Qiuli Bi
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurology, University of California San Francisco; San Francisco, CA 94143, USA
| | - Jonathan J Augustin
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurology, University of California San Francisco; San Francisco, CA 94143, USA
| | - Xinxin Ge
- Department of Physiology, University of California San Francisco, San Francisco, CA 94143, USA
| | - Mercedes F Paredes
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurology, University of California San Francisco; San Francisco, CA 94143, USA
| | - Eric J Huang
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Pathology, University of California San Francisco; San Francisco, CA 94143, USA
| | - Arturo Alvarez-Buylla
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurological Surgery, University of California San Francisco; San Francisco, CA 94143, USA
| | - Xin Duan
- Department of Ophthalmology, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Physiology, University of California San Francisco, San Francisco, CA 94143, USA
| | - Jingjing Li
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurology, University of California San Francisco; San Francisco, CA 94143, USA
| | - Arnold R Kriegstein
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco; San Francisco, CA 94143, USA
- Department of Neurology, University of California San Francisco; San Francisco, CA 94143, USA
| |
Collapse
|
4
|
Li MM, Huang Y, Sumathipala M, Liang MQ, Valdeolivas A, Ananthakrishnan AN, Liao K, Marbach D, Zitnik M. Contextual AI models for single-cell protein biology. Nat Methods 2024; 21:1546-1557. [PMID: 39039335 PMCID: PMC11310085 DOI: 10.1038/s41592-024-02341-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 06/10/2024] [Indexed: 07/24/2024]
Abstract
Understanding protein function and developing molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across biological contexts remains challenging for existing algorithms. Here we introduce PINNACLE, a geometric deep learning approach that generates context-aware protein representations. Leveraging a multiorgan single-cell atlas, PINNACLE learns on contextualized protein interaction networks to produce 394,760 protein representations from 156 cell type contexts across 24 tissues. PINNACLE's embedding space reflects cellular and tissue organization, enabling zero-shot retrieval of the tissue hierarchy. Pretrained protein representations can be adapted for downstream tasks: enhancing 3D structure-based representations for resolving immuno-oncological protein interactions, and investigating drugs' effects across cell types. PINNACLE outperforms state-of-the-art models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases and pinpoints cell type contexts with higher predictive capability than context-free models. PINNACLE's ability to adjust its outputs on the basis of the context in which it operates paves the way for large-scale context-specific predictions in biology.
Collapse
Affiliation(s)
- Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yepeng Huang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Marissa Sumathipala
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Man Qing Liang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alberto Valdeolivas
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Ashwin N Ananthakrishnan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Gastroenterology, Massachusetts General Hospital, Boston, MA, USA
| | - Katherine Liao
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Daniel Marbach
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
5
|
Hie BL, Kim S, Rando TA, Bryson B, Berger B. Scanorama: integrating large and diverse single-cell transcriptomic datasets. Nat Protoc 2024; 19:2283-2297. [PMID: 38844552 PMCID: PMC11361826 DOI: 10.1038/s41596-024-00991-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 01/11/2024] [Indexed: 08/07/2024]
Abstract
Merging diverse single-cell RNA sequencing (scRNA-seq) data from numerous experiments, laboratories and technologies can uncover important biological insights. Nonetheless, integrating scRNA-seq data encounters special challenges when the datasets are composed of diverse cell type compositions. Scanorama offers a robust solution for improving the quality and interpretation of heterogeneous scRNA-seq data by effectively merging information from diverse sources. Scanorama is designed to address the technical variation introduced by differences in sample preparation, sequencing depth and experimental batches that can confound the analysis of multiple scRNA-seq datasets. Here we provide a detailed protocol for using Scanorama within a Scanpy-based single-cell analysis workflow coupled with Google Colaboratory, a cloud-based free Jupyter notebook environment service. The protocol involves Scanorama integration, a process that typically spans 0.5-3 h. Scanorama integration requires a basic understanding of cellular biology, transcriptomic technologies and bioinformatics. Our protocol and new Scanorama-Colaboratory resource should make scRNA-seq integration more widely accessible to researchers.
Collapse
Affiliation(s)
- Brian L Hie
- Department of Chemical Engineering, Stanford University School, Stanford, CA, USA.
- Stanford Data Science, Stanford University, Stanford, CA, USA.
- Arc Institute, Palo Alto, CA, USA.
| | - Soochi Kim
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, USA
- Paul F. Glenn Center for the Biology of Aging, Stanford University School of Medicine, Stanford, CA, USA
| | - Thomas A Rando
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, USA
- Paul F. Glenn Center for the Biology of Aging, Stanford University School of Medicine, Stanford, CA, USA
- Department of Neurology, UCLA, Los Angeles, CA, USA
- Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Research, UCLA, Los Angeles, CA, USA
| | - Bryan Bryson
- Department of Biological Engineering, MIT, Cambridge, MA, USA.
- Ragon Institute of Mass General, MIT and Harvard, Cambridge, MA, USA.
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA.
- Department of Mathematics, MIT, Cambridge, MA, USA.
| |
Collapse
|
6
|
Keenen MM, Yang L, Liang H, Farmer VJ, Singh R, Gladfelter AS, Coyne CB. Comparative analysis of the syncytiotrophoblast in placenta tissue and trophoblast organoids using snRNA sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.01.601571. [PMID: 39005304 PMCID: PMC11244908 DOI: 10.1101/2024.07.01.601571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
The outer surface of chorionic villi in the human placenta consists of a single multinucleated cell called the syncytiotrophoblast (STB). The unique cellular ultrastructure of the STB presents challenges in deciphering its gene expression signature at the single-cell level, as the STB contains billions of nuclei in a single cell. There are many gaps in understanding the molecular mechanisms and developmental trajectories involved in STB formation and differentiation. To identify the underlying control of the STB, we performed comparative single nucleus (SN) and single cell (SC) RNA sequencing on placental tissue and tissue-derived trophoblast organoids (TOs). We found that SN was essential to capture the STB population from both tissue and TOs. Differential gene expression and pseudotime analysis of TO-derived STB identified three distinct nuclear subtypes reminiscent of those recently identified in vivo . These included a juvenile nuclear population that exhibited both CTB and STB marker expression, a population enriched in genes involved in oxygen sensing, and a fully differentiated subtype. Notably, suspension culture conditions of TOs that restore the native orientation of the STB (STB out ) showed elevated expression of canonical STB markers and pregnancy hormones, along with a greater proportion of the terminally differentiated mature STB subtype, compared to those cultivated with an inverted STB polarity (STB in ). Gene regulatory analysis identified novel markers of STB differentiation conserved in tissue and TOs, including the chromatin remodeler RYBP, that exhibited STB-specific RNA and protein expression. Finally, we compared STB gene expression signatures amongst first trimester tissue, full-term tissue, and TOs, identifying many commonalities but also notable variability across each sample type. This indicates that STB gene expression is responsive to its environmental context. Our findings emphasize the utility of TOs to accurately model STB differentiation and the distinct nuclear subtypes observed in vivo , offering a versatile platform for unraveling the molecular mechanisms governing STB functions in placental biology and disease.
Collapse
|
7
|
Sethi R, Ang KS, Li M, Long Y, Ling J, Chen J. ezSingleCell: an integrated one-stop single-cell and spatial omics analysis platform for bench scientists. Nat Commun 2024; 15:5600. [PMID: 38961061 PMCID: PMC11222513 DOI: 10.1038/s41467-024-48188-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 04/22/2024] [Indexed: 07/05/2024] Open
Abstract
ezSingleCell is an interactive and easy-to-use application for analysing various single-cell and spatial omics data types without requiring prior programing knowledge. It combines the best-performing publicly available methods for in-depth data analysis, integration, and interactive data visualization. ezSingleCell consists of five modules, each designed to be a comprehensive workflow for one data type or task. In addition, ezSingleCell allows crosstalk between different modules within a unified interface. Acceptable input data can be in a variety of formats while the output consists of publication ready figures and tables. In-depth manuals and video tutorials are available to guide users on the analysis workflows and parameter adjustments to suit their study aims. ezSingleCell's streamlined interface can analyse a standard scRNA-seq dataset of 3000 cells in less than five minutes. ezSingleCell is available in two forms: an installation-free web application ( https://immunesinglecell.org/ezsc/ ) or a software package with a shinyApp interface ( https://github.com/JinmiaoChenLab/ezSingleCell2 ) for offline analysis.
Collapse
Affiliation(s)
- Raman Sethi
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, Matrix, Singapore, 138671, Singapore
| | - Kok Siong Ang
- Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), 61 Biopolis Drive, Proteos, Singapore, 138673, Singapore
| | - Mengwei Li
- Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), 61 Biopolis Drive, Proteos, Singapore, 138673, Singapore
| | - Yahui Long
- Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), 61 Biopolis Drive, Proteos, Singapore, 138673, Singapore
| | - Jingjing Ling
- Singapore Immunology Network (SIgN), Agency of Science, Technology and Research (A*STAR), 8A Biomedical Grove, Immunos, Singapore, 138648, Singapore
| | - Jinmiao Chen
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, Matrix, Singapore, 138671, Singapore.
- Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), 61 Biopolis Drive, Proteos, Singapore, 138673, Singapore.
- Immunology Translational Research Program, Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore (NUS), 5 Science Drive 2, Blk MD4, Level 3, Singapore, 117545, Singapore.
| |
Collapse
|
8
|
Bilous M, Hérault L, Gabriel AA, Teleman M, Gfeller D. Building and analyzing metacells in single-cell genomics data. Mol Syst Biol 2024; 20:744-766. [PMID: 38811801 PMCID: PMC11220014 DOI: 10.1038/s44320-024-00045-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 05/03/2024] [Accepted: 05/08/2024] [Indexed: 05/31/2024] Open
Abstract
The advent of high-throughput single-cell genomics technologies has fundamentally transformed biological sciences. Currently, millions of cells from complex biological tissues can be phenotypically profiled across multiple modalities. The scaling of computational methods to analyze and visualize such data is a constant challenge, and tools need to be regularly updated, if not redesigned, to cope with ever-growing numbers of cells. Over the last few years, metacells have been introduced to reduce the size and complexity of single-cell genomics data while preserving biologically relevant information and improving interpretability. Here, we review recent studies that capitalize on the concept of metacells-and the many variants in nomenclature that have been used. We further outline how and when metacells should (or should not) be used to analyze single-cell genomics data and what should be considered when analyzing such data at the metacell level. To facilitate the exploration of metacells, we provide a comprehensive tutorial on the construction and analysis of metacells from single-cell RNA-seq data ( https://github.com/GfellerLab/MetacellAnalysisTutorial ) as well as a fully integrated pipeline to rapidly build, visualize and evaluate metacells with different methods ( https://github.com/GfellerLab/MetacellAnalysisToolkit ).
Collapse
Affiliation(s)
- Mariia Bilous
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Léonard Hérault
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Aurélie Ag Gabriel
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Matei Teleman
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland.
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland.
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland.
| |
Collapse
|
9
|
Aihara G, Clifton K, Chen M, Li Z, Atta L, Miller BF, Satija R, Hickey JW, Fan J. SEraster: a rasterization preprocessing framework for scalable spatial omics data analysis. Bioinformatics 2024; 40:btae412. [PMID: 38902953 PMCID: PMC11226864 DOI: 10.1093/bioinformatics/btae412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 05/15/2024] [Accepted: 06/19/2024] [Indexed: 06/22/2024] Open
Abstract
MOTIVATION Spatial omics data demand computational analysis but many analysis tools have computational resource requirements that increase with the number of cells analyzed. This presents scalability challenges as researchers use spatial omics technologies to profile millions of cells. RESULTS To enhance the scalability of spatial omics data analysis, we developed a rasterization preprocessing framework called SEraster that aggregates cellular information into spatial pixels. We apply SEraster to both real and simulated spatial omics data prior to spatial variable gene expression analysis to demonstrate that such preprocessing can reduce computational resource requirements while maintaining high performance, including as compared to other down-sampling approaches. We further integrate SEraster with existing analysis tools to characterize cell-type spatial co-enrichment across length scales. Finally, we apply SEraster to enable analysis of a mouse pup spatial omics dataset with over a million cells to identify tissue-level and cell-type-specific spatially variable genes as well as spatially co-enriched cell types that recapitulate expected organ structures. AVAILABILITY AND IMPLEMENTATION SEraster is implemented as an R package on GitHub (https://github.com/JEFworks-Lab/SEraster) with additional tutorials at https://JEF.works/SEraster.
Collapse
Affiliation(s)
- Gohta Aihara
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21211, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Kalen Clifton
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21211, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Mayling Chen
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21211, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Zhuoyan Li
- New York Genome Center, New York, NY 10013, United States
| | - Lyla Atta
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21211, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Brendan F Miller
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21211, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Rahul Satija
- New York Genome Center, New York, NY 10013, United States
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, United States
| | - John W Hickey
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, United States
| | - Jean Fan
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21211, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| |
Collapse
|
10
|
Thirimanne HN, Almiron-Bonnin D, Nuechterlein N, Arora S, Jensen M, Parada CA, Qiu C, Szulzewsky F, English CW, Chen WC, Sievers P, Nassiri F, Wang JZ, Klisch TJ, Aldape KD, Patel AJ, Cimino PJ, Zadeh G, Sahm F, Raleigh DR, Shendure J, Ferreira M, Holland EC. Meningioma transcriptomic landscape demonstrates novel subtypes with regional associated biology and patient outcome. CELL GENOMICS 2024; 4:100566. [PMID: 38788713 PMCID: PMC11228955 DOI: 10.1016/j.xgen.2024.100566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 04/16/2024] [Accepted: 05/02/2024] [Indexed: 05/26/2024]
Abstract
Meningiomas, although mostly benign, can be recurrent and fatal. World Health Organization (WHO) grading of the tumor does not always identify high-risk meningioma, and better characterizations of their aggressive biology are needed. To approach this problem, we combined 13 bulk RNA sequencing (RNA-seq) datasets to create a dimension-reduced reference landscape of 1,298 meningiomas. The clinical and genomic metadata effectively correlated with landscape regions, which led to the identification of meningioma subtypes with specific biological signatures. The time to recurrence also correlated with the map location. Further, we developed an algorithm that maps new patients onto this landscape, where the nearest neighbors predict outcome. This study highlights the utility of combining bulk transcriptomic datasets to visualize the complexity of tumor populations. Further, we provide an interactive tool for understanding the disease and predicting patient outcomes. This resource is accessible via the online tool Oncoscape, where the scientific community can explore the meningioma landscape.
Collapse
Affiliation(s)
| | - Damian Almiron-Bonnin
- Department of Pathology, University of California, San Francisco, San Francisco, CA, USA
| | - Nicholas Nuechterlein
- Neuropathology Unit, Surgical Neurology Branch, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Sonali Arora
- Human Biology Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Matt Jensen
- Human Biology Division, Fred Hutchinson Cancer Center, Seattle, WA, USA; Seattle Translational Tumor Research Center, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Carolina A Parada
- Department of Neurological Surgery, University of Washington Medical Center, Seattle, WA, USA
| | - Chengxiang Qiu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Frank Szulzewsky
- Human Biology Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Collin W English
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - William C Chen
- Departments of Radiation Oncology, Neurological Surgery, and Pathology, University of California, San Francisco, San Francisco, CA, USA
| | - Philipp Sievers
- Department of Neuropathology, Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany; Clinical Cooperation Unit Neuropathology, German Consortium for Translational Cancer Research (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Farshad Nassiri
- Department of Surgery, Division of Neurosurgery, University of Toronto, Toronto, ON, Canada
| | - Justin Z Wang
- Department of Surgery, Division of Neurosurgery, University of Toronto, Toronto, ON, Canada
| | - Tiemo J Klisch
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Kenneth D Aldape
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Akash J Patel
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Patrick J Cimino
- Neuropathology Unit, Surgical Neurology Branch, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Gelareh Zadeh
- Department of Surgery, Division of Neurosurgery, University of Toronto, Toronto, ON, Canada
| | - Felix Sahm
- Department of Neuropathology, Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany; Clinical Cooperation Unit Neuropathology, German Consortium for Translational Cancer Research (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - David R Raleigh
- Departments of Radiation Oncology, Neurological Surgery, and Pathology, University of California, San Francisco, San Francisco, CA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Manuel Ferreira
- Department of Neurological Surgery, University of Washington Medical Center, Seattle, WA, USA
| | - Eric C Holland
- Human Biology Division, Fred Hutchinson Cancer Center, Seattle, WA, USA; Seattle Translational Tumor Research Center, Fred Hutchinson Cancer Center, Seattle, WA, USA.
| |
Collapse
|
11
|
Singh R, Wu AP, Mudide A, Berger B. Causal gene regulatory analysis with RNA velocity reveals an interplay between slow and fast transcription factors. Cell Syst 2024; 15:462-474.e5. [PMID: 38754366 DOI: 10.1016/j.cels.2024.04.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/25/2023] [Accepted: 04/18/2024] [Indexed: 05/18/2024]
Abstract
Single-cell expression dynamics, from differentiation trajectories or RNA velocity, have the potential to reveal causal links between transcription factors (TFs) and their target genes in gene regulatory networks (GRNs). However, existing methods either overlook these expression dynamics or necessitate that cells be ordered along a linear pseudotemporal axis, which is incompatible with branching trajectories. We introduce Velorama, an approach to causal GRN inference that represents single-cell differentiation dynamics as a directed acyclic graph of cells, constructed from pseudotime or RNA velocity measurements. Additionally, Velorama enables the estimation of the speed at which TFs influence target genes. Applying Velorama, we uncover evidence that the speed of a TF's interactions is tied to its regulatory function. For human corticogenesis, we find that slow TFs are linked to gliomas, while fast TFs are associated with neuropsychiatric diseases. We expect Velorama to become a critical part of the RNA velocity toolkit for investigating the causal drivers of differentiation and disease.
Collapse
Affiliation(s)
- Rohit Singh
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA.
| | - Alexander P Wu
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
| | - Anish Mudide
- Phillips Exeter Academy, Exeter, NH 03883, USA; Computer Science and Artificial Intelligence Laboratory and Department of Mathematics, MIT, Cambridge, MA 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory and Department of Mathematics, MIT, Cambridge, MA 02139, USA.
| |
Collapse
|
12
|
Kang J, Lee JH, Cha H, An J, Kwon J, Lee S, Kim S, Baykan MY, Kim SY, An D, Kwon AY, An HJ, Lee SH, Choi JK, Park JE. Systematic dissection of tumor-normal single-cell ecosystems across a thousand tumors of 30 cancer types. Nat Commun 2024; 15:4067. [PMID: 38744958 PMCID: PMC11094150 DOI: 10.1038/s41467-024-48310-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 04/26/2024] [Indexed: 05/16/2024] Open
Abstract
The complexity of the tumor microenvironment poses significant challenges in cancer therapy. Here, to comprehensively investigate the tumor-normal ecosystems, we perform an integrative analysis of 4.9 million single-cell transcriptomes from 1070 tumor and 493 normal samples in combination with pan-cancer 137 spatial transcriptomics, 8887 TCGA, and 1261 checkpoint inhibitor-treated bulk tumors. We define a myriad of cell states constituting the tumor-normal ecosystems and also identify hallmark gene signatures across different cell types and organs. Our atlas characterizes distinctions between inflammatory fibroblasts marked by AKR1C1 or WNT5A in terms of cellular interactions and spatial co-localization patterns. Co-occurrence analysis reveals interferon-enriched community states including tertiary lymphoid structure (TLS) components, which exhibit differential rewiring between tumor, adjacent normal, and healthy normal tissues. The favorable response of interferon-enriched community states to immunotherapy is validated using immunotherapy-treated cancers (n = 1261) including our lung cancer cohort (n = 497). Deconvolution of spatial transcriptomes discriminates TLS-enriched from non-enriched cell types among immunotherapy-favorable components. Our systematic dissection of tumor-normal ecosystems provides a deeper understanding of inter- and intra-tumoral heterogeneity.
Collapse
Affiliation(s)
- Junho Kang
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Jun Hyeong Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Hongui Cha
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Jinhyeon An
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Joonha Kwon
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
- Division of Cancer Data Science, National Cancer Center, Bioinformatics Branch, Goyang, Republic of Korea
| | - Seongwoo Lee
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Seongryong Kim
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Mert Yakup Baykan
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - So Yeon Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Dohyeon An
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Ah-Young Kwon
- Department of Pathology, CHA Bundang Medical Center, CHA University, Seongnam-si, Republic of Korea
| | - Hee Jung An
- Department of Pathology, CHA Bundang Medical Center, CHA University, Seongnam-si, Republic of Korea
| | - Se-Hoon Lee
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea.
- Department of Health Sciences and Technology, Samsung Advanced Institute of Health Science and Technology, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea.
| | - Jung Kyoon Choi
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea.
- Penta Medix Co., Ltd., Seongnam-si, Gyeonggi-do, Republic of Korea.
| | - Jong-Eun Park
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea.
- Biomedical Research Center, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea.
| |
Collapse
|
13
|
Lotfollahi M, Yuhan Hao, Theis FJ, Satija R. The future of rapid and automated single-cell data analysis using reference mapping. Cell 2024; 187:2343-2358. [PMID: 38729109 PMCID: PMC11184658 DOI: 10.1016/j.cell.2024.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 03/05/2024] [Accepted: 03/08/2024] [Indexed: 05/12/2024]
Abstract
As the number of single-cell datasets continues to grow rapidly, workflows that map new data to well-curated reference atlases offer enormous promise for the biological community. In this perspective, we discuss key computational challenges and opportunities for single-cell reference-mapping algorithms. We discuss how mapping algorithms will enable the integration of diverse datasets across disease states, molecular modalities, genetic perturbations, and diverse species and will eventually replace manual and laborious unsupervised clustering pipelines.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Yuhan Hao
- Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York Genome Center, New York, NY, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK; Department of Mathematics, Technical University of Munich, Garching, Germany.
| | - Rahul Satija
- Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York Genome Center, New York, NY, USA.
| |
Collapse
|
14
|
Putri GH, Howitt G, Marsh-Wakefield F, Ashhurst TM, Phipson B. SuperCellCyto: enabling efficient analysis of large scale cytometry datasets. Genome Biol 2024; 25:89. [PMID: 38589921 PMCID: PMC11003185 DOI: 10.1186/s13059-024-03229-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 03/27/2024] [Indexed: 04/10/2024] Open
Abstract
Advancements in cytometry technologies have enabled quantification of up to 50 proteins across millions of cells at single cell resolution. Analysis of cytometry data routinely involves tasks such as data integration, clustering, and dimensionality reduction. While numerous tools exist, many require extensive run times when processing large cytometry data containing millions of cells. Existing solutions, such as random subsampling, are inadequate as they risk excluding rare cell subsets. To address this, we propose SuperCellCyto, an R package that builds on the SuperCell tool which groups highly similar cells into supercells. SuperCellCyto is available on GitHub ( https://github.com/phipsonlab/SuperCellCyto ) and Zenodo ( https://doi.org/10.5281/zenodo.10521294 ).
Collapse
Affiliation(s)
- Givanna H Putri
- The Walter and Eliza Hall Institute of Medical Research and The Department of Medical Biology, The University of Melbourne, Parkville, VIC, Australia.
| | - George Howitt
- Peter MacCallum Cancer Centre and The Sir Peter MacCallum, Department of Oncology, The University of Melbourne, Parkville, VIC, Australia
| | - Felix Marsh-Wakefield
- Centenary Institute of Cancer Medicine and Cell Biology, The University of Sydney, Sydney, NSW, Australia
| | - Thomas M Ashhurst
- Sydney Cytometry Core Research Facility and School of Medical Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Belinda Phipson
- The Walter and Eliza Hall Institute of Medical Research and The Department of Medical Biology, The University of Melbourne, Parkville, VIC, Australia.
| |
Collapse
|
15
|
Awuah WA, Roy S, Tan JK, Adebusoye FT, Qiang Z, Ferreira T, Ahluwalia A, Shet V, Yee ALW, Abdul‐Rahman T, Papadakis M. Exploring the current landscape of single-cell RNA sequencing applications in gastric cancer research. J Cell Mol Med 2024; 28:e18159. [PMID: 38494861 PMCID: PMC10945075 DOI: 10.1111/jcmm.18159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 12/22/2023] [Accepted: 01/12/2024] [Indexed: 03/19/2024] Open
Abstract
Gastric cancer (GC) represents a major global health burden and is responsible for a significant number of cancer-related fatalities. Its complex nature, characterized by heterogeneity and aggressive behaviour, poses considerable challenges for effective diagnosis and treatment. Single-cell RNA sequencing (scRNA-seq) has emerged as an important technique, offering unprecedented precision and depth in gene expression profiling at the cellular level. By facilitating the identification of distinct cell populations, rare cells and dynamic transcriptional changes within GC, scRNA-seq has yielded valuable insights into tumour progression and potential therapeutic targets. Moreover, this technology has significantly improved our comprehension of the tumour microenvironment (TME) and its intricate interplay with immune cells, thereby opening avenues for targeted therapeutic strategies. Nonetheless, certain obstacles, including tumour heterogeneity and technical limitations, persist in the field. Current endeavours are dedicated to refining protocols and computational tools to surmount these challenges. In this narrative review, we explore the significance of scRNA-seq in GC, emphasizing its advantages, challenges and potential applications in unravelling tumour heterogeneity and identifying promising therapeutic targets. Additionally, we discuss recent developments, ongoing efforts to overcome these challenges, and future prospects. Although further enhancements are required, scRNA-seq has already provided valuable insights into GC and holds promise for advancing biomedical research and clinical practice.
Collapse
Affiliation(s)
| | - Sakshi Roy
- School of MedicineQueen's University BelfastBelfastUK
| | | | | | - Zekai Qiang
- Department of Oncology & MetabolismThe University of SheffieldSheffieldUK
| | - Tomas Ferreira
- Department of Clinical Neurosciences, School of Clinical MedicineUniversity of CambridgeCambridgeUK
| | | | - Vallabh Shet
- Faculty of MedicineBangalore Medical College and Research InstituteBangaloreKarnatakaIndia
| | | | | | - Marios Papadakis
- Department of Surgery II, University Hospital Witten‐HerdeckeUniversity of Witten‐HerdeckeWuppertalGermany
| |
Collapse
|
16
|
Jiang A, Snell RG, Lehnert K. ICARUS v3, a massively scalable web server for single-cell RNA-seq analysis of millions of cells. Bioinformatics 2024; 40:btae167. [PMID: 38539041 PMCID: PMC11007236 DOI: 10.1093/bioinformatics/btae167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 03/18/2024] [Accepted: 03/25/2024] [Indexed: 04/12/2024] Open
Abstract
MOTIVATION In recent years, improvements in throughput of single-cell RNA-seq have resulted in a significant increase in the number of cells profiled. The generation of single-cell RNA-seq datasets comprising >1 million cells is becoming increasingly common, giving rise to demands for more efficient computational workflows. RESULTS We present an update to our single-cell RNA-seq analysis web server application, ICARUS (available at https://launch.icarus-scrnaseq.cloud.edu.au) that allows effective analysis of large-scale single-cell RNA-seq datasets. ICARUS v3 utilizes the geometric cell sketching method to subsample cells from the overall dataset for dimensionality reduction and clustering that can be then projected to the large dataset. We then extend this functionality to select a representative subset of cells for downstream data analysis applications including differential expression analysis, gene co-expression network construction, gene regulatory network construction, trajectory analysis, cell-cell communication inference, and cell cluster associations to GWAS traits. We demonstrate analysis of single-cell RNA-seq datasets using ICARUS v3 of 1.3 million cells completed within the hour. AVAILABILITY AND IMPLEMENTATION ICARUS is available at https://launch.icarus-scrnaseq.cloud.edu.au.
Collapse
Affiliation(s)
- Andrew Jiang
- Applied Translational Genetics Group, School of Biological Sciences, The University of Auckland, Auckland 1142, New Zealand
| | - Russell G Snell
- Applied Translational Genetics Group, School of Biological Sciences, The University of Auckland, Auckland 1142, New Zealand
| | - Klaus Lehnert
- Applied Translational Genetics Group, School of Biological Sciences, The University of Auckland, Auckland 1142, New Zealand
| |
Collapse
|
17
|
Qiu C, Martin BK, Welsh IC, Daza RM, Le TM, Huang X, Nichols EK, Taylor ML, Fulton O, O'Day DR, Gomes AR, Ilcisin S, Srivatsan S, Deng X, Disteche CM, Noble WS, Hamazaki N, Moens CB, Kimelman D, Cao J, Schier AF, Spielmann M, Murray SA, Trapnell C, Shendure J. A single-cell time-lapse of mouse prenatal development from gastrula to birth. Nature 2024; 626:1084-1093. [PMID: 38355799 PMCID: PMC10901739 DOI: 10.1038/s41586-024-07069-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 01/15/2024] [Indexed: 02/16/2024]
Abstract
The house mouse (Mus musculus) is an exceptional model system, combining genetic tractability with close evolutionary affinity to humans1,2. Mouse gestation lasts only 3 weeks, during which the genome orchestrates the astonishing transformation of a single-cell zygote into a free-living pup composed of more than 500 million cells. Here, to establish a global framework for exploring mammalian development, we applied optimized single-cell combinatorial indexing3 to profile the transcriptional states of 12.4 million nuclei from 83 embryos, precisely staged at 2- to 6-hour intervals spanning late gastrulation (embryonic day 8) to birth (postnatal day 0). From these data, we annotate hundreds of cell types and explore the ontogenesis of the posterior embryo during somitogenesis and of kidney, mesenchyme, retina and early neurons. We leverage the temporal resolution and sampling depth of these whole-embryo snapshots, together with published data4-8 from earlier timepoints, to construct a rooted tree of cell-type relationships that spans the entirety of prenatal development, from zygote to birth. Throughout this tree, we systematically nominate genes encoding transcription factors and other proteins as candidate drivers of the in vivo differentiation of hundreds of cell types. Remarkably, the most marked temporal shifts in cell states are observed within one hour of birth and presumably underlie the massive physiological adaptations that must accompany the successful transition of a mammalian fetus to life outside the womb.
Collapse
Affiliation(s)
- Chengxiang Qiu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Riza M Daza
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Truc-Mai Le
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Xingfan Huang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Eva K Nichols
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Megan L Taylor
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Olivia Fulton
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Diana R O'Day
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | | | - Saskia Ilcisin
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Sanjay Srivatsan
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA, USA
| | - Xinxian Deng
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Christine M Disteche
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Nobuhiko Hamazaki
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - Cecilia B Moens
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - David Kimelman
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Junyue Cao
- Laboratory of Single-Cell Genomics and Population dynamics, The Rockefeller University, New York, NY, USA
| | - Alexander F Schier
- Biozentrum, University of Basel, Basel, Switzerland
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Malte Spielmann
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute of Human Genetics, University Hospitals Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Kiel, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Hamburg, Lübeck, Kiel, Lübeck, Germany
| | | | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
- Seattle Hub for Synthetic Biology, Seattle, WA, USA.
| |
Collapse
|
18
|
Hao Y, Stuart T, Kowalski MH, Choudhary S, Hoffman P, Hartman A, Srivastava A, Molla G, Madad S, Fernandez-Granda C, Satija R. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol 2024; 42:293-304. [PMID: 37231261 PMCID: PMC10928517 DOI: 10.1038/s41587-023-01767-y] [Citation(s) in RCA: 346] [Impact Index Per Article: 346.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 03/28/2023] [Indexed: 05/27/2023]
Abstract
Mapping single-cell sequencing profiles to comprehensive reference datasets provides a powerful alternative to unsupervised analysis. However, most reference datasets are constructed from single-cell RNA-sequencing data and cannot be used to annotate datasets that do not measure gene expression. Here we introduce 'bridge integration', a method to integrate single-cell datasets across modalities using a multiomic dataset as a molecular bridge. Each cell in the multiomic dataset constitutes an element in a 'dictionary', which is used to reconstruct unimodal datasets and transform them into a shared space. Our procedure accurately integrates transcriptomic data with independent single-cell measurements of chromatin accessibility, histone modifications, DNA methylation and protein levels. Moreover, we demonstrate how dictionary learning can be combined with sketching techniques to improve computational scalability and harmonize 8.6 million human immune cell profiles from sequencing and mass cytometry experiments. Our approach, implemented in version 5 of our Seurat toolkit ( http://www.satijalab.org/seurat ), broadens the utility of single-cell reference datasets and facilitates comparisons across diverse molecular modalities.
Collapse
Affiliation(s)
- Yuhan Hao
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Tim Stuart
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Madeline H Kowalski
- New York Genome Center, New York, NY, USA
- Institute for System Genetics, NYU Langone Medical Center, New York, NY, USA
| | - Saket Choudhary
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Paul Hoffman
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Austin Hartman
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Avi Srivastava
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | | | - Shaista Madad
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Carlos Fernandez-Granda
- Center for Data Science, New York University, New York, NY, USA
- Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - Rahul Satija
- Center for Genomics and Systems Biology, New York University, New York, NY, USA.
- New York Genome Center, New York, NY, USA.
| |
Collapse
|
19
|
Andreatta M, Hérault L, Gueguen P, Gfeller D, Berenstein AJ, Carmona SJ. Semi-supervised integration of single-cell transcriptomics data. Nat Commun 2024; 15:872. [PMID: 38287014 PMCID: PMC10825117 DOI: 10.1038/s41467-024-45240-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 01/16/2024] [Indexed: 01/31/2024] Open
Abstract
Batch effects in single-cell RNA-seq data pose a significant challenge for comparative analyses across samples, individuals, and conditions. Although batch effect correction methods are routinely applied, data integration often leads to overcorrection and can result in the loss of biological variability. In this work we present STACAS, a batch correction method for scRNA-seq that leverages prior knowledge on cell types to preserve biological variability upon integration. Through an open-source benchmark, we show that semi-supervised STACAS outperforms state-of-the-art unsupervised methods, as well as supervised methods such as scANVI and scGen. STACAS scales well to large datasets and is robust to incomplete and imprecise input cell type labels, which are commonly encountered in real-life integration tasks. We argue that the incorporation of prior cell type information should be a common practice in single-cell data integration, and we provide a flexible framework for semi-supervised batch effect correction.
Collapse
Affiliation(s)
- Massimo Andreatta
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Léonard Hérault
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Paul Gueguen
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Ariel J Berenstein
- Laboratorio de Biología Molecular, División Patología, Instituto Multidisciplinario de Investigaciones en Patologías Pediátricas (IMIPP), CONICET-GCBA, Buenos Aires, C1425EFD, Argentina
| | - Santiago J Carmona
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland.
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| |
Collapse
|
20
|
Petersen C, Mucke L, Corces MR. CHOIR improves significance-based detection of cell types and states from single-cell data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.576317. [PMID: 38328105 PMCID: PMC10849522 DOI: 10.1101/2024.01.18.576317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Clustering is a critical step in the analysis of single-cell data, as it enables the discovery and characterization of putative cell types and states. However, most popular clustering tools do not subject clustering results to statistical inference testing, leading to risks of overclustering or underclustering data and often resulting in ineffective identification of cell types with widely differing prevalence. To address these challenges, we present CHOIR (clustering hierarchy optimization by iterative random forests), which applies a framework of random forest classifiers and permutation tests across a hierarchical clustering tree to statistically determine which clusters represent distinct populations. We demonstrate the enhanced performance of CHOIR through extensive benchmarking against 14 existing clustering methods across 100 simulated and 4 real single-cell RNA-seq, ATAC-seq, spatial transcriptomic, and multi-omic datasets. CHOIR can be applied to any single-cell data type and provides a flexible, scalable, and robust solution to the important challenge of identifying biologically relevant cell groupings within heterogeneous single-cell data.
Collapse
Affiliation(s)
- Cathrine Petersen
- Gladstone Institute of Neurological Disease, Gladstone Institutes, San Francisco, CA, USA
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Lennart Mucke
- Gladstone Institute of Neurological Disease, Gladstone Institutes, San Francisco, CA, USA
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
- Department of Neurology and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - M. Ryan Corces
- Gladstone Institute of Neurological Disease, Gladstone Institutes, San Francisco, CA, USA
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
- Department of Neurology and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
21
|
Neilson LJ, Cartwright D, Risteli M, Jokinen EM, McGarry L, Sandvik T, Nikolatou K, Hodge K, Atkinson S, Vias M, Kay EJ, Brenton JD, Carlin LM, Bryant DM, Salo T, Zanivan S. Omentum-derived matrix enables the study of metastatic ovarian cancer and stromal cell functions in a physiologically relevant environment. Matrix Biol Plus 2023; 19-20:100136. [PMID: 38223308 PMCID: PMC10784634 DOI: 10.1016/j.mbplus.2023.100136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 10/20/2023] [Accepted: 11/12/2023] [Indexed: 01/16/2024] Open
Abstract
High-grade serous (HGS) ovarian cancer is the most lethal gynaecological disease in the world and metastases is a major cause. The omentum is the preferential metastatic site in HGS ovarian cancer patients and in vitro models that recapitulate the original environment of this organ at cellular and molecular level are being developed to study basic mechanisms that underpin this disease. The tumour extracellular matrix (ECM) plays active roles in HGS ovarian cancer pathology and response to therapy. However, most of the current in vitro models use matrices of animal origin and that do not recapitulate the complexity of the tumour ECM in patients. Here, we have developed omentum gel (OmGel), a matrix made from tumour-associated omental tissue of HGS ovarian cancer patients that has unprecedented similarity to the ECM of HGS omental tumours and is simple to prepare. When used in 2D and 3D in vitro assays to assess cancer cell functions relevant to metastatic ovarian cancer, OmGel performs as well as or better than the widely use Matrigel and does not induce additional phenotypic changes to ovarian cancer cells. Surprisingly, OmGel promotes pronounced morphological changes in cancer associated fibroblasts (CAFs). These changes were associated with the upregulation of proteins that define subsets of CAFs in tumour patient samples, highlighting the importance of using clinically and physiologically relevant matrices for in vitro studies. Hence, OmGel provides a step forward to study the biology of HGS omental metastasis. Metastasis in the omentum are also typical of other cancer types, particularly gastric cancer, implying the relevance of OmGel to study the biology of other highly lethal cancers.
Collapse
Affiliation(s)
| | - Douglas Cartwright
- Cancer Research UK Scotland Institute, Glasgow, UK
- School of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - Maija Risteli
- Research Unit of Population Health, Medical Research Center Oulu, University of Oulu and Oulu University Hospital, Oulu, Finland
| | - Elina M. Jokinen
- Department of Bacteriology and Immunology, Translational Immunology Research Program, University of Helsinki, Finland
| | - Lynn McGarry
- Cancer Research UK Scotland Institute, Glasgow, UK
| | - Toni Sandvik
- Research Unit of Population Health, Medical Research Center Oulu, University of Oulu and Oulu University Hospital, Oulu, Finland
| | - Konstantina Nikolatou
- Cancer Research UK Scotland Institute, Glasgow, UK
- School of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - Kelly Hodge
- Cancer Research UK Scotland Institute, Glasgow, UK
| | | | - Maria Vias
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, UK
| | - Emily J. Kay
- Cancer Research UK Scotland Institute, Glasgow, UK
| | - James D. Brenton
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, UK
| | - Leo M. Carlin
- Cancer Research UK Scotland Institute, Glasgow, UK
- School of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - David M. Bryant
- Cancer Research UK Scotland Institute, Glasgow, UK
- School of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - Tuula Salo
- Research Unit of Population Health, Medical Research Center Oulu, University of Oulu and Oulu University Hospital, Oulu, Finland
- Department of Pathology, University of Helsinki, Helsinki, Finland
- Department of Oral and Maxillofacial Diseases, Clinicum, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Sara Zanivan
- Cancer Research UK Scotland Institute, Glasgow, UK
- School of Cancer Sciences, University of Glasgow, Glasgow, UK
| |
Collapse
|
22
|
Kwon J, Kang J, Jo A, Seo K, An D, Baykan MY, Lee JH, Kim N, Eum HH, Hwang S, Lee JM, Park WY, An HJ, Lee HO, Park JE, Choi JK. Single-cell mapping of combinatorial target antigens for CAR switches using logic gates. Nat Biotechnol 2023; 41:1593-1605. [PMID: 36797491 DOI: 10.1038/s41587-023-01686-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 01/20/2023] [Indexed: 02/18/2023]
Abstract
Identification of optimal target antigens that distinguish cancer cells from normal surrounding tissue cells remains a key challenge in chimeric antigen receptor (CAR) cell therapy for tumors with intratumoral heterogeneity. In this study, we dissected tissue complexity to the level of individual cells through the construction of a single-cell expression atlas that integrates ~1.4 million tumor, tumor-infiltrating normal and reference normal cells from 412 tumors and 12 normal organs. We used a two-step screening method using random forest and convolutional neural networks to select gene pairs that contribute most to discrimination between individual malignant and normal cells. Tumor coverage and specificity are evaluated for the AND, OR and NOT logic gates based on the combinatorial expression pattern of the pairing genes across individual single cells. Single-cell transcriptome-coupled epitope profiling validates the AND, OR and NOT switch targets identified in ovarian cancer and colorectal cancer.
Collapse
Affiliation(s)
- Joonha Kwon
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Junho Kang
- Graduate School of Medical Science and Engineering, KAIST, Daejeon, Republic of Korea
| | - Areum Jo
- Department of Microbiology, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
- Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of Korea, Seoul, Republic of Korea
| | - Kayoung Seo
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Dohyeon An
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Mert Yakup Baykan
- Graduate School of Medical Science and Engineering, KAIST, Daejeon, Republic of Korea
| | - Jun Hyeong Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Nayoung Kim
- Department of Microbiology, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
- Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of Korea, Seoul, Republic of Korea
| | - Hye Hyeon Eum
- Department of Microbiology, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
- Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of Korea, Seoul, Republic of Korea
| | - Sohyun Hwang
- Department of Pathology, CHA Bundang Medical Center, CHA University, Seongnam-si, Republic of Korea
- Department of Biomedical Science, CHA University, Pocheon-si, Republic of Korea
| | - Ji Min Lee
- CHA Advanced Research Institute, CHA Bundang Medical Center, Seongnam-si, Republic of Korea
| | - Woong-Yang Park
- Samsung Genome Institute, Samsung Medical Center, Seoul, Republic of Korea
| | - Hee Jung An
- Department of Pathology, CHA Bundang Medical Center, CHA University, Seongnam-si, Republic of Korea.
| | - Hae-Ock Lee
- Department of Microbiology, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.
- Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of Korea, Seoul, Republic of Korea.
| | - Jong-Eun Park
- Graduate School of Medical Science and Engineering, KAIST, Daejeon, Republic of Korea.
| | - Jung Kyoon Choi
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea.
- Penta Medix Co., Ltd., Seongnam-si, Republic of Korea.
| |
Collapse
|
23
|
Puccio S, Grillo G, Alvisi G, Scirgolea C, Galletti G, Mazza EMC, Consiglio A, De Simone G, Licciulli F, Lugli E. CRUSTY: a versatile web platform for the rapid analysis and visualization of high-dimensional flow cytometry data. Nat Commun 2023; 14:5102. [PMID: 37666818 PMCID: PMC10477295 DOI: 10.1038/s41467-023-40790-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 08/10/2023] [Indexed: 09/06/2023] Open
Abstract
Flow cytometry (FCM) can investigate dozens of parameters from millions of cells and hundreds of specimens in a short time and at a reasonable cost, but the amount of data that is generated is considerable. Computational approaches are useful to identify novel subpopulations and molecular biomarkers, but generally require deep expertize in bioinformatics and the use of different platforms. To overcome these limitations, we introduce CRUSTY, an interactive, user-friendly webtool incorporating the most popular algorithms for FCM data analysis, and capable of visualizing graphical and tabular results and automatically generating publication-quality figures within minutes. CRUSTY also hosts an interactive interface for the exploration of results in real time. Thus, CRUSTY enables a large number of users to mine complex datasets and reduce the time required for data exploration and interpretation. CRUSTY is accessible at https://crusty.humanitas.it/ .
Collapse
Affiliation(s)
- Simone Puccio
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy.
- Institute of Genetic and Biomedical Research, UoS Milan, National Research Council, via Manzoni 56, 20089, Rozzano, Milan, Italy.
| | - Giorgio Grillo
- Institute for Biomedical Technologies, National Research Council, via Amendola 122/D, 70126, Bari, Italy
| | - Giorgia Alvisi
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Caterina Scirgolea
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Giovanni Galletti
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
- School of Biological Sciences, Department of Molecular Biology, University of California San Diego, San Diego, CA, USA
| | - Emilia Maria Cristina Mazza
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Arianna Consiglio
- Institute for Biomedical Technologies, National Research Council, via Amendola 122/D, 70126, Bari, Italy
| | - Gabriele De Simone
- Flow Cytometry Core, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Flavio Licciulli
- Institute for Biomedical Technologies, National Research Council, via Amendola 122/D, 70126, Bari, Italy
| | - Enrico Lugli
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy.
| |
Collapse
|
24
|
DeMeo B, Berger B. SCA: recovering single-cell heterogeneity through information-based dimensionality reduction. Genome Biol 2023; 24:195. [PMID: 37626411 PMCID: PMC10464206 DOI: 10.1186/s13059-023-02998-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 06/28/2023] [Indexed: 08/27/2023] Open
Abstract
Dimensionality reduction summarizes the complex transcriptomic landscape of single-cell datasets for downstream analyses. Current approaches favor large cellular populations defined by many genes, at the expense of smaller and more subtly defined populations. Here, we present surprisal component analysis (SCA), a technique that newly leverages the information-theoretic notion of surprisal for dimensionality reduction to promote more meaningful signal extraction. For example, SCA uncovers clinically important cytotoxic T-cell subpopulations that are indistinguishable using existing pipelines. We also demonstrate that SCA substantially improves downstream imputation. SCA's efficient information-theoretic paradigm has broad applications to the study of complex biological tissues in health and disease.
Collapse
Affiliation(s)
- Benjamin DeMeo
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, 02139, MA, USA
- Department of Biomedical Informatics, Harvard University, Cambridge, 02138, MA, USA
- Department of Mathematics, MIT, Cambridge, 02139, MA, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, 02139, MA, USA.
- Department of Mathematics, MIT, Cambridge, 02139, MA, USA.
| |
Collapse
|
25
|
Kolabas ZI, Kuemmerle LB, Perneczky R, Förstera B, Ulukaya S, Ali M, Kapoor S, Bartos LM, Büttner M, Caliskan OS, Rong Z, Mai H, Höher L, Jeridi D, Molbay M, Khalin I, Deligiannis IK, Negwer M, Roberts K, Simats A, Carofiglio O, Todorov MI, Horvath I, Ozturk F, Hummel S, Biechele G, Zatcepin A, Unterrainer M, Gnörich J, Roodselaar J, Shrouder J, Khosravani P, Tast B, Richter L, Díaz-Marugán L, Kaltenecker D, Lux L, Chen Y, Zhao S, Rauchmann BS, Sterr M, Kunze I, Stanic K, Kan VWY, Besson-Girard S, Katzdobler S, Palleis C, Schädler J, Paetzold JC, Liebscher S, Hauser AE, Gokce O, Lickert H, Steinke H, Benakis C, Braun C, Martinez-Jimenez CP, Buerger K, Albert NL, Höglinger G, Levin J, Haass C, Kopczak A, Dichgans M, Havla J, Kümpfel T, Kerschensteiner M, Schifferer M, Simons M, Liesz A, Krahmer N, Bayraktar OA, Franzmeier N, Plesnila N, Erener S, Puelles VG, Delbridge C, Bhatia HS, Hellal F, Elsner M, Bechmann I, Ondruschka B, Brendel M, Theis FJ, Erturk A. Distinct molecular profiles of skull bone marrow in health and neurological disorders. Cell 2023; 186:3706-3725.e29. [PMID: 37562402 PMCID: PMC10443631 DOI: 10.1016/j.cell.2023.07.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 04/24/2023] [Accepted: 07/07/2023] [Indexed: 08/12/2023]
Abstract
The bone marrow in the skull is important for shaping immune responses in the brain and meninges, but its molecular makeup among bones and relevance in human diseases remain unclear. Here, we show that the mouse skull has the most distinct transcriptomic profile compared with other bones in states of health and injury, characterized by a late-stage neutrophil phenotype. In humans, proteome analysis reveals that the skull marrow is the most distinct, with differentially expressed neutrophil-related pathways and a unique synaptic protein signature. 3D imaging demonstrates the structural and cellular details of human skull-meninges connections (SMCs) compared with veins. Last, using translocator protein positron emission tomography (TSPO-PET) imaging, we show that the skull bone marrow reflects inflammatory brain responses with a disease-specific spatial distribution in patients with various neurological disorders. The unique molecular profile and anatomical and functional connections of the skull show its potential as a site for diagnosing, monitoring, and treating brain diseases.
Collapse
Affiliation(s)
- Zeynep Ilgin Kolabas
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany; Graduate School of Systemic Neurosciences (GSN), Munich, Germany
| | - Louis B Kuemmerle
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Robert Perneczky
- Division of Mental Health in Older Adults and Alzheimer Therapy and Research Center, Department of Psychiatry and Psychotherapy, University Hospital, Ludwig Maximilian University Munich, 80336 Munich, Germany; German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany; Ageing Epidemiology (AGE) Research Unit, School of Public Health, Imperial College London, London, UK; Munich Cluster for Systems Neurology (SyNergy), Munich, Germany; Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, UK
| | - Benjamin Förstera
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany
| | - Selin Ulukaya
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany
| | - Mayar Ali
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; Graduate School of Systemic Neurosciences (GSN), Munich, Germany; Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Saketh Kapoor
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany
| | - Laura M Bartos
- Department of Nuclear Medicine, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Maren Büttner
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Ozum Sehnaz Caliskan
- Institute for Diabetes and Obesity, Helmholtz Center Munich and German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany
| | - Zhouyi Rong
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany; Munich Medical Research School (MMRS), 80336 Munich, Germany
| | - Hongcheng Mai
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany; Munich Medical Research School (MMRS), 80336 Munich, Germany
| | - Luciano Höher
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany
| | - Denise Jeridi
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany
| | - Muge Molbay
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany
| | - Igor Khalin
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany
| | | | - Moritz Negwer
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany
| | | | - Alba Simats
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany
| | - Olga Carofiglio
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany
| | - Mihail I Todorov
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany
| | - Izabela Horvath
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; School of Computation, Information and Technology (CIT), TUM, Boltzmannstr. 3, 85748 Garching, Germany
| | - Furkan Ozturk
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany
| | - Selina Hummel
- German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany; Department of Nuclear Medicine, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gloria Biechele
- Department of Nuclear Medicine, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Artem Zatcepin
- German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany; Department of Nuclear Medicine, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Marcus Unterrainer
- Department of Nuclear Medicine, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany; Department of Radiology, University Hospital, LMU Munich, Munich, Germany
| | - Johannes Gnörich
- Department of Nuclear Medicine, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Jay Roodselaar
- Charité - Universitätsmedizin Berlin, Department of Rheumatology and Clinical Immunology, Berlin, Germany; Immune Dynamics, Deutsches Rheuma-Forschungszentrum (DRFZ), a Leibniz Institute, Berlin, Germany
| | - Joshua Shrouder
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany
| | - Pardis Khosravani
- Biomedical Center (BMC), Core Facility Flow Cytometry, Faculty of Medicine, LMU Munich, Munich, Germany
| | - Benjamin Tast
- Biomedical Center (BMC), Core Facility Flow Cytometry, Faculty of Medicine, LMU Munich, Munich, Germany
| | - Lisa Richter
- Biomedical Center (BMC), Core Facility Flow Cytometry, Faculty of Medicine, LMU Munich, Munich, Germany
| | - Laura Díaz-Marugán
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany
| | - Doris Kaltenecker
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; Institute for Diabetes and Cancer, Helmholtz Munich, Munich, Germany
| | - Laurin Lux
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany
| | - Ying Chen
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany
| | - Shan Zhao
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany
| | - Boris-Stephan Rauchmann
- Division of Mental Health in Older Adults and Alzheimer Therapy and Research Center, Department of Psychiatry and Psychotherapy, University Hospital, Ludwig Maximilian University Munich, 80336 Munich, Germany; Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, UK; Institute of Neuroradiology, University Hospital LMU, Munich, Germany
| | - Michael Sterr
- Institute of Diabetes and Regeneration Research, Helmholtz Diabetes Center, Helmholtz Zentrum München, Neuherberg, Germany; Institute of Stem Cell Research, Helmholtz Zentrum München, Neuherberg, Germany
| | - Ines Kunze
- Institute of Diabetes and Regeneration Research, Helmholtz Diabetes Center, Helmholtz Zentrum München, Neuherberg, Germany; Institute of Stem Cell Research, Helmholtz Zentrum München, Neuherberg, Germany
| | - Karen Stanic
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany
| | - Vanessa W Y Kan
- Institute of Clinical Neuroimmunology, University Hospital Munich, Ludwig-Maximilians University Munich, Munich, Germany
| | - Simon Besson-Girard
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany; Graduate School of Systemic Neurosciences (GSN), Munich, Germany
| | - Sabrina Katzdobler
- German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany; Department of Neurology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Carla Palleis
- German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany; Department of Neurology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Julia Schädler
- Institute of Legal Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Johannes C Paetzold
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; Department of Computing, Imperial College London, London, UK
| | - Sabine Liebscher
- Munich Cluster for Systems Neurology (SyNergy), Munich, Germany; Institute of Clinical Neuroimmunology, University Hospital Munich, Ludwig-Maximilians University Munich, Munich, Germany; Biomedical Center (BMC), Medical Faculty, Ludwig-Maximilians Universität Munich, Munich, Germany
| | - Anja E Hauser
- Charité - Universitätsmedizin Berlin, Department of Rheumatology and Clinical Immunology, Berlin, Germany; Immune Dynamics, Deutsches Rheuma-Forschungszentrum (DRFZ), a Leibniz Institute, Berlin, Germany
| | - Ozgun Gokce
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany; Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Heiko Lickert
- Institute of Diabetes and Regeneration Research, Helmholtz Diabetes Center, Helmholtz Zentrum München, Neuherberg, Germany; Institute of Stem Cell Research, Helmholtz Zentrum München, Neuherberg, Germany; TUM School of Medicine, Technical University of Munich, Munich, Germany
| | - Hanno Steinke
- Institute of Anatomy, University of Leipzig, 04109 Leipzig, Germany
| | - Corinne Benakis
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany
| | - Christian Braun
- Institute of Legal Medicine, Faculty of Medicine, LMU Munich, Germany
| | - Celia P Martinez-Jimenez
- Helmholtz Pioneer Campus (HPC), Helmholtz Munich, Neuherberg, Germany; TUM School of Medicine, Technical University of Munich, Munich, Germany
| | - Katharina Buerger
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany; German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany
| | - Nathalie L Albert
- Department of Nuclear Medicine, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Günter Höglinger
- German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany; Department of Neurology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Johannes Levin
- German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany; Munich Cluster for Systems Neurology (SyNergy), Munich, Germany; Department of Neurology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Christian Haass
- German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany; Munich Cluster for Systems Neurology (SyNergy), Munich, Germany; Metabolic Biochemistry, Biomedical Center (BMC), Faculty of Medicine, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Anna Kopczak
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany
| | - Martin Dichgans
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany; German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany; Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Joachim Havla
- Institute of Clinical Neuroimmunology, University Hospital Munich, Ludwig-Maximilians University Munich, Munich, Germany; Biomedical Center (BMC), Medical Faculty, Ludwig-Maximilians Universität Munich, Munich, Germany
| | - Tania Kümpfel
- Institute of Clinical Neuroimmunology, University Hospital Munich, Ludwig-Maximilians University Munich, Munich, Germany; Biomedical Center (BMC), Medical Faculty, Ludwig-Maximilians Universität Munich, Munich, Germany
| | - Martin Kerschensteiner
- Munich Cluster for Systems Neurology (SyNergy), Munich, Germany; Institute of Clinical Neuroimmunology, University Hospital Munich, Ludwig-Maximilians University Munich, Munich, Germany; Biomedical Center (BMC), Medical Faculty, Ludwig-Maximilians Universität Munich, Munich, Germany
| | - Martina Schifferer
- German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany; Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Mikael Simons
- German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany; Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Arthur Liesz
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany; Graduate School of Systemic Neurosciences (GSN), Munich, Germany; Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Natalie Krahmer
- Institute for Diabetes and Obesity, Helmholtz Center Munich and German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany
| | | | - Nicolai Franzmeier
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany
| | - Nikolaus Plesnila
- Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany; Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Suheda Erener
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany
| | - Victor G Puelles
- III. Department of Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany; Hamburg Center for Kidney Health (HCKH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany; Department of Clinical Medicine, Aarhus University, Aarhus, Denmark; Department of Pathology, Aarhus University Hospital, Aarhus, Denmark
| | - Claire Delbridge
- Institute of Pathology, Department of Neuropathology, Technical University Munich, TUM School of Medicine, Munich, Germany
| | - Harsharan Singh Bhatia
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany
| | - Farida Hellal
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany; Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Markus Elsner
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany
| | - Ingo Bechmann
- Institute of Anatomy, University of Leipzig, 04109 Leipzig, Germany
| | - Benjamin Ondruschka
- Institute of Legal Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Matthias Brendel
- German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany; Munich Cluster for Systems Neurology (SyNergy), Munich, Germany; Department of Nuclear Medicine, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany; Department of Mathematics, Technische Universität München, Garching bei München, Germany
| | - Ali Erturk
- Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center, Neuherberg, Munich, Germany; Institute for Stroke and Dementia Research, LMU University Hospital, Ludwig-Maximilians University Munich, Munich, Germany; Graduate School of Systemic Neurosciences (GSN), Munich, Germany; Munich Cluster for Systems Neurology (SyNergy), Munich, Germany.
| |
Collapse
|
26
|
Hepkema J, Lee NK, Stewart BJ, Ruangroengkulrith S, Charoensawan V, Clatworthy MR, Hemberg M. Predicting the impact of sequence motifs on gene regulation using single-cell data. Genome Biol 2023; 24:189. [PMID: 37582793 PMCID: PMC10426127 DOI: 10.1186/s13059-023-03021-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 07/21/2023] [Indexed: 08/17/2023] Open
Abstract
The binding of transcription factors at proximal promoters and distal enhancers is central to gene regulation. Identifying regulatory motifs and quantifying their impact on expression remains challenging. Using a convolutional neural network trained on single-cell data, we infer putative regulatory motifs and cell type-specific importance. Our model, scover, explains 29% of the variance in gene expression in multiple mouse tissues. Applying scover to distal enhancers identified using scATAC-seq from the developing human brain, we identify cell type-specific motif activities in distal enhancers. Scover can identify regulatory motifs and their importance from single-cell data where all parameters and outputs are easily interpretable.
Collapse
Affiliation(s)
- Jacob Hepkema
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Nicholas Keone Lee
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- The Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK
| | - Benjamin J Stewart
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, CB2 0QQ, UK
- Cambridge University Hospitals NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, CB2 0QQ, UK
| | - Siwat Ruangroengkulrith
- Department of Biochemistry, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
| | - Varodom Charoensawan
- Department of Biochemistry, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
- Integrative Computational BioScience (ICBS) Center, Mahidol University, Nakhon Pathom, 7310, Thailand
- Systems Biology of Diseases Research Unit, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
| | - Menna R Clatworthy
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, CB2 0QQ, UK
- Cambridge University Hospitals NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, CB2 0QQ, UK
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.
- The Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.
- Gene Lay Institute of Immunology and Inflammation, Brigham and Women's Hospital, Massachusetts General Hospital, and Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
27
|
Woo T, Liang X, Evans DA, Fernandez O, Kretschmer F, Reiter S, Laurent G. The dynamics of pattern matching in camouflaging cuttlefish. Nature 2023:10.1038/s41586-023-06259-2. [PMID: 37380772 PMCID: PMC10322717 DOI: 10.1038/s41586-023-06259-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 05/22/2023] [Indexed: 06/30/2023]
Abstract
Many cephalopods escape detection using camouflage1. This behaviour relies on a visual assessment of the surroundings, on an interpretation of visual-texture statistics2-4 and on matching these statistics using millions of skin chromatophores that are controlled by motoneurons located in the brain5-7. Analysis of cuttlefish images proposed that camouflage patterns are low dimensional and categorizable into three pattern classes, built from a small repertoire of components8-11. Behavioural experiments also indicated that, although camouflage requires vision, its execution does not require feedback5,12,13, suggesting that motion within skin-pattern space is stereotyped and lacks the possibility of correction. Here, using quantitative methods14, we studied camouflage in the cuttlefish Sepia officinalis as behavioural motion towards background matching in skin-pattern space. An analysis of hundreds of thousands of images over natural and artificial backgrounds revealed that the space of skin patterns is high-dimensional and that pattern matching is not stereotyped-each search meanders through skin-pattern space, decelerating and accelerating repeatedly before stabilizing. Chromatophores could be grouped into pattern components on the basis of their covariation during camouflaging. These components varied in shapes and sizes, and overlay one another. However, their identities varied even across transitions between identical skin-pattern pairs, indicating flexibility of implementation and absence of stereotypy. Components could also be differentiated by their sensitivity to spatial frequency. Finally, we compared camouflage to blanching, a skin-lightening reaction to threatening stimuli. Pattern motion during blanching was direct and fast, consistent with open-loop motion in low-dimensional pattern space, in contrast to that observed during camouflage.
Collapse
Affiliation(s)
- Theodosia Woo
- Max Planck Institute for Brain Research, Frankfurt, Germany
| | - Xitong Liang
- Max Planck Institute for Brain Research, Frankfurt, Germany
- School of Life Sciences, Peking University, Beijing, China
| | | | | | | | - Sam Reiter
- Max Planck Institute for Brain Research, Frankfurt, Germany.
- Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.
| | - Gilles Laurent
- Max Planck Institute for Brain Research, Frankfurt, Germany.
| |
Collapse
|
28
|
Berger B, Yu YW. Navigating bottlenecks and trade-offs in genomic data analysis. Nat Rev Genet 2023; 24:235-250. [PMID: 36476810 PMCID: PMC10204111 DOI: 10.1038/s41576-022-00551-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2022] [Indexed: 12/12/2022]
Abstract
Genome sequencing and analysis allow researchers to decode the functional information hidden in DNA sequences as well as to study cell to cell variation within a cell population. Traditionally, the primary bottleneck in genomic analysis pipelines has been the sequencing itself, which has been much more expensive than the computational analyses that follow. However, an important consequence of the continued drive to expand the throughput of sequencing platforms at lower cost is that often the analytical pipelines are struggling to keep up with the sheer amount of raw data produced. Computational cost and efficiency have thus become of ever increasing importance. Recent methodological advances, such as data sketching, accelerators and domain-specific libraries/languages, promise to address these modern computational challenges. However, despite being more efficient, these innovations come with a new set of trade-offs, both expected, such as accuracy versus memory and expense versus time, and more subtle, including the human expertise needed to use non-standard programming interfaces and set up complex infrastructure. In this Review, we discuss how to navigate these new methodological advances and their trade-offs.
Collapse
Affiliation(s)
- Bonnie Berger
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Yun William Yu
- Department of Computer and Mathematical Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
- Tri-Campus Department of Mathematics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
29
|
Microglia drive transient insult-induced brain injury by chemotactic recruitment of CD8 + T lymphocytes. Neuron 2023; 111:696-710.e9. [PMID: 36603584 DOI: 10.1016/j.neuron.2022.12.009] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 09/03/2022] [Accepted: 12/05/2022] [Indexed: 01/06/2023]
Abstract
The crosstalk between the nervous and immune systems has gained increasing attention for its emerging role in neurological diseases. Radiation-induced brain injury (RIBI) remains the most common medical complication of cranial radiotherapy, and its pathological mechanisms have yet to be elucidated. Here, using single-cell RNA and T cell receptor sequencing, we found infiltration and clonal expansion of CD8+ T lymphocytes in the lesioned brain tissues of RIBI patients. Furthermore, by strategies of genetic or pharmacologic interruption, we identified a chemotactic action of microglia-derived CCL2/CCL8 chemokines in mediating the infiltration of CCR2+/CCR5+ CD8+ T cells and tissue damage in RIBI mice. Such a chemotactic axis also participated in the progression of cerebral infarction in the mouse model of ischemic injury. Our findings therefore highlight the critical role of microglia in mediating the dysregulation of adaptive immune responses and reveal a potential therapeutic strategy for non-infectious brain diseases.
Collapse
|
30
|
Vlot A, Maghsudi S, Ohler U. Cluster-independent marker feature identification from single-cell omics data using SEMITONES. Nucleic Acids Res 2022; 50:e107. [PMID: 35909238 PMCID: PMC9561473 DOI: 10.1093/nar/gkac639] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 06/16/2022] [Accepted: 07/26/2022] [Indexed: 12/19/2022] Open
Abstract
Identification of cell identity markers is an essential step in single-cell omics data analysis. Current marker identification strategies typically rely on cluster assignments of cells. However, cluster assignment, particularly for developmental data, is nontrivial, potentially arbitrary, and commonly relies on prior knowledge. In response, we present SEMITONES, a principled method for cluster-free marker identification. We showcase and evaluate its application for marker gene and regulatory region identification from single-cell data of the human haematopoietic system. Additionally, we illustrate its application to spatial transcriptomics data and show how SEMITONES can be used for the annotation of cells given known marker genes. Using several simulated and curated data sets, we demonstrate that SEMITONES qualitatively and quantitatively outperforms existing methods for the retrieval of cell identity markers from single-cell omics data.
Collapse
Affiliation(s)
- Anna Hendrika Cornelia Vlot
- The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Hannoversche Str. 28, 10115 Berlin, Germany
- Department of Computer Science, Faculty of Mathematics and Natural Sciences, Humboldt Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
| | - Setareh Maghsudi
- Department of Computer Science, Faculty of Science, University of Tübingen, 72074 Tübingen, Germany
| | - Uwe Ohler
- The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Hannoversche Str. 28, 10115 Berlin, Germany
- Department of Computer Science, Faculty of Mathematics and Natural Sciences, Humboldt Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
- Department of Biology, Faculty of Life Sciences, Humboldt Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
| |
Collapse
|
31
|
Freckmann EC, Sandilands E, Cumming E, Neilson M, Román-Fernández A, Nikolatou K, Nacke M, Lannagan TRM, Hedley A, Strachan D, Salji M, Morton JP, McGarry L, Leung HY, Sansom OJ, Miller CJ, Bryant DM. Traject3d allows label-free identification of distinct co-occurring phenotypes within 3D culture by live imaging. Nat Commun 2022; 13:5317. [PMID: 36085324 PMCID: PMC9463449 DOI: 10.1038/s41467-022-32958-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 08/25/2022] [Indexed: 11/09/2022] Open
Abstract
Single cell profiling by genetic, proteomic and imaging methods has expanded the ability to identify programmes regulating distinct cell states. The 3-dimensional (3D) culture of cells or tissue fragments provides a system to study how such states contribute to multicellular morphogenesis. Whether cells plated into 3D cultures give rise to a singular phenotype or whether multiple biologically distinct phenotypes arise in parallel is largely unknown due to a lack of tools to detect such heterogeneity. Here we develop Traject3d (Trajectory identification in 3D), a method for identifying heterogeneous states in 3D culture and how these give rise to distinct phenotypes over time, from label-free multi-day time-lapse imaging. We use this to characterise the temporal landscape of morphological states of cancer cell lines, varying in metastatic potential and drug resistance, and use this information to identify drug combinations that inhibit such heterogeneity. Traject3d is therefore an important companion to other single-cell technologies by facilitating real-time identification via live imaging of how distinct states can lead to alternate phenotypes that occur in parallel in 3D culture.
Collapse
Affiliation(s)
- Eva C Freckmann
- Institute of Cancer Sciences, University of Glasgow, Glasgow, G61 1HQ, United Kingdom
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | - Emma Sandilands
- Institute of Cancer Sciences, University of Glasgow, Glasgow, G61 1HQ, United Kingdom
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | - Erin Cumming
- Institute of Cancer Sciences, University of Glasgow, Glasgow, G61 1HQ, United Kingdom
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | - Matthew Neilson
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | - Alvaro Román-Fernández
- Institute of Cancer Sciences, University of Glasgow, Glasgow, G61 1HQ, United Kingdom
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | - Konstantina Nikolatou
- Institute of Cancer Sciences, University of Glasgow, Glasgow, G61 1HQ, United Kingdom
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | - Marisa Nacke
- Institute of Cancer Sciences, University of Glasgow, Glasgow, G61 1HQ, United Kingdom
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | | | - Ann Hedley
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | - David Strachan
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | - Mark Salji
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | - Jennifer P Morton
- Institute of Cancer Sciences, University of Glasgow, Glasgow, G61 1HQ, United Kingdom
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | - Lynn McGarry
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | - Hing Y Leung
- Institute of Cancer Sciences, University of Glasgow, Glasgow, G61 1HQ, United Kingdom
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | - Owen J Sansom
- Institute of Cancer Sciences, University of Glasgow, Glasgow, G61 1HQ, United Kingdom
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | - Crispin J Miller
- Institute of Cancer Sciences, University of Glasgow, Glasgow, G61 1HQ, United Kingdom
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| | - David M Bryant
- Institute of Cancer Sciences, University of Glasgow, Glasgow, G61 1HQ, United Kingdom.
- The CRUK Beatson Institute, Glasgow, G61 1BD, United Kingdom.
| |
Collapse
|
32
|
Ranek JS, Stanley N, Purvis JE. Integrating temporal single-cell gene expression modalities for trajectory inference and disease prediction. Genome Biol 2022; 23:186. [PMID: 36064614 PMCID: PMC9442962 DOI: 10.1186/s13059-022-02749-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 08/16/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Current methods for analyzing single-cell datasets have relied primarily on static gene expression measurements to characterize the molecular state of individual cells. However, capturing temporal changes in cell state is crucial for the interpretation of dynamic phenotypes such as the cell cycle, development, or disease progression. RNA velocity infers the direction and speed of transcriptional changes in individual cells, yet it is unclear how these temporal gene expression modalities may be leveraged for predictive modeling of cellular dynamics. RESULTS Here, we present the first task-oriented benchmarking study that investigates integration of temporal sequencing modalities for dynamic cell state prediction. We benchmark ten integration approaches on ten datasets spanning different biological contexts, sequencing technologies, and species. We find that integrated data more accurately infers biological trajectories and achieves increased performance on classifying cells according to perturbation and disease states. Furthermore, we show that simple concatenation of spliced and unspliced molecules performs consistently well on classification tasks and can be used over more memory intensive and computationally expensive methods. CONCLUSIONS This work illustrates how integrated temporal gene expression modalities may be leveraged for predicting cellular trajectories and sample-associated perturbation and disease phenotypes. Additionally, this study provides users with practical recommendations for task-specific integration of single-cell gene expression modalities.
Collapse
Affiliation(s)
- Jolene S. Ranek
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Natalie Stanley
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Jeremy E. Purvis
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, USA
| |
Collapse
|
33
|
Bilous M, Tran L, Cianciaruso C, Gabriel A, Michel H, Carmona SJ, Pittet MJ, Gfeller D. Metacells untangle large and complex single-cell transcriptome networks. BMC Bioinformatics 2022; 23:336. [PMID: 35963997 PMCID: PMC9375201 DOI: 10.1186/s12859-022-04861-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 07/23/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) technologies offer unique opportunities for exploring heterogeneous cell populations. However, in-depth single-cell transcriptomic characterization of complex tissues often requires profiling tens to hundreds of thousands of cells. Such large numbers of cells represent an important hurdle for downstream analyses, interpretation and visualization. RESULTS We develop a framework called SuperCell to merge highly similar cells into metacells and perform standard scRNA-seq data analyses at the metacell level. Our systematic benchmarking demonstrates that metacells not only preserve but often improve the results of downstream analyses including visualization, clustering, differential expression, cell type annotation, gene correlation, imputation, RNA velocity and data integration. By capitalizing on the redundancy inherent to scRNA-seq data, metacells significantly facilitate and accelerate the construction and interpretation of single-cell atlases, as demonstrated by the integration of 1.46 million cells from COVID-19 patients in less than two hours on a standard desktop. CONCLUSIONS SuperCell is a framework to build and analyze metacells in a way that efficiently preserves the results of scRNA-seq data analyses while significantly accelerating and facilitating them.
Collapse
Affiliation(s)
- Mariia Bilous
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Loc Tran
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Chiara Cianciaruso
- Department of Pathology and Immunology, University of Geneva, Geneva, Switzerland
| | - Aurélie Gabriel
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Hugo Michel
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
| | - Santiago J Carmona
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Mikael J Pittet
- Department of Pathology and Immunology, University of Geneva, Geneva, Switzerland
- Department of Oncology, Geneva University Hospitals, Geneva, Switzerland
- Center for Systems Biology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| |
Collapse
|
34
|
Dhapola P, Rodhe J, Olofzon R, Bonald T, Erlandsson E, Soneji S, Karlsson G. Scarf enables a highly memory-efficient analysis of large-scale single-cell genomics data. Nat Commun 2022; 13:4616. [PMID: 35941103 PMCID: PMC9360040 DOI: 10.1038/s41467-022-32097-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 07/18/2022] [Indexed: 12/11/2022] Open
Abstract
As the scale of single-cell genomics experiments grows into the millions, the computational requirements to process this data are beyond the reach of many. Herein we present Scarf, a modularly designed Python package that seamlessly interoperates with other single-cell toolkits and allows for memory-efficient single-cell analysis of millions of cells on a laptop or low-cost devices like single-board computers. We demonstrate Scarf’s memory and compute-time efficiency by applying it to the largest existing single-cell RNA-Seq and ATAC-Seq datasets. Scarf wraps memory-efficient implementations of a graph-based t-stochastic neighbour embedding and hierarchical clustering algorithm. Moreover, Scarf performs accurate reference-anchored mapping of datasets while maintaining memory efficiency. By implementing a subsampling algorithm, Scarf additionally has the capacity to generate representative sampling of cells from a given dataset wherein rare cell populations and lineage differentiation trajectories are conserved. Together, Scarf provides a framework wherein any researcher can perform advanced processing, subsampling, reanalysis, and integration of atlas-scale datasets on standard laptop computers. Scarf is available on Github: https://github.com/parashardhapola/scarf. As the scale of single-cell genomics experiments grows into the millions, the computational requirements to process this data are beyond the reach of many. Here the authors present Scarf, a modularly designed Python package that makes the analysis workflow highly memory efficient such that even the largest existing datasets can be analyzed on an average modern laptop.
Collapse
Affiliation(s)
- Parashar Dhapola
- Division of Molecular Hematology, Lund Stem Cell Center, Lund University, Lund, Sweden.
| | - Johan Rodhe
- Division of Molecular Hematology, Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Rasmus Olofzon
- Division of Molecular Hematology, Lund Stem Cell Center, Lund University, Lund, Sweden
| | | | - Eva Erlandsson
- Division of Molecular Hematology, Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Shamit Soneji
- Division of Molecular Hematology, Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Göran Karlsson
- Division of Molecular Hematology, Lund Stem Cell Center, Lund University, Lund, Sweden.
| |
Collapse
|
35
|
Wang Y, Xu Y, Zang Z, Wu L, Li Z. Panoramic Manifold Projection (Panoramap) for Single-Cell Data Dimensionality Reduction and Visualization. Int J Mol Sci 2022; 23:7775. [PMID: 35887125 PMCID: PMC9316349 DOI: 10.3390/ijms23147775] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 07/03/2022] [Accepted: 07/12/2022] [Indexed: 12/22/2022] Open
Abstract
Nonlinear dimensionality reduction (NLDR) methods such as t-Distributed Stochastic Neighbour Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) have been widely used for biological data exploration, especially in single-cell analysis. However, the existing methods have drawbacks in preserving data's geometric and topological structures. A high-dimensional data analysis method, called Panoramic manifold projection (Panoramap), was developed as an enhanced deep learning framework for structure-preserving NLDR. Panoramap enhances deep neural networks by using cross-layer geometry-preserving constraints. The constraints constitute the loss for deep manifold learning and serve as geometric regularizers for NLDR network training. Therefore, Panoramap has better performance in preserving global structures of the original data. Here, we apply Panoramap to single-cell datasets and show that Panoramap excels at delineating the cell type lineage/hierarchy and can reveal rare cell types. Panoramap can facilitate trajectory inference and has the potential to aid in the early diagnosis of tumors. Panoramap gives improved and more biologically plausible visualization and interpretation of single-cell data. Panoramap can be readily used in single-cell research domains and other research fields that involve high dimensional data analysis.
Collapse
Affiliation(s)
- Yajuan Wang
- College of Mathematical Medicine, Zhejiang Normal University, Jinhua 321004, China
- School of Engineering, Westlake University, Hangzhou 310024, China; (Y.X.); (Z.Z.); (L.W.); (Z.L.)
| | - Yongjie Xu
- School of Engineering, Westlake University, Hangzhou 310024, China; (Y.X.); (Z.Z.); (L.W.); (Z.L.)
| | - Zelin Zang
- School of Engineering, Westlake University, Hangzhou 310024, China; (Y.X.); (Z.Z.); (L.W.); (Z.L.)
| | - Lirong Wu
- School of Engineering, Westlake University, Hangzhou 310024, China; (Y.X.); (Z.Z.); (L.W.); (Z.L.)
| | - Ziqing Li
- School of Engineering, Westlake University, Hangzhou 310024, China; (Y.X.); (Z.Z.); (L.W.); (Z.L.)
| |
Collapse
|
36
|
Wang S, Zheng H, Choi JS, Lee JK, Li X, Hu H. A systematic evaluation of the computational tools for ligand-receptor-based cell-cell interaction inference. Brief Funct Genomics 2022; 21:339-356. [PMID: 35822343 PMCID: PMC9479691 DOI: 10.1093/bfgp/elac019] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 06/13/2022] [Accepted: 06/16/2022] [Indexed: 11/13/2022] Open
Abstract
Cell-cell interactions (CCIs) are essential for multicellular organisms to coordinate biological processes and functions. One classical type of CCI interaction is between secreted ligands and cell surface receptors, i.e. ligand-receptor (LR) interactions. With the recent development of single-cell technologies, a large amount of single-cell ribonucleic acid (RNA) sequencing (scRNA-Seq) data has become widely available. This data availability motivated the single-cell-resolution study of CCIs, particularly LR-based CCIs. Dozens of computational methods and tools have been developed to predict CCIs by identifying LR-based CCIs. Many of these tools have been theoretically reviewed. However, there is little study on current LR-based CCI prediction tools regarding their performance and running results on public scRNA-Seq datasets. In this work, to fill this gap, we tested and compared nine of the most recent computational tools for LR-based CCI prediction. We used 15 well-studied scRNA-Seq samples that correspond to approximately 100K single cells under different experimental conditions for testing and comparison. Besides briefing the methodology used in these nine tools, we summarized the similarities and differences of these tools in terms of both LR prediction and CCI inference between cell types. We provided insight into using these tools to make meaningful discoveries in understanding cell communications.
Collapse
Affiliation(s)
| | | | | | | | - Xiaoman Li
- Corresponding authors: Haiyan Hu, Department of Computer Science, University of Central Florida, Orlando, FL, USA. Tel.: +1-4078820134; Fax: +1-4078235835; E-mail: ; Xiaoman Li, Burnett School of Biomedical Science, University of Central Florida, Orlando, FL, USA. Tel.: +1-4078234811; Fax: +1-4078235835; E-mail:
| | - Haiyan Hu
- Corresponding authors: Haiyan Hu, Department of Computer Science, University of Central Florida, Orlando, FL, USA. Tel.: +1-4078820134; Fax: +1-4078235835; E-mail: ; Xiaoman Li, Burnett School of Biomedical Science, University of Central Florida, Orlando, FL, USA. Tel.: +1-4078234811; Fax: +1-4078235835; E-mail:
| |
Collapse
|
37
|
Shang M, Hu Y, Cao H, Lin Q, Yi N, Zhang J, Gu Y, Yang Y, He S, Lu M, Peng L, Li L. Concordant and Heterogeneity of Single-Cell Transcriptome in Cardiac Development of Human and Mouse. Front Genet 2022; 13:892766. [PMID: 35832197 PMCID: PMC9271823 DOI: 10.3389/fgene.2022.892766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 05/16/2022] [Indexed: 11/28/2022] Open
Abstract
Normal heart development is vital for maintaining its function, and the development process is involved in complex interactions between different cell lineages. How mammalian hearts develop differently is still not fully understood. In this study, we identified several major types of cardiac cells, including cardiomyocytes (CMs), fibroblasts (FBs), endothelial cells (ECs), ECs/FBs, epicardial cells (EPs), and immune cells (macrophage/monocyte cluster, MACs/MONOs), based on single-cell transcriptome data from embryonic hearts of both human and mouse. Then, species-shared and species-specific marker genes were determined in the same cell type between the two species, and the genes with consistent and different expression patterns were also selected by constructing the developmental trajectories. Through a comparison of the development stage similarity of CMs, FBs, and ECs/FBs between humans and mice, it is revealed that CMs at e9.5 and e10.5 of mice are most similar to those of humans at 7 W and 9 W, respectively. Mouse FBs at e10.5, e13.5, and e14.5 are correspondingly more like the same human cells at 6, 7, and 9 W. Moreover, the e9.5-ECs/FBs of mice are most similar to that of humans at 10W. These results provide a resource for understudying cardiac cell types and the crucial markers able to trace developmental trajectories among the species, which is beneficial for finding suitable mouse models to detect human cardiac physiology and related diseases.
Collapse
Affiliation(s)
- Mengyue Shang
- Key Laboratory of Arrhythmias, Ministry of Education of China, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Heart Health Center, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Institute of Medical Genetics, Tongji University, Shanghai, China
| | - Yi Hu
- Key Laboratory of Arrhythmias, Ministry of Education of China, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Heart Health Center, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Institute of Medical Genetics, Tongji University, Shanghai, China
| | - Huaming Cao
- Department of Cardiology, Shanghai Shibei Hospital, Shanghai, China
| | - Qin Lin
- Key Laboratory of Arrhythmias, Ministry of Education of China, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Heart Health Center, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Institute of Medical Genetics, Tongji University, Shanghai, China
| | - Na Yi
- Key Laboratory of Arrhythmias, Ministry of Education of China, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Heart Health Center, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Institute of Medical Genetics, Tongji University, Shanghai, China
| | - Junfang Zhang
- Institute of Medical Genetics, Tongji University, Shanghai, China
| | - Yanqiong Gu
- Institute of Medical Genetics, Tongji University, Shanghai, China
| | - Yujie Yang
- Institute of Medical Genetics, Tongji University, Shanghai, China
| | - Siyu He
- Key Laboratory of Arrhythmias, Ministry of Education of China, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Heart Health Center, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Institute of Medical Genetics, Tongji University, Shanghai, China
| | - Min Lu
- Key Laboratory of Arrhythmias, Ministry of Education of China, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Luying Peng
- Key Laboratory of Arrhythmias, Ministry of Education of China, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Heart Health Center, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Institute of Medical Genetics, Tongji University, Shanghai, China
- Department of Medical Genetics, Tongji University School of Medicine, Shanghai, China
- Research Units of Origin and Regulation of Heart Rhythm, Chinese Academy of Medical Sciences, Beijing, China
- *Correspondence: Luying Peng, ; Li Li,
| | - Li Li
- Key Laboratory of Arrhythmias, Ministry of Education of China, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Heart Health Center, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Institute of Medical Genetics, Tongji University, Shanghai, China
- Department of Medical Genetics, Tongji University School of Medicine, Shanghai, China
- Research Units of Origin and Regulation of Heart Rhythm, Chinese Academy of Medical Sciences, Beijing, China
- *Correspondence: Luying Peng, ; Li Li,
| |
Collapse
|
38
|
Mashinchian O, De Franceschi F, Nassiri S, Michaud J, Migliavacca E, Aouad P, Metairon S, Pruvost S, Karaz S, Fabre P, Molina T, Stuelsatz P, Hegde N, Le Moal E, Dammone G, Dumont NA, Lutolf MP, Feige JN, Bentzinger CF. An engineered multicellular stem cell niche for the 3D derivation of human myogenic progenitors from iPSCs. EMBO J 2022; 41:e110655. [PMID: 35703167 DOI: 10.15252/embj.2022110655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 04/22/2022] [Accepted: 05/11/2022] [Indexed: 11/09/2022] Open
Abstract
Fate decisions in the embryo are controlled by a plethora of microenvironmental interactions in a three-dimensional niche. To investigate whether aspects of this microenvironmental complexity can be engineered to direct myogenic human-induced pluripotent stem cell (hiPSC) differentiation, we here screened murine cell types present in the developmental or adult stem cell niche in heterotypic suspension embryoids. We identified embryonic endothelial cells and fibroblasts as highly permissive for myogenic specification of hiPSCs. After two weeks of sequential Wnt and FGF pathway induction, these three-component embryoids are enriched in Pax7-positive embryonic-like myogenic progenitors that can be isolated by flow cytometry. Myogenic differentiation of hiPSCs in heterotypic embryoids relies on a specialized structural microenvironment and depends on MAPK, PI3K/AKT, and Notch signaling. After transplantation in a mouse model of Duchenne muscular dystrophy, embryonic-like myogenic progenitors repopulate the stem cell niche, reactivate after repeated injury, and, compared to adult human myoblasts, display enhanced fusion and lead to increased muscle function. Altogether, we provide a two-week protocol for efficient and scalable suspension-based 3D derivation of Pax7-positive myogenic progenitors from hiPSCs.
Collapse
Affiliation(s)
- Omid Mashinchian
- Nestlé Research, Nestlé Institute of Health Sciences, Lausanne, Switzerland.,School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | - Sina Nassiri
- Bioinformatics Core Facility, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Joris Michaud
- Nestlé Research, Nestlé Institute of Health Sciences, Lausanne, Switzerland
| | | | - Patrick Aouad
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Sylviane Metairon
- Nestlé Research, Nestlé Institute of Health Sciences, Lausanne, Switzerland
| | - Solenn Pruvost
- Nestlé Research, Nestlé Institute of Health Sciences, Lausanne, Switzerland
| | - Sonia Karaz
- Nestlé Research, Nestlé Institute of Health Sciences, Lausanne, Switzerland
| | - Paul Fabre
- Faculty of Medicine, CHU Sainte-Justine Research Center, School of Rehabilitation, Université de Montréal, Montreal, QC, Canada
| | - Thomas Molina
- Faculty of Medicine, CHU Sainte-Justine Research Center, School of Rehabilitation, Université de Montréal, Montreal, QC, Canada
| | - Pascal Stuelsatz
- Nestlé Research, Nestlé Institute of Health Sciences, Lausanne, Switzerland
| | - Nagabhooshan Hegde
- Nestlé Research, Nestlé Institute of Health Sciences, Lausanne, Switzerland
| | - Emmeran Le Moal
- Département de pharmacologie-physiologie, Faculté de médecine et des sciences de la santé, Centre de Recherche du CHUS, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Gabriele Dammone
- Nestlé Research, Nestlé Institute of Health Sciences, Lausanne, Switzerland
| | - Nicolas A Dumont
- Faculty of Medicine, CHU Sainte-Justine Research Center, School of Rehabilitation, Université de Montréal, Montreal, QC, Canada
| | - Matthias P Lutolf
- Laboratory of Stem Cell Bioengineering, Institute of Bioengineering, School of Life Sciences and School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.,Institute of Chemical Sciences and Engineering, School of Basic Science, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Jerome N Feige
- Nestlé Research, Nestlé Institute of Health Sciences, Lausanne, Switzerland.,School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - C Florian Bentzinger
- Nestlé Research, Nestlé Institute of Health Sciences, Lausanne, Switzerland.,Département de pharmacologie-physiologie, Faculté de médecine et des sciences de la santé, Centre de Recherche du CHUS, Université de Sherbrooke, Sherbrooke, QC, Canada
| |
Collapse
|
39
|
Song D, Xi NM, Li JJ, Wang L. scSampler: fast diversity-preserving subsampling of large-scale single-cell transcriptomic data. Bioinformatics 2022; 38:3126-3127. [PMID: 35426898 PMCID: PMC9991884 DOI: 10.1093/bioinformatics/btac271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 03/02/2022] [Accepted: 04/12/2022] [Indexed: 02/07/2023] Open
Abstract
SUMMARY The number of cells measured in single-cell transcriptomic data has grown fast in recent years. For such large-scale data, subsampling is a powerful and often necessary tool for exploratory data analysis. However, the easiest random subsampling is not ideal from the perspective of preserving rare cell types. Therefore, diversity-preserving subsampling is required for fast exploration of cell types in a large-scale dataset. Here, we propose scSampler, an algorithm for fast diversity-preserving subsampling of single-cell transcriptomic data. AVAILABILITY AND IMPLEMENTATION scSampler is implemented in Python and is published under the MIT source license. It can be installed by "pip install scsampler" and used with the Scanpy pipline. The code is available on GitHub: https://github.com/SONGDONGYUAN1994/scsampler. An R interface is available at: https://github.com/SONGDONGYUAN1994/rscsampler. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dongyuan Song
- Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA 90095-7246, USA
| | - Nan Miles Xi
- Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL 60660-1601, USA
| | - Jingyi Jessica Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
| | - Lin Wang
- Department of Statistics, The George Washington University, Washington, DC 20052-0086, USA
| |
Collapse
|
40
|
Ren J, Zhang Q, Zhou Y, Hu Y, Lyu X, Fang H, Yang J, Yu R, Shi X, Li Q. A downsampling Method Enables Robust Clustering and Integration of Single-Cell Transcriptome Data. J Biomed Inform 2022; 130:104093. [DOI: 10.1016/j.jbi.2022.104093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 04/06/2022] [Accepted: 05/03/2022] [Indexed: 11/27/2022]
|
41
|
PathogenTrack and Yeskit: tools for identifying intracellular pathogens from single-cell RNA-sequencing datasets as illustrated by application to COVID-19. Front Med 2022; 16:251-262. [PMID: 35192147 PMCID: PMC8861993 DOI: 10.1007/s11684-021-0915-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 12/20/2021] [Indexed: 12/20/2022]
Abstract
Pathogenic microbes can induce cellular dysfunction, immune response, and cause infectious disease and other diseases including cancers. However, the cellular distributions of pathogens and their impact on host cells remain rarely explored due to the limited methods. Taking advantage of single-cell RNA-sequencing (scRNA-seq) analysis, we can assess the transcriptomic features at the single-cell level. Still, the tools used to interpret pathogens (such as viruses, bacteria, and fungi) at the single-cell level remain to be explored. Here, we introduced PathogenTrack, a python-based computational pipeline that uses unmapped scRNA-seq data to identify intracellular pathogens at the single-cell level. In addition, we established an R package named Yeskit to import, integrate, analyze, and interpret pathogen abundance and transcriptomic features in host cells. Robustness of these tools has been tested on various real and simulated scRNA-seq datasets. PathogenTrack is competitive to the state-of-the-art tools such as Viral-Track, and the first tools for identifying bacteria at the single-cell level. Using the raw data of bronchoalveolar lavage fluid samples (BALF) from COVID-19 patients in the SRA database, we found the SARS-CoV-2 virus exists in multiple cell types including epithelial cells and macrophages. SARS-CoV-2-positive neutrophils showed increased expression of genes related to type I interferon pathway and antigen presenting module. Additionally, we observed the Haemophilus parahaemolyticus in some macrophage and epithelial cells, indicating a co-infection of the bacterium in some severe cases of COVID-19. The PathogenTrack pipeline and the Yeskit package are publicly available at GitHub.
Collapse
|
42
|
Zhang R, Zhou T, Ma J. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nat Biotechnol 2022; 40:254-261. [PMID: 34635838 PMCID: PMC8843812 DOI: 10.1038/s41587-021-01034-y] [Citation(s) in RCA: 72] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 07/27/2021] [Indexed: 02/08/2023]
Abstract
Single-cell Hi-C (scHi-C) can identify cell-to-cell variability of three-dimensional (3D) chromatin organization, but the sparseness of measured interactions poses an analysis challenge. Here we report Higashi, an algorithm based on hypergraph representation learning that can incorporate the latent correlations among single cells to enhance overall imputation of contact maps. Higashi outperforms existing methods for embedding and imputation of scHi-C data and is able to identify multiscale 3D genome features in single cells, such as compartmentalization and TAD-like domain boundaries, allowing refined delineation of their cell-to-cell variability. Moreover, Higashi can incorporate epigenomic signals jointly profiled in the same cell into the hypergraph representation learning framework, as compared to separate analysis of two modalities, leading to improved embeddings for single-nucleus methyl-3C data. In an scHi-C dataset from human prefrontal cortex, Higashi identifies connections between 3D genome features and cell-type-specific gene regulation. Higashi can also potentially be extended to analyze single-cell multiway chromatin interactions and other multimodal single-cell omics data.
Collapse
Affiliation(s)
- Ruochi Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tianming Zhou
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
43
|
Cao K, Hong Y, Wan L. Manifold alignment for heterogeneous single-cell multi-omics data integration using Pamona. Bioinformatics 2021; 38:211-219. [PMID: 34398192 PMCID: PMC8696097 DOI: 10.1093/bioinformatics/btab594] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 07/06/2021] [Accepted: 08/13/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Single-cell multi-omics sequencing data can provide a comprehensive molecular view of cells. However, effective approaches for the integrative analysis of such data are challenging. Existing manifold alignment methods demonstrated the state-of-the-art performance on single-cell multi-omics data integration, but they are often limited by requiring that single-cell datasets be derived from the same underlying cellular structure. RESULTS In this study, we present Pamona, a partial Gromov-Wasserstein distance-based manifold alignment framework that integrates heterogeneous single-cell multi-omics datasets with the aim of delineating and representing the shared and dataset-specific cellular structures across modalities. We formulate this task as a partial manifold alignment problem and develop a partial Gromov-Wasserstein optimal transport framework to solve it. Pamona identifies both shared and dataset-specific cells based on the computed probabilistic couplings of cells across datasets, and it aligns cellular modalities in a common low-dimensional space, while simultaneously preserving both shared and dataset-specific structures. Our framework can easily incorporate prior information, such as cell type annotations or cell-cell correspondence, to further improve alignment quality. We evaluated Pamona on a comprehensive set of publicly available benchmark datasets. We demonstrated that Pamona can accurately identify shared and dataset-specific cells, as well as faithfully recover and align cellular structures of heterogeneous single-cell modalities in a common space, outperforming the comparable existing methods. AVAILABILITYAND IMPLEMENTATION Pamona software is available at https://github.com/caokai1073/Pamona. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kai Cao
- LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yiguang Hong
- LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- Department of Control Science and Engineering, Tongji University, Shanghai 200092, China
| | - Lin Wan
- LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
44
|
Jain MS, Polanski K, Conde CD, Chen X, Park J, Mamanova L, Knights A, Botting RA, Stephenson E, Haniffa M, Lamacraft A, Efremova M, Teichmann SA. MultiMAP: dimensionality reduction and integration of multimodal data. Genome Biol 2021; 22:346. [PMID: 34930412 PMCID: PMC8686224 DOI: 10.1186/s13059-021-02565-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 12/03/2021] [Indexed: 01/04/2023] Open
Abstract
Multimodal data is rapidly growing in many fields of science and engineering, including single-cell biology. We introduce MultiMAP, a novel algorithm for dimensionality reduction and integration. MultiMAP can integrate any number of datasets, leverages features not present in all datasets, is not restricted to a linear mapping, allows the user to specify the influence of each dataset, and is extremely scalable to large datasets. We apply MultiMAP to single-cell transcriptomics, chromatin accessibility, methylation, and spatial data and show that it outperforms current approaches. On a new thymus dataset, we use MultiMAP to integrate cells along a temporal trajectory. This enables quantitative comparison of transcription factor expression and binding site accessibility over the course of T cell differentiation, revealing patterns of expression versus binding site opening kinetics.
Collapse
Affiliation(s)
- Mika Sarkin Jain
- Theory of Condensed Matter, Dept Physics, Cavendish Laboratory, University of Cambridge, JJ Thomson Ave, Cambridge, CB3 0HE, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| | - Krzysztof Polanski
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | | | - Xi Chen
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
- Southern University of Science and Technology, 1088 Xueyuan Ave, Nanshan, Shenzhen, 518055, Guangdong Province, China
| | - Jongeun Park
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
- KAIST, 291 Daehak-ro, Eoeun-dong, Yuseong-gu, Daejeon, South Korea
| | - Lira Mamanova
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Andrew Knights
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Rachel A Botting
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Emily Stephenson
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Muzlifah Haniffa
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Austen Lamacraft
- Theory of Condensed Matter, Dept Physics, Cavendish Laboratory, University of Cambridge, JJ Thomson Ave, Cambridge, CB3 0HE, UK
| | - Mirjana Efremova
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
- Barts Cancer Institute, Queen Mary University of London, London, UK.
| | - Sarah A Teichmann
- Theory of Condensed Matter, Dept Physics, Cavendish Laboratory, University of Cambridge, JJ Thomson Ave, Cambridge, CB3 0HE, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| |
Collapse
|
45
|
Baker DN, Dyjack N, Braverman V, Hicks SC, Langmead B. Fast and memory-efficient scRNA-seq k-means clustering with various distances. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2021; 2021:24. [PMID: 34778889 PMCID: PMC8586878 DOI: 10.1145/3459930.3469523] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Single-cell RNA-sequencing (scRNA-seq) analyses typically begin by clustering a gene-by-cell expression matrix to empirically define groups of cells with similar expression profiles. We describe new methods and a new open source library, minicore, for efficient k-means++ center finding and k-means clustering of scRNA-seq data. Minicore works with sparse count data, as it emerges from typical scRNA-seq experiments, as well as with dense data from after dimensionality reduction. Minicore's novel vectorized weighted reservoir sampling algorithm allows it to find initial k-means++ centers for a 4-million cell dataset in 1.5 minutes using 20 threads. Minicore can cluster using Euclidean distance, but also supports a wider class of measures like Jensen-Shannon Divergence, Kullback-Leibler Divergence, and the Bhattachaiyya distance, which can be directly applied to count data and probability distributions. Further, minicore produces lower-cost centerings more efficiently than scikit-learn for scRNA-seq datasets with millions of cells. With careful handling of priors, minicore implements these distance measures with only minor (<2-fold) speed differences among all distances. We show that a minicore pipeline consisting of k-means++, localsearch++ and mini-batch k-means can cluster a 4-million cell dataset in minutes, using less than 10GiB of RAM. This memory-efficiency enables atlas-scale clustering on laptops and other commodity hardware. Finally, we report findings on which distance measures give clusterings that are most consistent with known cell type labels.
Collapse
Affiliation(s)
- Daniel N Baker
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Nathan Dyjack
- Department of Biostatistics, Johns Hopkins University, Bloomberg, School of Public Health, Baltimore, MD, USA
| | - Vladimir Braverman
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins University, Bloomberg, School of Public Health, Baltimore, MD, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
46
|
Wu AP, Peng J, Berger B, Cho H. Bayesian information sharing enhances detection of regulatory associations in rare cell types. Bioinformatics 2021; 37:i349-i357. [PMID: 34252956 PMCID: PMC8275330 DOI: 10.1093/bioinformatics/btab269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Recent advances in single-cell RNA-sequencing (scRNA-seq) technologies promise to enable the study of gene regulatory associations at unprecedented resolution in diverse cellular contexts. However, identifying unique regulatory associations observed only in specific cell types or conditions remains a key challenge; this is particularly so for rare transcriptional states whose sample sizes are too small for existing gene regulatory network inference methods to be effective. RESULTS We present ShareNet, a Bayesian framework for boosting the accuracy of cell type-specific gene regulatory networks by propagating information across related cell types via an information sharing structure that is adaptively optimized for a given single-cell dataset. The techniques we introduce can be used with a range of general network inference algorithms to enhance the output for each cell type. We demonstrate the enhanced accuracy of our approach on three benchmark scRNA-seq datasets. We find that our inferred cell type-specific networks also uncover key changes in gene associations that underpin the complex rewiring of regulatory networks across cell types, tissues and dynamic biological processes. Our work presents a path toward extracting deeper insights about cell type-specific gene regulation in the rapidly growing compendium of scRNA-seq datasets. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. AVAILABILITY AND IMPLEMENTATION The code for ShareNet is available at http://sharenet.csail.mit.edu and https://github.com/alexw16/sharenet.
Collapse
Affiliation(s)
- Alexander P Wu
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
| | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA.,Department of Mathematics, MIT, Cambridge, MA 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Hyunghoon Cho
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
47
|
Tung LH, Kingsford C. Practical selection of representative sets of RNA-seq samples using a hierarchical approach. Bioinformatics 2021; 37:i334-i341. [PMID: 34252927 PMCID: PMC8275344 DOI: 10.1093/bioinformatics/btab315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/30/2021] [Indexed: 11/26/2022] Open
Abstract
MOTIVATION Despite numerous RNA-seq samples available at large databases, most RNA-seq analysis tools are evaluated on a limited number of RNA-seq samples. This drives a need for methods to select a representative subset from all available RNA-seq samples to facilitate comprehensive, unbiased evaluation of bioinformatics tools. In sequence-based approaches for representative set selection (e.g. a k-mer counting approach that selects a subset based on k-mer similarities between RNA-seq samples), because of the large numbers of available RNA-seq samples and of k-mers/sequences in each sample, computing the full similarity matrix using k-mers/sequences for the entire set of RNA-seq samples in a large database (e.g. the SRA) has memory and runtime challenges; this makes direct representative set selection infeasible with limited computing resources. RESULTS We developed a novel computational method called 'hierarchical representative set selection' to handle this challenge. Hierarchical representative set selection is a divide-and-conquer-like algorithm that breaks representative set selection into sub-selections and hierarchically selects representative samples through multiple levels. We demonstrate that hierarchical representative set selection can achieve summarization quality close to that of direct representative set selection, while largely reducing runtime and memory requirements of computing the full similarity matrix (up to 8.4× runtime reduction and 5.35× memory reduction for 10 000 and 12 000 samples respectively that could be practically run with direct subset selection). We show that hierarchical representative set selection substantially outperforms random sampling on the entire SRA set of RNA-seq samples, making it a practical solution to representative set selection on large databases like the SRA. AVAILABILITY AND IMPLEMENTATION The code is available at https://github.com/Kingsford-Group/hierrepsetselection and https://github.com/Kingsford-Group/jellyfishsim. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Laura H Tung
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Carl Kingsford
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
48
|
Hong F, Meng Q, Zhang W, Zheng R, Li X, Cheng T, Hu D, Gao X. Single-Cell Analysis of the Pan-Cancer Immune Microenvironment and scTIME Portal. Cancer Immunol Res 2021; 9:939-951. [PMID: 34117085 DOI: 10.1158/2326-6066.cir-20-1026] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 02/17/2021] [Accepted: 06/09/2021] [Indexed: 11/16/2022]
Abstract
Single-cell sequencing opens a new era for the investigation of tumor immune microenvironments (TIME). However, at single-cell resolution, a pan-cancer analysis that addresses the identity and diversity of TIMEs is lacking. Here, we first built a pan-cancer single-cell reference of TIMEs with refined subcell types and recognized new cell type-specific transcription factors. We then presented a pan-cancer view of the common features of the TIME and compared the variation of each immune cell type across patients and tumor types in the aspects of abundance, cell states, and cell communications. We found that the abundance and the cell states of dysfunctional T cells were most variable, whereas those of regulatory T cells were relatively stable. A subset of tumor-associated macrophages (TAM), PLTP + C1QC + TAMs, may regulate the abundance of dysfunctional T cells through cytokine/chemokine signaling. The ligand-receptor communication network of TIMEs was tumor-type specific and dominated by the tumor-enriched immune cells. We additionally developed the single-cell TIME (scTIME) portal (http://scTIME.sklehabc.com) with the scTIME-specific analysis modules and a unified cell annotation. In addition to the immune cell compositions and correlation analysis using refined cell type classifications, the portal also provides cell-cell interaction and cell type-specific gene signature analysis. Our single-cell pan-cancer analysis and scTIME portal will provide more insights into the features of TIMEs, as well as the molecular and cellular mechanisms underlying immunotherapies.
Collapse
Affiliation(s)
- Fang Hong
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, China
| | - Qianqian Meng
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, China
| | - Weiyu Zhang
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, China
| | - Ruiqin Zheng
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, China
| | - Xiaoyun Li
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, China
| | - Tao Cheng
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, China.
| | - Deqing Hu
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Breast Cancer Prevention and Therapy, Ministry of Education, Cancer Institute and Hospital of Tianjin Medical University, Department of Cell Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China.
| | - Xin Gao
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, China.
| |
Collapse
|
49
|
Singh R, Hie BL, Narayan A, Berger B. Schema: metric learning enables interpretable synthesis of heterogeneous single-cell modalities. Genome Biol 2021; 22:131. [PMID: 33941239 PMCID: PMC8091541 DOI: 10.1186/s13059-021-02313-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 03/12/2021] [Indexed: 02/08/2023] Open
Abstract
A complete understanding of biological processes requires synthesizing information across heterogeneous modalities, such as age, disease status, or gene expression. Technological advances in single-cell profiling have enabled researchers to assay multiple modalities simultaneously. We present Schema, which uses a principled metric learning strategy that identifies informative features in a modality to synthesize disparate modalities into a single coherent interpretation. We use Schema to infer cell types by integrating gene expression and chromatin accessibility data; demonstrate informative data visualizations that synthesize multiple modalities; perform differential gene expression analysis in the context of spatial variability; and estimate evolutionary pressure on peptide sequences.
Collapse
Affiliation(s)
- Rohit Singh
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| | - Brian L Hie
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Ashwin Narayan
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| |
Collapse
|
50
|
Abstract
Motivation Single-cell RNA-sequencing has grown massively in scale since its inception, presenting substantial analytic and computational challenges. Even simple downstream analyses, such as dimensionality reduction and clustering, require days of runtime and hundreds of gigabytes of memory for today’s largest datasets. In addition, current methods often favor common cell types, and miss salient biological features captured by small cell populations. Results Here we present Hopper, a single-cell toolkit that both speeds up the analysis of single-cell datasets and highlights their transcriptional diversity by intelligent subsampling, or sketching. Hopper realizes the optimal polynomial-time approximation of the Hausdorff distance between the full and downsampled dataset, ensuring that each cell is well-represented by some cell in the sample. Unlike prior sketching methods, Hopper adds points iteratively and allows for additional sampling from regions of interest, enabling fast and targeted multi-resolution analyses. In a dataset of over 1.3 million mouse brain cells, Hopper detects a cluster of just 64 macrophages expressing inflammatory genes (0.004% of the full dataset) from a Hopper sketch containing just 5000 cells, and several other small but biologically interesting immune cell populations invisible to analysis of the full data. On an even larger dataset consisting of ∼2 million developing mouse organ cells, we show Hopper’s even representation of important cell types in small sketches, in contrast with prior sketching methods. We also introduce Treehopper, which uses spatial partitioning to speed up Hopper by orders of magnitude with minimal loss in performance. By condensing transcriptional information encoded in large datasets, Hopper and Treehopper grant the individual user with a laptop the analytic capabilities of a large consortium. Availability and implementation The code for Hopper is available at https://github.com/bendemeo/hopper. In addition, we have provided sketches of many of the largest single-cell datasets, available at http://hopper.csail.mit.edu.
Collapse
Affiliation(s)
- Benjamin DeMeo
- Department of Bioinformatics, Harvard University, Cambridge, MA 02138, USA.,Computer Science and Artificial Intelligence Laboratory
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory.,Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|