1
|
Drost F, An Y, Bonafonte-Pardàs I, Dratva LM, Lindeboom RGH, Haniffa M, Teichmann SA, Theis F, Lotfollahi M, Schubert B. Multi-modal generative modeling for joint analysis of single-cell T cell receptor and gene expression data. Nat Commun 2024; 15:5577. [PMID: 38956082 PMCID: PMC11220149 DOI: 10.1038/s41467-024-49806-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 05/23/2024] [Indexed: 07/04/2024] Open
Abstract
Recent advances in single-cell immune profiling have enabled the simultaneous measurement of transcriptome and T cell receptor (TCR) sequences, offering great potential for studying immune responses at the cellular level. However, integrating these diverse modalities across datasets is challenging due to their unique data characteristics and technical variations. Here, to address this, we develop the multimodal generative model mvTCR to fuse modality-specific information across transcriptome and TCR into a shared representation. Our analysis demonstrates the added value of multimodal over unimodal approaches to capture antigen specificity. Notably, we use mvTCR to distinguish T cell subpopulations binding to SARS-CoV-2 antigens from bystander cells. Furthermore, when combined with reference mapping approaches, mvTCR can map newly generated datasets to extensive T cell references, facilitating knowledge transfer. In summary, we envision mvTCR to enable a scalable analysis of multimodal immune profiling data and advance our understanding of immune responses.
Collapse
Affiliation(s)
- Felix Drost
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Alte Akademie 8, 85354, Freising, Germany
| | - Yang An
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
- School of Computation, Information and Technology, Technical University of Munich, Boltzmannstraße 3, 85748, Garching bei München, Germany
| | - Irene Bonafonte-Pardàs
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Lisa M Dratva
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Rik G H Lindeboom
- The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Muzlifah Haniffa
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Physics, Cavendish Laboratory, University of Cambridge, 19 JJ Thomson Avenue, Cambridge, UK
| | - Fabian Theis
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Alte Akademie 8, 85354, Freising, Germany
- School of Computation, Information and Technology, Technical University of Munich, Boltzmannstraße 3, 85748, Garching bei München, Germany
| | - Mohammad Lotfollahi
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.
| | - Benjamin Schubert
- Computational Health Center, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.
- School of Computation, Information and Technology, Technical University of Munich, Boltzmannstraße 3, 85748, Garching bei München, Germany.
| |
Collapse
|
2
|
Semrau S. Neural Network-Based Filter Design for Compressive Raman Classification of Cells. J Chem Inf Model 2024. [PMID: 38959402 DOI: 10.1021/acs.jcim.3c01856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2024]
Abstract
Cell-based therapies are bound to revolutionize medicine, but significant technical hurdles must be overcome before wider adoption. In particular, nondestructive, label-free methods to characterize cells in real time are needed to optimize the production process and improve quality control. Raman spectroscopy, which provides a fingerprint of a cell's chemical composition, would be an ideal modality but is too slow for high-throughput applications. Compressive Raman techniques, which measure only linear combinations of Raman intensities, can be fast but require careful optimization to deliver high performance. Here, we develop a neural network model to identify optimal parameters for a compressive sensing scheme that reduces measurement time by 2 orders of magnitude. In a data set containing Raman spectra of three different cell types, it achieves up to 90% classification accuracy using only five linear combinations of Raman intensities. Our method thus unlocks the power of Raman spectroscopy for the characterization of cell products.
Collapse
Affiliation(s)
- Stefan Semrau
- Leiden Institute of Physics, Leiden University, Leiden 2333CA, The Netherlands
| |
Collapse
|
3
|
Netskar H, Pfefferle A, Goodridge JP, Sohlberg E, Dufva O, Teichmann SA, Brownlie D, Michaëlsson J, Marquardt N, Clancy T, Horowitz A, Malmberg KJ. Pan-cancer profiling of tumor-infiltrating natural killer cells through transcriptional reference mapping. Nat Immunol 2024:10.1038/s41590-024-01884-z. [PMID: 38956379 DOI: 10.1038/s41590-024-01884-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 05/30/2024] [Indexed: 07/04/2024]
Abstract
The functional diversity of natural killer (NK) cell repertoires stems from differentiation, homeostatic, receptor-ligand interactions and adaptive-like responses to viral infections. In the present study, we generated a single-cell transcriptional reference map of healthy human blood- and tissue-derived NK cells, with temporal resolution and fate-specific expression of gene-regulatory networks defining NK cell differentiation. Transfer learning facilitated incorporation of tumor-infiltrating NK cell transcriptomes (39 datasets, 7 solid tumors, 427 patients) into the reference map to analyze tumor microenvironment (TME)-induced perturbations. Of the six functionally distinct NK cell states identified, a dysfunctional stressed CD56bright state susceptible to TME-induced immunosuppression and a cytotoxic TME-resistant effector CD56dim state were commonly enriched across tumor types, the ratio of which was predictive of patient outcome in malignant melanoma and osteosarcoma. This resource may inform the design of new NK cell therapies and can be extended through transfer learning to interrogate new datasets from experimental perturbations or disease conditions.
Collapse
Affiliation(s)
- Herman Netskar
- Department of Cancer Immunology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Precision Immunotherapy Alliance, University of Oslo, Oslo, Norway
| | - Aline Pfefferle
- Center for Infectious Medicine, Department of Medicine Huddinge, Karolinska Institutet, Stockholm, Sweden.
| | | | - Ebba Sohlberg
- Center for Infectious Medicine, Department of Medicine Huddinge, Karolinska Institutet, Stockholm, Sweden
| | - Olli Dufva
- Wellcome Sanger Institute, Wellcome Genome Clymphoid cells (ILCs)ampus, Hinxton, Cambridge, UK
| | - Sarah A Teichmann
- Wellcome-MRC Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - Demi Brownlie
- Center for Hematology and Regenerative Medicine, Department of Medicine Huddinge, Karolinska Institutet, Huddinge, Sweden
| | - Jakob Michaëlsson
- Center for Infectious Medicine, Department of Medicine Huddinge, Karolinska Institutet, Stockholm, Sweden
| | - Nicole Marquardt
- Center for Hematology and Regenerative Medicine, Department of Medicine Huddinge, Karolinska Institutet, Huddinge, Sweden
| | - Trevor Clancy
- Oslo Cancer Cluster, NEC OncoImmunity AS, Oslo, Norway
- Department of Vaccine Informatics, Institute for Tropical Medicine, Nagasaki University, Nagasaki, Japan
| | - Amir Horowitz
- Department of Immunology & Immunotherapy, Lipschultz Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Karl-Johan Malmberg
- Department of Cancer Immunology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.
- Precision Immunotherapy Alliance, University of Oslo, Oslo, Norway.
- Center for Infectious Medicine, Department of Medicine Huddinge, Karolinska Institutet, Stockholm, Sweden.
| |
Collapse
|
4
|
Trapnell C. Revealing gene function with statistical inference at single-cell resolution. Nat Rev Genet 2024:10.1038/s41576-024-00750-w. [PMID: 38951690 DOI: 10.1038/s41576-024-00750-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/21/2024] [Indexed: 07/03/2024]
Abstract
Single-cell and spatial molecular profiling assays have shown large gains in sensitivity, resolution and throughput. Applying these technologies to specimens from human and model organisms promises to comprehensively catalogue cell types, reveal their lineage origins in development and discern their contributions to disease pathogenesis. Moreover, rapidly dropping costs have made well-controlled perturbation experiments and cohort studies widely accessible, illuminating mechanisms that give rise to phenotypes at the scale of the cell, the tissue and the whole organism. Interpreting the coming flood of single-cell data, much of which will be spatially resolved, will place a tremendous burden on existing computational pipelines. However, statistical concepts, models, tools and algorithms can be repurposed to solve problems now arising in genetic and molecular biology studies of development and disease. Here, I review how the questions that recent technological innovations promise to answer can be addressed by the major classes of statistical tools.
Collapse
Affiliation(s)
- Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
- Seattle Hub for Synthetic Biology, Seattle, WA, USA.
| |
Collapse
|
5
|
Spathopoulou A, Podlesnic M, De Gaetano L, Kirsch EM, Tisch M, Finotello F, Aigner L, Günther K, Edenhofer F. Single-cell Profiling of Reprogrammed Human Neural Stem Cells Unveils High Similarity to Neural Progenitors in the Developing Central Nervous System. Stem Cell Rev Rep 2024; 20:1325-1339. [PMID: 38519702 PMCID: PMC11222274 DOI: 10.1007/s12015-024-10698-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/14/2024] [Indexed: 03/25/2024]
Abstract
BACKGROUND Similar to induced pluripotent cells (iPSCs), induced neural stem cells (iNSCs) can be directly converted from human somatic cells such as dermal fibroblasts and peripheral blood monocytes. While previous studies have demonstrated the resemblance of iNSCs to neural stem cells derived from primary sources and embryonic stem cells, respectively, a comprehensive analysis of the correlation between iNSCs and their physiological counterparts remained to be investigated. METHODS Nowadays, single-cell sequencing technologies provide unique opportunities for in-depth cellular benchmarking of complex cell populations. Our study involves the comprehensive profiling of converted human iNSCs at a single-cell transcriptomic level, alongside conventional methods, like flow cytometry and immunofluorescence stainings. RESULTS Our results show that the iNSC conversion yields a homogeneous cell population expressing bona fide neural stem cell markers. Extracting transcriptomic signatures from published single cell transcriptomic atlas data and comparison to the iNSC transcriptome reveals resemblance to embryonic neuroepithelial cells of early neurodevelopmental stages observed in vivo at 5 weeks of development. CONCLUSION Our data underscore the physiological relevance of directly converted iNSCs, making them a valuable in vitro system for modeling human central nervous system development and establishing translational applications in cell therapy and compound screening.
Collapse
Affiliation(s)
- Angeliki Spathopoulou
- Department of Molecular Biology & CMBI, Genomics, Stem Cell & Regenerative Medicine Group, University of Innsbruck, Technikerstraße 25, 6020, Innsbruck, Austria
| | - Martina Podlesnic
- Department of Molecular Biology & CMBI, Genomics, Stem Cell & Regenerative Medicine Group, University of Innsbruck, Technikerstraße 25, 6020, Innsbruck, Austria
| | - Laura De Gaetano
- Department of Molecular Biology & CMBI, Genomics, Stem Cell & Regenerative Medicine Group, University of Innsbruck, Technikerstraße 25, 6020, Innsbruck, Austria
| | - Elena Marie Kirsch
- Institute of Molecular Regenerative Medicine, Paracelsus Medical University, Salzburg, Austria
- Center for Stroke Research, Charité - Universitätsmedizin Berlin, Berlin, Germany
- Department of Experimental Neurology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Marcel Tisch
- Department of Molecular Biology & CMBI, Genomics, Stem Cell & Regenerative Medicine Group, University of Innsbruck, Technikerstraße 25, 6020, Innsbruck, Austria
| | - Francesca Finotello
- Department of Molecular Biology, Digital Science Center (DiSC), University of Innsbruck, Innsbruck, Austria
| | - Ludwig Aigner
- Institute of Molecular Regenerative Medicine, Paracelsus Medical University, Salzburg, Austria
| | - Katharina Günther
- Department of Molecular Biology & CMBI, Genomics, Stem Cell & Regenerative Medicine Group, University of Innsbruck, Technikerstraße 25, 6020, Innsbruck, Austria
- Institute of Molecular Regenerative Medicine, Paracelsus Medical University, Salzburg, Austria
| | - Frank Edenhofer
- Department of Molecular Biology & CMBI, Genomics, Stem Cell & Regenerative Medicine Group, University of Innsbruck, Technikerstraße 25, 6020, Innsbruck, Austria.
| |
Collapse
|
6
|
Gonzalez-Ferrer J, Lehrer J, O'Farrell A, Paten B, Teodorescu M, Haussler D, Jonsson VD, Mostajo-Radji MA. SIMS: A deep-learning label transfer tool for single-cell RNA sequencing analysis. CELL GENOMICS 2024; 4:100581. [PMID: 38823397 DOI: 10.1016/j.xgen.2024.100581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 04/02/2024] [Accepted: 05/09/2024] [Indexed: 06/03/2024]
Abstract
Cell atlases serve as vital references for automating cell labeling in new samples, yet existing classification algorithms struggle with accuracy. Here we introduce SIMS (scalable, interpretable machine learning for single cell), a low-code data-efficient pipeline for single-cell RNA classification. We benchmark SIMS against datasets from different tissues and species. We demonstrate SIMS's efficacy in classifying cells in the brain, achieving high accuracy even with small training sets (<3,500 cells) and across different samples. SIMS accurately predicts neuronal subtypes in the developing brain, shedding light on genetic changes during neuronal differentiation and postmitotic fate refinement. Finally, we apply SIMS to single-cell RNA datasets of cortical organoids to predict cell identities and uncover genetic variations between cell lines. SIMS identifies cell-line differences and misannotated cell lineages in human cortical organoids derived from different pluripotent stem cell lines. Altogether, we show that SIMS is a versatile and robust tool for cell-type classification from single-cell datasets.
Collapse
Affiliation(s)
- Jesus Gonzalez-Ferrer
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Julian Lehrer
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Applied Mathematics, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Ash O'Farrell
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Mircea Teodorescu
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Electrical and Computer Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Vanessa D Jonsson
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Applied Mathematics, University of California, Santa Cruz, Santa Cruz, CA 95060, USA.
| | - Mohammed A Mostajo-Radji
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA.
| |
Collapse
|
7
|
McLean AK, Reynolds G, Pratt AG. Leveraging Multi-Tissue, Single-Cell Atlases as Tools to Elucidate Shared Mechanisms of Immune-Mediated Inflammatory Diseases. Biomedicines 2024; 12:1297. [PMID: 38927506 PMCID: PMC11201400 DOI: 10.3390/biomedicines12061297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 06/05/2024] [Accepted: 06/08/2024] [Indexed: 06/28/2024] Open
Abstract
The observation that certain therapeutic strategies for targeting inflammation benefit patients with distinct immune-mediated inflammatory diseases (IMIDs) is exemplified by the success of TNF blockade in conditions including rheumatoid arthritis, ulcerative colitis, and skin psoriasis, albeit only for subsets of individuals with each condition. This suggests intersecting "nodes" in inflammatory networks at a molecular and cellular level may drive and/or maintain IMIDs, being "shared" between traditionally distinct diagnoses without mapping neatly to a single clinical phenotype. In line with this proposition, integrative tumour tissue analyses in oncology have highlighted novel cell states acting across diverse cancers, with important implications for precision medicine. Drawing upon advances in the oncology field, this narrative review will first summarise learnings from the Human Cell Atlas in health as a platform for interrogating IMID tissues. It will then review cross-disease studies to date that inform this endeavour before considering future directions in the field.
Collapse
Affiliation(s)
- Anthony K. McLean
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Gary Reynolds
- Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Arthur G. Pratt
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
- Musculoskeletal Unit, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne NE7 7DN, UK
| |
Collapse
|
8
|
Curion F, Theis FJ. Machine learning integrative approaches to advance computational immunology. Genome Med 2024; 16:80. [PMID: 38862979 PMCID: PMC11165829 DOI: 10.1186/s13073-024-01350-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 05/23/2024] [Indexed: 06/13/2024] Open
Abstract
The study of immunology, traditionally reliant on proteomics to evaluate individual immune cells, has been revolutionized by single-cell RNA sequencing. Computational immunologists play a crucial role in analysing these datasets, moving beyond traditional protein marker identification to encompass a more detailed view of cellular phenotypes and their functional roles. Recent technological advancements allow the simultaneous measurements of multiple cellular components-transcriptome, proteome, chromatin, epigenetic modifications and metabolites-within single cells, including in spatial contexts within tissues. This has led to the generation of complex multiscale datasets that can include multimodal measurements from the same cells or a mix of paired and unpaired modalities. Modern machine learning (ML) techniques allow for the integration of multiple "omics" data without the need for extensive independent modelling of each modality. This review focuses on recent advancements in ML integrative approaches applied to immunological studies. We highlight the importance of these methods in creating a unified representation of multiscale data collections, particularly for single-cell and spatial profiling technologies. Finally, we discuss the challenges of these holistic approaches and how they will be instrumental in the development of a common coordinate framework for multiscale studies, thereby accelerating research and enabling discoveries in the computational immunology field.
Collapse
Affiliation(s)
- Fabiola Curion
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| |
Collapse
|
9
|
Rumberger JL, Greenwald NF, Ranek JS, Boonrat P, Walker C, Franzen J, Varra SR, Kong A, Sowers C, Liu CC, Averbukh I, Piyadasa H, Vanguri R, Nederlof I, Wang XJ, Van Valen D, Kok M, Hollmann TJ, Kainmueller D, Angelo M. Automated classification of cellular expression in multiplexed imaging data with Nimbus. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.02.597062. [PMID: 38895405 PMCID: PMC11185540 DOI: 10.1101/2024.06.02.597062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Multiplexed imaging offers a powerful approach to characterize the spatial topography of tissues in both health and disease. To analyze such data, the specific combination of markers that are present in each cell must be enumerated to enable accurate phenotyping, a process that often relies on unsupervised clustering. We constructed the Pan-Multiplex (Pan-M) dataset containing 197 million distinct annotations of marker expression across 15 different cell types. We used Pan-M to create Nimbus, a deep learning model to predict marker positivity from multiplexed image data. Nimbus is a pre-trained model that uses the underlying images to classify marker expression across distinct cell types, from different tissues, acquired using different microscope platforms, without requiring any retraining. We demonstrate that Nimbus predictions capture the underlying staining patterns of the full diversity of markers present in Pan-M. We then show how Nimbus predictions can be integrated with downstream clustering algorithms to robustly identify cell subtypes in image data. We have open-sourced Nimbus and Pan-M to enable community use at https://github.com/angelolab/Nimbus-Inference.
Collapse
Affiliation(s)
- J. Lorenz Rumberger
- Max-Delbruck-Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Humboldt-Universität zu Berlin, Faculty of Mathematics and Natural Sciences, Berlin, Germany
- Helmholtz Imaging
| | - Noah F. Greenwald
- Department of Pathology, Stanford University, Stanford, California, USA
| | - Jolene S. Ranek
- Department of Pathology, Stanford University, Stanford, California, USA
| | - Potchara Boonrat
- Department of Pathology, Stanford University, Stanford, California, USA
| | - Cameron Walker
- Department of Pathology, Stanford University, Stanford, California, USA
| | - Jannik Franzen
- Max-Delbruck-Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Helmholtz Imaging
- Charité University Medicine, Berlin, Germany
| | | | - Alex Kong
- Department of Pathology, Stanford University, Stanford, California, USA
| | - Cameron Sowers
- Department of Pathology, Stanford University, Stanford, California, USA
| | - Candace C. Liu
- Department of Pathology, Stanford University, Stanford, California, USA
| | - Inna Averbukh
- Department of Pathology, Stanford University, Stanford, California, USA
| | - Hadeesha Piyadasa
- Department of Pathology, Stanford University, Stanford, California, USA
| | - Rami Vanguri
- Division of Precision Medicine, Department of Medicine, NYU Grossman School of Medicine, New York, New York, USA
| | - Iris Nederlof
- Division of Tumor Biology & Immunology, The Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Xuefei Julie Wang
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA, USA
| | - David Van Valen
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Marleen Kok
- Division of Tumor Biology & Immunology, The Netherlands Cancer Institute, Amsterdam, the Netherlands
- Department of Medical Oncology, The Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Travis J. Hollmann
- Division of Precision Medicine, Department of Medicine, NYU Grossman School of Medicine, New York, New York, USA
| | - Dagmar Kainmueller
- Max-Delbruck-Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Helmholtz Imaging
- Potsdam University, Digital Engineering Faculty, Germany
| | - Michael Angelo
- Department of Pathology, Stanford University, Stanford, California, USA
| |
Collapse
|
10
|
Rivero-Garcia I, Torres M, Sánchez-Cabo F. Deep generative models in single-cell omics. Comput Biol Med 2024; 176:108561. [PMID: 38749321 DOI: 10.1016/j.compbiomed.2024.108561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 04/30/2024] [Accepted: 05/05/2024] [Indexed: 05/31/2024]
Abstract
Deep Generative Models (DGMs) are becoming instrumental for inferring probability distributions inherent to complex processes, such as most questions in biomedical research. For many years, there was a lack of mathematical methods that would allow this inference in the scarce data scenario of biomedical research. The advent of single-cell omics has finally made square the so-called "skinny matrix", allowing to apply mathematical methods already extensively used in other areas. Moreover, it is now possible to integrate data at different molecular levels in thousands or even millions of samples, thanks to the number of single-cell atlases being collaboratively generated. Additionally, DGMs have proven useful in other frequent tasks in single-cell analysis pipelines, from dimensionality reduction, cell type annotation to RNA velocity inference. In spite of its promise, DGMs need to be used with caution in biomedical research, paying special attention to its use to answer the right questions and the definition of appropriate error metrics and validation check points that confirm not only its correct use but also its relevance. All in all, DGMs provide an exciting tool that opens a bright future for the integrative analysis of single-cell -omics to understand health and disease.
Collapse
Affiliation(s)
- Inés Rivero-Garcia
- Universidad Politécnica de Madrid, Madrid, 28040, Spain; Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, 28029, Spain
| | - Miguel Torres
- Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, 28029, Spain
| | - Fátima Sánchez-Cabo
- Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, 28029, Spain.
| |
Collapse
|
11
|
De Zuani M, Xue H, Park JS, Dentro SC, Seferbekova Z, Tessier J, Curras-Alonso S, Hadjipanayis A, Athanasiadis EI, Gerstung M, Bayraktar O, Cvejic A. Single-cell and spatial transcriptomics analysis of non-small cell lung cancer. Nat Commun 2024; 15:4388. [PMID: 38782901 PMCID: PMC11116453 DOI: 10.1038/s41467-024-48700-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 05/08/2024] [Indexed: 05/25/2024] Open
Abstract
Lung cancer is the second most frequently diagnosed cancer and the leading cause of cancer-related mortality worldwide. Tumour ecosystems feature diverse immune cell types. Myeloid cells, in particular, are prevalent and have a well-established role in promoting the disease. In our study, we profile approximately 900,000 cells from 25 treatment-naive patients with adenocarcinoma and squamous-cell carcinoma by single-cell and spatial transcriptomics. We note an inverse relationship between anti-inflammatory macrophages and NK cells/T cells, and with reduced NK cell cytotoxicity within the tumour. While we observe a similar cell type composition in both adenocarcinoma and squamous-cell carcinoma, we detect significant differences in the co-expression of various immune checkpoint inhibitors. Moreover, we reveal evidence of a transcriptional "reprogramming" of macrophages in tumours, shifting them towards cholesterol export and adopting a foetal-like transcriptional signature which promotes iron efflux. Our multi-omic resource offers a high-resolution molecular map of tumour-associated macrophages, enhancing our understanding of their role within the tumour microenvironment.
Collapse
Affiliation(s)
- Marco De Zuani
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- OpenTargets, Wellcome Genome Campus, Hinxton, UK
- Department of Haematology, University of Cambridge, Cambridge, UK
- Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Cambridge, UK
| | - Haoliang Xue
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- OpenTargets, Wellcome Genome Campus, Hinxton, UK
- Department of Haematology, University of Cambridge, Cambridge, UK
- Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Cambridge, UK
| | - Jun Sung Park
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- OpenTargets, Wellcome Genome Campus, Hinxton, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Stefan C Dentro
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
- Division of Artificial Intelligence in Oncology, DKFZ, Heidelberg, Germany
| | - Zaira Seferbekova
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Julien Tessier
- Precision Medicine and Computational Biology, Sanofi, Cambridge, MA, USA
| | | | | | - Emmanouil I Athanasiadis
- OpenTargets, Wellcome Genome Campus, Hinxton, UK
- Medical Image and Signal Processing Laboratory (MEDISP), Department of Biomedical Engineering, University of West Attica, Athens, Greece
| | - Moritz Gerstung
- OpenTargets, Wellcome Genome Campus, Hinxton, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
- Division of Artificial Intelligence in Oncology, DKFZ, Heidelberg, Germany
| | - Omer Bayraktar
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- OpenTargets, Wellcome Genome Campus, Hinxton, UK
| | - Ana Cvejic
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
- OpenTargets, Wellcome Genome Campus, Hinxton, UK.
- Department of Haematology, University of Cambridge, Cambridge, UK.
- Biotech Research & Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
12
|
Guo Q, Yuan M, Zhang L, Deng M. scPLAN: a hierarchical computational framework for single transcriptomics data annotation, integration and cell-type label refinement. Brief Bioinform 2024; 25:bbae305. [PMID: 38935069 PMCID: PMC11209730 DOI: 10.1093/bib/bbae305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 05/22/2024] [Accepted: 06/11/2024] [Indexed: 06/28/2024] Open
Abstract
MOTIVATION In the past decade, single-cell RNA sequencing (scRNA-seq) has emerged as a pivotal method for transcriptomic profiling in biomedical research. Precise cell-type identification is crucial for subsequent analysis of single-cell data. And the integration and refinement of annotated data are essential for building comprehensive databases. However, prevailing annotation techniques often overlook the hierarchical organization of cell types, resulting in inconsistent annotations. Meanwhile, most existing integration approaches fail to integrate datasets with different annotation depths and none of them can enhance the labels of outdated data with lower annotation resolutions using more intricately annotated datasets or novel biological findings. RESULTS Here, we introduce scPLAN, a hierarchical computational framework designed for scRNA-seq data analysis. scPLAN excels in annotating unlabeled scRNA-seq data using a reference dataset structured along a hierarchical cell-type tree. It identifies potential novel cell types in a systematic, layer-by-layer manner. Additionally, scPLAN effectively integrates annotated scRNA-seq datasets with varying levels of annotation depth, ensuring consistent refinement of cell-type labels across datasets with lower resolutions. Through extensive annotation and novel cell detection experiments, scPLAN has demonstrated its efficacy. Two case studies have been conducted to showcase how scPLAN integrates datasets with diverse cell-type label resolutions and refine their cell-type labels. AVAILABILITY https://github.com/michaelGuo1204/scPLAN.
Collapse
Affiliation(s)
- Qirui Guo
- Center for Quantitative Biology, Peking University, Yiheyuan Road, 100871, Beijing, China
| | - Musu Yuan
- Center for Quantitative Biology, Peking University, Yiheyuan Road, 100871, Beijing, China
| | - Lei Zhang
- Center for Quantitative Biology, Peking University, Yiheyuan Road, 100871, Beijing, China
- Beijing International Center for Mathematical Research, Peking University, Yiheyuan Road, 100871, Beijing, China
- Center for Machine Learning Research, Peking University, Yiheyuan Road, 100871, Beijing, China
| | - Minghua Deng
- Center for Quantitative Biology, Peking University, Yiheyuan Road, 100871, Beijing, China
- School of Mathematical Sciences, Peking University, Yiheyuan Road, 100871, Beijing, China
- Center for Statistical Science, Peking University, Yiheyuan Road, 100871, Beijing, China
| |
Collapse
|
13
|
Hu Z, Przytycki PF, Pollard KS. CellWalker2: multi-omic discovery of hierarchical cell type relationships and their associations with genomic annotations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.17.594770. [PMID: 38798605 PMCID: PMC11118555 DOI: 10.1101/2024.05.17.594770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
CellWalker2 is a graph diffusion-based method for single-cell genomics data integration. It extends the CellWalker model by incorporating hierarchical relationships between cell types, providing estimates of statistical significance, and adding data structures for analyzing multi-omics data so that gene expression and open chromatin can be jointly modeled. Our open-source software enables users to annotate cells using existing ontologies and to probabilistically match cell types between two or more contexts, including across species. CellWalker2 can also map genomic regions to cell ontologies, enabling precise annotation of elements derived from bulk data, such as enhancers, genetic variants, and sequence motifs. Through simulation studies, we show that CellWalker2 performs better than existing methods in cell type annotation and mapping. We then use data from the brain and immune system to demonstrate CellWalker2's ability to discover cell type-specific regulatory programs and both conserved and divergent cell type relationships in complex tissues.
Collapse
Affiliation(s)
- Zhirui Hu
- Gladstone Institute of Data Science & Biotechnology, 1650 Owens Street, San Francisco, 94158, CA, USA
| | - Pawel F Przytycki
- Gladstone Institute of Data Science & Biotechnology, 1650 Owens Street, San Francisco, 94158, CA, USA
- Faculty of Computing & Data Sciences, Boston University, 665 Commonwealth Avenue, Boston, 02215, MA, USA
| | - Katherine S Pollard
- Gladstone Institute of Data Science & Biotechnology, 1650 Owens Street, San Francisco, 94158, CA, USA
- Department of Epidemiology & Biostatistics, University of California, 1650 Owens Street, San Francisco, 94158, CA, USA
- Chan Zuckerberg Biohub SF, 499 Illinois Street, San Francisco, 94158, CA, USA
| |
Collapse
|
14
|
Lotfollahi M, Yuhan Hao, Theis FJ, Satija R. The future of rapid and automated single-cell data analysis using reference mapping. Cell 2024; 187:2343-2358. [PMID: 38729109 PMCID: PMC11184658 DOI: 10.1016/j.cell.2024.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 03/05/2024] [Accepted: 03/08/2024] [Indexed: 05/12/2024]
Abstract
As the number of single-cell datasets continues to grow rapidly, workflows that map new data to well-curated reference atlases offer enormous promise for the biological community. In this perspective, we discuss key computational challenges and opportunities for single-cell reference-mapping algorithms. We discuss how mapping algorithms will enable the integration of diverse datasets across disease states, molecular modalities, genetic perturbations, and diverse species and will eventually replace manual and laborious unsupervised clustering pipelines.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Yuhan Hao
- Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York Genome Center, New York, NY, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK; Department of Mathematics, Technical University of Munich, Garching, Germany.
| | - Rahul Satija
- Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York Genome Center, New York, NY, USA.
| |
Collapse
|
15
|
Cuevas-Diaz Duran R, Wei H, Wu J. Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets. BMC Genomics 2024; 25:444. [PMID: 38711017 PMCID: PMC11073985 DOI: 10.1186/s12864-024-10364-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 04/29/2024] [Indexed: 05/08/2024] Open
Abstract
BACKGROUND Normalization is a critical step in the analysis of single-cell RNA-sequencing (scRNA-seq) datasets. Its main goal is to make gene counts comparable within and between cells. To do so, normalization methods must account for technical and biological variability. Numerous normalization methods have been developed addressing different sources of dispersion and making specific assumptions about the count data. MAIN BODY The selection of a normalization method has a direct impact on downstream analysis, for example differential gene expression and cluster identification. Thus, the objective of this review is to guide the reader in making an informed decision on the most appropriate normalization method to use. To this aim, we first give an overview of the different single cell sequencing platforms and methods commonly used including isolation and library preparation protocols. Next, we discuss the inherent sources of variability of scRNA-seq datasets. We describe the categories of normalization methods and include examples of each. We also delineate imputation and batch-effect correction methods. Furthermore, we describe data-driven metrics commonly used to evaluate the performance of normalization methods. We also discuss common scRNA-seq methods and toolkits used for integrated data analysis. CONCLUSIONS According to the correction performed, normalization methods can be broadly classified as within and between-sample algorithms. Moreover, with respect to the mathematical model used, normalization methods can further be classified into: global scaling methods, generalized linear models, mixed methods, and machine learning-based methods. Each of these methods depict pros and cons and make different statistical assumptions. However, there is no better performing normalization method. Instead, metrics such as silhouette width, K-nearest neighbor batch-effect test, or Highly Variable Genes are recommended to assess the performance of normalization methods.
Collapse
Affiliation(s)
- Raquel Cuevas-Diaz Duran
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Monterrey, Nuevo Leon, 64710, Mexico.
| | - Haichao Wei
- The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
- Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of Molecular Medicine, Houston, TX, 77030, USA
| | - Jiaqian Wu
- The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
- Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of Molecular Medicine, Houston, TX, 77030, USA.
- MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, 77030, USA.
| |
Collapse
|
16
|
Tadi AA, Alhadidi D, Rueda L. PPPCT: Privacy-Preserving framework for Parallel Clustering Transcriptomics data. Comput Biol Med 2024; 173:108351. [PMID: 38520921 DOI: 10.1016/j.compbiomed.2024.108351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 03/18/2024] [Accepted: 03/18/2024] [Indexed: 03/25/2024]
Abstract
Single-cell transcriptomics data provides crucial insights into patients' health, yet poses significant privacy concerns. Genomic data privacy attacks can have deep implications, encompassing not only the patients' health information but also extending widely to compromise their families'. Moreover, the permanence of leaked data exacerbates the challenges, making retraction an impossibility. While extensive efforts have been directed towards clustering single-cell transcriptomics data, addressing critical challenges, especially in the realm of privacy, remains pivotal. This paper introduces an efficient, fast, privacy-preserving approach for clustering single-cell RNA-sequencing (scRNA-seq) datasets. The key contributions include ensuring data privacy, achieving high-quality clustering, accommodating the high dimensionality inherent in the datasets, and maintaining reasonable computation time for big-scale datasets. Our proposed approach utilizes the map-reduce scheme to parallelize clustering, addressing intensive calculation challenges. Intel Software Guard eXtension (SGX) processors are used to ensure the security of sensitive code and data during processing. Additionally, the approach incorporates a logarithm transformation as a preprocessing step, employs non-negative matrix factorization for dimensionality reduction, and utilizes parallel k-means for clustering. The approach fully leverages the computing capabilities of all processing resources within a secure private cloud environment. Experimental results demonstrate the efficacy of our approach in preserving patient privacy while surpassing state-of-the-art methods in both clustering quality and computation time. Our method consistently achieves a minimum of 7% higher Adjusted Rand Index (ARI) than existing approaches, contingent on dataset size. Additionally, due to parallel computations and dimensionality reduction, our approach exhibits efficiency, converging to very good results in less than 10 seconds for a scRNA-seq dataset with 5000 genes and 6000 cells when prioritizing privacy and under two seconds without privacy considerations. Availability and implementation Code and datasets availability: https://github.com/University-of-Windsor/PPPCT.
Collapse
Affiliation(s)
- Ali Abbasi Tadi
- University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4, Ontario, Canada.
| | - Dima Alhadidi
- University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4, Ontario, Canada
| | - Luis Rueda
- University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4, Ontario, Canada
| |
Collapse
|
17
|
Yabo YA, Heiland DH. Understanding glioblastoma at the single-cell level: Recent advances and future challenges. PLoS Biol 2024; 22:e3002640. [PMID: 38814900 PMCID: PMC11139343 DOI: 10.1371/journal.pbio.3002640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024] Open
Abstract
Glioblastoma, the most aggressive and prevalent form of primary brain tumor, is characterized by rapid growth, diffuse infiltration, and resistance to therapies. Intrinsic heterogeneity and cellular plasticity contribute to its rapid progression under therapy; therefore, there is a need to fully understand these tumors at a single-cell level. Over the past decade, single-cell transcriptomics has enabled the molecular characterization of individual cells within glioblastomas, providing previously unattainable insights into the genetic and molecular features that drive tumorigenesis, disease progression, and therapy resistance. However, despite advances in single-cell technologies, challenges such as high costs, complex data analysis and interpretation, and difficulties in translating findings into clinical practice persist. As single-cell technologies are developed further, more insights into the cellular and molecular heterogeneity of glioblastomas are expected, which will help guide the development of personalized and effective therapies, thereby improving prognosis and quality of life for patients.
Collapse
Affiliation(s)
- Yahaya A Yabo
- Translational Neurosurgery, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
- Microenvironment and Immunology Research Laboratory, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
| | - Dieter Henrik Heiland
- Translational Neurosurgery, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
- Microenvironment and Immunology Research Laboratory, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
- Department of Neurosurgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
- Department of Neurosurgery, Faculty of Medicine, Medical Center University of Freiburg, Freiburg, Germany
- Department of Neurological Surgery, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
- German Cancer Consortium (DKTK) partner site, Freiburg, Germany
| |
Collapse
|
18
|
Zhang K, Kan H, Mao A, Yu F, Geng L, Zhou T, Feng L, Ma X. Integrated Single-Cell Transcriptomic Atlas of Human Kidney Endothelial Cells. J Am Soc Nephrol 2024; 35:578-593. [PMID: 38351505 PMCID: PMC11149048 DOI: 10.1681/asn.0000000000000320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 02/09/2024] [Indexed: 03/23/2024] Open
Abstract
Key Points We created a comprehensive reference atlas of normal human kidney endothelial cells. We confirmed that endothelial cell types in the human kidney were also highly conserved in the mouse kidney. Background Kidney endothelial cells are exposed to different microenvironmental conditions that support specific physiologic processes. However, the heterogeneity of human kidney endothelial cells has not yet been systematically described. Methods We reprocessed and integrated seven human kidney control single-cell/single-nucleus RNA sequencing datasets of >200,000 kidney cells in the same process. Results We identified five major cell types, 29,992 of which were endothelial cells. Endothelial cell reclustering identified seven subgroups that differed in molecular characteristics and physiologic functions. Mapping new data to a normal kidney endothelial cell atlas allows rapid data annotation and analysis. We confirmed that endothelial cell types in the human kidney were also highly conserved in the mouse kidney and identified endothelial marker genes that were conserved in humans and mice, as well as differentially expressed genes between corresponding subpopulations. Furthermore, combined analysis of single-cell transcriptome data with public genome-wide association study data showed a significant enrichment of endothelial cells, especially arterial endothelial cells, in BP heritability. Finally, we identified M1 and M12 from coexpression networks in endothelial cells that may be deeply involved in BP regulation. Conclusions We created a comprehensive reference atlas of normal human kidney endothelial cells that provides the molecular foundation for understanding how the identity and function of kidney endothelial cells are altered in disease, aging, and between species. Finally, we provide a publicly accessible online tool to explore the datasets described in this work (https://vascularmap.jiangnan.edu.cn ).
Collapse
Affiliation(s)
- Ka Zhang
- Wuxi School of Medicine, Jiangnan University, Wuxi, China
- School of Food Science and Technology, Jiangnan University, Wuxi, China
| | - Hao Kan
- Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Aiqin Mao
- Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Fan Yu
- Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Li Geng
- Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Tingting Zhou
- Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Lei Feng
- Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Xin Ma
- Wuxi School of Medicine, Jiangnan University, Wuxi, China
| |
Collapse
|
19
|
Herbst CJ, Lopez-Rodriguez E, Gluhovic V, Schulz S, Brandt R, Timm S, Abledu J, Falivene J, Pennitz P, Kirsten H, Nouailles G, Witzenrath M, Ochs M, Kuebler WM. Characterization of Commercially Available Human Primary Alveolar Epithelial Cells. Am J Respir Cell Mol Biol 2024; 70:339-350. [PMID: 38207121 DOI: 10.1165/rcmb.2023-0320ma] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 01/10/2024] [Indexed: 01/13/2024] Open
Abstract
In vitro lung research requires appropriate cell culture models that adequately mimic in vivo structure and function. Previously, researchers extensively used commercially available and easily expandable A549 and NCI-H441 cells, which replicate some but not all features of alveolar epithelial cells. Specifically, these cells are often restricted by terminally altered expression while lacking important alveolar epithelial characteristics. Of late, human primary alveolar epithelial cells (hPAEpCs) have become commercially available but are so far poorly specified. Here, we applied a comprehensive set of technologies to characterize their morphology, surface marker expression, transcriptomic profile, and functional properties. At optimized seeding numbers of 7,500 cells per square centimeter and growth at a gas-liquid interface, hPAEpCs formed regular monolayers with tight junctions and amiloride-sensitive transepithelial ion transport. Electron microscopy revealed lamellar body and microvilli formation characteristic for alveolar type II cells. Protein and single-cell transcriptomic analyses revealed expression of alveolar type I and type II cell markers; yet, transcriptomic data failed to detect NKX2-1, an important transcriptional regulator of alveolar cell differentiation. With increasing passage number, hPAEpCs transdifferentiated toward alveolar-basal intermediates characterized as SFTPC-, KRT8high, and KRT5- cells. In spite of marked changes in the transcriptome as a function of passaging, Uniform Manifold Approximation and Projection plots did not reveal major shifts in cell clusters, and epithelial permeability was unaffected. The present work delineates optimized culture conditions, cellular characteristics, and functional properties of commercially available hPAEpCs. hPAEpCs may provide a useful model system for studies on drug delivery, barrier function, and transepithelial ion transport in vitro.
Collapse
Affiliation(s)
- Christopher J Herbst
- Institute of Physiology
- German Center for Cardiovascular Research, Deutsches Zentrum für Herz-Kreislauf-Forschung (DZHK), Berlin, Germany
- German Center for Lung Research, Deutsches Zentrum für Lungenforschung (DZL), Berlin, Germany
| | | | | | | | | | - Sara Timm
- Core Facility Electron Microscopy, and
| | | | | | - Peter Pennitz
- Department of Infectious Diseases, Respiratory Medicine and Critical Care, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Holger Kirsten
- Institute for Medical Informatics, Statistics, and Epidemiology, University of Leipzig, Leipzig, Germany; and
| | - Geraldine Nouailles
- Department of Infectious Diseases, Respiratory Medicine and Critical Care, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Martin Witzenrath
- Department of Infectious Diseases, Respiratory Medicine and Critical Care, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Matthias Ochs
- Institute of Functional Anatomy
- Core Facility Electron Microscopy, and
- German Center for Lung Research, Deutsches Zentrum für Lungenforschung (DZL), Berlin, Germany
| | - Wolfgang M Kuebler
- Institute of Physiology
- German Center for Cardiovascular Research, Deutsches Zentrum für Herz-Kreislauf-Forschung (DZHK), Berlin, Germany
- German Center for Lung Research, Deutsches Zentrum für Lungenforschung (DZL), Berlin, Germany
- Keenan Research Centre, St. Michael's Hospital, and
- Departments of Surgery and
- Physiology, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
20
|
Haviv D, Remšík J, Gatie M, Snopkowski C, Takizawa M, Pereira N, Bashkin J, Jovanovich S, Nawy T, Chaligne R, Boire A, Hadjantonakis AK, Pe'er D. The covariance environment defines cellular niches for spatial inference. Nat Biotechnol 2024:10.1038/s41587-024-02193-4. [PMID: 38565973 DOI: 10.1038/s41587-024-02193-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 02/28/2024] [Indexed: 04/04/2024]
Abstract
A key challenge of analyzing data from high-resolution spatial profiling technologies is to suitably represent the features of cellular neighborhoods or niches. Here we introduce the covariance environment (COVET), a representation that leverages the gene-gene covariate structure across cells in the niche to capture the multivariate nature of cellular interactions within it. We define a principled optimal transport-based distance metric between COVET niches that scales to millions of cells. Using COVET to encode spatial context, we developed environmental variational inference (ENVI), a conditional variational autoencoder that jointly embeds spatial and single-cell RNA sequencing data into a latent space. ENVI includes two decoders: one to impute gene expression across the spatial modality and a second to project spatial information onto single-cell data. ENVI can confer spatial context to genomics data from single dissociated cells and outperforms alternatives for imputing gene expression on diverse spatial datasets.
Collapse
Affiliation(s)
- Doron Haviv
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Ján Remšík
- Human Oncology & Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Mohamed Gatie
- Developmental Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Catherine Snopkowski
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Meril Takizawa
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | | | | | | | - Tal Nawy
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ronan Chaligne
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Adrienne Boire
- Human Oncology & Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Department of Neurology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Brain Tumor Center, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Anna-Katerina Hadjantonakis
- Developmental Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Dana Pe'er
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Howard Hughes Medical Institute, New York, NY, USA.
| |
Collapse
|
21
|
Koca MB, Sevilgen FE. Integration of single-cell proteomic datasets through distinctive proteins in cell clusters. Proteomics 2024; 24:e2300282. [PMID: 38135888 DOI: 10.1002/pmic.202300282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 11/01/2023] [Accepted: 12/04/2023] [Indexed: 12/24/2023]
Abstract
The use of mass spectrometry and antibody-based sequencing technologies at the single-cell level has led to an increase in single-cell proteomic datasets. Integrating these datasets is crucial to eliminate the batch effect that often arises due to their limited sequencing molecules. Although methods for horizontally integrating high-dimensional single-cell transcriptomic datasets can also be applied to single-cell proteomic datasets, a specialized approach explicitly tailored for low-dimensional proteomic datasets may enhance the integration process. Here, we introduce SCPRO-HI, an algorithm for the horizontal integration of antibody-based single-cell proteomic datasets. It utilizes a hierarchical cell anchoring technique to match cells based on the similarity of distinctive proteins for constituting cell clusters. A novel variational auto-encoder model is employed for correcting batch effects on the protein abundances, eliminating the need for mapping them into a new domain. Moreover, we propose a technique for extending the algorithm to high-dimensional datasets. The performance of the SCPRO-HI algorithm is evaluated using simulated and real-world single-cell proteomic datasets. The findings demonstrate our algorithm outperforms state-of-the-art methods, achieving a 75% higher silhouette score while preserving HVPs 13% better. Furthermore, the algorithm shows competitive performance in transcriptomic datasets, suggesting potential for integrating high-dimensional mass-spectrometry-based proteomic datasets.
Collapse
Affiliation(s)
- Mehmet Burak Koca
- Computer Engineering Department, Gebze Technical University, Kocaeli, Türkiye
| | - Fatih Erdoğan Sevilgen
- Institute for Data Science and Artificial Intelligence, Boğaziçi University, İstanbul, Türkiye
| |
Collapse
|
22
|
Xie Y, Chen H, Chellamuthu VR, Lajam ABM, Albani S, Low AHL, Petretto E, Behmoaras J. Comparative Analysis of Single-Cell RNA Sequencing Methods with and without Sample Multiplexing. Int J Mol Sci 2024; 25:3828. [PMID: 38612639 PMCID: PMC11011421 DOI: 10.3390/ijms25073828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 03/20/2024] [Accepted: 03/21/2024] [Indexed: 04/14/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful technique for investigating biological heterogeneity at the single-cell level in human systems and model organisms. Recent advances in scRNA-seq have enabled the pooling of cells from multiple samples into single libraries, thereby increasing sample throughput while reducing technical batch effects, library preparation time, and the overall cost. However, a comparative analysis of scRNA-seq methods with and without sample multiplexing is lacking. In this study, we benchmarked methods from two representative platforms: Parse Biosciences (Parse; with sample multiplexing) and 10x Genomics (10x; without sample multiplexing). By using peripheral blood mononuclear cells (PBMCs) obtained from two healthy individuals, we demonstrate that demultiplexed scRNA-seq data obtained from Parse showed similar cell type frequencies compared to 10x data where samples were not multiplexed. Despite relatively lower cell capture affecting library preparation, Parse can detect rare cell types (e.g., plasmablasts and dendritic cells) which is likely due to its relatively higher sensitivity in gene detection. Moreover, a comparative analysis of transcript quantification between the two platforms revealed platform-specific distributions of gene length and GC content. These results offer guidance for researchers in designing high-throughput scRNA-seq studies.
Collapse
Affiliation(s)
- Yi Xie
- Programme in Cardiovascular and Metabolic Disorders and Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore; (Y.X.)
| | - Huimei Chen
- Programme in Cardiovascular and Metabolic Disorders and Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore; (Y.X.)
| | - Vasuki Ranjani Chellamuthu
- Translational Immunology Institute, SingHealth/Duke-NUS Academic Medical Centre, Academia, Singapore 169856, Singapore; (V.R.C.)
| | - Ahmad bin Mohamed Lajam
- Translational Immunology Institute, SingHealth/Duke-NUS Academic Medical Centre, Academia, Singapore 169856, Singapore; (V.R.C.)
| | - Salvatore Albani
- Translational Immunology Institute, SingHealth/Duke-NUS Academic Medical Centre, Academia, Singapore 169856, Singapore; (V.R.C.)
| | - Andrea Hsiu Ling Low
- Department of Rheumatology and Immunology, Singapore General Hospital, Academia, Singapore 169856, Singapore;
- SingHealth Duke-NUS Medicine Academic Clinical Programme, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Enrico Petretto
- Programme in Cardiovascular and Metabolic Disorders and Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore; (Y.X.)
- Institute for Big Data and Artificial Intelligence in Medicine, School of Science, China Pharmaceutical University, Nanjing 210009, China
| | - Jacques Behmoaras
- Programme in Cardiovascular and Metabolic Disorders and Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore; (Y.X.)
- Department of Immunology and Inflammation, Centre for Inflammatory Disease, Imperial College London, Hammersmith Hospital, London W12 0NN, UK
| |
Collapse
|
23
|
Zhang W, Cui Y, Liu B, Loza M, Park SJ, Nakai K. HyGAnno: hybrid graph neural network-based cell type annotation for single-cell ATAC sequencing data. Brief Bioinform 2024; 25:bbae152. [PMID: 38581422 PMCID: PMC10998639 DOI: 10.1093/bib/bbae152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/19/2024] [Accepted: 03/10/2024] [Indexed: 04/08/2024] Open
Abstract
Reliable cell type annotations are crucial for investigating cellular heterogeneity in single-cell omics data. Although various computational approaches have been proposed for single-cell RNA sequencing (scRNA-seq) annotation, high-quality cell labels are still lacking in single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) data, because of extreme sparsity and inconsistent chromatin accessibility between datasets. Here, we present a novel automated cell annotation method that transfers cell type information from a well-labeled scRNA-seq reference to an unlabeled scATAC-seq target, via a parallel graph neural network, in a semi-supervised manner. Unlike existing methods that utilize only gene expression or gene activity features, HyGAnno leverages genome-wide accessibility peak features to facilitate the training process. In addition, HyGAnno reconstructs a reference-target cell graph to detect cells with low prediction reliability, according to their specific graph connectivity patterns. HyGAnno was assessed across various datasets, showcasing its strengths in precise cell annotation, generating interpretable cell embeddings, robustness to noisy reference data and adaptability to tumor tissues.
Collapse
Affiliation(s)
- Weihang Zhang
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Yang Cui
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Bowen Liu
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Martin Loza
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Sung-Joon Park
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Kenta Nakai
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| |
Collapse
|
24
|
Zhai Y, Chen L, Deng M. scBOL: a universal cell type identification framework for single-cell and spatial transcriptomics data. Brief Bioinform 2024; 25:bbae188. [PMID: 38678389 PMCID: PMC11056022 DOI: 10.1093/bib/bbae188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 03/11/2024] [Accepted: 04/14/2024] [Indexed: 04/30/2024] Open
Abstract
MOTIVATION Over the past decade, single-cell transcriptomic technologies have experienced remarkable advancements, enabling the simultaneous profiling of gene expressions across thousands of individual cells. Cell type identification plays an essential role in exploring tissue heterogeneity and characterizing cell state differences. With more and more well-annotated reference data becoming available, massive automatic identification methods have sprung up to simplify the annotation process on unlabeled target data by transferring the cell type knowledge. However, in practice, the target data often include some novel cell types that are not in the reference data. Most existing works usually classify these private cells as one generic 'unassigned' group and learn the features of known and novel cell types in a coupled way. They are susceptible to the potential batch effects and fail to explore the fine-grained semantic knowledge of novel cell types, thus hurting the model's discrimination ability. Additionally, emerging spatial transcriptomic technologies, such as in situ hybridization, sequencing and multiplexed imaging, present a novel challenge to current cell type identification strategies that predominantly neglect spatial organization. Consequently, it is imperative to develop a versatile method that can proficiently annotate single-cell transcriptomics data, encompassing both spatial and non-spatial dimensions. RESULTS To address these issues, we propose a new, challenging yet realistic task called universal cell type identification for single-cell and spatial transcriptomics data. In this task, we aim to give semantic labels to target cells from known cell types and cluster labels to those from novel ones. To tackle this problem, instead of designing a suboptimal two-stage approach, we propose an end-to-end algorithm called scBOL from the perspective of Bipartite prototype alignment. Firstly, we identify the mutual nearest clusters in reference and target data as their potential common cell types. On this basis, we mine the cycle-consistent semantic anchor cells to build the intrinsic structure association between two data. Secondly, we design a neighbor-aware prototypical learning paradigm to strengthen the inter-cluster separability and intra-cluster compactness within each data, thereby inspiring the discriminative feature representations. Thirdly, driven by the semantic-aware prototypical learning framework, we can align the known cell types and separate the private cell types from them among reference and target data. Such an algorithm can be seamlessly applied to various data types modeled by different foundation models that can generate the embedding features for cells. Specifically, for non-spatial single-cell transcriptomics data, we use the autoencoder neural network to learn latent low-dimensional cell representations, and for spatial single-cell transcriptomics data, we apply the graph convolution network to capture molecular and spatial similarities of cells jointly. Extensive results on our carefully designed evaluation benchmarks demonstrate the superiority of scBOL over various state-of-the-art cell type identification methods. To our knowledge, we are the pioneers in presenting this pragmatic annotation task, as well as in devising a comprehensive algorithmic framework aimed at resolving this challenge across varied types of single-cell data. Finally, scBOL is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scBOL.
Collapse
Affiliation(s)
- Yuyao Zhai
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Liang Chen
- Huawei Technologies Co., Ltd., Beijing, China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing, China
- Center for Statistical Science, Peking University, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing, China
| |
Collapse
|
25
|
Bottomly D, McWeeney S. Just how transformative will AI/ML be for immuno-oncology? J Immunother Cancer 2024; 12:e007841. [PMID: 38531545 DOI: 10.1136/jitc-2023-007841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/15/2024] [Indexed: 03/28/2024] Open
Abstract
Immuno-oncology involves the study of approaches which harness the patient's immune system to fight malignancies. Immuno-oncology, as with every other biomedical and clinical research field as well as clinical operations, is in the midst of technological revolutions, which vastly increase the amount of available data. Recent advances in artificial intelligence and machine learning (AI/ML) have received much attention in terms of their potential to harness available data to improve insights and outcomes in many areas including immuno-oncology. In this review, we discuss important aspects to consider when evaluating the potential impact of AI/ML applications in the clinic. We highlight four clinical/biomedical challenges relevant to immuno-oncology and how they may be able to be addressed by the latest advancements in AI/ML. These challenges include (1) efficiency in clinical workflows, (2) curation of high-quality image data, (3) finding, extracting and synthesizing text knowledge as well as addressing, and (4) small cohort size in immunotherapeutic evaluation cohorts. Finally, we outline how advancements in reinforcement and federated learning, as well as the development of best practices for ethical and unbiased data generation, are likely to drive future innovations.
Collapse
Affiliation(s)
- Daniel Bottomly
- Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon, USA
| | - Shannon McWeeney
- Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon, USA
| |
Collapse
|
26
|
He S, Jin Y, Nazaret A, Shi L, Chen X, Rampersaud S, Dhillon BS, Valdez I, Friend LE, Fan JL, Park CY, Mintz RL, Lao YH, Carrera D, Fang KW, Mehdi K, Rohde M, McFaline-Figueroa JL, Blei D, Leong KW, Rudensky AY, Plitas G, Azizi E. Starfysh integrates spatial transcriptomic and histologic data to reveal heterogeneous tumor-immune hubs. Nat Biotechnol 2024:10.1038/s41587-024-02173-8. [PMID: 38514799 DOI: 10.1038/s41587-024-02173-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 02/14/2024] [Indexed: 03/23/2024]
Abstract
Spatially resolved gene expression profiling provides insight into tissue organization and cell-cell crosstalk; however, sequencing-based spatial transcriptomics (ST) lacks single-cell resolution. Current ST analysis methods require single-cell RNA sequencing data as a reference for rigorous interpretation of cell states, mostly do not use associated histology images and are not capable of inferring shared neighborhoods across multiple tissues. Here we present Starfysh, a computational toolbox using a deep generative model that incorporates archetypal analysis and any known cell type markers to characterize known or new tissue-specific cell states without a single-cell reference. Starfysh improves the characterization of spatial dynamics in complex tissues using histology images and enables the comparison of niches as spatial hubs across tissues. Integrative analysis of primary estrogen receptor (ER)-positive breast cancer, triple-negative breast cancer (TNBC) and metaplastic breast cancer (MBC) tissues led to the identification of spatial hubs with patient- and disease-specific cell type compositions and revealed metabolic reprogramming shaping immunosuppressive hubs in aggressive MBC.
Collapse
Affiliation(s)
- Siyu He
- Department of Biomedical Engineering, Columbia University, New York, NY, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
| | - Yinuo Jin
- Department of Biomedical Engineering, Columbia University, New York, NY, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
| | - Achille Nazaret
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
- Department of Computer Science, Columbia University, New York, NY, USA
| | - Lingting Shi
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
| | - Xueer Chen
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
| | - Sham Rampersaud
- Pharmaceutical Sciences and Pharmacogenomics Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Bahawar S Dhillon
- Immunology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Izabella Valdez
- The Graduate School of Biomedical Sciences at the Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lauren E Friend
- Department of Biomedical Engineering, Columbia University, New York, NY, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
| | - Joy Linyue Fan
- Department of Biomedical Engineering, Columbia University, New York, NY, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
| | - Cameron Y Park
- Department of Biomedical Engineering, Columbia University, New York, NY, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
| | - Rachel L Mintz
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Yeh-Hsing Lao
- Department of Biomedical Engineering, Columbia University, New York, NY, USA
- Department of Pharmaceutical Sciences, University at Buffalo, the State University of New York, Buffalo, NY, USA
| | - David Carrera
- Department of Computer Science, Columbia University, New York, NY, USA
| | - Kaylee W Fang
- Department of Biomedical Engineering, Columbia University, New York, NY, USA
- Department of Computer Science, Columbia University, New York, NY, USA
| | - Kaleem Mehdi
- Department of Computer Science, Fordham University, New York, NY, USA
| | | | - José L McFaline-Figueroa
- Department of Biomedical Engineering, Columbia University, New York, NY, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, USA
| | - David Blei
- Department of Computer Science, Columbia University, New York, NY, USA
- Department of Statistics, Columbia University, New York, NY, USA
| | - Kam W Leong
- Department of Biomedical Engineering, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Alexander Y Rudensky
- Immunology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Howard Hughes Medical Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Ludwig Center, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| | - George Plitas
- Immunology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Howard Hughes Medical Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Ludwig Center, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Department of Surgery, Breast Service, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| | - Elham Azizi
- Department of Biomedical Engineering, Columbia University, New York, NY, USA.
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA.
- Department of Computer Science, Columbia University, New York, NY, USA.
- Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, USA.
- Data Science Institute, Columbia University, New York, NY, USA.
| |
Collapse
|
27
|
Yan K, Liu Q, Huang R, Jiang Y, Bian Z, Li S, Li L, Shen F, Tsuneyama K, Zhang Q, Lian Z, Guan H, Xu B. Spatial transcriptomics reveals prognosis-associated cellular heterogeneity in the papillary thyroid carcinoma microenvironment. Clin Transl Med 2024; 14:e1594. [PMID: 38426403 PMCID: PMC10905537 DOI: 10.1002/ctm2.1594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 01/28/2024] [Accepted: 02/05/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND Papillary thyroid carcinoma (PTC) is the most common malignant endocrine tumour, and its incidence and prevalence are increasing considerably. Cellular heterogeneity in the tumour microenvironment is important for PTC prognosis. Spatial transcriptomics is a powerful technique for cellular heterogeneity study. METHODS In conjunction with a clinical pathologist identification method, spatial transcriptomics was employed to characterise the spatial location and RNA profiles of PTC-associated cells within the tissue sections. The spatial RNA-clinical signature genes for each cell type were extracted and applied to outlining the distribution regions of specific cells on the entire section. The cellular heterogeneity of each cell type was further revealed by ContourPlot analysis, monocle analysis, trajectory analysis, ligand-receptor analysis and Gene Ontology enrichment analysis. RESULTS The spatial distribution region of tumour cells, typical and atypical follicular cells (FCs and AFCs) and immune cells were accurately and comprehensively identified in all five PTC tissue sections. AFCs were identified as a transitional state between FCs and tumour cells, exhibiting a higher resemblance to the latter. Three tumour foci were shared among all patients out of the 13 observed. Notably, tumour foci No. 2 displayed elevated expression levels of genes associated with lower relapse-free survival in PTC patients. We discovered key ligand-receptor interactions, including LAMB3-ITGA2, FN1-ITGA3 and FN1-SDC4, involved in the transition of PTC cells from FCs to AFCs and eventually to tumour cells. High expression of these patterns correlated with reduced relapse-free survival. In the tumour immune microenvironment, reduced interaction between myeloid-derived TGFB1 and TGFBR1 in tumour focus No. 2 contributed to tumourigenesis and increased heterogeneity. The spatial RNA-clinical analysis method developed here revealed prognosis-associated cellular heterogeneity in the PTC microenvironment. CONCLUSIONS The occurrence of tumour foci No. 2 and three enhanced ligand-receptor interactions in the AFC area/tumour foci reduced the relapse-free survival of PTC patients, potentially leading to improved prognostic strategies and targeted therapies for PTC patients.
Collapse
Affiliation(s)
- Kai Yan
- Guangdong Cardiovascular InstituteGuangdong Provincial People's HospitalGuangdong Academy of Medical SciencesGuangzhouChina
| | - Qing‐Zhi Liu
- Chronic Disease LaboratoryInstitutes for Life SciencesSouth China University of TechnologyGuangzhouChina
| | - Rong‐Rong Huang
- Guangdong Cardiovascular InstituteGuangdong Provincial People's HospitalGuangdong Academy of Medical SciencesGuangzhouChina
| | - Yi‐Hua Jiang
- Guangdong Cardiovascular InstituteGuangdong Provincial People's HospitalGuangdong Academy of Medical SciencesGuangzhouChina
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and ApplicationGuangzhouChina
| | - Zhen‐Hua Bian
- School of Biomedical Sciences and EngineeringSouth China University of TechnologyGuangzhou International CampusGuangzhouChina
| | - Si‐Jin Li
- Department of Thyroid SurgeryGuangzhou First People's HospitalSouth China University of TechnologyGuangzhouChina
| | - Liang Li
- Medical Research InstituteGuangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences)Southern Medical UniversityGuangzhouChina
| | - Fei Shen
- Department of Thyroid SurgeryGuangzhou First People's HospitalSouth China University of TechnologyGuangzhouChina
| | - Koichi Tsuneyama
- Department of Pathology and Laboratory MedicineInstitute of Biomedical SciencesTokushima University Graduate SchoolTokushimaJapan
| | - Qing‐Ling Zhang
- Department of PathologyGuangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences)Southern Medical UniversityGuangzhouChina
| | - Zhe‐Xiong Lian
- Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences)Southern Medical UniversityGuangzhouChina
| | - Haixia Guan
- Department of EndocrinologyGuangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences)Southern Medical UniversityGuangzhouChina
| | - Bo Xu
- Department of Thyroid SurgeryGuangzhou First People's HospitalSouth China University of TechnologyGuangzhouChina
| |
Collapse
|
28
|
Schäfer PSL, Dimitrov D, Villablanca EJ, Saez-Rodriguez J. Integrating single-cell multi-omics and prior biological knowledge for a functional characterization of the immune system. Nat Immunol 2024; 25:405-417. [PMID: 38413722 DOI: 10.1038/s41590-024-01768-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 01/16/2024] [Indexed: 02/29/2024]
Abstract
The immune system comprises diverse specialized cell types that cooperate to defend the host against a wide range of pathogenic threats. Recent advancements in single-cell and spatial multi-omics technologies provide rich information about the molecular state of immune cells. Here, we review how the integration of single-cell and spatial multi-omics data with prior knowledge-gathered from decades of detailed biochemical studies-allows us to obtain functional insights, focusing on gene regulatory processes and cell-cell interactions. We present diverse applications in immunology and critically assess underlying assumptions and limitations. Finally, we offer a perspective on the ongoing technological and algorithmic developments that promise to get us closer to a systemic mechanistic understanding of the immune system.
Collapse
Affiliation(s)
- Philipp Sven Lars Schäfer
- Institute for Computational Bioscience, Faculty of Medicine and Heidelberg University Hospital, Heidelberg University, Heidelberg, Germany
| | - Daniel Dimitrov
- Institute for Computational Bioscience, Faculty of Medicine and Heidelberg University Hospital, Heidelberg University, Heidelberg, Germany
| | - Eduardo J Villablanca
- Division of Immunology and Allergy, Department of Medicine Solna, Karolinska Institute and Karolinska University Hospital, Stockholm, Sweden
- Center of Molecular Medicine, Stockholm, Sweden
| | - Julio Saez-Rodriguez
- Institute for Computational Bioscience, Faculty of Medicine and Heidelberg University Hospital, Heidelberg University, Heidelberg, Germany.
| |
Collapse
|
29
|
Tur S, Palii CG, Brand M. Cell fate decision in erythropoiesis: Insights from multiomics studies. Exp Hematol 2024; 131:104167. [PMID: 38262486 PMCID: PMC10939800 DOI: 10.1016/j.exphem.2024.104167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/10/2024] [Accepted: 01/13/2024] [Indexed: 01/25/2024]
Abstract
Every second, the body produces 2 million red blood cells through a process called erythropoiesis. Erythropoiesis is hierarchical in that it results from a series of cell fate decisions whereby hematopoietic stem cells progress toward the erythroid lineage. Single-cell transcriptomic and proteomic approaches have revolutionized the way we understand erythropoiesis, revealing it to be a gradual process that underlies a progressive restriction of fate potential driven by quantitative changes in lineage-specifying transcription factors. Despite these major advances, we still know very little about what cell fate decision entails at the molecular level. Novel approaches that simultaneously measure additional properties in single cells, including chromatin accessibility, transcription factor binding, and/or cell surface proteins are being developed at a fast pace, providing the means to exciting new advances in the near future. In this review, we briefly summarize the main findings obtained from single-cell studies of erythropoiesis, highlight outstanding questions, and suggest recent technological advances to address them.
Collapse
Affiliation(s)
- Steven Tur
- Department of Cell and Regenerative Biology, Wisconsin Blood Cancer Research Institute, Wisconsin Institutes for Medical Research, University of Wisconsin School of Medicine and Public Health, Carbone Cancer Center, Madison, WI; Cellular and Molecular Biology Graduate Program, University of Wisconsin School of Medicine and Public Health, Madison, WI
| | - Carmen G Palii
- Department of Cell and Regenerative Biology, Wisconsin Blood Cancer Research Institute, Wisconsin Institutes for Medical Research, University of Wisconsin School of Medicine and Public Health, Carbone Cancer Center, Madison, WI
| | - Marjorie Brand
- Department of Cell and Regenerative Biology, Wisconsin Blood Cancer Research Institute, Wisconsin Institutes for Medical Research, University of Wisconsin School of Medicine and Public Health, Carbone Cancer Center, Madison, WI.
| |
Collapse
|
30
|
Ma Y, Zhou Y, Jiang D, Dai W, Li J, Deng C, Chen C, Zheng G, Zhang Y, Qiu F, Sun H, Xing S, Han H, Qu J, Wu N, Yao Y, Su J. Integration of human organoids single-cell transcriptomic profiles and human genetics repurposes critical cell type-specific drug targets for severe COVID-19. Cell Prolif 2024; 57:e13558. [PMID: 37807299 PMCID: PMC10905359 DOI: 10.1111/cpr.13558] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 08/31/2023] [Accepted: 09/18/2023] [Indexed: 10/10/2023] Open
Abstract
Human organoids recapitulate the cell type diversity and function of their primary organs holding tremendous potentials for basic and translational research. Advances in single-cell RNA sequencing (scRNA-seq) technology and genome-wide association study (GWAS) have accelerated the biological and therapeutic interpretation of trait-relevant cell types or states. Here, we constructed a computational framework to integrate atlas-level organoid scRNA-seq data, GWAS summary statistics, expression quantitative trait loci, and gene-drug interaction data for distinguishing critical cell populations and drug targets relevant to coronavirus disease 2019 (COVID-19) severity. We found that 39 cell types across eight kinds of organoids were significantly associated with COVID-19 outcomes. Notably, subset of lung mesenchymal stem cells increased proximity with fibroblasts predisposed to repair COVID-19-damaged lung tissue. Brain endothelial cell subset exhibited significant associations with severe COVID-19, and this cell subset showed a notable increase in cell-to-cell interactions with other brain cell types, including microglia. We repurposed 33 druggable genes, including IFNAR2, TYK2, and VIPR2, and their interacting drugs for COVID-19 in a cell-type-specific manner. Overall, our results showcase that host genetic determinants have cellular-specific contribution to COVID-19 severity, and identification of cell type-specific drug targets may facilitate to develop effective therapeutics for treating severe COVID-19 and its complications.
Collapse
Affiliation(s)
- Yunlong Ma
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Department of Biomedical Informatics, Institute of Biomedical Big Data, Wenzhou Medical University, Wenzhou, China
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Zhejiang, China
| | - Yijun Zhou
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Department of Biomedical Informatics, Institute of Biomedical Big Data, Wenzhou Medical University, Wenzhou, China
| | - Dingping Jiang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Zhejiang, China
| | - Wei Dai
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, China
| | - Jingjing Li
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Department of Biomedical Informatics, Institute of Biomedical Big Data, Wenzhou Medical University, Wenzhou, China
| | - Chunyu Deng
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Cheng Chen
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Department of Biomedical Informatics, Institute of Biomedical Big Data, Wenzhou Medical University, Wenzhou, China
| | - Gongwei Zheng
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Department of Biomedical Informatics, Institute of Biomedical Big Data, Wenzhou Medical University, Wenzhou, China
| | - Yaru Zhang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Department of Biomedical Informatics, Institute of Biomedical Big Data, Wenzhou Medical University, Wenzhou, China
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Zhejiang, China
| | - Fei Qiu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Department of Biomedical Informatics, Institute of Biomedical Big Data, Wenzhou Medical University, Wenzhou, China
| | - Haojun Sun
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Department of Biomedical Informatics, Institute of Biomedical Big Data, Wenzhou Medical University, Wenzhou, China
| | - Shilai Xing
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Department of Biomedical Informatics, Institute of Biomedical Big Data, Wenzhou Medical University, Wenzhou, China
| | - Haijun Han
- School of Medicine, Hangzhou City University, Hangzhou, China
| | - Jia Qu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Nan Wu
- Beijing Key Laboratory for Genetic Research of Skeletal Deformity, Key Laboratory of Big Data for Spinal Deformities, Department of Orthopedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China
| | - Yinghao Yao
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Zhejiang, China
| | - Jianzhong Su
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Department of Biomedical Informatics, Institute of Biomedical Big Data, Wenzhou Medical University, Wenzhou, China
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Zhejiang, China
| |
Collapse
|
31
|
Liu W, Li W, Zhao Z. Single-Cell Transcriptomics Reveals Pre-existing COVID-19 Vulnerability Factors in Lung Cancer Patients. Mol Cancer Res 2024; 22:240-253. [PMID: 38063850 PMCID: PMC10922768 DOI: 10.1158/1541-7786.mcr-23-0692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Revised: 11/09/2023] [Accepted: 12/06/2023] [Indexed: 01/07/2024]
Abstract
Coronavirus disease 2019 (COVID-19) and cancer are major health threats, and individuals may develop both simultaneously. Recent studies have indicated that patients with cancer are particularly vulnerable to COVID-19, but the molecular mechanisms underlying the associations remain poorly understood. To address this knowledge gap, we collected single-cell RNA-sequencing data from COVID-19, lung adenocarcinoma, small cell lung carcinoma patients, and normal lungs to perform an integrated analysis. We characterized altered cell populations, gene expression, and dysregulated intercellular communication in diseases. Our analysis identified pathologic conditions shared by COVID-19 and lung cancer, including upregulated TMPRSS2 expression in epithelial cells, stronger inflammatory responses mediated by macrophages, increased T-cell response suppression, and elevated fibrosis risk by pathologic fibroblasts. These pre-existing conditions in patients with lung cancer may lead to more severe inflammation, fibrosis, and weakened adaptive immune response upon COVID-19 infection. Our findings revealed potential molecular mechanisms driving an increased COVID-19 risk in patients with lung cancer and suggested preventive and therapeutic targets for COVID-19 in this population. IMPLICATIONS Our work reveals the potential molecular mechanisms contributing to the vulnerability to COVID-19 in patients with lung cancer.
Collapse
Affiliation(s)
- Wendao Liu
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, USA
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Wenbo Li
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, USA
- Department of Biochemistry and Molecular Biology, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Zhongming Zhao
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, USA
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
32
|
Farah EN, Hu RK, Kern C, Zhang Q, Lu TY, Ma Q, Tran S, Zhang B, Carlin D, Monell A, Blair AP, Wang Z, Eschbach J, Li B, Destici E, Ren B, Evans SM, Chen S, Zhu Q, Chi NC. Spatially organized cellular communities form the developing human heart. Nature 2024; 627:854-864. [PMID: 38480880 PMCID: PMC10972757 DOI: 10.1038/s41586-024-07171-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 02/07/2024] [Indexed: 03/18/2024]
Abstract
The heart, which is the first organ to develop, is highly dependent on its form to function1,2. However, how diverse cardiac cell types spatially coordinate to create the complex morphological structures that are crucial for heart function remains unclear. Here we integrated single-cell RNA-sequencing with high-resolution multiplexed error-robust fluorescence in situ hybridization to resolve the identity of the cardiac cell types that develop the human heart. This approach also provided a spatial mapping of individual cells that enables illumination of their organization into cellular communities that form distinct cardiac structures. We discovered that many of these cardiac cell types further specified into subpopulations exclusive to specific communities, which support their specialization according to the cellular ecosystem and anatomical region. In particular, ventricular cardiomyocyte subpopulations displayed an unexpected complex laminar organization across the ventricular wall and formed, with other cell subpopulations, several cellular communities. Interrogating cell-cell interactions within these communities using in vivo conditional genetic mouse models and in vitro human pluripotent stem cell systems revealed multicellular signalling pathways that orchestrate the spatial organization of cardiac cell subpopulations during ventricular wall morphogenesis. These detailed findings into the cellular social interactions and specialization of cardiac cell types constructing and remodelling the human heart offer new insights into structural heart diseases and the engineering of complex multicellular tissues for human heart repair.
Collapse
Affiliation(s)
- Elie N Farah
- Department of Medicine, Division of Cardiology, University of California San Diego, La Jolla, CA, USA
| | - Robert K Hu
- Department of Medicine, Division of Cardiology, University of California San Diego, La Jolla, CA, USA
| | - Colin Kern
- Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
| | - Qingquan Zhang
- Department of Medicine, Division of Cardiology, University of California San Diego, La Jolla, CA, USA
| | - Ting-Yu Lu
- Materials Science and Engineering Program, University of California San Diego, La Jolla, CA, USA
| | - Qixuan Ma
- Department of Medicine, Division of Cardiology, University of California San Diego, La Jolla, CA, USA
| | - Shaina Tran
- Department of Medicine, Division of Cardiology, University of California San Diego, La Jolla, CA, USA
| | - Bo Zhang
- Department of Medicine, Division of Cardiology, University of California San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Daniel Carlin
- Department of Medicine, Division of Cardiology, University of California San Diego, La Jolla, CA, USA
| | - Alexander Monell
- Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
| | - Andrew P Blair
- Department of Medicine, Division of Cardiology, University of California San Diego, La Jolla, CA, USA
| | - Zilu Wang
- Department of Medicine, Division of Cardiology, University of California San Diego, La Jolla, CA, USA
| | - Jacqueline Eschbach
- Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
| | - Bin Li
- Department of Cellular and Molecular Medicine, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Eugin Destici
- Department of Medicine, Division of Cardiology, University of California San Diego, La Jolla, CA, USA
| | - Bing Ren
- Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Department of Cellular and Molecular Medicine, School of Medicine, University of California San Diego, La Jolla, CA, USA
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Sylvia M Evans
- Department of Medicine, Division of Cardiology, University of California San Diego, La Jolla, CA, USA
- Department of Pharmacology, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Shaochen Chen
- Materials Science and Engineering Program, University of California San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
- Department of NanoEngineering, University of California San Diego, La Jolla, CA, USA
- Institute of Engineering in Medicine, University of California San Diego, La Jolla, CA, USA
| | - Quan Zhu
- Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA.
| | - Neil C Chi
- Department of Medicine, Division of Cardiology, University of California San Diego, La Jolla, CA, USA.
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA.
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA.
- Institute of Engineering in Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
33
|
Gong M, Yu Y, Wang Z, Zhang J, Wang X, Fu C, Zhang Y, Wang X. scAuto as a comprehensive framework for single-cell chromatin accessibility data analysis. Comput Biol Med 2024; 171:108230. [PMID: 38442554 DOI: 10.1016/j.compbiomed.2024.108230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/06/2024] [Accepted: 02/25/2024] [Indexed: 03/07/2024]
Abstract
Interpreting single-cell chromatin accessibility data is crucial for understanding intercellular heterogeneity regulation. Despite the progress in computational methods for analyzing this data, there is still a lack of a comprehensive analytical framework and a user-friendly online analysis tool. To fill this gap, we developed a pre-trained deep learning-based framework, single-cell auto-correlation transformers (scAuto), to overcome the challenge. Following DNABERT's methodology of pre-training and fine-tuning, scAuto learns a general understanding of DNA sequence's grammar by being pre-trained on unlabeled human genome via self-supervision; it is then transferred to the single-cell chromatin accessibility analysis task of scATAC-seq data for supervised fine-tuning. We extensively validated scAuto on the Buenrostro2018 dataset, demonstrating its superior performance on chromatin accessibility prediction, single-cell clustering, and data denoising. Based on scAuto, we further developed an interactive web server for single-cell chromatin accessibility data analysis. It integrates tutorial-style interfaces for those with limited programming skills. The platform is accessible at http://zhanglab.icaup.cn. To our knowledge, this work is expected to help analyze single-cell chromatin accessibility data and facilitate the development of precision medicine.
Collapse
Affiliation(s)
- Meiqin Gong
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, 610041, China
| | - Yun Yu
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Zixuan Wang
- College of Electronics and information Engineering, SiChuan University, Chengdu, 610065, China
| | - Junming Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Xiongyi Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Cheng Fu
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Xiaodong Wang
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
34
|
Cui H, Wang C, Maan H, Pang K, Luo F, Duan N, Wang B. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat Methods 2024:10.1038/s41592-024-02201-0. [PMID: 38409223 DOI: 10.1038/s41592-024-02201-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 01/30/2024] [Indexed: 02/28/2024]
Abstract
Generative pretrained models have achieved remarkable success in various domains such as language and computer vision. Specifically, the combination of large-scale diverse datasets and pretrained transformers has emerged as a promising approach for developing foundation models. Drawing parallels between language and cellular biology (in which texts comprise words; similarly, cells are defined by genes), our study probes the applicability of foundation models to advance cellular biology and genetic research. Using burgeoning single-cell sequencing data, we have constructed a foundation model for single-cell biology, scGPT, based on a generative pretrained transformer across a repository of over 33 million cells. Our findings illustrate that scGPT effectively distills critical biological insights concerning genes and cells. Through further adaptation of transfer learning, scGPT can be optimized to achieve superior performance across diverse downstream applications. This includes tasks such as cell type annotation, multi-batch integration, multi-omic integration, perturbation response prediction and gene network inference.
Collapse
Affiliation(s)
- Haotian Cui
- Peter Munk Cardiac Centre, University Health Network, Toronto, Ontartio, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
| | - Chloe Wang
- Peter Munk Cardiac Centre, University Health Network, Toronto, Ontartio, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
| | - Hassaan Maan
- Peter Munk Cardiac Centre, University Health Network, Toronto, Ontartio, Canada
- Vector Institute, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Kuan Pang
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
| | - Fengning Luo
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
| | - Nan Duan
- Microsoft Research, Redmond, WA, USA
| | - Bo Wang
- Peter Munk Cardiac Centre, University Health Network, Toronto, Ontartio, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Vector Institute, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada.
- AI Hub, University Health Network, Toronto, Ontario, Canada.
| |
Collapse
|
35
|
Ng MTH, Borst R, Gacaferi H, Davidson S, Ackerman JE, Johnson PA, Machado CC, Reekie I, Attar M, Windell D, Kurowska-Stolarska M, MacDonald L, Alivernini S, Garvilles M, Jansen K, Bhalla A, Lee A, Charlesworth J, Chowdhury R, Klenerman P, Powell K, Hackstein CP, Furniss D, Rees J, Gilroy D, Coles M, Carr AJ, Sansom SN, Buckley CD, Dakin SG. A single cell atlas of frozen shoulder capsule identifies features associated with inflammatory fibrosis resolution. Nat Commun 2024; 15:1394. [PMID: 38374174 PMCID: PMC10876649 DOI: 10.1038/s41467-024-45341-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 01/19/2024] [Indexed: 02/21/2024] Open
Abstract
Frozen shoulder is a spontaneously self-resolving chronic inflammatory fibrotic human disease, which distinguishes the condition from most fibrotic diseases that are progressive and irreversible. Using single-cell analysis, we identify pro-inflammatory MERTKlowCD48+ macrophages and MERTK + LYVE1 + MRC1+ macrophages enriched for negative regulators of inflammation which co-exist in frozen shoulder capsule tissues. Micro-cultures of patient-derived cells identify integrin-mediated cell-matrix interactions between MERTK+ macrophages and pro-resolving DKK3+ and POSTN+ fibroblasts, suggesting that matrix remodelling plays a role in frozen shoulder resolution. Cross-tissue analysis reveals a shared gene expression cassette between shoulder capsule MERTK+ macrophages and a respective population enriched in synovial tissues of rheumatoid arthritis patients in disease remission, supporting the concept that MERTK+ macrophages mediate resolution of inflammation and fibrosis. Single-cell transcriptomic profiling and spatial analysis of human foetal shoulder tissues identify MERTK + LYVE1 + MRC1+ macrophages and DKK3+ and POSTN+ fibroblast populations analogous to those in frozen shoulder, suggesting that the template to resolve fibrosis is established during shoulder development. Crosstalk between MerTK+ macrophages and pro-resolving DKK3+ and POSTN+ fibroblasts could facilitate resolution of frozen shoulder, providing a basis for potential therapeutic resolution of persistent fibrotic diseases.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Caio C Machado
- University of Oxford, Oxford, UK
- University of Sao Paulo, Sao Paulo, Brazil
| | | | | | | | | | - Lucy MacDonald
- Research into Inflammatory Arthritis Centre Versus Arthritis (RACE), University of Glasgow, Glasgow, UK
| | - Stefano Alivernini
- Fondazione Policlinico Universitario Agostino Gemelli - IRCCS, Rome, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Massoni-Badosa R, Aguilar-Fernández S, Nieto JC, Soler-Vila P, Elosua-Bayes M, Marchese D, Kulis M, Vilas-Zornoza A, Bühler MM, Rashmi S, Alsinet C, Caratù G, Moutinho C, Ruiz S, Lorden P, Lunazzi G, Colomer D, Frigola G, Blevins W, Romero-Rivero L, Jiménez-Martínez V, Vidal A, Mateos-Jaimez J, Maiques-Diaz A, Ovejero S, Moreaux J, Palomino S, Gomez-Cabrero D, Agirre X, Weniger MA, King HW, Garner LC, Marini F, Cervera-Paz FJ, Baptista PM, Vilaseca I, Rosales C, Ruiz-Gaspà S, Talks B, Sidhpura K, Pascual-Reguant A, Hauser AE, Haniffa M, Prosper F, Küppers R, Gut IG, Campo E, Martin-Subero JI, Heyn H. An atlas of cells in the human tonsil. Immunity 2024; 57:379-399.e18. [PMID: 38301653 PMCID: PMC10869140 DOI: 10.1016/j.immuni.2024.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 07/07/2023] [Accepted: 01/09/2024] [Indexed: 02/03/2024]
Abstract
Palatine tonsils are secondary lymphoid organs (SLOs) representing the first line of immunological defense against inhaled or ingested pathogens. We generated an atlas of the human tonsil composed of >556,000 cells profiled across five different data modalities, including single-cell transcriptome, epigenome, proteome, and immune repertoire sequencing, as well as spatial transcriptomics. This census identified 121 cell types and states, defined developmental trajectories, and enabled an understanding of the functional units of the tonsil. Exemplarily, we stratified myeloid slan-like subtypes, established a BCL6 enhancer as locally active in follicle-associated T and B cells, and identified SIX5 as putative transcriptional regulator of plasma cell maturation. Analyses of a validation cohort confirmed the presence, annotation, and markers of tonsillar cell types and provided evidence of age-related compositional shifts. We demonstrate the value of this resource by annotating cells from B cell-derived mantle cell lymphomas, linking transcriptional heterogeneity to normal B cell differentiation states of the human tonsil.
Collapse
Affiliation(s)
| | | | - Juan C Nieto
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Paula Soler-Vila
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | | | | | - Marta Kulis
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Amaia Vilas-Zornoza
- Hemato-Oncology Program, Center for Applied Medical Research (CIMA), University of Navarra, IDISNA, Universidad de Navarra, Pamplona, Spain; Centro de Investigación Biomédica en Red Cáncer (CIBERONC), Madrid, Spain
| | - Marco Matteo Bühler
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland; Hematopathology Section, Pathology Department, Hospital Clinic, Barcelona, Spain
| | - Sonal Rashmi
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Clara Alsinet
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Ginevra Caratù
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Catia Moutinho
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Sara Ruiz
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Patricia Lorden
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Giulia Lunazzi
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Dolors Colomer
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Centro de Investigación Biomédica en Red Cáncer (CIBERONC), Madrid, Spain; Hematopathology Section, Pathology Department, Hospital Clinic, Barcelona, Spain; Departament de Fonaments Clínics, Facultat de Medicina, Universitat de Barcelona, Barcelona, Spain
| | - Gerard Frigola
- Hematopathology Section, Pathology Department, Hospital Clinic, Barcelona, Spain
| | - Will Blevins
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Lucia Romero-Rivero
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | | | - Anna Vidal
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Judith Mateos-Jaimez
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Alba Maiques-Diaz
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Sara Ovejero
- Department of Biological Hematology, CHU Montpellier, Montpellier, France; Institute of Human Genetics, UMR 9002 CNRS-UM, Montpellier, France
| | - Jérôme Moreaux
- Department of Biological Hematology, CHU Montpellier, Montpellier, France; Institute of Human Genetics, UMR 9002 CNRS-UM, Montpellier, France; Department of Clinical Hematology, CHU Montpellier, Montpellier, France
| | - Sara Palomino
- Translational Bioinformatics Unit (TransBio), Navarrabiomed, Navarra Health Department (CHN), Public University of Navarra (UPNA), Navarra Institute for Health Research (IdiSNA), Pamplona, Spain
| | - David Gomez-Cabrero
- Translational Bioinformatics Unit (TransBio), Navarrabiomed, Navarra Health Department (CHN), Public University of Navarra (UPNA), Navarra Institute for Health Research (IdiSNA), Pamplona, Spain; Bioscience Program, Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology KAUST, Thuwal, Saudi Arabia
| | - Xabier Agirre
- Hemato-Oncology Program, Center for Applied Medical Research (CIMA), University of Navarra, IDISNA, Universidad de Navarra, Pamplona, Spain; Centro de Investigación Biomédica en Red Cáncer (CIBERONC), Madrid, Spain
| | - Marc A Weniger
- Institute of Cell Biology (Cancer Research), Medical Faculty, University of Duisburg-Essen, Essen, Germany
| | - Hamish W King
- Epigenetics and Development Division, Walter and Eliza Hall Institute, Parkville, Australia
| | - Lucy C Garner
- Translational Gastroenterology Unit, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Federico Marini
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany; Center for Thrombosis and Hemostasis (CTH), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | | | - Peter M Baptista
- Department of Otorhinolaryngology, University of Navarra, Pamplona, Spain
| | - Isabel Vilaseca
- Otorhinolaryngology Head-Neck Surgery Department, Hospital Clínic, IDIBAPS Universitat de Barcelona, Barcelona, Spain
| | - Cecilia Rosales
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Silvia Ruiz-Gaspà
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Benjamin Talks
- Biosciences Institute, Newcastle University, Newcastle Upon Tyne, UK; Department of Otolaryngology, Freeman Hospital, Newcastle Hospitals NHS Foundation Trust, Newcastle Upon Tyne, UK
| | - Keval Sidhpura
- Biosciences Institute, Newcastle University, Newcastle Upon Tyne, UK
| | - Anna Pascual-Reguant
- Department of Rheumatology and Clinical Immunology, Charité - Universitätsmedizin Berlin, Berlin, Germany; Immune Dynamics, Deutsches Rheuma-Forschungszentrum (DRFZ), Berlin, Germany
| | - Anja E Hauser
- Department of Rheumatology and Clinical Immunology, Charité - Universitätsmedizin Berlin, Berlin, Germany; Immune Dynamics, Deutsches Rheuma-Forschungszentrum (DRFZ), Berlin, Germany
| | - Muzlifah Haniffa
- Biosciences Institute, Newcastle University, Newcastle Upon Tyne, UK; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK; Department of Dermatology and NIHR Newcastle Biomedical Research Centre, Newcastle Hospitals NHS Foundation Trust, Newcastle Upon Tyne, UK
| | - Felipe Prosper
- Hemato-Oncology Program, Center for Applied Medical Research (CIMA), University of Navarra, IDISNA, Universidad de Navarra, Pamplona, Spain; Centro de Investigación Biomédica en Red Cáncer (CIBERONC), Madrid, Spain; Departamento de Hematología, Clínica Universidad de Navarra, University of Navarra, Pamplona, Spain
| | - Ralf Küppers
- Institute of Cell Biology (Cancer Research), Medical Faculty, University of Duisburg-Essen, Essen, Germany
| | - Ivo Glynne Gut
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Elias Campo
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Centro de Investigación Biomédica en Red Cáncer (CIBERONC), Madrid, Spain; Hematopathology Section, Pathology Department, Hospital Clinic, Barcelona, Spain; Departament de Fonaments Clínics, Facultat de Medicina, Universitat de Barcelona, Barcelona, Spain
| | - José Ignacio Martin-Subero
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Departament de Fonaments Clínics, Facultat de Medicina, Universitat de Barcelona, Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| | - Holger Heyn
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| |
Collapse
|
37
|
Harkany T, Tretiakov E, Varela L, Jarc J, Rebernik P, Newbold S, Keimpema E, Verkhratsky A, Horvath T, Romanov R. Molecularly stratified hypothalamic astrocytes are cellular foci for obesity. RESEARCH SQUARE 2024:rs.3.rs-3748581. [PMID: 38405925 PMCID: PMC10889077 DOI: 10.21203/rs.3.rs-3748581/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Astrocytes safeguard the homeostasis of the central nervous system1,2. Despite their prominent morphological plasticity under conditions that challenge the brain's adaptive capacity3-5, the classification of astrocytes, and relating their molecular make-up to spatially devolved neuronal operations that specify behavior or metabolism, remained mostly futile6,7. Although it seems unexpected in the era of single-cell biology, the lack of a major advance in stratifying astrocytes under physiological conditions rests on the incompatibility of 'neurocentric' algorithms that rely on stable developmental endpoints, lifelong transcriptional, neurotransmitter, and neuropeptide signatures for classification6-8 with the dynamic functional states, anatomic allocation, and allostatic plasticity of astrocytes1. Simplistically, therefore, astrocytes are still grouped as 'resting' vs. 'reactive', the latter referring to pathological states marked by various inducible genes3,9,10. Here, we introduced a machine learning-based feature recognition algorithm that benefits from the cumulative power of published single-cell RNA-seq data on astrocytes as a reference map to stepwise eliminate pleiotropic and inducible cellular features. For the healthy hypothalamus, this walk-back approach revealed gene regulatory networks (GRNs) that specified subsets of astrocytes, and could be used as landmarking tools for their anatomical assignment. The core molecular censuses retained by astrocyte subsets were sufficient to stratify them by allostatic competence, chiefly their signaling and metabolic interplay with neurons. Particularly, we found differentially expressed mitochondrial genes in insulin-sensing astrocytes and demonstrated their reciprocal signaling with neurons that work antagonistically within the food intake circuitry. As a proof-of-concept, we showed that disrupting Mfn2 expression in astrocytes reduced their ability to support dynamic circuit reorganization, a time-locked feature of satiety in the hypothalamus, thus leading to obesity in mice. Overall, our results suggest that astrocytes in the healthy brain are fundamentally more heterogeneous than previously thought and topologically mirror the specificity of local neurocircuits.
Collapse
Affiliation(s)
- Tibor Harkany
- Center for Brain Research, Medical University of Vienna
| | | | | | - Jasna Jarc
- Center for Brain Research, Medical University of Vienna
| | | | | | - Erik Keimpema
- Medical University of Vienna, Center for Brain Research
| | | | | | | |
Collapse
|
38
|
Geuenich MJ, Gong DW, Campbell KR. The impacts of active and self-supervised learning on efficient annotation of single-cell expression data. Nat Commun 2024; 15:1014. [PMID: 38307875 PMCID: PMC10837127 DOI: 10.1038/s41467-024-45198-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 01/16/2024] [Indexed: 02/04/2024] Open
Abstract
A crucial step in the analysis of single-cell data is annotating cells to cell types and states. While a myriad of approaches has been proposed, manual labeling of cells to create training datasets remains tedious and time-consuming. In the field of machine learning, active and self-supervised learning methods have been proposed to improve the performance of a classifier while reducing both annotation time and label budget. However, the benefits of such strategies for single-cell annotation have yet to be evaluated in realistic settings. Here, we perform a comprehensive benchmarking of active and self-supervised labeling strategies across a range of single-cell technologies and cell type annotation algorithms. We quantify the benefits of active learning and self-supervised strategies in the presence of cell type imbalance and variable similarity. We introduce adaptive reweighting, a heuristic procedure tailored to single-cell data-including a marker-aware version-that shows competitive performance with existing approaches. In addition, we demonstrate that having prior knowledge of cell type markers improves annotation accuracy. Finally, we summarize our findings into a set of recommendations for those implementing cell type annotation procedures or platforms. An R package implementing the heuristic approaches introduced in this work may be found at https://github.com/camlab-bioml/leader .
Collapse
Affiliation(s)
- Michael J Geuenich
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, M5G 1×5, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada.
| | - Dae-Won Gong
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, M5G 1×5, Canada
| | - Kieran R Campbell
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, M5G 1×5, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada.
- Department of Statistical Sciences, University of Toronto, Toronto, ON, M5S 3G3, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, M5T 3A1, Canada.
- Ontario Institute of Cancer Research, Toronto, ON, M5G 1M1, Canada.
- Vector Institute, Toronto, ON, M5G 1M1, Canada.
| |
Collapse
|
39
|
Dopp J, Ortega A, Davie K, Poovathingal S, Baz ES, Liu S. Single-cell transcriptomics reveals that glial cells integrate homeostatic and circadian processes to drive sleep-wake cycles. Nat Neurosci 2024; 27:359-372. [PMID: 38263460 PMCID: PMC10849968 DOI: 10.1038/s41593-023-01549-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Accepted: 12/07/2023] [Indexed: 01/25/2024]
Abstract
The sleep-wake cycle is determined by circadian and sleep homeostatic processes. However, the molecular impact of these processes and their interaction in different brain cell populations are unknown. To fill this gap, we profiled the single-cell transcriptome of adult Drosophila brains across the sleep-wake cycle and four circadian times. We show cell type-specific transcriptomic changes, with glia displaying the largest variation. Glia are also among the few cell types whose gene expression correlates with both sleep homeostat and circadian clock. The sleep-wake cycle and sleep drive level affect the expression of clock gene regulators in glia, and disrupting clock genes specifically in glia impairs homeostatic sleep rebound after sleep deprivation. These findings provide a comprehensive view of the effects of sleep homeostatic and circadian processes on distinct cell types in an entire animal brain and reveal glia as an interaction site of these two processes to determine sleep-wake dynamics.
Collapse
Affiliation(s)
- Joana Dopp
- Center for Brain & Disease Research, VIB-KU Leuven, Leuven, Belgium
- Department of Neurosciences, KU Leuven, Leuven, Belgium
- Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Antonio Ortega
- Center for Brain & Disease Research, VIB-KU Leuven, Leuven, Belgium
- Department of Neurosciences, KU Leuven, Leuven, Belgium
- Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Kristofer Davie
- Center for Brain & Disease Research, VIB-KU Leuven, Leuven, Belgium
- Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Suresh Poovathingal
- Center for Brain & Disease Research, VIB-KU Leuven, Leuven, Belgium
- Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - El-Sayed Baz
- Center for Brain & Disease Research, VIB-KU Leuven, Leuven, Belgium
- Leuven Brain Institute, KU Leuven, Leuven, Belgium
- Zoology Department, Faculty of Science, Suez Canal University, Ismailia, Egypt
| | - Sha Liu
- Center for Brain & Disease Research, VIB-KU Leuven, Leuven, Belgium.
- Department of Neurosciences, KU Leuven, Leuven, Belgium.
- Leuven Brain Institute, KU Leuven, Leuven, Belgium.
| |
Collapse
|
40
|
Kucinski I, Campos J, Barile M, Severi F, Bohin N, Moreira PN, Allen L, Lawson H, Haltalli MLR, Kinston SJ, O'Carroll D, Kranc KR, Göttgens B. A time- and single-cell-resolved model of murine bone marrow hematopoiesis. Cell Stem Cell 2024; 31:244-259.e10. [PMID: 38183977 PMCID: PMC7615671 DOI: 10.1016/j.stem.2023.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 09/25/2023] [Accepted: 12/04/2023] [Indexed: 01/08/2024]
Abstract
The paradigmatic hematopoietic tree model is increasingly recognized to be limited, as it is based on heterogeneous populations largely defined by non-homeostatic assays testing cell fate potentials. Here, we combine persistent labeling with time-series single-cell RNA sequencing to build a real-time, quantitative model of in vivo tissue dynamics for murine bone marrow hematopoiesis. We couple cascading single-cell expression patterns with dynamic changes in differentiation and growth speeds. The resulting explicit linkage between molecular states and cellular behavior reveals widely varying self-renewal and differentiation properties across distinct lineages. Transplanted stem cells show strong acceleration of differentiation at specific stages of erythroid and neutrophil production, illustrating how the model can quantify the impact of perturbations. Our reconstruction of dynamic behavior from snapshot measurements is akin to how a kinetoscope allows sequential images to merge into a movie. We posit that this approach is generally applicable to understanding tissue-scale dynamics at high resolution.
Collapse
Affiliation(s)
- Iwo Kucinski
- Wellcome-MRC Cambridge Stem Cell Institute, Department of Haematology, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | - Joana Campos
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK; Institute of Cancer Research, London SM2 5NG, UK
| | - Melania Barile
- Wellcome-MRC Cambridge Stem Cell Institute, Department of Haematology, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK; Centre for Translational Stem Cell Biology, Hong Kong SAR, China
| | - Francesco Severi
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh EH16 4UU, UK; Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Natacha Bohin
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Pedro N Moreira
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh EH16 4UU, UK; Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Lewis Allen
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK; Institute of Cancer Research, London SM2 5NG, UK
| | - Hannah Lawson
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK; Institute of Cancer Research, London SM2 5NG, UK
| | - Myriam L R Haltalli
- Wellcome-MRC Cambridge Stem Cell Institute, Department of Haematology, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | - Sarah J Kinston
- Wellcome-MRC Cambridge Stem Cell Institute, Department of Haematology, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | - Dónal O'Carroll
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh EH16 4UU, UK; Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK.
| | - Kamil R Kranc
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK; Institute of Cancer Research, London SM2 5NG, UK.
| | - Berthold Göttgens
- Wellcome-MRC Cambridge Stem Cell Institute, Department of Haematology, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK.
| |
Collapse
|
41
|
Hao Y, Stuart T, Kowalski MH, Choudhary S, Hoffman P, Hartman A, Srivastava A, Molla G, Madad S, Fernandez-Granda C, Satija R. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol 2024; 42:293-304. [PMID: 37231261 PMCID: PMC10928517 DOI: 10.1038/s41587-023-01767-y] [Citation(s) in RCA: 168] [Impact Index Per Article: 168.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 03/28/2023] [Indexed: 05/27/2023]
Abstract
Mapping single-cell sequencing profiles to comprehensive reference datasets provides a powerful alternative to unsupervised analysis. However, most reference datasets are constructed from single-cell RNA-sequencing data and cannot be used to annotate datasets that do not measure gene expression. Here we introduce 'bridge integration', a method to integrate single-cell datasets across modalities using a multiomic dataset as a molecular bridge. Each cell in the multiomic dataset constitutes an element in a 'dictionary', which is used to reconstruct unimodal datasets and transform them into a shared space. Our procedure accurately integrates transcriptomic data with independent single-cell measurements of chromatin accessibility, histone modifications, DNA methylation and protein levels. Moreover, we demonstrate how dictionary learning can be combined with sketching techniques to improve computational scalability and harmonize 8.6 million human immune cell profiles from sequencing and mass cytometry experiments. Our approach, implemented in version 5 of our Seurat toolkit ( http://www.satijalab.org/seurat ), broadens the utility of single-cell reference datasets and facilitates comparisons across diverse molecular modalities.
Collapse
Affiliation(s)
- Yuhan Hao
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Tim Stuart
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Madeline H Kowalski
- New York Genome Center, New York, NY, USA
- Institute for System Genetics, NYU Langone Medical Center, New York, NY, USA
| | - Saket Choudhary
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Paul Hoffman
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Austin Hartman
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Avi Srivastava
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | | | - Shaista Madad
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Carlos Fernandez-Granda
- Center for Data Science, New York University, New York, NY, USA
- Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - Rahul Satija
- Center for Genomics and Systems Biology, New York University, New York, NY, USA.
- New York Genome Center, New York, NY, USA.
| |
Collapse
|
42
|
Mihai IS, Chafle S, Henriksson J. Representing and extracting knowledge from single-cell data. Biophys Rev 2024; 16:29-56. [PMID: 38495441 PMCID: PMC10937862 DOI: 10.1007/s12551-023-01091-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 06/28/2023] [Indexed: 03/19/2024] Open
Abstract
Single-cell analysis is currently one of the most high-resolution techniques to study biology. The large complex datasets that have been generated have spurred numerous developments in computational biology, in particular the use of advanced statistics and machine learning. This review attempts to explain the deeper theoretical concepts that underpin current state-of-the-art analysis methods. Single-cell analysis is covered from cell, through instruments, to current and upcoming models. The aim of this review is to spread concepts which are not yet in common use, especially from topology and generative processes, and how new statistical models can be developed to capture more of biology. This opens epistemological questions regarding our ontology and models, and some pointers will be given to how natural language processing (NLP) may help overcome our cognitive limitations for understanding single-cell data.
Collapse
Affiliation(s)
- Ionut Sebastian Mihai
- The Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå, Sweden
- Umeå Centre for Microbial Research (UCMR), Department of Molecular Biology, Umeå University, Umeå, Sweden
- Industrial Doctoral School, Umeå University, Umeå, Sweden
| | - Sarang Chafle
- The Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå, Sweden
- Umeå Centre for Microbial Research (UCMR), Department of Molecular Biology, Umeå University, Umeå, Sweden
| | - Johan Henriksson
- The Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå, Sweden
- Umeå Centre for Microbial Research (UCMR), Department of Molecular Biology, Umeå University, Umeå, Sweden
| |
Collapse
|
43
|
Andreatta M, Hérault L, Gueguen P, Gfeller D, Berenstein AJ, Carmona SJ. Semi-supervised integration of single-cell transcriptomics data. Nat Commun 2024; 15:872. [PMID: 38287014 PMCID: PMC10825117 DOI: 10.1038/s41467-024-45240-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 01/16/2024] [Indexed: 01/31/2024] Open
Abstract
Batch effects in single-cell RNA-seq data pose a significant challenge for comparative analyses across samples, individuals, and conditions. Although batch effect correction methods are routinely applied, data integration often leads to overcorrection and can result in the loss of biological variability. In this work we present STACAS, a batch correction method for scRNA-seq that leverages prior knowledge on cell types to preserve biological variability upon integration. Through an open-source benchmark, we show that semi-supervised STACAS outperforms state-of-the-art unsupervised methods, as well as supervised methods such as scANVI and scGen. STACAS scales well to large datasets and is robust to incomplete and imprecise input cell type labels, which are commonly encountered in real-life integration tasks. We argue that the incorporation of prior cell type information should be a common practice in single-cell data integration, and we provide a flexible framework for semi-supervised batch effect correction.
Collapse
Affiliation(s)
- Massimo Andreatta
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Léonard Hérault
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Paul Gueguen
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Ariel J Berenstein
- Laboratorio de Biología Molecular, División Patología, Instituto Multidisciplinario de Investigaciones en Patologías Pediátricas (IMIPP), CONICET-GCBA, Buenos Aires, C1425EFD, Argentina
| | - Santiago J Carmona
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland.
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| |
Collapse
|
44
|
Piran Z, Nitzan M. SiFT: uncovering hidden biological processes by probabilistic filtering of single-cell data. Nat Commun 2024; 15:760. [PMID: 38278815 PMCID: PMC10817921 DOI: 10.1038/s41467-024-44757-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 01/03/2024] [Indexed: 01/28/2024] Open
Abstract
Cellular populations simultaneously encode multiple biological attributes, including spatial configuration, temporal trajectories, and cell-cell interactions. Some of these signals may be overshadowed by others and harder to recover, despite the great progress made to computationally reconstruct biological processes from single-cell data. To address this, we present SiFT, a kernel-based projection method for filtering biological signals in single-cell data, thus uncovering underlying biological processes. SiFT applies to a wide range of tasks, from the removal of unwanted variation in the data to revealing hidden biological structures. We demonstrate how SiFT enhances the liver circadian signal by filtering spatial zonation, recovers regenerative cell subpopulations in spatially-resolved liver data, and exposes COVID-19 disease-related cells, pathways, and dynamics by filtering healthy reference signals. SiFT performs the correction at the gene expression level, can scale to large datasets, and compares favorably to state-of-the-art methods.
Collapse
Affiliation(s)
- Zoe Piran
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel
| | - Mor Nitzan
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel.
- Racah Institute of Physics, The Hebrew University, Jerusalem, Israel.
- Faculty of Medicine, The Hebrew University, Jerusalem, Israel.
| |
Collapse
|
45
|
Wang L, Nie R, Miao X, Cai Y, Wang A, Zhang H, Zhang J, Cai J. InClust+: the deep generative framework with mask modules for multimodal data integration, imputation, and cross-modal generation. BMC Bioinformatics 2024; 25:41. [PMID: 38267858 PMCID: PMC10809631 DOI: 10.1186/s12859-024-05656-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 01/15/2024] [Indexed: 01/26/2024] Open
Abstract
BACKGROUND With the development of single-cell technology, many cell traits can be measured. Furthermore, the multi-omics profiling technology could jointly measure two or more traits in a single cell simultaneously. In order to process the various data accumulated rapidly, computational methods for multimodal data integration are needed. RESULTS Here, we present inClust+, a deep generative framework for the multi-omics. It's built on previous inClust that is specific for transcriptome data, and augmented with two mask modules designed for multimodal data processing: an input-mask module in front of the encoder and an output-mask module behind the decoder. InClust+ was first used to integrate scRNA-seq and MERFISH data from similar cell populations, and to impute MERFISH data based on scRNA-seq data. Then, inClust+ was shown to have the capability to integrate the multimodal data (e.g. tri-modal data with gene expression, chromatin accessibility and protein abundance) with batch effect. Finally, inClust+ was used to integrate an unlabeled monomodal scRNA-seq dataset and two labeled multimodal CITE-seq datasets, transfer labels from CITE-seq datasets to scRNA-seq dataset, and generate the missing modality of protein abundance in monomodal scRNA-seq data. In the above examples, the performance of inClust+ is better than or comparable to the most recent tools in the corresponding task. CONCLUSIONS The inClust+ is a suitable framework for handling multimodal data. Meanwhile, the successful implementation of mask in inClust+ means that it can be applied to other deep learning methods with similar encoder-decoder architecture to broaden the application scope of these models.
Collapse
Affiliation(s)
- Lifei Wang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China.
| | - Rui Nie
- China National Center for Bioinformation, Beijing, China
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xuexia Miao
- China National Center for Bioinformation, Beijing, China
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yankai Cai
- School of Economic and Management, China University of Geoscience, Wuhan, China
| | - Anqi Wang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China
| | - Hanwen Zhang
- Shulan (Hangzhou) Hospital, Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China
| | - Jiang Zhang
- School of Systems Science, Beijing Normal University, Beijing, 100875, China.
| | - Jun Cai
- China National Center for Bioinformation, Beijing, China.
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
46
|
He Z, Hu S, Chen Y, An S, Zhou J, Liu R, Shi J, Wang J, Dong G, Shi J, Zhao J, Ou-Yang L, Zhu Y, Bo X, Ying X. Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS. Nat Biotechnol 2024:10.1038/s41587-023-02040-y. [PMID: 38263515 DOI: 10.1038/s41587-023-02040-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 10/23/2023] [Indexed: 01/25/2024]
Abstract
Integrating single-cell datasets produced by multiple omics technologies is essential for defining cellular heterogeneity. Mosaic integration, in which different datasets share only some of the measured modalities, poses major challenges, particularly regarding modality alignment and batch effect removal. Here, we present a deep probabilistic framework for the mosaic integration and knowledge transfer (MIDAS) of single-cell multimodal data. MIDAS simultaneously achieves dimensionality reduction, imputation and batch correction of mosaic data by using self-supervised modality alignment and information-theoretic latent disentanglement. We demonstrate its superiority to 19 other methods and reliability by evaluating its performance in trimodal and mosaic integration tasks. We also constructed a single-cell trimodal atlas of human peripheral blood mononuclear cells and tailored transfer learning and reciprocal reference mapping schemes to enable flexible and accurate knowledge transfer from the atlas to new data. Applications in mosaic integration, pseudotime analysis and cross-tissue knowledge transfer on bone marrow mosaic datasets demonstrate the versatility and superiority of MIDAS. MIDAS is available at https://github.com/labomics/midas .
Collapse
Affiliation(s)
- Zhen He
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Shuofeng Hu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Yaowen Chen
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Sijing An
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiahao Zhou
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Runyan Liu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Junfeng Shi
- School of Automation, China University of Geosciences, Wuhan, China
| | - Jing Wang
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Guohua Dong
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jinhui Shi
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiaxin Zhao
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Xiaomin Ying
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China.
| |
Collapse
|
47
|
Wan H, Yuan M, Fu Y, Deng M. Continually adapting pre-trained language model to universal annotation of single-cell RNA-seq data. Brief Bioinform 2024; 25:bbae047. [PMID: 38388681 PMCID: PMC10883808 DOI: 10.1093/bib/bbae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/29/2023] [Accepted: 01/18/2024] [Indexed: 02/24/2024] Open
Abstract
MOTIVATION Cell-type annotation of single-cell RNA-sequencing (scRNA-seq) data is a hallmark of biomedical research and clinical application. Current annotation tools usually assume the simultaneous acquisition of well-annotated data, but without the ability to expand knowledge from new data. Yet, such tools are inconsistent with the continuous emergence of scRNA-seq data, calling for a continuous cell-type annotation model. In addition, by their powerful ability of information integration and model interpretability, transformer-based pre-trained language models have led to breakthroughs in single-cell biology research. Therefore, the systematic combining of continual learning and pre-trained language models for cell-type annotation tasks is inevitable. RESULTS We herein propose a universal cell-type annotation tool, called CANAL, that continuously fine-tunes a pre-trained language model trained on a large amount of unlabeled scRNA-seq data, as new well-labeled data emerges. CANAL essentially alleviates the dilemma of catastrophic forgetting, both in terms of model inputs and outputs. For model inputs, we introduce an experience replay schema that repeatedly reviews previous vital examples in current training stages. This is achieved through a dynamic example bank with a fixed buffer size. The example bank is class-balanced and proficient in retaining cell-type-specific information, particularly facilitating the consolidation of patterns associated with rare cell types. For model outputs, we utilize representation knowledge distillation to regularize the divergence between previous and current models, resulting in the preservation of knowledge learned from past training stages. Moreover, our universal annotation framework considers the inclusion of new cell types throughout the fine-tuning and testing stages. We can continuously expand the cell-type annotation library by absorbing new cell types from newly arrived, well-annotated training datasets, as well as automatically identify novel cells in unlabeled datasets. Comprehensive experiments with data streams under various biological scenarios demonstrate the versatility and high model interpretability of CANAL. AVAILABILITY An implementation of CANAL is available from https://github.com/aster-ww/CANAL-torch. CONTACT dengmh@pku.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Journal Name online.
Collapse
Affiliation(s)
- Hui Wan
- School of Mathematical Sciences, Peking University, Beijing, China, 100871
| | - Musu Yuan
- Center for Quantitative Biology, Peking University, Beijing, China, 100871
| | - Yiwei Fu
- School of Mathematical Sciences, Peking University, Beijing, China, 100871
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing, China, 100871
- Center for Quantitative Biology, Peking University, Beijing, China, 100871
- Center for Statistical Science, Peking university, Beijing, China, 100871
| |
Collapse
|
48
|
Zahedi R, Ghamsari R, Argha A, Macphillamy C, Beheshti A, Alizadehsani R, Lovell NH, Lotfollahi M, Alinejad-Rokny H. Deep learning in spatially resolved transcriptfomics: a comprehensive technical view. Brief Bioinform 2024; 25:bbae082. [PMID: 38483255 PMCID: PMC10939360 DOI: 10.1093/bib/bbae082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/22/2024] [Accepted: 02/13/2024] [Indexed: 03/17/2024] Open
Abstract
Spatially resolved transcriptomics (SRT) is a pioneering method for simultaneously studying morphological contexts and gene expression at single-cell precision. Data emerging from SRT are multifaceted, presenting researchers with intricate gene expression matrices, precise spatial details and comprehensive histology visuals. Such rich and intricate datasets, unfortunately, render many conventional methods like traditional machine learning and statistical models ineffective. The unique challenges posed by the specialized nature of SRT data have led the scientific community to explore more sophisticated analytical avenues. Recent trends indicate an increasing reliance on deep learning algorithms, especially in areas such as spatial clustering, identification of spatially variable genes and data alignment tasks. In this manuscript, we provide a rigorous critique of these advanced deep learning methodologies, probing into their merits, limitations and avenues for further refinement. Our in-depth analysis underscores that while the recent innovations in deep learning tailored for SRT have been promising, there remains a substantial potential for enhancement. A crucial area that demands attention is the development of models that can incorporate intricate biological nuances, such as phylogeny-aware processing or in-depth analysis of minuscule histology image segments. Furthermore, addressing challenges like the elimination of batch effects, perfecting data normalization techniques and countering the overdispersion and zero inflation patterns seen in gene expression is pivotal. To support the broader scientific community in their SRT endeavors, we have meticulously assembled a comprehensive directory of readily accessible SRT databases, hoping to serve as a foundation for future research initiatives.
Collapse
Affiliation(s)
- Roxana Zahedi
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
| | - Reza Ghamsari
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
| | - Ahmadreza Argha
- The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| | - Callum Macphillamy
- School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, 5371, Australia
| | - Amin Beheshti
- School of Computing, Macquarie University, Sydney, 2109, Australia
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Waurn Ponds, Melbourne, VIC, 3216, Australia
| | - Nigel H Lovell
- The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| | - Mohammad Lotfollahi
- Computational Health Center, Helmholtz Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Hamid Alinejad-Rokny
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| |
Collapse
|
49
|
Zhai Y, Chen L, Deng M. scEVOLVE: cell-type incremental annotation without forgetting for single-cell RNA-seq data. Brief Bioinform 2024; 25:bbae039. [PMID: 38366803 PMCID: PMC10939389 DOI: 10.1093/bib/bbae039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 01/03/2024] [Accepted: 01/09/2024] [Indexed: 02/18/2024] Open
Abstract
The evolution in single-cell RNA sequencing (scRNA-seq) technology has opened a new avenue for researchers to inspect cellular heterogeneity with single-cell precision. One crucial aspect of this technology is cell-type annotation, which is fundamental for any subsequent analysis in single-cell data mining. Recently, the scientific community has seen a surge in the development of automatic annotation methods aimed at this task. However, these methods generally operate at a steady-state total cell-type capacity, significantly restricting the cell annotation systems'capacity for continuous knowledge acquisition. Furthermore, creating a unified scRNA-seq annotation system remains challenged by the need to progressively expand its understanding of ever-increasing cell-type concepts derived from a continuous data stream. In response to these challenges, this paper presents a novel and challenging setting for annotation, namely cell-type incremental annotation. This concept is designed to perpetually enhance cell-type knowledge, gleaned from continuously incoming data. This task encounters difficulty with data stream samples that can only be observed once, leading to catastrophic forgetting. To address this problem, we introduce our breakthrough methodology termed scEVOLVE, an incremental annotation method. This innovative approach is built upon the methodology of contrastive sample replay combined with the fundamental principle of partition confidence maximization. Specifically, we initially retain and replay sections of the old data in each subsequent training phase, then establish a unique prototypical learning objective to mitigate the cell-type imbalance problem, as an alternative to using cross-entropy. To effectively emulate a model that trains concurrently with complete data, we introduce a cell-type decorrelation strategy that efficiently scatters feature representations of each cell type uniformly. We constructed the scEVOLVE framework with simplicity and ease of integration into most deep softmax-based single-cell annotation methods. Thorough experiments conducted on a range of meticulously constructed benchmarks consistently prove that our methodology can incrementally learn numerous cell types over an extended period, outperforming other strategies that fail quickly. As far as our knowledge extends, this is the first attempt to propose and formulate an end-to-end algorithm framework to address this new, practical task. Additionally, scEVOLVE, coded in Python using the Pytorch machine-learning library, is freely accessible at https://github.com/aimeeyaoyao/scEVOLVE.
Collapse
Affiliation(s)
- Yuyao Zhai
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Liang Chen
- Huawei Technologies Co., Ltd., Beijing, China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing, China
- Center for Statistical Science, Peking University, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing, China
| |
Collapse
|
50
|
Jiang Y, Hu Z, Lynch AW, Jiang J, Zhu A, Zhang Y, Xie Y, Li R, Zhou N, Meyer CA, Cejas P, Brown M, Long HW, Qiu X. scATAnno: Automated Cell Type Annotation for single-cell ATAC Sequencing Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.06.01.543296. [PMID: 37333088 PMCID: PMC10274707 DOI: 10.1101/2023.06.01.543296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
The recent advances in single-cell epigenomic techniques have created a growing demand for scATAC-seq analysis. One key task is to determine cell types based on epigenetic profiling. We introduce scATAnno, a workflow designed to automatically annotate scATAC-seq data using large-scale scATAC-seq reference atlases. This workflow can generate scATAC-seq reference atlases from publicly available datasets, and enable accurate cell type annotation by integrating query data with reference atlases, without the aid of scRNA-seq profiling. To enhance annotation accuracy, we have incorporated KNN-based and weighted distance-based uncertainty scores to effectively detect unknown cell populations within the query data. We showcase the utility of scATAnno across multiple datasets, including peripheral blood mononuclear cell (PBMC), basal cell carcinoma (BCC) and Triple Negative Breast Cancer (TNBC), and demonstrate that scATAnno accurately annotates cell types across conditions. Overall, scATAnno is a powerful tool for cell type annotation in scATAC-seq data and can aid in the interpretation of new scATAC-seq datasets in complex biological systems.
Collapse
|