101
|
Yen A, Chen X, Skinner DD, Leti F, Crosby M, Hoisington-Lopez J, Wu Y, Chen J, Mitra RD, Dougherty JD. MYT1L deficiency impairs excitatory neuron trajectory during cortical development. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.06.583632. [PMID: 38496654 PMCID: PMC10942489 DOI: 10.1101/2024.03.06.583632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Mutations that reduce the function of MYT1L, a neuron-specific transcription factor, are associated with a syndromic neurodevelopmental disorder. Furthermore, MYT1L is routinely used as a proneural factor in fibroblast-to-neuron transdifferentiation. MYT1L has been hypothesized to play a role in the trajectory of neuronal specification and subtype specific maturation, but this hypothesis has not been directly tested, nor is it clear which neuron types are most impacted by MYT1L loss. In this study, we profiled 313,335 nuclei from the forebrains of wild-type and MYT1L-deficient mice at two developmental stages: E14 at the peak of neurogenesis and P21, when neurogenesis is complete, to examine the role of MYT1L levels in the trajectory of neuronal development. We found that MYT1L deficiency significantly disrupted the relative proportion of cortical excitatory neurons at E14 and P21. Significant changes in gene expression were largely concentrated in excitatory neurons, suggesting that transcriptional effects of MYT1L deficiency are largely due to disruption of neuronal maturation programs. Most effects on gene expression were cell autonomous and persistent through development. In addition, while MYT1L can both activate and repress gene expression, the repressive effects were most sensitive to haploinsufficiency, and thus more likely mediate MYT1L syndrome. These findings illuminate the intricate role of MYT1L in orchestrating gene expression dynamics during neuronal development, providing insights into the molecular underpinnings of MYT1L syndrome.
Collapse
Affiliation(s)
- Allen Yen
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - Xuhua Chen
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, MO, USA
| | | | | | - MariaLynn Crosby
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, MO, USA
- DNA Sequencing and Innovation Lab, Washington University School of Medicine, Saint Louis, MO
| | - Jessica Hoisington-Lopez
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, MO, USA
- DNA Sequencing and Innovation Lab, Washington University School of Medicine, Saint Louis, MO
| | - Yizhe Wu
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - Jiayang Chen
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - Robi D. Mitra
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, MO, USA
| | - Joseph D. Dougherty
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
- Intellectual and Developmental Disabilities Research Center, Washington University School of Medicine, Saint Louis, MO, USA
- Lead contact
| |
Collapse
|
102
|
Ma R, Sun ED, Donoho D, Zou J. Principled and interpretable alignability testing and integration of single-cell data. Proc Natl Acad Sci U S A 2024; 121:e2313719121. [PMID: 38416677 DOI: 10.1073/pnas.2313719121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 01/23/2024] [Indexed: 03/01/2024] Open
Abstract
Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional single-cell datasets are alignable (and therefore should even be aligned). Moreover, popular methods can substantially distort the data during alignment, making the aligned data and downstream analysis difficult to interpret. To overcome these limitations, we present a spectral manifold alignment and inference (SMAI) framework, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data with the same type of features. SMAI provides a statistical test to robustly assess the alignability between datasets to avoid misleading inference and is justified by high-dimensional statistical theory. On a diverse range of real and simulated benchmark datasets, it outperforms commonly used alignment methods. Moreover, we show that SMAI improves various downstream analyses such as identification of differentially expressed genes and imputation of single-cell spatial transcriptomics, providing further biological insights. SMAI's interpretability also enables quantification and a deeper understanding of the sources of technical confounders in single-cell data.
Collapse
Affiliation(s)
- Rong Ma
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Eric D Sun
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305
| | - David Donoho
- Department of Statistics, Stanford University, Stanford, CA 94305
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305
| |
Collapse
|
103
|
Chen Y, Zou J. GenePT: A Simple But Effective Foundation Model for Genes and Cells Built From ChatGPT. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.16.562533. [PMID: 37905130 PMCID: PMC10614824 DOI: 10.1101/2023.10.16.562533] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
There has been significant recent progress in leveraging large-scale gene expression data to develop foundation models for single-cell biology. Models such as Geneformer and scGPT implicitly learn gene and cellular functions from the gene expression profiles of millions of cells, which requires extensive data curation and resource-intensive training. Here we explore a much simpler alternative by leveraging ChatGPT embeddings of genes based on literature. Our proposal, GenePT, uses NCBI text descriptions of individual genes with GPT-3.5 to generate gene embeddings. From there, GenePT generates single-cell embeddings in two ways: (i) by averaging the gene embeddings, weighted by each gene's expression level; or (ii) by creating a sentence embedding for each cell, using gene names ordered by the expression level. Without the need for dataset curation and additional pretraining, GenePT is efficient and easy to use. On many downstream tasks used to evaluate recent single-cell foundation models - e.g., classifying gene properties and cell types - GenePT achieves comparable, and often better, performance than Geneformer and other models. GenePT demonstrates that large language model embedding of literature is a simple and effective path for biological foundation models.
Collapse
Affiliation(s)
- Yiqun Chen
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, CA, USA
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, CA, USA
- Department of Electrical Engineering, Stanford University, Stanford, 94305, CA, USA
- Department of Computer Science, Stanford University, Stanford, 94305, CA, USA
| |
Collapse
|
104
|
Peidli S, Green TD, Shen C, Gross T, Min J, Garda S, Yuan B, Schumacher LJ, Taylor-King JP, Marks DS, Luna A, Blüthgen N, Sander C. scPerturb: harmonized single-cell perturbation data. Nat Methods 2024; 21:531-540. [PMID: 38279009 DOI: 10.1038/s41592-023-02144-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 12/04/2023] [Indexed: 01/28/2024]
Abstract
Analysis across a growing number of single-cell perturbation datasets is hampered by poor data interoperability. To facilitate development and benchmarking of computational methods, we collect a set of 44 publicly available single-cell perturbation-response datasets with molecular readouts, including transcriptomics, proteomics and epigenomics. We apply uniform quality control pipelines and harmonize feature annotations. The resulting information resource, scPerturb, enables development and testing of computational methods, and facilitates comparison and integration across datasets. We describe energy statistics (E-statistics) for quantification of perturbation effects and significance testing, and demonstrate E-distance as a general distance measure between sets of single-cell expression profiles. We illustrate the application of E-statistics for quantifying similarity and efficacy of perturbations. The perturbation-response datasets and E-statistics computation software are publicly available at scperturb.org. This work provides an information resource for researchers working with single-cell perturbation data and recommendations for experimental design, including optimal cell counts and read depth.
Collapse
Affiliation(s)
- Stefan Peidli
- Institute of Pathology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität, Berlin, Germany.
- Institute of Biology, Humboldt-Universität, Berlin, Germany.
| | - Tessa D Green
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Ciyue Shen
- Departments of Cell Biology and Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute, Cambridge, MA, USA
| | | | - Joseph Min
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Samuele Garda
- Institute of Biology, Humboldt-Universität, Berlin, Germany
- Institute for Computer Science, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Bo Yuan
- Departments of Cell Biology and Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute, Cambridge, MA, USA
| | - Linus J Schumacher
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh, UK
| | | | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute, Cambridge, MA, USA
| | - Augustin Luna
- Departments of Cell Biology and Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute, Cambridge, MA, USA.
- Computational Biology Branch, National Library of Medicine and Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD, USA.
| | - Nils Blüthgen
- Institute of Pathology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität, Berlin, Germany.
- Institute of Biology, Humboldt-Universität, Berlin, Germany.
| | - Chris Sander
- Departments of Cell Biology and Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute, Cambridge, MA, USA.
| |
Collapse
|
105
|
Imbalanced single-cell data integration leads to loss of biological information. Nat Biotechnol 2024:10.1038/s41587-023-02114-x. [PMID: 38429429 DOI: 10.1038/s41587-023-02114-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2024]
|
106
|
Garmire LX, Li Y, Huang Q, Xu C, Teichmann SA, Kaminski N, Pellegrini M, Nguyen Q, Teschendorff AE. Challenges and perspectives in computational deconvolution of genomics data. Nat Methods 2024; 21:391-400. [PMID: 38374264 DOI: 10.1038/s41592-023-02166-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 12/26/2023] [Indexed: 02/21/2024]
Abstract
Deciphering cell-type heterogeneity is crucial for systematically understanding tissue homeostasis and its dysregulation in diseases. Computational deconvolution is an efficient approach for estimating cell-type abundances from a variety of omics data. Despite substantial methodological progress in computational deconvolution in recent years, challenges are still outstanding. Here we enlist four important challenges related to computational deconvolution: the quality of the reference data, generation of ground truth data, limitations of computational methodologies, and benchmarking design and implementation. Finally, we make recommendations on reference data generation, new directions of computational methodologies, and strategies to promote rigorous benchmarking.
Collapse
Affiliation(s)
- Lana X Garmire
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| | - Yijun Li
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Qianhui Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chuan Xu
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | | | - Naftali Kaminski
- Pulmonary, Critical Care & Sleep Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Matteo Pellegrini
- Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Quan Nguyen
- Institute for Molecular Bioscience, The University of Queensland and QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Andrew E Teschendorff
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- UCL Cancer Institute, University College London, London, UK
| |
Collapse
|
107
|
Bárcenas-Walls JR, Ansaloni F, Hervé B, Strandback E, Nyman T, Castelo-Branco G, Bartošovič M. Nano-CUT&Tag for multimodal chromatin profiling at single-cell resolution. Nat Protoc 2024; 19:791-830. [PMID: 38129675 DOI: 10.1038/s41596-023-00932-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 10/19/2023] [Indexed: 12/23/2023]
Abstract
The ability to comprehensively analyze the chromatin state with single-cell resolution is crucial for understanding gene regulatory principles in heterogenous tissues or during development. Recently, we developed a nanobody-based single-cell CUT&Tag (nano-CT) protocol to simultaneously profile three epigenetic modalities-two histone marks and open chromatin state-from the same single cell. Nano-CT implements a new set of secondary nanobody-Tn5 fusion proteins to direct barcoded tagmentation by Tn5 transposase to genomic targets labeled by primary antibodies raised in different species. Such nanobody-Tn5 fusion proteins are currently not commercially available, and their in-house production and purification can be completed in 3-4 d by following our detailed protocol. The single-cell indexing in nano-CT is performed on a commercially available platform, making it widely accessible to the community. In comparison to other multimodal methods, nano-CT stands out in data complexity, low sample requirements and the flexibility to choose two of the three modalities. In addition, nano-CT works efficiently with fresh brain samples, generating multimodal epigenomic profiles for thousands of brain cells at single-cell resolution. The nano-CT protocol can be completed in just 3 d by users with basic skills in standard molecular biology and bioinformatics, although previous experience with single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) is beneficial for more in-depth data analysis. As a multimodal assay, nano-CT holds immense potential to reveal interactions of various chromatin modalities, to explore epigenetic heterogeneity and to increase our understanding of the role and interplay that chromatin dynamics has in cellular development.
Collapse
Affiliation(s)
| | - Federico Ansaloni
- Laboratory of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Bastien Hervé
- Laboratory of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Emilia Strandback
- Protein Science Facility, Department of Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Tomas Nyman
- Protein Science Facility, Department of Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Gonçalo Castelo-Branco
- Laboratory of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
- Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, Sweden
| | - Marek Bartošovič
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden.
| |
Collapse
|
108
|
Maan H, Zhang L, Yu C, Geuenich MJ, Campbell KR, Wang B. Characterizing the impacts of dataset imbalance on single-cell data integration. Nat Biotechnol 2024:10.1038/s41587-023-02097-9. [PMID: 38429430 DOI: 10.1038/s41587-023-02097-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 12/13/2023] [Indexed: 03/03/2024]
Abstract
Computational methods for integrating single-cell transcriptomic data from multiple samples and conditions do not generally account for imbalances in the cell types measured in different datasets. In this study, we examined how differences in the cell types present, the number of cells per cell type and the cell type proportions across samples affect downstream analyses after integration. The Iniquitate pipeline assesses the robustness of integration results after perturbing the degree of imbalance between datasets. Benchmarking of five state-of-the-art single-cell RNA sequencing integration techniques in 2,600 integration experiments indicates that sample imbalance has substantial impacts on downstream analyses and the biological interpretation of integration results. Imbalance perturbation led to statistically significant variation in unsupervised clustering, cell type classification, differential expression and marker gene annotation, query-to-reference mapping and trajectory inference. We quantified the impacts of imbalance through newly introduced properties-aggregate cell type support and minimum cell type center distance. To better characterize and mitigate impacts of imbalance, we introduce balanced clustering metrics and imbalanced integration guidelines for integration method users.
Collapse
Affiliation(s)
- Hassaan Maan
- Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada.
- Vector Institute, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
| | - Lin Zhang
- Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Chengxin Yu
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada
| | - Michael J Geuenich
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada
| | - Kieran R Campbell
- Vector Institute, Toronto, Ontario, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.
- Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada.
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada.
| | - Bo Wang
- Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada.
- Vector Institute, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
109
|
Liu W, Li W, Zhao Z. Single-Cell Transcriptomics Reveals Pre-existing COVID-19 Vulnerability Factors in Lung Cancer Patients. Mol Cancer Res 2024; 22:240-253. [PMID: 38063850 PMCID: PMC10922768 DOI: 10.1158/1541-7786.mcr-23-0692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Revised: 11/09/2023] [Accepted: 12/06/2023] [Indexed: 01/07/2024]
Abstract
Coronavirus disease 2019 (COVID-19) and cancer are major health threats, and individuals may develop both simultaneously. Recent studies have indicated that patients with cancer are particularly vulnerable to COVID-19, but the molecular mechanisms underlying the associations remain poorly understood. To address this knowledge gap, we collected single-cell RNA-sequencing data from COVID-19, lung adenocarcinoma, small cell lung carcinoma patients, and normal lungs to perform an integrated analysis. We characterized altered cell populations, gene expression, and dysregulated intercellular communication in diseases. Our analysis identified pathologic conditions shared by COVID-19 and lung cancer, including upregulated TMPRSS2 expression in epithelial cells, stronger inflammatory responses mediated by macrophages, increased T-cell response suppression, and elevated fibrosis risk by pathologic fibroblasts. These pre-existing conditions in patients with lung cancer may lead to more severe inflammation, fibrosis, and weakened adaptive immune response upon COVID-19 infection. Our findings revealed potential molecular mechanisms driving an increased COVID-19 risk in patients with lung cancer and suggested preventive and therapeutic targets for COVID-19 in this population. IMPLICATIONS Our work reveals the potential molecular mechanisms contributing to the vulnerability to COVID-19 in patients with lung cancer.
Collapse
Affiliation(s)
- Wendao Liu
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, USA
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Wenbo Li
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, USA
- Department of Biochemistry and Molecular Biology, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Zhongming Zhao
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, USA
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
110
|
Arevalo J, Su E, van Dijk R, Carpenter AE, Singh S. Evaluating batch correction methods for image-based cell profiling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.15.558001. [PMID: 37745478 PMCID: PMC10516049 DOI: 10.1101/2023.09.15.558001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
High-throughput image-based profiling platforms are powerful technologies capable of collecting data from billions of cells exposed to thousands of perturbations in a time- and cost-effective manner. Therefore, image-based profiling data has been increasingly used for diverse biological applications, such as predicting drug mechanism of action or gene function. However, batch effects pose severe limitations to community-wide efforts to integrate and interpret image-based profiling data collected across different laboratories and equipment. To address this problem, we benchmarked seven high-performing scRNA-seq batch correction techniques, representing diverse approaches, using a newly released Cell Painting dataset, the largest publicly accessible image-based dataset. We focused on five different scenarios with varying complexity, and we found that Harmony, a mixture-model based method, consistently outperformed the other tested methods. Our proposed framework, benchmark, and metrics can additionally be used to assess new batch correction methods in the future. Overall, this work paves the way for improvements that allow the community to make best use of public Cell Painting data for scientific discovery.
Collapse
Affiliation(s)
- John Arevalo
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Ellen Su
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Robert van Dijk
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Anne E Carpenter
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Shantanu Singh
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
111
|
Liu T, Li K, Wang Y, Li H, Zhao H. Evaluating the Utilities of Foundation Models in Single-cell Data Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.08.555192. [PMID: 38464157 PMCID: PMC10925156 DOI: 10.1101/2023.09.08.555192] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Foundation Models (FMs) have made significant strides in both industrial and scientific domains. In this paper, we evaluate the performance of FMs in single-cell sequencing data analysis through comprehensive experiments across eight downstream tasks pertinent to single-cell data. By comparing ten different single-cell FMs with task-specific methods, we found that single-cell FMs may not consistently excel in all tasks than task-specific methods. However, the emergent abilities and the successful applications of cross-species/cross-modality transfer learning of FMs are promising. In addition, we present a systematic evaluation of the effects of hyper-parameters, initial settings, and stability for training single-cell FMs based on a proposed scEval framework, and provide guidelines for pre-training and fine-tuning. Our work summarizes the current state of single-cell FMs and points to their constraints and avenues for future development.
Collapse
|
112
|
Zito A, Lee JT. Variable expression of MECP2, CDKL5, and FMR1 in the human brain: Implications for gene restorative therapies. Proc Natl Acad Sci U S A 2024; 121:e2312757121. [PMID: 38386709 PMCID: PMC10907246 DOI: 10.1073/pnas.2312757121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 12/28/2023] [Indexed: 02/24/2024] Open
Abstract
MECP2, CDKL5, and FMR1 are three X-linked neurodevelopmental genes associated with Rett, CDKL5-, and fragile-X syndrome, respectively. These syndromes are characterized by distinct constellations of severe cognitive and neurobehavioral anomalies, reflecting the broad but unique expression patterns of each of the genes in the brain. As these disorders are not thought to be neurodegenerative and may be reversible, a major goal has been to restore expression of the functional proteins in the patient's brain. Strategies have included gene therapy, gene editing, and selective Xi-reactivation methodologies. However, tissue penetration and overall delivery to various regions of the brain remain challenging for each strategy. Thus, gaining insights into how much restoration would be required and what regions/cell types in the brain must be targeted for meaningful physiological improvement would be valuable. As a step toward addressing these questions, here we perform a meta-analysis of single-cell transcriptomics data from the human brain across multiple developmental stages, in various brain regions, and in multiple donors. We observe a substantial degree of expression variability for MECP2, CDKL5, and FMR1 not only across cell types but also between donors. The wide range of expression may help define a therapeutic window, with the low end delineating a minimum level required to restore physiological function and the high end informing toxicology margin. Finally, the inter-cellular and inter-individual variability enable identification of co-varying genes and will facilitate future identification of biomarkers.
Collapse
Affiliation(s)
- Antonino Zito
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA02114
- Department of Genetics, The Blavatnik Institute, Harvard Medical School, Boston, MA02114
| | - Jeannie T. Lee
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA02114
- Department of Genetics, The Blavatnik Institute, Harvard Medical School, Boston, MA02114
| |
Collapse
|
113
|
Tang S, Cui X, Wang R, Li S, Li S, Huang X, Chen S. scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data. Nat Commun 2024; 15:1629. [PMID: 38388573 PMCID: PMC10884038 DOI: 10.1038/s41467-024-46045-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 02/12/2024] [Indexed: 02/24/2024] Open
Abstract
Single-cell chromatin accessibility sequencing (scCAS) has emerged as a valuable tool for interrogating and elucidating epigenomic heterogeneity and gene regulation. However, scCAS data inherently suffers from limitations such as high sparsity and dimensionality, which pose significant challenges for downstream analyses. Although several methods are proposed to enhance scCAS data, there are still challenges and limitations that hinder the effectiveness of these methods. Here, we propose scCASE, a scCAS data enhancement method based on non-negative matrix factorization which incorporates an iteratively updating cell-to-cell similarity matrix. Through comprehensive experiments on multiple datasets, we demonstrate the advantages of scCASE over existing methods for scCAS data enhancement. The interpretable cell type-specific peaks identified by scCASE can provide valuable biological insights into cell subpopulations. Moreover, to leverage the large compendia of available omics data as a reference, we further expand scCASE to scCASER, which enables the incorporation of external reference data to improve enhancement performance.
Collapse
Affiliation(s)
- Songming Tang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Xuejian Cui
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, 100084, Beijing, China
| | - Rongxiang Wang
- Department of Computer Science, University of Virginia, Charlottesville, VA, 22903, USA
| | - Sijie Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Siyu Li
- School of Statistics and Data Science, Nankai University, Tianjin, 300071, China
| | - Xin Huang
- Beijing Key Laboratory for Radiobiology, Department of Radiation Biology, Beijing Institute of Radiation Medicine, 100850, Beijing, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| |
Collapse
|
114
|
Gao C, Welch JD. Integrating single-cell multimodal epigenomic data using 1D-convolutional neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.16.580655. [PMID: 38464242 PMCID: PMC10925154 DOI: 10.1101/2024.02.16.580655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Recent experimental developments enable single-cell multimodal epigenomic profiling, which measures multiple histone modifications and chromatin accessibility within the same cell. Such parallel measurements provide exciting new opportunities to investigate how epigenomic modalities vary together across cell types and states. A pivotal step in using this type of data is integrating the epigenomic modalities to learn a unified representation of each cell, but existing approaches are not designed to model the unique nature of this data type. Our key insight is to model single-cell multimodal epigenome data as a multi-channel sequential signal. Based on this insight, we developed ConvNet-VAEs, a novel framework that uses 1D-convolutional variational autoencoders (VAEs) for single-cell multimodal epigenomic data integration. We evaluated ConvNet-VAEs on nano-CT and scNTT-seq data generated from juvenile mouse brain and human bone marrow. We found that ConvNet-VAEs can perform dimension reduction and batch correction better than previous architectures while using significantly fewer parameters. Furthermore, the performance gap between convolutional and fully-connected architectures increases with the number of modalities, and deeper convolutional architectures can increase performance while performance degrades for deeper fully-connected architectures. Our results indicate that convolutional autoencoders are a promising method for integrating current and future single-cell multimodal epigenomic datasets.
Collapse
Affiliation(s)
- Chao Gao
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor MI 48109, USA
| | - Joshua D Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor MI 48109, USA
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor MI 48109, USA
| |
Collapse
|
115
|
Massoni-Badosa R, Aguilar-Fernández S, Nieto JC, Soler-Vila P, Elosua-Bayes M, Marchese D, Kulis M, Vilas-Zornoza A, Bühler MM, Rashmi S, Alsinet C, Caratù G, Moutinho C, Ruiz S, Lorden P, Lunazzi G, Colomer D, Frigola G, Blevins W, Romero-Rivero L, Jiménez-Martínez V, Vidal A, Mateos-Jaimez J, Maiques-Diaz A, Ovejero S, Moreaux J, Palomino S, Gomez-Cabrero D, Agirre X, Weniger MA, King HW, Garner LC, Marini F, Cervera-Paz FJ, Baptista PM, Vilaseca I, Rosales C, Ruiz-Gaspà S, Talks B, Sidhpura K, Pascual-Reguant A, Hauser AE, Haniffa M, Prosper F, Küppers R, Gut IG, Campo E, Martin-Subero JI, Heyn H. An atlas of cells in the human tonsil. Immunity 2024; 57:379-399.e18. [PMID: 38301653 PMCID: PMC10869140 DOI: 10.1016/j.immuni.2024.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 07/07/2023] [Accepted: 01/09/2024] [Indexed: 02/03/2024]
Abstract
Palatine tonsils are secondary lymphoid organs (SLOs) representing the first line of immunological defense against inhaled or ingested pathogens. We generated an atlas of the human tonsil composed of >556,000 cells profiled across five different data modalities, including single-cell transcriptome, epigenome, proteome, and immune repertoire sequencing, as well as spatial transcriptomics. This census identified 121 cell types and states, defined developmental trajectories, and enabled an understanding of the functional units of the tonsil. Exemplarily, we stratified myeloid slan-like subtypes, established a BCL6 enhancer as locally active in follicle-associated T and B cells, and identified SIX5 as putative transcriptional regulator of plasma cell maturation. Analyses of a validation cohort confirmed the presence, annotation, and markers of tonsillar cell types and provided evidence of age-related compositional shifts. We demonstrate the value of this resource by annotating cells from B cell-derived mantle cell lymphomas, linking transcriptional heterogeneity to normal B cell differentiation states of the human tonsil.
Collapse
Affiliation(s)
| | | | - Juan C Nieto
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Paula Soler-Vila
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | | | | | - Marta Kulis
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Amaia Vilas-Zornoza
- Hemato-Oncology Program, Center for Applied Medical Research (CIMA), University of Navarra, IDISNA, Universidad de Navarra, Pamplona, Spain; Centro de Investigación Biomédica en Red Cáncer (CIBERONC), Madrid, Spain
| | - Marco Matteo Bühler
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland; Hematopathology Section, Pathology Department, Hospital Clinic, Barcelona, Spain
| | - Sonal Rashmi
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Clara Alsinet
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Ginevra Caratù
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Catia Moutinho
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Sara Ruiz
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Patricia Lorden
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Giulia Lunazzi
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Dolors Colomer
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Centro de Investigación Biomédica en Red Cáncer (CIBERONC), Madrid, Spain; Hematopathology Section, Pathology Department, Hospital Clinic, Barcelona, Spain; Departament de Fonaments Clínics, Facultat de Medicina, Universitat de Barcelona, Barcelona, Spain
| | - Gerard Frigola
- Hematopathology Section, Pathology Department, Hospital Clinic, Barcelona, Spain
| | - Will Blevins
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain
| | - Lucia Romero-Rivero
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | | | - Anna Vidal
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Judith Mateos-Jaimez
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Alba Maiques-Diaz
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Sara Ovejero
- Department of Biological Hematology, CHU Montpellier, Montpellier, France; Institute of Human Genetics, UMR 9002 CNRS-UM, Montpellier, France
| | - Jérôme Moreaux
- Department of Biological Hematology, CHU Montpellier, Montpellier, France; Institute of Human Genetics, UMR 9002 CNRS-UM, Montpellier, France; Department of Clinical Hematology, CHU Montpellier, Montpellier, France
| | - Sara Palomino
- Translational Bioinformatics Unit (TransBio), Navarrabiomed, Navarra Health Department (CHN), Public University of Navarra (UPNA), Navarra Institute for Health Research (IdiSNA), Pamplona, Spain
| | - David Gomez-Cabrero
- Translational Bioinformatics Unit (TransBio), Navarrabiomed, Navarra Health Department (CHN), Public University of Navarra (UPNA), Navarra Institute for Health Research (IdiSNA), Pamplona, Spain; Bioscience Program, Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology KAUST, Thuwal, Saudi Arabia
| | - Xabier Agirre
- Hemato-Oncology Program, Center for Applied Medical Research (CIMA), University of Navarra, IDISNA, Universidad de Navarra, Pamplona, Spain; Centro de Investigación Biomédica en Red Cáncer (CIBERONC), Madrid, Spain
| | - Marc A Weniger
- Institute of Cell Biology (Cancer Research), Medical Faculty, University of Duisburg-Essen, Essen, Germany
| | - Hamish W King
- Epigenetics and Development Division, Walter and Eliza Hall Institute, Parkville, Australia
| | - Lucy C Garner
- Translational Gastroenterology Unit, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Federico Marini
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany; Center for Thrombosis and Hemostasis (CTH), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | | | - Peter M Baptista
- Department of Otorhinolaryngology, University of Navarra, Pamplona, Spain
| | - Isabel Vilaseca
- Otorhinolaryngology Head-Neck Surgery Department, Hospital Clínic, IDIBAPS Universitat de Barcelona, Barcelona, Spain
| | - Cecilia Rosales
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Silvia Ruiz-Gaspà
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Benjamin Talks
- Biosciences Institute, Newcastle University, Newcastle Upon Tyne, UK; Department of Otolaryngology, Freeman Hospital, Newcastle Hospitals NHS Foundation Trust, Newcastle Upon Tyne, UK
| | - Keval Sidhpura
- Biosciences Institute, Newcastle University, Newcastle Upon Tyne, UK
| | - Anna Pascual-Reguant
- Department of Rheumatology and Clinical Immunology, Charité - Universitätsmedizin Berlin, Berlin, Germany; Immune Dynamics, Deutsches Rheuma-Forschungszentrum (DRFZ), Berlin, Germany
| | - Anja E Hauser
- Department of Rheumatology and Clinical Immunology, Charité - Universitätsmedizin Berlin, Berlin, Germany; Immune Dynamics, Deutsches Rheuma-Forschungszentrum (DRFZ), Berlin, Germany
| | - Muzlifah Haniffa
- Biosciences Institute, Newcastle University, Newcastle Upon Tyne, UK; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK; Department of Dermatology and NIHR Newcastle Biomedical Research Centre, Newcastle Hospitals NHS Foundation Trust, Newcastle Upon Tyne, UK
| | - Felipe Prosper
- Hemato-Oncology Program, Center for Applied Medical Research (CIMA), University of Navarra, IDISNA, Universidad de Navarra, Pamplona, Spain; Centro de Investigación Biomédica en Red Cáncer (CIBERONC), Madrid, Spain; Departamento de Hematología, Clínica Universidad de Navarra, University of Navarra, Pamplona, Spain
| | - Ralf Küppers
- Institute of Cell Biology (Cancer Research), Medical Faculty, University of Duisburg-Essen, Essen, Germany
| | - Ivo Glynne Gut
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Elias Campo
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Centro de Investigación Biomédica en Red Cáncer (CIBERONC), Madrid, Spain; Hematopathology Section, Pathology Department, Hospital Clinic, Barcelona, Spain; Departament de Fonaments Clínics, Facultat de Medicina, Universitat de Barcelona, Barcelona, Spain
| | - José Ignacio Martin-Subero
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Departament de Fonaments Clínics, Facultat de Medicina, Universitat de Barcelona, Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| | - Holger Heyn
- Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| |
Collapse
|
116
|
Hrovatin K, Moinfar AA, Zappia L, Lapuerta AT, Lengerich B, Kellis M, Theis FJ. Integrating single-cell RNA-seq datasets with substantial batch effects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.03.565463. [PMID: 37961672 PMCID: PMC10635119 DOI: 10.1101/2023.11.03.565463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Integration of single-cell RNA-sequencing (scRNA-seq) datasets has become a standard part of the analysis, with conditional variational autoencoders (cVAE) being among the most popular approaches. Increasingly, researchers are asking to map cells across challenging cases such as cross-organs, species, or organoids and primary tissue, as well as different scRNA-seq protocols, including single-cell and single-nuclei. Current computational methods struggle to harmonize datasets with such substantial differences, driven by technical or biological variation. Here, we propose to address these challenges for the popular cVAE-based approaches by introducing and comparing a series of regularization constraints. The two commonly used strategies for increasing batch correction in cVAEs, that is Kullback-Leibler divergence (KL) regularization strength tuning and adversarial learning, suffer from substantial loss of biological information. Therefore, we adapt, implement, and assess alternative regularization strategies for cVAEs and investigate how they improve batch effect removal or better preserve biological variation, enabling us to propose an optimal cVAE-based integration strategy for complex systems. We show that using a VampPrior instead of the commonly used Gaussian prior not only improves the preservation of biological variation but also unexpectedly batch correction. Moreover, we show that our implementation of cycle-consistency loss leads to significantly better biological preservation than adversarial learning implemented in the previously proposed GLUE model. Additionally, we do not recommend relying only on the KL regularization strength tuning for increasing batch correction, as it removes both biological and batch information without discriminating between the two. Based on our findings, we propose a new model that combines VampPrior and cycle-consistency loss. We show that using it for datasets with substantial batch effects improves downstream interpretation of cell states and biological conditions. To ease the use of the newly proposed model, we make it available in the scvi-tools package as an external model named sysVI. Moreover, in the future, these regularization techniques could be added to other established cVAE-based models to improve the integration of datasets with substantial batch effects.
Collapse
Affiliation(s)
- Karin Hrovatin
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA
- Broad Institute of MIT and Harvard, Cambridge, MA
| | - Amir Ali Moinfar
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Luke Zappia
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Alejandro Tejada Lapuerta
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Ben Lengerich
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA
- Broad Institute of MIT and Harvard, Cambridge, MA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA
- Broad Institute of MIT and Harvard, Cambridge, MA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| |
Collapse
|
117
|
Haage V, Tuddenham JF, Comandante-Lou N, Bautista A, Monzel A, Chiu R, Fujita M, Garcia FG, Bhattarai P, Patel R, Buonfiglioli A, Idiarte J, Herman M, Rinderspacher A, Mela A, Zhao W, Argenziano MG, Furnari JL, Banu MA, Landry DW, Bruce JN, Canoll P, Zhang Y, Nuriel T, Kizil C, Sproul AA, de Witte LD, Sims PA, Menon V, Picard M, De Jager PL. A pharmacological toolkit for human microglia identifies Topoisomerase I inhibitors as immunomodulators for Alzheimer's disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.06.579103. [PMID: 38370689 PMCID: PMC10871172 DOI: 10.1101/2024.02.06.579103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
While efforts to identify microglial subtypes have recently accelerated, the relation of transcriptomically defined states to function has been largely limited to in silico annotations. Here, we characterize a set of pharmacological compounds that have been proposed to polarize human microglia towards two distinct states - one enriched for AD and MS genes and another characterized by increased expression of antigen presentation genes. Using different model systems including HMC3 cells, iPSC-derived microglia and cerebral organoids, we characterize the effect of these compounds in mimicking human microglial subtypes in vitro. We show that the Topoisomerase I inhibitor Camptothecin induces a CD74high/MHChigh microglial subtype which is specialized in amyloid beta phagocytosis. Camptothecin suppressed amyloid toxicity and restored microglia back to their homeostatic state in a zebrafish amyloid model. Our work provides avenues to recapitulate human microglial subtypes in vitro, enabling functional characterization and providing a foundation for modulating human microglia in vivo.
Collapse
Affiliation(s)
- Verena Haage
- Center for Translational & Computational Neuroimmunology, Neuroimmunology Division, Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - John F. Tuddenham
- Center for Translational & Computational Neuroimmunology, Neuroimmunology Division, Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Natacha Comandante-Lou
- Center for Translational & Computational Neuroimmunology, Neuroimmunology Division, Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Alex Bautista
- Center for Translational & Computational Neuroimmunology, Neuroimmunology Division, Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Anna Monzel
- Department of Psychiatry, Division of Behavioral Medicine, College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, USA
| | - Rebecca Chiu
- Center for Translational & Computational Neuroimmunology, Neuroimmunology Division, Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Masashi Fujita
- Center for Translational & Computational Neuroimmunology, Neuroimmunology Division, Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Frankie G. Garcia
- Center for Translational & Computational Neuroimmunology, Neuroimmunology Division, Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Prabesh Bhattarai
- Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Ronak Patel
- Department of Pathology and Cell Biology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Alice Buonfiglioli
- Department of Psychiatry, Icahn School of Medicine, 1460 Madison Avenue, New York, NY, 10029, United States
| | - Juan Idiarte
- Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Mathieu Herman
- Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | | | - Angeliki Mela
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Wenting Zhao
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Michael G. Argenziano
- Department of Neurological Surgery, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Julia L. Furnari
- Department of Neurological Surgery, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Matei A. Banu
- Department of Neurological Surgery, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Donald W. Landry
- Department of Medicine, Columbia University, New York, NY 10032, United States
| | - Jeffrey N. Bruce
- Department of Neurological Surgery, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Peter Canoll
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Ya Zhang
- Center for Translational & Computational Neuroimmunology, Neuroimmunology Division, Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Tal Nuriel
- Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Caghan Kizil
- Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Andrew A. Sproul
- Department of Pathology and Cell Biology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Lotje D. de Witte
- Department of Psychiatry, Icahn School of Medicine, 1460 Madison Avenue, New York, NY, 10029, United States
| | - Peter A. Sims
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Vilas Menon
- Center for Translational & Computational Neuroimmunology, Neuroimmunology Division, Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Martin Picard
- Department of Psychiatry, Division of Behavioral Medicine, College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, USA
- Department of Neurology, H. Houston Merritt Center, Columbia Translational Neuroscience Initiative, College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, USA
- New York State Psychiatric Institute, New York, USA
- Robert N Butler Columbia Aging Center, Columbia University Mailman School of Public Health, New York, NY, USA
| | - Philip L. De Jager
- Center for Translational & Computational Neuroimmunology, Neuroimmunology Division, Department of Neurology and the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States
| |
Collapse
|
118
|
Liu J, Ma J, Wen J, Zhou X. A Cell Cycle-aware Network for Data Integration and Label Transferring of Single-cell RNA-seq and ATAC-seq. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.31.578213. [PMID: 38352302 PMCID: PMC10862874 DOI: 10.1101/2024.01.31.578213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
In recent years, the integration of single-cell multi-omics data has provided a more comprehensive understanding of cell functions and internal regulatory mechanisms from a non-single omics perspective, but it still suffers many challenges, such as omics-variance, sparsity, cell heterogeneity and confounding factors. As we know, cell cycle is regarded as a confounder when analyzing other factors in single-cell RNA-seq data, but it's not clear how it will work on the integrated single-cell multi-omics data. Here, we developed a Cell Cycle-Aware Network (CCAN) to remove cell cycle effects from the integrated single-cell multi-omics data while keeping the cell type-specific variations. This is the first computational model to study the cell-cycle effects in the integration of single-cell multi-omics data. Validations on several benchmark datasets show the out-standing performance of CCAN in a variety of downstream analyses and applications, including removing cell cycle effects and batch effects of scRNA-seq datasets from different protocols, integrating paired and unpaired scRNA-seq and scATAC-seq data, accurately transferring cell type labels from scRNA-seq to scATAC-seq data, and characterizing the differentiation process from hematopoietic stem cells to different lineages in the integration of differentiation data.
Collapse
|
119
|
Zhang K, Zemke NR, Armand EJ, Ren B. A fast, scalable and versatile tool for analysis of single-cell omics data. Nat Methods 2024; 21:217-227. [PMID: 38191932 PMCID: PMC10864184 DOI: 10.1038/s41592-023-02139-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 11/23/2023] [Indexed: 01/10/2024]
Abstract
Single-cell omics technologies have revolutionized the study of gene regulation in complex tissues. A major computational challenge in analyzing these datasets is to project the large-scale and high-dimensional data into low-dimensional space while retaining the relative relationships between cells. This low dimension embedding is necessary to decompose cellular heterogeneity and reconstruct cell-type-specific gene regulatory programs. Traditional dimensionality reduction techniques, however, face challenges in computational efficiency and in comprehensively addressing cellular diversity across varied molecular modalities. Here we introduce a nonlinear dimensionality reduction algorithm, embodied in the Python package SnapATAC2, which not only achieves a more precise capture of single-cell omics data heterogeneities but also ensures efficient runtime and memory usage, scaling linearly with the number of cells. Our algorithm demonstrates exceptional performance, scalability and versatility across diverse single-cell omics datasets, including single-cell assay for transposase-accessible chromatin using sequencing, single-cell RNA sequencing, single-cell Hi-C and single-cell multi-omics datasets, underscoring its utility in advancing single-cell analysis.
Collapse
Affiliation(s)
- Kai Zhang
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA
- Westlake Laboratory of Life Sciences and Biomedicine, School of Life Sciences, Westlake University, Hangzhou, China
| | - Nathan R Zemke
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA
- Center for Epigenomics, University of California, San Diego School of Medicine, La Jolla, CA, USA
| | - Ethan J Armand
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA
| | - Bing Ren
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA.
- Center for Epigenomics, University of California, San Diego School of Medicine, La Jolla, CA, USA.
- Ludwig Institute for Cancer Research, La Jolla, CA, USA.
- Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
120
|
Sharma D, Worssam MD, Pedroza AJ, Dalal AR, Alemany H, Kim HJ, Kundu R, Fischbein M, Cheng P, Wirka R, Quertermous T. Comprehensive Integration of Multiple Single-Cell Transcriptomic Data Sets Defines Distinct Cell Populations and Their Phenotypic Changes in Murine Atherosclerosis. Arterioscler Thromb Vasc Biol 2024; 44:391-408. [PMID: 38152886 PMCID: PMC11285358 DOI: 10.1161/atvbaha.123.320030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 12/12/2023] [Indexed: 12/29/2023]
Abstract
BACKGROUND The application of single-cell transcriptomic (single-cell RNA sequencing) analysis to the study of atherosclerosis has provided unique insights into the molecular and genetic mechanisms that mediate disease risk and pathophysiology. However, nonstandardized methodologies and relatively high costs associated with the technique have limited the size and replication of existing data sets and created disparate or contradictory findings that have fostered misunderstanding and controversy. METHODS To address these uncertainties, we have performed a conservative integration of multiple published single-cell RNA sequencing data sets into a single meta-analysis, performed extended analysis of native resident vascular cells, and used in situ hybridization to map the disease anatomic location of the identified cluster cells. To investigate the transdifferentiation of smooth muscle cells to macrophage phenotype, we have developed a classifying algorithm based on the quantification of reporter transgene expression. RESULTS The reporter gene expression tool indicates that within the experimental limits of the examined studies, transdifferentiation of smooth muscle cell to the macrophage lineage is extremely rare. Validated transition smooth muscle cell phenotypes were defined by clustering, and the location of these cells was mapped to lesion anatomy with in situ hybridization. We have also characterized 5 endothelial cell phenotypes and linked these cellular species to different vascular structures and functions. Finally, we have identified a transcriptomically unique cellular phenotype that constitutes the aortic valve. CONCLUSIONS Taken together, these analyses resolve a number of outstanding issues related to differing results reported with vascular disease single-cell RNA sequencing studies, and significantly extend our understanding of the role of resident vascular cells in anatomy and disease.
Collapse
Affiliation(s)
- Disha Sharma
- Division of Cardiovascular Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305
| | - Matthew DeForest Worssam
- Division of Cardiovascular Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305
| | - Albert J. Pedroza
- Division of Cardiothoracic surgery, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305
| | - Alex R. Dalal
- Division of Cardiothoracic surgery, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305
| | - Haizea Alemany
- Division of Cardiovascular Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305
| | - Hyun-Jung Kim
- Division of Cardiovascular Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305
| | | | - Michael Fischbein
- Division of Cardiothoracic surgery, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305
| | - Paul Cheng
- Division of Cardiovascular Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305
| | - Robert Wirka
- Division of Cardiology, McAllister Heart Institute, UNC School of Medicine, 111 Mason Farm Road, MBRB 3312B, Chapel Hill, NC 27599-7126
| | - Thomas Quertermous
- Division of Cardiovascular Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305
| |
Collapse
|
121
|
Pascual-Reguant A, Kroh S, Hauser AE. Tissue niches and immunopathology through the lens of spatial tissue profiling techniques. Eur J Immunol 2024; 54:e2350484. [PMID: 37985207 DOI: 10.1002/eji.202350484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 11/13/2023] [Accepted: 11/15/2023] [Indexed: 11/22/2023]
Abstract
Spatial organization plays a fundamental role in biology, influencing the function of biological structures at various levels. The immune system, in particular, relies on the orchestrated interactions of immune cells with their microenvironment to mount protective or pathogenic immune responses. The COVID-19 pandemic has underscored the significance of studying immunity within target organs to understand disease progression and severity. To achieve this, multiplex histology and spatial transcriptomics have proven indispensable in providing a spatial context to protein and gene expression patterns. By combining these techniques, researchers gain a more comprehensive understanding of the complex interactions at the cellular and molecular level in distinct tissue niches, key functional units modulating health and disease. In this review, we discuss recent advances in spatial tissue profiling techniques, highlighting their advantages over traditional histopathology studies. The insights gained from these approaches have the potential to revolutionize the diagnosis and treatment of various diseases including cancer, autoimmune disorders, and infectious diseases. However, we also acknowledge their challenges and limitations. Despite these, spatial tissue profiling offers promising opportunities to improve our understanding of how tissue niches direct regional immunity, and their relevance in tissue immunopathology, as a basis for novel therapeutic strategies and personalized medicine.
Collapse
Affiliation(s)
- Anna Pascual-Reguant
- Department of Rheumatology and Clinical Immunology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Immune Dynamics, Deutsches Rheuma-Forschungszentrum (DRFZ), Leibniz Institute, Berlin, Germany
- Spatial Genomics, Centre Nacional d'Anàlisi Genòmica, Barcelona, 08028, Spain
| | - Sandy Kroh
- Immune Dynamics, Deutsches Rheuma-Forschungszentrum (DRFZ), Leibniz Institute, Berlin, Germany
| | - Anja E Hauser
- Department of Rheumatology and Clinical Immunology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Immune Dynamics, Deutsches Rheuma-Forschungszentrum (DRFZ), Leibniz Institute, Berlin, Germany
| |
Collapse
|
122
|
Ghazanfar S, Guibentif C, Marioni JC. Stabilized mosaic single-cell data integration using unshared features. Nat Biotechnol 2024; 42:284-292. [PMID: 37231260 PMCID: PMC10869270 DOI: 10.1038/s41587-023-01766-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 03/28/2023] [Indexed: 05/27/2023]
Abstract
Currently available single-cell omics technologies capture many unique features with different biological information content. Data integration aims to place cells, captured with different technologies, onto a common embedding to facilitate downstream analytical tasks. Current horizontal data integration techniques use a set of common features, thereby ignoring non-overlapping features and losing information. Here we introduce StabMap, a mosaic data integration technique that stabilizes mapping of single-cell data by exploiting the non-overlapping features. StabMap first infers a mosaic data topology based on shared features, then projects all cells onto supervised or unsupervised reference coordinates by traversing shortest paths along the topology. We show that StabMap performs well in various simulation contexts, facilitates 'multi-hop' mosaic data integration where some datasets do not share any features and enables the use of spatial gene expression features for mapping dissociated single-cell data onto a spatial transcriptomic reference.
Collapse
Affiliation(s)
- Shila Ghazanfar
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- School of Mathematics and Statistics, The University of Sydney, Camperdown, New South Wales, Australia.
- Charles Perkins Centre, The University of Sydney, Camperdown, New South Wales, Australia.
| | - Carolina Guibentif
- Sahlgrenska Center for Cancer Research, Inst. Biomedicine, Dept. Microbiology and Immunology, University of Gothenburg, Gothenburg, Sweden
| | - John C Marioni
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|
123
|
Zhang Z, Zhao X, Bindra M, Qiu P, Zhang X. scDisInFact: disentangled learning for integration and prediction of multi-batch multi-condition single-cell RNA-sequencing data. Nat Commun 2024; 15:912. [PMID: 38291052 PMCID: PMC10827746 DOI: 10.1038/s41467-024-45227-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 01/18/2024] [Indexed: 02/01/2024] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) has been widely used for disease studies, where sample batches are collected from donors under different conditions including demographic groups, disease stages, and drug treatments. It is worth noting that the differences among sample batches in such a study are a mixture of technical confounders caused by batch effect and biological variations caused by condition effect. However, current batch effect removal methods often eliminate both technical batch effect and meaningful condition effect, while perturbation prediction methods solely focus on condition effect, resulting in inaccurate gene expression predictions due to unaccounted batch effect. Here we introduce scDisInFact, a deep learning framework that models both batch effect and condition effect in scRNA-seq data. scDisInFact learns latent factors that disentangle condition effect from batch effect, enabling it to simultaneously perform three tasks: batch effect removal, condition-associated key gene detection, and perturbation prediction. We evaluate scDisInFact on both simulated and real datasets, and compare its performance with baseline methods for each task. Our results demonstrate that scDisInFact outperforms existing methods that focus on individual tasks, providing a more comprehensive and accurate approach for integrating and predicting multi-batch multi-condition single-cell RNA-sequencing data.
Collapse
Affiliation(s)
- Ziqi Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Xinye Zhao
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Mehak Bindra
- School of Biological Science, Georgia Institute of Technology, Atlanta, GA, USA
| | - Peng Qiu
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Xiuwei Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
124
|
Andreatta M, Hérault L, Gueguen P, Gfeller D, Berenstein AJ, Carmona SJ. Semi-supervised integration of single-cell transcriptomics data. Nat Commun 2024; 15:872. [PMID: 38287014 PMCID: PMC10825117 DOI: 10.1038/s41467-024-45240-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 01/16/2024] [Indexed: 01/31/2024] Open
Abstract
Batch effects in single-cell RNA-seq data pose a significant challenge for comparative analyses across samples, individuals, and conditions. Although batch effect correction methods are routinely applied, data integration often leads to overcorrection and can result in the loss of biological variability. In this work we present STACAS, a batch correction method for scRNA-seq that leverages prior knowledge on cell types to preserve biological variability upon integration. Through an open-source benchmark, we show that semi-supervised STACAS outperforms state-of-the-art unsupervised methods, as well as supervised methods such as scANVI and scGen. STACAS scales well to large datasets and is robust to incomplete and imprecise input cell type labels, which are commonly encountered in real-life integration tasks. We argue that the incorporation of prior cell type information should be a common practice in single-cell data integration, and we provide a flexible framework for semi-supervised batch effect correction.
Collapse
Affiliation(s)
- Massimo Andreatta
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Léonard Hérault
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Paul Gueguen
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Ariel J Berenstein
- Laboratorio de Biología Molecular, División Patología, Instituto Multidisciplinario de Investigaciones en Patologías Pediátricas (IMIPP), CONICET-GCBA, Buenos Aires, C1425EFD, Argentina
| | - Santiago J Carmona
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland.
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| |
Collapse
|
125
|
Piran Z, Nitzan M. SiFT: uncovering hidden biological processes by probabilistic filtering of single-cell data. Nat Commun 2024; 15:760. [PMID: 38278815 PMCID: PMC10817921 DOI: 10.1038/s41467-024-44757-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 01/03/2024] [Indexed: 01/28/2024] Open
Abstract
Cellular populations simultaneously encode multiple biological attributes, including spatial configuration, temporal trajectories, and cell-cell interactions. Some of these signals may be overshadowed by others and harder to recover, despite the great progress made to computationally reconstruct biological processes from single-cell data. To address this, we present SiFT, a kernel-based projection method for filtering biological signals in single-cell data, thus uncovering underlying biological processes. SiFT applies to a wide range of tasks, from the removal of unwanted variation in the data to revealing hidden biological structures. We demonstrate how SiFT enhances the liver circadian signal by filtering spatial zonation, recovers regenerative cell subpopulations in spatially-resolved liver data, and exposes COVID-19 disease-related cells, pathways, and dynamics by filtering healthy reference signals. SiFT performs the correction at the gene expression level, can scale to large datasets, and compares favorably to state-of-the-art methods.
Collapse
Affiliation(s)
- Zoe Piran
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel
| | - Mor Nitzan
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel.
- Racah Institute of Physics, The Hebrew University, Jerusalem, Israel.
- Faculty of Medicine, The Hebrew University, Jerusalem, Israel.
| |
Collapse
|
126
|
He Z, Hu S, Chen Y, An S, Zhou J, Liu R, Shi J, Wang J, Dong G, Shi J, Zhao J, Ou-Yang L, Zhu Y, Bo X, Ying X. Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS. Nat Biotechnol 2024:10.1038/s41587-023-02040-y. [PMID: 38263515 DOI: 10.1038/s41587-023-02040-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 10/23/2023] [Indexed: 01/25/2024]
Abstract
Integrating single-cell datasets produced by multiple omics technologies is essential for defining cellular heterogeneity. Mosaic integration, in which different datasets share only some of the measured modalities, poses major challenges, particularly regarding modality alignment and batch effect removal. Here, we present a deep probabilistic framework for the mosaic integration and knowledge transfer (MIDAS) of single-cell multimodal data. MIDAS simultaneously achieves dimensionality reduction, imputation and batch correction of mosaic data by using self-supervised modality alignment and information-theoretic latent disentanglement. We demonstrate its superiority to 19 other methods and reliability by evaluating its performance in trimodal and mosaic integration tasks. We also constructed a single-cell trimodal atlas of human peripheral blood mononuclear cells and tailored transfer learning and reciprocal reference mapping schemes to enable flexible and accurate knowledge transfer from the atlas to new data. Applications in mosaic integration, pseudotime analysis and cross-tissue knowledge transfer on bone marrow mosaic datasets demonstrate the versatility and superiority of MIDAS. MIDAS is available at https://github.com/labomics/midas .
Collapse
Affiliation(s)
- Zhen He
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Shuofeng Hu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Yaowen Chen
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Sijing An
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiahao Zhou
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Runyan Liu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Junfeng Shi
- School of Automation, China University of Geosciences, Wuhan, China
| | - Jing Wang
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Guohua Dong
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jinhui Shi
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiaxin Zhao
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Xiaomin Ying
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China.
| |
Collapse
|
127
|
Wan H, Yuan M, Fu Y, Deng M. Continually adapting pre-trained language model to universal annotation of single-cell RNA-seq data. Brief Bioinform 2024; 25:bbae047. [PMID: 38388681 PMCID: PMC10883808 DOI: 10.1093/bib/bbae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/29/2023] [Accepted: 01/18/2024] [Indexed: 02/24/2024] Open
Abstract
MOTIVATION Cell-type annotation of single-cell RNA-sequencing (scRNA-seq) data is a hallmark of biomedical research and clinical application. Current annotation tools usually assume the simultaneous acquisition of well-annotated data, but without the ability to expand knowledge from new data. Yet, such tools are inconsistent with the continuous emergence of scRNA-seq data, calling for a continuous cell-type annotation model. In addition, by their powerful ability of information integration and model interpretability, transformer-based pre-trained language models have led to breakthroughs in single-cell biology research. Therefore, the systematic combining of continual learning and pre-trained language models for cell-type annotation tasks is inevitable. RESULTS We herein propose a universal cell-type annotation tool, called CANAL, that continuously fine-tunes a pre-trained language model trained on a large amount of unlabeled scRNA-seq data, as new well-labeled data emerges. CANAL essentially alleviates the dilemma of catastrophic forgetting, both in terms of model inputs and outputs. For model inputs, we introduce an experience replay schema that repeatedly reviews previous vital examples in current training stages. This is achieved through a dynamic example bank with a fixed buffer size. The example bank is class-balanced and proficient in retaining cell-type-specific information, particularly facilitating the consolidation of patterns associated with rare cell types. For model outputs, we utilize representation knowledge distillation to regularize the divergence between previous and current models, resulting in the preservation of knowledge learned from past training stages. Moreover, our universal annotation framework considers the inclusion of new cell types throughout the fine-tuning and testing stages. We can continuously expand the cell-type annotation library by absorbing new cell types from newly arrived, well-annotated training datasets, as well as automatically identify novel cells in unlabeled datasets. Comprehensive experiments with data streams under various biological scenarios demonstrate the versatility and high model interpretability of CANAL. AVAILABILITY An implementation of CANAL is available from https://github.com/aster-ww/CANAL-torch. CONTACT dengmh@pku.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Journal Name online.
Collapse
Affiliation(s)
- Hui Wan
- School of Mathematical Sciences, Peking University, Beijing, China, 100871
| | - Musu Yuan
- Center for Quantitative Biology, Peking University, Beijing, China, 100871
| | - Yiwei Fu
- School of Mathematical Sciences, Peking University, Beijing, China, 100871
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing, China, 100871
- Center for Quantitative Biology, Peking University, Beijing, China, 100871
- Center for Statistical Science, Peking university, Beijing, China, 100871
| |
Collapse
|
128
|
Xiao C, Chen Y, Meng Q, Wei L, Zhang X. Benchmarking multi-omics integration algorithms across single-cell RNA and ATAC data. Brief Bioinform 2024; 25:bbae095. [PMID: 38493343 PMCID: PMC10944570 DOI: 10.1093/bib/bbae095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/30/2024] [Accepted: 02/16/2024] [Indexed: 03/18/2024] Open
Abstract
Recent advancements in single-cell sequencing technologies have generated extensive omics data in various modalities and revolutionized cell research, especially in the single-cell RNA and ATAC data. The joint analysis across scRNA-seq data and scATAC-seq data has paved the way to comprehending the cellular heterogeneity and complex cellular regulatory networks. Multi-omics integration is gaining attention as an important step in joint analysis, and the number of computational tools in this field is growing rapidly. In this paper, we benchmarked 12 multi-omics integration methods on three integration tasks via qualitative visualization and quantitative metrics, considering six main aspects that matter in multi-omics data analysis. Overall, we found that different methods have their own advantages on different aspects, while some methods outperformed other methods in most aspects. We therefore provided guidelines for selecting appropriate methods for specific scenarios and tasks to help obtain meaningful insights from multi-omics data integration.
Collapse
Affiliation(s)
- Chuxi Xiao
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yixin Chen
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Qiuchen Meng
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Lei Wei
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
- School of Life Sciences and School of Medicine, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
129
|
Guo ZH, Wang YB, Wang S, Zhang Q, Huang DS. scCorrector: a robust method for integrating multi-study single-cell data. Brief Bioinform 2024; 25:bbad525. [PMID: 38271483 PMCID: PMC10810333 DOI: 10.1093/bib/bbad525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/12/2023] [Accepted: 12/19/2023] [Indexed: 01/27/2024] Open
Abstract
The advent of single-cell sequencing technologies has revolutionized cell biology studies. However, integrative analyses of diverse single-cell data face serious challenges, including technological noise, sample heterogeneity, and different modalities and species. To address these problems, we propose scCorrector, a variational autoencoder-based model that can integrate single-cell data from different studies and map them into a common space. Specifically, we designed a Study Specific Adaptive Normalization for each study in decoder to implement these features. scCorrector substantially achieves competitive and robust performance compared with state-of-the-art methods and brings novel insights under various circumstances (e.g. various batches, multi-omics, cross-species, and development stages). In addition, the integration of single-cell data and spatial data makes it possible to transfer information between different studies, which greatly expand the narrow range of genes covered by MERFISH technology. In summary, scCorrector can efficiently integrate multi-study single-cell datasets, thereby providing broad opportunities to tackle challenges emerging from noisy resources.
Collapse
Affiliation(s)
- Zhen-Hao Guo
- College of Electronics and Information Engineering, Tongji University, Shanghai 200000, China
| | - Yan-Bin Wang
- College of Computer Science and Technology, Zhejiang University 310027, China
| | - Siguo Wang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Tongxin Road No.568, Ningbo, Zhejiang 315201, China
| | - Qinhu Zhang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Tongxin Road No.568, Ningbo, Zhejiang 315201, China
| | - De-Shuang Huang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Tongxin Road No.568, Ningbo, Zhejiang 315201, China
| |
Collapse
|
130
|
Kousnetsov R, Bourque J, Surnov A, Fallahee I, Hawiger D. Single-cell sequencing analysis within biologically relevant dimensions. Cell Syst 2024; 15:83-103.e11. [PMID: 38198894 DOI: 10.1016/j.cels.2023.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 05/23/2023] [Accepted: 12/14/2023] [Indexed: 01/12/2024]
Abstract
The currently predominant approach to transcriptomic and epigenomic single-cell analysis depends on a rigid perspective constrained by reduced dimensions and algorithmically derived and annotated clusters. Here, we developed Seqtometry (sequencing-to-measurement), a single-cell analytical strategy based on biologically relevant dimensions enabled by advanced scoring with multiple gene sets (signatures) for examination of gene expression and accessibility across various organ systems. By utilizing information only in the form of specific signatures, Seqtometry bypasses unsupervised clustering and individual annotations of clusters. Instead, Seqtometry combines qualitative and quantitative cell-type identification with specific characterization of diverse biological processes under experimental or disease conditions. Comprehensive analysis by Seqtometry of various immune cells as well as other cells from different organs and disease-induced states, including multiple myeloma and Alzheimer's disease, surpasses corresponding cluster-based analytical output. We propose Seqtometry as a single-cell sequencing analysis approach applicable for both basic and clinical research.
Collapse
Affiliation(s)
- Robert Kousnetsov
- Department of Molecular Microbiology and Immunology, Saint Louis University School of Medicine, St. Louis, MO, USA
| | - Jessica Bourque
- Department of Molecular Microbiology and Immunology, Saint Louis University School of Medicine, St. Louis, MO, USA
| | - Alexey Surnov
- Department of Molecular Microbiology and Immunology, Saint Louis University School of Medicine, St. Louis, MO, USA
| | - Ian Fallahee
- Department of Molecular Microbiology and Immunology, Saint Louis University School of Medicine, St. Louis, MO, USA
| | - Daniel Hawiger
- Department of Molecular Microbiology and Immunology, Saint Louis University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
131
|
Shahir JA, Stanley N, Purvis JE. Cellograph: a semi-supervised approach to analyzing multi-condition single-cell RNA-sequencing data using graph neural networks. BMC Bioinformatics 2024; 25:25. [PMID: 38221640 PMCID: PMC10788980 DOI: 10.1186/s12859-024-05641-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 01/04/2024] [Indexed: 01/16/2024] Open
Abstract
With the growing number of single-cell datasets collected under more complex experimental conditions, there is an opportunity to leverage single-cell variability to reveal deeper insights into how cells respond to perturbations. Many existing approaches rely on discretizing the data into clusters for differential gene expression (DGE), effectively ironing out any information unveiled by the single-cell variability across cell-types. In addition, DGE often assumes a statistical distribution that, if erroneous, can lead to false positive differentially expressed genes. Here, we present Cellograph: a semi-supervised framework that uses graph neural networks to quantify the effects of perturbations at single-cell granularity. Cellograph not only measures how prototypical cells are of each condition but also learns a latent space that is amenable to interpretable data visualization and clustering. The learned gene weight matrix from training reveals pertinent genes driving the differences between conditions. We demonstrate the utility of our approach on publicly-available datasets including cancer drug therapy, stem cell reprogramming, and organoid differentiation. Cellograph outperforms existing methods for quantifying the effects of experimental perturbations and offers a novel framework to analyze single-cell data using deep learning.
Collapse
Affiliation(s)
- Jamshaid A Shahir
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Natalie Stanley
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jeremy E Purvis
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
132
|
Gonzalez G, Herath I, Veselkov K, Bronstein M, Zitnik M. Combinatorial prediction of therapeutic perturbations using causally-inspired neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.03.573985. [PMID: 38260532 PMCID: PMC10802439 DOI: 10.1101/2024.01.03.573985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
As an alternative to target-driven drug discovery, phenotype-driven approaches identify compounds that counteract the overall disease effects by analyzing phenotypic signatures. Our study introduces a novel approach to this field, aiming to expand the search space for new therapeutic agents. We introduce PDGrapher, a causally-inspired graph neural network model designed to predict arbitrary perturbagens - sets of therapeutic targets - capable of reversing disease effects. Unlike existing methods that learn responses to perturbations, PDGrapher solves the inverse problem, which is to infer the perturbagens necessary to achieve a specific response - i.e., directly predicting perturbagens by learning which perturbations elicit a desired response. Experiments across eight datasets of genetic and chemical perturbations show that PDGrapher successfully predicted effective perturbagens in up to 9% additional test samples and ranked therapeutic targets up to 35% higher than competing methods. A key innovation of PDGrapher is its direct prediction capability, which contrasts with the indirect, computationally intensive models traditionally used in phenotypedriven drug discovery that only predict changes in phenotypes due to perturbations. The direct approach enables PDGrapher to train up to 30 times faster, representing a significant leap in efficiency. Our results suggest that PDGrapher can advance phenotype-driven drug discovery, offering a fast and comprehensive approach to identifying therapeutically useful perturbations.
Collapse
Affiliation(s)
- Guadalupe Gonzalez
- Imperial College London, London, UK
- Prescient Design, Genentech, South San Francisco, CA, USA
- F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Isuru Herath
- Merck & Co., South San Francisco, CA, USA
- Cornell University, Ithaca, NY, USA
| | | | | | - Marinka Zitnik
- Harvard Medical School, Boston, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
| |
Collapse
|
133
|
Sagar. Unraveling the secrets of γδ T cells with single-cell biology. J Leukoc Biol 2024; 115:47-56. [PMID: 38073484 DOI: 10.1093/jleuko/qiad131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 09/16/2023] [Accepted: 09/28/2023] [Indexed: 01/07/2024] Open
Abstract
Recent technological advancements have enabled us to study the molecular features of cellular states at the single-cell level, providing unprecedented resolution for comprehending the identity and function of a cell. By applying these techniques across multiple time frames, tissues, and diseases, we can delve deeper into the mechanisms governing the development and functions of cell lineages. In this review, I focus on γδ T cells, which are a unique and functionally nonredundant T cell lineage categorized under the umbrella of unconventional T cells. I discuss how single-cell biology is providing unique insights into their development and functions. Furthermore, I explore how single-cell methods can be used to answer several key questions about their biology. These investigations will be essential to fully understand their translational potential, including their role in cytotoxicity and tissue repair in cancer and regeneration.
Collapse
Affiliation(s)
- Sagar
- Department of Medicine II (Gastroenterology, Hepatology, Endocrinology, and Infectious Diseases), University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Hugstetterstraße 55, Freiburg 79106, Germany
| |
Collapse
|
134
|
Heryanto YD, Zhang YZ, Imoto S. Predicting cell types with supervised contrastive learning on cells and their types. Sci Rep 2024; 14:430. [PMID: 38172501 PMCID: PMC10764802 DOI: 10.1038/s41598-023-50185-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 12/16/2023] [Indexed: 01/05/2024] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) is a powerful technique that provides high-resolution expression profiling of individual cells. It significantly advances our understanding of cellular diversity and function. Despite its potential, the analysis of scRNA-seq data poses considerable challenges related to multicollinearity, data imbalance, and batch effect. One of the pivotal tasks in single-cell data analysis is cell type annotation, which classifies cells into discrete types based on their gene expression profiles. In this work, we propose a novel modeling formalism for cell type annotation with a supervised contrastive learning method, named SCLSC (Supervised Contrastive Learning for Single Cell). Different from the previous usage of contrastive learning in single cell data analysis, we employed the contrastive learning for instance-type pairs instead of instance-instance pairs. More specifically, in the cell type annotation task, the contrastive learning is applied to learn cell and cell type representation that render cells of the same type to be clustered in the new embedding space. Through this approach, the knowledge derived from annotated cells is transferred to the feature representation for scRNA-seq data. The whole training process becomes more efficient when conducting contrastive learning for cell and their types. Our experiment results demonstrate that the proposed SCLSC method consistently achieves superior accuracy in predicting cell types compared to five state-of-the-art methods. SCLSC also performs well in identifying cell types in different batch groups. The simplicity of our method allows for scalability, making it suitable for analyzing datasets with a large number of cells. In a real-world application of SCLSC to monitor the dynamics of immune cell subpopulations over time, SCLSC demonstrates a capability to discriminate cell subtypes of CD19+ B cells that were not present in the training dataset.
Collapse
Affiliation(s)
- Yusri Dwi Heryanto
- The Institute of Medical science, The University of Tokyo, Tokyo, 108-8639, Japan
| | - Yao-Zhong Zhang
- The Institute of Medical science, The University of Tokyo, Tokyo, 108-8639, Japan.
| | - Seiya Imoto
- The Institute of Medical science, The University of Tokyo, Tokyo, 108-8639, Japan.
| |
Collapse
|
135
|
Zhang C, Liu L, Zhang Y, Li M, Fang S, Kang Q, Chen A, Xu X, Zhang Y, Li Y. spatiAlign: an unsupervised contrastive learning model for data integration of spatially resolved transcriptomics. Gigascience 2024; 13:giae042. [PMID: 39028588 PMCID: PMC11258913 DOI: 10.1093/gigascience/giae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 01/08/2024] [Accepted: 06/21/2024] [Indexed: 07/21/2024] Open
Abstract
BACKGROUND Integrative analysis of spatially resolved transcriptomics datasets empowers a deeper understanding of complex biological systems. However, integrating multiple tissue sections presents challenges for batch effect removal, particularly when the sections are measured by various technologies or collected at different times. FINDINGS We propose spatiAlign, an unsupervised contrastive learning model that employs the expression of all measured genes and the spatial location of cells, to integrate multiple tissue sections. It enables the joint downstream analysis of multiple datasets not only in low-dimensional embeddings but also in the reconstructed full expression space. CONCLUSIONS In benchmarking analysis, spatiAlign outperforms state-of-the-art methods in learning joint and discriminative representations for tissue sections, each potentially characterized by complex batch effects or distinct biological characteristics. Furthermore, we demonstrate the benefits of spatiAlign for the integrative analysis of time-series brain sections, including spatial clustering, differential expression analysis, and particularly trajectory inference that requires a corrected gene expression matrix.
Collapse
Affiliation(s)
| | - Lin Liu
- BGI Research, Shenzhen 518083, China
| | | | - Mei Li
- BGI Research, Shenzhen 518083, China
| | - Shuangsang Fang
- BGI Research, Shenzhen 518083, China
- BGI Research, Beijing 102601, China
| | | | - Ao Chen
- BGI Research, Shenzhen 518083, China
- BGI Research, Chongqing 401329, China
| | - Xun Xu
- BGI Research, Wuhan 430074, China
| | - Yong Zhang
- BGI Research, Shenzhen 518083, China
- BGI Research, Wuhan 430074, China
- Guangdong Bigdata Engineering Technology Research Center for Life Sciences, BGI Research, Shenzhen 518083, China
| | - Yuxiang Li
- BGI Research, Shenzhen 518083, China
- BGI Research, Wuhan 430074, China
- Guangdong Bigdata Engineering Technology Research Center for Life Sciences, BGI Research, Shenzhen 518083, China
| |
Collapse
|
136
|
Jurado MR, Tombor LS, Arsalan M, Holubec T, Emrich F, Walther T, Abplanalp W, Fischer A, Zeiher AM, Schulz MH, Dimmeler S, John D. Improved integration of single-cell transcriptome data demonstrates common and unique signatures of heart failure in mice and humans. Gigascience 2024; 13:giae011. [PMID: 38573186 PMCID: PMC10993718 DOI: 10.1093/gigascience/giae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 01/17/2024] [Accepted: 03/06/2024] [Indexed: 04/05/2024] Open
Abstract
BACKGROUND Cardiovascular research heavily relies on mouse (Mus musculus) models to study disease mechanisms and to test novel biomarkers and medications. Yet, applying these results to patients remains a major challenge and often results in noneffective drugs. Therefore, it is an open challenge of translational science to develop models with high similarities and predictive value. This requires a comparison of disease models in mice with diseased tissue derived from humans. RESULTS To compare the transcriptional signatures at single-cell resolution, we implemented an integration pipeline called OrthoIntegrate, which uniquely assigns orthologs and therewith merges single-cell RNA sequencing (scRNA-seq) RNA of different species. The pipeline has been designed to be as easy to use and is fully integrable in the standard Seurat workflow.We applied OrthoIntegrate on scRNA-seq from cardiac tissue of heart failure patients with reduced ejection fraction (HFrEF) and scRNA-seq from the mice after chronic infarction, which is a commonly used mouse model to mimic HFrEF. We discovered shared and distinct regulatory pathways between human HFrEF patients and the corresponding mouse model. Overall, 54% of genes were commonly regulated, including major changes in cardiomyocyte energy metabolism. However, several regulatory pathways (e.g., angiogenesis) were specifically regulated in humans. CONCLUSIONS The demonstration of unique pathways occurring in humans indicates limitations on the comparability between mice models and human HFrEF and shows that results from the mice model should be validated carefully. OrthoIntegrate is publicly accessible (https://github.com/MarianoRuzJurado/OrthoIntegrate) and can be used to integrate other large datasets to provide a general comparison of models with patient data.
Collapse
Affiliation(s)
- Mariano Ruz Jurado
- Institute of Cardiovascular Regeneration, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
- German Centre for Cardiovascular Research (DZHK), 60590 Frankfurt am Main, Germany
- Cardio-Pulmonary Institute (CPI), Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
| | - Lukas S Tombor
- Institute of Cardiovascular Regeneration, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
- German Centre for Cardiovascular Research (DZHK), 60590 Frankfurt am Main, Germany
| | - Mani Arsalan
- Department of Cardiovascular Surgery, Goethe University Hospital, 60590 Frankfurt am Main, Germany
| | - Tomas Holubec
- Department of Cardiovascular Surgery, Goethe University Hospital, 60590 Frankfurt am Main, Germany
| | - Fabian Emrich
- Department of Cardiovascular Surgery, Goethe University Hospital, 60590 Frankfurt am Main, Germany
| | - Thomas Walther
- German Centre for Cardiovascular Research (DZHK), 60590 Frankfurt am Main, Germany
- Cardio-Pulmonary Institute (CPI), Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
- Department of Cardiovascular Surgery, Goethe University Hospital, 60590 Frankfurt am Main, Germany
| | - Wesley Abplanalp
- Institute of Cardiovascular Regeneration, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
- German Centre for Cardiovascular Research (DZHK), 60590 Frankfurt am Main, Germany
- Cardio-Pulmonary Institute (CPI), Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
| | - Ariane Fischer
- Institute of Cardiovascular Regeneration, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
| | - Andreas M Zeiher
- Institute of Cardiovascular Regeneration, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
- German Centre for Cardiovascular Research (DZHK), 60590 Frankfurt am Main, Germany
- Cardio-Pulmonary Institute (CPI), Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
| | - Marcel H Schulz
- Institute of Cardiovascular Regeneration, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
- German Centre for Cardiovascular Research (DZHK), 60590 Frankfurt am Main, Germany
- Cardio-Pulmonary Institute (CPI), Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
| | - Stefanie Dimmeler
- Institute of Cardiovascular Regeneration, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
- German Centre for Cardiovascular Research (DZHK), 60590 Frankfurt am Main, Germany
- Cardio-Pulmonary Institute (CPI), Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
| | - David John
- Institute of Cardiovascular Regeneration, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
- German Centre for Cardiovascular Research (DZHK), 60590 Frankfurt am Main, Germany
- Cardio-Pulmonary Institute (CPI), Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
| |
Collapse
|
137
|
Varrone M, Tavernari D, Santamaria-Martínez A, Walsh LA, Ciriello G. CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat Genet 2024; 56:74-84. [PMID: 38066188 DOI: 10.1038/s41588-023-01588-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 10/23/2023] [Indexed: 12/20/2023]
Abstract
Tissues are organized in cellular niches, the composition and interactions of which can be investigated using spatial omics technologies. However, systematic analyses of tissue composition are challenged by the scale and diversity of the data. Here we present CellCharter, an algorithmic framework to identify, characterize, and compare cellular niches in spatially resolved datasets. CellCharter outperformed existing approaches and effectively identified cellular niches across datasets generated using different technologies, and comprising hundreds of samples and millions of cells. In multiple human lung cancer cohorts, CellCharter uncovered a cellular niche composed of tumor-associated neutrophil and cancer cells expressing markers of hypoxia and cell migration. This cancer cell state was spatially segregated from more proliferative tumor cell clusters and was associated with tumor-associated neutrophil infiltration and poor prognosis in independent patient cohorts. Overall, CellCharter enables systematic analyses across data types and technologies to decode the link between spatial tissue architectures and cell plasticity.
Collapse
Affiliation(s)
- Marco Varrone
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Cancer Center Léman, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Daniele Tavernari
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Cancer Center Léman, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Swiss Institute for Experimental Cancer Research, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Albert Santamaria-Martínez
- Swiss Cancer Center Léman, Lausanne, Switzerland
- Swiss Institute for Experimental Cancer Research, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Logan A Walsh
- Rosalind and Morris Goodman Cancer Institute, McGill University, Montreal, Quebec, Canada
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
| | - Giovanni Ciriello
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Cancer Center Léman, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
138
|
Zhang T, Zhao F, Lin Y, Liu M, Zhou H, Cui F, Jin Y, Chen L, Sheng X. Integrated analysis of single-cell and bulk transcriptomics develops a robust neuroendocrine cell-intrinsic signature to predict prostate cancer progression. Theranostics 2024; 14:1065-1080. [PMID: 38250042 PMCID: PMC10797290 DOI: 10.7150/thno.92336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 12/26/2023] [Indexed: 01/23/2024] Open
Abstract
Neuroendocrine prostate cancer (NEPC) typically implies severe lethality and limited treatment options. The precise identification of NEPC cells holds paramount significance for both research and clinical applications, yet valid NEPC biomarker remains to be defined. Methods: Leveraging 11 published NE-related gene sets, 11 single-cell RNA-sequencing (scRNA-seq) cohorts, 15 bulk transcriptomic cohorts, and 13 experimental models of prostate cancer (PCa), we employed multiple advanced algorithms to construct and validate a robust NEPC risk prediction model. Results: Through the compilation of a comprehensive scRNA-seq reference atlas (comprising a total of 210,879 single cells, including 66 tumor samples) from 9 multicenter datasets of PCa, we observed inconsistent and inefficient performance among the 11 published NE gene sets. Therefore, we developed an integrative analysis pipeline, identifying 762 high-quality NE markers. Subsequently, we derived the NE cell-intrinsic gene signature, and developed an R package named NEPAL, to predict NEPC risk scores. By applying to multiple independent validation datasets, NEPAL consistently and accurately assigned NE feature and delineated PCa progression. Intriguingly, NEPAL demonstrated predictive capabilities for prognosis and therapy responsiveness, as well as the identification of potential epigenetic drivers of NEPC. Conclusion: The present study furnishes a valuable tool for the identification of NEPC and the monitoring of PCa progression through transcriptomic profiles obtained from both bulk and single-cell sources.
Collapse
Affiliation(s)
- Tingting Zhang
- Key Laboratory of Environmental Health, Ministry of Education & Ministry of Environmental Protection, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- School of Life and Health Sciences, Hainan University, Haikou, China
| | - Faming Zhao
- Key Laboratory of Environmental Health, Ministry of Education & Ministry of Environmental Protection, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- School of Life and Health Sciences, Hainan University, Haikou, China
| | - Yahang Lin
- Department of Neurology, Wuhan Fourth Hospital/Pu'ai Hospital, Wuhan, China
| | - Mingsheng Liu
- The Second Ward of Urology, Qujing Affiliated Hospital of Kunming Medical University, Qujing, China
| | - Hongqing Zhou
- The Second Ward of Urology, Qujing Affiliated Hospital of Kunming Medical University, Qujing, China
| | - Fengzhen Cui
- Key Laboratory of Environmental Health, Ministry of Education & Ministry of Environmental Protection, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- School of Life and Health Sciences, Hainan University, Haikou, China
| | - Yang Jin
- Institute for Cancer Genetics and Informatics, Oslo University Hospital, Oslo, Norway
| | - Liang Chen
- Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xia Sheng
- Key Laboratory of Environmental Health, Ministry of Education & Ministry of Environmental Protection, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- School of Life and Health Sciences, Hainan University, Haikou, China
| |
Collapse
|
139
|
Martens LD, Fischer DS, Yépez VA, Theis FJ, Gagneur J. Modeling fragment counts improves single-cell ATAC-seq analysis. Nat Methods 2024; 21:28-31. [PMID: 38049697 PMCID: PMC10776385 DOI: 10.1038/s41592-023-02112-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 10/25/2023] [Indexed: 12/06/2023]
Abstract
Single-cell ATAC sequencing coverage in regulatory regions is typically binarized as an indicator of open chromatin. Here we show that binarization is an unnecessary step that neither improves goodness of fit, clustering, cell type identification nor batch integration. Fragment counts, but not read counts, should instead be modeled, which preserves quantitative regulatory information. These results have immediate implications for single-cell ATAC sequencing analysis.
Collapse
Affiliation(s)
- Laura D Martens
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
- Helmholtz Association, Munich School for Data Science (MUDS), Munich, Germany
| | - David S Fischer
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Vicente A Yépez
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Fabian J Theis
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
- Helmholtz Association, Munich School for Data Science (MUDS), Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
- Helmholtz Association, Munich School for Data Science (MUDS), Munich, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
| |
Collapse
|
140
|
Danino R, Nachman I, Sharan R. Batch correction of single-cell sequencing data via an autoencoder architecture. BIOINFORMATICS ADVANCES 2023; 4:vbad186. [PMID: 38213820 PMCID: PMC10781938 DOI: 10.1093/bioadv/vbad186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 12/09/2023] [Accepted: 12/17/2023] [Indexed: 01/13/2024]
Abstract
Motivation Technical differences between gene expression sequencing experiments can cause variations in the data in the form of batch effect biases. These do not represent true biological variations between samples and can lead to false conclusions or hinder the ability to integrate multiple datasets. Since there is a growing need for the joint analysis of single-cell sequencing datasets from different sources, there is also a need to correct the resulting batch effects while maintaining the true biological variations in the data. Results We developed a semi-supervised deep learning architecture called Autoencoder-based Batch Correction (ABC) for integrating single-cell sequencing datasets. Our method removes batch effects through a guided process of data compression using supervised cell type classifier branches for biological signal retention. It aligns the different batches using an adversarial training approach. We comprehensively evaluate the performance of our method using four single-cell sequencing datasets and multiple measures for batch effect removal and biological variation conservation. ABC outperforms 10 state-of-the-art methods for this task including Seurat, scGen, ComBat, scanorama, scVI, scANVI, AutoClass, Harmony, scDREAMER, and CLEAR, correcting various types of batch effects while preserving intricate biological variations.
Collapse
Affiliation(s)
- Reut Danino
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Iftach Nachman
- School of Neurobiology, Biochemistry and Biophysics, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
141
|
Yu D, Li M, Linghu G, Hu Y, Hajdarovic KH, Wang A, Singh R, Webb AE. CellBiAge: Improved single-cell age classification using data binarization. Cell Rep 2023; 42:113500. [PMID: 38032797 PMCID: PMC10791072 DOI: 10.1016/j.celrep.2023.113500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 10/20/2023] [Accepted: 11/13/2023] [Indexed: 12/02/2023] Open
Abstract
Aging is a major risk factor for many diseases. Accurate methods for predicting age in specific cell types are essential to understand the heterogeneity of aging and to assess rejuvenation strategies. However, classifying organismal age at single-cell resolution using transcriptomics is challenging due to sparsity and noise. Here, we developed CellBiAge, a robust and easy-to-implement machine learning pipeline, to classify the age of single cells in the mouse brain using single-cell transcriptomics. We show that binarization of gene expression values for the top highly variable genes significantly improved test performance across different models, techniques, sexes, and brain regions, with potential age-related genes identified for model prediction. Additionally, we demonstrate CellBiAge's ability to capture exercise-induced rejuvenation in neural stem cells. This study provides a broadly applicable approach for robust classification of organismal age of single cells in the mouse brain, which may aid in understanding the aging process and evaluating rejuvenation methods.
Collapse
Affiliation(s)
- Doudou Yu
- Molecular Biology, Cell Biology, and Biochemistry Graduate Program, Brown University, Providence, RI 02912, USA; Data Science Institute, Brown University, Providence, RI 02912, USA
| | - Manlin Li
- Data Science Institute, Brown University, Providence, RI 02912, USA
| | - Guanjie Linghu
- Data Science Institute, Brown University, Providence, RI 02912, USA
| | - Yihuan Hu
- Data Science Institute, Brown University, Providence, RI 02912, USA
| | | | - An Wang
- Department of Applied Mathematics & Statistics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Ritambhara Singh
- Department of Computer Science, Brown University, Providence, RI 02912, USA; Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA.
| | - Ashley E Webb
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI 02912, USA; Center on the Biology of Aging, Brown University, Providence, RI 02912, USA; Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA; Center for Translational Neuroscience, Brown University, Providence, RI 02912, USA.
| |
Collapse
|
142
|
Xu C, Prete M, Webb S, Jardine L, Stewart BJ, Hoo R, He P, Meyer KB, Teichmann SA. Automatic cell-type harmonization and integration across Human Cell Atlas datasets. Cell 2023; 186:5876-5891.e20. [PMID: 38134877 DOI: 10.1016/j.cell.2023.11.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 08/24/2023] [Accepted: 11/23/2023] [Indexed: 12/24/2023]
Abstract
Harmonizing cell types across the single-cell community and assembling them into a common framework is central to building a standardized Human Cell Atlas. Here, we present CellHint, a predictive clustering tree-based tool to resolve cell-type differences in annotation resolution and technical biases across datasets. CellHint accurately quantifies cell-cell transcriptomic similarities and places cell types into a relationship graph that hierarchically defines shared and unique cell subtypes. Application to multiple immune datasets recapitulates expert-curated annotations. CellHint also reveals underexplored relationships between healthy and diseased lung cell states in eight diseases. Furthermore, we present a workflow for fast cross-dataset integration guided by harmonized cell types and cell hierarchy, which uncovers underappreciated cell types in adult human hippocampus. Finally, we apply CellHint to 12 tissues from 38 datasets, providing a deeply curated cross-tissue database with ∼3.7 million cells and various machine learning models for automatic cell annotation across human tissues.
Collapse
Affiliation(s)
- Chuan Xu
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Martin Prete
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Simone Webb
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Laura Jardine
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Benjamin J Stewart
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge CB2 0QQ, UK; Cambridge University Hospitals NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge CB2 0QQ, UK
| | - Regina Hoo
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Peng He
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Kerstin B Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Theory of Condensed Matter Group, Department of Physics, Cavendish Laboratory, University of Cambridge, Cambridge CB3 0HE, UK.
| |
Collapse
|
143
|
Møller AF, Madsen JGS. JOINTLY: interpretable joint clustering of single-cell transcriptomes. Nat Commun 2023; 14:8473. [PMID: 38123569 PMCID: PMC10733431 DOI: 10.1038/s41467-023-44279-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 12/06/2023] [Indexed: 12/23/2023] Open
Abstract
Single-cell and single-nucleus RNA-sequencing (sxRNA-seq) is increasingly being used to characterise the transcriptomic state of cell types at homeostasis, during development and in disease. However, this is a challenging task, as biological effects can be masked by technical variation. Here, we present JOINTLY, an algorithm enabling joint clustering of sxRNA-seq datasets across batches. JOINTLY performs on par or better than state-of-the-art batch integration methods in clustering tasks and outperforms other intrinsically interpretable methods. We demonstrate that JOINTLY is robust against over-correction while retaining subtle cell state differences between biological conditions and highlight how the interpretation of JOINTLY can be used to annotate cell types and identify active signalling programs across cell types and pseudo-time. Finally, we use JOINTLY to construct a reference atlas of white adipose tissue (WATLAS), an expandable and comprehensive community resource, in which we describe four adipocyte subpopulations and map compositional changes in obesity and between depots.
Collapse
Affiliation(s)
- Andreas Fønss Møller
- Institute of Biochemistry and Molecular Biology, University of Southern, Odense, Denmark
- Sino-Danish College (SDC), University of Chinese Academy of Sciences, Beijing, China
| | - Jesper Grud Skat Madsen
- Institute of Biochemistry and Molecular Biology, University of Southern, Odense, Denmark.
- Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.
- Center for Functional Genomics and Tissue Plasticity (ATLAS), Odense M, 5230, Denmark.
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| |
Collapse
|
144
|
Su J, Reynier JB, Fu X, Zhong G, Jiang J, Escalante RS, Wang Y, Aparicio L, Izar B, Knowles DA, Rabadan R. Smoother: a unified and modular framework for incorporating structural dependency in spatial omics data. Genome Biol 2023; 24:291. [PMID: 38110959 PMCID: PMC10726548 DOI: 10.1186/s13059-023-03138-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 12/04/2023] [Indexed: 12/20/2023] Open
Abstract
Spatial omics technologies can help identify spatially organized biological processes, but existing computational approaches often overlook structural dependencies in the data. Here, we introduce Smoother, a unified framework that integrates positional information into non-spatial models via modular priors and losses. In simulated and real datasets, Smoother enables accurate data imputation, cell-type deconvolution, and dimensionality reduction with remarkable efficiency. In colorectal cancer, Smoother-guided deconvolution reveals plasma cell and fibroblast subtype localizations linked to tumor microenvironment restructuring. Additionally, joint modeling of spatial and single-cell human prostate data with Smoother allows for spatial mapping of reference populations with significantly reduced ambiguity.
Collapse
Affiliation(s)
- Jiayu Su
- Program for Mathematical Genomics, Columbia University, New York, NY, USA.
- Department of Systems Biology, Columbia University, New York, NY, USA.
- New York Genome Center, New York, NY, USA.
| | - Jean-Baptiste Reynier
- Program for Mathematical Genomics, Columbia University, New York, NY, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Xi Fu
- Program for Mathematical Genomics, Columbia University, New York, NY, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Guojie Zhong
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Jiahao Jiang
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | | | - Yiping Wang
- Program for Mathematical Genomics, Columbia University, New York, NY, USA
- Division of Hematology/Oncology, Department of Medicine, Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Luis Aparicio
- Program for Mathematical Genomics, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Benjamin Izar
- Program for Mathematical Genomics, Columbia University, New York, NY, USA
- Division of Hematology/Oncology, Department of Medicine, Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA
| | - David A Knowles
- Department of Systems Biology, Columbia University, New York, NY, USA
- New York Genome Center, New York, NY, USA
- Department of Computer Science, Columbia University, New York, NY, USA
| | - Raul Rabadan
- Program for Mathematical Genomics, Columbia University, New York, NY, USA.
- Department of Systems Biology, Columbia University, New York, NY, USA.
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.
| |
Collapse
|
145
|
Persad S, Choo ZN, Dien C, Sohail N, Masilionis I, Chaligné R, Nawy T, Brown CC, Sharma R, Pe'er I, Setty M, Pe'er D. SEACells infers transcriptional and epigenomic cellular states from single-cell genomics data. Nat Biotechnol 2023; 41:1746-1757. [PMID: 36973557 PMCID: PMC10713451 DOI: 10.1038/s41587-023-01716-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 02/20/2023] [Indexed: 03/29/2023]
Abstract
Metacells are cell groupings derived from single-cell sequencing data that represent highly granular, distinct cell states. Here we present single-cell aggregation of cell states (SEACells), an algorithm for identifying metacells that overcome the sparsity of single-cell data while retaining heterogeneity obscured by traditional cell clustering. SEACells outperforms existing algorithms in identifying comprehensive, compact and well-separated metacells in both RNA and assay for transposase-accessible chromatin (ATAC) modalities across datasets with discrete cell types and continuous trajectories. We demonstrate the use of SEACells to improve gene-peak associations, compute ATAC gene scores and infer the activities of critical regulators during differentiation. Metacell-level analysis scales to large datasets and is particularly well suited for patient cohorts, where per-patient aggregation provides more robust units for data integration. We use our metacells to reveal expression dynamics and gradual reconfiguration of the chromatin landscape during hematopoietic differentiation and to uniquely identify CD4 T cell differentiation and activation states associated with disease onset and severity in a Coronavirus Disease 2019 (COVID-19) patient cohort.
Collapse
Affiliation(s)
- Sitara Persad
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Department of Computer Science, Fu Foundation School of Engineering & Applied Science, Columbia University, New York, NY, USA
| | - Zi-Ning Choo
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Christine Dien
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Computational Biology Program, Public Health Sciences Division and Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Noor Sohail
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ignas Masilionis
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ronan Chaligné
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Tal Nawy
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Chrysothemis C Brown
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Roshan Sharma
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Itsik Pe'er
- Department of Computer Science, Fu Foundation School of Engineering & Applied Science, Columbia University, New York, NY, USA
| | - Manu Setty
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
- Computational Biology Program, Public Health Sciences Division and Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle, WA, USA.
| | - Dana Pe'er
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Howard Hughes Medical Institute, New York, NY, USA.
| |
Collapse
|
146
|
Yao Z, van Velthoven CTJ, Kunst M, Zhang M, McMillen D, Lee C, Jung W, Goldy J, Abdelhak A, Aitken M, Baker K, Baker P, Barkan E, Bertagnolli D, Bhandiwad A, Bielstein C, Bishwakarma P, Campos J, Carey D, Casper T, Chakka AB, Chakrabarty R, Chavan S, Chen M, Clark M, Close J, Crichton K, Daniel S, DiValentin P, Dolbeare T, Ellingwood L, Fiabane E, Fliss T, Gee J, Gerstenberger J, Glandon A, Gloe J, Gould J, Gray J, Guilford N, Guzman J, Hirschstein D, Ho W, Hooper M, Huang M, Hupp M, Jin K, Kroll M, Lathia K, Leon A, Li S, Long B, Madigan Z, Malloy J, Malone J, Maltzer Z, Martin N, McCue R, McGinty R, Mei N, Melchor J, Meyerdierks E, Mollenkopf T, Moonsman S, Nguyen TN, Otto S, Pham T, Rimorin C, Ruiz A, Sanchez R, Sawyer L, Shapovalova N, Shepard N, Slaughterbeck C, Sulc J, Tieu M, Torkelson A, Tung H, Valera Cuevas N, Vance S, Wadhwani K, Ward K, Levi B, Farrell C, Young R, Staats B, Wang MQM, Thompson CL, Mufti S, Pagan CM, Kruse L, Dee N, Sunkin SM, Esposito L, Hawrylycz MJ, Waters J, Ng L, Smith K, Tasic B, Zhuang X, Zeng H. A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain. Nature 2023; 624:317-332. [PMID: 38092916 PMCID: PMC10719114 DOI: 10.1038/s41586-023-06812-z] [Citation(s) in RCA: 69] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 10/31/2023] [Indexed: 12/17/2023]
Abstract
The mammalian brain consists of millions to billions of cells that are organized into many cell types with specific spatial distribution patterns and structural and functional properties1-3. Here we report a comprehensive and high-resolution transcriptomic and spatial cell-type atlas for the whole adult mouse brain. The cell-type atlas was created by combining a single-cell RNA-sequencing (scRNA-seq) dataset of around 7 million cells profiled (approximately 4.0 million cells passing quality control), and a spatial transcriptomic dataset of approximately 4.3 million cells using multiplexed error-robust fluorescence in situ hybridization (MERFISH). The atlas is hierarchically organized into 4 nested levels of classification: 34 classes, 338 subclasses, 1,201 supertypes and 5,322 clusters. We present an online platform, Allen Brain Cell Atlas, to visualize the mouse whole-brain cell-type atlas along with the single-cell RNA-sequencing and MERFISH datasets. We systematically analysed the neuronal and non-neuronal cell types across the brain and identified a high degree of correspondence between transcriptomic identity and spatial specificity for each cell type. The results reveal unique features of cell-type organization in different brain regions-in particular, a dichotomy between the dorsal and ventral parts of the brain. The dorsal part contains relatively fewer yet highly divergent neuronal types, whereas the ventral part contains more numerous neuronal types that are more closely related to each other. Our study also uncovered extraordinary diversity and heterogeneity in neurotransmitter and neuropeptide expression and co-expression patterns in different cell types. Finally, we found that transcription factors are major determinants of cell-type classification and identified a combinatorial transcription factor code that defines cell types across all parts of the brain. The whole mouse brain transcriptomic and spatial cell-type atlas establishes a benchmark reference atlas and a foundational resource for integrative investigations of cellular and circuit function, development and evolution of the mammalian brain.
Collapse
Affiliation(s)
- Zizhen Yao
- Allen Institute for Brain Science, Seattle, WA, USA.
| | | | | | - Meng Zhang
- Howard Hughes Medical Institute, Department of Chemistry and Chemical Biology, Department of Physics, Harvard University, Cambridge, MA, USA
| | | | - Changkyu Lee
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Won Jung
- Howard Hughes Medical Institute, Department of Chemistry and Chemical Biology, Department of Physics, Harvard University, Cambridge, MA, USA
| | - Jeff Goldy
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | | | | | - Pamela Baker
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Eliza Barkan
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | | | | | | | | | - Daniel Carey
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | | | | | | | - Min Chen
- University of Pennsylvania, Philadelphia, PA, USA
| | | | - Jennie Close
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | - Scott Daniel
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | - Tim Dolbeare
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | | | | | - James Gee
- University of Pennsylvania, Philadelphia, PA, USA
| | | | | | - Jessica Gloe
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | - James Gray
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | | | | | - Windy Ho
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | - Mike Huang
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Madie Hupp
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Kelly Jin
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | - Kanan Lathia
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Arielle Leon
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Su Li
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Brian Long
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Zach Madigan
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | | | - Zoe Maltzer
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Naomi Martin
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Rachel McCue
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Ryan McGinty
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Nicholas Mei
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Jose Melchor
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | | | | | | | - Sven Otto
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | | | | | | | - Lane Sawyer
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | - Noah Shepard
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | - Josef Sulc
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Michael Tieu
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | - Herman Tung
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | - Shane Vance
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | - Katelyn Ward
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Boaz Levi
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | - Rob Young
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Brian Staats
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | | | - Shoaib Mufti
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | - Lauren Kruse
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Nick Dee
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | | | | | - Jack Waters
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Lydia Ng
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | | | - Xiaowei Zhuang
- Howard Hughes Medical Institute, Department of Chemistry and Chemical Biology, Department of Physics, Harvard University, Cambridge, MA, USA
| | - Hongkui Zeng
- Allen Institute for Brain Science, Seattle, WA, USA.
| |
Collapse
|
147
|
Okada H, Chung UI, Hojo H. Practical Compass of Single-Cell RNA-Seq Analysis. Curr Osteoporos Rep 2023:10.1007/s11914-023-00840-4. [PMID: 38019344 DOI: 10.1007/s11914-023-00840-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/14/2023] [Indexed: 11/30/2023]
Abstract
PURPOSE OF REVIEW This review paper provides step-by-step instructions on the fundamental process, from handling fastq datasets to illustrating plots and drawing trajectories. RECENT FINDINGS The number of studies using single-cell RNA-seq (scRNA-seq) is increasing. scRNA-seq revealed the heterogeneity or diversity of the cellular populations. scRNA-seq also provides insight into the interactions between different cell types. User-friendly scRNA-seq packages for ligand-receptor interactions and trajectory analyses are available. In skeletal biology, osteoclast differentiation, fracture healing, ectopic ossification, human bone development, and the bone marrow niche have been examined using scRNA-seq. scRNA-seq data analysis tools are still being developed, even at the fundamental step of dataset integration. However, updating the latest information is difficult for many researchers. Investigators and reviewers must share their knowledge of in silico scRNA-seq for better biological interpretation. This review article aims to provide a useful guide for complex analytical processes in single-cell RNA-seq data analysis.
Collapse
Affiliation(s)
- Hiroyuki Okada
- Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo-Ku, Tokyo, 113-8655, Japan.
- Department of Orthopaedic Surgery, The University of Tokyo, Tokyo, Japan.
- Department of Oral Medicine, Infection, and Immunity, Harvard School of Dental Medicine, Boston, MA, 02115, USA.
| | - Ung-Il Chung
- Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo-Ku, Tokyo, 113-8655, Japan
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, Tokyo, Japan
| | - Hironori Hojo
- Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo-Ku, Tokyo, 113-8655, Japan
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
148
|
Mosquera JV, Auguste G, Wong D, Turner AW, Hodonsky CJ, Alvarez-Yela AC, Song Y, Cheng Q, Lino Cardenas CL, Theofilatos K, Bos M, Kavousi M, Peyser PA, Mayr M, Kovacic JC, Björkegren JLM, Malhotra R, Stukenberg PT, Finn AV, van der Laan SW, Zang C, Sheffield NC, Miller CL. Integrative single-cell meta-analysis reveals disease-relevant vascular cell states and markers in human atherosclerosis. Cell Rep 2023; 42:113380. [PMID: 37950869 DOI: 10.1016/j.celrep.2023.113380] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 09/12/2023] [Accepted: 10/20/2023] [Indexed: 11/13/2023] Open
Abstract
Coronary artery disease (CAD) is characterized by atherosclerotic plaque formation in the arterial wall. CAD progression involves complex interactions and phenotypic plasticity among vascular and immune cell lineages. Single-cell RNA-seq (scRNA-seq) studies have highlighted lineage-specific transcriptomic signatures, but human cell phenotypes remain controversial. Here, we perform an integrated meta-analysis of 22 scRNA-seq libraries to generate a comprehensive map of human atherosclerosis with 118,578 cells. Besides characterizing granular cell-type diversity and communication, we leverage this atlas to provide insights into smooth muscle cell (SMC) modulation. We integrate genome-wide association study data and uncover a critical role for modulated SMC phenotypes in CAD, myocardial infarction, and coronary calcification. Finally, we identify fibromyocyte/fibrochondrogenic SMC markers (LTBP1 and CRTAC1) as proxies of atherosclerosis progression and validate these through omics and spatial imaging analyses. Altogether, we create a unified atlas of human atherosclerosis informing cell state-specific mechanistic and translational studies of cardiovascular diseases.
Collapse
Affiliation(s)
- Jose Verdezoto Mosquera
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA; Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Gaëlle Auguste
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Doris Wong
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA; Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Adam W Turner
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Chani J Hodonsky
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | | | - Yipei Song
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA; Department of Computer Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Qi Cheng
- CVPath Institute, Gaithersburg, MD 20878, USA
| | - Christian L Lino Cardenas
- Cardiovascular Research Center, Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02129, USA
| | | | - Maxime Bos
- Department of Epidemiology, Erasmus University Medical Center, 3000 CA Rotterdam, the Netherlands
| | - Maryam Kavousi
- Department of Epidemiology, Erasmus University Medical Center, 3000 CA Rotterdam, the Netherlands
| | - Patricia A Peyser
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI 48019, USA
| | - Manuel Mayr
- King's British Heart Foundation Centre, King's College London, London WC2R 2LS, UK; National Heart and Lung Institute, Imperial College London, London SW3 6LY, UK
| | - Jason C Kovacic
- Cardiovascular Research Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia; St. Vincent's Clinical School, University of New South Wales, Sydney, NSW 2052, Australia
| | - Johan L M Björkegren
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Karolinska Institutet, 141 52 Huddinge, Sweden
| | - Rajeev Malhotra
- Cardiovascular Research Center, Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02129, USA
| | - P Todd Stukenberg
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA
| | | | - Sander W van der Laan
- Central Diagnostics Laboratory, Division Laboratories, Pharmacy, and Biomedical Genetics, University Medical Center Utrecht, Utrecht University, 3584 CX Utrecht, the Netherlands
| | - Chongzhi Zang
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA; Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA; Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA; Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Nathan C Sheffield
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA; Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA; Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA; Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Clint L Miller
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA; Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA; Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA; Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA.
| |
Collapse
|
149
|
Shree A, Pavan MK, Zafar H. scDREAMER for atlas-level integration of single-cell datasets using deep generative model paired with adversarial classifier. Nat Commun 2023; 14:7781. [PMID: 38012145 PMCID: PMC10682386 DOI: 10.1038/s41467-023-43590-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 11/14/2023] [Indexed: 11/29/2023] Open
Abstract
Integration of heterogeneous single-cell sequencing datasets generated across multiple tissue locations, time, and conditions is essential for a comprehensive understanding of the cellular states and expression programs underlying complex biological systems. Here, we present scDREAMER ( https://github.com/Zafar-Lab/scDREAMER ), a data-integration framework that employs deep generative models and adversarial training for both unsupervised and supervised (scDREAMER-Sup) integration of multiple batches. Using six real benchmarking datasets, we demonstrate that scDREAMER can overcome critical challenges including skewed cell type distribution among batches, nested batch-effects, large number of batches and conservation of development trajectory across batches. Our experiments also show that scDREAMER and scDREAMER-Sup outperform state-of-the-art unsupervised and supervised integration methods respectively in batch-correction and conservation of biological variation. Using a 1 million cells dataset, we demonstrate that scDREAMER is scalable and can perform atlas-level cross-species (e.g., human and mouse) integration while being faster than other deep-learning-based methods.
Collapse
Affiliation(s)
- Ajita Shree
- Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, India
| | - Musale Krushna Pavan
- Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, India
| | - Hamim Zafar
- Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, India.
- Department of Biological Sciences and Bioengineering, Indian Institute of Technology Kanpur, Kanpur, India.
- Mehta Family Centre for Engineering in Medicine, Indian Institute of Technology Kanpur, Kanpur, India.
| |
Collapse
|
150
|
Huizing GJ, Deutschmann IM, Peyré G, Cantini L. Paired single-cell multi-omics data integration with Mowgli. Nat Commun 2023; 14:7711. [PMID: 38001063 PMCID: PMC10673889 DOI: 10.1038/s41467-023-43019-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 10/30/2023] [Indexed: 11/26/2023] Open
Abstract
The profiling of multiple molecular layers from the same set of cells has recently become possible. There is thus a growing need for multi-view learning methods able to jointly analyze these data. We here present Multi-Omics Wasserstein inteGrative anaLysIs (Mowgli), a novel method for the integration of paired multi-omics data with any type and number of omics. Of note, Mowgli combines integrative Nonnegative Matrix Factorization and Optimal Transport, enhancing at the same time the clustering performance and interpretability of integrative Nonnegative Matrix Factorization. We apply Mowgli to multiple paired single-cell multi-omics data profiled with 10X Multiome, CITE-seq, and TEA-seq. Our in-depth benchmark demonstrates that Mowgli's performance is competitive with the state-of-the-art in cell clustering and superior to the state-of-the-art once considering biological interpretability. Mowgli is implemented as a Python package seamlessly integrated within the scverse ecosystem and it is available at http://github.com/cantinilab/mowgli .
Collapse
Affiliation(s)
- Geert-Jan Huizing
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, F-75015, Paris, France.
- Institut de Biologie de l'Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005, Paris, France.
| | - Ina Maria Deutschmann
- Institut de Biologie de l'Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005, Paris, France
| | - Gabriel Peyré
- CNRS and DMA de l'Ecole Normale Supérieure, CNRS, Ecole Normale Supérieure, Université PSL, 75005, Paris, France
| | - Laura Cantini
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, F-75015, Paris, France.
- Institut de Biologie de l'Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005, Paris, France.
| |
Collapse
|