1
|
Grobecker P, Sakoparnig T, van Nimwegen E. Identifying cell states in single-cell RNA-seq data at statistically maximal resolution. PLoS Comput Biol 2024; 20:e1012224. [PMID: 38995959 DOI: 10.1371/journal.pcbi.1012224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 06/04/2024] [Indexed: 07/14/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has become a popular experimental method to study variation of gene expression within a population of cells. However, obtaining an accurate picture of the diversity of distinct gene expression states that are present in a given dataset is highly challenging because the sparsity of the scRNA-seq data and its inhomogeneous measurement noise properties. Although a vast number of different methods is applied in the literature for clustering cells into subsets with 'similar' expression profiles, these methods generally lack rigorously specified objectives, involve multiple complex layers of normalization, filtering, feature selection, dimensionality-reduction, employ ad hoc measures of distance or similarity between cells, often ignore the known measurement noise properties of scRNA-seq measurements, and include a large number of tunable parameters. Consequently, it is virtually impossible to assign concrete biophysical meaning to the clusterings that result from these methods. Here we address the following problem: Given raw unique molecule identifier (UMI) counts of an scRNA-seq dataset, partition the cells into subsets such that the gene expression states of the cells in each subset are statistically indistinguishable, and each subset corresponds to a distinct gene expression state. That is, we aim to partition cells so as to maximally reduce the complexity of the dataset without removing any of its meaningful structure. We show that, given the known measurement noise structure of scRNA-seq data, this problem is mathematically well-defined and derive its unique solution from first principles. We have implemented this solution in a tool called Cellstates which operates directly on the raw data and automatically determines the optimal partition and cluster number, with zero tunable parameters. We show that, on synthetic datasets, Cellstates almost perfectly recovers optimal partitions. On real data, Cellstates robustly identifies subtle substructure within groups of cells that are traditionally annotated as a common cell type. Moreover, we show that the diversity of gene expression states that Cellstates identifies systematically depends on the tissue of origin and not on technical features of the experiments such as the total number of cells and total UMI count per cell. In addition to the Cellstates tool we also provide a small toolbox of software to place the identified cellstates into a hierarchical tree of higher-order clusters, to identify the most important differentially expressed genes at each branch of this hierarchy, and to visualize these results.
Collapse
Affiliation(s)
- Pascal Grobecker
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Thomas Sakoparnig
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Erik van Nimwegen
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
2
|
Pan X, Li X, Dong L, Liu T, Zhang M, Zhang L, Zhang X, Huang L, Shi W, Sun H, Fang Z, Sun J, Huang Y, Shao H, Wang Y, Yin M. Tumour vasculature at single-cell resolution. Nature 2024:10.1038/s41586-024-07698-1. [PMID: 38987599 DOI: 10.1038/s41586-024-07698-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 06/10/2024] [Indexed: 07/12/2024]
Abstract
Tumours can obtain nutrients and oxygen required to progress and metastasize through the blood supply1. Inducing angiogenesis involves the sprouting of established vessel beds and their maturation into an organized network2,3. Here we generate a comprehensive atlas of tumour vasculature at single-cell resolution, encompassing approximately 200,000 cells from 372 donors representing 31 cancer types. Trajectory inference suggested that tumour angiogenesis was initiated from venous endothelial cells and extended towards arterial endothelial cells. As neovascularization elongates (through angiogenic stages SI, SII and SIII), APLN+ tip cells at the SI stage (APLN+ TipSI) advanced to TipSIII cells with increased Notch signalling. Meanwhile, stalk cells, following tip cells, transitioned from high chemokine expression to elevated TEK (also known as Tie2) expression. Moreover, APLN+ TipSI cells not only were associated with disease progression and poor prognosis but also hold promise for predicting response to anti-VEGF therapy. Lymphatic endothelial cells demonstrated two distinct differentiation lineages: one responsible for lymphangiogenesis and the other involved in antigen presentation. In pericytes, endoplasmic reticulum stress was associated with the proangiogenic BASP1+ matrix-producing pericytes. Furthermore, intercellular communication analysis showed that neovascular endothelial cells could shape an immunosuppressive microenvironment conducive to angiogenesis. This study depicts the complexity of tumour vasculature and has potential clinical significance for anti-angiogenic therapy.
Collapse
Affiliation(s)
- Xu Pan
- Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC) and Translational Medicine Research Center (TMRC), Chongqing University Three Gorges Hospital, Chongqing University, Chongqing, China
- Department of Basic Medical Sciences, School of Medicine, Tsinghua University, Beijing, China
- Department of General Surgery, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science & Peking Union Medical College, Beijing, China
| | - Xin Li
- Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC) and Translational Medicine Research Center (TMRC), Chongqing University Three Gorges Hospital, Chongqing University, Chongqing, China
- Chongqing Technical Innovation Center for Quality Evaluation and Identification of Authentic Medicinal Herbs, Chongqing, China
- School of Medicine, Chongqing University, Chongqing, China
| | - Liang Dong
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, China
| | - Teng Liu
- Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC) and Translational Medicine Research Center (TMRC), Chongqing University Three Gorges Hospital, Chongqing University, Chongqing, China
- Chongqing Technical Innovation Center for Quality Evaluation and Identification of Authentic Medicinal Herbs, Chongqing, China
- School of Medicine, Chongqing University, Chongqing, China
| | - Min Zhang
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, China
| | - Lining Zhang
- Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC) and Translational Medicine Research Center (TMRC), Chongqing University Three Gorges Hospital, Chongqing University, Chongqing, China
- Chongqing Technical Innovation Center for Quality Evaluation and Identification of Authentic Medicinal Herbs, Chongqing, China
- School of Medicine, Chongqing University, Chongqing, China
| | - Xiyuan Zhang
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, China
| | - Lingjuan Huang
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, China
| | - Wensheng Shi
- Department of Urology, Xiangya Hospital, Central South University, Changsha, China
| | - Hongyin Sun
- Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC) and Translational Medicine Research Center (TMRC), Chongqing University Three Gorges Hospital, Chongqing University, Chongqing, China
- School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, China
| | - Zhaoyu Fang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering at Central South University, Changsha, China
| | - Jie Sun
- Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC) and Translational Medicine Research Center (TMRC), Chongqing University Three Gorges Hospital, Chongqing University, Chongqing, China
- Chongqing Technical Innovation Center for Quality Evaluation and Identification of Authentic Medicinal Herbs, Chongqing, China
- School of Medicine, Chongqing University, Chongqing, China
| | - Yaoxuan Huang
- Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC) and Translational Medicine Research Center (TMRC), Chongqing University Three Gorges Hospital, Chongqing University, Chongqing, China
- Chongqing Technical Innovation Center for Quality Evaluation and Identification of Authentic Medicinal Herbs, Chongqing, China
- School of Medicine, Chongqing University, Chongqing, China
| | - Hua Shao
- Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC) and Translational Medicine Research Center (TMRC), Chongqing University Three Gorges Hospital, Chongqing University, Chongqing, China
- Chongqing Technical Innovation Center for Quality Evaluation and Identification of Authentic Medicinal Herbs, Chongqing, China
- School of Medicine, Chongqing University, Chongqing, China
| | - Yeqi Wang
- Key Laboratory for Biorheological Science and Technology of Ministry of Education, Bioengineering College of Chongqing University, Chongqing, China
| | - Mingzhu Yin
- Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC) and Translational Medicine Research Center (TMRC), Chongqing University Three Gorges Hospital, Chongqing University, Chongqing, China.
- Chongqing Technical Innovation Center for Quality Evaluation and Identification of Authentic Medicinal Herbs, Chongqing, China.
- School of Medicine, Chongqing University, Chongqing, China.
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, China.
| |
Collapse
|
3
|
Bilous M, Hérault L, Gabriel AA, Teleman M, Gfeller D. Building and analyzing metacells in single-cell genomics data. Mol Syst Biol 2024; 20:744-766. [PMID: 38811801 PMCID: PMC11220014 DOI: 10.1038/s44320-024-00045-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 05/03/2024] [Accepted: 05/08/2024] [Indexed: 05/31/2024] Open
Abstract
The advent of high-throughput single-cell genomics technologies has fundamentally transformed biological sciences. Currently, millions of cells from complex biological tissues can be phenotypically profiled across multiple modalities. The scaling of computational methods to analyze and visualize such data is a constant challenge, and tools need to be regularly updated, if not redesigned, to cope with ever-growing numbers of cells. Over the last few years, metacells have been introduced to reduce the size and complexity of single-cell genomics data while preserving biologically relevant information and improving interpretability. Here, we review recent studies that capitalize on the concept of metacells-and the many variants in nomenclature that have been used. We further outline how and when metacells should (or should not) be used to analyze single-cell genomics data and what should be considered when analyzing such data at the metacell level. To facilitate the exploration of metacells, we provide a comprehensive tutorial on the construction and analysis of metacells from single-cell RNA-seq data ( https://github.com/GfellerLab/MetacellAnalysisTutorial ) as well as a fully integrated pipeline to rapidly build, visualize and evaluate metacells with different methods ( https://github.com/GfellerLab/MetacellAnalysisToolkit ).
Collapse
Affiliation(s)
- Mariia Bilous
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Léonard Hérault
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Aurélie Ag Gabriel
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Matei Teleman
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland.
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland.
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland.
| |
Collapse
|
4
|
Aihara G, Clifton K, Chen M, Li Z, Atta L, Miller BF, Satija R, Hickey JW, Fan J. SEraster: a rasterization preprocessing framework for scalable spatial omics data analysis. Bioinformatics 2024; 40:btae412. [PMID: 38902953 PMCID: PMC11226864 DOI: 10.1093/bioinformatics/btae412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 05/15/2024] [Accepted: 06/19/2024] [Indexed: 06/22/2024] Open
Abstract
MOTIVATION Spatial omics data demand computational analysis but many analysis tools have computational resource requirements that increase with the number of cells analyzed. This presents scalability challenges as researchers use spatial omics technologies to profile millions of cells. RESULTS To enhance the scalability of spatial omics data analysis, we developed a rasterization preprocessing framework called SEraster that aggregates cellular information into spatial pixels. We apply SEraster to both real and simulated spatial omics data prior to spatial variable gene expression analysis to demonstrate that such preprocessing can reduce computational resource requirements while maintaining high performance, including as compared to other down-sampling approaches. We further integrate SEraster with existing analysis tools to characterize cell-type spatial co-enrichment across length scales. Finally, we apply SEraster to enable analysis of a mouse pup spatial omics dataset with over a million cells to identify tissue-level and cell-type-specific spatially variable genes as well as spatially co-enriched cell types that recapitulate expected organ structures. AVAILABILITY AND IMPLEMENTATION SEraster is implemented as an R package on GitHub (https://github.com/JEFworks-Lab/SEraster) with additional tutorials at https://JEF.works/SEraster.
Collapse
Affiliation(s)
- Gohta Aihara
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21211, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Kalen Clifton
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21211, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Mayling Chen
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21211, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Zhuoyan Li
- New York Genome Center, New York, NY 10013, United States
| | - Lyla Atta
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21211, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Brendan F Miller
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21211, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Rahul Satija
- New York Genome Center, New York, NY 10013, United States
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, United States
| | - John W Hickey
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, United States
| | - Jean Fan
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21211, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| |
Collapse
|
5
|
Gan D, Zhu Y, Lu X, Li J. SCIPAC: quantitative estimation of cell-phenotype associations. Genome Biol 2024; 25:119. [PMID: 38741183 PMCID: PMC11089691 DOI: 10.1186/s13059-024-03263-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 04/30/2024] [Indexed: 05/16/2024] Open
Abstract
Numerous algorithms have been proposed to identify cell types in single-cell RNA sequencing data, yet a fundamental problem remains: determining associations between cells and phenotypes such as cancer. We develop SCIPAC, the first algorithm that quantitatively estimates the association between each cell in single-cell data and a phenotype. SCIPAC also provides a p-value for each association and applies to data with virtually any type of phenotype. We demonstrate SCIPAC's accuracy in simulated data. On four real cancerous or noncancerous datasets, insights from SCIPAC help interpret the data and generate new hypotheses. SCIPAC requires minimum tuning and is computationally very fast.
Collapse
Affiliation(s)
- Dailin Gan
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, 46556, IN, USA
| | - Yini Zhu
- Department of Biological Sciences, Boler-Parseghian Center for Rare and Neglected Diseases, Harper Cancer Research Institute, Integrated Biomedical Sciences Graduate Program, University of Notre Dame, Notre Dame, 46556, IN, USA
| | - Xin Lu
- Department of Biological Sciences, Boler-Parseghian Center for Rare and Neglected Diseases, Harper Cancer Research Institute, Integrated Biomedical Sciences Graduate Program, University of Notre Dame, Notre Dame, 46556, IN, USA
- Tumor Microenvironment and Metastasis Program, Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Indianapolis, 46202, IN, USA
| | - Jun Li
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, 46556, IN, USA.
| |
Collapse
|
6
|
Putri GH, Howitt G, Marsh-Wakefield F, Ashhurst TM, Phipson B. SuperCellCyto: enabling efficient analysis of large scale cytometry datasets. Genome Biol 2024; 25:89. [PMID: 38589921 PMCID: PMC11003185 DOI: 10.1186/s13059-024-03229-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 03/27/2024] [Indexed: 04/10/2024] Open
Abstract
Advancements in cytometry technologies have enabled quantification of up to 50 proteins across millions of cells at single cell resolution. Analysis of cytometry data routinely involves tasks such as data integration, clustering, and dimensionality reduction. While numerous tools exist, many require extensive run times when processing large cytometry data containing millions of cells. Existing solutions, such as random subsampling, are inadequate as they risk excluding rare cell subsets. To address this, we propose SuperCellCyto, an R package that builds on the SuperCell tool which groups highly similar cells into supercells. SuperCellCyto is available on GitHub ( https://github.com/phipsonlab/SuperCellCyto ) and Zenodo ( https://doi.org/10.5281/zenodo.10521294 ).
Collapse
Affiliation(s)
- Givanna H Putri
- The Walter and Eliza Hall Institute of Medical Research and The Department of Medical Biology, The University of Melbourne, Parkville, VIC, Australia.
| | - George Howitt
- Peter MacCallum Cancer Centre and The Sir Peter MacCallum, Department of Oncology, The University of Melbourne, Parkville, VIC, Australia
| | - Felix Marsh-Wakefield
- Centenary Institute of Cancer Medicine and Cell Biology, The University of Sydney, Sydney, NSW, Australia
| | - Thomas M Ashhurst
- Sydney Cytometry Core Research Facility and School of Medical Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Belinda Phipson
- The Walter and Eliza Hall Institute of Medical Research and The Department of Medical Biology, The University of Melbourne, Parkville, VIC, Australia.
| |
Collapse
|
7
|
Youssef A, Paul I, Crovella M, Emili A. DESP demixes cell-state profiles from dynamic bulk molecular measurements. CELL REPORTS METHODS 2024; 4:100729. [PMID: 38490205 PMCID: PMC10985230 DOI: 10.1016/j.crmeth.2024.100729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 12/22/2023] [Accepted: 02/16/2024] [Indexed: 03/17/2024]
Abstract
Understanding the dynamic expression of proteins and other key molecules driving phenotypic remodeling in development and pathobiology has garnered widespread interest, yet the exploration of these systems at the foundational resolution of the underlying cell states has been significantly limited by technical constraints. Here, we present DESP, an algorithm designed to leverage independent estimates of cell-state proportions, such as from single-cell RNA sequencing, to resolve the relative contributions of cell states to bulk molecular measurements, most notably quantitative proteomics, recorded in parallel. We applied DESP to an in vitro model of the epithelial-to-mesenchymal transition and demonstrated its ability to accurately reconstruct cell-state signatures from bulk-level measurements of both the proteome and transcriptome, providing insights into transient regulatory mechanisms. DESP provides a generalizable computational framework for modeling the relationship between bulk and single-cell molecular measurements, enabling the study of proteomes and other molecular profiles at the cell-state level using established bulk-level workflows.
Collapse
Affiliation(s)
- Ahmed Youssef
- Graduate Program in Bioinformatics, Boston University, Boston, MA, USA; Center for Network Systems Biology, Boston University, Boston, MA, USA
| | - Indranil Paul
- Center for Network Systems Biology, Boston University, Boston, MA, USA
| | - Mark Crovella
- Graduate Program in Bioinformatics, Boston University, Boston, MA, USA; Computer Science Department, Boston University, Boston, MA, USA; Faculty of Computing and Data Sciences, Boston University, Boston, MA, USA.
| | - Andrew Emili
- Graduate Program in Bioinformatics, Boston University, Boston, MA, USA; Center for Network Systems Biology, Boston University, Boston, MA, USA; Faculty of Computing and Data Sciences, Boston University, Boston, MA, USA; Knight Cancer Institute, Oregon Health and Science University, Portland, OR, USA.
| |
Collapse
|
8
|
Osorio D, Capasso A, Eckhardt SG, Giri U, Somma A, Pitts TM, Lieu CH, Messersmith WA, Bagby SM, Singh H, Das J, Sahni N, Yi SS, Kuijjer ML. Population-level comparisons of gene regulatory networks modeled on high-throughput single-cell transcriptomics data. NATURE COMPUTATIONAL SCIENCE 2024; 4:237-250. [PMID: 38438786 DOI: 10.1038/s43588-024-00597-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 01/17/2024] [Indexed: 03/06/2024]
Abstract
Single-cell technologies enable high-resolution studies of phenotype-defining molecular mechanisms. However, data sparsity and cellular heterogeneity make modeling biological variability across single-cell samples difficult. Here we present SCORPION, a tool that uses a message-passing algorithm to reconstruct comparable gene regulatory networks from single-cell/nuclei RNA-sequencing data that are suitable for population-level comparisons by leveraging the same baseline priors. Using synthetic data, we found that SCORPION outperformed 12 existing gene regulatory network reconstruction techniques. Using supervised experiments, we show that SCORPION can accurately identify differences in regulatory networks between wild-type and transcription factor-perturbed cells. We demonstrate SCORPION's scalability to population-level analyses using a single-cell RNA-sequencing atlas containing 200,436 cells from colorectal cancer and adjacent healthy tissues. The differences between tumor regions detected by SCORPION are consistent across multiple cohorts as well as with our understanding of disease progression, and elucidate phenotypic regulators that may impact patient survival.
Collapse
Affiliation(s)
- Daniel Osorio
- Department of Oncology, Livestrong Cancer Institutes, Dell Medical School, The University of Texas at Austin, Austin, TX, USA.
| | - Anna Capasso
- Department of Oncology, Livestrong Cancer Institutes, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - S Gail Eckhardt
- Department of Oncology, Livestrong Cancer Institutes, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Uma Giri
- Department of Oncology, Livestrong Cancer Institutes, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Alexander Somma
- Department of Oncology, Livestrong Cancer Institutes, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Todd M Pitts
- Division of Medical Oncology, University of Colorado Cancer Center, School of Medicine, University of Colorado, Aurora, CO, USA
| | - Christopher H Lieu
- Division of Medical Oncology, University of Colorado Cancer Center, School of Medicine, University of Colorado, Aurora, CO, USA
| | - Wells A Messersmith
- Division of Medical Oncology, University of Colorado Cancer Center, School of Medicine, University of Colorado, Aurora, CO, USA
| | - Stacey M Bagby
- Division of Medical Oncology, University of Colorado Cancer Center, School of Medicine, University of Colorado, Aurora, CO, USA
| | - Harinder Singh
- Department of Immunology, Center for Systems Immunology, University of Pittsburg, Pittsburg, PA, USA
| | - Jishnu Das
- Department of Immunology, Center for Systems Immunology, University of Pittsburg, Pittsburg, PA, USA
| | - Nidhi Sahni
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas, MD Anderson Cancer Center, Houston, TX, USA
- Department of Bioinformatics and Computational Biology, The University of Texas, MD Anderson Cancer Center, Houston, TX, USA
| | - S Stephen Yi
- Department of Oncology, Livestrong Cancer Institutes, Dell Medical School, The University of Texas at Austin, Austin, TX, USA.
- Interdisciplinary Life Sciences Graduate Programs (ILSGP), College of Natural Sciences, The University of Texas at Austin, Austin, TX, USA.
- Oden Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin, Austin, TX, USA.
- Department of Biomedical Engineering, Cockrell School of Engineering, The University of Texas at Austin, Austin, TX, USA.
| | - Marieke L Kuijjer
- Centre for Molecular Medicine Norway (NCMM), University of Oslo, Oslo, Norway.
- Department of Pathology, Leiden University Medical Center (LUMC), Leiden University, Leiden, The Netherlands.
- Leiden Center for Computational Oncology, Leiden University Medical Center (LUMC), Leiden University, Leiden, The Netherlands.
| |
Collapse
|
9
|
Enabling comparative gene regulatory network analysis on single-cell data with SCORPION. NATURE COMPUTATIONAL SCIENCE 2024; 4:167-168. [PMID: 38459274 DOI: 10.1038/s43588-024-00615-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
|
10
|
Andreatta M, Hérault L, Gueguen P, Gfeller D, Berenstein AJ, Carmona SJ. Semi-supervised integration of single-cell transcriptomics data. Nat Commun 2024; 15:872. [PMID: 38287014 PMCID: PMC10825117 DOI: 10.1038/s41467-024-45240-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 01/16/2024] [Indexed: 01/31/2024] Open
Abstract
Batch effects in single-cell RNA-seq data pose a significant challenge for comparative analyses across samples, individuals, and conditions. Although batch effect correction methods are routinely applied, data integration often leads to overcorrection and can result in the loss of biological variability. In this work we present STACAS, a batch correction method for scRNA-seq that leverages prior knowledge on cell types to preserve biological variability upon integration. Through an open-source benchmark, we show that semi-supervised STACAS outperforms state-of-the-art unsupervised methods, as well as supervised methods such as scANVI and scGen. STACAS scales well to large datasets and is robust to incomplete and imprecise input cell type labels, which are commonly encountered in real-life integration tasks. We argue that the incorporation of prior cell type information should be a common practice in single-cell data integration, and we provide a flexible framework for semi-supervised batch effect correction.
Collapse
Affiliation(s)
- Massimo Andreatta
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Léonard Hérault
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Paul Gueguen
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Ariel J Berenstein
- Laboratorio de Biología Molecular, División Patología, Instituto Multidisciplinario de Investigaciones en Patologías Pediátricas (IMIPP), CONICET-GCBA, Buenos Aires, C1425EFD, Argentina
| | - Santiago J Carmona
- Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of Lausanne, 1011, Lausanne, Switzerland.
- AGORA Cancer Research Center, 1005, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| |
Collapse
|
11
|
Persad S, Choo ZN, Dien C, Sohail N, Masilionis I, Chaligné R, Nawy T, Brown CC, Sharma R, Pe'er I, Setty M, Pe'er D. SEACells infers transcriptional and epigenomic cellular states from single-cell genomics data. Nat Biotechnol 2023; 41:1746-1757. [PMID: 36973557 PMCID: PMC10713451 DOI: 10.1038/s41587-023-01716-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 02/20/2023] [Indexed: 03/29/2023]
Abstract
Metacells are cell groupings derived from single-cell sequencing data that represent highly granular, distinct cell states. Here we present single-cell aggregation of cell states (SEACells), an algorithm for identifying metacells that overcome the sparsity of single-cell data while retaining heterogeneity obscured by traditional cell clustering. SEACells outperforms existing algorithms in identifying comprehensive, compact and well-separated metacells in both RNA and assay for transposase-accessible chromatin (ATAC) modalities across datasets with discrete cell types and continuous trajectories. We demonstrate the use of SEACells to improve gene-peak associations, compute ATAC gene scores and infer the activities of critical regulators during differentiation. Metacell-level analysis scales to large datasets and is particularly well suited for patient cohorts, where per-patient aggregation provides more robust units for data integration. We use our metacells to reveal expression dynamics and gradual reconfiguration of the chromatin landscape during hematopoietic differentiation and to uniquely identify CD4 T cell differentiation and activation states associated with disease onset and severity in a Coronavirus Disease 2019 (COVID-19) patient cohort.
Collapse
Affiliation(s)
- Sitara Persad
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Department of Computer Science, Fu Foundation School of Engineering & Applied Science, Columbia University, New York, NY, USA
| | - Zi-Ning Choo
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Christine Dien
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Computational Biology Program, Public Health Sciences Division and Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Noor Sohail
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ignas Masilionis
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ronan Chaligné
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Tal Nawy
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Chrysothemis C Brown
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Roshan Sharma
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Itsik Pe'er
- Department of Computer Science, Fu Foundation School of Engineering & Applied Science, Columbia University, New York, NY, USA
| | - Manu Setty
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
- Computational Biology Program, Public Health Sciences Division and Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle, WA, USA.
| | - Dana Pe'er
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Howard Hughes Medical Institute, New York, NY, USA.
| |
Collapse
|
12
|
Lareau C. Subtle cell states resolved in single-cell data. Nat Biotechnol 2023; 41:1690-1691. [PMID: 37198441 PMCID: PMC10654257 DOI: 10.1038/s41587-023-01797-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Affiliation(s)
- Caleb Lareau
- Department of Pathology, Stanford University, Palo Alto, CA, USA.
| |
Collapse
|
13
|
Menon R, Otto EA, Barisoni L, Melo Ferreira R, Limonte CP, Godfrey B, Eichinger F, Nair V, Naik AS, Subramanian L, D'Agati V, Henderson JM, Herlitz L, Kiryluk K, Moledina DG, Moeckel GW, Palevsky PM, Parikh CR, Randhawa P, Rosas SE, Rosenberg AZ, Stillman I, Toto R, Torrealba J, Vazquez MA, Waikar SS, Alpers CE, Nelson RG, Eadon MT, Kretzler M, Hodgin JB. Defining the molecular correlate of arteriolar hyalinosis in kidney disease progression by integration of single cell transcriptomic analysis and pathology scoring. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.14.23291150. [PMID: 37398386 PMCID: PMC10312894 DOI: 10.1101/2023.06.14.23291150] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Arteriolar hyalinosis in kidneys is an independent predictor of cardiovascular disease, the main cause of mortality in chronic kidney disease (CKD). The underlying molecular mechanisms of protein accumulation in the subendothelial space are not well understood. Using single cell transcriptomic data and whole slide images from kidney biopsies of patients with CKD and acute kidney injury in the Kidney Precision Medicine Project, the molecular signals associated with arteriolar hyalinosis were evaluated. Co-expression network analysis of the endothelial genes yielded three gene set modules as significantly associated with arteriolar hyalinosis. Pathway analysis of these modules showed enrichment of transforming growth factor beta / bone morphogenetic protein (TGFβ / BMP) and vascular endothelial growth factor (VEGF) signaling pathways in the endothelial cell signatures. Ligand-receptor analysis identified multiple integrins and cell adhesion receptors as over-expressed in arteriolar hyalinosis, suggesting a potential role of integrin-mediated TGFβ signaling. Further analysis of arteriolar hyalinosis associated endothelial module genes identified focal segmental glomerular sclerosis as an enriched term. On validation in gene expression profiles from the Nephrotic Syndrome Study Network cohort, one of the three modules was significantly associated with the composite endpoint (> 40% reduction in estimated glomerular filtration rate (eGFR) or kidney failure) independent of age, sex, race, and baseline eGFR, suggesting poor prognosis with elevated expression of genes in this module. Thus, integration of structural and single cell molecular features yielded biologically relevant gene sets, signaling pathways and ligand-receptor interactions, underlying arteriolar hyalinosis and putative targets for therapeutic intervention.
Collapse
|
14
|
McCalla SG, Fotuhi Siahpirani A, Li J, Pyne S, Stone M, Periyasamy V, Shin J, Roy S. Identifying strengths and weaknesses of methods for computational network inference from single-cell RNA-seq data. G3 (BETHESDA, MD.) 2023; 13:jkad004. [PMID: 36626328 PMCID: PMC9997554 DOI: 10.1093/g3journal/jkad004] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 11/09/2022] [Accepted: 12/16/2022] [Indexed: 01/11/2023]
Abstract
Single-cell RNA-sequencing (scRNA-seq) offers unparalleled insight into the transcriptional programs of different cellular states by measuring the transcriptome of thousands of individual cells. An emerging problem in the analysis of scRNA-seq is the inference of transcriptional gene regulatory networks and a number of methods with different learning frameworks have been developed to address this problem. Here, we present an expanded benchmarking study of eleven recent network inference methods on seven published scRNA-seq datasets in human, mouse, and yeast considering different types of gold standard networks and evaluation metrics. We evaluate methods based on their computing requirements as well as on their ability to recover the network structure. We find that, while most methods have a modest recovery of experimentally derived interactions based on global metrics such as Area Under the Precision Recall curve, methods are able to capture targets of regulators that are relevant to the system under study. Among the top performing methods that use only expression were SCENIC, PIDC, MERLIN or Correlation. Addition of prior biological knowledge and the estimation of transcription factor activities resulted in the best overall performance with the Inferelator and MERLIN methods that use prior knowledge outperforming methods that use expression alone. We found that imputation for network inference did not improve network inference accuracy and could be detrimental. Comparisons of inferred networks for comparable bulk conditions showed that the networks inferred from scRNA-seq datasets are often better or at par with the networks inferred from bulk datasets. Our analysis should be beneficial in selecting methods for network inference. At the same time, this highlights the need for improved methods and better gold standards for regulatory network inference from scRNAseq datasets.
Collapse
Affiliation(s)
- Sunnie Grace McCalla
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | | | - Jiaxin Li
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Saptarshi Pyne
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Matthew Stone
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, USA
| | - Viswesh Periyasamy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Junha Shin
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
15
|
Hérault L, Poplineau M, Remy E, Duprez E. Single Cell Transcriptomics to Understand HSC Heterogeneity and Its Evolution upon Aging. Cells 2022; 11:cells11193125. [PMID: 36231086 PMCID: PMC9563410 DOI: 10.3390/cells11193125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 09/15/2022] [Accepted: 09/27/2022] [Indexed: 11/16/2022] Open
Abstract
Single-cell transcriptomic technologies enable the uncovering and characterization of cellular heterogeneity and pave the way for studies aiming at understanding the origin and consequences of it. The hematopoietic system is in essence a very well adapted model system to benefit from this technological advance because it is characterized by different cellular states. Each cellular state, and its interconnection, may be defined by a specific location in the global transcriptional landscape sustained by a complex regulatory network. This transcriptomic signature is not fixed and evolved over time to give rise to less efficient hematopoietic stem cells (HSC), leading to a well-documented hematopoietic aging. Here, we review the advance of single-cell transcriptomic approaches for the understanding of HSC heterogeneity to grasp HSC deregulations upon aging. We also discuss the new bioinformatics tools developed for the analysis of the resulting large and complex datasets. Finally, since hematopoiesis is driven by fine-tuned and complex networks that must be interconnected to each other, we highlight how mathematical modeling is beneficial for doing such interconnection between multilayered information and to predict how HSC behave while aging.
Collapse
Affiliation(s)
- Léonard Hérault
- I2M, CNRS, Aix Marseille University, 13009 Marseille, France
- Epigenetic Factors in Normal and Malignant Hematopoiesis Lab., CRCM, CNRS, INSERM, Institut Paoli Calmettes, Aix Marseille University, 13009 Marseille, France
| | - Mathilde Poplineau
- Epigenetic Factors in Normal and Malignant Hematopoiesis Lab., CRCM, CNRS, INSERM, Institut Paoli Calmettes, Aix Marseille University, 13009 Marseille, France
- Equipe Labellisée Ligue Nationale Contre le Cancer, 75013 Paris, France
| | - Elisabeth Remy
- I2M, CNRS, Aix Marseille University, 13009 Marseille, France
| | - Estelle Duprez
- Epigenetic Factors in Normal and Malignant Hematopoiesis Lab., CRCM, CNRS, INSERM, Institut Paoli Calmettes, Aix Marseille University, 13009 Marseille, France
- Equipe Labellisée Ligue Nationale Contre le Cancer, 75013 Paris, France
- Correspondence:
| |
Collapse
|