1
|
Chen R, Zhou J, Chen B. Imputing abundance of over 2,500 surface proteins from single-cell transcriptomes with context-agnostic zero-shot deep ensembles. Cell Syst 2024; 15:869-884.e6. [PMID: 39243755 PMCID: PMC11423933 DOI: 10.1016/j.cels.2024.08.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 05/23/2024] [Accepted: 08/15/2024] [Indexed: 09/09/2024]
Abstract
Cell surface proteins serve as primary drug targets and cell identity markers. Techniques such as CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing) have enabled the simultaneous quantification of surface protein abundance and transcript expression within individual cells. The published data have been utilized to train machine learning models for predicting surface protein abundance solely from transcript expression. However, the small scale of proteins predicted and the poor generalization ability of these computational approaches across diverse contexts (e.g., different tissues/disease states) impede their widespread adoption. Here, we propose SPIDER (surface protein prediction using deep ensembles from single-cell RNA sequencing), a context-agnostic zero-shot deep ensemble model, which enables large-scale protein abundance prediction and generalizes better to various contexts. Comprehensive benchmarking shows that SPIDER outperforms other state-of-the-art methods. Using the predicted surface abundance of >2,500 proteins from single-cell transcriptomes, we demonstrate the broad applications of SPIDER, including cell type annotation, biomarker/target identification, and cell-cell interaction analysis in hepatocellular carcinoma and colorectal cancer. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Ruoqiao Chen
- Department of Pharmacology and Toxicology, Michigan State University, East Lansing, MI 48824, USA
| | - Jiayu Zhou
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Bin Chen
- Department of Pharmacology and Toxicology, Michigan State University, East Lansing, MI 48824, USA; Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA; Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA.
| |
Collapse
|
2
|
Zhang L, Sagan A, Qin B, Kim E, Hu B, Osmanbeyoglu HU. STAN, a computational framework for inferring spatially informed transcription factor activity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.26.600782. [PMID: 38979296 PMCID: PMC11230390 DOI: 10.1101/2024.06.26.600782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Transcription factors (TFs) drive significant cellular changes in response to environmental cues and intercellular signaling. Neighboring cells influence TF activity and, consequently, cellular fate and function. Spatial transcriptomics (ST) captures mRNA expression patterns across tissue samples, enabling characterization of the local microenvironment. However, these datasets have not been fully leveraged to systematically estimate TF activity governing cell identity. Here, we present STAN ( S patially informed T ranscription factor A ctivity N etwork), a linear mixed-effects computational method that predicts spot-specific, spatially informed TF activities by integrating curated TF-target gene priors, mRNA expression, spatial coordinates, and morphological features from corresponding imaging data. We tested STAN using lymph node, breast cancer, and glioblastoma ST datasets to demonstrate its applicability by identifying TFs associated with specific cell types, spatial domains, pathological regions, and ligand‒receptor pairs. STAN augments the utility of STs to reveal the intricate interplay between TFs and spatial organization across a spectrum of cellular contexts.
Collapse
|
3
|
Chen R, Zhou J, Chen B. Imputing abundance of over 2500 surface proteins from single-cell transcriptomes with context-agnostic zero-shot deep ensembles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.31.605432. [PMID: 39131290 PMCID: PMC11312525 DOI: 10.1101/2024.07.31.605432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
Cell surface proteins serve as primary drug targets and cell identity markers. The emergence of techniques like CITE-seq has enabled simultaneous quantification of surface protein abundance and transcript expression for multimodal data analysis within individual cells. The published data have been utilized to train machine learning models for predicting surface protein abundance based solely from transcript expression. However, the small scale of proteins predicted and the poor generalization ability for these computational approaches across diverse contexts, such as different tissues or disease states, impede their widespread adoption. Here we propose SPIDER (surface protein prediction using deep ensembles from single-cell RNA-seq), a context-agnostic zero-shot deep ensemble model, which enables the large-scale prediction of cell surface protein abundance and generalizes better to various contexts. Comprehensive benchmarking shows that SPIDER outperforms other state-of-the-art methods. Using the predicted surface abundance of >2500 proteins from single-cell transcriptomes, we demonstrate the broad applications of SPIDER including cell type annotation, biomarker/target identification, and cell-cell interaction analysis in hepatocellular carcinoma and colorectal cancer.
Collapse
Affiliation(s)
- Ruoqiao Chen
- Department of Pharmacology and Toxicology, Michigan State University, MI, USA
| | - Jiayu Zhou
- Department of Computer Science and Engineering, Michigan State University, MI, USA
| | - Bin Chen
- Department of Pharmacology and Toxicology, Michigan State University, MI, USA
- Department of Computer Science and Engineering, Michigan State University, MI, USA
- Department of Pediatrics and Human Development, Michigan State University, MI, USA
| |
Collapse
|
4
|
Javaid A, Frost HR. STREAK: A supervised cell surface receptor abundance estimation strategy for single cell RNA-sequencing data using feature selection and thresholded gene set scoring. PLoS Comput Biol 2023; 19:e1011413. [PMID: 37603589 PMCID: PMC10470905 DOI: 10.1371/journal.pcbi.1011413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 08/31/2023] [Accepted: 08/07/2023] [Indexed: 08/23/2023] Open
Abstract
The accurate estimation of cell surface receptor abundance for single cell transcriptomics data is important for the tasks of cell type and phenotype categorization and cell-cell interaction quantification. We previously developed an unsupervised receptor abundance estimation technique named SPECK (Surface Protein abundance Estimation using CKmeans-based clustered thresholding) to address the challenges associated with accurate abundance estimation. In that paper, we concluded that SPECK results in improved concordance with Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) data relative to comparative unsupervised abundance estimation techniques using only single-cell RNA-sequencing (scRNA-seq) data. In this paper, we outline a new supervised receptor abundance estimation method called STREAK (gene Set Testing-based Receptor abundance Estimation using Adjusted distances and cKmeans thresholding) that leverages associations learned from joint scRNA-seq/CITE-seq training data and a thresholded gene set scoring mechanism to estimate receptor abundance for scRNA-seq target data. We evaluate STREAK relative to both unsupervised and supervised receptor abundance estimation techniques using two evaluation approaches on six joint scRNA-seq/CITE-seq datasets that represent four human and mouse tissue types. We conclude that STREAK outperforms other abundance estimation strategies and provides a more biologically interpretable and transparent statistical model.
Collapse
Affiliation(s)
- Azka Javaid
- Department of Biomedical Data Science, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Hildreth Robert Frost
- Department of Biomedical Data Science, Dartmouth College, Hanover, New Hampshire, United States of America
| |
Collapse
|
5
|
Ramjattun K, Xiaojun M, Shou-Jiang G, Singh H, Osmanbeyoglu HU. COVID-19db linkage maps of cell surface proteins and transcription factors in immune cells. J Med Virol 2023; 95:e28887. [PMID: 37341527 PMCID: PMC10478683 DOI: 10.1002/jmv.28887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 05/25/2023] [Accepted: 06/08/2023] [Indexed: 06/22/2023]
Abstract
The highly contagious SARS-CoV-2 and its associated disease (COVID-19) are a threat to global public health and economies. To develop effective treatments for COVID-19, we must understand the host cell types, cell states and regulators associated with infection and pathogenesis such as dysregulated transcription factors (TFs) and surface proteins, including signaling receptors. To link cell surface proteins with TFs, we recently developed SPaRTAN (Single-cell Proteomic and RNA-based Transcription factor Activity Network) by integrating parallel single-cell proteomic and transcriptomic data based on Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq) and gene cis-regulatory information. We apply SPaRTAN to CITE-seq data sets from patients with varying degrees of COVID-19 severity and healthy controls to identify the associations between surface proteins and TFs in host immune cells. Here, we present COVID-19db of Immune Cell States (https://covid19db.streamlit.app/), a web server containing cell surface protein expression, SPaRTAN-inferred TF activities, and their associations with major host immune cell types. The data include four high-quality COVID-19 CITE-seq data sets with a toolset for user-friendly data analysis and visualization. We provide interactive surface protein and TF visualizations across major immune cell types for each data set, allowing comparison between various patient severity groups for the discovery of potential therapeutic targets and diagnostic biomarkers.
Collapse
Affiliation(s)
- Koushul Ramjattun
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- UPMC Hillman Cancer Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Ma Xiaojun
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- UPMC Hillman Cancer Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Gao Shou-Jiang
- UPMC Hillman Cancer Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
| | - Harinder Singh
- Center for Systems Immunology and Department of Immunology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Hatice Ulku Osmanbeyoglu
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- UPMC Hillman Cancer Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Department of Bioengineering, University of Pittsburgh School of Engineering, Pittsburgh, USA
- Department of Biostatistics, University of Pittsburgh School of Public Health, Pittsburgh, PA, USA
| |
Collapse
|
6
|
Sagan A, Ma X, Ramjattun K, Osmanbeyoglu HU. Linking Expression of Cell-Surface Receptors with Transcription Factors by Computational Analysis of Paired Single-Cell Proteomes and Transcriptomes. Methods Mol Biol 2023; 2660:149-169. [PMID: 37191796 DOI: 10.1007/978-1-0716-3163-8_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Complex signaling and transcriptional programs control the development and physiology of specialized cell types. Genetic perturbations in these programs cause human cancers to arise from a diverse set of specialized cell types and developmental states. Understanding these complex systems and their potential to drive cancer is critical for the development of immunotherapies and druggable targets. Pioneering single-cell multi-omics technologies that analyze transcriptional states have been coupled with the expression of cell-surface receptors. This chapter describes SPaRTAN (Single-cell Proteomic and RNA-based Transcription factor Activity Network), a computational framework, to link transcription factors with cell-surface protein expression. SPaRTAN uses CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing) data and cis-regulatory sites to model the effect of interactions between transcription factors and cell-surface receptors on gene expression. We demonstrate the pipeline for SPaRTAN using CITE-seq data from peripheral blood mononuclear cells.
Collapse
Affiliation(s)
- April Sagan
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xiaojun Ma
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Koushul Ramjattun
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Hatice Ulku Osmanbeyoglu
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA.
- Department of Bioengineering, School of Engineering, University of Pittsburgh, Pittsburgh, PA, USA.
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
7
|
Isolated BAP1 Genomic Alteration in Malignant Pleural Mesothelioma Predicts Distinct Immunogenicity with Implications for Immunotherapeutic Response. Cancers (Basel) 2022; 14:cancers14225626. [PMID: 36428720 PMCID: PMC9688367 DOI: 10.3390/cancers14225626] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 10/30/2022] [Accepted: 11/09/2022] [Indexed: 11/18/2022] Open
Abstract
Malignant pleural mesothelioma (MPM), an aggressive cancer of the mesothelial cells lining the pleural cavity, lacks effective treatments. Multiple somatic mutations and copy number losses in tumor suppressor genes (TSGs) BAP1, CDKN2A/B, and NF2 are frequently associated with MPM. The impact of single versus multiple genomic alterations of TSG on MPM biology, the immune tumor microenvironment, clinical outcomes, and treatment responses are unknown. Tumors with genomic alterations in BAP1 alone were associated with a longer overall patient survival rate compared to tumors with CDKN2A/B and/or NF2 alterations with or without BAP1 and formed a distinct immunogenic subtype with altered transcription factor and pathway activity patterns. CDKN2A/B genomic alterations consistently contributed to an adverse clinical outcome. Since the genomic alterations of only BAP1 was associated with the PD-1 therapy response signature and higher LAG3 and VISTA gene expression, it might be a candidate marker for immune checkpoint blockade therapy. Our results on the impact of TSG genotypes on MPM and the correlations between TSG alterations and molecular pathways provide a foundation for developing individualized MPM therapies.
Collapse
|
8
|
Tao Y, Ma X, Palmer D, Schwartz R, Lu X, Osmanbeyoglu H. Interpretable deep learning for chromatin-informed inference of transcriptional programs driven by somatic alterations across cancers. Nucleic Acids Res 2022; 50:10869-10881. [PMID: 36243974 PMCID: PMC9638905 DOI: 10.1093/nar/gkac881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 09/23/2022] [Accepted: 09/29/2022] [Indexed: 11/14/2022] Open
Abstract
Cancer is a disease of gene dysregulation, where cells acquire somatic and epigenetic alterations that drive aberrant cellular signaling. These alterations adversely impact transcriptional programs and cause profound changes in gene expression. Interpreting somatic alterations within context-specific transcriptional programs will facilitate personalized therapeutic decisions but is a monumental task. Toward this goal, we develop a partially interpretable neural network model called Chromatin-informed Inference of Transcriptional Regulators Using Self-attention mechanism (CITRUS). CITRUS models the impact of somatic alterations on transcription factors and downstream transcriptional programs. Our approach employs a self-attention mechanism to model the contextual impact of somatic alterations. Furthermore, CITRUS uses a layer of hidden nodes to explicitly represent the state of transcription factors (TFs) to learn the relationships between TFs and their target genes based on TF binding motifs in the open chromatin regions of tumor samples. We apply CITRUS to genomic, transcriptomic, and epigenomic data from 17 cancer types profiled by The Cancer Genome Atlas. CITRUS predicts patient-specific TF activities and reveals transcriptional program variations between and within tumor types. We show that CITRUS yields biological insights into delineating TFs associated with somatic alterations in individual tumors. Thus, CITRUS is a promising tool for precision oncology.
Collapse
Affiliation(s)
- Yifeng Tao
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiaojun Ma
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Drake Palmer
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Russell Schwartz
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xinghua Lu
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Pharmaceutical Science, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Hatice Ulku Osmanbeyoglu
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Bioengineering, School of Engineering, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|