1
|
Zeng Z, Ma Y, Hu L, Tan B, Liu P, Wang Y, Xing C, Xiong Y, Du H. OmicVerse: a framework for bridging and deepening insights across bulk and single-cell sequencing. Nat Commun 2024; 15:5983. [PMID: 39013860 PMCID: PMC11252408 DOI: 10.1038/s41467-024-50194-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 06/28/2024] [Indexed: 07/18/2024] Open
Abstract
Single-cell sequencing is frequently affected by "omission" due to limitations in sequencing throughput, yet bulk RNA-seq may contain these ostensibly "omitted" cells. Here, we introduce the single cell trajectory blending from Bulk RNA-seq (BulkTrajBlend) algorithm, a component of the OmicVerse suite that leverages a Beta-Variational AutoEncoder for data deconvolution and graph neural networks for the discovery of overlapping communities. This approach effectively interpolates and restores the continuity of "omitted" cells within single-cell RNA sequencing datasets. Furthermore, OmicVerse provides an extensive toolkit for both bulk and single cell RNA-seq analysis, offering seamless access to diverse methodologies, streamlining computational processes, fostering exquisite data visualization, and facilitating the extraction of significant biological insights to advance scientific research.
Collapse
Affiliation(s)
- Zehua Zeng
- School of Chemistry and Biological Engineering, University of Science and Technology Beijing, Beijing, China.
- Daxing Research Institute, University of Science and Technology Beijing, Beijing, China.
| | - Yuqing Ma
- Center of Precision Medicine and Healthcare, Tsinghua-Berkeley Shenzhen Institute, Shenzhen, Guangdong Province, China
- Institute of Biopharmaceutics and Health Engineering, Tsinghua Shenzhen International Graduate School, Shenzhen, Guangdong Province, China
| | - Lei Hu
- School of Chemistry and Biological Engineering, University of Science and Technology Beijing, Beijing, China
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China
| | - Bowen Tan
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, China
| | - Peng Liu
- School of Chemistry and Biological Engineering, University of Science and Technology Beijing, Beijing, China
| | - Yixuan Wang
- School of Chemistry and Biological Engineering, University of Science and Technology Beijing, Beijing, China
| | - Cencan Xing
- School of Chemistry and Biological Engineering, University of Science and Technology Beijing, Beijing, China.
- Daxing Research Institute, University of Science and Technology Beijing, Beijing, China.
| | - Yuanyan Xiong
- Key Laboratory of Gene Engineering of the Ministry of Education, Institute of Healthy Aging Research, School of Life Sciences, Sun-Yat-Sen University, Guangzhou, Guangdong, China.
| | - Hongwu Du
- School of Chemistry and Biological Engineering, University of Science and Technology Beijing, Beijing, China.
- Daxing Research Institute, University of Science and Technology Beijing, Beijing, China.
| |
Collapse
|
2
|
Chari T, Gorin G, Pachter L. Stochastic Modeling of Biophysical Responses to Perturbation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.04.602131. [PMID: 39005347 PMCID: PMC11245117 DOI: 10.1101/2024.07.04.602131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Recent advances in high-throughput, multi-condition experiments allow for genome-wide investigation of how perturbations affect transcription and translation in the cell across multiple biological entities or modalities, from chromatin and mRNA information to protein production and spatial morphology. This presents an unprecedented opportunity to unravel how the processes of DNA and RNA regulation direct cell fate determination and disease response. Most methods designed for analyzing large-scale perturbation data focus on the observational outcomes, e.g., expression; however, many potential transcriptional mechanisms, such as transcriptional bursting or splicing dynamics, can underlie these complex and noisy observations. In this analysis, we demonstrate how a stochastic biophysical modeling approach to interpreting high-throughout perturbation data enables deeper investigation of the 'how' behind such molecular measurements. Our approach takes advantage of modalities already present in data produced with current technologies, such as nascent and mature mRNA measurements, to illuminate transcriptional dynamics induced by perturbation, predict kinetic behaviors in new perturbation settings, and uncover novel populations of cells with distinct kinetic responses to perturbation.
Collapse
Affiliation(s)
- Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | | | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California
| |
Collapse
|
3
|
Magaña-López G, Calzone L, Zinovyev A, Paulevé L. scBoolSeq: Linking scRNA-seq statistics and Boolean dynamics. PLoS Comput Biol 2024; 20:e1011620. [PMID: 38976751 PMCID: PMC11257695 DOI: 10.1371/journal.pcbi.1011620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 07/18/2024] [Accepted: 06/24/2024] [Indexed: 07/10/2024] Open
Abstract
Boolean networks are largely employed to model the qualitative dynamics of cell fate processes by describing the change of binary activation states of genes and transcription factors with time. Being able to bridge such qualitative states with quantitative measurements of gene expression in cells, as scRNA-seq, is a cornerstone for data-driven model construction and validation. On one hand, scRNA-seq binarisation is a key step for inferring and validating Boolean models. On the other hand, the generation of synthetic scRNA-seq data from baseline Boolean models provides an important asset to benchmark inference methods. However, linking characteristics of scRNA-seq datasets, including dropout events, with Boolean states is a challenging task. We present scBoolSeq, a method for the bidirectional linking of scRNA-seq data and Boolean activation state of genes. Given a reference scRNA-seq dataset, scBoolSeq computes statistical criteria to classify the empirical gene pseudocount distributions as either unimodal, bimodal, or zero-inflated, and fit a probabilistic model of dropouts, with gene-dependent parameters. From these learnt distributions, scBoolSeq can perform both binarisation of scRNA-seq datasets, and generate synthetic scRNA-seq datasets from Boolean traces, as issued from Boolean networks, using biased sampling and dropout simulation. We present a case study demonstrating the application of scBoolSeq's binarisation scheme in data-driven model inference. Furthermore, we compare synthetic scRNA-seq data generated by scBoolSeq with BoolODE's, data for the same Boolean Network model. The comparison shows that our method better reproduces the statistics of real scRNA-seq datasets, such as the mean-variance and mean-dropout relationships while exhibiting clearly defined trajectories in two-dimensional projections of the data.
Collapse
Affiliation(s)
| | - Laurence Calzone
- Institut Curie, Université PSL, Paris, France
- INSERM, U900, Paris, France
- Mines ParisTech, Université PSL, Paris, France
| | | | - Loïc Paulevé
- Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800, Talence, France
| |
Collapse
|
4
|
Paton V, Ramirez Flores RO, Gabor A, Badia-I-Mompel P, Tanevski J, Garrido-Rodriguez M, Saez-Rodriguez J. Assessing the impact of transcriptomics data analysis pipelines on downstream functional enrichment results. Nucleic Acids Res 2024:gkae552. [PMID: 38943333 DOI: 10.1093/nar/gkae552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 06/03/2024] [Accepted: 06/19/2024] [Indexed: 07/01/2024] Open
Abstract
Transcriptomics is widely used to assess the state of biological systems. There are many tools for the different steps, such as normalization, differential expression, and enrichment. While numerous studies have examined the impact of method choices on differential expression results, little attention has been paid to their effects on further downstream functional analysis, which typically provides the basis for interpretation and follow-up experiments. To address this, we introduce FLOP, a comprehensive nextflow-based workflow combining methods to perform end-to-end analyses of transcriptomics data. We illustrate FLOP on datasets ranging from end-stage heart failure patients to cancer cell lines. We discovered effects not noticeable at the gene-level, and observed that not filtering the data had the highest impact on the correlation between pipelines in the gene set space. Moreover, we performed three benchmarks to evaluate the 12 pipelines included in FLOP, and confirmed that filtering is essential in scenarios of expected moderate-to-low biological signal. Overall, our results underscore the impact of carefully evaluating the consequences of the choice of preprocessing methods on downstream enrichment analyses. We envision FLOP as a valuable tool to measure the robustness of functional analyses, ultimately leading to more reliable and conclusive biological findings.
Collapse
Affiliation(s)
- Victor Paton
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | - Ricardo Omar Ramirez Flores
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | - Attila Gabor
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | - Pau Badia-I-Mompel
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | - Jovan Tanevski
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | - Martin Garrido-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
- European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| |
Collapse
|
5
|
Shaw J, Gounot JS, Chen H, Nagarajan N, Yu YW. Floria: fast and accurate strain haplotyping in metagenomes. Bioinformatics 2024; 40:i30-i38. [PMID: 38940183 PMCID: PMC11211831 DOI: 10.1093/bioinformatics/btae252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
SUMMARY Shotgun metagenomics allows for direct analysis of microbial community genetics, but scalable computational methods for the recovery of bacterial strain genomes from microbiomes remains a key challenge. We introduce Floria, a novel method designed for rapid and accurate recovery of strain haplotypes from short and long-read metagenome sequencing data, based on minimum error correction (MEC) read clustering and a strain-preserving network flow model. Floria can function as a standalone haplotyping method, outputting alleles and reads that co-occur on the same strain, as well as an end-to-end read-to-assembly pipeline (Floria-PL) for strain-level assembly. Benchmarking evaluations on synthetic metagenomes show that Floria is > 3× faster and recovers 21% more strain content than base-level assembly methods (Strainberry) while being over an order of magnitude faster when only phasing is required. Applying Floria to a set of 109 deeply sequenced nanopore metagenomes took <20 min on average per sample and identified several species that have consistent strain heterogeneity. Applying Floria's short-read haplotyping to a longitudinal gut metagenomics dataset revealed a dynamic multi-strain Anaerostipes hadrus community with frequent strain loss and emergence events over 636 days. With Floria, accurate haplotyping of metagenomic datasets takes mere minutes on standard workstations, paving the way for extensive strain-level metagenomic analyses. AVAILABILITY AND IMPLEMENTATION Floria is available at https://github.com/bluenote-1577/floria, and the Floria-PL pipeline is available at https://github.com/jsgounot/Floria_analysis_workflow along with code for reproducing the benchmarks.
Collapse
Affiliation(s)
- Jim Shaw
- Department of Mathematics, University of Toronto, Toronto, Ontario, M5S 2E4, Canada
| | - Jean-Sebastien Gounot
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Singapore, 138672, Republic of Singapore
| | - Hanrong Chen
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Singapore, 138672, Republic of Singapore
| | - Niranjan Nagarajan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Singapore, 138672, Republic of Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117597, Republic of Singapore
| | - Yun William Yu
- Department of Mathematics, University of Toronto, Toronto, Ontario, M5S 2E4, Canada
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, United States
| |
Collapse
|
6
|
Moriel N, Memet E, Nitzan M. Optimal sequencing budget allocation for trajectory reconstruction of single cells. Bioinformatics 2024; 40:i446-i452. [PMID: 38940162 PMCID: PMC11211845 DOI: 10.1093/bioinformatics/btae258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
BACKGROUND Charting cellular trajectories over gene expression is key to understanding dynamic cellular processes and their underlying mechanisms. While advances in single-cell RNA-sequencing technologies and computational methods have pushed forward the recovery of such trajectories, trajectory inference remains a challenge due to the noisy, sparse, and high-dimensional nature of single-cell data. This challenge can be alleviated by increasing either the number of cells sampled along the trajectory (breadth) or the sequencing depth, i.e. the number of reads captured per cell (depth). Generally, these two factors are coupled due to an inherent breadth-depth tradeoff that arises when the sequencing budget is constrained due to financial or technical limitations. RESULTS Here we study the optimal allocation of a fixed sequencing budget to optimize the recovery of trajectory attributes. Empirical results reveal that reconstruction accuracy of internal cell structure in expression space scales with the logarithm of either the breadth or depth of sequencing. We additionally observe a power law relationship between the optimal number of sampled cells and the corresponding sequencing budget. For linear trajectories, non-monotonicity in trajectory reconstruction across the breadth-depth tradeoff can impact downstream inference, such as expression pattern analysis along the trajectory. We demonstrate these results for five single-cell RNA-sequencing datasets encompassing differentiation of embryonic stem cells, pancreatic beta cells, hepatoblast and multipotent hematopoietic cells, as well as induced reprogramming of embryonic fibroblasts into neurons. By addressing the challenges of single-cell data, our study offers insights into maximizing the efficiency of cellular trajectory analysis through strategic allocation of sequencing resources.
Collapse
Affiliation(s)
- Noa Moriel
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Edvin Memet
- Department of Physics, Harvard University, Cambridge, MA 02138, United States
| | - Mor Nitzan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
- Racah Institute of Physics, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
- Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| |
Collapse
|
7
|
Edenhofer FC, Térmeg A, Ohnuki M, Jocher J, Kliesmete Z, Briem E, Hellmann I, Enard W. Generation and characterization of inducible KRAB-dCas9 iPSCs from primates for cross-species CRISPRi. iScience 2024; 27:110090. [PMID: 38947524 PMCID: PMC11214527 DOI: 10.1016/j.isci.2024.110090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 03/28/2024] [Accepted: 05/21/2024] [Indexed: 07/02/2024] Open
Abstract
Comparisons of molecular phenotypes across primates provide unique information to understand human biology and evolution, and single-cell RNA-seq CRISPR interference (CRISPRi) screens are a powerful approach to analyze them. Here, we generate and validate three human, three gorilla, and two cynomolgus iPS cell lines that carry a dox-inducible KRAB-dCas9 construct at the AAVS1 locus. We show that despite variable expression levels of KRAB-dCas9 among lines, comparable downregulation of target genes and comparable phenotypic effects are observed in a single-cell RNA-seq CRISPRi screen. Hence, we provide valuable resources for performing and further extending CRISPRi in human and non-human primates.
Collapse
Affiliation(s)
- Fiona C. Edenhofer
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-Universität München, 82152 Planegg, Germany
| | - Anita Térmeg
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-Universität München, 82152 Planegg, Germany
| | - Mari Ohnuki
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-Universität München, 82152 Planegg, Germany
- Institute for the Advanced Study of Human Biology, Kyoto University, Kyoto 606-8501, Japan
- Hakubi Center, Kyoto University, Kyoto 606-8501, Japan
| | - Jessica Jocher
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-Universität München, 82152 Planegg, Germany
| | - Zane Kliesmete
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-Universität München, 82152 Planegg, Germany
| | - Eva Briem
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-Universität München, 82152 Planegg, Germany
| | - Ines Hellmann
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-Universität München, 82152 Planegg, Germany
| | - Wolfgang Enard
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-Universität München, 82152 Planegg, Germany
| |
Collapse
|
8
|
Jia Y, Ma P, Yao Q. CellMarkerPipe: cell marker identification and evaluation pipeline in single cell transcriptomes. Sci Rep 2024; 14:13151. [PMID: 38849445 PMCID: PMC11161599 DOI: 10.1038/s41598-024-63492-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 05/29/2024] [Indexed: 06/09/2024] Open
Abstract
Assessing marker genes from all cell clusters can be time-consuming and lack systematic strategy. Streamlining this process through a unified computational platform that automates identification and benchmarking will greatly enhance efficiency and ensure a fair evaluation. We therefore developed a novel computational platform, cellMarkerPipe ( https://github.com/yao-laboratory/cellMarkerPipe ), for automated cell-type specific marker gene identification from scRNA-seq data, coupled with comprehensive evaluation schema. CellMarkerPipe adaptively wraps around a collection of commonly used and state-of-the-art tools, including Seurat, COSG, SC3, SCMarker, COMET, and scGeneFit. From rigorously testing across diverse samples, we ascertain SCMarker's overall reliable performance in single marker gene selection, with COSG showing commendable speed and comparable efficacy. Furthermore, we demonstrate the pivotal role of our approach in real-world medical datasets. This general and opensource pipeline stands as a significant advancement in streamlining cell marker gene identification and evaluation, fitting broad applications in the field of cellular biology and medical research.
Collapse
Affiliation(s)
- Yinglu Jia
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA
- Department of Chemistry, University of Nebraska Lincoln, Hamilton Hall, Lincoln, NE, 68588, USA
| | - Pengchong Ma
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA
| | - Qiuming Yao
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA.
- Nebraska Center for the Prevention of Obesity Diseases, 316C Leverton Hall, Lincoln, NE, 68583, USA.
- Nebraska Center for Virology, University of Nebraska, 4240 Fair St., Lincoln, NE, 68583, USA.
| |
Collapse
|
9
|
Duo H, Li Y, Lan Y, Tao J, Yang Q, Xiao Y, Sun J, Li L, Nie X, Zhang X, Liang G, Liu M, Hao Y, Li B. Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios. Genome Biol 2024; 25:145. [PMID: 38831386 PMCID: PMC11149245 DOI: 10.1186/s13059-024-03290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines. RESULTS We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation. CONCLUSIONS No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.
Collapse
Affiliation(s)
- Hongrui Duo
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Yinghong Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, People's Republic of China
| | - Yang Lan
- Institute of Pathology and Southwest Cancer Center, Southwest Hospital, Army Medical University, Chongqing, 400038, People's Republic of China
| | - Jingxin Tao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, People's Republic of China
| | - Yingxue Xiao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Jing Sun
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Lei Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Xiner Nie
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Xiaoxi Zhang
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Mingwei Liu
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, People's Republic of China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
| |
Collapse
|
10
|
Singh A, Khiabanian H. Feature selection followed by a novel residuals-based normalization simplifies and improves single-cell gene expression analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.02.530891. [PMID: 38328133 PMCID: PMC10849523 DOI: 10.1101/2023.03.02.530891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Normalization is a crucial step in the analysis of single-cell RNA-sequencing (scRNA-seq) counts data. Its principal objectives are to reduce the systematic biases primarily introduced through technical sources and to transform the data to make it more amenable for application of established statistical frameworks. In the standard workflows, normalization is followed by feature selection to identify highly variable genes (HVGs) that capture most of the biologically meaningful variation across the cells. Here, we make the case for a revised workflow by proposing a simple feature selection method and showing that we can perform feature selection before normalization by relying on observed counts. We highlight that the feature selection step can be used to not only select HVGs but to also identify stable genes. We further propose a novel variance stabilization transformation inclusive residuals-based normalization method that in fact relies on the stable genes to inform the reduction of systematic biases. We demonstrate significant improvements in downstream clustering analyses through the application of our proposed methods on biological truth-known as well as simulated counts datasets. We have implemented this novel workflow for analyzing high-throughput scRNA-seq data in an R package called Piccolo.
Collapse
Affiliation(s)
- Amartya Singh
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, New Jersey
| | - Hossein Khiabanian
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, New Jersey
- Department of Pathology and Laboratory Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers University, New Brunswick, New Jersey
| |
Collapse
|
11
|
Cuevas-Diaz Duran R, Wei H, Wu J. Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets. BMC Genomics 2024; 25:444. [PMID: 38711017 PMCID: PMC11073985 DOI: 10.1186/s12864-024-10364-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 04/29/2024] [Indexed: 05/08/2024] Open
Abstract
BACKGROUND Normalization is a critical step in the analysis of single-cell RNA-sequencing (scRNA-seq) datasets. Its main goal is to make gene counts comparable within and between cells. To do so, normalization methods must account for technical and biological variability. Numerous normalization methods have been developed addressing different sources of dispersion and making specific assumptions about the count data. MAIN BODY The selection of a normalization method has a direct impact on downstream analysis, for example differential gene expression and cluster identification. Thus, the objective of this review is to guide the reader in making an informed decision on the most appropriate normalization method to use. To this aim, we first give an overview of the different single cell sequencing platforms and methods commonly used including isolation and library preparation protocols. Next, we discuss the inherent sources of variability of scRNA-seq datasets. We describe the categories of normalization methods and include examples of each. We also delineate imputation and batch-effect correction methods. Furthermore, we describe data-driven metrics commonly used to evaluate the performance of normalization methods. We also discuss common scRNA-seq methods and toolkits used for integrated data analysis. CONCLUSIONS According to the correction performed, normalization methods can be broadly classified as within and between-sample algorithms. Moreover, with respect to the mathematical model used, normalization methods can further be classified into: global scaling methods, generalized linear models, mixed methods, and machine learning-based methods. Each of these methods depict pros and cons and make different statistical assumptions. However, there is no better performing normalization method. Instead, metrics such as silhouette width, K-nearest neighbor batch-effect test, or Highly Variable Genes are recommended to assess the performance of normalization methods.
Collapse
Affiliation(s)
- Raquel Cuevas-Diaz Duran
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Monterrey, Nuevo Leon, 64710, Mexico.
| | - Haichao Wei
- The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
- Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of Molecular Medicine, Houston, TX, 77030, USA
| | - Jiaqian Wu
- The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
- Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of Molecular Medicine, Houston, TX, 77030, USA.
- MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, 77030, USA.
| |
Collapse
|
12
|
Park Y, Hauschild AC. The effect of data transformation on low-dimensional integration of single-cell RNA-seq. BMC Bioinformatics 2024; 25:171. [PMID: 38689234 PMCID: PMC11059821 DOI: 10.1186/s12859-024-05788-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/16/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Recent developments in single-cell RNA sequencing have opened up a multitude of possibilities to study tissues at the level of cellular populations. However, the heterogeneity in single-cell sequencing data necessitates appropriate procedures to adjust for technological limitations and various sources of noise when integrating datasets from different studies. While many analysis procedures employ various preprocessing steps, they often overlook the importance of selecting and optimizing the employed data transformation methods. RESULTS This work investigates data transformation approaches used in single-cell clustering analysis tools and their effects on batch integration analysis. In particular, we compare 16 transformations and their impact on the low-dimensional representations, aiming to reduce the batch effect and integrate multiple single-cell sequencing data. Our results show that data transformations strongly influence the results of single-cell clustering on low-dimensional data space, such as those generated by UMAP or PCA. Moreover, these changes in low-dimensional space significantly affect trajectory analysis using multiple datasets, as well. However, the performance of the data transformations greatly varies across datasets, and the optimal method was different for each dataset. Additionally, we explored how data transformation impacts the analysis of deep feature encodings using deep neural network-based models, including autoencoder-based models and proto-typical networks. Data transformation also strongly affects the outcome of deep neural network models. CONCLUSIONS Our findings suggest that the batch effect and noise in integrative analysis are highly influenced by data transformation. Low-dimensional features can integrate different batches well when proper data transformation is applied. Furthermore, we found that the batch mixing score on low-dimensional space can guide the selection of the optimal data transformation. In conclusion, data preprocessing is one of the most crucial analysis steps and needs to be cautiously considered in the integrative analysis of multiple scRNA-seq datasets.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
- International Max Planck Research Schools for Genome Science, Georg-August-Universität Göttingen, Göttingen, Germany
| | - Anne-Christin Hauschild
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany.
- Campus-Institute Data Science (CIDAS), Georg-August-Universität Göttingen, Göttingen, Germany.
| |
Collapse
|
13
|
Kim H, Chang W, Chae SJ, Park JE, Seo M, Kim JK. scLENS: data-driven signal detection for unbiased scRNA-seq data analysis. Nat Commun 2024; 15:3575. [PMID: 38678050 DOI: 10.1038/s41467-024-47884-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 04/14/2024] [Indexed: 04/29/2024] Open
Abstract
High dimensionality and noise have limited the new biological insights that can be discovered in scRNA-seq data. While dimensionality reduction tools have been developed to extract biological signals from the data, they often require manual determination of signal dimension, introducing user bias. Furthermore, a common data preprocessing method, log normalization, can unintentionally distort signals in the data. Here, we develop scLENS, a dimensionality reduction tool that circumvents the long-standing issues of signal distortion and manual input. Specifically, we identify the primary cause of signal distortion during log normalization and effectively address it by uniformizing cell vector lengths with L2 normalization. Furthermore, we utilize random matrix theory-based noise filtering and a signal robustness test to enable data-driven determination of the threshold for signal dimensions. Our method outperforms 11 widely used dimensionality reduction tools and performs particularly well for challenging scRNA-seq datasets with high sparsity and variability. To facilitate the use of scLENS, we provide a user-friendly package that automates accurate signal detection of scRNA-seq data without manual time-consuming tuning.
Collapse
Affiliation(s)
- Hyun Kim
- Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon, 34126, Republic of Korea
| | - Won Chang
- Division of Statistics and Data Science, University of Cincinnati, Cincinnati, OH, 45221, USA
| | - Seok Joo Chae
- Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon, 34126, Republic of Korea
- Department of Mathematical Sciences, KAIST, Daejeon, 34141, Republic of Korea
| | - Jong-Eun Park
- Graduate School of Medical Science and Engineering, KAIST, Daejeon, 34141, Republic of Korea
| | - Minseok Seo
- Department of Computer and Information Science, Korea University, Sejong, 30019, Republic of Korea
| | - Jae Kyoung Kim
- Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon, 34126, Republic of Korea.
- Department of Mathematical Sciences, KAIST, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
14
|
Tan CL, Lindner K, Boschert T, Meng Z, Rodriguez Ehrenfried A, De Roia A, Haltenhof G, Faenza A, Imperatore F, Bunse L, Lindner JM, Harbottle RP, Ratliff M, Offringa R, Poschke I, Platten M, Green EW. Prediction of tumor-reactive T cell receptors from scRNA-seq data for personalized T cell therapy. Nat Biotechnol 2024:10.1038/s41587-024-02161-y. [PMID: 38454173 DOI: 10.1038/s41587-024-02161-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 02/01/2024] [Indexed: 03/09/2024]
Abstract
The identification of patient-derived, tumor-reactive T cell receptors (TCRs) as a basis for personalized transgenic T cell therapies remains a time- and cost-intensive endeavor. Current approaches to identify tumor-reactive TCRs analyze tumor mutations to predict T cell activating (neo)antigens and use these to either enrich tumor infiltrating lymphocyte (TIL) cultures or validate individual TCRs for transgenic autologous therapies. Here we combined high-throughput TCR cloning and reactivity validation to train predicTCR, a machine learning classifier that identifies individual tumor-reactive TILs in an antigen-agnostic manner based on single-TIL RNA sequencing. PredicTCR identifies tumor-reactive TCRs in TILs from diverse cancers better than previous gene set enrichment-based approaches, increasing specificity and sensitivity (geometric mean) from 0.38 to 0.74. By predicting tumor-reactive TCRs in a matter of days, TCR clonotypes can be prioritized to accelerate the manufacture of personalized T cell therapies.
Collapse
Affiliation(s)
- C L Tan
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - K Lindner
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
- Immune Monitoring Unit, National Center for Tumor Diseases, Heidelberg, Germany
| | - T Boschert
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
- Helmholtz Institute for Translational Oncology, Mainz, Germany
| | - Z Meng
- Department of General, Visceral and Transplantation Surgery, University Hospital Heidelberg, Heidelberg, Germany
- Division of Molecular Oncology of Gastrointestinal Tumors, German Cancer Research Center, Heidelberg, Germany
- Sino-German Laboratory of Personalized Medicine for Pancreatic Cancer, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - A Rodriguez Ehrenfried
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
- Helmholtz Institute for Translational Oncology, Mainz, Germany
- Division of Molecular Oncology of Gastrointestinal Tumors, German Cancer Research Center, Heidelberg, Germany
| | - A De Roia
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
- DNA Vector Laboratory, German Cancer Research Center, Heidelberg, Germany
| | - G Haltenhof
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
| | | | | | - L Bunse
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
| | | | - R P Harbottle
- DNA Vector Laboratory, German Cancer Research Center, Heidelberg, Germany
| | - M Ratliff
- Department of Neurosurgery, University Hospital Mannheim, Mannheim, Germany
| | - R Offringa
- Department of General, Visceral and Transplantation Surgery, University Hospital Heidelberg, Heidelberg, Germany
- Division of Molecular Oncology of Gastrointestinal Tumors, German Cancer Research Center, Heidelberg, Germany
- Sino-German Laboratory of Personalized Medicine for Pancreatic Cancer, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - I Poschke
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Immune Monitoring Unit, National Center for Tumor Diseases, Heidelberg, Germany
| | - M Platten
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany.
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany.
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany.
- Immune Monitoring Unit, National Center for Tumor Diseases, Heidelberg, Germany.
- Helmholtz Institute for Translational Oncology, Mainz, Germany.
- German Cancer Research Center-Hector Cancer Institute at the Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany.
| | - E W Green
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany.
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany.
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany.
| |
Collapse
|
15
|
Mihai IS, Chafle S, Henriksson J. Representing and extracting knowledge from single-cell data. Biophys Rev 2024; 16:29-56. [PMID: 38495441 PMCID: PMC10937862 DOI: 10.1007/s12551-023-01091-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 06/28/2023] [Indexed: 03/19/2024] Open
Abstract
Single-cell analysis is currently one of the most high-resolution techniques to study biology. The large complex datasets that have been generated have spurred numerous developments in computational biology, in particular the use of advanced statistics and machine learning. This review attempts to explain the deeper theoretical concepts that underpin current state-of-the-art analysis methods. Single-cell analysis is covered from cell, through instruments, to current and upcoming models. The aim of this review is to spread concepts which are not yet in common use, especially from topology and generative processes, and how new statistical models can be developed to capture more of biology. This opens epistemological questions regarding our ontology and models, and some pointers will be given to how natural language processing (NLP) may help overcome our cognitive limitations for understanding single-cell data.
Collapse
Affiliation(s)
- Ionut Sebastian Mihai
- The Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå, Sweden
- Umeå Centre for Microbial Research (UCMR), Department of Molecular Biology, Umeå University, Umeå, Sweden
- Industrial Doctoral School, Umeå University, Umeå, Sweden
| | - Sarang Chafle
- The Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå, Sweden
- Umeå Centre for Microbial Research (UCMR), Department of Molecular Biology, Umeå University, Umeå, Sweden
| | - Johan Henriksson
- The Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå, Sweden
- Umeå Centre for Microbial Research (UCMR), Department of Molecular Biology, Umeå University, Umeå, Sweden
| |
Collapse
|
16
|
Yao Q, Jia Y, Ma P. cellMarkerPipe: Cell Marker Identification and Evaluation Pipeline in Single Cell Transcriptomes. RESEARCH SQUARE 2024:rs.3.rs-3844718. [PMID: 38313296 PMCID: PMC10836098 DOI: 10.21203/rs.3.rs-3844718/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
Assessing marker genes from all cell clusters can be time-consuming and lack systematic strategy. Streamlining this process through a unified computational platform that automates identification and benchmarking will greatly enhance efficiency and ensure a fair evaluation. We therefore developed a novel computational platform, cellMarkerPipe (https://github.com/yao-laboratory/cellMarkerPipe), for automated cell-type specific marker gene identification from scRNA-seq data, coupled with comprehensive evaluation schema. CellMarkerPipe adaptively wraps around a collection of commonly used and state-of-the-art tools, including Seurat, COSG, SC3, SCMarker, COMET, and scGeneFit. From rigorously testing across diverse samples, we ascertain SCMarker's overall reliable performance in single marker gene selection, with COSG showing commendable speed and comparable efficacy. Furthermore, we demonstrate the pivotal role of our approach in real-world medical datasets. This general and opensource pipeline stands as a significant advancement in streamlining cell marker gene identification and evaluation, fitting broad applications in the field of cellular biology and medical research.
Collapse
|
17
|
Atamian A, Birtele M, Hosseini N, Nguyen T, Seth A, Del Dosso A, Paul S, Tedeschi N, Taylor R, Coba MP, Samarasinghe R, Lois C, Quadrato G. Human cerebellar organoids with functional Purkinje cells. Cell Stem Cell 2024; 31:39-51.e6. [PMID: 38181749 DOI: 10.1016/j.stem.2023.11.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 05/30/2023] [Accepted: 11/30/2023] [Indexed: 01/07/2024]
Abstract
Research on human cerebellar development and disease has been hampered by the need for a human cell-based system that recapitulates the human cerebellum's cellular diversity and functional features. Here, we report a human organoid model (human cerebellar organoids [hCerOs]) capable of developing the complex cellular diversity of the fetal cerebellum, including a human-specific rhombic lip progenitor population that have never been generated in vitro prior to this study. 2-month-old hCerOs form distinct cytoarchitectural features, including laminar organized layering, and create functional connections between inhibitory and excitatory neurons that display coordinated network activity. Long-term culture of hCerOs allows healthy survival and maturation of Purkinje cells that display molecular and electrophysiological hallmarks of their in vivo counterparts, addressing a long-standing challenge in the field. This study therefore provides a physiologically relevant, all-human model system to elucidate the cell-type-specific mechanisms governing cerebellar development and disease.
Collapse
Affiliation(s)
- Alexander Atamian
- Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA; Eli and Edythe Broad CIRM Center for Regenerative Medicine and Stem Cell Research at USC, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Marcella Birtele
- Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA; Eli and Edythe Broad CIRM Center for Regenerative Medicine and Stem Cell Research at USC, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Negar Hosseini
- Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA; Eli and Edythe Broad CIRM Center for Regenerative Medicine and Stem Cell Research at USC, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Tuan Nguyen
- Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA; Eli and Edythe Broad CIRM Center for Regenerative Medicine and Stem Cell Research at USC, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Anoothi Seth
- Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA; Eli and Edythe Broad CIRM Center for Regenerative Medicine and Stem Cell Research at USC, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Ashley Del Dosso
- Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA; Eli and Edythe Broad CIRM Center for Regenerative Medicine and Stem Cell Research at USC, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Sandeep Paul
- Spatial Genomics, 145 Vista Avenue Suite 111, Pasadena, CA 91107, USA
| | - Neil Tedeschi
- Spatial Genomics, 145 Vista Avenue Suite 111, Pasadena, CA 91107, USA
| | - Ryan Taylor
- Spatial Genomics, 145 Vista Avenue Suite 111, Pasadena, CA 91107, USA
| | - Marcelo P Coba
- Department of Psychiatry and Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA; Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA; Department of Physiology and Neuroscience, Keck School of Medicine, University of Southern California, 1501 San Pablo Street, Los Angeles, CA 90033, USA
| | - Ranmal Samarasinghe
- Department of Clinical Neurophysiology and Neurology, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Carlos Lois
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Giorgia Quadrato
- Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA; Eli and Edythe Broad CIRM Center for Regenerative Medicine and Stem Cell Research at USC, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA.
| |
Collapse
|
18
|
Addala V, Newell F, Pearson JV, Redwood A, Robinson BW, Creaney J, Waddell N. Computational immunogenomic approaches to predict response to cancer immunotherapies. Nat Rev Clin Oncol 2024; 21:28-46. [PMID: 37907723 DOI: 10.1038/s41571-023-00830-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/03/2023] [Indexed: 11/02/2023]
Abstract
Cancer immunogenomics is an emerging field that bridges genomics and immunology. The establishment of large-scale genomic collaborative efforts along with the development of new single-cell transcriptomic techniques and multi-omics approaches have enabled characterization of the mutational and transcriptional profiles of many cancer types and helped to identify clinically actionable alterations as well as predictive and prognostic biomarkers. Researchers have developed computational approaches and machine learning algorithms to accurately obtain clinically useful information from genomic and transcriptomic sequencing data from bulk tissue or single cells and explore tumours and their microenvironment. The rapid growth in sequencing and computational approaches has resulted in the unmet need to understand their true potential and limitations in enabling improvements in the management of patients with cancer who are receiving immunotherapies. In this Review, we describe the computational approaches currently available to analyse bulk tissue and single-cell sequencing data from cancer, stromal and immune cells, as well as how best to select the most appropriate tool to address various clinical questions and, ultimately, improve patient outcomes.
Collapse
Affiliation(s)
- Venkateswar Addala
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia.
- Faculty of Medicine, The University of Queensland, Brisbane, Queensland, Australia.
| | - Felicity Newell
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - John V Pearson
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Alec Redwood
- National Centre for Asbestos Related Diseases, University of Western Australia, Perth, Western Australia, Australia
- Institute of Respiratory Health, Perth, Western Australia, Australia
- School of Biomedical Science, University of Western Australia, Perth, Western Australia, Australia
| | - Bruce W Robinson
- National Centre for Asbestos Related Diseases, University of Western Australia, Perth, Western Australia, Australia
- Institute of Respiratory Health, Perth, Western Australia, Australia
- Department of Respiratory Medicine, Sir Charles Gairdner Hospital, Perth, Western Australia, Australia
- Medical School, University of Western Australia, Perth, Western Australia, Australia
| | - Jenette Creaney
- National Centre for Asbestos Related Diseases, University of Western Australia, Perth, Western Australia, Australia
- Institute of Respiratory Health, Perth, Western Australia, Australia
- School of Biomedical Science, University of Western Australia, Perth, Western Australia, Australia
- Department of Respiratory Medicine, Sir Charles Gairdner Hospital, Perth, Western Australia, Australia
| | - Nicola Waddell
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia.
- Faculty of Medicine, The University of Queensland, Brisbane, Queensland, Australia.
| |
Collapse
|
19
|
Møller AF, Madsen JGS. JOINTLY: interpretable joint clustering of single-cell transcriptomes. Nat Commun 2023; 14:8473. [PMID: 38123569 PMCID: PMC10733431 DOI: 10.1038/s41467-023-44279-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 12/06/2023] [Indexed: 12/23/2023] Open
Abstract
Single-cell and single-nucleus RNA-sequencing (sxRNA-seq) is increasingly being used to characterise the transcriptomic state of cell types at homeostasis, during development and in disease. However, this is a challenging task, as biological effects can be masked by technical variation. Here, we present JOINTLY, an algorithm enabling joint clustering of sxRNA-seq datasets across batches. JOINTLY performs on par or better than state-of-the-art batch integration methods in clustering tasks and outperforms other intrinsically interpretable methods. We demonstrate that JOINTLY is robust against over-correction while retaining subtle cell state differences between biological conditions and highlight how the interpretation of JOINTLY can be used to annotate cell types and identify active signalling programs across cell types and pseudo-time. Finally, we use JOINTLY to construct a reference atlas of white adipose tissue (WATLAS), an expandable and comprehensive community resource, in which we describe four adipocyte subpopulations and map compositional changes in obesity and between depots.
Collapse
Affiliation(s)
- Andreas Fønss Møller
- Institute of Biochemistry and Molecular Biology, University of Southern, Odense, Denmark
- Sino-Danish College (SDC), University of Chinese Academy of Sciences, Beijing, China
| | - Jesper Grud Skat Madsen
- Institute of Biochemistry and Molecular Biology, University of Southern, Odense, Denmark.
- Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.
- Center for Functional Genomics and Tissue Plasticity (ATLAS), Odense M, 5230, Denmark.
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| |
Collapse
|
20
|
Leduc A, Harens H, Slavov N. Modeling and interpretation of single-cell proteogenomic data. ARXIV 2023:arXiv:2308.07465v2. [PMID: 37645043 PMCID: PMC10462161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Biological functions stem from coordinated interactions among proteins, nucleic acids and small molecules. Mass spectrometry technologies for reliable, high throughput single-cell proteomics will add a new modality to genomics and enable data-driven modeling of the molecular mechanisms coordinating proteins and nucleic acids at single-cell resolution. This promising potential requires estimating the reliability of measurements and computational analysis so that models can distinguish biological regulation from technical artifacts. We highlight different measurement modes that can support single-cell proteogenomic analysis and how to estimate their reliability. We then discuss approaches for developing both abstract and mechanistic models that aim to biologically interpret the measured differences across modalities, including specific applications to directed stem cell differentiation and to inferring protein interactions in cancer cells from the buffing of DNA copy-number variations. Single-cell proteogenomic data will support mechanistic models of direct molecular interactions that will provide generalizable and predictive representations of biological systems.
Collapse
Affiliation(s)
- Andrew Leduc
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA 02115, USA
| | - Hannah Harens
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA 02115, USA
| | - Nikolai Slavov
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA 02115, USA
- Parallel Squared Technology Institute, Watertown, MA 02472, USA
| |
Collapse
|
21
|
Gorin G, Yoshida S, Pachter L. Assessing Markovian and Delay Models for Single-Nucleus RNA Sequencing. Bull Math Biol 2023; 85:114. [PMID: 37828255 DOI: 10.1007/s11538-023-01213-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 09/11/2023] [Indexed: 10/14/2023]
Abstract
The serial nature of reactions involved in the RNA life-cycle motivates the incorporation of delays in models of transcriptional dynamics. The models couple a transcriptional process to a fairly general set of delayed monomolecular reactions with no feedback. We provide numerical strategies for calculating the RNA copy number distributions induced by these models, and solve several systems with splicing, degradation, and catalysis. An analysis of single-cell and single-nucleus RNA sequencing data using these models reveals that the kinetics of nuclear export do not appear to require invocation of a non-Markovian waiting time.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Shawn Yoshida
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125, USA.
| |
Collapse
|
22
|
Chari T, Gorin G, Pachter L. Biophysically Interpretable Inference of Cell Types from Multimodal Sequencing Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.17.558131. [PMID: 37745403 PMCID: PMC10516047 DOI: 10.1101/2023.09.17.558131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Multimodal, single-cell genomics technologies enable simultaneous capture of multiple facets of DNA and RNA processing in the cell. This creates opportunities for transcriptome-wide, mechanistic studies of cellular processing in heterogeneous cell types, with applications ranging from inferring kinetic differences between cells, to the role of stochasticity in driving heterogeneity. However, current methods for determining cell types or 'clusters' present in multimodal data often rely on ad hoc or independent treatment of modalities, and assumptions ignoring inherent properties of the count data. To enable interpretable and consistent cell cluster determination from multimodal data, we present meK-Means (mechanistic K-Means) which integrates modalities and learns underlying, shared biophysical states through a unifying model of transcription. In particular, we demonstrate how meK-Means can be used to cluster cells from unspliced and spliced mRNA count modalities. By utilizing the causal, physical relationships underlying these modalities, we identify shared transcriptional kinetics across cells, which induce the observed gene expression profiles, and provide an alternative definition for 'clusters' through the governing parameters of cellular processes.
Collapse
Affiliation(s)
- Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Gennady Gorin
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California
| |
Collapse
|
23
|
Wong M, Wei Y, Ho YC. Single-cell multiomic understanding of HIV-1 reservoir at epigenetic, transcriptional, and protein levels. Curr Opin HIV AIDS 2023; 18:246-256. [PMID: 37535039 PMCID: PMC10442869 DOI: 10.1097/coh.0000000000000809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2023]
Abstract
PURPOSE OF REVIEW The success of HIV-1 eradication strategies relies on in-depth understanding of HIV-1-infected cells. However, HIV-1-infected cells are extremely heterogeneous and rare. Single-cell multiomic approaches resolve the heterogeneity and rarity of HIV-1-infected cells. RECENT FINDINGS Advancement in single-cell multiomic approaches enabled HIV-1 reservoir profiling across the epigenetic (ATAC-seq), transcriptional (RNA-seq), and protein levels (CITE-seq). Using HIV-1 RNA as a surrogate, ECCITE-seq identified enrichment of HIV-1-infected cells in clonally expanded cytotoxic CD4+ T cells. Using HIV-1 DNA PCR-activated microfluidic sorting, FIND-seq captured the bulk transcriptome of HIV-1 DNA+ cells. Using targeted HIV-1 DNA amplification, PheP-seq identified surface protein expression of intact versus defective HIV-1-infected cells. Using ATAC-seq to identify HIV-1 DNA, ASAP-seq captured transcription factor activity and surface protein expression of HIV-1 DNA+ cells. Combining HIV-1 mapping by ATAC-seq and HIV-1 RNA mapping by RNA-seq, DOGMA-seq captured the epigenetic, transcriptional, and surface protein expression of latent and transcriptionally active HIV-1-infected cells. To identify reproducible biological insights and authentic HIV-1-infected cells and avoid false-positive discovery of artifacts, we reviewed current practices of single-cell multiomic experimental design and bioinformatic analysis. SUMMARY Single-cell multiomic approaches may identify innovative mechanisms of HIV-1 persistence, nominate therapeutic strategies, and accelerate discoveries.
Collapse
Affiliation(s)
- Michelle Wong
- Department of Microbial Pathogenesis, Yale University School of Medicine, New Haven, Connecticut, USA
| | | | | |
Collapse
|
24
|
Lause J, Ziegenhain C, Hartmanis L, Berens P, Kobak D. Compound models and Pearson residuals for normalization of single-cell RNA-seq data without UMIs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.02.551637. [PMID: 37577688 PMCID: PMC10418209 DOI: 10.1101/2023.08.02.551637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Before downstream analysis can reveal biological signals in single-cell RNA sequencing data, normalization and variance stabilization are required to remove technical noise. Recently, Pearson residuals based on negative binomial models have been suggested as an efficient normalization approach. These methods were developed for UMI-based sequencing protocols, where unique molecular identifiers (UMIs) help to remove PCR amplification noise by keeping track of the original molecules. In contrast, full-length protocols such as Smart-seq2 lack UMIs and retain amplification noise, making negative binomial models inapplicable. Here, we extend Pearson residuals to such read count data by modeling them as a compound process: we assume that the captured RNA molecules follow the negative binomial distribution, but are replicated according to an amplification distribution. Based on this model, we introduce compound Pearson residuals and show that they can be analytically obtained without explicit knowledge of the amplification distribution. Further, we demonstrate that compound Pearson residuals lead to a biologically meaningful gene selection and low-dimensional embeddings of complex Smart-seq2 datasets. Finally, we empirically study amplification distributions across several sequencing protocols, and suggest that they can be described by a broken power law. We show that the resulting compound distribution captures overdispersion and zero-inflation patterns characteristic of read count data. In summary, compound Pearson residuals provide an efficient and effective way to normalize read count data based on simple mechanistic assumptions.
Collapse
Affiliation(s)
- Jan Lause
- Hertie Institute for AI in Brain Health, University of Tübingen, Germany
- Tübingen AI Center, Tübingen, Germany
| | | | - Leonard Hartmanis
- Department of Cell & Molecular Biology, Karolinska Institutet, Sweden
| | - Philipp Berens
- Hertie Institute for AI in Brain Health, University of Tübingen, Germany
- Tübingen AI Center, Tübingen, Germany
| | - Dmitry Kobak
- Hertie Institute for AI in Brain Health, University of Tübingen, Germany
- Tübingen AI Center, Tübingen, Germany
| |
Collapse
|
25
|
Abstract
Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.
Collapse
Affiliation(s)
- Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California, United States of America
| |
Collapse
|
26
|
Janssen P, Kliesmete Z, Vieth B, Adiconis X, Simmons S, Marshall J, McCabe C, Heyn H, Levin JZ, Enard W, Hellmann I. The effect of background noise and its removal on the analysis of single-cell expression data. Genome Biol 2023; 24:140. [PMID: 37337297 DOI: 10.1186/s13059-023-02978-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 05/26/2023] [Indexed: 06/21/2023] Open
Abstract
BACKGROUND In droplet-based single-cell and single-nucleus RNA-seq experiments, not all reads associated with one cell barcode originate from the encapsulated cell. Such background noise is attributed to spillage from cell-free ambient RNA or barcode swapping events. RESULTS Here, we characterize this background noise exemplified by three scRNA-seq and two snRNA-seq replicates of mouse kidneys. For each experiment, cells from two mouse subspecies are pooled, allowing to identify cross-genotype contaminating molecules and thus profile background noise. Background noise is highly variable across replicates and cells, making up on average 3-35% of the total counts (UMIs) per cell and we find that noise levels are directly proportional to the specificity and detectability of marker genes. In search of the source of background noise, we find multiple lines of evidence that the majority of background molecules originates from ambient RNA. Finally, we use our genotype-based estimates to evaluate the performance of three methods (CellBender, DecontX, SoupX) that are designed to quantify and remove background noise. We find that CellBender provides the most precise estimates of background noise levels and also yields the highest improvement for marker gene detection. By contrast, clustering and classification of cells are fairly robust towards background noise and only small improvements can be achieved by background removal that may come at the cost of distortions in fine structure. CONCLUSIONS Our findings help to better understand the extent, sources and impact of background noise in single-cell experiments and provide guidance on how to deal with it.
Collapse
Affiliation(s)
- Philipp Janssen
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians University, Munich, Germany
| | - Zane Kliesmete
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians University, Munich, Germany
| | - Beate Vieth
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians University, Munich, Germany
| | - Xian Adiconis
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, USA
| | - Sean Simmons
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, USA
| | | | - Cristin McCabe
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, USA
| | - Holger Heyn
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Joshua Z Levin
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, USA
| | - Wolfgang Enard
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians University, Munich, Germany
| | - Ines Hellmann
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians University, Munich, Germany.
| |
Collapse
|
27
|
Zhang Y, Miller JA, Park J, Lelieveldt BP, Long B, Abdelaal T, Aevermann BD, Biancalani T, Comiter C, Dzyubachyk O, Eggermont J, Langseth CM, Petukhov V, Scalia G, Vaishnav ED, Zhao Y, Lein ES, Scheuermann RH. Reference-based cell type matching of in situ image-based spatial transcriptomics data on primary visual cortex of mouse brain. Sci Rep 2023; 13:9567. [PMID: 37311768 DOI: 10.1038/s41598-023-36638-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 06/07/2023] [Indexed: 06/15/2023] Open
Abstract
With the advent of multiplex fluorescence in situ hybridization (FISH) and in situ RNA sequencing technologies, spatial transcriptomics analysis is advancing rapidly, providing spatial location and gene expression information about cells in tissue sections at single cell resolution. Cell type classification of these spatially-resolved cells can be inferred by matching the spatial transcriptomics data to reference atlases derived from single cell RNA-sequencing (scRNA-seq) in which cell types are defined by differences in their gene expression profiles. However, robust cell type matching of the spatially-resolved cells to reference scRNA-seq atlases is challenging due to the intrinsic differences in resolution between the spatial and scRNA-seq data. In this study, we systematically evaluated six computational algorithms for cell type matching across four image-based spatial transcriptomics experimental protocols (MERFISH, smFISH, BaristaSeq, and ExSeq) conducted on the same mouse primary visual cortex (VISp) brain region. We find that many cells are assigned as the same type by multiple cell type matching algorithms and are present in spatial patterns previously reported from scRNA-seq studies in VISp. Furthermore, by combining the results of individual matching strategies into consensus cell type assignments, we see even greater alignment with biological expectations. We present two ensemble meta-analysis strategies used in this study and share the consensus cell type matching results in the Cytosplore Viewer ( https://viewer.cytosplore.org ) for interactive visualization and data exploration. The consensus matching can also guide spatial data analysis using SSAM, allowing segmentation-free cell type assignment.
Collapse
Affiliation(s)
- Yun Zhang
- J. Craig Venter Institute, La Jolla, CA, USA
| | | | - Jeongbin Park
- School of Biomedical Convergence Engineering, Pusan National University, Busan, Korea
| | - Boudewijn P Lelieveldt
- LKEB, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
- Pattern Recognition and Bioinformatics Group, Delft University of Technology, Delft, The Netherlands
| | - Brian Long
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Tamim Abdelaal
- LKEB, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
- Pattern Recognition and Bioinformatics Group, Delft University of Technology, Delft, The Netherlands
| | - Brian D Aevermann
- J. Craig Venter Institute, La Jolla, CA, USA
- Chan Zuckerberg Initiative, Redwood City, CA, USA
| | - Tommaso Biancalani
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Genentech, South San Francisco, CA, USA
| | | | - Oleh Dzyubachyk
- LKEB, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Jeroen Eggermont
- LKEB, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | | | - Viktor Petukhov
- Biotech Research and Innovation Centre, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Gabriele Scalia
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Genentech, South San Francisco, CA, USA
| | | | - Yilin Zhao
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Ed S Lein
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Richard H Scheuermann
- J. Craig Venter Institute, La Jolla, CA, USA.
- Department of Pathology, University of California, San Diego, CA, USA.
- Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, CA, USA.
| |
Collapse
|