1
|
Defard T, Desrentes A, Fouillade C, Mueller F. Homebuilt Imaging-Based Spatial Transcriptomics: Tertiary Lymphoid Structures as a Case Example. Methods Mol Biol 2025; 2864:77-105. [PMID: 39527218 DOI: 10.1007/978-1-0716-4184-2_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Spatial transcriptomics methods provide insight into the cellular heterogeneity and spatial architecture of complex, multicellular systems. Combining molecular and spatial information provides important clues to study tissue architecture in development and disease. Here, we present a comprehensive do-it-yourself (DIY) guide to perform such experiments at reduced costs leveraging open-source approaches. This guide spans the entire life cycle of a project, from its initial definition to experimental choices, wet lab approaches, instrumentation, and analysis. As a concrete example, we focus on tertiary lymphoid structures (TLS), which we use to develop typical questions that can be addressed by these approaches.
Collapse
Affiliation(s)
- Thomas Defard
- Institut Pasteur, Université Paris Cité, Photonic Bio-Imaging, Centre de Ressources et Recherches Technologiques (UTechS-PBI, C2RT), Paris, France
- Institut Pasteur, Université Paris Cité, Imaging and Modeling Unit, Paris, France
- Centre for Computational Biology (CBIO), Mines Paris, PSL University, Paris, France
- Institut Curie, PSL University, Paris, France
- INSERM, U900, Paris, France
| | - Auxence Desrentes
- UMRS1135 Sorbonne University, Paris, France
- INSERM U1135, Paris, France
- Team "Immune Microenvironment and Immunotherapy", Centre for Immunology and Microbial Infections (CIMI), Paris, France
| | - Charles Fouillade
- Institut Curie, Inserm U1021-CNRS UMR 3347, University Paris-Saclay, PSL Research University, Centre Universitaire, Orsay, France
| | - Florian Mueller
- Institut Pasteur, Université Paris Cité, Photonic Bio-Imaging, Centre de Ressources et Recherches Technologiques (UTechS-PBI, C2RT), Paris, France.
- Institut Pasteur, Université Paris Cité, Imaging and Modeling Unit, Paris, France.
| |
Collapse
|
2
|
Gulati GS, D'Silva JP, Liu Y, Wang L, Newman AM. Profiling cell identity and tissue architecture with single-cell and spatial transcriptomics. Nat Rev Mol Cell Biol 2025; 26:11-31. [PMID: 39169166 DOI: 10.1038/s41580-024-00768-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/16/2024] [Indexed: 08/23/2024]
Abstract
Single-cell transcriptomics has broadened our understanding of cellular diversity and gene expression dynamics in healthy and diseased tissues. Recently, spatial transcriptomics has emerged as a tool to contextualize single cells in multicellular neighbourhoods and to identify spatially recurrent phenotypes, or ecotypes. These technologies have generated vast datasets with targeted-transcriptome and whole-transcriptome profiles of hundreds to millions of cells. Such data have provided new insights into developmental hierarchies, cellular plasticity and diverse tissue microenvironments, and spurred a burst of innovation in computational methods for single-cell analysis. In this Review, we discuss recent advancements, ongoing challenges and prospects in identifying and characterizing cell states and multicellular neighbourhoods. We discuss recent progress in sample processing, data integration, identification of subtle cell states, trajectory modelling, deconvolution and spatial analysis. Furthermore, we discuss the increasing application of deep learning, including foundation models, in analysing single-cell and spatial transcriptomics data. Finally, we discuss recent applications of these tools in the fields of stem cell biology, immunology, and tumour biology, and the future of single-cell and spatial transcriptomics in biological research and its translation to the clinic.
Collapse
Affiliation(s)
- Gunsagar S Gulati
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Yunhe Liu
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Linghua Wang
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX, USA
| | - Aaron M Newman
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA.
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA.
- Chan Zuckerberg Biohub - San Francisco, San Francisco, CA, USA.
| |
Collapse
|
3
|
Kuemmerle LB, Luecken MD, Firsova AB, Barros de Andrade E Sousa L, Straßer L, Mekki II, Campi F, Heumos L, Shulman M, Beliaeva V, Hediyeh-Zadeh S, Schaar AC, Mahbubani KT, Sountoulidis A, Balassa T, Kovacs F, Horvath P, Piraud M, Ertürk A, Samakovlis C, Theis FJ. Probe set selection for targeted spatial transcriptomics. Nat Methods 2024; 21:2260-2270. [PMID: 39558096 PMCID: PMC11621025 DOI: 10.1038/s41592-024-02496-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 09/30/2024] [Indexed: 11/20/2024]
Abstract
Targeted spatial transcriptomic methods capture the topology of cell types and states in tissues at single-cell and subcellular resolution by measuring the expression of a predefined set of genes. The selection of an optimal set of probed genes is crucial for capturing the spatial signals present in a tissue. This requires selecting the most informative, yet minimal, set of genes to profile (gene set selection) for which it is possible to build probes (probe design). However, current selections often rely on marker genes, precluding them from detecting continuous spatial signals or new states. We present Spapros, an end-to-end probe set selection pipeline that optimizes both gene set specificity for cell type identification and within-cell type expression variation to resolve spatially distinct populations while considering prior knowledge as well as probe design and expression constraints. We evaluated Spapros and show that it outperforms other selection approaches in both cell type recovery and recovering expression variation beyond cell types. Furthermore, we used Spapros to design a single-cell resolution in situ hybridization on tissues (SCRINSHOT) experiment of adult lung tissue to demonstrate how probes selected with Spapros identify cell types of interest and detect spatial variation even within cell types.
Collapse
Affiliation(s)
- Louis B Kuemmerle
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Institute for Tissue Engineering and Regenerative Medicine, Helmholtz Zentrum München, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Malte D Luecken
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Institute of Lung Health & Immunity, Helmholtz Munich, Member of the German Center for Lung Research (DZL), Munich, Germany
- German Center for Lung Research (DZL), Gießen, Germany
| | - Alexandra B Firsova
- SciLifeLab and Department of Molecular Biosciences, Stockholm University, Stockholm, Sweden
| | | | - Lena Straßer
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | | | - Francesco Campi
- Helmholtz AI, Helmholtz Zentrum München, Neuherberg, Germany
| | - Lukas Heumos
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- Institute of Lung Biology and Disease and Comprehensive Pneumology Center, Helmholtz Zentrum München, German Center for Lung Research (DZL), Munich, Germany
| | - Maiia Shulman
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | - Valentina Beliaeva
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | - Soroor Hediyeh-Zadeh
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Anna C Schaar
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- TUM School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
- Munich Center for Machine Learning, Technical University of Munich, Munich, Germany
| | - Krishnaa T Mahbubani
- Department of Surgery, University of Cambridge and Cambridge NIHR Biomedical Research Centre, Cambridge, UK
| | | | - Tamás Balassa
- Synthetic and Systems Biology Unit, Biological Research Centre, Eötvös Loránd Research Network, Szeged, Hungary
| | | | - Peter Horvath
- Synthetic and Systems Biology Unit, Biological Research Centre, Eötvös Loránd Research Network, Szeged, Hungary
- Institute of AI for Health, Helmholtz Zentrum München, Neuherberg, Germany
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Marie Piraud
- Helmholtz AI, Helmholtz Zentrum München, Neuherberg, Germany
| | - Ali Ertürk
- Institute for Tissue Engineering and Regenerative Medicine, Helmholtz Zentrum München, Neuherberg, Germany
- Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University Munich, Munich, Germany
- Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
- School of Medicine, Koç University, İstanbul, Turkey
| | - Christos Samakovlis
- SciLifeLab and Department of Molecular Biosciences, Stockholm University, Stockholm, Sweden
- Cardiopulmonary Institute, Justus Liebig University, Giessen, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany.
- School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.
- School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
| |
Collapse
|
4
|
Borah K, Das HS, Seth S, Mallick K, Rahaman Z, Mallik S. A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis. Funct Integr Genomics 2024; 24:139. [PMID: 39158621 DOI: 10.1007/s10142-024-01415-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/20/2024]
Abstract
Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.
Collapse
Affiliation(s)
- Kasmika Borah
- Department of Computer Science and Information Technology, Cotton University, Panbazar, Guwahati, 781001, Assam, India
| | - Himanish Shekhar Das
- Department of Computer Science and Information Technology, Cotton University, Panbazar, Guwahati, 781001, Assam, India.
| | - Soumita Seth
- Department of Computer Science and Engineering, Future Institute of Engineering and Management, Narendrapur, Kolkata, 700150, West Bengal, India
| | - Koushik Mallick
- Department of Computer Science and Engineering, RCC Institute of Information Technology, Canal S Rd, Beleghata, Kolkata, 700015, West Bengal, India
| | | | - Saurav Mallik
- Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA, 02115, USA.
- Department of Pharmacology & Toxicology, University of Arizona, Tucson, AZ, 85721, USA.
| |
Collapse
|
5
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-023-2561-0. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
6
|
Chen M. Beyond variability: a novel gene expression stability metric to unveil homeostasis and regulation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.28.596283. [PMID: 38854149 PMCID: PMC11160662 DOI: 10.1101/2024.05.28.596283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
The concept of gene expression stability within a homeostatic cell is explored through the gene homeostasis Z-index, a measure that highlights genes under active regulation in response to internal and external stimuli. This index reveals distinct regulatory activities and patterns in different organs, such as enhanced synaptic transmission in pancreatic islets. The research indicates that traditional mean-based methods may miss these nuances, underlining the significance of new metrics in identifying gene regulation specifics in cellular adaptation.
Collapse
|
7
|
Ranek JS, Stallaert W, Milner JJ, Redick M, Wolff SC, Beltran AS, Stanley N, Purvis JE. DELVE: feature selection for preserving biological trajectories in single-cell data. Nat Commun 2024; 15:2765. [PMID: 38553455 PMCID: PMC10980758 DOI: 10.1038/s41467-024-46773-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 03/07/2024] [Indexed: 04/02/2024] Open
Abstract
Single-cell technologies can measure the expression of thousands of molecular features in individual cells undergoing dynamic biological processes. While examining cells along a computationally-ordered pseudotime trajectory can reveal how changes in gene or protein expression impact cell fate, identifying such dynamic features is challenging due to the inherent noise in single-cell data. Here, we present DELVE, an unsupervised feature selection method for identifying a representative subset of molecular features which robustly recapitulate cellular trajectories. In contrast to previous work, DELVE uses a bottom-up approach to mitigate the effects of confounding sources of variation, and instead models cell states from dynamic gene or protein modules based on core regulatory complexes. Using simulations, single-cell RNA sequencing, and iterative immunofluorescence imaging data in the context of cell cycle and cellular differentiation, we demonstrate how DELVE selects features that better define cell-types and cell-type transitions. DELVE is available as an open-source python package: https://github.com/jranek/delve .
Collapse
Affiliation(s)
- Jolene S Ranek
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Wayne Stallaert
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - J Justin Milner
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC, USA
| | - Margaret Redick
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Samuel C Wolff
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Adriana S Beltran
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Human Pluripotent Cell Core, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC, USA
| | - Natalie Stanley
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | - Jeremy E Purvis
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
8
|
Gregory W, Sarwar N, Kevrekidis G, Villar S, Dumitrascu B. MarkerMap: nonlinear marker selection for single-cell studies. NPJ Syst Biol Appl 2024; 10:17. [PMID: 38351188 PMCID: PMC10864304 DOI: 10.1038/s41540-024-00339-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 01/17/2024] [Indexed: 02/16/2024] Open
Abstract
Single-cell RNA-seq data allow the quantification of cell type differences across a growing set of biological contexts. However, pinpointing a small subset of genomic features explaining this variability can be ill-defined and computationally intractable. Here we introduce MarkerMap, a generative model for selecting minimal gene sets which are maximally informative of cell type origin and enable whole transcriptome reconstruction. MarkerMap provides a scalable framework for both supervised marker selection, aimed at identifying specific cell type populations, and unsupervised marker selection, aimed at gene expression imputation and reconstruction. We benchmark MarkerMap's competitive performance against previously published approaches on real single cell gene expression data sets. MarkerMap is available as a pip installable package, as a community resource aimed at developing explainable machine learning techniques for enhancing interpretability in single-cell studies.
Collapse
Affiliation(s)
- Wilson Gregory
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Nabeel Sarwar
- Center for Data Science, New York University, New York, NY, 10012, USA
| | - George Kevrekidis
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Soledad Villar
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, 21218, USA.
- Mathematical Institute for Data Science, Johns Hopkins University, Baltimore, MD, 21218, USA.
| | - Bianca Dumitrascu
- Department of Statistics, Columbia University, New York, NY, 10027, USA.
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, 10027, USA.
| |
Collapse
|
9
|
Zhang Y, Petukhov V, Biederstedt E, Que R, Zhang K, Kharchenko PV. Gene panel selection for targeted spatial transcriptomics. Genome Biol 2024; 25:35. [PMID: 38273415 PMCID: PMC10811939 DOI: 10.1186/s13059-024-03174-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 01/12/2024] [Indexed: 01/27/2024] Open
Abstract
Targeted spatial transcriptomics hold particular promise in analyzing complex tissues. Most such methods, however, measure only a limited panel of transcripts, which need to be selected in advance to inform on the cell types or processes being studied. A limitation of existing gene selection methods is their reliance on scRNA-seq data, ignoring platform effects between technologies. Here we describe gpsFISH, a computational method performing gene selection through optimizing detection of known cell types. By modeling and adjusting for platform effects, gpsFISH outperforms other methods. Furthermore, gpsFISH can incorporate cell type hierarchies and custom gene preferences to accommodate diverse design requirements.
Collapse
Affiliation(s)
- Yida Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Neurobiology, Duke University, Durham, NC, USA
| | - Viktor Petukhov
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Evan Biederstedt
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Richard Que
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Kun Zhang
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
- San Diego Institute of Science, Altos Labs, San Diego, CA, USA
| | - Peter V Kharchenko
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- San Diego Institute of Science, Altos Labs, San Diego, CA, USA.
| |
Collapse
|
10
|
Wang X, Duan M, Li J, Ma A, Xin G, Xu D, Li Z, Liu B, Ma Q. MarsGT: Multi-omics analysis for rare population inference using single-cell graph transformer. Nat Commun 2024; 15:338. [PMID: 38184630 PMCID: PMC10771517 DOI: 10.1038/s41467-023-44570-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 12/14/2023] [Indexed: 01/08/2024] Open
Abstract
Rare cell populations are key in neoplastic progression and therapeutic response, offering potential intervention targets. However, their computational identification and analysis often lag behind major cell types. To fill this gap, we introduce MarsGT: Multi-omics Analysis for Rare population inference using a Single-cell Graph Transformer. It identifies rare cell populations using a probability-based heterogeneous graph transformer on single-cell multi-omics data. MarsGT outperforms existing tools in identifying rare cells across 550 simulated and four real human datasets. In mouse retina data, it reveals unique subpopulations of rare bipolar cells and a Müller glia cell subpopulation. In human lymph node data, MarsGT detects an intermediate B cell population potentially acting as lymphoma precursors. In human melanoma data, it identifies a rare MAIT-like population impacted by a high IFN-I response and reveals the mechanism of immunotherapy. Hence, MarsGT offers biological insights and suggests potential strategies for early detection and therapeutic intervention of disease.
Collapse
Affiliation(s)
- Xiaoying Wang
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Maoteng Duan
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Jingxian Li
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Gang Xin
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
| | - Zihai Li
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
11
|
Liu Y. methylClass: an R package to construct DNA methylation-based classification models. Brief Bioinform 2023; 25:bbad485. [PMID: 38205965 PMCID: PMC10782803 DOI: 10.1093/bib/bbad485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 12/02/2023] [Accepted: 12/05/2023] [Indexed: 01/12/2024] Open
Abstract
DNA methylation profiling is a useful tool to increase the accuracy of a cancer diagnosis. However, a comprehensive R package specially for it is lacking. Hence, we developed the R package methylClass for methylation-based classification. Within it, we provide the eSVM (ensemble-based support vector machine) model to achieve much higher accuracy in methylation data classification than the popular random forest model and overcome the time-consuming problem of the traditional SVM. In addition, some novel feature selection methods are included in the package to improve the classification. Furthermore, because methylation data can be converted to other omics, such as copy number variation data, we also provide functions for multi-omics studies. The testing of this package on four datasets shows the accurate performance of our package, especially eSVM, which can be used in both methylation and multi-omics models and outperforms other methods in both cases. methylClass is available at: https://github.com/yuabrahamliu/methylClass.
Collapse
Affiliation(s)
- Yu Liu
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| |
Collapse
|
12
|
Zhang C, Duan ZW, Xu YP, Liu J, Li HD. FEED: a feature selection method based on gene expression decomposition for single cell clustering. Brief Bioinform 2023; 24:bbad389. [PMID: 37935617 DOI: 10.1093/bib/bbad389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 08/31/2023] [Accepted: 09/22/2023] [Indexed: 11/09/2023] Open
Abstract
Single-cell clustering is a critical step in biological downstream analysis. The clustering performance could be effectively improved by extracting cell-type-specific genes. The state-of-the-art feature selection methods usually calculate the importance of a single gene without considering the information contained in the gene expression distribution. Moreover, these methods ignore the intrinsic expression patterns of genes and heterogeneity within groups of different mean expression levels. In this work, we present a Feature sElection method based on gene Expression Decomposition (FEED) of scRNA-seq data, which selects informative genes to enhance clustering performance. First, the expression levels of genes are decomposed into multiple Gaussian components. Then, a novel gene correlation calculation method is proposed to measure the relationship between genes from the perspective of distribution. Finally, a permutation-based approach is proposed to determine the threshold of gene importance to obtain marker gene subsets. Compared with state-of-the-art feature selection methods, applying FEED on various scRNA-seq datasets including large datasets followed by different common clustering algorithms results in significant improvements in the accuracy of cell-type identification. The source codes for FEED are freely available at https://github.com/genemine/FEED.
Collapse
Affiliation(s)
- Chao Zhang
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Zhi-Wei Duan
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Yun-Pei Xu
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Jin Liu
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
13
|
Wang X, Duan M, Li J, Ma A, Xu D, Li Z, Liu B, Ma Q. MarsGT: Multi-omics analysis for rare population inference using single-cell graph transformer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.15.553454. [PMID: 37645917 PMCID: PMC10462017 DOI: 10.1101/2023.08.15.553454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Rare cell populations are key in neoplastic progression and therapeutic response, offering potential intervention targets. However, their computational identification and analysis often lag behind major cell types. To fill this gap, we introduced MarsGT: Multi-omics Analysis for Rare population inference using Single-cell Graph Transformer. It identifies rare cell populations using a probability-based heterogeneous graph transformer on single-cell multi-omics data. MarsGT outperformed existing tools in identifying rare cells across 400 simulated and four real human datasets. In mouse retina data, it revealed unique subpopulations of rare bipolar cells and a Müller glia cell subpopulation. In human lymph node data, MarsGT detected an intermediate B cell population potentially acting as lymphoma precursors. In human melanoma data, it identified a rare MAIT-like population impacted by a high IFN-I response and revealed the mechanism of immunotherapy. Hence, MarsGT offers biological insights and suggests potential strategies for early detection and therapeutic intervention of disease.
Collapse
Affiliation(s)
- Xiaoying Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Maoteng Duan
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Jingxian Li
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Zihai Li
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
14
|
Choi J, Li J, Ferdous S, Liang Q, Moffitt JR, Chen R. Spatial organization of the mouse retina at single cell resolution by MERFISH. Nat Commun 2023; 14:4929. [PMID: 37582959 PMCID: PMC10427710 DOI: 10.1038/s41467-023-40674-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 08/07/2023] [Indexed: 08/17/2023] Open
Abstract
The visual signal processing in the retina requires the precise organization of diverse neuronal types working in concert. While single-cell omics studies have identified more than 120 different neuronal subtypes in the mouse retina, little is known about their spatial organization. Here, we generated the single-cell spatial atlas of the mouse retina using multiplexed error-robust fluorescence in situ hybridization (MERFISH). We profiled over 390,000 cells and identified all major cell types and nearly all subtypes through the integration with reference single-cell RNA sequencing (scRNA-seq) data. Our spatial atlas allowed simultaneous examination of nearly all cell subtypes in the retina, revealing 8 previously unknown displaced amacrine cell subtypes and establishing the connection between the molecular classification of many cell subtypes and their spatial arrangement. Furthermore, we identified spatially dependent differential gene expression between subtypes, suggesting the possibility of functional tuning of neuronal types based on location.
Collapse
Affiliation(s)
- Jongsu Choi
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Jin Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Salma Ferdous
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Qingnan Liang
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Jeffrey R Moffitt
- Program in Cellular and Molecular Medicine, Boston Children's Hospital; Department of Microbiology, Harvard Medical School, Boston, MA, 02115, USA
| | - Rui Chen
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
15
|
Li X, Korkut A. Recurrent composite markers of cell types and states. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.17.549344. [PMID: 37503180 PMCID: PMC10370072 DOI: 10.1101/2023.07.17.549344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Determining concise sets of genomic markers that identify cell types and states within tissue ecosystems remains challenging. To address this challenge, we developed Recurrent Composite Markers for Biological Identities with Neighborhood Enrichment (RECOMBINE). Validations of RECOMBINE with simulation and transcriptomics data in bulk, single-cell and spatial resolutions demonstrated the method's ability for unbiased selection of composite markers that characterize biological subpopulations. RECOMBINE captured markers of mouse visual cortex from single-cell RNA sequencing data and provided a gene panel for targeted spatial transcriptomics profiling. RECOMBINE identified composite markers of CD8 T cell states including GZMK + HAVCR2 - effector memory cells associated with anti-PD1 therapy response. The method outperformed differential gene expression analysis in characterizing a rare cell subpopulation within mouse intestine. Using RECOMBINE, we uncovered hierarchical gene programs of inter- and intra-tumoral heterogeneity in breast and skin tumors. In conclusion, RECOMBINE offers a data-driven approach for unbiased selection of composite markers, resulting in improved interpretation, discovery, and validation of cell types and states.
Collapse
|
16
|
Ferguson C, Zhang Y, Palego C, Cheng X. Recent Approaches to Design and Analysis of Electrical Impedance Systems for Single Cells Using Machine Learning. SENSORS (BASEL, SWITZERLAND) 2023; 23:5990. [PMID: 37447838 DOI: 10.3390/s23135990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 06/17/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023]
Abstract
Individual cells have many unique properties that can be quantified to develop a holistic understanding of a population. This can include understanding population characteristics, identifying subpopulations, or elucidating outlier characteristics that may be indicators of disease. Electrical impedance measurements are rapid and label-free for the monitoring of single cells and generate large datasets of many cells at single or multiple frequencies. To increase the accuracy and sensitivity of measurements and define the relationships between impedance and biological features, many electrical measurement systems have incorporated machine learning (ML) paradigms for control and analysis. Considering the difficulty capturing complex relationships using traditional modelling and statistical methods due to population heterogeneity, ML offers an exciting approach to the systemic collection and analysis of electrical properties in a data-driven way. In this work, we discuss incorporation of ML to improve the field of electrical single cell analysis by addressing the design challenges to manipulate single cells and sophisticated analysis of electrical properties that distinguish cellular changes. Looking forward, we emphasize the opportunity to build on integrated systems to address common challenges in data quality and generalizability to save time and resources at every step in electrical measurement of single cells.
Collapse
Affiliation(s)
- Caroline Ferguson
- Department of Bioengineering, Lehigh University, Bethlehem, PA 18015, USA
| | - Yu Zhang
- Department of Bioengineering, Lehigh University, Bethlehem, PA 18015, USA
| | - Cristiano Palego
- Department of Computer Science and Electronic Engineering, Bangor University, Bangor LL57 2DG, UK
| | - Xuanhong Cheng
- Department of Bioengineering, Lehigh University, Bethlehem, PA 18015, USA
- Department of Materials Science and Engineering, Lehigh University, Bethlehem, PA 18015, USA
| |
Collapse
|
17
|
Alshammari A. Ensemble recurrent neural network with whale optimization algorithm-based DNA sequence classification for medical applications. Soft comput 2023:1-14. [PMID: 37362270 PMCID: PMC10231859 DOI: 10.1007/s00500-023-08435-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/04/2023] [Indexed: 06/28/2023]
Abstract
The modern data-driven era has facilitated the gathering of large quantities of biomedical and clinical data. The deoxyribonucleic acid gene expression datasets have become a vital focus for the research community because of their capability to detect pathogens via 'biomarkers' or particular modifications in the gene sequence which portray a specific pathogen. Metaheuristic-related feature selection (FS) efficiently filters out only the pertinent genes out of large feature sets to lessen the data storage and computation requirements. This paper embraces the whale optimization algorithm for the FS issue in HD microarray data for the effectual propagation of candidate solutions to reach global optima over sufficient iterations. The chosen data are classified by employing an ensemble recurrent network (ERNN) that retains the amalgamation of long short-term memory, bidirectional long short-term memory, and gated recurrent units. Analysis of this proposed ERNN methodology would be performed by correlating with diverse advanced methodologies, and thus, the ERNN attains 99.59% precision and 99.59% accuracy.
Collapse
Affiliation(s)
- Abdulaziz Alshammari
- Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
| |
Collapse
|
18
|
Ranek JS, Stallaert W, Milner J, Stanley N, Purvis JE. Feature selection for preserving biological trajectories in single-cell data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.09.540043. [PMID: 37214963 PMCID: PMC10197710 DOI: 10.1101/2023.05.09.540043] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Single-cell technologies can readily measure the expression of thousands of molecular features from individual cells undergoing dynamic biological processes, such as cellular differentiation, immune response, and disease progression. While examining cells along a computationally ordered pseudotime offers the potential to study how subtle changes in gene or protein expression impact cell fate decision-making, identifying characteristic features that drive continuous biological processes remains difficult to detect from unenriched and noisy single-cell data. Given that all profiled sources of feature variation contribute to the cell-to-cell distances that define an inferred cellular trajectory, including confounding sources of biological variation (e.g. cell cycle or metabolic state) or noisy and irrelevant features (e.g. measurements with low signal-to-noise ratio) can mask the underlying trajectory of study and hinder inference. Here, we present DELVE (dynamic selection of locally covarying features), an unsupervised feature selection method for identifying a representative subset of dynamically-expressed molecular features that recapitulates cellular trajectories. In contrast to previous work, DELVE uses a bottom-up approach to mitigate the effect of unwanted sources of variation confounding inference, and instead models cell states from dynamic feature modules that constitute core regulatory complexes. Using simulations, single-cell RNA sequencing data, and iterative immunofluorescence imaging data in the context of the cell cycle and cellular differentiation, we demonstrate that DELVE selects features that more accurately characterize cell populations and improve the recovery of cell type transitions. This feature selection framework provides an alternative approach for improving trajectory inference and uncovering co-variation amongst features along a biological trajectory. DELVE is implemented as an open-source python package and is publicly available at: https://github.com/jranek/delve.
Collapse
Affiliation(s)
- Jolene S. Ranek
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Wayne Stallaert
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Justin Milner
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Natalie Stanley
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jeremy E. Purvis
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
19
|
Zhang Y, Petukhov V, Biederstedt E, Que R, Zhang K, Kharchenko PV. Gene panel selection for targeted spatial transcriptomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.03.527053. [PMID: 36993340 PMCID: PMC10054990 DOI: 10.1101/2023.02.03.527053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Targeted spatial transcriptomics hold particular promise in analysis of complex tissues. Most such methods, however, measure only a limited panel of transcripts, which need to be selected in advance to inform on the cell types or processes being studied. A limitation of existing gene selection methods is that they rely on scRNA-seq data, ignoring platform effects between technologies. Here we describe gpsFISH, a computational method to perform gene selection through optimizing detection of known cell types. By modeling and adjusting for platform effects, gpsFISH outperforms other methods. Furthermore, gpsFISH can incorporate cell type hierarchies and custom gene preferences to accommodate diverse design requirements.
Collapse
Affiliation(s)
- Yida Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Neurobiology, Duke University, Durham, NC, USA
| | - Viktor Petukhov
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Evan Biederstedt
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Richard Que
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Kun Zhang
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
- San Diego Institute of Science, Altos Labs, San Diego, CA, USA
| | - Peter V. Kharchenko
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- San Diego Institute of Science, Altos Labs, San Diego, CA, USA
| |
Collapse
|
20
|
Li X, Lin Y, Xie C, Li Z, Chen M, Wang P, Zhou J. A Clustering Method Unifying Cell-Type Recognition and Subtype Identification for Tumor Heterogeneity Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:822-832. [PMID: 36044493 DOI: 10.1109/tcbb.2022.3203185] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The rapid development of single-cell technology has opened up a whole new perspective for identifying cell types in multicellular organisms and understanding the relationships between them. Distinguishing different cell types and subtypes can identify the components of different immune cells and different tumor clones in the tumor microenvironment, which is the basic work of tumor heterogeneity analysis and can help researchers understand the mechanism of tumor immune escape. Existing algorithms treat both cell types and subtypes as populations of cells with specific gene expression patterns, which is not conducive to accurate cell typing. For that, we proposed a cell similarity metric that unifies cell type recognition and subtype identification (UCRSI), with the assumption that selectively expressed genes represent differences in underlying cell type with on/off manner, while differences in expression level represent different cell subtype with more/less manner. Our method calculates these two kinds of differences separately, and then combines them using a consensus adjacency matrix, and finally cell typing is completed using spectral clustering algorithm. The results show that UCRSI can reconstruct expert annotation of single-cell RNA sequencing datasets more robustly than existing methods. And, UCRSI is useful for analyzing tumor heterogeneity and improving visualization of large-scale cell clustering.
Collapse
|
21
|
Hou J, Liang S, Xu C, Wei Y, Wang Y, Tan Y, Sahni N, McGrail D, Bernatchez C, Davies M, Li Y, Chen R, Yi S, Chen Y, Yee C, Chen K, Peng W. Single-cell CRISPR immune screens reveal immunological roles of tumor intrinsic factors. NAR Cancer 2022; 4:zcac038. [PMID: 36518525 PMCID: PMC9732527 DOI: 10.1093/narcan/zcac038] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/15/2022] [Accepted: 11/16/2022] [Indexed: 12/14/2022] Open
Abstract
Genetic screens are widely exploited to develop novel therapeutic approaches for cancer treatment. With recent advances in single-cell technology, single-cell CRISPR screen (scCRISPR) platforms provide opportunities for target validation and mechanistic studies in a high-throughput manner. Here, we aim to establish scCRISPR platforms which are suitable for immune-related screens involving multiple cell types. We integrated two scCRISPR platforms, namely Perturb-seq and CROP-seq, with both in vitro and in vivo immune screens. By leveraging previously generated resources, we optimized experimental conditions and data analysis pipelines to achieve better consistency between results from high-throughput and individual validations. Furthermore, we evaluated the performance of scCRISPR immune screens in determining underlying mechanisms of tumor intrinsic immune regulation. Our results showed that scCRISPR platforms can simultaneously characterize gene expression profiles and perturbation effects present in individual cells in different immune screen conditions. Results from scCRISPR immune screens also predict transcriptional phenotype associated with clinical responses to cancer immunotherapy. More importantly, scCRISPR screen platforms reveal the interactive relationship between targeting tumor intrinsic factors and T cell-mediated antitumor immune response which cannot be easily assessed by bulk RNA-seq. Collectively, scCRISPR immune screens provide scalable and reliable platforms to elucidate molecular determinants of tumor immune resistance.
Collapse
Affiliation(s)
- Jiakai Hou
- Department of Biology and Biochemistry, University of Houston, Houston, TX, USA
| | - Shaoheng Liang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Chunyu Xu
- Department of Biology and Biochemistry, University of Houston, Houston, TX, USA
| | - Yanjun Wei
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Yunfei Wang
- Department of Melanoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Yukun Tan
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Nidhi Sahni
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Daniel J McGrail
- Center for Immunotherapy and Precision Immuno-Oncology, Cleveland Clinic, Cleveland, OH, USA
| | - Chantale Bernatchez
- Department of Melanoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Michael Davies
- Department of Melanoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Yumei Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Rui Chen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - S Stephen Yi
- Department of Oncology, Livestrong Cancer Institutes, and Department of Biomedical Engineering, The University of Texas at Austin, Austin, TX, USA
- Interdisciplinary Life Sciences Graduate Programs (ILSGP) and Oden Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin, Austin, TX, USA
| | - Yiwen Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Cassian Yee
- Department of Melanoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Immunology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Weiyi Peng
- Department of Biology and Biochemistry, University of Houston, Houston, TX, USA
| |
Collapse
|
22
|
Missarova A, Jain J, Butler A, Ghazanfar S, Stuart T, Brusko M, Wasserfall C, Nick H, Brusko T, Atkinson M, Satija R, Marioni JC. geneBasis: an iterative approach for unsupervised selection of targeted gene panels from scRNA-seq. Genome Biol 2021; 22:333. [PMID: 34872616 PMCID: PMC8650258 DOI: 10.1186/s13059-021-02548-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 11/19/2021] [Indexed: 12/13/2022] Open
Abstract
scRNA-seq datasets are increasingly used to identify gene panels that can be probed using alternative technologies, such as spatial transcriptomics, where choosing the best subset of genes is vital. Existing methods are limited by a reliance on pre-existing cell type labels or by difficulties in identifying markers of rare cells. We introduce an iterative approach, geneBasis, for selecting an optimal gene panel, where each newly added gene captures the maximum distance between the true manifold and the manifold constructed using the currently selected gene panel. Our approach outperforms existing strategies and can resolve cell types and subtle cell state differences.
Collapse
Affiliation(s)
- Alsu Missarova
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | | | - Andrew Butler
- New York Genome Center, New York, USA
- Center for Genomics and Systems Biology, NYU, New York, USA
| | - Shila Ghazanfar
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Tim Stuart
- New York Genome Center, New York, USA
- Center for Genomics and Systems Biology, NYU, New York, USA
| | - Maigan Brusko
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Jacksonville, USA
| | - Clive Wasserfall
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Jacksonville, USA
| | - Harry Nick
- Department of Neuroscience, College of Medicine, University of Florida, Jacksonville, USA
| | - Todd Brusko
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Jacksonville, USA
| | - Mark Atkinson
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Jacksonville, USA
| | - Rahul Satija
- New York Genome Center, New York, USA.
- Center for Genomics and Systems Biology, NYU, New York, USA.
| | - John C Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|
23
|
Yang P, Huang H, Liu C. Feature selection revisited in the single-cell era. Genome Biol 2021; 22:321. [PMID: 34847932 PMCID: PMC8638336 DOI: 10.1186/s13059-021-02544-3] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 11/15/2021] [Indexed: 12/13/2022] Open
Abstract
Recent advances in single-cell biotechnologies have resulted in high-dimensional datasets with increased complexity, making feature selection an essential technique for single-cell data analysis. Here, we revisit feature selection techniques and summarise recent developments. We review their application to a range of single-cell data types generated from traditional cytometry and imaging technologies and the latest array of single-cell omics technologies. We highlight some of the challenges and future directions and finally consider their scalability and make general recommendations on each type of feature selection method. We hope this review stimulates future research and application of feature selection in the single-cell era.
Collapse
Affiliation(s)
- Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, 2006, Australia.
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia.
- Charles Perkins Centre, University of Sydney, Sydney, NSW, 2006, Australia.
| | - Hao Huang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, 2006, Australia
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
| | - Chunlei Liu
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
| |
Collapse
|