1
|
Aplakidou E, Vergoulidis N, Chasapi M, Venetsianou NK, Kokoli M, Panagiotopoulou E, Iliopoulos I, Karatzas E, Pafilis E, Georgakopoulos-Soares I, Kyrpides NC, Pavlopoulos GA, Baltoumas FA. Visualizing metagenomic and metatranscriptomic data: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2011-2033. [PMID: 38765606 PMCID: PMC11101950 DOI: 10.1016/j.csbj.2024.04.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024] Open
Abstract
The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.
Collapse
Affiliation(s)
- Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nikolaos Vergoulidis
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Chasapi
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Kokoli
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Eleni Panagiotopoulou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, 71003 Heraklion, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikos C. Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Center of New Biotechnologies & Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Greece
- Hellenic Army Academy, 16673 Vari, Greece
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| |
Collapse
|
2
|
Arango AS, Park H, Tajkhorshid E. Topological Learning Approach to Characterizing Biological Membranes. J Chem Inf Model 2024. [PMID: 38912752 DOI: 10.1021/acs.jcim.4c00552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]
Abstract
Biological membranes play key roles in cellular compartmentalization, structure, and its signaling pathways. At varying temperatures, individual membrane lipids sample from different configurations, a process that frequently leads to higher-order phase behavior and phenomena. Here, we present a persistent homology (PH)-based method for quantifying the structural features of individual and bulk lipids, providing local and contextual information on lipid tail organization. Our method leverages the mathematical machinery of algebraic topology and machine learning to infer temperature-dependent structural information on lipids from static coordinates. To train our model, we generated multiple molecular dynamics trajectories of dipalmitoyl-phosphatidylcholine membranes at varying temperatures. A fingerprint was then constructed for each set of lipid coordinates by PH filtration, in which interaction spheres were grown around the lipid atoms while tracking their intersections. The sphere filtration formed a simplicial complex that captures enduring key topological features of the configuration landscape using homology, yielding persistence data. Following fingerprint extraction for physiologically relevant temperatures, the persistence data were used to train an attention-based neural network for assignment of effective temperature values to selected membrane regions. Our persistence homology-based method captures the local structural effects, via effective temperature, of lipids adjacent to other membrane constituents, e.g., sterols and proteins. This topological learning approach can predict lipid effective temperatures from static coordinates across multiple spatial resolutions. The tool, called MembTDA, can be accessed at https://github.com/hyunp2/Memb-TDA.
Collapse
Affiliation(s)
- Andres S Arango
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Hyun Park
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Emad Tajkhorshid
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
3
|
Vardaman D, Ali MA, Bolding C, Tidwell H, Stephens H, Tyrrell DJ. Development of a Spectral Flow Cytometry Analysis Pipeline for High-Dimensional Immune Cell Characterization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.19.599633. [PMID: 38948780 PMCID: PMC11213029 DOI: 10.1101/2024.06.19.599633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Flow cytometry is a widely used technique for immune cell analysis, offering insights into cell composition and function. Spectral flow cytometry allows for high-dimensional analysis of immune cells, overcoming limitations of conventional flow cytometry. However, analyzing data from large antibody panels can be challenging using traditional bi-axial gating strategies. Here, we present a novel analysis pipeline designed to improve analysis of spectral flow cytometry. We employ this method to identify rare T cell populations in aging. We isolated splenocytes from young (2-3 months) and aged (18-19 months) female mice then stained these with a panel of 20 fluorescently labeled antibodies. Spectral flow cytometry was performed, followed by data processing and analysis using Python within a Jupyter Notebook environment to perform batch correction, unsupervised clustering, dimensionality reduction, and differential expression analysis. Our analysis of 3,776,804 T cells from 11 spleens revealed 34 distinct T cell clusters identified by surface marker expression. We observed significant differences between young and aged mice, with certain clusters enriched in one age group over the other. Naïve, effector memory, and central memory CD8 + and CD4 + T cell subsets exhibited age-associated changes in abundance and marker expression. Additionally, γδ T cell clusters showed differential abundance between age groups. By leveraging high-dimensional analysis methods borrowed from single-cell RNA sequencing analysis, we identified age-related differences in T cell subsets, providing insights into the immune aging process. This approach offers a robust, free, and easily implemented analysis pipeline for spectral flow cytometry data that may facilitate the discovery of novel therapeutic targets for age-related immune dysfunction.
Collapse
|
4
|
Lobentanzer S, Rodriguez-Mier P, Bauer S, Saez-Rodriguez J. Molecular causality in the advent of foundation models. Mol Syst Biol 2024:10.1038/s44320-024-00041-w. [PMID: 38890548 DOI: 10.1038/s44320-024-00041-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/18/2024] [Accepted: 03/21/2024] [Indexed: 06/20/2024] Open
Abstract
Correlation is not causation: this simple and uncontroversial statement has far-reaching implications. Defining and applying causality in biomedical research has posed significant challenges to the scientific community. In this perspective, we attempt to connect the partly disparate fields of systems biology, causal reasoning, and machine learning to inform future approaches in the field of systems biology and molecular medicine.
Collapse
Affiliation(s)
- Sebastian Lobentanzer
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany.
| | - Pablo Rodriguez-Mier
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany.
| |
Collapse
|
5
|
Jia Y, Ma P, Yao Q. CellMarkerPipe: cell marker identification and evaluation pipeline in single cell transcriptomes. Sci Rep 2024; 14:13151. [PMID: 38849445 PMCID: PMC11161599 DOI: 10.1038/s41598-024-63492-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 05/29/2024] [Indexed: 06/09/2024] Open
Abstract
Assessing marker genes from all cell clusters can be time-consuming and lack systematic strategy. Streamlining this process through a unified computational platform that automates identification and benchmarking will greatly enhance efficiency and ensure a fair evaluation. We therefore developed a novel computational platform, cellMarkerPipe ( https://github.com/yao-laboratory/cellMarkerPipe ), for automated cell-type specific marker gene identification from scRNA-seq data, coupled with comprehensive evaluation schema. CellMarkerPipe adaptively wraps around a collection of commonly used and state-of-the-art tools, including Seurat, COSG, SC3, SCMarker, COMET, and scGeneFit. From rigorously testing across diverse samples, we ascertain SCMarker's overall reliable performance in single marker gene selection, with COSG showing commendable speed and comparable efficacy. Furthermore, we demonstrate the pivotal role of our approach in real-world medical datasets. This general and opensource pipeline stands as a significant advancement in streamlining cell marker gene identification and evaluation, fitting broad applications in the field of cellular biology and medical research.
Collapse
Affiliation(s)
- Yinglu Jia
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA
- Department of Chemistry, University of Nebraska Lincoln, Hamilton Hall, Lincoln, NE, 68588, USA
| | - Pengchong Ma
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA
| | - Qiuming Yao
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA.
- Nebraska Center for the Prevention of Obesity Diseases, 316C Leverton Hall, Lincoln, NE, 68583, USA.
- Nebraska Center for Virology, University of Nebraska, 4240 Fair St., Lincoln, NE, 68583, USA.
| |
Collapse
|
6
|
Marx V. Seeing data as t-SNE and UMAP do. Nat Methods 2024; 21:930-933. [PMID: 38789649 DOI: 10.1038/s41592-024-02301-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
|
7
|
Taylor MA, Kandyba E, Halliwill K, Delrosario R, Khoroshkin M, Goodarzi H, Quigley D, Li YR, Wu D, Bollam SR, Mirzoeva OK, Akhurst RJ, Balmain A. Stem-cell states converge in multistage cutaneous squamous cell carcinoma development. Science 2024; 384:eadi7453. [PMID: 38815020 DOI: 10.1126/science.adi7453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 04/05/2024] [Indexed: 06/01/2024]
Abstract
Stem cells play a critical role in cancer development by contributing to cell heterogeneity, lineage plasticity, and drug resistance. We created gene expression networks from hundreds of mouse tissue samples (both normal and tumor) and integrated these with lineage tracing and single-cell RNA-seq, to identify convergence of cell states in premalignant tumor cells expressing markers of lineage plasticity and drug resistance. Two of these cell states representing multilineage plasticity or proliferation were inversely correlated, suggesting a mutually exclusive relationship. Treatment of carcinomas in vivo with chemotherapy repressed the proliferative state and activated multilineage plasticity whereas inhibition of differentiation repressed plasticity and potentiated responses to cell cycle inhibitors. Manipulation of this cell state transition point may provide a source of potential combinatorial targets for cancer therapy.
Collapse
Affiliation(s)
- Mark A Taylor
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Clinical Research Centre, Medical University of Bialystok, Bialystok 15-089, Poland
| | - Eve Kandyba
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Kyle Halliwill
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- AbbVie, South San Francisco, CA 94080, USA
| | - Reyno Delrosario
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Matvei Khoroshkin
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Hani Goodarzi
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94518, USA
- Department of Urology, University of California San Francisco, San Francisco, CA 94518, USA
- Arc Institute, Palo Alto, CA 94304, USA
| | - David Quigley
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Urology, University of California San Francisco, San Francisco, CA 94518, USA
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, CA 94518, USA
| | - Yun Rose Li
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Radiation Oncology, City of Hope National Medical Center, Duarte, CA 91010, USA
- Department of Cancer Genetics & Epigenetics, City of Hope National Medical Center, Duarte, CA 91010, USA
- Division of Quantitative Medicine & Systems Biology, Translational Genomics Research Institute, Phoenix, CA 85004, USA
| | - Di Wu
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Saumya R Bollam
- Biomedical Sciences Graduate Program, University of California San Francisco, San Francisco, CA 94518, USA
| | - Olga K Mirzoeva
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Rosemary J Akhurst
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Anatomy, University of California San Francisco, San Francisco, CA 94518, USA
| | - Allan Balmain
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94518, USA
| |
Collapse
|
8
|
Rafelski SM, Theriot JA. Establishing a conceptual framework for holistic cell states and state transitions. Cell 2024; 187:2633-2651. [PMID: 38788687 DOI: 10.1016/j.cell.2024.04.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 04/10/2024] [Accepted: 04/24/2024] [Indexed: 05/26/2024]
Abstract
Cell states were traditionally defined by how they looked, where they were located, and what functions they performed. In this post-genomic era, the field is largely focused on a molecular view of cell state. Moving forward, we anticipate that the observables used to define cell states will evolve again as single-cell imaging and analytics are advancing at a breakneck pace via the collection of large-scale, systematic cell image datasets and the application of quantitative image-based data science methods. This is, therefore, a key moment in the arc of cell biological research to develop approaches that integrate the spatiotemporal observables of the physical structure and organization of the cell with molecular observables toward the concept of a holistic cell state. In this perspective, we propose a conceptual framework for holistic cell states and state transitions that is data-driven, practical, and useful to enable integrative analyses and modeling across many data types.
Collapse
Affiliation(s)
- Susanne M Rafelski
- Allen Institute for Cell Science, 615 Westlake Avenue N, Seattle, WA 98125, USA.
| | - Julie A Theriot
- Department of Biology and Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
9
|
Mulè MP, Martins AJ, Cheung F, Farmer R, Sellers BA, Quiel JA, Jain A, Kotliarov Y, Bansal N, Chen J, Schwartzberg PL, Tsang JS. Integrating population and single-cell variations in vaccine responses identifies a naturally adjuvanted human immune setpoint. Immunity 2024; 57:1160-1176.e7. [PMID: 38697118 DOI: 10.1016/j.immuni.2024.04.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 01/21/2024] [Accepted: 04/12/2024] [Indexed: 05/04/2024]
Abstract
Multimodal single-cell profiling methods can capture immune cell variations unfolding over time at the molecular, cellular, and population levels. Transforming these data into biological insights remains challenging. Here, we introduce a framework to integrate variations at the human population and single-cell levels in vaccination responses. Comparing responses following AS03-adjuvanted versus unadjuvanted influenza vaccines with CITE-seq revealed AS03-specific early (day 1) response phenotypes, including a B cell signature of elevated germinal center competition. A correlated network of cell-type-specific transcriptional states defined the baseline immune status associated with high antibody responders to the unadjuvanted vaccine. Certain innate subsets in the network appeared "naturally adjuvanted," with transcriptional states resembling those induced uniquely by AS03-adjuvanted vaccination. Consistently, CD14+ monocytes from high responders at baseline had elevated phospho-signaling responses to lipopolysaccharide stimulation. Our findings link baseline immune setpoints to early vaccine responses, with positive implications for adjuvant development and immune response engineering.
Collapse
Affiliation(s)
- Matthew P Mulè
- Multiscale Systems Biology Section, Laboratory of Immune System Biology, NIAID, NIH, Bethesda, MD, USA; NIH-Oxford-Cambridge Scholars Program, Department of Medicine, University of Cambridge, Cambridge, UK
| | - Andrew J Martins
- Multiscale Systems Biology Section, Laboratory of Immune System Biology, NIAID, NIH, Bethesda, MD, USA
| | - Foo Cheung
- NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA
| | - Rohit Farmer
- NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA
| | - Brian A Sellers
- NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA
| | - Juan A Quiel
- NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA
| | - Arjun Jain
- Multiscale Systems Biology Section, Laboratory of Immune System Biology, NIAID, NIH, Bethesda, MD, USA
| | - Yuri Kotliarov
- NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA
| | - Neha Bansal
- Multiscale Systems Biology Section, Laboratory of Immune System Biology, NIAID, NIH, Bethesda, MD, USA
| | - Jinguo Chen
- NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA
| | - Pamela L Schwartzberg
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA; Cell Signaling and Immunity Section, NIAID, NIH, Bethesda, MD, USA
| | - John S Tsang
- Multiscale Systems Biology Section, Laboratory of Immune System Biology, NIAID, NIH, Bethesda, MD, USA; NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA.
| |
Collapse
|
10
|
Miles CE, McKinley SA, Ding F, Lehoucq RB. Inferring Stochastic Rates from Heterogeneous Snapshots of Particle Positions. Bull Math Biol 2024; 86:74. [PMID: 38740619 DOI: 10.1007/s11538-024-01301-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 04/20/2024] [Indexed: 05/16/2024]
Abstract
Many imaging techniques for biological systems-like fixation of cells coupled with fluorescence microscopy-provide sharp spatial resolution in reporting locations of individuals at a single moment in time but also destroy the dynamics they intend to capture. These snapshot observations contain no information about individual trajectories, but still encode information about movement and demographic dynamics, especially when combined with a well-motivated biophysical model. The relationship between spatially evolving populations and single-moment representations of their collective locations is well-established with partial differential equations (PDEs) and their inverse problems. However, experimental data is commonly a set of locations whose number is insufficient to approximate a continuous-in-space PDE solution. Here, motivated by popular subcellular imaging data of gene expression, we embrace the stochastic nature of the data and investigate the mathematical foundations of parametrically inferring demographic rates from snapshots of particles undergoing birth, diffusion, and death in a nuclear or cellular domain. Toward inference, we rigorously derive a connection between individual particle paths and their presentation as a Poisson spatial process. Using this framework, we investigate the properties of the resulting inverse problem and study factors that affect quality of inference. One pervasive feature of this experimental regime is the presence of cell-to-cell heterogeneity. Rather than being a hindrance, we show that cell-to-cell geometric heterogeneity can increase the quality of inference on dynamics for certain parameter regimes. Altogether, the results serve as a basis for more detailed investigations of subcellular spatial patterns of RNA molecules and other stochastically evolving populations that can only be observed for single instants in their time evolution.
Collapse
Affiliation(s)
| | - Scott A McKinley
- Department of Mathematics, Tulane University, New Orleans, LA, USA
| | - Fangyuan Ding
- Departments of Biomedical Engineering, Developmental and Cell Biology, University of California, Irvine, Irvine, USA
| | - Richard B Lehoucq
- Discrete Math and Optimization, Sandia National Laboratories, Albuquerque, NM, USA
| |
Collapse
|
11
|
Schmidt M, Avagyan S, Reiche K, Binder H, Loeffler-Wirth H. A Spatial Transcriptomics Browser for Discovering Gene Expression Landscapes across Microscopic Tissue Sections. Curr Issues Mol Biol 2024; 46:4701-4720. [PMID: 38785552 PMCID: PMC11119626 DOI: 10.3390/cimb46050284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 04/30/2024] [Accepted: 05/03/2024] [Indexed: 05/25/2024] Open
Abstract
A crucial feature of life is its spatial organization and compartmentalization on the molecular, cellular, and tissue levels. Spatial transcriptomics (ST) technology has opened a new chapter of the sequencing revolution, emerging rapidly with transformative effects across biology. This technique produces extensive and complex sequencing data, raising the need for computational methods for their comprehensive analysis and interpretation. We developed the ST browser web tool for the interactive discovery of ST images, focusing on different functional aspects such as single gene expression, the expression of functional gene sets, as well as the inspection of the spatial patterns of cell-cell interactions. As a unique feature, our tool applies self-organizing map (SOM) machine learning to the ST data. Our SOM data portrayal method generates individual gene expression landscapes for each spot in the ST image, enabling its downstream analysis with high resolution. The performance of the spatial browser is demonstrated by disentangling the intra-tumoral heterogeneity of melanoma and the microarchitecture of the mouse brain. The integration of machine-learning-based SOM portrayal into an interactive ST analysis environment opens novel perspectives for the comprehensive knowledge mining of the organization and interactions of cellular ecosystems.
Collapse
Affiliation(s)
- Maria Schmidt
- Interdisciplinary Centre for Bioinformatics (IZBI), Leipzig University, Härtelstr. 16-18, 04107 Leipzig, Germany; (M.S.); (H.B.)
| | - Susanna Avagyan
- Armenian Bioinformatics Institute, 3/6 Nelson Stepanyan Str., Yerevan 0062, Armenia
| | - Kristin Reiche
- Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology (IZI), Perlickstrasse 1, 04103 Leipzig, Germany
- Institute for Clinical Immunology, University Hospital of Leipzig, 04103 Leipzig, Germany
| | - Hans Binder
- Interdisciplinary Centre for Bioinformatics (IZBI), Leipzig University, Härtelstr. 16-18, 04107 Leipzig, Germany; (M.S.); (H.B.)
- Armenian Bioinformatics Institute, 3/6 Nelson Stepanyan Str., Yerevan 0062, Armenia
| | - Henry Loeffler-Wirth
- Interdisciplinary Centre for Bioinformatics (IZBI), Leipzig University, Härtelstr. 16-18, 04107 Leipzig, Germany; (M.S.); (H.B.)
| |
Collapse
|
12
|
Cai L, Anastassiou D. CASCC: a co-expression-assisted single-cell RNA-seq data clustering method. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae283. [PMID: 38662553 DOI: 10.1093/bioinformatics/btae283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 03/28/2024] [Accepted: 04/23/2024] [Indexed: 05/15/2024]
Abstract
SUMMARY Existing clustering methods for characterizing cell populations from single-cell RNA sequencing are constrained by several limitations stemming from the fact that clusters often cannot be homogeneous, particularly for transitioning populations. On the other hand, dominant cell populations within samples can be identified independently by their strong gene co-expression signatures using methods unrelated to partitioning. Here, we introduce a clustering method, CASCC (co-expression-assisted single-cell clustering), designed to improve biological accuracy using gene co-expression features identified using an unsupervised adaptive attractor algorithm. CASCC outperformed other methods as evidenced by multiple evaluation metrics, and our results suggest that CASCC can improve the analysis of single-cell transcriptomics, enabling potential new discoveries related to underlying biological mechanisms. AVAILABILITY AND IMPLEMENTATION The CASCC R package is publicly available at https://github.com/LingyiC/CASCC and https://zenodo.org/doi/10.5281/zenodo.10648327.
Collapse
Affiliation(s)
- Lingyi Cai
- Department of Systems Biology, Columbia University, New York, NY 10032, United States
- Department of Electrical Engineering, Columbia University, New York, NY 10027, United States
| | - Dimitris Anastassiou
- Department of Systems Biology, Columbia University, New York, NY 10032, United States
- Department of Electrical Engineering, Columbia University, New York, NY 10027, United States
- Irving Comprehensive Cancer Center, Columbia University, New York, NY 10032, United States
| |
Collapse
|
13
|
Shah M, Guo L, Xu X, Deng L, Lu K, Dong J, Zhao C, Xu J. eLIMS: Ensemble Learning-Based Spatial Segmentation of Mass Spectrometry Imaging to Explore Metabolic Heterogeneity. J Proteome Res 2024. [PMID: 38690713 DOI: 10.1021/acs.jproteome.3c00764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]
Abstract
Spatial segmentation is an essential processing method for image analysis aiming to identify the characteristic suborgans or microregions from mass spectrometry imaging (MSI) data, which is critical for understanding the spatial heterogeneity of biological information and function and the underlying molecular signatures. Due to the intrinsic characteristics of MSI data including spectral nonlinearity, high-dimensionality, and large data size, the common segmentation methods lack the capability for capturing the accurate microregions associated with biological functions. Here we proposed an ensemble learning-based spatial segmentation strategy, named eLIMS, that combines a randomized unified manifold approximation and projection (r-UMAP) dimensionality reduction module for extracting significant features and an ensemble pixel clustering module for aggregating the clustering maps from r-UMAP. Three MSI datasets are used to evaluate the performance of eLIMS, including mouse fetus, human adenocarcinoma, and mouse brain. Experimental results demonstrate that the proposed method has potential in partitioning the heterogeneous tissues into several subregions associated with anatomical structure, i.e., the suborgans of the brain region in mouse fetus data are identified as dorsal pallium, midbrain, and brainstem. Furthermore, it effectively discovers critical microregions related to physiological and pathological variations offering new insight into metabolic heterogeneity.
Collapse
Affiliation(s)
- Mudassir Shah
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, Xiamen 361005, China
| | - Lei Guo
- Interdisciplinary Institute of Medical Engineering, Fuzhou University, Fuzhou 350108, China
| | - Xiangnan Xu
- School of Business and Economics, Humboldt-Universität zu Berlin, Berlin 10099, Germany
| | - Lingli Deng
- Department of Information Engineering, East China University of Technology, Nanchang 330013, China
| | - Keyi Lu
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, Xiamen 361005, China
| | - Jiyang Dong
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, Xiamen 361005, China
| | - Chao Zhao
- Bionic Sensing and Intelligence Center, Institute of Biomedical and Health Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
| | - Jingjing Xu
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, Xiamen 361005, China
| |
Collapse
|
14
|
Park Y, Hauschild AC. The effect of data transformation on low-dimensional integration of single-cell RNA-seq. BMC Bioinformatics 2024; 25:171. [PMID: 38689234 PMCID: PMC11059821 DOI: 10.1186/s12859-024-05788-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/16/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Recent developments in single-cell RNA sequencing have opened up a multitude of possibilities to study tissues at the level of cellular populations. However, the heterogeneity in single-cell sequencing data necessitates appropriate procedures to adjust for technological limitations and various sources of noise when integrating datasets from different studies. While many analysis procedures employ various preprocessing steps, they often overlook the importance of selecting and optimizing the employed data transformation methods. RESULTS This work investigates data transformation approaches used in single-cell clustering analysis tools and their effects on batch integration analysis. In particular, we compare 16 transformations and their impact on the low-dimensional representations, aiming to reduce the batch effect and integrate multiple single-cell sequencing data. Our results show that data transformations strongly influence the results of single-cell clustering on low-dimensional data space, such as those generated by UMAP or PCA. Moreover, these changes in low-dimensional space significantly affect trajectory analysis using multiple datasets, as well. However, the performance of the data transformations greatly varies across datasets, and the optimal method was different for each dataset. Additionally, we explored how data transformation impacts the analysis of deep feature encodings using deep neural network-based models, including autoencoder-based models and proto-typical networks. Data transformation also strongly affects the outcome of deep neural network models. CONCLUSIONS Our findings suggest that the batch effect and noise in integrative analysis are highly influenced by data transformation. Low-dimensional features can integrate different batches well when proper data transformation is applied. Furthermore, we found that the batch mixing score on low-dimensional space can guide the selection of the optimal data transformation. In conclusion, data preprocessing is one of the most crucial analysis steps and needs to be cautiously considered in the integrative analysis of multiple scRNA-seq datasets.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
- International Max Planck Research Schools for Genome Science, Georg-August-Universität Göttingen, Göttingen, Germany
| | - Anne-Christin Hauschild
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany.
- Campus-Institute Data Science (CIDAS), Georg-August-Universität Göttingen, Göttingen, Germany.
| |
Collapse
|
15
|
Wang Y, Chen X, Tang N, Guo M, Ai D. Boosting Clear Cell Renal Carcinoma-Specific Drug Discovery Using a Deep Learning Algorithm and Single-Cell Analysis. Int J Mol Sci 2024; 25:4134. [PMID: 38612943 PMCID: PMC11012314 DOI: 10.3390/ijms25074134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 03/26/2024] [Accepted: 04/03/2024] [Indexed: 04/14/2024] Open
Abstract
Clear cell renal carcinoma (ccRCC), the most common subtype of renal cell carcinoma, has the high heterogeneity of a highly complex tumor microenvironment. Existing clinical intervention strategies, such as target therapy and immunotherapy, have failed to achieve good therapeutic effects. In this article, single-cell transcriptome sequencing (scRNA-seq) data from six patients downloaded from the GEO database were adopted to describe the tumor microenvironment (TME) of ccRCC, including its T cells, tumor-associated macrophages (TAMs), endothelial cells (ECs), and cancer-associated fibroblasts (CAFs). Based on the differential typing of the TME, we identified tumor cell-specific regulatory programs that are mediated by three key transcription factors (TFs), whilst the TF EPAS1/HIF-2α was identified via drug virtual screening through our analysis of ccRCC's protein structure. Then, a combined deep graph neural network and machine learning algorithm were used to select anti-ccRCC compounds from bioactive compound libraries, including the FDA-approved drug library, natural product library, and human endogenous metabolite compound library. Finally, five compounds were obtained, including two FDA-approved drugs (flufenamic acid and fludarabine), one endogenous metabolite, one immunology/inflammation-related compound, and one inhibitor of DNA methyltransferase (N4-methylcytidine, a cytosine nucleoside analogue that, like zebularine, has the mechanism of inhibiting DNA methyltransferase). Based on the tumor microenvironment characteristics of ccRCC, five ccRCC-specific compounds were identified, which would give direction of the clinical treatment for ccRCC patients.
Collapse
Affiliation(s)
| | | | | | | | - Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China; (Y.W.); (X.C.); (N.T.); (M.G.)
| |
Collapse
|
16
|
Adema K, Schon MA, Nodine MD, Kohlen W. Lost in space: what single-cell RNA sequencing cannot tell you. TRENDS IN PLANT SCIENCE 2024:S1360-1385(24)00066-9. [PMID: 38570278 DOI: 10.1016/j.tplants.2024.03.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 02/21/2024] [Accepted: 03/11/2024] [Indexed: 04/05/2024]
Abstract
Plant scientists are rapidly integrating single-cell RNA sequencing (scRNA-seq) into their workflows. Maximizing the potential of scRNA-seq requires a proper understanding of the spatiotemporal context of cells. However, positional information is inherently lost during scRNA-seq, limiting its potential to characterize complex biological systems. In this review we highlight how current single-cell analysis pipelines cannot completely recover spatial information, which confounds biological interpretation. Various strategies exist to identify the location of RNA, from classical RNA in situ hybridization to spatial transcriptomics. Herein we discuss the possibility of utilizing this spatial information to supervise single-cell analyses. An integrative approach will maximize the potential of each technology, and lead to insights which go beyond the capability of each individual technology.
Collapse
Affiliation(s)
- Kelvin Adema
- Laboratory of Cell and Developmental Biology, Cluster of Plant Developmental Biology, Department of Plant Sciences, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Michael A Schon
- Laboratory of Cell and Developmental Biology, Cluster of Plant Developmental Biology, Department of Plant Sciences, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands; Laboratory of Molecular Biology, Cluster of Plant Developmental Biology, Department of Plant Sciences, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Michael D Nodine
- Laboratory of Molecular Biology, Cluster of Plant Developmental Biology, Department of Plant Sciences, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Wouter Kohlen
- Laboratory of Cell and Developmental Biology, Cluster of Plant Developmental Biology, Department of Plant Sciences, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands; Laboratory of Molecular Biology, Cluster of Plant Developmental Biology, Department of Plant Sciences, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands.
| |
Collapse
|
17
|
Snyder KT, Creanza N. Birds convey complex signals in simple songs. Nature 2024; 628:37-39. [PMID: 38509289 DOI: 10.1038/d41586-024-00677-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2024]
|
18
|
Grones C, Eekhout T, Shi D, Neumann M, Berg LS, Ke Y, Shahan R, Cox KL, Gomez-Cano F, Nelissen H, Lohmann JU, Giacomello S, Martin OC, Cole B, Wang JW, Kaufmann K, Raissig MT, Palfalvi G, Greb T, Libault M, De Rybel B. Best practices for the execution, analysis, and data storage of plant single-cell/nucleus transcriptomics. THE PLANT CELL 2024; 36:812-828. [PMID: 38231860 PMCID: PMC10980355 DOI: 10.1093/plcell/koae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 10/17/2023] [Accepted: 10/24/2023] [Indexed: 01/19/2024]
Abstract
Single-cell and single-nucleus RNA-sequencing technologies capture the expression of plant genes at an unprecedented resolution. Therefore, these technologies are gaining traction in plant molecular and developmental biology for elucidating the transcriptional changes across cell types in a specific tissue or organ, upon treatments, in response to biotic and abiotic stresses, or between genotypes. Despite the rapidly accelerating use of these technologies, collective and standardized experimental and analytical procedures to support the acquisition of high-quality data sets are still missing. In this commentary, we discuss common challenges associated with the use of single-cell transcriptomics in plants and propose general guidelines to improve reproducibility, quality, comparability, and interpretation and to make the data readily available to the community in this fast-developing field of research.
Collapse
Affiliation(s)
- Carolin Grones
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| | - Thomas Eekhout
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
- VIB Single Cell Core Facility, Ghent 9052, Belgium
| | - Dongbo Shi
- Centre for Organismal Studies, Heidelberg University, 69120 Heidelberg, Germany
- Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany
| | - Manuel Neumann
- Institute of Biology, Humboldt-Universität zu Berlin, 10115 Berlin, Germany
| | - Lea S Berg
- Institute of Plant Sciences, University of Bern, 3012 Bern, Switzerland
| | - Yuji Ke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| | - Rachel Shahan
- Department of Biology, Duke University, Durham, NC 27708, USA
- Howard Hughes Medical Institute, Duke University, Durham, NC 27708, USA
| | - Kevin L Cox
- Donald Danforth Plant Science Center, St. Louis, MO 63132, USA
| | - Fabio Gomez-Cano
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hilde Nelissen
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| | - Jan U Lohmann
- Centre for Organismal Studies, Heidelberg University, 69120 Heidelberg, Germany
| | - Stefania Giacomello
- SciLifeLab, Department of Gene Technology, KTH Royal Institute of Technology, 17165 Solna, Sweden
| | - Olivier C Martin
- Universities of Paris-Saclay, Paris-Cité and Evry, CNRS, INRAE, Institute of Plant Sciences Paris-Saclay, Gif-sur-Yvette 91192, France
| | - Benjamin Cole
- DOE-Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jia-Wei Wang
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Institute of Plant Physiology and Ecology (SIPPE), Chinese Academy of Sciences (CAS), Shanghai 200032, China
| | - Kerstin Kaufmann
- Institute of Biology, Humboldt-Universität zu Berlin, 10115 Berlin, Germany
| | - Michael T Raissig
- Institute of Plant Sciences, University of Bern, 3012 Bern, Switzerland
| | - Gergo Palfalvi
- Department of Comparative Development and Genetics, Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany
| | - Thomas Greb
- Centre for Organismal Studies, Heidelberg University, 69120 Heidelberg, Germany
| | - Marc Libault
- Division of Plant Science and Technology, Interdisciplinary Plant Group, College of Agriculture, Food, and Natural Resources, University of Missouri-Columbia, Columbia, MO 65201, USA
| | - Bert De Rybel
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| |
Collapse
|
19
|
Lause J, Berens P, Kobak D. The art of seeing the elephant in the room: 2D embeddings of single-cell data do make sense. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.26.586728. [PMID: 38585748 PMCID: PMC10996625 DOI: 10.1101/2024.03.26.586728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
A recent paper in PLOS Computational Biology (Chari and Pachter, 2023) claimed that t-SNE and UMAP embeddings of single-cell datasets fail to capture true biological structure. The authors argued that such embeddings are as arbitrary and as misleading as forcing the data into an elephant shape. Here we show that this conclusion was based on inadequate and limited metrics of embedding quality. More appropriate metrics quantifying neighborhood and class preservation reveal the elephant in the room: while t-SNE and UMAP embeddings of single-cell data do not represent high-dimensional distances, they can nevertheless provide biologically relevant information.
Collapse
Affiliation(s)
- Jan Lause
- Hertie Institute for AI in Brain Health, University of Tübingen, Germany
- Tübingen AI Center, Tübingen, Germany
| | - Philipp Berens
- Hertie Institute for AI in Brain Health, University of Tübingen, Germany
- Tübingen AI Center, Tübingen, Germany
| | - Dmitry Kobak
- Hertie Institute for AI in Brain Health, University of Tübingen, Germany
- Tübingen AI Center, Tübingen, Germany
- IWR, Heidelberg University, Germany
| |
Collapse
|
20
|
Dong X, Leary JR, Yang C, Brusko MA, Brusko TM, Bacher R. Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference. Brief Bioinform 2024; 25:bbae216. [PMID: 38725155 PMCID: PMC11082074 DOI: 10.1093/bib/bbae216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 03/01/2024] [Accepted: 04/25/2024] [Indexed: 05/13/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) experiments have become instrumental in developmental and differentiation studies, enabling the profiling of cells at a single or multiple time-points to uncover subtle variations in expression profiles reflecting underlying biological processes. Benchmarking studies have compared many of the computational methods used to reconstruct cellular dynamics; however, researchers still encounter challenges in their analysis due to uncertainty with respect to selecting the most appropriate methods and parameters. Even among universal data processing steps used by trajectory inference methods such as feature selection and dimension reduction, trajectory methods' performances are highly dataset-specific. To address these challenges, we developed Escort, a novel framework for evaluating a dataset's suitability for trajectory inference and quantifying trajectory properties influenced by analysis decisions. Escort evaluates the suitability of trajectory analysis and the combined effects of processing choices using trajectory-specific metrics. Escort navigates single-cell trajectory analysis through these data-driven assessments, reducing uncertainty and much of the decision burden inherent to trajectory inference analyses. Escort is implemented in an accessible R package and R/Shiny application, providing researchers with the necessary tools to make informed decisions during trajectory analysis and enabling new insights into dynamic biological processes at single-cell resolution.
Collapse
Affiliation(s)
- Xiaoru Dong
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
| | - Jack R Leary
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
| | - Chuanhao Yang
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
| | - Maigan A Brusko
- Diabetes Institute, University of Florida, Gainesville, FL 32610, United States
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, United States
| | - Todd M Brusko
- Diabetes Institute, University of Florida, Gainesville, FL 32610, United States
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, United States
- Department of Pediatrics, College of Medicine, University of Florida, Gainesville, FL 32610, United States
| | - Rhonda Bacher
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
- Diabetes Institute, University of Florida, Gainesville, FL 32610, United States
| |
Collapse
|
21
|
Chen K, Zhou Y, Ding M, Wang Y, Ren Z, Yang Y. Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction. Brief Bioinform 2024; 25:bbae163. [PMID: 38605640 PMCID: PMC11009468 DOI: 10.1093/bib/bbae163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/22/2024] [Accepted: 03/19/2024] [Indexed: 04/13/2024] Open
Abstract
Language models pretrained by self-supervised learning (SSL) have been widely utilized to study protein sequences, while few models were developed for genomic sequences and were limited to single species. Due to the lack of genomes from different species, these models cannot effectively leverage evolutionary information. In this study, we have developed SpliceBERT, a language model pretrained on primary ribonucleic acids (RNA) sequences from 72 vertebrates by masked language modeling, and applied it to sequence-based modeling of RNA splicing. Pretraining SpliceBERT on diverse species enables effective identification of evolutionarily conserved elements. Meanwhile, the learned hidden states and attention weights can characterize the biological properties of splice sites. As a result, SpliceBERT was shown effective on several downstream tasks: zero-shot prediction of variant effects on splicing, prediction of branchpoints in humans, and cross-species prediction of splice sites. Our study highlighted the importance of pretraining genomic language models on a diverse range of species and suggested that SSL is a promising approach to enhance our understanding of the regulatory logic underlying genomic sequences.
Collapse
Affiliation(s)
- Ken Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yue Zhou
- Peng Cheng Laboratory, Shenzhen, China
| | - Maolin Ding
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yu Wang
- Peng Cheng Laboratory, Shenzhen, China
| | | | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education, China
| |
Collapse
|
22
|
Jing Z, Zhu Q, Li L, Xie Y, Wu X, Fang Q, Yang B, Dai B, Xu X, Pan H, Bai Y. Spaco: A comprehensive tool for coloring spatial data at single-cell resolution. PATTERNS (NEW YORK, N.Y.) 2024; 5:100915. [PMID: 38487801 PMCID: PMC10935509 DOI: 10.1016/j.patter.2023.100915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 12/11/2023] [Accepted: 12/18/2023] [Indexed: 03/17/2024]
Abstract
Understanding tissue architecture and niche-specific microenvironments in spatially resolved transcriptomics (SRT) requires in situ annotation and labeling of cells. Effective spatial visualization of these data demands appropriate colorization of numerous cell types. However, current colorization frameworks often inadequately account for the spatial relationships between cell types. This results in perceptual ambiguity in neighboring cells of biological distinct types, particularly in complex environments such as brain or tumor. To address this, we introduce Spaco, a potent tool for spatially aware colorization. Spaco utilizes the Degree of Interlacement metric to construct a weighted graph that evaluates the spatial relationships among different cell types, refining color assignments. Furthermore, Spaco incorporates an adaptive palette selection approach to amplify chromatic distinctions. When benchmarked on four diverse datasets, Spaco outperforms existing solutions, capturing complex spatial relationships and boosting visual clarity. Spaco ensures broad accessibility by accommodating color vision deficiency and offering open-accessible code in both Python and R.
Collapse
Affiliation(s)
- Zehua Jing
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Hangzhou 310012, China
| | | | - Linxuan Li
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Shenzhen 518083, China
| | - Yue Xie
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Shenzhen 518083, China
| | - Xinchao Wu
- BGI Research, Hangzhou 310012, China
- School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Qi Fang
- BGI Research, Shenzhen 518083, China
| | - Bolin Yang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Hangzhou 310012, China
| | - Baojun Dai
- BGI Research, Hangzhou 310012, China
- School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Xun Xu
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Guangdong Provincial Key Laboratory of Genome Read and Write, BGI Research, Shenzhen 518083, China
- BGI Research, Shenzhen 518083, China
| | - Hailin Pan
- BGI Research, Hangzhou 310012, China
- BGI Research, Shenzhen 518083, China
| | - Yinqi Bai
- BGI Research, Hangzhou 310012, China
- BGI Research, Shenzhen 518083, China
| |
Collapse
|
23
|
Ma R, Sun ED, Donoho D, Zou J. Principled and interpretable alignability testing and integration of single-cell data. Proc Natl Acad Sci U S A 2024; 121:e2313719121. [PMID: 38416677 DOI: 10.1073/pnas.2313719121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 01/23/2024] [Indexed: 03/01/2024] Open
Abstract
Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional single-cell datasets are alignable (and therefore should even be aligned). Moreover, popular methods can substantially distort the data during alignment, making the aligned data and downstream analysis difficult to interpret. To overcome these limitations, we present a spectral manifold alignment and inference (SMAI) framework, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data with the same type of features. SMAI provides a statistical test to robustly assess the alignability between datasets to avoid misleading inference and is justified by high-dimensional statistical theory. On a diverse range of real and simulated benchmark datasets, it outperforms commonly used alignment methods. Moreover, we show that SMAI improves various downstream analyses such as identification of differentially expressed genes and imputation of single-cell spatial transcriptomics, providing further biological insights. SMAI's interpretability also enables quantification and a deeper understanding of the sources of technical confounders in single-cell data.
Collapse
Affiliation(s)
- Rong Ma
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Eric D Sun
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305
| | - David Donoho
- Department of Statistics, Stanford University, Stanford, CA 94305
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305
| |
Collapse
|
24
|
Woodruff MC, Faliti CE, Sanz I. Systems biology of B cells in COVID-19. Semin Immunol 2024; 72:101875. [PMID: 38489999 DOI: 10.1016/j.smim.2024.101875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/04/2024] [Accepted: 03/04/2024] [Indexed: 03/17/2024]
Abstract
The integration of multi-'omic datasets into complex systems-wide assessments has become a mainstay in immunologic investigation. This focus on high-dimensional data collection and analysis was on full display in the investigation of COVID-19, the respiratory illness resulting from infection by the novel coronavirus SARS-CoV-2. Particularly in the area of B cell biology, tremendous efforts in both cellular and serologic investigation have resulted in an increasingly detailed mapping of the coordinated effector, memory, and antibody secreting cell responses that underpin the development of humoral immunity in response to primary viral infection. Further, the rapid development and deployment of effective vaccines has allowed for the assessment of developing memory responses across a wide variety of immune contexts, including in patients with compromised immune function. The result has been a period of rapid gains in the understanding of B cell biology unrestricted to the study of COVID-19. Here, we outline the systems-level technologies that have been routinely implemented in these investigations throughout the pandemic, and discuss how their use has led to clear and applicable gains in pursuance of the amelioration of human infectious disease and beyond.
Collapse
Affiliation(s)
- Matthew C Woodruff
- Department of Medicine, Division of Rheumatology, Lowance Center for Human Immunology, Emory University, Atlanta, GA, USA; Emory Autoimmunity Center of Excellence, Emory University, Atlanta, GA, USA.
| | - Caterina E Faliti
- Department of Medicine, Division of Rheumatology, Lowance Center for Human Immunology, Emory University, Atlanta, GA, USA; Emory Autoimmunity Center of Excellence, Emory University, Atlanta, GA, USA.
| | - Ignacio Sanz
- Department of Medicine, Division of Rheumatology, Lowance Center for Human Immunology, Emory University, Atlanta, GA, USA; Emory Autoimmunity Center of Excellence, Emory University, Atlanta, GA, USA
| |
Collapse
|
25
|
Xia L, Lee C, Li JJ. Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters. Nat Commun 2024; 15:1753. [PMID: 38409103 PMCID: PMC10897166 DOI: 10.1038/s41467-024-45891-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 02/06/2024] [Indexed: 02/28/2024] Open
Abstract
Two-dimensional (2D) embedding methods are crucial for single-cell data visualization. Popular methods such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are commonly used for visualizing cell clusters; however, it is well known that t-SNE and UMAP's 2D embeddings might not reliably inform the similarities among cell clusters. Motivated by this challenge, we present a statistical method, scDEED, for detecting dubious cell embeddings output by a 2D-embedding method. By calculating a reliability score for every cell embedding based on the similarity between the cell's 2D-embedding neighbors and pre-embedding neighbors, scDEED identifies the cell embeddings with low reliability scores as dubious and those with high reliability scores as trustworthy. Moreover, by minimizing the number of dubious cell embeddings, scDEED provides intuitive guidance for optimizing the hyperparameters of an embedding method. We show the effectiveness of scDEED on multiple datasets for detecting dubious cell embeddings and optimizing the hyperparameters of t-SNE and UMAP.
Collapse
Affiliation(s)
- Lucy Xia
- Department of ISOM, School of Business and Management, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Christy Lee
- Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jingyi Jessica Li
- Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA.
- Radcliffe Institute of Advanced Study, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
26
|
Walsh JR, Sun G, Balan J, Hardcastle J, Vollenweider J, Jerde C, Rumilla K, Koellner C, Koleilat A, Hasadsri L, Kipp B, Jenkinson G, Klee E. A supervised learning method for classifying methylation disorders. BMC Bioinformatics 2024; 25:66. [PMID: 38347515 PMCID: PMC10863277 DOI: 10.1186/s12859-024-05673-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 01/24/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND DNA methylation is one of the most stable and well-characterized epigenetic alterations in humans. Accordingly, it has already found clinical utility as a molecular biomarker in a variety of disease contexts. Existing methods for clinical diagnosis of methylation-related disorders focus on outlier detection in a small number of CpG sites using standardized cutoffs which differentiate healthy from abnormal methylation levels. The standardized cutoff values used in these methods do not take into account methylation patterns which are known to differ between the sexes and with age. RESULTS Here we profile genome-wide DNA methylation from blood samples drawn from within a cohort composed of healthy controls of different age and sex alongside patients with Prader-Willi syndrome (PWS), Beckwith-Wiedemann syndrome, Fragile-X syndrome, Angelman syndrome, and Silver-Russell syndrome. We propose a Generalized Additive Model to perform age and sex adjusted outlier analysis of around 700,000 CpG sites throughout the human genome. Utilizing z-scores among the cohort for each site, we deployed an ensemble based machine learning pipeline and achieved a combined prediction accuracy of 0.96 (Binomial 95% Confidence Interval 0.868[Formula: see text]0.995). CONCLUSION We demonstrate a method for age and sex adjusted outlier detection of differentially methylated loci based on a large cohort of healthy individuals. We present a custom machine learning pipeline utilizing this outlier analysis to classify samples for potential methylation associated congenital disorders. These methods are able to achieve high accuracy when used with machine learning methods to classify abnormal methylation patterns.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Alaa Koleilat
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, USA
| | | | | | | | | |
Collapse
|
27
|
Qiu C, Martin BK, Welsh IC, Daza RM, Le TM, Huang X, Nichols EK, Taylor ML, Fulton O, O'Day DR, Gomes AR, Ilcisin S, Srivatsan S, Deng X, Disteche CM, Noble WS, Hamazaki N, Moens CB, Kimelman D, Cao J, Schier AF, Spielmann M, Murray SA, Trapnell C, Shendure J. A single-cell time-lapse of mouse prenatal development from gastrula to birth. Nature 2024; 626:1084-1093. [PMID: 38355799 PMCID: PMC10901739 DOI: 10.1038/s41586-024-07069-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 01/15/2024] [Indexed: 02/16/2024]
Abstract
The house mouse (Mus musculus) is an exceptional model system, combining genetic tractability with close evolutionary affinity to humans1,2. Mouse gestation lasts only 3 weeks, during which the genome orchestrates the astonishing transformation of a single-cell zygote into a free-living pup composed of more than 500 million cells. Here, to establish a global framework for exploring mammalian development, we applied optimized single-cell combinatorial indexing3 to profile the transcriptional states of 12.4 million nuclei from 83 embryos, precisely staged at 2- to 6-hour intervals spanning late gastrulation (embryonic day 8) to birth (postnatal day 0). From these data, we annotate hundreds of cell types and explore the ontogenesis of the posterior embryo during somitogenesis and of kidney, mesenchyme, retina and early neurons. We leverage the temporal resolution and sampling depth of these whole-embryo snapshots, together with published data4-8 from earlier timepoints, to construct a rooted tree of cell-type relationships that spans the entirety of prenatal development, from zygote to birth. Throughout this tree, we systematically nominate genes encoding transcription factors and other proteins as candidate drivers of the in vivo differentiation of hundreds of cell types. Remarkably, the most marked temporal shifts in cell states are observed within one hour of birth and presumably underlie the massive physiological adaptations that must accompany the successful transition of a mammalian fetus to life outside the womb.
Collapse
Affiliation(s)
- Chengxiang Qiu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Riza M Daza
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Truc-Mai Le
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Xingfan Huang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Eva K Nichols
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Megan L Taylor
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Olivia Fulton
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Diana R O'Day
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | | | - Saskia Ilcisin
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Sanjay Srivatsan
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA, USA
| | - Xinxian Deng
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Christine M Disteche
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Nobuhiko Hamazaki
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - Cecilia B Moens
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - David Kimelman
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Junyue Cao
- Laboratory of Single-Cell Genomics and Population dynamics, The Rockefeller University, New York, NY, USA
| | - Alexander F Schier
- Biozentrum, University of Basel, Basel, Switzerland
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Malte Spielmann
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute of Human Genetics, University Hospitals Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Kiel, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Hamburg, Lübeck, Kiel, Lübeck, Germany
| | | | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
- Seattle Hub for Synthetic Biology, Seattle, WA, USA.
| |
Collapse
|
28
|
Zheng L, Shi S, Lu M, Fang P, Pan Z, Zhang H, Zhou Z, Zhang H, Mou M, Huang S, Tao L, Xia W, Li H, Zeng Z, Zhang S, Chen Y, Li Z, Zhu F. AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding. Genome Biol 2024; 25:41. [PMID: 38303023 PMCID: PMC10832132 DOI: 10.1186/s13059-024-03166-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 01/05/2024] [Indexed: 02/03/2024] Open
Abstract
Protein function annotation has been one of the longstanding issues in biological sciences, and various computational methods have been developed. However, the existing methods suffer from a serious long-tail problem, with a large number of GO families containing few annotated proteins. Herein, an innovative strategy named AnnoPRO was therefore constructed by enabling sequence-based multi-scale protein representation, dual-path protein encoding using pre-training, and function annotation by long short-term memory-based decoding. A variety of case studies based on different benchmarks were conducted, which confirmed the superior performance of AnnoPRO among available methods. Source code and models have been made freely available at: https://github.com/idrblab/AnnoPRO and https://zenodo.org/records/10012272.
Collapse
Affiliation(s)
- Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Pan Fang
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Hongning Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Zhimeng Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Shijie Huang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Weiqi Xia
- Pharmaceutical Department, Zhejiang Provincial People's Hospital, Hangzhou, 310014, China
| | - Honglin Li
- School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Zhenyu Zeng
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Shun Zhang
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Yuzong Chen
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, The Graduate School at Shenzhen, Tsinghua University, Shenzhen, 518055, China
| | - Zhaorong Li
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China.
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China.
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China.
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.
| |
Collapse
|
29
|
Tyler SR, Lozano-Ojalvo D, Guccione E, Schadt EE. Anti-correlated feature selection prevents false discovery of subpopulations in scRNAseq. Nat Commun 2024; 15:699. [PMID: 38267438 PMCID: PMC10808220 DOI: 10.1038/s41467-023-43406-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 11/07/2023] [Indexed: 01/26/2024] Open
Abstract
While sub-clustering cell-populations has become popular in single cell-omics, negative controls for this process are lacking. Popular feature-selection/clustering algorithms fail the null-dataset problem, allowing erroneous subdivisions of homogenous clusters until nearly each cell is called its own cluster. Using real and synthetic datasets, we find that anti-correlated gene selection reduces or eliminates erroneous subdivisions, increases marker-gene selection efficacy, and efficiently scales to millions of cells.
Collapse
Affiliation(s)
- Scott R Tyler
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Daniel Lozano-Ojalvo
- Department of Dermatology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ernesto Guccione
- Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Therapeutics Discovery, Department of Oncological Sciences and Pharmacological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Bioinformatics for Next Generation Sequencing (BiNGS) Shared Resource Facility, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eric E Schadt
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
30
|
Lederer AR, Leonardi M, Talamanca L, Herrera A, Droin C, Khven I, Carvalho HJF, Valente A, Mantes AD, Arabí PM, Pinello L, Naef F, Manno GL. Statistical inference with a manifold-constrained RNA velocity model uncovers cell cycle speed modulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.576093. [PMID: 38328127 PMCID: PMC10849531 DOI: 10.1101/2024.01.18.576093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Across a range of biological processes, cells undergo coordinated changes in gene expression, resulting in transcriptome dynamics that unfold within a low-dimensional manifold. Single-cell RNA-sequencing (scRNA-seq) only measures temporal snapshots of gene expression. However, information on the underlying low-dimensional dynamics can be extracted using RNA velocity, which models unspliced and spliced RNA abundances to estimate the rate of change of gene expression. Available RNA velocity algorithms can be fragile and rely on heuristics that lack statistical control. Moreover, the estimated vector field is not dynamically consistent with the traversed gene expression manifold. Here, we develop a generative model of RNA velocity and a Bayesian inference approach that solves these problems. Our model couples velocity field and manifold estimation in a reformulated, unified framework, so as to coherently identify the parameters of an autonomous dynamical system. Focusing on the cell cycle, we implemented VeloCycle to study gene regulation dynamics on one-dimensional periodic manifolds and validated using live-imaging its ability to infer actual cell cycle periods. We benchmarked RNA velocity inference with sensitivity analyses and demonstrated one- and multiple-sample testing. We also conducted Markov chain Monte Carlo inference on the model, uncovering key relationships between gene-specific kinetics and our gene-independent velocity estimate. Finally, we applied VeloCycle to in vivo samples and in vitro genome-wide Perturb-seq, revealing regionally-defined proliferation modes in neural progenitors and the effect of gene knockdowns on cell cycle speed. Ultimately, VeloCycle expands the scRNA-seq analysis toolkit with a modular and statistically rigorous RNA velocity inference framework.
Collapse
|
31
|
Yao Q, Jia Y, Ma P. cellMarkerPipe: Cell Marker Identification and Evaluation Pipeline in Single Cell Transcriptomes. RESEARCH SQUARE 2024:rs.3.rs-3844718. [PMID: 38313296 PMCID: PMC10836098 DOI: 10.21203/rs.3.rs-3844718/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
Assessing marker genes from all cell clusters can be time-consuming and lack systematic strategy. Streamlining this process through a unified computational platform that automates identification and benchmarking will greatly enhance efficiency and ensure a fair evaluation. We therefore developed a novel computational platform, cellMarkerPipe (https://github.com/yao-laboratory/cellMarkerPipe), for automated cell-type specific marker gene identification from scRNA-seq data, coupled with comprehensive evaluation schema. CellMarkerPipe adaptively wraps around a collection of commonly used and state-of-the-art tools, including Seurat, COSG, SC3, SCMarker, COMET, and scGeneFit. From rigorously testing across diverse samples, we ascertain SCMarker's overall reliable performance in single marker gene selection, with COSG showing commendable speed and comparable efficacy. Furthermore, we demonstrate the pivotal role of our approach in real-world medical datasets. This general and opensource pipeline stands as a significant advancement in streamlining cell marker gene identification and evaluation, fitting broad applications in the field of cellular biology and medical research.
Collapse
|
32
|
Chrysinas P, Venkatesan S, Ang I, Ghosh V, Chen C, Neelamegham S, Gunawan R. Cell and tissue-specific glycosylation pathways informed by single-cell transcriptomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.26.559616. [PMID: 38260527 PMCID: PMC10802235 DOI: 10.1101/2023.09.26.559616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
While single cell studies have made significant impacts in various subfields of biology, they lag in the Glycosciences. To address this gap, we analyzed single-cell glycogene expressions in the Tabula Sapiens dataset of human tissues and cell types using a recent glycosylation-specific gene ontology (GlycoEnzOnto). At the median sequencing (count) depth, ~40-50 out of 400 glycogenes were detected in individual cells. Upon increasing the sequencing depth, the number of detectable glycogenes saturates at ~200 glycogenes, suggesting that the average human cell expresses about half of the glycogene repertoire. Hierarchies in glycogene and glycopathway expressions emerged from our analysis: nucleotide-sugar synthesis and transport exhibited the highest gene expressions, followed by genes for core enzymes, glycan modification and extensions, and finally terminal modifications. Interestingly, the same cell types showed variable glycopathway expressions based on their organ or tissue origin, suggesting nuanced cell- and tissue-specific glycosylation patterns. Probing deeper into the transcription factors (TFs) of glycogenes, we identified distinct groupings of TFs controlling different aspects of glycosylation: core biosynthesis, terminal modifications, etc. We present webtools to explore the interconnections across glycogenes, glycopathways, and TFs regulating glycosylation in human cell/tissue types. Overall, the study presents an overview of glycosylation across multiple human organ systems.
Collapse
Affiliation(s)
- Panagiotis Chrysinas
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, Buffalo, NY, 14260, USA
| | - Shriramprasad Venkatesan
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, Buffalo, NY, 14260, USA
| | - Isaac Ang
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
| | - Vishnu Ghosh
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, Buffalo, NY, 14260, USA
| | - Changyou Chen
- Department of Computer Science and Engineering, University at Buffalo-SUNY, Buffalo, NY, 14260, USA
| | - Sriram Neelamegham
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, Buffalo, NY, 14260, USA
| | - Rudiyanto Gunawan
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, Buffalo, NY, 14260, USA
| |
Collapse
|
33
|
Bump P, Lubeck L. Marine Invertebrates One Cell at A Time: Insights from Single-Cell Analysis. Integr Comp Biol 2023; 63:999-1009. [PMID: 37188638 PMCID: PMC10714908 DOI: 10.1093/icb/icad034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 04/25/2023] [Accepted: 05/05/2023] [Indexed: 05/17/2023] Open
Abstract
Over the past decade, single-cell RNA-sequencing (scRNA-seq) has made it possible to study the cellular diversity of a broad range of organisms. Technological advances in single-cell isolation and sequencing have expanded rapidly, allowing the transcriptomic profile of individual cells to be captured. As a result, there has been an explosion of cell type atlases created for many different marine invertebrate species from across the tree of life. Our focus in this review is to synthesize current literature on marine invertebrate scRNA-seq. Specifically, we provide perspectives on key insights from scRNA-seq studies, including descriptive studies of cell type composition, how cells respond in dynamic processes such as development and regeneration, and the evolution of new cell types. Despite these tremendous advances, there also lie several challenges ahead. We discuss the important considerations that are essential when making comparisons between experiments, or between datasets from different species. Finally, we address the future of single-cell analyses in marine invertebrates, including combining scRNA-seq data with other 'omics methods to get a fuller understanding of cellular complexities. The full diversity of cell types across marine invertebrates remains unknown and understanding this diversity and evolution will provide rich areas for future study.
Collapse
Affiliation(s)
- Paul Bump
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Lauren Lubeck
- Department of Biology, Hopkins Marine Station, Stanford University, Pacific Grove, CA 93950, USA
| |
Collapse
|
34
|
Wang JH, Tsin D, Engel TA. Predictive variational autoencoder for learning robust representations of time-series data. ARXIV 2023:arXiv:2312.06932v1. [PMID: 38168462 PMCID: PMC10760197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Variational autoencoders (VAEs) have been used extensively to discover low-dimensional latent factors governing neural activity and animal behavior. However, without careful model selection, the uncovered latent factors may reflect noise in the data rather than true underlying features, rendering such representations unsuitable for scientific interpretation. Existing solutions to this problem involve introducing additional measured variables or data augmentations specific to a particular data type. We propose a VAE architecture that predicts the next point in time and show that it mitigates the learning of spurious features. In addition, we introduce a model selection metric based on smoothness over time in the latent space. We show that together these two constraints on VAEs to be smooth over time produce robust latent representations and faithfully recover latent factors on synthetic datasets.
Collapse
Affiliation(s)
- Julia H Wang
- Cold Spring Harbor Laboratory School of Biological Sciences Cold Spring Harbor Laboratory Cold Spring Harbor, New York, USA
| | - Dexter Tsin
- Princeton Neuroscience Institute Prineton University Princeton, New Jersey, USA
| | - Tatiana A Engel
- Princeton Neuroscience Institute Prineton University Princeton, New Jersey, USA
| |
Collapse
|
35
|
Ghaddar B, De S. Hierarchical and automated cell-type annotation and inference of cancer cell of origin with Census. Bioinformatics 2023; 39:btad714. [PMID: 38011649 PMCID: PMC10713118 DOI: 10.1093/bioinformatics/btad714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 10/26/2023] [Accepted: 11/25/2023] [Indexed: 11/29/2023] Open
Abstract
MOTIVATION Cell-type annotation is a time-consuming yet critical first step in the analysis of single-cell RNA-seq data, especially when multiple similar cell subtypes with overlapping marker genes are present. Existing automated annotation methods have a number of limitations, including requiring large reference datasets, high computation time, shallow annotation resolution, and difficulty in identifying cancer cells or their most likely cell of origin. RESULTS We developed Census, a biologically intuitive and fully automated cell-type identification method for single-cell RNA-seq data that can deeply annotate normal cells in mammalian tissues and identify malignant cells and their likely cell of origin. Motivated by the inherently stratified developmental programs of cellular differentiation, Census infers hierarchical cell-type relationships and uses gradient-boosted \decision trees that capitalize on nodal cell-type relationships to achieve high prediction speed and accuracy. When benchmarked on 44 atlas-scale normal and cancer, human and mouse tissues, Census significantly outperforms state-of-the-art methods across multiple metrics and naturally predicts the cell-of-origin of different cancers. Census is pretrained on the Tabula Sapiens to classify 175 cell-types from 24 organs; however, users can seamlessly train their own models for customized applications. AVAILABILITY AND IMPLEMENTATION Census is available at Zenodo https://zenodo.org/records/7017103 and on our Github https://github.com/sjdlabgroup/Census.
Collapse
Affiliation(s)
- Bassel Ghaddar
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 08901, United States
| | - Subhajyoti De
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 08901, United States
| |
Collapse
|
36
|
Shinn M. Phantom oscillations in principal component analysis. Proc Natl Acad Sci U S A 2023; 120:e2311420120. [PMID: 37988465 PMCID: PMC10691246 DOI: 10.1073/pnas.2311420120] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/18/2023] [Indexed: 11/23/2023] Open
Abstract
Principal component analysis (PCA) is a dimensionality reduction method that is known for being simple and easy to interpret. Principal components are often interpreted as low-dimensional patterns in high-dimensional space. However, this simple interpretation fails for timeseries, spatial maps, and other continuous data. In these cases, nonoscillatory data may have oscillatory principal components. Here, we show that two common properties of data cause oscillatory principal components: smoothness and shifts in time or space. These two properties implicate almost all neuroscience data. We show how the oscillations produced by PCA, which we call "phantom oscillations," impact data analysis. We also show that traditional cross-validation does not detect phantom oscillations, so we suggest procedures that do. Our findings are supported by a collection of mathematical proofs. Collectively, our work demonstrates that patterns which emerge from high-dimensional data analysis may not faithfully represent the underlying data.
Collapse
Affiliation(s)
- Maxwell Shinn
- University College London (UCL) Queen Square Institute of Neurology, University College London, LondonWC1E 6BT, United Kingdom
| |
Collapse
|
37
|
Nazaret A, Fan JL, Lavallée VP, Cornish AE, Kiseliovas V, Masilionis I, Chun J, Bowman RL, Eisman SE, Wang J, Shi L, Levine RL, Mazutis L, Blei D, Pe'er D, Azizi E. Deep generative model deciphers derailed trajectories in acute myeloid leukemia. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.11.566719. [PMID: 38014231 PMCID: PMC10680623 DOI: 10.1101/2023.11.11.566719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Single-cell genomics has the potential to map cell states and their dynamics in an unbiased way in response to perturbations like disease. However, elucidating the cell-state transitions from healthy to disease requires analyzing data from perturbed samples jointly with unperturbed reference samples. Existing methods for integrating and jointly visualizing single-cell datasets from distinct contexts tend to remove key biological differences or do not correctly harmonize shared mechanisms. We present Decipher, a model that combines variational autoencoders with deep exponential families to reconstruct derailed trajectories ( https://github.com/azizilab/decipher ). Decipher jointly represents normal and perturbed single-cell RNA-seq datasets, revealing shared and disrupted dynamics. It further introduces a novel approach to visualize data, without the need for methods such as UMAP or TSNE. We demonstrate Decipher on data from acute myeloid leukemia patient bone marrow specimens, showing that it successfully characterizes the divergence from normal hematopoiesis and identifies transcriptional programs that become disrupted in each patient when they acquire NPM1 driver mutations.
Collapse
|
38
|
Miles CE, McKinley SA, Ding F, Lehoucq RB. Inferring stochastic rates from heterogeneous snapshots of particle positions. ARXIV 2023:arXiv:2311.04880v1. [PMID: 37986720 PMCID: PMC10659442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Many imaging techniques for biological systems - like fixation of cells coupled with fluorescence microscopy - provide sharp spatial resolution in reporting locations of individuals at a single moment in time but also destroy the dynamics they intend to capture. These snapshot observations contain no information about individual trajectories, but still encode information about movement and demographic dynamics, especially when combined with a well-motivated biophysical model. The relationship between spatially evolving populations and single-moment representations of their collective locations is well-established with partial differential equations (PDEs) and their inverse problems. However, experimental data is commonly a set of locations whose number is insufficient to approximate a continuous-in-space PDE solution. Here, motivated by popular subcellular imaging data of gene expression, we embrace the stochastic nature of the data and investigate the mathematical foundations of parametrically inferring demographic rates from snapshots of particles undergoing birth, diffusion, and death in a nuclear or cellular domain. Toward inference, we rigorously derive a connection between individual particle paths and their presentation as a Poisson spatial process. Using this framework, we investigate the properties of the resulting inverse problem and study factors that affect quality of inference. One pervasive feature of this experimental regime is the presence of cell-to-cell heterogeneity. Rather than being a hindrance, we show that cell-to-cell geometric heterogeneity can increase the quality of inference on dynamics for certain parameter regimes. Altogether, the results serve as a basis for more detailed investigations of subcellular spatial patterns of RNA molecules and other stochastically evolving populations that can only be observed for single instants in their time evolution.
Collapse
Affiliation(s)
| | | | - Fangyuan Ding
- Department of Biomedical Engineering, University of California, Irvine
| | | |
Collapse
|
39
|
Falconnier C, Caparros-Roissard A, Decraene C, Lutz PE. Functional genomic mechanisms of opioid action and opioid use disorder: a systematic review of animal models and human studies. Mol Psychiatry 2023; 28:4568-4584. [PMID: 37723284 PMCID: PMC10914629 DOI: 10.1038/s41380-023-02238-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 08/17/2023] [Accepted: 08/24/2023] [Indexed: 09/20/2023]
Abstract
In the past two decades, over-prescription of opioids for pain management has driven a steep increase in opioid use disorder (OUD) and death by overdose, exerting a dramatic toll on western countries. OUD is a chronic relapsing disease associated with a lifetime struggle to control drug consumption, suggesting that opioids trigger long-lasting brain adaptations, notably through functional genomic and epigenomic mechanisms. Current understanding of these processes, however, remain scarce, and have not been previously reviewed systematically. To do so, the goal of the present work was to synthesize current knowledge on genome-wide transcriptomic and epigenetic mechanisms of opioid action, in primate and rodent species. Using a prospectively registered methodology, comprehensive literature searches were completed in PubMed, Embase, and Web of Science. Of the 2709 articles identified, 73 met our inclusion criteria and were considered for qualitative analysis. Focusing on the 5 most studied nervous system structures (nucleus accumbens, frontal cortex, whole striatum, dorsal striatum, spinal cord; 44 articles), we also conducted a quantitative analysis of differentially expressed genes, in an effort to identify a putative core transcriptional signature of opioids. Only one gene, Cdkn1a, was consistently identified in eleven studies, and globally, our results unveil surprisingly low consistency across published work, even when considering most recent single-cell approaches. Analysis of sources of variability detected significant contributions from species, brain structure, duration of opioid exposure, strain, time-point of analysis, and batch effects, but not type of opioid. To go beyond those limitations, we leveraged threshold-free methods to illustrate how genome-wide comparisons may generate new findings and hypotheses. Finally, we discuss current methodological development in the field, and their implication for future research and, ultimately, better care.
Collapse
Affiliation(s)
- Camille Falconnier
- Centre National de la Recherche Scientifique, Université de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR 3212, 67000, Strasbourg, France
| | - Alba Caparros-Roissard
- Centre National de la Recherche Scientifique, Université de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR 3212, 67000, Strasbourg, France
| | - Charles Decraene
- Centre National de la Recherche Scientifique, Université de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR 3212, 67000, Strasbourg, France
- Centre National de la Recherche Scientifique, Université de Strasbourg, Laboratoire de Neurosciences Cognitives et Adaptatives UMR 7364, 67000, Strasbourg, France
| | - Pierre-Eric Lutz
- Centre National de la Recherche Scientifique, Université de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR 3212, 67000, Strasbourg, France.
- Douglas Mental Health University Institute, Montreal, QC, Canada.
| |
Collapse
|
40
|
Yampolskaya M, Herriges MJ, Ikonomou L, Kotton DN, Mehta P. scTOP: physics-inspired order parameters for cellular identification and visualization. Development 2023; 150:dev201873. [PMID: 37756586 PMCID: PMC10629677 DOI: 10.1242/dev.201873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023]
Abstract
Advances in single-cell RNA sequencing provide an unprecedented window into cellular identity. The abundance of data requires new theoretical and computational frameworks to analyze the dynamics of differentiation and integrate knowledge from cell atlases. We present 'single-cell Type Order Parameters' (scTOP): a statistical, physics-inspired approach for quantifying cell identity given a reference basis of cell types. scTOP can accurately classify cells, visualize developmental trajectories and assess the fidelity of engineered cells. Importantly, scTOP does this without feature selection, statistical fitting or dimensional reduction (e.g. uniform manifold approximation and projection, principle components analysis, etc.). We illustrate the power of scTOP using human and mouse datasets. By reanalyzing mouse lung data, we characterize a transient hybrid alveolar type 1/alveolar type 2 cell population. Visualizations of lineage tracing hematopoiesis data using scTOP confirm that a single clone can give rise to multiple mature cell types. We assess the transcriptional similarity between endogenous and donor-derived cells in the context of murine pulmonary cell transplantation. Our results suggest that physics-inspired order parameters can be an important tool for understanding differentiation and characterizing engineered cells. scTOP is available as an easy-to-use Python package.
Collapse
Affiliation(s)
| | - Michael J. Herriges
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Laertis Ikonomou
- Department of Oral Biology, University at Buffalo, The State University of New York, Buffalo, NY 14215, USA
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University at Buffalo, The State University of New York, Buffalo, NY 14215, USA
| | - Darrell N. Kotton
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Pankaj Mehta
- Department of Physics, Boston University, Boston, MA 02215, USA
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
- Faculty of Computing and Data Science, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
| |
Collapse
|
41
|
Carbonetto P, Luo K, Sarkar A, Hung A, Tayeb K, Pott S, Stephens M. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. Genome Biol 2023; 24:236. [PMID: 37858253 PMCID: PMC10588049 DOI: 10.1186/s13059-023-03067-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/20/2023] [Indexed: 10/21/2023] Open
Abstract
Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.
Collapse
Affiliation(s)
- Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Research Computing Center, University of Chicago, Chicago, IL, USA
| | - Kaixuan Luo
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Vesalius Therapeutics, Cambridge, MA, USA
| | - Anthony Hung
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Karl Tayeb
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Sebastian Pott
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- Department of Statistics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
42
|
Read JF, Serralha M, Armitage JD, Iqbal MM, Cruickshank MN, Saxena A, Strickland DH, Waithman J, Holt PG, Bosco A. Single cell transcriptomics reveals cell type specific features of developmentally regulated responses to lipopolysaccharide between birth and 5 years. Front Immunol 2023; 14:1275937. [PMID: 37920467 PMCID: PMC10619903 DOI: 10.3389/fimmu.2023.1275937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 10/04/2023] [Indexed: 11/04/2023] Open
Abstract
Background Human perinatal life is characterized by a period of extraordinary change during which newborns encounter abundant environmental stimuli and exposure to potential pathogens. To meet such challenges, the neonatal immune system is equipped with unique functional characteristics that adapt to changing conditions as development progresses across the early years of life, but the molecular characteristics of such adaptations remain poorly understood. The application of single cell genomics to birth cohorts provides an opportunity to investigate changes in gene expression programs elicited downstream of innate immune activation across early life at unprecedented resolution. Methods In this study, we performed single cell RNA-sequencing of mononuclear cells collected from matched birth cord blood and 5-year peripheral blood samples following stimulation (18hrs) with two well-characterized innate stimuli; lipopolysaccharide (LPS) and Polyinosinic:polycytidylic acid (Poly(I:C)). Results We found that the transcriptional response to LPS was constrained at birth and predominantly partitioned into classical proinflammatory gene upregulation primarily by monocytes and Interferon (IFN)-signaling gene upregulation by lymphocytes. Moreover, these responses featured substantial cell-to-cell communication which appeared markedly strengthened between birth and 5 years. In contrast, stimulation with Poly(I:C) induced a robust IFN-signalling response across all cell types identified at birth and 5 years. Analysis of gene regulatory networks revealed IRF1 and STAT1 were key drivers of the LPS-induced IFN-signaling response in lymphocytes with a potential developmental role for IRF7 regulation. Conclusion Additionally, we observed distinct activation trajectory endpoints for monocytes derived from LPS-treated cord and 5-year blood, which was not apparent among Poly(I:C)-induced monocytes. Taken together, our findings provide new insight into the gene regulatory landscape of immune cell function between birth and 5 years and point to regulatory mechanisms relevant to future investigation of infection susceptibility in early life.
Collapse
Affiliation(s)
- James F. Read
- Asthma and Airway Disease Research Center, University of Arizona, Tucson, AZ, United States
- Telethon Kids Institute, The University of Western Australia, Perth, WA, Australia
| | - Michael Serralha
- Telethon Kids Institute, The University of Western Australia, Perth, WA, Australia
| | - Jesse D. Armitage
- Telethon Kids Institute, The University of Western Australia, Perth, WA, Australia
- School of Biomedical Sciences, The University of Western Australia, Nedlands, Western Australia, Australia
| | - Muhammad Munir Iqbal
- Genomics WA, Joint Initiative of Telethon Kids Institute, Harry Perkins Institute of Medical Research and The University of Western Australia, Nedlands, WA, Australia
| | - Mark N. Cruickshank
- School of Biomedical Sciences, The University of Western Australia, Nedlands, Western Australia, Australia
| | - Alka Saxena
- Genomics WA, Joint Initiative of Telethon Kids Institute, Harry Perkins Institute of Medical Research and The University of Western Australia, Nedlands, WA, Australia
| | - Deborah H. Strickland
- Telethon Kids Institute, The University of Western Australia, Perth, WA, Australia
- UWA Centre for Child Health Research, The University of Western Australia, Nedlands, WA, Australia
| | - Jason Waithman
- School of Biomedical Sciences, The University of Western Australia, Nedlands, Western Australia, Australia
| | - Patrick G. Holt
- Telethon Kids Institute, The University of Western Australia, Perth, WA, Australia
- UWA Centre for Child Health Research, The University of Western Australia, Nedlands, WA, Australia
| | - Anthony Bosco
- Asthma and Airway Disease Research Center, University of Arizona, Tucson, AZ, United States
- Department of Immunobiology, The University of Arizona College of Medicine, Tucson, AZ, United States
| |
Collapse
|
43
|
Tseng KC, Crump JG. Craniofacial developmental biology in the single-cell era. Development 2023; 150:dev202077. [PMID: 37812056 PMCID: PMC10617621 DOI: 10.1242/dev.202077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
The evolution of a unique craniofacial complex in vertebrates made possible new ways of breathing, eating, communicating and sensing the environment. The head and face develop through interactions of all three germ layers, the endoderm, ectoderm and mesoderm, as well as the so-called fourth germ layer, the cranial neural crest. Over a century of experimental embryology and genetics have revealed an incredible diversity of cell types derived from each germ layer, signaling pathways and genes that coordinate craniofacial development, and how changes to these underlie human disease and vertebrate evolution. Yet for many diseases and congenital anomalies, we have an incomplete picture of the causative genomic changes, in particular how alterations to the non-coding genome might affect craniofacial gene expression. Emerging genomics and single-cell technologies provide an opportunity to obtain a more holistic view of the genes and gene regulatory elements orchestrating craniofacial development across vertebrates. These single-cell studies generate novel hypotheses that can be experimentally validated in vivo. In this Review, we highlight recent advances in single-cell studies of diverse craniofacial structures, as well as potential pitfalls and the need for extensive in vivo validation. We discuss how these studies inform the developmental sources and regulation of head structures, bringing new insights into the etiology of structural birth anomalies that affect the vertebrate head.
Collapse
Affiliation(s)
- Kuo-Chang Tseng
- Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine of University of Southern California, Los Angeles, CA 90033, USA
| | - J. Gage Crump
- Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine of University of Southern California, Los Angeles, CA 90033, USA
| |
Collapse
|
44
|
Steinhart MR, van der Valk WH, Osorio D, Serdy SA, Zhang J, Nist-Lund C, Kim J, Moncada-Reid C, Sun L, Lee J, Koehler KR. Mapping oto-pharyngeal development in a human inner ear organoid model. Development 2023; 150:dev201871. [PMID: 37796037 DOI: 10.1242/dev.201871] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 09/08/2023] [Indexed: 10/06/2023]
Abstract
Inner ear development requires the coordination of cell types from distinct epithelial, mesenchymal and neuronal lineages. Although we have learned much from animal models, many details about human inner ear development remain elusive. We recently developed an in vitro model of human inner ear organogenesis using pluripotent stem cells in a 3D culture, fostering the growth of a sensorineural circuit, including hair cells and neurons. Despite previously characterizing some cell types, many remain undefined. This study aimed to chart the in vitro development timeline of the inner ear organoid to understand the mechanisms at play. Using single-cell RNA sequencing at ten stages during the first 36 days of differentiation, we tracked the evolution from pluripotency to various ear cell types after exposure to specific signaling modulators. Our findings showcase gene expression that influences differentiation, identifying a plethora of ectodermal and mesenchymal cell types. We also discern aspects of the organoid model consistent with in vivo development, while highlighting potential discrepancies. Our study establishes the Inner Ear Organoid Developmental Atlas (IODA), offering deeper insights into human biology and improving inner ear tissue differentiation.
Collapse
Affiliation(s)
- Matthew R Steinhart
- Department of Otolaryngology, Boston Children's Hospital, Boston, MA 02115, USA
- F. M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA 02115, USA
- Department of Otolaryngology-Head and Neck Surgery, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Medical Neuroscience Graduate Program, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Wouter H van der Valk
- Department of Otolaryngology, Boston Children's Hospital, Boston, MA 02115, USA
- F. M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA 02115, USA
- Department of Otolaryngology-Head and Neck Surgery, Harvard Medical School, Boston, MA 02115, USA
- OtoBiology Leiden, Department of Otorhinolaryngology and Head & Neck Surgery; Leiden University Medical Center, Leiden 2333 ZA, the Netherlands
- The Novo Nordisk Foundation Center for Stem Cell Medicine (reNEW); Leiden University Medical Center, Leiden, 2333 ZA, the Netherlands
| | - Daniel Osorio
- Research Computing, Department of Information Technology; Boston Children's Hospital, Boston, MA 02115, USA
| | - Sara A Serdy
- Department of Otolaryngology, Boston Children's Hospital, Boston, MA 02115, USA
- F. M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA 02115, USA
| | - Jingyuan Zhang
- Department of Otolaryngology, Boston Children's Hospital, Boston, MA 02115, USA
- F. M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA 02115, USA
- Department of Otolaryngology-Head and Neck Surgery, Harvard Medical School, Boston, MA 02115, USA
| | - Carl Nist-Lund
- Department of Otolaryngology, Boston Children's Hospital, Boston, MA 02115, USA
- F. M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA 02115, USA
- Program in Neuroscience, Harvard Medical School, Boston, MA 02115, USA
| | - Jin Kim
- Department of Otolaryngology, Boston Children's Hospital, Boston, MA 02115, USA
- F. M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA 02115, USA
- Department of Otolaryngology-Head and Neck Surgery, Harvard Medical School, Boston, MA 02115, USA
- Department of Plastic and Oral Surgery, Boston Children's Hospital, Boston, MA 02115, USA
| | - Cynthia Moncada-Reid
- Speech and Hearing Bioscience and Technology (SHBT) Graduate Program, Harvard Medical School, Boston, MA 02115, USA
| | - Liang Sun
- Research Computing, Department of Information Technology; Boston Children's Hospital, Boston, MA 02115, USA
| | - Jiyoon Lee
- Department of Otolaryngology, Boston Children's Hospital, Boston, MA 02115, USA
- F. M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA 02115, USA
- Department of Otolaryngology-Head and Neck Surgery, Harvard Medical School, Boston, MA 02115, USA
- Department of Plastic and Oral Surgery, Boston Children's Hospital, Boston, MA 02115, USA
| | - Karl R Koehler
- Department of Otolaryngology, Boston Children's Hospital, Boston, MA 02115, USA
- F. M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA 02115, USA
- Department of Otolaryngology-Head and Neck Surgery, Harvard Medical School, Boston, MA 02115, USA
- Department of Plastic and Oral Surgery, Boston Children's Hospital, Boston, MA 02115, USA
| |
Collapse
|
45
|
Chari T, Gorin G, Pachter L. Biophysically Interpretable Inference of Cell Types from Multimodal Sequencing Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.17.558131. [PMID: 37745403 PMCID: PMC10516047 DOI: 10.1101/2023.09.17.558131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Multimodal, single-cell genomics technologies enable simultaneous capture of multiple facets of DNA and RNA processing in the cell. This creates opportunities for transcriptome-wide, mechanistic studies of cellular processing in heterogeneous cell types, with applications ranging from inferring kinetic differences between cells, to the role of stochasticity in driving heterogeneity. However, current methods for determining cell types or 'clusters' present in multimodal data often rely on ad hoc or independent treatment of modalities, and assumptions ignoring inherent properties of the count data. To enable interpretable and consistent cell cluster determination from multimodal data, we present meK-Means (mechanistic K-Means) which integrates modalities and learns underlying, shared biophysical states through a unifying model of transcription. In particular, we demonstrate how meK-Means can be used to cluster cells from unspliced and spliced mRNA count modalities. By utilizing the causal, physical relationships underlying these modalities, we identify shared transcriptional kinetics across cells, which induce the observed gene expression profiles, and provide an alternative definition for 'clusters' through the governing parameters of cellular processes.
Collapse
Affiliation(s)
- Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Gennady Gorin
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California
| |
Collapse
|