1
|
Croydon-Veleslavov IA, Stumpf MPH. Repeated Decision Stumping Distils Simple Rules from Single-Cell Data. J Comput Biol 2024; 31:21-40. [PMID: 38170180 DOI: 10.1089/cmb.2021.0613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2024] Open
Abstract
Single-cell data afford unprecedented insights into molecular processes. But the complexity and size of these data sets have proved challenging and given rise to a large armory of statistical and machine learning approaches. The majority of approaches focuses on either describing features of these data, or making predictions and classifying unlabeled samples. In this study, we introduce repeated decision stumping (ReDX) as a method to distill simple models from single-cell data. We develop decision trees of depth one-hence "stumps"-to identify in an inductive manner, gene products involved in driving cell fate transitions, and in applications to published data we are able to discover the key players involved in these processes in an unbiased manner without prior knowledge. Our algorithm is deliberately targeting the simplest possible candidate hypotheses that can be extracted from complex high-dimensional data. There are three reasons for this: (1) the predictions become straightforwardly testable hypotheses; (2) the identified candidates form the basis for further mechanistic model development, for example, for engineering and synthetic biology interventions; and (3) this approach complements existing descriptive modeling approaches and frameworks. The approach is computationally efficient, has remarkable predictive power, including in simulation studies where the ground truth is known, and yields robust and statistically stable predictors; the same set of candidates is generated by applying the algorithm to different subsamples of experimental data.
Collapse
Affiliation(s)
- Ivan A Croydon-Veleslavov
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom
| | - Michael P H Stumpf
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom
- School of BioSciences, University of Melbourne, Parkville, Australia
- School of Mathematics and Statistics, University of Melbourne, Parkville, Australia
| |
Collapse
|
2
|
Greulich P. Quantitative Modelling in Stem Cell Biology and Beyond: How to Make Best Use of It. CURRENT STEM CELL REPORTS 2023; 9:67-76. [PMID: 38145009 PMCID: PMC10739548 DOI: 10.1007/s40778-023-00230-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/17/2023] [Indexed: 12/26/2023]
Abstract
Purpose of Review This article gives a broad overview of quantitative modelling approaches in biology and provides guidance on how to employ them to boost stem cell research, by helping to answer biological questions and to predict the outcome of biological processes. Recent Findings The twenty-first century has seen a steady increase in the proportion of cell biology publications employing mathematical modelling to aid experimental research. However, quantitative modelling is often used as a rather decorative element to confirm experimental findings, an approach which often yields only marginal added value, and is in many cases scientifically questionable. Summary Quantitative modelling can boost biological research in manifold ways, but one has to take some careful considerations before embarking on a modelling campaign, in order to maximise its added value, to avoid pitfalls that may lead to wrong results, and to be aware of its fundamental limitations, imposed by the risks of over-fitting and "universality".
Collapse
Affiliation(s)
- Philip Greulich
- School of Mathematical Sciences, University of Southampton, Southampton, UK
- Institute for Life Sciences, University of Southampton, Southampton, UK
| |
Collapse
|
3
|
Haj Ali S, Hütt MT. Inferring missing edges in a graph from observed collective patterns. Phys Rev E 2022; 105:064610. [PMID: 35854582 DOI: 10.1103/physreve.105.064610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 05/26/2022] [Indexed: 06/15/2023]
Abstract
Many real-life networks are incomplete. Dynamical observations can allow estimating missing edges. Such procedures, often summarized under the term 'network inference', typically evaluate the statistical correlations among pairs of nodes to determine connectivity. Here, we offer an alternative approach: completing an incomplete network by observing its collective behavior. We illustrate this approach for the case of patterns emerging in reaction-diffusion systems on graphs, where collective behaviors can be associated with eigenvectors of the network's Laplacian matrix. Our method combines a partial spectral decomposition of the network's Laplacian matrix with eigenvalue assignment by matching the patterns to the eigenvectors of the incomplete graph. We show that knowledge of a few collective patterns can allow the prediction of missing edges and that this result holds across a range of network architectures. We present a numerical case study using activator-inhibitor dynamics and we illustrate that the main requirement for the observed patterns is that they are not confined to subsets of nodes, but involve the whole network.
Collapse
Affiliation(s)
- Selim Haj Ali
- Department of Life Sciences and Chemistry, Jacobs University Bremen, D-28759 Bremen, Germany
| | - Marc-Thorsten Hütt
- Department of Life Sciences and Chemistry, Jacobs University Bremen, D-28759 Bremen, Germany
| |
Collapse
|
4
|
Rajagopal V, Arumugam S, Hunter PJ, Khadangi A, Chung J, Pan M. The Cell Physiome: What Do We Need in a Computational Physiology Framework for Predicting Single-Cell Biology? Annu Rev Biomed Data Sci 2022; 5:341-366. [PMID: 35576556 DOI: 10.1146/annurev-biodatasci-072018-021246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Modern biology and biomedicine are undergoing a big data explosion, needing advanced computational algorithms to extract mechanistic insights on the physiological state of living cells. We present the motivation for the Cell Physiome project: a framework and approach for creating, sharing, and using biophysics-based computational models of single-cell physiology. Using examples in calcium signaling, bioenergetics, and endosomal trafficking, we highlight the need for spatially detailed, biophysics-based computational models to uncover new mechanisms underlying cell biology. We review progress and challenges to date toward creating cell physiome models. We then introduce bond graphs as an efficient way to create cell physiome models that integrate chemical, mechanical, electromagnetic, and thermal processes while maintaining mass and energy balance. Bond graphs enhance modularization and reusability of computational models of cells at scale. We conclude with a look forward at steps that will help fully realize this exciting new field of mechanistic biomedical data science. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Vijay Rajagopal
- Department of Biomedical Engineering, University of Melbourne, Melbourne, Victoria, Australia;
| | - Senthil Arumugam
- Cellular Physiology Lab, Monash Biomedicine Discovery Institute, Faculty of Medicine, Nursing and Health Sciences; European Molecular Biological Laboratory (EMBL) Australia; and Australian Research Council Centre of Excellence in Advanced Molecular Imaging, Monash University, Clayton/Melbourne, Victoria, Australia
| | - Peter J Hunter
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Afshin Khadangi
- Department of Biomedical Engineering, University of Melbourne, Melbourne, Victoria, Australia;
| | - Joshua Chung
- Department of Biomedical Engineering, University of Melbourne, Melbourne, Victoria, Australia;
| | - Michael Pan
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
5
|
Abstract
Network inference is a notoriously challenging problem. Inferred networks are associated with high uncertainty and likely riddled with false positive and false negative interactions. Especially for biological networks we do not have good ways of judging the performance of inference methods against real networks, and instead we often rely solely on the performance against simulated data. Gaining confidence in networks inferred from real data nevertheless thus requires establishing reliable validation methods. Here, we argue that the expectation of mixing patterns in biological networks such as gene regulatory networks offers a reasonable starting point: interactions are more likely to occur between nodes with similar biological functions. We can quantify this behaviour using the assortativity coefficient, and here we show that the resulting heuristic, functional assortativity, offers a reliable and informative route for comparing different inference algorithms.
Collapse
|
6
|
Heydari T, A. Langley M, Fisher CL, Aguilar-Hidalgo D, Shukla S, Yachie-Kinoshita A, Hughes M, M. McNagny K, Zandstra PW. IQCELL: A platform for predicting the effect of gene perturbations on developmental trajectories using single-cell RNA-seq data. PLoS Comput Biol 2022; 18:e1009907. [PMID: 35213533 PMCID: PMC8906617 DOI: 10.1371/journal.pcbi.1009907] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 03/09/2022] [Accepted: 02/08/2022] [Indexed: 01/03/2023] Open
Abstract
The increasing availability of single-cell RNA-sequencing (scRNA-seq) data from various developmental systems provides the opportunity to infer gene regulatory networks (GRNs) directly from data. Herein we describe IQCELL, a platform to infer, simulate, and study executable logical GRNs directly from scRNA-seq data. Such executable GRNs allow simulation of fundamental hypotheses governing developmental programs and help accelerate the design of strategies to control stem cell fate. We first describe the architecture of IQCELL. Next, we apply IQCELL to scRNA-seq datasets from early mouse T-cell and red blood cell development, and show that the platform can infer overall over 74% of causal gene interactions previously reported from decades of research. We will also show that dynamic simulations of the generated GRN qualitatively recapitulate the effects of known gene perturbations. Finally, we implement an IQCELL gene selection pipeline that allows us to identify candidate genes, without prior knowledge. We demonstrate that GRN simulations based on the inferred set yield results similar to the original curated lists. In summary, the IQCELL platform offers a versatile tool to infer, simulate, and study executable GRNs in dynamic biological systems.
Collapse
Affiliation(s)
- Tiam Heydari
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Matthew A. Langley
- Institute of Biomaterials and Biomedical Engineering, University of Toronto, Toronto, ON, Canada
| | - Cynthia L. Fisher
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Daniel Aguilar-Hidalgo
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Shreya Shukla
- Institute of Biomaterials and Biomedical Engineering, University of Toronto, Toronto, ON, Canada
- Notch Therapeutics, Vancouver, British Columbia, Canada
| | - Ayako Yachie-Kinoshita
- Institute of Biomaterials and Biomedical Engineering, University of Toronto, Toronto, ON, Canada
| | - Michael Hughes
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Kelly M. McNagny
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Peter W. Zandstra
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
7
|
Waters SL, Schumacher LJ, El Haj AJ. Regenerative medicine meets mathematical modelling: developing symbiotic relationships. NPJ Regen Med 2021; 6:24. [PMID: 33846347 PMCID: PMC8042047 DOI: 10.1038/s41536-021-00134-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 02/26/2021] [Indexed: 02/01/2023] Open
Abstract
Successful progression from bench to bedside for regenerative medicine products is challenging and requires a multidisciplinary approach. What has not yet been fully recognised is the potential for quantitative data analysis and mathematical modelling approaches to support this process. In this review, we highlight the wealth of opportunities for embedding mathematical and computational approaches within all stages of the regenerative medicine pipeline. We explore how exploiting quantitative mathematical and computational approaches, alongside state-of-the-art regenerative medicine research, can lead to therapies that potentially can be more rapidly translated into the clinic.
Collapse
Affiliation(s)
- S L Waters
- Oxford Centre for Industrial and Applied Mathematics, Mathematical Institute, Radcliffe Observatory Quarter, University of Oxford, Oxford, UK
| | - L J Schumacher
- Centre for Regenerative Medicine, The University of Edinburgh, Edinburgh BioQuarter, Edinburgh, UK
| | - A J El Haj
- Healthcare Technology Institute, Institute of Translational Medicine, School of Chemical Engineering, University of Birmingham, Birmingham, UK.
| |
Collapse
|
8
|
Hütt MT, Lesne A. Gene Regulatory Networks: Dissecting Structure and Dynamics. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11467-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
9
|
|
10
|
Xin H, Lian Q, Jiang Y, Luo J, Wang X, Erb C, Xu Z, Zhang X, Heidrich-O’Hare E, Yan Q, Duerr RH, Chen K, Chen W. GMM-Demux: sample demultiplexing, multiplet detection, experiment planning, and novel cell-type verification in single cell sequencing. Genome Biol 2020; 21:188. [PMID: 32731885 PMCID: PMC7393741 DOI: 10.1186/s13059-020-02084-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 06/24/2020] [Indexed: 11/10/2022] Open
Abstract
Identifying and removing multiplets are essential to improving the scalability and the reliability of single cell RNA sequencing (scRNA-seq). Multiplets create artificial cell types in the dataset. We propose a Gaussian mixture model-based multiplet identification method, GMM-Demux. GMM-Demux accurately identifies and removes multiplets through sample barcoding, including cell hashing and MULTI-seq. GMM-Demux uses a droplet formation model to authenticate putative cell types discovered from a scRNA-seq dataset. We generate two in-house cell-hashing datasets and compared GMM-Demux against three state-of-the-art sample barcoding classifiers. We show that GMM-Demux is stable and highly accurate and recognizes 9 multiplet-induced fake cell types in a PBMC dataset.
Collapse
Affiliation(s)
- Hongyi Xin
- University of Michigan-Shanghai Jiao Tong University Joint Institute, Shanghai Jiao Tong University, Shanghai, 200240 China
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Qiuyu Lian
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
- Department of Automation, Tsinghua University, Beijing, 100086 China
| | - Yale Jiang
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
- School of Medicine, Tsinghua University, Beijing, 100086 China
| | - Jiadi Luo
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Xinjun Wang
- Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Carla Erb
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Zhongli Xu
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
- School of Medicine, Tsinghua University, Beijing, 100086 China
| | - Xiaoyi Zhang
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Elisa Heidrich-O’Hare
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Qi Yan
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Richard H. Duerr
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Kong Chen
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| | - Wei Chen
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260 USA
| |
Collapse
|
11
|
Qiu X, Rahimzamani A, Wang L, Ren B, Mao Q, Durham T, McFaline-Figueroa JL, Saunders L, Trapnell C, Kannan S. Inferring Causal Gene Regulatory Networks from Coupled Single-Cell Expression Dynamics Using Scribe. Cell Syst 2020; 10:265-274.e11. [PMID: 32135093 PMCID: PMC7223477 DOI: 10.1016/j.cels.2020.02.003] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 06/08/2019] [Accepted: 02/05/2020] [Indexed: 01/13/2023]
Abstract
Here, we present Scribe (https://github.com/aristoteleo/Scribe-py), a toolkit for detecting and visualizing causal regulatory interactions between genes and explore the potential for single-cell experiments to power network reconstruction. Scribe employs restricted directed information to determine causality by estimating the strength of information transferred from a potential regulator to its downstream target. We apply Scribe and other leading approaches for causal network reconstruction to several types of single-cell measurements and show that there is a dramatic drop in performance for "pseudotime"-ordered single-cell data compared with true time-series data. We demonstrate that performing causal inference requires temporal coupling between measurements. We show that methods such as "RNA velocity" restore some degree of coupling through an analysis of chromaffin cell fate commitment. These analyses highlight a shortcoming in experimental and computational methods for analyzing gene regulation at single-cell resolution and suggest ways of overcoming it.
Collapse
Affiliation(s)
- Xiaojie Qiu
- Molecular & Cellular Biology Program, University of Washington, Seattle, WA, USA; Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Arman Rahimzamani
- Department of Electrical Engineering, University of Washington, Seattle, WA, USA
| | - Li Wang
- Department of Mathematics, University of Texas at Arlington, Arlington, TX, USA
| | - Bingcheng Ren
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Qi Mao
- HERE company, Chicago, IL 60606, USA
| | - Timothy Durham
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Lauren Saunders
- Molecular & Cellular Biology Program, University of Washington, Seattle, WA, USA; Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Cole Trapnell
- Molecular & Cellular Biology Program, University of Washington, Seattle, WA, USA; Department of Genome Sciences, University of Washington, Seattle, WA, USA; Brotman-Baty Institute for Precision Medicine, Seattle, WA, USA.
| | - Sreeram Kannan
- Department of Electrical Engineering, University of Washington, Seattle, WA, USA.
| |
Collapse
|
12
|
Wang S, Karikomi M, MacLean AL, Nie Q. Cell lineage and communication network inference via optimization for single-cell transcriptomics. Nucleic Acids Res 2019; 47:e66. [PMID: 30923815 PMCID: PMC6582411 DOI: 10.1093/nar/gkz204] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 03/04/2019] [Accepted: 03/27/2019] [Indexed: 12/20/2022] Open
Abstract
The use of single-cell transcriptomics has become a major approach to delineate cell subpopulations and the transitions between them. While various computational tools using different mathematical methods have been developed to infer clusters, marker genes, and cell lineage, none yet integrate these within a mathematical framework to perform multiple tasks coherently. Such coherence is critical for the inference of cell–cell communication, a major remaining challenge. Here, we present similarity matrix-based optimization for single-cell data analysis (SoptSC), in which unsupervised clustering, pseudotemporal ordering, lineage inference, and marker gene identification are inferred via a structured cell-to-cell similarity matrix. SoptSC then predicts cell–cell communication networks, enabling reconstruction of complex cell lineages that include feedback or feedforward interactions. Application of SoptSC to early embryonic development, epidermal regeneration, and hematopoiesis demonstrates robust identification of subpopulations, lineage relationships, and pseudotime, and prediction of pathway-specific cell communication patterns regulating processes of development and differentiation.
Collapse
Affiliation(s)
- Shuxiong Wang
- Department of Mathematics, University of California, Irvine, CA 92697, USA
| | - Matthew Karikomi
- Department of Mathematics, University of California, Irvine, CA 92697, USA
| | - Adam L MacLean
- Department of Mathematics, University of California, Irvine, CA 92697, USA.,Department of Biological Sciences, University of Southern California, Irvine, CA 90089, USA
| | - Qing Nie
- Department of Mathematics, University of California, Irvine, CA 92697, USA.,Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA
| |
Collapse
|
13
|
Blencowe M, Arneson D, Ding J, Chen YW, Saleem Z, Yang X. Network modeling of single-cell omics data: challenges, opportunities, and progresses. Emerg Top Life Sci 2019; 3:379-398. [PMID: 32270049 PMCID: PMC7141415 DOI: 10.1042/etls20180176] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 06/07/2019] [Accepted: 06/24/2019] [Indexed: 01/07/2023]
Abstract
Single-cell multi-omics technologies are rapidly evolving, prompting both methodological advances and biological discoveries at an unprecedented speed. Gene regulatory network modeling has been used as a powerful approach to elucidate the complex molecular interactions underlying biological processes and systems, yet its application in single-cell omics data modeling has been met with unique challenges and opportunities. In this review, we discuss these challenges and opportunities, and offer an overview of the recent development of network modeling approaches designed to capture dynamic networks, within-cell networks, and cell-cell interaction or communication networks. Finally, we outline the remaining gaps in single-cell gene network modeling and the outlooks of the field moving forward.
Collapse
Affiliation(s)
- Montgomery Blencowe
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Douglas Arneson
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Jessica Ding
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Yen-Wei Chen
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Molecular Toxicology Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Zara Saleem
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Molecular Toxicology Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| |
Collapse
|
14
|
Bonnaffoux A, Herbach U, Richard A, Guillemin A, Gonin-Giraud S, Gros PA, Gandrillon O. WASABI: a dynamic iterative framework for gene regulatory network inference. BMC Bioinformatics 2019; 20:220. [PMID: 31046682 PMCID: PMC6498543 DOI: 10.1186/s12859-019-2798-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 04/09/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Inference of gene regulatory networks from gene expression data has been a long-standing and notoriously difficult task in systems biology. Recently, single-cell transcriptomic data have been massively used for gene regulatory network inference, with both successes and limitations. RESULTS In the present work we propose an iterative algorithm called WASABI, dedicated to inferring a causal dynamical network from time-stamped single-cell data, which tackles some of the limitations associated with current approaches. We first introduce the concept of waves, which posits that the information provided by an external stimulus will affect genes one-by-one through a cascade, like waves spreading through a network. This concept allows us to infer the network one gene at a time, after genes have been ordered regarding their time of regulation. We then demonstrate the ability of WASABI to correctly infer small networks, which have been simulated in silico using a mechanistic model consisting of coupled piecewise-deterministic Markov processes for the proper description of gene expression at the single-cell level. We finally apply WASABI on in vitro generated data on an avian model of erythroid differentiation. The structure of the resulting gene regulatory network sheds a new light on the molecular mechanisms controlling this process. In particular, we find no evidence for hub genes and a much more distributed network structure than expected. Interestingly, we find that a majority of genes are under the direct control of the differentiation-inducing stimulus. CONCLUSIONS Together, these results demonstrate WASABI versatility and ability to tackle some general gene regulatory networks inference issues. It is our hope that WASABI will prove useful in helping biologists to fully exploit the power of time-stamped single-cell data.
Collapse
Affiliation(s)
- Arnaud Bonnaffoux
- University Lyon, ENS de Lyon, University Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, Lyon, France
- Inria Team Dracula, Inria Center Grenoble Rhône-Alpes, Lyon, France
- Cosmotech, Lyon, France
| | - Ulysse Herbach
- University Lyon, ENS de Lyon, University Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, Lyon, France
- Inria Team Dracula, Inria Center Grenoble Rhône-Alpes, Lyon, France
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5208, Institut Camille Jordan, Villeurbanne, France
| | - Angélique Richard
- University Lyon, ENS de Lyon, University Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, Lyon, France
| | - Anissa Guillemin
- University Lyon, ENS de Lyon, University Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, Lyon, France
| | - Sandrine Gonin-Giraud
- University Lyon, ENS de Lyon, University Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, Lyon, France
| | | | - Olivier Gandrillon
- University Lyon, ENS de Lyon, University Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, Lyon, France
- Inria Team Dracula, Inria Center Grenoble Rhône-Alpes, Lyon, France
| |
Collapse
|
15
|
Fernandez-Valverde SL, Aguilera F, Ramos-Díaz RA. Inference of Developmental Gene Regulatory Networks Beyond Classical Model Systems: New Approaches in the Post-genomic Era. Integr Comp Biol 2019; 58:640-653. [PMID: 29917089 DOI: 10.1093/icb/icy061] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The advent of high-throughput sequencing (HTS) technologies has revolutionized the way we understand the transformation of genetic information into morphological traits. Elucidating the network of interactions between genes that govern cell differentiation through development is one of the core challenges in genome research. These networks are known as developmental gene regulatory networks (dGRNs) and consist largely of the functional linkage between developmental control genes, cis-regulatory modules, and differentiation genes, which generate spatially and temporally refined patterns of gene expression. Over the last 20 years, great advances have been made in determining these gene interactions mainly in classical model systems, including human, mouse, sea urchin, fruit fly, and worm. This has brought about a radical transformation in the fields of developmental biology and evolutionary biology, allowing the generation of high-resolution gene regulatory maps to analyze cell differentiation during animal development. Such maps have enabled the identification of gene regulatory circuits and have led to the development of network inference methods that can recapitulate the differentiation of specific cell-types or developmental stages. In contrast, dGRN research in non-classical model systems has been limited to the identification of developmental control genes via the candidate gene approach and the characterization of their spatiotemporal expression patterns, as well as to the discovery of cis-regulatory modules via patterns of sequence conservation and/or predicted transcription-factor binding sites. However, thanks to the continuous advances in HTS technologies, this scenario is rapidly changing. Here, we give a historical overview on the architecture and elucidation of the dGRNs. Subsequently, we summarize the approaches available to unravel these regulatory networks, highlighting the vast range of possibilities of integrating multiple technical advances and theoretical approaches to expand our understanding on the global gene regulation during animal development in non-classical model systems. Such new knowledge will not only lead to greater insights into the evolution of molecular mechanisms underlying cell identity and animal body plans, but also into the evolution of morphological key innovations in animals.
Collapse
Affiliation(s)
- Selene L Fernandez-Valverde
- CONACYT, Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del IPN, Irapuato, Guanajuato, Mexico
| | - Felipe Aguilera
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Biológicas, Universidad de Concepción, Chile
| | - René Alexander Ramos-Díaz
- CONACYT, Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del IPN, Irapuato, Guanajuato, Mexico
| |
Collapse
|
16
|
Chan TE, Stumpf MPH, Babtie AC. Gene Regulatory Networks from Single Cell Data for Exploring Cell Fate Decisions. Methods Mol Biol 2019; 1975:211-238. [PMID: 31062312 DOI: 10.1007/978-1-4939-9224-9_10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Single cell experimental techniques now allow us to quantify gene expression in up to thousands of individual cells. These data reveal the changes in transcriptional state that occur as cells progress through development and adopt specialized cell fates. In this chapter we describe in detail how to use our network inference algorithm (PIDC)-and the associated software package NetworkInference.jl-to infer functional interactions between genes from the observed gene expression patterns. We exploit the large sample sizes and inherent variability of single cell data to detect statistical dependencies between genes that indicate putative (co-)regulatory relationships, using multivariate information measures that can capture complex statistical relationships. We provide guidelines on how best to combine this analysis with other complementary methods designed to explore single cell data, and how to interpret the resulting gene regulatory network models to gain insight into the processes regulating cell differentiation.
Collapse
Affiliation(s)
- Thalia E Chan
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, UK
| | - Michael P H Stumpf
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, UK
| | - Ann C Babtie
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, UK.
| |
Collapse
|
17
|
Brackston RD, Lakatos E, Stumpf MPH. Transition state characteristics during cell differentiation. PLoS Comput Biol 2018; 14:e1006405. [PMID: 30235202 PMCID: PMC6168170 DOI: 10.1371/journal.pcbi.1006405] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 10/02/2018] [Accepted: 07/27/2018] [Indexed: 12/11/2022] Open
Abstract
Models describing the process of stem-cell differentiation are plentiful, and may offer insights into the underlying mechanisms and experimentally observed behaviour. Waddington's epigenetic landscape has been providing a conceptual framework for differentiation processes since its inception. It also allows, however, for detailed mathematical and quantitative analyses, as the landscape can, at least in principle, be related to mathematical models of dynamical systems. Here we focus on a set of dynamical systems features that are intimately linked to cell differentiation, by considering exemplar dynamical models that capture important aspects of stem cell differentiation dynamics. These models allow us to map the paths that cells take through gene expression space as they move from one fate to another, e.g. from a stem-cell to a more specialized cell type. Our analysis highlights the role of the transition state (TS) that separates distinct cell fates, and how the nature of the TS changes as the underlying landscape changes-change that can be induced by e.g. cellular signaling. We demonstrate that models for stem cell differentiation may be interpreted in terms of either a static or transitory landscape. For the static case the TS represents a particular transcriptional profile that all cells approach during differentiation. Alternatively, the TS may refer to the commonly observed period of heterogeneity as cells undergo stochastic transitions.
Collapse
Affiliation(s)
- Rowan D. Brackston
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Eszter Lakatos
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Michael P. H. Stumpf
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, United Kingdom
- School of BioScience and School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| |
Collapse
|
18
|
Jin S, MacLean AL, Peng T, Nie Q. scEpath: energy landscape-based inference of transition probabilities and cellular trajectories from single-cell transcriptomic data. Bioinformatics 2018; 34:2077-2086. [PMID: 29415263 PMCID: PMC6658715 DOI: 10.1093/bioinformatics/bty058] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Revised: 01/11/2018] [Accepted: 02/03/2018] [Indexed: 01/18/2023] Open
Abstract
Motivation Single-cell RNA-sequencing (scRNA-seq) offers unprecedented resolution for studying cellular decision-making processes. Robust inference of cell state transition paths and probabilities is an important yet challenging step in the analysis of these data. Results Here we present scEpath, an algorithm that calculates energy landscapes and probabilistic directed graphs in order to reconstruct developmental trajectories. We quantify the energy landscape using 'single-cell energy' and distance-based measures, and find that the combination of these enables robust inference of the transition probabilities and lineage relationships between cell states. We also identify marker genes and gene expression patterns associated with cell state transitions. Our approach produces pseudotemporal orderings that are-in combination-more robust and accurate than current methods, and offers higher resolution dynamics of the cell state transitions, leading to new insight into key transition events during differentiation and development. Moreover, scEpath is robust to variation in the size of the input gene set, and is broadly unsupervised, requiring few parameters to be set by the user. Applications of scEpath led to the identification of a cell-cell communication network implicated in early human embryo development, and novel transcription factors important for myoblast differentiation. scEpath allows us to identify common and specific temporal dynamics and transcriptional factor programs along branched lineages, as well as the transition probabilities that control cell fates. Availability and implementation A MATLAB package of scEpath is available at https://github.com/sqjin/scEpath. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Suoqin Jin
- Department of Mathematics and Center for Complex Biological Systems
| | - Adam L MacLean
- Department of Mathematics and Center for Complex Biological Systems
| | - Tao Peng
- Department of Mathematics and Center for Complex Biological Systems
| | - Qing Nie
- Department of Mathematics and Center for Complex Biological Systems
- Department of Development and Cell Biology, University of California, Irvine, CA, USA
| |
Collapse
|