1
|
Gorin G, Carilli M, Chari T, Pachter L. Spectral neural approximations for models of transcriptional dynamics. Biophys J 2024; 123:2892-2901. [PMID: 38715358 DOI: 10.1016/j.bpj.2024.04.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 03/22/2024] [Accepted: 04/30/2024] [Indexed: 05/18/2024] Open
Abstract
The advent of high-throughput transcriptomics provides an opportunity to advance mechanistic understanding of transcriptional processes and their connections to cellular function at an unprecedented, genome-wide scale. These transcriptional systems, which involve discrete stochastic events, are naturally modeled using chemical master equations (CMEs), which can be solved for probability distributions to fit biophysical rates that govern system dynamics. While CME models have been used as standards in fluorescence transcriptomics for decades to analyze single-species RNA distributions, there are often no closed-form solutions to CMEs that model multiple species, such as nascent and mature RNA transcript counts. This has prevented the application of standard likelihood-based statistical methods for analyzing high-throughput, multi-species transcriptomic datasets using biophysical models. Inspired by recent work in machine learning to learn solutions to complex dynamical systems, we leverage neural networks and statistical understanding of system distributions to produce accurate approximations to a steady-state bivariate distribution for a model of the RNA life cycle that includes nascent and mature molecules. The steady-state distribution to this simple model has no closed-form solution and requires intensive numerical solving techniques: our approach reduces likelihood evaluation time by several orders of magnitude. We demonstrate two approaches, whereby solutions are approximated by 1) learning the weights of kernel distributions with constrained parameters or 2) learning both weights and scaling factors for parameters of kernel distributions. We show that our strategies, denoted by kernel weight regression and parameter-scaled kernel weight regression, respectively, enable broad exploration of parameter space and can be used in existing likelihood frameworks to infer transcriptional burst sizes, RNA splicing rates, and mRNA degradation rates from experimental transcriptomic data.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California
| | - Maria Carilli
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California; Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California.
| |
Collapse
|
2
|
Dolgov S, Savostyanov D. Tensor product algorithms for inference of contact network from epidemiological data. BMC Bioinformatics 2024; 25:285. [PMID: 39223484 PMCID: PMC11370089 DOI: 10.1186/s12859-024-05910-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 08/21/2024] [Indexed: 09/04/2024] Open
Abstract
We consider a problem of inferring contact network from nodal states observed during an epidemiological process. In a black-box Bayesian optimisation framework this problem reduces to a discrete likelihood optimisation over the set of possible networks. The cardinality of this set grows combinatorially with the number of network nodes, which makes this optimisation computationally challenging. For each network, its likelihood is the probability for the observed data to appear during the evolution of the epidemiological process on this network. This probability can be very small, particularly if the network is significantly different from the ground truth network, from which the observed data actually appear. A commonly used stochastic simulation algorithm struggles to recover rare events and hence to estimate small probabilities and likelihoods. In this paper we replace the stochastic simulation with solving the chemical master equation for the probabilities of all network states. Since this equation also suffers from the curse of dimensionality, we apply tensor train approximations to overcome it and enable fast and accurate computations. Numerical simulations demonstrate efficient black-box Bayesian inference of the network.
Collapse
Affiliation(s)
- Sergey Dolgov
- University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | | |
Collapse
|
3
|
Cao Z, Chen R, Xu L, Zhou X, Fu X, Zhong W, Grima R. Efficient and scalable prediction of stochastic reaction-diffusion processes using graph neural networks. Math Biosci 2024; 375:109248. [PMID: 38986837 DOI: 10.1016/j.mbs.2024.109248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Revised: 05/07/2024] [Accepted: 07/03/2024] [Indexed: 07/12/2024]
Abstract
The dynamics of locally interacting particles that are distributed in space give rise to a multitude of complex behaviours. However the simulation of reaction-diffusion processes which model such systems is highly computationally expensive, the cost increasing rapidly with the size of space. Here, we devise a graph neural network based approach that uses cheap Monte Carlo simulations of reaction-diffusion processes in a small space to cast predictions of the dynamics of the same processes in a much larger and complex space, including spaces modelled by networks with heterogeneous topology. By applying the method to two biological examples, we show that it leads to accurate results in a small fraction of the computation time of standard stochastic simulation methods. The scalability and accuracy of the method suggest it is a promising approach for studying reaction-diffusion processes in complex spatial domains such as those modelling biochemical reactions, population evolution and epidemic spreading.
Collapse
Affiliation(s)
- Zhixing Cao
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China; Department of Chemical Engineering, Queen's University, Kingston, Canada K7L 3N6.
| | - Rui Chen
- Shanghai Jiao Tong University School of Medicine, Shanghai 200127, China
| | - Libin Xu
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Xinyi Zhou
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Xiaoming Fu
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Weimin Zhong
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China
| | - Ramon Grima
- School of Biological Sciences, the University of Edinburgh, Max Born Crescent, Edinburgh, EH9 3BF, Scotland, United Kingdom.
| |
Collapse
|
4
|
Fang Z, Gupta A, Kumar S, Khammash M. Advanced methods for gene network identification and noise decomposition from single-cell data. Nat Commun 2024; 15:4911. [PMID: 38851792 PMCID: PMC11162465 DOI: 10.1038/s41467-024-49177-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 05/24/2024] [Indexed: 06/10/2024] Open
Abstract
Central to analyzing noisy gene expression systems is solving the Chemical Master Equation (CME), which characterizes the probability evolution of the reacting species' copy numbers. Solving CMEs for high-dimensional systems suffers from the curse of dimensionality. Here, we propose a computational method for improved scalability through a divide-and-conquer strategy that optimally decomposes the whole system into a leader system and several conditionally independent follower subsystems. The CME is solved by combining Monte Carlo estimation for the leader system with stochastic filtering procedures for the follower subsystems. We demonstrate this method with high-dimensional numerical examples and apply it to identify a yeast transcription system at the single-cell resolution, leveraging mRNA time-course experimental data. The identification results enable an accurate examination of the heterogeneity in rate parameters among isogenic cells. To validate this result, we develop a noise decomposition technique exploiting time-course data but requiring no supplementary components, e.g., dual-reporters.
Collapse
Affiliation(s)
- Zhou Fang
- Department of Biosystems Science and Engineering, ETH Zurich, CH-4056, Basel, Switzerland
| | - Ankit Gupta
- Department of Biosystems Science and Engineering, ETH Zurich, CH-4056, Basel, Switzerland
| | - Sant Kumar
- Department of Biosystems Science and Engineering, ETH Zurich, CH-4056, Basel, Switzerland
| | - Mustafa Khammash
- Department of Biosystems Science and Engineering, ETH Zurich, CH-4056, Basel, Switzerland.
| |
Collapse
|
5
|
Klumpe HE, Lugagne JB, Khalil AS, Dunlop MJ. Deep Neural Networks for Predicting Single-Cell Responses and Probability Landscapes. ACS Synth Biol 2023; 12:2367-2381. [PMID: 37467372 DOI: 10.1021/acssynbio.3c00203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/21/2023]
Abstract
Engineering biology relies on the accurate prediction of cell responses. However, making these predictions is challenging for a variety of reasons, including the stochasticity of biochemical reactions, variability between cells, and incomplete information about underlying biological processes. Machine learning methods, which can model diverse input-output relationships without requiring a priori mechanistic knowledge, are an ideal tool for this task. For example, such approaches can be used to predict gene expression dynamics given time-series data of past expression history. To explore this application, we computationally simulated single-cell responses, incorporating different sources of noise and alternative genetic circuit designs. We showed that deep neural networks trained on these simulated data were able to correctly infer the underlying dynamics of a cell response even in the presence of measurement noise and stochasticity in the biochemical reactions. The training set size and the amount of past data provided as inputs both affected prediction quality, with cascaded genetic circuits that introduce delays requiring more past data. We also tested prediction performance on a bistable auto-activation circuit, finding that our initial method for predicting a single trajectory was fundamentally ill-suited for multimodal dynamics. To address this, we updated the network architecture to predict the entire distribution of future states, showing it could accurately predict bimodal expression distributions. Overall, these methods can be readily applied to the diverse prediction tasks necessary to predict and control a variety of biological circuits, a key aspect of many synthetic biology applications.
Collapse
Affiliation(s)
- Heidi E Klumpe
- Biomedical Engineering, Boston University, Boston, Massachusetts 02215, United States
- Biological Design Center, Boston University, Boston, Massachusetts 02215, United States
| | - Jean-Baptiste Lugagne
- Biomedical Engineering, Boston University, Boston, Massachusetts 02215, United States
- Biological Design Center, Boston University, Boston, Massachusetts 02215, United States
| | - Ahmad S Khalil
- Biomedical Engineering, Boston University, Boston, Massachusetts 02215, United States
- Biological Design Center, Boston University, Boston, Massachusetts 02215, United States
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, Massachusetts 02115, United States
| | - Mary J Dunlop
- Biomedical Engineering, Boston University, Boston, Massachusetts 02215, United States
- Biological Design Center, Boston University, Boston, Massachusetts 02215, United States
| |
Collapse
|
6
|
Vo HD, Forero-Quintero LS, Aguilera LU, Munsky B. Analysis and design of single-cell experiments to harvest fluctuation information while rejecting measurement noise. Front Cell Dev Biol 2023; 11:1133994. [PMID: 37305680 PMCID: PMC10250612 DOI: 10.3389/fcell.2023.1133994] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 05/10/2023] [Indexed: 06/13/2023] Open
Abstract
Introduction: Despite continued technological improvements, measurement errors always reduce or distort the information that any real experiment can provide to quantify cellular dynamics. This problem is particularly serious for cell signaling studies to quantify heterogeneity in single-cell gene regulation, where important RNA and protein copy numbers are themselves subject to the inherently random fluctuations of biochemical reactions. Until now, it has not been clear how measurement noise should be managed in addition to other experiment design variables (e.g., sampling size, measurement times, or perturbation levels) to ensure that collected data will provide useful insights on signaling or gene expression mechanisms of interest. Methods: We propose a computational framework that takes explicit consideration of measurement errors to analyze single-cell observations, and we derive Fisher Information Matrix (FIM)-based criteria to quantify the information value of distorted experiments. Results and Discussion: We apply this framework to analyze multiple models in the context of simulated and experimental single-cell data for a reporter gene controlled by an HIV promoter. We show that the proposed approach quantitatively predicts how different types of measurement distortions affect the accuracy and precision of model identification, and we demonstrate that the effects of these distortions can be mitigated through explicit consideration during model inference. We conclude that this reformulation of the FIM could be used effectively to design single-cell experiments to optimally harvest fluctuation information while mitigating the effects of image distortion.
Collapse
Affiliation(s)
- Huy D. Vo
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, United States
| | - Linda S. Forero-Quintero
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, United States
| | - Luis U. Aguilera
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, United States
| | - Brian Munsky
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, United States
- School of Biomedical Engineering, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
7
|
Caringella G, Bandiera L, Menolascina F. Recent advances, opportunities and challenges in cybergenetic identification and control of biomolecular networks. Curr Opin Biotechnol 2023; 80:102893. [PMID: 36706519 DOI: 10.1016/j.copbio.2023.102893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 12/13/2022] [Accepted: 12/20/2022] [Indexed: 01/26/2023]
Abstract
Cybergenetics is a new area of research aimed at developing digital and biological controllers for living systems. Synthetic biologists have begun exploiting cybergenetic tools and platforms to both accelerate the development of mathematical models and develop control strategies for complex biological phenomena. Here, we review the state of the art in cybergenetic identification and control. Our aim is to lower the entry barrier to this field and foster the adoption of methods and technologies that will accelerate the pace at which Synthetic Biology progresses toward applications.
Collapse
Affiliation(s)
- Gianpio Caringella
- School of Engineering, Institute for Bioengineering, The University of Edinburgh, Edinburgh EH9 3DW, UK
| | - Lucia Bandiera
- School of Engineering, Institute for Bioengineering, The University of Edinburgh, Edinburgh EH9 3DW, UK; Centre for Engineering Biology, The University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Filippo Menolascina
- School of Engineering, Institute for Bioengineering, The University of Edinburgh, Edinburgh EH9 3DW, UK; Centre for Engineering Biology, The University of Edinburgh, Edinburgh EH9 3BF, UK.
| |
Collapse
|
8
|
Tang Y, Weng J, Zhang P. Neural-network solutions to stochastic reaction networks. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-023-00632-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
|
9
|
Sukys A, Öcal K, Grima R. Approximating solutions of the Chemical Master equation using neural networks. iScience 2022; 25:105010. [PMID: 36117994 PMCID: PMC9474291 DOI: 10.1016/j.isci.2022.105010] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 06/13/2022] [Accepted: 08/18/2022] [Indexed: 10/27/2022] Open
Abstract
The Chemical Master Equation (CME) provides an accurate description of stochastic biochemical reaction networks in well-mixed conditions, but it cannot be solved analytically for most systems of practical interest. Although Monte Carlo methods provide a principled means to probe system dynamics, the large number of simulations typically required can render the estimation of molecule number distributions and other quantities infeasible. In this article, we aim to leverage the representational power of neural networks to approximate the solutions of the CME and propose a framework for the Neural Estimation of Stochastic Simulations for Inference and Exploration (Nessie). Our approach is based on training neural networks to learn the distributions predicted by the CME from relatively few stochastic simulations. We show on biologically relevant examples that simple neural networks with one hidden layer can capture highly complex distributions across parameter space, thereby accelerating computationally intensive tasks such as parameter exploration and inference.
Collapse
Affiliation(s)
- Augustinas Sukys
- School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JH, UK
- The Alan Turing Institute, London NW1 2DB, UK
| | - Kaan Öcal
- School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JH, UK
- School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK
| | - Ramon Grima
- School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JH, UK
| |
Collapse
|