1
|
Basallo O, Perez L, Lucido A, Sorribas A, Marin-Saguino A, Vilaprinyo E, Perez-Fons L, Albacete A, Martínez-Andújar C, Fraser PD, Christou P, Capell T, Alves R. Changing biosynthesis of terpenoid percursors in rice through synthetic biology. FRONTIERS IN PLANT SCIENCE 2023; 14:1133299. [PMID: 37465386 PMCID: PMC10350630 DOI: 10.3389/fpls.2023.1133299] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 05/30/2023] [Indexed: 07/20/2023]
Abstract
Many highly valued chemicals in the pharmaceutical, biotechnological, cosmetic, and biomedical industries belong to the terpenoid family. Biosynthesis of these chemicals relies on polymerization of Isopentenyl di-phosphate (IPP) and/or dimethylallyl diphosphate (DMAPP) monomers, which plants synthesize using two alternative pathways: a cytosolic mevalonic acid (MVA) pathway and a plastidic methyleritritol-4-phosphate (MEP) pathway. As such, developing plants for use as a platform to use IPP/DMAPP and produce high value terpenoids is an important biotechnological goal. Still, IPP/DMAPP are the precursors to many plant developmental hormones. This creates severe challenges in redirecting IPP/DMAPP towards production of non-cognate plant metabolites. A potential solution to this problem is increasing the IPP/DMAPP production flux in planta. Here, we aimed at discovering, understanding, and predicting the effects of increasing IPP/DMAPP production in plants through modelling. We used synthetic biology to create rice lines containing an additional ectopic MVA biosynthetic pathway for producing IPP/DMAPP. The rice lines express three alternative versions of the additional MVA pathway in the plastid, in addition to the normal endogenous pathways. We collected data for changes in macroscopic and molecular phenotypes, gene expression, isoprenoid content, and hormone abundance in those lines. To integrate the molecular and macroscopic data and develop a more in depth understanding of the effects of engineering the exogenous pathway in the mutant rice lines, we developed and analyzed data-centric, line-specific, multilevel mathematical models. These models connect the effects of variations in hormones and gene expression to changes in macroscopic plant phenotype and metabolite concentrations within the MVA and MEP pathways of WT and mutant rice lines. Our models allow us to predict how an exogenous IPP/DMAPP biosynthetic pathway affects the flux of terpenoid precursors. We also quantify the long-term effect of plant hormones on the dynamic behavior of IPP/DMAPP biosynthetic pathways in seeds, and predict plant characteristics, such as plant height, leaf size, and chlorophyll content from molecular data. In addition, our models are a tool that can be used in the future to help in prioritizing re-engineering strategies for the exogenous pathway in order to achieve specific metabolic goals.
Collapse
Affiliation(s)
- Orio Basallo
- Systems Biology Group, Department Ciències Mèdiques Bàsiques, Faculty of Medicine, Universitat de Lleida, Lleida, Spain
- Institut de Recerca Biomedica de Lleida (IRBLleida), Lleida, Spain
| | - Lucia Perez
- Applied Plant Biotechnology Group, Department de Producció Vegetal I Ciència Florestal, Escola Tècnica Superior d'Enginyeria Agroalimentària i Forestal i de Veterinària (ETSEAFiV), Universitat de Lleida, Lleida, Spain
- Agrotecnio Centres de Recerca de Catalunya (CERCA) Center, Lleida, Spain
| | - Abel Lucido
- Systems Biology Group, Department Ciències Mèdiques Bàsiques, Faculty of Medicine, Universitat de Lleida, Lleida, Spain
- Institut de Recerca Biomedica de Lleida (IRBLleida), Lleida, Spain
| | - Albert Sorribas
- Systems Biology Group, Department Ciències Mèdiques Bàsiques, Faculty of Medicine, Universitat de Lleida, Lleida, Spain
- Institut de Recerca Biomedica de Lleida (IRBLleida), Lleida, Spain
| | - Alberto Marin-Saguino
- Systems Biology Group, Department Ciències Mèdiques Bàsiques, Faculty of Medicine, Universitat de Lleida, Lleida, Spain
- Institut de Recerca Biomedica de Lleida (IRBLleida), Lleida, Spain
| | - Ester Vilaprinyo
- Systems Biology Group, Department Ciències Mèdiques Bàsiques, Faculty of Medicine, Universitat de Lleida, Lleida, Spain
- Institut de Recerca Biomedica de Lleida (IRBLleida), Lleida, Spain
| | - Laura Perez-Fons
- School of Biological Sciences, Royal Holloway University of London, Egham Hill, United Kingdom
| | - Alfonso Albacete
- Departament of Plant Nutrition, Center of Edaphology and Applied Biology of the Segura (CEBAS), Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Murcia, Murcia, Spain
- Department of Plant Production and Agrotechnology, Institute for Agri-Food Research and Development of Murcia, Murcia, Spain
| | - Cristina Martínez-Andújar
- Departament of Plant Nutrition, Center of Edaphology and Applied Biology of the Segura (CEBAS), Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Murcia, Murcia, Spain
| | - Paul D. Fraser
- School of Biological Sciences, Royal Holloway University of London, Egham Hill, United Kingdom
| | - Paul Christou
- Applied Plant Biotechnology Group, Department de Producció Vegetal I Ciència Florestal, Escola Tècnica Superior d'Enginyeria Agroalimentària i Forestal i de Veterinària (ETSEAFiV), Universitat de Lleida, Lleida, Spain
- Agrotecnio Centres de Recerca de Catalunya (CERCA) Center, Lleida, Spain
- ICREA, Catalan Institute for Research and Advanced Studies, Barcelona, Spain
| | - Teresa Capell
- Applied Plant Biotechnology Group, Department de Producció Vegetal I Ciència Florestal, Escola Tècnica Superior d'Enginyeria Agroalimentària i Forestal i de Veterinària (ETSEAFiV), Universitat de Lleida, Lleida, Spain
- Agrotecnio Centres de Recerca de Catalunya (CERCA) Center, Lleida, Spain
| | - Rui Alves
- Systems Biology Group, Department Ciències Mèdiques Bàsiques, Faculty of Medicine, Universitat de Lleida, Lleida, Spain
- Institut de Recerca Biomedica de Lleida (IRBLleida), Lleida, Spain
| |
Collapse
|
2
|
Goyal P, Benner P. Neural ordinary differential equations with irregular and noisy data. ROYAL SOCIETY OPEN SCIENCE 2023; 10:221475. [PMID: 37476515 PMCID: PMC10354476 DOI: 10.1098/rsos.221475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Accepted: 06/23/2023] [Indexed: 07/22/2023]
Abstract
Measurement noise is an integral part of collecting data of a physical process. Thus, noise removal is necessary to draw conclusions from these data, and it often becomes essential to construct dynamical models using these data. We discuss a methodology to learn differential equation(s) using noisy and irregularly sampled measurements. In our methodology, the main innovation can be seen in the integration of deep neural networks with the neural ordinary differential equations (ODEs) approach. Precisely, we aim at learning a neural network that provides (approximately) an implicit representation of the data and an additional neural network that models the vector fields of the dependent variables. We combine these two networks by constraints using neural ODEs. The proposed framework to learn a model describing the vector field is highly effective under noisy measurements. The approach can handle scenarios where dependent variables are unavailable at the same temporal grid. Moreover, a particular structure, e.g. second order with respect to time, can easily be incorporated. We demonstrate the effectiveness of the proposed method for learning models using data obtained from various differential equations and present a comparison with the neural ODE method that does not make any special treatment to noise. Additionally, we discuss an ensemble approach to improve the performance of the proposed approach further.
Collapse
Affiliation(s)
- Pawan Goyal
- Max Planck Institute for Dynamics of Complex Technical Systems, Standtorstrasse 1, 39106 Magdeburg, Germany
| | - Peter Benner
- Max Planck Institute for Dynamics of Complex Technical Systems, Standtorstrasse 1, 39106 Magdeburg, Germany
| |
Collapse
|
3
|
Makrygiorgos G, Berliner AJ, Shi F, Clark DS, Arkin AP, Mesbah A. Data-driven flow-map models for data-efficient discovery of dynamics and fast uncertainty quantification of biological and biochemical systems. Biotechnol Bioeng 2023; 120:803-818. [PMID: 36453664 DOI: 10.1002/bit.28295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Revised: 07/27/2022] [Accepted: 10/09/2022] [Indexed: 12/05/2022]
Abstract
Computational models are increasingly used to investigate and predict the complex dynamics of biological and biochemical systems. Nevertheless, governing equations of a biochemical system may not be (fully) known, which would necessitate learning the system dynamics directly from, often limited and noisy, observed data. On the other hand, when expensive models are available, systematic and efficient quantification of the effects of model uncertainties on quantities of interest can be an arduous task. This paper leverages the notion of flow-map (de)compositions to present a framework that can address both of these challenges via learning data-driven models useful for capturing the dynamical behavior of biochemical systems. Data-driven flow-map models seek to directly learn the integration operators of the governing differential equations in a black-box manner, irrespective of structure of the underlying equations. As such, they can serve as a flexible approach for deriving fast-to-evaluate surrogates for expensive computational models of system dynamics, or, alternatively, for reconstructing the long-term system dynamics via experimental observations. We present a data-efficient approach to data-driven flow-map modeling based on polynomial chaos Kriging. The approach is demonstrated for discovery of the dynamics of various benchmark systems and a coculture bioreactor subject to external forcing, as well as for uncertainty quantification of a microbial electrosynthesis reactor. Such data-driven models and analyses of dynamical systems can be paramount in the design and optimization of bioprocesses and integrated biomanufacturing systems.
Collapse
Affiliation(s)
- Georgios Makrygiorgos
- Center for the Utilization of Biological Engineering in Space (CUBES), Berkeley, California, USA.,Department of Chemical and Biomolecular Engineering, University of California, Berkeley, California, USA
| | - Aaron J Berliner
- Center for the Utilization of Biological Engineering in Space (CUBES), Berkeley, California, USA.,Department of Bioengineering, University of California, Berkeley, California, USA
| | - Fengzhe Shi
- Center for the Utilization of Biological Engineering in Space (CUBES), Berkeley, California, USA.,Department of Chemical and Biomolecular Engineering, University of California, Berkeley, California, USA
| | - Douglas S Clark
- Center for the Utilization of Biological Engineering in Space (CUBES), Berkeley, California, USA.,Department of Chemical and Biomolecular Engineering, University of California, Berkeley, California, USA
| | - Adam P Arkin
- Center for the Utilization of Biological Engineering in Space (CUBES), Berkeley, California, USA.,Department of Bioengineering, University of California, Berkeley, California, USA
| | - Ali Mesbah
- Center for the Utilization of Biological Engineering in Space (CUBES), Berkeley, California, USA.,Department of Chemical and Biomolecular Engineering, University of California, Berkeley, California, USA
| |
Collapse
|
4
|
Chen B, Huang K, Raghupathi S, Chandratreya I, Du Q, Lipson H. Automated discovery of fundamental variables hidden in experimental data. NATURE COMPUTATIONAL SCIENCE 2022; 2:433-442. [PMID: 38177869 DOI: 10.1038/s43588-022-00281-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 06/21/2022] [Indexed: 01/06/2024]
Abstract
All physical laws are described as mathematical relationships between state variables. These variables give a complete and non-redundant description of the relevant system. However, despite the prevalence of computing power and artificial intelligence, the process of identifying the hidden state variables themselves has resisted automation. Most data-driven methods for modelling physical phenomena still rely on the assumption that the relevant state variables are already known. A longstanding question is whether it is possible to identify state variables from only high-dimensional observational data. Here we propose a principle for determining how many state variables an observed system is likely to have, and what these variables might be. We demonstrate the effectiveness of this approach using video recordings of a variety of physical dynamical systems, ranging from elastic double pendulums to fire flames. Without any prior knowledge of the underlying physics, our algorithm discovers the intrinsic dimension of the observed dynamics and identifies candidate sets of state variables.
Collapse
Affiliation(s)
- Boyuan Chen
- Department of Computer Science, Columbia University, New York, USA.
| | - Kuang Huang
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, USA
| | - Sunand Raghupathi
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, USA
| | | | - Qiang Du
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, USA
- Data Science Institute, Columbia University, New York, USA
| | - Hod Lipson
- Data Science Institute, Columbia University, New York, USA
- Department of Mechanical Engineering, Columbia University, New York, USA
| |
Collapse
|
5
|
Goyal P, Benner P. Discovery of nonlinear dynamical systems using a Runge-Kutta inspired dictionary-based sparse regression approach. Proc Math Phys Eng Sci 2022; 478:20210883. [PMID: 35756880 PMCID: PMC9215218 DOI: 10.1098/rspa.2021.0883] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 05/16/2022] [Indexed: 11/12/2022] Open
Abstract
In this work, we blend machine learning and dictionary-based learning with numerical analysis tools to discover differential equations from noisy and sparsely sampled measurement data of time-dependent processes. We use the fact that given a dictionary containing large candidate nonlinear functions, dynamical models can often be described by a few appropriately chosen basis functions. As a result, we obtain parsimonious models that can be better interpreted by practitioners, and potentially generalize better beyond the sampling regime than black-box modelling. In this work, we integrate a numerical integration framework with dictionary learning that yields differential equations without requiring or approximating derivative information at any stage. Hence, it is utterly effective for corrupted and sparsely sampled data. We discuss its extension to governing equations, containing rational nonlinearities that typically appear in biological networks. Moreover, we generalized the method to governing equations subject to parameter variations and externally controlled inputs. We demonstrate the efficiency of the method to discover a number of diverse differential equations using noisy measurements, including a model describing neural dynamics, chaotic Lorenz model, Michaelis-Menten kinetics and a parameterized Hopf normal form.
Collapse
Affiliation(s)
- Pawan Goyal
- Max Planck Institute for Dynamics of Complex Technical Systems, Standtorstraße 1, 39106 Magdeburg, Germany
| | - Peter Benner
- Max Planck Institute for Dynamics of Complex Technical Systems, Standtorstraße 1, 39106 Magdeburg, Germany
| |
Collapse
|
6
|
Fasel U, Kutz JN, Brunton BW, Brunton SL. Ensemble-SINDy: Robust sparse model discovery in the low-data, high-noise limit, with active learning and control. Proc Math Phys Eng Sci 2022; 478:20210904. [PMID: 35450025 PMCID: PMC9006119 DOI: 10.1098/rspa.2021.0904] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 03/10/2022] [Indexed: 12/17/2022] Open
Abstract
Sparse model identification enables the discovery of nonlinear dynamical systems purely from data; however, this approach is sensitive to noise, especially in the low-data limit. In this work, we leverage the statistical approach of bootstrap aggregating (bagging) to robustify the sparse identification of the nonlinear dynamics (SINDy) algorithm. First, an ensemble of SINDy models is identified from subsets of limited and noisy data. The aggregate model statistics are then used to produce inclusion probabilities of the candidate functions, which enables uncertainty quantification and probabilistic forecasts. We apply this ensemble-SINDy (E-SINDy) algorithm to several synthetic and real-world datasets and demonstrate substantial improvements to the accuracy and robustness of model discovery from extremely noisy and limited data. For example, E-SINDy uncovers partial differential equations models from data with more than twice as much measurement noise as has been previously reported. Similarly, E-SINDy learns the Lotka Volterra dynamics from remarkably limited data of yearly lynx and hare pelts collected from 1900 to 1920. E-SINDy is computationally efficient, with similar scaling as standard SINDy. Finally, we show that ensemble statistics from E-SINDy can be exploited for active learning and improved model predictive control.
Collapse
Affiliation(s)
- U. Fasel
- Department of Mechanical Engineering, University of Washington, Seattle, WA, USA
| | - J. N. Kutz
- Department of Applied Mathematics, University of Washington, Seattle, WA, USA
| | - B. W. Brunton
- Department of Biology, University of Washington, Seattle, WA, USA
| | - S. L. Brunton
- Department of Mechanical Engineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
7
|
NMR in Metabolomics: From Conventional Statistics to Machine Learning and Neural Network Approaches. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12062824] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
NMR measurements combined with chemometrics allow achieving a great amount of information for the identification of potential biomarkers responsible for a precise metabolic pathway. These kinds of data are useful in different fields, ranging from food to biomedical fields, including health science. The investigation of the whole set of metabolites in a sample, representing its fingerprint in the considered condition, is known as metabolomics and may take advantage of different statistical tools. The new frontier is to adopt self-learning techniques to enhance clustering or classification actions that can improve the predictive power over large amounts of data. Although machine learning is already employed in metabolomics, deep learning and artificial neural networks approaches were only recently successfully applied. In this work, we give an overview of the statistical approaches underlying the wide range of opportunities that machine learning and neural networks allow to perform with accurate metabolites assignment and quantification.Various actual challenges are discussed, such as proper metabolomics, deep learning architectures and model accuracy.
Collapse
|
8
|
La Cava W, Burlacu B, Virgolin M, Kommenda M, Orzechowski P, de França FO, Jin Y, Moore JH. Contemporary Symbolic Regression Methods and their Relative Performance. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2021; 2021:1-16. [PMID: 38715933 PMCID: PMC11074949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/12/2024]
Abstract
Many promising approaches to symbolic regression have been presented in recent years, yet progress in the field continues to suffer from a lack of uniform, robust, and transparent benchmarking standards. We address this shortcoming by introducing an open-source, reproducible benchmarking platform for symbolic regression. We assess 14 symbolic regression methods and 7 machine learning methods on a set of 252 diverse regression problems. Our assessment includes both real-world datasets with no known model form as well as ground-truth benchmark problems. For the real-world datasets, we benchmark the ability of each method to learn models with low error and low complexity relative to state-of-the-art machine learning methods. For the synthetic problems, we assess each method's ability to find exact solutions in the presence of varying levels of noise. Under these controlled experiments, we conclude that the best performing methods for real-world regression combine genetic algorithms with parameter estimation and/or semantic search drivers. When tasked with recovering exact equations in the presence of noise, we find that several approaches perform similarly. We provide a detailed guide to reproducing this experiment and contributing new methods, and encourage other researchers to collaborate with us on a common and living symbolic regression benchmark.
Collapse
Affiliation(s)
| | - Bogdan Burlacu
- Josef Ressel Center for Symbolic Regression, University of Applied Sciences Upper Austria
| | - Marco Virgolin
- Life Sciences and Health Group, Centrum Wiskunde & Informatica
| | - Michael Kommenda
- Josef Ressel Center for Symbolic Regression, University of Applied Sciences Upper Austria
| | | | | | - Ying Jin
- Department of Statistics, Stanford University
| | - Jason H Moore
- Institute for Biomedical Informatics, University of Pennsylvania
| |
Collapse
|
9
|
Regazzoni F, Chapelle D, Moireau P. Combining data assimilation and machine learning to build data-driven models for unknown long time dynamics-Applications in cardiovascular modeling. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2021; 37:e3471. [PMID: 33913623 PMCID: PMC8365699 DOI: 10.1002/cnm.3471] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 03/12/2021] [Accepted: 04/23/2021] [Indexed: 06/12/2023]
Abstract
We propose a method to discover differential equations describing the long-term dynamics of phenomena featuring a multiscale behavior in time, starting from measurements taken at the fast-scale. Our methodology is based on a synergetic combination of data assimilation (DA), used to estimate the parameters associated with the known fast-scale dynamics, and machine learning (ML), used to infer the laws underlying the slow-scale dynamics. Specifically, by exploiting the scale separation between the fast and the slow dynamics, we propose a decoupling of time scales that allows to drastically lower the computational burden. Then, we propose a ML algorithm that learns a parametric mathematical model from a collection of time series coming from the phenomenon to be modeled. Moreover, we study the interpretability of the data-driven models obtained within the black-box learning framework proposed in this paper. In particular, we show that every model can be rewritten in infinitely many different equivalent ways, thus making intrinsically ill-posed the problem of learning a parametric differential equation starting from time series. Hence, we propose a strategy that allows to select a unique representative model in each equivalence class, thus enhancing the interpretability of the results. We demonstrate the effectiveness and noise-robustness of the proposed methods through several test cases, in which we reconstruct several differential models starting from time series generated through the models themselves. Finally, we show the results obtained for a test case in the cardiovascular modeling context, which sheds light on a promising field of application of the proposed methods.
Collapse
Affiliation(s)
- Francesco Regazzoni
- MOX—Mathematics DepartmentPolitecnico di MilanoMilanoItaly
- M3DISIMInstitut National de Recherche en Informatique et en AutomatiquePalaiseauFrance
- LMSEcole Polytechnique, CNRS, Institut Polytechnique de ParisPalaiseauFrance
| | - Dominique Chapelle
- M3DISIMInstitut National de Recherche en Informatique et en AutomatiquePalaiseauFrance
- LMSEcole Polytechnique, CNRS, Institut Polytechnique de ParisPalaiseauFrance
| | - Philippe Moireau
- M3DISIMInstitut National de Recherche en Informatique et en AutomatiquePalaiseauFrance
- LMSEcole Polytechnique, CNRS, Institut Polytechnique de ParisPalaiseauFrance
| |
Collapse
|
10
|
Dasgupta P, Hughes JA, Daley M, Sejdić E. Is Human Walking a Network Medicine Problem? An Analysis Using Symbolic Regression Models with Genetic Programming. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 206:106104. [PMID: 33951562 PMCID: PMC8205964 DOI: 10.1016/j.cmpb.2021.106104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 04/05/2021] [Indexed: 06/12/2023]
Abstract
BACKGROUND AND OBJECTIVE Human walking is typically assessed using a sensor placed on the lower back or the hip. Such analyses often ignore that the arms, legs, and body trunk movements all have significant roles during walking; in other words, these body nodes with accelerometers form a body sensor network (BSN). BSN refers to a network of wearable sensors or devices on the human body that collects physiological signals. Our study proposes that human locomotion could be considered as a network of well-connected nodes. METHODS While hypothesizing that accelerometer data can model this BSN, we collected accelerometer signals from six body areas from ten healthy participants performing a cognitive task. Machine learning based on genetic programming was used to produce a collection of non-linear symbolic models of human locomotion. RESULTS With implications in precision medicine, our primary finding was that our BSN models fit the data from the lower back's accelerometer and describe subject-specific data the best compared to all other models. Across subjects, models were less effective due to the diversity of human sizes. CONCLUSIONS A BSN relationship between all six body nodes has been shown to describe the subject-specific data, which indicates that the network-medicine relationship between these nodes is essential in adequately describing human walking. Our gait analyses can be used for several clinical applications such as medical diagnostics as well as creating a baseline for healthy walking with and without a cognitive load.
Collapse
Affiliation(s)
- Pritika Dasgupta
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
| | - James Alexander Hughes
- Department of Computer Science, St. Francis Xavier University, Antigonish, Nova Scotia, B2G 2W5, Canada
| | - Mark Daley
- Department of Computer Science, Middlesex College, University of Western Ontario, London, Ontario, N6A 3K7, Canada
| | - Ervin Sejdić
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, USA; Department of Electrical and Computer Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, 15261, USA; Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, 15261, USA
| |
Collapse
|
11
|
Shah HA, Liu J, Yang Z, Feng J. Review of Machine Learning Methods for the Prediction and Reconstruction of Metabolic Pathways. Front Mol Biosci 2021; 8:634141. [PMID: 34222327 PMCID: PMC8247443 DOI: 10.3389/fmolb.2021.634141] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 06/01/2021] [Indexed: 11/13/2022] Open
Abstract
Prediction and reconstruction of metabolic pathways play significant roles in many fields such as genetic engineering, metabolic engineering, drug discovery, and are becoming the most active research topics in synthetic biology. With the increase of related data and with the development of machine learning techniques, there have many machine leaning based methods been proposed for prediction or reconstruction of metabolic pathways. Machine learning techniques are showing state-of-the-art performance to handle the rapidly increasing volume of data in synthetic biology. To support researchers in this field, we briefly review the research progress of metabolic pathway reconstruction and prediction based on machine learning. Some challenging issues in the reconstruction of metabolic pathways are also discussed in this paper.
Collapse
Affiliation(s)
- Hayat Ali Shah
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Zhihui Yang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Jing Feng
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| |
Collapse
|
12
|
Udrescu SM, Tegmark M. Symbolic pregression: Discovering physical laws from distorted video. Phys Rev E 2021; 103:043307. [PMID: 34005960 DOI: 10.1103/physreve.103.043307] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 03/31/2021] [Indexed: 11/07/2022]
Abstract
We present a method for unsupervised learning of equations of motion for objects in raw and optionally distorted unlabeled synthetic video (or, more generally, for discovering and modeling predictable features in time-series data). We first train an autoencoder that maps each video frame into a low-dimensional latent space where the laws of motion are as simple as possible, by minimizing a combination of nonlinearity, acceleration, and prediction error. Differential equations describing the motion are then discovered using Pareto-optimal symbolic regression. We find that our pre-regression ("pregression") step is able to rediscover Cartesian coordinates of unlabeled moving objects even when the video is distorted by a generalized lens. Using intuition from multidimensional knot theory, we find that the pregression step is facilitated by first adding extra latent space dimensions to avoid topological problems during training and then removing these extra dimensions via principal component analysis. An inertial frame is autodiscovered by minimizing the combined equation complexity for multiple experiments.
Collapse
Affiliation(s)
- Silviu-Marian Udrescu
- Department of Physics, Institute for AI & Fundamental Interactions, and Center for Brains, Minds, & Machines, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Max Tegmark
- Department of Physics, Institute for AI & Fundamental Interactions, and Center for Brains, Minds, & Machines, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
13
|
Maddu S, Cheeseman BL, Müller CL, Sbalzarini IF. Learning physically consistent differential equation models from data using group sparsity. Phys Rev E 2021; 103:042310. [PMID: 34005966 DOI: 10.1103/physreve.103.042310] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 03/22/2021] [Indexed: 11/07/2022]
Abstract
We propose a statistical learning framework based on group-sparse regression that can be used to (i) enforce conservation laws, (ii) ensure model equivalence, and (iii) guarantee symmetries when learning or inferring differential-equation models from data. Directly learning interpretable mathematical models from data has emerged as a valuable modeling approach. However, in areas such as biology, high noise levels, sensor-induced correlations, and strong intersystem variability can render data-driven models nonsensical or physically inconsistent without additional constraints on the model structure. Hence, it is important to leverage prior knowledge from physical principles to learn biologically plausible and physically consistent models rather than models that simply fit the data best. We present the group iterative hard thresholding algorithm and use stability selection to infer physically consistent models with minimal parameter tuning. We show several applications from systems biology that demonstrate the benefits of enforcing priors in data-driven modeling.
Collapse
Affiliation(s)
- Suryanarayana Maddu
- Technische Universität Dresden, Faculty of Computer Science, 01069 Dresden, Germany.,Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany.,Center for Systems Biology Dresden, 01307 Dresden, Germany.,Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, Dresden/Leipzig, Germany
| | - Bevan L Cheeseman
- Technische Universität Dresden, Faculty of Computer Science, 01069 Dresden, Germany.,Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany.,Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Christian L Müller
- Center for Computational Mathematics, Flatiron Institute, New York, New York 10010, USA.,Department of Statistics, LMU München, 80539 Munich, Germany.,Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Ivo F Sbalzarini
- Technische Universität Dresden, Faculty of Computer Science, 01069 Dresden, Germany.,Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany.,Center for Systems Biology Dresden, 01307 Dresden, Germany.,Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, Dresden/Leipzig, Germany.,Cluster of Excellence Physics of Life, TU Dresden, 01307 Dresden, Germany
| |
Collapse
|
14
|
Su WH, Chou CS, Xiu D. Deep Learning of Biological Models from Data: Applications to ODE Models. Bull Math Biol 2021; 83:19. [PMID: 33452931 DOI: 10.1007/s11538-020-00851-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 12/21/2020] [Indexed: 10/22/2022]
Abstract
Mathematical equations are often used to model biological processes. However, for many systems, determining analytically the underlying equations is highly challenging due to the complexity and unknown factors involved in the biological processes. In this work, we present a numerical procedure to discover dynamical physical laws behind biological data. The method utilizes deep learning methods based on neural networks, particularly residual networks. It is also based on recently developed mathematical tools of flow-map learning for dynamical systems. We demonstrate that with the proposed method, one can accurately construct numerical biological models for unknown governing equations behind measurement data. Moreover, the deep learning model can also incorporate unknown parameters in the biological process. A successfully trained deep neural network model can then be used as a predictive tool to produce system predictions of different settings and allows one to conduct detailed analysis of the underlying biological process. In this paper, we use three biological models-SEIR model, Morris-Lecar model and the Hodgkin-Huxley model-to show the capability of our proposed method.
Collapse
Affiliation(s)
- Wei-Hung Su
- Department of Mathematics, The Ohio State University, Columbus, OH, 43221, USA
| | - Ching-Shan Chou
- Department of Mathematics, The Ohio State University, Columbus, OH, 43221, USA
| | - Dongbin Xiu
- Department of Mathematics, The Ohio State University, Columbus, OH, 43221, USA.
| |
Collapse
|
15
|
Yang Y, Aziz Bhouri M, Perdikaris P. Bayesian differential programming for robust systems identification under uncertainty. Proc Math Phys Eng Sci 2020; 476:20200290. [PMID: 33362409 PMCID: PMC7735302 DOI: 10.1098/rspa.2020.0290] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 10/23/2020] [Indexed: 11/12/2022] Open
Abstract
This paper presents a machine learning framework for Bayesian systems identification from noisy, sparse and irregular observations of nonlinear dynamical systems. The proposed method takes advantage of recent developments in differentiable programming to propagate gradient information through ordinary differential equation solvers and perform Bayesian inference with respect to unknown model parameters using Hamiltonian Monte Carlo sampling. This allows an efficient inference of the posterior distributions over plausible models with quantified uncertainty, while the use of sparsity-promoting priors enables the discovery of interpretable and parsimonious representations for the underlying latent dynamics. A series of numerical studies is presented to demonstrate the effectiveness of the proposed methods, including nonlinear oscillators, predator-prey systems and examples from systems biology. Taken together, our findings put forth a flexible and robust workflow for data-driven model discovery under uncertainty. All codes and data accompanying this article are available at https://bit.ly/34FOJMj.
Collapse
Affiliation(s)
| | | | - Paris Perdikaris
- Department of Mechanical Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
16
|
Kaheman K, Kutz JN, Brunton SL. SINDy-PI: a robust algorithm for parallel implicit sparse identification of nonlinear dynamics. Proc Math Phys Eng Sci 2020; 476:20200279. [PMID: 33214760 PMCID: PMC7655768 DOI: 10.1098/rspa.2020.0279] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 09/10/2020] [Indexed: 12/15/2022] Open
Abstract
Accurately modelling the nonlinear dynamics of a system from measurement data is a challenging yet vital topic. The sparse identification of nonlinear dynamics (SINDy) algorithm is one approach to discover dynamical systems models from data. Although extensions have been developed to identify implicit dynamics, or dynamics described by rational functions, these extensions are extremely sensitive to noise. In this work, we develop SINDy-PI (parallel, implicit), a robust variant of the SINDy algorithm to identify implicit dynamics and rational nonlinearities. The SINDy-PI framework includes multiple optimization algorithms and a principled approach to model selection. We demonstrate the ability of this algorithm to learn implicit ordinary and partial differential equations and conservation laws from limited and noisy data. In particular, we show that the proposed approach is several orders of magnitude more noise robust than previous approaches, and may be used to identify a class of ODE and PDE dynamics that were previously unattainable with SINDy, including for the double pendulum dynamics and simplified model for the Belousov-Zhabotinsky (BZ) reaction.
Collapse
Affiliation(s)
- Kadierdan Kaheman
- Department of Mechanical Engineering, University of Washington, Seattle, WA 98195, USA
| | - J Nathan Kutz
- Department of Applied Mathematics, University of Washington, Seattle, WA 98195, USA
| | - Steven L Brunton
- Department of Mechanical Engineering, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
17
|
Udrescu SM, Tegmark M. AI Feynman: A physics-inspired method for symbolic regression. SCIENCE ADVANCES 2020; 6:eaay2631. [PMID: 32426452 PMCID: PMC7159912 DOI: 10.1126/sciadv.aay2631] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 01/03/2020] [Indexed: 05/22/2023]
Abstract
A core challenge for both physics and artificial intelligence (AI) is symbolic regression: finding a symbolic expression that matches data from an unknown function. Although this problem is likely to be NP-hard in principle, functions of practical interest often exhibit symmetries, separability, compositionality, and other simplifying properties. In this spirit, we develop a recursive multidimensional symbolic regression algorithm that combines neural network fitting with a suite of physics-inspired techniques. We apply it to 100 equations from the Feynman Lectures on Physics, and it discovers all of them, while previous publicly available software cracks only 71; for a more difficult physics-based test set, we improve the state-of-the-art success rate from 15 to 90%.
Collapse
Affiliation(s)
- Silviu-Marian Udrescu
- Department of Physics and Center for Brains, Minds & Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Max Tegmark
- Department of Physics and Center for Brains, Minds & Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Theiss Research, La Jolla, CA 92037, USA
- Corresponding author.
| |
Collapse
|
18
|
Iten R, Metger T, Wilming H, Del Rio L, Renner R. Discovering Physical Concepts with Neural Networks. PHYSICAL REVIEW LETTERS 2020; 124:010508. [PMID: 31976717 DOI: 10.1103/physrevlett.124.010508] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Indexed: 05/10/2023]
Abstract
Despite the success of neural networks at solving concrete physics problems, their use as a general-purpose tool for scientific discovery is still in its infancy. Here, we approach this problem by modeling a neural network architecture after the human physical reasoning process, which has similarities to representation learning. This allows us to make progress towards the long-term goal of machine-assisted scientific discovery from experimental data without making prior assumptions about the system. We apply this method to toy examples and show that the network finds the physically relevant parameters, exploits conservation laws to make predictions, and can help to gain conceptual insights, e.g., Copernicus' conclusion that the solar system is heliocentric.
Collapse
Affiliation(s)
- Raban Iten
- ETH Zürich, Wolfgang-Pauli-Strasse 27, 8093 Zürich, Switzerland
| | - Tony Metger
- ETH Zürich, Wolfgang-Pauli-Strasse 27, 8093 Zürich, Switzerland
| | - Henrik Wilming
- ETH Zürich, Wolfgang-Pauli-Strasse 27, 8093 Zürich, Switzerland
| | - Lídia Del Rio
- ETH Zürich, Wolfgang-Pauli-Strasse 27, 8093 Zürich, Switzerland
| | - Renato Renner
- ETH Zürich, Wolfgang-Pauli-Strasse 27, 8093 Zürich, Switzerland
| |
Collapse
|
19
|
Automated, predictive, and interpretable inference of Caenorhabditis elegans escape dynamics. Proc Natl Acad Sci U S A 2019; 116:7226-7231. [PMID: 30902894 DOI: 10.1073/pnas.1816531116] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The roundworm Caenorhabditis elegans exhibits robust escape behavior in response to rapidly rising temperature. The behavior lasts for a few seconds, shows history dependence, involves both sensory and motor systems, and is too complicated to model mechanistically using currently available knowledge. Instead we model the process phenomenologically, and we use the Sir Isaac dynamical inference platform to infer the model in a fully automated fashion directly from experimental data. The inferred model requires incorporation of an unobserved dynamical variable and is biologically interpretable. The model makes accurate predictions about the dynamics of the worm behavior, and it can be used to characterize the functional logic of the dynamical system underlying the escape response. This work illustrates the power of modern artificial intelligence to aid in discovery of accurate and interpretable models of complex natural systems.
Collapse
|
20
|
Gasperino D, Baughman T, Hsieh HV, Bell D, Weigl BH. Improving Lateral Flow Assay Performance Using Computational Modeling. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2018; 11:219-244. [PMID: 29595992 DOI: 10.1146/annurev-anchem-061417-125737] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
The performance, field utility, and low cost of lateral flow assays (LFAs) have driven a tremendous shift in global health care practices by enabling diagnostic testing in previously unserved settings. This success has motivated the continued improvement of LFAs through increasingly sophisticated materials and reagents. However, our mechanistic understanding of the underlying processes that drive the informed design of these systems has not received commensurate attention. Here, we review the principles underpinning LFAs and the historical evolution of theory to predict their performance. As this theory is integrated into computational models and becomes testable, the criteria for quantifying performance and validating predictive power are critical. The integration of computational design with LFA development offers a promising and coherent framework to choose from an increasing number of novel materials, techniques, and reagents to deliver the low-cost, high-fidelity assays of the future.
Collapse
Affiliation(s)
- David Gasperino
- Intellectual Ventures Laboratory, Bellevue, Washington 98007, USA
| | - Ted Baughman
- Intellectual Ventures Laboratory, Bellevue, Washington 98007, USA
| | - Helen V Hsieh
- Intellectual Ventures Laboratory, Bellevue, Washington 98007, USA
| | - David Bell
- Intellectual Ventures Laboratory, Bellevue, Washington 98007, USA
| | - Bernhard H Weigl
- Intellectual Ventures Laboratory, Bellevue, Washington 98007, USA
- Department of Bioengineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
21
|
Quade M, Abel M, Nathan Kutz J, Brunton SL. Sparse identification of nonlinear dynamics for rapid model recovery. CHAOS (WOODBURY, N.Y.) 2018; 28:063116. [PMID: 29960401 DOI: 10.1063/1.5027470] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Big data have become a critically enabling component of emerging mathematical methods aimed at the automated discovery of dynamical systems, where first principles modeling may be intractable. However, in many engineering systems, abrupt changes must be rapidly characterized based on limited, incomplete, and noisy data. Many leading automated learning techniques rely on unrealistically large data sets, and it is unclear how to leverage prior knowledge effectively to re-identify a model after an abrupt change. In this work, we propose a conceptual framework to recover parsimonious models of a system in response to abrupt changes in the low-data limit. First, the abrupt change is detected by comparing the estimated Lyapunov time of the data with the model prediction. Next, we apply the sparse identification of nonlinear dynamics (SINDy) regression to update a previously identified model with the fewest changes, either by addition, deletion, or modification of existing model terms. We demonstrate this sparse model recovery on several examples for abrupt system change detection in periodic and chaotic dynamical systems. Our examples show that sparse updates to a previously identified model perform better with less data, have lower runtime complexity, and are less sensitive to noise than identifying an entirely new model. The proposed abrupt-SINDy architecture provides a new paradigm for the rapid and efficient recovery of a system model after abrupt changes.
Collapse
Affiliation(s)
- Markus Quade
- Institut für Physik und Astronomie, Universität Potsdam, Karl-Liebknecht-Straße 24/25, 14476 Potsdam, Germany
| | - Markus Abel
- Institut für Physik und Astronomie, Universität Potsdam, Karl-Liebknecht-Straße 24/25, 14476 Potsdam, Germany
| | - J Nathan Kutz
- Department of Applied Mathematics, University of Washington, Seattle, Washington 98195, USA
| | - Steven L Brunton
- Department of Mechanical Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
22
|
Identifying models of dielectric breakdown strength from high-throughput data via genetic programming. Sci Rep 2017; 7:17594. [PMID: 29242566 PMCID: PMC5730619 DOI: 10.1038/s41598-017-17535-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 11/22/2017] [Indexed: 11/26/2022] Open
Abstract
The identification of models capable of rapidly predicting material properties enables rapid screening of large numbers of materials and facilitates the design of new materials. One of the leading challenges for computational researchers is determining the best ways to analyze large material data sets to identify models that can rapidly predict a given property. In this paper, we demonstrate the use of genetic programming to generate simple models of dielectric breakdown based on 82 representative dielectric materials. We identified the band gap Eg and phonon cut-off frequency ωmax as the two most relevant features, and new classes of models featuring functions of Eg and ωmax were uncovered. The genetic programming approach was found to outperform other approaches for generating models, and we discuss some of the advantages of this approach.
Collapse
|
23
|
Cyr KJ, Avaldi OM, Wikswo JP. Circadian hormone control in a human-on-a-chip: In vitro biology's ignored component? Exp Biol Med (Maywood) 2017; 242:1714-1731. [PMID: 29065796 PMCID: PMC5832251 DOI: 10.1177/1535370217732766] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Organs-on-Chips (OoCs) are poised to reshape dramatically the study of biology by replicating in vivo the function of individual and coupled human organs. Such microphysiological systems (MPS) have already recreated complex physiological responses necessary to simulate human organ function not evident in two-dimensional in vitro biological experiments. OoC researchers hope to streamline pharmaceutical development, accelerate toxicology studies, limit animal testing, and provide new insights beyond the capability of current biological models. However, to develop a physiologically accurate Human-on-a-Chip, i.e., an MPS homunculus that functions as an interconnected, whole-body, model organ system, one must couple individual OoCs with proper fluidic and metabolic scaling. This will enable the study of the effects of organ-organ interactions on the metabolism of drugs and toxins. Critical to these efforts will be the recapitulation of the complex physiological signals that regulate the endocrine, metabolic, and digestive systems. To date, with the exception of research focused on reproductive organs on chips, most OoC research ignores homuncular endocrine regulation, in particular the circadian rhythms that modulate the function of all organ systems. We outline the importance of cyclic endocrine regulation and the role that it may play in the development of MPS homunculi for the pharmacology, toxicology, and systems biology communities. Moreover, we discuss the critical end-organ hormone interactions that are most relevant for a typical coupled-OoC system, and the possible research applications of a missing endocrine system MicroFormulator (MES-µF) that could impose biological rhythms on in vitro models. By linking OoCs together through chemical messenger systems, advanced physiological phenomena relevant to pharmacokinetics and pharmacodynamics studies can be replicated. The concept of a MES-µF could be applied to other standard cell-culture systems such as well plates, thereby extending the concept of circadian hormonal regulation to much of in vitro biology. Impact statement Historically, cyclic endocrine modulation has been largely ignored within in vitro cell culture, in part because cultured cells typically have their media changed every day or two, precluding hourly adjustment of hormone concentrations to simulate circadian rhythms. As the Organ-on-Chip (OoC) community strives for greater physiological realism, the contribution of hormonal oscillations toward regulation of organ systems has been examined only in the context of reproductive organs, and circadian variation of the breadth of other hormones on most organs remains unaddressed. We illustrate the importance of cyclic endocrine modulation and the role that it plays within individual organ systems. The study of cyclic endocrine modulation within OoC systems will help advance OoC research to the point where it can reliably replicate in vitro key regulatory components of human physiology. This will help translate OoC work into pharmaceutical applications and connect the OoC community with the greater pharmacology and physiology communities.
Collapse
Affiliation(s)
- Kevin J. Cyr
- Vanderbilt Institute for Integrative Biosystems Research and Education
- Systems Biology and Bioengineering Undergraduate Research Experience
| | - Omero M. Avaldi
- Vanderbilt Institute for Integrative Biosystems Research and Education
- Systems Biology and Bioengineering Undergraduate Research Experience
| | - John P. Wikswo
- Vanderbilt Institute for Integrative Biosystems Research and Education
- Department of Biomedical Engineering
- Department of Molecular Physiology and Biophysics
- Department of Physics and Astronomy, Vanderbilt University, Nashville TN, 37235, USA
| |
Collapse
|
24
|
Rudy SH, Brunton SL, Proctor JL, Kutz JN. Data-driven discovery of partial differential equations. SCIENCE ADVANCES 2017; 3:e1602614. [PMID: 28508044 PMCID: PMC5406137 DOI: 10.1126/sciadv.1602614] [Citation(s) in RCA: 214] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2016] [Accepted: 02/11/2017] [Indexed: 05/18/2023]
Abstract
We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg-de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable.
Collapse
Affiliation(s)
- Samuel H. Rudy
- Department of Applied Mathematics, University of Washington, Seattle, WA 98195, USA
- Corresponding author.
| | - Steven L. Brunton
- Department of Mechanical Engineering, University of Washington, Seattle, WA 98195, USA
| | - Joshua L. Proctor
- Institute for Disease Modeling, 3150 139th Avenue Southeast, Bellevue, WA 98005, USA
| | - J. Nathan Kutz
- Department of Applied Mathematics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
25
|
Lobo D, Levin M. Computing a Worm: Reverse-Engineering Planarian Regeneration. EMERGENCE, COMPLEXITY AND COMPUTATION 2017. [DOI: 10.1007/978-3-319-33921-4_24] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
26
|
Astola L, Stigter H, Gomez Roldan MV, van Eeuwijk F, Hall RD, Groenenboom M, Molenaar JJ. Parameter estimation in tree graph metabolic networks. PeerJ 2016; 4:e2417. [PMID: 27688960 PMCID: PMC5036115 DOI: 10.7717/peerj.2417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 08/05/2016] [Indexed: 11/21/2022] Open
Abstract
We study the glycosylation processes that convert initially toxic substrates to nutritionally valuable metabolites in the flavonoid biosynthesis pathway of tomato (Solanum lycopersicum) seedlings. To estimate the reaction rates we use ordinary differential equations (ODEs) to model the enzyme kinetics. A popular choice is to use a system of linear ODEs with constant kinetic rates or to use Michaelis–Menten kinetics. In reality, the catalytic rates, which are affected among other factors by kinetic constants and enzyme concentrations, are changing in time and with the approaches just mentioned, this phenomenon cannot be described. Another problem is that, in general these kinetic coefficients are not always identifiable. A third problem is that, it is not precisely known which enzymes are catalyzing the observed glycosylation processes. With several hundred potential gene candidates, experimental validation using purified target proteins is expensive and time consuming. We aim at reducing this task via mathematical modeling to allow for the pre-selection of most potential gene candidates. In this article we discuss a fast and relatively simple approach to estimate time varying kinetic rates, with three favorable properties: firstly, it allows for identifiable estimation of time dependent parameters in networks with a tree-like structure. Secondly, it is relatively fast compared to usually applied methods that estimate the model derivatives together with the network parameters. Thirdly, by combining the metabolite concentration data with a corresponding microarray data, it can help in detecting the genes related to the enzymatic processes. By comparing the estimated time dynamics of the catalytic rates with time series gene expression data we may assess potential candidate genes behind enzymatic reactions. As an example, we show how to apply this method to select prominent glycosyltransferase genes in tomato seedlings.
Collapse
Affiliation(s)
- Laura Astola
- Department of Biomedical Engineering, Eindhoven University of Technology , Eindhoven , Netherlands
| | - Hans Stigter
- Biometris, Department for Mathematical and Statistical Methods, Wageningen University and Research Centre , Wageningen , Netherlands
| | | | - Fred van Eeuwijk
- Biometris, Department for Mathematical and Statistical Methods, Wageningen University and Research Centre , Wageningen , Netherlands
| | - Robert D Hall
- Plant Research Intenational-Bioscience, Wageningen University and Research Centre , Wageningen , Netherlands
| | - Marian Groenenboom
- Biometris, Department for Mathematical and Statistical Methods, Wageningen University and Research Centre , Wageningen , Netherlands
| | - Jaap J Molenaar
- Biometris, Department for Mathematical and Statistical Methods, Wageningen University and Research Centre , Wageningen , Netherlands
| |
Collapse
|
27
|
Quade M, Abel M, Shafi K, Niven RK, Noack BR. Prediction of dynamical systems by symbolic regression. Phys Rev E 2016; 94:012214. [PMID: 27575130 DOI: 10.1103/physreve.94.012214] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2016] [Indexed: 11/07/2022]
Abstract
We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.
Collapse
Affiliation(s)
- Markus Quade
- Universität Potsdam, Institut für Physik und Astronomie, Karl-Liebknecht-Straße 24/25, 14476 Potsdam, Germany and Ambrosys GmbH, David-Gilly-Straße 1, 14469 Potsdam, Germany
| | - Markus Abel
- Universität Potsdam, Institut für Physik und Astronomie, Karl-Liebknecht-Straße 24/25, 14476 Potsdam, Germany and Ambrosys GmbH, David-Gilly-Straße 1, 14469 Potsdam, Germany
| | - Kamran Shafi
- School of Engineering and Information Technology, University of New South Wales, Canberra ACT 2600, Australia
| | - Robert K Niven
- School of Engineering and Information Technology, University of New South Wales, Canberra ACT 2600, Australia
| | - Bernd R Noack
- Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur LIMSI-CNRS, BP 133, 91403 Orsay cedex, France and Institut für Strömungsmechanik, Technische Universität Braunschweig, Hermann-Blenk-Straße 37, 38108 Braunschweig, Germany
| |
Collapse
|
28
|
Mellis IA, Raj A. Half dozen of one, six billion of the other: What can small- and large-scale molecular systems biology learn from one another? Genome Res 2016; 25:1466-72. [PMID: 26430156 PMCID: PMC4579331 DOI: 10.1101/gr.190579.115] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Small-scale molecular systems biology, by which we mean the understanding of a how a few parts work together to control a particular biological process, is predicated on the assumption that cellular regulation is arranged in a circuit-like structure. Results from the omics revolution have upset this vision to varying degrees by revealing a high degree of interconnectivity, making it difficult to develop a simple, circuit-like understanding of regulatory processes. We here outline the limitations of the small-scale systems biology approach with examples from research into genetic algorithms, genetics, transcriptional network analysis, and genomics. We also discuss the difficulties associated with deriving true understanding from the analysis of large data sets and propose that the development of new, intelligent, computational tools may point to a way forward. Throughout, we intentionally oversimplify and talk about things in which we have little expertise, and it is likely that many of our arguments are wrong on one level or another. We do believe, however, that developing a true understanding via molecular systems biology will require a fundamental rethinking of our approach, and our goal is to provoke thought along these lines.
Collapse
Affiliation(s)
- Ian A Mellis
- Perelman School of Medicine, Genomics and Computational Biology Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania 19104-6021, USA
| | - Arjun Raj
- Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104-6321, USA
| |
Collapse
|
29
|
Mangan NM, Brunton SL, Proctor JL, Kutz JN. Inferring Biological Networks by Sparse Identification of Nonlinear Dynamics. ACTA ACUST UNITED AC 2016. [DOI: 10.1109/tmbmc.2016.2633265] [Citation(s) in RCA: 172] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
30
|
Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc Natl Acad Sci U S A 2016; 113:3932-7. [PMID: 27035946 DOI: 10.1073/pnas.1517384113] [Citation(s) in RCA: 589] [Impact Index Per Article: 73.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Extracting governing equations from data is a central challenge in many diverse areas of science and engineering. Data are abundant whereas models often remain elusive, as in climate science, neuroscience, ecology, finance, and epidemiology, to name only a few examples. In this work, we combine sparsity-promoting techniques and machine learning with nonlinear dynamical systems to discover governing equations from noisy measurement data. The only assumption about the structure of the model is that there are only a few important terms that govern the dynamics, so that the equations are sparse in the space of possible functions; this assumption holds for many physical systems in an appropriate basis. In particular, we use sparse regression to determine the fewest terms in the dynamic governing equations required to accurately represent the data. This results in parsimonious models that balance accuracy with model complexity to avoid overfitting. We demonstrate the algorithm on a wide range of problems, from simple canonical systems, including linear and nonlinear oscillators and the chaotic Lorenz system, to the fluid vortex shedding behind an obstacle. The fluid example illustrates the ability of this method to discover the underlying dynamics of a system that took experts in the community nearly 30 years to resolve. We also show that this method generalizes to parameterized systems and systems that are time-varying or have external forcing.
Collapse
|
31
|
|
32
|
Abstract
Crowdsourcing, understood as outsourcing work to a large network of people in the form of an open call, has been utilized successfully many times, including a very interesting concept involving the implementation of computer games with the objective of solving a scientific problem by employing users to play a game—so-called crowdsourced serious games. Our main objective was to verify whether such an approach could be successfully applied to the discovery of mathematical equations that explain experimental data gathered during the observation of a given dynamic system. Moreover, we wanted to compare it with an approach based on artificial intelligence that uses symbolic regression to find such formulae automatically. To achieve this, we designed and implemented an Internet game in which players attempt to design a spaceship representing an equation that models the observed system. The game was designed while considering that it should be easy to use for people without strong mathematical backgrounds. Moreover, we tried to make use of the collective intelligence observed in crowdsourced systems by enabling many players to collaborate on a single solution. The idea was tested on several hundred players playing almost 10,000 games and conducting a user opinion survey. The results prove that the proposed solution has very high potential. The function generated during weeklong tests was almost as precise as the analytical solution of the model of the system and, up to a certain complexity level of the formulae, it explained data better than the solution generated automatically by Eureqa, the leading software application for the implementation of symbolic regression. Moreover, we observed benefits of using crowdsourcing; the chain of consecutive solutions that led to the best solution was obtained by the continuous collaboration of several players.
Collapse
|
33
|
Automated adaptive inference of phenomenological dynamical models. Nat Commun 2015; 6:8133. [PMID: 26293508 PMCID: PMC4560822 DOI: 10.1038/ncomms9133] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 07/22/2015] [Indexed: 11/17/2022] Open
Abstract
Dynamics of complex systems is often driven by large and intricate networks of microscopic interactions, whose sheer size obfuscates understanding. With limited experimental data, many parameters of such dynamics are unknown, and thus detailed, mechanistic models risk overfitting and making faulty predictions. At the other extreme, simple ad hoc models often miss defining features of the underlying systems. Here we develop an approach that instead constructs phenomenological, coarse-grained models of network dynamics that automatically adapt their complexity to the available data. Such adaptive models produce accurate predictions even when microscopic details are unknown. The approach is computationally tractable, even for a relatively large number of dynamical variables. Using simulated data, it correctly infers the phase space structure for planetary motion, avoids overfitting in a biological signalling system and produces accurate predictions for yeast glycolysis with tens of data points and over half of the interacting species unobserved. Mechanistic modelling of dynamical phenomena with many degrees of freedom runs the risk of overfitting and making faulty predictions, whereas ad hoc models may miss defining features. Here the authors develop an approach to construct dynamical models that adapt their complexity to the amount of available data.
Collapse
|
34
|
Inferring regulatory networks from experimental morphological phenotypes: a computational method reverse-engineers planarian regeneration. PLoS Comput Biol 2015; 11:e1004295. [PMID: 26042810 PMCID: PMC4456145 DOI: 10.1371/journal.pcbi.1004295] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 04/21/2015] [Indexed: 01/18/2023] Open
Abstract
Transformative applications in biomedicine require the discovery of complex regulatory networks that explain the development and regeneration of anatomical structures, and reveal what external signals will trigger desired changes of large-scale pattern. Despite recent advances in bioinformatics, extracting mechanistic pathway models from experimental morphological data is a key open challenge that has resisted automation. The fundamental difficulty of manually predicting emergent behavior of even simple networks has limited the models invented by human scientists to pathway diagrams that show necessary subunit interactions but do not reveal the dynamics that are sufficient for complex, self-regulating pattern to emerge. To finally bridge the gap between high-resolution genetic data and the ability to understand and control patterning, it is critical to develop computational tools to efficiently extract regulatory pathways from the resultant experimental shape phenotypes. For example, planarian regeneration has been studied for over a century, but despite increasing insight into the pathways that control its stem cells, no constructive, mechanistic model has yet been found by human scientists that explains more than one or two key features of its remarkable ability to regenerate its correct anatomical pattern after drastic perturbations. We present a method to infer the molecular products, topology, and spatial and temporal non-linear dynamics of regulatory networks recapitulating in silico the rich dataset of morphological phenotypes resulting from genetic, surgical, and pharmacological experiments. We demonstrated our approach by inferring complete regulatory networks explaining the outcomes of the main functional regeneration experiments in the planarian literature; By analyzing all the datasets together, our system inferred the first systems-biology comprehensive dynamical model explaining patterning in planarian regeneration. This method provides an automated, highly generalizable framework for identifying the underlying control mechanisms responsible for the dynamic regulation of growth and form. Developmental and regenerative biology experiments are producing a huge number of morphological phenotypes from functional perturbation experiments. However, existing pathway models do not generally explain the dynamic regulation of anatomical shape due to the difficulty of inferring and testing non-linear regulatory networks responsible for appropriate form, shape, and pattern. We present a method that automates the discovery and testing of regulatory networks explaining morphological outcomes directly from the resultant phenotypes, producing network models as testable hypotheses explaining regeneration data. Our system integrates a formalization of the published results in planarian regeneration, an in silico simulator in which the patterning properties of regulatory networks can be quantitatively tested in a regeneration assay, and a machine learning module that evolves networks whose behavior in this assay optimally matches the database of planarian results. We applied our method to explain the key experiments in planarian regeneration, and discovered the first comprehensive model of anterior-posterior patterning in planaria under surgical, pharmacological, and genetic manipulations. Beyond the planarian data, our approach is readily generalizable to facilitate the discovery of testable regulatory networks in developmental biology and biomedicine, and represents the first developmental model discovered de novo from morphological outcomes by an automated system.
Collapse
|
35
|
Inferring Broad Regulatory Biology from Time Course Data: Have We Reached an Upper Bound under Constraints Typical of In Vivo Studies? PLoS One 2015; 10:e0127364. [PMID: 25984725 PMCID: PMC4435750 DOI: 10.1371/journal.pone.0127364] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 04/13/2015] [Indexed: 12/21/2022] Open
Abstract
There is a growing appreciation for the network biology that regulates the coordinated expression of molecular and cellular markers however questions persist regarding the identifiability of these networks. Here we explore some of the issues relevant to recovering directed regulatory networks from time course data collected under experimental constraints typical of in vivo studies. NetSim simulations of sparsely connected biological networks were used to evaluate two simple feature selection techniques used in the construction of linear Ordinary Differential Equation (ODE) models, namely truncation of terms versus latent vector projection. Performance was compared with ODE-based Time Series Network Identification (TSNI) integral, and the information-theoretic Time-Delay ARACNE (TD-ARACNE). Projection-based techniques and TSNI integral outperformed truncation-based selection and TD-ARACNE on aggregate networks with edge densities of 10-30%, i.e. transcription factor, protein-protein cliques and immune signaling networks. All were more robust to noise than truncation-based feature selection. Performance was comparable on the in silico 10-node DREAM 3 network, a 5-node Yeast synthetic network designed for In vivo Reverse-engineering and Modeling Assessment (IRMA) and a 9-node human HeLa cell cycle network of similar size and edge density. Performance was more sensitive to the number of time courses than to sample frequency and extrapolated better to larger networks by grouping experiments. In all cases performance declined rapidly in larger networks with lower edge density. Limited recovery and high false positive rates obtained overall bring into question our ability to generate informative time course data rather than the design of any particular reverse engineering algorithm.
Collapse
|
36
|
Daniels BC, Nemenman I. Efficient inference of parsimonious phenomenological models of cellular dynamics using S-systems and alternating regression. PLoS One 2015; 10:e0119821. [PMID: 25806510 PMCID: PMC4373916 DOI: 10.1371/journal.pone.0119821] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 01/16/2015] [Indexed: 11/18/2022] Open
Abstract
The nonlinearity of dynamics in systems biology makes it hard to infer them from experimental data. Simple linear models are computationally efficient, but cannot incorporate these important nonlinearities. An adaptive method based on the S-system formalism, which is a sensible representation of nonlinear mass-action kinetics typically found in cellular dynamics, maintains the efficiency of linear regression. We combine this approach with adaptive model selection to obtain efficient and parsimonious representations of cellular dynamics. The approach is tested by inferring the dynamics of yeast glycolysis from simulated data. With little computing time, it produces dynamical models with high predictive power and with structural complexity adapted to the difficulty of the inference problem.
Collapse
Affiliation(s)
- Bryan C. Daniels
- Center for Complexity and Collective Computation, Wisconsin Institute for Discovery, University of Wisconsin, Madison, WI 53715, USA
- * E-mail: (BCD); (IN)
| | - Ilya Nemenman
- Departments of Physics and Biology, Emory University, Atlanta, GA 30322, USA
- * E-mail: (BCD); (IN)
| |
Collapse
|
37
|
Cornforth TW, Lipson H. A hybrid evolutionary algorithm for the symbolic modeling of multiple-time-scale dynamical systems. EVOLUTIONARY INTELLIGENCE 2015. [DOI: 10.1007/s12065-015-0126-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
38
|
Cakır T, Khatibipour MJ. Metabolic network discovery by top-down and bottom-up approaches and paths for reconciliation. Front Bioeng Biotechnol 2014; 2:62. [PMID: 25520953 PMCID: PMC4253960 DOI: 10.3389/fbioe.2014.00062] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 11/14/2014] [Indexed: 11/13/2022] Open
Abstract
The primary focus in the network-centric analysis of cellular metabolism by systems biology approaches is to identify the active metabolic network for the condition of interest. Two major approaches are available for the discovery of the condition-specific metabolic networks. One approach starts from genome-scale metabolic networks, which cover all possible reactions known to occur in the related organism in a condition-independent manner, and applies methods such as the optimization-based Flux-Balance Analysis to elucidate the active network. The other approach starts from the condition-specific metabolome data, and processes the data with statistical or optimization-based methods to extract information content of the data such that the active network is inferred. These approaches, termed bottom-up and top-down, respectively, are currently employed independently. However, considering that both approaches have the same goal, they can both benefit from each other paving the way for the novel integrative analysis methods of metabolome data- and flux-analysis approaches in the post-genomic era. This study reviews the strengths of constraint-based analysis and network inference methods reported in the metabolic systems biology field; then elaborates on the potential paths to reconcile the two approaches to shed better light on how the metabolism functions.
Collapse
Affiliation(s)
- Tunahan Cakır
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University (formerly known as Gebze Institute of Technology) , Gebze , Turkey
| | - Mohammad Jafar Khatibipour
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University (formerly known as Gebze Institute of Technology) , Gebze , Turkey ; Department of Chemical Engineering, Gebze Technical University (formerly known as Gebze Institute of Technology) , Gebze , Turkey
| |
Collapse
|
39
|
Qi Q, Li J, Cheng J. Reconstruction of metabolic pathways by combining probabilistic graphical model-based and knowledge-based methods. BMC Proc 2014; 8:S5. [PMID: 25374614 PMCID: PMC4202177 DOI: 10.1186/1753-6561-8-s6-s5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Automatic reconstruction of metabolic pathways for an organism from genomics and transcriptomics data has been a challenging and important problem in bioinformatics. Traditionally, known reference pathways can be mapped into an organism-specific ones based on its genome annotation and protein homology. However, this simple knowledge-based mapping method might produce incomplete pathways and generally cannot predict unknown new relations and reactions. In contrast, ab initio metabolic network construction methods can predict novel reactions and interactions, but its accuracy tends to be low leading to a lot of false positives. Here we combine existing pathway knowledge and a new ab initio Bayesian probabilistic graphical model together in a novel fashion to improve automatic reconstruction of metabolic networks. Specifically, we built a knowledge database containing known, individual gene / protein interactions and metabolic reactions extracted from existing reference pathways. Known reactions and interactions were then used as constraints for Bayesian network learning methods to predict metabolic pathways. Using individual reactions and interactions extracted from different pathways of many organisms to guide pathway construction is new and improves both the coverage and accuracy of metabolic pathway construction. We applied this probabilistic knowledge-based approach to construct the metabolic networks from yeast gene expression data and compared its results with 62 known metabolic networks in the KEGG database. The experiment showed that the method improved the coverage of metabolic network construction over the traditional reference pathway mapping method and was more accurate than pure ab initio methods.
Collapse
Affiliation(s)
- Qi Qi
- Department of Computer Science, University of Missouri, Columbia, MO 65201, USA
| | - Jilong Li
- Department of Computer Science, University of Missouri, Columbia, MO 65201, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65201, USA ; Informatics Institute, University of Missouri, Columbia, MO 65201, USA
| |
Collapse
|
40
|
Heinemann J, Noon B, Mohigmi MJ, Mazurie A, Dickensheets DL, Bothner B. Real-time digitization of metabolomics patterns from a living system using mass spectrometry. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2014; 25:1755-62. [PMID: 25001378 PMCID: PMC4163111 DOI: 10.1007/s13361-014-0922-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Revised: 04/27/2014] [Accepted: 04/28/2014] [Indexed: 05/05/2023]
Abstract
The real-time quantification of changes in intracellular metabolic activities has the potential to vastly improve upon traditional transcriptomics and metabolomics assays for the prediction of current and future cellular phenotypes. This is in part because intracellular processes reveal themselves as specific temporal patterns of variation in metabolite abundance that can be detected with existing signal processing algorithms. Although metabolite abundance levels can be quantified by mass spectrometry (MS), large-scale real-time monitoring of metabolite abundance has yet to be realized because of technological limitations for fast extraction of metabolites from cells and biological fluids. To address this issue, we have designed a microfluidic-based inline small molecule extraction system, which allows for continuous metabolomic analysis of living systems using MS. The system requires minimal supervision, and has been successful at real-time monitoring of bacteria and blood. Feature-based pattern analysis of Escherichia coli growth and stress revealed cyclic patterns and forecastable metabolic trajectories. Using these trajectories, future phenotypes could be inferred as they exhibit predictable transitions in both growth and stress related changes. Herein, we describe an interface for tracking metabolic changes directly from blood or cell suspension in real-time.
Collapse
Affiliation(s)
- Joshua Heinemann
- Department of chemistry and biochemistry, Montana State University, Bozeman, MT 59717
| | - Brigit Noon
- Department of chemistry and biochemistry, Montana State University, Bozeman, MT 59717
| | - Mohammad J. Mohigmi
- Electrical & computer engineering department, Montana State University, Bozeman, MT 59717
| | - Aurélien Mazurie
- Bioinformatics core facility, Montana State University, Bozeman, MT 59717
| | - David L. Dickensheets
- Electrical & computer engineering department, Montana State University, Bozeman, MT 59717
| | - Brian Bothner
- Department of chemistry and biochemistry, Montana State University, Bozeman, MT 59717
- Montana Microfabrication facility, Montana State University, Bozeman, MT 59717
| |
Collapse
|
41
|
Sunnåker M, Zamora-Sillero E, López García de Lomana A, Rudroff F, Sauer U, Stelling J, Wagner A. Topological augmentation to infer hidden processes in biological systems. Bioinformatics 2014; 30:221-7. [PMID: 24297519 PMCID: PMC3892687 DOI: 10.1093/bioinformatics/btt638] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Revised: 10/28/2013] [Accepted: 10/31/2013] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION A common problem in understanding a biochemical system is to infer its correct structure or topology. This topology consists of all relevant state variables-usually molecules and their interactions. Here we present a method called topological augmentation to infer this structure in a statistically rigorous and systematic way from prior knowledge and experimental data. RESULTS Topological augmentation starts from a simple model that is unable to explain the experimental data and augments its topology by adding new terms that capture the experimental behavior. This process is guided by representing the uncertainty in the model topology through stochastic differential equations whose trajectories contain information about missing model parts. We first apply this semiautomatic procedure to a pharmacokinetic model. This example illustrates that a global sampling of the parameter space is critical for inferring a correct model structure. We also use our method to improve our understanding of glutamine transport in yeast. This analysis shows that transport dynamics is determined by glutamine permeases with two different kinds of kinetics. Topological augmentation can not only be applied to biochemical systems, but also to any system that can be described by ordinary differential equations. AVAILABILITY AND IMPLEMENTATION Matlab code and examples are available at: http://www.csb.ethz.ch/tools/index
Collapse
Affiliation(s)
- Mikael Sunnåker
- Department of Biosystems Science and Engineering/Swiss Institute of Bioinformatics, ETH Zurich, 4058 Basel, Switzerland, Competence Center for Systems Physiology and Metabolic Diseases, ETH Zurich, 8093 Zurich, Switzerland, Institute of Evolutionary Biology and Environmental Studies/Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland, Institute for Molecular Systems Biology, 8093 Zurich, Switzerland and The Santa Fe Institute, Santa Fe, 87501 New Mexico, USA
| | | | | | | | | | | | | |
Collapse
|
42
|
Gennemark P, Wedelin D. ODEion--a software module for structural identification of ordinary differential equations. J Bioinform Comput Biol 2013; 12:1350015. [PMID: 24467754 DOI: 10.1142/s0219720013500157] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In the systems biology field, algorithms for structural identification of ordinary differential equations (ODEs) have mainly focused on fixed model spaces like S-systems and/or on methods that require sufficiently good data so that derivatives can be accurately estimated. There is therefore a lack of methods and software that can handle more general models and realistic data. We present ODEion, a software module for structural identification of ODEs. Main characteristic features of the software are: • The model space is defined by arbitrary user-defined functions that can be nonlinear in both variables and parameters, such as for example chemical rate reactions. • ODEion implements computationally efficient algorithms that have been shown to efficiently handle sparse and noisy data. It can run a range of realistic problems that previously required a supercomputer. • ODEion is easy to use and provides SBML output. We describe the mathematical problem, the ODEion system itself, and provide several examples of how the system can be used. Available at: http://www.odeidentification.org.
Collapse
Affiliation(s)
- Peter Gennemark
- Mathematical Sciences, University of Gothenburg, Gothenburg, Sweden , Department of Mathematics, Uppsala University, Uppsala, Sweden
| | | |
Collapse
|
43
|
Wikswo JP, Curtis EL, Eagleton ZE, Evans BC, Kole A, Hofmeister LH, Matloff WJ. Scaling and systems biology for integrating multiple organs-on-a-chip. LAB ON A CHIP 2013; 13:3496-511. [PMID: 23828456 PMCID: PMC3818688 DOI: 10.1039/c3lc50243k] [Citation(s) in RCA: 182] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Coupled systems of in vitro microfabricated organs-on-a-chip containing small populations of human cells are being developed to address the formidable pharmacological and physiological gaps between monolayer cell cultures, animal models, and humans that severely limit the speed and efficiency of drug development. These gaps present challenges not only in tissue and microfluidic engineering, but also in systems biology: how does one model, test, and learn about the communication and control of biological systems with individual organs-on-chips that are one-thousandth or one-millionth of the size of adult organs, or even smaller, i.e., organs for a milliHuman (mHu) or microHuman (μHu)? Allometric scaling that describes inter-species variation of organ size and properties provides some guidance, but given the desire to utilize these systems to extend and validate human pharmacokinetic and pharmacodynamic (PK/PD) models in support of drug discovery and development, it is more appropriate to scale each organ functionally to ensure that it makes the suitable physiological contribution to the coupled system. The desire to recapitulate the complex organ-organ interactions that result from factors in the blood and lymph places a severe constraint on the total circulating fluid (~5 mL for a mHu and ~5 μL for a μHu) and hence on the pumps, valves, and analytical instruments required to maintain and study these systems. Scaling arguments also provide guidance on the design of a universal cell-culture medium, typically without red blood cells. This review presents several examples of scaling arguments and discusses steps that should ensure the success of this endeavour.
Collapse
Affiliation(s)
- John P Wikswo
- Vanderbilt Institute for Integrative Biosystems Research and Education, Vanderbilt University, Nashville, TN 37235, USA.
| | | | | | | | | | | | | |
Collapse
|
44
|
Wikswo JP, Block FE, Cliffel DE, Goodwin CR, Marasco CC, Markov DA, McLean DL, McLean JA, McKenzie JR, Reiserer RS, Samson PC, Schaffer DK, Seale KT, Sherrod SD. Engineering challenges for instrumenting and controlling integrated organ-on-chip systems. IEEE Trans Biomed Eng 2013; 60:682-90. [PMID: 23380852 PMCID: PMC3696887 DOI: 10.1109/tbme.2013.2244891] [Citation(s) in RCA: 132] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
The sophistication and success of recently reported microfabricated organs-on-chips and human organ constructs have made it possible to design scaled and interconnected organ systems that may significantly augment the current drug development pipeline and lead to advances in systems biology. Physiologically realistic live microHuman (μHu) and milliHuman (mHu) systems operating for weeks to months present exciting and important engineering challenges such as determining the appropriate size for each organ to ensure appropriate relative organ functional activity, achieving appropriate cell density, providing the requisite universal perfusion media, sensing the breadth of physiological responses, and maintaining stable control of the entire system, while maintaining fluid scaling that consists of ~5 mL for the mHu and ~5 μL for the μHu. We believe that successful mHu and μHu systems for drug development and systems biology will require low-volume microdevices that support chemical signaling, microfabricated pumps, valves and microformulators, automated optical microscopy, electrochemical sensors for rapid metabolic assessment, ion mobility-mass spectrometry for real-time molecular analysis, advanced bioinformatics, and machine learning algorithms for automated model inference and integrated electronic control. Toward this goal, we are building functional prototype components and are working toward top-down system integration.
Collapse
Affiliation(s)
- John P. Wikswo
- Departments of Biomedical Engineering, Molecular Physiology & Biophysics, and Physics, and Astronomy, Vanderbilt University, Nashville, TN 37235-1807 USA
| | - Frank E. Block
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37235-1631 USA
| | - David E. Cliffel
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235-1822 USA
| | - Cody R. Goodwin
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235-1822 USA
| | - Christina C. Marasco
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37235-1631 USA
| | - Dmitry A. Markov
- Department of Cancer Biology, Vanderbilt University, Nashville, TN 37232-6840 USA
| | - David L. McLean
- Department of Physics & Astronomy, Vanderbilt University, Nashville, TN 37235-1807 USA
| | - John A. McLean
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235-1822 USA
| | | | - Ronald S. Reiserer
- Department of Physics & Astronomy, Vanderbilt University, Nashville, TN 37235-1807 USA
| | - Philip C. Samson
- Department of Physics & Astronomy, Vanderbilt University, Nashville, TN 37235-1807 USA
| | - David K. Schaffer
- Department of Physics & Astronomy, Vanderbilt University, Nashville, TN 37235-1807 USA
| | - Kevin T. Seale
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37235-1631 USA
| | - Stacy D. Sherrod
- Department of Physics & Astronomy, Vanderbilt University, Nashville, TN 37235-1807 USA
| |
Collapse
|
45
|
Broderick G, Craddock TJA. Systems biology of complex symptom profiles: capturing interactivity across behavior, brain and immune regulation. Brain Behav Immun 2013; 29:1-8. [PMID: 23022717 PMCID: PMC3554865 DOI: 10.1016/j.bbi.2012.09.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/21/2012] [Revised: 09/13/2012] [Accepted: 09/14/2012] [Indexed: 12/15/2022] Open
Abstract
As our thinking about the basic principles of biology and medicine continue to evolve, the importance of context and regulatory interaction is becoming increasingly obvious. Biochemical and physiological components do not exist in isolation but instead are part of a tightly integrated network of interacting elements that ensure robustness and support the emergence of complex behavior. This integration permeates all levels of biology from gene regulation, to immune cell signaling, to coordinated patterns of neuronal activity and the resulting psychosocial interaction. Systems biology is an emerging branch of science that sits as a translational catalyst at the interface of the life and computational sciences. While there is no universally accepted definition of systems biology, we attempt to provide an overview of some the basic unifying concepts and current efforts in the field as they apply to illnesses where brain and subsequent behavior are a chief component, for example autism, schizophrenia, depression, and others. Methods in this field currently constitute a broad mosaic that stretches across multiple scales of biology and physiological compartments. While this work by no means constitutes an exhaustive list of all these methods, this work highlights the principal sub-disciplines presently driving the field as well as future directions of progress.
Collapse
Affiliation(s)
- Gordon Broderick
- Department of Medicine, University of Alberta, Edmonton, Canada.
| | | |
Collapse
|
46
|
Lecca P, Priami C. Biological network inference for drug discovery. Drug Discov Today 2012; 18:256-64. [PMID: 23147668 DOI: 10.1016/j.drudis.2012.11.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Revised: 10/04/2012] [Accepted: 11/05/2012] [Indexed: 12/31/2022]
Abstract
A better understanding of the pathophysiology should help deliver drugs whose targets are involved in the causative processes underlying a disease. Biological network inference uses computational methods for deducing from high-throughput experimental data, the topology and the causal structure of the interactions among the drugs and their targets. Therefore, biological network inference can support and contribute to the experimental identification of both gene and protein networks causing a disease as well as the biochemical networks of drugs metabolism and mechanisms of action. The resulting high-level networks serve as a foundational basis for more detailed mechanistic models and are increasingly used in drug discovery by pharmaceutical and biotechnology companies. We review and compare recent computational technologies for network inference applied to drug discovery.
Collapse
Affiliation(s)
- Paola Lecca
- The Microsoft Research, University of Trento, Centre for Computational and Systems Biology, Piazza Manifattura 1 - 38068 Rovereto, Italy.
| | | |
Collapse
|