1
|
Zhang H, Chen J, Tian T. Bayesian Inference of Stochastic Dynamic Models Using Early-Rejection Methods Based on Sequential Stochastic Simulations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1484-1494. [PMID: 33216717 DOI: 10.1109/tcbb.2020.3039490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Stochastic modelling is an important method to investigate the functions of noise in a wide range of biological systems. However, the parameter inference for stochastic models is still a challenging problem partially due to the large computing time required for stochastic simulations. To address this issue, we propose a novel early-rejection method by using sequential stochastic simulations. We first show that a large number of stochastic simulations are required to obtain reliable inference results. Instead of generating a large number of simulations for each parameter sample, we propose to generate these simulations in a number of stages. The simulation process will go to the next stage only if the accuracy of simulations at the current stage satisfies a given error criterion. We propose a formula to determine the error criterion and use a stochastic differential equation model to examine the effects of different criteria. Three biochemical network models are used to evaluate the efficiency and accuracy of the proposed method. Numerical results suggest the proposed early-rejection method achieves substantial improvement in the efficiency for the inference of stochastic models.
Collapse
|
2
|
The identifiability of gene regulatory networks: the role of observation data. J Biol Phys 2022; 48:93-110. [PMID: 34988715 PMCID: PMC8866611 DOI: 10.1007/s10867-021-09595-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Accepted: 11/07/2021] [Indexed: 10/19/2022] Open
Abstract
Identifying gene regulatory networks (GRN) from observation data is significant to understand biological systems. Conventional studies focus on improving the performance of identification algorithms. However, besides algorithm performance, the GRN identification is strongly depended on the observation data. In this work, for three GRN S-system models, three observation data collection schemes are used to perform the identifiability test procedure. A modified genetic algorithm-particle swarm optimization algorithm is proposed to implement this task, including the multi-level mutation operation and velocity limitation strategy. The results show that, in scheme 1 (starting from a special initial condition), the GRN systems are of identifiability using the sufficient transient observation data. In scheme 2, the observation data are short of sufficient system dynamic. The GRN systems are not of identifiability even though the state trajectories can be reproduced. As a special case of scheme 2, i.e., the steady-state observation data, the equilibrium point analysis is given to explain why it is infeasible for GRN identification. In schemes 1 and 2, the observation data are obtained from zero-input GRN systems, which will evolve to the steady state at last. The sufficient transient observation data in scheme 1 can be obtained by changing the experimental conditions. Additionally, the valid observation data can be also obtained by means of adding impulse excitation signal into GRN systems (scheme 3). Consequently, the GRN systems are identifiable using scheme 3. Owing to its universality and simplicity, these results provide a guide for biologists to collect valid observation data for identifying GRNs and to further understand GRN dynamics.
Collapse
|
3
|
Deng Z, Zhang X, Tian T. Inference of Model Parameters Using Particle Filter Algorithm and Copula Distributions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1231-1240. [PMID: 30418916 DOI: 10.1109/tcbb.2018.2880974] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
It is widely accepted that experimental data often include noise because of the limitation in experimental conditions. In addition, biological systems inside the cells also contain uncertainty due to small copy molecular numbers. To address this issue, it was proposed that experimental data include both real system state and a noise term whose variance is a constant. An additional assumption is that the observation data of different variables are independent to each other. However, recent research works showed that noise in experimental data might not be the white noise. In addition, the observed values of different variables may be correlated. This work designs a new algorithm to infer the unknown model parameters based on noisy data. The innovation of this method includes a new noise model, in which the variance of noise is dependent on the system state, and a copula particle filter algorithm that uses the copula density functions to describe the dependence of different variables. The proposed algorithm is evaluated by using two deterministic models for gene networks and a stochastic model. Numerical results show that the accuracy of our proposed method is better than that of the widely used Liu-West filter and copula particle filter algorithms.
Collapse
|
4
|
Bakhteh S, Ghaffari-Hadigheh A, Chaparzadeh N. Identification of Minimum Set of Master Regulatory Genes in Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:999-1009. [PMID: 30334767 DOI: 10.1109/tcbb.2018.2875692] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Identification of master regulatory genes is one of the primary challenges in systems biology. The minimum dominating set problem is a powerful paradigm in analyzing such complex networks. In these models, genes stand as nodes and their interactions are assumed as edges. Here, members of a minimal dominating set could be regarded as master genes. As finitely many minimum dominating sets may exist in a network, it is difficult to identify which one represents the most appropriate set of master genes. In this paper, we develop a weighted gene regulatory network problem with two objectives as a version of the dominating set problem. Collective influence of each gene is considered as its weight. The first objective aims to find a master regulatory genes set with minimum cardinality, and the second objective identifies the one with maximum weight. The model is converted to a single objective using a parameter varying between zero and one. The model is implemented on three human networks, and the results are reported and compared with the existing model of weighted network. Parametric programming in linear optimization and logistic regression are also implemented on the arisen relaxed problem to provide a deeper understanding of the results. Learned from computational results in parametric analysis, for some ranges of priorities in objectives, the identified master regulatory genes are invariant, while some of them are identified for all priorities. This would be an indication that such genes have higher degree of being master regulatory ones, specially on the noisy networks.
Collapse
|
5
|
Loskot P, Atitey K, Mihaylova L. Comprehensive Review of Models and Methods for Inferences in Bio-Chemical Reaction Networks. Front Genet 2019; 10:549. [PMID: 31258548 PMCID: PMC6588029 DOI: 10.3389/fgene.2019.00549] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 05/24/2019] [Indexed: 01/30/2023] Open
Abstract
The key processes in biological and chemical systems are described by networks of chemical reactions. From molecular biology to biotechnology applications, computational models of reaction networks are used extensively to elucidate their non-linear dynamics. The model dynamics are crucially dependent on the parameter values which are often estimated from observations. Over the past decade, the interest in parameter and state estimation in models of (bio-) chemical reaction networks (BRNs) grew considerably. The related inference problems are also encountered in many other tasks including model calibration, discrimination, identifiability, and checking, and optimum experiment design, sensitivity analysis, and bifurcation analysis. The aim of this review paper is to examine the developments in literature to understand what BRN models are commonly used, and for what inference tasks and inference methods. The initial collection of about 700 documents concerning estimation problems in BRNs excluding books and textbooks in computational biology and chemistry were screened to select over 270 research papers and 20 graduate research theses. The paper selection was facilitated by text mining scripts to automate the search for relevant keywords and terms. The outcomes are presented in tables revealing the levels of interest in different inference tasks and methods for given models in the literature as well as the research trends are uncovered. Our findings indicate that many combinations of models, tasks and methods are still relatively unexplored, and there are many new research opportunities to explore combinations that have not been considered-perhaps for good reasons. The most common models of BRNs in literature involve differential equations, Markov processes, mass action kinetics, and state space representations whereas the most common tasks are the parameter inference and model identification. The most common methods in literature are Bayesian analysis, Monte Carlo sampling strategies, and model fitting to data using evolutionary algorithms. The new research problems which cannot be directly deduced from the text mining data are also discussed.
Collapse
Affiliation(s)
- Pavel Loskot
- College of Engineering, Swansea University, Swansea, United Kingdom
| | - Komlan Atitey
- College of Engineering, Swansea University, Swansea, United Kingdom
| | - Lyudmila Mihaylova
- Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield, United Kingdom
| |
Collapse
|
6
|
Jeong JE, Qiu P. Quantifying the relative importance of experimental data points in parameter estimation. BMC SYSTEMS BIOLOGY 2018; 12:103. [PMID: 30463558 PMCID: PMC6249737 DOI: 10.1186/s12918-018-0622-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND Ordinary differential equations (ODEs) are often used to understand biological processes. Since ODE-based models usually contain many unknown parameters, parameter estimation is an important step toward deeper understanding of the process. Parameter estimation is often formulated as a least squares optimization problem, where all experimental data points are considered as equally important. However, this equal-weight formulation ignores the possibility of existence of relative importance among different data points, and may lead to misleading parameter estimation results. Therefore, we propose to introduce weights to account for the relative importance of different data points when formulating the least squares optimization problem. Each weight is defined by the uncertainty of one data point given the other data points. If one data point can be accurately inferred given the other data, the uncertainty of this data point is low and the importance of this data point is low. Whereas, if inferring one data point from the other data is almost impossible, it contains a huge uncertainty and carries more information for estimating parameters. RESULTS G1/S transition model with 6 parameters and 12 parameters, and MAPK module with 14 parameters were used to test the weighted formulation. In each case, evenly spaced experimental data points were used. Weights calculated in these models showed similar patterns: high weights for data points in dynamic regions and low weights for data points in flat regions. We developed a sampling algorithm to evaluate the weighted formulation, and demonstrated that the weighted formulation reduced the redundancy in the data. For G1/S transition model with 12 parameters, we examined unevenly spaced experimental data points, strategically sampled to have more measurement points where the weights were relatively high, and fewer measurement points where the weights were relatively low. This analysis showed that the proposed weights can be used for designing measurement time points. CONCLUSIONS Giving a different weight to each data point according to its relative importance compared to other data points is an effective method for improving robustness of parameter estimation by reducing the redundancy in the experimental data.
Collapse
Affiliation(s)
- Jenny E. Jeong
- Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, 30332 GA USA
| | - Peng Qiu
- Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, 30332 GA USA
| |
Collapse
|
7
|
Zhang YM, Zhang Y, Guo M. Epigenetic game theory and its application in plants: Comment on: "Epigenetic game theory: How to compute the epigenetic control of maternal-to-zygotic transition" by Qian Wang et al. Phys Life Rev 2017; 20:158-160. [PMID: 28094142 DOI: 10.1016/j.plrev.2017.01.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 01/11/2017] [Indexed: 11/19/2022]
Affiliation(s)
- Yuan-Ming Zhang
- Statistical Genomics Laboratory, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China.
| | - Yinghao Zhang
- Statistical Genomics Laboratory, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Mingyue Guo
- Statistical Genomics Laboratory, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
8
|
Kumar H, Tichkule S, Raj U, Gupta S, Srivastava S, Varadwaj PK. Effect of STAT3 inhibitor in chronic myeloid leukemia associated signaling pathway: a mathematical modeling, simulation and systems biology study. 3 Biotech 2016; 6:40. [PMID: 28330111 PMCID: PMC4729759 DOI: 10.1007/s13205-015-0357-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2015] [Accepted: 12/29/2015] [Indexed: 10/31/2022] Open
Abstract
Chronic myeloid leukemia (CML) is a hematopoietic stem-cell disorder which proliferates due to abnormal growth of basophil cells. Several proangiogenic molecules have been reported to be associated in CML progression, including the hepatocyte growth factor (HGF). However, detail mechanism about the cellular distribution and function of HGF in CML is yet to be revealed. The proliferation of hematopoietic cells are regulated by some of the growth factors like interleukin 3 (IL-3), IL-6, erythropoietin, thrombopoietin, etc. In this study IL-6 pathways have been taken into consideration which induces JAK/STAT and MAPK pathways to decipher the CML progression stages. An attempt has been made to model these pathways with the help of ordinary differential equations (ODEs) and estimating unknown parameters through fminsearch optimization algorithm. Some of the specific component like STAT3, of the pathway has been analyzed in detail and their role in CML progression has been elucidated. The roles of STAT3 inhibitors into the treatment of CML have been thoroughly studied and optimum concentration of the inhibitors have been predicted.
Collapse
|
9
|
Bayesian Computation Methods for Inferring Regulatory Network Models Using Biomedical Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 939:289-307. [DOI: 10.1007/978-981-10-1503-8_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
10
|
Ershov YA. Kinetic models of conjugated metabolic cycles. RUSSIAN JOURNAL OF PHYSICAL CHEMISTRY A 2015. [DOI: 10.1134/s0036024416010088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
11
|
Kumar H, Tichkule S, Raj U, Gupta S, Srivastava S, Varadwaj PK. Parameters Involved in Autophosphorylation in Chronic Myeloid Leukemia: a Systems Biology Approach. Asian Pac J Cancer Prev 2015. [PMID: 26225665 DOI: 10.7314/apjcp.2015.16.13.5273] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Chronic myeloid leukemia (CML) is a stem cell disorder characterized by the fusion of two oncogenes namely BCR and ABL with their aberrant expression. Autophosphorylation of BCR-ABL oncogenes results in proliferation of CML. The study deals with estimation of rate constant involved in each step of the cellular autophosphorylation process, which are consequently playing important roles in the proliferation of cancerous cells. MATERIALS AND METHODS A mathematical model was proposed for autophosphorylation of BCR-ABL oncogenes utilizing ordinary differential equations to enumerate the rate of change of each responsible system component. The major difficulty to model this process is the lack of experimental data, which are needed to estimate unknown model parameters. Initial concentration data of each substrate and product for BCR-ABL systems were collected from the reported literature. All parameters were optimized through time interval simulation using the fminsearch algorithm. RESULTS The rate of change versus time was estimated to indicate the role of each state variable that are crucial for the systems. The time wise change in concentration of substrate shows the convergence of each parameter in autophosphorylation process. CONCLUSIONS The role of each constituent parameter and their relative time dependent variations in autophosphorylation process could be inferred.
Collapse
Affiliation(s)
- Himansu Kumar
- Department of Bioinformatics, Indian Institute of Information Technology, Allahabad, India E-mail :
| | | | | | | | | | | |
Collapse
|
12
|
Wu Q, Smith-Miles K, Tian T. Approximate Bayesian computation schemes for parameter inference of discrete stochastic models using simulated likelihood density. BMC Bioinformatics 2014; 15 Suppl 12:S3. [PMID: 25473744 PMCID: PMC4243104 DOI: 10.1186/1471-2105-15-s12-s3] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Mathematical modeling is an important tool in systems biology to study the dynamic property of complex biological systems. However, one of the major challenges in systems biology is how to infer unknown parameters in mathematical models based on the experimental data sets, in particular, when the data are sparse and the regulatory network is stochastic. RESULTS To address this issue, this work proposed a new algorithm to estimate parameters in stochastic models using simulated likelihood density in the framework of approximate Bayesian computation. Two stochastic models were used to demonstrate the efficiency and effectiveness of the proposed method. In addition, we designed another algorithm based on a novel objective function to measure the accuracy of stochastic simulations. CONCLUSIONS Simulation results suggest that the usage of simulated likelihood density improves the accuracy of estimates substantially. When the error is measured at each observation time point individually, the estimated parameters have better accuracy than those obtained by a published method in which the error is measured using simulations over the entire observation time period.
Collapse
|