1
|
Zhang W, Zhang F, Zhang J, Wang N. Hierarchical parameter estimation of GRN based on topological analysis. IET Syst Biol 2019; 12:294-303. [PMID: 30472694 DOI: 10.1049/iet-syb.2018.5015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Reverse engineering of gene regulatory network (GRN) is an important and challenging task in systems biology. Existing parameter estimation approaches that compute model parameters with the same importance are usually computationally expensive or infeasible, especially in dealing with complex biological networks.In order to improve the efficiency of computational modeling, the paper applies a hierarchical estimation methodology in computational modeling of GRN based on topological analysis. This paper divides nodes in a network into various priority levels using the graph-based measure and genetic algorithm. The nodes in the first level, that correspond to root strongly connected components(SCC) in the digraph of GRN, are given top priority in parameter estimation. The estimated parameters of vertices in the previous priority level ARE used to infer the parameters for nodes in the next priority level. The proposed hierarchical estimation methodology obtains lower error indexes while consuming less computational resources compared with single estimation methodology. Experimental outcomes with insilico networks and a realistic network show that gene networks are decomposed into no more than four levels, which is consistent with the properties of inherent modularity for GRN. In addition, the proposed hierarchical parameter estimation achieves a balance between computational efficiency and accuracy.
Collapse
Affiliation(s)
- Wei Zhang
- Department of Control Science and Engineering, Zhejiang University, Zheda Road 38, Hangzhou, People's Republic of China
| | - Feng Zhang
- Department of Control Science and Engineering, Zhejiang University, Zheda Road 38, Hangzhou, People's Republic of China
| | - Jianming Zhang
- Department of Control Science and Engineering, Zhejiang University, Zheda Road 38, Hangzhou, People's Republic of China.
| | - Ning Wang
- Department of Control Science and Engineering, Zhejiang University, Zheda Road 38, Hangzhou, People's Republic of China
| |
Collapse
|
2
|
Loskot P, Atitey K, Mihaylova L. Comprehensive Review of Models and Methods for Inferences in Bio-Chemical Reaction Networks. Front Genet 2019; 10:549. [PMID: 31258548 PMCID: PMC6588029 DOI: 10.3389/fgene.2019.00549] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 05/24/2019] [Indexed: 01/30/2023] Open
Abstract
The key processes in biological and chemical systems are described by networks of chemical reactions. From molecular biology to biotechnology applications, computational models of reaction networks are used extensively to elucidate their non-linear dynamics. The model dynamics are crucially dependent on the parameter values which are often estimated from observations. Over the past decade, the interest in parameter and state estimation in models of (bio-) chemical reaction networks (BRNs) grew considerably. The related inference problems are also encountered in many other tasks including model calibration, discrimination, identifiability, and checking, and optimum experiment design, sensitivity analysis, and bifurcation analysis. The aim of this review paper is to examine the developments in literature to understand what BRN models are commonly used, and for what inference tasks and inference methods. The initial collection of about 700 documents concerning estimation problems in BRNs excluding books and textbooks in computational biology and chemistry were screened to select over 270 research papers and 20 graduate research theses. The paper selection was facilitated by text mining scripts to automate the search for relevant keywords and terms. The outcomes are presented in tables revealing the levels of interest in different inference tasks and methods for given models in the literature as well as the research trends are uncovered. Our findings indicate that many combinations of models, tasks and methods are still relatively unexplored, and there are many new research opportunities to explore combinations that have not been considered-perhaps for good reasons. The most common models of BRNs in literature involve differential equations, Markov processes, mass action kinetics, and state space representations whereas the most common tasks are the parameter inference and model identification. The most common methods in literature are Bayesian analysis, Monte Carlo sampling strategies, and model fitting to data using evolutionary algorithms. The new research problems which cannot be directly deduced from the text mining data are also discussed.
Collapse
Affiliation(s)
- Pavel Loskot
- College of Engineering, Swansea University, Swansea, United Kingdom
| | - Komlan Atitey
- College of Engineering, Swansea University, Swansea, United Kingdom
| | - Lyudmila Mihaylova
- Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield, United Kingdom
| |
Collapse
|
3
|
Dai H, Umarov R, Kuwahara H, Li Y, Song L, Gao X. Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape. Bioinformatics 2017; 33:3575-3583. [PMID: 28961686 PMCID: PMC5870668 DOI: 10.1093/bioinformatics/btx480] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Revised: 07/07/2017] [Accepted: 07/26/2017] [Indexed: 12/29/2022] Open
Abstract
MOTIVATION An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. RESULTS Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods. AVAILABILITY AND IMPLEMENTATION Our program is freely available at https://github.com/ramzan1990/sequence2vec. CONTACT xin.gao@kaust.edu.sa or lsong@cc.gatech.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hanjun Dai
- College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
| | - Ramzan Umarov
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
| | - Hiroyuki Kuwahara
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
| | - Yu Li
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
| | - Le Song
- College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
| |
Collapse
|
4
|
Fujii C, Kuwahara H, Yu G, Guo L, Gao X. Learning gene regulatory networks from gene expression data using weighted consensus. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.02.087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
5
|
Dahlquist KD, Fitzpatrick BG, Camacho ET, Entzminger SD, Wanner NC. Parameter Estimation for Gene Regulatory Networks from Microarray Data: Cold Shock Response in Saccharomyces cerevisiae. Bull Math Biol 2015; 77:1457-92. [PMID: 26420504 PMCID: PMC4636536 DOI: 10.1007/s11538-015-0092-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 07/16/2015] [Indexed: 11/10/2022]
Abstract
We investigated the dynamics of a gene regulatory network controlling the cold shock response in budding yeast, Saccharomyces cerevisiae. The medium-scale network, derived from published genome-wide location data, consists of 21 transcription factors that regulate one another through 31 directed edges. The expression levels of the individual transcription factors were modeled using mass balance ordinary differential equations with a sigmoidal production function. Each equation includes a production rate, a degradation rate, weights that denote the magnitude and type of influence of the connected transcription factors (activation or repression), and a threshold of expression. The inverse problem of determining model parameters from observed data is our primary interest. We fit the differential equation model to published microarray data using a penalized nonlinear least squares approach. Model predictions fit the experimental data well, within the 95 % confidence interval. Tests of the model using randomized initial guesses and model-generated data also lend confidence to the fit. The results have revealed activation and repression relationships between the transcription factors. Sensitivity analysis indicates that the model is most sensitive to changes in the production rate parameters, weights, and thresholds of Yap1, Rox1, and Yap6, which form a densely connected core in the network. The modeling results newly suggest that Rap1, Fhl1, Msn4, Rph1, and Hsf1 play an important role in regulating the early response to cold shock in yeast. Our results demonstrate that estimation for a large number of parameters can be successfully performed for nonlinear dynamic gene regulatory networks using sparse, noisy microarray data.
Collapse
Affiliation(s)
- Kam D Dahlquist
- Department of Biology, Loyola Marymount University, 1 LMU Drive, MS 8888, Los Angeles, CA, 90045, USA.
| | - Ben G Fitzpatrick
- Department of Mathematics, Loyola Marymount University, 1 LMU Drive, UH 2700, Los Angeles, CA, 90045, USA
| | - Erika T Camacho
- School of Mathematical and Natural Sciences, Arizona State University, Mail Code 2352, P.O. Box 37100, Phoenix, AZ, 85069-7100, USA
| | - Stephanie D Entzminger
- Department of Mathematics, Loyola Marymount University, 1 LMU Drive, UH 2700, Los Angeles, CA, 90045, USA
| | - Nathan C Wanner
- Department of Mathematics, Loyola Marymount University, 1 LMU Drive, UH 2700, Los Angeles, CA, 90045, USA
| |
Collapse
|
6
|
Fan M, Kuwahara H, Wang X, Wang S, Gao X. Parameter estimation methods for gene circuit modeling from time-series mRNA data: a comparative study. Brief Bioinform 2015; 16:987-99. [PMID: 25818863 DOI: 10.1093/bib/bbv015] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Indexed: 11/14/2022] Open
Abstract
Parameter estimation is a challenging computational problem in the reverse engineering of biological systems. Because advances in biotechnology have facilitated wide availability of time-series gene expression data, systematic parameter estimation of gene circuit models from such time-series mRNA data has become an important method for quantitatively dissecting the regulation of gene expression. By focusing on the modeling of gene circuits, we examine here the performance of three types of state-of-the-art parameter estimation methods: population-based methods, online methods and model-decomposition-based methods. Our results show that certain population-based methods are able to generate high-quality parameter solutions. The performance of these methods, however, is heavily dependent on the size of the parameter search space, and their computational requirements substantially increase as the size of the search space increases. In comparison, online methods and model decomposition-based methods are computationally faster alternatives and are less dependent on the size of the search space. Among other things, our results show that a hybrid approach that augments computationally fast methods with local search as a subsequent refinement procedure can substantially increase the quality of their parameter estimates to the level on par with the best solution obtained from the population-based methods while maintaining high computational speed. These suggest that such hybrid methods can be a promising alternative to the more commonly used population-based methods for parameter estimation of gene circuit models when limited prior knowledge about the underlying regulatory mechanisms makes the size of the parameter search space vastly large.
Collapse
|
7
|
Wang X, Kuwahara H, Gao X. Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 5:S5. [PMID: 25605483 PMCID: PMC4305984 DOI: 10.1186/1752-0509-8-s5-s5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
BACKGROUND A quantitative understanding of interactions between transcription factors (TFs) and their DNA binding sites is key to the rational design of gene regulatory networks. Recent advances in high-throughput technologies have enabled high-resolution measurements of protein-DNA binding affinity. Importantly, such experiments revealed the complex nature of TF-DNA interactions, whereby the effects of nucleotide changes on the binding affinity were observed to be context dependent. A systematic method to give high-quality estimates of such complex affinity landscapes is, thus, essential to the control of gene expression and the advance of synthetic biology. RESULTS Here, we propose a two-round prediction method that is based on support vector regression (SVR) with weighted degree (WD) kernels. In the first round, a WD kernel with shifts and mismatches is used with SVR to detect the importance of subsequences with different lengths at different positions. The subsequences identified as important in the first round are then fed into a second WD kernel to fit the experimentally measured affinities. To our knowledge, this is the first attempt to increase the accuracy of the affinity prediction by applying two rounds of string kernels and by identifying a small number of crucial k-mers. The proposed method was tested by predicting the binding affinity landscape of Gcn4p in Saccharomyces cerevisiae using datasets from HiTS-FLIP. Our method explicitly identified important subsequences and showed significant performance improvements when compared with other state-of-the-art methods. Based on the identified important subsequences, we discovered two surprisingly stable 10-mers and one sensitive 10-mer which were not reported before. Further test on four other TFs in S. cerevisiae demonstrated the generality of our method. CONCLUSION We proposed in this paper a two-round method to quantitatively model the DNA binding affinity landscape. Since the ability to modify genetic parts to fine-tune gene expression rates is crucial to the design of biological systems, such a tool may play an important role in the success of synthetic biology going forward.
Collapse
|