1
|
Ebtekar A, Hutter M. Modeling the Arrows of Time with Causal Multibaker Maps. ENTROPY (BASEL, SWITZERLAND) 2024; 26:776. [PMID: 39330109 DOI: 10.3390/e26090776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 07/22/2024] [Accepted: 09/07/2024] [Indexed: 09/28/2024]
Abstract
Why do we remember the past, and plan the future? We introduce a toy model in which to investigate emergent time asymmetries: the causal multibaker maps. These are reversible discrete-time dynamical systems with configurable causal interactions. Imposing a suitable initial condition or "Past Hypothesis", and then coarse-graining, yields a Pearlean locally causal structure. While it is more common to speculate that the other arrows of time arise from the thermodynamic arrow, our model instead takes the causal arrow as fundamental. From it, we obtain the thermodynamic and epistemic arrows of time. The epistemic arrow concerns records, which we define to be systems that encode the state of another system at another time, regardless of the latter system's dynamics. Such records exist of the past, but not of the future. We close with informal discussions of the evolutionary and agential arrows of time, and their relevance to decision theory.
Collapse
Affiliation(s)
- Aram Ebtekar
- Independent Researcher, Vancouver, BC V5Y 3J6, Canada
| | - Marcus Hutter
- Google DeepMind, London N1C 4AG, UK
- School of Computing, Australian National University, Canberra, ACT 2601, Australia
| |
Collapse
|
2
|
Si T, Wang Y, Zhang L, Richmond E, Ahn TH, Gong H. Multivariate Time Series Change-Point Detection with a Novel Pearson-like Scaled Bregman Divergence. STATS 2024; 7:462-480. [PMID: 38827579 PMCID: PMC11138604 DOI: 10.3390/stats7020028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2024] Open
Abstract
Change-point detection is a challenging problem that has a number of applications across various real-world domains. The primary objective of CPD is to identify specific time points where the underlying system undergoes transitions between different states, each characterized by its distinct data distribution. Precise identification of change points in time series omics data can provide insights into the dynamic and temporal characteristics inherent to complex biological systems. Many change-point detection methods have traditionally focused on the direct estimation of data distributions. However, these approaches become unrealistic in high-dimensional data analysis. Density ratio methods have emerged as promising approaches for change-point detection since estimating density ratios is easier than directly estimating individual densities. Nevertheless, the divergence measures used in these methods may suffer from numerical instability during computation. Additionally, the most popular α -relative Pearson divergence cannot measure the dissimilarity between two distributions of data but a mixture of distributions. To overcome the limitations of existing density ratio-based methods, we propose a novel approach called the Pearson-like scaled-Bregman divergence-based (PLsBD) density ratio estimation method for change-point detection. Our theoretical studies derive an analytical expression for the Pearson-like scaled Bregman divergence using a mixture measure. We integrate the PLsBD with a kernel regression model and apply a random sampling strategy to identify change points in both synthetic data and real-world high-dimensional genomics data of Drosophila. Our PLsBD method demonstrates superior performance compared to many other change-point detection methods.
Collapse
Affiliation(s)
- Tong Si
- Department of Mathematics and Statistics, Saint Louis University, St. Louis, MO 63103, USA
| | - Yunge Wang
- Department of Mathematics and Statistics, Saint Louis University, St. Louis, MO 63103, USA
| | - Lingling Zhang
- Department of Mathematics and Statistics, University at Albany SUNY, Albany, NY 12222, USA
| | - Evan Richmond
- Department of Mathematics and Statistics, Saint Louis University, St. Louis, MO 63103, USA
| | - Tae-Hyuk Ahn
- Department of Computer Science, Saint Louis University, St. Louis, MO 63103, USA
| | - Haijun Gong
- Department of Mathematics and Statistics, Saint Louis University, St. Louis, MO 63103, USA
| |
Collapse
|
3
|
Duttweiler L, Thurston SW, Almudevar A. Spectral Bayesian network theory. LINEAR ALGEBRA AND ITS APPLICATIONS 2023; 674:282-303. [PMID: 37520305 PMCID: PMC10373448 DOI: 10.1016/j.laa.2023.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/01/2023]
Abstract
A Bayesian Network (BN) is a probabilistic model that represents a set of variables using a directed acyclic graph (DAG). Current algorithms for learning BN structures from data focus on estimating the edges of a specific DAG, and often lead to many 'likely' network structures. In this paper, we lay the groundwork for an approach that focuses on learning global properties of the DAG rather than exact edges. This is done by defining the structural hypergraph of a BN, which is shown to be related to the inverse-covariance matrix of the network. Spectral bounds are derived for the normalized inverse-covariance matrix, which are shown to be closely related to the maximum indegree of the associated BN.
Collapse
|
4
|
Zhang J, Hu C, Zhang Q. Gene regulatory network inference based on a nonhomogeneous dynamic Bayesian network model with an improved Markov Monte Carlo sampling. BMC Bioinformatics 2023; 24:264. [PMID: 37355560 DOI: 10.1186/s12859-023-05381-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Accepted: 06/07/2023] [Indexed: 06/26/2023] Open
Abstract
A nonhomogeneous dynamic Bayesian network model, which combines the dynamic Bayesian network and the multi-change point process, solves the limitations of the dynamic Bayesian network in modeling non-stationary gene expression data to a certain extent. However, certain problems persist, such as the low network reconstruction accuracy and poor model convergence. Therefore, we propose an MD-birth move based on the Manhattan distance of the data points to increase the rationality of the multi-change point process. The underlying concept of the MD-birth move is that the direction of movement of the change point is assumed to have a larger Manhattan distance between the variance and the mean of its left and right data points. Considering the data instability characteristics, we propose a Markov chain Monte Carlo sampling method based on node-dependent particle filtering in addition to the multi-change point process. The candidate parent nodes to be sampled, which are close to the real state, are pushed to the high probability area through the particle filter, and the candidate parent node set to be sampled that is far from the real state is pushed to the low probability area and then sampled. In terms of reconstructing the gene regulatory network, the model proposed in this paper (FC-DBN) has better network reconstruction accuracy and model convergence speed than other corresponding models on the Saccharomyces cerevisiae data and RAF data.
Collapse
Affiliation(s)
- Jiayao Zhang
- College of Artificial Intelligence and Big Data, Hefei University, Hefei, 230031, China
| | - Chunling Hu
- College of Artificial Intelligence and Big Data, Hefei University, Hefei, 230031, China.
| | - Qianqian Zhang
- College of Artificial Intelligence and Big Data, Hefei University, Hefei, 230031, China
| |
Collapse
|
5
|
Ajmal HB, Madden MG. Dynamic Bayesian Network Learning to Infer Sparse Models From Time Series Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2794-2805. [PMID: 34181549 DOI: 10.1109/tcbb.2021.3092879] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
One of the key challenges in systems biology is to derive gene regulatory networks (GRNs) from complex high-dimensional sparse data. Bayesian networks (BNs) and dynamic Bayesian networks (DBNs) have been widely applied to infer GRNs from gene expression data. GRNs are typically sparse but traditional approaches of BN structure learning to elucidate GRNs often produce many spurious (false positive) edges. We present two new BN scoring functions, which are extensions to the Bayesian Information Criterion (BIC) score, with additional penalty terms and use them in conjunction with DBN structure search methods to find a graph structure that maximises the proposed scores. Our BN scoring functions offer better solutions for inferring networks with fewer spurious edges compared to the BIC score. The proposed methods are evaluated extensively on auto regressive and DREAM4 benchmarks. We found that they significantly improve the precision of the learned graphs, relative to the BIC score. The proposed methods are also evaluated on three real time series gene expression datasets. The results demonstrate that our algorithms are able to learn sparse graphs from high-dimensional time series data. The implementation of these algorithms is open source and is available in form of an R package on GitHub at https://github.com/HamdaBinteAjmal/DBN4GRN, along with the documentation and tutorials.
Collapse
|
6
|
The identifiability of gene regulatory networks: the role of observation data. J Biol Phys 2022; 48:93-110. [PMID: 34988715 PMCID: PMC8866611 DOI: 10.1007/s10867-021-09595-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Accepted: 11/07/2021] [Indexed: 10/19/2022] Open
Abstract
Identifying gene regulatory networks (GRN) from observation data is significant to understand biological systems. Conventional studies focus on improving the performance of identification algorithms. However, besides algorithm performance, the GRN identification is strongly depended on the observation data. In this work, for three GRN S-system models, three observation data collection schemes are used to perform the identifiability test procedure. A modified genetic algorithm-particle swarm optimization algorithm is proposed to implement this task, including the multi-level mutation operation and velocity limitation strategy. The results show that, in scheme 1 (starting from a special initial condition), the GRN systems are of identifiability using the sufficient transient observation data. In scheme 2, the observation data are short of sufficient system dynamic. The GRN systems are not of identifiability even though the state trajectories can be reproduced. As a special case of scheme 2, i.e., the steady-state observation data, the equilibrium point analysis is given to explain why it is infeasible for GRN identification. In schemes 1 and 2, the observation data are obtained from zero-input GRN systems, which will evolve to the steady state at last. The sufficient transient observation data in scheme 1 can be obtained by changing the experimental conditions. Additionally, the valid observation data can be also obtained by means of adding impulse excitation signal into GRN systems (scheme 3). Consequently, the GRN systems are identifiable using scheme 3. Owing to its universality and simplicity, these results provide a guide for biologists to collect valid observation data for identifying GRNs and to further understand GRN dynamics.
Collapse
|
7
|
Liu C, Chen Y, Chen F, Zhu P, Chen L. Sliding window change point detection based dynamic network model inference framework for airport ground service process. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107701] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
8
|
Pyne S, Anand A. Rapid Reconstruction of Time-varying Gene Regulatory Networks with Limited Main Memory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1608-1619. [PMID: 31613774 DOI: 10.1109/tcbb.2019.2946826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Reconstruction of time-varying gene regulatory networks underlying a time-series gene expression data is a fundamental challenge in the computational systems biology. The challenge increases multi-fold if the target networks need to be constructed for hundreds to thousands of genes. There have been constant efforts to design an algorithm that can perform the reconstruction task correctly as well as can scale efficiently (with respect to both time and memory) to such a large number of genes. However, the existing algorithms either do not offer time-efficiency, or they offer it at other costs - memory-inefficiency or imposition of a constraint, known as the 'smoothly time-varying assumption'. In this article, two novel algorithms - 'an algorithm for reconstructing Time-varying Gene regulatory networks with Shortlisted candidate regulators - which is Light on memory' (TGS-Lite) and 'TGS-Lite Plus' (TGS-Lite+) - are proposed that are time-efficient, memory-efficient and do not impose the smoothly time-varying assumption. Additionally, they offer state-of-the-art reconstruction correctness as demonstrated with three benchmark datasets. Source Code: https://github.com/sap01/TGS-Lite-supplem/tree/master/sourcecode.
Collapse
|
9
|
Shafiee Kamalabad M, Grzegorczyk M. A new Bayesian piecewise linear regression model for dynamic network reconstruction. BMC Bioinformatics 2021; 22:196. [PMID: 33902443 PMCID: PMC8074473 DOI: 10.1186/s12859-021-03998-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 02/05/2021] [Indexed: 11/10/2022] Open
Abstract
Background Linear regression models are important tools for learning regulatory networks from gene expression time series. A conventional assumption for non-homogeneous regulatory processes on a short time scale is that the network structure stays constant across time, while the network parameters are time-dependent. The objective is then to learn the network structure along with changepoints that divide the time series into time segments. An uncoupled model learns the parameters separately for each segment, while a coupled model enforces the parameters of any segment to stay similar to those of the previous segment. In this paper, we propose a new consensus model that infers for each individual time segment whether it is coupled to (or uncoupled from) the previous segment. Results The results show that the new consensus model is superior to the uncoupled and the coupled model, as well as superior to a recently proposed generalized coupled model. Conclusions The newly proposed model has the uncoupled and the coupled model as limiting cases, and it is able to infer the best trade-off between them from the data. Supplementary Information The online version supplementary material available at 10.1186/s12859-021-03998-9.
Collapse
Affiliation(s)
- Mahdi Shafiee Kamalabad
- Department of Methodology and Statistics, Tilburg School of Social and Behavioral Sciences, Tilburg University, Prof. Cobbenhagenlaan 225, 5037 DB, Tilburg, The Netherlands.,Jheronimus Academy of Data Science, Sint Janssingel 92, 5211 DA, 's-Hertogenbosch, The Netherlands
| | - Marco Grzegorczyk
- Bernoulli Institute, Groningen University, Nijenborgh 9, 9747 AG, Groningen, The Netherlands.
| |
Collapse
|
10
|
Ingham VA, Elg S, Nagi SC, Dondelinger F. Capturing the transcription factor interactome in response to sub-lethal insecticide exposure. CURRENT RESEARCH IN INSECT SCIENCE 2021; 1:None. [PMID: 34977825 PMCID: PMC8702396 DOI: 10.1016/j.cris.2021.100018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 06/15/2021] [Accepted: 07/21/2021] [Indexed: 12/02/2022]
Abstract
The increasing levels of pesticide resistance in agricultural pests and disease vectors represents a threat to both food security and global health. As insecticide resistance intensity strengthens and spreads, the likelihood of a pest encountering a sub-lethal dose of pesticide dramatically increases. Here, we apply dynamic Bayesian networks to a transcriptome time-course generated using sub-lethal pyrethroid exposure on a highly resistant Anopheles coluzzii population. The model accounts for circadian rhythm and ageing effects allowing high confidence identification of transcription factors with key roles in pesticide response. The associations generated by this model show high concordance with lab-based validation and identifies 44 transcription factors putatively regulating insecticide-responsive transcripts. We identify six key regulators, with each displaying differing enrichment terms, demonstrating the complexity of pesticide response. The considerable overlap of resistance mechanisms in agricultural pests and disease vectors strongly suggests that these findings are relevant in a wide variety of pest species.
Collapse
|
11
|
Ajmal HB, Madden MG. Inferring dynamic gene regulatory networks with low-order conditional independencies – an evaluation of the method. Stat Appl Genet Mol Biol 2020. [DOI: 10.1515/sagmb-2020-0051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractOver a decade ago, Lèbre (2009) proposed an inference method, G1DBN, to learn the structure of gene regulatory networks (GRNs) from high dimensional, sparse time-series gene expression data. Their approach is based on concept of low-order conditional independence graphs that they extend to dynamic Bayesian networks (DBNs). They present results to demonstrate that their method yields better structural accuracy compared to the related Lasso and Shrinkage methods, particularly where the data is sparse, that is, the number of time measurements n is much smaller than the number of genes p. This paper challenges these claims using a careful experimental analysis, to show that the GRNs reverse engineered from time-series data using the G1DBN approach are less accurate than claimed by Lèbre (2009). We also show that the Lasso method yields higher structural accuracy for graphs learned from the simulated data, compared to the G1DBN method, particularly when the data is sparse ($n{< }{< }p$). The Lasso method is also better than G1DBN at identifying the transcription factors (TFs) involved in the cell cycle of Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- Hamda B. Ajmal
- School of Computer Science, National University of Ireland, Galway, Ireland
| | - Michael G. Madden
- School of Computer Science, National University of Ireland, Galway, Ireland
| |
Collapse
|
12
|
Wolpert DH. Uncertainty Relations and Fluctuation Theorems for Bayes Nets. PHYSICAL REVIEW LETTERS 2020; 125:200602. [PMID: 33258647 DOI: 10.1103/physrevlett.125.200602] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 06/30/2020] [Accepted: 09/11/2020] [Indexed: 05/10/2023]
Abstract
Recent research has considered the stochastic thermodynamics of multiple interacting systems, representing the overall system as a Bayes net. I derive fluctuation theorems governing the entropy production (EP) of arbitrary sets of the systems in such a Bayes net. I also derive "conditional" fluctuation theorems, governing the distribution of EP in one set of systems conditioned on the EP of a different set of systems. I then derive thermodynamic uncertainty relations relating the EP of the overall system to the precisions of probability currents within the individual systems.
Collapse
Affiliation(s)
- David H Wolpert
- Santa Fe Institute, Santa Fe, New Mexico Complexity Science Hub, Vienna Arizona State University, Tempe, Arizona 87501, USA
| |
Collapse
|
13
|
Dong C, Zhang Q. The Cubic Dynamic Uncertain Causality Graph: A Methodology for Temporal Process Modeling and Diagnostic Logic Inference. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4239-4253. [PMID: 31905150 DOI: 10.1109/tnnls.2019.2953177] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
To meet the demand for dynamic and highly reliable real-time fault diagnosis for complex systems, we extend the dynamic uncertain causality graph (DUCG) by proposing novel temporal causality modeling and reasoning methods. A new methodology, the Cubic DUCG, is therefore developed. It exploits an efficient scheme for compactly representing and accurately reasoning about the dynamic causalities in the system fault-spreading process. The Cubic DUCG is characterized by: 1) continuous generation of a causality graph that allows for causal connections penetrating among any number of time slices and discards the restrictive assumptions (about the underlying graph structure) upon which the existing research commonly relies; 2) a modeling scheme of complex causalities that includes dynamic negative feedback loops in a natural and intuitive manner; 3) a rigorous and reliable inference algorithm based on complete causalities that reflect real-time fault situations rather than on the cumulative aggregation of static time slices; and 4) some solutions to causality simplification and reduction, graphical transformation, and logical reasoning, for the sake of reducing the reasoning complexity. A series of fault diagnosis experiments on a nuclear power plant simulator verifies the accuracy, robustness, and efficiency of the proposed methodology.
Collapse
|
14
|
Shafiee Kamalabad M, Heberle AM, Thedieck K, Grzegorczyk M. Partially non-homogeneous dynamic Bayesian networks based on Bayesian regression models with partitioned design matrices. Bioinformatics 2020; 35:2108-2117. [PMID: 30395165 PMCID: PMC6581439 DOI: 10.1093/bioinformatics/bty917] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 08/29/2018] [Accepted: 11/02/2018] [Indexed: 12/25/2022] Open
Abstract
Motivation Non-homogeneous dynamic Bayesian networks (NH-DBNs) are a popular modelling tool for learning cellular networks from time series data. In systems biology, time series are often measured under different experimental conditions, and not rarely only some network interaction parameters depend on the condition while the other parameters stay constant across conditions. For this situation, we propose a new partially NH-DBN, based on Bayesian hierarchical regression models with partitioned design matrices. With regard to our main application to semi-quantitative (immunoblot) timecourse data from mammalian target of rapamycin complex 1 (mTORC1) signalling, we also propose a Gaussian process-based method to solve the problem of non-equidistant time series measurements. Results On synthetic network data and on yeast gene expression data the new model leads to improved network reconstruction accuracies. We then use the new model to reconstruct the topologies of the circadian clock network in Arabidopsis thaliana and the mTORC1 signalling pathway. The inferred network topologies show features that are consistent with the biological literature. Availability and implementation All datasets have been made available with earlier publications. Our Matlab code is available upon request. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mahdi Shafiee Kamalabad
- Department of Mathematics, Bernoulli Institute, Faculty of Science and Engineering, University of Groningen, AG Groningen, The Netherlands
| | - Alexander Martin Heberle
- Laboratory of Pediatrics, Section Systems Medicine of Metabolism and Signaling, University of Groningen, University Medical Center Groningen, AV Groningen, The Netherlands
| | - Kathrin Thedieck
- Laboratory of Pediatrics, Section Systems Medicine of Metabolism and Signaling, University of Groningen, University Medical Center Groningen, AV Groningen, The Netherlands.,Department for Neuroscience, School of Medicine and Health Sciences, Carl von Ossietzky University Oldenburg, Oldenburg, Germany
| | - Marco Grzegorczyk
- Department of Mathematics, Bernoulli Institute, Faculty of Science and Engineering, University of Groningen, AG Groningen, The Netherlands
| |
Collapse
|
15
|
Pyne S, Kumar AR, Anand A. Rapid Reconstruction of Time-Varying Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:278-291. [PMID: 30072338 DOI: 10.1109/tcbb.2018.2861698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Rapid advancements in high-throughput technologies have resulted in genome-scale time series datasets. Uncovering the temporal sequence of gene regulatory events, in the form of time-varying gene regulatory networks (GRNs), demands computationally fast, accurate, and scalable algorithms. The existing algorithms can be divided into two categories: ones that are time-intensive and hence unscalable; and others that impose structural constraints to become scalable. In this paper, a novel algorithm, namely 'an algorithm for reconstructing Time-varying Gene regulatory networks with Shortlisted candidate regulators' (TGS), is proposed. TGS is time-efficient and does not impose any structural constraints. Moreover, it provides such flexibility and time-efficiency, without losing its accuracy. TGS consistently outperforms the state-of-the-art algorithms in true positive detection, on three benchmark synthetic datasets. However, TGS does not perform as well in false positive rejection. To mitigate this issue, TGS+ is proposed. TGS+ demonstrates competitive false positive rejection power, while maintaining the superior speed and true positive detection power of TGS. Nevertheless, the main memory requirements of both TGS variants grow exponentially with the number of genes, which they tackle by restricting the maximum number of regulators for each gene. Relaxing this restriction remains a challenge as the actual number of regulators is not known a priori.
Collapse
|
16
|
Xu M, Chen X, Wu WB. Estimation of Dynamic Networks for High-Dimensional Nonstationary Time Series. ENTROPY 2019; 22:e22010055. [PMID: 33285830 PMCID: PMC7516486 DOI: 10.3390/e22010055] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Revised: 12/25/2019] [Accepted: 12/26/2019] [Indexed: 12/01/2022]
Abstract
This paper is concerned with the estimation of time-varying networks for high-dimensional nonstationary time series. Two types of dynamic behaviors are considered: structural breaks (i.e., abrupt change points) and smooth changes. To simultaneously handle these two types of time-varying features, a two-step approach is proposed: multiple change point locations are first identified on the basis of comparing the difference between the localized averages on sample covariance matrices, and then graph supports are recovered on the basis of a kernelized time-varying constrained L1-minimization for inverse matrix estimation (CLIME) estimator on each segment. We derive the rates of convergence for estimating the change points and precision matrices under mild moment and dependence conditions. In particular, we show that this two-step approach is consistent in estimating the change points and the piecewise smooth precision matrix function, under a certain high-dimensional scaling limit. The method is applied to the analysis of network structure of the S&P 500 index between 2003 and 2008.
Collapse
Affiliation(s)
- Mengyu Xu
- Department of Statistics and Data Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL 32816, USA;
| | - Xiaohui Chen
- Department of Statistics, University of Illinois at Urbana-Champaign, S. Wright Street, Champaign, IL 61820, USA;
| | - Wei Biao Wu
- Department of Statistics, University of Chicago, 5747 S. Ellis Avenue, Jones 311, Chicago, IL 60637, USA
- Correspondence:
| |
Collapse
|
17
|
Wan X, Wang Z, Han QL, Wu M. A Recursive Approach to Quantized H ∞ State Estimation for Genetic Regulatory Networks Under Stochastic Communication Protocols. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2840-2852. [PMID: 30668504 DOI: 10.1109/tnnls.2018.2885723] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper deals with the finite-horizon quantized H∞ state estimation problem for a class of discrete time-varying genetic regulatory networks with quantization effects under stochastic communication protocols (SCPs). To better reflect the data-driven flavor of today's biological research, the network measurements (typically gigabytes in size by high-throughput sequencing technologies) are transmitted to a remote state estimator via two independent communication networks of limited bandwidths. To lighten the communication loads and avoid undesired data collisions, the measurement outputs are quantized and then transmitted under two SCPs introduced to schedule the large-scale data transmissions. The purpose of this paper is to design a time-varying state estimator such that the error dynamics of the state estimation satisfies a prescribed H∞ performance requirement over a finite horizon in the presence of nonlinearities, quantization effects, and SCPs. By utilizing the completing-the-square technique, sufficient conditions are derived to ensure the H∞ estimation performance and the parameters of the state estimator are designed by solving coupled backward recursive Riccati difference equations. A numerical example is given to illustrate the effectiveness of the design scheme of the proposed state estimator.
Collapse
|
18
|
Messager A, Parisis G, Kiss IZ, Harper R, Tee P, Berthouze L. Inferring Functional Connectivity From Time-Series of Events in Large Scale Network Deployments. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2019. [DOI: 10.1109/tnsm.2019.2932896] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
19
|
Adabor ES, Acquaah-Mensah GK. Restricted-derestricted dynamic Bayesian Network inference of transcriptional regulatory relationships among genes in cancer. Comput Biol Chem 2019; 79:155-164. [PMID: 30822674 DOI: 10.1016/j.compbiolchem.2019.02.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 01/21/2019] [Accepted: 02/20/2019] [Indexed: 01/19/2023]
Abstract
Understanding transcriptional regulatory relationships among genes is important for gaining etiological insights into diseases such as cancer. To this end, high-throughput biological data have been generated through advancements in a variety of technologies. These rely on computational approaches to discover underlying structures in such data. Among these computational approaches, Bayesian networks (BNs) stand out because their probabilistic nature enables them to manage randomness in the dynamics of gene regulation and experimental data. Feedback loops inherent in networks of regulatory relationships are more tractable when enhancements to BNs are applied to them. Here, we propose Restricted-Derestricted dynamic BNs with a novel search technique, Restricted-Derestricted Greedy Method, for such tasks. This approach relies on the Restricted-Derestricted Greedy search technique to infer transcriptional regulatory networks in two phases: restricted inference and derestricted inference. An application of this approach to real data sets reveals it performs favourably well compared to other existing well performing dynamic BN approaches in terms of recovering true relationships among genes. In addition, it provides a balance between searching for optimal networks and keeping biologically relevant regulatory interactions among variables.
Collapse
Affiliation(s)
- Emmanuel S Adabor
- School of Technology, Ghana Institute of Management and Public Administration, Achimota, Accra, Ghana.
| | - George K Acquaah-Mensah
- Pharmaceutical Sciences Department, Massachusetts College of Pharmacy and Health Sciences (MCPHS University), 19 Foster Street, Worcester, MA, USA
| |
Collapse
|
20
|
Wang Z, Guo Y, Gong H. An Integrative Analysis of Time-varying Regulatory Networks From High-dimensional Data. PROCEEDINGS : ... IEEE INTERNATIONAL CONFERENCE ON BIG DATA. IEEE INTERNATIONAL CONFERENCE ON BIG DATA 2019; 2018:3798-3807. [PMID: 31544173 DOI: 10.1109/bigdata.2018.8622361] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Directed networks have been widely used to describe many biological processes and functions. Understanding the structure of biological networks, especially regulatory networks, could help discover the mechanisms underlying important biological processes and pathogenesis of diseases. Most network inference methods assume the network structure is time-invariant or stationary. However, in some processes, the network structure is non-stationary or time-varying. The stationary network inference methods might not be able to directly used to reconstruct time-varying networks. Some non-stationary network learning methods have been proposed to infer the networks, but, the inferred networks are not regulatory networks which require activation and inhibition information. This work proposes an integrative approach, which combines the changepoint estimation, weighted network learning and searching, and model checking technique, to reconstruct time varying regulatory networks from high-dimensional time series data. We illustrate this approach to study the structure changes of Drosophila's regulatory networks in its life cycle.
Collapse
Affiliation(s)
- Zi Wang
- Department of Mathematics and Statistics, Saint Louis University, St. Louis, MO 63103, USA
| | - Yun Guo
- Department of Mathematics and Statistics, Saint Louis University, St. Louis, MO 63103, USA
| | - Haijun Gong
- Department of Mathematics and Statistics, Saint Louis University, St. Louis, MO 63103, USA.,Research School of Finance, Actuarial Studies and Statistics, Australian National University, Acton, ACT, 2601 Australia
| |
Collapse
|
21
|
Dondelinger F, Mukherjee S. Statistical Network Inference for Time-Varying Molecular Data with Dynamic Bayesian Networks. Methods Mol Biol 2019; 1883:25-48. [PMID: 30547395 DOI: 10.1007/978-1-4939-8882-2_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2023]
Abstract
In this chapter, we review the problem of network inference from time-course data, focusing on a class of graphical models known as dynamic Bayesian networks (DBNs). We discuss the relationship of DBNs to models based on ordinary differential equations, and consider extensions to nonlinear time dynamics. We provide an introduction to time-varying DBN models, which allow for changes to the network structure and parameters over time. We also discuss causal perspectives on network inference, including issues around model semantics that can arise due to missing variables. We present a case study of applying time-varying DBNs to gene expression measurements over the life cycle of Drosophila melanogaster. We finish with a discussion of future perspectives, including possible applications of time-varying network inference to single-cell gene expression data.
Collapse
Affiliation(s)
| | - Sach Mukherjee
- German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
| |
Collapse
|
22
|
Abstract
Gene regulatory networks are powerful abstractions of biological systems. Since the advent of high-throughput measurement technologies in biology in the late 1990s, reconstructing the structure of such networks has been a central computational problem in systems biology. While the problem is certainly not solved in its entirety, considerable progress has been made in the last two decades, with mature tools now available. This chapter aims to provide an introduction to the basic concepts underpinning network inference tools, attempting a categorization which highlights commonalities and relative strengths. While the chapter is meant to be self-contained, the material presented should provide a useful background to the later, more specialized chapters of this book.
Collapse
Affiliation(s)
- Vân Anh Huynh-Thu
- Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium
| | | |
Collapse
|
23
|
Abstract
We consider the problem of modeling conditional independence structures in heterogeneous data in the presence of additional subject-level covariates - termed Graphical Regression. We propose a novel specification of a conditional (in)dependence function of covariates - which allows the structure of a directed graph to vary flexibly with the covariates; imposes sparsity in both edge and covariate selection; produces both subject-specific and predictive graphs; and is computationally tractable. We provide theoretical justifications of our modeling endeavor, in terms of graphical model selection consistency. We demonstrate the performance of our method through rigorous simulation studies. We illustrate our approach in a cancer genomics-based precision medicine paradigm, where-in we explore gene regulatory networks in multiple myeloma taking prognostic clinical factors into account to obtain both population-level and subject-level gene regulatory networks.
Collapse
Affiliation(s)
- Yang Ni
- Department of Statistics and Data Sciences, The University of Texas at Austin
- Department of Statistics, Rice University
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center
| | - Francesco C Stingo
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center
- Department of Statistics, Computer Science, Applications "G. Parenti", The University of Florence
| | | |
Collapse
|
24
|
Causal Queries from Observational Data in Biological Systems via Bayesian Networks: An Empirical Study in Small Networks. Methods Mol Biol 2018. [PMID: 30547398 DOI: 10.1007/978-1-4939-8882-2_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Biological networks are a very convenient modeling and visualization tool to discover knowledge from modern high-throughput genomics and post-genomics data sets. Indeed, biological entities are not isolated but are components of complex multilevel systems. We go one step further and advocate for the consideration of causal representations of the interactions in living systems. We present the causal formalism and bring it out in the context of biological networks, when the data is observational. We also discuss its ability to decipher the causal information flow as observed in gene expression. We also illustrate our exploration by experiments on small simulated networks as well as on a real biological data set.
Collapse
|
25
|
De Landtsheer S, Lucarelli P, Sauter T. Using Regularization to Infer Cell Line Specificity in Logical Network Models of Signaling Pathways. Front Physiol 2018; 9:550. [PMID: 29872402 PMCID: PMC5972629 DOI: 10.3389/fphys.2018.00550] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Accepted: 04/30/2018] [Indexed: 11/13/2022] Open
Abstract
Understanding the functional properties of cells of different origins is a long-standing challenge of personalized medicine. Especially in cancer, the high heterogeneity observed in patients slows down the development of effective cures. The molecular differences between cell types or between healthy and diseased cellular states are usually determined by the wiring of regulatory networks. Understanding these molecular and cellular differences at the systems level would improve patient stratification and facilitate the design of rational intervention strategies. Models of cellular regulatory networks frequently make weak assumptions about the distribution of model parameters across cell types or patients. These assumptions are usually expressed in the form of regularization of the objective function of the optimization problem. We propose a new method of regularization for network models of signaling pathways based on the local density of the inferred parameter values within the parameter space. Our method reduces the complexity of models by creating groups of cell line-specific parameters which can then be optimized together. We demonstrate the use of our method by recovering the correct topology and inferring accurate values of the parameters of a small synthetic model. To show the value of our method in a realistic setting, we re-analyze a recently published phosphoproteomic dataset from a panel of 14 colon cancer cell lines. We conclude that our method efficiently reduces model complexity and helps recovering context-specific regulatory information.
Collapse
Affiliation(s)
- Sébastien De Landtsheer
- Systems Biology Group, Life Sciences Research Unit, University of Luxembourg, Belvaux, Luxembourg
| | - Philippe Lucarelli
- Systems Biology Group, Life Sciences Research Unit, University of Luxembourg, Belvaux, Luxembourg
| | - Thomas Sauter
- Systems Biology Group, Life Sciences Research Unit, University of Luxembourg, Belvaux, Luxembourg
| |
Collapse
|
26
|
Yu B, Xu JM, Li S, Chen C, Chen RX, Wang L, Zhang Y, Wang MH. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method. Oncotarget 2017; 8:80373-80392. [PMID: 29113310 PMCID: PMC5655205 DOI: 10.18632/oncotarget.21268] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 08/27/2017] [Indexed: 01/31/2023] Open
Abstract
Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.
Collapse
Affiliation(s)
- Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- CAS Key Laboratory of Geospace Environment, Department of Geophysics and Planetary Science, University of Science and Technology of China, Hefei 230026, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Jia-Meng Xu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Shan Li
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Rui-Xin Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Lei Wang
- Key Laboratory of Eco-chemical Engineering, Ministry of Education, Laboratory of Inorganic Synthesis and Applied Chemistry, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao 266042, China
| | - Yan Zhang
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
- College of Electromechanical Engineering, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Ming-Hui Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| |
Collapse
|
27
|
Intosalmi J, Nousiainen K, Ahlfors H, Lähdesmäki H. Data-driven mechanistic analysis method to reveal dynamically evolving regulatory networks. Bioinformatics 2017; 32:i288-i296. [PMID: 27307629 PMCID: PMC4908358 DOI: 10.1093/bioinformatics/btw274] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Motivation: Mechanistic models based on ordinary differential equations provide powerful and accurate means to describe the dynamics of molecular machinery which orchestrates gene regulation. When combined with appropriate statistical techniques, mechanistic models can be calibrated using experimental data and, in many cases, also the model structure can be inferred from time–course measurements. However, existing mechanistic models are limited in the sense that they rely on the assumption of static network structure and cannot be applied when transient phenomena affect, or rewire, the network structure. In the context of gene regulatory network inference, network rewiring results from the net impact of possible unobserved transient phenomena such as changes in signaling pathway activities or epigenome, which are generally difficult, but important, to account for. Results: We introduce a novel method that can be used to infer dynamically evolving regulatory networks from time–course data. Our method is based on the notion that all mechanistic ordinary differential equation models can be coupled with a latent process that approximates the network structure rewiring process. We illustrate the performance of the method using simulated data and, further, we apply the method to study the regulatory interactions during T helper 17 (Th17) cell differentiation using time–course RNA sequencing data. The computational experiments with the real data show that our method is capable of capturing the experimentally verified rewiring effects of the core Th17 regulatory network. We predict Th17 lineage specific subnetworks that are activated sequentially and control the differentiation process in an overlapping manner. Availability and Implementation: An implementation of the method is available at http://research.ics.aalto.fi/csb/software/lem/. Contacts:jukka.intosalmi@aalto.fi or harri.lahdesmaki@aalto.fi
Collapse
Affiliation(s)
- Jukka Intosalmi
- Department of Computer Science, Aalto University, Aalto, FI-00076, Finland
| | - Kari Nousiainen
- Department of Computer Science, Aalto University, Aalto, FI-00076, Finland
| | - Helena Ahlfors
- Lymphocyte Signalling and Development, The Babraham Institute, Cambridgeshire CB22 3AT, UK
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University, Aalto, FI-00076, Finland
| |
Collapse
|
28
|
Liang Y, Kelemen A. Computational dynamic approaches for temporal omics data with applications to systems medicine. BioData Min 2017. [PMID: 28638442 PMCID: PMC5473988 DOI: 10.1186/s13040-017-0140-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Modeling and predicting biological dynamic systems and simultaneously estimating the kinetic structural and functional parameters are extremely important in systems and computational biology. This is key for understanding the complexity of the human health, drug response, disease susceptibility and pathogenesis for systems medicine. Temporal omics data used to measure the dynamic biological systems are essentials to discover complex biological interactions and clinical mechanism and causations. However, the delineation of the possible associations and causalities of genes, proteins, metabolites, cells and other biological entities from high throughput time course omics data is challenging for which conventional experimental techniques are not suited in the big omics era. In this paper, we present various recently developed dynamic trajectory and causal network approaches for temporal omics data, which are extremely useful for those researchers who want to start working in this challenging research area. Moreover, applications to various biological systems, health conditions and disease status, and examples that summarize the state-of-the art performances depending on different specific mining tasks are presented. We critically discuss the merits, drawbacks and limitations of the approaches, and the associated main challenges for the years ahead. The most recent computing tools and software to analyze specific problem type, associated platform resources, and other potentials for the dynamic trajectory and interaction methods are also presented and discussed in detail.
Collapse
Affiliation(s)
- Yulan Liang
- Department of Family and Community Health, University of Maryland, Baltimore, MD 21201 USA
| | - Arpad Kelemen
- Department of Organizational Systems and Adult Health, University of Maryland, Baltimore, MD 21201 USA
| |
Collapse
|
29
|
Wu PP, Julian Caley M, Kendrick GA, McMahon K, Mengersen K. Dynamic Bayesian network inferencing for non‐homogeneous complex systems. J R Stat Soc Ser C Appl Stat 2017. [DOI: 10.1111/rssc.12228] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Paul P.‐Y. Wu
- Queensland University of Technology, and Australian Research Council Centre of Excellence in Mathematical and Statistical Frontiers Brisbane Australia
| | - M. Julian Caley
- Queensland University of Technology, and Australian Research Council Centre of Excellence in Mathematical and Statistical Frontiers Brisbane Australia
| | - Gary A. Kendrick
- University of Western Australia, Crawley, and Western Australia Marine Science Institution Perth Australia
| | - Kathryn McMahon
- Edith Cowan University, Joondalup, and Western Australia Marine Science Institution Perth Australia
| | - Kerrie Mengersen
- Queensland University of Technology, and Australian Research Council Centre of Excellence in Mathematical and Statistical Frontiers Brisbane Australia
| |
Collapse
|
30
|
McGoff KA, Guo X, Deckard A, Kelliher CM, Leman AR, Francey LJ, Hogenesch JB, Haase SB, Harer JL. The Local Edge Machine: inference of dynamic models of gene regulation. Genome Biol 2016; 17:214. [PMID: 27760556 PMCID: PMC5072315 DOI: 10.1186/s13059-016-1076-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Accepted: 10/03/2016] [Indexed: 12/31/2022] Open
Abstract
We present a novel approach, the Local Edge Machine, for the inference of regulatory interactions directly from time-series gene expression data. We demonstrate its performance, robustness, and scalability on in silico datasets with varying behaviors, sizes, and degrees of complexity. Moreover, we demonstrate its ability to incorporate biological prior information and make informative predictions on a well-characterized in vivo system using data from budding yeast that have been synchronized in the cell cycle. Finally, we use an atlas of transcription data in a mammalian circadian system to illustrate how the method can be used for discovery in the context of large complex networks.
Collapse
Affiliation(s)
- Kevin A McGoff
- Department of Mathematics and Statistics, UNC Charlotte, 9201 University City Blvd., Charlotte, 28269, NC, USA.
| | - Xin Guo
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China
| | | | | | - Adam R Leman
- Department of Biology, Duke University, Durham, NC, USA
| | - Lauren J Francey
- Department of Molecular and Cellular Physiology, University of Cincinnati, Cincinnati, OH, USA
| | - John B Hogenesch
- Department of Molecular and Cellular Physiology, University of Cincinnati, Cincinnati, OH, USA
| | | | - John L Harer
- Department of Mathematics, Duke University, Durham, NC, USA
| |
Collapse
|
31
|
Shavit Y, Yordanov B, Dunn SJ, Wintersteiger CM, Otani T, Hamadi Y, Livesey FJ, Kugler H. Automated Synthesis and Analysis of Switching Gene Regulatory Networks. Biosystems 2016; 146:26-34. [PMID: 27178783 DOI: 10.1016/j.biosystems.2016.03.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 03/30/2016] [Indexed: 11/18/2022]
Abstract
Studying the gene regulatory networks (GRNs) that govern how cells change into specific cell types with unique roles throughout development is an active area of experimental research. The fate specification process can be viewed as a biological program prescribing the system dynamics, governed by a network of genetic interactions. To investigate the possibility that GRNs are not fixed but rather change their topology, for example as cells progress through commitment, we introduce the concept of Switching Gene Regulatory Networks (SGRNs) to enable the modelling and analysis of network reconfiguration. We define the synthesis problem of constructing SGRNs that are guaranteed to satisfy a set of constraints representing experimental observations of cell behaviour. We propose a solution to this problem that employs methods based upon Satisfiability Modulo Theories (SMT) solvers, and evaluate the feasibility and scalability of our approach by considering a set of synthetic benchmarks exhibiting possible biological behaviour of cell development. We outline how our approach is applied to a more realistic biological system, by considering a simplified network involved in the processes of neuron maturation and fate specification in the mammalian cortex.
Collapse
Affiliation(s)
- Yoli Shavit
- University of Cambridge, UK; Microsoft Research, UK
| | | | | | | | | | | | | | - Hillel Kugler
- Microsoft Research, UK; Bar-Ilan University, Israel.
| |
Collapse
|
32
|
Bueno MLP, Hommersom A, Lucas PJF, Lappenschaar M, Janzing JGE. Understanding disease processes by partitioned dynamic Bayesian networks. J Biomed Inform 2016; 61:283-97. [PMID: 27182055 DOI: 10.1016/j.jbi.2016.05.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Revised: 04/04/2016] [Accepted: 05/11/2016] [Indexed: 10/21/2022]
Abstract
For many clinical problems in patients the underlying pathophysiological process changes in the course of time as a result of medical interventions. In model building for such problems, the typical scarcity of data in a clinical setting has been often compensated by utilizing time homogeneous models, such as dynamic Bayesian networks. As a consequence, the specificities of the underlying process are lost in the obtained models. In the current work, we propose the new concept of partitioned dynamic Bayesian networks to capture distribution regime changes, i.e. time non-homogeneity, benefiting from an intuitive and compact representation with the solid theoretical foundation of Bayesian network models. In order to balance specificity and simplicity in real-world scenarios, we propose a heuristic algorithm to search and learn these non-homogeneous models taking into account a preference for less complex models. An extensive set of experiments were ran, in which simulating experiments show that the heuristic algorithm was capable of constructing well-suited solutions, in terms of goodness of fit and statistical distance to the original distributions, in consonance with the underlying processes that generated data, whether it was homogeneous or non-homogeneous. Finally, a study case on psychotic depression was conducted using non-homogeneous models learned by the heuristic, leading to insightful answers for clinically relevant questions concerning the dynamics of this mental disorder.
Collapse
Affiliation(s)
- Marcos L P Bueno
- Institute for Computing and Information Sciences, Radboud University Nijmegen, The Netherlands.
| | - Arjen Hommersom
- Institute for Computing and Information Sciences, Radboud University Nijmegen, The Netherlands; Faculty of Management, Science and Technology, Open University, The Netherlands.
| | - Peter J F Lucas
- Institute for Computing and Information Sciences, Radboud University Nijmegen, The Netherlands; Leiden Institute of Advanced Computer Science, Leiden University, The Netherlands.
| | - Martijn Lappenschaar
- Institute for Computing and Information Sciences, Radboud University Nijmegen, The Netherlands.
| | - Joost G E Janzing
- Department of Psychiatry, Radboud University Nijmegen Medical Center, The Netherlands.
| |
Collapse
|
33
|
Yahara K, Furuta Y, Morimoto S, Kikutake C, Komukai S, Matelska D, Dunin-Horkawicz S, Bujnicki JM, Uchiyama I, Kobayashi I. Genome-wide survey of codons under diversifying selection in a highly recombining bacterial species, Helicobacter pylori. DNA Res 2016; 23:135-43. [PMID: 26961370 PMCID: PMC4833421 DOI: 10.1093/dnares/dsw003] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Accepted: 01/23/2016] [Indexed: 01/04/2023] Open
Abstract
Selection has been a central issue in biology in eukaryotes as well as prokaryotes. Inference of selection in recombining bacterial species, compared with clonal ones, has been a challenge. It is not known how codons under diversifying selection are distributed along the chromosome or among functional categories or how frequently such codons are subject to mutual homologous recombination. Here, we explored these questions by analysing genes present in >90% among 29 genomes of Helicobacter pylori, one of the bacterial species with the highest mutation and recombination rates. By a method for recombining sequences, we identified codons under diversifying selection (dN/dS> 1), which were widely distributed and accounted for ∼0.2% of all the codons of the genome. The codons were enriched in genes of host interaction/cell surface and genome maintenance (DNA replication,recombination, repair, and restriction modification system). The encoded amino acid residues were sometimes found adjacent to critical catalytic/binding residues in protein structures.Furthermore, by estimating the intensity of homologous recombination at a single nucleotide level, we found that these codons appear to be more frequently subject to recombination.We expect that the present study provides a new approach to population genomics of selection in recombining prokaryotes.
Collapse
Affiliation(s)
- Koji Yahara
- Biostatistics Center, Kurume University, Kurume, Fukuoka 830-0011, Japan
- Institute of Life Science, College of Medicine, Swansea University, Swansea SA2 8PP, UK
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Minato-ku, Tokyo108-8639, Japan
| | - Yoshikazu Furuta
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Minato-ku, Tokyo108-8639, Japan
| | - Shinpei Morimoto
- Division of Biostatistics, Kurume University School of Medicine, Fukuoka 830-0011, Japan
| | - Chie Kikutake
- Division of Biostatistics, Kurume University School of Medicine, Fukuoka 830-0011, Japan
| | - Sho Komukai
- Division of Biostatistics, Kurume University School of Medicine, Fukuoka 830-0011, Japan
| | - Dorota Matelska
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trodena 4, 02-109 Warsaw, Poland
| | - Stanisław Dunin-Horkawicz
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trodena 4, 02-109 Warsaw, Poland
| | - Janusz M. Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trodena 4, 02-109 Warsaw, Poland
- Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, 61-614 Poznan, Poland
| | - Ikuo Uchiyama
- Laboratory of Genome Informatics, National Institute for Basic Biology, Okazaki, Aichi 444-8585, Japan
| | - Ichizo Kobayashi
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Minato-ku, Tokyo108-8639, Japan
- Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| |
Collapse
|
34
|
Grzegorczyk M. A non-homogeneous dynamic Bayesian network with a hidden Markov model dependency structure among the temporal data points. Mach Learn 2016. [DOI: 10.1007/s10994-015-5503-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
35
|
Malleshaiah M, Padi M, Rué P, Quackenbush J, Martinez-Arias A, Gunawardena J. Nac1 Coordinates a Sub-network of Pluripotency Factors to Regulate Embryonic Stem Cell Differentiation. Cell Rep 2016; 14:1181-1194. [PMID: 26832399 DOI: 10.1016/j.celrep.2015.12.101] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Revised: 10/19/2015] [Accepted: 12/23/2015] [Indexed: 12/15/2022] Open
Abstract
Pluripotent cells give rise to distinct cell types during development and are regulated by often self-reinforcing molecular networks. How such networks allow cells to differentiate is less well understood. Here, we use integrative methods to show that external signals induce reorganization of the mouse embryonic stem cell pluripotency network and that a sub-network of four factors, Nac1, Oct4, Tcf3, and Sox2, regulates their differentiation into the alternative mesendodermal and neuroectodermal fates. In the mesendodermal fate, Nac1 and Oct4 were constrained within quantitative windows, whereas Sox2 and Tcf3 were repressed. In contrast, in the neuroectodermal fate, Sox2 and Tcf3 were constrained while Nac1 and Oct4 were repressed. In addition, we show that Nac1 coordinates differentiation by activating Oct4 and inhibiting both Sox2 and Tcf3. Reorganization of progenitor cell networks around shared factors might be a common differentiation strategy and our integrative approach provides a general methodology for delineating such networks.
Collapse
Affiliation(s)
- Mohan Malleshaiah
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA.
| | - Megha Padi
- Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA 02215, USA
| | - Pau Rué
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
| | - John Quackenbush
- Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA 02215, USA
| | | | - Jeremy Gunawardena
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA.
| |
Collapse
|
36
|
Hidden Markov induced Dynamic Bayesian Network for recovering time evolving gene regulatory networks. Sci Rep 2015; 5:17841. [PMID: 26680653 PMCID: PMC4683538 DOI: 10.1038/srep17841] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2015] [Accepted: 10/26/2015] [Indexed: 11/08/2022] Open
Abstract
Dynamic Bayesian Networks (DBN) have been widely used to recover gene regulatory relationships from time-series data in computational systems biology. Its standard assumption is ‘stationarity’, and therefore, several research efforts have been recently proposed to relax this restriction. However, those methods suffer from three challenges: long running time, low accuracy and reliance on parameter settings. To address these problems, we propose a novel non-stationary DBN model by extending each hidden node of Hidden Markov Model into a DBN (called HMDBN), which properly handles the underlying time-evolving networks. Correspondingly, an improved structural EM algorithm is proposed to learn the HMDBN. It dramatically reduces searching space, thereby substantially improving computational efficiency. Additionally, we derived a novel generalized Bayesian Information Criterion under the non-stationary assumption (called BWBIC), which can help significantly improve the reconstruction accuracy and largely reduce over-fitting. Moreover, the re-estimation formulas for all parameters of our model are derived, enabling us to avoid reliance on parameter settings. Compared to the state-of-the-art methods, the experimental evaluation of our proposed method on both synthetic and real biological data demonstrates more stably high prediction accuracy and significantly improved computation efficiency, even with no prior knowledge and parameter settings.
Collapse
|
37
|
Lim N, d’Alché-Buc F, Auliac C, Michailidis G. Operator-valued kernel-based vector autoregressive models for network inference. Mach Learn 2014. [DOI: 10.1007/s10994-014-5479-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
38
|
Oates CJ, Korkola J, Gray JW, Mukherjee S. Joint estimation of multiple related biological networks. Ann Appl Stat 2014. [DOI: 10.1214/14-aoas761] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
39
|
Oates CJ, Dondelinger F, Bayani N, Korkola J, Gray JW, Mukherjee S. Causal network inference using biochemical kinetics. Bioinformatics 2014; 30:i468-74. [PMID: 25161235 PMCID: PMC4147905 DOI: 10.1093/bioinformatics/btu452] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Networks are widely used as structural summaries of biochemical systems. Statistical estimation of networks is usually based on linear or discrete models. However, the dynamics of biochemical systems are generally non-linear, suggesting that suitable non-linear formulations may offer gains with respect to causal network inference and aid in associated prediction problems. RESULTS We present a general framework for network inference and dynamical prediction using time course data that is rooted in non-linear biochemical kinetics. This is achieved by considering a dynamical system based on a chemical reaction graph with associated kinetic parameters. Both the graph and kinetic parameters are treated as unknown; inference is carried out within a Bayesian framework. This allows prediction of dynamical behavior even when the underlying reaction graph itself is unknown or uncertain. Results, based on (i) data simulated from a mechanistic model of mitogen-activated protein kinase signaling and (ii) phosphoproteomic data from cancer cell lines, demonstrate that non-linear formulations can yield gains in causal network inference and permit dynamical prediction and uncertainty quantification in the challenging setting where the reaction graph is unknown. AVAILABILITY AND IMPLEMENTATION MATLAB R2014a software is available to download from warwick.ac.uk/chrisoates. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chris J Oates
- Department of Statistics, University of Warwick, Coventry, CV4 7AL, MRC Biostatistics Unit, Cambridge, CB2 0SR, UK, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, Department of Biomedical Engineering, Knight Cancer Institute, Oregon Health and Science University, Portland, OR 97239-3098, USA and School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK
| | - Frank Dondelinger
- Department of Statistics, University of Warwick, Coventry, CV4 7AL, MRC Biostatistics Unit, Cambridge, CB2 0SR, UK, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, Department of Biomedical Engineering, Knight Cancer Institute, Oregon Health and Science University, Portland, OR 97239-3098, USA and School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK
| | - Nora Bayani
- Department of Statistics, University of Warwick, Coventry, CV4 7AL, MRC Biostatistics Unit, Cambridge, CB2 0SR, UK, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, Department of Biomedical Engineering, Knight Cancer Institute, Oregon Health and Science University, Portland, OR 97239-3098, USA and School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK
| | - James Korkola
- Department of Statistics, University of Warwick, Coventry, CV4 7AL, MRC Biostatistics Unit, Cambridge, CB2 0SR, UK, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, Department of Biomedical Engineering, Knight Cancer Institute, Oregon Health and Science University, Portland, OR 97239-3098, USA and School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK
| | - Joe W Gray
- Department of Statistics, University of Warwick, Coventry, CV4 7AL, MRC Biostatistics Unit, Cambridge, CB2 0SR, UK, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, Department of Biomedical Engineering, Knight Cancer Institute, Oregon Health and Science University, Portland, OR 97239-3098, USA and School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK
| | - Sach Mukherjee
- Department of Statistics, University of Warwick, Coventry, CV4 7AL, MRC Biostatistics Unit, Cambridge, CB2 0SR, UK, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, Department of Biomedical Engineering, Knight Cancer Institute, Oregon Health and Science University, Portland, OR 97239-3098, USA and School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK Department of Statistics, University of Warwick, Coventry, CV4 7AL, MRC Biostatistics Unit, Cambridge, CB2 0SR, UK, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, Department of Biomedical Engineering, Knight Cancer Institute, Oregon Health and Science University, Portland, OR 97239-3098, USA and School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK
| |
Collapse
|
40
|
Khan J, Bouaynaya N, Fathallah-Shaykh HM. Tracking of time-varying genomic regulatory networks with a LASSO-Kalman smoother. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2014; 2014:3. [PMID: 24517200 PMCID: PMC3974129 DOI: 10.1186/1687-4153-2014-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 11/25/2013] [Indexed: 11/15/2022]
Abstract
It is widely accepted that cellular requirements and environmental conditions dictate the architecture of genetic regulatory networks. Nonetheless, the status quo in regulatory network modeling and analysis assumes an invariant network topology over time. In this paper, we refocus on a dynamic perspective of genetic networks, one that can uncover substantial topological changes in network structure during biological processes such as developmental growth. We propose a novel outlook on the inference of time-varying genetic networks, from a limited number of noisy observations, by formulating the network estimation as a target tracking problem. We overcome the limited number of observations (small n large p problem) by performing tracking in a compressed domain. Assuming linear dynamics, we derive the LASSO-Kalman smoother, which recursively computes the minimum mean-square sparse estimate of the network connectivity at each time point. The LASSO operator, motivated by the sparsity of the genetic regulatory networks, allows simultaneous signal recovery and compression, thereby reducing the amount of required observations. The smoothing improves the estimation by incorporating all observations. We track the time-varying networks during the life cycle of the Drosophila melanogaster. The recovered networks show that few genes are permanent, whereas most are transient, acting only during specific developmental phases of the organism.
Collapse
Affiliation(s)
| | - Nidhal Bouaynaya
- Department of Electrical and Computer Engineering, Rowan University, 201 Mullica Hill Rd, Glassboro, NJ 08028, USA.
| | | |
Collapse
|
41
|
Xiong J, Zhou T. A Kalman-filter based approach to identification of time-varying gene regulatory networks. PLoS One 2013; 8:e74571. [PMID: 24116005 PMCID: PMC3792119 DOI: 10.1371/journal.pone.0074571] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2013] [Accepted: 08/04/2013] [Indexed: 11/18/2022] Open
Abstract
Motivation Conventional identification methods for gene regulatory networks (GRNs) have overwhelmingly adopted static topology models, which remains unchanged over time to represent the underlying molecular interactions of a biological system. However, GRNs are dynamic in response to physiological and environmental changes. Although there is a rich literature in modeling static or temporally invariant networks, how to systematically recover these temporally changing networks remains a major and significant pressing challenge. The purpose of this study is to suggest a two-step strategy that recovers time-varying GRNs. Results It is suggested in this paper to utilize a switching auto-regressive model to describe the dynamics of time-varying GRNs, and a two-step strategy is proposed to recover the structure of time-varying GRNs. In the first step, the change points are detected by a Kalman-filter based method. The observed time series are divided into several segments using these detection results; and each time series segment belonging to two successive demarcating change points is associated with an individual static regulatory network. In the second step, conditional network structure identification methods are used to reconstruct the topology for each time interval. This two-step strategy efficiently decouples the change point detection problem and the topology inference problem. Simulation results show that the proposed strategy can detect the change points precisely and recover each individual topology structure effectively. Moreover, computation results with the developmental data of Drosophila Melanogaster show that the proposed change point detection procedure is also able to work effectively in real world applications and the change point estimation accuracy exceeds other existing approaches, which means the suggested strategy may also be helpful in solving actual GRN reconstruction problem.
Collapse
Affiliation(s)
- Jie Xiong
- Department of Automation, Tsinghua University, Beijing, China
- * E-mail:
| | - Tong Zhou
- Department of Automation and Tsinghua National Laboratory for Information Science and Technology(TNList), Tsinghua University, Beijing, China
| |
Collapse
|