1
|
Rubaiyat AHM, Thai DH, Nichols JM, Hutchinson MN, Wallen SP, Naify CJ, Geib N, Haberman MR, Rohde GK. Data-driven Identification of Parametric Governing Equations of Dynamical Systems Using the Signed Cumulative Distribution Transform. COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING 2024; 422:116822. [PMID: 38352168 PMCID: PMC10861186 DOI: 10.1016/j.cma.2024.116822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2024]
Abstract
This paper presents a novel data-driven approach to identify partial differential equation (PDE) parameters of a dynamical system. Specifically, we adopt a mathematical "transport" model for the solution of the dynamical system at specific spatial locations that allows us to accurately estimate the model parameters, including those associated with structural damage. This is accomplished by means of a newly-developed mathematical transform, the signed cumulative distribution transform (SCDT), which is shown to convert the general nonlinear parameter estimation problem into a simple linear regression. This approach has the additional practical advantage of requiring no a priori knowledge of the source of the excitation (or, alternatively, the initial conditions). By using training data, we devise a coarse regression procedure to recover different PDE parameters from the PDE solution measured at a single location. Numerical experiments show that the proposed regression procedure is capable of detecting and estimating PDE parameters with superior accuracy compared to a number of recently developed machine learning methods. Furthermore, a damage identification experiment conducted on a publicly available dataset provides strong evidence of the proposed method's effectiveness in structural health monitoring (SHM) applications. The Python implementation of the proposed system identification technique is integrated as a part of the software package PyTransKit [1].
Collapse
Affiliation(s)
- Abu Hasnat Mohammad Rubaiyat
- Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, 22904, USA
- U.S. Naval Research Laboratory, Washington, DC, 20375, USA
| | - Duy H Thai
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908, USA
| | | | | | - Samuel P Wallen
- Applied Research Laboratories, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Christina J Naify
- Applied Research Laboratories, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Nathan Geib
- Applied Research Laboratories, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Michael R Haberman
- Walker Department of Mechanical Engineering, The University of Texas at Austin, Austin, TX, 78712, USA
- Applied Research Laboratories, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Gustavo K Rohde
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908, USA
- Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, 22904, USA
| |
Collapse
|
2
|
Shuttleworth JG, Lei CL, Whittaker DG, Windley MJ, Hill AP, Preston SP, Mirams GR. Empirical Quantification of Predictive Uncertainty Due to Model Discrepancy by Training with an Ensemble of Experimental Designs: An Application to Ion Channel Kinetics. Bull Math Biol 2023; 86:2. [PMID: 37999811 PMCID: PMC10673765 DOI: 10.1007/s11538-023-01224-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 10/09/2023] [Indexed: 11/25/2023]
Abstract
When using mathematical models to make quantitative predictions for clinical or industrial use, it is important that predictions come with a reliable estimate of their accuracy (uncertainty quantification). Because models of complex biological systems are always large simplifications, model discrepancy arises-models fail to perfectly recapitulate the true data generating process. This presents a particular challenge for making accurate predictions, and especially for accurately quantifying uncertainty in these predictions. Experimentalists and modellers must choose which experimental procedures (protocols) are used to produce data used to train models. We propose to characterise uncertainty owing to model discrepancy with an ensemble of parameter sets, each of which results from training to data from a different protocol. The variability in predictions from this ensemble provides an empirical estimate of predictive uncertainty owing to model discrepancy, even for unseen protocols. We use the example of electrophysiology experiments that investigate the properties of hERG potassium channels. Here, 'information-rich' protocols allow mathematical models to be trained using numerous short experiments performed on the same cell. In this case, we simulate data with one model and fit it with a different (discrepant) one. For any individual experimental protocol, parameter estimates vary little under repeated samples from the assumed additive independent Gaussian noise model. Yet parameter sets arising from the same model applied to different experiments conflict-highlighting model discrepancy. Our methods will help select more suitable ion channel models for future studies, and will be widely applicable to a range of biological modelling problems.
Collapse
Affiliation(s)
- Joseph G Shuttleworth
- Centre for Mathematical Medicine & Biology, School of Mathematical Sciences, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
| | - Chon Lok Lei
- Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau, China
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Macau, China
| | - Dominic G Whittaker
- Centre for Mathematical Medicine & Biology, School of Mathematical Sciences, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
- 4 Systems Modeling & Translational Biology, Stevenage, GSK, UK
| | - Monique J Windley
- Computational Cardiology Laboratory, Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
| | - Adam P Hill
- Computational Cardiology Laboratory, Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
| | - Simon P Preston
- Centre for Mathematical Medicine & Biology, School of Mathematical Sciences, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
| | - Gary R Mirams
- Centre for Mathematical Medicine & Biology, School of Mathematical Sciences, University of Nottingham, University Park, Nottingham, NG7 2RD, UK.
| |
Collapse
|
3
|
Wang Q, Loh JM, He X, Wang Y. A latent state space model for estimating brain dynamics from electroencephalogram (EEG) data. Biometrics 2023; 79:2444-2457. [PMID: 36004670 PMCID: PMC10894450 DOI: 10.1111/biom.13742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 08/01/2022] [Indexed: 02/03/2023]
Abstract
Modern neuroimaging technologies have substantially advanced the measurement of brain activity. Electroencephalogram (EEG) as a noninvasive neuroimaging technique measures changes in electrical voltage on the scalp induced by brain cortical activity. With its high temporal resolution, EEG has emerged as an increasingly useful tool to study brain connectivity. Challenges with modeling EEG signals of complex brain activity include interactions among unknown sources, low signal-to-noise ratio, and substantial between-subject heterogeneity. In this work, we propose a state space model that jointly analyzes multichannel EEG signals and learns dynamics of different sources corresponding to brain cortical activity. Our model borrows strength from spatially correlated measurements and uses low-dimensional latent states to explain all observed channels. The model can account for patient heterogeneity and quantify the effect of a subject's covariates on the latent space. The EM algorithm, Kalman filtering, and bootstrap resampling are used to fit the state space model and provide comparisons between patient diagnostic groups. We apply the developed approach to a case-control study of alcoholism and reveal significant attenuation of brain activity in response to visual stimuli in alcoholic subjects compared to healthy controls.
Collapse
Affiliation(s)
- Qinxia Wang
- Department of Biostatistics, Columbia University, New York, New York, USA
| | - Ji Meng Loh
- Department of Statistics, NJIT, Newark, New Jersey, USA
| | - Xiaofu He
- Department of Psychiatry, Columbia University, New York, New York, USA
| | - Yuanjia Wang
- Department of Biostatistics, Columbia University, New York, New York, USA
- Department of Psychiatry, Columbia University, New York, New York, USA
| |
Collapse
|
4
|
Zhang J, Hu C, Zhang Q. Gene regulatory network inference based on a nonhomogeneous dynamic Bayesian network model with an improved Markov Monte Carlo sampling. BMC Bioinformatics 2023; 24:264. [PMID: 37355560 DOI: 10.1186/s12859-023-05381-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Accepted: 06/07/2023] [Indexed: 06/26/2023] Open
Abstract
A nonhomogeneous dynamic Bayesian network model, which combines the dynamic Bayesian network and the multi-change point process, solves the limitations of the dynamic Bayesian network in modeling non-stationary gene expression data to a certain extent. However, certain problems persist, such as the low network reconstruction accuracy and poor model convergence. Therefore, we propose an MD-birth move based on the Manhattan distance of the data points to increase the rationality of the multi-change point process. The underlying concept of the MD-birth move is that the direction of movement of the change point is assumed to have a larger Manhattan distance between the variance and the mean of its left and right data points. Considering the data instability characteristics, we propose a Markov chain Monte Carlo sampling method based on node-dependent particle filtering in addition to the multi-change point process. The candidate parent nodes to be sampled, which are close to the real state, are pushed to the high probability area through the particle filter, and the candidate parent node set to be sampled that is far from the real state is pushed to the low probability area and then sampled. In terms of reconstructing the gene regulatory network, the model proposed in this paper (FC-DBN) has better network reconstruction accuracy and model convergence speed than other corresponding models on the Saccharomyces cerevisiae data and RAF data.
Collapse
Affiliation(s)
- Jiayao Zhang
- College of Artificial Intelligence and Big Data, Hefei University, Hefei, 230031, China
| | - Chunling Hu
- College of Artificial Intelligence and Big Data, Hefei University, Hefei, 230031, China.
| | - Qianqian Zhang
- College of Artificial Intelligence and Big Data, Hefei University, Hefei, 230031, China
| |
Collapse
|
5
|
Wang Q, Dong A, Zhao J, Wang C, Griffin C, Gragnoli C, Xue F, Wu R. Vaginal microbiota networks as a mechanistic predictor of aerobic vaginitis. Front Microbiol 2022; 13:998813. [DOI: 10.3389/fmicb.2022.998813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 09/09/2022] [Indexed: 11/13/2022] Open
Abstract
Aerobic vaginitis (AV) is a complex vaginal dysbiosis that is thought to be caused by the micro-ecological change of the vaginal microbiota. While most studies have focused on how changes in the abundance of individual microbes are associated with the emergence of AV, we still do not have a complete mechanistic atlas of the microbe-AV link. Network modeling is central to understanding the structure and function of any microbial community assembly. By encapsulating the abundance of microbes as nodes and ecological interactions among microbes as edges, microbial networks can reveal how each microbe functions and how one microbe cooperate or compete with other microbes to mediate the dynamics of microbial communities. However, existing approaches can only estimate either the strength of microbe-microbe link or the direction of this link, failing to capture full topological characteristics of a network, especially from high-dimensional microbial data. We combine allometry scaling law and evolutionary game theory to derive a functional graph theory that can characterize bidirectional, signed, and weighted interaction networks from any data domain. We apply our theory to characterize the causal interdependence between microbial interactions and AV. From functional networks arising from different functional modules, we find that, as the only favorable genus from Firmicutes among all identified genera, the role of Lactobacillus in maintaining vaginal microbial symbiosis is enabled by upregulation from other microbes, rather than through any intrinsic capacity. Among Lactobacillus species, the proportion of L. crispatus to L. iners is positively associated with more healthy acid vaginal ecosystems. In a less healthy alkaline ecosystem, L. crispatus establishes a contradictory relationship with other microbes, leading to population decrease relative to L. iners. We identify topological changes of vaginal microbiota networks when the menstrual cycle of women changes from the follicular to luteal phases. Our network tool provides a mechanistic approach to disentangle the internal workings of the microbiota assembly and predict its causal relationships with human diseases including AV.
Collapse
|
6
|
Dynamical modeling for non-Gaussian data with high-dimensional sparse ordinary differential equations. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107483] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
7
|
Zhang N, Nanshan M, Cao J. A Joint estimation approach to sparse additive ordinary differential equations. STATISTICS AND COMPUTING 2022; 32:69. [PMID: 36033975 PMCID: PMC9395913 DOI: 10.1007/s11222-022-10117-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 06/13/2022] [Indexed: 06/15/2023]
Abstract
Ordinary differential equations (ODEs) are widely used to characterize the dynamics of complex systems in real applications. In this article, we propose a novel joint estimation approach for generalized sparse additive ODEs where observations are allowed to be non-Gaussian. The new method is unified with existing collocation methods by considering the likelihood, ODE fidelity and sparse regularization simultaneously. We design a block coordinate descent algorithm for optimizing the non-convex and non-differentiable objective function. The global convergence of the algorithm is established. The simulation study and two applications demonstrate the superior performance of the proposed method in estimation and improved performance of identifying the sparse structure.
Collapse
Affiliation(s)
- Nan Zhang
- School of Data Science, Fudan University, Shanghai, China
| | - Muye Nanshan
- School of Data Science, Fudan University, Shanghai, China
| | - Jiguo Cao
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
8
|
Chen C, Shen B, Ma T, Wang M, Wu R. A statistical framework for recovering pseudo-dynamic networks from static data. Bioinformatics 2022; 38:2481-2487. [PMID: 35218338 PMCID: PMC9991900 DOI: 10.1093/bioinformatics/btac038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 12/06/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The collection of temporal or perturbed data is often a prerequisite for reconstructing dynamic networks in most cases. However, these types of data are seldom available for genomic studies in medicine, thus significantly limiting the use of dynamic networks to characterize the biological principles underlying human health and diseases. RESULTS We proposed a statistical framework to recover disease risk-associated pseudo-dynamic networks (DRDNet) from steady-state data. We incorporated a varying coefficient model with multiple ordinary differential equations to learn a series of networks. We analyzed the publicly available Genotype-Tissue Expression data to construct networks associated with hypertension risk, and biological findings showed that key genes constituting these networks had pivotal and biologically relevant roles associated with the vascular system. We also provided the selection consistency of the proposed learning procedure and evaluated its utility through extensive simulations. AVAILABILITY AND IMPLEMENTATION DRDNet is implemented in the R language, and the source codes are available at https://github.com/chencxxy28/DRDnet/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chixiang Chen
- Division of Biostatistics and Bioinformatics, University of Maryland School of Medicine, Baltimore, MD 21201, USA.,Department of Neurosurgery, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Biyi Shen
- Division of Biostatistics and Bioinformatics, College of Medicine, Pennsylvania State University, Hershey, PA 17033, USA
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, MD 20740, USA
| | - Ming Wang
- Division of Biostatistics and Bioinformatics, College of Medicine, Pennsylvania State University, Hershey, PA 17033, USA
| | - Rongling Wu
- Division of Biostatistics and Bioinformatics, College of Medicine, Pennsylvania State University, Hershey, PA 17033, USA
| |
Collapse
|
9
|
Liu Y, Li L, Wang X. A nonlinear sparse neural ordinary differential equation model for multiple functional processes. CAN J STAT 2022; 50:59-85. [PMID: 35530428 PMCID: PMC9075179 DOI: 10.1002/cjs.11666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In this article, we propose a new sparse neural ordinary differential equation (ODE) model to characterize flexible relations among multiple functional processes. We characterize the latent states of the functions via a set of ordinary differential equations. We then model the dynamic changes of the latent states using a deep neural network (DNN) with a specially designed architecture and a sparsity-inducing regularization. The new model is able to capture both nonlinear and sparse dependent relations among multivariate functions. We develop an efficient optimization algorithm to estimate the unknown weights for the DNN under the sparsity constraint. We establish both the algorithmic convergence and selection consistency, which constitute the theoretical guarantees of the proposed method. We illustrate the efficacy of the method through simulations and a gene regulatory network example.
Collapse
Affiliation(s)
- Yijia Liu
- Department of Statistics, Purdue University
| | - Lexin Li
- Department of Biostatistics and Epidemiology, University of California at Berkeley
| | - Xiao Wang
- Department of Statistics, Purdue University,Author to whom correspondence may be addressed.
| |
Collapse
|
10
|
Shojaie A, Fox EB. Granger Causality: A Review and Recent Advances. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2022; 9:289-319. [PMID: 37840549 PMCID: PMC10571505 DOI: 10.1146/annurev-statistics-040120-010930] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2023]
Abstract
Introduced more than a half-century ago, Granger causality has become a popular tool for analyzing time series data in many application domains, from economics and finance to genomics and neuroscience. Despite this popularity, the validity of this framework for inferring causal relationships among time series has remained the topic of continuous debate. Moreover, while the original definition was general, limitations in computational tools have constrained the applications of Granger causality to primarily simple bivariate vector autoregressive processes. Starting with a review of early developments and debates, this article discusses recent advances that address various shortcomings of the earlier approaches, from models for high-dimensional time series to more recent developments that account for nonlinear and non-Gaussian observations and allow for subsampled and mixed-frequency time series.
Collapse
Affiliation(s)
- Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle, Washington 98195-4322, USA
| | - Emily B Fox
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA
| |
Collapse
|
11
|
A graph model of combination therapies. Drug Discov Today 2022; 27:1210-1217. [PMID: 35143962 DOI: 10.1016/j.drudis.2022.02.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 12/31/2021] [Accepted: 02/02/2022] [Indexed: 11/24/2022]
Abstract
The simultaneous use of multiple medications causes drug-drug interactions (DDI) that impact therapeutic efficacy. Here, we argue that graph theory, in conjunction with game theory and ecosystem theory, can address this issue. We treat the coexistence of multiple drugs as a system in which DDI is modeled by game theory. We develop an ordinary differential equation model to characterize how the concentration of a drug changes as a result of its independent capacity and the dependent influence of other drugs through the metabolic response of the host. We coalesce all drugs into personalized and context-specific networks, which can reveal key DDI determinants of therapeutical efficacy. Our model can quantify drug synergy and antagonism and test the translational success of combination therapies to the clinic.
Collapse
|
12
|
Abstract
Ordinary differential equation (ODE) is widely used in modeling biological and physical processes in science. In this article, we propose a new reproducing kernel-based approach for estimation and inference of ODE given noisy observations. We do not assume the functional forms in ODE to be known, or restrict them to be linear or additive, and we allow pairwise interactions. We perform sparse estimation to select individual functionals, and construct confidence intervals for the estimated signal trajectories. We establish the estimation optimality and selection consistency of kernel ODE under both the low-dimensional and high-dimensional settings, where the number of unknown functionals can be smaller or larger than the sample size. Our proposal builds upon the smoothing spline analysis of variance (SS-ANOVA) framework, but tackles several important problems that are not yet fully addressed, and thus extends the scope of existing SS-ANOVA as well. We demonstrate the efficacy of our method through numerous ODE examples.
Collapse
Affiliation(s)
- Xiaowu Dai
- Department of Economics and Simons Institute for the Theory of Computing, the University of California, Berkeley, Berkeley, CA
| | - Lexin Li
- Department of Economics and Simons Institute for the Theory of Computing, the University of California, Berkeley, Berkeley, CA
- Department of Biostatistics and Epidemiology, the University of California, Berkeley, Berkeley, CA
| |
Collapse
|
13
|
Zhang K, Safikhani A, Tank A, Shojaie A. Penalized estimation of threshold auto-regressive models with many components and thresholds. Electron J Stat 2022; 16:1891-1951. [PMID: 37051046 PMCID: PMC10088520 DOI: 10.1214/22-ejs1982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Thanks to their simplicity and interpretable structure, autoregressive processes are widely used to model time series data. However, many real time series data sets exhibit non-linear patterns, requiring nonlinear modeling. The threshold Auto-Regressive (TAR) process provides a family of non-linear auto-regressive time series models in which the process dynamics are specific step functions of a thresholding variable. While estimation and inference for low-dimensional TAR models have been investigated, high-dimensional TAR models have received less attention. In this article, we develop a new framework for estimating high-dimensional TAR models, and propose two different sparsity-inducing penalties. The first penalty corresponds to a natural extension of classical TAR model to high-dimensional settings, where the same threshold is enforced for all model parameters. Our second penalty develops a more flexible TAR model, where different thresholds are allowed for different auto-regressive coefficients. We show that both penalized estimation strategies can be utilized in a three-step procedure that consistently learns both the thresholds and the corresponding auto-regressive coefficients. However, our theoretical and empirical investigations show that the direct extension of the TAR model is not appropriate for high-dimensional settings and is better suited for moderate dimensions. In contrast, the more flexible extension of the TAR model leads to consistent estimation and superior empirical performance in high dimensions.
Collapse
Affiliation(s)
- Kunhui Zhang
- University of Washington, Department of Statistics, Padelford Hall, W Stevens Way NE, Seattle, WA 98195
| | - Abolfazl Safikhani
- University of Florida, Department of Statistics, 102 Griffin-Floyd Hall, Gainesville, FL 32611
| | - Alex Tank
- University of Washington, Department of Statistics, Padelford Hall, W Stevens Way NE, Seattle, WA 98195
| | - Ali Shojaie
- University of Washington, Department of Statistics, Padelford Hall, W Stevens Way NE, Seattle, WA 98195
| |
Collapse
|
14
|
Sun BM, Zeng D, Wang Y. Modeling Temporal Biomarkers With Semiparametric Nonlinear Dynamical Systems. Biometrika 2021; 108:199-214. [PMID: 34326552 PMCID: PMC8315107 DOI: 10.1093/biomet/asaa042] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Dynamical systems based on differential equations are useful for modeling the temporal evolution of biomarkers. These systems can characterize the temporal patterns of biomarkers and inform the detection of interactions among biomarkers. Existing statistical methods for dynamical systems mostly target single time-course data based on a linear model or generalized additive model. Hence, they cannot adequately capture the complex interactions among biomarkers; neither can they take into account the heterogeneity between systems or subjects. in this work, we propose a semiparametric dynamical system based on multi-index models for multiple subjects time-course data. Our model accounts for between-subject heterogeneity by introducing system-level or subject-level covariates to dynamic systems, and it allows for nonlinear relationship and interaction between the combined biomarkers and the temporal rate of each biomarker. For estimation and inference, we consider a two-step procedure based on integral equations from the proposed model. We propose an algorithm that iterates between the estimation of the link function through splines and the estimation of index parameters and that allows for regularization to achieve sparsity. We prove model identifiability and derive the asymptotic properties of the estimated model parameters. A benefit of our approach is to pool information from multiple subjects to identify the interaction among biomarkers. We apply the method to analyze electroencephalogram (EEG) data for patients affected by alcohol dependence. The results reveal new insight on patients' brain activities and demonstrate differential interaction patterns in patients compared to health control subjects.
Collapse
Affiliation(s)
- By Ming Sun
- Department of Biostatistics, Columbia University, 722 West 168th St. New York, U.S
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Yuanjia Wang
- Department of Biostatistics, Columbia University, 722 West 168th St. New York, U.S. & Department of Psychiatry, Columbia University Irving Medical Center
| |
Collapse
|
15
|
Stepaniants G, Brunton BW, Kutz JN. Inferring causal networks of dynamical systems through transient dynamics and perturbation. Phys Rev E 2020; 102:042309. [PMID: 33212733 DOI: 10.1103/physreve.102.042309] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 09/25/2020] [Indexed: 12/28/2022]
Abstract
Inferring causal relations from time series measurements is an ill-posed mathematical problem, where typically an infinite number of potential solutions can reproduce the given data. We explore in depth a strategy to disambiguate between possible underlying causal networks by perturbing the network, where the forcings are either targeted or applied at random. The resulting transient dynamics provide the critical information necessary to infer causality. Two methods are shown to provide accurate causal reconstructions: Granger causality (GC) with perturbations, and our proposed perturbation cascade inference (PCI). Perturbed GC is capable of inferring smaller networks under low coupling strength regimes. Our proposed PCI method demonstrated consistently strong performance in inferring causal relations for small (2-5 node) and large (10-20 node) networks, with both linear and nonlinear dynamics. Thus, the ability to apply a large and diverse set of perturbations to the network is critical for successfully and accurately determining causal relations and disambiguating between various viable networks.
Collapse
Affiliation(s)
- George Stepaniants
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA and Department of Mathematics, University of Washington, Seattle, Washington 98195, USA
| | - Bingni W Brunton
- Department of Biology, University of Washington, Seattle, Washington 98195, USA
| | - J Nathan Kutz
- Department of Applied Mathematics, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
16
|
Safikhani A, Shojaie A. Joint Structural Break Detection and Parameter Estimation in High-Dimensional Non-Stationary VAR Models. J Am Stat Assoc 2020; 117:251-264. [PMID: 38375186 PMCID: PMC10874880 DOI: 10.1080/01621459.2020.1770097] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 11/01/2019] [Accepted: 05/11/2020] [Indexed: 10/24/2022]
Abstract
Assuming stationarity is unrealistic in many time series applications. A more realistic alternative is to assume piecewise stationarity, where the model can change at potentially many change points. We propose a three-stage procedure for simultaneous estimation of change points and parameters of high-dimensional piecewise vector autoregressive (VAR) models. In the first step, we reformulate the change point detection problem as a high-dimensional variable selection one, and solve it using a penalized least square estimator with a total variation penalty. We show that the penalized estimation method over-estimates the number of change points, and propose a selection criterion to identify the change points. In the last step of our procedure, we estimate the VAR parameters in each of the segments. We prove that the proposed procedure consistently detects the number and location of change points, and provides consistent estimates of VAR parameters. The performance of the method is illustrated through several simulated and real data examples.
Collapse
Affiliation(s)
| | - Ali Shojaie
- Department of Biostatistics, University of Washington
| |
Collapse
|
17
|
Cardner M, Meyer-Schaller N, Christofori G, Beerenwinkel N. Inferring signalling dynamics by integrating interventional with observational data. Bioinformatics 2020; 35:i577-i585. [PMID: 31510686 PMCID: PMC6612850 DOI: 10.1093/bioinformatics/btz325] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Motivation In order to infer a cell signalling network, we generally need interventional data from perturbation experiments. If the perturbation experiments are time-resolved, then signal progression through the network can be inferred. However, such designs are infeasible for large signalling networks, where it is more common to have steady-state perturbation data on the one hand, and a non-interventional time series on the other. Such was the design in a recent experiment investigating the coordination of epithelial–mesenchymal transition (EMT) in murine mammary gland cells. We aimed to infer the underlying signalling network of transcription factors and microRNAs coordinating EMT, as well as the signal progression during EMT. Results In the context of nested effects models, we developed a method for integrating perturbation data with a non-interventional time series. We applied the model to RNA sequencing data obtained from an EMT experiment. Part of the network inferred from RNA interference was validated experimentally using luciferase reporter assays. Our model extension is formulated as an integer linear programme, which can be solved efficiently using heuristic algorithms. This extension allowed us to infer the signal progression through the network during an EMT time course, and thereby assess when each regulator is necessary for EMT to advance. Availability and implementation R package at https://github.com/cbg-ethz/timeseriesNEM. The RNA sequencing data and microscopy images can be explored through a Shiny app at https://emt.bsse.ethz.ch. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mathias Cardner
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | | | | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
18
|
Dattner I, Ship H, Voit EO. Separable Nonlinear Least-Squares Parameter Estimation for Complex Dynamic Systems. COMPLEXITY 2020; 2020:6403641. [PMID: 34113070 PMCID: PMC8188859 DOI: 10.1155/2020/6403641] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Nonlinear dynamic models are widely used for characterizing processes that govern complex biological pathway systems. Over the past decade, validation and further development of these models became possible due to data collected via high-throughput experiments using methods from molecular biology. While these data are very beneficial, they are typically incomplete and noisy, which renders the inference of parameter values for complex dynamic models challenging. Fortunately, many biological systems have embedded linear mathematical features, which may be exploited, thereby improving fits and leading to better convergence of optimization algorithms. In this paper, we explore options of inference for dynamic models using a novel method of separable nonlinear least-squares optimization and compare its performance to the traditional nonlinear least-squares method. The numerical results from extensive simulations suggest that the proposed approach is at least as accurate as the traditional nonlinear least-squares, but usually superior, while also enjoying a substantial reduction in computational time.
Collapse
Affiliation(s)
- Itai Dattner
- Department of Statistics, University of Haifa, 199 Aba Khoushy Ave., Mount Carmel, Haifa 3498838, Israel
| | - Harold Ship
- Department of Statistics, University of Haifa, 199 Aba Khoushy Ave., Mount Carmel, Haifa 3498838, Israel
| | - Eberhard O. Voit
- The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 950 Atlantic Drive, Atslanta, GA 30332–2000, USA
| |
Collapse
|
19
|
Zhang J, Wei Sun W, Li L. Mixed-Effect Time-Varying Network Model and Application in Brain Connectivity Analysis. J Am Stat Assoc 2020; 115:2022-2036. [PMID: 34321703 DOI: 10.1080/01621459.2019.1677242] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Time-varying networks are fast emerging in a wide range of scientific and business applications. Most existing dynamic network models are limited to a single-subject and discrete-time setting. In this article, we propose a mixed-effect network model that characterizes the continuous time-varying behavior of the network at the population level, meanwhile taking into account both the individual subject variability as well as the prior module information. We develop a multistep optimization procedure for a constrained likelihood estimation and derive the associated asymptotic properties. We demonstrate the effectiveness of our method through both simulations and an application to a study of brain development in youth. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Jingfei Zhang
- Department of Management Science, Miami Business School, University of Miami, Miami, FL
| | - Will Wei Sun
- Krannert School of Management, Purdue University, West Lafayette, IN
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California at Berkeley, Berkeley, CA
| |
Collapse
|
20
|
Pfister N, Bauer S, Peters J. Learning stable and predictive structures in kinetic systems. Proc Natl Acad Sci U S A 2019; 116:25405-25411. [PMID: 31776252 PMCID: PMC6925987 DOI: 10.1073/pnas.1905688116] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Learning kinetic systems from data is one of the core challenges in many fields. Identifying stable models is essential for the generalization capabilities of data-driven inference. We introduce a computationally efficient framework, called CausalKinetiX, that identifies structure from discrete time, noisy observations, generated from heterogeneous experiments. The algorithm assumes the existence of an underlying, invariant kinetic model, a key criterion for reproducible research. Results on both simulated and real-world examples suggest that learning the structure of kinetic systems benefits from a causal perspective. The identified variables and models allow for a concise description of the dynamics across multiple experimental settings and can be used for prediction in unseen experiments. We observe significant improvements compared to well-established approaches focusing solely on predictive performance, especially for out-of-sample generalization.
Collapse
Affiliation(s)
- Niklas Pfister
- Seminar for Statistics, Eidgenössische Technische Hochschule Zürich, 8092 Zürich, Switzerland;
| | - Stefan Bauer
- Empirical Inference, Max-Planck-Institute for Intelligent Systems, 72076 Tübingen, Germany
| | - Jonas Peters
- Department of Mathematical Sciences, University of Copenhagen, 2100 Copenhagen, Denmark
| |
Collapse
|
21
|
Chen C, Jiang L, Fu G, Wang M, Wang Y, Shen B, Liu Z, Wang Z, Hou W, Berceli SA, Wu R. An omnidirectional visualization model of personalized gene regulatory networks. NPJ Syst Biol Appl 2019; 5:38. [PMID: 31632690 PMCID: PMC6789114 DOI: 10.1038/s41540-019-0116-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 09/18/2019] [Indexed: 01/09/2023] Open
Abstract
Gene regulatory networks (GRNs) have been widely used as a fundamental tool to reveal the genomic mechanisms that underlie the individual's response to environmental and developmental cues. Standard approaches infer GRNs as holistic graphs of gene co-expression, but such graphs cannot quantify how gene-gene interactions vary among individuals and how they alter structurally across spatiotemporal gradients. Here, we develop a general framework for inferring informative, dynamic, omnidirectional, and personalized networks (idopNetworks) from routine transcriptional experiments. This framework is constructed by a system of quasi-dynamic ordinary differential equations (qdODEs) derived from the combination of ecological and evolutionary theories. We reconstruct idopNetworks using genomic data from a surgical experiment and illustrate how network structure is associated with surgical response to infrainguinal vein bypass grafting and the outcome of grafting. idopNetworks may shed light on genotype-phenotype relationships and provide valuable information for personalized medicine.
Collapse
Affiliation(s)
- Chixiang Chen
- Center for Statistical Genetics, Departments of Public Health Sciences and Statistics, Pennsylvania State University, Hershey, PA 17033 USA
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033 USA
| | - Libo Jiang
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083 China
| | - Guifang Fu
- Department of Mathematical Sciences, SUNY Binghamton University, Binghamton, NY 13902 USA
| | - Ming Wang
- Center for Statistical Genetics, Departments of Public Health Sciences and Statistics, Pennsylvania State University, Hershey, PA 17033 USA
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033 USA
| | - Yaqun Wang
- Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Piscataway, NJ 08854 USA
| | - Biyi Shen
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033 USA
| | - Zhenqiu Liu
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033 USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Heaven, CT 06520 USA
| | - Wei Hou
- Department of Family, Population & Preventive Medicine, Stony Brook School of Medicine, Stony Brook, NY 11794 USA
| | - Scott A. Berceli
- Malcom Randall VA Medical Center, Gainesville, FL 32610 USA
- Department of Surgery, University of Florida, Box 100128, Gainesville, FL 32610 USA
- Department of Biomedical Engineering, University of Florida, Gainesville, FL 32610 USA
| | - Rongling Wu
- Center for Statistical Genetics, Departments of Public Health Sciences and Statistics, Pennsylvania State University, Hershey, PA 17033 USA
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033 USA
| |
Collapse
|
22
|
Dale R, Bhat HS. Equations of mind: Data science for inferring nonlinear dynamics of socio-cognitive systems. COGN SYST RES 2018. [DOI: 10.1016/j.cogsys.2018.06.020] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
23
|
Inferring a nonlinear biochemical network model from a heterogeneous single-cell time course data. Sci Rep 2018; 8:6790. [PMID: 29717206 PMCID: PMC5931614 DOI: 10.1038/s41598-018-25064-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Accepted: 04/09/2018] [Indexed: 12/30/2022] Open
Abstract
Mathematical modeling and analysis of biochemical reaction networks are key routines in computational systems biology and biophysics; however, it remains difficult to choose the most valid model. Here, we propose a computational framework for data-driven and systematic inference of a nonlinear biochemical network model. The framework is based on the expectation-maximization algorithm combined with particle smoother and sparse regularization techniques. In this method, a “redundant” model consisting of an excessive number of nodes and regulatory paths is iteratively updated by eliminating unnecessary paths, resulting in an inference of the most likely model. Using artificial single-cell time-course data showing heterogeneous oscillatory behaviors, we demonstrated that this algorithm successfully inferred the true network without any prior knowledge of network topology or parameter values. Furthermore, we showed that both the regulatory paths among nodes and the optimal number of nodes in the network could be systematically determined. The method presented in this study provides a general framework for inferring a nonlinear biochemical network model from heterogeneous single-cell time-course data.
Collapse
|