1
|
Wu Y, Qian B, Wang A, Dong H, Zhu E, Ma B. iLSGRN: inference of large-scale gene regulatory networks based on multi-model fusion. Bioinformatics 2023; 39:btad619. [PMID: 37851379 PMCID: PMC10589915 DOI: 10.1093/bioinformatics/btad619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/04/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.
Collapse
Affiliation(s)
- Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bing Qian
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong 999077, China
| | - Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Enqiang Zhu
- Institution of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
2
|
Dynamical modeling for non-Gaussian data with high-dimensional sparse ordinary differential equations. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107483] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
3
|
Zhang N, Nanshan M, Cao J. A Joint estimation approach to sparse additive ordinary differential equations. STATISTICS AND COMPUTING 2022; 32:69. [PMID: 36033975 PMCID: PMC9395913 DOI: 10.1007/s11222-022-10117-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 06/13/2022] [Indexed: 06/15/2023]
Abstract
Ordinary differential equations (ODEs) are widely used to characterize the dynamics of complex systems in real applications. In this article, we propose a novel joint estimation approach for generalized sparse additive ODEs where observations are allowed to be non-Gaussian. The new method is unified with existing collocation methods by considering the likelihood, ODE fidelity and sparse regularization simultaneously. We design a block coordinate descent algorithm for optimizing the non-convex and non-differentiable objective function. The global convergence of the algorithm is established. The simulation study and two applications demonstrate the superior performance of the proposed method in estimation and improved performance of identifying the sparse structure.
Collapse
Affiliation(s)
- Nan Zhang
- School of Data Science, Fudan University, Shanghai, China
| | - Muye Nanshan
- School of Data Science, Fudan University, Shanghai, China
| | - Jiguo Cao
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
4
|
Combining kinetic orders for efficient S-System modelling of gene regulatory network. Biosystems 2022; 220:104736. [PMID: 35863700 DOI: 10.1016/j.biosystems.2022.104736] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 07/10/2022] [Accepted: 07/10/2022] [Indexed: 11/21/2022]
Abstract
S-System models, non-linear differential equation models, are widely used for reconstructing gene regulatory networks from temporal gene expression data. An S-System model involves two states, generation and degeneration, and uses the kinetic parameters gij and hij, to represent the direction, nature, and intensity of the genetic interactions. The need for learning a large number of model parameters results in increased computational expense. Previously, we improved the performance of the algorithm using dynamic allocation of the maximum in-degree for each gene. While the method was effective for smaller networks, a large amount of computation was still needed for larger networks. This problem arose mainly due to the increased occurrence of invalid networks during optimization, primarily because the two kinetic parameters (gij and hij) of the S-System model converge independently during optimization. Being independent, these two parameters can converge to values that can indicate contradictory gene interactions, specifically inhibition or activation. In this study, to address this major challenge in S-System modelling, we developed a novel method that includes two features: a penalty term that penalizes those networks with invalid kinetic orders, and a parameter, wij, derived by combining the kinetic parameters gij and hij. The novel penalty term was used for candidate selection during the process of optimizing the DRNI (Dynamically Regulated Network Initialization) algorithm. Rather than remaining constant, it is dynamic, with its magnitude dependent on the number of invalid interactions in the given network. This approach encourages the generation of valid candidate solutions, and eliminates invalid networks in a systematic manner. The previous DRNI method, a two-stage approach which uses dynamic allocation of the maximum in-degree for each gene, was further improved by adding a third stage which applies the proposed wij to handle the invalid regulations that may still exist in that candidate solutions. The method was tested on different gene expression datasets, and was able to reduce the number of iterations and produce improved network accuracies. For a 20 gene network, the number of generations required for convergence was reduced by 300, and the F-score improved by 0.05 compared to our previously reported DRNI approach. For the well-known 10 gene networks of the DREAM challenge, our method produced an improvement in the average area under the ROC curve of the DREAM4 10 gene networks.
Collapse
|
5
|
Chen C, Shen B, Ma T, Wang M, Wu R. A statistical framework for recovering pseudo-dynamic networks from static data. Bioinformatics 2022; 38:2481-2487. [PMID: 35218338 PMCID: PMC9991900 DOI: 10.1093/bioinformatics/btac038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 12/06/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The collection of temporal or perturbed data is often a prerequisite for reconstructing dynamic networks in most cases. However, these types of data are seldom available for genomic studies in medicine, thus significantly limiting the use of dynamic networks to characterize the biological principles underlying human health and diseases. RESULTS We proposed a statistical framework to recover disease risk-associated pseudo-dynamic networks (DRDNet) from steady-state data. We incorporated a varying coefficient model with multiple ordinary differential equations to learn a series of networks. We analyzed the publicly available Genotype-Tissue Expression data to construct networks associated with hypertension risk, and biological findings showed that key genes constituting these networks had pivotal and biologically relevant roles associated with the vascular system. We also provided the selection consistency of the proposed learning procedure and evaluated its utility through extensive simulations. AVAILABILITY AND IMPLEMENTATION DRDNet is implemented in the R language, and the source codes are available at https://github.com/chencxxy28/DRDnet/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chixiang Chen
- Division of Biostatistics and Bioinformatics, University of Maryland School of Medicine, Baltimore, MD 21201, USA.,Department of Neurosurgery, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Biyi Shen
- Division of Biostatistics and Bioinformatics, College of Medicine, Pennsylvania State University, Hershey, PA 17033, USA
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, MD 20740, USA
| | - Ming Wang
- Division of Biostatistics and Bioinformatics, College of Medicine, Pennsylvania State University, Hershey, PA 17033, USA
| | - Rongling Wu
- Division of Biostatistics and Bioinformatics, College of Medicine, Pennsylvania State University, Hershey, PA 17033, USA
| |
Collapse
|
6
|
Liu Y, Li L, Wang X. A nonlinear sparse neural ordinary differential equation model for multiple functional processes. CAN J STAT 2022; 50:59-85. [PMID: 35530428 PMCID: PMC9075179 DOI: 10.1002/cjs.11666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In this article, we propose a new sparse neural ordinary differential equation (ODE) model to characterize flexible relations among multiple functional processes. We characterize the latent states of the functions via a set of ordinary differential equations. We then model the dynamic changes of the latent states using a deep neural network (DNN) with a specially designed architecture and a sparsity-inducing regularization. The new model is able to capture both nonlinear and sparse dependent relations among multivariate functions. We develop an efficient optimization algorithm to estimate the unknown weights for the DNN under the sparsity constraint. We establish both the algorithmic convergence and selection consistency, which constitute the theoretical guarantees of the proposed method. We illustrate the efficacy of the method through simulations and a gene regulatory network example.
Collapse
Affiliation(s)
- Yijia Liu
- Department of Statistics, Purdue University
| | - Lexin Li
- Department of Biostatistics and Epidemiology, University of California at Berkeley
| | - Xiao Wang
- Department of Statistics, Purdue University,Author to whom correspondence may be addressed.
| |
Collapse
|
7
|
Shojaie A, Fox EB. Granger Causality: A Review and Recent Advances. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2022; 9:289-319. [PMID: 37840549 PMCID: PMC10571505 DOI: 10.1146/annurev-statistics-040120-010930] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2023]
Abstract
Introduced more than a half-century ago, Granger causality has become a popular tool for analyzing time series data in many application domains, from economics and finance to genomics and neuroscience. Despite this popularity, the validity of this framework for inferring causal relationships among time series has remained the topic of continuous debate. Moreover, while the original definition was general, limitations in computational tools have constrained the applications of Granger causality to primarily simple bivariate vector autoregressive processes. Starting with a review of early developments and debates, this article discusses recent advances that address various shortcomings of the earlier approaches, from models for high-dimensional time series to more recent developments that account for nonlinear and non-Gaussian observations and allow for subsampled and mixed-frequency time series.
Collapse
Affiliation(s)
- Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle, Washington 98195-4322, USA
| | - Emily B Fox
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA
| |
Collapse
|
8
|
Abstract
Ordinary differential equation (ODE) is widely used in modeling biological and physical processes in science. In this article, we propose a new reproducing kernel-based approach for estimation and inference of ODE given noisy observations. We do not assume the functional forms in ODE to be known, or restrict them to be linear or additive, and we allow pairwise interactions. We perform sparse estimation to select individual functionals, and construct confidence intervals for the estimated signal trajectories. We establish the estimation optimality and selection consistency of kernel ODE under both the low-dimensional and high-dimensional settings, where the number of unknown functionals can be smaller or larger than the sample size. Our proposal builds upon the smoothing spline analysis of variance (SS-ANOVA) framework, but tackles several important problems that are not yet fully addressed, and thus extends the scope of existing SS-ANOVA as well. We demonstrate the efficacy of our method through numerous ODE examples.
Collapse
Affiliation(s)
- Xiaowu Dai
- Department of Economics and Simons Institute for the Theory of Computing, the University of California, Berkeley, Berkeley, CA
| | - Lexin Li
- Department of Economics and Simons Institute for the Theory of Computing, the University of California, Berkeley, Berkeley, CA
- Department of Biostatistics and Epidemiology, the University of California, Berkeley, Berkeley, CA
| |
Collapse
|
9
|
Sun BM, Zeng D, Wang Y. Modeling Temporal Biomarkers With Semiparametric Nonlinear Dynamical Systems. Biometrika 2021; 108:199-214. [PMID: 34326552 PMCID: PMC8315107 DOI: 10.1093/biomet/asaa042] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Dynamical systems based on differential equations are useful for modeling the temporal evolution of biomarkers. These systems can characterize the temporal patterns of biomarkers and inform the detection of interactions among biomarkers. Existing statistical methods for dynamical systems mostly target single time-course data based on a linear model or generalized additive model. Hence, they cannot adequately capture the complex interactions among biomarkers; neither can they take into account the heterogeneity between systems or subjects. in this work, we propose a semiparametric dynamical system based on multi-index models for multiple subjects time-course data. Our model accounts for between-subject heterogeneity by introducing system-level or subject-level covariates to dynamic systems, and it allows for nonlinear relationship and interaction between the combined biomarkers and the temporal rate of each biomarker. For estimation and inference, we consider a two-step procedure based on integral equations from the proposed model. We propose an algorithm that iterates between the estimation of the link function through splines and the estimation of index parameters and that allows for regularization to achieve sparsity. We prove model identifiability and derive the asymptotic properties of the estimated model parameters. A benefit of our approach is to pool information from multiple subjects to identify the interaction among biomarkers. We apply the method to analyze electroencephalogram (EEG) data for patients affected by alcohol dependence. The results reveal new insight on patients' brain activities and demonstrate differential interaction patterns in patients compared to health control subjects.
Collapse
Affiliation(s)
- By Ming Sun
- Department of Biostatistics, Columbia University, 722 West 168th St. New York, U.S
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Yuanjia Wang
- Department of Biostatistics, Columbia University, 722 West 168th St. New York, U.S. & Department of Psychiatry, Columbia University Irving Medical Center
| |
Collapse
|
10
|
Ma B, Fang M, Jiao X. Inference of gene regulatory networks based on nonlinear ordinary differential equations. Bioinformatics 2020; 36:4885-4893. [DOI: 10.1093/bioinformatics/btaa032] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 12/30/2019] [Accepted: 01/15/2020] [Indexed: 01/05/2023] Open
Abstract
Abstract
Motivation
Gene regulatory networks (GRNs) capture the regulatory interactions between genes, resulting from the fundamental biological process of transcription and translation. In some cases, the topology of GRNs is not known, and has to be inferred from gene expression data. Most of the existing GRNs reconstruction algorithms are either applied to time-series data or steady-state data. Although time-series data include more information about the system dynamics, steady-state data imply stability of the underlying regulatory networks.
Results
In this article, we propose a method for inferring GRNs from time-series and steady-state data jointly. We make use of a non-linear ordinary differential equations framework to model dynamic gene regulation and an importance measurement strategy to infer all putative regulatory links efficiently. The proposed method is evaluated extensively on the artificial DREAM4 dataset and two real gene expression datasets of yeast and Escherichia coli. Based on public benchmark datasets, the proposed method outperforms other popular inference algorithms in terms of overall score. By comparing the performance on the datasets with different scales, the results show that our method still keeps good robustness and accuracy at a low computational complexity.
Availability and implementation
The proposed method is written in the Python language, and is available at: https://github.com/lab319/GRNs_nonlinear_ODEs
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Baoshan Ma
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Mingkun Fang
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Xiangtian Jiao
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
11
|
Che D, Guo S, Jiang Q, Chen L. PFBNet: a priori-fused boosting method for gene regulatory network inference. BMC Bioinformatics 2020; 21:308. [PMID: 32664870 PMCID: PMC7362553 DOI: 10.1186/s12859-020-03639-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 07/02/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Inferring gene regulatory networks (GRNs) from gene expression data remains a challenge in system biology. In past decade, numerous methods have been developed for the inference of GRNs. It remains a challenge due to the fact that the data is noisy and high dimensional, and there exists a large number of potential interactions. RESULTS We present a novel method, namely priori-fused boosting network inference method (PFBNet), to infer GRNs from time-series expression data by using the non-linear model of Boosting and the prior information (e.g., the knockout data) fusion scheme. Specifically, PFBNet first calculates the confidences of the regulation relationships using the boosting-based model, where the information about the accumulation impact of the gene expressions at previous time points is taken into account. Then, a newly defined strategy is applied to fuse the information from the prior data by elevating the confidences of the regulation relationships from the corresponding regulators. CONCLUSIONS The experiments on the benchmark datasets from DREAM challenge as well as the E.coli datasets show that PFBNet achieves significantly better performance than other state-of-the-art methods (Jump3, GEINE3-lag, HiDi, iRafNet and BiXGBoost).
Collapse
Affiliation(s)
- Dandan Che
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Shun Guo
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Qingshan Jiang
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Lifei Chen
- School of Mathematics and Computer Science, Fujian Normal University, Fujian, 350117 China
| |
Collapse
|
12
|
Shafiee Kamalabad M, Grzegorczyk M. Non-homogeneous dynamic Bayesian networks with edge-wise sequentially coupled parameters. Bioinformatics 2020; 36:1198-1207. [PMID: 31504191 PMCID: PMC7703764 DOI: 10.1093/bioinformatics/btz690] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 08/02/2019] [Accepted: 09/02/2019] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Non-homogeneous dynamic Bayesian networks (NH-DBNs) are a popular tool for learning networks with time-varying interaction parameters. A multiple changepoint process is used to divide the data into disjoint segments and the network interaction parameters are assumed to be segment-specific. The objective is to infer the network structure along with the segmentation and the segment-specific parameters from the data. The conventional (uncoupled) NH-DBNs do not allow for information exchange among segments, and the interaction parameters have to be learned separately for each segment. More advanced coupled NH-DBN models allow the interaction parameters to vary but enforce them to stay similar over time. As the enforced similarity of the network parameters can have counter-productive effects, we propose a new consensus NH-DBN model that combines features of the uncoupled and the coupled NH-DBN. The new model infers for each individual edge whether its interaction parameter stays similar over time (and should be coupled) or if it changes from segment to segment (and should stay uncoupled). RESULTS Our new model yields higher network reconstruction accuracies than state-of-the-art models for synthetic and yeast network data. For gene expression data from A.thaliana our new model infers a plausible network topology and yields hypotheses about the light-dependencies of the gene interactions. AVAILABILITY AND IMPLEMENTATION Data are available from earlier publications. Matlab code is available at Bioinformatics online. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mahdi Shafiee Kamalabad
- Bernoulli Institute, Department of Mathematics, Faculty of Science and Engineering, Groningen University, Groningen 9747 AG, The Netherlands
| | - Marco Grzegorczyk
- Bernoulli Institute, Department of Mathematics, Faculty of Science and Engineering, Groningen University, Groningen 9747 AG, The Netherlands
| |
Collapse
|
13
|
Chen C, Jiang L, Fu G, Wang M, Wang Y, Shen B, Liu Z, Wang Z, Hou W, Berceli SA, Wu R. An omnidirectional visualization model of personalized gene regulatory networks. NPJ Syst Biol Appl 2019; 5:38. [PMID: 31632690 PMCID: PMC6789114 DOI: 10.1038/s41540-019-0116-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 09/18/2019] [Indexed: 01/09/2023] Open
Abstract
Gene regulatory networks (GRNs) have been widely used as a fundamental tool to reveal the genomic mechanisms that underlie the individual's response to environmental and developmental cues. Standard approaches infer GRNs as holistic graphs of gene co-expression, but such graphs cannot quantify how gene-gene interactions vary among individuals and how they alter structurally across spatiotemporal gradients. Here, we develop a general framework for inferring informative, dynamic, omnidirectional, and personalized networks (idopNetworks) from routine transcriptional experiments. This framework is constructed by a system of quasi-dynamic ordinary differential equations (qdODEs) derived from the combination of ecological and evolutionary theories. We reconstruct idopNetworks using genomic data from a surgical experiment and illustrate how network structure is associated with surgical response to infrainguinal vein bypass grafting and the outcome of grafting. idopNetworks may shed light on genotype-phenotype relationships and provide valuable information for personalized medicine.
Collapse
Affiliation(s)
- Chixiang Chen
- Center for Statistical Genetics, Departments of Public Health Sciences and Statistics, Pennsylvania State University, Hershey, PA 17033 USA
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033 USA
| | - Libo Jiang
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083 China
| | - Guifang Fu
- Department of Mathematical Sciences, SUNY Binghamton University, Binghamton, NY 13902 USA
| | - Ming Wang
- Center for Statistical Genetics, Departments of Public Health Sciences and Statistics, Pennsylvania State University, Hershey, PA 17033 USA
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033 USA
| | - Yaqun Wang
- Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Piscataway, NJ 08854 USA
| | - Biyi Shen
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033 USA
| | - Zhenqiu Liu
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033 USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Heaven, CT 06520 USA
| | - Wei Hou
- Department of Family, Population & Preventive Medicine, Stony Brook School of Medicine, Stony Brook, NY 11794 USA
| | - Scott A. Berceli
- Malcom Randall VA Medical Center, Gainesville, FL 32610 USA
- Department of Surgery, University of Florida, Box 100128, Gainesville, FL 32610 USA
- Department of Biomedical Engineering, University of Florida, Gainesville, FL 32610 USA
| | - Rongling Wu
- Center for Statistical Genetics, Departments of Public Health Sciences and Statistics, Pennsylvania State University, Hershey, PA 17033 USA
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033 USA
| |
Collapse
|
14
|
Deng Y, Zenil H, Tegnér J, Kiani NA. HiDi: an efficient reverse engineering schema for large-scale dynamic regulatory network reconstruction using adaptive differentiation. Bioinformatics 2017; 33:3964-3972. [DOI: 10.1093/bioinformatics/btx501] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Accepted: 08/05/2017] [Indexed: 11/14/2022] Open
Affiliation(s)
- Yue Deng
- Algorithmic Dynamics Lab, Karolinska Institute, Stockholm, Sweden
- Unit of Computational Medicine, Center for Molecular Medicine, Department of Medicine, Solna and Science for Life Laboratory (SciLifeLab), Karolinska Institute, Stockholm, Sweden
| | - Hector Zenil
- Algorithmic Dynamics Lab, Karolinska Institute, Stockholm, Sweden
- Unit of Computational Medicine, Center for Molecular Medicine, Department of Medicine, Solna and Science for Life Laboratory (SciLifeLab), Karolinska Institute, Stockholm, Sweden
| | - Jesper Tegnér
- Unit of Computational Medicine, Center for Molecular Medicine, Department of Medicine, Solna and Science for Life Laboratory (SciLifeLab), Karolinska Institute, Stockholm, Sweden
- Biological and Environmental Sciences and Engineering Division, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Narsis A Kiani
- Algorithmic Dynamics Lab, Karolinska Institute, Stockholm, Sweden
- Unit of Computational Medicine, Center for Molecular Medicine, Department of Medicine, Solna and Science for Life Laboratory (SciLifeLab), Karolinska Institute, Stockholm, Sweden
| |
Collapse
|
15
|
Inference of Gene Regulatory Networks Using Bayesian Nonparametric Regression and Topology Information. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2017; 2017:8307530. [PMID: 28133490 PMCID: PMC5241943 DOI: 10.1155/2017/8307530] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Accepted: 11/24/2016] [Indexed: 11/17/2022]
Abstract
Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.
Collapse
|
16
|
Chen S, Shojaie A, Witten DM. Network Reconstruction From High-Dimensional Ordinary Differential Equations. J Am Stat Assoc 2017; 112:1697-1707. [PMID: 29618851 DOI: 10.1080/01621459.2016.1229197] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
We consider the task of learning a dynamical system from high-dimensional time-course data. For instance, we might wish to estimate a gene regulatory network from gene expression data measured at discrete time points. We model the dynamical system nonparametrically as a system of additive ordinary differential equations. Most existing methods for parameter estimation in ordinary differential equations estimate the derivatives from noisy observations. This is known to be challenging and inefficient. We propose a novel approach that does not involve derivative estimation. We show that the proposed method can consistently recover the true network structure even in high dimensions, and we demonstrate empirical improvement over competing approaches. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Shizhe Chen
- Department of Biostatistics, University of Washington, WA
| | - Ali Shojaie
- Departments of Biostatistics and Statistics, University of Washington, WA
| | - Daniela M Witten
- Departments of Biostatistics and Statistics, University of Washington, WA
| |
Collapse
|
17
|
Palapattu GS. Parsing Multi-omic Data to Understand Urothelial Cell Carcinoma Progression. J Urol 2016; 195:1645. [PMID: 26946160 DOI: 10.1016/j.juro.2016.02.2965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/29/2016] [Indexed: 10/22/2022]
|
18
|
Inferring Broad Regulatory Biology from Time Course Data: Have We Reached an Upper Bound under Constraints Typical of In Vivo Studies? PLoS One 2015; 10:e0127364. [PMID: 25984725 PMCID: PMC4435750 DOI: 10.1371/journal.pone.0127364] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 04/13/2015] [Indexed: 12/21/2022] Open
Abstract
There is a growing appreciation for the network biology that regulates the coordinated expression of molecular and cellular markers however questions persist regarding the identifiability of these networks. Here we explore some of the issues relevant to recovering directed regulatory networks from time course data collected under experimental constraints typical of in vivo studies. NetSim simulations of sparsely connected biological networks were used to evaluate two simple feature selection techniques used in the construction of linear Ordinary Differential Equation (ODE) models, namely truncation of terms versus latent vector projection. Performance was compared with ODE-based Time Series Network Identification (TSNI) integral, and the information-theoretic Time-Delay ARACNE (TD-ARACNE). Projection-based techniques and TSNI integral outperformed truncation-based selection and TD-ARACNE on aggregate networks with edge densities of 10-30%, i.e. transcription factor, protein-protein cliques and immune signaling networks. All were more robust to noise than truncation-based feature selection. Performance was comparable on the in silico 10-node DREAM 3 network, a 5-node Yeast synthetic network designed for In vivo Reverse-engineering and Modeling Assessment (IRMA) and a 9-node human HeLa cell cycle network of similar size and edge density. Performance was more sensitive to the number of time courses than to sample frequency and extrapolated better to larger networks by grouping experiments. In all cases performance declined rapidly in larger networks with lower edge density. Limited recovery and high false positive rates obtained overall bring into question our ability to generate informative time course data rather than the design of any particular reverse engineering algorithm.
Collapse
|
19
|
Hasegawa T, Mori T, Yamaguchi R, Shimamura T, Miyano S, Imoto S, Akutsu T. Genomic data assimilation using a higher moment filtering technique for restoration of gene regulatory networks. BMC SYSTEMS BIOLOGY 2015; 9:14. [PMID: 25890175 PMCID: PMC4371723 DOI: 10.1186/s12918-015-0154-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 02/20/2015] [Indexed: 11/20/2022]
Abstract
Background As a result of recent advances in biotechnology, many findings related to intracellular systems have been published, e.g., transcription factor (TF) information. Although we can reproduce biological systems by incorporating such findings and describing their dynamics as mathematical equations, simulation results can be inconsistent with data from biological observations if there are inaccurate or unknown parts in the constructed system. For the completion of such systems, relationships among genes have been inferred through several computational approaches, which typically apply several abstractions, e.g., linearization, to handle the heavy computational cost in evaluating biological systems. However, since these approximations can generate false regulations, computational methods that can infer regulatory relationships based on less abstract models incorporating existing knowledge have been strongly required. Results We propose a new data assimilation algorithm that utilizes a simple nonlinear regulatory model and a state space representation to infer gene regulatory networks (GRNs) using time-course observation data. For the estimation of the hidden state variables and the parameter values, we developed a novel method termed a higher moment ensemble particle filter (HMEnPF) that can retain first four moments of the conditional distributions through filtering steps. Starting from the original model, e.g., derived from the literature, the proposed algorithm can sequentially evaluate candidate models, which are generated by partially changing the current best model, to find the model that can best predict the data. For the performance evaluation, we generated six synthetic data based on two real biological networks and evaluated effectiveness of the proposed algorithm by improving the networks inferred by previous methods. We then applied time-course observation data of rat skeletal muscle stimulated with corticosteroid. Since a corticosteroid pharmacogenomic pathway, its kinetic/dynamics and TF candidate genes have been partially elucidated, we incorporated these findings and inferred an extended pathway of rat pharmacogenomics. Conclusions Through the simulation study, the proposed algorithm outperformed previous methods and successfully improved the regulatory structure inferred by the previous methods. Furthermore, the proposed algorithm could extend a corticosteroid related pathway, which has been partially elucidated, with incorporating several information sources. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0154-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Takanori Hasegawa
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Kyoto, 611-0011 Uji, Japan.
| | - Tomoya Mori
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Kyoto, 611-0011 Uji, Japan.
| | - Rui Yamaguchi
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Tokyo, 108-8639 Minato-ku, Japan.
| | - Teppei Shimamura
- Division of Systems Biology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Nagoya, 466-8550 Showa-ku, Japan.
| | - Satoru Miyano
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Tokyo, 108-8639 Minato-ku, Japan.
| | - Seiya Imoto
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Tokyo, 108-8639 Minato-ku, Japan.
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Kyoto, 611-0011 Uji, Japan.
| |
Collapse
|
20
|
Bhaumik P, Ghosal S. Bayesian two-step estimation in differential equation models. Electron J Stat 2015. [DOI: 10.1214/15-ejs1099] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
21
|
Inference of gene regulatory networks incorporating multi-source biological knowledge via a state space model with L1 regularization. PLoS One 2014; 9:e105942. [PMID: 25162401 PMCID: PMC4146587 DOI: 10.1371/journal.pone.0105942] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 07/25/2014] [Indexed: 12/17/2022] Open
Abstract
Comprehensive understanding of gene regulatory networks (GRNs) is a major challenge in the field of systems biology. Currently, there are two main approaches in GRN analysis using time-course observation data, namely an ordinary differential equation (ODE)-based approach and a statistical model-based approach. The ODE-based approach can generate complex dynamics of GRNs according to biologically validated nonlinear models. However, it cannot be applied to ten or more genes to simultaneously estimate system dynamics and regulatory relationships due to the computational difficulties. The statistical model-based approach uses highly abstract models to simply describe biological systems and to infer relationships among several hundreds of genes from the data. However, the high abstraction generates false regulations that are not permitted biologically. Thus, when dealing with several tens of genes of which the relationships are partially known, a method that can infer regulatory relationships based on a model with low abstraction and that can emulate the dynamics of ODE-based models while incorporating prior knowledge is urgently required. To accomplish this, we propose a method for inference of GRNs using a state space representation of a vector auto-regressive (VAR) model with L1 regularization. This method can estimate the dynamic behavior of genes based on linear time-series modeling constructed from an ODE-based model and can infer the regulatory structure among several tens of genes maximizing prediction ability for the observational data. Furthermore, the method is capable of incorporating various types of existing biological knowledge, e.g., drug kinetics and literature-recorded pathways. The effectiveness of the proposed method is shown through a comparison of simulation studies with several previous methods. For an application example, we evaluated mRNA expression profiles over time upon corticosteroid stimulation in rats, thus incorporating corticosteroid kinetics/dynamics, literature-recorded pathways and transcription factor (TF) information.
Collapse
|