51
|
Tsai MJ, Wang JR, Ho SJ, Shu LS, Huang WL, Ho SY. GREMA: modelling of emulated gene regulatory networks with confidence levels based on evolutionary intelligence to cope with the underdetermined problem. Bioinformatics 2020; 36:3833-3840. [DOI: 10.1093/bioinformatics/btaa267] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 04/14/2020] [Accepted: 05/09/2020] [Indexed: 11/12/2022] Open
Abstract
AbstractMotivationNon-linear ordinary differential equation (ODE) models that contain numerous parameters are suitable for inferring an emulated gene regulatory network (eGRN). However, the number of experimental measurements is usually far smaller than the number of parameters of the eGRN model that leads to an underdetermined problem. There is no unique solution to the inference problem for an eGRN using insufficient measurements.ResultsThis work proposes an evolutionary modelling algorithm (EMA) that is based on evolutionary intelligence to cope with the underdetermined problem. EMA uses an intelligent genetic algorithm to solve the large-scale parameter optimization problem. An EMA-based method, GREMA, infers a novel type of gene regulatory network with confidence levels for every inferred regulation. The higher the confidence level is, the more accurate the inferred regulation is. GREMA gradually determines the regulations of an eGRN with confidence levels in descending order using either an S-system or a Hill function-based ODE model. The experimental results showed that the regulations with high-confidence levels are more accurate and robust than regulations with low-confidence levels. Evolutionary intelligence enhanced the mean accuracy of GREMA by 19.2% when using the S-system model with benchmark datasets. An increase in the number of experimental measurements may increase the mean confidence level of the inferred regulations. GREMA performed well compared with existing methods that have been previously applied to the same S-system, DREAM4 challenge and SOS DNA repair benchmark datasets.Availability and implementationAll of the datasets that were used and the GREMA-based tool are freely available at https://nctuiclab.github.io/GREMA.Supplementary informationSupplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ming-Ju Tsai
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Jyun-Rong Wang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Shinn-Jang Ho
- Department of Automation Engineering, National Formosa University, Yunlin 632, Taiwan
| | - Li-Sun Shu
- Department of Information Management, Overseas Chinese University, Taichung 407, Taiwan
| | - Wen-Lin Huang
- Department of Industrial Engineering and Management, Minghsin University of Science and Technology, Xinfeng 304, Taiwan
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
- Department of Biological Science and Technology
- Center For Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
52
|
Liu L, Liu J. Reconstructing gene regulatory networks via memetic algorithm and LASSO based on recurrent neural networks. Soft comput 2020. [DOI: 10.1007/s00500-019-04185-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
53
|
|
54
|
Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali TM. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods 2020; 17:147-154. [PMID: 31907445 PMCID: PMC7098173 DOI: 10.1038/s41592-019-0690-6] [Citation(s) in RCA: 326] [Impact Index Per Article: 81.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 11/22/2019] [Indexed: 01/10/2023]
Abstract
We present a systematic evaluation of state-of-the-art algorithms for inferring gene regulatory networks from single-cell transcriptional data. As the ground truth for assessing accuracy, we use synthetic networks with predictable trajectories, literature-curated Boolean models and diverse transcriptional regulatory networks. We develop a strategy to simulate single-cell transcriptional data from synthetic and Boolean networks that avoids pitfalls of previously used methods. Furthermore, we collect networks from multiple experimental single-cell RNA-seq datasets. We develop an evaluation framework called BEELINE. We find that the area under the precision-recall curve and early precision of the algorithms are moderate. The methods are better in recovering interactions in synthetic networks than Boolean models. The algorithms with the best early precision values for Boolean models also perform well on experimental datasets. Techniques that do not require pseudotime-ordered cells are generally more accurate. Based on these results, we present recommendations to end users. BEELINE will aid the development of gene regulatory network inference algorithms.
Collapse
Affiliation(s)
- Aditya Pratapa
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Amogh P Jalihal
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Tech, Blacksburg, VA, USA
| | - Jeffrey N Law
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Tech, Blacksburg, VA, USA
| | - Aditya Bharadwaj
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|
55
|
Pyne S, Kumar AR, Anand A. Rapid Reconstruction of Time-Varying Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:278-291. [PMID: 30072338 DOI: 10.1109/tcbb.2018.2861698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Rapid advancements in high-throughput technologies have resulted in genome-scale time series datasets. Uncovering the temporal sequence of gene regulatory events, in the form of time-varying gene regulatory networks (GRNs), demands computationally fast, accurate, and scalable algorithms. The existing algorithms can be divided into two categories: ones that are time-intensive and hence unscalable; and others that impose structural constraints to become scalable. In this paper, a novel algorithm, namely 'an algorithm for reconstructing Time-varying Gene regulatory networks with Shortlisted candidate regulators' (TGS), is proposed. TGS is time-efficient and does not impose any structural constraints. Moreover, it provides such flexibility and time-efficiency, without losing its accuracy. TGS consistently outperforms the state-of-the-art algorithms in true positive detection, on three benchmark synthetic datasets. However, TGS does not perform as well in false positive rejection. To mitigate this issue, TGS+ is proposed. TGS+ demonstrates competitive false positive rejection power, while maintaining the superior speed and true positive detection power of TGS. Nevertheless, the main memory requirements of both TGS variants grow exponentially with the number of genes, which they tackle by restricting the maximum number of regulators for each gene. Relaxing this restriction remains a challenge as the actual number of regulators is not known a priori.
Collapse
|
56
|
Ghanbari M, Lasserre J, Vingron M. The Distance Precision Matrix: computing networks from non-linear relationships. Bioinformatics 2019; 35:1009-1017. [PMID: 30165509 PMCID: PMC6420154 DOI: 10.1093/bioinformatics/bty724] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Revised: 06/12/2018] [Accepted: 08/23/2018] [Indexed: 12/21/2022] Open
Abstract
Motivation Full-order partial correlation, a fundamental approach for network reconstruction, e.g. in the context of gene regulation, relies on the precision matrix (the inverse of the covariance matrix) as an indicator of which variables are directly associated. The precision matrix assumes Gaussian linear data and its entries are zero for pairs of variables that are independent given all other variables. However, there is still very little theory on network reconstruction under the assumption of non-linear interactions among variables. Results We propose Distance Precision Matrix, a network reconstruction method aimed at both linear and non-linear data. Like partial distance correlation, it builds on distance covariance, a measure of possibly non-linear association, and on the idea of full-order partial correlation, which allows to discard indirect associations. We provide evidence that the Distance Precision Matrix method can successfully compute networks from linear and non-linear data, and consistently so across different datasets, even if sample size is low. The method is fast enough to compute networks on hundreds of nodes. Availability and implementation An R package DPM is available at https://github.molgen.mpg.de/ghanbari/DPM. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mahsa Ghanbari
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, D-14195 Berlin, Germany
| | - Julia Lasserre
- Zalando Research, Mühlenstr. 25, D-10243 Berlin, Germany
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, D-14195 Berlin, Germany
| |
Collapse
|
57
|
Chen X, Li M, Zheng R, Wu FX, Wang J. D3GRN: a data driven dynamic network construction method to infer gene regulatory networks. BMC Genomics 2019; 20:929. [PMID: 31881937 PMCID: PMC6933629 DOI: 10.1186/s12864-019-6298-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND To infer gene regulatory networks (GRNs) from gene-expression data is still a fundamental and challenging problem in systems biology. Several existing algorithms formulate GRNs inference as a regression problem and obtain the network with an ensemble strategy. Recent studies on data driven dynamic network construction provide us a new perspective to solve the regression problem. RESULTS In this study, we propose a data driven dynamic network construction method to infer gene regulatory network (D3GRN), which transforms the regulatory relationship of each target gene into functional decomposition problem and solves each sub problem by using the Algorithm for Revealing Network Interactions (ARNI). To remedy the limitation of ARNI in constructing networks solely from the unit level, a bootstrapping and area based scoring method is taken to infer the final network. On DREAM4 and DREAM5 benchmark datasets, D3GRN performs competitively with the state-of-the-art algorithms in terms of AUPR. CONCLUSIONS We have proposed a novel data driven dynamic network construction method by combining ARNI with bootstrapping and area based scoring strategy. The proposed method performs well on the benchmark datasets, contributing as a competitive method to infer gene regulatory networks in a new perspective.
Collapse
Affiliation(s)
- Xiang Chen
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China.
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
58
|
A Stable, Unified Density Controlled Memetic Algorithm for Gene Regulatory Network Reconstruction Based on Sparse Fuzzy Cognitive Maps. Neural Process Lett 2019. [DOI: 10.1007/s11063-019-10056-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
59
|
Bhuva DD, Cursons J, Smyth GK, Davis MJ. Differential co-expression-based detection of conditional relationships in transcriptional data: comparative analysis and application to breast cancer. Genome Biol 2019; 20:236. [PMID: 31727119 PMCID: PMC6857226 DOI: 10.1186/s13059-019-1851-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 10/02/2019] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Elucidation of regulatory networks, including identification of regulatory mechanisms specific to a given biological context, is a key aim in systems biology. This has motivated the move from co-expression to differential co-expression analysis and numerous methods have been developed subsequently to address this task; however, evaluation of methods and interpretation of the resulting networks has been hindered by the lack of known context-specific regulatory interactions. RESULTS In this study, we develop a simulator based on dynamical systems modelling capable of simulating differential co-expression patterns. With the simulator and an evaluation framework, we benchmark and characterise the performance of inference methods. Defining three different levels of "true" networks for each simulation, we show that accurate inference of causation is difficult for all methods, compared to inference of associations. We show that a z-score-based method has the best general performance. Further, analysis of simulation parameters reveals five network and simulation properties that explained the performance of methods. The evaluation framework and inference methods used in this study are available in the dcanr R/Bioconductor package. CONCLUSIONS Our analysis of networks inferred from simulated data show that hub nodes are more likely to be differentially regulated targets than transcription factors. Based on this observation, we propose an interpretation of the inferred differential network that can reconstruct a putative causal network.
Collapse
Affiliation(s)
- Dharmesh D Bhuva
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,School of Mathematics and Statistics, Faculty of Science, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Joseph Cursons
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Gordon K Smyth
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,School of Mathematics and Statistics, Faculty of Science, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Melissa J Davis
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia. .,Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, VIC, 3010, Australia. .,Department of Clinical Pathology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, VIC, 3010, Australia.
| |
Collapse
|
60
|
Zhang L, Wu HC, Ho CH, Chan SC. A Multi-Laplacian Prior and Augmented Lagrangian Approach to the Exploratory Analysis of Time-Varying Gene and Transcriptional Regulatory Networks for Gene Microarray Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1816-1829. [PMID: 29993914 DOI: 10.1109/tcbb.2018.2828810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper proposes a novel multi-Laplacian prior (MLP) and augmented Lagrangian method (ALM) approach for gene interactions and putative transcription factors (TFs) identification from time-course gene microarray data. It employs a non-linear time-varying auto-regressive (N-TVAR) model and the Maximum-A-Posteriori-Probability method for incorporating the multi-Laplacian prior and the continuity constraint. The MLP allows connections to/from a gene to be better preserved for putative TF identification in non-stationarity gene regulatory network as compared with conventional L1-based penalties. Moreover, the ALM allows the resultant non-smooth L1-based penalties to be decoupled from the remaining smooth terms, so that the former and latter can be efficiently solved using a low-complexity proximity operator and smooth optimization technique, respectively. Synthetic and real time-course gene microarray datasets are tested to evaluate the performance of the proposed method. Experimental results show that the proposed method gives better accuracy and higher computational speed than our previous work using smoothed approximation. Moreover, its performance, without the use of ChIP-chip data, is found to be highly comparable with other state-of-the-art methods integrating both ChIP-chip and gene microarray data. It suggests that the proposed method may serve as a useful exploratory tool for putative TF identification with reduced experimental cost.
Collapse
|
61
|
Liu L, Liu J. A sparse and decomposed particle swarm optimization for inferring gene regulatory networks based on fuzzy cognitive maps. J Bioinform Comput Biol 2019; 17:1950023. [PMID: 31617458 DOI: 10.1142/s0219720019500239] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Inferring gene regulatory networks (GRNs) is vital to understand the complex cellular processes and reveal the regulatory mechanisms among genes. Although various methods have been developed, more accurate algorithms which can control the sparseness of GRNs still need to be developed. In this work, we model GRNs by fuzzy cognitive maps (FCMs), and a node in an FCM means a gene. Then, a new sparse and decomposed particle swarm optimization, termed as SDPSOFCM-GRN, is proposed to train FCMs, which employs the least absolute shrinkage and selection operator (Lasso) to control the network sparseness with a decomposed strategy. In the experiments, the performance of SDPSOFCM-GRN is validated on synthetic data and the well-known benchmark DREAM3 and DREAM4. The results show that SDPSOFCM-GRN can well control the sparseness of GRNs, and infer directed GRNs with high accuracy and efficiency.
Collapse
Affiliation(s)
- Luowen Liu
- Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi'an 710071, P. R. China
| | - Jing Liu
- Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi'an 710071, P. R. China
| |
Collapse
|
62
|
Muldoon JJ, Yu JS, Fassia MK, Bagheri N. Network inference performance complexity: a consequence of topological, experimental and algorithmic determinants. Bioinformatics 2019; 35:3421-3432. [PMID: 30932143 PMCID: PMC6748731 DOI: 10.1093/bioinformatics/btz105] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 01/24/2019] [Accepted: 02/11/2019] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Network inference algorithms aim to uncover key regulatory interactions governing cellular decision-making, disease progression and therapeutic interventions. Having an accurate blueprint of this regulation is essential for understanding and controlling cell behavior. However, the utility and impact of these approaches are limited because the ways in which various factors shape inference outcomes remain largely unknown. RESULTS We identify and systematically evaluate determinants of performance-including network properties, experimental design choices and data processing-by developing new metrics that quantify confidence across algorithms in comparable terms. We conducted a multifactorial analysis that demonstrates how stimulus target, regulatory kinetics, induction and resolution dynamics, and noise differentially impact widely used algorithms in significant and previously unrecognized ways. The results show how even if high-quality data are paired with high-performing algorithms, inferred models are sometimes susceptible to giving misleading conclusions. Lastly, we validate these findings and the utility of the confidence metrics using realistic in silico gene regulatory networks. This new characterization approach provides a way to more rigorously interpret how algorithms infer regulation from biological datasets. AVAILABILITY AND IMPLEMENTATION Code is available at http://github.com/bagherilab/networkinference/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joseph J Muldoon
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL, USA
| | - Jessica S Yu
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
| | - Mohammad-Kasim Fassia
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Department of Biomedical Engineering, Northwestern University, Evanston, IL, USA
| | - Neda Bagheri
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, USA
- Chemistry of Life Processes Institute, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
| |
Collapse
|
63
|
Leng S, Xu Z, Ma H. Reconstructing directional causal networks with random forest: Causality meeting machine learning. CHAOS (WOODBURY, N.Y.) 2019; 29:093130. [PMID: 31575149 DOI: 10.1063/1.5120778] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2019] [Accepted: 09/08/2019] [Indexed: 06/10/2023]
Abstract
Inspired by the decision tree algorithm in machine learning, a novel causal network reconstruction framework is proposed with the name Importance Causal Analysis (ICA). The ICA framework is designed in a network level and fills the gap between traditional mutual causality detection methods and the reconstruction of causal networks. The potential of the method to identify the true causal relations in complex networks is validated by both benchmark systems and real-world data sets.
Collapse
Affiliation(s)
- Siyang Leng
- School of Mathematical Sciences, Fudan University, Shanghai 200433, China
| | - Ziwei Xu
- School of Mathematical Sciences, Soochow University, Suzhou 215006, China
| | - Huanfei Ma
- School of Mathematical Sciences, Soochow University, Suzhou 215006, China
| |
Collapse
|
64
|
Li Y, Jann T, Vera-Licona P. Benchmarking time-series data discretization on inference methods. Bioinformatics 2019; 35:3102-3109. [PMID: 30657860 DOI: 10.1093/bioinformatics/btz036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 12/10/2018] [Accepted: 01/14/2019] [Indexed: 12/15/2022] Open
Abstract
SUMMARY The rapid development in quantitatively measuring DNA, RNA and protein has generated a great interest in the development of reverse-engineering methods, that is, data-driven approaches to infer the network structure or dynamical model of the system. Many reverse-engineering methods require discrete quantitative data as input, while many experimental data are continuous. Some studies have started to reveal the impact that the choice of data discretization has on the performance of reverse-engineering methods. However, more comprehensive studies are still greatly needed to systematically and quantitatively understand the impact that discretization methods have on inference methods. Furthermore, there is an urgent need for systematic comparative methods that can help select between discretization methods. In this work, we consider four published intracellular networks inferred with their respective time-series datasets. We discretized the data using different discretization methods. Across all datasets, changing the data discretization to a more appropriate one improved the reverse-engineering methods' performance. We observed no universal best discretization method across different time-series datasets. Thus, we propose DiscreeTest, a two-step evaluation metric for ranking discretization methods for time-series data. The underlying assumption of DiscreeTest is that an optimal discretization method should preserve the dynamic patterns observed in the original data across all variables. We used the same datasets and networks to show that DiscreeTest is able to identify an appropriate discretization among several candidate methods. To our knowledge, this is the first time that a method for benchmarking and selecting an appropriate discretization method for time-series data has been proposed. AVAILABILITY AND IMPLEMENTATION All the datasets, reverse-engineering methods and source code used in this paper are available in Vera-Licona's lab Github repository: https://github.com/VeraLiconaResearchGroup/Benchmarking_TSDiscretizations. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuezhe Li
- R.D. Berlin Center for Cell Analysis and Modeling, University of Connecticut School of Medicine, Farmington, CT, USA
| | - Tiffany Jann
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Paola Vera-Licona
- Center for Quantitative Medicine, University of Connecticut School of Medicine, Farmington, CT, USA.,Department of Cell Biology, University of Connecticut School of Medicine, Farmington, CT, USA.,Department of Pediatrics, University of Connecticut School of Medicine, Farmington, CT, USA.,Institute for Systems Genomics, University of Connecticut School of Medicine, Farmington, CT, USA
| |
Collapse
|
65
|
Young WC, Yeung KY, Raftery AE. Identifying Dynamical Time Series Model Parameters from Equilibrium Samples, with Application to Gene Regulatory Networks. STAT MODEL 2019; 19:444-465. [PMID: 33824624 DOI: 10.1177/1471082x18776577] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Gene regulatory network reconstruction is an essential task of genomics in order to further our understanding of how genes interact dynamically with each other. The most readily available data, however, are from steady state observations. These data are not as informative about the relational dynamics between genes as knockout or over-expression experiments, which attempt to control the expression of individual genes. We develop a new framework for network inference using samples from the equilibrium distribution of a vector autoregressive (VAR) time-series model which can be applied to steady state gene expression data. We explore the theoretical aspects of our method and apply the method to synthetic gene expression data generated using GeneNetWeaver.
Collapse
Affiliation(s)
- William Chad Young
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma, WA, USA
| | - Adrian E Raftery
- Department of Statistics, University of Washington, Seattle, WA, USA
| |
Collapse
|
66
|
Chen X, Gu J, Wang X, Jung JG, Wang TL, Hilakivi-Clarke L, Clarke R, Xuan J. CRNET: an efficient sampling approach to infer functional regulatory networks by integrating large-scale ChIP-seq and time-course RNA-seq data. Bioinformatics 2019; 34:1733-1740. [PMID: 29280996 DOI: 10.1093/bioinformatics/btx827] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Accepted: 12/20/2017] [Indexed: 12/28/2022] Open
Abstract
Motivation NGS techniques have been widely applied in genetic and epigenetic studies. Multiple ChIP-seq and RNA-seq profiles can now be jointly used to infer functional regulatory networks (FRNs). However, existing methods suffer from either oversimplified assumption on transcription factor (TF) regulation or slow convergence of sampling for FRN inference from large-scale ChIP-seq and time-course RNA-seq data. Results We developed an efficient Bayesian integration method (CRNET) for FRN inference using a two-stage Gibbs sampler to estimate iteratively hidden TF activities and the posterior probabilities of binding events. A novel statistic measure that jointly considers regulation strength and regression error enables the sampling process of CRNET to converge quickly, thus making CRNET very efficient for large-scale FRN inference. Experiments on synthetic and benchmark data showed a significantly improved performance of CRNET when compared with existing methods. CRNET was applied to breast cancer data to identify FRNs functional at promoter or enhancer regions in breast cancer MCF-7 cells. Transcription factor MYC is predicted as a key functional factor in both promoter and enhancer FRNs. We experimentally validated the regulation effects of MYC on CRNET-predicted target genes using appropriate RNAi approaches in MCF-7 cells. Availability and implementation R scripts of CRNET are available at http://www.cbil.ece.vt.edu/software.htm. Contact xuan@vt.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xi Chen
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jinghua Gu
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Xiao Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jin-Gyoung Jung
- Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, USA
| | - Tian-Li Wang
- Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, USA
| | - Leena Hilakivi-Clarke
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA
| | - Robert Clarke
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA
| | - Jianhua Xuan
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
67
|
Pirgazi J, Khanteymoori AR, Jalilkhani M. TIGRNCRN: Trustful inference of gene regulatory network using clustering and refining the network. J Bioinform Comput Biol 2019; 17:1950018. [DOI: 10.1142/s0219720019500185] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this study, in order to deal with the noise and uncertainty in gene expression data, learning networks, especially Bayesian networks, that have the ability to use prior knowledge, were used to infer gene regulatory network. Learning networks are methods that have the structure of the network and a learning process to obtain relationships. One of the methods which have been used for measuring the relationship between genes is the correlation metrics, but the high correlated genes not necessarily mean that they have causal effect on each other. Studies on common methods in inference of gene regulatory networks are yet to pay attention to their biological importance and as such, predictions by these methods are less accurate in terms of biological significance. Hence, in the proposed method, genes with high correlation were identified in one cluster using clustering, and the existence of edge between the genes in the cluster was prevented. Finally, after the Bayesian network modeling, based on knowledge gained from clustering, the refining phase and improving regulatory interactions using biological correlation were done. In order to show the efficiency, the proposed method has been compared with several common methods in this area including GENIE3 and BMALR. The results of the evaluation indicate that the proposed method recognized regulatory relations in Bayesian modeling process well, due to using of biological knowledge which is hidden in the data collection, and is able to recognize gene regulatory networks align with important methods in this field.
Collapse
Affiliation(s)
- Jamshid Pirgazi
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
| | - Ali Reza Khanteymoori
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Maryam Jalilkhani
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
| |
Collapse
|
68
|
Yang Z, Liu J. Learning fuzzy cognitive maps with convergence using a multi-agent genetic algorithm. Soft comput 2019. [DOI: 10.1007/s00500-019-04173-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
69
|
Kolahdoozi M, Amirkhani A, Shojaeefard MH, Abraham A. A novel quantum inspired algorithm for sparse fuzzy cognitive maps learning. APPL INTELL 2019. [DOI: 10.1007/s10489-019-01476-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
70
|
Liang Y, Kelemen A. Dynamic modeling and network approaches for omics time course data: overview of computational approaches and applications. Brief Bioinform 2019; 19:1051-1068. [PMID: 28430854 DOI: 10.1093/bib/bbx036] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Indexed: 12/23/2022] Open
Abstract
Inferring networks and dynamics of genes, proteins, cells and other biological entities from high-throughput biological omics data is a central and challenging issue in computational and systems biology. This is essential for understanding the complexity of human health, disease susceptibility and pathogenesis for Predictive, Preventive, Personalized and Participatory (P4) system and precision medicine. The delineation of the possible interactions of all genes/proteins in a genome/proteome is a task for which conventional experimental techniques are ill suited. Urgently needed are rapid and inexpensive computational and statistical methods that can identify interacting candidate disease genes or drug targets out of thousands that can be further investigated or validated by experimentations. Moreover, identifying biological dynamic systems, and simultaneously estimating the important kinetic structural and functional parameters, which may not be experimentally accessible could be important directions for drug-disease-gene network studies. In this article, we present an overview and comparison of recent developments of dynamic modeling and network approaches for time-course omics data, and their applications to various biological systems, health conditions and disease statuses. Moreover, various data reduction and analytical schemes ranging from mathematical to computational to statistical methods are compared including their merits, drawbacks and limitations. The most recent software, associated web resources and other potentials for the compared methods are also presented and discussed in detail.
Collapse
Affiliation(s)
- Yulan Liang
- Department of Family and Community Health, University of Maryland, Baltimore, MD, USA
| | - Arpad Kelemen
- Department of Family and Community Health, University of Maryland, Baltimore, MD, USA
| |
Collapse
|
71
|
Castro JC, Valdés I, Gonzalez-García LN, Danies G, Cañas S, Winck FV, Ñústez CE, Restrepo S, Riaño-Pachón DM. Gene regulatory networks on transfer entropy (GRNTE): a novel approach to reconstruct gene regulatory interactions applied to a case study for the plant pathogen Phytophthora infestans. Theor Biol Med Model 2019; 16:7. [PMID: 30961611 PMCID: PMC6454757 DOI: 10.1186/s12976-019-0103-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 03/07/2019] [Indexed: 11/10/2022] Open
Abstract
Background The increasing amounts of genomics data have helped in the understanding of the molecular dynamics of complex systems such as plant and animal diseases. However, transcriptional regulation, although playing a central role in the decision-making process of cellular systems, is still poorly understood. In this study, we linked expression data with mathematical models to infer gene regulatory networks (GRN). We present a simple yet effective method to estimate transcription factors’ GRNs from transcriptional data. Method We defined interactions between pairs of genes (edges in the GRN) as the partial mutual information between these genes that takes into account time and possible lags in time from one gene in relation to another. We call this method Gene Regulatory Networks on Transfer Entropy (GRNTE) and it corresponds to Granger causality for Gaussian variables in an autoregressive model. To evaluate the reconstruction accuracy of our method, we generated several sub-networks from the GRN of the eukaryotic yeast model, Saccharomyces cerevisae. Then, we applied this method using experimental data of the plant pathogen Phytophthora infestans. We evaluated the transcriptional expression levels of 48 transcription factors of P. infestans during its interaction with one moderately resistant and one susceptible cultivar of yellow potato (Solanum tuberosum group Phureja), using RT-qPCR. With these data, we reconstructed the regulatory network of P. infestans during its interaction with these hosts. Results We first evaluated the performance of our method, based on the transfer entropy (GRNTE), on eukaryotic datasets from the GRNs of the yeast S. cerevisae. Results suggest that GRNTE is comparable with the state-of-the-art methods when the parameters for edge detection are properly tuned. In the case of P. infestans, most of the genes considered in this study, showed a significant change in expression from the onset of the interaction (0 h post inoculum - hpi) to the later time-points post inoculation. Hierarchical clustering of the expression data discriminated two distinct periods during the infection: from 12 to 36 hpi and from 48 to 72 hpi for both the moderately resistant and susceptible cultivars. These distinct periods could be associated with two phases of the life cycle of the pathogen when infecting the host plant: the biotrophic and necrotrophic phases. Conclusions Here we presented an algorithmic solution to the problem of network reconstruction in time series data. This analytical perspective makes use of the dynamic nature of time series data as it relates to intrinsically dynamic processes such as transcription regulation, were multiple elements of the cell (e.g., transcription factors) act simultaneously and change over time. We applied the algorithm to study the regulatory network of P. infestans during its interaction with two hosts which differ in their level of resistance to the pathogen. Although the gene expression analysis did not show differences between the two hosts, the results of the GRN analyses evidenced rewiring of the genes’ interactions according to the resistance level of the host. This suggests that different regulatory processes are activated in response to different environmental cues. Applications of our methodology showed that it could reliably predict where to place edges in the transcriptional networks and sub-networks. The experimental approach used here can help provide insights on the biological role of these interactions on complex processes such as pathogenicity. The code used is available at https://github.com/jccastrog/GRNTE under GNU general public license 3.0. Electronic supplementary material The online version of this article (10.1186/s12976-019-0103-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Juan Camilo Castro
- Department of Biological Sciences, Universidad de los Andes, Bogotá D.C, Colombia
| | - Ivan Valdés
- Department of Biological Sciences, Universidad de los Andes, Bogotá D.C, Colombia
| | | | - Giovanna Danies
- Department of Design, Universidad de los Andes, Bogotá D.C, Colombia
| | - Silvia Cañas
- Department of Biological Sciences, Universidad de los Andes, Bogotá D.C, Colombia
| | - Flavia Vischi Winck
- Regulatory Systems Biology Laboratory, Department of Biochemistry, Institute of Chemistry, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Carlos Eduardo Ñústez
- School of Agricultural Sciences, Universidad Nacional de Colombia, Bogotá D.C, Colombia
| | - Silvia Restrepo
- Department of Biological Sciences, Universidad de los Andes, Bogotá D.C, Colombia
| | - Diego Mauricio Riaño-Pachón
- Computational, Evolutionary and Systems Biology Laboratory, Center for Nuclear Energy in Agriculture, Universidade de São Paulo, Piracicaba, SP, Brazil.
| |
Collapse
|
72
|
Computational methods for Gene Regulatory Networks reconstruction and analysis: A review. Artif Intell Med 2019; 95:133-145. [DOI: 10.1016/j.artmed.2018.10.006] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 10/23/2018] [Accepted: 10/23/2018] [Indexed: 01/14/2023]
|
73
|
Khalid M, Khan S, Ahmad J, Shaheryar M. Identification of self-regulatory network motifs in reverse engineering gene regulatory networks using microarray gene expression data. IET Syst Biol 2019; 13:55-68. [PMID: 33444479 PMCID: PMC8687352 DOI: 10.1049/iet-syb.2018.5001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 11/01/2018] [Accepted: 12/10/2018] [Indexed: 11/19/2022] Open
Abstract
Gene Regulatory Networks (GRNs) are reconstructed from the microarray gene expression data through diversified computational approaches. This process ensues in symmetric and diagonal interaction of gene pairs that cannot be modelled as direct activation, inhibition, and self-regulatory interactions. The values of gene co-expressions could help in identifying co-regulations among them. The proposed approach aims at computing the differences in variances of co-expressed genes rather than computing differences in values of mean expressions across experimental conditions. It adopts multivariate co-variances using principal component analysis (PCA) to predict an asymmetric and non-diagonal gene interaction matrix, to select only those gene pair interactions that exhibit the maximum variances in gene regulatory expressions. The asymmetric gene regulatory interactions help in identifying the controlling regulatory agents, thus lowering the false positive rate by minimizing the connections between previously unlinked network components. The experimental results on real as well as in silico datasets including time-series RTX therapy, Arabidopsis thaliana, DREAM-3, and DREAM-8 datasets, in comparison with existing state-of-the-art approaches demonstrated the enhanced performance of the proposed approach for predicting positive and negative feedback loops and self-regulatory interactions. The generated GRNs hold the potential in determining the real nature of gene pair regulatory interactions.
Collapse
Affiliation(s)
- Mehrosh Khalid
- School of Electrical Engineering and Computer ScienceNational University of Sciences and TechnologyIslamabadPakistan
| | - Sharifullah Khan
- School of Electrical Engineering and Computer ScienceNational University of Sciences and TechnologyIslamabadPakistan
| | - Jamil Ahmad
- Research Centre for Modelling and SimulationNational University of Sciences and TechnologyIslamabadPakistan
| | - Muhammad Shaheryar
- Department of Computer ScienceCapital University of Science and TechnologyIslamabadPakistan
| |
Collapse
|
74
|
Foo M, Kim J, Bates DG. Modelling and Control of Gene Regulatory Networks for Perturbation Mitigation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:583-595. [PMID: 29994499 DOI: 10.1109/tcbb.2017.2771775] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Synthetic Biologists are increasingly interested in the idea of using synthetic feedback control circuits for the mitigation of perturbations to gene regulatory networks that may arise due to disease and/or environmental disturbances. Models employing Michaelis-Menten kinetics with Hill-type nonlinearities are typically used to represent the dynamics of gene regulatory networks. Here, we identify some fundamental problems with such models from the point of view of control system design, and argue that an alternative formalism, based on so-called S-System models, is more suitable. Using tools from system identification, we show how to build S-System models that capture the key dynamics of an example gene regulatory network, and design a genetic feedback controller with the objective of rejecting an external perturbation. Using a sine sweeping method, we show how the S-System model can be approximated by a linear transfer function and, based on this transfer function, we design our controller. Simulation results using the full nonlinear S-System model of the network show that the synthetic control circuit is able to mitigate the effect of external perturbations. Our study is the first to highlight the usefulness of the S-System modelling formalism for the design of synthetic control circuits for gene regulatory networks.
Collapse
|
75
|
Jahagirdar S, Suarez-Diez M, Saccenti E. Simulation and Reconstruction of Metabolite-Metabolite Association Networks Using a Metabolic Dynamic Model and Correlation Based Algorithms. J Proteome Res 2019; 18:1099-1113. [PMID: 30663881 DOI: 10.1021/acs.jproteome.8b00781] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Biological networks play a paramount role in our understanding of complex biological phenomena, and metabolite-metabolite association networks are now commonly used in metabolomics applications. In this study we evaluate the performance of several network inference algorithms (PCLRC, MRNET, GENIE3, TIGRESS, and modifications of the MRNET algorithm, together with standard Pearson's and Spearman's correlation) using as a test case data generated using a dynamic metabolic model describing the metabolism of arachidonic acid (consisting of 83 metabolites and 131 reactions) and simulation individual metabolic profiles of 550 subjects. The quality of the reconstructed metabolite-metabolite association networks was assessed against the original metabolic network taking into account different degrees of association among the metabolites and different sample sizes and noise levels. We found that inference algorithms based on resampling and bootstrapping perform better when correlations are used as indexes to measure the strength of metabolite-metabolite associations. We also advocate for the use of data generated using dynamic models to test the performance of algorithms for network inference since they produce correlation patterns that are more similar to those observed in real metabolomics data.
Collapse
Affiliation(s)
- Sanjeevan Jahagirdar
- Laboratory of Systems and Synthetic Biology , Wageningen University & Research , Stippeneng 4 , 6708WE Wageningen , The Netherlands
| | - Maria Suarez-Diez
- Laboratory of Systems and Synthetic Biology , Wageningen University & Research , Stippeneng 4 , 6708WE Wageningen , The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology , Wageningen University & Research , Stippeneng 4 , 6708WE Wageningen , The Netherlands
| |
Collapse
|
76
|
Larsen SJ, Röttger R, Schmidt HH, Baumbach J. E. coli gene regulatory networks are inconsistent with gene expression data. Nucleic Acids Res 2019; 47:85-92. [PMID: 30462289 PMCID: PMC6326786 DOI: 10.1093/nar/gky1176] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 10/29/2018] [Accepted: 11/05/2018] [Indexed: 12/16/2022] Open
Abstract
Gene regulatory networks (GRNs) and gene expression data form a core element of systems biology-based phenotyping. Changes in the expression of transcription factors are commonly believed to have a causal effect on the expression of their targets. Here we evaluated in the best researched model organism, Escherichia coli, the consistency between a GRN and a large gene expression compendium. Surprisingly, a modest correlation was observed between the expression of transcription factors and their targets and, most noteworthy, both activating and repressing interactions were associated with positive correlation. When evaluated using a sign consistency model we found the regulatory network was not more consistent with measured expression than random network models. We conclude that, at least in E. coli, one cannot expect a causal relationship between the expression of transcription and factors their targets, and that the current static GRN does not adequately explain transcriptional regulation. The implications of this are profound as they question what we consider established knowledge of the systemic biology of cells and point to methodological limitations with respect to single omics analysis, static networks and temporality.
Collapse
Affiliation(s)
- Simon J Larsen
- Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Richard Röttger
- Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Harald H H W Schmidt
- Department of Pharmacology and Personalised Medicine, MaCSBio, Maastricht University, Universiteitssingel 60, 6229 ER, Maastricht, The Netherlands
| | - Jan Baumbach
- Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
- Chair of Experimental Bioinformatics, Wissenschaftszentrum Weihenstephan, Technical University of Munich, Maximus-von-Imhof-Forum 3, 85354 Freising-Weihenstephan, Germany
| |
Collapse
|
77
|
Yang Z, Liu J. Learning of fuzzy cognitive maps using a niching-based multi-modal multi-agent genetic algorithm. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2018.10.038] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
78
|
Angelin-Bonnet O, Biggs PJ, Vignes M. Gene Regulatory Networks: A Primer in Biological Processes and Statistical Modelling. Methods Mol Biol 2019; 1883:347-383. [PMID: 30547408 DOI: 10.1007/978-1-4939-8882-2_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Modelling gene regulatory networks requires not only a thorough understanding of the biological system depicted, but also the ability to accurately represent this system from a mathematical perspective. Throughout this chapter, we aim to familiarize the reader with the biological processes and molecular factors at play in the process of gene expression regulation. We first describe the different interactions controlling each step of the expression process, from transcription to mRNA and protein decay. In the second section, we provide statistical tools to accurately represent this biological complexity in the form of mathematical models. Among other considerations, we discuss the topological properties of biological networks, the application of deterministic and stochastic frameworks, and the quantitative modelling of regulation. We particularly focus on the use of such models for the simulation of expression data that can serve as a benchmark for the testing of network inference algorithms.
Collapse
Affiliation(s)
- Olivia Angelin-Bonnet
- Institute of Fundamental Sciences, Palmerston North, New Zealand
- School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Patrick J Biggs
- Institute of Fundamental Sciences, Palmerston North, New Zealand
- School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Matthieu Vignes
- Institute of Fundamental Sciences, Palmerston North, New Zealand.
- School of Veterinary Science, Massey University, Palmerston North, New Zealand.
| |
Collapse
|
79
|
Nguyen P, Braun R. Time-lagged Ordered Lasso for network inference. BMC Bioinformatics 2018; 19:545. [PMID: 30594121 PMCID: PMC6311035 DOI: 10.1186/s12859-018-2558-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 12/04/2018] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Accurate gene regulatory networks can be used to explain the emergence of different phenotypes, disease mechanisms, and other biological functions. Many methods have been proposed to infer networks from gene expression data but have been hampered by problems such as low sample size, inaccurate constraints, and incomplete characterizations of regulatory dynamics. Since expression regulation is dynamic, time-course data can be used to infer causality, but these datasets tend to be short or sparsely sampled. In addition, temporal methods typically assume that the expression of a gene at a time point depends on the expression of other genes at only the immediately preceding time point, while other methods include additional time points without any constraints to account for their temporal distance. These limitations can contribute to inaccurate networks with many missing and anomalous links. RESULTS We adapted the time-lagged Ordered Lasso, a regularized regression method with temporal monotonicity constraints, for de novo reconstruction. We also developed a semi-supervised method that embeds prior network information into the Ordered Lasso to discover novel regulatory dependencies in existing pathways. R code is available at https://github.com/pn51/laggedOrderedLassoNetwork . CONCLUSIONS We evaluated these approaches on simulated data for a repressilator, time-course data from past DREAM challenges, and a HeLa cell cycle dataset to show that they can produce accurate networks subject to the dynamics and assumptions of the time-lagged Ordered Lasso regression.
Collapse
Affiliation(s)
- Phan Nguyen
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL USA
| | - Rosemary Braun
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL USA
- Biostatistics Division, Feinberg School of Medicine, Northwestern University, Chicago, IL USA
| |
Collapse
|
80
|
Kuzmanovski V, Todorovski L, Džeroski S. Extensive evaluation of the generalized relevance network approach to inferring gene regulatory networks. Gigascience 2018; 7:5099470. [PMID: 30239704 PMCID: PMC6420648 DOI: 10.1093/gigascience/giy118] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2017] [Accepted: 09/11/2018] [Indexed: 01/15/2023] Open
Abstract
Background The generalized relevance network approach to network inference reconstructs network links based on the strength of associations between data in individual network nodes. It can reconstruct undirected networks, i.e., relevance networks, sensu stricto, as well as directed networks, referred to as causal relevance networks. The generalized approach allows the use of an arbitrary measure of pairwise association between nodes, an arbitrary scoring scheme that transforms the associations into weights of the network links, and a method for inferring the directions of the links. While this makes the approach powerful and flexible, it introduces the challenge of finding a combination of components that would perform well on a given inference task. Results We address this challenge by performing an extensive empirical analysis of the performance of 114 variants of the generalized relevance network approach on 47 tasks of gene network inference from time-series data and 39 tasks of gene network inference from steady-state data. We compare the different variants in a multi-objective manner, considering their ranking in terms of different performance metrics. The results suggest a set of recommendations that provide guidance for selecting an appropriate variant of the approach in different data settings. Conclusions The association measures based on correlation, combined with a particular scoring scheme of asymmetric weighting, lead to optimal performance of the relevance network approach in the general case. In the two special cases of inference tasks involving short time-series data and/or large networks, association measures based on identifying qualitative trends in the time series are more appropriate.
Collapse
Affiliation(s)
- Vladimir Kuzmanovski
- Department of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia
| | - Ljupco Todorovski
- Department of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia.,Faculty of Public Administration, University of Ljubljana, Gosarjeva ulica 5, 1000 Ljubljana, Slovenia
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia
| |
Collapse
|
81
|
Reverse engineering gene regulatory networks by modular response analysis - a benchmark. Essays Biochem 2018; 62:535-547. [PMID: 30315094 DOI: 10.1042/ebc20180012] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 08/13/2018] [Accepted: 08/24/2018] [Indexed: 11/17/2022]
Abstract
Gene regulatory networks control the cellular phenotype by changing the RNA and protein composition. Despite its importance, the gene regulatory network in higher organisms is only partly mapped out. Here, we investigate the potential of reverse engineering methods to unravel the structure of these networks. Particularly, we focus on modular response analysis (MRA), a method that can disentangle networks from perturbation data. We benchmark a version of MRA that was previously successfully applied to reconstruct a signalling-driven genetic network, termed MLMSMRA, to test cases mimicking various aspects of gene regulatory networks. We then investigate the performance in comparison with other MRA realisations and related methods. The benchmark shows that MRA has the potential to predict functional interactions, but also shows that successful application of MRA is restricted to small sparse networks and to data with a low signal-to-noise ratio.
Collapse
|
82
|
Liu L, Liu J. Inferring gene regulatory networks with hybrid of multi-agent genetic algorithm and random forests based on fuzzy cognitive maps. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.05.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
83
|
Hung LH, Shi K, Wu M, Young WC, Raftery AE, Yeung KY. fastBMA: scalable network inference and transitive reduction. Gigascience 2018; 6:1-10. [PMID: 29020744 PMCID: PMC5632288 DOI: 10.1093/gigascience/gix078] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 08/10/2017] [Indexed: 11/15/2022] Open
Abstract
Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem. We evaluated the performance of fastBMA on synthetic data and experimental genome-wide time series yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory-efficient, parallel, and distributed application that scales to human genome-wide expression data. A 10 000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster (2 nodes of 16 cores). fastBMA is a significant improvement over its predecessor ScanBMA. It is more accurate and orders of magnitude faster than other fast network inference methods such as the 1 based on LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable time frame. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/).
Collapse
Affiliation(s)
- Ling-Hong Hung
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - Kaiyuan Shi
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - Migao Wu
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - William Chad Young
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195-4322, U.S.A
| | - Adrian E. Raftery
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195-4322, U.S.A
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
- Correspondence address. Ka Yee Yeung, Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A.; Tel: 253-692-4924; Fax: 253-692-5862; E-mail:
| |
Collapse
|
84
|
Mittal V, Hung LH, Keswani J, Kristiyanto D, Lee SB, Yeung KY. GUIdock-VNC: using a graphical desktop sharing system to provide a browser-based interface for containerized software. Gigascience 2018; 6:1-6. [PMID: 28327936 PMCID: PMC5530313 DOI: 10.1093/gigascience/giw013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Accepted: 12/16/2016] [Indexed: 11/30/2022] Open
Abstract
Background: Software container technology such as Docker can be used to package and distribute bioinformatics workflows consisting of multiple software implementations and dependencies. However, Docker is a command line–based tool, and many bioinformatics pipelines consist of components that require a graphical user interface. Results: We present a container tool called GUIdock-VNC that uses a graphical desktop sharing system to provide a browser-based interface for containerized software. GUIdock-VNC uses the Virtual Network Computing protocol to render the graphics within most commonly used browsers. We also present a minimal image builder that can add our proposed graphical desktop sharing system to any Docker packages, with the end result that any Docker packages can be run using a graphical desktop within a browser. In addition, GUIdock-VNC uses the Oauth2 authentication protocols when deployed on the cloud. Conclusions: As a proof-of-concept, we demonstrated the utility of GUIdock-noVNC in gene network inference. We benchmarked our container implementation on various operating systems and showed that our solution creates minimal overhead.
Collapse
|
85
|
Khan A, Saha G, Pal RK. An approach for reduction of false predictions in reverse engineering of gene regulatory networks. J Theor Biol 2018; 445:9-30. [PMID: 29462626 DOI: 10.1016/j.jtbi.2018.02.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 07/07/2017] [Accepted: 02/15/2018] [Indexed: 10/18/2022]
Abstract
A gene regulatory network discloses the regulatory interactions amongst genes, at a particular condition of the human body. The accurate reconstruction of such networks from time-series genetic expression data using computational tools offers a stiff challenge for contemporary computer scientists. This is crucial to facilitate the understanding of the proper functioning of a living organism. Unfortunately, the computational methods produce many false predictions along with the correct predictions, which is unwanted. Investigations in the domain focus on the identification of as many correct regulations as possible in the reverse engineering of gene regulatory networks to make it more reliable and biologically relevant. One way to achieve this is to reduce the number of incorrect predictions in the reconstructed networks. In the present investigation, we have proposed a novel scheme to decrease the number of false predictions by suitably combining several metaheuristic techniques. We have implemented the same using a dataset ensemble approach (i.e. combining multiple datasets) also. We have employed the proposed methodology on real-world experimental datasets of the SOS DNA Repair network of Escherichia coli and the IMRA network of Saccharomyces cerevisiae. Subsequently, we have experimented upon somewhat larger, in silico networks, namely, DREAM3 and DREAM4 Challenge networks, and 15-gene and 20-gene networks extracted from the GeneNetWeaver database. To study the effect of multiple datasets on the quality of the inferred networks, we have used four datasets in each experiment. The obtained results are encouraging enough as the proposed methodology can reduce the number of false predictions significantly, without using any supplementary prior biological information for larger gene regulatory networks. It is also observed that if a small amount of prior biological information is incorporated here, the results improve further w.r.t. the prediction of true positives.
Collapse
Affiliation(s)
- Abhinandan Khan
- Department of Computer Science and Engineering, University of Calcutta, Acharya Prafulla Chandra Roy Siksha Prangan, JD-2, Sector-III, Saltlake, Kolkata 700106, India.
| | - Goutam Saha
- Department of Information Technology, North-Eastern Hill University, Shillong 793022, India.
| | - Rajat Kumar Pal
- Department of Computer Science and Engineering, University of Calcutta, Acharya Prafulla Chandra Roy Siksha Prangan, JD-2, Sector-III, Saltlake, Kolkata 700106, India.
| |
Collapse
|
86
|
Pirayre A, Couprie C, Duval L, Pesquet JC. BRANE Clust: Cluster-Assisted Gene Regulatory Network Inference Refinement. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:850-860. [PMID: 28368827 DOI: 10.1109/tcbb.2017.2688355] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Discovering meaningful gene interactions is crucial for the identification of novel regulatory processes in cells. Building accurately the related graphs remains challenging due to the large number of possible solutions from available data. Nonetheless, enforcing a priori on the graph structure, such as modularity, may reduce network indeterminacy issues. BRANE Clust (Biologically-Related A priori Network Enhancement with Clustering) refines gene regulatory network (GRN) inference thanks to cluster information. It works as a post-processing tool for inference methods (i.e., CLR, GENIE3). In BRANE Clust, the clustering is based on the inversion of a system of linear equations involving a graph-Laplacian matrix promoting a modular structure. Our approach is validated on DREAM4 and DREAM5 datasets with objective measures, showing significant comparative improvements. We provide additional insights on the discovery of novel regulatory or co-expressed links in the inferred Escherichia coli network evaluated using the STRING database. The comparative pertinence of clustering is discussed computationally (SIMoNe, WGCNA, X-means) and biologically (RegulonDB). BRANE Clust software is available at: http://www-syscom.univ-mlv.fr/~pirayre/Codes-GRN-BRANE-clust.html.
Collapse
|
87
|
Awdeh A, Phenix H, Karn M, Perkins TJ. Dynamics in Epistasis Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:878-891. [PMID: 28092574 DOI: 10.1109/tcbb.2017.2653110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Finding regulatory relationships between genes, including the direction and nature of influence between them, is a fundamental challenge in the field of molecular genetics. One classical approach to this problem is epistasis analysis. Broadly speaking, epistasis analysis infers the regulatory relationships between a pair of genes in a genetic pathway by considering the patterns of change in an observable trait resulting from single and double deletion of genes. While classical epistasis analysis has yielded deep insights on numerous genetic pathways, it is not without limitations. Here, we explore the possibility of dynamic epistasis analysis, in which, in addition to performing genetic perturbations of a pathway, we drive the pathway by a time-varying upstream signal. We explore the theoretical power of dynamical epistasis analysis by conducting an identifiability analysis of Boolean models of genetic pathways, comparing static and dynamic approaches. We find that even relatively simple input dynamics greatly increases the power of epistasis analysis to discriminate alternative network structures. Further, we explore the question of experiment design, and show that a subset of short time-varying signals, which we call dynamic primitives, allow maximum discriminative power with a reduced number of experiments.
Collapse
|
88
|
Mall R, Cerulo L, Garofano L, Frattini V, Kunji K, Bensmail H, Sabedot TS, Noushmehr H, Lasorella A, Iavarone A, Ceccarelli M. RGBM: regularized gradient boosting machines for identification of the transcriptional regulators of discrete glioma subtypes. Nucleic Acids Res 2018; 46:e39. [PMID: 29361062 PMCID: PMC6283452 DOI: 10.1093/nar/gky015] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Accepted: 01/06/2018] [Indexed: 01/05/2023] Open
Abstract
We propose a generic framework for gene regulatory network (GRN) inference approached as a feature selection problem. GRNs obtained using Machine Learning techniques are often dense, whereas real GRNs are rather sparse. We use a Tikonov regularization inspired optimal L-curve criterion that utilizes the edge weight distribution for a given target gene to determine the optimal set of TFs associated with it. Our proposed framework allows to incorporate a mechanistic active biding network based on cis-regulatory motif analysis. We evaluate our regularization framework in conjunction with two non-linear ML techniques, namely gradient boosting machines (GBM) and random-forests (GENIE), resulting in a regularized feature selection based method specifically called RGBM and RGENIE respectively. RGBM has been used to identify the main transcription factors that are causally involved as master regulators of the gene expression signature activated in the FGFR3-TACC3-positive glioblastoma. Here, we illustrate that RGBM identifies the main regulators of the molecular subtypes of brain tumors. Our analysis reveals the identity and corresponding biological activities of the master regulators characterizing the difference between G-CIMP-high and G-CIMP-low subtypes and between PA-like and LGm6-GBM, thus providing a clue to the yet undetermined nature of the transcriptional events among these subtypes.
Collapse
Affiliation(s)
- Raghvendra Mall
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Luigi Cerulo
- Department of Science and Technology, University of Sannio, Benevento, Italy
- BIOGEM Istituto di Ricerche Genetiche “G. Salvatore”, Ariano Irpino, Italy
| | - Luciano Garofano
- Department of Science and Technology, University of Sannio, Benevento, Italy
- BIOGEM Istituto di Ricerche Genetiche “G. Salvatore”, Ariano Irpino, Italy
| | - Veronique Frattini
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY 10032, USA
| | - Khalid Kunji
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Halima Bensmail
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Thais S Sabedot
- Department of Neurosurgery, Brain Tumor Center, Henry Ford Health System, Detroit, MI, USA
- Department of Genetics (CISBi/NAP), Department of Surgery and Anatomy, Ribeirão Preto Medical School, University of Sao Paulo, Monte Alegre, Ribeirao Preto, Brazil
| | - Houtan Noushmehr
- Department of Neurosurgery, Brain Tumor Center, Henry Ford Health System, Detroit, MI, USA
- Department of Genetics (CISBi/NAP), Department of Surgery and Anatomy, Ribeirão Preto Medical School, University of Sao Paulo, Monte Alegre, Ribeirao Preto, Brazil
| | - Anna Lasorella
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY 10032, USA
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, New York 10032, USA
- Department of Pediatrics, Columbia University Medical Center, New York, New York 10032, USA
| | - Antonio Iavarone
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY 10032, USA
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, New York 10032, USA
- Department of Neurology, Columbia University Medical Center, New York, New York 10032, USA
| | - Michele Ceccarelli
- Department of Science and Technology, University of Sannio, Benevento, Italy
- BIOGEM Istituto di Ricerche Genetiche “G. Salvatore”, Ariano Irpino, Italy
| |
Collapse
|
89
|
Nguyen P, Braun R. Semi-supervised network inference using simulated gene expression dynamics. Bioinformatics 2018; 34:1148-1156. [PMID: 29186340 DOI: 10.1093/bioinformatics/btx748] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 11/23/2017] [Indexed: 01/21/2023] Open
Abstract
Motivation Inferring the structure of gene regulatory networks from high-throughput datasets remains an important and unsolved problem. Current methods are hampered by problems such as noise, low sample size, and incomplete characterizations of regulatory dynamics, leading to networks with missing and anomalous links. Integration of prior network information (e.g. from pathway databases) has the potential to improve reconstructions. Results We developed a semi-supervised network reconstruction algorithm that enables the synthesis of information from partially known networks with time course gene expression data. We adapted partial least square-variable importance in projection (VIP) for time course data and used reference networks to simulate expression data from which null distributions of VIP scores are generated and used to estimate edge probabilities for input expression data. By using simulated dynamics to generate reference distributions, this approach incorporates previously known regulatory relationships and links the network to the dynamics to form a semi-supervised approach that discovers novel and anomalous connections. We applied this approach to data from a sleep deprivation study with KEGG pathways treated as prior networks, as well as to synthetic data from several DREAM challenges, and find that it is able to recover many of the true edges and identify errors in these networks, suggesting its ability to derive posterior networks that accurately reflect gene expression dynamics. Availability and implementation R code is available at https://github.com/pn51/postPLSR. Contact rbraun@northwestern.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Phan Nguyen
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL 60208, USA
| | - Rosemary Braun
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL 60208, USA.,Biostatistics Division, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| |
Collapse
|
90
|
Omony J, de Jong A, Krawczyk AO, Eijlander RT, Kuipers OP. Dynamic sporulation gene co-expression networks for Bacillus subtilis 168 and the food-borne isolate Bacillus amyloliquefaciens: a transcriptomic model. Microb Genom 2018; 4. [PMID: 29424683 PMCID: PMC5857382 DOI: 10.1099/mgen.0.000157] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Sporulation is a survival strategy, adapted by bacterial cells in response to harsh environmental adversities. The adaptation potential differs between strains and the variations may arise from differences in gene regulation. Gene networks are a valuable way of studying such regulation processes and establishing associations between genes. We reconstructed and compared sporulation gene co-expression networks (GCNs) of the model laboratory strain Bacillus subtilis 168 and the food-borne industrial isolate Bacillus amyloliquefaciens. Transcriptome data obtained from samples of six stages during the sporulation process were used for network inference. Subsequently, a gene set enrichment analysis was performed to compare the reconstructed GCNs of B. subtilis 168 and B. amyloliquefaciens with respect to biological functions, which showed the enriched modules with coherent functional groups associated with sporulation. On basis of the GCNs and time-evolution of differentially expressed genes, we could identify novel candidate genes strongly associated with sporulation in B. subtilis 168 and B. amyloliquefaciens. The GCNs offer a framework for exploring transcription factors, their targets, and co-expressed genes during sporulation. Furthermore, the methodology described here can conveniently be applied to other species or biological processes.
Collapse
Affiliation(s)
- Jimmy Omony
- 1Laboratory of Molecular Genetics, University of Groningen, 9747 AG Groningen, The Netherlands.,2Top Institute Food and Nutrition (TIFN), Nieuwe Kanaal 9A, 6709 PA Wageningen, The Netherlands
| | - Anne de Jong
- 1Laboratory of Molecular Genetics, University of Groningen, 9747 AG Groningen, The Netherlands.,2Top Institute Food and Nutrition (TIFN), Nieuwe Kanaal 9A, 6709 PA Wageningen, The Netherlands
| | - Antonina O Krawczyk
- 1Laboratory of Molecular Genetics, University of Groningen, 9747 AG Groningen, The Netherlands.,2Top Institute Food and Nutrition (TIFN), Nieuwe Kanaal 9A, 6709 PA Wageningen, The Netherlands
| | - Robyn T Eijlander
- 1Laboratory of Molecular Genetics, University of Groningen, 9747 AG Groningen, The Netherlands.,2Top Institute Food and Nutrition (TIFN), Nieuwe Kanaal 9A, 6709 PA Wageningen, The Netherlands.,3NIZO Food Research, B.V., P.O. Box 20, Ede 6710 BA, Ede, The Netherlands
| | - Oscar P Kuipers
- 1Laboratory of Molecular Genetics, University of Groningen, 9747 AG Groningen, The Netherlands.,2Top Institute Food and Nutrition (TIFN), Nieuwe Kanaal 9A, 6709 PA Wageningen, The Netherlands
| |
Collapse
|
91
|
Carlin DE, Paull EO, Graim K, Wong CK, Bivol A, Ryabinin P, Ellrott K, Sokolov A, Stuart JM. Prophetic Granger Causality to infer gene regulatory networks. PLoS One 2017; 12:e0170340. [PMID: 29211761 PMCID: PMC5718405 DOI: 10.1371/journal.pone.0170340] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 10/26/2017] [Indexed: 01/09/2023] Open
Abstract
We introduce a novel method called Prophetic Granger Causality (PGC) for inferring gene regulatory networks (GRNs) from protein-level time series data. The method uses an L1-penalized regression adaptation of Granger Causality to model protein levels as a function of time, stimuli, and other perturbations. When combined with a data-independent network prior, the framework outperformed all other methods submitted to the HPN-DREAM 8 breast cancer network inference challenge. Our investigations reveal that PGC provides complementary information to other approaches, raising the performance of ensemble learners, while on its own achieves moderate performance. Thus, PGC serves as a valuable new tool in the bioinformatics toolkit for analyzing temporal datasets. We investigate the general and cell-specific interactions predicted by our method and find several novel interactions, demonstrating the utility of the approach in charting new tumor wiring.
Collapse
Affiliation(s)
- Daniel E. Carlin
- University of California San Diego, Department of Medicine, La Jolla, CA, United States of America
| | - Evan O. Paull
- University of California Santa Cruz, Department of Biomolecular Engineering, Santa Cruz, CA, United States of America
| | - Kiley Graim
- University of California Santa Cruz, Department of Biomolecular Engineering, Santa Cruz, CA, United States of America
| | - Christopher K. Wong
- University of California Santa Cruz, Department of Biomolecular Engineering, Santa Cruz, CA, United States of America
| | - Adrian Bivol
- University of California Santa Cruz, Department of Biomolecular Engineering, Santa Cruz, CA, United States of America
| | - Peter Ryabinin
- University of California Santa Cruz, Department of Biomolecular Engineering, Santa Cruz, CA, United States of America
| | - Kyle Ellrott
- Oregon Health Sciences University, Department of Biomedical Engineering, Portland, OR, United States of America
| | - Artem Sokolov
- University of California Santa Cruz, Department of Biomolecular Engineering, Santa Cruz, CA, United States of America
- * E-mail: (JMS); (AS)
| | - Joshua M. Stuart
- University of California Santa Cruz, Department of Biomolecular Engineering, Santa Cruz, CA, United States of America
- * E-mail: (JMS); (AS)
| |
Collapse
|
92
|
Liang Y, Kelemen A. Bayesian state space models for dynamic genetic network construction across multiple tissues. Stat Appl Genet Mol Biol 2017; 15:273-90. [PMID: 27343475 DOI: 10.1515/sagmb-2014-0055] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models. The unevenly spaced short time courses with unseen time points are treated as hidden state variables. Hierarchical Bayesian approaches with various prior and hyper-prior models with Monte Carlo Markov Chain and Gibbs sampling algorithms are used to estimate the model parameters and the hidden state variables. We apply the proposed Hierarchical Bayesian state space models to multiple tissues (liver, skeletal muscle, and kidney) Affymetrix time course data sets following corticosteroid (CS) drug administration. Both simulation and real data analysis results show that the genomic changes over time and gene-gene interaction in response to CS treatment can be well captured by the proposed models. The proposed dynamic Hierarchical Bayesian state space modeling approaches could be expanded and applied to other large scale genomic data, such as next generation sequence (NGS) combined with real time and time varying electronic health record (EHR) for more comprehensive and robust systematic and network based analysis in order to transform big biomedical data into predictions and diagnostics for precision medicine and personalized healthcare with better decision making and patient outcomes.
Collapse
|
93
|
Yu B, Xu JM, Li S, Chen C, Chen RX, Wang L, Zhang Y, Wang MH. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method. Oncotarget 2017; 8:80373-80392. [PMID: 29113310 PMCID: PMC5655205 DOI: 10.18632/oncotarget.21268] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 08/27/2017] [Indexed: 01/31/2023] Open
Abstract
Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.
Collapse
Affiliation(s)
- Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- CAS Key Laboratory of Geospace Environment, Department of Geophysics and Planetary Science, University of Science and Technology of China, Hefei 230026, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Jia-Meng Xu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Shan Li
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Rui-Xin Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Lei Wang
- Key Laboratory of Eco-chemical Engineering, Ministry of Education, Laboratory of Inorganic Synthesis and Applied Chemistry, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao 266042, China
| | - Yan Zhang
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
- College of Electromechanical Engineering, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Ming-Hui Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| |
Collapse
|
94
|
Papili Gao N, Ud-Dean SMM, Gandrillon O, Gunawan R. SINCERITIES: inferring gene regulatory networks from time-stamped single cell transcriptional expression profiles. Bioinformatics 2017; 34:258-266. [PMID: 28968704 PMCID: PMC5860204 DOI: 10.1093/bioinformatics/btx575] [Citation(s) in RCA: 109] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Revised: 06/12/2017] [Accepted: 09/13/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation Single cell transcriptional profiling opens up a new avenue in studying the functional role of cell-to-cell variability in physiological processes. The analysis of single cell expression profiles creates new challenges due to the distributive nature of the data and the stochastic dynamics of gene transcription process. The reconstruction of gene regulatory networks (GRNs) using single cell transcriptional profiles is particularly challenging, especially when directed gene-gene relationships are desired. Results We developed SINCERITIES (SINgle CEll Regularized Inference using TIme-stamped Expression profileS) for the inference of GRNs from single cell transcriptional profiles. We focused on time-stamped cross-sectional expression data, commonly generated from transcriptional profiling of single cells collected at multiple time points after cell stimulation. SINCERITIES recovers directed regulatory relationships among genes by employing regularized linear regression (ridge regression), using temporal changes in the distributions of gene expressions. Meanwhile, the modes of the gene regulations (activation and repression) come from partial correlation analyses between pairs of genes. We demonstrated the efficacy of SINCERITIES in inferring GRNs using in silico time-stamped single cell expression data and single cell transcriptional profiles of THP-1 monocytic human leukemia cells. The case studies showed that SINCERITIES could provide accurate GRN predictions, significantly better than other GRN inference algorithms such as TSNI, GENIE3 and JUMP3. Moreover, SINCERITIES has a low computational complexity and is amenable to problems of extremely large dimensionality. Finally, an application of SINCERITIES to single cell expression data of T2EC chicken erythrocytes pointed to BATF as a candidate novel regulator of erythroid development. Availability and implementation MATLAB and R version of SINCERITIES are freely available from the following websites: http://www.cabsel.ethz.ch/tools/sincerities.html and https://github.com/CABSEL/SINCERITIES. The single cell THP-1 and T2EC transcriptional profiles are available from the original publications (Kouno et al., 2013; Richard et al., 2016). The in silico single cell data are available on SINCERITIES websites. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nan Papili Gao
- Institute for Chemical and Bioengineering, ETH Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - S M Minhaz Ud-Dean
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Olivier Gandrillon
- Laboratory of Biology and Modelling of the Cell, Univ Lyon, ENS de Lyon, Univ Claude Bernard, CNRS UMR, INSERM Lyon, France.,Inria Team Dracula, Inria Center Grenoble Rhône-Alpes, Rhône-Alpes, France
| | - Rudiyanto Gunawan
- Institute for Chemical and Bioengineering, ETH Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
95
|
Voigt A, Nowick K, Almaas E. A composite network of conserved and tissue specific gene interactions reveals possible genetic interactions in glioma. PLoS Comput Biol 2017; 13:e1005739. [PMID: 28957313 PMCID: PMC5634634 DOI: 10.1371/journal.pcbi.1005739] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Revised: 10/10/2017] [Accepted: 08/24/2017] [Indexed: 02/08/2023] Open
Abstract
Differential co-expression network analyses have recently become an important step in the investigation of cellular differentiation and dysfunctional gene-regulation in cell and tissue disease-states. The resulting networks have been analyzed to identify and understand pathways associated with disorders, or to infer molecular interactions. However, existing methods for differential co-expression network analysis are unable to distinguish between various forms of differential co-expression. To close this gap, here we define the three different kinds (conserved, specific, and differentiated) of differential co-expression and present a systematic framework, CSD, for differential co-expression network analysis that incorporates these interactions on an equal footing. In addition, our method includes a subsampling strategy to estimate the variance of co-expressions. Our framework is applicable to a wide variety of cases, such as the study of differential co-expression networks between healthy and disease states, before and after treatments, or between species. Applying the CSD approach to a published gene-expression data set of cerebral cortex and basal ganglia samples from healthy individuals, we find that the resulting CSD network is enriched in genes associated with cognitive function, signaling pathways involving compounds with well-known roles in the central nervous system, as well as certain neurological diseases. From the CSD analysis, we identify a set of prominent hubs of differential co-expression, whose neighborhood contains a substantial number of genes associated with glioblastoma. The resulting gene-sets identified by our CSD analysis also contain many genes that so far have not been recognized as having a role in glioblastoma, but are good candidates for further studies. CSD may thus aid in hypothesis-generation for functional disease-associations.
Collapse
Affiliation(s)
- André Voigt
- Network Systems Biology Group, Department of Biotechnology, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
| | - Katja Nowick
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
- Bioinformatics, Institute of Animal Science, University of Hohenheim, Stuttgart, Germany
- Human Biology, Institute for Biology, Free University Berlin, Berlin, Germany
| | - Eivind Almaas
- Network Systems Biology Group, Department of Biotechnology, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and General Practice, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
96
|
Predicting gene regulatory networks by combining spatial and temporal gene expression data in Arabidopsis root stem cells. Proc Natl Acad Sci U S A 2017; 114:E7632-E7640. [PMID: 28827319 DOI: 10.1073/pnas.1707566114] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Identifying the transcription factors (TFs) and associated networks involved in stem cell regulation is essential for understanding the initiation and growth of plant tissues and organs. Although many TFs have been shown to have a role in the Arabidopsis root stem cells, a comprehensive view of the transcriptional signature of the stem cells is lacking. In this work, we used spatial and temporal transcriptomic data to predict interactions among the genes involved in stem cell regulation. To accomplish this, we transcriptionally profiled several stem cell populations and developed a gene regulatory network inference algorithm that combines clustering with dynamic Bayesian network inference. We leveraged the topology of our networks to infer potential major regulators. Specifically, through mathematical modeling and experimental validation, we identified PERIANTHIA (PAN) as an important molecular regulator of quiescent center function. The results presented in this work show that our combination of molecular biology, computational biology, and mathematical modeling is an efficient approach to identify candidate factors that function in the stem cells.
Collapse
|
97
|
Deng Y, Zenil H, Tegnér J, Kiani NA. HiDi: an efficient reverse engineering schema for large-scale dynamic regulatory network reconstruction using adaptive differentiation. Bioinformatics 2017; 33:3964-3972. [DOI: 10.1093/bioinformatics/btx501] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Accepted: 08/05/2017] [Indexed: 11/14/2022] Open
Affiliation(s)
- Yue Deng
- Algorithmic Dynamics Lab, Karolinska Institute, Stockholm, Sweden
- Unit of Computational Medicine, Center for Molecular Medicine, Department of Medicine, Solna and Science for Life Laboratory (SciLifeLab), Karolinska Institute, Stockholm, Sweden
| | - Hector Zenil
- Algorithmic Dynamics Lab, Karolinska Institute, Stockholm, Sweden
- Unit of Computational Medicine, Center for Molecular Medicine, Department of Medicine, Solna and Science for Life Laboratory (SciLifeLab), Karolinska Institute, Stockholm, Sweden
| | - Jesper Tegnér
- Unit of Computational Medicine, Center for Molecular Medicine, Department of Medicine, Solna and Science for Life Laboratory (SciLifeLab), Karolinska Institute, Stockholm, Sweden
- Biological and Environmental Sciences and Engineering Division, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Narsis A Kiani
- Algorithmic Dynamics Lab, Karolinska Institute, Stockholm, Sweden
- Unit of Computational Medicine, Center for Molecular Medicine, Department of Medicine, Solna and Science for Life Laboratory (SciLifeLab), Karolinska Institute, Stockholm, Sweden
| |
Collapse
|
98
|
Rowland MA, Abdelzaher A, Ghosh P, Mayo ML. Crosstalk and the Dynamical Modularity of Feed-Forward Loops in Transcriptional Regulatory Networks. Biophys J 2017; 112:1539-1550. [PMID: 28445746 PMCID: PMC5406374 DOI: 10.1016/j.bpj.2017.02.044] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Revised: 02/08/2017] [Accepted: 02/16/2017] [Indexed: 01/16/2023] Open
Abstract
Network motifs, such as the feed-forward loop (FFL), introduce a range of complex behaviors to transcriptional regulatory networks, yet such properties are typically determined from their isolated study. We characterize the effects of crosstalk on FFL dynamics by modeling the cross regulation between two different FFLs and evaluate the extent to which these patterns occur in vivo. Analytical modeling suggests that crosstalk should overwhelmingly affect individual protein-expression dynamics. Counter to this expectation we find that entire FFLs are more likely than expected to resist the effects of crosstalk (≈20% for one crosstalk interaction) and remain dynamically modular. The likelihood that cross-linked FFLs are dynamically correlated increases monotonically with additional crosstalk, but is independent of the specific regulation type or connectivity of the interactions. Just one additional regulatory interaction is sufficient to drive the FFL dynamics to a statistically different state. Despite the potential for modularity between sparsely connected network motifs, Escherichia coli (E. coli) appears to favor crosstalk wherein at least one of the cross-linked FFLs remains modular. A gene ontology analysis reveals that stress response processes are significantly overrepresented in the cross-linked motifs found within E. coli. Although the daunting complexity of biological networks affects the dynamical properties of individual network motifs, some resist and remain modular, seemingly insulated from extrinsic perturbations—an intriguing possibility for nature to consistently and reliably provide certain network functionalities wherever the need arise.
Collapse
Affiliation(s)
- Michael A Rowland
- Oak Ridge Institute for Science and Education, Oak Ridge, Tennessee; Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, Mississippi
| | - Ahmed Abdelzaher
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia
| | - Michael L Mayo
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, Mississippi.
| |
Collapse
|
99
|
Reconstructing Genetic Regulatory Networks Using Two-Step Algorithms with the Differential Equation Models of Neural Networks. Interdiscip Sci 2017; 10:823-835. [PMID: 28748400 DOI: 10.1007/s12539-017-0254-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Revised: 07/01/2017] [Accepted: 07/14/2017] [Indexed: 10/19/2022]
Abstract
BACKGROUND The identification of genetic regulatory networks (GRNs) provides insights into complex cellular processes. A class of recurrent neural networks (RNNs) captures the dynamics of GRN. Algorithms combining the RNN and machine learning schemes were proposed to reconstruct small-scale GRNs using gene expression time series. RESULTS We present new GRN reconstruction methods with neural networks. The RNN is extended to a class of recurrent multilayer perceptrons (RMLPs) with latent nodes. Our methods contain two steps: the edge rank assignment step and the network construction step. The former assigns ranks to all possible edges by a recursive procedure based on the estimated weights of wires of RNN/RMLP (RERNN/RERMLP), and the latter constructs a network consisting of top-ranked edges under which the optimized RNN simulates the gene expression time series. The particle swarm optimization (PSO) is applied to optimize the parameters of RNNs and RMLPs in a two-step algorithm. The proposed RERNN-RNN and RERMLP-RNN algorithms are tested on synthetic and experimental gene expression time series of small GRNs of about 10 genes. The experimental time series are from the studies of yeast cell cycle regulated genes and E. coli DNA repair genes. CONCLUSION The unstable estimation of RNN using experimental time series having limited data points can lead to fairly arbitrary predicted GRNs. Our methods incorporate RNN and RMLP into a two-step structure learning procedure. Results show that the RERMLP using the RMLP with a suitable number of latent nodes to reduce the parameter dimension often result in more accurate edge ranks than the RERNN using the regularized RNN on short simulated time series. Combining by a weighted majority voting rule the networks derived by the RERMLP-RNN using different numbers of latent nodes in step one to infer the GRN, the method performs consistently and outperforms published algorithms for GRN reconstruction on most benchmark time series. The framework of two-step algorithms can potentially incorporate with different nonlinear differential equation models to reconstruct the GRN.
Collapse
|
100
|
Reverse engineering highlights potential principles of large gene regulatory network design and learning. NPJ Syst Biol Appl 2017. [PMID: 28649444 PMCID: PMC5481436 DOI: 10.1038/s41540-017-0019-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Inferring transcriptional gene regulatory networks from transcriptomic datasets is a key challenge of systems biology, with potential impacts ranging from medicine to agronomy. There are several techniques used presently to experimentally assay transcription factors to target relationships, defining important information about real gene regulatory networks connections. These techniques include classical ChIP-seq, yeast one-hybrid, or more recently, DAP-seq or target technologies. These techniques are usually used to validate algorithm predictions. Here, we developed a reverse engineering approach based on mathematical and computer simulation to evaluate the impact that this prior knowledge on gene regulatory networks may have on training machine learning algorithms. First, we developed a gene regulatory networks-simulating engine called FRANK (Fast Randomizing Algorithm for Network Knowledge) that is able to simulate large gene regulatory networks (containing 104 genes) with characteristics of gene regulatory networks observed in vivo. FRANK also generates stable or oscillatory gene expression directly produced by the simulated gene regulatory networks. The development of FRANK leads to important general conclusions concerning the design of large and stable gene regulatory networks harboring scale free properties (built ex nihilo). In combination with supervised (accepting prior knowledge) support vector machine algorithm we (i) address biologically oriented questions concerning our capacity to accurately reconstruct gene regulatory networks and in particular we demonstrate that prior-knowledge structure is crucial for accurate learning, and (ii) draw conclusions to inform experimental design to performed learning able to solve gene regulatory networks in the future. By demonstrating that our predictions concerning the influence of the prior-knowledge structure on support vector machine learning capacity holds true on real data (Escherichia coli K14 network reconstruction using network and transcriptomic data), we show that the formalism used to build FRANK can to some extent be a reasonable model for gene regulatory networks in real cells. This work by Carré et al addresses central questions in biology, which are: how very large gene regulatory networks (GRNs) are organized, generate stable gene expression, and can be learnt using machine learning algorithms? In this work authors developed an algorithm able to simulate large GRNs. From these networks they simulate stable or oscillating gene expression and highlights some mathematical rules controlling such a collective (several thousands of genes) behavior. They discuss consequent hypothesis concerning the organization of GRNs in real cells. Using this simulation tool, authors also demonstrate that it’s likely possible to computationally learn GRNs from transcriptomic data and prior knowledge on the network (actual known connections issued from Yeast One Hybrid or ChIP Seq for instance). They particularly highlight the crucial importance of the prior knowledge structure in their capacity to learn large GRNs.
Collapse
|