1
|
Cao X, Zhang L, Islam MK, Zhao M, He C, Zhang K, Liu S, Sha Q, Wei H. TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization. NAR Genom Bioinform 2023; 5:lqad083. [PMID: 37711605 PMCID: PMC10498345 DOI: 10.1093/nargab/lqad083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 05/30/2023] [Accepted: 08/30/2023] [Indexed: 09/16/2023] Open
Abstract
Four statistical selection methods for inferring transcription factor (TF)-target gene (TG) pairs were developed by coupling mean squared error (MSE) or Huber loss function, with elastic net (ENET) or least absolute shrinkage and selection operator (Lasso) penalty. Two methods were also developed for inferring pathway gene regulatory networks (GRNs) by combining Huber or MSE loss function with a network (Net)-based penalty. To solve these regressions, we ameliorated an accelerated proximal gradient descent (APGD) algorithm to optimize parameter selection processes, resulting in an equally effective but much faster algorithm than the commonly used convex optimization solver. The synthetic data generated in a general setting was used to test four TF-TG identification methods, ENET-based methods performed better than Lasso-based methods. Synthetic data generated from two network settings was used to test Huber-Net and MSE-Net, which outperformed all other methods. The TF-TG identification methods were also tested with SND1 and gl3 overexpression transcriptomic data, Huber-ENET and MSE-ENET outperformed all other methods when genome-wide predictions were performed. The TF-TG identification methods fill the gap of lacking a method for genome-wide TG prediction of a TF, and potential for validating ChIP/DAP-seq results, while the two Net-based methods are instrumental for predicting pathway GRNs.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Ling Zhang
- Computational Science and Engineering Program, Michigan Technological University, Houghton, MI 49931, USA
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| | - Md Khairul Islam
- Computational Science and Engineering Program, Michigan Technological University, Houghton, MI 49931, USA
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| | - Mingxia Zhao
- Department of Plant Pathology, Kansas State University, Manhattan, KS 66506, USA
| | - Cheng He
- Department of Plant Pathology, Kansas State University, Manhattan, KS 66506, USA
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Sanzhen Liu
- Department of Plant Pathology, Kansas State University, Manhattan, KS 66506, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Hairong Wei
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
- Computational Science and Engineering Program, Michigan Technological University, Houghton, MI 49931, USA
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| |
Collapse
|
2
|
Shachaf LI, Roberts E, Cahan P, Xiao J. Gene regulation network inference using k-nearest neighbor-based mutual information estimation: revisiting an old DREAM. BMC Bioinformatics 2023; 24:84. [PMID: 36879188 PMCID: PMC9990267 DOI: 10.1186/s12859-022-05047-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 11/08/2022] [Indexed: 03/08/2023] Open
Abstract
BACKGROUND A cell exhibits a variety of responses to internal and external cues. These responses are possible, in part, due to the presence of an elaborate gene regulatory network (GRN) in every single cell. In the past 20 years, many groups worked on reconstructing the topological structure of GRNs from large-scale gene expression data using a variety of inference algorithms. Insights gained about participating players in GRNs may ultimately lead to therapeutic benefits. Mutual information (MI) is a widely used metric within this inference/reconstruction pipeline as it can detect any correlation (linear and non-linear) between any number of variables (n-dimensions). However, the use of MI with continuous data (for example, normalized fluorescence intensity measurement of gene expression levels) is sensitive to data size, correlation strength and underlying distributions, and often requires laborious and, at times, ad hoc optimization. RESULTS In this work, we first show that estimating MI of a bi- and tri-variate Gaussian distribution using k-nearest neighbor (kNN) MI estimation results in significant error reduction as compared to commonly used methods based on fixed binning. Second, we demonstrate that implementing the MI-based kNN Kraskov-Stoögbauer-Grassberger (KSG) algorithm leads to a significant improvement in GRN reconstruction for popular inference algorithms, such as Context Likelihood of Relatedness (CLR). Finally, through extensive in-silico benchmarking we show that a new inference algorithm CMIA (Conditional Mutual Information Augmentation), inspired by CLR, in combination with the KSG-MI estimator, outperforms commonly used methods. CONCLUSIONS Using three canonical datasets containing 15 synthetic networks, the newly developed method for GRN reconstruction-which combines CMIA, and the KSG-MI estimator-achieves an improvement of 20-35% in precision-recall measures over the current gold standard in the field. This new method will enable researchers to discover new gene interactions or better choose gene candidates for experimental validations.
Collapse
Affiliation(s)
- Lior I Shachaf
- Department of Biophysics, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA.
| | - Elijah Roberts
- Department of Biophysics, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
- 10x Genomics, 6230 Stoneridge Mall Road, Pleasanton, CA, 94588-3260, USA
| | - Patrick Cahan
- Department of Biomedical Engineering, Department of Molecular Biology and Genetics, Institute for Cell Engineering, Johns Hopkins School of Medicine, 733 N. Broadway, Baltimore, MD, 21205, USA
| | - Jie Xiao
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, 725 N. Wolfe Street, WBSB 708, Baltimore, MD, 21205, USA
| |
Collapse
|
3
|
Borthakur D, Busov V, Cao XH, Du Q, Gailing O, Isik F, Ko JH, Li C, Li Q, Niu S, Qu G, Vu THG, Wang XR, Wei Z, Zhang L, Wei H. Current status and trends in forest genomics. FORESTRY RESEARCH 2022; 2:11. [PMID: 39525413 PMCID: PMC11524260 DOI: 10.48130/fr-2022-0011] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 08/19/2022] [Indexed: 11/16/2024]
Abstract
Forests are not only the most predominant of the Earth's terrestrial ecosystems, but are also the core supply for essential products for human use. However, global climate change and ongoing population explosion severely threatens the health of the forest ecosystem and aggravtes the deforestation and forest degradation. Forest genomics has great potential of increasing forest productivity and adaptation to the changing climate. In the last two decades, the field of forest genomics has advanced quickly owing to the advent of multiple high-throughput sequencing technologies, single cell RNA-seq, clustered regularly interspaced short palindromic repeats (CRISPR)-mediated genome editing, and spatial transcriptomes, as well as bioinformatics analysis technologies, which have led to the generation of multidimensional, multilayered, and spatiotemporal gene expression data. These technologies, together with basic technologies routinely used in plant biotechnology, enable us to tackle many important or unique issues in forest biology, and provide a panoramic view and an integrative elucidation of molecular regulatory mechanisms underlying phenotypic changes and variations. In this review, we recapitulated the advancement and current status of 12 research branches of forest genomics, and then provided future research directions and focuses for each area. Evidently, a shift from simple biotechnology-based research to advanced and integrative genomics research, and a setup for investigation and interpretation of many spatiotemporal development and differentiation issues in forest genomics have just begun to emerge.
Collapse
Affiliation(s)
- Dulal Borthakur
- Dulal Borthakur, Department of Molecular Biosciences and Bioengineering, University of Hawaii at Manoa, 1955 East-West Road, Honolulu, HI 96822, USA
| | - Victor Busov
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| | - Xuan Hieu Cao
- Forest Genetics and Forest Tree Breeding, Faculty for Forest Sciences and Forest Ecology, University of Göttingen, Büsgenweg 2, 37077 Göttingen, Germany
| | - Qingzhang Du
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, P.R. China
| | - Oliver Gailing
- Forest Genetics and Forest Tree Breeding, Faculty for Forest Sciences and Forest Ecology, University of Göttingen, Büsgenweg 2, 37077 Göttingen, Germany
| | - Fikret Isik
- Cooperative Tree Improvement Program, North Carolina State University, Raleigh, NC 27695, USA
| | - Jae-Heung Ko
- Department of Plant & Environmental New Resources, Kyung Hee University, 1732 Deogyeong-daero, Yongin 17104, Republic of Korea
| | - Chenghao Li
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150040, P.R. China
| | - Quanzi Li
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing 100093, P.R. China
| | - Shihui Niu
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, P.R. China
| | - Guanzheng Qu
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150040, P.R. China
| | - Thi Ha Giang Vu
- Forest Genetics and Forest Tree Breeding, Faculty for Forest Sciences and Forest Ecology, University of Göttingen, Büsgenweg 2, 37077 Göttingen, Germany
| | - Xiao-Ru Wang
- Department of Ecology and Environmental Science, Umeå Plant Science Centre, Umeå University, Umeå 90187, Sweden
| | - Zhigang Wei
- College of Life Sciences, Heilongjiang University, Harbin 150080, P. R. China
| | - Lin Zhang
- Key Laboratory of Cultivation and Protection for Non-Wood Forest Trees, Ministry of Education, Central South University of Forestry and Technology, Changsha 410004, Hunan Province, P.R. China
| | - Hairong Wei
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| |
Collapse
|
4
|
Jeong D, Lim S, Lee S, Oh M, Cho C, Seong H, Jung W, Kim S. Construction of Condition-Specific Gene Regulatory Network Using Kernel Canonical Correlation Analysis. Front Genet 2021; 12:652623. [PMID: 34093651 PMCID: PMC8172963 DOI: 10.3389/fgene.2021.652623] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 03/26/2021] [Indexed: 01/01/2023] Open
Abstract
Gene expression profile or transcriptome can represent cellular states, thus understanding gene regulation mechanisms can help understand how cells respond to external stress. Interaction between transcription factor (TF) and target gene (TG) is one of the representative regulatory mechanisms in cells. In this paper, we present a novel computational method to construct condition-specific transcriptional networks from transcriptome data. Regulatory interaction between TFs and TGs is very complex, specifically multiple-to-multiple relations. Experimental data from TF Chromatin Immunoprecipitation sequencing is useful but produces one-to-multiple relations between TF and TGs. On the other hand, co-expression networks of genes can be useful for constructing condition transcriptional networks, but there are many false positive relations in co-expression networks. In this paper, we propose a novel method to construct a condition-specific and combinatorial transcriptional network, applying kernel canonical correlation analysis (kernel CCA) to identify multiple-to-multiple TF-TG relations in certain biological condition. Kernel CCA is a well-established statistical method for computing the correlation of a group of features vs. another group of features. We, therefore, employed kernel CCA to embed TFs and TGs into a new space where the correlation of TFs and TGs are reflected. To demonstrate the usefulness of our network construction method, we used the blood transcriptome data for the investigation on the response to high fat diet in a human and an arabidopsis data set for the investigation on the response to cold/heat stress. Our method detected not only important regulatory interactions reported in previous studies but also novel TF-TG relations where a module of TF is regulating a module of TGs upon specific stress.
Collapse
Affiliation(s)
- Dabin Jeong
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
| | - Sangseon Lee
- BK21 FOUR Intelligence Computing, Seoul National University, Seoul, South Korea
| | - Minsik Oh
- Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea
| | - Changyun Cho
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Hyeju Seong
- Department of Crop Science, Konkuk University, Seoul, South Korea
| | - Woosuk Jung
- Department of Crop Science, Konkuk University, Seoul, South Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul, South Korea
| |
Collapse
|
5
|
Mahmoodi SH, Aghdam R, Eslahchi C. An order independent algorithm for inferring gene regulatory network using quantile value for conditional independence tests. Sci Rep 2021; 11:7605. [PMID: 33828122 PMCID: PMC8027014 DOI: 10.1038/s41598-021-87074-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 03/24/2021] [Indexed: 10/31/2022] Open
Abstract
In recent years, due to the difficulty and inefficiency of experimental methods, numerous computational methods have been introduced for inferring the structure of Gene Regulatory Networks (GRNs). The Path Consistency (PC) algorithm is one of the popular methods to infer the structure of GRNs. However, this group of methods still has limitations and there is a potential for improvements in this field. For example, the PC-based algorithms are still sensitive to the ordering of nodes i.e. different node orders results in different network structures. The second is that the networks inferred by these methods are highly dependent on the threshold used for independence testing. Also, it is still a challenge to select the set of conditional genes in an optimal way, which affects the performance and computation complexity of the PC-based algorithm. We introduce a novel algorithm, namely Order Independent PC-based algorithm using Quantile value (OIPCQ), which improves the accuracy of the learning process of GRNs and solves the order dependency issue. The quantile-based thresholds are considered for different orders of CMI tests. For conditional gene selection, we consider the paths between genes with length equal or greater than 2 while other well-known PC-based methods only consider the paths of length 2. We applied OIPCQ on the various networks of the DREAM3 and DREAM4 in silico challenges. As a real-world case study, we used OIPCQ to reconstruct SOS DNA network obtained from Escherichia coli and GRN for acute myeloid leukemia based on the RNA sequencing data from The Cancer Genome Atlas. The results show that OIPCQ produces the same network structure for all the permutations of the genes and improves the resulted GRN through accurately quantifying the causal regulation strength in comparison with other well-known PC-based methods. According to the GRN constructed by OIPCQ, for acute myeloid leukemia, two regulators BCLAF1 and NRSF reported previously are significantly important. However, the highest degree nodes in this GRN are ZBTB7A and PU1 which play a significant role in cancer, especially in leukemia. OIPCQ is freely accessible at https://github.com/haammim/OIPCQ-and-OIPCQ2 .
Collapse
Affiliation(s)
- Sayyed Hadi Mahmoodi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Rosa Aghdam
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. .,School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. .,School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| |
Collapse
|
6
|
EnGRNT: Inference of gene regulatory networks using ensemble methods and topological feature extraction. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
7
|
Hayama Nishida CE, Costa Bianchi RA, Reali Costa AH. A framework to shift basins of attraction of gene regulatory networks through batch reinforcement learning. Artif Intell Med 2020; 107:101853. [PMID: 32828434 DOI: 10.1016/j.artmed.2020.101853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 03/23/2020] [Accepted: 03/31/2020] [Indexed: 11/25/2022]
Abstract
A major challenge in gene regulatory networks (GRN) of biological systems is to discover when and what interventions should be applied to shift them to healthy phenotypes. A set of gene activity profiles, called basin of attraction (BOA), takes this network to a specific phenotype; therefore, a healthy BOA leads the GRN to a healthy phenotype. However, without the complete observability of the genes, it is not possible to identify whether the current BOA is healthy. In this article we investigate external interventions in GRN with partial observability aiming to bring it to healthy BOAs. We propose a new batch reinforcement learning method (BRL), called mSFQI, to define intervention strategies based on the probabilities of the gene activity profiles being in healthy BOAs, which are calculated from a set of previous observed experiences. BRL uses approximation functions and repeated applications of previous experiences to accelerate learning. Results demonstrate that our proposal can quickly shift a partially observable GRN to healthy BOAs, while reducing the number of interventions. In addition, when observability is poor, mSFQI produces better results when the probabilities for a greater amount of previous observations are available.
Collapse
|
8
|
Wei H. Construction of a hierarchical gene regulatory network centered around a transcription factor. Brief Bioinform 2020; 20:1021-1031. [PMID: 29186304 DOI: 10.1093/bib/bbx152] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Revised: 10/11/2017] [Indexed: 12/24/2022] Open
Abstract
We have modified a multitude of transcription factors (TFs) in numerous plant species and some animal species, and obtained transgenic lines that exhibit phenotypic alterations. Whenever we observe phenotypic changes in a TF's transgenic lines, we are always eager to identify its target genes, collaborative regulators and even upstream high hierarchical regulators. This issue can be addressed by establishing a multilayered hierarchical gene regulatory network (ML-hGRN) centered around a given TF. In this article, a practical approach for constructing an ML-hGRN centered on a TF using a combined approach of top-down and bottom-up network construction methods is described. Strategies for constructing ML-hGRNs are vitally important, as these networks provide key information to advance our understanding of how biological processes are regulated.
Collapse
Affiliation(s)
- Hairong Wei
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, Heilongjiang, China.,School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, USA
| |
Collapse
|
9
|
Bucur IG, Claassen T, Heskes T. Large-scale local causal inference of gene regulatory relationships. Int J Approx Reason 2019. [DOI: 10.1016/j.ijar.2019.08.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
10
|
Ramos PIP, Arge LWP, Lima NCB, Fukutani KF, de Queiroz ATL. Leveraging User-Friendly Network Approaches to Extract Knowledge From High-Throughput Omics Datasets. Front Genet 2019; 10:1120. [PMID: 31798629 PMCID: PMC6863976 DOI: 10.3389/fgene.2019.01120] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Accepted: 10/16/2019] [Indexed: 11/13/2022] Open
Abstract
Recent technological advances for the acquisition of multi-omics data have allowed an unprecedented understanding of the complex intricacies of biological systems. In parallel, a myriad of computational analysis techniques and bioinformatics tools have been developed, with many efforts directed towards the creation and interpretation of networks from this data. In this review, we begin by examining key network concepts and terminology. Then, computational tools that allow for their construction and analysis from high-throughput omics datasets are presented. We focus on the study of functional relationships such as co-expression, protein-protein interactions, and regulatory interactions that are particularly amenable to modeling using the framework of networks. We envisage that many potential users of these analytical strategies may not be completely literate in programming languages and code adaptation, and for this reason, emphasis is given to tools' user-friendliness, including plugins for the widely adopted Cytoscape software, an open-source, cross-platform tool for network analysis, visualization, and data integration.
Collapse
Affiliation(s)
- Pablo Ivan Pereira Ramos
- Center for Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Brazil
| | - Luis Willian Pacheco Arge
- Laboratório de Genética Molecular e Biotecnologia Vegetal, Centro de Ciências da Saúde, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | | | - Kiyoshi F. Fukutani
- Multinational Organization Network Sponsoring Translational and Epidemiological Research (MONSTER) Initiative, Fundação José Silveira, Salvador, Brazil
| | - Artur Trancoso L. de Queiroz
- Center for Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Brazil
| |
Collapse
|
11
|
Muthiah A, Angulo MS, Walker NN, Keller SR, Lee JK. Biologically anchored knowledge expansion approach uncovers KLF4 as a novel insulin signaling regulator. PLoS One 2018; 13:e0204100. [PMID: 30240435 PMCID: PMC6150497 DOI: 10.1371/journal.pone.0204100] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Accepted: 09/04/2018] [Indexed: 12/12/2022] Open
Abstract
One of the biggest challenges in analyzing high throughput omics data in biological studies is extracting information that is relevant to specific biological mechanisms of interest while simultaneously restricting the number of false positive findings. Due to random chances with numerous candidate targets and mechanisms, computational approaches often yield a large number of false positives that cannot easily be discerned from relevant biological findings without costly, and often infeasible, biological experiments. We here introduce and apply an integrative bioinformatics approach, Biologically Anchored Knowledge Expansion (BAKE), which uses sequential statistical analysis and literature mining to identify highly relevant network genes and effectively removes false positive findings. Applying BAKE to genomic expression data collected from mouse (Mus musculus) adipocytes during insulin resistance progression, we uncovered the transcription factor Krueppel-like Factor 4 (KLF4) as a regulator of early insulin signaling. We experimentally confirmed that KLF4 controls the expression of two key insulin signaling molecules, the Insulin Receptor Substrate 2 (IRS2) and Tuberous Sclerosis Complex 2 (TSC2).
Collapse
Affiliation(s)
- Annamalai Muthiah
- Department of Systems and Information Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| | - Morgan S. Angulo
- Department of Surgery, University of Virginia Medical Center, University of Virginia, Charlottesville, Virginia, United States of America
| | - Natalie N. Walker
- Department of Medicine, Division of Endocrinology and Metabolism, University of Virginia, Charlottesville, Virginia, United States of America
| | - Susanna R. Keller
- Department of Medicine, Division of Endocrinology and Metabolism, University of Virginia, Charlottesville, Virginia, United States of America
| | - Jae K. Lee
- Department of Systems and Information Engineering, University of Virginia, Charlottesville, Virginia, United States of America
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida, United States of America
- Department of Public Health Sciences, University of Virginia School of Medicine, University of Virginia, Charlottesville, Virginia, United States of America
| |
Collapse
|
12
|
Tavakkolkhah P, Zimmer R, Küffner R. Detection of network motifs using three-way ANOVA. PLoS One 2018; 13:e0201382. [PMID: 30080876 PMCID: PMC6078297 DOI: 10.1371/journal.pone.0201382] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 07/13/2018] [Indexed: 01/03/2023] Open
Abstract
Motivation Gene regulatory networks (GRN) can be determined via various experimental techniques, and also by computational methods, which infer networks from gene expression data. However, these techniques treat interactions separately such that interdependencies of interactions forming meaningful subnetworks are typically not considered. Methods For the investigation of network properties and for the classification of different (sub-)networks based on gene expression data, we consider biological network motifs consisting of three genes and up to three interactions, e.g. the cascade chain (CSC), feed-forward loop (FFL), and dense-overlapping regulon (DOR). We examine several conventional methods for the inference of network motifs, which typically consider each interaction individually. In addition, we propose a new method based on three-way ANOVA (ANalysis Of VAriance) (3WA) that analyzes entire subnetworks at once. To demonstrate the advantages of such a more holistic perspective, we compare the ability of 3WA and other methods to detect and categorize network motifs on large real and artificial datasets. Results We find that conventional methods perform much better on artificial data (AUC up to 80%), than on real E. coli expression datasets (AUC 50% corresponding to random guessing). To explain this observation, we examine several important properties that differ between datasets and analyze predicted motifs in detail. We find that in case of real networks our new 3WA method outperforms (AUC 70% in E. coli) previous methods by exploiting the interdependencies in the full motif structure. Because of important differences between current artificial datasets and real measurements, the construction and testing of motif detection methods should focus on real data.
Collapse
Affiliation(s)
- Pegah Tavakkolkhah
- Department of Informatics, Ludwig-Maximilians-Universität München, München, Germany
| | - Ralf Zimmer
- Department of Informatics, Ludwig-Maximilians-Universität München, München, Germany
| | - Robert Küffner
- Department of Informatics, Ludwig-Maximilians-Universität München, München, Germany
- Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
- * E-mail:
| |
Collapse
|
13
|
Monneret G, Jaffrézic F, Rau A, Zerjal T, Nuel G. Identification of marginal causal relationships in gene networks from observational and interventional expression data. PLoS One 2017; 12:e0171142. [PMID: 28301504 PMCID: PMC5354375 DOI: 10.1371/journal.pone.0171142] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Accepted: 01/01/2017] [Indexed: 11/29/2022] Open
Abstract
Causal network inference is an important methodological challenge in biology as well as other areas of application. Although several causal network inference methods have been proposed in recent years, they are typically applicable for only a small number of genes, due to the large number of parameters to be estimated and the limited number of biological replicates available. In this work, we consider the specific case of transcriptomic studies made up of both observational and interventional data in which a single gene of biological interest is knocked out. We focus on a marginal causal estimation approach, based on the framework of Gaussian directed acyclic graphs, to infer causal relationships between the knocked-out gene and a large set of other genes. In a simulation study, we found that our proposed method accurately differentiates between downstream causal relationships and those that are upstream or simply associative. It also enables an estimation of the total causal effects between the gene of interest and the remaining genes. Our method performed very similarly to a classical differential analysis for experiments with a relatively large number of biological replicates, but has the advantage of providing a formal causal interpretation. Our proposed marginal causal approach is computationally efficient and may be applied to several thousands of genes simultaneously. In addition, it may help highlight subsets of genes of interest for a more thorough subsequent causal network inference. The method is implemented in an R package called MarginalCausality (available on GitHub).
Collapse
Affiliation(s)
- Gilles Monneret
- UMR GABI, AgroParisTech, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
- LPMA, UMR CNRS 7599, UPMC, Sorbonne Universités, 4 place Jussieu, 75005 Paris, France
- * E-mail:
| | - Florence Jaffrézic
- UMR GABI, AgroParisTech, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Andrea Rau
- UMR GABI, AgroParisTech, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Tatiana Zerjal
- UMR GABI, AgroParisTech, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Grégory Nuel
- LPMA, UMR CNRS 7599, UPMC, Sorbonne Universités, 4 place Jussieu, 75005 Paris, France
| |
Collapse
|
14
|
Deng W, Zhang K, Busov V, Wei H. Recursive random forest algorithm for constructing multilayered hierarchical gene regulatory networks that govern biological pathways. PLoS One 2017; 12:e0171532. [PMID: 28158291 PMCID: PMC5291523 DOI: 10.1371/journal.pone.0171532] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2016] [Accepted: 01/23/2017] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Present knowledge indicates a multilayered hierarchical gene regulatory network (ML-hGRN) often operates above a biological pathway. Although the ML-hGRN is very important for understanding how a pathway is regulated, there is almost no computational algorithm for directly constructing ML-hGRNs. RESULTS A backward elimination random forest (BWERF) algorithm was developed for constructing the ML-hGRN operating above a biological pathway. For each pathway gene, the BWERF used a random forest model to calculate the importance values of all transcription factors (TFs) to this pathway gene recursively with a portion (e.g. 1/10) of least important TFs being excluded in each round of modeling, during which, the importance values of all TFs to the pathway gene were updated and ranked until only one TF was remained in the list. The above procedure, termed BWERF. After that, the importance values of a TF to all pathway genes were aggregated and fitted to a Gaussian mixture model to determine the TF retention for the regulatory layer immediately above the pathway layer. The acquired TFs at the secondary layer were then set to be the new bottom layer to infer the next upper layer, and this process was repeated until a ML-hGRN with the expected layers was obtained. CONCLUSIONS BWERF improved the accuracy for constructing ML-hGRNs because it used backward elimination to exclude the noise genes, and aggregated the individual importance values for determining the TFs retention. We validated the BWERF by using it for constructing ML-hGRNs operating above mouse pluripotency maintenance pathway and Arabidopsis lignocellulosic pathway. Compared to GENIE3, BWERF showed an improvement in recognizing authentic TFs regulating a pathway. Compared to the bottom-up Gaussian graphical model algorithm we developed for constructing ML-hGRNs, the BWERF can construct ML-hGRNs with significantly reduced edges that enable biologists to choose the implicit edges for experimental validation.
Collapse
Affiliation(s)
- Wenping Deng
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, United States of America
| | - Kui Zhang
- Department of Mathematical Sciences Michigan Technological University, Houghton, MI, United States of America
| | - Victor Busov
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, United States of America
| | - Hairong Wei
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, United States of America
- Life Science and Technology Institute, Michigan Technological University, Houghton, Michigan, MI, United States of America
| |
Collapse
|
15
|
Henriques D, Villaverde AF, Rocha M, Saez-Rodriguez J, Banga JR. Data-driven reverse engineering of signaling pathways using ensembles of dynamic models. PLoS Comput Biol 2017; 13:e1005379. [PMID: 28166222 PMCID: PMC5319798 DOI: 10.1371/journal.pcbi.1005379] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 02/21/2017] [Accepted: 01/24/2017] [Indexed: 11/19/2022] Open
Abstract
Despite significant efforts and remarkable progress, the inference of signaling networks from experimental data remains very challenging. The problem is particularly difficult when the objective is to obtain a dynamic model capable of predicting the effect of novel perturbations not considered during model training. The problem is ill-posed due to the nonlinear nature of these systems, the fact that only a fraction of the involved proteins and their post-translational modifications can be measured, and limitations on the technologies used for growing cells in vitro, perturbing them, and measuring their variations. As a consequence, there is a pervasive lack of identifiability. To overcome these issues, we present a methodology called SELDOM (enSEmbLe of Dynamic lOgic-based Models), which builds an ensemble of logic-based dynamic models, trains them to experimental data, and combines their individual simulations into an ensemble prediction. It also includes a model reduction step to prune spurious interactions and mitigate overfitting. SELDOM is a data-driven method, in the sense that it does not require any prior knowledge of the system: the interaction networks that act as scaffolds for the dynamic models are inferred from data using mutual information. We have tested SELDOM on a number of experimental and in silico signal transduction case-studies, including the recent HPN-DREAM breast cancer challenge. We found that its performance is highly competitive compared to state-of-the-art methods for the purpose of recovering network topology. More importantly, the utility of SELDOM goes beyond basic network inference (i.e. uncovering static interaction networks): it builds dynamic (based on ordinary differential equation) models, which can be used for mechanistic interpretations and reliable dynamic predictions in new experimental conditions (i.e. not used in the training). For this task, SELDOM's ensemble prediction is not only consistently better than predictions from individual models, but also often outperforms the state of the art represented by the methods used in the HPN-DREAM challenge.
Collapse
Affiliation(s)
- David Henriques
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
| | - Alejandro F. Villaverde
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Julio Saez-Rodriguez
- Joint Research Center for Computational Biomedicine, RWTH-Aachen University, Aachen, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Julio R. Banga
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
| |
Collapse
|
16
|
Liu W, Zhu W, Liao B, Chen H, Ren S, Cai L. Improving gene regulatory network structure using redundancy reduction in the MRNET algorithm. RSC Adv 2017. [DOI: 10.1039/c7ra01557g] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Inferring gene regulatory networks from expression data is a central problem in systems biology.
Collapse
Affiliation(s)
- Wei Liu
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Wen Zhu
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Bo Liao
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Haowen Chen
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Siqi Ren
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Lijun Cai
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| |
Collapse
|
17
|
Gene Regulatory Network Inferences Using a Maximum-Relevance and Maximum-Significance Strategy. PLoS One 2016; 11:e0166115. [PMID: 27829000 PMCID: PMC5102470 DOI: 10.1371/journal.pone.0166115] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 10/24/2016] [Indexed: 12/18/2022] Open
Abstract
Recovering gene regulatory networks from expression data is a challenging problem in systems biology that provides valuable information on the regulatory mechanisms of cells. A number of algorithms based on computational models are currently used to recover network topology. However, most of these algorithms have limitations. For example, many models tend to be complicated because of the "large p, small n" problem. In this paper, we propose a novel regulatory network inference method called the maximum-relevance and maximum-significance network (MRMSn) method, which converts the problem of recovering networks into a problem of how to select the regulator genes for each gene. To solve the latter problem, we present an algorithm that is based on information theory and selects the regulator genes for a specific gene by maximizing the relevance and significance. A first-order incremental search algorithm is used to search for regulator genes. Eventually, a strict constraint is adopted to adjust all of the regulatory relationships according to the obtained regulator genes and thus obtain the complete network structure. We performed our method on five different datasets and compared our method to five state-of-the-art methods for network inference based on information theory. The results confirm the effectiveness of our method.
Collapse
|
18
|
Banf M, Rhee SY. Computational inference of gene regulatory networks: Approaches, limitations and opportunities. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2016; 1860:41-52. [PMID: 27641093 DOI: 10.1016/j.bbagrm.2016.09.003] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Revised: 09/08/2016] [Accepted: 09/08/2016] [Indexed: 10/21/2022]
Abstract
Gene regulatory networks lie at the core of cell function control. In E. coli and S. cerevisiae, the study of gene regulatory networks has led to the discovery of regulatory mechanisms responsible for the control of cell growth, differentiation and responses to environmental stimuli. In plants, computational rendering of gene regulatory networks is gaining momentum, thanks to the recent availability of high-quality genomes and transcriptomes and development of computational network inference approaches. Here, we review current techniques, challenges and trends in gene regulatory network inference and highlight challenges and opportunities for plant science. We provide plant-specific application examples to guide researchers in selecting methodologies that suit their particular research questions. Given the interdisciplinary nature of gene regulatory network inference, we tried to cater to both biologists and computer scientists to help them engage in a dialogue about concepts and caveats in network inference. Specifically, we discuss problems and opportunities in heterogeneous data integration for eukaryotic organisms and common caveats to be considered during network model evaluation. This article is part of a Special Issue entitled: Plant Gene Regulatory Mechanisms and Networks, edited by Dr. Erich Grotewold and Dr. Nathan Springer.
Collapse
Affiliation(s)
- Michael Banf
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama Street, Stanford 93405, United States.
| | - Seung Y Rhee
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama Street, Stanford 93405, United States.
| |
Collapse
|
19
|
Liu F, Zhang SW, Guo WF, Wei ZG, Chen L. Inference of Gene Regulatory Network Based on Local Bayesian Networks. PLoS Comput Biol 2016; 12:e1005024. [PMID: 27479082 PMCID: PMC4968793 DOI: 10.1371/journal.pcbi.1005024] [Citation(s) in RCA: 79] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 06/20/2016] [Indexed: 11/18/2022] Open
Abstract
The inference of gene regulatory networks (GRNs) from expression data can mine the direct regulations among genes and gain deep insights into biological processes at a network level. During past decades, numerous computational approaches have been introduced for inferring the GRNs. However, many of them still suffer from various problems, e.g., Bayesian network (BN) methods cannot handle large-scale networks due to their high computational complexity, while information theory-based methods cannot identify the directions of regulatory interactions and also suffer from false positive/negative problems. To overcome the limitations, in this work we present a novel algorithm, namely local Bayesian network (LBN), to infer GRNs from gene expression data by using the network decomposition strategy and false-positive edge elimination scheme. Specifically, LBN algorithm first uses conditional mutual information (CMI) to construct an initial network or GRN, which is decomposed into a number of local networks or GRNs. Then, BN method is employed to generate a series of local BNs by selecting the k-nearest neighbors of each gene as its candidate regulatory genes, which significantly reduces the exponential search space from all possible GRN structures. Integrating these local BNs forms a tentative network or GRN by performing CMI, which reduces redundant regulations in the GRN and thus alleviates the false positive problem. The final network or GRN can be obtained by iteratively performing CMI and local BN on the tentative network. In the iterative process, the false or redundant regulations are gradually removed. When tested on the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in E.coli, our results suggest that LBN outperforms other state-of-the-art methods (ARACNE, GENIE3 and NARROMI) significantly, with more accurate and robust performance. In particular, the decomposition strategy with local Bayesian networks not only effectively reduce the computational cost of BN due to much smaller sizes of local GRNs, but also identify the directions of the regulations.
Collapse
Affiliation(s)
- Fei Liu
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Science, Baoji, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Wei-Feng Guo
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Ze-Gang Wei
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Luonan Chen
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
- Key Laboratory of Systems Biology, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| |
Collapse
|
20
|
Kumari S, Deng W, Gunasekara C, Chiang V, Chen HS, Ma H, Davis X, Wei H. Bottom-up GGM algorithm for constructing multilayered hierarchical gene regulatory networks that govern biological pathways or processes. BMC Bioinformatics 2016; 17:132. [PMID: 26993098 PMCID: PMC4797117 DOI: 10.1186/s12859-016-0981-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Accepted: 03/09/2016] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Multilayered hierarchical gene regulatory networks (ML-hGRNs) are very important for understanding genetics regulation of biological pathways. However, there are currently no computational algorithms available for directly building ML-hGRNs that regulate biological pathways. RESULTS A bottom-up graphic Gaussian model (GGM) algorithm was developed for constructing ML-hGRN operating above a biological pathway using small- to medium-sized microarray or RNA-seq data sets. The algorithm first placed genes of a pathway at the bottom layer and began to construct a ML-hGRN by evaluating all combined triple genes: two pathway genes and one regulatory gene. The algorithm retained all triple genes where a regulatory gene significantly interfered two paired pathway genes. The regulatory genes with highest interference frequency were kept as the second layer and the number kept is based on an optimization function. Thereafter, the algorithm was used recursively to build a ML-hGRN in layer-by-layer fashion until the defined number of layers was obtained or terminated automatically. CONCLUSIONS We validated the algorithm and demonstrated its high efficiency in constructing ML-hGRNs governing biological pathways. The algorithm is instrumental for biologists to learn the hierarchical regulators associated with a given biological pathway from even small-sized microarray or RNA-seq data sets.
Collapse
Affiliation(s)
- Sapna Kumari
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, 49931, USA
| | - Wenping Deng
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, 49931, USA
| | - Chathura Gunasekara
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, 49931, USA
| | - Vincent Chiang
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, 27695, USA
| | - Huann-Sheng Chen
- Statistical Methodology and Applications Branch, Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Rockville, MD, 20850, USA
| | - Hao Ma
- NCCWA, USDA ARS, Kearneysville, WV, 25430, USA
| | - Xin Davis
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, 27695, USA
| | - Hairong Wei
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, 49931, USA.
| |
Collapse
|
21
|
He B, Tan K. Understanding transcriptional regulatory networks using computational models. Curr Opin Genet Dev 2016; 37:101-108. [PMID: 26950762 DOI: 10.1016/j.gde.2016.02.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Revised: 01/29/2016] [Accepted: 02/08/2016] [Indexed: 01/06/2023]
Abstract
Transcriptional regulatory networks (TRNs) encode instructions for animal development and physiological responses. Recent advances in genomic technologies and computational modeling have revolutionized our ability to construct models of TRNs. Here, we survey current computational methods for inferring TRN models using genome-scale data. We discuss their advantages and limitations. We summarize representative TRNs constructed using genome-scale data in both normal and disease development. We discuss lessons learned about the structure/function relationship of TRNs, based on examining various large-scale TRN models. Finally, we outline some open questions regarding TRNs, including how to improve model accuracy by integrating complementary data types, how to infer condition-specific TRNs, and how to compare TRNs across conditions and species in order to understand their structure/function relationship.
Collapse
Affiliation(s)
- Bing He
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA 52242, USA
| | - Kai Tan
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA 52242, USA; Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA.
| |
Collapse
|
22
|
Information theory in systems biology. Part I: Gene regulatory and metabolic networks. Semin Cell Dev Biol 2015; 51:3-13. [PMID: 26701126 DOI: 10.1016/j.semcdb.2015.12.007] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 12/07/2015] [Indexed: 11/22/2022]
Abstract
"A Mathematical Theory of Communication", was published in 1948 by Claude Shannon to establish a framework that is now known as information theory. In recent decades, information theory has gained much attention in the area of systems biology. The aim of this paper is to provide a systematic review of those contributions that have applied information theory in inferring or understanding of biological systems. Based on the type of system components and the interactions between them, we classify the biological systems into 4 main classes: gene regulatory, metabolic, protein-protein interaction and signaling networks. In the first part of this review, we attempt to introduce most of the existing studies on two types of biological networks, including gene regulatory and metabolic networks, which are founded on the concepts of information theory.
Collapse
|
23
|
Folch-Fortuny A, Villaverde AF, Ferrer A, Banga JR. Enabling network inference methods to handle missing data and outliers. BMC Bioinformatics 2015; 16:283. [PMID: 26335628 PMCID: PMC4559359 DOI: 10.1186/s12859-015-0717-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 08/24/2015] [Indexed: 12/20/2022] Open
Abstract
Background The inference of complex networks from data is a challenging problem in biological sciences, as well as in a wide range of disciplines such as chemistry, technology, economics, or sociology. The quantity and quality of the data greatly affect the results. While many methodologies have been developed for this task, they seldom take into account issues such as missing data or outlier detection and correction, which need to be properly addressed before network inference. Results Here we present an approach to (i) handle missing data and (ii) detect and correct outliers based on multivariate projection to latent structures. The method, called trimmed scores regression (TSR), enables network inference methods to analyse incomplete datasets by imputing the missing values coherently with the latent data structure. Furthermore, it substitutes the faulty values in a dataset by proper estimations. We provide an implementation of this approach, and show how it can be integrated with any network inference method as a preliminary data curation step. This functionality is demonstrated with a state of the art network inference method based on mutual information distance and entropy reduction, MIDER. Conclusion The methodology presented here enables network inference methods to analyse a large number of incomplete and faulty datasets that could not be reliably analysed so far. Our comparative studies show the superiority of TSR over other missing data approaches used by practitioners. Furthermore, the method allows for outlier detection and correction. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0717-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Abel Folch-Fortuny
- Departamento de Estadística e Investigación Operativa Aplicadas y Calidad, Universitat Politècnica de València, Camino de Vera s/n, Valencia, 46022, Spain.
| | - Alejandro F Villaverde
- BioProcess Engineering Group, IIM-CSIC, Eduardo Cabello 6, Vigo, 36208, Spain.,Centre of Biological Engineering, Universidade do Minho, Campus de Gualtar, Braga, 4710-057, Portugal.,Department of Systems and Control Engineering, Universidade de Vigo, Rua Maxwell, Vigo, 36310, Spain
| | - Alberto Ferrer
- Departamento de Estadística e Investigación Operativa Aplicadas y Calidad, Universitat Politècnica de València, Camino de Vera s/n, Valencia, 46022, Spain
| | - Julio R Banga
- BioProcess Engineering Group, IIM-CSIC, Eduardo Cabello 6, Vigo, 36208, Spain
| |
Collapse
|
24
|
Xiao X, Zhang W, Zou X. A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks. PLoS One 2015; 10:e0119294. [PMID: 25807392 PMCID: PMC4373852 DOI: 10.1371/journal.pone.0119294] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Accepted: 01/29/2015] [Indexed: 11/18/2022] Open
Abstract
The reconstruction of gene regulatory networks (GRNs) from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE)-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM), experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.
Collapse
Affiliation(s)
- Xiangyun Xiao
- School of Mathematics and Statistics, Wuhan University, Wuhan, China
| | - Wei Zhang
- School of Science, East China Jiaotong University, Nanchang, China
| | - Xiufen Zou
- School of Mathematics and Statistics, Wuhan University, Wuhan, China
- * E-mail:
| |
Collapse
|
25
|
Dimitrakopoulou K, Vrahatis AG, Bezerianos A. Integromics network meta-analysis on cardiac aging offers robust multi-layer modular signatures and reveals micronome synergism. BMC Genomics 2015; 16:147. [PMID: 25887273 PMCID: PMC4367845 DOI: 10.1186/s12864-015-1256-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 01/19/2015] [Indexed: 02/02/2023] Open
Abstract
Background The avalanche of integromics and panomics approaches shifted the deciphering of aging mechanisms from single molecular entities to communities of them. In this orientation, we explore the cardiac aging mechanisms – risk factor for multiple cardiovascular diseases - by capturing the micronome synergism and detecting longevity signatures in the form of communities (modules). For this, we developed a meta-analysis scheme that integrates transcriptome expression data from multiple cardiac-specific independent studies in mouse and human along with proteome and micronome interaction data in the form of multiple independent weighted networks. Modularization of each weighted network produced modules, which in turn were further analyzed so as to define consensus modules across datasets that change substantially during lifespan. Also, we established a metric that determines - from the modular perspective - the synergism of microRNA-microRNA interactions as defined by significantly functionally associated targets. Results The meta-analysis provided 40 consensus integromics modules across mouse datasets and revealed microRNA relations with substantial collective action during aging. Three modules were reproducible, based on homology, when mapped against human-derived modules. The respective homologs mainly represent NADH dehydrogenases, ATP synthases, cytochrome oxidases, Ras GTPases and ribosomal proteins. Among various observations, we corroborate to the involvement of miR-34a (included in consensus modules) as proposed recently; yet we report that has no synergistic effect. Moving forward, we determined its age-related neighborhood in which HCN3, a known heart pacemaker channel, was included. Also, miR-125a-5p/-351, miR-200c/-429, miR-106b/-17, miR-363/-92b, miR-181b/-181d, miR-19a/-19b, let-7d/-7f, miR-18a/-18b, miR-128/-27b and miR-106a/-291a-3p pairs exhibited significant synergy and their association to aging and/or cardiovascular diseases is supported in many cases by a disease database and previous studies. On the contrary, we suggest that miR-22 has not substantial impact on heart longevity as proposed recently. Conclusions We revised several proteins and microRNAs recently implicated in cardiac aging and proposed for the first time modules as signatures. The integromics meta-analysis approach can serve as an efficient subvening signature tool for more-oriented better-designed experiments. It can also promote the combinational multi-target microRNA therapy of age-related cardiovascular diseases along the continuum from prevention to detection, diagnosis, treatment and outcome. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1256-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Aristidis G Vrahatis
- Department of Medical Physics, School of Medicine, University of Patras, Patras, 26500, Greece. .,Department of Computer Engineering and Informatics, University of Patras, Patras, 26500, Greece.
| | - Anastasios Bezerianos
- Department of Medical Physics, School of Medicine, University of Patras, Patras, 26500, Greece. .,Singapore Institute for Neurotechnology (SINAPSE), Center of Life Sciences, National University of Singapore, Singapore, 117456, Singapore.
| |
Collapse
|
26
|
Giorgi FM, Lopez G, Woo JH, Bisikirska B, Califano A, Bansal M. Inferring protein modulation from gene expression data using conditional mutual information. PLoS One 2014; 9:e109569. [PMID: 25314274 PMCID: PMC4196905 DOI: 10.1371/journal.pone.0109569] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Accepted: 09/12/2014] [Indexed: 01/18/2023] Open
Abstract
Systematic, high-throughput dissection of causal post-translational regulatory dependencies, on a genome wide basis, is still one of the great challenges of biology. Due to its complexity, however, only a handful of computational algorithms have been developed for this task. Here we present CINDy (Conditional Inference of Network Dynamics), a novel algorithm for the genome-wide, context specific inference of regulatory dependencies between signaling protein and transcription factor activity, from gene expression data. The algorithm uses a novel adaptive partitioning methodology to accurately estimate the full Condition Mutual Information (CMI) between a transcription factor and its targets, given the expression of a signaling protein. We show that CMI analysis is optimally suited to dissecting post-translational dependencies. Indeed, when tested against a gold standard dataset of experimentally validated protein-protein interactions in signal transduction networks, CINDy significantly outperforms previous methods, both in terms of sensitivity and precision.
Collapse
Affiliation(s)
- Federico M. Giorgi
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Gonzalo Lopez
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Jung H. Woo
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Brygida Bisikirska
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Andrea Califano
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- Columbia Genome Center, High Throughput Screening facility, Columbia University, New York, New York, United States of America
- Department of Biomedical Informatics, Columbia University, New York, New York, United States of America
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Institute for Cancer Genetics, Columbia University, New York, New York, United States of America
- Herbert Irving Comprehensive Cancer Center, Columbia University, New York, New York, United States of America
- * E-mail: (AC); (MB)
| | - Mukesh Bansal
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- * E-mail: (AC); (MB)
| |
Collapse
|
27
|
Emmert-Streib F, Dehmer M, Haibe-Kains B. Gene regulatory networks and their applications: understanding biological and medical problems in terms of networks. Front Cell Dev Biol 2014; 2:38. [PMID: 25364745 PMCID: PMC4207011 DOI: 10.3389/fcell.2014.00038] [Citation(s) in RCA: 122] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2014] [Accepted: 07/29/2014] [Indexed: 11/13/2022] Open
Abstract
In recent years gene regulatory networks (GRNs) have attracted a lot of interest and many methods have been introduced for their statistical inference from gene expression data. However, despite their popularity, GRNs are widely misunderstood. For this reason, we provide in this paper a general discussion and perspective of gene regulatory networks. Specifically, we discuss their meaning, the consistency among different network inference methods, ensemble methods, the assessment of GRNs, the estimated number of existing GRNs and their usage in different application domains. Furthermore, we discuss open questions and necessary steps in order to utilize gene regulatory networks in a clinical context and for personalized medicine.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Computational Biology and Machine Learning Laboratory, Faculty of Medicine, Health and Life Sciences, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast Belfast, UK
| | - Matthias Dehmer
- Institute for Bioinformatics and Translational Research, UMIT Hall in Tyrol, Austria
| | - Benjamin Haibe-Kains
- Bioinformatics and Computational Genomics Laboratory, Department of Medical Biophysics, Princess Margaret Cancer Centre, University of Toronto Canada
| |
Collapse
|
28
|
Villaverde AF, Ross J, Morán F, Banga JR. MIDER: network inference with mutual information distance and entropy reduction. PLoS One 2014; 9:e96732. [PMID: 24806471 PMCID: PMC4013075 DOI: 10.1371/journal.pone.0096732] [Citation(s) in RCA: 91] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Accepted: 04/09/2014] [Indexed: 01/14/2023] Open
Abstract
The prediction of links among variables from a given dataset is a task referred to as network inference or reverse engineering. It is an open problem in bioinformatics and systems biology, as well as in other areas of science. Information theory, which uses concepts such as mutual information, provides a rigorous framework for addressing it. While a number of information-theoretic methods are already available, most of them focus on a particular type of problem, introducing assumptions that limit their generality. Furthermore, many of these methods lack a publicly available implementation. Here we present MIDER, a method for inferring network structures with information theoretic concepts. It consists of two steps: first, it provides a representation of the network in which the distance among nodes indicates their statistical closeness. Second, it refines the prediction of the existing links to distinguish between direct and indirect interactions and to assign directionality. The method accepts as input time-series data related to some quantitative features of the network nodes (such as e.g. concentrations, if the nodes are chemical species). It takes into account time delays between variables, and allows choosing among several definitions and normalizations of mutual information. It is general purpose: it may be applied to any type of network, cellular or otherwise. A Matlab implementation including source code and data is freely available (http://www.iim.csic.es/~gingproc/mider.html). The performance of MIDER has been evaluated on seven different benchmark problems that cover the main types of cellular networks, including metabolic, gene regulatory, and signaling. Comparisons with state of the art information–theoretic methods have demonstrated the competitive performance of MIDER, as well as its versatility. Its use does not demand any a priori knowledge from the user; the default settings and the adaptive nature of the method provide good results for a wide range of problems without requiring tuning.
Collapse
Affiliation(s)
| | - John Ross
- Department of Chemistry, Stanford University, Stanford, California, United States of America
| | - Federico Morán
- Department of Biochemistry and Molecular Biology, Complutense University, Madrid, Spain
| | | |
Collapse
|
29
|
Guo X, Zhang Y, Hu W, Tan H, Wang X. Inferring nonlinear gene regulatory networks from gene expression data based on distance correlation. PLoS One 2014; 9:e87446. [PMID: 24551058 PMCID: PMC3925093 DOI: 10.1371/journal.pone.0087446] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2013] [Accepted: 12/27/2013] [Indexed: 02/05/2023] Open
Abstract
Nonlinear dependence is general in regulation mechanism of gene regulatory networks (GRNs). It is vital to properly measure or test nonlinear dependence from real data for reconstructing GRNs and understanding the complex regulatory mechanisms within the cellular system. A recently developed measurement called the distance correlation (DC) has been shown powerful and computationally effective in nonlinear dependence for many situations. In this work, we incorporate the DC into inferring GRNs from the gene expression data without any underling distribution assumptions. We propose three DC-based GRNs inference algorithms: CLR-DC, MRNET-DC and REL-DC, and then compare them with the mutual information (MI)-based algorithms by analyzing two simulated data: benchmark GRNs from the DREAM challenge and GRNs generated by SynTReN network generator, and an experimentally determined SOS DNA repair network in Escherichia coli. According to both the receiver operator characteristic (ROC) curve and the precision-recall (PR) curve, our proposed algorithms significantly outperform the MI-based algorithms in GRNs inference.
Collapse
Affiliation(s)
- Xiaobo Guo
- Department of Statistical Science, School of Mathematics & Computational Science, Sun Yat-Sen University, Guangzhou, China
- Southern China Research Center of Statistical Science, Sun Yat-Sen University, Guangzhou, China
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Ye Zhang
- Department of Statistical Science, School of Mathematics & Computational Science, Sun Yat-Sen University, Guangzhou, China
- Southern China Research Center of Statistical Science, Sun Yat-Sen University, Guangzhou, China
| | - Wenhao Hu
- Department of Statistical Science, School of Mathematics & Computational Science, Sun Yat-Sen University, Guangzhou, China
| | - Haizhu Tan
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
- Department of Physics and Informatics, Shantou University Medical College, Shantou, China
| | - Xueqin Wang
- Department of Statistical Science, School of Mathematics & Computational Science, Sun Yat-Sen University, Guangzhou, China
- Southern China Research Center of Statistical Science, Sun Yat-Sen University, Guangzhou, China
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| |
Collapse
|
30
|
Li X, Zhao Y, Tian B, Jamaluddin M, Mitra A, Yang J, Rowicka M, Brasier AR, Kudlicki A. Modulation of gene expression regulated by the transcription factor NF-κB/RelA. J Biol Chem 2014; 289:11927-11944. [PMID: 24523406 DOI: 10.1074/jbc.m113.539965] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Modulators (Ms) are proteins that modify the activity of transcription factors (TFs) and influence expression of their target genes (TGs). To discover modulators of NF-κB/RelA, we first identified 365 NF-κB/RelA-binding proteins using liquid chromatography-tandem mass spectrometry (LC-MS/MS). We used a probabilistic model to infer 8349 (M, NF-κB/RelA, TG) triplets and their modes of modulatory action from our combined LC-MS/MS and ChIP-Seq (ChIP followed by next generation sequencing) data, published RelA modulators and TGs, and a compendium of gene expression profiles. Hierarchical clustering of the derived modulatory network revealed functional subnetworks and suggested new pathways modulating RelA transcriptional activity. The modulators with the highest number of TGs and most non-random distribution of action modes (measured by Shannon entropy) are consistent with published reports. Our results provide a repertoire of testable hypotheses for experimental validation. One of the NF-κB/RelA modulators we identified is STAT1. The inferred (STAT1, NF-κB/RelA, TG) triplets were validated by LC-selected reaction monitoring-MS and the results of STAT1 deletion in human fibrosarcoma cells. Overall, we have identified 562 NF-κB/RelA modulators, which are potential drug targets, and clarified mechanisms of achieving NF-κB/RelA multiple functions through modulators. Our approach can be readily applied to other TFs.
Collapse
Affiliation(s)
- Xueling Li
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555; Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas 77555; Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, China
| | - Yingxin Zhao
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555; Center for Clinical Proteomics, University of Texas Medical Branch, Galveston, Texas 77555
| | - Bing Tian
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555; Departments of Internal Medicine, University of Texas Medical Branch, Galveston, Texas 77555
| | - Mohammad Jamaluddin
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555
| | - Abhishek Mitra
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555
| | - Jun Yang
- Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555; Departments of Internal Medicine, University of Texas Medical Branch, Galveston, Texas 77555
| | - Maga Rowicka
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555; Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas 77555
| | - Allan R Brasier
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555; Center for Clinical Proteomics, University of Texas Medical Branch, Galveston, Texas 77555; Departments of Internal Medicine, University of Texas Medical Branch, Galveston, Texas 77555
| | - Andrzej Kudlicki
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555; Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas 77555.
| |
Collapse
|
31
|
Villaverde AF, Banga JR. Reverse engineering and identification in systems biology: strategies, perspectives and challenges. J R Soc Interface 2014; 11:20130505. [PMID: 24307566 PMCID: PMC3869153 DOI: 10.1098/rsif.2013.0505] [Citation(s) in RCA: 132] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 11/12/2013] [Indexed: 12/17/2022] Open
Abstract
The interplay of mathematical modelling with experiments is one of the central elements in systems biology. The aim of reverse engineering is to infer, analyse and understand, through this interplay, the functional and regulatory mechanisms of biological systems. Reverse engineering is not exclusive of systems biology and has been studied in different areas, such as inverse problem theory, machine learning, nonlinear physics, (bio)chemical kinetics, control theory and optimization, among others. However, it seems that many of these areas have been relatively closed to outsiders. In this contribution, we aim to compare and highlight the different perspectives and contributions from these fields, with emphasis on two key questions: (i) why are reverse engineering problems so hard to solve, and (ii) what methods are available for the particular problems arising from systems biology?
Collapse
Affiliation(s)
| | - Julio R. Banga
- BioProcess Engineering Group, IIM-CSIC, Spanish National Research Council, Vigo 36208, Spain
| |
Collapse
|
32
|
Yu T, Peng H. Hierarchical clustering of high-throughput expression data based on general dependences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1080-1085. [PMID: 24334400 PMCID: PMC3905248 DOI: 10.1109/tcbb.2013.99] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
High-throughput expression technologies, including gene expression array and liquid chromatography--mass spectrometry (LC-MS) and so on, measure thousands of features, i.e., genes or metabolites, on a continuous scale. In such data, both linear and nonlinear relations exist between features. Nonlinear relations can reflect critical regulation patterns in the biological system. However, they are not identified and utilized by traditional clustering methods based on linear associations. Clustering based on general dependences, i.e., both linear and nonlinear relations, is hampered by the high dimensionality and high noise level of the data. We developed a sensitive nonparametric measure of general dependence between (groups of) random variables in high dimensions. Based on this dependence measure, we developed a hierarchical clustering method. In simulation studies, the method outperformed correlation- and mutual information (MI)-based hierarchical clustering methods in clustering features with nonlinear dependences. We applied the method to a microarray data set measuring the gene expression in cell-cycle time series to show it generates biologically relevant results. The R code is available at http://userwww.service.emory.edu/~tyu8/GDHC.
Collapse
Affiliation(s)
- Tianwei Yu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA
| | - Hesen Peng
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA
| |
Collapse
|
33
|
Villaverde AF, Ross J, Banga JR. Reverse engineering cellular networks with information theoretic methods. Cells 2013; 2:306-29. [PMID: 24709703 PMCID: PMC3972682 DOI: 10.3390/cells2020306] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Revised: 04/22/2013] [Accepted: 04/27/2013] [Indexed: 11/16/2022] Open
Abstract
Building mathematical models of cellular networks lies at the core of systems biology. It involves, among other tasks, the reconstruction of the structure of interactions between molecular components, which is known as network inference or reverse engineering. Information theory can help in the goal of extracting as much information as possible from the available data. A large number of methods founded on these concepts have been proposed in the literature, not only in biology journals, but in a wide range of areas. Their critical comparison is difficult due to the different focuses and the adoption of different terminologies. Here we attempt to review some of the existing information theoretic methodologies for network inference, and clarify their differences. While some of these methods have achieved notable success, many challenges remain, among which we can mention dealing with incomplete measurements, noisy data, counterintuitive behaviour emerging from nonlinear relations or feedback loops, and computational burden of dealing with large data sets.
Collapse
Affiliation(s)
| | - John Ross
- Department of Chemistry, Stanford University, Stanford, CA 94305, USA.
| | - Julio R Banga
- Bioprocess Engineering Group, IIM-CSIC, Eduardo Cabello 6, Vigo 36208, Spain.
| |
Collapse
|
34
|
de Matos Simoes R, Emmert-Streib F. Bagging statistical network inference from large-scale gene expression data. PLoS One 2012; 7:e33624. [PMID: 22479422 PMCID: PMC3316596 DOI: 10.1371/journal.pone.0033624] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2011] [Accepted: 02/14/2012] [Indexed: 11/24/2022] Open
Abstract
Modern biology and medicine aim at hunting molecular and cellular causes of biological functions and diseases. Gene regulatory networks (GRN) inferred from gene expression data are considered an important aid for this research by providing a map of molecular interactions. Hence, GRNs have the potential enabling and enhancing basic as well as applied research in the life sciences. In this paper, we introduce a new method called BC3NET for inferring causal gene regulatory networks from large-scale gene expression data. BC3NET is an ensemble method that is based on bagging the C3NET algorithm, which means it corresponds to a Bayesian approach with noninformative priors. In this study we demonstrate for a variety of simulated and biological gene expression data from S. cerevisiae that BC3NET is an important enhancement over other inference methods that is capable of capturing biochemical interactions from transcription regulation and protein-protein interaction sensibly. An implementation of BC3NET is freely available as an R package from the CRAN repository.
Collapse
Affiliation(s)
| | - Frank Emmert-Streib
- Computational Biology and Machine Learning Lab, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, United Kingdom
- * E-mail:
| |
Collapse
|
35
|
Emmert-Streib F, Glazko GV, Altay G, de Matos Simoes R. Statistical inference and reverse engineering of gene regulatory networks from observational expression data. Front Genet 2012; 3:8. [PMID: 22408642 PMCID: PMC3271232 DOI: 10.3389/fgene.2012.00008] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Accepted: 01/10/2012] [Indexed: 01/04/2023] Open
Abstract
In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Computational Biology and Machine Learning Lab, School of Medicine, Dentistry and Biomedical Sciences, Center for Cancer Research and Cell Biology, Queen's University Belfast Belfast, UK
| | | | | | | |
Collapse
|
36
|
de Matos Simoes R, Emmert-Streib F. Influence of statistical estimators of mutual information and data heterogeneity on the inference of gene regulatory networks. PLoS One 2011; 6:e29279. [PMID: 22242113 PMCID: PMC3248437 DOI: 10.1371/journal.pone.0029279] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2011] [Accepted: 11/23/2011] [Indexed: 11/19/2022] Open
Abstract
The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.
Collapse
Affiliation(s)
- Ricardo de Matos Simoes
- Computational Biology and Machine Learning Lab, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, United Kingdom
| | - Frank Emmert-Streib
- Computational Biology and Machine Learning Lab, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, United Kingdom
- * E-mail:
| |
Collapse
|
37
|
Zhang X, Zhao XM, He K, Lu L, Cao Y, Liu J, Hao JK, Liu ZP, Chen L. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. ACTA ACUST UNITED AC 2011; 28:98-104. [PMID: 22088843 DOI: 10.1093/bioinformatics/btr626] [Citation(s) in RCA: 167] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
MOTIVATION Reconstruction of gene regulatory networks (GRNs), which explicitly represent the causality of developmental or regulatory process, is of utmost interest and has become a challenging computational problem for understanding the complex regulatory mechanisms in cellular systems. However, all existing methods of inferring GRNs from gene expression profiles have their strengths and weaknesses. In particular, many properties of GRNs, such as topology sparseness and non-linear dependence, are generally in regulation mechanism but seldom are taken into account simultaneously in one computational method. RESULTS In this work, we present a novel method for inferring GRNs from gene expression data considering the non-linear dependence and topological structure of GRNs by employing path consistency algorithm (PCA) based on conditional mutual information (CMI). In this algorithm, the conditional dependence between a pair of genes is represented by the CMI between them. With the general hypothesis of Gaussian distribution underlying gene expression data, CMI between a pair of genes is computed by a concise formula involving the covariance matrices of the related gene expression profiles. The method is validated on the benchmark GRNs from the DREAM challenge and the widely used SOS DNA repair network in Escherichia coli. The cross-validation results confirmed the effectiveness of our method (PCA-CMI), which outperforms significantly other previous methods. Besides its high accuracy, our method is able to distinguish direct (or causal) interactions from indirect associations. AVAILABILITY All the source data and code are available at: http://csb.shu.edu.cn/subweb/grn.htm. CONTACT lnchen@sibs.ac.cn; zpliu@sibs.ac.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiujun Zhang
- Institute of Systems Biology, Shanghai University, Shanghai 200444, China
| | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Wang XD, Qi YX, Jiang ZL. Reconstruction of transcriptional network from microarray data using combined mutual information and network-assisted regression. IET Syst Biol 2011; 5:95-102. [PMID: 21405197 DOI: 10.1049/iet-syb.2010.0041] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Many methods had been developed on inferring transcriptional network from gene expression. However, it is still necessary to design new method that discloses more detailed and exact network information. Using network-assisted regression, the authors combined the averaged three-way mutual information (AMI3) and non-linear ordinary differential equation (ODE) model to infer the transcriptional network, and to obtain both the topological structure and the regulatory dynamics. Synthetic and experimental data were used to evaluate the performance of the above approach. In comparison with the previous methods based on mutual information, AMI3 obtained higher precision with the same sensitivity. To describe the regulatory dynamics between transcription factors and target genes, network-assisted regression and regression without network, respectively, were applied in the steady-state and time series microarray data. The results revealed that comparing with regression without network, network-assisted regression increased the precision, but decreased the fitting goodness. Then, the authors reconstructed the transcriptional network of Escherichia coli and simulated the regulatory dynamics of genes. Furthermore, the authors' approach identified potential transcription factors regulating yeast cell cycle. In conclusion, network-assisted regression, combined AMI3 and ODE model, was a more precisely to infer the topological structure and the regulatory dynamics of transcriptional network from microarray data. [Includes supplementary material].
Collapse
Affiliation(s)
- X-D Wang
- Shanghai Jiao Tong University, Institute of Mechanobiology and Medical Engineering, Shanghai, People's Republic of China
| | | | | |
Collapse
|
39
|
Yu T, Peng H, Sun W. Incorporating Nonlinear Relationships in Microarray Missing Value Imputation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:723-731. [PMID: 20733236 PMCID: PMC3624752 DOI: 10.1109/tcbb.2010.73] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Microarray gene expression data often contain missing values. Accurate estimation of the missing values is important for downstream data analyses that require complete data. Nonlinear relationships between gene expression levels have not been well-utilized in missing value imputation. We propose an imputation scheme based on nonlinear dependencies between genes. By simulations based on real microarray data, we show that incorporating nonlinear relationships could improve the accuracy of missing value imputation, both in terms of normalized root-mean-squared error and in terms of the preservation of the list of significant genes in statistical testing. In addition, we studied the impact of artificial dependencies introduced by data normalization on the simulation results. Our results suggest that methods relying on global correlation structures may yield overly optimistic simulation results when the data have been subjected to row (gene)-wise mean removal.
Collapse
Affiliation(s)
- Tianwei Yu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA (telephone: 404-727-7671)
| | - Hesen Peng
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Wei Sun
- Department of Biostatistics & the Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
40
|
Ram R, Chetty M. A Markov-blanket-based model for gene regulatory network inference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:353-367. [PMID: 21233520 DOI: 10.1109/tcbb.2009.70] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
An efficient two-step Markov blanket method for modeling and inferring complex regulatory networks from large-scale microarray data sets is presented. The inferred gene regulatory network (GRN) is based on the time series gene expression data capturing the underlying gene interactions. For constructing a highly accurate GRN, the proposed method performs: 1) discovery of a gene's Markov Blanket (MB), 2) formulation of a flexible measure to determine the network's quality, 3) efficient searching with the aid of a guided genetic algorithm, and 4) pruning to obtain a minimal set of correct interactions. Investigations are carried out using both synthetic as well as yeast cell cycle gene expression data sets. The realistic synthetic data sets validate the robustness of the method by varying topology, sample size, time delay, noise, vertex in-degree, and the presence of hidden nodes. It is shown that the proposed approach has excellent inferential capabilities and high accuracy even in the presence of noise. The gene network inferred from yeast cell cycle data is investigated for its biological relevance using well-known interactions, sequence analysis, motif patterns, and GO data. Further, novel interactions are predicted for the unknown genes of the network and their influence on other genes is also discussed.
Collapse
Affiliation(s)
- Ramesh Ram
- Gippsland School of Information Technology, Monash University, Gippsland Campus, VIC 3842, Australia.
| | | |
Collapse
|
41
|
Tian LP, Liu L, Wu FX. Estimating parameters in genetic regulatory networks with SUM logic. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2011; 2011:1371-1374. [PMID: 22254572 DOI: 10.1109/iembs.2011.6090207] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Many methods for inferring genetic regulatory networks have been proposed. However inferred networks can hardly be used to analyze the dynamics of genetic regulatory networks. Recently nonlinear differential equations are proposed to model genetic regulatory networks. Based on this kind of model, the stability of genetic regulatory networks has been intensively investigated. Because of difficulty in estimating parameters in nonlinear model, inference of genetic regulatory networks with nonlinear model has been paid little attention. In this paper, we present a method for estimating parameters in genetic regulatory networks with SUM regulatory logic. In this kind of genetic regulatory networks, a regulatory function for each gene is a linear combination of Hill form functions, which are nonlinear in parameters and in system states. To investigate the proposed method, the gene toggle switch network is used as an illustrative example. The simulation results show that the proposed method can accurately estimates parameters in genetic regulatory networks with SUM logic.
Collapse
Affiliation(s)
- Li-Ping Tian
- School of Information, Beijing Wuzi University, Beijing, PR China.
| | | | | |
Collapse
|
42
|
Yano K. Improved prediction of protein interaction from microarray data using asymmetric correlation. PROCEDIA COMPUTER SCIENCE 2011; 4:1072-1081. [DOI: 10.1016/j.procs.2011.04.114] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
43
|
Luo W, Woolf PJ. Reconstructing transcriptional regulatory networks using three-way mutual information and Bayesian networks. Methods Mol Biol 2010; 674:401-18. [PMID: 20827604 DOI: 10.1007/978-1-60761-854-6_23] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2023]
Abstract
Probabilistic methods such as mutual information and Bayesian networks have become a major category of tools for the reconstruction of regulatory relationships from quantitative biological data. In this chapter, we describe the theoretic framework and the implementation for learning gene regulatory networks using high-order mutual information via the MI3 method (Luo et al. (2008) BMC Bioinformatics 9, 467; Luo (2008) Gene regulatory network reconstruction and pathway inference from high throughput gene expression data. PhD thesis). We also cover the closely related Bayesian network method in detail.
Collapse
Affiliation(s)
- Weijun Luo
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| | | |
Collapse
|
44
|
Hodges AP, Woolf P, He Y. BN+1 Bayesian network expansion for identifying molecular pathway elements. Commun Integr Biol 2010; 3:549-54. [PMID: 21331236 DOI: 10.4161/cib.3.6.12845] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Accepted: 06/29/2010] [Indexed: 01/08/2023] Open
Abstract
A Bayesian network expansion algorithm called BN+1 was developed to identify undocumented gene interactions in a known pathway using microarray gene expression data. In our recent paper, the BN+1 algorithm has been successfully used to identify key regulators including uspE in the E. coli ROS pathway and biofilm formation.18 In this report, a synthetic network was designed to further evaluate this algorithm. The BN+1 method was found to identify both linear and nonlinear relationships and correctly identify variables near the starting network. Using experimentally derived data, the BN+1 method identifies the gene fdhE as a potentially new ROS regulator. Finally, a range of possible score cutoff methods are explored to identify a set of criteria for selecting BN+1 calls.
Collapse
Affiliation(s)
- Andrew P Hodges
- Center for Computational Medicine and Bioinformatics; University of Michigan Medical School; University of Michigan; Michigan USA
| | | | | |
Collapse
|
45
|
Li Y, Zhu Y, Bai X, Cai H, Ji W, Guo D. ReTRN: A retriever of real transcriptional regulatory network and expression data for evaluating structure learning algorithm. Genomics 2009; 94:349-54. [DOI: 10.1016/j.ygeno.2009.08.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2009] [Revised: 06/26/2009] [Accepted: 08/18/2009] [Indexed: 11/24/2022]
|
46
|
Lee WP, Tzou WS. Computational methods for discovering gene networks from expression data. Brief Bioinform 2009; 10:408-23. [PMID: 19505889 DOI: 10.1093/bib/bbp028] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Designing and conducting experiments are routine practices for modern biologists. The real challenge, especially in the post-genome era, usually comes not from acquiring data, but from subsequent activities such as data processing, analysis, knowledge generation and gaining insight into the research question of interest. The approach of inferring gene regulatory networks (GRNs) has been flourishing for many years, and new methods from mathematics, information science, engineering and social sciences have been applied. We review different kinds of computational methods biologists use to infer networks of varying levels of accuracy and complexity. The primary concern of biologists is how to translate the inferred network into hypotheses that can be tested with real-life experiments. Taking the biologists' viewpoint, we scrutinized several methods for predicting GRNs in mammalian cells, and more importantly show how the power of different knowledge databases of different types can be used to identify modules and subnetworks, thereby reducing complexity and facilitating the generation of testable hypotheses.
Collapse
Affiliation(s)
- Wei-Po Lee
- Department of Information Management, National Sun Yat-sen University, Kaohsiung, Taiwan.
| | | |
Collapse
|
47
|
He F, Balling R, Zeng AP. Reverse engineering and verification of gene networks: principles, assumptions, and limitations of present methods and future perspectives. J Biotechnol 2009; 144:190-203. [PMID: 19631244 DOI: 10.1016/j.jbiotec.2009.07.013] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2009] [Revised: 07/13/2009] [Accepted: 07/16/2009] [Indexed: 12/21/2022]
Abstract
Reverse engineering of gene networks aims at revealing the structure of the gene regulation network in a biological system by reasoning backward directly from experimental data. Many methods have recently been proposed for reverse engineering of gene networks by using gene transcript expression data measured by microarray. Whereas the potentials of the methods have been well demonstrated, the assumptions and limitations behind them are often not clearly stated or not well understood. In this review, we first briefly explain the principles of the major methods, identify the assumptions behind them and pinpoint the limitations and possible pitfalls in applying them to real biological questions. With regard to applications, we then discuss challenges in the experimental verification of gene networks generated from reverse engineering methods. We further propose an optimal experimental design for allocating sampling schedule and possible strategies for reducing the limitations of some of the current reverse engineering methods. Finally, we examine the perspectives for the development of reverse engineering and urge the need to move from revealing network structure to the dynamics of biological systems.
Collapse
Affiliation(s)
- Feng He
- Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany
| | | | | |
Collapse
|
48
|
Luo W, Friedman MS, Shedden K, Hankenson KD, Woolf PJ. GAGE: generally applicable gene set enrichment for pathway analysis. BMC Bioinformatics 2009; 10:161. [PMID: 19473525 PMCID: PMC2696452 DOI: 10.1186/1471-2105-10-161] [Citation(s) in RCA: 970] [Impact Index Per Article: 60.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2008] [Accepted: 05/27/2009] [Indexed: 11/11/2022] Open
Abstract
Background Gene set analysis (GSA) is a widely used strategy for gene expression data analysis based on pathway knowledge. GSA focuses on sets of related genes and has established major advantages over individual gene analyses, including greater robustness, sensitivity and biological relevance. However, previous GSA methods have limited usage as they cannot handle datasets of different sample sizes or experimental designs. Results To address these limitations, we present a new GSA method called Generally Applicable Gene-set Enrichment (GAGE). We successfully apply GAGE to multiple microarray datasets with different sample sizes, experimental designs and profiling techniques. GAGE shows significantly better results when compared to two other commonly used GSA methods of GSEA and PAGE. We demonstrate this improvement in the following three aspects: (1) consistency across repeated studies/experiments; (2) sensitivity and specificity; (3) biological relevance of the regulatory mechanisms inferred. GAGE reveals novel and relevant regulatory mechanisms from both published and previously unpublished microarray studies. From two published lung cancer data sets, GAGE derived a more cohesive and predictive mechanistic scheme underlying lung cancer progress and metastasis. For a previously unpublished BMP6 study, GAGE predicted novel regulatory mechanisms for BMP6 induced osteoblast differentiation, including the canonical BMP-TGF beta signaling, JAK-STAT signaling, Wnt signaling, and estrogen signaling pathways–all of which are supported by the experimental literature. Conclusion GAGE is generally applicable to gene expression datasets with different sample sizes and experimental designs. GAGE consistently outperformed two most frequently used GSA methods and inferred statistically and biologically more relevant regulatory pathways. The GAGE method is implemented in R in the "gage" package, available under the GNU GPL from .
Collapse
Affiliation(s)
- Weijun Luo
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI 48109, USA.
| | | | | | | | | |
Collapse
|