1
|
NetREX-CF integrates incomplete transcription factor data with gene expression to reconstruct gene regulatory networks. Commun Biol 2022; 5:1282. [PMID: 36418514 PMCID: PMC9684490 DOI: 10.1038/s42003-022-04226-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 11/04/2022] [Indexed: 11/25/2022] Open
Abstract
The inference of Gene Regulatory Networks (GRNs) is one of the key challenges in systems biology. Leading algorithms utilize, in addition to gene expression, prior knowledge such as Transcription Factor (TF) DNA binding motifs or results of TF binding experiments. However, such prior knowledge is typically incomplete, therefore, integrating it with gene expression to infer GRNs remains difficult. To address this challenge, we introduce NetREX-CF-Regulatory Network Reconstruction using EXpression and Collaborative Filtering-a GRN reconstruction approach that brings together Collaborative Filtering to address the incompleteness of the prior knowledge and a biologically justified model of gene expression (sparse Network Component Analysis based model). We validated the NetREX-CF using Yeast data and then used it to construct the GRN for Drosophila Schneider 2 (S2) cells. To corroborate the GRN, we performed a large-scale RNA-Seq analysis followed by a high-throughput RNAi treatment against all 465 expressed TFs in the cell line. Our knockdown result has not only extensively validated the GRN we built, but also provides a benchmark that our community can use for evaluating GRNs. Finally, we demonstrate that NetREX-CF can infer GRNs using single-cell RNA-Seq, and outperforms other methods, by using previously published human data.
Collapse
|
2
|
Yang X, Tipton CM, Woodruff MC, Zhou E, Lee FEH, Sanz I, Qiu P. GLaMST: grow lineages along minimum spanning tree for b cell receptor sequencing data. BMC Genomics 2020; 21:583. [PMID: 32900378 PMCID: PMC7488003 DOI: 10.1186/s12864-020-06936-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Background B cell affinity maturation enables B cells to generate high-affinity antibodies. This process involves somatic hypermutation of B cell immunoglobulin receptor (BCR) genes and selection by their ability to bind antigens. Lineage trees are used to describe this microevolution of B cell immunoglobulin genes. In a lineage tree, each node is one BCR sequence that mutated from the germinal center and each directed edge represents a single base mutation, insertion or deletion. In BCR sequencing data, the observed data only contains a subset of BCR sequences in this microevolution process. Therefore, reconstructing the lineage tree from experimental data requires algorithms to build the tree based on partially observed tree nodes. Results We developed a new algorithm named Grow Lineages along Minimum Spanning Tree (GLaMST), which efficiently reconstruct the lineage tree given observed BCR sequences that correspond to a subset of the tree nodes. Through comparison using simulated and real data, GLaMST outperforms existing algorithms in simulations with high rates of mutation, insertion and deletion, and generates lineage trees with smaller size and closer to ground truth according to tree features that highly correlated with selection pressure. Conclusions GLaMST outperforms state-of-art in reconstruction of the BCR lineage tree in both efficiency and accuracy. Integrating it into existing BCR sequencing analysis frameworks can significant improve lineage tree reconstruction aspect of the analysis.
Collapse
Affiliation(s)
- Xingyu Yang
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, USA
| | - Christopher M Tipton
- Department of Medicine, Division of Rheumatology, Emory University, Atlanta, USA
| | - Matthew C Woodruff
- Department of Medicine, Division of Rheumatology, Emory University, Atlanta, USA
| | - Enlu Zhou
- School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, USA
| | | | - Inãki Sanz
- Department of Medicine, Division of Rheumatology, Emory University, Atlanta, USA
| | - Peng Qiu
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, USA.
| |
Collapse
|
3
|
Liu W, Rajapakse JC. Fusing gene expressions and transitive protein-protein interactions for inference of gene regulatory networks. BMC SYSTEMS BIOLOGY 2019; 13:37. [PMID: 30953534 PMCID: PMC6449891 DOI: 10.1186/s12918-019-0695-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
BACKGROUND Systematic fusion of multiple data sources for Gene Regulatory Networks (GRN) inference remains a key challenge in systems biology. We incorporate information from protein-protein interaction networks (PPIN) into the process of GRN inference from gene expression (GE) data. However, existing PPIN remain sparse and transitive protein interactions can help predict missing protein interactions. We therefore propose a systematic probabilistic framework on fusing GE data and transitive protein interaction data to coherently build GRN. RESULTS We use a Gaussian Mixture Model (GMM) to soft-cluster GE data, allowing overlapping cluster memberships. Next, a heuristic method is proposed to extend sparse PPIN by incorporating transitive linkages. We then propose a novel way to score extended protein interactions by combining topological properties of PPIN and correlations of GE. Following this, GE data and extended PPIN are fused using a Gaussian Hidden Markov Model (GHMM) in order to identify gene regulatory pathways and refine interaction scores that are then used to constrain the GRN structure. We employ a Bayesian Gaussian Mixture (BGM) model to refine the GRN derived from GE data by using the structural priors derived from GHMM. Experiments on real yeast regulatory networks demonstrate both the feasibility of the extended PPIN in predicting transitive protein interactions and its effectiveness on improving the coverage and accuracy the proposed method of fusing PPIN and GE to build GRN. CONCLUSION The GE and PPIN fusion model outperforms both the state-of-the-art single data source models (CLR, GENIE3, TIGRESS) as well as existing fusion models under various constraints.
Collapse
Affiliation(s)
- Wenting Liu
- School of Public Health and Management, Hubei University of Medicine, Shiyan, Hubei China
- Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA USA
| | - Jagath C. Rajapakse
- School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
4
|
Ko Y, Kim J, Rodriguez-Zas SL. Markov chain Monte Carlo simulation of a Bayesian mixture model for gene network inference. Genes Genomics 2019; 41:547-555. [PMID: 30741379 DOI: 10.1007/s13258-019-00789-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 01/21/2019] [Indexed: 12/31/2022]
Abstract
BACKGROUND Simultaneous measurement of gene expression level for thousands of genes contains the rich information about many different aspects of biological mechanisms. A major computational challenge is to find methods to extract new biological insights from this wealth of data. Complex biological processes are often regulated under the various conditions or circumstances and associated gene interactions are dynamically changed depending on different biological contexts. Thus, inference of such dynamic relationships between genes with consideration of biological conditions is very challenging. METHOD In this study, we propose a comprehensive and integrated approach to infer the dynamic relationships between genes and evaluate this approach on three distinct gene networks. RESULTS This study demonstrates the advantage of integrating Markov chain Monte Carlo (MCMC) simulation into a Bayesian mixture model to overcome the high-dimension, low sample size (HDLSS) problem as well as to identify context-specific biological modules. Such biological modules were identified through the summarization of sampled network structures obtained from MCMC simulation. CONCLUSION This novel approach gives a comprehensive understanding of the dynamically regulated biological modules.
Collapse
Affiliation(s)
- Younhee Ko
- Division of Biomedical Engineering, Hankuk University of Foreign Studies, Gyeonggi-do, 17035, South Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea.
| | - Sandra L Rodriguez-Zas
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA.
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA.
| |
Collapse
|
5
|
Abstract
Gaussian process dynamical systems (GPDS) represent Bayesian nonparametric approaches to inference of nonlinear dynamical systems, and provide a principled framework for the learning of biological networks from multiple perturbed time series measurements of gene or protein expression. Such approaches are able to capture the full richness of complex ODE models, and can be scaled for inference in moderately large systems containing hundreds of genes. Related hierarchical approaches allow for inference from multiple datasets in which the underlying generative networks are assumed to have been rewired, either by context-dependent changes in network structure, evolutionary processes, or synthetic manipulation. These approaches can also be used to leverage experimentally determined network structures from one species into another where the network structure is unknown. Collectively, these methods provide a comprehensive and flexible platform for inference from a diverse range of data, with applications in systems and synthetic biology, as well as spatiotemporal modelling of embryo development. In this chapter we provide an overview of GPDS approaches and highlight their applications in the biological sciences, with accompanying tutorials available as a Jupyter notebook from https://github.com/cap76/GPDS .
Collapse
Affiliation(s)
| | - Iulia Gherman
- Warwick Integrative Synthetic Biology Centre, School of Engineering, University of Warwick, Coventry, UK
| | - Anastasiya Sybirna
- Wellcome/CRUK Gurdon Institute, University of Cambridge, Cambridge, UK
- Wellcome/MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
- Physiology, Development and Neuroscience Department, University of Cambridge, Cambridge, UK
| | - David L Wild
- Department of Statistics and Systems Biology Centre, University of Warwick, Coventry, UK
| |
Collapse
|
6
|
Dondelinger F, Mukherjee S. Statistical Network Inference for Time-Varying Molecular Data with Dynamic Bayesian Networks. Methods Mol Biol 2019; 1883:25-48. [PMID: 30547395 DOI: 10.1007/978-1-4939-8882-2_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2023]
Abstract
In this chapter, we review the problem of network inference from time-course data, focusing on a class of graphical models known as dynamic Bayesian networks (DBNs). We discuss the relationship of DBNs to models based on ordinary differential equations, and consider extensions to nonlinear time dynamics. We provide an introduction to time-varying DBN models, which allow for changes to the network structure and parameters over time. We also discuss causal perspectives on network inference, including issues around model semantics that can arise due to missing variables. We present a case study of applying time-varying DBNs to gene expression measurements over the life cycle of Drosophila melanogaster. We finish with a discussion of future perspectives, including possible applications of time-varying network inference to single-cell gene expression data.
Collapse
Affiliation(s)
| | - Sach Mukherjee
- German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
| |
Collapse
|
7
|
Bellazzi R, Engel F, Ferrazzi F. Gene network analysis: from heart development to cardiac therapy. Thromb Haemost 2017; 113:522-31. [DOI: 10.1160/th14-06-0483] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2014] [Accepted: 08/14/2014] [Indexed: 12/31/2022]
Abstract
SummaryNetworks offer a flexible framework to represent and analyse the complex interactions between components of cellular systems. In particular gene networks inferred from expression data can support the identification of novel hypotheses on regulatory processes. In this review we focus on the use of gene network analysis in the study of heart development. Understanding heart development will promote the elucidation of the aetiology of congenital heart disease and thus possibly improve diagnostics. Moreover, it will help to establish cardiac therapies. For example, understanding cardiac differentiation during development will help to guide stem cell differentiation required for cardiac tissue engineering or to enhance endogenous repair mechanisms. We introduce different methodological frameworks to infer networks from expression data such as Boolean and Bayesian networks. Then we present currently available temporal expression data in heart development and discuss the use of network-based approaches in published studies. Collectively, our literature-based analysis indicates that gene network analysis constitutes a promising opportunity to infer therapy-relevant regulatory processes in heart development. However, the use of network-based approaches has so far been limited by the small amount of samples in available datasets. Thus, we propose to acquire high-resolution temporal expression data to improve the mathematical descriptions of regulatory processes obtained with gene network inference methodologies. Especially probabilistic methods that accommodate the intrinsic variability of biological systems have the potential to contribute to a deeper understanding of heart development.
Collapse
|
8
|
Kannan V, Tegner J. Adaptive input data transformation for improved network reconstruction with information theoretic algorithms. Stat Appl Genet Mol Biol 2016; 15:507-520. [PMID: 27875324 DOI: 10.1515/sagmb-2016-0013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
We propose a novel systematic procedure of non-linear data transformation for an adaptive algorithm in the context of network reverse-engineering using information theoretic methods. Our methodology is rooted in elucidating and correcting for the specific biases in the estimation techniques for mutual information (MI) given a finite sample of data. These are, in turn, tied to lack of well-defined bounds for numerical estimation of MI for continuous probability distributions from finite data. The nature and properties of the inevitable bias is described, complemented by several examples illustrating their form and variation. We propose an adaptive partitioning scheme for MI estimation that effectively transforms the sample data using parameters determined from its local and global distribution guaranteeing a more robust and reliable reconstruction algorithm. Together with a normalized measure (Shared Information Metric) we report considerably enhanced performance both for in silico and real-world biological networks. We also find that the recovery of true interactions is in particular better for intermediate range of false positive rates, suggesting that our algorithm is less vulnerable to spurious signals of association.
Collapse
|
9
|
Serin EAR, Nijveen H, Hilhorst HWM, Ligterink W. Learning from Co-expression Networks: Possibilities and Challenges. FRONTIERS IN PLANT SCIENCE 2016; 7:444. [PMID: 27092161 PMCID: PMC4825623 DOI: 10.3389/fpls.2016.00444] [Citation(s) in RCA: 185] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 03/21/2016] [Indexed: 05/18/2023]
Abstract
Plants are fascinating and complex organisms. A comprehensive understanding of the organization, function and evolution of plant genes is essential to disentangle important biological processes and to advance crop engineering and breeding strategies. The ultimate aim in deciphering complex biological processes is the discovery of causal genes and regulatory mechanisms controlling these processes. The recent surge of omics data has opened the door to a system-wide understanding of the flow of biological information underlying complex traits. However, dealing with the corresponding large data sets represents a challenging endeavor that calls for the development of powerful bioinformatics methods. A popular approach is the construction and analysis of gene networks. Such networks are often used for genome-wide representation of the complex functional organization of biological systems. Network based on similarity in gene expression are called (gene) co-expression networks. One of the major application of gene co-expression networks is the functional annotation of unknown genes. Constructing co-expression networks is generally straightforward. In contrast, the resulting network of connected genes can become very complex, which limits its biological interpretation. Several strategies can be employed to enhance the interpretation of the networks. A strategy in coherence with the biological question addressed needs to be established to infer reliable networks. Additional benefits can be gained from network-based strategies using prior knowledge and data integration to further enhance the elucidation of gene regulatory relationships. As a result, biological networks provide many more applications beyond the simple visualization of co-expressed genes. In this study we review the different approaches for co-expression network inference in plants. We analyse integrative genomics strategies used in recent studies that successfully identified candidate genes taking advantage of gene co-expression networks. Additionally, we discuss promising bioinformatics approaches that predict networks for specific purposes.
Collapse
Affiliation(s)
- Elise A. R. Serin
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen UniversityWageningen, Netherlands
| | - Harm Nijveen
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen UniversityWageningen, Netherlands
- Laboratory of Bioinformatics, Wageningen UniversityWageningen, Netherlands
| | - Henk W. M. Hilhorst
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen UniversityWageningen, Netherlands
| | - Wilco Ligterink
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen UniversityWageningen, Netherlands
- *Correspondence: Wilco Ligterink
| |
Collapse
|
10
|
Penfold CA, Millar JBA, Wild DL. Inferring orthologous gene regulatory networks using interspecies data fusion. Bioinformatics 2015; 31:i97-105. [PMID: 26072515 PMCID: PMC4765882 DOI: 10.1093/bioinformatics/btv267] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Motivation: The ability to jointly learn gene regulatory networks (GRNs) in, or leverage GRNs between related species would allow the vast amount of legacy data obtained in model organisms to inform the GRNs of more complex, or economically or medically relevant counterparts. Examples include transferring information from Arabidopsis thaliana into related crop species for food security purposes, or from mice into humans for medical applications. Here we develop two related Bayesian approaches to network inference that allow GRNs to be jointly inferred in, or leveraged between, several related species: in one framework, network information is directly propagated between species; in the second hierarchical approach, network information is propagated via an unobserved ‘hypernetwork’. In both frameworks, information about network similarity is captured via graph kernels, with the networks additionally informed by species-specific time series gene expression data, when available, using Gaussian processes to model the dynamics of gene expression. Results: Results on in silico benchmarks demonstrate that joint inference, and leveraging of known networks between species, offers better accuracy than standalone inference. The direct propagation of network information via the non-hierarchical framework is more appropriate when there are relatively few species, while the hierarchical approach is better suited when there are many species. Both methods are robust to small amounts of mislabelling of orthologues. Finally, the use of Saccharomyces cerevisiae data and networks to inform inference of networks in the budding yeast Schizosaccharomyces pombe predicts a novel role in cell cycle regulation for Gas1 (SPAC19B12.02c), a 1,3-beta-glucanosyltransferase. Availability and implementation: MATLAB code is available from http://go.warwick.ac.uk/systemsbiology/software/. Contact:d.l.wild@warwick.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher A Penfold
- Warwick Systems Biology Centre and Biomedical Cell Biology, Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK
| | - Jonathan B A Millar
- Warwick Systems Biology Centre and Biomedical Cell Biology, Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK
| | - David L Wild
- Warwick Systems Biology Centre and Biomedical Cell Biology, Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|
11
|
Abstract
Directed acyclic graphs (DAGs) and associated probability models are widely used to model neural connectivity and communication channels. In many experiments, data are collected from multiple subjects whose connectivities may differ but are likely to share many features. In such circumstances, it is natural to leverage similarity among subjects to improve statistical efficiency. The first exact algorithm for estimation of multiple related DAGs was recently proposed by Oates, Smith, Mukherjee, and Cussens ( 2014 ). In this letter we present examples and discuss implications of the methodology as applied to the analysis of fMRI data from a multisubject experiment. Elicitation of tuning parameters requires care, and we illustrate how this may proceed retrospectively based on technical replicate data. In addition to joint learning of subject-specific connectivity, we allow for heterogeneous collections of subjects and simultaneously estimate relationships between the subjects themselves. This letter aims to highlight the potential for exact estimation in the multisubject setting.
Collapse
Affiliation(s)
- C J Oates
- Department of Statistics, University of Warwick, Coventry, CV4 7AL, U.K.
| | | | | |
Collapse
|
12
|
Maiti A, Reddy R, Mukherjee A. Structural prediction of dynamic Bayesian network with partial prior information. IEEE Trans Nanobioscience 2014; 14:95-103. [PMID: 25314704 DOI: 10.1109/tnb.2014.2361838] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The prediction of the structure of a hidden dynamic Bayesian network (DBN) from a noisy dataset is an important and challenging task. This work presents a generalized framework to infer the DBN network structure with partial prior information. In the proposed framework, the partial information about the network structure is provided in the form of prior. The proposed method makes use of the prior information regarding the presence and as well as absence of some of the edges. Using the noisy dataset and partial prior information, this method is able to infer nearly accurate structure of the network. The proposed method is validated using simulated datasets. In addition, two real biological datasets are used to infer hidden biological interaction networks.
Collapse
|
13
|
Oates CJ, Korkola J, Gray JW, Mukherjee S. Joint estimation of multiple related biological networks. Ann Appl Stat 2014. [DOI: 10.1214/14-aoas761] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
14
|
Kiani NA, Kaderali L. Dynamic probabilistic threshold networks to infer signaling pathways from time-course perturbation data. BMC Bioinformatics 2014; 15:250. [PMID: 25047753 PMCID: PMC4133630 DOI: 10.1186/1471-2105-15-250] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Accepted: 07/15/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Network inference deals with the reconstruction of molecular networks from experimental data. Given N molecular species, the challenge is to find the underlying network. Due to data limitations, this typically is an ill-posed problem, and requires the integration of prior biological knowledge or strong regularization. We here focus on the situation when time-resolved measurements of a system's response after systematic perturbations are available. RESULTS We present a novel method to infer signaling networks from time-course perturbation data. We utilize dynamic Bayesian networks with probabilistic Boolean threshold functions to describe protein activation. The model posterior distribution is analyzed using evolutionary MCMC sampling and subsequent clustering, resulting in probability distributions over alternative networks. We evaluate our method on simulated data, and study its performance with respect to data set size and levels of noise. We then use our method to study EGF-mediated signaling in the ERBB pathway. CONCLUSIONS Dynamic Probabilistic Threshold Networks is a new method to infer signaling networks from time-series perturbation data. It exploits the dynamic response of a system after external perturbation for network reconstruction. On simulated data, we show that the approach outperforms current state of the art methods. On the ERBB data, our approach recovers a significant fraction of the known interactions, and predicts novel mechanisms in the ERBB pathway.
Collapse
Affiliation(s)
- Narsis A Kiani
- Technische Universität Dresden, Medical Faculty Carl Gustav Carus, Institute for Medical Informatics and Biometry, Fetscherstr, 74, 01307 Dresden, Germany.
| | | |
Collapse
|
15
|
López-Kleine L, Leal L, López C. Biostatistical approaches for the reconstruction of gene co-expression networks based on transcriptomic data. Brief Funct Genomics 2013; 12:457-67. [PMID: 23407269 DOI: 10.1093/bfgp/elt003] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Techniques in molecular biology have permitted the gathering of an extremely large amount of information relating organisms and their genes. The current challenge is assigning a putative function to thousands of genes that have been detected in different organisms. One of the most informative types of genomic data to achieve a better knowledge of protein function is gene expression data. Based on gene expression data and assuming that genes involved in the same function should have a similar or correlated expression pattern, a function can be attributed to those genes with unknown functions when they appear to be linked in a gene co-expression network (GCN). Several tools for the construction of GCNs have been proposed and applied to plant gene expression data. Here, we review recent methodologies used for plant gene expression data and compare the results, advantages and disadvantages in order to help researchers in their choice of a method for the construction of GCNs.
Collapse
Affiliation(s)
- Liliana López-Kleine
- Statistical Department, Universidad Nacional de Colombia, Ciudad Universitaria. Cra 30 No 45-03, Colombia.
| | | | | |
Collapse
|
16
|
Kim H, Gelenbe E. Reconstruction of large-scale gene regulatory networks using Bayesian model averaging. IEEE Trans Nanobioscience 2013; 11:259-65. [PMID: 22987132 DOI: 10.1109/tnb.2012.2214233] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Gene regulatory networks provide the systematic view of molecular interactions in a complex living system. However, constructing large-scale gene regulatory networks is one of the most challenging problems in systems biology. Also large burst sets of biological data require a proper integration technique for reliable gene regulatory network construction. Here we present a new reverse engineering approach based on Bayesian model averaging which attempts to combine all the appropriate models describing interactions among genes. This Bayesian approach with a prior based on the Gibbs distribution provides an efficient means to integrate multiple sources of biological data. In a simulation study with maximum of 2000 genes, our method shows better sensitivity than previous elastic-net and Gaussian graphical models, with a fixed specificity of 0.99. The study also shows that the proposed method outperforms the other standard methods for a DREAM dataset generated by nonlinear stochastic models. In brain tumor data analysis, three large-scale networks consisting of 4422 genes were built using the gene expression of non-tumor, low and high grade tumor mRNA expression samples, along with DNA-protein binding affinity information. We found that genes having a large variation of degree distribution among the three tumor networks are the ones that see most involved in regulatory and developmental processes, which possibly gives a novel insight concerning conventional differentially expressed gene analysis.
Collapse
Affiliation(s)
- Haseong Kim
- Intelligent Systems and Networks Group, Department of Electrical and Electronic Engineering, Imperial College London, London SW72AZ, UK.
| | | |
Collapse
|
17
|
Penfold CA, Buchanan-Wollaston V, Denby KJ, Wild DL. Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks. ACTA ACUST UNITED AC 2013; 28:i233-41. [PMID: 22689766 PMCID: PMC3371854 DOI: 10.1093/bioinformatics/bts222] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Motivation: The generation of time series transcriptomic datasets collected under multiple experimental conditions has proven to be a powerful approach for disentangling complex biological processes, allowing for the reverse engineering of gene regulatory networks (GRNs). Most methods for reverse engineering GRNs from multiple datasets assume that each of the time series were generated from networks with identical topology. In this study, we outline a hierarchical, non-parametric Bayesian approach for reverse engineering GRNs using multiple time series that can be applied in a number of novel situations including: (i) where different, but overlapping sets of transcription factors are expected to bind in the different experimental conditions; that is, where switching events could potentially arise under the different treatments and (ii) for inference in evolutionary related species in which orthologous GRNs exist. More generally, the method can be used to identify context-specific regulation by leveraging time series gene expression data alongside methods that can identify putative lists of transcription factors or transcription factor targets. Results: The hierarchical inference outperforms related (but non-hierarchical) approaches when the networks used to generate the data were identical, and performs comparably even when the networks used to generate data were independent. The method was subsequently used alongside yeast one hybrid and microarray time series data to infer potential transcriptional switches in Arabidopsis thaliana response to stress. The results confirm previous biological studies and allow for additional insights into gene regulation under various abiotic stresses. Availability: The methods outlined in this article have been implemented in Matlab and are available on request. Contact:d.l.wild@warwick.ac.uk Supplementary Information:Supplementary data is available for this article.
Collapse
|
18
|
Hill SM, Lu Y, Molina J, Heiser LM, Spellman PT, Speed TP, Gray JW, Mills GB, Mukherjee S. Bayesian inference of signaling network topology in a cancer cell line. Bioinformatics 2012; 28:2804-10. [PMID: 22923301 PMCID: PMC3476330 DOI: 10.1093/bioinformatics/bts514] [Citation(s) in RCA: 80] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2012] [Revised: 07/27/2012] [Accepted: 08/13/2012] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein signaling networks play a key role in cellular function, and their dysregulation is central to many diseases, including cancer. To shed light on signaling network topology in specific contexts, such as cancer, requires interrogation of multiple proteins through time and statistical approaches to make inferences regarding network structure. RESULTS In this study, we use dynamic Bayesian networks to make inferences regarding network structure and thereby generate testable hypotheses. We incorporate existing biology using informative network priors, weighted objectively by an empirical Bayes approach, and exploit a connection between variable selection and network inference to enable exact calculation of posterior probabilities of interest. The approach is computationally efficient and essentially free of user-set tuning parameters. Results on data where the true, underlying network is known place the approach favorably relative to existing approaches. We apply these methods to reverse-phase protein array time-course data from a breast cancer cell line (MDA-MB-468) to predict signaling links that we independently validate using targeted inhibition. The methods proposed offer a general approach by which to elucidate molecular networks specific to biological context, including, but not limited to, human cancers. AVAILABILITY http://mukherjeelab.nki.nl/DBN (code and data).
Collapse
Affiliation(s)
- Steven M Hill
- Department of Biochemistry, The Netherlands Cancer Institute, 1066 CX, Amsterdam, The Netherlands
| | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Werhli AV. Comparing the reconstruction of regulatory pathways with distinct Bayesian networks inference methods. BMC Genomics 2012; 13 Suppl 5:S2. [PMID: 23095805 PMCID: PMC3477004 DOI: 10.1186/1471-2164-13-s5-s2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Background Inference of biological networks has become an important tool in Systems Biology. Nowadays it is becoming clearer that the complexity of organisms is more related with the organization of its components in networks rather than with the individual behaviour of the components. Among various approaches for inferring networks, Bayesian Networks are very attractive due to their probabilistic nature and flexibility to incorporate interventions and extra sources of information. Recently various attempts to infer networks with different Bayesian Networks approaches were pursued. The specific interest in this paper is to compare the performance of three different inference approaches: Bayesian Networks without any modification; Bayesian Networks modified to take into account specific interventions produced during data collection; and a probabilistic hierarchical model that allows the inclusion of extra knowledge in the inference of Bayesian Networks. The inference is performed in three different types of data: (i) synthetic data obtained from a Gaussian distribution, (ii) synthetic data simulated with Netbuilder and (iii) Real data obtained in flow cytometry experiments. Results Bayesian Networks with interventions and Bayesian Networks with inclusion of extra knowledge outperform simple Bayesian Networks in all data sets when considering the reconstruction accuracy and taking the edge directions into account. In the Real data the increase in accuracy is also observed when not taking the edge directions into account. Conclusions Although it comes with a small extra computational cost the use of more refined Bayesian network models is justified. Both the inclusion of extra knowledge and the use of interventions have outperformed the simple Bayesian network model in simulated and Real data sets. Also, if the source of extra knowledge used in the inference is not reliable the inferred network is not deteriorated. If the extra knowledge has a good agreement with the data there is no significant difference in using the Bayesian networks with interventions or Bayesian networks with the extra knowledge.
Collapse
Affiliation(s)
- Adriano V Werhli
- Centro de Ciências Computacionais - C3, Universidade Federal do Rio Grande - FURG, RS, Brazil.
| |
Collapse
|
20
|
Dondelinger F, Lèbre S, Husmeier D. Non-homogeneous dynamic Bayesian networks with Bayesian regularization for inferring gene regulatory networks with gradually time-varying structure. Mach Learn 2012. [DOI: 10.1007/s10994-012-5311-x] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
21
|
Lèbre S, Dondelinger F, Husmeier D. Nonhomogeneous dynamic Bayesian networks in systems biology. Methods Mol Biol 2012; 802:199-213. [PMID: 22130882 DOI: 10.1007/978-1-61779-400-1_13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Dynamic Bayesian networks (DBNs) have received increasing attention from the computational biology community as models of gene regulatory networks. However, conventional DBNs are based on the homogeneous Markov assumption and cannot deal with inhomogeneity and nonstationarity in temporal processes. The present chapter provides a detailed discussion of how the homogeneity assumption can be relaxed. The improved method is evaluated on simulated data, where the network structure is allowed to change with time, and on gene expression time series during morphogenesis in Drosophila melanogaster.
Collapse
Affiliation(s)
- Sophie Lèbre
- Université de Strasbourg, LSIIT - UMR 7005, Strasbourg, France
| | | | | |
Collapse
|
22
|
Vignes M, Vandel J, Allouche D, Ramadan-Alban N, Cierco-Ayrolles C, Schiex T, Mangin B, de Givry S. Gene regulatory network reconstruction using Bayesian networks, the Dantzig Selector, the Lasso and their meta-analysis. PLoS One 2011; 6:e29165. [PMID: 22216195 PMCID: PMC3246469 DOI: 10.1371/journal.pone.0029165] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Accepted: 11/22/2011] [Indexed: 11/18/2022] Open
Abstract
Modern technologies and especially next generation sequencing facilities are giving a cheaper access to genotype and genomic data measured on the same sample at once. This creates an ideal situation for multifactorial experiments designed to infer gene regulatory networks. The fifth "Dialogue for Reverse Engineering Assessments and Methods" (DREAM5) challenges are aimed at assessing methods and associated algorithms devoted to the inference of biological networks. Challenge 3 on "Systems Genetics" proposed to infer causal gene regulatory networks from different genetical genomics data sets. We investigated a wide panel of methods ranging from Bayesian networks to penalised linear regressions to analyse such data, and proposed a simple yet very powerful meta-analysis, which combines these inference methods. We present results of the Challenge as well as more in-depth analysis of predicted networks in terms of structure and reliability. The developed meta-analysis was ranked first among the 16 teams participating in Challenge 3A. It paves the way for future extensions of our inference method and more accurate gene network estimates in the context of genetical genomics.
Collapse
Affiliation(s)
- Matthieu Vignes
- SaAB Team/BIA Unit, INRA Toulouse, Castanet-Tolosan, France.
| | | | | | | | | | | | | | | |
Collapse
|
23
|
Nan X, Fu G, Zhao Z, Liu S, Patel RY, Liu H, Daga PR, Doerksen RJ, Dang X, Chen Y, Wilkins D. Leveraging domain information to restructure biological prediction. BMC Bioinformatics 2011; 12 Suppl 10:S22. [PMID: 22166097 PMCID: PMC3236845 DOI: 10.1186/1471-2105-12-s10-s22] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background It is commonly believed that including domain knowledge in a prediction model is desirable. However, representing and incorporating domain information in the learning process is, in general, a challenging problem. In this research, we consider domain information encoded by discrete or categorical attributes. A discrete or categorical attribute provides a natural partition of the problem domain, and hence divides the original problem into several non-overlapping sub-problems. In this sense, the domain information is useful if the partition simplifies the learning task. The goal of this research is to develop an algorithm to identify discrete or categorical attributes that maximally simplify the learning task. Results We consider restructuring a supervised learning problem via a partition of the problem space using a discrete or categorical attribute. A naive approach exhaustively searches all the possible restructured problems. It is computationally prohibitive when the number of discrete or categorical attributes is large. We propose a metric to rank attributes according to their potential to reduce the uncertainty of a classification task. It is quantified as a conditional entropy achieved using a set of optimal classifiers, each of which is built for a sub-problem defined by the attribute under consideration. To avoid high computational cost, we approximate the solution by the expected minimum conditional entropy with respect to random projections. This approach is tested on three artificial data sets, three cheminformatics data sets, and two leukemia gene expression data sets. Empirical results demonstrate that our method is capable of selecting a proper discrete or categorical attribute to simplify the problem, i.e., the performance of the classifier built for the restructured problem always beats that of the original problem. Conclusions The proposed conditional entropy based metric is effective in identifying good partitions of a classification problem, hence enhancing the prediction performance.
Collapse
Affiliation(s)
- Xiaofei Nan
- Department of Computer and Information Science, University of Mississippi, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Chang R, Shoemaker R, Wang W. A novel knowledge-driven systems biology approach for phenotype prediction upon genetic intervention. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1170-1182. [PMID: 21282866 PMCID: PMC3211072 DOI: 10.1109/tcbb.2011.18] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Deciphering the biological networks underlying complex phenotypic traits, e.g., human disease is undoubtedly crucial to understand the underlying molecular mechanisms and to develop effective therapeutics. Due to the network complexity and the relatively small number of available experiments, data-driven modeling is a great challenge for deducing the functions of genes/proteins in the network and in phenotype formation. We propose a novel knowledge-driven systems biology method that utilizes qualitative knowledge to construct a Dynamic Bayesian network (DBN) to represent the biological network underlying a specific phenotype. Edges in this network depict physical interactions between genes and/or proteins. A qualitative knowledge model first translates typical molecular interactions into constraints when resolving the DBN structure and parameters. Therefore, the uncertainty of the network is restricted to a subset of models which are consistent with the qualitative knowledge. All models satisfying the constraints are considered as candidates for the underlying network. These consistent models are used to perform quantitative inference. By in silico inference, we can predict phenotypic traits upon genetic interventions and perturbing in the network. We applied our method to analyze the puzzling mechanism of breast cancer cell proliferation network and we accurately predicted cancer cell growth rate upon manipulating (anti)cancerous marker genes/proteins.
Collapse
|
25
|
|
26
|
Chipman KC, Singh AK. Using stochastic causal trees to augment Bayesian networks for modeling eQTL datasets. BMC Bioinformatics 2011; 12:7. [PMID: 21211042 PMCID: PMC3032670 DOI: 10.1186/1471-2105-12-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2010] [Accepted: 01/06/2011] [Indexed: 11/10/2022] Open
Abstract
Background The combination of genotypic and genome-wide expression data arising from segregating populations offers an unprecedented opportunity to model and dissect complex phenotypes. The immense potential offered by these data derives from the fact that genotypic variation is the sole source of perturbation and can therefore be used to reconcile changes in gene expression programs with the parental genotypes. To date, several methodologies have been developed for modeling eQTL data. These methods generally leverage genotypic data to resolve causal relationships among gene pairs implicated as associates in the expression data. In particular, leading studies have augmented Bayesian networks with genotypic data, providing a powerful framework for learning and modeling causal relationships. While these initial efforts have provided promising results, one major drawback associated with these methods is that they are generally limited to resolving causal orderings for transcripts most proximal to the genomic loci. In this manuscript, we present a probabilistic method capable of learning the causal relationships between transcripts at all levels in the network. We use the information provided by our method as a prior for Bayesian network structure learning, resulting in enhanced performance for gene network reconstruction. Results Using established protocols to synthesize eQTL networks and corresponding data, we show that our method achieves improved performance over existing leading methods. For the goal of gene network reconstruction, our method achieves improvements in recall ranging from 20% to 90% across a broad range of precision levels and for datasets of varying sample sizes. Additionally, we show that the learned networks can be utilized for expression quantitative trait loci mapping, resulting in upwards of 10-fold increases in recall over traditional univariate mapping. Conclusions Using the information from our method as a prior for Bayesian network structure learning yields large improvements in accuracy for the tasks of gene network reconstruction and expression quantitative trait loci mapping. In particular, our method is effective for establishing causal relationships between transcripts located both proximally and distally from genomic loci.
Collapse
Affiliation(s)
- Kyle C Chipman
- Biomolecular Science and Engineering Program, UC Santa Barbara, Santa Barbara, CA, USA.
| | | |
Collapse
|
27
|
Modelling nonstationary gene regulatory processes. Adv Bioinformatics 2010. [PMID: 20721277 PMCID: PMC2913537 DOI: 10.1155/2010/749848] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 04/29/2010] [Indexed: 02/05/2023] Open
Abstract
An important objective in systems biology is to infer gene regulatory networks from postgenomic data, and dynamic Bayesian networks have been widely applied as a popular tool to this end. The standard approach for nondiscretised data is restricted to a linear model and a homogeneous Markov chain. Recently, various generalisations based on changepoint processes and free allocation mixture models have been proposed. The former aim to relax the homogeneity assumption, whereas the latter are more flexible and, in principle, more adequate for modelling nonlinear processes. In our paper, we compare both paradigms and discuss theoretical shortcomings of the latter approach. We show that a model based on the changepoint process yields systematically better results than the free allocation model when inferring nonstationary gene regulatory processes from simulated gene expression time series. We further cross-compare the performance of both models on three biological systems: macrophages challenged with viral infection, circadian regulation in Arabidopsis thaliana, and morphogenesis in Drosophila melanogaster.
Collapse
|
28
|
Cheng H, Jiang L, Wu M, Liu Q. Inferring Transcriptional Interactions by the Optimal Integration of ChIP-chip and Knock-out Data. Bioinform Biol Insights 2009; 3:129-40. [PMID: 20140075 PMCID: PMC2808186 DOI: 10.4137/bbi.s3445] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
How to combine heterogeneous data sources for reliable prediction of transcriptional regulation is a challenge. Here we present an easy but powerful method to integrate Chromatin immunoprecipitation (ChIP)-chip and knock-out data. Since these two types of data provide complementary (physical and functional) information about transcription, the method combining them is expected to achieve high detection rates and very low false positive rates. We try to seek the optimal integration of these two data using hyper-geometric distribution. We evaluate our method on yeast data and compare our predictions with YEASTRACT, high-quality ChIP-chip data, and literature. The results show that even using low-quality ChIP-chip data, our method uncovers more relations than those inferred before from high-quality data. Furthermore our method achieves a low false positive rate. We find experimental and computational evidence in literature for most transcription factor (TF)-gene relations uncovered by our method.
Collapse
Affiliation(s)
- Haoyu Cheng
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.
| | | | | | | |
Collapse
|
29
|
Christley S, Nie Q, Xie X. Incorporating existing network information into gene network inference. PLoS One 2009; 4:e6799. [PMID: 19710931 PMCID: PMC2729382 DOI: 10.1371/journal.pone.0006799] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2009] [Accepted: 07/27/2009] [Indexed: 12/31/2022] Open
Abstract
One methodology that has met success to infer gene networks from gene expression data is based upon ordinary differential equations (ODE). However new types of data continue to be produced, so it is worthwhile to investigate how to integrate these new data types into the inference procedure. One such data is physical interactions between transcription factors and the genes they regulate as measured by ChIP-chip or ChIP-seq experiments. These interactions can be incorporated into the gene network inference procedure as a priori network information. In this article, we extend the ODE methodology into a general optimization framework that incorporates existing network information in combination with regularization parameters that encourage network sparsity. We provide theoretical results proving convergence of the estimator for our method and show the corresponding probabilistic interpretation also converges. We demonstrate our method on simulated network data and show that existing network information improves performance, overcomes the lack of observations, and performs well even when some of the existing network information is incorrect. We further apply our method to the core regulatory network of embryonic stem cells utilizing predicted interactions from two studies as existing network information. We show that including the prior network information constructs a more closely representative regulatory network versus when no information is provided.
Collapse
Affiliation(s)
- Scott Christley
- Department of Mathematics, University of California Irvine, Irvine, California, United States of America
- Department of Computer Science, University of California Irvine, Irvine, California, United States of America
- Center for Mathematical and Computational Biology, University of California Irvine, Irvine, California, United States of America
- Center for Complex Biological Systems, University of California Irvine, Irvine, California, United States of America
- * E-mail: (SC); l (XX)
| | - Qing Nie
- Department of Mathematics, University of California Irvine, Irvine, California, United States of America
- Center for Mathematical and Computational Biology, University of California Irvine, Irvine, California, United States of America
- Center for Complex Biological Systems, University of California Irvine, Irvine, California, United States of America
| | - Xiaohui Xie
- Department of Computer Science, University of California Irvine, Irvine, California, United States of America
- Center for Mathematical and Computational Biology, University of California Irvine, Irvine, California, United States of America
- Center for Complex Biological Systems, University of California Irvine, Irvine, California, United States of America
- Institute for Genomics and Bioinformatics, University of California Irvine, Irvine, California, United States of America
- * E-mail: (SC); l (XX)
| |
Collapse
|
30
|
Inference of gene pathways using mixture Bayesian networks. BMC SYSTEMS BIOLOGY 2009; 3:54. [PMID: 19454027 PMCID: PMC2701418 DOI: 10.1186/1752-0509-3-54] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2008] [Accepted: 05/19/2009] [Indexed: 12/13/2022]
Abstract
BACKGROUND Inference of gene networks typically relies on measurements across a wide range of conditions or treatments. Although one network structure is predicted, the relationship between genes could vary across conditions. A comprehensive approach to infer general and condition-dependent gene networks was evaluated. This approach integrated Bayesian network and Gaussian mixture models to describe continuous microarray gene expression measurements, and three gene networks were predicted. RESULTS The first reconstructions of a circadian rhythm pathway in honey bees and an adherens junction pathway in mouse embryos were obtained. In addition, general and condition-specific gene relationships, some unexpected, were detected in these two pathways and in a yeast cell-cycle pathway. The mixture Bayesian network approach identified all (honey bee circadian rhythm and mouse adherens junction pathways) or the vast majority (yeast cell-cycle pathway) of the gene relationships reported in empirical studies. Findings across the three pathways and data sets indicate that the mixture Bayesian network approach is well-suited to infer gene pathways based on microarray data. Furthermore, the interpretation of model estimates provided a broader understanding of the relationships between genes. The mixture models offered a comprehensive description of the relationships among genes in complex biological processes or across a wide range of conditions. The mixture parameter estimates and corresponding odds that the gene network inferred for a sample pertained to each mixture component allowed the uncovering of both general and condition-dependent gene relationships and patterns of expression. CONCLUSION This study demonstrated the two main benefits of learning gene pathways using mixture Bayesian networks. First, the identification of the optimal number of mixture components supported by the data offered a robust approach to infer gene relationships and estimate gene expression profiles. Second, the classification of conditions and observations into groups that support particular mixture components helped to uncover both gene relationships that are unique or common across conditions. Results from the application of mixture Bayesian networks substantially augmented the understanding of gene networks and demonstrated the added-value of this methodology to infer gene networks.
Collapse
|