101
|
Ebrahimie E, Nurollah Z, Ebrahimi M, Hemmatzadeh F, Ignjatovic J. Unique ability of pandemic influenza to downregulate the genes involved in neuronal disorders. Mol Biol Rep 2015; 42:1377-90. [DOI: 10.1007/s11033-015-3916-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 07/22/2015] [Indexed: 01/01/2023]
|
102
|
Zhu F, Panwar B, Guan Y. Algorithms for modeling global and context-specific functional relationship networks. Brief Bioinform 2015; 17:686-95. [PMID: 26254431 DOI: 10.1093/bib/bbv065] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Indexed: 02/07/2023] Open
Abstract
Functional genomics has enormous potential to facilitate our understanding of normal and disease-specific physiology. In the past decade, intensive research efforts have been focused on modeling functional relationship networks, which summarize the probability of gene co-functionality relationships. Such modeling can be based on either expression data only or heterogeneous data integration. Numerous methods have been deployed to infer the functional relationship networks, while most of them target the global (non-context-specific) functional relationship networks. However, it is expected that functional relationships consistently reprogram under different tissues or biological processes. Thus, advanced methods have been developed targeting tissue-specific or developmental stage-specific networks. This article brings together the state-of-the-art functional relationship network modeling methods, emphasizes the need for heterogeneous genomic data integration and context-specific network modeling and outlines future directions for functional relationship networks.
Collapse
|
103
|
Ha MJ, Baladandayuthapani V, Do KA. DINGO: differential network analysis in genomics. Bioinformatics 2015; 31:3413-20. [PMID: 26148744 DOI: 10.1093/bioinformatics/btv406] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2014] [Accepted: 06/26/2015] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Cancer progression and development are initiated by aberrations in various molecular networks through coordinated changes across multiple genes and pathways. It is important to understand how these networks change under different stress conditions and/or patient-specific groups to infer differential patterns of activation and inhibition. Existing methods are limited to correlation networks that are independently estimated from separate group-specific data and without due consideration of relationships that are conserved across multiple groups. METHOD We propose a pathway-based differential network analysis in genomics (DINGO) model for estimating group-specific networks and making inference on the differential networks. DINGO jointly estimates the group-specific conditional dependencies by decomposing them into global and group-specific components. The delineation of these components allows for a more refined picture of the major driver and passenger events in the elucidation of cancer progression and development. RESULTS Simulation studies demonstrate that DINGO provides more accurate group-specific conditional dependencies than achieved by using separate estimation approaches. We apply DINGO to key signaling pathways in glioblastoma to build differential networks for long-term survivors and short-term survivors in The Cancer Genome Atlas. The hub genes found by mRNA expression, DNA copy number, methylation and microRNA expression reveal several important roles in glioblastoma progression. AVAILABILITY AND IMPLEMENTATION R Package at: odin.mdacc.tmc.edu/∼vbaladan. CONTACT veera@mdanderson.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Min Jin Ha
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | | | - Kim-Anh Do
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
104
|
Inferring Broad Regulatory Biology from Time Course Data: Have We Reached an Upper Bound under Constraints Typical of In Vivo Studies? PLoS One 2015; 10:e0127364. [PMID: 25984725 PMCID: PMC4435750 DOI: 10.1371/journal.pone.0127364] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 04/13/2015] [Indexed: 12/21/2022] Open
Abstract
There is a growing appreciation for the network biology that regulates the coordinated expression of molecular and cellular markers however questions persist regarding the identifiability of these networks. Here we explore some of the issues relevant to recovering directed regulatory networks from time course data collected under experimental constraints typical of in vivo studies. NetSim simulations of sparsely connected biological networks were used to evaluate two simple feature selection techniques used in the construction of linear Ordinary Differential Equation (ODE) models, namely truncation of terms versus latent vector projection. Performance was compared with ODE-based Time Series Network Identification (TSNI) integral, and the information-theoretic Time-Delay ARACNE (TD-ARACNE). Projection-based techniques and TSNI integral outperformed truncation-based selection and TD-ARACNE on aggregate networks with edge densities of 10-30%, i.e. transcription factor, protein-protein cliques and immune signaling networks. All were more robust to noise than truncation-based feature selection. Performance was comparable on the in silico 10-node DREAM 3 network, a 5-node Yeast synthetic network designed for In vivo Reverse-engineering and Modeling Assessment (IRMA) and a 9-node human HeLa cell cycle network of similar size and edge density. Performance was more sensitive to the number of time courses than to sample frequency and extrapolated better to larger networks by grouping experiments. In all cases performance declined rapidly in larger networks with lower edge density. Limited recovery and high false positive rates obtained overall bring into question our ability to generate informative time course data rather than the design of any particular reverse engineering algorithm.
Collapse
|
105
|
Data Integration for Microarrays: Enhanced Inference for Gene Regulatory Networks. MICROARRAYS 2015; 4:255-69. [PMID: 27600224 PMCID: PMC4996389 DOI: 10.3390/microarrays4020255] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 04/30/2015] [Indexed: 01/01/2023]
Abstract
Microarray technologies have been the basis of numerous important findings regarding gene expression in the few last decades. Studies have generated large amounts of data describing various processes, which, due to the existence of public databases, are widely available for further analysis. Given their lower cost and higher maturity compared to newer sequencing technologies, these data continue to be produced, even though data quality has been the subject of some debate. However, given the large volume of data generated, integration can help overcome some issues related, e.g., to noise or reduced time resolution, while providing additional insight on features not directly addressed by sequencing methods. Here, we present an integration test case based on public Drosophila melanogaster datasets (gene expression, binding site affinities, known interactions). Using an evolutionary computation framework, we show how integration can enhance the ability to recover transcriptional gene regulatory networks from these data, as well as indicating which data types are more important for quantitative and qualitative network inference. Our results show a clear improvement in performance when multiple datasets are integrated, indicating that microarray data will remain a valuable and viable resource for some time to come.
Collapse
|
106
|
Nepomuceno JA, Troncoso A, Nepomuceno-Chamorro IA, Aguilar-Ruiz JS. Integrating biological knowledge based on functional annotations for biclustering of gene expression data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 119:163-80. [PMID: 25843807 DOI: 10.1016/j.cmpb.2015.02.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Revised: 02/17/2015] [Accepted: 02/27/2015] [Indexed: 05/06/2023]
Abstract
Gene expression data analysis is based on the assumption that co-expressed genes imply co-regulated genes. This assumption is being reformulated because the co-expression of a group of genes may be the result of an independent activation with respect to the same experimental condition and not due to the same regulatory regime. For this reason, traditional techniques are recently being improved with the use of prior biological knowledge from open-access repositories together with gene expression data. Biclustering is an unsupervised machine learning technique that searches patterns in gene expression data matrices. A scatter search-based biclustering algorithm that integrates biological information is proposed in this paper. In addition to the gene expression data matrix, the input of the algorithm is only a direct annotation file that relates each gene to a set of terms from a biological repository where genes are annotated. Two different biological measures, FracGO and SimNTO, are proposed to integrate this information by means of its addition to-be-optimized fitness function in the scatter search scheme. The measure FracGO is based on the biological enrichment and SimNTO is based on the overlapping among GO annotations of pairs of genes. Experimental results evaluate the proposed algorithm for two datasets and show the algorithm performs better when biological knowledge is integrated. Moreover, the analysis and comparison between the two different biological measures is presented and it is concluded that the differences depend on both the data source and how the annotation file has been built in the case GO is used. It is also shown that the proposed algorithm obtains a greater number of enriched biclusters than other classical biclustering algorithms typically used as benchmark and an analysis of the overlapping among biclusters reveals that the biclusters obtained present a low overlapping. The proposed methodology is a general-purpose algorithm which allows the integration of biological information from several sources and can be extended to other biclustering algorithms based on the optimization of a merit function.
Collapse
Affiliation(s)
- Juan A Nepomuceno
- Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Seville, Spain.
| | - Alicia Troncoso
- Department of Computer Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013 Seville, Spain
| | - Isabel A Nepomuceno-Chamorro
- Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Seville, Spain
| | - Jesús S Aguilar-Ruiz
- Department of Computer Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013 Seville, Spain
| |
Collapse
|
107
|
Siebourg-Polster J, Mudrak D, Emmenlauer M, Rämö P, Dehio C, Greber U, Fröhlich H, Beerenwinkel N. NEMix: single-cell nested effects models for probabilistic pathway stimulation. PLoS Comput Biol 2015; 11:e1004078. [PMID: 25879530 PMCID: PMC4400057 DOI: 10.1371/journal.pcbi.1004078] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 12/08/2014] [Indexed: 11/18/2022] Open
Abstract
Nested effects models have been used successfully for learning subcellular networks from high-dimensional perturbation effects that result from RNA interference (RNAi) experiments. Here, we further develop the basic nested effects model using high-content single-cell imaging data from RNAi screens of cultured cells infected with human rhinovirus. RNAi screens with single-cell readouts are becoming increasingly common, and they often reveal high cell-to-cell variation. As a consequence of this cellular heterogeneity, knock-downs result in variable effects among cells and lead to weak average phenotypes on the cell population level. To address this confounding factor in network inference, we explicitly model the stimulation status of a signaling pathway in individual cells. We extend the framework of nested effects models to probabilistic combinatorial knock-downs and propose NEMix, a nested effects mixture model that accounts for unobserved pathway activation. We analyzed the identifiability of NEMix and developed a parameter inference scheme based on the Expectation Maximization algorithm. In an extensive simulation study, we show that NEMix improves learning of pathway structures over classical NEMs significantly in the presence of hidden pathway stimulation. We applied our model to single-cell imaging data from RNAi screens monitoring human rhinovirus infection, where limited infection efficiency of the assay results in uncertain pathway stimulation. Using a subset of genes with known interactions, we show that the inferred NEMix network has high accuracy and outperforms the classical nested effects model without hidden pathway activity. NEMix is implemented as part of the R/Bioconductor package ‘nem’ and available at www.cbg.ethz.ch/software/NEMix. Experiments monitoring individual cells show that cells can behave differently even under same experimental conditions. Summarizing measurements over a population of cells can lead to weak and widely deviating signals, and subsequently applied modeling approaches, like network inference, will suffer from this information loss. Nested effects models, a method tailored to reconstruct signaling networks from high-dimensional read-outs of gene silencing experiments, have so far been only applied on the cell population level. These models assume the pathway under consideration to be activated in all cells. The signal flow is only disrupted, when genes are silenced. However, if this assumption is not met, inference results can be incorrect, because observed effects are interpreted wrongly. We extended nested effects models, to use the power of single-cell resolution data sets. We introduce a new unobserved factor, which describes the pathway activity of single cells. The pathway activity is learned for each cell during network inference. We apply our model to gene silencing screens, investigating human rhino virus infection of single cells from microscopy imaging features. Comparing the learned network to the known KEGG pathway of the genes shows that our method recovers networks significantly better than classical nested effects models without capturing of hidden signaling.
Collapse
Affiliation(s)
- Juliane Siebourg-Polster
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Daria Mudrak
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | | | - Pauli Rämö
- Biozentrum, University of Basel, Basel, Switzerland
| | | | - Urs Greber
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | - Holger Fröhlich
- Algorithmic Bioinformatics, Bonn-Aachen International Center for IT, University of Bonn, Bonn, Germany
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
- * E-mail:
| |
Collapse
|
108
|
Han S, Wong RKW, Lee TCM, Shen L, Li SYR, Fan X. A full bayesian approach for boolean genetic network inference. PLoS One 2014; 9:e115806. [PMID: 25551820 PMCID: PMC4281059 DOI: 10.1371/journal.pone.0115806] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2014] [Accepted: 11/29/2014] [Indexed: 02/03/2023] Open
Abstract
Boolean networks are a simple but efficient model for describing gene regulatory systems. A number of algorithms have been proposed to infer Boolean networks. However, these methods do not take full consideration of the effects of noise and model uncertainty. In this paper, we propose a full Bayesian approach to infer Boolean genetic networks. Markov chain Monte Carlo algorithms are used to obtain the posterior samples of both the network structure and the related parameters. In addition to regular link addition and removal moves, which can guarantee the irreducibility of the Markov chain for traversing the whole network space, carefully constructed mixture proposals are used to improve the Markov chain Monte Carlo convergence. Both simulations and a real application on cell-cycle data show that our method is more powerful than existing methods for the inference of both the topology and logic relations of the Boolean network from observed data.
Collapse
Affiliation(s)
- Shengtong Han
- Department of Statistics, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China
| | - Raymond K. W. Wong
- Department of Statistics, Iowa State University, Ames, IA, United States of America
| | - Thomas C. M. Lee
- Department of Statistics, University of California Davis, Davis, CA, United States of America
| | - Linghao Shen
- Department of Information Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China
| | - Shuo-Yen R. Li
- Department of Information Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China
- University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China
- * E-mail:
| |
Collapse
|
109
|
Bois FY, Gayraud G. Probabilistic generation of random networks taking into account information on motifs occurrence. J Comput Biol 2014; 22:25-36. [PMID: 25493547 DOI: 10.1089/cmb.2014.0175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Abstract
Because of the huge number of graphs possible even with a small number of nodes, inference on network structure is known to be a challenging problem. Generating large random directed graphs with prescribed probabilities of occurrences of some meaningful patterns (motifs) is also difficult. We show how to generate such random graphs according to a formal probabilistic representation, using fast Markov chain Monte Carlo methods to sample them. As an illustration, we generate realistic graphs with several hundred nodes mimicking a gene transcription interaction network in Escherichia coli.
Collapse
Affiliation(s)
- Frederic Y Bois
- 1 Université de Technologie de Compiègne and Institut National de l'Environnement Industriel et des Risques, France
| | | |
Collapse
|
110
|
Qi Q, Li J, Cheng J. Reconstruction of metabolic pathways by combining probabilistic graphical model-based and knowledge-based methods. BMC Proc 2014; 8:S5. [PMID: 25374614 PMCID: PMC4202177 DOI: 10.1186/1753-6561-8-s6-s5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Automatic reconstruction of metabolic pathways for an organism from genomics and transcriptomics data has been a challenging and important problem in bioinformatics. Traditionally, known reference pathways can be mapped into an organism-specific ones based on its genome annotation and protein homology. However, this simple knowledge-based mapping method might produce incomplete pathways and generally cannot predict unknown new relations and reactions. In contrast, ab initio metabolic network construction methods can predict novel reactions and interactions, but its accuracy tends to be low leading to a lot of false positives. Here we combine existing pathway knowledge and a new ab initio Bayesian probabilistic graphical model together in a novel fashion to improve automatic reconstruction of metabolic networks. Specifically, we built a knowledge database containing known, individual gene / protein interactions and metabolic reactions extracted from existing reference pathways. Known reactions and interactions were then used as constraints for Bayesian network learning methods to predict metabolic pathways. Using individual reactions and interactions extracted from different pathways of many organisms to guide pathway construction is new and improves both the coverage and accuracy of metabolic pathway construction. We applied this probabilistic knowledge-based approach to construct the metabolic networks from yeast gene expression data and compared its results with 62 known metabolic networks in the KEGG database. The experiment showed that the method improved the coverage of metabolic network construction over the traditional reference pathway mapping method and was more accurate than pure ab initio methods.
Collapse
Affiliation(s)
- Qi Qi
- Department of Computer Science, University of Missouri, Columbia, MO 65201, USA
| | - Jilong Li
- Department of Computer Science, University of Missouri, Columbia, MO 65201, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65201, USA ; Informatics Institute, University of Missouri, Columbia, MO 65201, USA
| |
Collapse
|
111
|
Voillet V, SanCristobal M, Lippi Y, Martin PGP, Iannuccelli N, Lascor C, Vignoles F, Billon Y, Canario L, Liaubet L. Muscle transcriptomic investigation of late fetal development identifies candidate genes for piglet maturity. BMC Genomics 2014; 15:797. [PMID: 25226791 PMCID: PMC4287105 DOI: 10.1186/1471-2164-15-797] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Accepted: 09/11/2014] [Indexed: 01/06/2023] Open
Abstract
Background In pigs, the perinatal period is the most critical time for survival. Piglet maturation, which occurs at the end of gestation, leads to a state of full development after birth. Therefore, maturity is an important determinant of early survival. Skeletal muscle plays a key role in adaptation to extra-uterine life, e.g. glycogen storage and thermoregulation. In this study, we performed microarray analysis to identify the genes and biological processes involved in piglet muscle maturity. Progeny from two breeds with extreme muscle maturity phenotypes were analyzed at two time points during gestation (gestational days 90 and 110). The Large White (LW) breed is a selected breed with an increased rate of mortality at birth, whereas the Meishan (MS) breed produces piglets with extremely low mortality at birth. The impact of the parental genome was analyzed with reciprocal crossed fetuses. Results Microarray analysis identified 12,326 differentially expressed probes for gestational age and genotype. Such a high number reflects an important transcriptomic change that occurs between 90 and 110 days of gestation. 2,000 probes, corresponding to 1,120 unique annotated genes, involved more particularly in the maturation process were further studied. Functional enrichment and graph inference studies underlined genes involved in muscular development around 90 days of gestation, and genes involved in metabolic functions, such as gluconeogenesis, around 110 days of gestation. Moreover, a difference in the expression of key genes, e.g. PCK2, LDHA or PGK1, was detected between MS and LW just before birth. Reciprocal crossing analysis resulted in the identification of 472 genes with an expression preferentially regulated by one parental genome. Most of these genes (366) were regulated by the paternal genome. Among these paternally regulated genes, some known imprinted genes, such as MAGEL2 or IGF2, were identified and could have a key role in the maturation process. Conclusion These results reveal the biological mechanisms that regulate muscle maturity in piglets. Maturity is also under the conflicting regulation of the parental genomes. Crucial genes, which could explain the biological differences in maturity observed between LW and MS breeds, were identified. These genes could be excellent candidates for a key role in the maturity. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-797) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Laurence Liaubet
- INRA, UMR1388 Génétique, Physiologie et Systèmes d' Elevage, F-31326 Castanet-Tolosan, France.
| |
Collapse
|
112
|
Abstract
A fundamental goal of systems biology is to create models that describe relationships between biological components. Networks are an increasingly popular approach to this problem. However, a scientist interested in modeling biological (e.g., gene expression) data as a network is quickly confounded by the fundamental problem: how to construct the network? It is fairly easy to construct a network, but is it the network for the problem being considered? This is an important problem with three fundamental issues: How to weight edges in the network in order to capture actual biological interactions? What is the effect of the type of biological experiment used to collect the data from which the network is constructed? How to prune the weighted edges (or what cut-off to apply)? Differences in the construction of networks could lead to different biological interpretations. Indeed, we find that there are statistically significant dissimilarities in the functional content and topology between gene co-expression networks constructed using different edge weighting methods, data types, and edge cut-offs. We show that different types of known interactions, such as those found through Affinity Capture-Luminescence or Synthetic Lethality experiments, appear in significantly varying amounts in networks constructed in different ways. Hence, we demonstrate that different biological questions may be answered by the different networks. Consequently, we posit that the approach taken to build a network can be matched to biological questions to get targeted answers. More study is required to understand the implications of different network inference approaches and to draw reliable conclusions from networks used in the field of systems biology.
Collapse
|
113
|
Emmert-Streib F, Dehmer M, Haibe-Kains B. Untangling statistical and biological models to understand network inference: the need for a genomics network ontology. Front Genet 2014; 5:299. [PMID: 25221572 PMCID: PMC4148777 DOI: 10.3389/fgene.2014.00299] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2014] [Accepted: 08/12/2014] [Indexed: 12/31/2022] Open
Abstract
In this paper, we shed light on approaches that are currently used to infer networks from gene expression data with respect to their biological meaning. As we will show, the biological interpretation of these networks depends on the chosen theoretical perspective. For this reason, we distinguish a statistical perspective from a mathematical modeling perspective and elaborate their differences and implications. Our results indicate the imperative need for a genomic network ontology in order to avoid increasing confusion about the biological interpretation of inferred networks, which can be even enhanced by approaches that integrate multiple data sets, respectively, data types.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Computational Biology and Machine Learning Laboratory, Faculty of Medicine, Health and Life Sciences, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast Belfast, UK
| | - Matthias Dehmer
- Institute for Bioinformatics and Translational Research, UMIT Hall in Tyrol, Austria
| | - Benjamin Haibe-Kains
- Bioinformatics and Computational Genomics Laboratory, Princess Margaret Cancer Centre, University Health Network Toronto, ON, Canada
| |
Collapse
|
114
|
Emmert-Streib F, Dehmer M, Haibe-Kains B. Gene regulatory networks and their applications: understanding biological and medical problems in terms of networks. Front Cell Dev Biol 2014; 2:38. [PMID: 25364745 PMCID: PMC4207011 DOI: 10.3389/fcell.2014.00038] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2014] [Accepted: 07/29/2014] [Indexed: 11/13/2022] Open
Abstract
In recent years gene regulatory networks (GRNs) have attracted a lot of interest and many methods have been introduced for their statistical inference from gene expression data. However, despite their popularity, GRNs are widely misunderstood. For this reason, we provide in this paper a general discussion and perspective of gene regulatory networks. Specifically, we discuss their meaning, the consistency among different network inference methods, ensemble methods, the assessment of GRNs, the estimated number of existing GRNs and their usage in different application domains. Furthermore, we discuss open questions and necessary steps in order to utilize gene regulatory networks in a clinical context and for personalized medicine.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Computational Biology and Machine Learning Laboratory, Faculty of Medicine, Health and Life Sciences, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast Belfast, UK
| | - Matthias Dehmer
- Institute for Bioinformatics and Translational Research, UMIT Hall in Tyrol, Austria
| | - Benjamin Haibe-Kains
- Bioinformatics and Computational Genomics Laboratory, Department of Medical Biophysics, Princess Margaret Cancer Centre, University of Toronto Canada
| |
Collapse
|
115
|
Häggström J, Cipriano M, Forshell LP, Persson E, Hammarsten P, Stella N, Fowler CJ. Potential upstream regulators of cannabinoid receptor 1 signaling in prostate cancer: a Bayesian network analysis of data from a tissue microarray. Prostate 2014; 74:1107-17. [PMID: 24913716 PMCID: PMC4145668 DOI: 10.1002/pros.22827] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Accepted: 04/30/2014] [Indexed: 01/12/2023]
Abstract
BACKGROUND The endocannabinoid system regulates cancer cell proliferation, and in prostate cancer a high cannabinoid CB1 receptor expression is associated with a poor prognosis. Down-stream mediators of CB1 receptor signaling in prostate cancer are known, but information on potential upstream regulators is lacking. RESULTS Data from a well-characterized tumor tissue microarray were used for a Bayesian network analysis using the max-min hill-climbing method. In non-malignant tissue samples, a directionality of pEGFR (the phosphorylated form of the epidermal growth factor receptor) → CB1 receptors were found regardless as to whether the endocannabinoid metabolizing enzyme fatty acid amide hydrolase (FAAH) was included as a parameter. A similar result was found in the tumor tissue, but only when FAAH was included in the analysis. A second regulatory pathway, from the growth factor receptor ErbB2 → FAAH was also identified in the tumor samples. Transfection of AT1 prostate cancer cells with CB1 receptors induced a sensitivity to the growth-inhibiting effects of the CB receptor agonist CP55,940. The sensitivity was not dependent upon the level of receptor expression. Thus a high CB1 receptor expression alone does not drive the cells towards a survival phenotype in the presence of a CB receptor agonist. CONCLUSIONS The data identify two potential regulators of the endocannabinoid system in prostate cancer and allow the construction of a model of a dysregulated endocannabinoid signaling network in this tumor. Further studies should be designed to test the veracity of the predictions of the network analysis in prostate cancer and other solid tumors.
Collapse
Affiliation(s)
- Jenny Häggström
- Department of Statistics, Umeå School of Business and Economics, Umeå UniversityUmeå, Sweden
| | - Mariateresa Cipriano
- Department of Pharmacology and Clinical Neuroscience, Umeå UniversityUmeå, Sweden
| | - Linus Plym Forshell
- Department of Pharmacology and Clinical Neuroscience, Umeå UniversityUmeå, Sweden
| | - Emma Persson
- Department of Radiation Sciences, Oncology, Umeå UniversityUmeå, Sweden
| | - Peter Hammarsten
- Department of Medical Biosciences, Pathology, Umeå UniversityUmeå, Sweden
| | - Nephi Stella
- Department of Pharmacology, Psychiatry and Behavioral Sciences, University of WashingtonSeattle, Washington
| | - Christopher J Fowler
- Department of Pharmacology and Clinical Neuroscience, Umeå UniversityUmeå, Sweden
- *Correspondence to: Professor Christopher J. Fowler, Department of Pharmacology and Clinical Neuroscience, Umeå University, SE-901 87, Umeå, Sweden. E-mail:
| |
Collapse
|
116
|
Kiani NA, Kaderali L. Dynamic probabilistic threshold networks to infer signaling pathways from time-course perturbation data. BMC Bioinformatics 2014; 15:250. [PMID: 25047753 PMCID: PMC4133630 DOI: 10.1186/1471-2105-15-250] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Accepted: 07/15/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Network inference deals with the reconstruction of molecular networks from experimental data. Given N molecular species, the challenge is to find the underlying network. Due to data limitations, this typically is an ill-posed problem, and requires the integration of prior biological knowledge or strong regularization. We here focus on the situation when time-resolved measurements of a system's response after systematic perturbations are available. RESULTS We present a novel method to infer signaling networks from time-course perturbation data. We utilize dynamic Bayesian networks with probabilistic Boolean threshold functions to describe protein activation. The model posterior distribution is analyzed using evolutionary MCMC sampling and subsequent clustering, resulting in probability distributions over alternative networks. We evaluate our method on simulated data, and study its performance with respect to data set size and levels of noise. We then use our method to study EGF-mediated signaling in the ERBB pathway. CONCLUSIONS Dynamic Probabilistic Threshold Networks is a new method to infer signaling networks from time-series perturbation data. It exploits the dynamic response of a system after external perturbation for network reconstruction. On simulated data, we show that the approach outperforms current state of the art methods. On the ERBB data, our approach recovers a significant fraction of the known interactions, and predicts novel mechanisms in the ERBB pathway.
Collapse
Affiliation(s)
- Narsis A Kiani
- Technische Universität Dresden, Medical Faculty Carl Gustav Carus, Institute for Medical Informatics and Biometry, Fetscherstr, 74, 01307 Dresden, Germany.
| | | |
Collapse
|
117
|
Astola L, Molenaar J. A New Modified Histogram Matching Normalization for Time Series Microarray Analysis. MICROARRAYS 2014; 3:203-11. [PMID: 27600344 PMCID: PMC4996360 DOI: 10.3390/microarrays3030203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Revised: 06/19/2014] [Accepted: 06/25/2014] [Indexed: 11/16/2022]
Abstract
Microarray data is often utilized in inferring regulatory networks. Quantile normalization (QN) is a popular method to reduce array-to-array variation. We show that in the context of time series measurements QN may not be the best choice for this task, especially not if the inference is based on continuous time ODE model. We propose an alternative normalization method that is better suited for network inference from time series data.
Collapse
Affiliation(s)
- Laura Astola
- Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven 5612 AZ,The Netherlands.
| | - Jaap Molenaar
- Biometris, Wageningen University and Research Centre, Wageningen 6708 PB, The Netherlands.
- Wageningen Centre for Systems Biology, Wageningen 6700 AC, The Netherlands.
| |
Collapse
|
118
|
Bazil JN, Stamm KD, Li X, Thiagarajan R, Nelson TJ, Tomita-Mitchell A, Beard DA. The inferred cardiogenic gene regulatory network in the mammalian heart. PLoS One 2014; 9:e100842. [PMID: 24971943 PMCID: PMC4074065 DOI: 10.1371/journal.pone.0100842] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 05/31/2014] [Indexed: 12/22/2022] Open
Abstract
Cardiac development is a complex, multiscale process encompassing cell fate adoption, differentiation and morphogenesis. To elucidate pathways underlying this process, a recently developed algorithm to reverse engineer gene regulatory networks was applied to time-course microarray data obtained from the developing mouse heart. Approximately 200 genes of interest were input into the algorithm to generate putative network topologies that are capable of explaining the experimental data via model simulation. To cull specious network interactions, thousands of putative networks are merged and filtered to generate scale-free, hierarchical networks that are statistically significant and biologically relevant. The networks are validated with known gene interactions and used to predict regulatory pathways important for the developing mammalian heart. Area under the precision-recall curve and receiver operator characteristic curve are 9% and 58%, respectively. Of the top 10 ranked predicted interactions, 4 have already been validated. The algorithm is further tested using a network enriched with known interactions and another depleted of them. The inferred networks contained more interactions for the enriched network versus the depleted network. In all test cases, maximum performance of the algorithm was achieved when the purely data-driven method of network inference was combined with a data-independent, functional-based association method. Lastly, the network generated from the list of approximately 200 genes of interest was expanded using gene-profile uniqueness metrics to include approximately 900 additional known mouse genes and to form the most likely cardiogenic gene regulatory network. The resultant network supports known regulatory interactions and contains several novel cardiogenic regulatory interactions. The method outlined herein provides an informative approach to network inference and leads to clear testable hypotheses related to gene regulation.
Collapse
Affiliation(s)
- Jason N. Bazil
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Karl D. Stamm
- Biotechnology and Bioengineering Center, Medical College of Wisconsin, Milwaukee, Wisconsin, United States of America
| | - Xing Li
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Raghuram Thiagarajan
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Timothy J. Nelson
- Departments of Medicine, Molecular Pharmacology and Experimental Therapeutics, and Mayo Clinic Center for Regenerative Medicine, Rochester, Minnesota, United States of America
| | - Aoy Tomita-Mitchell
- Biotechnology and Bioengineering Center, Medical College of Wisconsin, Milwaukee, Wisconsin, United States of America
| | - Daniel A. Beard
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|
119
|
Wang Y, Penfold CA, Hodgson DA, Gifford ML, Burroughs NJ. Correcting for link loss in causal network inference caused by regulator interference. ACTA ACUST UNITED AC 2014; 30:2779-86. [PMID: 24947751 PMCID: PMC4173021 DOI: 10.1093/bioinformatics/btu388] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
MOTIVATION There are a number of algorithms to infer causal regulatory networks from time series (gene expression) data. Here we analyse the phenomena of regulator interference, where regulators with similar dynamics mutually suppress both the probability of regulating a target and the associated link strength; for instance, interference between two identical strong regulators reduces link probabilities by ∼50%. RESULTS We construct a robust method to define an interference-corrected causal network based on an analysis of the conditional link probabilities that recovers links lost through interference. On a large real network (Streptomyces coelicolor, phosphate depletion), we demonstrate that significant interference can occur between regulators with a correlation as low as 0.865, losing an estimated 34% of links by interference. However, levels of interference cannot be predicted from the correlation between regulators alone and are data specific. Validating against known networks, we show that high numbers of functional links are lost by regulator interference. Performance against other methods on DREAM4 data is excellent. AVAILABILITY AND IMPLEMENTATION The method is implemented in R and is publicly available as the NIACS package at http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software.
Collapse
Affiliation(s)
- Ying Wang
- Warwick Systems Biology Centre and School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
| | - Christopher A Penfold
- Warwick Systems Biology Centre and School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
| | - David A Hodgson
- Warwick Systems Biology Centre and School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
| | - Miriam L Gifford
- Warwick Systems Biology Centre and School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
| | - Nigel J Burroughs
- Warwick Systems Biology Centre and School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|
120
|
Abstract
It is often of interest to understand how the structure of a genetic network differs between two conditions. In this paper, each condition-specific network is modeled using the precision matrix of a multivariate normal random vector, and a method is proposed to directly estimate the difference of the precision matrices. In contrast to other approaches, such as separate or joint estimation of the individual matrices, direct estimation does not require those matrices to be sparse, and thus can allow the individual networks to contain hub nodes. Under the assumption that the true differential network is sparse, the direct estimator is shown to be consistent in support recovery and estimation. It is also shown to outperform existing methods in simulations, and its properties are illustrated on gene expression data from late-stage ovarian cancer patients.
Collapse
Affiliation(s)
- Sihai Dave Zhao
- Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - T Tony Cai
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Hongzhe Li
- Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
121
|
Villaverde AF, Ross J, Morán F, Banga JR. MIDER: network inference with mutual information distance and entropy reduction. PLoS One 2014; 9:e96732. [PMID: 24806471 PMCID: PMC4013075 DOI: 10.1371/journal.pone.0096732] [Citation(s) in RCA: 91] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Accepted: 04/09/2014] [Indexed: 01/14/2023] Open
Abstract
The prediction of links among variables from a given dataset is a task referred to as network inference or reverse engineering. It is an open problem in bioinformatics and systems biology, as well as in other areas of science. Information theory, which uses concepts such as mutual information, provides a rigorous framework for addressing it. While a number of information-theoretic methods are already available, most of them focus on a particular type of problem, introducing assumptions that limit their generality. Furthermore, many of these methods lack a publicly available implementation. Here we present MIDER, a method for inferring network structures with information theoretic concepts. It consists of two steps: first, it provides a representation of the network in which the distance among nodes indicates their statistical closeness. Second, it refines the prediction of the existing links to distinguish between direct and indirect interactions and to assign directionality. The method accepts as input time-series data related to some quantitative features of the network nodes (such as e.g. concentrations, if the nodes are chemical species). It takes into account time delays between variables, and allows choosing among several definitions and normalizations of mutual information. It is general purpose: it may be applied to any type of network, cellular or otherwise. A Matlab implementation including source code and data is freely available (http://www.iim.csic.es/~gingproc/mider.html). The performance of MIDER has been evaluated on seven different benchmark problems that cover the main types of cellular networks, including metabolic, gene regulatory, and signaling. Comparisons with state of the art information–theoretic methods have demonstrated the competitive performance of MIDER, as well as its versatility. Its use does not demand any a priori knowledge from the user; the default settings and the adaptive nature of the method provide good results for a wide range of problems without requiring tuning.
Collapse
Affiliation(s)
| | - John Ross
- Department of Chemistry, Stanford University, Stanford, California, United States of America
| | - Federico Morán
- Department of Biochemistry and Molecular Biology, Complutense University, Madrid, Spain
| | | |
Collapse
|
122
|
Henderson J, Michailidis G. Network reconstruction using nonparametric additive ODE models. PLoS One 2014; 9:e94003. [PMID: 24732037 PMCID: PMC3986056 DOI: 10.1371/journal.pone.0094003] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 03/13/2014] [Indexed: 01/05/2023] Open
Abstract
Network representations of biological systems are widespread and reconstructing unknown networks from data is a focal problem for computational biologists. For example, the series of biochemical reactions in a metabolic pathway can be represented as a network, with nodes corresponding to metabolites and edges linking reactants to products. In a different context, regulatory relationships among genes are commonly represented as directed networks with edges pointing from influential genes to their targets. Reconstructing such networks from data is a challenging problem receiving much attention in the literature. There is a particular need for approaches tailored to time-series data and not reliant on direct intervention experiments, as the former are often more readily available. In this paper, we introduce an approach to reconstructing directed networks based on dynamic systems models. Our approach generalizes commonly used ODE models based on linear or nonlinear dynamics by extending the functional class for the functions involved from parametric to nonparametric models. Concomitantly we limit the complexity by imposing an additive structure on the estimated slope functions. Thus the submodel associated with each node is a sum of univariate functions. These univariate component functions form the basis for a novel coupling metric that we define in order to quantify the strength of proposed relationships and hence rank potential edges. We show the utility of the method by reconstructing networks using simulated data from computational models for the glycolytic pathway of Lactocaccus Lactis and a gene network regulating the pluripotency of mouse embryonic stem cells. For purposes of comparison, we also assess reconstruction performance using gene networks from the DREAM challenges. We compare our method to those that similarly rely on dynamic systems models and use the results to attempt to disentangle the distinct roles of linearity, sparsity, and derivative estimation.
Collapse
Affiliation(s)
- James Henderson
- Department of Statistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - George Michailidis
- Department of Statistics, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
123
|
Olsen C, Fleming K, Prendergast N, Rubio R, Emmert-Streib F, Bontempi G, Haibe-Kains B, Quackenbush J. Inference and validation of predictive gene networks from biomedical literature and gene expression data. Genomics 2014; 103:329-36. [PMID: 24691108 DOI: 10.1016/j.ygeno.2014.03.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2013] [Revised: 01/23/2014] [Accepted: 03/15/2014] [Indexed: 02/04/2023]
Abstract
Although many methods have been developed for inference of biological networks, the validation of the resulting models has largely remained an unsolved problem. Here we present a framework for quantitative assessment of inferred gene interaction networks using knock-down data from cell line experiments. Using this framework we are able to show that network inference based on integration of prior knowledge derived from the biomedical literature with genomic data significantly improves the quality of inferred networks relative to other approaches. Our results also suggest that cell line experiments can be used to quantitatively assess the quality of networks inferred from tumor samples.
Collapse
Affiliation(s)
- Catharina Olsen
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium; Interuniversity Institute of Bioinformatics Brussels, ULB-VUB, La Plaine Campus, Brussels, Belgium
| | - Kathleen Fleming
- Computational Biology and Functional Genomics Laboratory, Dana-Farber Cancer Institute, Harvard School of Public Health, Boston, MA, USA
| | - Niall Prendergast
- Computational Biology and Functional Genomics Laboratory, Dana-Farber Cancer Institute, Harvard School of Public Health, Boston, MA, USA
| | - Renee Rubio
- Computational Biology and Functional Genomics Laboratory, Dana-Farber Cancer Institute, Harvard School of Public Health, Boston, MA, USA
| | - Frank Emmert-Streib
- Computational Biology and Machine Learning Laboratory, Center for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, UK
| | - Gianluca Bontempi
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium; Interuniversity Institute of Bioinformatics Brussels, ULB-VUB, La Plaine Campus, Brussels, Belgium
| | - Benjamin Haibe-Kains
- Bioinformatics and Computational Genomics Laboratory, Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada.
| | - John Quackenbush
- Computational Biology and Functional Genomics Laboratory, Dana-Farber Cancer Institute, Harvard School of Public Health, Boston, MA, USA.
| |
Collapse
|
124
|
Usie A, Karathia H, Teixidó I, Alves R, Solsona F. Biblio-MetReS for user-friendly mining of genes and biological processes in scientific documents. PeerJ 2014; 2:e276. [PMID: 24688854 PMCID: PMC3940481 DOI: 10.7717/peerj.276] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2013] [Accepted: 01/27/2014] [Indexed: 01/18/2023] Open
Abstract
UNLABELLED One way to initiate the reconstruction of molecular circuits is by using automated text-mining techniques. Developing more efficient methods for such reconstruction is a topic of active research, and those methods are typically included by bioinformaticians in pipelines used to mine and curate large literature datasets. Nevertheless, experimental biologists have a limited number of available user-friendly tools that use text-mining for network reconstruction and require no programming skills to use. One of these tools is Biblio-MetReS. Originally, this tool permitted an on-the-fly analysis of documents contained in a number of web-based literature databases to identify co-occurrence of proteins/genes. This approach ensured results that were always up-to-date with the latest live version of the databases. However, this 'up-to-dateness' came at the cost of large execution times. Here we report an evolution of the application Biblio-MetReS that permits constructing co-occurrence networks for genes, GO processes, Pathways, or any combination of the three types of entities and graphically represent those entities. We show that the performance of Biblio-MetReS in identifying gene co-occurrence is as least as good as that of other comparable applications (STRING and iHOP). In addition, we also show that the identification of GO processes is on par to that reported in the latest BioCreAtIvE challenge. Finally, we also report the implementation of a new strategy that combines on-the-fly analysis of new documents with preprocessed information from documents that were encountered in previous analyses. This combination simultaneously decreases program run time and maintains 'up-to-dateness' of the results. AVAILABILITY http://metres.udl.cat/index.php/downloads, CONTACT metres.cmb@gmail.com.
Collapse
Affiliation(s)
- Anabel Usie
- Department of Basic Medical Sciences, Edifici Recerca Biomedica I, Universitat de Lleida and IRBLleida, Lleida, Spain
- Department of Computer Science, Escola Politècnica Superior and INSPIRES, Universitat de Lleida, Lleida, Spain
| | - Hiren Karathia
- Department of Basic Medical Sciences, Edifici Recerca Biomedica I, Universitat de Lleida and IRBLleida, Lleida, Spain
| | - Ivan Teixidó
- Department of Computer Science, Escola Politècnica Superior and INSPIRES, Universitat de Lleida, Lleida, Spain
| | - Rui Alves
- Department of Basic Medical Sciences, Edifici Recerca Biomedica I, Universitat de Lleida and IRBLleida, Lleida, Spain
| | - Francesc Solsona
- Department of Computer Science, Escola Politècnica Superior and INSPIRES, Universitat de Lleida, Lleida, Spain
| |
Collapse
|
125
|
Windhager L, Zierer J, Küffner R. Refining ensembles of predicted gene regulatory networks based on characteristic interaction sets. PLoS One 2014; 9:e84596. [PMID: 24498260 PMCID: PMC3911903 DOI: 10.1371/journal.pone.0084596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Accepted: 11/14/2013] [Indexed: 11/30/2022] Open
Abstract
Different ensemble voting approaches have been successfully applied for reverse-engineering of gene regulatory networks. They are based on the assumption that a good approximation of true network structure can be derived by considering the frequencies of individual interactions in a large number of predicted networks. Such approximations are typically superior in terms of prediction quality and robustness as compared to considering a single best scoring network only. Nevertheless, ensemble approaches only work well if the predicted gene regulatory networks are sufficiently similar to each other. If the topologies of predicted networks are considerably different, an ensemble of all networks obscures interesting individual characteristics. Instead, networks should be grouped according to local topological similarities and ensemble voting performed for each group separately. We argue that the presence of sets of co-occurring interactions is a suitable indicator for grouping predicted networks. A stepwise bottom-up procedure is proposed, where first mutual dependencies between pairs of interactions are derived from predicted networks. Pairs of co-occurring interactions are subsequently extended to derive characteristic interaction sets that distinguish groups of networks. Finally, ensemble voting is applied separately to the resulting topologically similar groups of networks to create distinct group-ensembles. Ensembles of topologically similar networks constitute distinct hypotheses about the reference network structure. Such group-ensembles are easier to interpret as their characteristic topology becomes clear and dependencies between interactions are known. The availability of distinct hypotheses facilitates the design of further experiments to distinguish between plausible network structures. The proposed procedure is a reasonable refinement step for non-deterministic reverse-engineering applications that produce a large number of candidate predictions for a gene regulatory network, e.g. due to probabilistic optimization or a cross-validation procedure.
Collapse
Affiliation(s)
- Lukas Windhager
- Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Jonas Zierer
- Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Robert Küffner
- Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
- * E-mail:
| |
Collapse
|
126
|
Villaverde AF, Banga JR. Reverse engineering and identification in systems biology: strategies, perspectives and challenges. J R Soc Interface 2014; 11:20130505. [PMID: 24307566 PMCID: PMC3869153 DOI: 10.1098/rsif.2013.0505] [Citation(s) in RCA: 163] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 11/12/2013] [Indexed: 12/17/2022] Open
Abstract
The interplay of mathematical modelling with experiments is one of the central elements in systems biology. The aim of reverse engineering is to infer, analyse and understand, through this interplay, the functional and regulatory mechanisms of biological systems. Reverse engineering is not exclusive of systems biology and has been studied in different areas, such as inverse problem theory, machine learning, nonlinear physics, (bio)chemical kinetics, control theory and optimization, among others. However, it seems that many of these areas have been relatively closed to outsiders. In this contribution, we aim to compare and highlight the different perspectives and contributions from these fields, with emphasis on two key questions: (i) why are reverse engineering problems so hard to solve, and (ii) what methods are available for the particular problems arising from systems biology?
Collapse
Affiliation(s)
| | - Julio R. Banga
- BioProcess Engineering Group, IIM-CSIC, Spanish National Research Council, Vigo 36208, Spain
| |
Collapse
|
127
|
Abstract
Noise permeates biology on all levels, from the most basic molecular, sub-cellular processes to the dynamics of tissues, organs, organisms and populations. The functional roles of noise in biological processes can vary greatly. Along with standard, entropy-increasing effects of producing random mutations, diversifying phenotypes in isogenic populations, limiting information capacity of signaling relays, it occasionally plays more surprising constructive roles by accelerating the pace of evolution, providing selective advantage in dynamic environments, enhancing intracellular transport of biomolecules and increasing information capacity of signaling pathways. This short review covers the recent progress in understanding mechanisms and effects of fluctuations in biological systems of different scales and the basic approaches to their mathematical modeling.
Collapse
Affiliation(s)
- Lev S. Tsimring
- BioCircuits Institute, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0328, USA
| |
Collapse
|
128
|
Wang G, Yang E, Mandhan I, Brinkmeyer-Langford CL, Cai JJ. Population-level expression variability of mitochondrial DNA-encoded genes in humans. Eur J Hum Genet 2014; 22:1093-9. [PMID: 24398800 DOI: 10.1038/ejhg.2013.293] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Revised: 10/22/2013] [Accepted: 11/09/2013] [Indexed: 12/28/2022] Open
Abstract
Human mitochondria contain multiple copies of a circular genome made up of double-stranded DNA (mtDNA) that encodes proteins involved in cellular respiration. Transcript abundance of mtDNA-encoded genes varies between human individuals, yet the level of variation in the general population has not been systematically assessed. In the present study, we revisited large-scale RNA sequencing data generated from lymphoblastoid cell lines of HapMap samples of European and African ancestry to estimate transcript abundance and quantify expression variation for mtDNA-encoded genes. In both populations, we detected up to over 100-fold difference in mtDNA gene expression between individuals. The marked variation was not due to differences in mtDNA copy number between individuals, but was shaped by the transcription of hundreds of nuclear genes. Many of these nuclear genes were co-expressed with one another, resulting in a module-enriched co-expression network. Significant correlations in expression between genes of the mtDNA and nuclear genomes were used to identify factors involved with the regulation of mitochondrial functions. In conclusion, we determined the baseline amount of variability in mtDNA gene expression in general human populations and cataloged a complete set of nuclear genes whose expression levels are correlated with those of mtDNA-encoded genes. Our findings will enable the integration of information from both mtDNA and nuclear genetic systems, and facilitate the discovery of novel regulatory pathways involving mitochondrial functions.
Collapse
Affiliation(s)
- Gang Wang
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
| | - Ence Yang
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
| | - Ishita Mandhan
- Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | | | - James J Cai
- 1] Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA [2] Interdisciplinary Program in Genetics, Texas A&M University, College Station, TX, USA
| |
Collapse
|
129
|
Öksüz M, Sadıkoğlu H, Çakır T. Sparsity as cellular objective to infer directed metabolic networks from steady-state metabolome data: a theoretical analysis. PLoS One 2013; 8:e84505. [PMID: 24391961 PMCID: PMC3877278 DOI: 10.1371/journal.pone.0084505] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2013] [Accepted: 11/21/2013] [Indexed: 12/11/2022] Open
Abstract
Since metabolome data are derived from the underlying metabolic network, reverse engineering of such data to recover the network topology is of wide interest. Lyapunov equation puts a constraint to the link between data and network by coupling the covariance of data with the strength of interactions (Jacobian matrix). This equation, when expressed as a linear set of equations at steady state, constitutes a basis to infer the network structure given the covariance matrix of data. The sparse structure of metabolic networks points to reactions which are active based on minimal enzyme production, hinting at sparsity as a cellular objective. Therefore, for a given covariance matrix, we solved Lyapunov equation to calculate Jacobian matrix by a simultaneous use of minimization of Euclidean norm of residuals and maximization of sparsity (the number of zeros in Jacobian matrix) as objective functions to infer directed small-scale networks from three kingdoms of life (bacteria, fungi, mammalian). The inference performance of the approach was found to be promising, with zero False Positive Rate, and almost one True positive Rate. The effect of missing data on results was additionally analyzed, revealing superiority over similarity-based approaches which infer undirected networks. Our findings suggest that the covariance of metabolome data implies an underlying network with sparsest pattern. The theoretical analysis forms a framework for further investigation of sparsity-based inference of metabolic networks from real metabolome data.
Collapse
Affiliation(s)
- Melik Öksüz
- Department of Bioengineering, Gebze Institute of Technology, Gebze, Kocaeli, Turkey
- Department of Chemical Engineering, Gebze Institute of Technology, Gebze, Kocaeli, Turkey
| | - Hasan Sadıkoğlu
- Department of Chemical Engineering, Gebze Institute of Technology, Gebze, Kocaeli, Turkey
| | - Tunahan Çakır
- Department of Bioengineering, Gebze Institute of Technology, Gebze, Kocaeli, Turkey
- * E-mail:
| |
Collapse
|
130
|
Jia B, Wang X. Gene regulatory network inference by point-based Gaussian approximation filters incorporating the prior information. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2013; 2013:16. [PMID: 24341668 PMCID: PMC3977693 DOI: 10.1186/1687-4153-2013-16] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2013] [Accepted: 11/11/2013] [Indexed: 11/17/2022]
Abstract
The extended Kalman filter (EKF) has been applied to inferring gene regulatory
networks. However, it is well known that the EKF becomes less accurate when the
system exhibits high nonlinearity. In addition, certain prior information about
the gene regulatory network exists in practice, and no systematic approach has
been developed to incorporate such prior information into the Kalman-type filter
for inferring the structure of the gene regulatory network. In this paper, an
inference framework based on point-based Gaussian approximation filters that can
exploit the prior information is developed to solve the gene regulatory network
inference problem. Different point-based Gaussian approximation filters, including
the unscented Kalman filter (UKF), the third-degree cubature Kalman filter
(CKF3), and the fifth-degree cubature Kalman filter
(CKF5) are employed. Several types of network prior information,
including the existing network structure information, sparsity assumption, and the
range constraint of parameters, are considered, and the corresponding filters
incorporating the prior information are developed. Experiments on a synthetic
network of eight genes and the yeast protein synthesis network of five genes are
carried out to demonstrate the performance of the proposed framework. The results
show that the proposed methods provide more accurate inference results than
existing methods, such as the EKF and the traditional UKF.
Collapse
Affiliation(s)
| | - Xiaodong Wang
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA.
| |
Collapse
|
131
|
Martini P, Sales G, Calura E, Brugiolo M, Lanfranchi G, Romualdi C, Cagnin S. Systems biology approach to the dissection of the complexity of regulatory networks in the S. scrofa cardiocirculatory system. Int J Mol Sci 2013; 14:23160-87. [PMID: 24284405 PMCID: PMC3856112 DOI: 10.3390/ijms141123160] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Revised: 10/23/2013] [Accepted: 11/02/2013] [Indexed: 12/23/2022] Open
Abstract
Genome-wide experiments are routinely used to increase the understanding of the biological processes involved in the development and maintenance of a variety of pathologies. Although the technical feasibility of this type of experiment has improved in recent years, data analysis remains challenging. In this context, gene set analysis has emerged as a fundamental tool for the interpretation of the results. Here, we review strategies used in the gene set approach, and using datasets for the pig cardiocirculatory system as a case study, we demonstrate how the use of a combination of these strategies can enhance the interpretation of results. Gene set analyses are able to distinguish vessels from the heart and arteries from veins in a manner that is consistent with the different cellular composition of smooth muscle cells. By integrating microRNA elements in the regulatory circuits identified, we find that vessel specificity is maintained through specific miRNAs, such as miR-133a and miR-143, which show anti-correlated expression with their mRNA targets.
Collapse
Affiliation(s)
- Paolo Martini
- Department of Biology, University of Padova, Via G. Colombo 3, Padova 35121, Italy; E-Mails: (P.M.); (G.S.); (E.C.); (G.L.)
| | - Gabriele Sales
- Department of Biology, University of Padova, Via G. Colombo 3, Padova 35121, Italy; E-Mails: (P.M.); (G.S.); (E.C.); (G.L.)
| | - Enrica Calura
- Department of Biology, University of Padova, Via G. Colombo 3, Padova 35121, Italy; E-Mails: (P.M.); (G.S.); (E.C.); (G.L.)
| | - Mattia Brugiolo
- C.R.I.B.I. Biotechnology Centre, University of Padova, Via U. Bassi 58/B, Padova 35121, Italy; E-Mail:
| | - Gerolamo Lanfranchi
- Department of Biology, University of Padova, Via G. Colombo 3, Padova 35121, Italy; E-Mails: (P.M.); (G.S.); (E.C.); (G.L.)
- C.R.I.B.I. Biotechnology Centre, University of Padova, Via U. Bassi 58/B, Padova 35121, Italy; E-Mail:
| | - Chiara Romualdi
- Department of Biology, University of Padova, Via G. Colombo 3, Padova 35121, Italy; E-Mails: (P.M.); (G.S.); (E.C.); (G.L.)
- Authors to whom correspondence should be addressed; E-Mails: (C.R.); (S.C.); Tel.: +39-049-827-7401 (C.R.); +39-049-827-6162 (S.C.); Fax: +39-049-827-6159 (C.R. & S.C.)
| | - Stefano Cagnin
- Department of Biology, University of Padova, Via G. Colombo 3, Padova 35121, Italy; E-Mails: (P.M.); (G.S.); (E.C.); (G.L.)
- C.R.I.B.I. Biotechnology Centre, University of Padova, Via U. Bassi 58/B, Padova 35121, Italy; E-Mail:
- Authors to whom correspondence should be addressed; E-Mails: (C.R.); (S.C.); Tel.: +39-049-827-7401 (C.R.); +39-049-827-6162 (S.C.); Fax: +39-049-827-6159 (C.R. & S.C.)
| |
Collapse
|
132
|
Misra A, Sriram G. Network component analysis provides quantitative insights on an Arabidopsis transcription factor-gene regulatory network. BMC SYSTEMS BIOLOGY 2013; 7:126. [PMID: 24228871 PMCID: PMC3843564 DOI: 10.1186/1752-0509-7-126] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2013] [Accepted: 11/05/2013] [Indexed: 01/01/2023]
Abstract
Background Gene regulatory networks (GRNs) are models of molecule-gene interactions instrumental in the coordination of gene expression. Transcription factor (TF)-GRNs are an important subset of GRNs that characterize gene expression as the effect of TFs acting on their target genes. Although such networks can qualitatively summarize TF-gene interactions, it is highly desirable to quantitatively determine the strengths of the interactions in a TF-GRN as well as the magnitudes of TF activities. To our knowledge, such analysis is rare in plant biology. A computational methodology developed for this purpose is network component analysis (NCA), which has been used for studying large-scale microbial TF-GRNs to obtain nontrivial, mechanistic insights. In this work, we employed NCA to quantitatively analyze a plant TF-GRN important in floral development using available regulatory information from AGRIS, by processing previously reported gene expression data from four shoot apical meristem cell types. Results The NCA model satisfactorily accounted for gene expression measurements in a TF-GRN of seven TFs (LFY, AG, SEPALLATA3 [SEP3], AP2, AGL15, HY5 and AP3/PI) and 55 genes. NCA found strong interactions between certain TF-gene pairs including LFY → MYB17, AG → CRC, AP2 → RD20, AGL15 → RAV2 and HY5 → HLH1, and the direction of the interaction (activation or repression) for some AGL15 targets for which this information was not previously available. The activity trends of four TFs - LFY, AG, HY5 and AP3/PI as deduced by NCA correlated well with the changes in expression levels of the genes encoding these TFs across all four cell types; such a correlation was not observed for SEP3, AP2 and AGL15. Conclusions For the first time, we have reported the use of NCA to quantitatively analyze a plant TF-GRN important in floral development for obtaining nontrivial information about connectivity strengths between TFs and their target genes as well as TF activity. However, since NCA relies on documented connectivity information about the underlying TF-GRN, it is currently limited in its application to larger plant networks because of the lack of documented connectivities. In the future, the identification of interactions between plant TFs and their target genes on a genome scale would allow the use of NCA to provide quantitative regulatory information about plant TF-GRNs, leading to improved insights on cellular regulatory programs.
Collapse
Affiliation(s)
| | - Ganesh Sriram
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
133
|
Maria G, Luta I. Structured cell simulator coupled with a fluidized bed bioreactor model to predict the adaptive mercury uptake by E. coli cells. Comput Chem Eng 2013. [DOI: 10.1016/j.compchemeng.2013.06.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
134
|
Sławek J, Arodź T. ENNET: inferring large gene regulatory networks from expression data using gradient boosting. BMC SYSTEMS BIOLOGY 2013; 7:106. [PMID: 24148309 PMCID: PMC4015806 DOI: 10.1186/1752-0509-7-106] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2013] [Accepted: 10/17/2013] [Indexed: 01/19/2023]
Abstract
BACKGROUND The regulation of gene expression by transcription factors is a key determinant of cellular phenotypes. Deciphering genome-wide networks that capture which transcription factors regulate which genes is one of the major efforts towards understanding and accurate modeling of living systems. However, reverse-engineering the network from gene expression profiles remains a challenge, because the data are noisy, high dimensional and sparse, and the regulation is often obscured by indirect connections. RESULTS We introduce a gene regulatory network inference algorithm ENNET, which reverse-engineers networks of transcriptional regulation from a variety of expression profiles with a superior accuracy compared to the state-of-the-art methods. The proposed method relies on the boosting of regression stumps combined with a relative variable importance measure for the initial scoring of transcription factors with respect to each gene. Then, we propose a technique for using a distribution of the initial scores and information about knockouts to refine the predictions. We evaluated the proposed method on the DREAM3, DREAM4 and DREAM5 data sets and achieved higher accuracy than the winners of those competitions and other established methods. CONCLUSIONS Superior accuracy achieved on the three different benchmark data sets shows that ENNET is a top contender in the task of network inference. It is a versatile method that uses information about which gene was knocked-out in which experiment if it is available, but remains the top performer even without such information. ENNET is available for download from https://github.com/slawekj/ennet under the GNU GPLv3 license.
Collapse
Affiliation(s)
- Janusz Sławek
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia
| | - Tomasz Arodź
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia
| |
Collapse
|
135
|
Roy S, Lagree S, Hou Z, Thomson JA, Stewart R, Gasch AP. Integrated module and gene-specific regulatory inference implicates upstream signaling networks. PLoS Comput Biol 2013; 9:e1003252. [PMID: 24146602 PMCID: PMC3798279 DOI: 10.1371/journal.pcbi.1003252] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2013] [Accepted: 08/17/2013] [Indexed: 11/19/2022] Open
Abstract
Regulatory networks that control gene expression are important in diverse biological contexts including stress response and development. Each gene's regulatory program is determined by module-level regulation (e.g. co-regulation via the same signaling system), as well as gene-specific determinants that can fine-tune expression. We present a novel approach, Modular regulatory network learning with per gene information (MERLIN), that infers regulatory programs for individual genes while probabilistically constraining these programs to reveal module-level organization of regulatory networks. Using edge-, regulator- and module-based comparisons of simulated networks of known ground truth, we find MERLIN reconstructs regulatory programs of individual genes as well or better than existing approaches of network reconstruction, while additionally identifying modular organization of the regulatory networks. We use MERLIN to dissect global transcriptional behavior in two biological contexts: yeast stress response and human embryonic stem cell differentiation. Regulatory modules inferred by MERLIN capture co-regulatory relationships between signaling proteins and downstream transcription factors thereby revealing the upstream signaling systems controlling transcriptional responses. The inferred networks are enriched for regulators with genetic or physical interactions, supporting the inference, and identify modules of functionally related genes bound by the same transcriptional regulators. Our method combines the strengths of per-gene and per-module methods to reveal new insights into transcriptional regulation in stress and development.
Collapse
Affiliation(s)
- Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Wisconsin Institute for Discovery, Madison, Wisconsin, United States of America
- * E-mail:
| | - Stephen Lagree
- Department of Computer Science, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Zhonggang Hou
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
| | - James A. Thomson
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Ron Stewart
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
| | - Audrey P. Gasch
- Department of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
136
|
Network-based approaches in drug discovery and early development. Clin Pharmacol Ther 2013; 94:651-8. [PMID: 24025802 DOI: 10.1038/clpt.2013.176] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 09/03/2013] [Indexed: 12/20/2022]
Abstract
Identification of novel targets is a critical first step in the drug discovery and development process. Most diseases such as cancer, metabolic disorders, and neurological disorders are complex, and their pathogenesis involves multiple genetic and environmental factors. Finding a viable drug target-drug combination with high potential for yielding clinical success within the efficacy-toxicity spectrum is extremely challenging. Many examples are now available in which network-based approaches show potential for the identification of novel targets and for the repositioning of established targets. The objective of this article is to highlight network approaches for identifying novel targets with greater chances of gaining approved drugs with maximal efficacy and minimal side effects. Further enhancement of these approaches may emerge from effectively integrating computational systems biology with pharmacodynamic systems analysis. Coupling genomics, proteomics, and metabolomics databases with systems pharmacology modeling may aid in the development of disease-specific networks that can be further used to build confidence in target identification.
Collapse
|
137
|
Rahman A, Poirel CL, Badger DJ, Estep C, Murali T. Reverse engineering molecular hypergraphs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1113-1124. [PMID: 24384702 PMCID: PMC4051496 DOI: 10.1109/tcbb.2013.71] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Analysis of molecular interaction networks is pervasive in systems biology. This research relies almost entirely on graphs for modeling interactions. However, edges in graphs cannot represent multiway interactions among molecules, which occur very often within cells. Hypergraphs may be better representations for networks having such interactions, since hyperedges can naturally represent relationships among multiple molecules. Here, we propose using hypergraphs to capture the uncertainty inherent in reverse engineering gene-gene networks. Some subsets of nodes may induce highly varying subgraphs across an ensemble of networks inferred by a reverse engineering algorithm. We provide a novel formulation of hyperedges to capture this uncertainty in network topology. We propose a clustering-based approach to discover hyperedges. We show that our approach can recover hyperedges planted in synthetic data sets with high precision and recall, even for moderate amount of noise. We apply our techniques to a data set of pathways inferred from genetic interaction data in S. cerevisiae related to the unfolded protein response. Our approach discovers several hyperedges that capture the uncertain connectivity of genes in relevant protein complexes, suggesting that further experiments may be required to precisely discern their interaction patterns. We also show that these complexes are not discovered by an algorithm that computes frequent and dense subgraphs.
Collapse
Affiliation(s)
- Ahsanur Rahman
- Department of Computer Science, Virginia Tech, Blacksburg, VA
| | | | - David J. Badger
- Department of Computer Science, Virginia Tech, Blacksburg, VA
| | - Craig Estep
- Department of Computer Science, Virginia Tech, Blacksburg, VA
| | - T.M. Murali
- Department of Computer Science and the ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, VA
| |
Collapse
|
138
|
Pinna A, Heise S, Flassig RJ, de la Fuente A, Klamt S. Reconstruction of large-scale regulatory networks based on perturbation graphs and transitive reduction: improved methods and their evaluation. BMC SYSTEMS BIOLOGY 2013; 7:73. [PMID: 23924435 PMCID: PMC4231426 DOI: 10.1186/1752-0509-7-73] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Accepted: 08/05/2013] [Indexed: 02/08/2023]
Abstract
Background The data-driven inference of intracellular networks is one of the key challenges of computational and systems biology. As suggested by recent works, a simple yet effective approach for reconstructing regulatory networks comprises the following two steps. First, the observed effects induced by directed perturbations are collected in a signed and directed perturbation graph (PG). In a second step, Transitive Reduction (TR) is used to identify and eliminate those edges in the PG that can be explained by paths and are therefore likely to reflect indirect effects. Results In this work we introduce novel variants for PG generation and TR, leading to significantly improved performances. The key modifications concern: (i) use of novel statistical criteria for deriving a high-quality PG from experimental data; (ii) the application of local TR which allows only short paths to explain (and remove) a given edge; and (iii) a novel strategy to rank the edges with respect to their confidence. To compare the new methods with existing ones we not only apply them to a recent DREAM network inference challenge but also to a novel and unprecedented synthetic compendium consisting of 30 5000-gene networks simulated with varying biological and measurement error variances resulting in a total of 270 datasets. The benchmarks clearly demonstrate the superior reconstruction performance of the novel PG and TR variants compared to existing approaches. Moreover, the benchmark enabled us to draw some general conclusions. For example, it turns out that local TR restricted to paths with a length of only two is often sufficient or even favorable. We also demonstrate that considering edge weights is highly beneficial for TR whereas consideration of edge signs is of minor importance. We explain these observations from a graph-theoretical perspective and discuss the consequences with respect to a greatly reduced computational demand to conduct TR. Finally, as a realistic application scenario, we use our framework for inferring gene interactions in yeast based on a library of gene expression data measured in mutants with single knockouts of transcription factors. The reconstructed network shows a significant enrichment of known interactions, especially within the 100 most confident (and for experimental validation most relevant) edges. Conclusions This paper presents several major achievements. The novel methods introduced herein can be seen as state of the art for inference techniques relying on perturbation graphs and transitive reduction. Another key result of the study is the generation of a new and unprecedented large-scale in silico benchmark dataset accounting for different noise levels and providing a solid basis for unbiased testing of network inference methodologies. Finally, applying our approach to Saccharomyces cerevisiae suggested several new gene interactions with high confidence awaiting experimental validation.
Collapse
Affiliation(s)
- Andrea Pinna
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany.
| | | | | | | | | |
Collapse
|
139
|
Rajagopalan P, Kasif S, Murali T. Systems Biology Characterization of Engineered Tissues. Annu Rev Biomed Eng 2013; 15:55-70. [DOI: 10.1146/annurev-bioeng-071811-150120] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Padmavathy Rajagopalan
- Department of Chemical Engineering, Virginia Tech, Blacksburg, Virginia 24060;
- School of Biomedical Engineering and Sciences, Virginia Tech, Blacksburg, Virginia 24060
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, Virginia 24060
| | - Simon Kasif
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215
| | - T.M. Murali
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24060
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, Virginia 24060
| |
Collapse
|
140
|
Santra T, Kolch W, Kholodenko BN. Integrating Bayesian variable selection with Modular Response Analysis to infer biochemical network topology. BMC SYSTEMS BIOLOGY 2013; 7:57. [PMID: 23829771 PMCID: PMC3726398 DOI: 10.1186/1752-0509-7-57] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Accepted: 06/28/2013] [Indexed: 12/31/2022]
Abstract
Background Recent advancements in genetics and proteomics have led to the acquisition of large quantitative data sets. However, the use of these data to reverse engineer biochemical networks has remained a challenging problem. Many methods have been proposed to infer biochemical network topologies from different types of biological data. Here, we focus on unraveling network topologies from steady state responses of biochemical networks to successive experimental perturbations. Results We propose a computational algorithm which combines a deterministic network inference method termed Modular Response Analysis (MRA) and a statistical model selection algorithm called Bayesian Variable Selection, to infer functional interactions in cellular signaling pathways and gene regulatory networks. It can be used to identify interactions among individual molecules involved in a biochemical pathway or reveal how different functional modules of a biological network interact with each other to exchange information. In cases where not all network components are known, our method reveals functional interactions which are not direct but correspond to the interaction routes through unknown elements. Using computer simulated perturbation responses of signaling pathways and gene regulatory networks from the DREAM challenge, we demonstrate that the proposed method is robust against noise and scalable to large networks. We also show that our method can infer network topologies using incomplete perturbation datasets. Consequently, we have used this algorithm to explore the ERBB regulated G1/S transition pathway in certain breast cancer cells to understand the molecular mechanisms which cause these cells to become drug resistant. The algorithm successfully inferred many well characterized interactions of this pathway by analyzing experimentally obtained perturbation data. Additionally, it identified some molecular interactions which promote drug resistance in breast cancer cells. Conclusions The proposed algorithm provides a robust, scalable and cost effective solution for inferring network topologies from biological data. It can potentially be applied to explore novel pathways which play important roles in life threatening disease like cancer.
Collapse
Affiliation(s)
- Tapesh Santra
- Systems Biology Ireland, Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland.
| | | | | |
Collapse
|
141
|
Villaverde AF, Ross J, Banga JR. Reverse engineering cellular networks with information theoretic methods. Cells 2013; 2:306-29. [PMID: 24709703 PMCID: PMC3972682 DOI: 10.3390/cells2020306] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Revised: 04/22/2013] [Accepted: 04/27/2013] [Indexed: 11/16/2022] Open
Abstract
Building mathematical models of cellular networks lies at the core of systems biology. It involves, among other tasks, the reconstruction of the structure of interactions between molecular components, which is known as network inference or reverse engineering. Information theory can help in the goal of extracting as much information as possible from the available data. A large number of methods founded on these concepts have been proposed in the literature, not only in biology journals, but in a wide range of areas. Their critical comparison is difficult due to the different focuses and the adoption of different terminologies. Here we attempt to review some of the existing information theoretic methodologies for network inference, and clarify their differences. While some of these methods have achieved notable success, many challenges remain, among which we can mention dealing with incomplete measurements, noisy data, counterintuitive behaviour emerging from nonlinear relations or feedback loops, and computational burden of dealing with large data sets.
Collapse
Affiliation(s)
| | - John Ross
- Department of Chemistry, Stanford University, Stanford, CA 94305, USA.
| | - Julio R Banga
- Bioprocess Engineering Group, IIM-CSIC, Eduardo Cabello 6, Vigo 36208, Spain.
| |
Collapse
|
142
|
Wu M, Liu L, Hijazi H, Chan C. A multi-layer inference approach to reconstruct condition-specific genes and their regulation. ACTA ACUST UNITED AC 2013; 29:1541-52. [PMID: 23610368 DOI: 10.1093/bioinformatics/btt186] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
UNLABELLED An important topic in systems biology is the reverse engineering of regulatory mechanisms through reconstruction of context-dependent gene networks. A major challenge is to identify the genes and the regulations specific to a condition or phenotype, given that regulatory processes are highly connected such that a specific response is typically accompanied by numerous collateral effects. In this study, we design a multi-layer approach that is able to reconstruct condition-specific genes and their regulation through an integrative analysis of large-scale information of gene expression, protein interaction and transcriptional regulation (transcription factor-target gene relationships). We establish the accuracy of our methodology against synthetic datasets, as well as a yeast dataset. We then extend the framework to the application of higher eukaryotic systems, including human breast cancer and Arabidopsis thaliana cold acclimation. Our study identified TACSTD2 (TROP2) as a target gene for human breast cancer and discovered its regulation by transcription factors CREB, as well as NFkB. We also predict KIF2C is a target gene for ER-/HER2- breast cancer and is positively regulated by E2F1. The predictions were further confirmed through experimental studies. AVAILABILITY The implementation and detailed protocol of the layer approach is available at http://www.egr.msu.edu/changroup/Protocols/Three-layer%20approach%20 to % 20reconstruct%20condition.html.
Collapse
Affiliation(s)
- Ming Wu
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | | | | | | |
Collapse
|
143
|
Lim N, Senbabaoglu Y, Michailidis G, d'Alché-Buc F. OKVAR-Boost: a novel boosting algorithm to infer nonlinear dynamics and interactions in gene regulatory networks. ACTA ACUST UNITED AC 2013; 29:1416-23. [PMID: 23574736 PMCID: PMC3661057 DOI: 10.1093/bioinformatics/btt167] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
MOTIVATION Reverse engineering of gene regulatory networks remains a central challenge in computational systems biology, despite recent advances facilitated by benchmark in silico challenges that have aided in calibrating their performance. A number of approaches using either perturbation (knock-out) or wild-type time-series data have appeared in the literature addressing this problem, with the latter using linear temporal models. Nonlinear dynamical models are particularly appropriate for this inference task, given the generation mechanism of the time-series data. In this study, we introduce a novel nonlinear autoregressive model based on operator-valued kernels that simultaneously learns the model parameters, as well as the network structure. RESULTS A flexible boosting algorithm (OKVAR-Boost) that shares features from L2-boosting and randomization-based algorithms is developed to perform the tasks of parameter learning and network inference for the proposed model. Specifically, at each boosting iteration, a regularized Operator-valued Kernel-based Vector AutoRegressive model (OKVAR) is trained on a random subnetwork. The final model consists of an ensemble of such models. The empirical estimation of the ensemble model's Jacobian matrix provides an estimation of the network structure. The performance of the proposed algorithm is first evaluated on a number of benchmark datasets from the DREAM3 challenge and then on real datasets related to the In vivo Reverse-Engineering and Modeling Assessment (IRMA) and T-cell networks. The high-quality results obtained strongly indicate that it outperforms existing approaches. AVAILABILITY The OKVAR-Boost Matlab code is available as the archive: http://amis-group.fr/sourcecode-okvar-boost/OKVARBoost-v1.0.zip. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Néhémy Lim
- IBISC EA 4526, Université d'Évry-Val d'Essonne, Évry, France
| | | | | | | |
Collapse
|
144
|
Villa-Vialaneix N, Liaubet L, Laurent T, Cherel P, Gamot A, SanCristobal M. The structure of a gene co-expression network reveals biological functions underlying eQTLs. PLoS One 2013; 8:e60045. [PMID: 23577081 PMCID: PMC3618335 DOI: 10.1371/journal.pone.0060045] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Accepted: 02/20/2013] [Indexed: 11/18/2022] Open
Abstract
What are the commonalities between genes, whose expression level is partially controlled by eQTL, especially with regard to biological functions? Moreover, how are these genes related to a phenotype of interest? These issues are particularly difficult to address when the genome annotation is incomplete, as is the case for mammalian species. Moreover, the direct link between gene expression and a phenotype of interest may be weak, and thus difficult to handle. In this framework, the use of a co-expression network has proven useful: it is a robust approach for modeling a complex system of genetic regulations, and to infer knowledge for yet unknown genes. In this article, a case study was conducted with a mammalian species. It showed that the use of a co-expression network based on partial correlation, combined with a relevant clustering of nodes, leads to an enrichment of biological functions of around 83%. Moreover, the use of a spatial statistics approach allowed us to superimpose additional information related to a phenotype; this lead to highlighting specific genes or gene clusters that are related to the network structure and the phenotype. Three main results are worth noting: first, key genes were highlighted as a potential focus for forthcoming biological experiments; second, a set of biological functions, which support a list of genes under partial eQTL control, was set up by an overview of the global structure of the gene expression network; third, pH was found correlated with gene clusters, and then with related biological functions, as a result of a spatial analysis of the network topology.
Collapse
|
145
|
Altomare D, Consonni G, La Rocca L. Objective Bayesian search of Gaussian directed acyclic graphical models for ordered variables with non-local priors. Biometrics 2013; 69:478-87. [PMID: 23560520 DOI: 10.1111/biom.12018] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Revised: 03/01/2013] [Accepted: 03/01/2013] [Indexed: 11/30/2022]
Abstract
Directed acyclic graphical (DAG) models are increasingly employed in the study of physical and biological systems to model direct influences between variables. Identifying the graph from data is a challenging endeavor, which can be more reasonably tackled if the variables are assumed to satisfy a given ordering; in this case we simply have to estimate the presence or absence of each potential edge. Working under this assumption, we propose an objective Bayesian method for searching the space of Gaussian DAG models, which provides a rich output from minimal input. We base our analysis on non-local parameter priors, which are especially suited for learning sparse graphs, because they allow a faster learning rate, relative to ordinary local parameter priors, when the true unknown sampling distribution belongs to a simple model. We implement an efficient stochastic search algorithm, which deals effectively with data sets having sample size smaller than the number of variables, and apply our method to a variety of simulated and real data sets. Our approach compares favorably, in terms of the ROC curve for edge hit rate versus false alarm rate, to current state-of-the-art frequentist methods relying on the assumption of ordered variables; under this assumption it exhibits a competitive advantage over the PC-algorithm, which can be considered as a frequentist benchmark for unordered variables. Importantly, we find that our method is still at an advantage for learning the skeleton of the DAG, when the ordering of the variables is only moderately mis-specified. Prospectively, our method could be coupled with a strategy to learn the order of the variables, thus dropping the known ordering assumption.
Collapse
Affiliation(s)
- Davide Altomare
- Dipartimento di Matematica, Università di Pavia, Via Ferrata 1, 27100 Pavia, Italy
| | | | | |
Collapse
|
146
|
Bartel J, Krumsiek J, Theis FJ. Statistical methods for the analysis of high-throughput metabolomics data. Comput Struct Biotechnol J 2013; 4:e201301009. [PMID: 24688690 PMCID: PMC3962125 DOI: 10.5936/csbj.201301009] [Citation(s) in RCA: 171] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Revised: 03/05/2013] [Accepted: 03/07/2013] [Indexed: 11/24/2022] Open
Abstract
Metabolomics is a relatively new high-throughput technology that aims at measuring all endogenous metabolites within a biological sample in an unbiased fashion. The resulting metabolic profiles may be regarded as functional signatures of the physiological state, and have been shown to comprise effects of genetic regulation as well as environmental factors. This potential to connect genotypic to phenotypic information promises new insights and biomarkers for different research fields, including biomedical and pharmaceutical research. In the statistical analysis of metabolomics data, many techniques from other omics fields can be reused. However recently, a number of tools specific for metabolomics data have been developed as well. The focus of this mini review will be on recent advancements in the analysis of metabolomics data especially by utilizing Gaussian graphical models and independent component analysis.
Collapse
Affiliation(s)
- Jörg Bartel
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Jan Krumsiek
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Fabian J Theis
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany ; Department of Mathematics, Technische Universität München, Boltzmannstr. 3, 85747 Garching, Germany
| |
Collapse
|
147
|
Yates PD, Mukhopadhyay ND. An inferential framework for biological network hypothesis tests. BMC Bioinformatics 2013; 14:94. [PMID: 23496778 PMCID: PMC3621801 DOI: 10.1186/1471-2105-14-94] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2012] [Accepted: 03/03/2013] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Networks are ubiquitous in modern cell biology and physiology. A large literature exists for inferring/proposing biological pathways/networks using statistical or machine learning algorithms. Despite these advances a formal testing procedure for analyzing network-level observations is in need of further development. Comparing the behaviour of a pharmacologically altered pathway to its canonical form is an example of a salient one-sample comparison. Locating which pathways differentiate disease from no-disease phenotype may be recast as a two-sample network inference problem. RESULTS We outline an inferential method for performing one- and two-sample hypothesis tests where the sampling unit is a network and the hypotheses are stated via network model(s). We propose a dissimilarity measure that incorporates nearby neighbour information to contrast one or more networks in a statistical test. We demonstrate and explore the utility of our approach with both simulated and microarray data; random graphs and weighted (partial) correlation networks are used to form network models. Using both a well-known diabetes dataset and an ovarian cancer dataset, the methods outlined here could better elucidate co-regulation changes for one or more pathways between two clinically relevant phenotypes. CONCLUSIONS Formal hypothesis tests for gene- or protein-based networks are a logical progression from existing gene-based and gene-set tests for differential expression. Commensurate with the growing appreciation and development of systems biology, the dissimilarity-based testing methods presented here may allow us to improve our understanding of pathways and other complex regulatory systems. The benefit of our method was illustrated under select scenarios.
Collapse
Affiliation(s)
| | - Nitai D Mukhopadhyay
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
148
|
Kusano M, Fukushima A. Current challenges and future potential of tomato breeding using omics approaches. BREEDING SCIENCE 2013; 63:31-41. [PMID: 23641179 PMCID: PMC3621443 DOI: 10.1270/jsbbs.63.31] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2012] [Accepted: 10/30/2012] [Indexed: 05/16/2023]
Abstract
As tomatoes are one of the most important vegetables in the world, improvements in the quality and yield of tomato are strongly required. For this purpose, omics approaches such as metabolomics and transcriptomics are used not only for basic research to understand relationships between important traits and metabolism but also for the development of next generation breeding strategies of tomato plants, because an increase in the knowledge improves the taste and quality, stress resistance and/or potentially health-beneficial metabolites and is connected to improvements in the biochemical composition of tomatoes. Such omics data can be applied to network analyses to potentially reveal unknown cellular regulatory networks in tomato plants. The high-quality tomato genome that was sequenced in 2012 will likely accelerate the application of omics strategies, including next generation sequencing for tomato breeding. In this review, we highlight the current studies of omics network analyses of tomatoes and other plant species, in particular, a gene coexpression network. Key applications of omics approaches are also presented as case examples to improve economically important traits for tomato breeding.
Collapse
Affiliation(s)
- Miyako Kusano
- RIKEN Plant Science Center, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
- Kihara Institute for Biological Research, Yokohama City University, 641-12 Maioka, Totsuka, Yokohama, Kanagawa 244-0813, Japan
- Corresponding author (e-mail: )
| | - Atsushi Fukushima
- RIKEN Plant Science Center, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
149
|
Schneider HC, Klabunde T. Understanding drugs and diseases by systems biology? Bioorg Med Chem Lett 2013; 23:1168-76. [DOI: 10.1016/j.bmcl.2012.12.031] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2012] [Revised: 12/07/2012] [Accepted: 12/11/2012] [Indexed: 10/27/2022]
|
150
|
Dowell KG, Simons AK, Wang ZZ, Yun K, Hibbs MA. Cell-type-specific predictive network yields novel insights into mouse embryonic stem cell self-renewal and cell fate. PLoS One 2013; 8:e56810. [PMID: 23468881 PMCID: PMC3585227 DOI: 10.1371/journal.pone.0056810] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Accepted: 01/14/2013] [Indexed: 01/25/2023] Open
Abstract
Self-renewal, the ability of a stem cell to divide repeatedly while maintaining an undifferentiated state, is a defining characteristic of all stem cells. Here, we clarify the molecular foundations of mouse embryonic stem cell (mESC) self-renewal by applying a proven Bayesian network machine learning approach to integrate high-throughput data for protein function discovery. By focusing on a single stem-cell system, at a specific developmental stage, within the context of well-defined biological processes known to be active in that cell type, we produce a consensus predictive network that reflects biological reality more closely than those made by prior efforts using more generalized, context-independent methods. In addition, we show how machine learning efforts may be misled if the tissue specific role of mammalian proteins is not defined in the training set and circumscribed in the evidential data. For this study, we assembled an extensive compendium of mESC data: ∼2.2 million data points, collected from 60 different studies, under 992 conditions. We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination. Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant. Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies. This network can be used by stem cell researchers (at http://StemSight.org) to explore hypotheses about gene function in the context of self-renewal and to prioritize genes of interest for experimental validation.
Collapse
Affiliation(s)
- Karen G. Dowell
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
| | - Allen K. Simons
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Zack Z. Wang
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
- Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Kyuson Yun
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
| | - Matthew A. Hibbs
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
- Trinity University, Department of Computer Science, San Antonio, Texas, United States of America
- * E-mail:
| |
Collapse
|