1
|
Wang X, Rai N, Merchel Piovesan Pereira B, Eetemadi A, Tagkopoulos I. Accelerated knowledge discovery from omics data by optimal experimental design. Nat Commun 2020; 11:5026. [PMID: 33024104 PMCID: PMC7538421 DOI: 10.1038/s41467-020-18785-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 08/27/2020] [Indexed: 12/15/2022] Open
Abstract
How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. We present an optimal experimental design method (coined OPEX) to identify informative omics experiments using machine learning models for both experimental space exploration and model training. OPEX-guided exploration of Escherichia coli’s populations exposed to biocide and antibiotic combinations lead to more accurate predictive models of gene expression with 44% less data. Analysis of the proposed experiments shows that broad exploration of the experimental space followed by fine-tuning emerges as the optimal strategy. Additionally, analysis of the experimental data reveals 29 cases of cross-stress protection and 4 cases of cross-stress vulnerability. Further validation reveals the central role of chaperones, stress response proteins and transport pumps in cross-stress exposure. This work demonstrates how active learning can be used to guide omics data collection for training predictive models, making evidence-driven decisions and accelerating knowledge discovery in life sciences. How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. Here, the authors present OPEX, an optimal experimental design method to identify informative omics experiments for both experimental space exploration and model training.
Collapse
Affiliation(s)
- Xiaokang Wang
- Department of Biomedical Engineering, University of California, Davis, CA, 95616, USA.,Genome Center, University of California, Davis, CA, 95616, USA
| | - Navneet Rai
- Genome Center, University of California, Davis, CA, 95616, USA.,Department of Computer Science, University of California, Davis, CA, 95616, USA
| | - Beatriz Merchel Piovesan Pereira
- Genome Center, University of California, Davis, CA, 95616, USA.,Microbiology Graduate Group, University of California, Davis, CA, 95616, USA
| | - Ameen Eetemadi
- Genome Center, University of California, Davis, CA, 95616, USA.,Department of Computer Science, University of California, Davis, CA, 95616, USA
| | - Ilias Tagkopoulos
- Genome Center, University of California, Davis, CA, 95616, USA. .,Department of Computer Science, University of California, Davis, CA, 95616, USA.
| |
Collapse
|
2
|
Pratapa A, Adames N, Kraikivski P, Franzese N, Tyson JJ, Peccoud J, Murali TM. CrossPlan: systematic planning of genetic crosses to validate mathematical models. Bioinformatics 2019; 34:2237-2244. [PMID: 29432533 DOI: 10.1093/bioinformatics/bty072] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 02/07/2018] [Indexed: 12/27/2022] Open
Abstract
Motivation Mathematical models of cellular processes can systematically predict the phenotypes of novel combinations of multi-gene mutations. Searching for informative predictions and prioritizing them for experimental validation is challenging since the number of possible combinations grows exponentially in the number of mutations. Moreover, keeping track of the crosses needed to make new mutants and planning sequences of experiments is unmanageable when the experimenter is deluged by hundreds of potentially informative predictions to test. Results We present CrossPlan, a novel methodology for systematically planning genetic crosses to make a set of target mutants from a set of source mutants. We base our approach on a generic experimental workflow used in performing genetic crosses in budding yeast. We prove that the CrossPlan problem is NP-complete. We develop an integer-linear-program (ILP) to maximize the number of target mutants that we can make under certain experimental constraints. We apply our method to a comprehensive mathematical model of the protein regulatory network controlling cell division in budding yeast. We also extend our solution to incorporate other experimental conditions such as a delay factor that decides the availability of a mutant and genetic markers to confirm gene deletions. The experimental flow that underlies our work is quite generic and our ILP-based algorithm is easy to modify. Hence, our framework should be relevant in plant and animal systems as well. Availability and implementation CrossPlan code is freely available under GNU General Public Licence v3.0 at https://github.com/Murali-group/crossplan. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aditya Pratapa
- Department of Computer Science, Virginia Tech, Blacksburg, USA
| | - Neil Adames
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, USA
| | - Pavel Kraikivski
- Department of Biological Sciences, Virginia Tech, Blacksburg, USA
| | | | - John J Tyson
- Department of Biological Sciences, Virginia Tech, Blacksburg, USA
| | - Jean Peccoud
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, USA
| |
Collapse
|
3
|
Dougherty ER. A Nonmathematical Review of Optimal Operator and Experimental Design for Uncertain Scientific Models with Application to Genomics. Curr Genomics 2019; 20:16-23. [PMID: 31015788 PMCID: PMC6446484 DOI: 10.2174/1389202919666181213095743] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 12/05/2018] [Accepted: 12/10/2018] [Indexed: 11/22/2022] Open
Abstract
Introduction: The most basic aspect of modern engineering is the design of operators to act on physical systems in an optimal manner relative to a desired objective – for instance, designing a con-trol policy to autonomously direct a system or designing a classifier to make decisions regarding the sys-tem. These kinds of problems appear in biomedical science, where physical models are created with the intention of using them to design tools for diagnosis, prognosis, and therapy. Methods: In the classical paradigm, our knowledge regarding the model is certain; however, in practice, especially with complex systems, our knowledge is uncertain and operators must be designed while tak-ing this uncertainty into account. The related concepts of intrinsically Bayesian robust operators and op-timal Bayesian operators treat operator design under uncertainty. An objective-based experimental de-sign procedure is naturally related to operator design: We would like to perform an experiment that max-imally reduces our uncertainty as it pertains to our objective. Results & Discussion: This paper provides a nonmathematical review of optimal Bayesian operators directed at biomedical scientists. It considers two applications important to genomics, structural interven-tion in gene regulatory networks and classification. Conclusion: The salient point regarding intrinsically Bayesian operators is that uncertainty is quantified relative to the scientific model, and the prior distribution is on the parameters of this model. Optimization has direct physical (biological) meaning. This is opposed to the common method of placing prior distri-butions on the parameters of the operator, in which case there is a scientific gap between operator design and the phenomena.
Collapse
Affiliation(s)
- Edward R Dougherty
- Department of Electrical and Computer Engineering, College Station, Texas A&M University - TX, USA
| |
Collapse
|
4
|
Franks AM, Markowetz F, Airoldi EM. REFINING CELLULAR PATHWAY MODELS USING AN ENSEMBLE OF HETEROGENEOUS DATA SOURCES. Ann Appl Stat 2018; 12:1361-1384. [PMID: 36506698 PMCID: PMC9733905 DOI: 10.1214/16-aoas915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Improving current models and hypotheses of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of new high-throughput studies. Moreover, the available sources of data are heterogeneous, and the data need to be integrated in different ways depending on which part of the pathway they are most informative for. In this paper, we introduce a compartment specific strategy to integrate edge, node and path data for refining a given network hypothesis. To carry out inference, we use a local-move Gibbs sampler for updating the pathway hypothesis from a compendium of heterogeneous data sources, and a new network regression idea for integrating protein attributes. We demonstrate the utility of this approach in a case study of the pheromone response MAPK pathway in the yeast S. cerevisiae.
Collapse
Affiliation(s)
- Alexander M Franks
- Department of Statistics and, Applied Probability, University of California, Santa Barbara, South Hall, Santa Barbara, California 93106, USA
| | - Florian Markowetz
- Cancer Research UK, Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, United Kingdom
| | - Edoardo M Airoldi
- Fox School of Business, Department of Statistical Science, Temple University, Center for Data Science, 1810 Liacouras Walk, Philadelphia, Pennsylvania 19122, USA
| |
Collapse
|
5
|
Sverchkov Y, Craven M. A review of active learning approaches to experimental design for uncovering biological networks. PLoS Comput Biol 2017; 13:e1005466. [PMID: 28570593 PMCID: PMC5453429 DOI: 10.1371/journal.pcbi.1005466] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Various types of biological knowledge describe networks of interactions among elementary entities. For example, transcriptional regulatory networks consist of interactions among proteins and genes. Current knowledge about the exact structure of such networks is highly incomplete, and laboratory experiments that manipulate the entities involved are conducted to test hypotheses about these networks. In recent years, various automated approaches to experiment selection have been proposed. Many of these approaches can be characterized as active machine learning algorithms. Active learning is an iterative process in which a model is learned from data, hypotheses are generated from the model to propose informative experiments, and the experiments yield new data that is used to update the model. This review describes the various models, experiment selection strategies, validation techniques, and successful applications described in the literature; highlights common themes and notable distinctions among methods; and identifies likely directions of future research and open problems in the area.
Collapse
Affiliation(s)
- Yuriy Sverchkov
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Mark Craven
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
6
|
Korla K, Chandra N. A Systems Perspective of Signalling Networks in Host–Pathogen Interactions. J Indian Inst Sci 2017. [DOI: 10.1007/s41745-016-0017-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
7
|
Banf M, Rhee SY. Computational inference of gene regulatory networks: Approaches, limitations and opportunities. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2016; 1860:41-52. [PMID: 27641093 DOI: 10.1016/j.bbagrm.2016.09.003] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Revised: 09/08/2016] [Accepted: 09/08/2016] [Indexed: 10/21/2022]
Abstract
Gene regulatory networks lie at the core of cell function control. In E. coli and S. cerevisiae, the study of gene regulatory networks has led to the discovery of regulatory mechanisms responsible for the control of cell growth, differentiation and responses to environmental stimuli. In plants, computational rendering of gene regulatory networks is gaining momentum, thanks to the recent availability of high-quality genomes and transcriptomes and development of computational network inference approaches. Here, we review current techniques, challenges and trends in gene regulatory network inference and highlight challenges and opportunities for plant science. We provide plant-specific application examples to guide researchers in selecting methodologies that suit their particular research questions. Given the interdisciplinary nature of gene regulatory network inference, we tried to cater to both biologists and computer scientists to help them engage in a dialogue about concepts and caveats in network inference. Specifically, we discuss problems and opportunities in heterogeneous data integration for eukaryotic organisms and common caveats to be considered during network model evaluation. This article is part of a Special Issue entitled: Plant Gene Regulatory Mechanisms and Networks, edited by Dr. Erich Grotewold and Dr. Nathan Springer.
Collapse
Affiliation(s)
- Michael Banf
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama Street, Stanford 93405, United States.
| | - Seung Y Rhee
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama Street, Stanford 93405, United States.
| |
Collapse
|
8
|
Zhao B, Wang J, Li M, Li X, Li Y, Wu FX, Pan Y. A New Method for Predicting Protein Functions From Dynamic Weighted Interactome Networks. IEEE Trans Nanobioscience 2016; 15:131-9. [PMID: 26955047 DOI: 10.1109/tnb.2016.2536161] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of proteins can only be annotated computationally. Under new conditions or stimuli, not only the number and location of proteins would be changed, but also their interactions. This dynamic feature of protein interactions, however, was not considered in the existing function prediction algorithms. Taking the dynamic nature of protein interactions into consideration, we construct a dynamic weighted interactome network (DWIN) by integrating protein-protein interaction (PPI) network and time course gene expression data, as well as proteins' domain information and protein complex information. Then, we propose a new prediction approach that predicts protein functions from the constructed dynamic weighted interactome network. For an unknown protein, the proposed method visits dynamic networks at different time points and scores functions derived from all neighbors. Finally, the method selects top N functions from these ranked candidate functions to annotate the testing protein. Experiments on PPI datasets were conducted to evaluate the effectiveness of the proposed approach in predicting unknown protein functions. The evaluation results demonstrated that the proposed method outperforms other competing methods.
Collapse
|
9
|
Videla S, Konokotina I, Alexopoulos LG, Saez-Rodriguez J, Schaub T, Siegel A, Guziolowski C. Designing Experiments to Discriminate Families of Logic Models. Front Bioeng Biotechnol 2015; 3:131. [PMID: 26389116 PMCID: PMC4560026 DOI: 10.3389/fbioe.2015.00131] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2015] [Accepted: 08/17/2015] [Indexed: 11/13/2022] Open
Abstract
Logic models of signaling pathways are a promising way of building effective in silico functional models of a cell, in particular of signaling pathways. The automated learning of Boolean logic models describing signaling pathways can be achieved by training to phosphoproteomics data, which is particularly useful if it is measured upon different combinations of perturbations in a high-throughput fashion. However, in practice, the number and type of allowed perturbations are not exhaustive. Moreover, experimental data are unavoidably subjected to noise. As a result, the learning process results in a family of feasible logical networks rather than in a single model. This family is composed of logic models implementing different internal wirings for the system and therefore the predictions of experiments from this family may present a significant level of variability, and hence uncertainty. In this paper, we introduce a method based on Answer Set Programming to propose an optimal experimental design that aims to narrow down the variability (in terms of input-output behaviors) within families of logical models learned from experimental data. We study how the fitness with respect to the data can be improved after an optimal selection of signaling perturbations and how we learn optimal logic models with minimal number of experiments. The methods are applied on signaling pathways in human liver cells and phosphoproteomics experimental data. Using 25% of the experiments, we obtained logical models with fitness scores (mean square error) 15% close to the ones obtained using all experiments, illustrating the impact that our approach can have on the design of experiments for efficient model calibration.
Collapse
Affiliation(s)
- Santiago Videla
- UMR 6074 IRISA, CNRS, Campus de Beaulieu , Rennes , France ; Dyliss project, INRIA, Campus de Beaulieu , Rennes , France ; Institut für Informatik, Universität Potsdam , Potsdam , Germany ; LBSI, Fundación Instituto Leloir, CONICET , Buenos Aires , Argentina
| | - Irina Konokotina
- IRCCyN UMR CNRS 6597, École Centrale de Nantes , Nantes , France
| | - Leonidas G Alexopoulos
- Department of Mechanical Engineering, National Technical University of Athens , Athens , Greece
| | - Julio Saez-Rodriguez
- European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton , UK
| | - Torsten Schaub
- Institut für Informatik, Universität Potsdam , Potsdam , Germany
| | - Anne Siegel
- UMR 6074 IRISA, CNRS, Campus de Beaulieu , Rennes , France ; Dyliss project, INRIA, Campus de Beaulieu , Rennes , France
| | | |
Collapse
|
10
|
Dehghannasiri R, Yoon BJ, Dougherty ER. Optimal Experimental Design for Gene Regulatory Networks in the Presence of Uncertainty. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:938-50. [PMID: 26357334 DOI: 10.1109/tcbb.2014.2377733] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Of major interest to translational genomics is the intervention in gene regulatory networks (GRNs) to affect cell behavior; in particular, to alter pathological phenotypes. Owing to the complexity of GRNs, accurate network inference is practically challenging and GRN models often contain considerable amounts of uncertainty. Considering the cost and time required for conducting biological experiments, it is desirable to have a systematic method for prioritizing potential experiments so that an experiment can be chosen to optimally reduce network uncertainty. Moreover, from a translational perspective it is crucial that GRN uncertainty be quantified and reduced in a manner that pertains to the operational cost that it induces, such as the cost of network intervention. In this work, we utilize the concept of mean objective cost of uncertainty (MOCU) to propose a novel framework for optimal experimental design. In the proposed framework, potential experiments are prioritized based on the MOCU expected to remain after conducting the experiment. Based on this prioritization, one can select an optimal experiment with the largest potential to reduce the pertinent uncertainty present in the current network model. We demonstrate the effectiveness of the proposed method via extensive simulations based on synthetic and real gene regulatory networks.
Collapse
|
11
|
Chasman D, Ho YH, Berry DB, Nemec CM, MacGilvray ME, Hose J, Merrill AE, Lee MV, Will JL, Coon JJ, Ansari AZ, Craven M, Gasch AP. Pathway connectivity and signaling coordination in the yeast stress-activated signaling network. Mol Syst Biol 2014; 10:759. [PMID: 25411400 PMCID: PMC4299600 DOI: 10.15252/msb.20145120] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Stressed cells coordinate a multi-faceted response spanning many levels of physiology. Yet
knowledge of the complete stress-activated regulatory network as well as design principles for
signal integration remains incomplete. We developed an experimental and computational approach to
integrate available protein interaction data with gene fitness contributions, mutant transcriptome
profiles, and phospho-proteome changes in cells responding to salt stress, to infer the
salt-responsive signaling network in yeast. The inferred subnetwork presented many novel predictions
by implicating new regulators, uncovering unrecognized crosstalk between known pathways, and
pointing to previously unknown ‘hubs’ of signal integration. We exploited these
predictions to show that Cdc14 phosphatase is a central hub in the network and that modification of
RNA polymerase II coordinates induction of stress-defense genes with reduction of growth-related
transcripts. We find that the orthologous human network is enriched for cancer-causing genes,
underscoring the importance of the subnetwork's predictions in understanding stress
biology.
Collapse
Affiliation(s)
- Deborah Chasman
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - Yi-Hsuan Ho
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, USA
| | - David B Berry
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, USA
| | - Corey M Nemec
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA
| | | | - James Hose
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, USA
| | - Anna E Merrill
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - M Violet Lee
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Jessica L Will
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, USA
| | - Joshua J Coon
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI, USA Department of Biological Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Aseem Z Ansari
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI, USA
| | - Mark Craven
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI, USA Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Audrey P Gasch
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, USA Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
12
|
Sintupisut N, Liu PL, Yeang CH. An integrative characterization of recurrent molecular aberrations in glioblastoma genomes. Nucleic Acids Res 2013; 41:8803-21. [PMID: 23907387 PMCID: PMC3799430 DOI: 10.1093/nar/gkt656] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Glioblastoma multiforme (GBM) is the most common and malignant primary brain tumor in adults. Decades of investigations and the recent effort of the Cancer Genome Atlas (TCGA) project have mapped many molecular alterations in GBM cells. Alterations on DNAs may dysregulate gene expressions and drive malignancy of tumors. It is thus important to uncover causal and statistical dependency between ‘effector’ molecular aberrations and ‘target’ gene expressions in GBMs. A rich collection of prior studies attempted to combine copy number variation (CNV) and mRNA expression data. However, systematic methods to integrate multiple types of cancer genomic data—gene mutations, single nucleotide polymorphisms, CNVs, DNA methylations, mRNA and microRNA expressions and clinical information—are relatively scarce. We proposed an algorithm to build ‘association modules’ linking effector molecular aberrations and target gene expressions and applied the module-finding algorithm to the integrated TCGA GBM data sets. The inferred association modules were validated by six tests using external information and datasets of central nervous system tumors: (i) indication of prognostic effects among patients; (ii) coherence of target gene expressions; (iii) retention of effector–target associations in external data sets; (iv) recurrence of effector molecular aberrations in GBM; (v) functional enrichment of target genes; and (vi) co-citations between effectors and targets. Modules associated with well-known molecular aberrations of GBM—such as chromosome 7 amplifications, chromosome 10 deletions, EGFR and NF1 mutations—passed the majority of the validation tests. Furthermore, several modules associated with less well-reported molecular aberrations—such as chromosome 11 CNVs, CD40, PLXNB1 and GSTM1 methylations, and mir-21 expressions—were also validated by external information. In particular, modules constituting trans-acting effects with chromosome 11 CNVs and cis-acting effects with chromosome 10 CNVs manifested strong negative and positive associations with survival times in brain tumors. By aligning the information of association modules with the established GBM subclasses based on transcription or methylation levels, we found each subclass possessed multiple concurrent molecular aberrations. Furthermore, the joint molecular characteristics derived from 16 association modules had prognostic power not explained away by the strong biomarker of CpG island methylator phenotypes. Functional and survival analyses indicated that immune/inflammatory responses and epithelial-mesenchymal transitions were among the most important determining processes of prognosis. Finally, we demonstrated that certain molecular aberrations uniquely recurred in GBM but were relatively rare in non-GBM glioma cells. These results justify the utility of an integrative analysis on cancer genomes and provide testable characterizations of driver aberration events in GBM.
Collapse
Affiliation(s)
- Nardnisa Sintupisut
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan, ROC and Institute of Information Science, Academia Sinica, Taipei, Taiwan, ROC
| | | | | |
Collapse
|
13
|
Lai WKM, Buck MJ. An integrative approach to understanding the combinatorial histone code at functional elements. Bioinformatics 2013; 29:2231-7. [DOI: 10.1093/bioinformatics/btt382] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
14
|
Tennen RI, Haye JE, Wijayatilake HD, Arlow T, Ponzio D, Gammie AE. Cell-cycle and DNA damage regulation of the DNA mismatch repair protein Msh2 occurs at the transcriptional and post-transcriptional level. DNA Repair (Amst) 2013; 12:97-109. [PMID: 23261051 PMCID: PMC3749301 DOI: 10.1016/j.dnarep.2012.11.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Revised: 10/03/2012] [Accepted: 11/06/2012] [Indexed: 12/13/2022]
Abstract
DNA mismatch repair during replication is a conserved process essential for maintaining genomic stability. Mismatch repair is also implicated in cell-cycle arrest and apoptosis after DNA damage. Because yeast and human mismatch repair systems are well conserved, we have employed the budding yeast Saccharomyces cerevisiae to understand the regulation and function of the mismatch repair gene MSH2. Using a luciferase-based transcriptional reporter, we defined a 218-bp region upstream of MSH2 that contains cell-cycle and DNA damage responsive elements. The 5' end of the MSH2 transcript was mapped by primer extension and was found to encode a small upstream open reading frame (uORF). Mutagenesis of the uORF start codon or of the uORF stop codon, which creates a continuous reading frame with MSH2, increased Msh2 steady-state protein levels ∼2-fold. Furthermore, we found that the cell-cycle transcription factors Swi6, Swi4, and Mbp1-along with SCB/MCB cell-cycle binding sites upstream of MSH2-are all required for full basal expression of MSH2. Mutagenesis of the cell-cycle boxes resulted in a minor reduction in basal Msh2 levels and a 3-fold defect in mismatch repair. Disruption of the cell-cycle boxes also affected growth in a DNA polymerase-defective strain background where mismatch repair is essential, particularly in the presence of the DNA damaging agent methyl methane sulfonate (MMS). Promoter replacements conferring constitutive expression of MSH2 revealed that the transcriptional induction in response to MMS is required to maintain induced levels of Msh2. Turnover experiments confirmed an elevated rate of degradation in the presence of MMS. Taken together, the data show that the DNA damage regulation of Msh2 occurs at the transcriptional and post-transcriptional levels. The transcriptional and translational control elements identified are conserved in mammalian cells, underscoring the use of yeast as a model system to examine the regulation of MSH2.
Collapse
Affiliation(s)
- Ruth I. Tennen
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544-1014, United States
| | - Joanna E. Haye
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544-1014, United States
| | | | - Tim Arlow
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544-1014, United States
| | - Danielle Ponzio
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544-1014, United States
| | - Alison E. Gammie
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544-1014, United States
| |
Collapse
|
15
|
Hashemikhabir S, Ayaz ES, Kavurucu Y, Can T, Kahveci T. Large-scale signaling network reconstruction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1696-1708. [PMID: 23221085 DOI: 10.1109/tcbb.2012.128] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Reconstructing the topology of a signaling network by means of RNA interference (RNAi) technology is an underdetermined problem especially when a single gene in the network is knocked down or observed. In addition, the exponential search space limits the existing methods to small signaling networks of size 10-15 genes. In this paper, we propose integrating RNAi data with a reference physical interaction network. We formulate the problem of signaling network reconstruction as finding the minimum number of edit operations on a given reference network. The edit operations transform the reference network to a network that satisfies the RNAi observations. We show that using a reference network does not simplify the computational complexity of the problem. Therefore, we propose two methods which provide near optimal results and can scale well for reconstructing networks up to hundreds of components. We validate the proposed methods on synthetic and real data sets. Comparison with the state of the art on real signaling networks shows that the proposed methodology can scale better and generates biologically significant results.
Collapse
|
16
|
Niederberger T, Etzold S, Lidschreiber M, Maier KC, Martin DE, Fröhlich H, Cramer P, Tresch A. MC EMiNEM maps the interaction landscape of the Mediator. PLoS Comput Biol 2012; 8:e1002568. [PMID: 22737066 PMCID: PMC3380870 DOI: 10.1371/journal.pcbi.1002568] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Accepted: 05/04/2012] [Indexed: 11/18/2022] Open
Abstract
The Mediator is a highly conserved, large multiprotein complex that is involved essentially in the regulation of eukaryotic mRNA transcription. It acts as a general transcription factor by integrating regulatory signals from gene-specific activators or repressors to the RNA Polymerase II. The internal network of interactions between Mediator subunits that conveys these signals is largely unknown. Here, we introduce MC EMiNEM, a novel method for the retrieval of functional dependencies between proteins that have pleiotropic effects on mRNA transcription. MC EMiNEM is based on Nested Effects Models (NEMs), a class of probabilistic graphical models that extends the idea of hierarchical clustering. It combines mode-hopping Monte Carlo (MC) sampling with an Expectation-Maximization (EM) algorithm for NEMs to increase sensitivity compared to existing methods. A meta-analysis of four Mediator perturbation studies in Saccharomyces cerevisiae, three of which are unpublished, provides new insight into the Mediator signaling network. In addition to the known modular organization of the Mediator subunits, MC EMiNEM reveals a hierarchical ordering of its internal information flow, which is putatively transmitted through structural changes within the complex. We identify the N-terminus of Med7 as a peripheral entity, entailing only local structural changes upon perturbation, while the C-terminus of Med7 and Med19 appear to play a central role. MC EMiNEM associates Mediator subunits to most directly affected genes, which, in conjunction with gene set enrichment analysis, allows us to construct an interaction map of Mediator subunits and transcription factors.
Collapse
Affiliation(s)
- Theresa Niederberger
- Gene Center Munich and Center for integrated Protein Science CiPSM, Department of Biochemistry, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Stefanie Etzold
- Gene Center Munich and Center for integrated Protein Science CiPSM, Department of Biochemistry, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Michael Lidschreiber
- Gene Center Munich and Center for integrated Protein Science CiPSM, Department of Biochemistry, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Kerstin C. Maier
- Gene Center Munich and Center for integrated Protein Science CiPSM, Department of Biochemistry, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Dietmar E. Martin
- Gene Center Munich and Center for integrated Protein Science CiPSM, Department of Biochemistry, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Holger Fröhlich
- Bonn-Aachen International Center for IT (B-IT) Algorithmic Bioinformatics, Rheinische Friedrich-Wilhelms-University Bonn, Bonn, Germany
| | - Patrick Cramer
- Gene Center Munich and Center for integrated Protein Science CiPSM, Department of Biochemistry, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Achim Tresch
- Gene Center Munich and Center for integrated Protein Science CiPSM, Department of Biochemistry, Ludwig-Maximilians-University Munich, Munich, Germany
- Max Planck Institute for Plant Breeding Research, Cologne, Germany
- Institute for Genetics, University of Cologne, Cologne, Germany
- * E-mail:
| |
Collapse
|
17
|
Identifying a small set of marker genes using minimum expected cost of misclassification. Artif Intell Med 2012; 55:51-9. [DOI: 10.1016/j.artmed.2012.01.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2011] [Revised: 12/12/2011] [Accepted: 01/29/2012] [Indexed: 11/24/2022]
|
18
|
Abstract
The complexity, diversity, and richness of experimental data on cellular systems are inspiring the development of computational analysis techniques that can directly prioritize and suggest new experiments.
Collapse
|
19
|
Vinayagam A, Stelzl U, Foulle R, Plassmann S, Zenkner M, Timm J, Assmus HE, Andrade-Navarro MA, Wanker EE. A directed protein interaction network for investigating intracellular signal transduction. Sci Signal 2011; 4:rs8. [PMID: 21900206 DOI: 10.1126/scisignal.2001699] [Citation(s) in RCA: 252] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Cellular signal transduction is a complex process involving protein-protein interactions (PPIs) that transmit information. For example, signals from the plasma membrane may be transduced to transcription factors to regulate gene expression. To obtain a global view of cellular signaling and to predict potential signal modulators, we searched for protein interaction partners of more than 450 signaling-related proteins by means of automated yeast two-hybrid interaction mating. The resulting PPI network connected 1126 proteins through 2626 PPIs. After expansion of this interaction map with publicly available PPI data, we generated a directed network resembling the signal transduction flow between proteins with a naïve Bayesian classifier. We exploited information on the shortest PPI paths from membrane receptors to transcription factors to predict input and output relationships between interacting proteins. Integration of directed PPI with time-resolved protein phosphorylation data revealed network structures that dynamically conveyed information from the activated epidermal growth factor and extracellular signal-regulated kinase (EGF/ERK) signaling cascade to directly associated proteins and more distant proteins in the network. From the model network, we predicted 18 previously unknown modulators of EGF/ERK signaling, which we validated in mammalian cell-based assays. This generic experimental and computational approach provides a framework for elucidating causal connections between signaling proteins and facilitates the identification of proteins that modulate the flow of information in signaling networks.
Collapse
Affiliation(s)
- Arunachalam Vinayagam
- AG Neuroproteomics and Computational Biology and Data Mining Group, Max Delbrück Centrum for Molecular Medicine, Robert-Rössle-Strasse 10, D-13125 Berlin-Buch, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Affiliation(s)
- Nancy Lan Guo
- Mary Babb Randolph Cancer Center/Department of Community Medicine, School of Medicine, West Virginia University, Morgantown, WV 26506-9300
| |
Collapse
|
21
|
Li SD, Tagami T, Ho YF, Yeang CH. Deciphering causal and statistical relations of molecular aberrations and gene expressions in NCI-60 cell lines. BMC SYSTEMS BIOLOGY 2011; 5:186. [PMID: 22051105 PMCID: PMC3259106 DOI: 10.1186/1752-0509-5-186] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Accepted: 11/04/2011] [Indexed: 12/02/2022]
Abstract
BACKGROUND Cancer cells harbor a large number of molecular alterations such as mutations, amplifications and deletions on DNA sequences and epigenetic changes on DNA methylations. These aberrations may dysregulate gene expressions, which in turn drive the malignancy of tumors. Deciphering the causal and statistical relations of molecular aberrations and gene expressions is critical for understanding the molecular mechanisms of clinical phenotypes. RESULTS In this work, we proposed a computational method to reconstruct association modules containing driver aberrations, passenger mRNA or microRNA expressions, and putative regulators that mediate the effects from drivers to passengers. By applying the module-finding algorithm to the integrated datasets of NCI-60 cancer cell lines, we found that gene expressions were driven by diverse molecular aberrations including chromosomal segments' copy number variations, gene mutations and DNA methylations, microRNA expressions, and the expressions of transcription factors. In-silico validation indicated that passenger genes were enriched with the regulator binding motifs, functional categories or pathways where the drivers were involved, and co-citations with the driver/regulator genes. Moreover, 6 of 11 predicted MYB targets were down-regulated in an MYB-siRNA treated leukemia cell line. In addition, microRNA expressions were driven by distinct mechanisms from mRNA expressions. CONCLUSIONS The results provide rich mechanistic information regarding molecular aberrations and gene expressions in cancer genomes. This kind of integrative analysis will become an important tool for the diagnosis and treatment of cancer in the era of personalized medicine.
Collapse
Affiliation(s)
- Shyh-Dar Li
- Ontario Institute for Cancer Research, 101 College Street, Toronto, Canada
| | | | - Ying-Fu Ho
- Institute of Statistical Science, Academia Sinica, Academia Road, Sec 2, Taipei, Taiwan
| | - Chen-Hsiang Yeang
- Institute of Statistical Science, Academia Sinica, Academia Road, Sec 2, Taipei, Taiwan
| |
Collapse
|
22
|
Sendiña-Nadal I, Ofran Y, Almendral JA, Buldú JM, Leyva I, Li D, Havlin S, Boccaletti S. Unveiling protein functions through the dynamics of the interaction network. PLoS One 2011; 6:e17679. [PMID: 21408013 PMCID: PMC3052369 DOI: 10.1371/journal.pone.0017679] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2010] [Accepted: 02/05/2011] [Indexed: 01/02/2023] Open
Abstract
Protein interaction networks have become a tool to study biological processes, either for predicting molecular functions or for designing proper new drugs to regulate the main biological interactions. Furthermore, such networks are known to be organized in sub-networks of proteins contributing to the same cellular function. However, the protein function prediction is not accurate and each protein has traditionally been assigned to only one function by the network formalism. By considering the network of the physical interactions between proteins of the yeast together with a manual and single functional classification scheme, we introduce a method able to reveal important information on protein function, at both micro- and macro-scale. In particular, the inspection of the properties of oscillatory dynamics on top of the protein interaction network leads to the identification of misclassification problems in protein function assignments, as well as to unveil correct identification of protein functions. We also demonstrate that our approach can give a network representation of the meta-organization of biological processes by unraveling the interactions between different functional classes.
Collapse
|
23
|
Liu Q, Tan Y, Huang T, Ding G, Tu Z, Liu L, Li Y, Dai H, Xie L. TF-centered downstream gene set enrichment analysis: Inference of causal regulators by integrating TF-DNA interactions and protein post-translational modifications information. BMC Bioinformatics 2010; 11 Suppl 11:S5. [PMID: 21172055 PMCID: PMC3024863 DOI: 10.1186/1471-2105-11-s11-s5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Background Inference of causal regulators responsible for gene expression changes under different conditions is of great importance but remains rather challenging. To date, most approaches use direct binding targets of transcription factors (TFs) to associate TFs with expression profiles. However, the low overlap between binding targets of a TF and the affected genes of the TF knockout limits the power of those methods. Results We developed a TF-centered downstream gene set enrichment analysis approach to identify potential causal regulators responsible for expression changes. We constructed hierarchical and multi-layer regulation models to derive possible downstream gene sets of a TF using not only TF-DNA interactions, but also, for the first time, post-translational modifications (PTM) information. We verified our method in one expression dataset of large-scale TF knockout and another dataset involving both TF knockout and TF overexpression. Compared with the flat model using TF-DNA interactions alone, our method correctly identified five more actual perturbed TFs in large-scale TF knockout data and six more perturbed TFs in overexpression data. Potential regulatory pathways downstream of three perturbed regulators— SNF1, AFT1 and SUT1 —were given to demonstrate the power of multilayer regulation models integrating TF-DNA interactions and PTM information. Additionally, our method successfully identified known important TFs and inferred some novel potential TFs involved in the transition from fermentative to glycerol-based respiratory growth and in the pheromone response. Downstream regulation pathways of SUT1 and AFT1 were also supported by the mRNA and/or phosphorylation changes of their mediating TFs and/or “modulator” proteins. Conclusions The results suggest that in addition to direct transcription, indirect transcription and post-translational regulation are also responsible for the effects of TFs perturbation, especially for TFs overexpression. Many TFs inferred by our method are supported by literature. Multiple TF regulation models could lead to new hypotheses for future experiments. Our method provides a valuable framework for analyzing gene expression data to identify causal regulators in the context of TF-DNA interactions and PTM information.
Collapse
Affiliation(s)
- Qi Liu
- School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Yeang CH. An integrated analysis of molecular aberrations in NCI-60 cell lines. BMC Bioinformatics 2010; 11:495. [PMID: 20925909 PMCID: PMC2984587 DOI: 10.1186/1471-2105-11-495] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2010] [Accepted: 10/06/2010] [Indexed: 11/26/2022] Open
Abstract
Background Cancer is a complex disease where various types of molecular aberrations drive the development and progression of malignancies. Large-scale screenings of multiple types of molecular aberrations (e.g., mutations, copy number variations, DNA methylations, gene expressions) become increasingly important in the prognosis and study of cancer. Consequently, a computational model integrating multiple types of information is essential for the analysis of the comprehensive data. Results We propose an integrated modeling framework to identify the statistical and putative causal relations of various molecular aberrations and gene expressions in cancer. To reduce spurious associations among the massive number of probed features, we sequentially applied three layers of logistic regression models with increasing complexity and uncertainty regarding the possible mechanisms connecting molecular aberrations and gene expressions. Layer 1 models associate gene expressions with the molecular aberrations on the same loci. Layer 2 models associate expressions with the aberrations on different loci but have known mechanistic links. Layer 3 models associate expressions with nonlocal aberrations which have unknown mechanistic links. We applied the layered models to the integrated datasets of NCI-60 cancer cell lines and validated the results with large-scale statistical analysis. Furthermore, we discovered/reaffirmed the following prominent links: (1)Protein expressions are generally consistent with mRNA expressions. (2)Several gene expressions are modulated by composite local aberrations. For instance, CDKN2A expressions are repressed by either frame-shift mutations or DNA methylations. (3)Amplification of chromosome 6q in leukemia elevates the expression of MYB, and the downstream targets of MYB on other chromosomes are up-regulated accordingly. (4)Amplification of chromosome 3p and hypo-methylation of PAX3 together elevate MITF expression in melanoma, which up-regulates the downstream targets of MITF. (5)Mutations of TP53 are negatively associated with its direct target genes. Conclusions The analysis results on NCI-60 data justify the utility of the layered models for the incoming flow of cancer genomic data. Experimental validations on selected prominent links and application of the layered modeling framework to other integrated datasets will be carried out subsequently.
Collapse
|
25
|
Joshi A, Van Parys T, Van de Peer Y, Michoel T. Characterizing regulatory path motifs in integrated networks using perturbational data. Genome Biol 2010; 11:R32. [PMID: 20230615 PMCID: PMC2864572 DOI: 10.1186/gb-2010-11-3-r32] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2009] [Revised: 10/01/2009] [Accepted: 03/11/2010] [Indexed: 01/12/2023] Open
Abstract
Pathicular – a Cytoscape plugin for analysing cellular responses to transcription factor perturbations is presented We introduce Pathicular http://bioinformatics.psb.ugent.be/software/details/Pathicular, a Cytoscape plugin for studying the cellular response to perturbations of transcription factors by integrating perturbational expression data with transcriptional, protein-protein and phosphorylation networks. Pathicular searches for 'regulatory path motifs', short paths in the integrated physical networks which occur significantly more often than expected between transcription factors and their targets in the perturbational data. A case study in Saccharomyces cerevisiae identifies eight regulatory path motifs and demonstrates their biological significance.
Collapse
Affiliation(s)
- Anagha Joshi
- Department of Plant Systems Biology, VIB, Technologiepark 927, Gent, Belgium.
| | | | | | | |
Collapse
|
26
|
|
27
|
Przytycka TM, Singh M, Slonim DK. Toward the dynamic interactome: it's about time. Brief Bioinform 2010; 11:15-29. [PMID: 20061351 PMCID: PMC2810115 DOI: 10.1093/bib/bbp057] [Citation(s) in RCA: 147] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Revised: 11/01/2009] [Indexed: 11/14/2022] Open
Abstract
Dynamic molecular interactions play a central role in regulating the functioning of cells and organisms. The availability of experimentally determined large-scale cellular networks, along with other high-throughput experimental data sets that provide snapshots of biological systems at different times and conditions, is increasingly helpful in elucidating interaction dynamics. Here we review the beginnings of a new subfield within computational biology, one focused on the global inference and analysis of the dynamic interactome. This burgeoning research area, which entails a shift from static to dynamic network analysis, promises to be a major step forward in our ability to model and reason about cellular function and behavior.
Collapse
Affiliation(s)
- Teresa M Przytycka
- National Center of Biotechnology Information, NLM, NIH, 8000 Rockville Pike, Bethesda MD 20814, USA.
| | | | | |
Collapse
|
28
|
Abstract
Regulatory and other networks in the cell change in a highly dynamic way over time and in response to internal and external stimuli. While several different types of high-throughput experimental procedures are available to study systems in the cell, most only measure static properties of such networks. Information derived from sequence data is inherently static, and most interaction data sets are measured in a static way as well. In this chapter we discuss one of the few abundant sources for temporal information, time series expression data. We provide an overview of the methods suggested for clustering this type of data to identify functionally related genes. We also discuss methods for inferring causality and interactions using lagged correlations and regression analysis. Finally, we present methods for combining time series expression data with static data to reconstruct dynamic regulatory networks. We point to software tools implementing the methods discussed in this chapter. As more temporal measurements become available, the importance of analyzing such data and of combining it with other types of data will greatly increase.
Collapse
Affiliation(s)
- Anthony Gitter
- Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA.
| | | | | |
Collapse
|
29
|
Elucidating regulatory mechanisms downstream of a signaling pathway using informative experiments. Mol Syst Biol 2009; 5:287. [PMID: 19584836 PMCID: PMC2724975 DOI: 10.1038/msb.2009.45] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2008] [Accepted: 05/26/2009] [Indexed: 11/25/2022] Open
Abstract
Signaling cascades are triggered by environmental stimulation and propagate the signal to regulate transcription. Systematic reconstruction of the underlying regulatory mechanisms requires pathway-targeted, informative experimental data. However, practical experimental design approaches are still in their infancy. Here, we propose a framework that iterates design of experiments and identification of regulatory relationships downstream of a given pathway. The experimental design component, called MEED, aims to minimize the amount of laboratory effort required in this process. To avoid ambiguity in the identification of regulatory relationships, the choice of experiments maximizes diversity between expression profiles of genes regulated through different mechanisms. The framework takes advantage of expert knowledge about the pathways under study, formalized in a predictive logical model. By considering model-predicted dependencies between experiments, MEED is able to suggest a whole set of experiments that can be carried out simultaneously. Our framework was applied to investigate interconnected signaling pathways in yeast. In comparison with other approaches, MEED suggested the most informative experiments for unambiguous identification of transcriptional regulation in this system.
Collapse
|
30
|
Gitter A, Siegfried Z, Klutstein M, Fornes O, Oliva B, Simon I, Bar-Joseph Z. Backup in gene regulatory networks explains differences between binding and knockout results. Mol Syst Biol 2009; 5:276. [PMID: 19536199 PMCID: PMC2710864 DOI: 10.1038/msb.2009.33] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2008] [Accepted: 04/29/2009] [Indexed: 12/15/2022] Open
Abstract
The complementarity of gene expression and protein–DNA interaction data led to several successful models of biological systems. However, recent studies in multiple species raise doubts about the relationship between these two datasets. These studies show that the overwhelming majority of genes bound by a particular transcription factor (TF) are not affected when that factor is knocked out. Here, we show that this surprising result can be partially explained by considering the broader cellular context in which TFs operate. Factors whose functions are not backed up by redundant paralogs show a fourfold increase in the agreement between their bound targets and the expression levels of those targets. In addition, we show that incorporating protein interaction networks provides physical explanations for knockout effects. New double knockout experiments support our conclusions. Our results highlight the robustness provided by redundant TFs and indicate that in the context of diverse cellular systems, binding is still largely functional.
Collapse
Affiliation(s)
- Anthony Gitter
- Computer Science Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | | | | | | | | | | |
Collapse
|
31
|
A factor graph nested effects model to identify networks from genetic perturbations. PLoS Comput Biol 2009; 5:e1000274. [PMID: 19180177 PMCID: PMC2613752 DOI: 10.1371/journal.pcbi.1000274] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2008] [Accepted: 12/12/2008] [Indexed: 11/26/2022] Open
Abstract
Complex phenotypes such as the transformation of a normal population of cells into cancerous tissue result from a series of molecular triggers gone awry. We describe a method that searches for a genetic network consistent with expression changes observed under the knock-down of a set of genes that share a common role in the cell, such as a disease phenotype. The method extends the Nested Effects Model of Markowetz et al. (2005) by using a probabilistic factor graph to search for a network representing interactions among these silenced genes. The method also expands the network by attaching new genes at specific downstream points, providing candidates for subsequent perturbations to further characterize the pathway. We investigated an extension provided by the factor graph approach in which the model distinguishes between inhibitory and stimulatory interactions. We found that the extension yielded significant improvements in recovering the structure of simulated and Saccharomyces cerevisae networks. We applied the approach to discover a signaling network among genes involved in a human colon cancer cell invasiveness pathway. The method predicts several genes with new roles in the invasiveness process. We knocked down two genes identified by our approach and found that both knock-downs produce loss of invasive potential in a colon cancer cell line. Nested effects models may be a powerful tool for inferring regulatory connections and genes that operate in normal and disease-related processes. Biological processes are the result of the actions and interactions of many genes and the proteins that they encode. Our knowledge of interactions for many biological processes is limited, especially for cancer where genomic alterations may create entirely novel pathways not present in normal tissue. Perturbing gene expression (for example, by deleting a gene) has long been used as a tool in molecular biology to elucidate interactions but is very expensive and labor intensive. The search for new genes that may participate can be a daunting “fishing expedition.” We have devised a tool that automatically infers interactions using high-throughput gene expression data. When a gene is silenced, it causes other genes to be switched on or off, which provide clues about the pathway(s) in which the gene acts. Our method uses the genomewide on/off states as a fingerprint to detect interactions among a set of silenced genes. We were able to elucidate a network of interactions for several genes implicated in metastatic colon cancer. Genes newly connected to the network were found to operate in cancer cell invasion in human cells, validating the approach. Thus, the method enables an efficient discovery of the networks that underlie biological processes such as carcinogenesis.
Collapse
|
32
|
Lotito L, Russo A, Bueno S, Chillemi G, Fogli MV, Capranico G. A specific transcriptional response of yeast cells to camptothecin dependent on the Swi4 and Mbp1 factors. Eur J Pharmacol 2008; 603:29-36. [PMID: 19094980 DOI: 10.1016/j.ejphar.2008.12.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2008] [Revised: 12/01/2008] [Accepted: 12/03/2008] [Indexed: 10/21/2022]
Abstract
Topoisomerase I (Top1) is the specific target of the anticancer drug camptothecin (CPT) that interferes with enzyme activity promoting Top1-mediated DNA breaks and inhibition of DNA and RNA synthesis. To define the specific transcriptional response to CPT, we have determined the CPT-altered transcription profiles in yeast by using a relatively low concentration of the drug. CPT could alter global expression profiles only if a catalytically active Top1p was expressed in the cell, demonstrating that drug interference with Top1 was the sole trigger of the response. A total of 95 genes showed a statistically-significant alterations. Gene Ontology term analyses suggested that the cell response was mainly to the inhibition of nucleic acid synthesis and cell cycle progression. Promoter sequence analyses of the 22 up-regulated genes and expression studies in gene-deleted strains showed that the transcription factors, Swi4p and Mbp1p, mediate at least partially the transcriptional response to CPT. The MBP1 gene deletion abrogates a transient cell growth delay caused by CPT whereas the SWI4 gene deletion increases yeast resistance to CPT. Thus, the findings show that yeast cells have a highly selective and sensitive transcriptional response to CPT depending on SWI4 and MBP1 genes suggesting a complex regulation of cell cycle progression by the two factors in the presence of CPT.
Collapse
Affiliation(s)
- Luca Lotito
- G Moruzzi Department of Biochemistry, University of Bologna, Bologna, Italy
| | | | | | | | | | | |
Collapse
|
33
|
Kundaje A, Xin X, Lan C, Lianoglou S, Zhou M, Zhang L, Leslie C. A predictive model of the oxygen and heme regulatory network in yeast. PLoS Comput Biol 2008; 4:e1000224. [PMID: 19008939 PMCID: PMC2573020 DOI: 10.1371/journal.pcbi.1000224] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2008] [Accepted: 10/08/2008] [Indexed: 11/18/2022] Open
Abstract
Deciphering gene regulatory mechanisms through the analysis of high-throughput expression data is a challenging computational problem. Previous computational studies have used large expression datasets in order to resolve fine patterns of coexpression, producing clusters or modules of potentially coregulated genes. These methods typically examine promoter sequence information, such as DNA motifs or transcription factor occupancy data, in a separate step after clustering. We needed an alternative and more integrative approach to study the oxygen regulatory network in Saccharomyces cerevisiae using a small dataset of perturbation experiments. Mechanisms of oxygen sensing and regulation underlie many physiological and pathological processes, and only a handful of oxygen regulators have been identified in previous studies. We used a new machine learning algorithm called MEDUSA to uncover detailed information about the oxygen regulatory network using genome-wide expression changes in response to perturbations in the levels of oxygen, heme, Hap1, and Co2+. MEDUSA integrates mRNA expression, promoter sequence, and ChIP-chip occupancy data to learn a model that accurately predicts the differential expression of target genes in held-out data. We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network. This network includes both known oxygen and heme regulators, such as Hap1, Mga2, Hap4, and Upc2, as well as many new candidate regulators. MEDUSA also identified many DNA motifs that are consistent with previous experimentally identified transcription factor binding sites. Because MEDUSA's regulatory program associates regulators to target genes through their promoter sequences, we directly tested the predicted regulators for OLE1, a gene specifically induced under hypoxia, by experimental analysis of the activity of its promoter. In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation. MEDUSA can reveal important information from a small dataset and generate testable hypotheses for further experimental analysis. Supplemental data are included. The cell uses complex regulatory networks to modulate the expression of genes in response to changes in cellular and environmental conditions. The transcript level of a gene is directly affected by the binding of transcriptional regulators to DNA motifs in its promoter sequence. Therefore, both expression levels of transcription factors and other regulatory proteins as well as sequence information in the promoters contribute to transcriptional gene regulation. In this study, we describe a new computational strategy for learning gene regulatory programs from gene expression data based on the MEDUSA algorithm. We learn a model that predicts differential expression of target genes from the expression levels of regulators, the presence of DNA motifs in promoter sequences, and binding data for transcription factors. Unlike many previous approaches, we do not assume that genes are regulated in clusters, and we learn DNA motifs de novo from promoter sequences as an integrated part of our algorithm. We use MEDUSA to produce a global map of the yeast oxygen and heme regulatory network. To demonstrate that MEDUSA can reveal detailed information about regulatory mechanisms, we perform biochemical experiments to confirm the predicted regulators for an important hypoxia gene.
Collapse
Affiliation(s)
- Anshul Kundaje
- Department of Computer Science, Columbia University, New York, New York, United States of America
| | - Xiantong Xin
- Department of Molecular and Cell Biology, University of Texas at Dallas, Richardson, Texas, United States of America
| | - Changgui Lan
- Department of Molecular and Cell Biology, University of Texas at Dallas, Richardson, Texas, United States of America
| | - Steve Lianoglou
- Department of Physiology, Biophysics, and Systems Biology, Weill Medical College of Cornell University, New York, New York, United States of America
- Computational Biology Program, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Mei Zhou
- Department of Molecular and Cell Biology, University of Texas at Dallas, Richardson, Texas, United States of America
| | - Li Zhang
- Department of Molecular and Cell Biology, University of Texas at Dallas, Richardson, Texas, United States of America
- * E-mail: (LZ); (CL)
| | - Christina Leslie
- Computational Biology Program, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
- * E-mail: (LZ); (CL)
| |
Collapse
|
34
|
Karlebach G, Shamir R. Modelling and analysis of gene regulatory networks. Nat Rev Mol Cell Biol 2008; 9:770-80. [PMID: 18797474 DOI: 10.1038/nrm2503] [Citation(s) in RCA: 574] [Impact Index Per Article: 35.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Gene regulatory networks have an important role in every process of life, including cell differentiation, metabolism, the cell cycle and signal transduction. By understanding the dynamics of these networks we can shed light on the mechanisms of diseases that occur when these cellular processes are dysregulated. Accurate prediction of the behaviour of regulatory networks will also speed up biotechnological projects, as such predictions are quicker and cheaper than lab experiments. Computational methods, both for supporting the development of network models and for the analysis of their functionality, have already proved to be a valuable research tool.
Collapse
Affiliation(s)
- Guy Karlebach
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | | |
Collapse
|
35
|
Busch H, Camacho-Trullio D, Rogon Z, Breuhahn K, Angel P, Eils R, Szabowski A. Gene network dynamics controlling keratinocyte migration. Mol Syst Biol 2008; 4:199. [PMID: 18594517 PMCID: PMC2516358 DOI: 10.1038/msb.2008.36] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2008] [Accepted: 05/01/2008] [Indexed: 11/09/2022] Open
Abstract
Translation of large-scale data into a coherent model that allows one to simulate, predict and control cellular behavior is far from being resolved. Assuming that long-term cellular behavior is reflected in the gene expression kinetics, we infer a dynamic gene regulatory network from time-series measurements of DNA microarray data of hepatocyte growth factor-induced migration of primary human keratinocytes. Transferring the obtained interactions to the level of signaling pathways, we predict in silico and verify in vitro the necessary and sufficient time-ordered events that control migration. We show that pulse-like activation of the proto-oncogene receptor Met triggers a responsive state, whereas time sequential activation of EGF-R is required to initiate and maintain migration. Context information for enhancing, delaying or stopping migration is provided by the activity of the protein kinase A signaling pathway. Our study reveals the complex orchestration of multiple pathways controlling cell migration.
Collapse
Affiliation(s)
- Hauke Busch
- B080 Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | | | | | | | | | | |
Collapse
|
36
|
Abstract
During a decade of proof-of-principle analysis in model organisms, protein networks have been used to further the study of molecular evolution, to gain insight into the robustness of cells to perturbation, and for assignment of new protein functions. Following these analyses, and with the recent rise of protein interaction measurements in mammals, protein networks are increasingly serving as tools to unravel the molecular basis of disease. We review promising applications of protein networks to disease in four major areas: identifying new disease genes; the study of their network properties; identifying disease-related subnetworks; and network-based disease classification. Applications in infectious disease, personalized medicine, and pharmacology are also forthcoming as the available protein network information improves in quality and coverage.
Collapse
Affiliation(s)
- Trey Ideker
- Department of Bioengineering, University of California at San Diego, La Jolla, California 92093, USA
| | | |
Collapse
|
37
|
Veber P, Guziolowski C, Le Borgne M, Radulescu O, Siegel A. Inferring the role of transcription factors in regulatory networks. BMC Bioinformatics 2008; 9:228. [PMID: 18460200 PMCID: PMC2422845 DOI: 10.1186/1471-2105-9-228] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2007] [Accepted: 05/06/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Expression profiles obtained from multiple perturbation experiments are increasingly used to reconstruct transcriptional regulatory networks, from well studied, simple organisms up to higher eukaryotes. Admittedly, a key ingredient in developing a reconstruction method is its ability to integrate heterogeneous sources of information, as well as to comply with practical observability issues: measurements can be scarce or noisy. In this work, we show how to combine a network of genetic regulations with a set of expression profiles, in order to infer the functional effect of the regulations, as inducer or repressor. Our approach is based on a consistency rule between a network and the signs of variation given by expression arrays. RESULTS We evaluate our approach in several settings of increasing complexity. First, we generate artificial expression data on a transcriptional network of E. coli extracted from the literature (1529 nodes and 3802 edges), and we estimate that 30% of the regulations can be annotated with about 30 profiles. We additionally prove that at most 40.8% of the network can be inferred using our approach. Second, we use this network in order to validate the predictions obtained with a compendium of real expression profiles. We describe a filtering algorithm that generates particularly reliable predictions. Finally, we apply our inference approach to S. cerevisiae transcriptional network (2419 nodes and 4344 interactions), by combining ChIP-chip data and 15 expression profiles. We are able to detect and isolate inconsistencies between the expression profiles and a significant portion of the model (15% of all the interactions). In addition, we report predictions for 14.5% of all interactions. CONCLUSION Our approach does not require accurate expression levels nor times series. Nevertheless, we show on both data, real and artificial, that a relatively small number of perturbation experiments are enough to determine a significant portion of regulatory effects. This is a key practical asset compared to statistical methods for network reconstruction. We demonstrate that our approach is able to provide accurate predictions, even when the network is incomplete and the data is noisy.
Collapse
Affiliation(s)
- Philippe Veber
- Centre INRIA Rennes Bretagne Atlantique, IRISA, Rennes, France.
| | | | | | | | | |
Collapse
|
38
|
Yeang CH, McCormick F, Levine A. Combinatorial patterns of somatic gene mutations in cancer. FASEB J 2008; 22:2605-22. [DOI: 10.1096/fj.08-108985] [Citation(s) in RCA: 204] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Affiliation(s)
- Chen-Hsiang Yeang
- Simons Center for Systems BiologyInstitute for Advanced StudyPrincetonNew JerseyUSA
| | - Frank McCormick
- Helen Diller Family Comprehensive Cancer Center and Cancer Research Institute, University of CaliforniaSan FranciscoCaliforniaUSA
| | - Arnold Levine
- Simons Center for Systems BiologyInstitute for Advanced StudyPrincetonNew JerseyUSA
| |
Collapse
|
39
|
Gohlke JM, Armant O, Parham FM, Smith MV, Zimmer C, Castro DS, Nguyen L, Parker JS, Gradwohl G, Portier CJ, Guillemot F. Characterization of the proneural gene regulatory network during mouse telencephalon development. BMC Biol 2008; 6:15. [PMID: 18377642 PMCID: PMC2330019 DOI: 10.1186/1741-7007-6-15] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2007] [Accepted: 03/31/2008] [Indexed: 12/22/2022] Open
Abstract
Background The proneural proteins Mash1 and Ngn2 are key cell autonomous regulators of neurogenesis in the mammalian central nervous system, yet little is known about the molecular pathways regulated by these transcription factors. Results Here we identify the downstream effectors of proneural genes in the telencephalon using a genomic approach to analyze the transcriptome of mice that are either lacking or overexpressing proneural genes. Novel targets of Ngn2 and/or Mash1 were identified, such as members of the Notch and Wnt pathways, and proteins involved in adhesion and signal transduction. Next, we searched the non-coding sequence surrounding the predicted proneural downstream effector genes for evolutionarily conserved transcription factor binding sites associated with newly defined consensus binding sites for Ngn2 and Mash1. This allowed us to identify potential novel co-factors and co-regulators for proneural proteins, including Creb, Tcf/Lef, Pou-domain containing transcription factors, Sox9, and Mef2a. Finally, a gene regulatory network was delineated using a novel Bayesian-based algorithm that can incorporate information from diverse datasets. Conclusion Together, these data shed light on the molecular pathways regulated by proneural genes and demonstrate that the integration of experimentation with bioinformatics can guide both hypothesis testing and hypothesis generation.
Collapse
Affiliation(s)
- Julia M Gohlke
- Environmental Systems Biology Group, Laboratory of Molecular Toxicology, National Institute of Environmental Health Sciences, RTP, NC 27709, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Tresch A, Beissbarth T, Sültmann H, Kuner R, Poustka A, Buness A. Discrimination of direct and indirect interactions in a network of regulatory effects. J Comput Biol 2008; 14:1217-28. [PMID: 17990974 DOI: 10.1089/cmb.2007.0085] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The matter of concern are algorithms for the discrimination of direct from indirect regulatory effects from an interaction graph built up by error-prone measurements. Many of these algorithms can be cast as a rule for the removal of a single edge of the graph, such that the remaining graph is still consistent with the data. A set of mild conditions is given under which iterated application of such a rule leads to a unique minimal consistent graph. We show that three of the common methods for direct interactions search fulfill these conditions, thus providing a justification of their use. The main issues a reconstruction algorithm has to deal with, are the noise in the data, the presence of regulatory cycles, and the direction of the regulatory effects. We introduce a novel rule that, in contrast to the previously mentioned methods, simultaneously takes into account all these aspects. An efficient algorithm for the computation of the minimal graph is given, whose time complexity is cubic in the number of vertices of the graph. Finally, we demonstrate the utility of our method in a simulation study.
Collapse
Affiliation(s)
- Achim Tresch
- Institute for Medical Biometry, Epidemiology and Informatics, Mainz, Germany.
| | | | | | | | | | | |
Collapse
|
41
|
Ideker T. Protein Network Comparative Genomics. FASEB J 2008. [DOI: 10.1096/fasebj.22.1_supplement.538.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Trey Ideker
- BioengineeringUniversity of CaliforniaSan DiegoLa JollaCA
| |
Collapse
|
42
|
A systems approach to delineate functions of paralogous transcription factors: role of the Yap family in the DNA damage response. Proc Natl Acad Sci U S A 2008; 105:2934-9. [PMID: 18287073 DOI: 10.1073/pnas.0708670105] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Duplication of genes encoding transcription factors plays an essential role in driving phenotypic variation. Because regulation can occur at multiple levels, it is often difficult to discern how each duplicated factor achieves its regulatory specificity. In these cases, a "systems approach" may distinguish the role of each factor by integrating complementary large-scale measurements of the regulatory network. To explore such an approach, we integrate growth phenotypes, promoter binding profiles, and gene expression patterns to model the DNA damage response network controlled by the Yeast-specific AP-1 (YAP) family of transcription factors. This analysis reveals that YAP regulatory specificity is achieved by at least three mechanisms: (i) divergence of DNA-binding sequences into two subfamilies; (ii) condition-specific combinatorial regulation by multiple Yap factors; and (iii) interactions of Yap 1, 4, and 6 with chromatin remodeling proteins. Additional microarray experiments establish that Yap 4 and 6 regulate gene expression through interactions with the histone deacetylase, Hda1. The data further highlight differences among Yap paralogs in terms of their regulatory mode of action (activation vs. repression). This study suggests how other large TF families might be disentangled in the future.
Collapse
|
43
|
Abstract
Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape.
Collapse
|
44
|
GONG Y, ZHANG Z. Alternative Pathway Approach for Automating Analysis and Validation of Cell Perturbation Networks and Design of Perturbation Experiments. Ann N Y Acad Sci 2007; 1115:267-85. [DOI: 10.1196/annals.1407.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
45
|
Abstract
In this review we give an overview of computational and statistical methods to reconstruct cellular networks. Although this area of research is vast and fast developing, we show that most currently used methods can be organized by a few key concepts. The first part of the review deals with conditional independence models including Gaussian graphical models and Bayesian networks. The second part discusses probabilistic and graph-based methods for data from experimental interventions and perturbations.
Collapse
Affiliation(s)
- Florian Markowetz
- Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany
- Princeton University, Lewis-Sigler Institute for Integrative Genomics and Dept. of Computer Science, Princeton, NJ 08544, USA
| | - Rainer Spang
- Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany
- Present affiliation: University Regensburg, Institute of Functional Genomics, Josef-Engert-Str. 9, 93053 Regensburg, Germany
| |
Collapse
|
46
|
Beyer A, Bandyopadhyay S, Ideker T. Integrating physical and genetic maps: from genomes to interaction networks. Nat Rev Genet 2007; 8:699-710. [PMID: 17703239 PMCID: PMC2811081 DOI: 10.1038/nrg2144] [Citation(s) in RCA: 161] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Physical and genetic mapping data have become as important to network biology as they once were to the Human Genome Project. Integrating physical and genetic networks currently faces several challenges: increasing the coverage of each type of network; establishing methods to assemble individual interaction measurements into contiguous pathway models; and annotating these pathways with detailed functional information. A particular challenge involves reconciling the wide variety of interaction types that are currently available. For this purpose, recent studies have sought to classify genetic and physical interactions along several complementary dimensions, such as ordered versus unordered, alleviating versus aggravating, and first versus second degree.
Collapse
Affiliation(s)
- Andreas Beyer
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| | | | | |
Collapse
|
47
|
Ourfali O, Shlomi T, Ideker T, Ruppin E, Sharan R. SPINE: a framework for signaling-regulatory pathway inference from cause-effect experiments. ACTA ACUST UNITED AC 2007; 23:i359-66. [PMID: 17646318 DOI: 10.1093/bioinformatics/btm170] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The complex program of gene expression allows the cell to cope with changing genetic, developmental and environmental conditions. The accumulating large-scale measurements of gene knockout effects and molecular interactions allow us to begin to uncover regulatory and signaling pathways within the cell that connect causal to affected genes on a network of physical interactions. RESULTS We present a novel framework, SPINE, for Signaling-regulatory Pathway INferencE. The framework aims at explaining gene expression experiments in which a gene is knocked out and as a result multiple genes change their expression levels. To this end, an integrated network of protein-protein and protein-DNA interactions is constructed, and signaling pathways connecting the causal gene to the affected genes are searched for in this network. The reconstruction problem is translated into that of assigning an activation/repression attribute with each protein so as to explain (in expectation) a maximum number of the knockout effects observed. We provide an integer programming formulation for the latter problem and solve it using a commercial solver. We validate the method by applying it to a yeast subnetwork that is involved in mating. In cross-validation tests, SPINE obtains very high accuracy in predicting knockout effects (99%). Next, we apply SPINE to the entire yeast network to predict protein effects and reconstruct signaling and regulatory pathways. Overall, we are able to infer 861 paths with confidence and assign effects to 183 genes. The predicted effects are found to be in high agreement with current biological knowledge. AVAILABILITY The algorithm and data are available at http://cs.tau.ac.il/~roded/SPINE.html.
Collapse
Affiliation(s)
- Oved Ourfali
- School of Computer Science, School of Medicine, Tel-Aviv University, Tel-Aviv, Israel
| | | | | | | | | |
Collapse
|
48
|
A Three Stage Integrative Pathway Search (TIPS) framework to identify toxicity relevant genes and pathways. BMC Bioinformatics 2007; 8:202. [PMID: 17570844 PMCID: PMC1906836 DOI: 10.1186/1471-2105-8-202] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2007] [Accepted: 06/14/2007] [Indexed: 03/31/2023] Open
Abstract
Background The ability to obtain profiles of gene expressions, proteins and metabolites with the advent of high throughput technologies has advanced the study of pathway and network reconstruction. Genome-wide network reconstruction requires either interaction measurements or large amount of perturbation data, often not available for mammalian cell systems. To overcome these shortcomings, we developed a Three Stage Integrative Pathway Search (TIPS©) approach to reconstruct context-specific active pathways involved in conferring a specific phenotype, from limited amount of perturbation data. The approach was tested on human liver cells to identify pathways that confer cytotoxicity. Results This paper presents a systems approach that integrates gene expression and cytotoxicity profiles to identify a network of pathways involved in free fatty acid (FFA) and tumor necrosis factor-α (TNF-α) induced cytotoxicity in human hepatoblastoma cells (HepG2/C3A). Cytotoxicity relevant genes were first identified and then used to reconstruct a network using Bayesian network (BN) analysis. BN inference was used subsequently to predict the effects of perturbing a gene on the other genes in the network and on the cytotoxicity. These predictions were subsequently confirmed through the published literature and further experiments. Conclusion The TIPS© approach is able to reconstruct active pathways that confer a particular phenotype by integrating gene expression and phenotypic profiles. A web-based version of TIPS© that performs the analysis described herein can be accessed at .
Collapse
|
49
|
Hu Z, Killion PJ, Iyer VR. Genetic reconstruction of a functional transcriptional regulatory network. Nat Genet 2007; 39:683-7. [PMID: 17417638 DOI: 10.1038/ng2012] [Citation(s) in RCA: 302] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2006] [Accepted: 03/01/2007] [Indexed: 11/09/2022]
Abstract
Although global analyses of transcription factor binding provide one view of potential transcriptional regulatory networks, regulation also occurs at levels distinct from transcription factor binding. Here, we use a genetic approach to identify targets of transcription factors in yeast and reconstruct a functional regulatory network. First, we profiled transcriptional responses in S. cerevisiae strains with individual deletions of 263 transcription factors. Then we used directed-weighted graph modeling and regulatory epistasis analysis to identify indirect regulatory relationships between these transcription factors, and from this we reconstructed a functional transcriptional regulatory network. The enrichment of promoter motifs and Gene Ontology annotations provide insight into the biological functions of the transcription factors.
Collapse
Affiliation(s)
- Zhanzhi Hu
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, Section of Molecular Genetics and Microbiology, University of Texas at Austin, 1 University Station A4800, Austin, Texas 78712, USA
| | | | | |
Collapse
|
50
|
Sharan R, Ulitsky I, Shamir R. Network-based prediction of protein function. Mol Syst Biol 2007; 3:88. [PMID: 17353930 PMCID: PMC1847944 DOI: 10.1038/msb4100129] [Citation(s) in RCA: 620] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2006] [Accepted: 01/09/2007] [Indexed: 12/22/2022] Open
Abstract
Functional annotation of proteins is a fundamental problem in the post-genomic era. The recent availability of protein interaction networks for many model species has spurred on the development of computational methods for interpreting such data in order to elucidate protein function. In this review, we describe the current computational approaches for the task, including direct methods, which propagate functional information through the network, and module-assisted methods, which infer functional modules within the network and use those for the annotation task. Although a broad variety of interesting approaches has been developed, further progress in the field will depend on systematic evaluation of the methods and their dissemination in the biological community.
Collapse
Affiliation(s)
- Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Igor Ulitsky
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Ron Shamir
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
- School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel. Tel.: +972 3 6405383; Fax: +972 3 6405384;
| |
Collapse
|