1
|
Application of Transcriptional Gene Modules to Analysis of Caenorhabditis elegans' Gene Expression Data. G3-GENES GENOMES GENETICS 2020; 10:3623-3638. [PMID: 32759329 PMCID: PMC7534440 DOI: 10.1534/g3.120.401270] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Identification of co-expressed sets of genes (gene modules) is used widely for grouping functionally related genes during transcriptomic data analysis. An organism-wide atlas of high-quality gene modules would provide a powerful tool for unbiased detection of biological signals from gene expression data. Here, using a method based on independent component analysis we call DEXICA, we have defined and optimized 209 modules that broadly represent transcriptional wiring of the key experimental organism C. elegans. These modules represent responses to changes in the environment (e.g., starvation, exposure to xenobiotics), genes regulated by transcriptions factors (e.g., ATFS-1, DAF-16), genes specific to tissues (e.g., neurons, muscle), genes that change during development, and other complex transcriptional responses to genetic, environmental and temporal perturbations. Interrogation of these modules reveals processes that are activated in long-lived mutants in cases where traditional analyses of differentially expressed genes fail to do so. Additionally, we show that modules can inform the strength of the association between a gene and an annotation (e.g., GO term). Analysis of “module-weighted annotations” improves on several aspects of traditional annotation-enrichment tests and can aid in functional interpretation of poorly annotated genes. We provide an online interactive resource with tutorials at http://genemodules.org/, in which users can find detailed information on each module, check genes for module-weighted annotations, and use both of these to analyze their own gene expression data (generated using any platform) or gene sets of interest.
Collapse
|
2
|
Wu WS, Chen PH, Chen TT, Tseng YY. YGMD: a repository for yeast cooperative transcription factor sets and their target gene modules. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2017:4596568. [PMID: 29220473 PMCID: PMC5691354 DOI: 10.1093/database/bax085] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2017] [Accepted: 10/19/2017] [Indexed: 01/02/2023]
Abstract
By organizing the genome into gene modules (GMs), a living cell coordinates the activities of a set of genes to properly respond to environmental changes. The transcriptional regulation of the expression of a GM is usually carried out by a cooperative transcription factor set (CoopTFS) consisting of several cooperative transcription factors (TFs). Therefore, a database which provides CoopTFSs and their target GMs is useful for studying the cellular responses to internal or external stimuli. To address this need, here we constructed YGMD (Yeast Gene Module Database) to provide 34120 CoopTFSs, each of which consists of two to five cooperative TFs, and their target GMs. The cooperativity between TFs in a CoopTFS is suggested by physical/genetic interaction evidence or/and predicted by existing algorithms. The target GM regulated by a CoopTFS is defined as the common target genes of all the TFs in that CoopTFS. The regulatory association between any TF in a CoopTFS and any gene in the target GM is supported by experimental evidence in the literature. In YGMD, users can (i) search the GM regulated by a specific CoopTFS of interest or (ii) search all possible CoopTFSs whose target GMs contain a specific gene of interest. The biological relevance of YGMD is shown by a case study which demonstrates that YGMD can provide a GM enriched with genes known to be regulated by the query CoopTFS (Cbf1-Met4-Met32). We believe that YGMD provides a valuable resource for yeast biologists to study the transcriptional regulation of GMs. Database URL:http://cosbi4.ee.ncku.edu.tw/YGMD/, http://cosbi5.ee.ncku.edu.tw/YGMD/ or http://cosbi.ee.ncku.edu.tw/YGMD/
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | - Pin-Han Chen
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | - Tsung-Te Chen
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | - Yan-Yuan Tseng
- Center for Molecular Medicine and Genetics, Wayne State University, School of Medicine, Detroit, MI 48201, USA
| |
Collapse
|
3
|
Chen BS, Wu WS. Underlying Principles of Natural Selection in Network Evolution: Systems Biology Approach. Evol Bioinform Online 2017. [DOI: 10.1177/117693430700300010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Systems biology is a rapidly expanding field that integrates diverse areas of science such as physics, engineering, computer science, mathematics, and biology toward the goal of elucidating the underlying principles of hierarchical metabolic and regulatory systems in the cell, and ultimately leading to predictive understanding of cellular response to perturbations. Because post-genomics research is taking place throughout the tree of life, comparative approaches offer a way for combining data from many organisms to shed light on the evolution and function of biological networks from the gene to the organismal level. Therefore, systems biology can build on decades of theoretical work in evolutionary biology, and at the same time evolutionary biology can use the systems biology approach to go in new uncharted directions. In this study, we present a review of how the post-genomics era is adopting comparative approaches and dynamic system methods to understand the underlying design principles of network evolution and to shape the nascent field of evolutionary systems biology. Finally, the application of evolutionary systems biology to robust biological network designs is also discussed from the synthetic biology perspective.
Collapse
Affiliation(s)
- Bor-Sen Chen
- Lab of Control and Systems Biology, National Tsing Hua University, Hsinchu, 300, Taiwan
| | - Wei-Sheng Wu
- Lab of Control and Systems Biology, National Tsing Hua University, Hsinchu, 300, Taiwan
| |
Collapse
|
4
|
Hillenbrand P, Maier KC, Cramer P, Gerland U. Inference of gene regulation functions from dynamic transcriptome data. eLife 2016; 5. [PMID: 27652904 PMCID: PMC5072840 DOI: 10.7554/elife.12188] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Accepted: 09/20/2016] [Indexed: 11/17/2022] Open
Abstract
To quantify gene regulation, a function is required that relates transcription factor binding to DNA (input) to the rate of mRNA synthesis from a target gene (output). Such a ‘gene regulation function’ (GRF) generally cannot be measured because the experimental titration of inputs and simultaneous readout of outputs is difficult. Here we show that GRFs may instead be inferred from natural changes in cellular gene expression, as exemplified for the cell cycle in the yeast S. cerevisiae. We develop this inference approach based on a time series of mRNA synthesis rates from a synchronized population of cells observed over three cell cycles. We first estimate the functional form of how input transcription factors determine mRNA output and then derive GRFs for target genes in the CLB2 gene cluster that are expressed during G2/M phase. Systematic analysis of additional GRFs suggests a network architecture that rationalizes transcriptional cell cycle oscillations. We find that a transcription factor network alone can produce oscillations in mRNA expression, but that additional input from cyclin oscillations is required to arrive at the native behaviour of the cell cycle oscillator. DOI:http://dx.doi.org/10.7554/eLife.12188.001 Living cells rely on networks of genes to control their behavior, including how they grow, develop and respond to stress. Genes encode instructions needed to make proteins and other molecules, and much of the control is exerted at the first stage of protein production, known as transcription. During this process, a gene is copied to make molecules known as transcripts that may later be used as templates to make proteins. Many genes encode proteins that act to regulate transcription. Therefore, an individual gene may receive inputs from other genes, and these inputs affect how much transcript the gene produces, which can be considered as the gene’s output. While these inputs and outputs can often be wired together to form a network, it is less clear exactly how all the different inputs at a gene interact to determine its output. These interactions are known as “gene regulation functions”, and knowing them would be an important step towards understanding gene networks, which would help us to predict how cells will behave in different situations. Gene regulation functions are difficult to measure directly, so researchers would like to find other ways to assess them indirectly. A recently developed experimental technique called “dynamic transcriptome analysis” seemed promising as it measures both the inputs and outputs of all genes in a cell over time. Hillenbrand et al. used this technique to infer gene regulation functions with one or two inputs in yeast cells. Comparing these estimates with experimental data from previous studies showed that these inferred gene regulation functions could successfully predict the output of a gene based on its inputs. Hillenbrand et al. then used these estimates to search and model a well-known genetic network that is thought to be part of the molecular clockwork that controls the timing of events that cause a cell to divide. Currently, the approach used by Hillenbrand et al. treats gene regulation functions like “black boxes”. This means that, while an output can be predicted if the inputs are known, it cannot reveal all of the detailed mechanisms behind it. Gaining insights into the inner workings of these black boxes will require taking more data into account, such as how abundant the proteins that regulate transcription are, where they are located within cells or whether they are active or not. Therefore, the next challenge is to incorporate these kinds of data to gain a fuller picture of how gene networks operate within cells. DOI:http://dx.doi.org/10.7554/eLife.12188.002
Collapse
Affiliation(s)
- Patrick Hillenbrand
- Lehrstuhl für Theorie komplexer Biosysteme, Physik-Department, Technische Universität München, Garching, Germany
| | - Kerstin C Maier
- Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Patrick Cramer
- Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Ulrich Gerland
- Lehrstuhl für Theorie komplexer Biosysteme, Physik-Department, Technische Universität München, Garching, Germany
| |
Collapse
|
5
|
Wu WS, Hsieh YC, Lai FJ. YCRD: Yeast Combinatorial Regulation Database. PLoS One 2016; 11:e0159213. [PMID: 27392072 PMCID: PMC4938206 DOI: 10.1371/journal.pone.0159213] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 06/28/2016] [Indexed: 12/21/2022] Open
Abstract
In eukaryotes, the precise transcriptional control of gene expression is typically achieved through combinatorial regulation using cooperative transcription factors (TFs). Therefore, a database which provides regulatory associations between cooperative TFs and their target genes is helpful for biologists to study the molecular mechanisms of transcriptional regulation of gene expression. Because there is no such kind of databases in the public domain, this prompts us to construct a database, called Yeast Combinatorial Regulation Database (YCRD), which deposits 434,197 regulatory associations between 2535 cooperative TF pairs and 6243 genes. The comprehensive collection of more than 2500 cooperative TF pairs was retrieved from 17 existing algorithms in the literature. The target genes of a cooperative TF pair (e.g. TF1-TF2) are defined as the common target genes of TF1 and TF2, where a TF’s experimentally validated target genes were downloaded from YEASTRACT database. In YCRD, users can (i) search the target genes of a cooperative TF pair of interest, (ii) search the cooperative TF pairs which regulate a gene of interest and (iii) identify important cooperative TF pairs which regulate a given set of genes. We believe that YCRD will be a valuable resource for yeast biologists to study combinatorial regulation of gene expression. YCRD is available at http://cosbi.ee.ncku.edu.tw/YCRD/ or http://cosbi2.ee.ncku.edu.tw/YCRD/.
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan
- * E-mail:
| | - Yen-Chen Hsieh
- Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Fu-Jou Lai
- Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan
| |
Collapse
|
6
|
Wu WS, Lai FJ, Tu BW, Chang DTH. CoopTFD: a repository for predicted yeast cooperative transcription factor pairs. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw092. [PMID: 27242036 PMCID: PMC4885606 DOI: 10.1093/database/baw092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2016] [Accepted: 05/09/2016] [Indexed: 01/22/2023]
Abstract
In eukaryotic cells, transcriptional regulation of gene expression is usually accomplished by cooperative Transcription Factors (TFs). Therefore, knowing cooperative TFs is helpful for uncovering the mechanisms of transcriptional regulation. In yeast, many cooperative TF pairs have been predicted by various algorithms in the literature. However, until now, there is still no database which collects the predicted yeast cooperative TFs from existing algorithms. This prompts us to construct Cooperative Transcription Factors Database (CoopTFD), which has a comprehensive collection of 2622 predicted cooperative TF pairs (PCTFPs) in yeast from 17 existing algorithms. For each PCTFP, our database also provides five types of validation information: (i) the algorithms which predict this PCTFP, (ii) the publications which experimentally show that this PCTFP has physical or genetic interactions, (iii) the publications which experimentally study the biological roles of both TFs of this PCTFP, (iv) the common Gene Ontology (GO) terms of this PCTFP and (v) the common target genes of this PCTFP. Based on the provided validation information, users can judge the biological plausibility of a PCTFP of interest. We believe that CoopTFD will be a valuable resource for yeast biologists to study the combinatorial regulation of gene expression controlled by cooperative TFs. Database URL:http://cosbi.ee.ncku.edu.tw/CoopTFD/ or http://cosbi2.ee.ncku.edu.tw/CoopTFD/
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | - Fu-Jou Lai
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | - Bor-Wen Tu
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | - Darby Tien-Hao Chang
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| |
Collapse
|
7
|
Wu WS, Lai FJ. Functional redundancy of transcription factors explains why most binding targets of a transcription factor are not affected when the transcription factor is knocked out. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 6:S2. [PMID: 26678747 PMCID: PMC4674858 DOI: 10.1186/1752-0509-9-s6-s2] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Background Biologists are puzzled by the extremely low percentage (3%) of the binding targets of a yeast transcription factor (TF) affected when the TF is knocked out, a phenomenon observed by comparing the TF binding dataset and TF knockout effect dataset. Results This study gives a plausible biological explanation of this counterintuitive phenomenon. Our analyses find that TFs with high functional redundancy show significantly lower percentage than do TFs with low functional redundancy. This suggests that functional redundancy may lead to one TF compensating for another, thus masking the TF knockout effect on the binding targets of the knocked-out TF. In addition, we show that seven classes of genes (lowly expressed genes, TATA box-less genes, genes containing a nucleosome-free region immediately upstream of the transcriptional start site (TSS), genes with low transcriptional plasticity, genes with a low number of bound TFs, genes with a low number of TFBSs, and genes with a short average distance of TFBSs to the TSS) are insensitive to the knockout of their promoter-binding TFs, providing clues for finding other biological explanations of the surprisingly low percentage of the binding targets of a TF affected when the TF is knocked out. Conclusions This study shows that one property of TFs (functional redundancy) and seven properties of genes (expression level, TATA box, nucleosome, transcriptional plasticity, the number of bound TFs, the number of TFBSs, and the average distance of TFBSs to the TSS) may be useful for explaining a counterintuitive phenomenon: most binding targets of a yeast transcription factor are not affected when the transcription factor is knocked out.
Collapse
|
8
|
Wu WS. A Computational Method for Identifying Yeast Cell Cycle Transcription Factors. Methods Mol Biol 2015; 1342:209-19. [PMID: 26254926 DOI: 10.1007/978-1-4939-2957-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
The eukaryotic cell cycle is a complex process and is precisely regulated at many levels. Many genes specific to the cell cycle are regulated transcriptionally and are expressed just before they are needed. To understand the cell cycle process, it is important to identify the cell cycle transcription factors (TFs) that regulate the expression of cell cycle-regulated genes. Here, we describe a computational method to identify cell cycle TFs in yeast by integrating current ChIP-chip, mutant, transcription factor-binding site (TFBS), and cell cycle gene expression data. For each identified cell cycle TF, our method also assigned specific cell cycle phases in which the TF functions and identified the time lag for the TF to exert regulatory effects on its target genes. Moreover, our method can identify novel cell cycle-regulated genes as a by-product.
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Department of Electrical Engineering, National Cheng Kung University, No. 1 Daxue Road, East District, Tainan City, 701, Taiwan,
| |
Collapse
|
9
|
Eser P, Demel C, Maier KC, Schwalb B, Pirkl N, Martin DE, Cramer P, Tresch A. Periodic mRNA synthesis and degradation co-operate during cell cycle gene expression. Mol Syst Biol 2014; 10:717. [PMID: 24489117 PMCID: PMC4023403 DOI: 10.1002/msb.134886] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
During the cell cycle, the levels of hundreds of mRNAs change in a periodic manner, but how this is achieved by alterations in the rates of mRNA synthesis and degradation has not been studied systematically. Here, we used metabolic RNA labeling and comparative dynamic transcriptome analysis (cDTA) to derive mRNA synthesis and degradation rates every 5 min during three cell cycle periods of the yeast Saccharomyces cerevisiae. A novel statistical model identified 479 genes that show periodic changes in mRNA synthesis and generally also periodic changes in their mRNA degradation rates. Peaks of mRNA degradation generally follow peaks of mRNA synthesis, resulting in sharp and high peaks of mRNA levels at defined times during the cell cycle. Whereas the timing of mRNA synthesis is set by upstream DNA motifs and their associated transcription factors (TFs), the synthesis rate of a periodically expressed gene is apparently set by its core promoter.
Collapse
Affiliation(s)
- Philipp Eser
- Gene Center and Department of Biochemistry, Center for Integrated Protein Science CIPSM Ludwig-Maximilians-Universität München, Munich, Germany
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Wang H, Chiu CC, Wu YC, Wu WS. Shrinkage regression-based methods for microarray missing value imputation. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S11. [PMID: 24565159 PMCID: PMC4028886 DOI: 10.1186/1752-0509-7-s6-s11] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Background Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. Results To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Conclusions Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods.
Collapse
|
11
|
Chiu CC, Chan SY, Wang CC, Wu WS. Missing value imputation for microarray data: a comprehensive comparison study and a web tool. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S12. [PMID: 24565220 PMCID: PMC4028811 DOI: 10.1186/1752-0509-7-s6-s12] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
BACKGROUND Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. RESULTS In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. CONCLUSIONS In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses.
Collapse
Affiliation(s)
- Chia-Chun Chiu
- Department of Electrical Engineering, National Cheng Kung University, No.1 University Road, 701 Tainan, Taiwan (R. O. C
| | - Shih-Yao Chan
- Department of Electrical Engineering, National Cheng Kung University, No.1 University Road, 701 Tainan, Taiwan (R. O. C
| | - Chung-Ching Wang
- Department of Electrical Engineering, National Cheng Kung University, No.1 University Road, 701 Tainan, Taiwan (R. O. C
| | - Wei-Sheng Wu
- Department of Electrical Engineering, National Cheng Kung University, No.1 University Road, 701 Tainan, Taiwan (R. O. C
| |
Collapse
|
12
|
Diez D, Hutchins AP, Miranda-Saavedra D. Systematic identification of transcriptional regulatory modules from protein-protein interaction networks. Nucleic Acids Res 2013; 42:e6. [PMID: 24137002 PMCID: PMC3874207 DOI: 10.1093/nar/gkt913] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Transcription factors (TFs) combine with co-factors to form transcriptional regulatory modules (TRMs) that regulate gene expression programs with spatiotemporal specificity. Here we present a novel and generic method (rTRM) for the reconstruction of TRMs that integrates genomic information from TF binding, cell type-specific gene expression and protein–protein interactions. rTRM was applied to reconstruct the TRMs specific for embryonic stem cells (ESC) and hematopoietic stem cells (HSC), neural progenitor cells, trophoblast stem cells and distinct types of terminally differentiated CD4+ T cells. The ESC and HSC TRM predictions were highly precise, yielding 77 and 96 proteins, of which ∼75% have been independently shown to be involved in the regulation of these cell types. Furthermore, rTRM successfully identified a large number of bridging proteins with known roles in ESCs and HSCs, which could not have been identified using genomic approaches alone, as they lack the ability to bind specific DNA sequences. This highlights the advantage of rTRM over other methods that ignore PPI information, as proteins need to interact with other proteins to form complexes and perform specific functions. The prediction and experimental validation of the co-factors that endow master regulatory TFs with the capacity to select specific genomic sites, modulate the local epigenetic profile and integrate multiple signals will provide important mechanistic insights not only into how such TFs operate, but also into abnormal transcriptional states leading to disease.
Collapse
Affiliation(s)
- Diego Diez
- World Premier International (WPI) Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita 565-0871, Osaka, Japan, South China Institute for Stem Cell Biology and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, 190 Kaiyuan Ave, Guangzhou 510663, China and Fibrosis Laboratories, Institute of Cellular Medicine, Newcastle University Medical School, Framlington Place, Newcastle upon Tyne NE2 4HH, United Kingdom
| | | | | |
Collapse
|
13
|
Systems biology as an integrated platform for bioinformatics, systems synthetic biology, and systems metabolic engineering. Cells 2013; 2:635-88. [PMID: 24709875 PMCID: PMC3972654 DOI: 10.3390/cells2040635] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Revised: 09/12/2013] [Accepted: 09/19/2013] [Indexed: 01/11/2023] Open
Abstract
Systems biology aims at achieving a system-level understanding of living organisms and applying this knowledge to various fields such as synthetic biology, metabolic engineering, and medicine. System-level understanding of living organisms can be derived from insight into: (i) system structure and the mechanism of biological networks such as gene regulation, protein interactions, signaling, and metabolic pathways; (ii) system dynamics of biological networks, which provides an understanding of stability, robustness, and transduction ability through system identification, and through system analysis methods; (iii) system control methods at different levels of biological networks, which provide an understanding of systematic mechanisms to robustly control system states, minimize malfunctions, and provide potential therapeutic targets in disease treatment; (iv) systematic design methods for the modification and construction of biological networks with desired behaviors, which provide system design principles and system simulations for synthetic biology designs and systems metabolic engineering. This review describes current developments in systems biology, systems synthetic biology, and systems metabolic engineering for engineering and biology researchers. We also discuss challenges and future prospects for systems biology and the concept of systems biology as an integrated platform for bioinformatics, systems synthetic biology, and systems metabolic engineering.
Collapse
|
14
|
Tsoy OV, Pyatnitskiy MA, Kazanov MD, Gelfand MS. Evolution of transcriptional regulation in closely related bacteria. BMC Evol Biol 2012; 12:200. [PMID: 23039862 PMCID: PMC3735044 DOI: 10.1186/1471-2148-12-200] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Accepted: 09/26/2012] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND The exponential growth of the number of fully sequenced genomes at varying taxonomic closeness allows one to characterize transcriptional regulation using comparative-genomics analysis instead of time-consuming experimental methods. A transcriptional regulatory unit consists of a transcription factor, its binding site and a regulated gene. These units constitute a graph which contains so-called "network motifs", subgraphs of a given structure. Here we consider genomes of closely related Enterobacteriales and estimate the fraction of conserved network motifs and sites as well as positions under selection in various types of non-coding regions. RESULTS Using a newly developed technique, we found that the highest fraction of positions under selection, approximately 50%, was observed in synvergon spacers (between consecutive genes from the same strand), followed by ~45% in divergon spacers (common 5'-regions), and ~10% in convergon spacers (common 3'-regions). The fraction of selected positions in functional regions was higher, 60% in transcription factor-binding sites and ~45% in terminators and promoters. Small, but significant differences were observed between Escherichia coli and Salmonella enterica. This fraction is similar to the one observed in eukaryotes.The conservation of binding sites demonstrated some differences between types of regulatory units. In E. coli, strains the interactions of the type "local transcriptional factor gene" turned out to be more conserved in feed-forward loops (FFLs) compared to non-motif interactions. The coherent FFLs tend to be less conserved than the incoherent FFLs. A natural explanation is that the former imply functional redundancy. CONCLUSIONS A naïve hypothesis that FFL would be highly conserved turned out to be not entirely true: its conservation depends on its status in the transcriptional network and also from its usage. The fraction of positions under selection in intergenic regions of bacterial genomes is roughly similar to that of eukaryotes. Known regulatory sites explain 20±5% of selected positions.
Collapse
Affiliation(s)
- Olga V Tsoy
- Institute for Information Transmission Problems, RAS, Bolshoi Karetny per. 19, Moscow 127994, Russia
| | | | | | | |
Collapse
|
15
|
Yang TH, Wu WS. Identifying biologically interpretable transcription factor knockout targets by jointly analyzing the transcription factor knockout microarray and the ChIP-chip data. BMC SYSTEMS BIOLOGY 2012; 6:102. [PMID: 22898448 PMCID: PMC3465233 DOI: 10.1186/1752-0509-6-102] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Accepted: 08/02/2012] [Indexed: 12/17/2022]
Abstract
Background Transcription factor knockout microarrays (TFKMs) provide useful information about gene regulation. By using statistical methods for detecting differentially expressed genes between the gene expression microarray data of the mutant and wild type strains, the TF knockout targets of the knocked-out TF can be identified. However, the identified TF knockout targets may contain a certain amount of false positives due to the experimental noises inherent in the high-throughput microarray technology. Even if the identified TF knockout targets are true, the molecular mechanisms of how a TF regulates its TF knockout targets remain unknown by this kind of statistical approaches. Results To solve these two problems, we developed a method to filter out the false positives in the original TF knockout targets (identified by statistical approaches) so that the biologically interpretable TF knockout targets can be extracted. Our method can further generate experimentally testable hypotheses of the molecular mechanisms of how a TF regulates its biologically interpretable TF knockout targets. The details of our method are as follows. First, a TF binding network was constructed using the ChIP-chip data deposited in the YEASTRACT database. Then for each original TF knockout target, it is said to be biologically interpretable if a path (in the TF binding network) from the knocked-out TF to this target could be identified by our path search algorithm. The identified path explains how the TF may regulate this target either directly by binding to its promoter or indirectly through intermediate TFs. After checking all the original TF knockout targets, the biologically interpretable ones could be extracted and the false positives could be filtered out. We validated the biological significance of our refined (i.e., biologically interpretable) TF knockout targets by assessing their functional enrichment, expression coherence, and the prevalence of protein-protein interactions. Our refined TF knockout targets outperform the original TF knockout targets across all measures. Conclusions By jointly analyzing the TFKM and ChIP-chip data, our method can extract the biologically interpretable TF knockout targets by identifying paths (in the TF binding network) from the knocked-out TF to these targets. The identified paths form experimentally testable hypotheses regarding the molecular mechanisms of how a TF may regulate its knockout targets. About seven hundred hypotheses generated by our methods have been experimentally validated in the literature. Our work demonstrates that integrating different data sources is a powerful approach to study complex biological systems.
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | | |
Collapse
|
16
|
Wang H, Wang YH, Wu WS. Yeast cell cycle transcription factors identification by variable selection criteria. Gene 2011; 485:172-6. [DOI: 10.1016/j.gene.2011.06.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2011] [Revised: 05/12/2011] [Accepted: 06/03/2011] [Indexed: 01/12/2023]
|
17
|
Yao CW, Hsu BD, Chen BS. Constructing gene regulatory networks for long term photosynthetic light acclimation in Arabidopsis thaliana. BMC Bioinformatics 2011; 12:335. [PMID: 21834997 PMCID: PMC3162938 DOI: 10.1186/1471-2105-12-335] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2010] [Accepted: 08/11/2011] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Photosynthetic light acclimation is an important process that allows plants to optimize the efficiency of photosynthesis, which is the core technology for green energy. However, currently little is known about the molecular mechanisms behind the regulation of the photosynthetic light acclimation response. In this study, a systematic method is proposed to investigate this mechanism by constructing gene regulatory networks from microarray data of Arabidopsis thaliana. METHODS The potential TF-gene regulatory pairs of photosynthetic light acclimation have been obtained by data mining of literature and databases. Following the identification of these potential TF-gene pairs, they have been refined using Pearson's correlation, allowing the construction of a rough gene regulatory network. This rough gene regulatory network is then pruned using time series microarray data of Arabidopsis thaliana via the maximum likelihood system identification method and Akaike's system order detection method to approach the real gene regulatory network of photosynthetic light acclimation. RESULTS By comparing the gene regulatory networks under the PSI-to-PSII light shift and the PSII-to-PSI light shift, it is possible to identify important transcription factors for the different light shift conditions. Furthermore, the robustness of the gene network, in particular the hubs and weak linkage points, are also discussed under the different light conditions to gain further insight into the mechanisms of photosynthesis. CONCLUSIONS This study investigates the molecular mechanisms of photosynthetic light acclimation for Arabidopsis thaliana from the physiological level. This has been achieved through the construction of gene regulatory networks from the limited data sources and literature via an efficient computation method. If more experimental data for whole-genome ChIP-chip data and microarray data with multiple sampling points becomes available in the future, the proposed method will be improved on by constructing the whole-genome gene regulatory network. These advances will greatly improve our understanding of the mechanisms of the photosynthetic system.
Collapse
Affiliation(s)
- Cheng-Wei Yao
- Lab of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsin-Chu, 300, Taiwan
| | | | | |
Collapse
|
18
|
Uncovering the transcriptional circuitry in skeletal muscle regeneration. Mamm Genome 2011; 22:272-81. [PMID: 21509518 DOI: 10.1007/s00335-011-9322-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2010] [Accepted: 03/07/2011] [Indexed: 02/04/2023]
Abstract
Skeletal muscle has a remarkable ability to regenerate after repeated and complete destruction of the tissue. The healing phases for an injured muscle undergo an activation program controlled by a dynamically inducible transcriptional regulatory network. Mapping a complex mammalian transcriptional network is confronted by significant challenges and requires the integration of multiple experimental data types. In this work we present a system approach to describe the transcriptional circuitry during skeletal muscle regeneration using time-course expression data and motif scanning information. Time-lagged correlation analysis was utilized to evaluate the transcription factor (TF) → target associations. Our analysis identified six TFs that potentially play a central role throughout the regeneration process. Four of them have previously been described to be important for muscle regeneration and differentiation. The remaining two TFs are identified as novel regulators that may have a role in the regeneration process. We hope that our work may provide useful clues to help accelerate the recovery process in injured skeletal muscle.
Collapse
|
19
|
Wu WS. Different Functional Gene Clusters in Yeast have Different Spatial Distributions of the Transcription Factor Binding Sites. Bioinform Biol Insights 2011; 5:1-11. [PMID: 21423404 PMCID: PMC3045049 DOI: 10.4137/bbi.s6362] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Transcription factors control gene expression by binding to short specific DNA sequences, called transcription factor binding sites (TFBSs), in the promoter of a gene. Thus, studying the spatial distribution of TFBSs in the promoters may provide insights into the molecular mechanisms of gene regulation. I developed a method to construct the spatial distribution of TFBSs for any set of genes of interest. I found that different functional gene clusters have different spatial distributions of TFBSs, indicating that gene regulation mechanisms may be very different among different functional gene clusters. I also found that the binding sites for different transcription factors (TFs) may have different spatial distributions: a sharp peak, a plateau or no dominant single peak. The spatial distributions of binding sites for many TFs derived from my analyses are valuable prior information for TFBS prediction algorithm because different regions of a promoter can assign different possibilities for TFBS occurrence.
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Lab of Computational Systems Biology, Department of Electrical Engineering, National Cheng Kung University, Taiwan
| |
Collapse
|
20
|
Liu Q, Tan Y, Huang T, Ding G, Tu Z, Liu L, Li Y, Dai H, Xie L. TF-centered downstream gene set enrichment analysis: Inference of causal regulators by integrating TF-DNA interactions and protein post-translational modifications information. BMC Bioinformatics 2010; 11 Suppl 11:S5. [PMID: 21172055 PMCID: PMC3024863 DOI: 10.1186/1471-2105-11-s11-s5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Background Inference of causal regulators responsible for gene expression changes under different conditions is of great importance but remains rather challenging. To date, most approaches use direct binding targets of transcription factors (TFs) to associate TFs with expression profiles. However, the low overlap between binding targets of a TF and the affected genes of the TF knockout limits the power of those methods. Results We developed a TF-centered downstream gene set enrichment analysis approach to identify potential causal regulators responsible for expression changes. We constructed hierarchical and multi-layer regulation models to derive possible downstream gene sets of a TF using not only TF-DNA interactions, but also, for the first time, post-translational modifications (PTM) information. We verified our method in one expression dataset of large-scale TF knockout and another dataset involving both TF knockout and TF overexpression. Compared with the flat model using TF-DNA interactions alone, our method correctly identified five more actual perturbed TFs in large-scale TF knockout data and six more perturbed TFs in overexpression data. Potential regulatory pathways downstream of three perturbed regulators— SNF1, AFT1 and SUT1 —were given to demonstrate the power of multilayer regulation models integrating TF-DNA interactions and PTM information. Additionally, our method successfully identified known important TFs and inferred some novel potential TFs involved in the transition from fermentative to glycerol-based respiratory growth and in the pheromone response. Downstream regulation pathways of SUT1 and AFT1 were also supported by the mRNA and/or phosphorylation changes of their mediating TFs and/or “modulator” proteins. Conclusions The results suggest that in addition to direct transcription, indirect transcription and post-translational regulation are also responsible for the effects of TFs perturbation, especially for TFs overexpression. Many TFs inferred by our method are supported by literature. Multiple TF regulation models could lead to new hypotheses for future experiments. Our method provides a valuable framework for analyzing gene expression data to identify causal regulators in the context of TF-DNA interactions and PTM information.
Collapse
Affiliation(s)
- Qi Liu
- School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Lai F, Chang JS, Wu WS. Identifying a Transcription Factor's Regulatory Targets from its Binding Targets. GENE REGULATION AND SYSTEMS BIOLOGY 2010; 4:125-33. [PMID: 21245946 PMCID: PMC3020039 DOI: 10.4137/grsb.s6458] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
ChIP-chip data, which shows binding of transcription factors (TFs) to promoter regions in vivo, are widely used by biologists to identify the regulatory targets of TFs. However, the binding of a TF to a gene does not necessarily imply regulation. Thus, it is important to develop computational methods which can extract a TF’s regulatory targets from its binding targets. We developed a method, called REgulatory Targets Extraction Algorithm (RETEA), which uses partial correlation analysis on gene expression data to extract a TF’s regulatory targets from its binding targets inferred from ChIP-chip data. We applied RETEA to yeast cell cycle microarray data and identified the plausible regulatory targets of eleven known cell cycle TFs. We validated our predictions by checking the enrichments for cell cycle-regulated genes, common cellular processes and common molecular functions. Finally, we showed that RETEA performs better than three published methods (MA-Network, TRIA and Garten et al’s method).
Collapse
Affiliation(s)
- Fred Lai
- Department of Engineering Science, National Cheng Kung University, Tainan, Taiwan
| | | | | |
Collapse
|
22
|
Ferrezuelo F, Colomina N, Futcher B, Aldea M. The transcriptional network activated by Cln3 cyclin at the G1-to-S transition of the yeast cell cycle. Genome Biol 2010; 11:R67. [PMID: 20573214 PMCID: PMC2911115 DOI: 10.1186/gb-2010-11-6-r67] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2010] [Accepted: 06/23/2010] [Indexed: 12/25/2022] Open
Abstract
Background The G1-to-S transition of the cell cycle in the yeast Saccharomyces cerevisiae involves an extensive transcriptional program driven by transcription factors SBF (Swi4-Swi6) and MBF (Mbp1-Swi6). Activation of these factors ultimately depends on the G1 cyclin Cln3. Results To determine the transcriptional targets of Cln3 and their dependence on SBF or MBF, we first have used DNA microarrays to interrogate gene expression upon Cln3 overexpression in synchronized cultures of strains lacking components of SBF and/or MBF. Secondly, we have integrated this expression dataset together with other heterogeneous data sources into a single probabilistic model based on Bayesian statistics. Our analysis has produced more than 200 transcription factor-target assignments, validated by ChIP assays and by functional enrichment. Our predictions show higher internal coherence and predictive power than previous classifications. Our results support a model whereby SBF and MBF may be differentially activated by Cln3. Conclusions Integration of heterogeneous genome-wide datasets is key to building accurate transcriptional networks. By such integration, we provide here a reliable transcriptional network at the G1-to-S transition in the budding yeast cell cycle. Our results suggest that to improve the reliability of predictions we need to feed our models with more informative experimental data.
Collapse
Affiliation(s)
- Francisco Ferrezuelo
- Departament de Ciències Mèdiques Bàsiques, Institut de Recerca Biomèdica de Lleida, Universitat de Lleida, Montserrat Roig 2, 25008 Lleida, Spain.
| | | | | | | |
Collapse
|
23
|
LIU QJ, WANG ZH, LIU WL, LI D, HE FC, ZHU YP. A Novel Method to Identify The Condition-specific Regulatory Sub-network That Controls The Yeast Cell Cycle Based on Gene Expression Model*. PROG BIOCHEM BIOPHYS 2010. [DOI: 10.3724/sp.j.1206.2009.00581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
24
|
Chen T, Li F, Chen BS. Cross-talks of sensory transcription networks in response to various environmental stresses. Interdiscip Sci 2009; 1:46-54. [PMID: 20640818 DOI: 10.1007/s12539-008-0018-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2008] [Revised: 11/28/2008] [Accepted: 11/29/2008] [Indexed: 11/26/2022]
Abstract
For living organisms like Saccharomyces cerevisiae, instinctual response to sudden environmental changes leads to swift establishment of adaptive mechanisms. The issue about how TFs sense and adapt to various environmental stresses has not been systematically studied yet. Here we try to elucidate this problem with the assistance of genomic expression patterns from a computation perspective. A dynamic transcriptional regulatory model is employed to uncover significant TF-target regulatory relationships under various environmental stresses. Based on a global microarray dataset that describes how transcriptional regulators significantly respond to one specific stress, we constructed a sensory transcriptional network for the potential specific stressresponsive regulators. Alternatively, we have observed cross-talks among these sensory transcription networks that may shed light on general stress-responsive regulators. Results reveal that our method not only reconstructs the potential global protection mechanisms under various environmental stresses but also presents a set of reported specific stress-responsive regulators (i.e., Aft2, Hsf1, Msn2, Msn4, Skn7 and Yap1) as well as a set of inferred specific/general stress-responsive regulators that may provide new guidance for further experiments on yeast cells' adaption to environmental stimuli. Though we only make a study on the yeast S. cerevisiae, our method can be broadly applied to all species.
Collapse
Affiliation(s)
- Ting Chen
- Department of Electronic Engineering, Fudan University, Shanghai, 200433, China
| | | | | |
Collapse
|
25
|
Reid JE, Ott S, Wernisch L. Transcriptional programs: modelling higher order structure in transcriptional control. BMC Bioinformatics 2009; 10:218. [PMID: 19607663 PMCID: PMC2725141 DOI: 10.1186/1471-2105-10-218] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2008] [Accepted: 07/16/2009] [Indexed: 12/15/2022] Open
Abstract
Background Transcriptional regulation is an important part of regulatory control in eukaryotes. Even if binding motifs for transcription factors are known, the task of finding binding sites by scanning sequences is plagued by false positives. One way to improve the detection of binding sites from motifs is by taking cooperativity of transcription factor binding into account. We propose a non-parametric probabilistic model, similar to a document topic model, for detecting transcriptional programs, groups of cooperative transcription factors and co-regulated genes. The analysis results in transcriptional programs which generalise both transcriptional modules and TF-target gene incidence matrices and provide a higher-level summary of these structures. The method is independent of prior specification of training sets of genes, for example, via gene expression data. The analysis is based on known binding motifs. Results We applied our method to putative regulatory regions of 18,445 Mus musculus genes. We discovered just 68 transcriptional programs that effectively summarised the action of 149 transcription factors on these genes. Several of these programs were significantly enriched for known biological processes and signalling pathways. One transcriptional program has a significant overlap with a reference set of cell cycle specific transcription factors. Conclusion Our method is able to pick out higher order structure from noisy sequence analyses. The transcriptional programs it identifies potentially represent common mechanisms of regulatory control across the genome. It simultaneously predicts which genes are co-regulated and which sets of transcription factors cooperate to achieve this co-regulation. The programs we discovered enable biologists to choose new genes and transcription factors to study in specific transcriptional regulatory systems.
Collapse
Affiliation(s)
- John E Reid
- MRC Biostatistics Unit, Institute of Public Health, University Forvie Site, Cambridge CB2 0SR, UK.
| | | | | |
Collapse
|
26
|
Bacha J, Brodie JS, Loose MW. myGRN: a database and visualisation system for the storage and analysis of developmental genetic regulatory networks. BMC DEVELOPMENTAL BIOLOGY 2009; 9:33. [PMID: 19500400 PMCID: PMC2702357 DOI: 10.1186/1471-213x-9-33] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2009] [Accepted: 06/06/2009] [Indexed: 11/23/2022]
Abstract
Background Biological processes are regulated by complex interactions between transcription factors and signalling molecules, collectively described as Genetic Regulatory Networks (GRNs). The characterisation of these networks to reveal regulatory mechanisms is a long-term goal of many laboratories. However compiling, visualising and interacting with such networks is non-trivial. Current tools and databases typically focus on GRNs within simple, single celled organisms. However, data is available within the literature describing regulatory interactions in multi-cellular organisms, although not in any systematic form. This is particularly true within the field of developmental biology, where regulatory interactions should also be tagged with information about the time and anatomical location of development in which they occur. Description We have developed myGRN (), a web application for storing and interrogating interaction data, with an emphasis on developmental processes. Users can submit interaction and gene expression data, either curated from published sources or derived from their own unpublished data. All interactions associated with publications are publicly visible, and unpublished interactions can only be shared between collaborating labs prior to publication. Users can group interactions into discrete networks based on specific biological processes. Various filters allow dynamic production of network diagrams based on a range of information including tissue location, developmental stage or basic topology. Individual networks can be viewed using myGRV, a tool focused on displaying developmental networks, or exported in a range of formats compatible with third party tools. Networks can also be analysed for the presence of common network motifs. We demonstrate the capabilities of myGRN using a network of zebrafish interactions integrated with expression data from the zebrafish database, ZFIN. Conclusion Here we are launching myGRN as a community-based repository for interaction networks, with a specific focus on developmental networks. We plan to extend its functionality, as well as use it to study networks involved in embryonic development in the future.
Collapse
Affiliation(s)
- Jamil Bacha
- Institute of Genetics, University of Nottingham, Nottingham, UK.
| | | | | |
Collapse
|
27
|
Michoel T, De Smet R, Joshi A, Marchal K, Van de Peer Y. Reverse-engineering transcriptional modules from gene expression data. Ann N Y Acad Sci 2009; 1158:36-43. [PMID: 19348630 DOI: 10.1111/j.1749-6632.2008.03943.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
"Module networks" are a framework to learn gene regulatory networks from expression data using a probabilistic model in which coregulated genes share the same parameters and conditional distributions. We present a method to infer ensembles of such networks and an averaging procedure to extract the statistically most significant modules and their regulators. We show that the inferred probabilistic models extend beyond the dataset used to learn the models.
Collapse
Affiliation(s)
- Tom Michoel
- Department of Plant Systems Biology, VIB, Gent, Belgium.
| | | | | | | | | |
Collapse
|
28
|
Stuart GR, Copeland WC, Strand MK. Construction and application of a protein and genetic interaction network (yeast interactome). Nucleic Acids Res 2009; 37:e54. [PMID: 19273534 PMCID: PMC2673449 DOI: 10.1093/nar/gkp140] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Cytoscape is a bioinformatic data analysis and visualization platform that is well-suited to the analysis of gene expression data. To facilitate the analysis of yeast microarray data using Cytoscape, we constructed an interaction network (interactome) using the curated interaction data available from the Saccharomyces Genome Database (www.yeastgenome.org) and the database of yeast transcription factors at YEASTRACT (www.yeastract.com). These data were formatted and imported into Cytoscape using semi-automated methods, including Linux-based scripts, that simplified the process while minimizing the introduction of processing errors. The methods described for the construction of this yeast interactome are generally applicable to the construction of any interactome. Using Cytoscape, we illustrate the use of this interactome through the analysis of expression data from a recent yeast diauxic shift experiment. We also report and briefly describe the complex associations among transcription factors that result in the regulation of thousands of genes through coordinated changes in expression of dozens of transcription factors. These cells are thus able to sensitively regulate cellular metabolism in response to changes in genetic or environmental conditions through relatively small changes in the expression of large numbers of genes, affecting the entire yeast metabolome.
Collapse
Affiliation(s)
- Gregory R Stuart
- Laboratory of Molecular Genetics, National Institute of Environmental Health Sciences and Life Sciences Division, Research Triangle Park, NC 27709, USA
| | | | | |
Collapse
|
29
|
Wu WS, Li WH. Systematic identification of yeast cell cycle transcription factors using multiple data sources. BMC Bioinformatics 2008; 9:522. [PMID: 19061501 PMCID: PMC2613934 DOI: 10.1186/1471-2105-9-522] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2008] [Accepted: 12/05/2008] [Indexed: 12/16/2022] Open
Abstract
Background Eukaryotic cell cycle is a complex process and is precisely regulated at many levels. Many genes specific to the cell cycle are regulated transcriptionally and are expressed just before they are needed. To understand the cell cycle process, it is important to identify the cell cycle transcription factors (TFs) that regulate the expression of cell cycle-regulated genes. Results We developed a method to identify cell cycle TFs in yeast by integrating current ChIP-chip, mutant, transcription factor binding site (TFBS), and cell cycle gene expression data. We identified 17 cell cycle TFs, 12 of which are known cell cycle TFs, while the remaining five (Ash1, Rlm1, Ste12, Stp1, Tec1) are putative novel cell cycle TFs. For each cell cycle TF, we assigned specific cell cycle phases in which the TF functions and identified the time lag for the TF to exert regulatory effects on its target genes. We also identified 178 novel cell cycle-regulated genes, among which 59 have unknown functions, but they may now be annotated as cell cycle-regulated genes. Most of our predictions are supported by previous experimental or computational studies. Furthermore, a high confidence TF-gene regulatory matrix is derived as a byproduct of our method. Each TF-gene regulatory relationship in this matrix is supported by at least three data sources: gene expression, TFBS, and ChIP-chip or/and mutant data. We show that our method performs better than four existing methods for identifying yeast cell cycle TFs. Finally, an application of our method to different cell cycle gene expression datasets suggests that our method is robust. Conclusion Our method is effective for identifying yeast cell cycle TFs and cell cycle-regulated genes. Many of our predictions are validated by the literature. Our study shows that integrating multiple data sources is a powerful approach to studying complex biological systems.
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Department of Evolution and Ecology, University of Chicago, Chicago, IL 60637, USA.
| | | |
Collapse
|
30
|
Chen BS, Yang SK, Lan CY, Chuang YJ. A systems biology approach to construct the gene regulatory network of systemic inflammation via microarray and databases mining. BMC Med Genomics 2008; 1:46. [PMID: 18823570 PMCID: PMC2567339 DOI: 10.1186/1755-8794-1-46] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2008] [Accepted: 09/30/2008] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Inflammation is a hallmark of many human diseases. Elucidating the mechanisms underlying systemic inflammation has long been an important topic in basic and clinical research. When primary pathogenetic events remains unclear due to its immense complexity, construction and analysis of the gene regulatory network of inflammation at times becomes the best way to understand the detrimental effects of disease. However, it is difficult to recognize and evaluate relevant biological processes from the huge quantities of experimental data. It is hence appealing to find an algorithm which can generate a gene regulatory network of systemic inflammation from high-throughput genomic studies of human diseases. Such network will be essential for us to extract valuable information from the complex and chaotic network under diseased conditions. RESULTS In this study, we construct a gene regulatory network of inflammation using data extracted from the Ensembl and JASPAR databases. We also integrate and apply a number of systematic algorithms like cross correlation threshold, maximum likelihood estimation method and Akaike Information Criterion (AIC) on time-lapsed microarray data to refine the genome-wide transcriptional regulatory network in response to bacterial endotoxins in the context of dynamic activated genes, which are regulated by transcription factors (TFs) such as NF-kappaB. This systematic approach is used to investigate the stochastic interaction represented by the dynamic leukocyte gene expression profiles of human subject exposed to an inflammatory stimulus (bacterial endotoxin). Based on the kinetic parameters of the dynamic gene regulatory network, we identify important properties (such as susceptibility to infection) of the immune system, which may be useful for translational research. Finally, robustness of the inflammatory gene network is also inferred by analyzing the hubs and "weak ties" structures of the gene network. CONCLUSION In this study, Data mining and dynamic network analyses were integrated to examine the gene regulatory network in the inflammatory response system. Compared with previous methodologies reported in the literatures, the proposed gene network perturbation method has shown a great improvement in analyzing the systemic inflammation.
Collapse
Affiliation(s)
- Bor-Sen Chen
- Lab of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, 300, Taiwan
| | - Shih-Kuang Yang
- Lab of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, 300, Taiwan
| | - Chung-Yu Lan
- Department of Life Science, National Tsing Hua University, Hsinchu, 300, Taiwan
| | - Yung-Jen Chuang
- Department of Life Science, National Tsing Hua University, Hsinchu, 300, Taiwan
| |
Collapse
|
31
|
Wu WS, Li WH. Identifying gene regulatory modules of heat shock response in yeast. BMC Genomics 2008; 9:439. [PMID: 18811975 PMCID: PMC2575218 DOI: 10.1186/1471-2164-9-439] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2008] [Accepted: 09/23/2008] [Indexed: 03/27/2023] Open
Abstract
Background A gene regulatory module (GRM) is a set of genes that is regulated by the same set of transcription factors (TFs). By organizing the genome into GRMs, a living cell can coordinate the activities of many genes in response to various internal and external stimuli. Therefore, identifying GRMs is helpful for understanding gene regulation. Results Integrating transcription factor binding site (TFBS), mutant, ChIP-chip, and heat shock time series gene expression data, we develop a method, called Heat-Inducible Module Identification Algorithm (HIMIA), for reconstructing GRMs of yeast heat shock response. Unlike previous module inference tools which are static statistics-based methods, HIMIA is a dynamic system model-based method that utilizes the dynamic nature of time series gene expression data. HIMIA identifies 29 GRMs, which in total contain 182 heat-inducible genes regulated by 12 heat-responsive TFs. Using various types of published data, we validate the biological relevance of the identified GRMs. Our analysis suggests that different combinations of a fairly small number of heat-responsive TFs regulate a large number of genes involved in heat shock response and that there may exist crosstalk between heat shock response and other cellular processes. Using HIMIA, we identify 68 uncharacterized genes that may be involved in heat shock response and we also identify their plausible heat-responsive regulators. Furthermore, HIMIA is capable of assigning the regulatory roles of the TFs that regulate GRMs and Cst6, Hsf1, Msn2, Msn4, and Yap1 are found to be activators of several GRMs. In addition, HIMIA refines two clusters of genes involved in heat shock response and provides a better understanding of how the complex expression program of heat shock response is regulated. Finally, we show that HIMIA outperforms four current module inference tools (GRAM, MOFA, ReMoDisvovery, and SAMBA), and we conduct two randomization tests to show that the output of HIMIA is statistically meaningful. Conclusion HIMIA is effective for reconstructing GRMs of yeast heat shock response. Indeed, many of the reconstructed GRMs are in agreement with previous studies. Further, HIMIA predicts several interesting new modules and novel TF combinations. Our study shows that integrating multiple types of data is a powerful approach to studying complex biological systems.
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Department of Evolution and Ecology, University of Chicago, Chicago, IL 60637, USA.
| | | |
Collapse
|
32
|
Ramsey SA, Klemm SL, Zak DE, Kennedy KA, Thorsson V, Li B, Gilchrist M, Gold ES, Johnson CD, Litvak V, Navarro G, Roach JC, Rosenberger CM, Rust AG, Yudkovsky N, Aderem A, Shmulevich I. Uncovering a macrophage transcriptional program by integrating evidence from motif scanning and expression dynamics. PLoS Comput Biol 2008; 4:e1000021. [PMID: 18369420 PMCID: PMC2265556 DOI: 10.1371/journal.pcbi.1000021] [Citation(s) in RCA: 143] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2007] [Accepted: 02/04/2008] [Indexed: 01/04/2023] Open
Abstract
Macrophages are versatile immune cells that can detect a variety of pathogen-associated molecular patterns through their Toll-like receptors (TLRs). In response to microbial challenge, the TLR-stimulated macrophage undergoes an activation program controlled by a dynamically inducible transcriptional regulatory network. Mapping a complex mammalian transcriptional network poses significant challenges and requires the integration of multiple experimental data types. In this work, we inferred a transcriptional network underlying TLR-stimulated murine macrophage activation. Microarray-based expression profiling and transcription factor binding site motif scanning were used to infer a network of associations between transcription factor genes and clusters of co-expressed target genes. The time-lagged correlation was used to analyze temporal expression data in order to identify potential causal influences in the network. A novel statistical test was developed to assess the significance of the time-lagged correlation. Several associations in the resulting inferred network were validated using targeted ChIP-on-chip experiments. The network incorporates known regulators and gives insight into the transcriptional control of macrophage activation. Our analysis identified a novel regulator (TGIF1) that may have a role in macrophage activation. Macrophages play a vital role in host defense against infection by recognizing pathogens through pattern recognition receptors, such as the Toll-like receptors (TLRs), and mounting an immune response. Stimulation of TLRs initiates a complex transcriptional program in which induced transcription factor genes dynamically regulate downstream genes. Microarray-based transcriptional profiling has proved useful for mapping such transcriptional programs in simpler model organisms; however, mammalian systems present difficulties such as post-translational regulation of transcription factors, combinatorial gene regulation, and a paucity of available gene-knockout expression data. Additional evidence sources, such as DNA sequence-based identification of transcription factor binding sites, are needed. In this work, we computationally inferred a transcriptional network for TLR-stimulated murine macrophages. Our approach combined sequence scanning with time-course expression data in a probabilistic framework. Expression data were analyzed using the time-lagged correlation. A novel, unbiased method was developed to assess the significance of the time-lagged correlation. The inferred network of associations between transcription factor genes and co-expressed gene clusters was validated with targeted ChIP-on-chip experiments, and yielded insights into the macrophage activation program, including a potential novel regulator. Our general approach could be used to analyze other complex mammalian systems for which time-course expression data are available.
Collapse
Affiliation(s)
- Stephen A. Ramsey
- Institute for Systems Biology, Seattle, Washington, United States of America
- * E-mail: (SR); (AA); (IS)
| | - Sandy L. Klemm
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Daniel E. Zak
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Kathleen A. Kennedy
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Vesteinn Thorsson
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Bin Li
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Mark Gilchrist
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Elizabeth S. Gold
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Carrie D. Johnson
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Vladimir Litvak
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Garnet Navarro
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Jared C. Roach
- Institute for Systems Biology, Seattle, Washington, United States of America
| | | | - Alistair G. Rust
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Natalya Yudkovsky
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Alan Aderem
- Institute for Systems Biology, Seattle, Washington, United States of America
- * E-mail: (SR); (AA); (IS)
| | - Ilya Shmulevich
- Institute for Systems Biology, Seattle, Washington, United States of America
- * E-mail: (SR); (AA); (IS)
| |
Collapse
|
33
|
Cheng C, Li LM. Systematic identification of cell cycle regulated transcription factors from microarray time series data. BMC Genomics 2008; 9:116. [PMID: 18315882 PMCID: PMC2315658 DOI: 10.1186/1471-2164-9-116] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2007] [Accepted: 03/03/2008] [Indexed: 02/05/2023] Open
Abstract
Background The cell cycle has long been an important model to study the genome-wide transcriptional regulation. Although several methods have been introduced to identify cell cycle regulated genes from microarray data, they can not be directly used to investigate cell cycle regulated transcription factors (CCRTFs), because for many transcription factors (TFs) it is their activities instead of expressions that are periodically regulated across the cell cycle. To overcome this problem, it is useful to infer TF activities across the cell cycle by integrating microarray expression data with ChIP-chip data, and then examine the periodicity of the inferred activities. For most species, however, large-scale ChIP-chip data are still not available. Results We propose a two-step method to identify the CCRTFs by integrating microarray cell cycle data with ChIP-chip data or motif discovery data. In S. cerevisiae, we identify 42 CCRTFs, among which 23 have been verified experimentally. The cell cycle related behaviors (e.g. at which cell cycle phase a TF achieves the highest activity) predicted by our method are consistent with the well established knowledge about them. We also find that the periodical activity fluctuation of some TFs can be perturbed by the cell synchronization treatment. Moreover, by integrating expression data with in-silico motif discovery data, we identify 8 cell cycle associated regulatory motifs, among which 7 are binding sites for well-known cell cycle related TFs. Conclusion Our method is effective to identify CCRTFs by integrating microarray cell cycle data with TF-gene binding information. In S. cerevisiae, the TF-gene binding information is provided by the systematic ChIP-chip experiments. In other species where systematic ChIP-chip data is not available, in-silico motif discovery and analysis provide us with an alternative method. Therefore, our method is ready to be implemented to the microarray cell cycle data sets from different species. The C++ program for AC score calculation is available for download from URL .
Collapse
Affiliation(s)
- Chao Cheng
- Molecular and Computational biology program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089-2910, USA.
| | | |
Collapse
|
34
|
Ashe M, de Bruin RA, Kalashnikova T, McDonald WH, Yates JR, Wittenberg C. The SBF- and MBF-associated Protein Msa1 Is Required for Proper Timing of G1-specific Transcription in Saccharomyces cerevisiae. J Biol Chem 2008; 283:6040-9. [DOI: 10.1074/jbc.m708248200] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
35
|
Wu WS, Li WH, Chen BS. Reconstructing a network of stress-response regulators via dynamic system modeling of gene regulation. GENE REGULATION AND SYSTEMS BIOLOGY 2008; 2:53-62. [PMID: 19787074 PMCID: PMC2733084 DOI: 10.4137/grsb.s558] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Unicellular organisms such as yeasts have evolved mechanisms to respond to environmental stresses by rapidly reorganizing the gene expression program. Although many stress-response genes in yeast have been discovered by DNA microarrays, the stress-response transcription factors (TFs) that regulate these stress-response genes remain to be investigated. In this study, we use a dynamic system model of gene regulation to describe the mechanism of how TFs may control a gene’s expression. Then, based on the dynamic system model, we develop the Stress Regulator Identification Algorithm (SRIA) to identify stress-response TFs for six kinds of stresses. We identified some general stress-response TFs that respond to various stresses and some specific stress-response TFs that respond to one specific stress. The biological significance of our findings is validated by the literature. We found that a small number of TFs is probably sufficient to control a wide variety of expression patterns in yeast under different stresses. Two implications can be inferred from this observation. First, the response mechanisms to different stresses may have a bow-tie structure. Second, there may be regulatory cross-talks among different stress responses. In conclusion, this study proposes a network of stress-response regulators and the details of their actions.
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Lab of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, 300, Taiwan.
| | | | | |
Collapse
|
36
|
Lee HJ, Manke T, Bringas R, Vingron M. Prioritization of gene regulatory interactions from large-scale modules in yeast. BMC Bioinformatics 2008; 9:32. [PMID: 18211684 PMCID: PMC2244593 DOI: 10.1186/1471-2105-9-32] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2007] [Accepted: 01/22/2008] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND The identification of groups of co-regulated genes and their transcription factors, called transcriptional modules, has been a focus of many studies about biological systems. While methods have been developed to derive numerous modules from genome-wide data, individual links between regulatory proteins and target genes still need experimental verification. In this work, we aim to prioritize regulator-target links within transcriptional modules based on three types of large-scale data sources. RESULTS Starting with putative transcriptional modules from ChIP-chip data, we first derive modules in which target genes show both expression and function coherence. The most reliable regulatory links between transcription factors and target genes are established by identifying intersection of target genes in coherent modules for each enriched functional category. Using a combination of genome-wide yeast data in normal growth conditions and two different reference datasets, we show that our method predicts regulatory interactions with significantly higher predictive power than ChIP-chip binding data alone. A comparison with results from other studies highlights that our approach provides a reliable and complementary set of regulatory interactions. Based on our results, we can also identify functionally interacting target genes, for instance, a group of co-regulated proteins related to cell wall synthesis. Furthermore, we report novel conserved binding sites of a glycoprotein-encoding gene, CIS3, regulated by Swi6-Swi4 and Ndd1-Fkh2-Mcm1 complexes. CONCLUSION We provide a simple method to prioritize individual TF-gene interactions from large-scale transcriptional modules. In comparison with other published works, we predict a complementary set of regulatory interactions which yields a similar or higher prediction accuracy at the expense of sensitivity. Therefore, our method can serve as an alternative approach to prioritization for further experimental studies.
Collapse
Affiliation(s)
- Ho-Joon Lee
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany.
| | | | | | | |
Collapse
|
37
|
A systematic approach to detecting transcription factors in response to environmental stresses. BMC Bioinformatics 2007; 8:473. [PMID: 18067669 PMCID: PMC2257980 DOI: 10.1186/1471-2105-8-473] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2007] [Accepted: 12/08/2007] [Indexed: 11/15/2022] Open
Abstract
Background Eukaryotic cells have developed mechanisms to respond to external environmental or physiological changes (stresses). In order to increase the activities of stress-protection functions in response to an environmental change, the internal cell mechanisms need to induce certain specific gene expression patterns and pathways by changing the expression levels of specific transcription factors (TFs). The conventional methods to find these specific TFs and their interactivities are slow and laborious. In this study, a novel efficient method is proposed to detect the TFs and their interactivities that regulate yeast genes that respond to any specific environment change. Results For each gene expressed in a specific environmental condition, a dynamic regulatory model is constructed in which the coefficients of the model represent the transcriptional activities and interactivities of the corresponding TFs. The proposed method requires only microarray data and information of all TFs that bind to the gene but it has superior resolution than the current methods. Our method not only can find stress-specific TFs but also can predict their regulatory strengths and interactivities. Moreover, TFs can be ranked, so that we can identify the major TFs to a stress. Similarly, it can rank the interactions between TFs and identify the major cooperative TF pairs. In addition, the cross-talks and interactivities among different stress-induced pathways are specified by the proposed scheme to gain much insight into protective mechanisms of yeast under different environmental stresses. Conclusion In this study, we find significant stress-specific and cell cycle-controlled TFs via constructing a transcriptional dynamic model to regulate the expression profiles of genes under different environmental conditions through microarray data. We have applied this TF activity and interactivity detection method to many stress conditions, including hyper- and hypo- osmotic shock, heat shock, hydrogen peroxide and cell cycle, because the available expression time profiles for these conditions are long enough. Especially, we find significant TFs and cooperative TFs responding to environmental changes. Our method may also be applicable to other stresses if the gene expression profiles have been examined for a sufficiently long time.
Collapse
|
38
|
Langfelder P, Horvath S. Eigengene networks for studying the relationships between co-expression modules. BMC SYSTEMS BIOLOGY 2007. [PMID: 18031580 DOI: 10.1186/1752‐0509‐1‐54] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND There is evidence that genes and their protein products are organized into functional modules according to cellular processes and pathways. Gene co-expression networks have been used to describe the relationships between gene transcripts. Ample literature exists on how to detect biologically meaningful modules in networks but there is a need for methods that allow one to study the relationships between modules. RESULTS We show that network methods can also be used to describe the relationships between co-expression modules and present the following methodology. First, we describe several methods for detecting modules that are shared by two or more networks (referred to as consensus modules). We represent the gene expression profiles of each module by an eigengene. Second, we propose a method for constructing an eigengene network, where the edges are undirected but maintain information on the sign of the co-expression information. Third, we propose methods for differential eigengene network analysis that allow one to assess the preservation of network properties across different data sets. We illustrate the value of eigengene networks in studying the relationships between consensus modules in human and chimpanzee brains; the relationships between consensus modules in brain, muscle, liver, and adipose mouse tissues; and the relationships between male-female mouse consensus modules and clinical traits. In some applications, we find that module eigengenes can be organized into higher level clusters which we refer to as meta-modules. CONCLUSION Eigengene networks can be effective and biologically meaningful tools for studying the relationships between modules of a gene co-expression network. The proposed methods may reveal a higher order organization of the transcriptome. R software tutorials, the data, and supplementary material can be found at the following webpage: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/EigengeneNetwork.
Collapse
Affiliation(s)
- Peter Langfelder
- Department of Human Genetics and Department of Biostatistics, University of California, Los Angeles, CA 90095, USA.
| | | |
Collapse
|
39
|
Langfelder P, Horvath S. Eigengene networks for studying the relationships between co-expression modules. BMC SYSTEMS BIOLOGY 2007; 1:54. [PMID: 18031580 PMCID: PMC2267703 DOI: 10.1186/1752-0509-1-54] [Citation(s) in RCA: 588] [Impact Index Per Article: 34.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2007] [Accepted: 11/21/2007] [Indexed: 11/25/2022]
Abstract
Background There is evidence that genes and their protein products are organized into functional modules according to cellular processes and pathways. Gene co-expression networks have been used to describe the relationships between gene transcripts. Ample literature exists on how to detect biologically meaningful modules in networks but there is a need for methods that allow one to study the relationships between modules. Results We show that network methods can also be used to describe the relationships between co-expression modules and present the following methodology. First, we describe several methods for detecting modules that are shared by two or more networks (referred to as consensus modules). We represent the gene expression profiles of each module by an eigengene. Second, we propose a method for constructing an eigengene network, where the edges are undirected but maintain information on the sign of the co-expression information. Third, we propose methods for differential eigengene network analysis that allow one to assess the preservation of network properties across different data sets. We illustrate the value of eigengene networks in studying the relationships between consensus modules in human and chimpanzee brains; the relationships between consensus modules in brain, muscle, liver, and adipose mouse tissues; and the relationships between male-female mouse consensus modules and clinical traits. In some applications, we find that module eigengenes can be organized into higher level clusters which we refer to as meta-modules. Conclusion Eigengene networks can be effective and biologically meaningful tools for studying the relationships between modules of a gene co-expression network. The proposed methods may reveal a higher order organization of the transcriptome. R software tutorials, the data, and supplementary material can be found at the following webpage: .
Collapse
Affiliation(s)
- Peter Langfelder
- Department of Human Genetics and Department of Biostatistics, University of California, Los Angeles, CA 90095, USA.
| | | |
Collapse
|
40
|
Chen BS, Wu WS. Underlying principles of natural selection in network evolution: systems biology approach. Evol Bioinform Online 2007; 3:245-62. [PMID: 19468310 PMCID: PMC2684126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Systems biology is a rapidly expanding field that integrates diverse areas of science such as physics, engineering, computer science, mathematics, and biology toward the goal of elucidating the underlying principles of hierarchical metabolic and regulatory systems in the cell, and ultimately leading to predictive understanding of cellular response to perturbations. Because post-genomics research is taking place throughout the tree of life, comparative approaches offer a way for combining data from many organisms to shed light on the evolution and function of biological networks from the gene to the organismal level. Therefore, systems biology can build on decades of theoretical work in evolutionary biology, and at the same time evolutionary biology can use the systems biology approach to go in new uncharted directions. In this study, we present a review of how the post-genomics era is adopting comparative approaches and dynamic system methods to understand the underlying design principles of network evolution and to shape the nascent field of evolutionary systems biology. Finally, the application of evolutionary systems biology to robust biological network designs is also discussed from the synthetic biology perspective.
Collapse
Affiliation(s)
- Bor-Sen Chen
- Lab of Control and Systems Biology, National Tsing Hua University, Hsinchu 300, Taiwan.
| | | |
Collapse
|
41
|
Wang RS, Wang Y, Zhang XS, Chen L. Inferring transcriptional regulatory networks from high-throughput data. ACTA ACUST UNITED AC 2007; 23:3056-64. [PMID: 17890736 DOI: 10.1093/bioinformatics/btm465] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Inferring the relationships between transcription factors (TFs) and their targets has utmost importance for understanding the complex regulatory mechanisms in cellular systems. However, the transcription factor activities (TFAs) cannot be measured directly by standard microarray experiment owing to various post-translational modifications. In particular, cooperative mechanism and combinatorial control are common in gene regulation, e.g. TFs usually recruit other proteins cooperatively to facilitate transcriptional reaction processes. RESULTS In this article, we propose a novel method for inferring transcriptional regulatory networks (TRN) from gene expression data based on protein transcription complexes and mass action law. With gene expression data and TFAs estimated from transcription complex information, the inference of TRN is formulated as a linear programming (LP) problem which has a globally optimal solution in terms of L(1) norm error. The proposed method not only can easily incorporate ChIP-Chip data as prior knowledge, but also can integrate multiple gene expression datasets from different experiments simultaneously. A unique feature of our method is to take into account protein cooperation in transcription process. We tested our method by using both synthetic data and several experimental datasets in yeast. The extensive results illustrate the effectiveness of the proposed method for predicting transcription regulatory relationships between TFs with co-regulators and target genes.
Collapse
Affiliation(s)
- Rui-Sheng Wang
- School of Information, Renmin University of China, Beijing 100872, China
| | | | | | | |
Collapse
|
42
|
Bayesian hierarchical model for transcriptional module discovery by jointly modeling gene expression and ChIP-chip data. BMC Bioinformatics 2007; 8:283. [PMID: 17683565 PMCID: PMC1994961 DOI: 10.1186/1471-2105-8-283] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2007] [Accepted: 08/03/2007] [Indexed: 02/03/2023] Open
Abstract
Background Transcriptional modules (TM) consist of groups of co-regulated genes and transcription factors (TF) regulating their expression. Two high-throughput (HT) experimental technologies, gene expression microarrays and Chromatin Immuno-Precipitation on Chip (ChIP-chip), are capable of producing data informative about expression regulatory mechanism on a genome scale. The optimal approach to joint modeling of data generated by these two complementary biological assays, with the goal of identifying and characterizing TMs, is an important open problem in computational biomedicine. Results We developed and validated a novel probabilistic model and related computational procedure for identifying TMs by jointly modeling gene expression and ChIP-chip binding data. We demonstrate an improved functional coherence of the TMs produced by the new method when compared to either analyzing expression or ChIP-chip data separately or to alternative approaches for joint analysis. We also demonstrate the ability of the new algorithm to identify novel regulatory relationships not revealed by ChIP-chip data alone. The new computational procedure can be used in more or less the same way as one would use simple hierarchical clustering without performing any special transformation of data prior to the analysis. The R and C-source code for implementing our algorithm is incorporated within the R package gimmR which is freely available at http://eh3.uc.edu/gimm. Conclusion Our results indicate that, whenever available, ChIP-chip and expression data should be analyzed within the unified probabilistic modeling framework, which will likely result in improved clusters of co-regulated genes and improved ability to detect meaningful regulatory relationships. Given the good statistical properties and the ease of use, the new computational procedure offers a worthy new tool for reconstructing transcriptional regulatory networks.
Collapse
|
43
|
Wu WS, Li WH, Chen BS. Identifying regulatory targets of cell cycle transcription factors using gene expression and ChIP-chip data. BMC Bioinformatics 2007; 8:188. [PMID: 17559637 PMCID: PMC1906835 DOI: 10.1186/1471-2105-8-188] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2006] [Accepted: 06/08/2007] [Indexed: 11/27/2022] Open
Abstract
Background ChIP-chip data, which indicate binding of transcription factors (TFs) to DNA regions in vivo, are widely used to reconstruct transcriptional regulatory networks. However, the binding of a TF to a gene does not necessarily imply regulation. Thus, it is important to develop methods to identify regulatory targets of TFs from ChIP-chip data. Results We developed a method, called Temporal Relationship Identification Algorithm (TRIA), which uses gene expression data to identify a TF's regulatory targets among its binding targets inferred from ChIP-chip data. We applied TRIA to yeast cell cycle microarray data and identified many plausible regulatory targets of cell cycle TFs. We validated our predictions by checking the enrichments for functional annotation and known cell cycle genes. Moreover, we showed that TRIA performs better than two published methods (MA-Network and MFA). It is known that co-regulated genes may not be co-expressed. TRIA has the ability to identify subsets of highly co-expressed genes among the regulatory targets of a TF. Different functional roles are found for different subsets, indicating the diverse functions a TF could have. Finally, for a control, we showed that TRIA also performs well for cell-cycle irrelevant TFs. Conclusion Finding the regulatory targets of TFs is important for understanding how cells change their transcription program to adapt to environmental stimuli. Our algorithm TRIA is helpful for achieving this purpose.
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Lab of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, 300, Taiwan
| | - Wen-Hsiung Li
- Department of Evolution and Ecology, University of Chicago, 1101 East 57Street, Chicago, IL, 60637, USA
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Bor-Sen Chen
- Lab of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, 300, Taiwan
| |
Collapse
|