51
|
Schmeier S, MacPherson CR, Essack M, Kaur M, Schaefer U, Suzuki H, Hayashizaki Y, Bajic VB. Deciphering the transcriptional circuitry of microRNA genes expressed during human monocytic differentiation. BMC Genomics 2009; 10:595. [PMID: 20003307 PMCID: PMC2797535 DOI: 10.1186/1471-2164-10-595] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2008] [Accepted: 12/10/2009] [Indexed: 12/19/2022] Open
Abstract
Background Macrophages are immune cells involved in various biological processes including host defence, homeostasis, differentiation, and organogenesis. Disruption of macrophage biology has been linked to increased pathogen infection, inflammation and malignant diseases. Differential gene expression observed in monocytic differentiation is primarily regulated by interacting transcription factors (TFs). Current research suggests that microRNAs (miRNAs) degrade and repress translation of mRNA, but also may target genes involved in differentiation. We focus on getting insights into the transcriptional circuitry regulating miRNA genes expressed during monocytic differentiation. Results We computationally analysed the transcriptional circuitry of miRNA genes during monocytic differentiation using in vitro time-course expression data for TFs and miRNAs. A set of TF→miRNA associations was derived from predicted TF binding sites in promoter regions of miRNA genes. Time-lagged expression correlation analysis was utilised to evaluate the TF→miRNA associations. Our analysis identified 12 TFs that potentially play a central role in regulating miRNAs throughout the differentiation process. Six of these 12 TFs (ATF2, E2F3, HOXA4, NFE2L1, SP3, and YY1) have not previously been described to be important for monocytic differentiation. The remaining six TFs are CEBPB, CREB1, ELK1, NFE2L2, RUNX1, and USF2. For several miRNAs (miR-21, miR-155, miR-424, and miR-17-92), we show how their inferred transcriptional regulation impacts monocytic differentiation. Conclusions The study demonstrates that miRNAs and their transcriptional regulatory control are integral molecular mechanisms during differentiation. Furthermore, it is the first study to decipher on a large-scale, how miRNAs are controlled by TFs during human monocytic differentiation. Subsequently, we have identified 12 candidate key controllers of miRNAs during this differentiation process.
Collapse
Affiliation(s)
- Sebastian Schmeier
- South African National Bioinformatics Institute, University of the Western Cape, Modderdam Road, Bellville, South Africa.
| | | | | | | | | | | | | | | |
Collapse
|
52
|
Kiddle SJ, Windram OPF, McHattie S, Mead A, Beynon J, Buchanan-Wollaston V, Denby KJ, Mukherjee S. Temporal clustering by affinity propagation reveals transcriptional modules in Arabidopsis thaliana. Bioinformatics 2009; 26:355-62. [DOI: 10.1093/bioinformatics/btp673] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
|
53
|
Wang G, Yin L, Zhao Y, Mao K. Efficiently mining time-delayed gene expression patterns. ACTA ACUST UNITED AC 2009; 40:400-11. [PMID: 19884096 DOI: 10.1109/tsmcb.2009.2025564] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Unlike pattern-based biclustering methods that focus on grouping objects in the same subset of dimensions, in this paper, we propose a novel model of coherent clustering for time-series gene expression data, i.e., time-delayed cluster (td-cluster). Under this model, objects can be coherent in different subsets of dimensions if these objects follow a certain time-delayed relationship. Such a cluster can discover the cycle time of gene expression, which is essential in revealing gene regulatory networks. This paper is the first attempt to mine time-delayed gene expression patterns from microarray data. A novel algorithm is also presented and implemented to mine all significant td-clusters. Our experimental results show following two results: 1) the td-cluster algorithm can detect a significant amount of clusters that were missed by previous models, and these clusters are potentially of high biological significance and 2) the td-cluster model and algorithm can easily be extended to 3-D gene x sample x time data sets to identify 3-D td-clusters.
Collapse
Affiliation(s)
- Guoren Wang
- School of Information Science and Engineering, and Key Laboratory ofMedical Image Computing, Northeastern University, Shenyang 110004, China.
| | | | | | | |
Collapse
|
54
|
Nguyen TT, Nowakowski RS, Androulakis IP. Unsupervised selection of highly coexpressed and noncoexpressed genes using a consensus clustering approach. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2009; 13:219-37. [PMID: 19445647 DOI: 10.1089/omi.2008.0074] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
In this paper we explore the concept of consensus clustering to identify, within a set of differentially expressed genes, a subset of genes that are either highly coexpressed or highly noncoexpressed based on the hypothesis that this subset would serve as a better starting point for further analyses. A number of core clustering methods form the basis for the assertion of an agreement matrix (AM) characterizing the level of coexpression between any two probesets. In order to overcome the limitations of using a single distance metric, we explore different metrics and examine the sensitivity of the AM as a function of the input number of clusters to find a suggestive number of clusters that best describes a particular dataset. The result of this level of analysis is a systematic framework for eliminating probesets that cannot be clearly characterized as either coexpressed or noncoexpressed with others, thus eliminating a number of probesets from further analysis. Subsequently, an agglomerative hierarchical clustering approach is applied to cluster the selected subset using the agreement metric information as the similarity measure. Thus, the goal of the proposed methodology is twofold: (1) we opt to identify a more "clusterable" subset of the original set; and (2) we aim at further refining the subset in order to identify a core of genes that contains genes that are either coexpressed or noncoexpressed within a certain confidence level. The approach is tested with a number of data sets, both synthetic and real, and it is demonstrated that it is successful in identifying more clusterable, also hypothesized to be more biologically relevant, subsets of genes and expression profiles.
Collapse
Affiliation(s)
- Tung T Nguyen
- BioMaPS Institute for Quantitative Biology, Rutgers University, Piscataway, New Jersey 08854, USA
| | | | | |
Collapse
|
55
|
Telesca D, Inoue LYT, Neira M, Etzioni R, Gleave M, Nelson C. Differential expression and network inferences through functional data modeling. Biometrics 2009; 65:793-804. [PMID: 19053995 PMCID: PMC2956129 DOI: 10.1111/j.1541-0420.2008.01159.x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Time course microarray data consist of mRNA expression from a common set of genes collected at different time points. Such data are thought to reflect underlying biological processes developing over time. In this article, we propose a model that allows us to examine differential expression and gene network relationships using time course microarray data. We model each gene-expression profile as a random functional transformation of the scale, amplitude, and phase of a common curve. Inferences about the gene-specific amplitude parameters allow us to examine differential gene expression. Inferences about measures of functional similarity based on estimated time-transformation functions allow us to examine gene networks while accounting for features of the gene-expression profiles. We discuss applications to simulated data as well as to microarray data on prostate cancer progression.
Collapse
Affiliation(s)
- Donatello Telesca
- University of Texas, M.D. Anderson Cancer Center, Department of Biostatistics, Houston, Texas 77230, USA
| | | | | | | | | | | |
Collapse
|
56
|
He F, Balling R, Zeng AP. Reverse engineering and verification of gene networks: principles, assumptions, and limitations of present methods and future perspectives. J Biotechnol 2009; 144:190-203. [PMID: 19631244 DOI: 10.1016/j.jbiotec.2009.07.013] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2009] [Revised: 07/13/2009] [Accepted: 07/16/2009] [Indexed: 12/21/2022]
Abstract
Reverse engineering of gene networks aims at revealing the structure of the gene regulation network in a biological system by reasoning backward directly from experimental data. Many methods have recently been proposed for reverse engineering of gene networks by using gene transcript expression data measured by microarray. Whereas the potentials of the methods have been well demonstrated, the assumptions and limitations behind them are often not clearly stated or not well understood. In this review, we first briefly explain the principles of the major methods, identify the assumptions behind them and pinpoint the limitations and possible pitfalls in applying them to real biological questions. With regard to applications, we then discuss challenges in the experimental verification of gene networks generated from reverse engineering methods. We further propose an optimal experimental design for allocating sampling schedule and possible strategies for reducing the limitations of some of the current reverse engineering methods. Finally, we examine the perspectives for the development of reverse engineering and urge the need to move from revealing network structure to the dynamics of biological systems.
Collapse
Affiliation(s)
- Feng He
- Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany
| | | | | |
Collapse
|
57
|
Ruan J, Deng Y, Perkins EJ, Zhang W. An ensemble learning approach to reverse-engineering transcriptional regulatory networks from time-series gene expression data. BMC Genomics 2009; 10 Suppl 1:S8. [PMID: 19594885 PMCID: PMC2709269 DOI: 10.1186/1471-2164-10-s1-s8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND One of the most challenging tasks in the post-genomic era is to reconstruct the transcriptional regulatory networks. The goal is to reveal, for each gene that responds to a certain biological event, which transcription factors affect its expression, and how a set of transcription factors coordinate to accomplish temporal and spatial specific regulations. RESULTS Here we propose a supervised machine learning approach to address these questions. We focus our study on the gene transcriptional regulation of the cell cycle in the budding yeast, thanks to the large amount of data available and relatively well-understood biology, although the main ideas of our method can be applied to other data as well. Our method starts with building an ensemble of decision trees for each microarray data to capture the association between the expression levels of yeast genes and the binding of transcription factors to gene promoter regions, as determined by chromatin immunoprecipitation microarray (ChIP-chip) experiment. Cross-validation experiments show that the method is more accurate and reliable than the naive decision tree algorithm and several other ensemble learning methods. From the decision tree ensembles, we extract logical rules that explain how a set of transcription factors act in concert to regulate the expression of their targets. We further compute a profile for each rule to show its regulation strengths at different time points. We also propose a spline interpolation method to integrate the rule profiles learned from several time series expression data sets that measure the same biological process. We then combine these rule profiles to build a transcriptional regulatory network for the yeast cell cycle. Compared to the results in the literature, our method correctly identifies all major known yeast cell cycle transcription factors, and assigns them into appropriate cell cycle phases. Our method also identifies many interesting synergetic relationships among these transcription factors, most of which are well known, while many of the rest can also be supported by other evidences. CONCLUSION The high accuracy of our method indicates that our method is valid and robust. As more gene expression and transcription factor binding data become available, we believe that our method is useful for reconstructing large-scale transcriptional regulatory networks in other species as well.
Collapse
Affiliation(s)
- Jianhua Ruan
- Department of Computer Science, The University of Texas at San Antonio, San Antonio, TX 78249, USA
| | - Youping Deng
- SpecPro Inc., Vicksburg, MS 39180, USA
- Department of Biological Sciences, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| | - Edward J Perkins
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS 39180, USA
| | - Weixiong Zhang
- Department of Computer Science and Engineering, Washington University in St Louis, St Louis, MO 63130, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA
| |
Collapse
|
58
|
Picciani RG, Diaz A, Lee RK, Bhattacharya SK. Potential for transcriptional upregulation of cochlin in glaucomatous trabecular meshwork: a combinatorial bioinformatic and biochemical analytical approach. Invest Ophthalmol Vis Sci 2009; 50:3106-11. [PMID: 19098315 PMCID: PMC2720616 DOI: 10.1167/iovs.08-3106] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
PURPOSE To determine the existence of a relatively higher abundance of potential TFs in glaucomatous trabecular meshwork (TM) that may bind putative promoter regions and affect cochlin protein expression in glaucomatous compared to normal TM. METHODS Combinatorial bioinformatics and biochemical analyses, using human glaucomatous and normal donor tissue (n = 4 each). Biochemical analysis included electrophoretic mobility shift assays (EMSAs), filter binding assays (FBAs), coupled in vitro transcription-translation (TNT) assays and promoter mutation analysis. RESULTS Combinatorial bioinformatics and biochemical analyses revealed the existence of a higher abundance of TFs in glaucomatous than in normal TM nuclear extracts. The evidence of a relatively high abundance of TFs, leading to increased expression of cochlin predicted by bioinformatic and biochemical analyses (EMSA and FBA), was further supported by TNT and promoter mutation TNT assays. CONCLUSIONS These results support the finding that the observed increased cochlin expression in glaucomatous TM is due to relative elevated abundance of TFs. The results also demonstrate the utility of combinatorial bioinformatic and biochemical analyses for genes with uncharacterized promoter regions.
Collapse
Affiliation(s)
- Renata G Picciani
- Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida 33136, USA
| | | | | | | |
Collapse
|
59
|
Chechik G, Koller D. Timing of gene expression responses to environmental changes. J Comput Biol 2009; 16:279-90. [PMID: 19193146 DOI: 10.1089/cmb.2008.13tt] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Cells respond to environmental perturbations with changes in their gene expression that are coordinated in magnitude and time. Timing information about individual genes, rather than clusters, provides a refined way to view and analyze responses, but it is hard to estimate accurately. To analyze response timing of individual genes, we developed a parametric model that captures the typical temporal responses: an abrupt early response followed by a second transition to a steady state. This impulse model explicitly represents natural temporal properties such as the onset and the offset time, and can be estimated robustly, as demonstrated by its superior ability to impute missing values in gene expression data. Using response time of individual genes, we identify relations between gene function and their response timing, showing, for example, how cytosolic ribosomal genes are only repressed after the mitochondrial ribosome is activated. We further demonstrate a strong relation between the binding affinity of a transcription factor and the activation timing of its targets, suggesting that graded binding affinities could be a widely used mechanism for controlling expression timing. See online Supplementary Material at (www.liebertonline.com).
Collapse
Affiliation(s)
- Gal Chechik
- Computer Science Department, Stanford University, Stanford, CA, USA.
| | | |
Collapse
|
60
|
Feng J, Yi D, Krishna R, Guo S, Buchanan-Wollaston V. Listen to genes: dealing with microarray data in the frequency domain. PLoS One 2009; 4:e5098. [PMID: 22745650 PMCID: PMC3383793 DOI: 10.1371/journal.pone.0005098] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Accepted: 03/05/2009] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND We present a novel and systematic approach to analyze temporal microarray data. The approach includes normalization, clustering and network analysis of genes. METHODOLOGY Genes are normalized using an error model based uniform normalization method aimed at identifying and estimating the sources of variations. The model minimizes the correlation among error terms across replicates. The normalized gene expressions are then clustered in terms of their power spectrum density. The method of complex Granger causality is introduced to reveal interactions between sets of genes. Complex Granger causality along with partial Granger causality is applied in both time and frequency domains to selected as well as all the genes to reveal the interesting networks of interactions. The approach is successfully applied to Arabidopsis leaf microarray data generated from 31,000 genes observed over 22 time points over 22 days. Three circuits: a circadian gene circuit, an ethylene circuit and a new global circuit showing a hierarchical structure to determine the initiators of leaf senescence are analyzed in detail. CONCLUSIONS We use a totally data-driven approach to form biological hypothesis. Clustering using the power-spectrum analysis helps us identify genes of potential interest. Their dynamics can be captured accurately in the time and frequency domain using the methods of complex and partial Granger causality. With the rise in availability of temporal microarray data, such methods can be useful tools in uncovering the hidden biological interactions. We show our method in a step by step manner with help of toy models as well as a real biological dataset. We also analyse three distinct gene circuits of potential interest to Arabidopsis researchers.
Collapse
Affiliation(s)
- Jianfeng Feng
- Centre for Computational System Biology, Shanghai, Fudan University, Shanghai, People's Republic of China.
| | | | | | | | | |
Collapse
|
61
|
Scheinine A, Mentzen WI, Fotia G, Pieroni E, Maggio F, Mancosu G, De La Fuente A. Inferring Gene Networks: Dream or Nightmare? Ann N Y Acad Sci 2009; 1158:287-301. [DOI: 10.1111/j.1749-6632.2008.04100.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
62
|
Nam H, Lee K, Lee D. Identification of temporal association rules from time-series microarray data sets. BMC Bioinformatics 2009; 10 Suppl 3:S6. [PMID: 19344482 PMCID: PMC2665054 DOI: 10.1186/1471-2105-10-s3-s6] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Background One of the most challenging problems in mining gene expression data is to identify how the expression of any particular gene affects the expression of other genes. To elucidate the relationships between genes, an association rule mining (ARM) method has been applied to microarray gene expression data. However, a conventional ARM method has a limit on extracting temporal dependencies between gene expressions, though the temporal information is indispensable to discover underlying regulation mechanisms in biological pathways. In this paper, we propose a novel method, referred to as temporal association rule mining (TARM), which can extract temporal dependencies among related genes. A temporal association rule has the form [gene A↑, gene B↓] → (7 min) [gene C↑], which represents that high expression level of gene A and significant repression of gene B followed by significant expression of gene C after 7 minutes. The proposed TARM method is tested with Saccharomyces cerevisiae cell cycle time-series microarray gene expression data set. Results In the parameter fitting phase of TARM, the fitted parameter set [threshold = ± 0.8, support ≥ 3 transactions, confidence ≥ 90%] with the best precision score for KEGG cell cycle pathway has been chosen for rule mining phase. With the fitted parameter set, numbers of temporal association rules with five transcriptional time delays (0, 7, 14, 21, 28 minutes) are extracted from gene expression data of 799 genes, which are pre-identified cell cycle relevant genes. From the extracted temporal association rules, associated genes, which play same role of biological processes within short transcriptional time delay and some temporal dependencies between genes with specific biological processes are identified. Conclusion In this work, we proposed TARM, which is an applied form of conventional ARM. TARM showed higher precision score than Dynamic Bayesian network and Bayesian network. Advantages of TARM are that it tells us the size of transcriptional time delay between associated genes, activation and inhibition relationship between genes, and sets of co-regulators.
Collapse
Affiliation(s)
- Hojung Nam
- Department of Bio and Brain Engineering, KAIST, 373-1 Guseong-dong, Yuseong-gu, Daejeon, Korea.
| | | | | |
Collapse
|
63
|
Liu Y, Jiang B, Zhang X. Gene-set analysis identifies master transcription factors in developmental courses. Genomics 2009; 94:1-10. [PMID: 19272436 DOI: 10.1016/j.ygeno.2009.02.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2008] [Revised: 02/11/2009] [Accepted: 02/26/2009] [Indexed: 11/26/2022]
Abstract
Transcriptional regulation plays key roles in many biological processes. The regulation is dynamic in time and space. Identifying transcription factors that play major roles in a developmental time course is very important for understanding the regulation. This cannot be realized by studying the relation between the expression of individual genes. We developed a gene-set analysis approach to study master regulators and their actively regulated targets during a time course from gene expression data. We applied the method to a mouse liver development data and a mouse embryonic stem cell (mESC) development data, and identified 14 and 9 transcription factors that play major regulatory roles in the two development courses, respectively. Some transcription factors could not be identified as active in the process by studying their correlation with individual targets. The method was also extended for studying other regulation factors or pathways from time-course expression data.
Collapse
Affiliation(s)
- Ying Liu
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | | | | |
Collapse
|
64
|
Hu M, Qin ZS. Query large scale microarray compendium datasets using a model-based bayesian approach with variable selection. PLoS One 2009; 4:e4495. [PMID: 19214232 PMCID: PMC2637418 DOI: 10.1371/journal.pone.0004495] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2008] [Accepted: 12/06/2008] [Indexed: 11/19/2022] Open
Abstract
In microarray gene expression data analysis, it is often of interest to identify genes that share similar expression profiles with a particular gene such as a key regulatory protein. Multiple studies have been conducted using various correlation measures to identify co-expressed genes. While working well for small datasets, the heterogeneity introduced from increased sample size inevitably reduces the sensitivity and specificity of these approaches. This is because most co-expression relationships do not extend to all experimental conditions. With the rapid increase in the size of microarray datasets, identifying functionally related genes from large and diverse microarray gene expression datasets is a key challenge. We develop a model-based gene expression query algorithm built under the Bayesian model selection framework. It is capable of detecting co-expression profiles under a subset of samples/experimental conditions. In addition, it allows linearly transformed expression patterns to be recognized and is robust against sporadic outliers in the data. Both features are critically important for increasing the power of identifying co-expressed genes in large scale gene expression datasets. Our simulation studies suggest that this method outperforms existing correlation coefficients or mutual information-based query tools. When we apply this new method to the Escherichia coli microarray compendium data, it identifies a majority of known regulons as well as novel potential target genes of numerous key transcription factors.
Collapse
Affiliation(s)
- Ming Hu
- Center for Statistical Genetics, Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Zhaohui S. Qin
- Center for Statistical Genetics, Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|
65
|
Comparing algorithms for clustering of expression data: how to assess gene clusters. Methods Mol Biol 2009; 541:479-509. [PMID: 19381534 DOI: 10.1007/978-1-59745-243-4_21] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Clustering is a popular technique commonly used to search for groups of similarly expressed genes using mRNA expression data. There are many different clustering algorithms and the application of each one will usually produce different results. Without additional evaluation, it is difficult to determine which solutions are better.In this chapter we discuss methods to assess algorithms for clustering of gene expression data. In particular, we present a new method that uses two elements: an internal index of validity based on the MDL principle and an external index of validity that measures the consistency with experimental data. Each one is used to suggest an effective set of models, but it is only the combination of both that is capable of pinpointing the best model overall. Our method can be used to compare different clustering algorithms and pick the one that maximizes the correlation with functional links in gene networks while minimizing the error rate. We test our methods on several popular clustering algorithms as well as on clustering algorithms that are specially tailored to deal with noisy data. Finally, we propose methods for assessing the significance of individual clusters and study the correspondence between gene clusters and biochemical pathways.
Collapse
|
66
|
Wu WS, Li WH. Systematic identification of yeast cell cycle transcription factors using multiple data sources. BMC Bioinformatics 2008; 9:522. [PMID: 19061501 PMCID: PMC2613934 DOI: 10.1186/1471-2105-9-522] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2008] [Accepted: 12/05/2008] [Indexed: 12/16/2022] Open
Abstract
Background Eukaryotic cell cycle is a complex process and is precisely regulated at many levels. Many genes specific to the cell cycle are regulated transcriptionally and are expressed just before they are needed. To understand the cell cycle process, it is important to identify the cell cycle transcription factors (TFs) that regulate the expression of cell cycle-regulated genes. Results We developed a method to identify cell cycle TFs in yeast by integrating current ChIP-chip, mutant, transcription factor binding site (TFBS), and cell cycle gene expression data. We identified 17 cell cycle TFs, 12 of which are known cell cycle TFs, while the remaining five (Ash1, Rlm1, Ste12, Stp1, Tec1) are putative novel cell cycle TFs. For each cell cycle TF, we assigned specific cell cycle phases in which the TF functions and identified the time lag for the TF to exert regulatory effects on its target genes. We also identified 178 novel cell cycle-regulated genes, among which 59 have unknown functions, but they may now be annotated as cell cycle-regulated genes. Most of our predictions are supported by previous experimental or computational studies. Furthermore, a high confidence TF-gene regulatory matrix is derived as a byproduct of our method. Each TF-gene regulatory relationship in this matrix is supported by at least three data sources: gene expression, TFBS, and ChIP-chip or/and mutant data. We show that our method performs better than four existing methods for identifying yeast cell cycle TFs. Finally, an application of our method to different cell cycle gene expression datasets suggests that our method is robust. Conclusion Our method is effective for identifying yeast cell cycle TFs and cell cycle-regulated genes. Many of our predictions are validated by the literature. Our study shows that integrating multiple data sources is a powerful approach to studying complex biological systems.
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Department of Evolution and Ecology, University of Chicago, Chicago, IL 60637, USA.
| | | |
Collapse
|
67
|
Farina L, De Santis A, Salvucci S, Morelli G, Ruberti I. Embedding mRNA stability in correlation analysis of time-series gene expression data. PLoS Comput Biol 2008; 4:e1000141. [PMID: 18670596 PMCID: PMC2453326 DOI: 10.1371/journal.pcbi.1000141] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Accepted: 06/24/2008] [Indexed: 12/23/2022] Open
Abstract
Current methods for the identification of putatively co-regulated genes directly from gene expression time profiles are based on the similarity of the time profile. Such association metrics, despite their central role in gene network inference and machine learning, have largely ignored the impact of dynamics or variation in mRNA stability. Here we introduce a simple, but powerful, new similarity metric called lead-lag R2 that successfully accounts for the properties of gene dynamics, including varying mRNA degradation and delays. Using yeast cell-cycle time-series gene expression data, we demonstrate that the predictive power of lead-lag R2 for the identification of co-regulated genes is significantly higher than that of standard similarity measures, thus allowing the selection of a large number of entirely new putatively co-regulated genes. Furthermore, the lead-lag metric can also be used to uncover the relationship between gene expression time-series and the dynamics of formation of multiple protein complexes. Remarkably, we found a high lead-lag R2 value among genes coding for a transient complex. Microarrays provide snapshots of the transcriptional state of the cell at some point in time. Multiple snapshots can be taken sequentially in time, thus providing insight into the dynamics of change. Since genome-wide expression data report on the abundance of mRNA, not on the underlying activity of genes, we developed a novel method to relate the expression pattern of genes, detected in a time-series experiment, using a similarity measure that incorporates mRNA decay and called lead-lag R2. We used the lead-lag R2 similarity measure to predict the presence of common transcription factors between gene pairs using an integrated dataset consisting of 13 yeast cell-cycles. The method was benchmarked against six well-established similarity measures and obtained the best true positive rate result, around 95%. We believe that the lead-lag analysis can be successfully used also to predict the presence of a common mechanism able to modulate the degradation rate of specific transcripts. Finally, we envisage the possibility to extend our analysis to different experimental conditions and organisms, thus providing a simple off-the-shelf computational tool to support the understanding of the transcriptional and post-transcriptional regulation layer and its role in many diseases, such as cancer.
Collapse
Affiliation(s)
- Lorenzo Farina
- Dipartimento di Informatica e Sistemistica Antonio Ruberti, Sapienza Università di Roma, Rome, Italy.
| | | | | | | | | |
Collapse
|
68
|
He F, Buer J, Zeng AP, Balling R. Dynamic cumulative activity of transcription factors as a mechanism of quantitative gene regulation. Genome Biol 2008; 8:R181. [PMID: 17784952 PMCID: PMC2375019 DOI: 10.1186/gb-2007-8-9-r181] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2007] [Revised: 08/22/2007] [Accepted: 09/04/2007] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND The regulation of genes in multicellular organisms is generally achieved through the combinatorial activity of different transcription factors. However, the quantitative mechanisms of how a combination of transcription factors controls the expression of their target genes remain unknown. RESULTS By using the information on the yeast transcription network and high-resolution time-series data, the combinatorial expression profiles of regulators that best correlate with the expression of their target genes are identified. We demonstrate that a number of factors, particularly time-shifts among the different regulators as well as conversion efficiencies of transcription factor mRNAs into functional binding regulators, play a key role in the quantification of target gene expression. By quantifying and integrating these factors, we have found a highly significant correlation between the combinatorial time-series expression profile of regulators and their target gene expression in 67.1% of the 161 known yeast three-regulator motifs and in 32.9% of 544 two-regulator motifs. For network motifs involved in the cell cycle, these percentages are much higher. Furthermore, the results have been verified with a high consistency in a second independent set of time-series data. Additional support comes from the finding that a high percentage of motifs again show a significant correlation in time-series data from stress-response studies. CONCLUSION Our data strongly support the concept that dynamic cumulative regulation is a major principle of quantitative transcriptional control. The proposed concept might also apply to other organisms and could be relevant for a wide range of biotechnological applications in which quantitative gene regulation plays a role.
Collapse
Affiliation(s)
- Feng He
- Biological Systems Analysis Group, HZI- Helmholtz Centre for Infection Research, Inhoffenstrasse, D-38124 Braunschweig, Germany
| | - Jan Buer
- Mucosal Immunity Group, HZI- Helmholtz Centre for Infection Research, Inhoffenstrasse, D-38124 Braunschweig, Germany
- Institute of Medical Microbiology, Hannover Medical School (MHH), D-30625 Hannover, Germany
| | - An-Ping Zeng
- Systems Biology Group, HZI- Helmholtz Centre for Infection Research, Inhoffenstrasse, D-38124 Braunschweig, Germany
- Institute of Bioprocess and Biosystems Engineering, Hamburg University of Technology, Denickerstrasse, D-21073 Hamburg, Germany
| | - Rudi Balling
- Biological Systems Analysis Group, HZI- Helmholtz Centre for Infection Research, Inhoffenstrasse, D-38124 Braunschweig, Germany
| |
Collapse
|
69
|
Wang H, Wang Q, Li X, Shen B, Ding M, Shen Z. Towards patterns tree of gene coexpression in eukaryotic species. ACTA ACUST UNITED AC 2008; 24:1367-73. [PMID: 18407921 DOI: 10.1093/bioinformatics/btn134] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Cellular pathways behave coordinated regulation activity, and some reported works also have affirmed that genes in the same pathway have similar expression pattern. However, the complexity of biological systems regulation actually causes expression relationships between genes to display multiple patterns, such as linear, non-linear, local, global, linear with time-delayed, non-linear with time-delayed, monotonic and non-monotonic, which should be the explicit representation of cellular inner regulation mechanism in mRNA level. To investigate the relationship between different patterns, our work aims to systematically reveal gene-expression relationship patterns in cellular pathways and to check for the existence of dominating gene-expression pattern. By a large scale analysis of genes expression in three eukaryotic species, Saccharomyces cerevisiae, Caenorhabditis elegans and Human, we constructed gene coexpression patterns tree to systematically and hierarchically illustrate the different patterns and their interrelations. RESULTS The results show that the linear is the dominating expression pattern in the same pathway. The time-shifted pattern is another important relationship pattern. Many genes from the different pathway also present coexpression patterns. The non-linear, non-monotonic and time-delayed relationship patterns reflect the remote interactions between the genes in cellular processes. Gene coexpression phenomena in the same pathways are diverse in different species. Genes in S.cerevisiae and C.elegans present strong coexpression relationships, especially in C.elegans, coexpression is more universal and stronger due to its special array of genes. However in Human, gene coexpression is not apparent and the human genome involves more complicated functional relationships. In conclusion, different patterns corresponding to different coordinating behaviors coexist. The patterns trees of different species give us comprehensive insight and understanding of genes expression activity in the cellular society.
Collapse
Affiliation(s)
- Haiyun Wang
- School of Life Science and Technology, Tongji University, Shanghai 200092, China
| | | | | | | | | | | |
Collapse
|
70
|
Chuang CL, Jen CH, Chen CM, Shieh GS. A pattern recognition approach to infer time-lagged genetic interactions. ACTA ACUST UNITED AC 2008; 24:1183-90. [PMID: 18337258 DOI: 10.1093/bioinformatics/btn098] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
MOTIVATION For any time-course microarray data in which the gene interactions and the associated paired patterns are dependent, the proposed pattern recognition (PARE) approach can infer time-lagged genetic interactions, a challenging task due to the small number of time points and large number of genes. PARE utilizes a non-linear score to identify subclasses of gene pairs with different time lags. In each subclass, PARE extracts non-linear characteristics of paired gene-expression curves and learns weights of the decision score applying an optimization algorithm to microarray gene-expression data (MGED) of some known interactions, from biological experiments or published literature. Namely, PARE integrates both MGED and existing knowledge via machine learning, and subsequently predicts the other genetic interactions in the subclass. RESULTS PARE, a time-lagged correlation approach and the latest advance in graphical Gaussian models were applied to predict 112 (132) pairs of TC/TD (transcriptional regulatory) interactions. Checked against qRT-PCR results (published literature), their true positive rates are 73% (77%), 46% (51%), and 52% (59%), respectively. The false positive rates of predicting TC and TD (AT and RT) interactions in the yeast genome are bounded by 13 and 10% (10 and 14%), respectively. Several predicted TC/TD interactions are shown to coincide with existing pathways involving Sgs1, Srs2 and Mus81. This reinforces the possibility of applying genetic interactions to predict pathways of protein complexes. Moreover, some experimentally testable gene interactions involving DNA repair are predicted. AVAILABILITY Supplementary data and PARE software are available at http://www.stat.sinica.edu.tw/~gshieh/pare.htm.
Collapse
Affiliation(s)
- Cheng-Long Chuang
- Institute of Biomedical Engineering, National Taiwan University, Taipei 106, Taiwan
| | | | | | | |
Collapse
|
71
|
Kim CS, Riikonen P, Salakoski T. Detecting biological associations between genes based on the theory of phase synchronization. Biosystems 2008; 92:99-113. [PMID: 18289772 DOI: 10.1016/j.biosystems.2007.12.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2007] [Revised: 12/20/2007] [Accepted: 12/22/2007] [Indexed: 11/17/2022]
Abstract
This study presents a novel approach to detect biological associations for gene pairs with cell cycle-specific expression profiles. Previous studies have shown that periodic transcription is commonly regulated by transcription factors that are also periodically transcribed, and there is a growing number of examples where cell cycle regulated genes are conserved in yeast and mammalian cells. Some genes have periodicity for their oscillatory activity throughout cell division. These cell cycle-specific oscillatory activities could be explained by a biological phenomenon in terms of efficiency and logical order. In the yeast data used in this study, about 13% of genes behave in this manner based on a previous yeast study. Microarrays have been applied to determine genome-wide expression patterns during the cell cycle of a number of different cells. Moreover, several previous studies have shown that many pairs of genes, which have linearly correlated expression profiles, have similar cellular roles or physical interactions. Based on this point of view, the traditional clustering methods have focused on similar expression profiles based on the premise that genes with similar expression profiles have similar biological functions or relevant biological interactions. However, there are a number of previous studies indicating that the expression of some genes may be delayed compared to others due to a time lag in their transcriptional control. Therefore, we propose a novel clustering method, named as phase-synchronization clustering, based on the theory of phase synchronization for detecting biological associations using cell cycle-specific expression profiles. We evaluate phase-synchronization clustering here using Saccharomyces cerevisiae microarray data. Phase-synchronization clustering is able to detect biologically associated gene pairs that have linearly correlated (simultaneous and inverted) as well as time-delayed expression profiles. The performance of phase-synchronization clustering is compared with other conventional clustering methods. The likelihood of finding relevant biological associations by phase-synchronization clustering is significantly higher than other clustering methods. Therefore, phase-synchronization clustering is more efficient for detecting known biological interactions for gene pairs than other conventional clustering methods for analyzing cell cycle-specific expression data. The evaluation analysis of the results by phase-synchronization clustering also suggests that the cellular activities during the cell division process could be understood as a phenomenon of collective synchronization.
Collapse
Affiliation(s)
- Chang Sik Kim
- Institute of Animal Resources Research, Kangwon National University, Chuncheon, Republic of Korea.
| | | | | |
Collapse
|
72
|
Androulakis IP, Yang E, Almon RR. Analysis of time-series gene expression data: methods, challenges, and opportunities. Annu Rev Biomed Eng 2007; 9:205-28. [PMID: 17341157 PMCID: PMC4181347 DOI: 10.1146/annurev.bioeng.9.060906.151904] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Monitoring the change in expression patterns over time provides the distinct possibility of unraveling the mechanistic drivers characterizing cellular responses. Gene arrays measuring the level of mRNA expression of thousands of genes simultaneously provide a method of high-throughput data collection necessary for obtaining the scope of data required for understanding the complexities of living organisms. Unraveling the coherent complex structures of transcriptional dynamics is the goal of a large family of computational methods aiming at upgrading the information content of time-course gene expression data. In this review, we summarize the qualitative characteristics of these approaches, discuss the main challenges that this type of complex data present, and, finally, explore the opportunities in the context of developing mechanistic models of cellular response.
Collapse
Affiliation(s)
- I P Androulakis
- Biomedical Engineering Department, Rutgers University, Piscataway, New Jersey 08854, USA.
| | | | | |
Collapse
|
73
|
Huang Y, Li H, Hu H, Yan X, Waterman MS, Huang H, Zhou XJ. Systematic discovery of functional modules and context-specific functional annotation of human genome. ACTA ACUST UNITED AC 2007; 23:i222-9. [PMID: 17646300 DOI: 10.1093/bioinformatics/btm222] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
MOTIVATION The rapid accumulation of microarray datasets provides unique opportunities to perform systematic functional characterization of the human genome. We designed a graph-based approach to integrate cross-platform microarray data, and extract recurrent expression patterns. A series of microarray datasets can be modeled as a series of co-expression networks, in which we search for frequently occurring network patterns. The integrative approach provides three major advantages over the commonly used microarray analysis methods: (1) enhance signal to noise separation (2) identify functionally related genes without co-expression and (3) provide a way to predict gene functions in a context-specific way. RESULTS We integrate 65 human microarray datasets, comprising 1105 experiments and over 11 million expression measurements. We develop a data mining procedure based on frequent itemset mining and biclustering to systematically discover network patterns that recur in at least five datasets. This resulted in 143,401 potential functional modules. Subsequently, we design a network topology statistic based on graph random walk that effectively captures characteristics of a gene's local functional environment. Function annotations based on this statistic are then subject to the assessment using the random forest method, combining six other attributes of the network modules. We assign 1126 functions to 895 genes, 779 known and 116 unknown, with a validation accuracy of 70%. Among our assignments, 20% genes are assigned with multiple functions based on different network environments. AVAILABILITY http://zhoulab.usc.edu/ContextAnnotation.
Collapse
Affiliation(s)
- Yu Huang
- Molecuolar and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | | | | | | | | | | | | |
Collapse
|
74
|
Wu WS, Li WH, Chen BS. Identifying regulatory targets of cell cycle transcription factors using gene expression and ChIP-chip data. BMC Bioinformatics 2007; 8:188. [PMID: 17559637 PMCID: PMC1906835 DOI: 10.1186/1471-2105-8-188] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2006] [Accepted: 06/08/2007] [Indexed: 11/27/2022] Open
Abstract
Background ChIP-chip data, which indicate binding of transcription factors (TFs) to DNA regions in vivo, are widely used to reconstruct transcriptional regulatory networks. However, the binding of a TF to a gene does not necessarily imply regulation. Thus, it is important to develop methods to identify regulatory targets of TFs from ChIP-chip data. Results We developed a method, called Temporal Relationship Identification Algorithm (TRIA), which uses gene expression data to identify a TF's regulatory targets among its binding targets inferred from ChIP-chip data. We applied TRIA to yeast cell cycle microarray data and identified many plausible regulatory targets of cell cycle TFs. We validated our predictions by checking the enrichments for functional annotation and known cell cycle genes. Moreover, we showed that TRIA performs better than two published methods (MA-Network and MFA). It is known that co-regulated genes may not be co-expressed. TRIA has the ability to identify subsets of highly co-expressed genes among the regulatory targets of a TF. Different functional roles are found for different subsets, indicating the diverse functions a TF could have. Finally, for a control, we showed that TRIA also performs well for cell-cycle irrelevant TFs. Conclusion Finding the regulatory targets of TFs is important for understanding how cells change their transcription program to adapt to environmental stimuli. Our algorithm TRIA is helpful for achieving this purpose.
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Lab of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, 300, Taiwan
| | - Wen-Hsiung Li
- Department of Evolution and Ecology, University of Chicago, 1101 East 57Street, Chicago, IL, 60637, USA
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Bor-Sen Chen
- Lab of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, 300, Taiwan
| |
Collapse
|
75
|
Soneji S, Huang S, Loose M, Donaldson IJ, Patient R, Göttgens B, Enver T, May G. Inference, validation, and dynamic modeling of transcription networks in multipotent hematopoietic cells. Ann N Y Acad Sci 2007; 1106:30-40. [PMID: 17442775 DOI: 10.1196/annals.1392.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Identifying the transcription factor interactions that are responsible for cell-specific gene expression programs is key to understanding the regulation of cell behaviors, such as self-renewal, proliferation, differentiation, and death. The rapidly increasing availability of microarray-derived global gene expression data sets, coupled with genome sequence information from multiple species, has driven the development of computational methods to reverse engineer and dynamically model genetic regulatory networks. An understanding of the architecture and behavior of transcriptional networks should lend insight into how the huge number of potential gene expression programs is constrained and facilitates efforts to direct or redirect cell fate.
Collapse
Affiliation(s)
- Shamit Soneji
- Weatherall Institute of Molecular Medicine, Molecular Haematology Unit, University of Oxford, John Radcliffe Hospital, Headington, Oxford OX3 9DS, UK
| | | | | | | | | | | | | | | |
Collapse
|
76
|
Yu H, Xia Y, Trifonov V, Gerstein M. Design principles of molecular networks revealed by global comparisons and composite motifs. Genome Biol 2007; 7:R55. [PMID: 16859507 PMCID: PMC1779570 DOI: 10.1186/gb-2006-7-7-r55] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2006] [Revised: 05/19/2006] [Accepted: 06/20/2006] [Indexed: 02/01/2023] Open
Abstract
A global comparison of the four basic molecular networks in yeast - regulatory, co-expression, interaction and metabolic - reveals general design principles. Background Molecular networks are of current interest, particularly with the publication of many large-scale datasets. Previous analyses have focused on topologic structures of individual networks. Results Here, we present a global comparison of four basic molecular networks: regulatory, co-expression, interaction, and metabolic. In terms of overall topologic correlation - whether nearby proteins in one network are close in another - we find that the four are quite similar. However, focusing on the occurrence of local features, we introduce the concept of composite hubs, namely hubs shared by more than one network. We find that the three 'action' networks (metabolic, co-expression, and interaction) share the same scaffolding of hubs, whereas the regulatory network uses distinctly different regulator hubs. Finally, we examine the inter-relationship between the regulatory network and the three action networks, focusing on three composite motifs - triangles, trusses, and bridges - involving different degrees of regulation of gene pairs. Our analysis shows that interaction and co-expression networks have short-range relationships, with directly interacting and co-expressed proteins sharing regulators. However, the metabolic network contains many long-distance relationships: far-away enzymes in a pathway often have time-delayed expression relationships, which are well coordinated by bridges connecting their regulators. Conclusion We demonstrate how basic molecular networks are distinct yet connected and well coordinated. Many of our conclusions can be mapped onto structured social networks, providing intuitive comparisons. In particular, the long-distance regulation in metabolic networks agrees with its counterpart in social networks (namely, assembly lines). Conversely, the segregation of regulator hubs from other hubs diverges from social intuitions (as managers often are centers of interactions).
Collapse
Affiliation(s)
- Haiyuan Yu
- Department of Molecular Biophysics and Biochemistry, Whitney Avenue, Yale University, New Haven, CT 06520, USA
| | - Yu Xia
- Department of Molecular Biophysics and Biochemistry, Whitney Avenue, Yale University, New Haven, CT 06520, USA
| | - Valery Trifonov
- Department of Molecular Biophysics and Biochemistry, Whitney Avenue, Yale University, New Haven, CT 06520, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Whitney Avenue, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
77
|
Chiang JH, Chao SY. Modeling human cancer-related regulatory modules by GA-RNN hybrid algorithms. BMC Bioinformatics 2007; 8:91. [PMID: 17359522 PMCID: PMC1838431 DOI: 10.1186/1471-2105-8-91] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2006] [Accepted: 03/14/2007] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Modeling cancer-related regulatory modules from gene expression profiling of cancer tissues is expected to contribute to our understanding of cancer biology as well as developments of new diagnose and therapies. Several mathematical models have been used to explore the phenomena of transcriptional regulatory mechanisms in Saccharomyces cerevisiae. However, the contemplating on controlling of feed-forward and feedback loops in transcriptional regulatory mechanisms is not resolved adequately in Saccharomyces cerevisiae, nor is in human cancer cells. RESULTS In this study, we introduce a Genetic Algorithm-Recurrent Neural Network (GA-RNN) hybrid method for finding feed-forward regulated genes when given some transcription factors to construct cancer-related regulatory modules in human cancer microarray data. This hybrid approach focuses on the construction of various kinds of regulatory modules, that is, Recurrent Neural Network has the capability of controlling feed-forward and feedback loops in regulatory modules and Genetic Algorithms provide the ability of global searching of common regulated genes. This approach unravels new feed-forward connections in regulatory models by modified multi-layer RNN architectures. We also validate our approach by demonstrating that the connections in our cancer-related regulatory modules have been most identified and verified by previously-published biological documents. CONCLUSION The major contribution provided by this approach is regarding the chain influences upon a set of genes sequentially. In addition, this inverse modeling correctly identifies known oncogenes and their interaction genes in a purely data-driven way.
Collapse
Affiliation(s)
- Jung-Hsien Chiang
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tanian, Taiwan
| | - Shih-Yi Chao
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tanian, Taiwan
| |
Collapse
|
78
|
Shi Y, Mitchell T, Bar-Joseph Z. Inferring pairwise regulatory relationships from multiple time series datasets. Bioinformatics 2007; 23:755-63. [PMID: 17237067 DOI: 10.1093/bioinformatics/btl676] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Time series expression experiments have emerged as a popular method for studying a wide range of biological systems under a variety of conditions. One advantage of such data is the ability to infer regulatory relationships using time lag analysis. However, such analysis in a single experiment may result in many false positives due to the small number of time points and the large number of genes. Extending these methods to simultaneously analyze several time series datasets is challenging since under different experimental conditions biological systems may behave faster or slower making it hard to rely on the actual duration of the experiment. RESULTS We present a new computational model and an associated algorithm to address the problem of inferring time-lagged regulatory relationships from multiple time series expression experiments with varying (unknown) time-scales. Our proposed algorithm uses a set of known interacting pairs to compute a temporal transformation between every two datasets. Using this temporal transformation we search for new interacting pairs. As we show, our method achieves a much lower false-positive rate compared to previous methods that use time series expression data for pairwise regulatory relationship discovery. Some of the new predictions made by our method can be verified using other high throughput data sources and functional annotation databases. AVAILABILITY Matlab implementation is available from the supporting website: http://www.cs.cmu.edu/~yanxins/regulation_inference/index.html.
Collapse
Affiliation(s)
- Yanxin Shi
- Machine Learning Department, Language Technologies Institute, Computer Science Department and Department of Biological Sciences, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213, USA
| | | | | |
Collapse
|
79
|
Bioinformatics analysis of the early inflammatory response in a rat thermal injury model. BMC Bioinformatics 2007; 8:10. [PMID: 17214898 PMCID: PMC1797813 DOI: 10.1186/1471-2105-8-10] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2006] [Accepted: 01/10/2007] [Indexed: 12/25/2022] Open
Abstract
Background Thermal injury is among the most severe forms of trauma and its effects are both local and systemic. Response to thermal injury includes cellular protection mechanisms, inflammation, hypermetabolism, prolonged catabolism, organ dysfunction and immuno-suppression. It has been hypothesized that gene expression patterns in the liver will change with severe burns, thus reflecting the role the liver plays in the response to burn injury. Characterizing the molecular fingerprint (i.e., expression profile) of the inflammatory response resulting from burns may help elucidate the activated mechanisms and suggest new therapeutic intervention. In this paper we propose a novel integrated framework for analyzing time-series transcriptional data, with emphasis on the burn-induced response within the context of the rat animal model. Our analysis robustly identifies critical expression motifs, indicative of the dynamic evolution of the inflammatory response and we further propose a putative reconstruction of the associated transcription factor activities. Results Implementation of our algorithm on data obtained from an animal (rat) burn injury study identified 281 genes corresponding to 4 unique profiles. Enrichment evaluation upon both gene ontologies and transcription factors, verifies the inflammation-specific character of the selections and the rationalization of the burn-induced inflammatory response. Conducting the transcription network reconstruction and analysis, we have identified transcription factors, including AHR, Octamer Binding Proteins, Kruppel-like Factors, and cell cycle regulators as being highly important to an organism's response to burn response. These transcription factors are notable due to their roles in pathways that play a part in the gross physiological response to burn such as changes in the immune response and inflammation. Conclusion Our results indicate that our novel selection/classification algorithm has been successful in selecting out genes with play an important role in thermal injury. Additionally, we have demonstrated the value of an integrative approach in identifying possible points of intervention, namely the activation of certain transcription factors that govern the organism's response.
Collapse
|
80
|
Schöner D, Barkow S, Bleuler S, Wille A, Zimmermann P, Bühlmann P, Gruissem W, Zitzler E. Network analysis of systems elements. EXS 2007; 97:331-51. [PMID: 17432274 DOI: 10.1007/978-3-7643-7439-6_14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
A central goal of postgenomic research is to assign a function to every predicted gene. Because genes often cooperate in order to establish and regulate cellular events the examination of a gene has also included the search for at least a few interacting genes. This requires a strong hypothesis about possible interaction partners, which has often been derived from what was known about the gene or protein beforehand. Many times, though, this prior knowledge has either been completely lacking, biased towards favored concepts, or only partial due to the theoretically vast interaction space. With the advent of high-throughput technology and robotics in biological research, it has become possible to study gene function on a global scale, monitoring entire genomes and proteomes at once. These systematic approaches aim at considering all possible dependencies between genes or their products, thereby exploring the interaction space at a systems scale. This chapter provides an introduction to network analysis and illustrates the corresponding concepts on the basis of gene expression data. First, an overview of existing methods for the identification of co-regulated genes is given. Second, the issue of topology inference is discussed and as an example a specific inference method is presented. And lastly, the application of these techniques is demonstrated for the Arabidopsis thaliana isoprenoid pathway.
Collapse
Affiliation(s)
- Daniel Schöner
- Plant Biotechnology, Institute of Plant Sciences, Rämistr 2, Swiss Federal Institute of Technology, 8092 Zürich, Switzerland
| | | | | | | | | | | | | | | |
Collapse
|
81
|
Ge H, Player CM, Zou L. Toward a global picture of development: lessons from genome-scale analysis in Caenorhabditis elegans embryonic development. Dev Dyn 2006; 235:2009-17. [PMID: 16779860 DOI: 10.1002/dvdy.20865] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Development is the result of complex events, including cascades of transcriptional programs and numerous molecular interactions. Traditionally, research focus has been given to the characterization of individual mutants, regulators, or interactions. With the availability of complete genome sequences and high-throughput (HT) experimental techniques, probing development on a system level has become feasible. Pioneering work initiated in invertebrate model systems such as Caenorhabditis elegans has provided first drafts of catalogs of essential components, transcriptional regulatory diagrams and molecular interaction networks underlying developmental processes. Integrating these drafts approximates a system-level picture of development and provides local models for protein/gene functions. Here we summarize the progress toward elucidating developmental processes on a system level, including the applications of genomic technologies and computational analyses. We discuss C. elegans embryonic development in case studies to illustrate how various HT approaches can be integrated and how biological insights can be gained from these approaches.
Collapse
Affiliation(s)
- Hui Ge
- Whitehead Institute, Cambridge, Massachusetts 02142, USA.
| | | | | |
Collapse
|
82
|
Wu WS, Li WH, Chen BS. Computational reconstruction of transcriptional regulatory modules of the yeast cell cycle. BMC Bioinformatics 2006; 7:421. [PMID: 17010188 PMCID: PMC1637117 DOI: 10.1186/1471-2105-7-421] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2006] [Accepted: 09/29/2006] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND A transcriptional regulatory module (TRM) is a set of genes that is regulated by a common set of transcription factors (TFs). By organizing the genome into TRMs, a living cell can coordinate the activities of many genes and carry out complex functions. Therefore, identifying TRMs is helpful for understanding gene regulation. RESULTS Integrating gene expression and ChIP-chip data, we develop a method, called MOdule Finding Algorithm (MOFA), for reconstructing TRMs of the yeast cell cycle. MOFA identified 87 TRMs, which together contain 336 distinct genes regulated by 40 TFs. Using various kinds of data, we validated the biological relevance of the identified TRMs. Our analysis shows that different combinations of a fairly small number of TFs are responsible for regulating a large number of genes involved in different cell cycle phases and that there may exist crosstalk between the cell cycle and other cellular processes. MOFA is capable of finding many novel TF-target gene relationships and can determine whether a TF is an activator or/and a repressor. Finally, MOFA refines some clusters proposed by previous studies and provides a better understanding of how the complex expression program of the cell cycle is regulated. CONCLUSION MOFA was developed to reconstruct TRMs of the yeast cell cycle. Many of these TRMs are in agreement with previous studies. Further, MOFA inferred many interesting modules and novel TF combinations. We believe that computational analysis of multiple types of data will be a powerful approach to studying complex biological systems when more and more genomic resources such as genome-wide protein activity data and protein-protein interaction data become available.
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Lab of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, 300, Taiwan
| | - Wen-Hsiung Li
- Department of Evolution and Ecology, University of Chicago, 1101 East 57th Street, Chicago, IL, 60637, USA
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Bor-Sen Chen
- Lab of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, 300, Taiwan
| |
Collapse
|
83
|
Yu H, Gerstein M. Genomic analysis of the hierarchical structure of regulatory networks. Proc Natl Acad Sci U S A 2006; 103:14724-31. [PMID: 17003135 PMCID: PMC1595419 DOI: 10.1073/pnas.0508637103] [Citation(s) in RCA: 221] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A fundamental question in biology is how the cell uses transcription factors (TFs) to coordinate the expression of thousands of genes in response to various stimuli. The relationships between TFs and their target genes can be modeled in terms of directed regulatory networks. These relationships, in turn, can be readily compared with commonplace "chain-of-command" structures in social networks, which have characteristic hierarchical layouts. Here, we develop algorithms for identifying generalized hierarchies (allowing for various loop structures) and use these approaches to illuminate extensive pyramid-shaped hierarchical structures existing in the regulatory networks of representative prokaryotes (Escherichia coli) and eukaryotes (Saccharomyces cerevisiae), with most TFs at the bottom levels and only a few master TFs on top. These masters are situated near the center of the protein-protein interaction network, a different type of network from the regulatory one, and they receive most of the input for the whole regulatory hierarchy through protein interactions. Moreover, they have maximal influence over other genes, in terms of affecting expression-level changes. Surprisingly, however, TFs at the bottom of the regulatory hierarchy are more essential to the viability of the cell. Finally, one might think master TFs achieve their wide influence through directly regulating many targets, but TFs with most direct targets are in the middle of the hierarchy. We find, in fact, that these midlevel TFs are "control bottlenecks" in the hierarchy, and this great degree of control for "middle managers" has parallels in efficient social structures in various corporate and governmental settings.
Collapse
Affiliation(s)
- Haiyuan Yu
- Departments of Molecular Biophysics and Biochemistry and Computer Science and Program in Computational Biology and Bioinformatics, Yale University, 266 Whitney Avenue, P.O. Box 208114, New Haven, CT 06520
- *To whom correspondence may be addressed. E-mail:
or
| | - Mark Gerstein
- Departments of Molecular Biophysics and Biochemistry and Computer Science and Program in Computational Biology and Bioinformatics, Yale University, 266 Whitney Avenue, P.O. Box 208114, New Haven, CT 06520
- *To whom correspondence may be addressed. E-mail:
or
| |
Collapse
|
84
|
Inoue LYT, Neira M, Nelson C, Gleave M, Etzioni R. Cluster-based network model for time-course gene expression data. Biostatistics 2006; 8:507-25. [PMID: 16980695 DOI: 10.1093/biostatistics/kxl026] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We propose a model-based approach to unify clustering and network modeling using time-course gene expression data. Specifically, our approach uses a mixture model to cluster genes. Genes within the same cluster share a similar expression profile. The network is built over cluster-specific expression profiles using state-space models. We discuss the application of our model to simulated data as well as to time-course gene expression data arising from animal models on prostate cancer progression. The latter application shows that with a combined statistical/bioinformatics analyses, we are able to extract gene-to-gene relationships supported by the literature as well as new plausible relationships.
Collapse
Affiliation(s)
- Lurdes Y T Inoue
- Department of Biostatistics, University of Washington, F-600 Health Sciences Building, Campus Mail Stop 357232, Seattle, WA 98195, USA.
| | | | | | | | | |
Collapse
|
85
|
Ruan Q, Dutta D, Schwalbach MS, Steele JA, Fuhrman JA, Sun F. Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors. Bioinformatics 2006; 22:2532-8. [PMID: 16882654 DOI: 10.1093/bioinformatics/btl417] [Citation(s) in RCA: 191] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Characterizing the diversity of microbial communities and understanding the environmental factors that influence community diversity are central tenets of microbial ecology. The development and application of cultivation independent molecular tools has allowed for rapid surveying of microbial community composition at unprecedented resolutions and frequencies. There is a growing need to discern robust patterns and relationships within these datasets which provide insight into microbial ecology. Pearson correlation coefficient (PCC) analysis is commonly used for identifying the linear relationship between two species, or species and environmental factors. However, this approach may not be able to capture more complex interactions which occur in situ; thus, alternative analyses were explored. RESULTS In this paper we introduced local similarity analysis (LSA), which is a technique that can identify more complex dependence associations among species as well as associations between species and environmental factors without requiring significant data reduction. To illustrate its capability of identifying relationships that may not otherwise be identified by PCC, we first applied LSA to simulated data. We then applied LSA to a marine microbial observatory dataset and identified unique, significant associations that were not detected by PCC analysis. LSA results, combined with results from PCC analysis were used to construct a theoretical ecological network which allows for easy visualization of the most significant associations. Biological implications of the significant associations detected by LSA were discussed. We also identified additional applications where LSA would be beneficial. AVAILABILITY The algorithms are implemented in Splus/R and they are available upon request from the corresponding author.
Collapse
Affiliation(s)
- Quansong Ruan
- Department of Mathematics, University of Southern California 3620 Vermont Avenue, KAP 108, Los Angeles, CA 90089-2532, USA
| | | | | | | | | | | |
Collapse
|
86
|
Qin ZS. Clustering microarray gene expression data using weighted Chinese restaurant process. ACTA ACUST UNITED AC 2006; 22:1988-97. [PMID: 16766561 DOI: 10.1093/bioinformatics/btl284] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Clustering microarray gene expression data is a powerful tool for elucidating co-regulatory relationships among genes. Many different clustering techniques have been successfully applied and the results are promising. However, substantial fluctuation contained in microarray data, lack of knowledge on the number of clusters and complex regulatory mechanisms underlying biological systems make the clustering problems tremendously challenging. RESULTS We devised an improved model-based Bayesian approach to cluster microarray gene expression data. Cluster assignment is carried out by an iterative weighted Chinese restaurant seating scheme such that the optimal number of clusters can be determined simultaneously with cluster assignment. The predictive updating technique was applied to improve the efficiency of the Gibbs sampler. An additional step is added during reassignment to allow genes that display complex correlation relationships such as time-shifted and/or inverted to be clustered together. Analysis done on a real dataset showed that as much as 30% of significant genes clustered in the same group display complex relationships with the consensus pattern of the cluster. Other notable features including automatic handling of missing data, quantitative measures of cluster strength and assignment confidence. Synthetic and real microarray gene expression datasets were analyzed to demonstrate its performance. AVAILABILITY A computer program named Chinese restaurant cluster (CRC) has been developed based on this algorithm. The program can be downloaded at http://www.sph.umich.edu/csg/qin/CRC/.
Collapse
Affiliation(s)
- Zhaohui S Qin
- Center for Statistical Genetics, Department of Biostatistics, School of Public Health, University of Michigan 1420 Washington Heights, Ann Arbor, MI 48109-2029, USA.
| |
Collapse
|
87
|
Glazko G, Coleman M, Mushegian A. Similarity searches in genome-wide numerical data sets. Biol Direct 2006; 1:13. [PMID: 16734895 PMCID: PMC1489924 DOI: 10.1186/1745-6150-1-13] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2006] [Accepted: 05/30/2006] [Indexed: 11/24/2022] Open
Abstract
We present psi-square, a program for searching the space of gene vectors. The program starts with a gene vector, i.e., the set of measurements associated with a gene, and finds similar vectors, derives a probabilistic model of these vectors, then repeats search using this model as a query, and continues to update the model and search again, until convergence. When applied to three different pathway-discovery problems, psi-square was generally more sensitive and sometimes more specific than the ad hoc methods developed for solving each of these problems before.
Collapse
Affiliation(s)
- Galina Glazko
- Stowers Institute for Medical Research, 1000 E 50St., Kansas City MO 64110, USA
- University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Michael Coleman
- Stowers Institute for Medical Research, 1000 E 50St., Kansas City MO 64110, USA
| | - Arcady Mushegian
- Stowers Institute for Medical Research, 1000 E 50St., Kansas City MO 64110, USA
- Department of Microbiology, Molecular Genetics, and Immunology, University of Kansas Medical Center, Kansas City, KS 66160, USA
| |
Collapse
|
88
|
Cho KH, Kim JR, Baek S, Choi HS, Choo SM. Inferring biomolecular regulatory networks from phase portraits of time-series expression profiles. FEBS Lett 2006; 580:3511-8. [PMID: 16730002 DOI: 10.1016/j.febslet.2006.05.035] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2006] [Revised: 05/02/2006] [Accepted: 05/09/2006] [Indexed: 11/28/2022]
Abstract
Reverse engineering of biomolecular regulatory networks such as gene regulatory networks, protein interaction networks, and metabolic networks has received an increasing attention as more high-throughput time-series measurements become available. In spite of various approaches developed from this motivation, it still remains as a challenging subject to develop a new reverse engineering scheme that can effectively uncover the functional interaction structure of a biomolecular network from given time-series expression profiles (TSEPs). We propose a new reverse engineering scheme that makes use of phase portraits constructed by projection of every two TSEPs into respective phase planes. We introduce two measures of a slope index (SI) and a winding index (WI) to quantify the interaction properties embedded in the phase portrait. Based on the SI and WI, we can reconstruct the functional interaction network in a very efficient and systematic way with better inference results compared to previous approaches. By using the SI, we can also estimate the time-lag accompanied with the interaction between molecular components of a network.
Collapse
Affiliation(s)
- Kwang-Hyun Cho
- College of Medicine, Seoul National University, Jongno-gu, Republic of Korea.
| | | | | | | | | |
Collapse
|
89
|
Wu X, Zhu L, Guo J, Zhang DY, Lin K. Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations. Nucleic Acids Res 2006; 34:2137-50. [PMID: 16641319 PMCID: PMC1449908 DOI: 10.1093/nar/gkl219] [Citation(s) in RCA: 139] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A map of protein-protein interactions provides valuable insight into the cellular function and machinery of a proteome. By measuring the similarity between two Gene Ontology (GO) terms with a relative specificity semantic relation, here, we proposed a new method of reconstructing a yeast protein-protein interaction map that is solely based on the GO annotations. The method was validated using high-quality interaction datasets for its effectiveness. Based on a Z-score analysis, a positive dataset and a negative dataset for protein-protein interactions were derived. Moreover, a gold standard positive (GSP) dataset with the highest level of confidence that covered 78% of the high-quality interaction dataset and a gold standard negative (GSN) dataset with the lowest level of confidence were derived. In addition, we assessed four high-throughput experimental interaction datasets using the positives and the negatives as well as GSPs and GSNs. Our predicted network reconstructed from GSPs consists of 40,753 interactions among 2259 proteins, and forms 16 connected components. We mapped all of the MIPS complexes except for homodimers onto the predicted network. As a result, approximately 35% of complexes were identified interconnected. For seven complexes, we also identified some nonmember proteins that may be functionally related to the complexes concerned. This analysis is expected to provide a new approach for predicting the protein-protein interaction maps from other completely sequenced genomes with high-quality GO-based annotations.
Collapse
Affiliation(s)
| | | | | | | | - Kui Lin
- To whom correspondence should be addressed. Tel: +86 10 58805045; Fax: +86 10 58807721;
| |
Collapse
|
90
|
Abstract
It is commonly accepted that genes with similar expression profiles are functionally related. However, there are many ways one can measure the similarity of expression profiles, and it is not clear a priori what is the most effective one. Moreover, so far no clear distinction has been made as for the type of the functional link between genes as suggested by microarray data. Similarly expressed genes can be part of the same complex as interacting partners; they can participate in the same pathway without interacting directly; they can perform similar functions; or they can simply have similar regulatory sequences. Here we conduct a study of the notion of functional link as implied from expression data. We analyze different similarity measures of gene expression profiles and assess their usefulness and robustness in detecting biological relationships by comparing the similarity scores with results obtained from databases of interacting proteins, promoter signals and cellular pathways, as well as through sequence comparisons. We also introduce variations on similarity measures that are based on statistical analysis and better discriminate genes which are functionally nearby and faraway. Our tools can be used to assess other similarity measures for expression profiles, and are accessible at biozon.org/tools/expression/
Collapse
Affiliation(s)
- Golan Yona
- Department of Computer Science, Cornell University, NY, USA.
| | | | | | | |
Collapse
|
91
|
He F, Zeng AP. In search of functional association from time-series microarray data based on the change trend and level of gene expression. BMC Bioinformatics 2006; 7:69. [PMID: 16478547 PMCID: PMC1435774 DOI: 10.1186/1471-2105-7-69] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2005] [Accepted: 02/15/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The increasing availability of time-series expression data opens up new possibilities to study functional linkages of genes. Present methods used to infer functional linkages between genes from expression data are mainly based on a point-to-point comparison. Change trends between consecutive time points in time-series data have been so far not well explored. RESULTS In this work we present a new method based on extracting main features of the change trend and level of gene expression between consecutive time points. The method, termed as trend correlation (TC), includes two major steps: 1, calculating a maximal local alignment of change trend score by dynamic programming and a change trend correlation coefficient between the maximal matched change levels of each gene pair; 2, inferring relationships of gene pairs based on two statistical extraction procedures. The new method considers time shifts and inverted relationships in a similar way as the local clustering (LC) method but the latter is merely based on a point-to-point comparison. The TC method is demonstrated with data from yeast cell cycle and compared with the LC method and the widely used Pearson correlation coefficient (PCC) based clustering method. The biological significance of the gene pairs is examined with several large-scale yeast databases. Although the TC method predicts an overall lower number of gene pairs than the other two methods at a same p-value threshold, the additional number of gene pairs inferred by the TC method is considerable: e.g. 20.5% compared with the LC method and 49.6% with the PCC method for a p-value threshold of 2.7E-3. Moreover, the percentage of the inferred gene pairs consistent with databases by our method is generally higher than the LC method and similar to the PCC method. A significant number of the gene pairs only inferred by the TC method are process-identity or function-similarity pairs or have well-documented biological interactions, including 443 known protein interactions and some known cell cycle related regulatory interactions. It should be emphasized that the overlapping of gene pairs detected by the three methods is normally not very high, indicating a necessity of combining the different methods in search of functional association of genes from time-series data. For a p-value threshold of 1E-5 the percentage of process-identity and function-similarity gene pairs among the shared part of the three methods reaches 60.2% and 55.6% respectively, building a good basis for further experimental and functional study. Furthermore, the combined use of methods is important to infer more complete regulatory circuits and network as exemplified in this study. CONCLUSION The TC method can significantly augment the current major methods to infer functional linkages and biological network and is well suitable for exploring temporal relationships of gene expression in time-series data.
Collapse
Affiliation(s)
- Feng He
- Research Group Systems Biology, GBF – German Research Center for Biotechnology, Mascheroder Weg 1, 38124 Braunschweig, Germany
| | - An-Ping Zeng
- Research Group Systems Biology, GBF – German Research Center for Biotechnology, Mascheroder Weg 1, 38124 Braunschweig, Germany
| |
Collapse
|
92
|
Rafiq MI, O'Connor MJ, Das AK. Computational method for temporal pattern discovery in biomedical genomic databases. PROCEEDINGS. IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2006:362-5. [PMID: 16447993 DOI: 10.1109/csb.2005.25] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
With the rapid growth of biomedical research databases, opportunities for scientific inquiry have expanded quickly and led to a demand for computational methods that can extract biologically relevant patterns among vast amounts of data. A significant challenge is identifying temporal relationships among genotypic and clinical (phenotypic) data. Few software tools are available for such pattern matching, and they are not interoperable with existing databases. We are developing and validating a novel software method for temporal pattern discovery in biomedical genomics. In this paper, we present an efficient and flexible query algorithm (called TEMF) to extract statistical patterns from time-oriented relational databases. We show that TEMF - as an extension to our modular temporal querying application (Chronus II) - can express a wide range of complex temporal aggregations without the need for data processing in a statistical software package. We show the expressivity of TEMF using example queries from the Stanford HIV Database.
Collapse
Affiliation(s)
- Mohammed I Rafiq
- Stanford Medical Informatics, MSOB X233, Stanford, CA 94305, USA.
| | | | | |
Collapse
|
93
|
Li X, Rao S, Jiang W, Li C, Xiao Y, Guo Z, Zhang Q, Wang L, Du L, Li J, Li L, Zhang T, Wang QK. Discovery of Time-Delayed Gene Regulatory Networks based on temporal gene expression profiling. BMC Bioinformatics 2006; 7:26. [PMID: 16420705 PMCID: PMC1386718 DOI: 10.1186/1471-2105-7-26] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2005] [Accepted: 01/18/2006] [Indexed: 11/30/2022] Open
Abstract
Background It is one of the ultimate goals for modern biological research to fully elucidate the intricate interplays and the regulations of the molecular determinants that propel and characterize the progression of versatile life phenomena, to name a few, cell cycling, developmental biology, aging, and the progressive and recurrent pathogenesis of complex diseases. The vast amount of large-scale and genome-wide time-resolved data is becoming increasing available, which provides the golden opportunity to unravel the challenging reverse-engineering problem of time-delayed gene regulatory networks. Results In particular, this methodological paper aims to reconstruct regulatory networks from temporal gene expression data by using delayed correlations between genes, i.e., pairwise overlaps of expression levels shifted in time relative each other. We have thus developed a novel model-free computational toolbox termed TdGRN (Time-delayed Gene Regulatory Network) to address the underlying regulations of genes that can span any unit(s) of time intervals. This bioinformatics toolbox has provided a unified approach to uncovering time trends of gene regulations through decision analysis of the newly designed time-delayed gene expression matrix. We have applied the proposed method to yeast cell cycling and human HeLa cell cycling and have discovered most of the underlying time-delayed regulations that are supported by multiple lines of experimental evidence and that are remarkably consistent with the current knowledge on phase characteristics for the cell cyclings. Conclusion We established a usable and powerful model-free approach to dissecting high-order dynamic trends of gene-gene interactions. We have carefully validated the proposed algorithm by applying it to two publicly available cell cycling datasets. In addition to uncovering the time trends of gene regulations for cell cycling, this unified approach can also be used to study the complex gene regulations related to the development, aging and progressive pathogenesis of a complex disease where potential dependences between different experiment units might occurs.
Collapse
Affiliation(s)
- Xia Li
- Department of Bioinformatics, Harbin Medical University, Harbin 150086, PR China
- Department of Computer Science, Harbin Institute of Technology, Harbin 150080, PR China
- Biomedical Engineering Institute, Capital University of Medical Sciences, Beijing 100054, PR China
| | - Shaoqi Rao
- Department of Bioinformatics, Harbin Medical University, Harbin 150086, PR China
- Departments of Cardiovascular Medicine and Molecular Cardiology, The Cleveland Clinic Foundation, 9500 Euclid Avenue, Cleveland, Ohio 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, 9500 Euclid Avenue, Cleveland, Ohio 44195, USA
| | - Wei Jiang
- Department of Bioinformatics, Harbin Medical University, Harbin 150086, PR China
| | - Chuanxing Li
- Department of Bioinformatics, Harbin Medical University, Harbin 150086, PR China
| | - Yun Xiao
- Department of Bioinformatics, Harbin Medical University, Harbin 150086, PR China
| | - Zheng Guo
- Department of Bioinformatics, Harbin Medical University, Harbin 150086, PR China
- Department of Computer Science, Harbin Institute of Technology, Harbin 150080, PR China
| | - Qingpu Zhang
- Department of Computer Science, Harbin Institute of Technology, Harbin 150080, PR China
| | - Lihong Wang
- Department of Bioinformatics, Harbin Medical University, Harbin 150086, PR China
| | - Lei Du
- Department of Bioinformatics, Harbin Medical University, Harbin 150086, PR China
| | - Jing Li
- Department of Bioinformatics, Harbin Medical University, Harbin 150086, PR China
| | - Li Li
- Department of Bioinformatics, Harbin Medical University, Harbin 150086, PR China
| | - Tianwen Zhang
- Department of Computer Science, Harbin Institute of Technology, Harbin 150080, PR China
| | - Qing K Wang
- Departments of Cardiovascular Medicine and Molecular Cardiology, The Cleveland Clinic Foundation, 9500 Euclid Avenue, Cleveland, Ohio 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, 9500 Euclid Avenue, Cleveland, Ohio 44195, USA
| |
Collapse
|
94
|
Yugi K, Nakayama Y, Kojima S, Kitayama T, Tomita M. A microarray data-based semi-kinetic method for predicting quantitative dynamics of genetic networks. BMC Bioinformatics 2005; 6:299. [PMID: 16351711 PMCID: PMC1326213 DOI: 10.1186/1471-2105-6-299] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2005] [Accepted: 12/13/2005] [Indexed: 11/19/2022] Open
Abstract
Background Elucidating the dynamic behaviour of genetic regulatory networks is one of the most significant challenges in systems biology. However, conventional quantitative predictions have been limited to small networks because publicly available transcriptome data has not been extensively applied to dynamic simulation. Results We present a microarray data-based semi-kinetic (MASK) method which facilitates the prediction of regulatory dynamics of genetic networks composed of recurrently appearing network motifs with reasonable accuracy. The MASK method allows the determination of model parameters representing the contribution of regulators to transcription rate from time-series microarray data. Using a virtual regulatory network and a Saccharomyces cerevisiae ribosomal protein gene module, we confirmed that a MASK model can predict expression profiles for various conditions as accurately as a conventional kinetic model. Conclusion We have demonstrated the MASK method for the construction of dynamic simulation models of genetic networks from time-series microarray data, initial mRNA copy number and first-order degradation constants of mRNA. The quantitative accuracy of the MASK models has been confirmed, and the results indicated that this method enables the prediction of quantitative dynamics in genetic networks composed of commonly used network motifs, which cover considerable fraction of the whole network.
Collapse
Affiliation(s)
- Katsuyuki Yugi
- Institute for Advanced Biosciences, Keio University, Tsuruoka, 997-0035, Japan
| | - Yoichi Nakayama
- Institute for Advanced Biosciences, Keio University, Tsuruoka, 997-0035, Japan
| | - Shigen Kojima
- Institute for Advanced Biosciences, Keio University, Tsuruoka, 997-0035, Japan
| | - Tomoya Kitayama
- Institute for Advanced Biosciences, Keio University, Tsuruoka, 997-0035, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Tsuruoka, 997-0035, Japan
| |
Collapse
|
95
|
Simon I, Siegfried Z, Ernst J, Bar-Joseph Z. Combined static and dynamic analysis for determining the quality of time-series expression profiles. Nat Biotechnol 2005; 23:1503-8. [PMID: 16333294 DOI: 10.1038/nbt1164] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Expression profiling of time-series experiments is widely used to study biological systems. However, determining the quality of the resulting profiles remains a fundamental problem. Because of inadequate sampling rates, the effect of arrest-and-release methods and loss of synchronization, the measurements obtained from a series of time points may not accurately represent the underlying expression profiles. To solve this, we propose an approach that combines time-series and static (average) expression data analysis--for each gene, we determine whether its temporal expression profile can be reconciled with its static expression levels. We show that by combining synchronized and unsynchronized human cell cycle data, we can identify many cycling genes that are missed when using only time-series data. The algorithm also correctly distinguishes cycling genes from genes that specifically react to an environmental stimulus even if they share similar temporal expression profiles. Experimental validation of these results shows the utility of this analytical approach for determining the accuracy of gene expression patterns.
Collapse
Affiliation(s)
- Itamar Simon
- Dept. Molecular Biology, Hebrew University Medical School, Jerusalem, Israel 91120
| | | | | | | |
Collapse
|
96
|
Wei GH, Liu DP, Liang CC. Charting gene regulatory networks: strategies, challenges and perspectives. Biochem J 2004; 381:1-12. [PMID: 15080794 PMCID: PMC1133755 DOI: 10.1042/bj20040311] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2004] [Revised: 04/13/2004] [Accepted: 04/13/2004] [Indexed: 11/17/2022]
Abstract
One of the foremost challenges in the post-genomic era will be to chart the gene regulatory networks of cells, including aspects such as genome annotation, identification of cis-regulatory elements and transcription factors, information on protein-DNA and protein-protein interactions, and data mining and integration. Some of these broad sets of data have already been assembled for building networks of gene regulation. Even though these datasets are still far from comprehensive, and the approach faces many important and difficult challenges, some strategies have begun to make connections between disparate regulatory events and to foster new hypotheses. In this article we review several different genomics and proteomics technologies, and present bioinformatics methods for exploring these data in order to make novel discoveries.
Collapse
Affiliation(s)
- Gong-Hong Wei
- National Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences (CAMS) and Peking Union Medical College (PUMC), 5 Dong Dan San Tiao, Beijing 100005, P.R. China
| | - De-Pei Liu
- National Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences (CAMS) and Peking Union Medical College (PUMC), 5 Dong Dan San Tiao, Beijing 100005, P.R. China
- To whom correspondence should be addressed (e-mail )
| | - Chih-Chuan Liang
- National Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences (CAMS) and Peking Union Medical College (PUMC), 5 Dong Dan San Tiao, Beijing 100005, P.R. China
| |
Collapse
|
97
|
Balasubramaniyan R, Hüllermeier E, Weskamp N, Kämper J. Clustering of gene expression data using a local shape-based similarity measure. Bioinformatics 2004; 21:1069-77. [PMID: 15513997 DOI: 10.1093/bioinformatics/bti095] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Microarray technology enables the study of gene expression in large scale. The application of methods for data analysis then allows for grouping genes that show a similar expression profile and that are thus likely to be co-regulated. A relationship among genes at the biological level often presents itself by locally similar and potentially time-shifted patterns in their expression profiles. RESULTS Here, we propose a new method (CLARITY; Clustering with Local shApe-based similaRITY) for the analysis of microarray time course experiments that uses a local shape-based similarity measure based on Spearman rank correlation. This measure does not require a normalization of the expression data and is comparably robust towards noise. It is also able to detect similar and even time-shifted sub-profiles. To this end, we implemented an approach motivated by the BLAST algorithm for sequence alignment. We used CLARITY to cluster the times series of gene expression data during the mitotic cell cycle of the yeast Saccharomyces cerevisiae. The obtained clusters were related to the MIPS functional classification to assess their biological significance. We found that several clusters were significantly enriched with genes that share similar or related functions.
Collapse
Affiliation(s)
- Rajarajeswari Balasubramaniyan
- Max-Planck Institute for Terrestrial Microbiology, Department of Organismic Interactions Karl-von-Frisch-Strasse, 35043 Marburg, Germany
| | | | | | | |
Collapse
|
98
|
Xia Y, Yu H, Jansen R, Seringhaus M, Baxter S, Greenbaum D, Zhao H, Gerstein M. Analyzing cellular biochemistry in terms of molecular networks. Annu Rev Biochem 2004; 73:1051-87. [PMID: 15189167 DOI: 10.1146/annurev.biochem.73.011303.073950] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
One way to understand cells and circumscribe the function of proteins is through molecular networks. These networks take a variety of forms including webs of protein-protein interactions, regulatory circuits linking transcription factors and targets, and complex pathways of metabolic reactions. We first survey experimental techniques for mapping networks (e.g., the yeast two-hybrid screens). We then turn our attention to computational approaches for predicting networks from individual protein features, such as correlating gene expression levels or analyzing sequence coevolution. All the experimental techniques and individual predictions suffer from noise and systematic biases. These problems can be overcome to some degree through statistical integration of different experimental datasets and predictive features (e.g., within a Bayesian formalism). Next, we discuss approaches for characterizing the topology of networks, such as finding hubs and analyzing subnetworks in terms of common motifs. Finally, we close with perspectives on how network analysis represents a preliminary step toward a systems approach for modeling cells.
Collapse
Affiliation(s)
- Yu Xia
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.
| | | | | | | | | | | | | | | |
Collapse
|
99
|
|
100
|
Coulouarn C, Lefebvre G, Derambure C, Lequerre T, Scotte M, Francois A, Cellier D, Daveau M, Salier JP. Altered gene expression in acute systemic inflammation detected by complete coverage of the human liver transcriptome. Hepatology 2004; 39:353-364. [PMID: 14767988 DOI: 10.1002/hep.20052] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/07/2022]
Abstract
The goal of the current study was to provide complete coverage of the liver transcriptome with human probes corresponding to every gene expressed in embryonic, adult, and/or cancerous liver. We developed dedicated tools, namely, the Liverpool nylon array of complementary DNA (cDNA) probes for approximately 10,000 nonredundant genes and the LiverTools database. Inflammation-induced transcriptome changes were studied in liver tissue samples from patients with an acute systemic inflammation and from control subjects. One hundred and fifty-four messenger RNAs (mRNA) correlated statistically with the extent of inflammation. Of these, 134 mRNA samples were not associated previously with an acute-phase (AP) response. The hepatocyte origin and proinflammatory cytokine responsiveness of these mRNAs were confirmed by quantitative reverse-transcription polymerase chain reaction (Q-RT-PCR) in cytokine-challenged hepatoma cells. The corresponding gene promoters were enriched in potential binding sites for inflammation-driven transcription factors in the liver. Some of the corresponding proteins may provide novel blood markers of clinical relevance. The mRNAs whose level is most correlated with the AP extent (P <.05) were enriched in intracellular signaling molecules, transcription factors, glycosylation enzymes, and up-regulated plasma proteins. In conclusion, the hepatocyte responded to the AP extent by fine tuning some mRNA levels, controlling most, if not all, intracellular events from early signaling to the final secretion of proteins involved in innate immunity. Supplementary material for this article can be found on the HEPATOLOGY website (http://interscience.wiley.com/jpages/0270-9139/suppmat/index.html).
Collapse
Affiliation(s)
- Cédric Coulouarn
- INSERM Unité 519 and Faculté de Médecine-Pharmacie, Institut Fédératif de Recherches Multidisciplinaires sur les Peptides, Rouen, France
| | | | | | | | | | | | | | | | | |
Collapse
|