1
|
Huang T, Xiao H, Tian Q, He Z, Yuan C, Lin Z, Gao X, Yao M. Identification of upstream transcription factor binding sites in orthologous genes using mixed Student’s t-test statistics. PLoS Comput Biol 2022; 18:e1009773. [PMID: 35671296 PMCID: PMC9205514 DOI: 10.1371/journal.pcbi.1009773] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 06/17/2022] [Accepted: 04/30/2022] [Indexed: 11/18/2022] Open
Abstract
Background Transcription factor (TF) regulates the transcription of DNA to messenger RNA by binding to upstream sequence motifs. Identifying the locations of known motifs in whole genomes is computationally intensive. Methodology/Principal findings This study presents a computational tool, named “Grit”, for screening TF-binding sites (TFBS) by coordinating transcription factors to their promoter sequences in orthologous genes. This tool employs a newly developed mixed Student’s t-test statistical method that detects high-scoring binding sites utilizing conservation information among species. The program performs sequence scanning at a rate of 3.2 Mbp/s on a quad-core Amazon server and has been benchmarked by the well-established ChIP-Seq datasets, putting Grit amongst the top-ranked TFBS predictors. It significantly outperforms the well-known transcription factor motif scanning tools, Pscan (4.8%) and FIMO (17.8%), in analyzing well-documented ChIP-Atlas human genome Chip-Seq datasets. Significance Grit is a good alternative to current available motif scanning tools. Locating transcription factor-binding (TF-binding) site in the genome and identification their function is fundamental in understanding various biological processes. Improve the performance of the prediction tools is important because accurate TF-binding site prediction can save cost and time for wet-lab experiments. Also, genome wide TF-binding site prediction can provide new insights for transcriptome regulation in system biology perspective. This study developed a new TF-binding site prediction tool based on mixed Student’s t-test statistical method. The tool is amongst the top-ranked TF-binding site predictors, as such, it can help the researchers in TF-binding site identification and transcriptional regulation mechanism interpretation of genes.
Collapse
Affiliation(s)
- Tinghua Huang
- College of Animal Science, Yangtze University, Jingzhou, China
| | - Hong Xiao
- College of Animal Science, Yangtze University, Jingzhou, China
| | - Qi Tian
- College of Animal Science, Yangtze University, Jingzhou, China
| | - Zhen He
- College of Animal Science, Yangtze University, Jingzhou, China
| | - Cheng Yuan
- College of Animal Science, Yangtze University, Jingzhou, China
| | - Zezhao Lin
- College of Animal Science, Yangtze University, Jingzhou, China
| | - Xuejun Gao
- College of Animal Science, Yangtze University, Jingzhou, China
- * E-mail: (XG); (MY)
| | - Min Yao
- College of Animal Science, Yangtze University, Jingzhou, China
- * E-mail: (XG); (MY)
| |
Collapse
|
2
|
Boeva V. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells. Front Genet 2016; 7:24. [PMID: 26941778 PMCID: PMC4763482 DOI: 10.3389/fgene.2016.00024] [Citation(s) in RCA: 87] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 02/05/2016] [Indexed: 12/27/2022] Open
Abstract
Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation.
Collapse
Affiliation(s)
- Valentina Boeva
- Centre de Recherche, Institut CurieParis, France; INSERM, U900Paris, France; Mines ParisTechFontainebleau, France; PSL Research UniversityParis, France; Department of Development, Reproduction and Cancer, Institut CochinParis, France; INSERM, U1016Paris, France; Centre National de la Recherche Scientifique UMR 8104Paris, France; Université Paris Descartes UMR-S1016Paris, France
| |
Collapse
|
3
|
A novel pairwise comparison method for in silico discovery of statistically significant cis-regulatory elements in eukaryotic promoter regions: application to Arabidopsis. J Theor Biol 2014; 364:364-76. [PMID: 25303887 DOI: 10.1016/j.jtbi.2014.09.038] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2014] [Revised: 09/27/2014] [Accepted: 09/29/2014] [Indexed: 11/22/2022]
Abstract
Cis regulatory elements (CREs), located within promoter regions, play a significant role in the blueprint for transcriptional regulation of genes. There is a growing interest to study the combinatorial nature of CREs including presence or absence of CREs, the number of occurrences of each CRE, as well as of their order and location relative to their target genes. Comparative promoter analysis has been shown to be a reliable strategy to test the significance of each component of promoter architecture. However, it remains unclear what level of difference in the number of occurrences of each CRE is of statistical significance in order to explain different expression patterns of two genes. In this study, we present a novel statistical approach for pairwise comparison of promoters of Arabidopsis genes in the context of number of occurrences of each CRE within the promoters. First, using the sample of 1000 Arabidopsis promoters, the results of the goodness of fit test and non-parametric analysis revealed that the number of occurrences of CREs in a promoter sequence is Poisson distributed. As a promoter sequence contained functional and non-functional CREs, we addressed the issue of the statistical distribution of functional CREs by analyzing the ChIP-seq datasets. The results showed that the number of occurrences of functional CREs over the genomic regions was determined as being Poisson distributed. In accordance with the obtained distribution of CREs occurrences, we suggested the Audic and Claverie (AC) test to compare two promoters based on the number of occurrences for the CREs. Superiority of the AC test over Chi-square (2×2) and Fisher's exact tests was also shown, as the AC test was able to detect a higher number of significant CREs. The two case studies on the Arabidopsis genes were performed in order to biologically verify the pairwise test for promoter comparison. Consequently, a number of CREs with significantly different occurrences was identified between the promoters. The results of the pairwise comparative analysis together with the expression data for the studied genes revealed the biological significance of the identified CREs.
Collapse
|
4
|
Elucidation of regulatory mechanisms revealed by human promoter sequence analysis of genes co-expressed in forskolin-treated theca cells in PCOS. Arch Gynecol Obstet 2012; 287:477-85. [DOI: 10.1007/s00404-012-2580-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Accepted: 09/20/2012] [Indexed: 10/27/2022]
|
5
|
Small interfering RNA against transcription factor STAT6 leads to increased cholesterol synthesis in lung cancer cell lines. PLoS One 2011; 6:e28509. [PMID: 22162773 PMCID: PMC3230611 DOI: 10.1371/journal.pone.0028509] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2011] [Accepted: 11/09/2011] [Indexed: 01/31/2023] Open
Abstract
STAT6 transcription factor has become a potential molecule for therapeutic intervention because it regulates broad range of cellular processes in a large variety of cell types. Although some target genes and interacting partners of STAT6 have been identified, its exact mechanism of action needs to be elucidated. In this study, we sought to further characterize the molecular interactions, networks, and functions of STAT6 by profiling the mRNA expression of STAT6 silenced human lung cells (NCI-H460) using microarrays. Our analysis revealed 273 differentially expressed genes after STAT6 silencing. Analysis of the gene expression data with Ingenuity Pathway Analysis (IPA) software revealed Gene expression, Cell death, Lipid metabolism as the functions associated with highest rated network. Cholesterol biosynthesis was among the most enriched pathways in IPA as well as in PANTHER analysis. These results have been validated by real-time PCR and cholesterol assay using scrambled siRNA as a negative control. Similar findings were also observed with human type II pulmonary alveolar epithelial cells, A549. In the present study we have, for the first time, shown the inverse relationship of STAT6 with the cholesterol biosynthesis in lung cancer cells. The present findings are potentially significant to advance the understanding and design of therapeutics for the pathological conditions where both STAT6 and cholesterol biosynthesis are implicated viz. asthma, atherosclerosis etc.
Collapse
|
6
|
Transcriptional changes of secreted Wnt antagonists in hindlimb skeletal muscle during the lifetime of the C57BL/6J mouse. Mech Ageing Dev 2011; 132:511-4. [PMID: 21855563 DOI: 10.1016/j.mad.2011.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2011] [Revised: 07/12/2011] [Accepted: 07/31/2011] [Indexed: 11/22/2022]
Abstract
The canonical Wnt pathway plays a critical role in myogenesis and age-related inefficient muscle regeneration. To gain insights into changes in Wnt signaling in muscle during the lifetime of a mouse, mRNA levels of secreted Wnt antagonists were investigated. Among 13 analyzed antagonists, seven genes were found to be down-regulated in skeletal muscles of adult and old mice. Epigenetic modifications at the promoter regions of these seven Wnt antagonists were then examined to understand how these correlate with this transcriptional repression. DNA methylation was stably maintained, while chromatin modifications changed to transcriptionally inactive states over the course of a lifetime. Similar patterns of changes in chromatin modifications were observed at the promoters of all of the studied genes. The observations indicated that an upstream factor might regulate the chromatin states and the transcriptional repression of Wnt antagonists. Several bioinformatic analyses revealed that a FOXD3 binding motif is present within promoter regions of the seven antagonists. Furthermore, age-dependent differential FOXD3 binding is observed at the motifs of the seven gene promoters. Our results suggest that FOXD3 as a potential epigenetic regulator may mediate the transcriptional repression of the seven antagonists, possibly through regulation of histone modifications.
Collapse
|
7
|
Li R, Ackerman WE, Summerfield TL, Yu L, Gulati P, Zhang J, Huang K, Romero R, Kniss DA. Inflammatory gene regulatory networks in amnion cells following cytokine stimulation: translational systems approach to modeling human parturition. PLoS One 2011; 6:e20560. [PMID: 21655103 PMCID: PMC3107214 DOI: 10.1371/journal.pone.0020560] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2011] [Accepted: 05/05/2011] [Indexed: 11/18/2022] Open
Abstract
A majority of the studies examining the molecular regulation of human labor have been conducted using single gene approaches. While the technology to produce multi-dimensional datasets is readily available, the means for facile analysis of such data are limited. The objective of this study was to develop a systems approach to infer regulatory mechanisms governing global gene expression in cytokine-challenged cells in vitro, and to apply these methods to predict gene regulatory networks (GRNs) in intrauterine tissues during term parturition. To this end, microarray analysis was applied to human amnion mesenchymal cells (AMCs) stimulated with interleukin-1β, and differentially expressed transcripts were subjected to hierarchical clustering, temporal expression profiling, and motif enrichment analysis, from which a GRN was constructed. These methods were then applied to fetal membrane specimens collected in the absence or presence of spontaneous term labor. Analysis of cytokine-responsive genes in AMCs revealed a sterile immune response signature, with promoters enriched in response elements for several inflammation-associated transcription factors. In comparison to the fetal membrane dataset, there were 34 genes commonly upregulated, many of which were part of an acute inflammation gene expression signature. Binding motifs for nuclear factor-κB were prominent in the gene interaction and regulatory networks for both datasets; however, we found little evidence to support the utilization of pathogen-associated molecular pattern (PAMP) signaling. The tissue specimens were also enriched for transcripts governed by hypoxia-inducible factor. The approach presented here provides an uncomplicated means to infer global relationships among gene clusters involved in cellular responses to labor-associated signals.
Collapse
Affiliation(s)
- Ruth Li
- Division of Maternal-Fetal Medicine and Laboratory of Perinatal Research,
Department of Obstetrics and Gynecology, The Ohio State University, Columbus,
Ohio, United States of America
| | - William E. Ackerman
- Division of Maternal-Fetal Medicine and Laboratory of Perinatal Research,
Department of Obstetrics and Gynecology, The Ohio State University, Columbus,
Ohio, United States of America
| | - Taryn L. Summerfield
- Division of Maternal-Fetal Medicine and Laboratory of Perinatal Research,
Department of Obstetrics and Gynecology, The Ohio State University, Columbus,
Ohio, United States of America
| | - Lianbo Yu
- Center for Biostatistics, The Ohio State University, Columbus, Ohio,
United States of America
| | - Parul Gulati
- Center for Biostatistics, The Ohio State University, Columbus, Ohio,
United States of America
| | - Jie Zhang
- Department of Biomedical Informatics, The Ohio State University,
Columbus, Ohio, United States of America
| | - Kun Huang
- Department of Biomedical Informatics, The Ohio State University,
Columbus, Ohio, United States of America
| | - Roberto Romero
- Perinatology Research Branch, Intramural Division, Eunice Kennedy Shriver
National Institute of Child Health and Human Development, National Institutes of
Health, Department of Health and Human Services, Bethesda, Maryland, United
States of America
- Hutzel Women's Hospital, Detroit, Michigan, United States of
America
| | - Douglas A. Kniss
- Division of Maternal-Fetal Medicine and Laboratory of Perinatal Research,
Department of Obstetrics and Gynecology, The Ohio State University, Columbus,
Ohio, United States of America
- Department of Biomedical Engineering, The Ohio State University,
Columbus, Ohio, United States of America
- * E-mail:
| |
Collapse
|
8
|
Tapia A, Vilos C, Marín JC, Croxatto HB, Devoto L. Bioinformatic detection of E47, E2F1 and SREBP1 transcription factors as potential regulators of genes associated to acquisition of endometrial receptivity. Reprod Biol Endocrinol 2011; 9:14. [PMID: 21272326 PMCID: PMC3040129 DOI: 10.1186/1477-7827-9-14] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/03/2010] [Accepted: 01/27/2011] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND The endometrium is a dynamic tissue whose changes are driven by the ovarian steroidal hormones. Its main function is to provide an adequate substrate for embryo implantation. Using microarray technology, several reports have provided the gene expression patterns of human endometrial tissue during the window of implantation. However it is required that biological connections be made across these genomic datasets to take full advantage of them. The objective of this work was to perform a research synthesis of available gene expression profiles related to acquisition of endometrial receptivity for embryo implantation, in order to gain insights into its molecular basis and regulation. METHODS Gene expression datasets were intersected to determine a consensus endometrial receptivity transcript list (CERTL). For this cluster of genes we determined their functional annotations using available web-based databases. In addition, promoter sequences were analyzed to identify putative transcription factor binding sites using bioinformatics tools and determined over-represented features. RESULTS We found 40 up- and 21 down-regulated transcripts in the CERTL. Those more consistently increased were C4BPA, SPP1, APOD, CD55, CFD, CLDN4, DKK1, ID4, IL15 and MAP3K5 whereas the more consistently decreased were OLFM1, CCNB1, CRABP2, EDN3, FGFR1, MSX1 and MSX2. Functional annotation of CERTL showed it was enriched with transcripts related to the immune response, complement activation and cell cycle regulation. Promoter sequence analysis of genes revealed that DNA binding sites for E47, E2F1 and SREBP1 transcription factors were the most consistently over-represented and in both up- and down-regulated genes during the window of implantation. CONCLUSIONS Our research synthesis allowed organizing and mining high throughput data to explore endometrial receptivity and focus future research efforts on specific genes and pathways. The discovery of possible new transcription factors orchestrating the CERTL opens new alternatives for understanding gene expression regulation in uterine function.
Collapse
Affiliation(s)
- Alejandro Tapia
- Instituto de Investigaciones Materno Infantil (IDIMI), Facultad de Medicina, Universidad de Chile, Santiago, Chile
| | - Cristian Vilos
- Facultad de Química y Biología, Universidad de Santiago de Chile, Santiago, Chile
| | | | - Horacio B Croxatto
- Facultad de Química y Biología, Universidad de Santiago de Chile, Santiago, Chile
- Centro para el Desarrollo de la Nanociencia y la Nanotecnología (CEDENNA), Santiago, Chile
| | - Luigi Devoto
- Instituto de Investigaciones Materno Infantil (IDIMI), Facultad de Medicina, Universidad de Chile, Santiago, Chile
- Centro FONDAP de Estudios Moleculares de la Célula (CEMC), Santiago, Chile
| |
Collapse
|
9
|
Zhu J, Weiss M, Grubman MJ, de los Santos T. Differential gene expression in bovine cells infected with wild type and leaderless foot-and-mouth disease virus. Virology 2010; 404:32-40. [PMID: 20494391 DOI: 10.1016/j.virol.2010.04.021] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2009] [Revised: 01/18/2010] [Accepted: 04/22/2010] [Indexed: 10/19/2022]
Abstract
The leader proteinase (L(pro)) of foot-and-mouth disease virus (FMDV) plays a critical role in viral pathogenesis. Molecular studies have demonstrated that L(pro) inhibits translation of host capped mRNAs and transcription of some genes involved in the innate immune response. We have used microarray technology to study the gene expression profile of bovine cells infected with wild type (WT) or leaderless FMDV. Thirty nine out of approximately 22,000 bovine genes were selectively up-regulated by 2 fold or more in leaderless versus WT virus infected cells. Most of the up-regulated genes corresponded to IFN-inducible genes, chemokines or transcription factors. Comparison of promoter sequences suggested that host factors NF-kappaB, ISGF3G and IRF1 specifically contributed to the differential expression, being NF-kappaB primarily responsible for the observed changes. Our results suggest that L(pro) plays a central role in the FMDV evasion of the innate immune response by inhibiting NF-kappaB dependent gene expression.
Collapse
Affiliation(s)
- James Zhu
- Plum Island Animal Disease Center, North Atlantic Area, Agricultural Research Service, U.S. Department of Agriculture, Greenport, New York 11944, USA
| | | | | | | |
Collapse
|
10
|
McLeay RC, Bailey TL. Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data. BMC Bioinformatics 2010; 11:165. [PMID: 20356413 PMCID: PMC2868005 DOI: 10.1186/1471-2105-11-165] [Citation(s) in RCA: 449] [Impact Index Per Article: 32.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 04/01/2010] [Indexed: 01/01/2023] Open
Abstract
Background A major goal of molecular biology is determining the mechanisms that control the transcription of genes. Motif Enrichment Analysis (MEA) seeks to determine which DNA-binding transcription factors control the transcription of a set of genes by detecting enrichment of known binding motifs in the genes' regulatory regions. Typically, the biologist specifies a set of genes believed to be co-regulated and a library of known DNA-binding models for transcription factors, and MEA determines which (if any) of the factors may be direct regulators of the genes. Since the number of factors with known DNA-binding models is rapidly increasing as a result of high-throughput technologies, MEA is becoming increasingly useful. In this paper, we explore ways to make MEA applicable in more settings, and evaluate the efficacy of a number of MEA approaches. Results We first define a mathematical framework for Motif Enrichment Analysis that relaxes the requirement that the biologist input a selected set of genes. Instead, the input consists of all regulatory regions, each labeled with the level of a biological signal. We then define and implement a number of motif enrichment analysis methods. Some of these methods require a user-specified signal threshold, some identify an optimum threshold in a data-driven way and two of our methods are threshold-free. We evaluate these methods, along with two existing methods (Clover and PASTAA), using yeast ChIP-chip data. Our novel threshold-free method based on linear regression performs best in our evaluation, followed by the data-driven PASTAA algorithm. The Clover algorithm performs as well as PASTAA if the user-specified threshold is chosen optimally. Data-driven methods based on three statistical tests–Fisher Exact Test, rank-sum test, and multi-hypergeometric test—perform poorly, even when the threshold is chosen optimally. These methods (and Clover) perform even worse when unrestricted data-driven threshold determination is used. Conclusions Our novel, threshold-free linear regression method works well on ChIP-chip data. Methods using data-driven threshold determination can perform poorly unless the range of thresholds is limited a priori. The limits implemented in PASTAA, however, appear to be well-chosen. Our novel algorithms—AME (Analysis of Motif Enrichment)—are available at http://bioinformatics.org.au/ame/.
Collapse
Affiliation(s)
- Robert C McLeay
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4072, Australia
| | | |
Collapse
|
11
|
Pandey AK, Munjal N, Datta M. Gene expression profiling and network analysis reveals lipid and steroid metabolism to be the most favored by TNFalpha in HepG2 cells. PLoS One 2010; 5:e9063. [PMID: 20140224 PMCID: PMC2816217 DOI: 10.1371/journal.pone.0009063] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2009] [Accepted: 01/12/2010] [Indexed: 12/11/2022] Open
Abstract
Background The proinflammatory cytokine, TNFα, is a crucial mediator of the pathogenesis of several diseases, more so in cases involving the liver wherein it is critical in maintaining liver homeostasis since it is a major determiner of hepatocyte life and death. Gene expression profiling serves as an appropriate strategy to unravel the underlying signatures to envisage such varied responses and considering this, gene transcription profiling was examined in control and TNFα treated HepG2 cells. Methods and Findings Microarray experiments between control and TNFα treated HepG2 cells indicated that TNFα could significantly alter the expression profiling of 140 genes; among those up-regulated, several GO (Gene Ontology) terms related to lipid and fat metabolism were significantly (p<0.01) overrepresented indicating a global preference of fat metabolism within the hepatocyte and those within the down-regulated dataset included genes involved in several aspects of the immune response like immunoglobulin receptor activity and IgE binding thereby indicating a compromise in the immune defense mechanism(s). Conserved transcription factor binding sites were identified in identically clustered genes within a common GO term and SREBP-1 and FOXJ2 depicted increased occupation of their respective binding elements in the presence of TNFα. The interacting network of “lipid metabolism, small molecule biochemistry” was derived to be significantly overrepresented that correlated well with the top canonical pathway of “biosynthesis of steroids”. Conclusions TNFα alters the transcriptome profiling within HepG2 cells with an interesting catalog of genes being affected and those involved in lipid and steroid metabolism to be the most favored. This study represents a composite analysis of the effects of TNFα in HepG2 cells that encompasses the altered transcriptome profiling, the functional analysis of the up- and down- regulated genes and the identification of conserved transcription factor binding sites. These could possibly determine TNFα mediated alterations mainly the phenotypes of hepatic steatosis and fatty liver associated with several hepatic pathological states.
Collapse
Affiliation(s)
- Amit K. Pandey
- Institute of Genomics and Integrative Biology (Council of Scientific and Industrial Research), Delhi, India
| | - Neha Munjal
- Institute of Genomics and Integrative Biology (Council of Scientific and Industrial Research), Delhi, India
| | - Malabika Datta
- Institute of Genomics and Integrative Biology (Council of Scientific and Industrial Research), Delhi, India
- * E-mail:
| |
Collapse
|
12
|
TaNF-YC11, one of the light-upregulated NF-YC members in Triticum aestivum, is co-regulated with photosynthesis-related genes. Funct Integr Genomics 2010; 10:265-76. [PMID: 20111976 DOI: 10.1007/s10142-010-0158-3] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2009] [Revised: 12/27/2009] [Accepted: 01/01/2010] [Indexed: 10/19/2022]
Abstract
Nuclear factor Y (NF-Y) is a heterotrimeric transcription factor complex. Each of the NF-Y subunits (NF-YA, NF-YB and NF-YC) in plants is encoded by multiple genes. Quantitative RT-PCR analysis revealed that five wheat NF-YC members (TaNF-YC5, 8, 9, 11 and 12) were upregulated by light in both the leaf and seedling shoot. Co-expression analysis of Affymetrix wheat genome array datasets revealed that transcript levels of a large number of genes were consistently correlated with those of the TaNF-YC11 and TaNF-YC8 genes in three to four separate Affymetrix array datasets. TaNF-YC11-correlated transcripts were significantly enriched with the Gene Ontology term photosynthesis. Sequence analysis in the promoters of TaNF-YC11-correlated genes revealed the presence of putative NF-Y complex binding sites (CCAAT motifs). Quantitative RT-PCR analysis of a subset of potential TaNF-YC11 target genes showed that ten out of the 13 genes were also light-upregulated in both the leaf and seedling shoot and had significantly correlated expression profiles with TaNF-YC11. The potential target genes for TaNF-YC11 include subunit members from all four thylakoid membrane-bound complexes required for the conversion of solar energy into chemical energy and rate-limiting enzymes in the Calvin cycle. These data indicate that TaNF-YC11 is potentially involved in regulation of photosynthesis-related genes.
Collapse
|
13
|
Lee WP, Tzou WS. Computational methods for discovering gene networks from expression data. Brief Bioinform 2009; 10:408-23. [PMID: 19505889 DOI: 10.1093/bib/bbp028] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Designing and conducting experiments are routine practices for modern biologists. The real challenge, especially in the post-genome era, usually comes not from acquiring data, but from subsequent activities such as data processing, analysis, knowledge generation and gaining insight into the research question of interest. The approach of inferring gene regulatory networks (GRNs) has been flourishing for many years, and new methods from mathematics, information science, engineering and social sciences have been applied. We review different kinds of computational methods biologists use to infer networks of varying levels of accuracy and complexity. The primary concern of biologists is how to translate the inferred network into hypotheses that can be tested with real-life experiments. Taking the biologists' viewpoint, we scrutinized several methods for predicting GRNs in mammalian cells, and more importantly show how the power of different knowledge databases of different types can be used to identify modules and subnetworks, thereby reducing complexity and facilitating the generation of testable hypotheses.
Collapse
Affiliation(s)
- Wei-Po Lee
- Department of Information Management, National Sun Yat-sen University, Kaohsiung, Taiwan.
| | | |
Collapse
|
14
|
Zambelli F, Pesole G, Pavesi G. Pscan: finding over-represented transcription factor binding site motifs in sequences from co-regulated or co-expressed genes. Nucleic Acids Res 2009; 37:W247-52. [PMID: 19487240 PMCID: PMC2703934 DOI: 10.1093/nar/gkp464] [Citation(s) in RCA: 319] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The first step in gene expression, transcription, is modulated by the interaction of transcription factors with their corresponding binding sites on the DNA sequence. Pscan is a software tool that scans a set of sequences (e.g. promoters) from co-regulated or co-expressed genes with motifs describing the binding specificity of known transcription factors and assesses which motifs are significantly over- or under-represented, providing thus hints on which transcription factors could be common regulators of the genes studied, together with the location of their candidate binding sites in the sequences. Pscan does not resort to comparisons with orthologous sequences and experimental results show that it compares favorably to other tools for the same task in terms of false positive predictions and computation time. The website is free and open to all users and there is no login requirement. Address: http://www.beaconlab.it/pscan.
Collapse
Affiliation(s)
- Federico Zambelli
- Dipartimento di Scienze Biomolecolari e Biotecnologie, University of Milan, Milan, Italy
| | | | | |
Collapse
|
15
|
Kerr JR. Gene profiling of patients with chronic fatigue syndrome/myalgic encephalomyelitis. Curr Rheumatol Rep 2009; 10:482-91. [PMID: 19007540 DOI: 10.1007/s11926-008-0079-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Chronic fatigue syndrome/myalgic encephalomyelitis (CFS/ME) is a multisystem disease, the pathogenesis of which remains undetermined. Following two microarray studies, we reported the differential expression of 88 human genes in patients with CFS; 85 of these genes were upregulated and 3 were downregulated. The top functional categories of these 88 genes were hematologic disease and function, immunologic disease and function, cancer, cell death, immune response, and infection. Clustering of quantitative polymerase chain reaction data from CFS/ME patients revealed seven subtypes with distinct differences in Short Form (SF)-36 scores, clinical phenotypes, and severity. Gene signatures in each subtype implicate five human genes as possible targets for specific therapy. Development of a diagnostic test for subtype status is now a priority. The possibility that these subtypes represent individual host responses to particular microbial infections is being investigated and may provide another route to specific therapies for CFS patients.
Collapse
Affiliation(s)
- Jonathan R Kerr
- St. George's University of London, Cranmer Terrace, London SW17 0RE, United Kingdom.
| |
Collapse
|
16
|
Ho LHM, Giraud E, Uggalla V, Lister R, Clifton R, Glen A, Thirkettle-Watts D, Van Aken O, Whelan J. Identification of regulatory pathways controlling gene expression of stress-responsive mitochondrial proteins in Arabidopsis. PLANT PHYSIOLOGY 2008; 147:1858-73. [PMID: 18567827 PMCID: PMC2492625 DOI: 10.1104/pp.108.121384] [Citation(s) in RCA: 107] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2008] [Accepted: 06/11/2008] [Indexed: 05/17/2023]
Abstract
In this study we analyzed transcript abundance and promoters of genes encoding mitochondrial proteins to identify signaling pathways that regulate stress-induced gene expression. We used Arabidopsis (Arabidopsis thaliana) alternative oxidase AOX1a, external NADP H-dehydrogenase NDB2, and two additional highly stress-responsive genes, At2g21640 and BCS1. As a starting point, the promoter region of AOX1a was analyzed and functional analysis identified 10 cis-acting regulatory elements (CAREs), which played a role in response to treatment with H(2)O(2), rotenone, or both. Six of these elements were also functional in the NDB2 promoter. The promoter region of At2g21640, previously defined as a hallmark of oxidative stress, shared two functional CAREs with AOX1a and was responsive to treatment with H(2)O(2) but not rotenone. Microarray analysis further supported that signaling pathways induced by H(2)O(2) and rotenone are not identical. The promoter of BCS1 was not responsive to H(2)O(2) or rotenone, but highly responsive to salicylic acid (SA), whereas the promoters of AOX1a and NDB2 were unresponsive to SA. Analysis of transcript abundance of these genes in a variety of defense signaling mutants confirmed that BCS1 expression is regulated in a different manner compared to AOX1a, NDB2, and At2g21640. These mutants also revealed a pathway associated with programmed cell death that regulated AOX1a in a manner distinct from the other genes. Thus, at least three distinctive pathways regulate mitochondrial stress response at a transcriptional level, an SA-dependent pathway represented by BCS1, a second pathway that represents a convergence point for signals generated by H(2)O(2) and rotenone on multiple CAREs, some of which are shared between responsive genes, and a third pathway that acts via EDS1 and PAD4 regulating only AOX1a. Furthermore, posttranscriptional regulation accounts for changes in transcript abundance by SA treatment for some genes.
Collapse
Affiliation(s)
- Lois H M Ho
- Australian Research Council Centre of Excellence in Plant Energy Biology, University of Western Australia, Crawley 6009, Western Australia, Australia
| | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Gotea V, Ovcharenko I. DiRE: identifying distant regulatory elements of co-expressed genes. Nucleic Acids Res 2008; 36:W133-9. [PMID: 18487623 PMCID: PMC2447744 DOI: 10.1093/nar/gkn300] [Citation(s) in RCA: 105] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2007] [Revised: 04/23/2008] [Accepted: 04/29/2008] [Indexed: 11/13/2022] Open
Abstract
Regulation of gene expression in eukaryotic genomes is established through a complex cooperative activity of proximal promoters and distant regulatory elements (REs) such as enhancers, repressors and silencers. We have developed a web server named DiRE, based on the Enhancer Identification (EI) method, for predicting distant regulatory elements in higher eukaryotic genomes, namely for determining their chromosomal location and functional characteristics. The server uses gene co-expression data, comparative genomics and profiles of transcription factor binding sites (TFBSs) to determine TFBS-association signatures that can be used for discriminating specific regulatory functions. DiRE's unique feature is its ability to detect REs outside of proximal promoter regions, as it takes advantage of the full gene locus to conduct the search. DiRE can predict common REs for any set of input genes for which the user has prior knowledge of co-expression, co-function or other biologically meaningful grouping. The server predicts function-specific REs consisting of clusters of specifically-associated TFBSs and it also scores the association of individual transcription factors (TFs) with the biological function shared by the group of input genes. Its integration with the Array2BIO server allows users to start their analysis with raw microarray expression data. The DiRE web server is freely available at http://dire.dcode.org.
Collapse
Affiliation(s)
| | - Ivan Ovcharenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894
| |
Collapse
|
18
|
Kerr JR, Petty R, Burke B, Gough J, Fear D, Sinclair LI, Mattey DL, Richards SCM, Montgomery J, Baldwin DA, Kellam P, Harrison TJ, Griffin GE, Main J, Enlander D, Nutt DJ, Holgate ST. Gene expression subtypes in patients with chronic fatigue syndrome/myalgic encephalomyelitis. J Infect Dis 2008; 197:1171-84. [PMID: 18462164 DOI: 10.1086/533453] [Citation(s) in RCA: 90] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
Chronic fatigue syndrome/myalgic encephalomyelitis (CFS/ME) is a multisystem disease, the pathogenesis of which remains undetermined. We set out to determine the precise abnormalities of gene expression in the blood of patients with CFS/ME. We analyzed gene expression in peripheral blood from 25 patients with CFS/ME diagnosed according to the Centers for Disease Control and Prevention diagnostic criteria and 50 healthy blood donors, using a microarray with a cutoff fold difference of expression of >or=2.5. Genes showing differential expression were further analyzed in 55 patients with CFS/ME and 75 healthy blood donors, using quantitative polymerase chain reaction. Differential expression was confirmed for 88 genes; 85 were upregulated, and 3 were downregulated. Highly represented functions were hematological disease and function, immunological disease and function, cancer, cell death, immune response, and infection. Clustering of quantitative polymerase chain reaction data from patients with CFS/ME revealed 7 subtypes with distinct differences in Medical Outcomes Survey Short Form-36 scores, clinical phenotypes, and severity.
Collapse
Affiliation(s)
- Jonathan R Kerr
- Department of Cellular & Molecular Medicine, St. George's University of London, London.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
|
20
|
Abstract
The formation of diverse cell types from an invariant set of genes is governed by biochemical and molecular processes that regulate gene activity. A complete understanding of the regulatory mechanisms of gene expression is the major function of genomics. Computational genomics is a rapidly emerging area for deciphering the regulation of metazoan genes as well as interpreting the results of high-throughput screening. The integration of computer science with biology has expedited molecular modelling and processing of large-scale data inputs such as microarrays, analysis of genomes, transcriptomes and proteomes. Many bioinformaticians have developed various algorithms for predicting transcriptional regulatory mechanisms from the sequence, gene expression and interaction data. This review contains compiled information of various computational methods adopted to dissect gene expression pathways.
Collapse
Affiliation(s)
- Vibha Rani
- Department of Biotechnology, Jaypee Institute of Information Technology University, A-10, Sector 62, Noida 210 307, India.
| |
Collapse
|
21
|
Sarkar C, Maitra A. Deciphering the cis-regulatory elements of co-expressed genes in PCOS by in silico analysis. Gene 2008; 408:72-84. [DOI: 10.1016/j.gene.2007.10.026] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2007] [Revised: 10/11/2007] [Accepted: 10/17/2007] [Indexed: 01/30/2023]
|
22
|
Tabach Y, Brosh R, Buganim Y, Reiner A, Zuk O, Yitzhaky A, Koudritsky M, Rotter V, Domany E. Wide-scale analysis of human functional transcription factor binding reveals a strong bias towards the transcription start site. PLoS One 2007; 2:e807. [PMID: 17726537 PMCID: PMC1950076 DOI: 10.1371/journal.pone.0000807] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2007] [Accepted: 07/24/2007] [Indexed: 01/07/2023] Open
Abstract
Background Transcription factors (TF) regulate expression by binding to specific DNA sequences. A binding event is functional when it affects gene expression. Functionality of a binding site is reflected in conservation of the binding sequence during evolution and in over represented binding in gene groups with coherent biological functions. Functionality is governed by several parameters such as the TF-DNA binding strength, distance of the binding site from the transcription start site (TSS), DNA packing, and more. Understanding how these parameters control functionality of different TFs in different biological contexts is a must for identifying functional TF binding sites and for understanding regulation of transcription. Methodology/Principal Findings We introduce a novel method to screen the promoters of a set of genes with shared biological function (obtained from the functional Gene Ontology (GO) classification) against a precompiled library of motifs, and find those motifs which are statistically over-represented in the gene set. More than 8000 human (and 23,000 mouse) genes, were assigned to one of 134 GO sets. Their promoters were searched (from 200 bp downstream to 1000 bp upstream the TSS) for 414 known DNA motifs. We optimized the sequence similarity score threshold, independently for every location window, taking into account nucleotide heterogeneity along the promoters of the target genes. The method, combined with binding sequence and location conservation between human and mouse, identifies with high probability functional binding sites for groups of functionally-related genes. We found many location-sensitive functional binding events and showed that they clustered close to the TSS. Our method and findings were tested experimentally. Conclusions/Significance We identified reliably functional TF binding sites. This is an essential step towards constructing regulatory networks. The promoter region proximal to the TSS is of central importance for regulation of transcription in human and mouse, just as it is in bacteria and yeast.
Collapse
Affiliation(s)
- Yuval Tabach
- Department of Physics of Complex Systems, The Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, The Weizmann Institute of Science, Rehovot, Israel
| | - Ran Brosh
- Department of Molecular Cell Biology, The Weizmann Institute of Science, Rehovot, Israel
| | - Yossi Buganim
- Department of Molecular Cell Biology, The Weizmann Institute of Science, Rehovot, Israel
| | - Anat Reiner
- Department of Physics of Complex Systems, The Weizmann Institute of Science, Rehovot, Israel
| | - Or Zuk
- Department of Physics of Complex Systems, The Weizmann Institute of Science, Rehovot, Israel
| | - Assif Yitzhaky
- Department of Physics of Complex Systems, The Weizmann Institute of Science, Rehovot, Israel
| | - Mark Koudritsky
- Department of Physics of Complex Systems, The Weizmann Institute of Science, Rehovot, Israel
| | - Varda Rotter
- Department of Molecular Cell Biology, The Weizmann Institute of Science, Rehovot, Israel
| | - Eytan Domany
- Department of Physics of Complex Systems, The Weizmann Institute of Science, Rehovot, Israel
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
23
|
Karmaker A, Harris SE, Kwek S. Constructing human transcriptional regulatory subnets from crossgenome comparison and gene expression profile analysis. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2007; 11:397-412. [PMID: 18092911 DOI: 10.1089/omi.2007.0028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
With the completion of Human Genome Project (HGP), understanding the complex interaction between trans- and cis-regulatory elements comprehensively and identifying these potential functional elements are fundamental problems in functional genomics. Although many computational approaches have been developed for lower eukaryotes and prokaryotes, most of them often do not generalize to vertebrates. Here, we use a decay function to characterize transcriptional behavior, and analyze correlations on gene expression profiles of human and mouse to construct coregulated gene groups. Using these two closely related species, we perform comparative genome analysis and identify target genes and conserved functional cis-regulatory elements by motif overrepresentation. Moreover, we presented experimental evidences (ChIP-Chip) for E2F to assert our findings.
Collapse
Affiliation(s)
- Amitava Karmaker
- Department of Computer Science, University of Texas at San Antonio, San Antonio, TX 78249, USA.
| | | | | |
Collapse
|
24
|
Yan B, Lovley DR, Krushkal J. Genome-wide similarity search for transcription factors and their binding sites in a metal-reducing prokaryote Geobacter sulfurreducens. Biosystems 2006; 90:421-41. [PMID: 17184904 DOI: 10.1016/j.biosystems.2006.10.006] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2006] [Revised: 09/21/2006] [Accepted: 10/20/2006] [Indexed: 12/26/2022]
Abstract
The knowledge obtained from understanding individual elements involved in gene regulation is important for reconstructing gene regulatory networks, a key for understanding cellular behavior. To study gene regulatory interactions in a model microorganism, Geobacter sulfurreducens, which participates in metal reduction and energy harvesting, we investigated the presence of 59 known Escherichia coli transcription factors and predicted transcription regulatory sites in its genome. The supplementary material, available at http://www.geobacter.org/research/genomescan/, provides the results of similarity comparisons that identified regulatory proteins of G. sulfurreducens and the genome locations of the predicted regulatory sites, including the list of putative regulatory elements in the upstream regions of every predicted operon and singleton open reading frame. Regulatory sequence elements, predicted using genome similarity searches to matrices of established transcription regulatory elements from E. coli, provide an initial insight into regulation of genes and operons in G. sulfurreducens. The predicted regulatory elements were predominantly located in the upstream regions of operons and singleton open reading frames. The validity of the predictions was examined using a permutation approach. Sequence similarity searches indicate that E. coli transcription factors ArgR, CytR, DeoR, FlhCD (both FlhC and FlhD subunits), FruR, GalR, GlpR, H-NS, LacI, MetJ, PurR, TrpR, and Tus are likely missing from G. sulfurreducens. Phylogenetic analysis suggests that one HU subunit is present in G. sulfurreducens as compared to two subunits in E. coli, while each of the two E. coli IHF subunits, HimA and HimD, have two homologs in G. sulfurreducens. The closest homolog of E. coli RpoE in G. sulfurreducens may be more similar to FecI than to RpoE. These findings represent the first step in the understanding of the regulatory relationships in G. sulfurreducens on the genome scale.
Collapse
Affiliation(s)
- Bin Yan
- Department of Preventive Medicine, University of Tennessee Health Science Center, 66 N. Pauline St., Ste. 633, Memphis, TN 38163, USA
| | | | | |
Collapse
|
25
|
Cheung TH, Kwan YL, Hamady M, Liu X. Unraveling transcriptional control and cis-regulatory codes using the software suite GeneACT. Genome Biol 2006; 7:R97. [PMID: 17064417 PMCID: PMC1794569 DOI: 10.1186/gb-2006-7-10-r97] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2006] [Revised: 09/18/2006] [Accepted: 10/25/2006] [Indexed: 01/16/2023] Open
Abstract
Deciphering gene regulatory networks requires the systematic identification of functional cis-acting regulatory elements. We present a suite of web-based bioinformatics tools, called GeneACT http://promoter.colorado.edu, that can rapidly detect evolutionarily conserved transcription factor binding sites or microRNA target sites that are either unique or over-represented in differentially expressed genes from DNA microarray data. GeneACT provides graphic visualization and extraction of common regulatory sequence elements in the promoters and 3'-untranslated regions that are conserved across multiple mammalian species.
Collapse
Affiliation(s)
- Tom Hiu Cheung
- Department of Chemistry and Biochemistry, University of Colorado, 215 UCB, Boulder, Colorado 80309, USA
| | - Yin Lam Kwan
- Department of Computer Science, University of Colorado, 430 UCB, Boulder, Colorado 80309, USA
| | - Micah Hamady
- Department of Computer Science, University of Colorado, 430 UCB, Boulder, Colorado 80309, USA
| | - Xuedong Liu
- Department of Chemistry and Biochemistry, University of Colorado, 215 UCB, Boulder, Colorado 80309, USA
| |
Collapse
|
26
|
Defrance M, Touzet H. Predicting transcription factor binding sites using local over-representation and comparative genomics. BMC Bioinformatics 2006; 7:396. [PMID: 16945132 PMCID: PMC1570149 DOI: 10.1186/1471-2105-7-396] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2006] [Accepted: 08/31/2006] [Indexed: 12/02/2022] Open
Abstract
Background Identifying cis-regulatory elements is crucial to understanding gene expression, which highlights the importance of the computational detection of overrepresented transcription factor binding sites (TFBSs) in coexpressed or coregulated genes. However, this is a challenging problem, especially when considering higher eukaryotic organisms. Results We have developed a method, named TFM-Explorer, that searches for locally overrepresented TFBSs in a set of coregulated genes, which are modeled by profiles provided by a database of position weight matrices. The novelty of the method is that it takes advantage of spatial conservation in the sequence and supports multiple species. The efficiency of the underlying algorithm and its robustness to noise allow weak regulatory signals to be detected in large heterogeneous data sets. Conclusion TFM-Explorer provides an efficient way to predict TFBS overrepresentation in related sequences. Promising results were obtained in a variety of examples in human, mouse, and rat genomes. The software is publicly available at .
Collapse
Affiliation(s)
- Matthieu Defrance
- LIFL, UMR CNRS 8022, Université des Sciences et Technologies de Lille, Villeneuve d'Ascq, France
| | - Hélène Touzet
- LIFL, UMR CNRS 8022, Université des Sciences et Technologies de Lille, Villeneuve d'Ascq, France
| |
Collapse
|
27
|
Abstract
Most forms of neuronal plasticity are associated with induction of the transcription factor Zif268 (Egr1/Krox24/NGF-IA). In a genome-wide scan, we obtained evidence for potential modulation of proteasome subunit and regulatory genes by Zif268 in neurons, a finding of significance considering emerging evidence that the proteasome modulates synaptic function. Bioinformatic analysis indicated that the candidate proteasome Zif268 target genes had a rich concentration of putative Zif268 binding sites immediately upstream of the transcriptional start sites. Regulation of the mRNAs encoding the psmb9 (Lmp2) and psme2 (PA28beta) proteasome subunits, along with the proteasome-regulatory kinase serum/glucocorticoid-regulated kinase (SGK) and the proteasome-associated antigen peptide transporter subunit 1 (Tap1), was confirmed after transfection of a neuronal cell line with Zif268. Conversely, these mRNAs were upregulated in cerebral cortex tissue from Zif268 knock-out mice relative to controls, confirming that Zif268 suppresses their expression in the CNS. Transfected Zif268 reduced the activity of psmb9, SGK, and Tap1 promoter-reporter constructs. Altered psmb9, SGK, and Tap1 mRNA levels were also observed in an in vivo model of neuronal plasticity involving Zif268 induction: the effect of haloperidol administration on striatal gene expression. Consistent with these effects on proteasome gene expression, increased Zif268 expression suppressed proteasome activity, whereas Zif268 knock-out mice exhibited elevated cortical proteasome activity. Our findings reveal that Zif268 regulates the expression of proteasome and related genes in neuronal cells and provide new evidence that altered expression of proteasome activity after Zif268 induction may be a key component of long-lasting CNS plasticity.
Collapse
Affiliation(s)
- Allan B James
- Division of Neuroscience and Biomedical Systems, Institute of Biomedical and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom.
| | | | | |
Collapse
|
28
|
Abnizova I, Gilks WR. Studying statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the eukaryotic genomes. Brief Bioinform 2006; 7:48-54. [PMID: 16761364 DOI: 10.1093/bib/bbk004] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
There are no well-known properties in regulatory DNA analogous to those in coding sequences; their spatial location is not regular, the consensus regulatory elements are often degenerate and there are no understandable rules governing their evolution. This makes it difficult to recognize regulatory regions within genome. We review developments in the statistical characterization of regulatory regions and methods of their recognition in eukaryotic genomes.
Collapse
|
29
|
Ho Sui SJ, Mortimer JR, Arenillas DJ, Brumm J, Walsh CJ, Kennedy BP, Wasserman WW. oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res 2005; 33:3154-64. [PMID: 15933209 PMCID: PMC1142402 DOI: 10.1093/nar/gki624] [Citation(s) in RCA: 310] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Targeted transcript profiling studies can identify sets of co-expressed genes; however, identification of the underlying functional mechanism(s) is a significant challenge. Established methods for the analysis of gene annotations, particularly those based on the Gene Ontology, can identify functional linkages between genes. Similar methods for the identification of over-represented transcription factor binding sites (TFBSs) have been successful in yeast, but extension to human genomics has largely proved ineffective. Creation of a system for the efficient identification of common regulatory mechanisms in a subset of co-expressed human genes promises to break a roadblock in functional genomics research. We have developed an integrated system that searches for evidence of co-regulation by one or more transcription factors (TFs). oPOSSUM combines a pre-computed database of conserved TFBSs in human and mouse promoters with statistical methods for identification of sites over-represented in a set of co-expressed genes. The algorithm successfully identified mediating TFs in control sets of tissue-specific genes and in sets of co-expressed genes from three transcript profiling studies. Simulation studies indicate that oPOSSUM produces few false positives using empirically defined thresholds and can tolerate up to 50% noise in a set of co-expressed genes.
Collapse
Affiliation(s)
- Shannan J. Ho Sui
- Centre for Molecular Medicine and Therapeutics, University of British ColumbiaVancouver, BC, Canada
- Genetics Graduate Program, University of British ColumbiaVancouver, BC, Canada
| | | | - David J. Arenillas
- Centre for Molecular Medicine and Therapeutics, University of British ColumbiaVancouver, BC, Canada
- Department of Medical Genetics, University of British ColumbiaVancouver, BC, Canada
| | - Jochen Brumm
- Centre for Molecular Medicine and Therapeutics, University of British ColumbiaVancouver, BC, Canada
- Department of Statistics, University of British ColumbiaVancouver, BC, Canada
| | - Christopher J. Walsh
- Centre for Molecular Medicine and Therapeutics, University of British ColumbiaVancouver, BC, Canada
- Genetics Graduate Program, University of British ColumbiaVancouver, BC, Canada
| | - Brian P. Kennedy
- Department of Medical Genetics, University of British ColumbiaVancouver, BC, Canada
| | - Wyeth W. Wasserman
- Centre for Molecular Medicine and Therapeutics, University of British ColumbiaVancouver, BC, Canada
- Merck Frosst Centre for Therapeutic ResearchKirkland QC, Canada
- To whom correspondence should be addressed. Tel: +1 604 875 3812; Fax: +1 604 875 3819;
| |
Collapse
|
30
|
Yap YL, Lam DCL, Luc G, Zhang XW, Hernandez D, Gras R, Wang E, Chiu SW, Chung LP, Lam WK, Smith DK, Minna JD, Danchin A, Wong MP. Conserved transcription factor binding sites of cancer markers derived from primary lung adenocarcinoma microarrays. Nucleic Acids Res 2005; 33:409-21. [PMID: 15653641 PMCID: PMC546166 DOI: 10.1093/nar/gki188] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Gene transcription in a set of 49 human primary lung adenocarcinomas and 9 normal lung tissue samples was examined using Affymetrix GeneChip technology. A total of 3442 genes, called the set MAD, were found to be either up- or down-regulated by at least 2-fold between the two phenotypes. Genes assigned to a particular gene ontology term were found, in many cases, to be significantly unevenly distributed between the genes in and outside MAD. Terms that were overrepresented in MAD included functions directly implicated in the cancer cell metabolism. Based on their functional roles and expression profiles, genes in MAD were grouped into likely co-regulated gene sets. Highly conserved sequences in the 5 kb region upstream of the genes in these sets were identified with the motif discovery tool, MoDEL. Potential oncogenic transcription factors and their corresponding binding sites were identified in these conserved regions using the TRANSFAC 8.3 database. Several of the transcription factors identified in this study have been shown elsewhere to be involved in oncogenic processes. This study searched beyond phenotypic gene expression profiles in cancer cells, in order to identify the more important regulatory transcription factors that caused these aberrations in gene expression.
Collapse
Affiliation(s)
- Yee Leng Yap
- HKU-Pasteur Research Centre Dexter H.C. Man Building, 8 Sassoon Road Pokfulam, Hong Kong, China.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Alkema WBL, Lenhard B, Wasserman WW. Regulog analysis: detection of conserved regulatory networks across bacteria: application to Staphylococcus aureus. Genome Res 2004; 14:1362-73. [PMID: 15231752 PMCID: PMC442153 DOI: 10.1101/gr.2242604] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
A transcriptional regulatory network encompasses sets of genes (regulons) whose expression states are directly altered in response to an activating signal, mediated by trans-acting regulatory proteins and cis-acting regulatory sequences. Enumeration of these network components is an essential step toward the creation of a framework for systems-based analysis of biological processes. Profile-based methods for the detection of cis-regulatory elements are often applied to predict regulon members, but they suffer from poor specificity. In this report we describe Regulogger, a novel computational method that uses comparative genomics to eliminate spurious members of predicted gene regulons. Regulogger produces regulogs, sets of coregulated genes for which the regulatory sequence has been conserved across multiple organisms. The quantitative method assigns a confidence score to each predicted regulog member on the basis of the degree of conservation of protein sequence and regulatory mechanisms. When applied to a reference collection of regulons from Escherichia coli, Regulogger increased the specificity of predictions up to 25-fold over methods that use cis-element detection in isolation. The enhanced specificity was observed across a wide range of biologically meaningful parameter combinations, indicating a robust and broad utility for the method. The power of computational pattern discovery methods coupled with Regulogger to unravel transcriptional networks was demonstrated in an analysis of the genome of Staphylococcus aureus. A total of 125 regulogs were found in this organism, including both well-defined functional groups and a subset with unknown functions.
Collapse
Affiliation(s)
- Wynand B L Alkema
- Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
| | | | | |
Collapse
|
32
|
Fu Y, Frith MC, Haverty PM, Weng Z. MotifViz: an analysis and visualization tool for motif discovery. Nucleic Acids Res 2004; 32:W420-3. [PMID: 15215422 PMCID: PMC441564 DOI: 10.1093/nar/gkh426] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Detecting overrepresented known transcription factor binding motifs in a set of promoter sequences of co-regulated genes has become an important approach to deciphering transcriptional regulatory mechanisms. In this paper, we present an interactive web server, MotifViz, for three motif discovery programs, Clover, Rover and Motifish, covering most available flavors of algorithms for achieving this goal. For comparison, we have also implemented the simple motif-matching program Possum. MotifViz provides uniform and intuitive input and output formats for all four programs. It can be accessed at http://biowulf.bu.edu/MotifViz.
Collapse
Affiliation(s)
- Yutao Fu
- Bioinformatics Program, Boston University, Boston, MA 02215, USA
| | | | | | | |
Collapse
|
33
|
Frith MC, Fu Y, Yu L, Chen JF, Hansen U, Weng Z. Detection of functional DNA motifs via statistical over-representation. Nucleic Acids Res 2004; 32:1372-81. [PMID: 14988425 PMCID: PMC390287 DOI: 10.1093/nar/gkh299] [Citation(s) in RCA: 317] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The interaction of proteins with DNA recognition motifs regulates a number of fundamental biological processes, including transcription. To understand these processes, we need to know which motifs are present in a sequence and which factors bind to them. We describe a method to screen a set of DNA sequences against a precompiled library of motifs, and assess which, if any, of the motifs are statistically over- or under-represented in the sequences. Over-represented motifs are good candidates for playing a functional role in the sequences, while under-representation hints that if the motif were present, it would have a harmful dysregulatory effect. We apply our method (implemented as a computer program called Clover) to dopamine-responsive promoters, sequences flanking binding sites for the transcription factor LSF, sequences that direct transcription in muscle and liver, and Drosophila segmentation enhancers. In each case Clover successfully detects motifs known to function in the sequences, and intriguing and testable hypotheses are made concerning additional motifs. Clover compares favorably with an ab initio motif discovery algorithm based on sequence alignment, when the motif library includes only a homolog of the factor that actually regulates the sequences. It also demonstrates superior performance over two contingency table based over-representation methods. In conclusion, Clover has the potential to greatly accelerate characterization of signals that regulate transcription.
Collapse
Affiliation(s)
- Martin C Frith
- Bioinformatics Program, Boston University, 44 Cummington Street, Boston, MA 02215, USA
| | | | | | | | | | | |
Collapse
|
34
|
Haverty PM, Hansen U, Weng Z. Computational inference of transcriptional regulatory networks from expression profiling and transcription factor binding site identification. Nucleic Acids Res 2004; 32:179-88. [PMID: 14704355 PMCID: PMC373293 DOI: 10.1093/nar/gkh183] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We have developed a computational method for transcriptional regulatory network inference, CARRIE (Computational Ascertainment of Regu latory Relationships Inferred from Expression), which combines microarray and promoter sequence analysis. CARRIE uses sources of data to identify the transcription factors (TFs) that regulate gene expression changes in response to a stimulus and generates testable hypotheses about the regulatory network connecting these TFs to the genes they regulate. The promoter analysis component of CARRIE, ROVER (Relative OVER-abundance of cis-elements), is highly accurate at detecting the TFs that regulate the response to a stimulus. ROVER also predicts which genes are regulated by each of these TFs. CARRIE uses these transcriptional interactions to infer a regulatory network. To demonstrate our method, we applied CARRIE to six sets of publicly available DNA microarray experiments on Saccharomyces cerevisiae. The predicted networks were validated with comparisons to literature sources, experimental TF binding data, and gene ontology biological process information.
Collapse
Affiliation(s)
- Peter M Haverty
- Bioinformatics Program, Boston University, 44 Cummington Street, Boston, MA 02215, USA
| | | | | |
Collapse
|