1
|
Ibrahim-Alobaide MA, Abdelsalam AG, Alobydi H, Rasul KI, Zhang R, Srivenugopal KS. Characterization of regulatory sequences in alternative promoters of hypermethylated genes associated with tumor resistance to cisplatin. Mol Clin Oncol 2015; 3:408-414. [PMID: 25798277 DOI: 10.3892/mco.2014.468] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Accepted: 10/23/2014] [Indexed: 01/03/2023] Open
Abstract
The development of cisplatin resistance in human cancers is controlled by multiple genes and leads to therapeutic failure. Hypermethylation of specific gene promoters is a key event in clinical resistance to cisplatin. Although the usage of multiple promoters is frequent in the transcription of human genes, the role of alternative promoters and their regulatory sequences have not yet been investigated in cisplatin resistance genes. In a new approach, we hypothesized that human cancers exploit the specific transcription factor-binding sites (TFBS) and CpG islands (CGIs) located in the alternative promoters of certain genes to acquire platinum drug resistance. To provide a useful resource of regulatory elements associated with cisplatin resistance, we investigated the TFBS and CGIs in 48 alternative promoters of 14 hypermethylated cisplatin resistance genes previously reported. CGIs prone to methylation were identified in 28 alternative promoters of 11 hypermethylated genes. The majority of alternative promoters harboring CGIs (93%) were clustered in one phylogenetic subclass, whereas the ones lacking CGIs were distributed in two unrelated subclasses. Regulatory sequences, initiator and TATA-532 prevailed over TATA-8 and were found in all the promoters. B recognition element (BRE) sequences were present only in alternative promoters harboring CGIs, but CCAAT and TAACC were found in both types of alternative promoters, whereas downstream promoter element sequences were significantly less frequent. Therefore, it was hypothesized that BRE and CGI sequences co-localized in alternative promoters of cisplatin resistance genes may be used to design molecular markers for drug resistance. A more extensive knowledge of alternative promoters and their regulatory elements in clinical resistance to cisplatin is likely to usher novel avenues for sensitizing human cancers to treatment.
Collapse
Affiliation(s)
- Mohammed A Ibrahim-Alobaide
- Department of Biomedical Sciences and Cancer Biology Research Center, School of Pharmacy, Texas Tech University Health Sciences Center, Amarillo, TX 79106, USA
| | - Abdelsalam G Abdelsalam
- Department of Mathematics, Statistics and Physics, College of Arts and Sciences, Qatar University, Doha, Qatar ; Department of Statistics, Faculty of Economics and Political Sciences, Cairo University, Giza 12613, Egypt
| | | | | | - Ruiwen Zhang
- Department of Pharmaceutical Sciences and Cancer Biology Research Center, School of Pharmacy, Texas Tech University Health Sciences Center, Amarillo, TX 79106, USA
| | - Kalkunte S Srivenugopal
- Department of Biomedical Sciences and Cancer Biology Research Center, School of Pharmacy, Texas Tech University Health Sciences Center, Amarillo, TX 79106, USA
| |
Collapse
|
2
|
Navarro C, Lopez FJ, Cano C, Garcia-Alcalde F, Blanco A. CisMiner: genome-wide in-silico cis-regulatory module prediction by fuzzy itemset mining. PLoS One 2014; 9:e108065. [PMID: 25268582 PMCID: PMC4182448 DOI: 10.1371/journal.pone.0108065] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Accepted: 08/25/2014] [Indexed: 01/18/2023] Open
Abstract
Eukaryotic gene control regions are known to be spread throughout non-coding DNA sequences which may appear distant from the gene promoter. Transcription factors are proteins that coordinately bind to these regions at transcription factor binding sites to regulate gene expression. Several tools allow to detect significant co-occurrences of closely located binding sites (cis-regulatory modules, CRMs). However, these tools present at least one of the following limitations: 1) scope limited to promoter or conserved regions of the genome; 2) do not allow to identify combinations involving more than two motifs; 3) require prior information about target motifs. In this work we present CisMiner, a novel methodology to detect putative CRMs by means of a fuzzy itemset mining approach able to operate at genome-wide scale. CisMiner allows to perform a blind search of CRMs without any prior information about target CRMs nor limitation in the number of motifs. CisMiner tackles the combinatorial complexity of genome-wide cis-regulatory module extraction using a natural representation of motif combinations as itemsets and applying the Top-Down Fuzzy Frequent- Pattern Tree algorithm to identify significant itemsets. Fuzzy technology allows CisMiner to better handle the imprecision and noise inherent to regulatory processes. Results obtained for a set of well-known binding sites in the S. cerevisiae genome show that our method yields highly reliable predictions. Furthermore, CisMiner was also applied to putative in-silico predicted transcription factor binding sites to identify significant combinations in S. cerevisiae and D. melanogaster, proving that our approach can be further applied genome-wide to more complex genomes. CisMiner is freely accesible at: http://genome2.ugr.es/cisminer. CisMiner can be queried for the results presented in this work and can also perform a customized cis-regulatory module prediction on a query set of transcription factor binding sites provided by the user.
Collapse
Affiliation(s)
- Carmen Navarro
- Department of Computer Science and AI, University of Granada, Granada, Spain
| | - Francisco J. Lopez
- Andalusian Human Genome Sequencing Centre (CASEGH), Medical Genome Project (MGP), Sevilla, Spain
| | - Carlos Cano
- Department of Computer Science and AI, University of Granada, Granada, Spain
| | | | - Armando Blanco
- Department of Computer Science and AI, University of Granada, Granada, Spain
| |
Collapse
|
3
|
Iglesias-Fernández R, Wozny D, Iriondo-de Hond M, Oñate-Sánchez L, Carbonero P, Barrero-Sicilia C. The AtCathB3 gene, encoding a cathepsin B-like protease, is expressed during germination of Arabidopsis thaliana and transcriptionally repressed by the basic leucine zipper protein GBF1. JOURNAL OF EXPERIMENTAL BOTANY 2014; 65:2009-21. [PMID: 24600022 PMCID: PMC3991739 DOI: 10.1093/jxb/eru055] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Protein hydrolysis plays an important role during seed germination and post-germination seedling establishment. In Arabidopsis thaliana, cathepsin B-like proteases are encoded by a gene family of three members, but only the AtCathB3 gene is highly induced upon seed germination and at the early post-germination stage. Seeds of a homozygous T-DNA insertion mutant in the AtCathB3 gene have, besides a reduced cathepsin B activity, a slower germination than the wild type. To explore the transcriptional regulation of this gene, we used a combined phylogenetic shadowing approach together with a yeast one-hybrid screening of an arrayed library of approximately 1200 transcription factor open reading frames from Arabidopsis thaliana. We identified a conserved CathB3-element in the promoters of orthologous CathB3 genes within the Brassicaceae species analysed, and, as its DNA-interacting protein, the G-Box Binding Factor1 (GBF1). Transient overexpression of GBF1 together with a PAtCathB3::uidA (β-glucuronidase) construct in tobacco plants revealed a negative effect of GBF1 on expression driven by the AtCathB3 promoter. In stable P35S::GBF1 lines, not only was the expression of the AtCathB3 gene drastically reduced, but a significant slower germination was also observed. In the homozygous knockout mutant for the GBF1 gene, the opposite effect was found. These data indicate that GBF1 is a transcriptional repressor of the AtCathB3 gene and affects the germination kinetics of Arabidopsis thaliana seeds. As AtCathB3 is also expressed during post-germination in the cotyledons, a role for the AtCathB3-like protease in reserve mobilization is also inferred.
Collapse
Affiliation(s)
| | - Dorothee Wozny
- * Present address: Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829 Köln, Germany
| | | | | | | | - Cristina Barrero-Sicilia
- To whom correspondence should be addressed. Present address: Department of Biological Chemistry and Crop Protection, Rothamsted Research, West Common, Harpenden AL5 2JQ, UK. E-mail:
| |
Collapse
|
4
|
Diermeier SD, Németh A, Rehli M, Grummt I, Längst G. Chromatin-specific regulation of mammalian rDNA transcription by clustered TTF-I binding sites. PLoS Genet 2013; 9:e1003786. [PMID: 24068958 PMCID: PMC3772059 DOI: 10.1371/journal.pgen.1003786] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 07/26/2013] [Indexed: 12/04/2022] Open
Abstract
Enhancers and promoters often contain multiple binding sites for the same transcription factor, suggesting that homotypic clustering of binding sites may serve a role in transcription regulation. Here we show that clustering of binding sites for the transcription termination factor TTF-I downstream of the pre-rRNA coding region specifies transcription termination, increases the efficiency of transcription initiation and affects the three-dimensional structure of rRNA genes. On chromatin templates, but not on free rDNA, clustered binding sites promote cooperative binding of TTF-I, loading TTF-I to the downstream terminators before it binds to the rDNA promoter. Interaction of TTF-I with target sites upstream and downstream of the rDNA transcription unit connects these distal DNA elements by forming a chromatin loop between the rDNA promoter and the terminators. The results imply that clustered binding sites increase the binding affinity of transcription factors in chromatin, thus influencing the timing and strength of DNA-dependent processes. The sequence-specific binding of proteins to regulatory regions controls gene expression. Binding sites for transcription factors are rather short and present several million times in large genomes. However, only a small number of these binding sites are functionally important. How proteins can discriminate and select their functional regions is not clear, to date. Regulatory loci like gene promoters and enhancers commonly comprise multiple binding sites for either one factor or a combination of several DNA binding proteins, allowing efficient factor recruitment. We studied the cluster of TTF-I binding sites downstream of the rRNA gene and identified that cooperative binding to the multimeric termination sites in combination with low-affinity binding of TTF-I to individual sites upstream of the gene serves multiple regulatory functions. Packaging of the clustered sites into chromatin is a prerequisite for high-affinity binding, coordinated activation of transcription and the formation of a chromatin loop between the promoter and the terminator.
Collapse
Affiliation(s)
- Sarah D. Diermeier
- Biochemistry Centre Regensburg (BCR), University of Regensburg, Regensburg, Germany
| | - Attila Németh
- Biochemistry Centre Regensburg (BCR), University of Regensburg, Regensburg, Germany
| | - Michael Rehli
- Department of Hematology, University Hospital Regensburg, Regensburg, Germany
| | - Ingrid Grummt
- Molecular Biology of the Cell II, German Cancer Research Centre (DKFZ), Heidelberg, Germany
| | - Gernot Längst
- Biochemistry Centre Regensburg (BCR), University of Regensburg, Regensburg, Germany
- * E-mail:
| |
Collapse
|
5
|
Iglesias-Fernández R, Barrero-Sicilia C, Carrillo-Barral N, Oñate-Sánchez L, Carbonero P. Arabidopsis thaliana bZIP44: a transcription factor affecting seed germination and expression of the mannanase-encoding gene AtMAN7. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2013; 74:767-80. [PMID: 23461773 DOI: 10.1111/tpj.12162] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Accepted: 02/26/2013] [Indexed: 05/19/2023]
Abstract
Endo-β-mannanases (MAN; EC. 3.2.1.78) catalyze the cleavage of β1→4 bonds in mannan polymers and have been associated with the process of weakening the tissues surrounding the embryo during seed germination. In germinating Arabidopsis thaliana seeds, the most highly expressed MAN gene is AtMAN7 and its transcripts are restricted to the micropylar endosperm and to the radicle tip just before radicle emergence. Mutants with a T-DNA insertion in AtMAN7 have a slower germination than the wild type. To gain insight into the transcriptional regulation of the AtMAN7 gene, a bioinformatic search for conserved non-coding cis-elements (phylogenetic shadowing) within the Brassicaceae MAN7 gene promoters has been done, and these conserved motifs have been used as bait to look for their interacting transcription factors (TFs), using as a prey an arrayed yeast library from A. thaliana. The basic-leucine zipper TF AtbZIP44, but not the closely related AtbZIP11, has thus been identified and its transcriptional activation upon AtMAN7 has been validated at the molecular level. In the knock-out lines of AtbZIP44, not only is the expression of the AtMAN7 gene drastically reduced, but these mutants have a significantly slower germination than the wild type, being affected in the two phases of the germination process, both in the rupture of the seed coat and in the breakage of the micropylar endosperm cell walls. In the over-expression lines the opposite phenotype is observed.
Collapse
Affiliation(s)
- Raquel Iglesias-Fernández
- Centro de Biotecnología y Genómica de Plantas-UPM-INIA, ETSI Agrónomos, Universidad Politécnica de Madrid, Campus de Montegancedo, 28223 Pozuelo de Alarcón, Madrid, Spain.
| | | | | | | | | |
Collapse
|
6
|
Vernot B, Stergachis AB, Maurano MT, Vierstra J, Neph S, Thurman RE, Stamatoyannopoulos JA, Akey JM. Personal and population genomics of human regulatory variation. Genome Res 2013; 22:1689-97. [PMID: 22955981 PMCID: PMC3431486 DOI: 10.1101/gr.134890.111] [Citation(s) in RCA: 91] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The characteristics and evolutionary forces acting on regulatory variation in humans remains elusive because of the difficulty in defining functionally important noncoding DNA. Here, we combine genome-scale maps of regulatory DNA marked by DNase I hypersensitive sites (DHSs) from 138 cell and tissue types with whole-genome sequences of 53 geographically diverse individuals in order to better delimit the patterns of regulatory variation in humans. We estimate that individuals likely harbor many more functionally important variants in regulatory DNA compared with protein-coding regions, although they are likely to have, on average, smaller effect sizes. Moreover, we demonstrate that there is significant heterogeneity in the level of functional constraint in regulatory DNA among different cell types. We also find marked variability in functional constraint among transcription factor motifs in regulatory DNA, with sequence motifs for major developmental regulators, such as HOX proteins, exhibiting levels of constraint comparable to protein-coding regions. Finally, we perform a genome-wide scan of recent positive selection and identify hundreds of novel substrates of adaptive regulatory evolution that are enriched for biologically interesting pathways such as melanogenesis and adipocytokine signaling. These data and results provide new insights into patterns of regulatory variation in individuals and populations and demonstrate that a large proportion of functionally important variation lies beyond the exome.
Collapse
Affiliation(s)
- Benjamin Vernot
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Natarajan A, Yardimci GG, Sheffield NC, Crawford GE, Ohler U. Predicting cell-type-specific gene expression from regions of open chromatin. Genome Res 2013; 22:1711-22. [PMID: 22955983 PMCID: PMC3431488 DOI: 10.1101/gr.135129.111] [Citation(s) in RCA: 172] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Complex patterns of cell-type-specific gene expression are thought to be achieved by combinatorial binding of transcription factors (TFs) to sequence elements in regulatory regions. Predicting cell-type-specific expression in mammals has been hindered by the oftentimes unknown location of distal regulatory regions. To alleviate this bottleneck, we used DNase-seq data from 19 diverse human cell types to identify proximal and distal regulatory elements at genome-wide scale. Matched expression data allowed us to separate genes into classes of cell-type-specific up-regulated, down-regulated, and constitutively expressed genes. CG dinucleotide content and DNA accessibility in the promoters of these three classes of genes displayed substantial differences, highlighting the importance of including these aspects in modeling gene expression. We associated DNase I hypersensitive sites (DHSs) with genes, and trained classifiers for different expression patterns. TF sequence motif matches in DHSs provided a strong performance improvement in predicting gene expression over the typical baseline approach of using proximal promoter sequences. In particular, we achieved competitive performance when discriminating up-regulated genes from different cell types or genes up- and down-regulated under the same conditions. We identified previously known and new candidate cell-type-specific regulators. The models generated testable predictions of activating or repressive functions of regulators. DNase I footprints for these regulators were indicative of their direct binding to DNA. In summary, we successfully used information of open chromatin obtained by a single assay, DNase-seq, to address the problem of predicting cell-type-specific gene expression in mammalian organisms directly from regulatory sequence.
Collapse
Affiliation(s)
- Anirudh Natarajan
- Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina 27708, USA
| | | | | | | | | |
Collapse
|
8
|
Spivakov M, Akhtar J, Kheradpour P, Beal K, Girardot C, Koscielny G, Herrero J, Kellis M, Furlong EEM, Birney E. Analysis of variation at transcription factor binding sites in Drosophila and humans. Genome Biol 2012; 13:R49. [PMID: 22950968 PMCID: PMC3491393 DOI: 10.1186/gb-2012-13-9-r49] [Citation(s) in RCA: 77] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2012] [Revised: 05/23/2012] [Accepted: 06/08/2012] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Advances in sequencing technology have boosted population genomics and made it possible to map the positions of transcription factor binding sites (TFBSs) with high precision. Here we investigate TFBS variability by combining transcription factor binding maps generated by ENCODE, modENCODE, our previously published data and other sources with genomic variation data for human individuals and Drosophila isogenic lines. RESULTS We introduce a metric of TFBS variability that takes into account changes in motif match associated with mutation and makes it possible to investigate TFBS functional constraints instance-by-instance as well as in sets that share common biological properties. We also take advantage of the emerging per-individual transcription factor binding data to show evidence that TFBS mutations, particularly at evolutionarily conserved sites, can be efficiently buffered to ensure coherent levels of transcription factor binding. CONCLUSIONS Our analyses provide insights into the relationship between individual and interspecies variation and show evidence for the functional buffering of TFBS mutations in both humans and flies. In a broad perspective, these results demonstrate the potential of combining functional genomics and population genetics approaches for understanding gene regulation.
Collapse
Affiliation(s)
- Mikhail Spivakov
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Liu Y, Nandi S, Martel A, Antoun A, Ioshikhes I, Blais A. Discovery, optimization and validation of an optimal DNA-binding sequence for the Six1 homeodomain transcription factor. Nucleic Acids Res 2012; 40:8227-39. [PMID: 22730291 PMCID: PMC3458543 DOI: 10.1093/nar/gks587] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The Six1 transcription factor is a homeodomain protein involved in controlling gene expression during embryonic development. Six1 establishes gene expression profiles that enable skeletal myogenesis and nephrogenesis, among others. While several homeodomain factors have been extensively characterized with regards to their DNA-binding properties, relatively little is known of the properties of Six1. We have used the genomic binding profile of Six1 during the myogenic differentiation of myoblasts to obtain a better understanding of its preferences for recognizing certain DNA sequences. DNA sequence analyses on our genomic binding dataset, combined with biochemical characterization using binding assays, reveal that Six1 has a much broader DNA-binding sequence spectrum than had been previously determined. Moreover, using a position weight matrix optimization algorithm, we generated a highly sensitive and specific matrix that can be used to predict novel Six1-binding sites with highest accuracy. Furthermore, our results support the idea of a mode of DNA recognition by this factor where Six1 itself is sufficient for sequence discrimination, and where Six1 domains outside of its homeodomain contribute to binding site selection. Together, our results provide new light on the properties of this important transcription factor, and will enable more accurate modeling of Six1 function in bioinformatic studies.
Collapse
Affiliation(s)
- Yubing Liu
- Ottawa Institute of Systems Biology and Biochemistry, Microbiology and Immunology Department, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
| | | | | | | | | | | |
Collapse
|
10
|
Prouse MB, Campbell MM. The interaction between MYB proteins and their target DNA binding sites. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2012; 1819:67-77. [DOI: 10.1016/j.bbagrm.2011.10.010] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Revised: 10/17/2011] [Accepted: 10/18/2011] [Indexed: 02/02/2023]
|
11
|
Abstract
Differences in gene regulation are thought to play an important role in speciation and adaptation. Comparative genomic studies of gene expression levels have identified a large number of differentially expressed genes among species, and, in a number of cases, also pointed to connections between interspecies differences in gene regulation and differences in ultimate physiological or morphological phenotypes. The mechanisms underlying changes in gene regulation are also being actively studied using comparative genomic approaches. However, the relative importance of different regulatory mechanisms to interspecies differences in gene expression levels is not yet well understood. In particular, it is often difficult to infer causality between apparent differences in regulatory mechanisms and changes in gene expression levels, a challenge that is compounded by the fact that the link between sequence variation and gene regulation is not clear. Indeed, in certain cases, gene regulation can be conserved even when sequences at associated regulatory elements have changed. In this chapter, I examine different genomic approaches to the study of regulatory evolution and the underlying genetic and epigenetic regulatory mechanisms. I try to distinguish between hypothesis-driven and exploratory studies, and argue that the latter class of studies provides valuable information in its own right as well as necessary context for the former. I discuss issues related to study designs and statistical analyses of genomic studies, and review the evidence for natural selection on gene expression levels and associated regulatory mechanisms. Most of the issues that are discussed pertain to the general nature of multivariate genomic data, and thus are often relevant regardless of the technology that is used to collect high-throughput genomic data (for example, microarrays or massively parallel sequencing).
Collapse
|
12
|
Halfon MS, Zhu Q, Brennan ER, Zhou Y. Erroneous attribution of relevant transcription factor binding sites despite successful prediction of cis-regulatory modules. BMC Genomics 2011; 12:578. [PMID: 22115527 PMCID: PMC3235160 DOI: 10.1186/1471-2164-12-578] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2011] [Accepted: 11/25/2011] [Indexed: 12/22/2022] Open
Abstract
Background Cis-regulatory modules are bound by transcription factors to regulate gene expression. Characterizing these DNA sequences is central to understanding gene regulatory networks and gaining insight into mechanisms of transcriptional regulation, but genome-scale regulatory module discovery remains a challenge. One popular approach is to scan the genome for clusters of transcription factor binding sites, especially those conserved in related species. When such approaches are successful, it is typically assumed that the activity of the modules is mediated by the identified binding sites and their cognate transcription factors. However, the validity of this assumption is often not assessed. Results We successfully predicted five new cis-regulatory modules by combining binding site identification with sequence conservation and compared these to unsuccessful predictions from a related approach not utilizing sequence conservation. Despite greatly improved predictive success, the positive set had similar degrees of sequence and binding site conservation as the negative set. We explored the reasons for this by mutagenizing putative binding sites in three cis-regulatory modules. A large proportion of the tested sites had little or no demonstrable role in mediating regulatory element activity. Examination of loss-of-function mutants also showed that some transcription factors supposedly binding to the modules are not required for their function. Conclusions Our results raise important questions about interpreting regulatory module predictions obtained by finding clusters of conserved binding sites. Attribution of function to these sites and their cognate transcription factors may be incorrect even when modules are successfully identified. Our study underscores the importance of empirical validation of computational results even when these results are in line with expectation.
Collapse
Affiliation(s)
- Marc S Halfon
- Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14214, USA.
| | | | | | | |
Collapse
|
13
|
Starr MO, Ho MCW, Gunther EJM, Tu YK, Shur AS, Goetz SE, Borok MJ, Kang V, Drewell RA. Molecular dissection of cis-regulatory modules at the Drosophila bithorax complex reveals critical transcription factor signature motifs. Dev Biol 2011; 359:290-302. [PMID: 21821017 PMCID: PMC3202680 DOI: 10.1016/j.ydbio.2011.07.028] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 07/17/2011] [Accepted: 07/19/2011] [Indexed: 11/17/2022]
Abstract
At the Drosophila melanogaster bithorax complex (BX-C) over 330kb of intergenic DNA is responsible for directing the transcription of just three homeotic (Hox) genes during embryonic development. A number of distinct enhancer cis-regulatory modules (CRMs) are responsible for controlling the specific expression patterns of the Hox genes in the BX-C. While it has proven possible to identify orthologs of known BX-C CRMs in different Drosophila species using overall sequence conservation, this approach has not proven sufficiently effective for identifying novel CRMs or defining the key functional sequences within enhancer CRMs. Here we demonstrate that the specific spatial clustering of transcription factor (TF) binding sites is important for BX-C enhancer activity. A bioinformatic search for combinations of putative TF binding sites in the BX-C suggests that simple clustering of binding sites is frequently not indicative of enhancer activity. However, through molecular dissection and evolutionary comparison across the Drosophila genus we discovered that specific TF binding site clustering patterns are an important feature of three known BX-C enhancers. Sub-regions of the defined IAB5 and IAB7b enhancers were both found to contain an evolutionarily conserved signature motif of clustered TF binding sites which is critical for the functional activity of the enhancers. Together, these results indicate that the spatial organization of specific activator and repressor binding sites within BX-C enhancers is of greater importance than overall sequence conservation and is indicative of enhancer functional activity.
Collapse
Affiliation(s)
| | | | | | - Yen-Kuei Tu
- Biology Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
| | - Andrey S. Shur
- Biology Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
| | - Sara E. Goetz
- Biology Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
| | - Matthew J. Borok
- Biology Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
| | - Victoria Kang
- Biology Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
| | - Robert A. Drewell
- Biology Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
| |
Collapse
|
14
|
A ChIP-Seq benchmark shows that sequence conservation mainly improves detection of strong transcription factor binding sites. PLoS One 2011; 6:e18430. [PMID: 21533218 PMCID: PMC3077367 DOI: 10.1371/journal.pone.0018430] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Accepted: 03/03/2011] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Transcription factors are important controllers of gene expression and mapping transcription factor binding sites (TFBS) is key to inferring transcription factor regulatory networks. Several methods for predicting TFBS exist, but there are no standard genome-wide datasets on which to assess the performance of these prediction methods. Also, it is believed that information about sequence conservation across different genomes can generally improve accuracy of motif-based predictors, but it is not clear under what circumstances use of conservation is most beneficial. RESULTS Here we use published ChIP-seq data and an improved peak detection method to create comprehensive benchmark datasets for prediction methods which use known descriptors or binding motifs to detect TFBS in genomic sequences. We use this benchmark to assess the performance of five different prediction methods and find that the methods that use information about sequence conservation generally perform better than simpler motif-scanning methods. The difference is greater on high-affinity peaks and when using short and information-poor motifs. However, if the motifs are specific and information-rich, we find that simple motif-scanning methods can perform better than conservation-based methods. CONCLUSIONS Our benchmark provides a comprehensive test that can be used to rank the relative performance of transcription factor binding site prediction methods. Moreover, our results show that, contrary to previous reports, sequence conservation is better suited for predicting strong than weak transcription factor binding sites.
Collapse
|
15
|
When needles look like hay: how to find tissue-specific enhancers in model organism genomes. Dev Biol 2010; 350:239-54. [PMID: 21130761 DOI: 10.1016/j.ydbio.2010.11.026] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2010] [Revised: 11/11/2010] [Accepted: 11/22/2010] [Indexed: 01/22/2023]
Abstract
A major prerequisite for the investigation of tissue-specific processes is the identification of cis-regulatory elements. No generally applicable technique is available to distinguish them from any other type of genomic non-coding sequence. Therefore, researchers often have to identify these elements by elaborate in vivo screens, testing individual regions until the right one is found. Here, based on many examples from the literature, we summarize how functional enhancers have been isolated from other elements in the genome and how they have been characterized in transgenic animals. Covering computational and experimental studies, we provide an overview of the global properties of cis-regulatory elements, like their specific interactions with promoters and target gene distances. We describe conserved non-coding elements (CNEs) and their internal structure, nucleotide composition, binding site clustering and overlap, with a special focus on developmental enhancers. Conflicting data and unresolved questions on the nature of these elements are highlighted. Our comprehensive overview of the experimental shortcuts that have been found in the different model organism communities and the new field of high-throughput assays should help during the preparation phase of a screen for enhancers. The review is accompanied by a list of general guidelines for such a project.
Collapse
|
16
|
Cai Y, He Z, Shi X, Kong X, Gu L, Xie L. A novel sequence-based method of predicting protein DNA-binding residues, using a machine learning approach. Mol Cells 2010; 30:99-105. [PMID: 20706794 DOI: 10.1007/s10059-010-0093-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Revised: 04/06/2010] [Accepted: 04/22/2010] [Indexed: 11/29/2022] Open
Abstract
Protein-DNA interactions play an essential role in transcriptional regulation, DNA repair, and many vital biological processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.
Collapse
Affiliation(s)
- Yudong Cai
- Institute of System Biology, Shanghai University, Shanghai, 200244, People's Republic of China.
| | | | | | | | | | | |
Collapse
|
17
|
Adrian J, Farrona S, Reimer JJ, Albani MC, Coupland G, Turck F. cis-Regulatory elements and chromatin state coordinately control temporal and spatial expression of FLOWERING LOCUS T in Arabidopsis. THE PLANT CELL 2010; 22:1425-40. [PMID: 20472817 PMCID: PMC2899882 DOI: 10.1105/tpc.110.074682] [Citation(s) in RCA: 230] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2010] [Revised: 04/16/2010] [Accepted: 05/03/2010] [Indexed: 05/17/2023]
Abstract
Flowering time of summer annual Arabidopsis thaliana accessions is largely determined by the timing of FLOWERING LOCUS T (FT) expression in the leaf vasculature. To understand the complex interplay between activating and repressive inputs controlling flowering through FT, cis-regulatory sequences of FT were identified in this study. A proximal and an approximately 5-kb upstream promoter region containing highly conserved sequence blocks were found to be essential for FT activation by CONSTANS (CO). Chromatin-associated protein complexes add another layer to FT regulation. In plants constitutively overexpressing CO, changes in chromatin status, such as a decrease in binding of LIKE HETEROCHROMATIN PROTEIN1 (LHP1) and increased acetylation of H3K9 and K14, were observed throughout the FT locus, although these changes appear to be a consequence of FT upregulation and not a prerequisite for activation. Binding of LHP1 was required to repress enhancer elements located between the CO-controlled regions. By contrast, the distal and proximal promoter sequences required for FT activation coincide with locally LHP1 and H3K27me3 depleted chromatin, indicating that chromatin status facilitates the accessibility of transcription factors to FT. Therefore, distant regulatory regions are required for FT transcription, reflecting the complexity of its control and differences in chromatin status delimit functionally important cis-regulatory regions.
Collapse
|
18
|
Abstract
Animal growth and development depend on the precise control of gene expression at the level of transcription. A central role in the regulation of developmental transcription is attributed to transcription factors that bind DNA enhancer elements, which are often located far from gene transcription start sites. Here, we review recent studies that have uncovered significant regulatory functions in developmental transcription for the TFIID basal transcription factors and for the DNA core promoter elements that are located close to transcription start sites.
Collapse
Affiliation(s)
- Uwe Ohler
- Institute for Genome Sciences & Policy, Departments of Biostatistics & Bioinformatics and Computer Science, Duke University, Durham, NC 27708, USA
| | | |
Collapse
|
19
|
Fu AQ, Adryan B. Scoring overlapping and adjacent signals from genome-wide ChIP and DamID assays. MOLECULAR BIOSYSTEMS 2009; 5:1429-38. [PMID: 19763325 PMCID: PMC3475982 DOI: 10.1039/b906880e] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Much of the research utilising genome-wide ChIP and DamID assays aims to understand the combinatorial feature of transcription factor binding and the chromatin modification code. With these experimental methods becoming more affordable and widespread, the focus of research is shifting to making sense of the data. Amongst the many challenges arising from data analyses, we are concerned with identifying biologically meaningful co-occurrences of transcription factor binding or chromatin modifications, using genome-wide profiles generated from ChIP and DamID assays. Co-occurrences are reflected in overlapping and adjacent signals in multiple ChIP or DamID profiles. We review existing quantitative methods to score overlaps and to cluster binding events in ChIP and DamID profiles. For pairwise comparison, existing methods either are based on a single score at the genome level or take a genomic, region-specific view. To draw inference from many profiles simultaneously, methods exist to cluster regions by their regulatory importance or to infer cis-regulatory modules for a particular region. We provide a simple guide to some of the statistical tools used by these methods.
Collapse
Affiliation(s)
- Audrey Qiuyan Fu
- Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge, UK.
| | | |
Collapse
|
20
|
Unravelling cis-regulatory elements in the genome of the smallest photosynthetic eukaryote: phylogenetic footprinting in Ostreococcus. J Mol Evol 2009; 69:249-59. [PMID: 19693423 DOI: 10.1007/s00239-009-9271-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Revised: 07/17/2009] [Accepted: 07/27/2009] [Indexed: 10/20/2022]
Abstract
We used a phylogenetic footprinting approach, adapted to high levels of divergence, to estimate the level of constraint in intergenic regions of the extremely gene dense Ostreococcus algae genomes (Chlorophyta, Prasinophyceae). We first benchmarked our method against the Saccharomyces sensu stricto genome data and found that the proportion of conserved non-coding sites was consistent with those obtained with methods using calibration by the neutral substitution rate. We then applied our method to the complete genomes of Ostreococcus tauri and O. lucimarinus, which are the most divergent species from the same genus sequenced so far. We found that 77% of intergenic regions in Ostreococcus still contain some phylogenetic footprints, as compared to 88% for Saccharomyces, corresponding to an average rate of constraint on intergenic region of 17% and 30%, respectively. A comparison with some known functional cis-regulatory elements enabled us to investigate whether some transcriptional regulatory pathways were conserved throughout the green lineage. Strikingly, the size of the phylogenetic footprints depends on gene orientation of neighboring genes, and appears to be genus-specific. In Ostreococcus, 5' intergenic regions contain four times more conserved sites than 3' intergenic regions, whereas in yeast a higher frequency of constrained sites in intergenic regions between genes on the same DNA strand suggests a higher frequency of bidirectional regulatory elements. The phylogenetic footprinting approach can be used despite high levels of divergence in the ultrasmall Ostreococcus algae, to decipher structure of constrained regulatory motifs, and identify putative regulatory pathways conserved within the green lineage.
Collapse
|
21
|
Jeziorska DM, Jordan KW, Vance KW. A systems biology approach to understanding cis-regulatory module function. Semin Cell Dev Biol 2009; 20:856-62. [PMID: 19660565 DOI: 10.1016/j.semcdb.2009.07.007] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2009] [Accepted: 07/29/2009] [Indexed: 12/27/2022]
Abstract
The genomic instructions used to regulate development are encoded within a set of functional DNA elements called cis-regulatory modules (CRMs). These elements determine the precise patterns of temporal and spatial gene expression. Here we summarize recent progress made towards cataloguing and characterizing the complete repertoire of CRMs. We describe CRMs as genomic information processing devices containing clusters of transcription factor binding sites and we position CRMs as nodes within large gene regulatory networks. We define CRM architecture and describe how these genomic elements process the information they encode to their target genes. Furthermore, we present an overview describing high-throughput techniques to identify CRMs genome wide and experimental methodologies to validate their function on a large scale. This review emphasizes the advantages and power of a systems biology approach which integrates computational and experimental technologies to further our understanding of CRM function.
Collapse
Affiliation(s)
- Danuta M Jeziorska
- Departments of Systems Biology and Biological Sciences, University of Warwick, Biomedical Research Institute, Gibbet Hill, Coventry CV4 7AL, UK
| | | | | |
Collapse
|
22
|
Marco A, Konikoff C, Karr TL, Kumar S. Relationship between gene co-expression and sharing of transcription factor binding sites in Drosophila melanogaster. Bioinformatics 2009; 25:2473-7. [PMID: 19633094 DOI: 10.1093/bioinformatics/btp462] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
MOTIVATION In functional genomics, it is frequently useful to correlate expression levels of genes to identify transcription factor binding sites (TFBS) via the presence of common sequence motifs. The underlying assumption is that co-expressed genes are more likely to contain shared TFBS and, thus, TFBS can be identified computationally. Indeed, gene pairs with a very high expression correlation show a significant excess of shared binding sites in yeast. We have tested this assumption in a more complex organism, Drosophila melanogaster, by using experimentally determined TFBS and microarray expression data. We have also examined the reverse relationship between the expression correlation and the extent of TFBS sharing. RESULTS Pairs of genes with shared TFBS show, on average, a higher degree of co-expression than those with no common TFBS in Drosophila. However, the reverse does not hold true: gene pairs with high expression correlations do not share significantly larger numbers of TFBS. Exception to this observation exists when comparing expression of genes from the earliest stages of embryonic development. Interestingly, semantic similarity between gene annotations (Biological Process) is much better associated with TFBS sharing, as compared to the expression correlation. We discuss these results in light of reverse engineering approaches to computationally predict regulatory sequences by using comparative genomics.
Collapse
Affiliation(s)
- Antonio Marco
- Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA.
| | | | | | | |
Collapse
|
23
|
Deplancke B. Experimental advances in the characterization of metazoan gene regulatory networks. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2009; 8:12-27. [PMID: 19324929 DOI: 10.1093/bfgp/elp001] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Gene regulatory networks (GRNs) play a vital role in metazoan development and function, and deregulation of these networks is often implicated in disease. GRNs depict the dynamic interactions between genomic and regulatory state components. The genomic components comprise genes and their associated cis-regulatory elements. The regulatory state components consist primarily of transcriptional complexes that bind the latter elements. With the availability of complete genome sequences, several approaches have recently been developed which promise to significantly enhance our ability to identify either the genomic or regulatory state components, or the interactions between these two. In this review, I provide an in-depth overview of these approaches and detail how each contributes to a more comprehensive understanding of GRN composition and function.
Collapse
Affiliation(s)
- Bart Deplancke
- Ecole Polytechnique Fédérale de Lausanne, School of Life Sciences, Institute of Bioengineering, Lausanne, Switzerland.
| |
Collapse
|
24
|
Liu R, Hannenhalli S, Bucan M. Motifs and cis-regulatory modules mediating the expression of genes co-expressed in presynaptic neurons. Genome Biol 2009; 10:R72. [PMID: 19570198 PMCID: PMC2728526 DOI: 10.1186/gb-2009-10-7-r72] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Revised: 06/11/2009] [Accepted: 07/01/2009] [Indexed: 12/19/2022] Open
Abstract
An integrative strategy of comparative genomics, experimental and computational approaches reveals aspects of a regulatory network controlling neuronal-specific expression in presynaptic neurons. Background Hundreds of proteins modulate neurotransmitter release and synaptic plasticity during neuronal development and in response to synaptic activity. The expression of genes in the pre- and post-synaptic neurons is under stringent spatio-temporal control, but the mechanism underlying the neuronal expression of these genes remains largely unknown. Results Using unbiased in vivo and in vitro screens, we characterized the cis elements regulating the Rab3A gene, which is expressed abundantly in presynaptic neurons. A set of identified regulatory elements of the Rab3A gene corresponded to the defined Rab3A multi-species conserved elements. In order to identify clusters of enriched transcription factor binding sites, for example, cis-regulatory modules, we analyzed intergenic multi-species conserved elements in the vicinity of nine presynaptic genes, including Rab3A, that are highly and specifically expressed in brain regions. Sixteen transcription factor binding motifs were over-represented in these multi-species conserved elements. Based on a combined occurrence for these enriched motifs, multi-species conserved elements in the vicinity of 107 previously identified presynaptic genes were scored and ranked. We then experimentally validated the scoring strategy by showing that 12 of 16 (75%) high-scoring multi-species conserved elements functioned as neuronal enhancers in a cell-based assay. Conclusions This work introduces an integrative strategy of comparative genomics, experimental, and computational approaches to reveal aspects of a regulatory network controlling neuronal-specific expression of genes in presynaptic neurons.
Collapse
Affiliation(s)
- Rui Liu
- Department of Genetics and Penn Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | | |
Collapse
|
25
|
Riesenberg AN, Le TT, Willardsen MI, Blackburn DC, Vetter ML, Brown NL. Pax6 regulation of Math5 during mouse retinal neurogenesis. Genesis 2009; 47:175-87. [PMID: 19208436 DOI: 10.1002/dvg.20479] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Activation of the bHLH factor Math5 (Atoh7) is an initiating event for mammalian retinal neurogenesis, as it is critically required for retinal ganglion cell formation. However, the cis-regulatory elements and trans-acting factors that control Math5 expression are largely unknown. Using a combination of transgenic mice and bioinformatics, we identified a phylogenetically conserved regulatory element that is required to activate Math5 transcription during early retinal neurogenesis. This element drives retinal expression in vivo, in a cross-species transgenic assay. Previously, Pax6 was shown to be necessary for the initiation of Math5 mRNA expression. We extend this finding by showing that the Math5 retinal enhancer also requires Pax6 for its activation, via Pax6 binding to a highly conserved binding site. In addition, our data reveal that other retinal factors are required for accurate regulation of Math5 by Pax6.
Collapse
Affiliation(s)
- Amy N Riesenberg
- Division of Developmental Biology, Department of Pediatrics, University of Cincinnati School of Medicine, Cincinnati, Ohio 45229, USA
| | | | | | | | | | | |
Collapse
|
26
|
Evolution of Transcription Factor Binding Sites in Mammalian Gene Regulatory Regions: Handling Counterintuitive Results. J Mol Evol 2009; 68:654-64. [DOI: 10.1007/s00239-009-9238-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2007] [Revised: 03/30/2009] [Accepted: 04/15/2009] [Indexed: 01/26/2023]
|
27
|
Discovery of transcriptional programs in cerebral ischemia by in silico promoter analysis. Brain Res 2009; 1272:3-13. [DOI: 10.1016/j.brainres.2009.03.046] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2009] [Revised: 03/09/2009] [Accepted: 03/19/2009] [Indexed: 12/19/2022]
|
28
|
Wang S, Yang S, Yin Y, Guo X, Wang S, Hao D. An in silico strategy identified the target gene candidates regulated by dehydration responsive element binding proteins (DREBs) in Arabidopsis genome. PLANT MOLECULAR BIOLOGY 2009; 69:167-78. [PMID: 18931920 DOI: 10.1007/s11103-008-9414-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2008] [Accepted: 10/01/2008] [Indexed: 05/23/2023]
Abstract
Identification of downstream target genes of stress-relating transcription factors (TFs) is desirable in understanding cellular responses to various environmental stimuli. However, this has long been a difficult work for both experimental and computational practices. In this research, we presented a novel computational strategy which combined the analysis of the transcription factor binding site (TFBS) contexts and machine learning approach. Using this strategy, we conducted a genome-wide investigation into novel direct target genes of dehydration responsive element binding proteins (DREBs), the members of AP2-EREBPs transcription factor super family which is reported to be responsive to various abiotic stresses in Arabidopsis. The genome-wide searching yielded in total 474 target gene candidates. With reference to the microarray data for abiotic stresses-inducible gene expression profile, 268 target gene candidates out of the total 474 genes predicted, were induced during the 24-h exposure to abiotic stresses. This takes about 57% of total predicted targets. Furthermore, GO annotations revealed that these target genes are likely involved in protein amino acid phosphorylation, protein binding and Endomembrane sorting system. The results suggested that the predicted target gene candidates were adequate to meet the essential biological principle of stress-resistance in plants.
Collapse
Affiliation(s)
- Shichen Wang
- College of Animal Science and Veterinary Medicine, Jilin University, Changchun 130062, People's Republic of China
| | | | | | | | | | | |
Collapse
|
29
|
Miura H, Tomaru Y, Nakanishi M, Kondo S, Hayashizaki Y, Suzuki M. Identification of DNA regions and a set of transcriptional regulatory factors involved in transcriptional regulation of several human liver-enriched transcription factor genes. Nucleic Acids Res 2008; 37:778-92. [PMID: 19074951 PMCID: PMC2647325 DOI: 10.1093/nar/gkn978] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Mammalian tissue- and/or time-specific transcription is primarily regulated in a combinatorial fashion through interactions between a specific set of transcriptional regulatory factors (TRFs) and their cognate cis-regulatory elements located in the regulatory regions. In exploring the DNA regions and TRFs involved in combinatorial transcriptional regulation, we noted that individual knockdown of a set of human liver-enriched TRFs such as HNF1A, HNF3A, HNF3B, HNF3G and HNF4A resulted in perturbation of the expression of several single TRF genes, such as HNF1A, HNF3G and CEBPA genes. We thus searched the potential binding sites for these five TRFs in the highly conserved genomic regions around these three TRF genes and found several putative combinatorial regulatory regions. Chromatin immunoprecipitation analysis revealed that almost all of the putative regulatory DNA regions were bound by the TRFs as well as two coactivators (CBP and p300). The strong transcription-enhancing activity of the putative combinatorial regulatory region located downstream of the CEBPA gene was confirmed. EMSA demonstrated specific bindings of these HNFs to the target DNA region. Finally, co-transfection reporter assays with various combinations of expression vectors for these HNF genes demonstrated the transcriptional activation of the CEBPA gene in a combinatorial manner by these TRFs.
Collapse
Affiliation(s)
- Hisashi Miura
- RIKEN Omics Science Center, RIKEN Yokohama Institute 1-7-22 Suehiro-Cho, Tsurumi-Ku, Yokohama, Kanagawa 230-0045, Japan
| | | | | | | | | | | |
Collapse
|
30
|
Uchikawa M. Enhancer analysis by chicken embryo electroporation with aid of genome comparison. Dev Growth Differ 2008; 50:467-74. [DOI: 10.1111/j.1440-169x.2008.01028.x] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
31
|
Maston GA, Evans SK, Green MR. Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet 2008; 7:29-59. [PMID: 16719718 DOI: 10.1146/annurev.genom.7.080505.115623] [Citation(s) in RCA: 546] [Impact Index Per Article: 34.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The faithful execution of biological processes requires a precise and carefully orchestrated set of steps that depend on the proper spatial and temporal expression of genes. Here we review the various classes of transcriptional regulatory elements (core promoters, proximal promoters, distal enhancers, silencers, insulators/boundary elements, and locus control regions) and the molecular machinery (general transcription factors, activators, and coactivators) that interacts with the regulatory elements to mediate precisely controlled patterns of gene expression. The biological importance of transcriptional regulation is highlighted by examples of how alterations in these transcriptional components can lead to disease. Finally, we discuss the methods currently used to identify transcriptional regulatory elements, and the ability of these methods to be scaled up for the purpose of annotating the entire human genome.
Collapse
Affiliation(s)
- Glenn A Maston
- Howard Hughes Medical Institute, Programs in Gene Function and Expression and Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA.
| | | | | |
Collapse
|
32
|
Identification of a novel regulatory region in the interleukin-6 gene promoter. Cytokine 2008; 42:256-264. [PMID: 18406623 DOI: 10.1016/j.cyto.2008.02.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2007] [Revised: 01/31/2008] [Accepted: 02/18/2008] [Indexed: 11/22/2022]
Abstract
Interleukin-6 (IL6) is an important pleiotropic cytokine that is regulated at the transcriptional level. To date, most work on its regulation has focused on a 1.2kb region 5' from the start of transcription, similar to published reports on other cytokine genes. This report demonstrates for the first time that a cytokine gene can be regulated by cis-acting regions much further upstream than previously examined. Comparative genomic analysis showed that a 120 kb region contains blocks of sequence conservation between human and rodent genomes, and that a 15 kb region proximal to the start of transcription contains 10 highly homologous sequence blocks of between 100 and 250 bp. By means of a reporter gene assay, a novel transcriptionally active region located between -5307 and -5202 bp upstream from the start of transcription was identified. Electrophoretic mobility shift assays showed nuclear protein(s) binding to this region, thus raising the possibility that the regulatory activity shown by the reporter gene constructs may be mediated by these proteins. These results suggest that the regulation of IL6 expression involves a much larger upstream region than previously examined and the control of IL6 transcription is likely to be regulated by a complex mechanism of modular cis-regulatory elements.
Collapse
|
33
|
de Candia P, Blekhman R, Chabot AE, Oshlack A, Gilad Y. A combination of genomic approaches reveals the role of FOXO1a in regulating an oxidative stress response pathway. PLoS One 2008; 3:e1670. [PMID: 18301748 PMCID: PMC2244703 DOI: 10.1371/journal.pone.0001670] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2008] [Accepted: 01/30/2008] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND While many of the phenotypic differences between human and chimpanzee may result from changes in gene regulation, only a handful of functionally important regulatory differences are currently known. As a first step towards identifying transcriptional pathways that have been remodeled in the human lineage, we focused on a transcription factor, FOXO1a, which we had previously found to be up-regulated in the human liver compared to that of three other primate species. We concentrated on this gene because of its known role in the regulation of metabolism and in longevity. METHODOLOGY Using a combination of expression profiling following siRNA knockdown and chromatin immunoprecipitation in a human liver cell line, we identified eight novel direct transcriptional targets of FOXO1a. This set includes the gene for thioredoxin-interacting protein (TXNIP), the expression of which is directly repressed by FOXO1a. The thioredoxin-interacting protein is known to inhibit the reducing activity of thioredoxin (TRX), thereby hindering the cellular response to oxidative stress and affecting life span. CONCLUSIONS Our results provide an explanation for the repeated observations that differences in the regulation of FOXO transcription factors affect longevity. Moreover, we found that TXNIP is down-regulated in human compared to chimpanzee, consistent with the up-regulation of its direct repressor FOXO1a in humans, and with differences in longevity between the two species.
Collapse
Affiliation(s)
- Paola de Candia
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- *E-mail: (Pd); (YG)
| | - Ran Blekhman
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Adrien E. Chabot
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Alicia Oshlack
- Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | - Yoav Gilad
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- *E-mail: (Pd); (YG)
| |
Collapse
|
34
|
Wei W, Yu XD. Comparative analysis of regulatory motif discovery tools for transcription factor binding sites. GENOMICS PROTEOMICS & BIOINFORMATICS 2007; 5:131-42. [PMID: 17893078 PMCID: PMC5054109 DOI: 10.1016/s1672-0229(07)60023-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
In the post-genomic era, identification of specific regulatory motifs or transcription factor binding sites (TFBSs) in non-coding DNA sequences, which is essential to elucidate transcriptional regulatory networks, has emerged as an obstacle that frustrates many researchers. Consequently, numerous motif discovery tools and correlated databases have been applied to solving this problem. However, these existing methods, based on different computational algorithms, show diverse motif prediction efficiency in non-coding DNA sequences. Therefore, understanding the similarities and differences of computational algorithms and enriching the motif discovery literatures are important for users to choose the most appropriate one among the online available tools. Moreover, there still lacks credible criterion to assess motif discovery tools and instructions for researchers to choose the best according to their own projects. Thus integration of the related resources might be a good approach to improve accuracy of the application. Recent studies integrate regulatory motif discovery tools with experimental methods to offer a complementary approach for researchers, and also provide a much-needed model for current researches on transcriptional regulatory networks. Here we present a comparative analysis of regulatory motif discovery tools for TFBSs.
Collapse
|
35
|
Andersson SA, Lagergren J. Motif Yggdrasil: sampling sequence motifs from a tree mixture model. J Comput Biol 2007; 14:682-97. [PMID: 17683268 DOI: 10.1089/cmb.2007.r010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version.
Collapse
Affiliation(s)
- Samuel A Andersson
- Stockholm Bioinformatics Center and School of Computer Science and Communication, Royal Institute of Technology, Stockholm, Sweden.
| | | |
Collapse
|
36
|
Chabot A, Shrit RA, Blekhman R, Gilad Y. Using reporter gene assays to identify cis regulatory differences between humans and chimpanzees. Genetics 2007; 176:2069-76. [PMID: 17565944 PMCID: PMC1950614 DOI: 10.1534/genetics.107.073429] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Most phenotypic differences between human and chimpanzee are likely to result from differences in gene regulation, rather than changes to protein-coding regions. To date, however, only a handful of human-chimpanzee nucleotide differences leading to changes in gene regulation have been identified. To hone in on differences in regulatory elements between human and chimpanzee, we focused on 10 genes that were previously found to be differentially expressed between the two species. We then designed reporter gene assays for the putative human and chimpanzee promoters of the 10 genes. Of seven promoters that we found to be active in human liver cell lines, human and chimpanzee promoters had significantly different activity in four cases, three of which recapitulated the gene expression difference seen in the microarray experiment. For these three genes, we were therefore able to demonstrate that a change in cis influences expression differences between humans and chimpanzees. Moreover, using site-directed mutagenesis on one construct, the promoter for the DDA3 gene, we were able to identify three nucleotides that together lead to a cis regulatory difference between the species. High-throughput application of this approach can provide a map of regulatory element differences between humans and our close evolutionary relatives.
Collapse
Affiliation(s)
- Adrien Chabot
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL 60637, USA
| | | | | | | |
Collapse
|
37
|
Chen N, Mah A, Blacque OE, Chu J, Phgora K, Bakhoum MW, Hunt Newbury CR, Khattra J, Chan S, Go A, Efimenko E, Johnsen R, Phirke P, Swoboda P, Marra M, Moerman DG, Leroux MR, Baillie DL, Stein LD. Identification of ciliary and ciliopathy genes in Caenorhabditis elegans through comparative genomics. Genome Biol 2007; 7:R126. [PMID: 17187676 PMCID: PMC1794439 DOI: 10.1186/gb-2006-7-12-r126] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2006] [Revised: 10/20/2006] [Accepted: 12/22/2006] [Indexed: 01/05/2023] Open
Abstract
Comparative genomic analysis of three nematode species identifies 93 genes that encode putative components of the ciliated neurons in C. elegans and are subject to the same regulatory control. Background The recent availability of genome sequences of multiple related Caenorhabditis species has made it possible to identify, using comparative genomics, similarly transcribed genes in Caenorhabditis elegans and its sister species. Taking this approach, we have identified numerous novel ciliary genes in C. elegans, some of which may be orthologs of unidentified human ciliopathy genes. Results By screening for genes possessing canonical X-box sequences in promoters of three Caenorhabditis species, namely C. elegans, C. briggsae and C. remanei, we identified 93 genes (including known X-box regulated genes) that encode putative components of ciliated neurons in C. elegans and are subject to the same regulatory control. For many of these genes, restricted anatomical expression in ciliated cells was confirmed, and control of transcription by the ciliogenic DAF-19 RFX transcription factor was demonstrated by comparative transcriptional profiling of different tissue types and of daf-19(+) and daf-19(-) animals. Finally, we demonstrate that the dye-filling defect of dyf-5(mn400) animals, which is indicative of compromised exposure of cilia to the environment, is caused by a nonsense mutation in the serine/threonine protein kinase gene M04C9.5. Conclusion Our comparative genomics-based predictions may be useful for identifying genes involved in human ciliopathies, including Bardet-Biedl Syndrome (BBS), since the C. elegans orthologs of known human BBS genes contain X-box motifs and are required for normal dye filling in C. elegans ciliated neurons.
Collapse
Affiliation(s)
- Nansheng Chen
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
- Department of Molecular Biology and Biochemistry, Simon Fraser University, University Drive, Burnaby, British Columbia, Canada V5A 1S6
| | - Allan Mah
- Department of Molecular Biology and Biochemistry, Simon Fraser University, University Drive, Burnaby, British Columbia, Canada V5A 1S6
| | - Oliver E Blacque
- Department of Molecular Biology and Biochemistry, Simon Fraser University, University Drive, Burnaby, British Columbia, Canada V5A 1S6
- School of Biomolecular and Biomedical Sciences, Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - Jeffrey Chu
- Department of Molecular Biology and Biochemistry, Simon Fraser University, University Drive, Burnaby, British Columbia, Canada V5A 1S6
| | - Kiran Phgora
- Department of Molecular Biology and Biochemistry, Simon Fraser University, University Drive, Burnaby, British Columbia, Canada V5A 1S6
| | - Mathieu W Bakhoum
- Department of Molecular Biology and Biochemistry, Simon Fraser University, University Drive, Burnaby, British Columbia, Canada V5A 1S6
| | - C Rebecca Hunt Newbury
- Department of Zoology, University of British Columbia, West Mall, Vancouver, British Columbia, Canada V6T 1Z4
| | - Jaswinder Khattra
- Department of Zoology, University of British Columbia, West Mall, Vancouver, British Columbia, Canada V6T 1Z4
| | - Susanna Chan
- Department of Zoology, University of British Columbia, West Mall, Vancouver, British Columbia, Canada V6T 1Z4
| | - Anne Go
- Department of Zoology, University of British Columbia, West Mall, Vancouver, British Columbia, Canada V6T 1Z4
| | - Evgeni Efimenko
- Karolinska Institute, Department of Biosciences and Nutrition, Södertörn University College, School of Life Sciences, S-14189 Huddinge, Sweden
| | - Robert Johnsen
- Department of Molecular Biology and Biochemistry, Simon Fraser University, University Drive, Burnaby, British Columbia, Canada V5A 1S6
| | - Prasad Phirke
- Karolinska Institute, Department of Biosciences and Nutrition, Södertörn University College, School of Life Sciences, S-14189 Huddinge, Sweden
| | - Peter Swoboda
- Karolinska Institute, Department of Biosciences and Nutrition, Södertörn University College, School of Life Sciences, S-14189 Huddinge, Sweden
| | - Marco Marra
- British Columbia Cancer Agency, Genome Sciences Centre, Vancouver, British Columbia, Canada V5Z 4S6
| | - Donald G Moerman
- Department of Zoology, University of British Columbia, West Mall, Vancouver, British Columbia, Canada V6T 1Z4
| | - Michel R Leroux
- Department of Molecular Biology and Biochemistry, Simon Fraser University, University Drive, Burnaby, British Columbia, Canada V5A 1S6
| | - David L Baillie
- Department of Molecular Biology and Biochemistry, Simon Fraser University, University Drive, Burnaby, British Columbia, Canada V5A 1S6
| | - Lincoln D Stein
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| |
Collapse
|
38
|
Abnizova I, Subhankulova T, Gilks WR. Recent computational approaches to understand gene regulation: mining gene regulation in silico. Curr Genomics 2007; 8:79-91. [PMID: 18660846 PMCID: PMC2435357 DOI: 10.2174/138920207780368150] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2006] [Revised: 12/13/2006] [Accepted: 12/15/2006] [Indexed: 01/03/2023] Open
Abstract
This paper reviews recent computational approaches to the understanding of gene regulation in eukaryotes. Cis-regulation of gene expression by the binding of transcription factors is a critical component of cellular physiology. In eukaryotes, a number of transcription factors often work together in a combinatorial fashion to enable cells to respond to a wide spectrum of environmental and developmental signals. Integration of genome sequences and/or Chromatin Immunoprecipitation on chip data with gene-expression data has facilitated in silico discovery of how the combinatorics and positioning of transcription factors binding sites underlie gene activation in a variety of cellular processes.The process of gene regulation is extremely complex and intriguing, therefore all possible points of view and related links should be carefully considered. Here we attempt to collect an inventory, not claiming it to be comprehensive and complete, of related computational biological topics covering gene regulation, which may en-lighten the process, and briefly review what is currently occurring in these areas.We will consider the following computational areas:o gene regulatory network construction;o evolution of regulatory DNA;o studies of its structural and statistical informational properties;o and finally, regulatory RNA.
Collapse
Affiliation(s)
| | - T Subhankulova
- Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer and Developmental Biology, Cambridge, UK
| | | |
Collapse
|
39
|
A comparative genomics approach to identifying the plasticity transcriptome. BMC Neurosci 2007; 8:20. [PMID: 17355637 PMCID: PMC1831778 DOI: 10.1186/1471-2202-8-20] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2006] [Accepted: 03/13/2007] [Indexed: 02/04/2023] Open
Abstract
Background Neuronal activity regulates gene expression to control learning and memory, homeostasis of neuronal function, and pathological disease states such as epilepsy. A great deal of experimental evidence supports the involvement of two particular transcription factors in shaping the genomic response to neuronal activity and mediating plasticity: CREB and zif268 (egr-1, krox24, NGFI-A). The gene targets of these two transcription factors are of considerable interest, since they may help develop hypotheses about how neural activity is coupled to changes in neural function. Results We have developed a computational approach for identifying binding sites for these transcription factors within the promoter regions of annotated genes in the mouse, rat, and human genomes. By combining a robust search algorithm to identify discrete binding sites, a comparison of targets across species, and an analysis of binding site locations within promoter regions, we have defined a group of candidate genes that are strong CREB- or zif268 targets and are thus regulated by neural activity. Our analysis revealed that CREB and zif268 share a disproportionate number of targets in common and that these common targets are dominated by transcription factors. Conclusion These observations may enable a more detailed understanding of the regulatory networks that are induced by neural activity and contribute to the plasticity transcriptome. The target genes identified in this study will be a valuable resource for investigators who hope to define the functions of specific genes that underlie activity-dependent changes in neuronal properties.
Collapse
|
40
|
Lee J, Li Z, Brower-Sinning R, John B. Regulatory circuit of human microRNA biogenesis. PLoS Comput Biol 2007; 3:e67. [PMID: 17447837 PMCID: PMC1853126 DOI: 10.1371/journal.pcbi.0030067] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2006] [Accepted: 02/27/2007] [Indexed: 01/07/2023] Open
Abstract
miRNAs (microRNAs) are a class of endogenous small RNAs that are thought to negatively regulate protein production. Aberrant expression of many miRNAs is linked to cancer and other diseases. Little is known about the factors that regulate the expression of miRNAs. We have identified numerous regulatory elements upstream of miRNA genes that are likely to be essential to the transcriptional and posttranscriptional regulation of miRNAs. Newly identified regulatory motifs occur frequently and in multiple copies upstream of miRNAs. The motifs are highly enriched in G and C nucleotides, in comparison with the nucleotide composition of miRNA upstream sequences. Although the motifs were predicted using sequences that are upstream of miRNAs, we find that 99% of the top-predicted motifs preferentially occur within the first 500 nucleotides upstream of the transcription start sites of protein-coding genes; the observed preference in location underscores the validity and importance of the motifs identified in this study. Our study also raises the possibility that a considerable number of well-characterized, disease-associated transcription factors (TFs) of protein-coding genes contribute to the abnormal miRNA expression in diseases such as cancer. Further analysis of predicted miRNA–protein interactions lead us to hypothesize that TFs that include c-Myb, NF-Y, Sp-1, MTF-1, and AP-2α are master-regulators of miRNA expression. Our predictions are a solid starting point for the systematic elucidation of the causative basis for aberrant expression patterns of disease-related (e.g., cancer) miRNAs. Thus, we point out that focused studies of the TFs that regulate miRNAs will be paramount in developing cures for miRNA-related diseases. The identification of the miRNA regulatory motifs was facilitated by a new computational method, K-Factor. K-Factor predicts regulatory motifs in a set of functionally related sequences, without relying on evolutionary conservation. microRNAs (miRNAs) are unusually small RNAs that are thought to control the production of proteins in the cell. Recent studies have linked miRNAs to several types of cancers. Several studies strongly suggest that miRNAs could be useful as diagnostic and prognostic markers of various cancers. Thus, although miRNAs appear to have opened up a new chapter in cancer biology, the fundamental question regarding why miRNAs are strongly associated with diseases such as cancer remain unclear. Here, we endeavored to systematically identify the factors that regulate miRNA biogenesis. We first identified a large number of DNA sequence elements that are characteristic of miRNA genes, using a new computational method named K-Factor. The sequence elements were then used to match known protein binding sites to identify specific proteins (transcription factors (TF)) that regulate miRNA biogenesis. Based on our observations, we put forward the hypothesis that a number of known TFs are primarily responsible for the aberrant regulation of miRNAs in cancer and other diseases.
Collapse
Affiliation(s)
- Ji Lee
- Department of Computational Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America
- Department of Bioengineering, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Zhihua Li
- Department of Computational Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America
| | - Rachel Brower-Sinning
- Department of Computational Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America
| | - Bino John
- Department of Computational Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America
- University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
41
|
Davuluri RV. Bioinformatics tools for modeling transcription factor target genes and epigenetic changes. Methods Mol Biol 2007; 408:129-151. [PMID: 18314581 DOI: 10.1007/978-1-59745-547-3_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The combinatorial control of gene regulatory switches involves both transcription factor (TF) complexes and associated epigenetic modifications to the chromatin template. The novel high-throughput technologies, such as Chromatin ImmunoPrecipitation ChIP-chip, have enabled genome-wide in vivo identification of TF target regulatory regions and related epigenetic modifications, which led to the view of highly dynamic TF-DNA interactions in activated or repressed promoters. Consequently, modeling and elucidating the combinatorial interaction of TFs and corresponding cis-regulatory modules in target promoters is of paramount interest. An estimated 5% of the genes in mammalian genomes code for TF proteins, and computational modeling of cis-regulatory logic would rapidly increase the pace of experimental confirmation of TF target promoters at the bench. The purpose of this chapter is to discuss the use of different bioinformatics tools for predicting the target genes of TFs of interest in mammalian genomes, and the application of these methods in the analysis of ChIP-chip experimental data. The author describes most commonly used databases and prediction programs that are available on the World Wide Web and demonstrate the use of some of these programs by an example. A list of these programs is provided along with their web Uniform Resource Locator (URLs) and guidelines for successful application are suggested.
Collapse
Affiliation(s)
- Ramana V Davuluri
- OSU Comprehensive Cancer Center, Ohio State University, Columbus, USA
| |
Collapse
|
42
|
Lein ES, Hawrylycz MJ, Ao N, Ayres M, Bensinger A, Bernard A, Boe AF, Boguski MS, Brockway KS, Byrnes EJ, Chen L, Chen L, Chen TM, Chin MC, Chong J, Crook BE, Czaplinska A, Dang CN, Datta S, Dee NR, Desaki AL, Desta T, Diep E, Dolbeare TA, Donelan MJ, Dong HW, Dougherty JG, Duncan BJ, Ebbert AJ, Eichele G, Estin LK, Faber C, Facer BA, Fields R, Fischer SR, Fliss TP, Frensley C, Gates SN, Glattfelder KJ, Halverson KR, Hart MR, Hohmann JG, Howell MP, Jeung DP, Johnson RA, Karr PT, Kawal R, Kidney JM, Knapik RH, Kuan CL, Lake JH, Laramee AR, Larsen KD, Lau C, Lemon TA, Liang AJ, Liu Y, Luong LT, Michaels J, Morgan JJ, Morgan RJ, Mortrud MT, Mosqueda NF, Ng LL, Ng R, Orta GJ, Overly CC, Pak TH, Parry SE, Pathak SD, Pearson OC, Puchalski RB, Riley ZL, Rockett HR, Rowland SA, Royall JJ, Ruiz MJ, Sarno NR, Schaffnit K, Shapovalova NV, Sivisay T, Slaughterbeck CR, Smith SC, Smith KA, Smith BI, Sodt AJ, Stewart NN, Stumpf KR, Sunkin SM, Sutram M, Tam A, Teemer CD, Thaller C, Thompson CL, Varnam LR, Visel A, Whitlock RM, Wohnoutka PE, Wolkey CK, Wong VY, Wood M, Yaylaoglu MB, Young RC, Youngstrom BL, Yuan XF, Zhang B, Zwingman TA, Jones AR. Genome-wide atlas of gene expression in the adult mouse brain. Nature 2006; 445:168-76. [PMID: 17151600 DOI: 10.1038/nature05453] [Citation(s) in RCA: 3885] [Impact Index Per Article: 215.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2006] [Accepted: 11/15/2006] [Indexed: 11/09/2022]
Abstract
Molecular approaches to understanding the functional circuitry of the nervous system promise new insights into the relationship between genes, brain and behaviour. The cellular diversity of the brain necessitates a cellular resolution approach towards understanding the functional genomics of the nervous system. We describe here an anatomically comprehensive digital atlas containing the expression patterns of approximately 20,000 genes in the adult mouse brain. Data were generated using automated high-throughput procedures for in situ hybridization and data acquisition, and are publicly accessible online. Newly developed image-based informatics tools allow global genome-scale structural analysis and cross-correlation, as well as identification of regionally enriched genes. Unbiased fine-resolution analysis has identified highly specific cellular markers as well as extensive evidence of cellular heterogeneity not evident in classical neuroanatomical atlases. This highly standardized atlas provides an open, primary data resource for a wide variety of further studies concerning brain organization and function.
Collapse
Affiliation(s)
- Ed S Lein
- Allen Institute for Brain Science, Seattle, Washington 98103, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Elnitski L, Jin VX, Farnham PJ, Jones SJM. Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res 2006; 16:1455-64. [PMID: 17053094 DOI: 10.1101/gr.4140006] [Citation(s) in RCA: 168] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Fields such as genomics and systems biology are built on the synergism between computational and experimental techniques. This type of synergism is especially important in accomplishing goals like identifying all functional transcription factor binding sites in vertebrate genomes. Precise detection of these elements is a prerequisite to deciphering the complex regulatory networks that direct tissue specific and lineage specific patterns of gene expression. This review summarizes approaches for in silico, in vitro, and in vivo identification of transcription factor binding sites. A variety of techniques useful for localized- and high-throughput analyses are discussed here, with emphasis on aspects of data generation and verification.
Collapse
Affiliation(s)
- Laura Elnitski
- Genomic Functional Analysis Section, National Human Genome Research Institute, National Institutes of Health, Rockville, Maryland 20878, USA.
| | | | | | | |
Collapse
|
44
|
Fu Q, Manolagas SC, O'Brien CA. Parathyroid hormone controls receptor activator of NF-kappaB ligand gene expression via a distant transcriptional enhancer. Mol Cell Biol 2006; 26:6453-68. [PMID: 16914731 PMCID: PMC1592840 DOI: 10.1128/mcb.00356-06] [Citation(s) in RCA: 149] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
RANKL, a protein essential for osteoclast development and survival, is stimulated by parathyroid hormone (PTH) via a PTH receptor 1/cyclic AMP (cAMP)/protein kinase A (PKA)/CREB cascade, exclusively in osteoblastic cells. We report that a bacterial artificial chromosome-based transcriptional reporter construct containing 120 kb of RANKL 5'-flanking region was stimulated by dibutyryl-cAMP in stromal/osteoblastic cells, but not other cell types. Full cAMP responsiveness was dependent upon a conserved 715-bp region located 76 kb upstream from the transcription start site, which we identified by sequential deletion analysis and by comparison of human and mouse genomic sequences in silico. This region contained conserved consensus sequences which bound CREB and the osteoblast-specific transcription factor Runx2, and when mutated blunted cAMP responsiveness. Overexpression of Runx2 potentiated cAMP responsiveness of the endogenous RANKL gene in a cell-type-specific manner. Lastly, PTH responsiveness of the endogenous RANKL gene was abrogated in mice from which we deleted this conserved upstream region. Thus, PTH responsiveness of the RANKL gene is determined by a distant regulatory region that responds to cAMP in a cell-type-specific manner and Runx2 may contribute to such cell-type specificity.
Collapse
Affiliation(s)
- Qiang Fu
- University of Arkansas for Medical Sciences, 4301 W. Markham St., Mail Slot 587, Little Rock, AR 72205, USA.
| | | | | |
Collapse
|
45
|
Beltran A, Liu Y, Parikh S, Temple B, Blancafort P. Interrogating genomes with combinatorial artificial transcription factor libraries: asking zinc finger questions. Assay Drug Dev Technol 2006; 4:317-31. [PMID: 16834537 DOI: 10.1089/adt.2006.4.317] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Artificial transcription factors (ATFs) are proteins designed to specifically bind and regulate genes. Because of their DNA-binding selectivity and modular organization, arrays of zinc finger (ZF) domains have traditionally been used to build the ATF's DNA-binding domains. ATFs have been designed and constructed to regulate a variety of therapeutic targets. Recently, novel combinatorial technologies have been developed to induce expression of any gene of interest or to modify cellular phenotypes. Large repertoires of ATFs have been generated by recombination of all available sequence-specific ZF lexicons. These libraries comprise millions of ATFs with unique DNA-binding specificities. The ATFs are produced by combinatorial assembly of three- and six-ZF building blocks and are linked to activator or repressor domains. Upon delivery into a cell population, any gene in the human genome can potentially be regulated. ATF library members generate genome-wide, experimental perturbations of gene expression, resulting in a phenotypically diverse population, or cellular library. A variety of phenotypic screenings can be applied to select for cells exhibiting a phenotype of interest. The ATFs are then used as genetic probes to identify the targeted genes responsible for the phenotypic switch. In this review we will summarize several applications of ATF library screenings in gene discovery, biotechnology, and disease therapeutics.
Collapse
Affiliation(s)
- Adriana Beltran
- Department of Pharmacology, University of North Carolina, Chapel Hill, NC 27599, USA
| | | | | | | | | |
Collapse
|
46
|
Podvinec M, Meyer UA. Prediction of cis-regulatory elements for drug-activated transcription factors in the regulation of drug-metabolising enzymes and drug transporters. Expert Opin Drug Metab Toxicol 2006; 2:367-79. [PMID: 16863440 DOI: 10.1517/17425255.2.3.367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The expression of drug-metabolising enzymes is affected by many endogenous and exogenous factors, including sex, age, diet and exposure to xenobiotics and drugs. To understand fully how the organism metabolises a drug, these alterations in gene expression must be taken into account. The central process, the definition of likely regulatory elements in the genes coding for enzymes and transporters involved in drug disposition, can be vastly accelerated using existing and emerging bioinformatics methods to unravel the regulatory networks causing drug-mediated induction of genes. Here, various approaches to predict transcription factor interactions with regulatory DNA elements are reviewed.
Collapse
Affiliation(s)
- Michael Podvinec
- Swiss Institute of Bioinformatics and Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.
| | | |
Collapse
|
47
|
Gómez-Skarmeta JL, Lenhard B, Becker TS. New technologies, new findings, and new concepts in the study of vertebrate cis-regulatory sequences. Dev Dyn 2006; 235:870-85. [PMID: 16395688 DOI: 10.1002/dvdy.20659] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
All vertebrates share a similar early embryonic body plan and use the same regulatory genes for their development. The availability of numerous sequenced vertebrate genomes and significant advances in bioinformatics have resulted in the finding that the genomic regions of many of these developmental regulatory genes also contain highly conserved noncoding sequence. In silico discovery of conserved noncoding regions and of transcription factor binding sites as well as the development of methods for high throughput transgenesis in Xenopus and zebrafish are dramatically increasing the speed with which regulatory elements can be discovered, characterized, and tested in the context of whole live embryos. We review here some of the recent technological developments that will likely lead to a surge in research on how vertebrate genomes encode regulation of transcriptional activity, how regulatory sequences constrain genomic architecture, and ultimately how vertebrate form has evolved.
Collapse
|
48
|
Abstract
Morphogens act as graded positional cues that control cell fate specification in many developing tissues. This concept, in which a signalling gradient regulates differential gene expression in a concentration-dependent manner, provides a basis for understanding many patterning processes. It also raises several mechanistic issues, such as how responding cells perceive and interpret the concentration-dependent information provided by a morphogen to generate precise patterns of gene expression and cell differentiation in developing tissues. Here, we review recent work on the molecular features of morphogen signalling that facilitate the interpretation of graded signals and attempt to identify some emerging common principles.
Collapse
Affiliation(s)
- Hilary L Ashe
- Faculty of Life Sciences, The University of Manchester, UK.
| | | |
Collapse
|
49
|
Abnizova I, Gilks WR. Studying statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the eukaryotic genomes. Brief Bioinform 2006; 7:48-54. [PMID: 16761364 DOI: 10.1093/bib/bbk004] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
There are no well-known properties in regulatory DNA analogous to those in coding sequences; their spatial location is not regular, the consensus regulatory elements are often degenerate and there are no understandable rules governing their evolution. This makes it difficult to recognize regulatory regions within genome. We review developments in the statistical characterization of regulatory regions and methods of their recognition in eukaryotic genomes.
Collapse
|
50
|
Ochoa-Espinosa A, Small S. Developmental mechanisms and cis-regulatory codes. Curr Opin Genet Dev 2006; 16:165-70. [PMID: 16503128 DOI: 10.1016/j.gde.2006.02.014] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2006] [Accepted: 02/13/2006] [Indexed: 12/30/2022]
Abstract
Complex networks of transcriptional interactions control the processes of animal development. These networks begin with broad positional information that patterns the cells of the early embryo, and end with precise expression profiles that provide the functions of fully differentiated cells. At the heart of these networks are cis-regulatory modules (CRMs), which contain binding sites for regulatory proteins and control the spatial and temporal expression of genes within the network. Recent studies in several model systems have begun to decipher the 'cis-regulatory codes' of CRMs involved in various developmental processes. These studies suggest that CRMs involved in regulating co-expressed genes share sequence characteristics that can be identified by in silico approaches. They also suggest that CRMs involved in specific types of developmental events have common binding site architectures, which can be linked to their specific functions.
Collapse
|