51
|
Shu JJ, Li Y. A statistical fat-tail test of predicting regulatory regions in the Drosophila genome. Comput Biol Med 2012; 42:935-41. [PMID: 22884312 DOI: 10.1016/j.compbiomed.2012.07.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2010] [Revised: 05/29/2012] [Accepted: 07/18/2012] [Indexed: 11/19/2022]
Affiliation(s)
- Jian-Jun Shu
- School of Mechanical & Aerospace Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore.
| | | |
Collapse
|
52
|
Chen H, Xu Z, Mei C, Yu D, Small S. A system of repressor gradients spatially organizes the boundaries of Bicoid-dependent target genes. Cell 2012; 149:618-29. [PMID: 22541432 DOI: 10.1016/j.cell.2012.03.018] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Revised: 02/24/2012] [Accepted: 03/16/2012] [Indexed: 12/19/2022]
Abstract
The homeodomain (HD) protein Bicoid (Bcd) is thought to function as a gradient morphogen that positions boundaries of target genes via threshold-dependent activation mechanisms. Here, we analyze 66 Bcd-dependent regulatory elements and show that their boundaries are positioned primarily by repressive gradients that antagonize Bcd-mediated activation. A major repressor is the pair-rule protein Runt (Run), which is expressed in an opposing gradient and is necessary and sufficient for limiting Bcd-dependent activation. Evidence is presented that Run functions with the maternal repressor Capicua and the gap protein Kruppel as the principal components of a repression system that correctly orders boundaries throughout the anterior half of the embryo. These results put conceptual limits on the Bcd morphogen hypothesis and demonstrate how the Bcd gradient functions within the gene network that patterns the embryo.
Collapse
Affiliation(s)
- Hongtao Chen
- Department of Biology, New York University, 100 Washington Square East, New York, NY 10003, USA
| | | | | | | | | |
Collapse
|
53
|
Lettice L, Williamson I, Wiltshire J, Peluso S, Devenney P, Hill A, Essafi A, Hagman J, Mort R, Grimes G, DeAngelis C, Hill R. Opposing functions of the ETS factor family define Shh spatial expression in limb buds and underlie polydactyly. Dev Cell 2012; 22:459-67. [PMID: 22340503 PMCID: PMC3314984 DOI: 10.1016/j.devcel.2011.12.010] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2010] [Revised: 09/20/2011] [Accepted: 12/15/2011] [Indexed: 12/11/2022]
Abstract
Sonic hedgehog (Shh) expression during limb development is crucial for specifying the identity and number of digits. The spatial pattern of Shh expression is restricted to a region called the zone of polarizing activity (ZPA), and this expression is controlled from a long distance by the cis-regulator ZRS. Here, members of two groups of ETS transcription factors are shown to act directly at the ZRS mediating a differential effect on Shh, defining its spatial expression pattern. Occupancy at multiple GABPα/ETS1 sites regulates the position of the ZPA boundary, whereas ETV4/ETV5 binding restricts expression outside the ZPA. The ETS gene family is therefore attributed with specifying the boundaries of the classical ZPA. Two point mutations within the ZRS change the profile of ETS binding and activate Shh expression at an ectopic site in the limb bud. These molecular changes define a pathogenetic mechanism that leads to preaxial polydactyly (PPD).
Collapse
Affiliation(s)
- Laura A. Lettice
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
| | - Iain Williamson
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
| | - John H. Wiltshire
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
| | - Silvia Peluso
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
| | - Paul S. Devenney
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
| | - Alison E. Hill
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
| | - Abdelkader Essafi
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
| | - James Hagman
- Integrated Department of Immunology, National Jewish Health, Denver, CO 80206, USA
| | - Richard Mort
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
| | - Graeme Grimes
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
| | - Carlo L. DeAngelis
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
| | - Robert E. Hill
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
- Corresponding author
| |
Collapse
|
54
|
van Loo KMJ, Schaub C, Pernhorst K, Yaari Y, Beck H, Schoch S, Becker AJ. Transcriptional regulation of T-type calcium channel CaV3.2: bi-directionality by early growth response 1 (Egr1) and repressor element 1 (RE-1) protein-silencing transcription factor (REST). J Biol Chem 2012; 287:15489-501. [PMID: 22431737 DOI: 10.1074/jbc.m111.310763] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The pore-forming Ca(2+) channel subunit Ca(V)3.2 mediates a low voltage-activated (T-type) Ca(2+) current (I(CaT)) that contributes pivotally to neuronal and cardiac pacemaker activity. Despite the importance of tightly regulated Ca(V)3.2 levels, the mechanisms regulating its transcriptional dynamics are not well understood. Here, we have identified two key factors that up- and down-regulate the expression of the gene encoding Ca(V)3.2 (Cacna1h). First, we determined the promoter region and observed several stimulatory and inhibitory clusters. Furthermore, we found binding sites for the transcription factor early growth response 1 (Egr1/Zif268/Krox-24) to be highly overrepresented within the Ca(V)3.2 promoter region. mRNA expression analyses and dual-luciferase promoter assays revealed that the Ca(V)3.2 promoter was strongly activated by Egr1 overexpression in vitro and in vivo. Subsequent chromatin immunoprecipitation assays in NG108-15 cells and mouse hippocampi confirmed specific Egr1 binding to the Ca(V)3.2 promoter. Congruently, whole-cell I(CaT) values were significantly larger after Egr1 overexpression. Intriguingly, Egr1-induced activation of the Ca(V)3.2 promoter was effectively counteracted by the repressor element 1-silencing transcription factor (REST). Thus, Egr1 and REST can bi-directionally regulate Ca(V)3.2 promoter activity and mRNA expression and, hence, the size of I(CaT). This mechanism has critical implications for the regulation of neuronal and cardiac Ca(2+) homeostasis under physiological conditions and in episodic disorders such as arrhythmias and epilepsy.
Collapse
Affiliation(s)
- Karen M J van Loo
- Department of Neuropathology, University of Bonn Medical Center, D-53105 Bonn, Germany.
| | | | | | | | | | | | | |
Collapse
|
55
|
Nikulova AA, Favorov AV, Sutormin RA, Makeev VJ, Mironov AA. CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation. Nucleic Acids Res 2012; 40:e93. [PMID: 22422836 PMCID: PMC3384346 DOI: 10.1093/nar/gks235] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory ‘grammar’, or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.
Collapse
Affiliation(s)
- Anna A Nikulova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73 Leninskie Gory, Moscow 119991, Russia.
| | | | | | | | | |
Collapse
|
56
|
Nikulova AA, Polishchuk MS, Tumanian VG, Makeev VY, Mironov AA, Favorov AV. Correlations between clusters of protein-DNA binding sites and the binding experimental data allow predicting a structure of regulatory modules. Biophysics (Nagoya-shi) 2012. [DOI: 10.1134/s0006350912020157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
57
|
He X, Duque TSPC, Sinha S. Evolutionary origins of transcription factor binding site clusters. Mol Biol Evol 2011; 29:1059-70. [PMID: 22075113 DOI: 10.1093/molbev/msr277] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Empirical studies have revealed that regulatory DNA sequences such as enhancers or promoters often harbor multiple binding sites for the same transcription factor. Such "homotypic site clustering" has been hypothesized as arising out of functional requirements of the sequences. Here, we propose an alternative explanation of this phenomenon that multisite enhancers are common because they are favored by evolutionary sampling of the genotype-phenotype landscape. To test this hypothesis, we developed a new computational framework specialized for population genetic simulations of enhancer evolution. It uses a thermodynamics-based model of enhancer function, integrating information from strong as well as weak binding sites, to determine the strength of selection. Using this framework, we found that even when simpler genotypes exist for a desired strength of regulation, relatively complex genotypes (enhancers with more sites) are more readily reached by the simulated evolutionary process. We show that there are more ways to "build" a fit genotype with many weak sites than with a few strong sites, and this is why evolution finds complex genotypes more often. Our claims are consistent with an empirical analysis of binding site content in enhancers characterized in Drosophila melanogaster and their orthologs in other Drosophila species. We also characterized a subtle but significant difference between genotypes likely to be sampled by evolution and equally fit genotypes one would obtain by uniform sampling of the fitness landscape, that is, an "evolutionary signature" in enhancer sequences. Finally, we investigated potential effects of other factors, such as rugged fitness landscapes, short local duplications, and noise characteristics of enhancers, on the emergence of homotypic site clustering. Homotypic site clustering is an important contributor to the complexity and function of cis-regulatory sequences. This work provides a simple null hypothesis for its origin, against which alternative adaptationist explanations may be evaluated, and cautions against "evolutionary mirages" present in common features of genomic sequence. The quantitative framework we develop here can be used more generally to understand how mechanisms of enhancer action influence their composition and evolution.
Collapse
Affiliation(s)
- Xin He
- Department of Biochemistry, University of California at San Francisco, CA, USA
| | | | | |
Collapse
|
58
|
Struffi P, Corado M, Kaplan L, Yu D, Rushlow C, Small S. Combinatorial activation and concentration-dependent repression of the Drosophila even skipped stripe 3+7 enhancer. Development 2011; 138:4291-9. [PMID: 21865322 DOI: 10.1242/dev.065987] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Despite years of study, the precise mechanisms that control position-specific gene expression during development are not understood. Here, we analyze an enhancer element from the even skipped (eve) gene, which activates and positions two stripes of expression (stripes 3 and 7) in blastoderm stage Drosophila embryos. Previous genetic studies showed that the JAK-STAT pathway is required for full activation of the enhancer, whereas the gap genes hunchback (hb) and knirps (kni) are required for placement of the boundaries of both stripes. We show that the maternal zinc-finger protein Zelda (Zld) is absolutely required for activation, and present evidence that Zld binds to multiple non-canonical sites. We also use a combination of in vitro binding experiments and bioinformatics analysis to redefine the Kni-binding motif, and mutational analysis and in vivo tests to show that Kni and Hb are dedicated repressors that function by direct DNA binding. These experiments significantly extend our understanding of how the eve enhancer integrates positive and negative transcriptional activities to generate sharp boundaries in the early embryo.
Collapse
Affiliation(s)
- Paolo Struffi
- Department of Biology, New York University, New York, NY 10003, USA
| | | | | | | | | | | |
Collapse
|
59
|
Kulakovskiy IV, Belostotsky AA, Kasianov AS, Esipova NG, Medvedeva YA, Eliseeva IA, Makeev VJ. A deeper look into transcription regulatory code by preferred pair distance templates for transcription factor binding sites. ACTA ACUST UNITED AC 2011; 27:2621-4. [PMID: 21852305 DOI: 10.1093/bioinformatics/btr453] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Modern experimental methods provide substantial information on protein-DNA recognition. Studying arrangements of transcription factor binding sites (TFBSs) of interacting transcription factors (TFs) advances understanding of the transcription regulatory code. RESULTS We constructed binding motifs for TFs forming a complex with HIF-1α at the erythropoietin 3(')-enhancer. Corresponding TFBSs were predicted in the segments around transcription start sites (TSSs) of all human genes. Using the genome-wide set of regulatory regions, we observed several strongly preferred distances between hypoxia-responsive element (HRE) and binding sites of a particular cofactor protein. The set of preferred distances was called as a preferred pair distance template (PPDT). PPDT dramatically depended on the TF and orientation of its binding sites relative to HRE. PPDT evaluated from the genome-wide set of regulatory sequences was used to detect significant PPDT-consistent binding site pairs in regulatory regions of hypoxia-responsive genes. We believe PPDT can help to reveal the layout of eukaryotic regulatory segments. CONTACT ivan.kulakovskiy@gmail.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- I V Kulakovskiy
- Laboratory of Bioinformatics and System Biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia.
| | | | | | | | | | | | | |
Collapse
|
60
|
Schindler AJ, Sherwood DR. The transcription factor HLH-2/E/Daughterless regulates anchor cell invasion across basement membrane in C. elegans. Dev Biol 2011; 357:380-91. [PMID: 21784067 DOI: 10.1016/j.ydbio.2011.07.012] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2011] [Revised: 06/17/2011] [Accepted: 07/07/2011] [Indexed: 10/18/2022]
Abstract
Cell invasion through basement membrane is a specialized cellular behavior critical for many developmental processes and leukocyte trafficking. Invasive cellular behavior is also inappropriately co-opted during cancer progression. Acquisition of an invasive phenotype is accompanied by changes in gene expression that are thought to coordinate the steps of invasion. The transcription factors responsible for these changes in gene expression, however, are largely unknown. C. elegans anchor cell (AC) invasion is a genetically tractable in vivo model of invasion through basement membrane. AC invasion requires the conserved transcription factor FOS-1A, but other transcription factors are thought to act in parallel to FOS-1A to control invasion. Here we identify the transcription factor HLH-2, the C. elegans ortholog of Drosophila Daughterless and vertebrate E proteins, as a regulator of AC invasion. Reduction of HLH-2 function by RNAi or with a hypomorphic allele causes defects in AC invasion. Genetic analysis indicates that HLH-2 has functions outside of the FOS-1A pathway. Using expression analysis, we identify three genes that are transcriptionally regulated by HLH-2: the protocadherin cdh-3, and two genes encoding secreted extracellular matrix proteins, mig-6/papilin and him-4/hemicentin. Further, we show that reduction of HLH-2 function causes defects in polarization of F-actin to the invasive cell membrane, a process required for the AC to generate protrusions that breach the basement membrane. This work identifies HLH-2 as a regulator of the invasive phenotype in the AC, adding to our understanding of the transcriptional networks that control cell invasion.
Collapse
|
61
|
Papatsenko D, Levine M. The Drosophila gap gene network is composed of two parallel toggle switches. PLoS One 2011; 6:e21145. [PMID: 21747931 PMCID: PMC3128594 DOI: 10.1371/journal.pone.0021145] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2011] [Accepted: 05/20/2011] [Indexed: 11/30/2022] Open
Abstract
Drosophila “gap” genes provide the first response to maternal gradients in the early fly embryo. Gap genes are expressed in a series of broad bands across the embryo during first hours of development. The gene network controlling the gap gene expression patterns includes inputs from maternal gradients and mutual repression between the gap genes themselves. In this study we propose a modular design for the gap gene network, involving two relatively independent network domains. The core of each network domain includes a toggle switch corresponding to a pair of mutually repressive gap genes, operated in space by maternal inputs. The toggle switches present in the gap network are evocative of the phage lambda switch, but they are operated positionally (in space) by the maternal gradients, so the synthesis rates for the competing components change along the embryo anterior-posterior axis. Dynamic model, constructed based on the proposed principle, with elements of fractional site occupancy, required 5–7 parameters to fit quantitative spatial expression data for gap gradients. The identified model solutions (parameter combinations) reproduced major dynamic features of the gap gradient system and explained gap expression in a variety of segmentation mutants.
Collapse
Affiliation(s)
- Dmitri Papatsenko
- Department of Gene and Cell Medicine, Mount Sinai School of Medicine, Black Family Stem Cell Institute, New York, New York, United States of America.
| | | |
Collapse
|
62
|
When needles look like hay: how to find tissue-specific enhancers in model organism genomes. Dev Biol 2010; 350:239-54. [PMID: 21130761 DOI: 10.1016/j.ydbio.2010.11.026] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2010] [Revised: 11/11/2010] [Accepted: 11/22/2010] [Indexed: 01/22/2023]
Abstract
A major prerequisite for the investigation of tissue-specific processes is the identification of cis-regulatory elements. No generally applicable technique is available to distinguish them from any other type of genomic non-coding sequence. Therefore, researchers often have to identify these elements by elaborate in vivo screens, testing individual regions until the right one is found. Here, based on many examples from the literature, we summarize how functional enhancers have been isolated from other elements in the genome and how they have been characterized in transgenic animals. Covering computational and experimental studies, we provide an overview of the global properties of cis-regulatory elements, like their specific interactions with promoters and target gene distances. We describe conserved non-coding elements (CNEs) and their internal structure, nucleotide composition, binding site clustering and overlap, with a special focus on developmental enhancers. Conflicting data and unresolved questions on the nature of these elements are highlighted. Our comprehensive overview of the experimental shortcuts that have been found in the different model organism communities and the new field of high-throughput assays should help during the preparation phase of a screen for enhancers. The review is accompanied by a list of general guidelines for such a project.
Collapse
|
63
|
Lee AP, Kerk SY, Tan YY, Brenner S, Venkatesh B. Ancient vertebrate conserved noncoding elements have been evolving rapidly in teleost fishes. Mol Biol Evol 2010; 28:1205-15. [PMID: 21081479 DOI: 10.1093/molbev/msq304] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Vertebrate genomes contain thousands of conserved noncoding elements (CNEs) that often function as tissue-specific enhancers. In this study, we have identified CNEs in human, dog, chicken, Xenopus, and four teleost fishes (zebrafish, stickleback, medaka, and fugu) using elephant shark, a cartilaginous vertebrate, as the base genome and investigated the evolution of these ancient vertebrate CNEs (aCNEs) in bony vertebrate lineages. Our analysis shows that aCNEs have been evolving at different rates in different bony vertebrate lineages. Although 78-83% of CNEs have diverged beyond recognition ("lost") in different teleost fishes, only 24% and 40% have been lost in the chicken and mammalian lineages, respectively. Relative rate tests of substitution rates in CNEs revealed that the teleost fish CNEs have been evolving at a significantly higher rate than those in other bony vertebrates. In the ray-finned fish lineage, 68% of aCNEs were lost before the divergence of the four teleosts. This implicates the "fish-specific" whole-genome duplication in the accelerated evolution and the loss of a large number of both copies of duplicated CNEs in teleost fishes. The aCNEs are rich in tissue-specific enhancers and thus many of them are likely to be evolutionarily constrained cis-regulatory elements. The rapid evolution of aCNEs might have affected the expression patterns driven by them. Transgenic zebrafish assay of some human CNE enhancers that have been lost in teleosts has indicated instances of conservation or changes in trans-acting factors between mammals and fishes.
Collapse
Affiliation(s)
- Alison P Lee
- Comparative Genomics Laboratory, Institute of Molecular and Cell Biology, A*STAR (Agency for Science, Technology and Research), Biopolis, Singapore
| | | | | | | | | |
Collapse
|
64
|
Guo Y, Papachristoudis G, Altshuler RC, Gerber GK, Jaakkola TS, Gifford DK, Mahony S. Discovering homotypic binding events at high spatial resolution. ACTA ACUST UNITED AC 2010; 26:3028-34. [PMID: 20966006 DOI: 10.1093/bioinformatics/btq590] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Clusters of protein-DNA interaction events involving the same transcription factor are known to act as key components of invertebrate and mammalian promoters and enhancers. However, detecting closely spaced homotypic events from ChIP-Seq data is challenging because random variation in the ChIP fragmentation process obscures event locations. RESULTS The Genome Positioning System (GPS) can predict protein-DNA interaction events at high spatial resolution from ChIP-Seq data, while retaining the ability to resolve closely spaced events that appear as a single cluster of reads. GPS models observed reads using a complexity penalized mixture model and efficiently predicts event locations with a segmented EM algorithm. An optional mode permits GPS to align common events across distinct experiments. GPS detects more joint events in synthetic and actual ChIP-Seq data and has superior spatial resolution when compared with other methods. In addition, the specificity and sensitivity of GPS are superior to or comparable with other methods. AVAILABILITY http://cgs.csail.mit.edu/gps.
Collapse
Affiliation(s)
- Yuchun Guo
- MIT Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
| | | | | | | | | | | | | |
Collapse
|
65
|
Thermodynamics-based models of transcriptional regulation by enhancers: the roles of synergistic activation, cooperative binding and short-range repression. PLoS Comput Biol 2010; 6. [PMID: 20862354 PMCID: PMC2940721 DOI: 10.1371/journal.pcbi.1000935] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2010] [Accepted: 08/17/2010] [Indexed: 01/08/2023] Open
Abstract
Quantitative models of cis-regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled, or heuristic approximations of the underlying regulatory mechanisms. We have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence, as a function of transcription factor concentrations and their DNA-binding specificities. It uses statistical thermodynamics theory to model not only protein-DNA interaction, but also the effect of DNA-bound activators and repressors on gene expression. In addition, the model incorporates mechanistic features such as synergistic effect of multiple activators, short range repression, and cooperativity in transcription factor-DNA binding, allowing us to systematically evaluate the significance of these features in the context of available expression data. Using this model on segmentation-related enhancers in Drosophila, we find that transcriptional synergy due to simultaneous action of multiple activators helps explain the data beyond what can be explained by cooperative DNA-binding alone. We find clear support for the phenomenon of short-range repression, where repressors do not directly interact with the basal transcriptional machinery. We also find that the binding sites contributing to an enhancer's function may not be conserved during evolution, and a noticeable fraction of these undergo lineage-specific changes. Our implementation of the model, called GEMSTAT, is the first publicly available program for simultaneously modeling the regulatory activities of a given set of sequences. The development of complex multicellular organisms requires genes to be expressed at specific stages and in specific tissues. Regulatory DNA sequences, often called cis-regulatory modules, drive the desired gene expression patterns by integrating information about the environment in the form of the activities of transcription factors. The rules by which regulatory sequences read this type of information, however, are unclear. In this work, we developed quantitative models based on physicochemical principles that directly map regulatory sequences to the expression profiles they generate. We evaluated these models on the segmentation network of the model organism Drosophila melanogaster. Our models incorporate mechanistic features that attempt to capture how activating and repressing transcription factors work in the segmentation system. By evaluating the importance of these features, we were able to gain insights on the quantitative regulatory rules. We found that two different mechanisms may contribute to cooperative gene activation and that repressors often have a short range of influence in DNA sequences. Combining the quantitative modeling with comparative sequence analysis, we also found that even functional sequences may be lost during evolution.
Collapse
|
66
|
Carstensen L, Sandelin A, Winther O, Hansen NR. Multivariate Hawkes process models of the occurrence of regulatory elements. BMC Bioinformatics 2010; 11:456. [PMID: 20828413 PMCID: PMC2949889 DOI: 10.1186/1471-2105-11-456] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2010] [Accepted: 09/09/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A central question in molecular biology is how transcriptional regulatory elements (TREs) act in combination. Recent high-throughput data provide us with the location of multiple regulatory regions for multiple regulators, and thus with the possibility of analyzing the multivariate distribution of the occurrences of these TREs along the genome. RESULTS We present a model of TRE occurrences known as the Hawkes process. We illustrate the use of this model by analyzing two different publically available data sets. We are able to model, in detail, how the occurrence of one TRE is affected by the occurrences of others, and we can test a range of natural hypotheses about the dependencies among the TRE occurrences. In contrast to earlier efforts, pre-processing steps such as clustering or binning are not needed, and we thus retain information about the dependencies among the TREs that is otherwise lost. For each of the two data sets we provide two results: first, a qualitative description of the dependencies among the occurrences of the TREs, and second, quantitative results on the favored or avoided distances between the different TREs. CONCLUSIONS The Hawkes process is a novel way of modeling the joint occurrences of multiple TREs along the genome that is capable of providing new insights into dependencies among elements involved in transcriptional regulation. The method is available as an R package from http://www.math.ku.dk/~richard/ppstat/.
Collapse
Affiliation(s)
- Lisbeth Carstensen
- Department of Mathematical Sciences, University of Copenhagen, Universitetsparken 5, 2100 Copenhagen Ø, Denmark
| | | | | | | |
Collapse
|
67
|
Lang M, Juan E. Binding site number variation and high-affinity binding consensus of Myb-SANT-like transcription factor Adf-1 in Drosophilidae. Nucleic Acids Res 2010; 38:6404-17. [PMID: 20542916 PMCID: PMC2965233 DOI: 10.1093/nar/gkq504] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
There is a growing interest in the evolution of transcription factor binding sites and corresponding functional change of transcriptional regulation. In this context, we have examined the structural changes of the ADF-1 binding sites at the Adh promoters of Drosophila funebris and D. virilis. We detected an expanded footprinted region in D. funebris that contains various adjacent binding sites with different binding affinities. ADF-1 was described to direct sequence-specific DNA binding to sites consisting of the multiple trinucleotide repeat . The ADF-1 recognition sites with high binding affinity differ from this trinucleotide repeat consensus sequence and a new consensus sequence is proposed for the high-affinity ADF-1 binding sites. In vitro transcription experiments with the D. funebris and D. virilis ADF-1 binding regions revealed that stronger ADF-1 binding to the expanded D. funebris ADF-1 binding region only moderately lead to increased transcriptional activity of the Adh gene. The potential of this regional expansion is discussed in the context of different ADF-1 cellular concentrations and maintenance of the ADF-1 stimulus. Altogether, evolutionary change of ADF-1 binding regions involves both, rearrangements of complex binding site cluster and also nucleotide substitutions within sites that lead to different binding affinities.
Collapse
Affiliation(s)
- Michael Lang
- Departament de Genètica, Universitat de Barcelona, 08028 Barcelona, Spain
| | | |
Collapse
|
68
|
Jung CH, Makunin IV, Mattick JS. Identification of conserved Drosophila-specific euchromatin-restricted non-coding sequence motifs. Genomics 2010; 96:154-66. [PMID: 20595017 DOI: 10.1016/j.ygeno.2010.05.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2010] [Revised: 05/25/2010] [Accepted: 05/26/2010] [Indexed: 01/19/2023]
Abstract
Non-protein-coding DNA comprises the majority of animal genomes but its functions are largely unknown. We identified over 17,000 different tetranucleotide pairs in the Drosophila melanogaster genome that are over-represented at distances up to 100nt in conserved non-exonic sequences. Those exhibiting the highest information content in surrounding nucleotides were classified into five groups: tRNAs, motifs associated with histone genes, Suppressor-of-Hairy-wing binding sites, and two sets of previously unrecognized motifs (DLM3 and DLM4). There are hundreds to thousands of copies of DLM3 and DLM4, respectively, in the genome, located almost exclusively in non-coding regions. They have similar copy numbers among drosophilids, but are largely absent in other insects. DLM3 is likely a cis-regulatory element, whereas DLM4 sequences are capable of forming a short hairpin structure and are expressed as approximately 80nt RNAs. This work reports the existence of Drosophila genus-specific sequence motifs, and suggests that many more novel functional elements may be discovered in genomes using the general approach outlined herein.
Collapse
Affiliation(s)
- Chol-Hee Jung
- Institute for Molecular Bioscience, The University of Queensland, St Lucia QLD, Australia
| | | | | |
Collapse
|
69
|
Kulakovskiy IV, Makeev VJ. Discovery of DNA motifs recognized by transcription factors through integration of different experimental sources. Biophysics (Nagoya-shi) 2010. [DOI: 10.1134/s0006350909060013] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
70
|
Gotea V, Visel A, Westlund JM, Nobrega MA, Pennacchio LA, Ovcharenko I. Homotypic clusters of transcription factor binding sites are a key component of human promoters and enhancers. Genome Res 2010; 20:565-77. [PMID: 20363979 DOI: 10.1101/gr.104471.109] [Citation(s) in RCA: 169] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Clustering of multiple transcription factor binding sites (TFBSs) for the same transcription factor (TF) is a common feature of cis-regulatory modules in invertebrate animals, but the occurrence of such homotypic clusters of TFBSs (HCTs) in the human genome has remained largely unknown. To explore whether HCTs are also common in human and other vertebrates, we used known binding motifs for vertebrate TFs and a hidden Markov model-based approach to detect HCTs in the human, mouse, chicken, and fugu genomes, and examined their association with cis-regulatory modules. We found that evolutionarily conserved HCTs occupy nearly 2% of the human genome, with experimental evidence for individual TFs supporting their binding to predicted HCTs. More than half of the promoters of human genes contain HCTs, with a distribution around the transcription start site in agreement with the experimental data from the ENCODE project. In addition, almost half of the 487 experimentally validated developmental enhancers contain them as well--a number more than 25-fold larger than expected by chance. We also found evidence of negative selection acting on TFBSs within HCTs, as the conservation of TFBSs is stronger than the conservation of sequences separating them. The important role of HCTs as components of developmental enhancers is additionally supported by a strong correlation between HCTs and the binding of the enhancer-associated coactivator protein Ep300 (also known as p300). Experimental validation of HCT-containing elements in both zebrafish and mouse suggest that HCTs could be used to predict both the presence of enhancers and their tissue specificity, and are thus a feature that can be effectively used in deciphering the gene regulatory code. In conclusion, our results indicate that HCTs are a pervasive feature of human cis-regulatory modules and suggest that they play an important role in gene regulation in the human and other vertebrate genomes.
Collapse
Affiliation(s)
- Valer Gotea
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | | | | | | | | | | |
Collapse
|
71
|
Noncooperative Interactions between Transcription Factors and Clustered DNA Binding Sites Enable Graded Transcriptional Responses to Environmental Inputs. Mol Cell 2010; 37:418-28. [DOI: 10.1016/j.molcel.2010.01.016] [Citation(s) in RCA: 126] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2009] [Revised: 10/30/2009] [Accepted: 12/23/2009] [Indexed: 02/08/2023]
|
72
|
Goering LM, Hunt PK, Heighington C, Busick C, Pennings PS, Hermisson J, Kumar S, Gibson G. Association of orthodenticle with natural variation for early embryonic patterning in Drosophila melanogaster. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2010; 312:841-54. [PMID: 19488993 DOI: 10.1002/jez.b.21299] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Although it is well established that cis-acting regulatory variation contributes to morphological evolution between species, few concrete examples of polymorphism affecting developmental patterning within species have been demonstrated. Early embryogenesis in Drosophila is initiated by a gradient of Bicoid morphogen activity that results in differential expression of multiple target genes. In a screen for genetic variation affecting this process, we surveyed 96 wild-type lines of Drosophila melanogaster for polymorphisms in binding sites within 16 Bicoid cis-regulatory response elements. One common polymorphism in the orthodenticle (otd) early head enhancer is associated with a complex series of indels/substitutions that define two distinct haplotypes. The middle region of this enhancer exhibits an unusual pattern of nucleotide diversity that does not easily fit into standard models of selection and demography. Population Gene Expression Maps, generated by extracting binary expression profiles from normalized embryo images, revealed a ventral reduction of otd transcript abundance in one of the haplotypes that was recapitulated in expression of transgenic constructs containing the two alleles. We thus demonstrate that even a process as robust as early developmental patterning is affected by standing genetic variation, intriguingly involving otd, whose morphogenetic function bicoid is thought to have displaced during dipteran evolution.
Collapse
Affiliation(s)
- Lisa M Goering
- Department of Genetics, North Carolina State University, Raleigh, North Carolina, USA.
| | | | | | | | | | | | | | | |
Collapse
|
73
|
Medvedeva YA, Fridman MV, Oparina NJ, Malko DB, Ermakova EO, Kulakovskiy IV, Heinzel A, Makeev VJ. Intergenic, gene terminal, and intragenic CpG islands in the human genome. BMC Genomics 2010; 11:48. [PMID: 20085634 PMCID: PMC2817693 DOI: 10.1186/1471-2164-11-48] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 01/19/2010] [Indexed: 11/10/2022] Open
Abstract
Background Recently, it has been discovered that the human genome contains many transcription start sites for non-coding RNA. Regulatory regions related to transcription of this non-coding RNAs are poorly studied. Some of these regulatory regions may be associated with CpG islands located far from transcription start-sites of any protein coding gene. The human genome contains many such CpG islands; however, until now their properties were not systematically studied. Results We studied CpG islands located in different regions of the human genome using methods of bioinformatics and comparative genomics. We have observed that CpG islands have a preference to overlap with exons, including exons located far from transcription start site, but usually extend well into introns. Synonymous substitution rate of CpG-containing codons becomes substantially reduced in regions where CpG islands overlap with protein-coding exons, even if they are located far downstream from transcription start site. CAGE tag analysis displayed frequent transcription start sites in all CpG islands, including those found far from transcription start sites of protein coding genes. Computational prediction and analysis of published ChIP-chip data revealed that CpG islands contain an increased number of sites recognized by Sp1 protein. CpG islands containing more CAGE tags usually also contain more Sp1 binding sites. This is especially relevant for CpG islands located in 3' gene regions. Various examples of transcription, confirmed by mRNAs or ESTs, but with no evidence of protein coding genes, were found in CAGE-enriched CpG islands located far from transcription start site of any known protein coding gene. Conclusions CpG islands located far from transcription start sites of protein coding genes have transcription initiation activity and display Sp1 binding properties. In exons, overlapping with these islands, the synonymous substitution rate of CpG containing codons is decreased. This suggests that these CpG islands are involved in transcription initiation, possibly of some non-coding RNAs.
Collapse
Affiliation(s)
- Yulia A Medvedeva
- Research Institute for Genetics and Selection of Industrial Microorganisms, Genetika, 1st Dorozhny proezd, 1, Moscow, 117545, Russia.
| | | | | | | | | | | | | | | |
Collapse
|
74
|
Narlikar L, Sakabe NJ, Blanski AA, Arimura FE, Westlund JM, Nobrega MA, Ovcharenko I. Genome-wide discovery of human heart enhancers. Genome Res 2010; 20:381-92. [PMID: 20075146 DOI: 10.1101/gr.098657.109] [Citation(s) in RCA: 98] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The various organogenic programs deployed during embryonic development rely on the precise expression of a multitude of genes in time and space. Identifying the cis-regulatory elements responsible for this tightly orchestrated regulation of gene expression is an essential step in understanding the genetic pathways involved in development. We describe a strategy to systematically identify tissue-specific cis-regulatory elements that share combinations of sequence motifs. Using heart development as an experimental framework, we employed a combination of Gibbs sampling and linear regression to build a classifier that identifies heart enhancers based on the presence and/or absence of various sequence features, including known and putative transcription factor (TF) binding specificities. In distinguishing heart enhancers from a large pool of random noncoding sequences, the performance of our classifier is vastly superior to four commonly used methods, with an accuracy reaching 92% in cross-validation. Furthermore, most of the binding specificities learned by our method resemble the specificities of TFs widely recognized as key players in heart development and differentiation, such as SRF, MEF2, ETS1, SMAD, and GATA. Using our classifier as a predictor, a genome-wide scan identified over 40,000 novel human heart enhancers. Although the classifier used no gene expression information, these novel enhancers are strongly associated with genes expressed in the heart. Finally, in vivo tests of our predictions in mouse and zebrafish achieved a validation rate of 62%, significantly higher than what is expected by chance. These results support the existence of underlying cis-regulatory codes dictating tissue-specific transcription in mammalian genomes and validate our enhancer classifier strategy as a method to uncover these regulatory codes.
Collapse
Affiliation(s)
- Leelavati Narlikar
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland 20894, USA
| | | | | | | | | | | | | |
Collapse
|
75
|
Zeitlinger J, Stark A. Developmental gene regulation in the era of genomics. Dev Biol 2010; 339:230-9. [PMID: 20045679 DOI: 10.1016/j.ydbio.2009.12.039] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2009] [Revised: 12/04/2009] [Accepted: 12/23/2009] [Indexed: 01/30/2023]
Abstract
Genetic experiments over the last few decades have identified many developmental control genes critical for pattern formation and cell fate specification during the development of multicellular organisms. A large fraction of these genes encode transcription factors and signaling molecules, show highly dynamic expression patterns during development, and are deeply evolutionarily conserved and deregulated in various human diseases such as cancer. Because of their importance in development, evolution, and disease, a fundamental question in biology is how these developmental control genes are regulated in such an extensive and precise fashion. Using genomics methods, it has become clear that developmental control genes are a distinct group of genes with special regulatory characteristics. However, a systematic analysis of these characteristics has not been presented. Here we review how developmental control genes were discovered, evaluate their genome-wide regulation and gene structure, discuss emerging evidence for their mode of regulation, and estimate their overall abundance in the genome. Understanding the global regulation of developmental control genes may provide a new perspective on development in the era genomics.
Collapse
Affiliation(s)
- Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA.
| | | |
Collapse
|
76
|
Papatsenko D. Stripe formation in the early fly embryo: principles, models, and networks. Bioessays 2009; 31:1172-80. [DOI: 10.1002/bies.200900096] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
77
|
Identifying cis-regulatory sequences by word profile similarity. PLoS One 2009; 4:e6901. [PMID: 19730735 PMCID: PMC2731932 DOI: 10.1371/journal.pone.0006901] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2008] [Accepted: 08/07/2009] [Indexed: 12/13/2022] Open
Abstract
Background Recognizing regulatory sequences in genomes is a continuing challenge, despite a wealth of available genomic data and a growing number of experimentally validated examples. Methodology/Principal Findings We discuss here a simple approach to search for regulatory sequences based on the compositional similarity of genomic regions and known cis-regulatory sequences. This method, which is not limited to searching for predefined motifs, recovers sequences known to be under similar regulatory control. The words shared by the recovered sequences often correspond to known binding sites. Furthermore, we show that although local word profile clustering is predictive for the regulatory sequences involved in blastoderm segmentation, local dissimilarity is a more universal feature of known regulatory sequences in Drosophila. Conclusions/Significance Our method leverages sequence motifs within a known regulatory sequence to identify co-regulated sequences without explicitly defining binding sites. We also show that regulatory sequences can be distinguished from surrounding sequences by local sequence dissimilarity, a novel feature in identifying regulatory sequences across a genome. Source code for WPH-finder is available for download at http://rana.lbl.gov/downloads/wph.tar.gz.
Collapse
|
78
|
Mongin E, Dewar K, Blanchette M. Long-range regulation is a major driving force in maintaining genome integrity. BMC Evol Biol 2009; 9:203. [PMID: 19682388 PMCID: PMC2741452 DOI: 10.1186/1471-2148-9-203] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2009] [Accepted: 08/15/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The availability of newly sequenced vertebrate genomes, along with more efficient and accurate alignment algorithms, have enabled the expansion of the field of comparative genomics. Large-scale genome rearrangement events modify the order of genes and non-coding conserved regions on chromosomes. While certain large genomic regions have remained intact over much of vertebrate evolution, others appear to be hotspots for genomic breakpoints. The cause of the non-uniformity of breakpoints that occurred during vertebrate evolution is poorly understood. RESULTS We describe a machine learning method to distinguish genomic regions where breakpoints would be expected to have deleterious effects (called breakpoint-refractory regions) from those where they are expected to be neutral (called breakpoint-susceptible regions). Our predictor is trained using breakpoints that took place along the human lineage since amniote divergence. Based on our predictions, refractory and susceptible regions have very distinctive features. Refractory regions are significantly enriched for conserved non-coding elements as well as for genes involved in development, whereas susceptible regions are enriched for housekeeping genes, likely to have simpler transcriptional regulation. CONCLUSION We postulate that long-range transcriptional regulation strongly influences chromosome break fixation. In many regions, the fitness cost of altering the spatial association between long-range regulatory regions and their target genes may be so high that rearrangements are not allowed. Consequently, only a limited, identifiable fraction of the genome is susceptible to genome rearrangements.
Collapse
Affiliation(s)
- Emmanuel Mongin
- McGill Centre for Bioinformatics, McGill University, Montreal, Canada
- Research Institute of McGill University Health Centre, McGill University and Genome Quebec Innovation Centre, Montreal, Canada
- Departments of Human Genetics and Experimental Medicine, McGill University, Montreal, Canada
| | - Ken Dewar
- Research Institute of McGill University Health Centre, McGill University and Genome Quebec Innovation Centre, Montreal, Canada
- Departments of Human Genetics and Experimental Medicine, McGill University, Montreal, Canada
| | - Mathieu Blanchette
- McGill Centre for Bioinformatics, McGill University, Montreal, Canada
- School of Computer Science, McGill University, Montreal, Canada
| |
Collapse
|
79
|
Papatsenko D, Goltsev Y, Levine M. Organization of developmental enhancers in the Drosophila embryo. Nucleic Acids Res 2009; 37:5665-77. [PMID: 19651877 PMCID: PMC2761283 DOI: 10.1093/nar/gkp619] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Most cell-specific enhancers are thought to lack an inherent organization, with critical binding sites distributed in a more or less random fashion. However, there are examples of fixed arrangements of binding sites, such as helical phasing, that promote the formation of higher-order protein complexes on the enhancer DNA template. Here, we investigate the regulatory ‘grammar’ of nearly 100 characterized enhancers for developmental control genes active in the early Drosophila embryo. The conservation of grammar is examined in seven divergent Drosophila genomes. Linked binding sites are observed for particular combinations of binding motifs, including Bicoid–Bicoid, Hunchback–Hunchback, Bicoid–Dorsal, Bicoid–Caudal and Dorsal–Twist. Direct evidence is presented for the importance of Bicoid–Dorsal linkage in the integration of the anterior–posterior and dorsal–ventral patterning systems. Hunchback–Hunchback interactions help explain unresolved aspects of segmentation, including the differential regulation of the eve stripe 3 + 7 and stripe 4 + 6 enhancers. We also present evidence that there is an under-representation of nucleosome positioning sequences in many enhancers, raising the possibility for a subtle higher-order structure extending across certain enhancers. We conclude that grammar of gene control regions is pervasively used in the patterning of the Drosophila embryo.
Collapse
Affiliation(s)
- Dmitri Papatsenko
- Department of Molecular Cell Biology, Division of Genetics, Genomics & Development, Center for Integrative Genomics, University of California, Berkeley, CA 94720-200, USA.
| | | | | |
Collapse
|
80
|
Kulakovskiy IV, Favorov AV, Makeev VJ. Motif discovery and motif finding from genome-mapped DNase footprint data. ACTA ACUST UNITED AC 2009; 25:2318-25. [PMID: 19605419 DOI: 10.1093/bioinformatics/btp434] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Footprint data is an important source of information on transcription factor recognition motifs. However, a footprinting fragment can contain no sequences similar to known protein recognition sites. Inspection of genome fragments nearby can help to identify missing site positions. RESULTS Genome fragments containing footprints were supplied to a pipeline that constructed a position weight matrix (PWM) for different motif lengths and selected the optimal PWM. Fragments were aligned with the SeSiMCMC sampler and a new heuristic algorithm, Bigfoot. Footprints with missing hits were found for approximately 50% of factors. Adding only 2 bp on both sides of a footprinting fragment recovered most hits. We automatically constructed motifs for 41 Drosophila factors. New motifs can recognize footprints with a greater sensitivity at the same false positive rate than existing models. Also we discuss possible overfitting of constructed motifs. AVAILABILITY Software and the collection of regulatory motifs are freely available at http://line.imb.ac.ru/DMMPMM.
Collapse
Affiliation(s)
- Ivan V Kulakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.
| | | | | |
Collapse
|
81
|
Narlikar L, Ovcharenko I. Identifying regulatory elements in eukaryotic genomes. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2009; 8:215-30. [PMID: 19498043 DOI: 10.1093/bfgp/elp014] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Proper development and functioning of an organism depends on precise spatial and temporal expression of all its genes. These coordinated expression-patterns are maintained primarily through the process of transcriptional regulation. Transcriptional regulation is mediated by proteins binding to regulatory elements on the DNA in a combinatorial manner, where particular combinations of transcription factor binding sites establish specific regulatory codes. In this review, we survey experimental and computational approaches geared towards the identification of proximal and distal gene regulatory elements in the genomes of complex eukaryotes. Available approaches that decipher the genetic structure and function of regulatory elements by exploiting various sources of information like gene expression data, chromatin structure, DNA-binding specificities of transcription factors, cooperativity of transcription factors, etc. are highlighted. We also discuss the relevance of regulatory elements in the context of human health through examples of mutations in some of these regions having serious implications in misregulation of genes and being strongly associated with human disorders.
Collapse
Affiliation(s)
- Leelavati Narlikar
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | |
Collapse
|
82
|
Pape UJ, Klein H, Vingron M. Statistical detection of cooperative transcription factors with similarity adjustment. Bioinformatics 2009; 25:2103-9. [PMID: 19286833 PMCID: PMC2722994 DOI: 10.1093/bioinformatics/btp143] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Motivation: Statistical assessment of cis-regulatory modules (CRMs) is a crucial task in computational biology. Usually, one concludes from exceptional co-occurrences of DNA motifs that the corresponding transcription factors (TFs) are cooperative. However, similar DNA motifs tend to co-occur in random sequences due to high probability of overlapping occurrences. Therefore, it is important to consider similarity of DNA motifs in the statistical assessment. Results: Based on previous work, we propose to adjust the window size for co-occurrence detection. Using the derived approximation, one obtains different window sizes for different sets of DNA motifs depending on their similarities. This ensures that the probability of co-occurrences in random sequences are equal. Applying the approach to selected similar and dissimilar DNA motifs from human TFs shows the necessity of adjustment and confirms the accuracy of the approximation by comparison to simulated data. Furthermore, it becomes clear that approaches ignoring similarities strongly underestimate P-values for cooperativity of TFs with similar DNA motifs. In addition, the approach is extended to deal with overlapping windows. We derive Chen–Stein error bounds for the approximation. Comparing the error bounds for similar and dissimilar DNA motifs shows that the approximation for similar DNA motifs yields large bounds. Hence, one has to be careful using overlapping windows. Based on the error bounds, one can precompute the approximation errors and select an appropriate overlap scheme before running the analysis. Availability: Software to perform the calculation for pairs of position frequency matrices (PFMs) is available at http://mosta.molgen.mpg.de as well as C++ source code for downloading. Contact:utz.pape@molgen.mpg.de
Collapse
Affiliation(s)
- Utz J Pape
- Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 73 and Mathematics and Computer Science, Free University of Berlin, Takustr. 9, 14195 Berlin, Germany.
| | | | | |
Collapse
|
83
|
He X, Ling X, Sinha S. Alignment and prediction of cis-regulatory modules based on a probabilistic model of evolution. PLoS Comput Biol 2009; 5:e1000299. [PMID: 19293946 PMCID: PMC2657044 DOI: 10.1371/journal.pcbi.1000299] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2008] [Accepted: 01/22/2009] [Indexed: 11/30/2022] Open
Abstract
Cross-species comparison has emerged as a powerful paradigm for predicting cis-regulatory modules (CRMs) and understanding their evolution. The comparison requires reliable sequence alignment, which remains a challenging task for less conserved noncoding sequences. Furthermore, the existing models of DNA sequence evolution generally do not explicitly treat the special properties of CRM sequences. To address these limitations, we propose a model of CRM evolution that captures different modes of evolution of functional transcription factor binding sites (TFBSs) and the background sequences. A particularly novel aspect of our work is a probabilistic model of gains and losses of TFBSs, a process being recognized as an important part of regulatory sequence evolution. We present a computational framework that uses this model to solve the problems of CRM alignment and prediction. Our alignment method is similar to existing methods of statistical alignment but uses the conserved binding sites to improve alignment. Our CRM prediction method deals with the inherent uncertainties of binding site annotations and sequence alignment in a probabilistic framework. In simulated as well as real data, we demonstrate that our program is able to improve both alignment and prediction of CRM sequences over several state-of-the-art methods. Finally, we used alignments produced by our program to study binding site conservation in genome-wide binding data of key transcription factors in the Drosophila blastoderm, with two intriguing results: (i) the factor-bound sequences are under strong evolutionary constraints even if their neighboring genes are not expressed in the blastoderm and (ii) binding sites in distal bound sequences (relative to transcription start sites) tend to be more conserved than those in proximal regions. Our approach is implemented as software, EMMA (Evolutionary Model-based cis-regulatory Module Analysis), ready to be applied in a broad biological context. Comparison of noncoding DNA sequences across species has the potential to significantly improve our understanding of gene regulation and our ability to annotate regulatory regions of the genome. This potential is evident from recent publications analyzing 12 Drosophila genomes for regulatory annotation. However, because noncoding sequences are much less structured than coding sequences, their interspecies comparison presents technical challenges, such as ambiguity about how to align them and how to predict transcription factor binding sites, which are the fundamental units that make up regulatory sequences. This article describes how to build an integrated probabilistic framework that performs alignment and binding site prediction simultaneously, in the process improving the accuracy of both tasks. It defines a stochastic model for the evolution of entire “cis-regulatory modules,” with its highlight being a novel theoretical treatment of the commonly observed loss and gain of binding sites during evolution. This new evolutionary model forms the backbone of newly developed software for the prediction of new cis-regulatory modules, alignment of known modules to elucidate general principles of cis-regulatory evolution, or both. The new software is demonstrated to provide benefits in performance of these two crucial genomics tasks.
Collapse
Affiliation(s)
- Xin He
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xu Ling
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
84
|
Wilczynski B, Dojer N, Patelak M, Tiuryn J. Finding evolutionarily conserved cis-regulatory modules with a universal set of motifs. BMC Bioinformatics 2009; 10:82. [PMID: 19284541 PMCID: PMC2669485 DOI: 10.1186/1471-2105-10-82] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2008] [Accepted: 03/10/2009] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Finding functional regulatory elements in DNA sequences is a very important problem in computational biology and providing a reliable algorithm for this task would be a major step towards understanding regulatory mechanisms on genome-wide scale. Major obstacles in this respect are that the fact that the amount of non-coding DNA is vast, and that the methods for predicting functional transcription factor binding sites tend to produce results with a high percentage of false positives. This makes the problem of finding regions significantly enriched in binding sites difficult. RESULTS We develop a novel method for predicting regulatory regions in DNA sequences, which is designed to exploit the evolutionary conservation of regulatory elements between species without assuming that the order of motifs is preserved across species. We have implemented our method and tested its predictive abilities on various datasets from different organisms. CONCLUSION We show that our approach enables us to find a majority of the known CRMs using only sequence information from different species together with currently publicly available motif data. Also, our method is robust enough to perform well in predicting CRMs, despite differences in tissue specificity and even across species, provided that the evolutionary distances between compared species do not change substantially. The complexity of the proposed algorithm is polynomial, and the observed running times show that it may be readily applied.
Collapse
|
85
|
Kim J, He X, Sinha S. Evolution of regulatory sequences in 12 Drosophila species. PLoS Genet 2009; 5:e1000330. [PMID: 19132088 PMCID: PMC2607023 DOI: 10.1371/journal.pgen.1000330] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2008] [Accepted: 12/05/2008] [Indexed: 01/07/2023] Open
Abstract
Characterization of the evolutionary constraints acting on cis-regulatory sequences is crucial to comparative genomics and provides key insights on the evolution of organismal diversity. We study the relationships among orthologous cis-regulatory modules (CRMs) in 12 Drosophila species, especially with respect to the evolution of transcription factor binding sites, and report statistical evidence in favor of key evolutionary hypotheses. Binding sites are found to have position-specific substitution rates. However, the selective forces at different positions of a site do not act independently, and the evidence suggests that constraints on sites are often based on their exact binding affinities. Binding site loss is seen to conform to a molecular clock hypothesis. The rate of site loss is transcription factor–specific and depends on the strength of binding and, in some cases, the presence of other binding sites in close proximity. Our analysis is based on a novel computational method for aligning orthologous CRMs on a tree, which rigorously accounts for alignment uncertainties and exploits binding site predictions through a unified probabilistic framework. Finally, we report weak purifying selection on short deletions, providing important clues about overall spatial constraints on CRMs. Our results present a complex picture of regulatory sequence evolution, with substantial plasticity that depends on a number of factors. The insights gained in this study will help us to understand the combinatorial control of gene regulation and how it evolves. They will pave the way for theoretical models that are cognizant of the important determinants of regulatory sequence evolution and will be critical in genome-wide identification of non-coding sequences under purifying or positive selection. The spatial–temporal expression pattern of a gene, which is crucial to its function, is controlled by cis-regulatory DNA sequences. Forming the basic units of regulatory sequences are transcription factor binding sites, often organized into larger modules that determine gene expression in response to combinatorial environmental signals. Understanding the conservation and change of regulatory sequences is critical to our knowledge of the unity as well as diversity of animal development and phenotypes. In this paper, we study the evolution of sequences involved in the regulation of body patterning in the Drosophila embryo. We find that mutations of nucleotides within a binding site are constrained by evolutionary forces to preserve the site's binding affinity to the cognate transcription factor. Functional binding sites are frequently destroyed during evolution and the rate of loss across evolutionary spans is roughly constant. We also find that the evolutionary fate of a site strongly depends on its context; a pair of interacting sites are more likely to survive mutational forces than isolated sites. Together, these findings provide new insights and pose new challenges to our understanding of cis-regulatory sequences and their evolution.
Collapse
Affiliation(s)
- Jaebum Kim
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xin He
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
86
|
Regulatory Motif Analysis. Bioinformatics 2009. [DOI: 10.1007/978-0-387-92738-1_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
87
|
Polishchuk MS, Heinzel A, Favorov AV, Makeev YV. The binding sites of the proteins regulating transcription in the early development of Drosophila melanogaster: A comparative analysis of ChIP-chip data and theoretically predicted clusters. Biophysics (Nagoya-shi) 2008. [DOI: 10.1134/s0006350908050059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
88
|
Ranade SS, Yang-Zhou D, Kong SW, McDonald EC, Cook TA, Pignoni F. Analysis of the Otd-dependent transcriptome supports the evolutionary conservation of CRX/OTX/OTD functions in flies and vertebrates. Dev Biol 2008; 315:521-34. [PMID: 18241855 PMCID: PMC2329912 DOI: 10.1016/j.ydbio.2007.12.017] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2006] [Revised: 12/04/2007] [Accepted: 12/11/2007] [Indexed: 11/18/2022]
Abstract
Homeobox transcription factors of the vertebrate CRX/OTX family play critical roles in photoreceptor neurons, the rostral brain and circadian processes. In mouse, the three related proteins, CRX, OTX1, and OTX2, fulfill these functions. In Drosophila, the single founding member of this gene family, called orthodenticle (otd), is required during embryonic brain and photoreceptor neuron development. We have used global gene expression analysis in late pupal heads to better characterize the post-embryonic functions of Otd in Drosophila. We have identified 61 genes that are differentially expressed between wild type and a viable eye-specific otd mutant allele. Among them, about one-third represent potentially direct targets of Otd based on their association with evolutionarily conserved Otd-binding sequences. The spectrum of biological functions associated with these gene targets establishes Otd as a critical regulator of photoreceptor morphology and phototransduction, as well as suggests its involvement in circadian processes. Together with the well-documented role of otd in embryonic patterning, this evidence shows that vertebrate and fly genes contribute to analogous biological processes, notwithstanding the significant divergence of the underlying genetic pathways. Our findings underscore the common evolutionary history of photoperception-based functions in vertebrates and invertebrates and support the view that a complex nervous system was already present in the last common ancestor of all bilateria.
Collapse
Affiliation(s)
- Swati S. Ranade
- Department of Ophthalmology, Harvard Medical School and the Massachusetts Eye and Ear Infirmary, Boston, MA
| | - Donghui Yang-Zhou
- Department of Ophthalmology, Harvard Medical School and the Massachusetts Eye and Ear Infirmary, Boston, MA
| | - Sek Won Kong
- Bauer Center for Genomic Research, Harvard University, Cambridge, MA
| | - Elizabeth C. McDonald
- Division of Developmental Biology and Department of Pediatric Ophthalmology, Cincinnati Children’s Hospital Medical Center, University of Cincinnati School of Medicine, Cincinnati, OH
| | - Tiffany A. Cook
- Division of Developmental Biology and Department of Pediatric Ophthalmology, Cincinnati Children’s Hospital Medical Center, University of Cincinnati School of Medicine, Cincinnati, OH
| | - Francesca Pignoni
- Department of Ophthalmology, Harvard Medical School and the Massachusetts Eye and Ear Infirmary, Boston, MA
| |
Collapse
|
89
|
Noyes MB, Meng X, Wakabayashi A, Sinha S, Brodsky MH, Wolfe SA. A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system. Nucleic Acids Res 2008; 36:2547-60. [PMID: 18332042 PMCID: PMC2377422 DOI: 10.1093/nar/gkn048] [Citation(s) in RCA: 139] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Specificity data for groups of transcription factors (TFs) in a common regulatory network can be used to computationally identify the location of cis-regulatory modules in a genome. The primary limitation for this type of analysis is the paucity of specificity data that is available for the majority of TFs. We describe an omega-based bacterial one-hybrid system that provides a rapid method for characterizing DNA-binding specificities on a genome-wide scale. Using this system, 35 members of the Drosophila melanogaster segmentation network have been characterized, including representative members of all of the major classes of DNA-binding domains. A suite of web-based tools was created that uses this binding site dataset and phylogenetic comparisons to identify cis-regulatory modules throughout the fly genome. These tools allow specificities for any combination of factors to be used to perform rapid local or genome-wide searches for cis-regulatory modules. The utility of these factor specificities and tools is demonstrated on the well-characterized segmentation network. By incorporating specificity data on an additional 66 factors that we have characterized, our tools utilize ∼14% of the predicted factors within the fly genome and provide an important new community resource for the identification of cis-regulatory modules.
Collapse
Affiliation(s)
- Marcus B Noyes
- Program in Gene Function and Expression, Department of Biochemistry and Molecular Pharmacology, Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | | | | | | | | | | |
Collapse
|
90
|
Li L, Zhu Q, He X, Sinha S, Halfon MS. Large-scale analysis of transcriptional cis-regulatory modules reveals both common features and distinct subclasses. Genome Biol 2008; 8:R101. [PMID: 17550599 PMCID: PMC2394749 DOI: 10.1186/gb-2007-8-6-r101] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2007] [Revised: 05/23/2007] [Accepted: 06/05/2007] [Indexed: 02/01/2023] Open
Abstract
Analysis of 280 experimentally-verified cis-regulatory modules from Drosophila reveal features both common to all and unique to distinct subclasses of modules. Background Transcriptional cis-regulatory modules (for example, enhancers) play a critical role in regulating gene expression. While many individual regulatory elements have been characterized, they have never been analyzed as a class. Results We have performed the first such large-scale study of cis-regulatory modules in order to determine whether they have common properties that might aid in their identification and contribute to our understanding of the mechanisms by which they function. A total of 280 individual, experimentally verified cis-regulatory modules from Drosophila were analyzed for a range of sequence-level and functional properties. We report here that regulatory modules do indeed share common properties, among them an elevated GC content, an increased level of interspecific sequence conservation, and a tendency to be transcribed into RNA. However, we find that dense clustering of transcription factor binding sites, especially homotypic clustering, which is commonly believed to be a general characteristic of regulatory modules, is rather a feature that belongs chiefly to a specific subclass. This has important implications for current computational approaches, many of which are biased toward this subset. We explore two new strategies to assess binding site clustering and gauge their performances with respect to their ability to detect all 280 modules and various functionally coherent subsets. Conclusion Our findings demonstrate that cis-regulatory modules share common features that help to define them as a class and that may lead to new insights into mechanisms of gene regulation. However, these properties alone may not be sufficient to reliably distinguish regulatory from non-regulatory sequences. We also demonstrate that there are distinct subclasses of cis-regulatory modules that are more amenable to in silico detection than others and that these differences must be taken into account when attempting genome-wide regulatory element discovery.
Collapse
Affiliation(s)
- Long Li
- Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14214, USA
| | - Qianqian Zhu
- Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14214, USA
| | - Xin He
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Marc S Halfon
- Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14214, USA
- Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY 14214, USA
- New York State Center of Excellence in Bioinformatics and the Life Sciences, Buffalo, NY 14203, USA
- Department of Molecular and Cellular Biology, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| |
Collapse
|
91
|
Abstract
The regulation of segmentation gene expression is investigated by computational modeling using quantitative expression data. Previous tissue culture assays and transgene analyses raised the possibility that Hunchback (Hb) might function as both an activator and repressor of transcription. At low concentrations, Hb activates gene expression, whereas at high concentrations it mediates repression. Under the same experimental conditions, transcription factors encoded by other gap genes appear to function as dedicated repressors. Models based on dual regulation suggest that the Hb gradient can be sufficient for establishing the initial Kruppel (Kr) expression pattern in central regions of the precellular embryo. The subsequent refinement of the Kr pattern depends on the combination of Hb and the Giant (Gt) repressor. The dual-regulation models developed for Kr also explain some of the properties of the even-skipped (eve) stripe 3+7 enhancer. Computational simulations suggest that repression results from the dimerization of Hb monomers on the DNA template.
Collapse
|
92
|
Morgan XC, Ni S, Miranker DP, Iyer VR. Predicting combinatorial binding of transcription factors to regulatory elements in the human genome by association rule mining. BMC Bioinformatics 2007; 8:445. [PMID: 18005433 PMCID: PMC2211755 DOI: 10.1186/1471-2105-8-445] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2007] [Accepted: 11/15/2007] [Indexed: 12/20/2022] Open
Abstract
Background Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cis-regulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment. Results Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature. Conclusion Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.
Collapse
Affiliation(s)
- Xochitl C Morgan
- Institute for Cellular and Molecular Biology and Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712-0159, USA.
| | | | | | | |
Collapse
|
93
|
Aerts S, van Helden J, Sand O, Hassan BA. Fine-tuning enhancer models to predict transcriptional targets across multiple genomes. PLoS One 2007; 2:e1115. [PMID: 17973026 PMCID: PMC2047340 DOI: 10.1371/journal.pone.0001115] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2007] [Accepted: 09/02/2007] [Indexed: 01/05/2023] Open
Abstract
Networks of regulatory relations between transcription factors (TF) and their target genes (TG)- implemented through TF binding sites (TFBS)- are key features of biology. An idealized approach to solving such networks consists of starting from a consensus TFBS or a position weight matrix (PWM) to generate a high accuracy list of candidate TGs for biological validation. Developing and evaluating such approaches remains a formidable challenge in regulatory bioinformatics. We perform a benchmark study on 34 Drosophila TFs to assess existing TFBS and cis-regulatory module (CRM) detection methods, with a strong focus on the use of multiple genomes. Particularly, for CRM-modelling we investigate the addition of orthologous sites to a known PWM to construct phyloPWMs and we assess the added value of phylogenentic footprinting to predict contextual motifs around known TFBSs. For CRM-prediction, we compare motif conservation with network-level conservation approaches across multiple genomes. Choosing the optimal training and scoring strategies strongly enhances the performance of TG prediction for more than half of the tested TFs. Finally, we analyse a 35th TF, namely Eyeless, and find a significant overlap between predicted TGs and candidate TGs identified by microarray expression studies. In summary we identify several ways to optimize TF-specific TG predictions, some of which can be applied to all TFs, and others that can be applied only to particular TFs. The ability to model known TF-TG relations, together with the use of multiple genomes, results in a significant step forward in solving the architecture of gene regulatory networks.
Collapse
Affiliation(s)
- Stein Aerts
- Laboratory of Neurogenetics, Department of Molecular and Developmental Genetics, Vlaams Instituut voor Biotechnologie (VIB), Leuven, Belgium
- Department of Human Genetics, K.U. Leuven School of Medicine, Leuven, Belgium
- * To whom correspondence should be addressed. E-mail: (SA); (BH)
| | - Jacques van Helden
- Service de Conformation des Macromolécules Biologiques et de Bioinformatique, Département de Biologie Moléculaire, Université Libre de Bruxelles, Bruxelles, Belgium
| | - Olivier Sand
- Service de Conformation des Macromolécules Biologiques et de Bioinformatique, Département de Biologie Moléculaire, Université Libre de Bruxelles, Bruxelles, Belgium
| | - Bassem A. Hassan
- Laboratory of Neurogenetics, Department of Molecular and Developmental Genetics, Vlaams Instituut voor Biotechnologie (VIB), Leuven, Belgium
- Department of Human Genetics, K.U. Leuven School of Medicine, Leuven, Belgium
- * To whom correspondence should be addressed. E-mail: (SA); (BH)
| |
Collapse
|
94
|
Exact p-value calculation for heterotypic clusters of regulatory motifs and its application in computational annotation of cis-regulatory modules. Algorithms Mol Biol 2007; 2:13. [PMID: 17927813 PMCID: PMC2174486 DOI: 10.1186/1748-7188-2-13] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2007] [Accepted: 10/10/2007] [Indexed: 11/15/2022] Open
Abstract
Background cis-Regulatory modules (CRMs) of eukaryotic genes often contain multiple binding sites for transcription factors. The phenomenon that binding sites form clusters in CRMs is exploited in many algorithms to locate CRMs in a genome. This gives rise to the problem of calculating the statistical significance of the event that multiple sites, recognized by different factors, would be found simultaneously in a text of a fixed length. The main difficulty comes from overlapping occurrences of motifs. So far, no tools have been developed allowing the computation of p-values for simultaneous occurrences of different motifs which can overlap. Results We developed and implemented an algorithm computing the p-value that s different motifs occur respectively k1, ..., ks or more times, possibly overlapping, in a random text. Motifs can be represented with a majority of popular motif models, but in all cases, without indels. Zero or first order Markov chains can be adopted as a model for the random text. The computational tool was tested on the set of cis-regulatory modules involved in D. melanogaster early development, for which there exists an annotation of binding sites for transcription factors. Our test allowed us to correctly identify transcription factors cooperatively/competitively binding to DNA. Method The algorithm that precisely computes the probability of simultaneous motif occurrences is inspired by the Aho-Corasick automaton and employs a prefix tree together with a transition function. The algorithm runs with the O(n|Σ|(m|ℋ
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@| + K|σ|K) ∏i ki) time complexity, where n is the length of the text, |Σ| is the alphabet size, m is the maximal motif length, |ℋ
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@| is the total number of words in motifs, K is the order of Markov model, and ki is the number of occurrences of the ith motif. Conclusion The primary objective of the program is to assess the likelihood that a given DNA segment is CRM regulated with a known set of regulatory factors. In addition, the program can also be used to select the appropriate threshold for PWM scanning. Another application is assessing similarity of different motifs. Availability Project web page, stand-alone version and documentation can be found at
Collapse
|
95
|
Fan X, Zhu J, Schadt EE, Liu JS. Statistical power of phylo-HMM for evolutionarily conserved element detection. BMC Bioinformatics 2007; 8:374. [PMID: 17919331 PMCID: PMC2194792 DOI: 10.1186/1471-2105-8-374] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2007] [Accepted: 10/05/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND An important goal of comparative genomics is the identification of functional elements through conservation analysis. Phylo-HMM was recently introduced to detect conserved elements based on multiple genome alignments, but the method has not been rigorously evaluated. RESULTS We report here a simulation study to investigate the power of phylo-HMM. We show that the power of the phylo-HMM approach depends on many factors, the most important being the number of species-specific genomes used and evolutionary distances between pairs of species. This finding is consistent with results reported by other groups for simpler comparative genomics models. In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors. In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors. CONCLUSION Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.
Collapse
Affiliation(s)
- Xiaodan Fan
- Department of Statistics, Harvard University, Boston, MA, USA.
| | | | | | | |
Collapse
|
96
|
|
97
|
Abnizova I, Subhankulova T, Gilks WR. Recent computational approaches to understand gene regulation: mining gene regulation in silico. Curr Genomics 2007; 8:79-91. [PMID: 18660846 PMCID: PMC2435357 DOI: 10.2174/138920207780368150] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2006] [Revised: 12/13/2006] [Accepted: 12/15/2006] [Indexed: 01/03/2023] Open
Abstract
This paper reviews recent computational approaches to the understanding of gene regulation in eukaryotes. Cis-regulation of gene expression by the binding of transcription factors is a critical component of cellular physiology. In eukaryotes, a number of transcription factors often work together in a combinatorial fashion to enable cells to respond to a wide spectrum of environmental and developmental signals. Integration of genome sequences and/or Chromatin Immunoprecipitation on chip data with gene-expression data has facilitated in silico discovery of how the combinatorics and positioning of transcription factors binding sites underlie gene activation in a variety of cellular processes.The process of gene regulation is extremely complex and intriguing, therefore all possible points of view and related links should be carefully considered. Here we attempt to collect an inventory, not claiming it to be comprehensive and complete, of related computational biological topics covering gene regulation, which may en-lighten the process, and briefly review what is currently occurring in these areas.We will consider the following computational areas:o gene regulatory network construction;o evolution of regulatory DNA;o studies of its structural and statistical informational properties;o and finally, regulatory RNA.
Collapse
Affiliation(s)
| | - T Subhankulova
- Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer and Developmental Biology, Cambridge, UK
| | | |
Collapse
|
98
|
Zinzen RP, Papatsenko D. Enhancer responses to similarly distributed antagonistic gradients in development. PLoS Comput Biol 2007; 3:e84. [PMID: 17500585 PMCID: PMC1866357 DOI: 10.1371/journal.pcbi.0030084] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2006] [Accepted: 03/28/2007] [Indexed: 01/09/2023] Open
Abstract
Formation of spatial gene expression patterns in development depends on transcriptional responses mediated by gene control regions, enhancers. Here, we explore possible responses of enhancers to overlapping gradients of antagonistic transcriptional regulators in the Drosophila embryo. Using quantitative models based on enhancer structure, we demonstrate how a pair of antagonistic transcription factor gradients with similar or even identical spatial distributions can lead to the formation of distinct gene expression domains along the embryo axes. The described mechanisms are sufficient to explain the formation of the anterior and the posterior knirps expression, the posterior hunchback expression domain, and the lateral stripes of rhomboid expression and of other ventral neurogenic ectodermal genes. The considered principles of interaction between antagonistic gradients at the enhancer level can also be applied to diverse developmental processes, such as domain specification in imaginal discs, or even eyespot pattern formation in the butterfly wing. The early development of the fruit fly embryo depends on an intricate but well-studied gene regulatory network. In fly eggs, maternally deposited gene products—morphogenes—form spatial concentration gradients. The graded distribution of the maternal morphogenes initiates a cascade of gene interactions leading to embryo development. Gradients of activators and repressors regulating common target genes may produce different outcomes depending on molecular mechanisms, mediating their function. Here, we describe quantitative mathematical models for the interplay between gradients of positive and negative transcriptional regulators—proteins, activating or repressing their target genes through binding the gene's regulatory DNA sequences. We predict possible spatial outcomes of the transcriptional antagonistic interactions in fly development and consider examples where the predicted cases may take place.
Collapse
Affiliation(s)
- Robert P Zinzen
- Department of Molecular and Cell Biology, Center for Integrative Genomics, University of California, Berkeley, California, United States of America
| | - Dmitri Papatsenko
- Department of Molecular and Cell Biology, Center for Integrative Genomics, University of California, Berkeley, California, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
99
|
Papatsenko D. ClusterDraw web server: a tool to identify and visualize clusters of binding motifs for transcription factors. ACTA ACUST UNITED AC 2007; 23:1032-4. [PMID: 17308342 DOI: 10.1093/bioinformatics/btm047] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
ClusterDraw is a program aimed to identification of binding sites and binding-site clusters. Major difference of the ClusterDraw from existing tools is its ability to scan a wide range of parameter values and weigh statistical significance of all possible clusters, smaller than a selected size. The program produces graphs along with decorated FASTA files. ClusterDraw web server is available at the following URL: http://flydev.berkeley.edu/cgi-bin/cld/submit.cgi
Collapse
Affiliation(s)
- Dmitri Papatsenko
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA.
| |
Collapse
|
100
|
Schones DE, Smith AD, Zhang MQ. Statistical significance of cis-regulatory modules. BMC Bioinformatics 2007; 8:19. [PMID: 17241466 PMCID: PMC1796902 DOI: 10.1186/1471-2105-8-19] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2006] [Accepted: 01/22/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND It is becoming increasingly important for researchers to be able to scan through large genomic regions for transcription factor binding sites or clusters of binding sites forming cis-regulatory modules. Correspondingly, there has been a push to develop algorithms for the rapid detection and assessment of cis-regulatory modules. While various algorithms for this purpose have been introduced, most are not well suited for rapid, genome scale scanning. RESULTS We introduce methods designed for the detection and statistical evaluation of cis-regulatory modules, modeled as either clusters of individual binding sites or as combinations of sites with constrained organization. In order to determine the statistical significance of module sites, we first need a method to determine the statistical significance of single transcription factor binding site matches. We introduce a straightforward method of estimating the statistical significance of single site matches using a database of known promoters to produce data structures that can be used to estimate p-values for binding site matches. We next introduce a technique to calculate the statistical significance of the arrangement of binding sites within a module using a max-gap model. If the module scanned for has defined organizational parameters, the probability of the module is corrected to account for organizational constraints. The statistical significance of single site matches and the architecture of sites within the module can be combined to provide an overall estimation of statistical significance of cis-regulatory module sites. CONCLUSION The methods introduced in this paper allow for the detection and statistical evaluation of single transcription factor binding sites and cis-regulatory modules. The features described are implemented in the Search Tool for Occurrences of Regulatory Motifs (STORM) and MODSTORM software.
Collapse
Affiliation(s)
- Dustin E Schones
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
- Department of Physics and Astronomy, Stony Brook University, Stony Brook, NY 11790, USA
| | - Andrew D Smith
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Michael Q Zhang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| |
Collapse
|