51
|
Teif VB, Rippe K. Calculating transcription factor binding maps for chromatin. Brief Bioinform 2011; 13:187-201. [PMID: 21737419 DOI: 10.1093/bib/bbr037] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Current high-throughput experiments already generate enough data for retrieving the DNA sequence-dependent binding affinities of transcription factors (TF) and other chromosomal proteins throughout the complete genome. However, the reverse task of calculating binding maps in a chromatin context for a given set of concentrations and TF affinities appears to be even more challenging and computationally demanding. The problem can be addressed by considering the DNA sequence as a one-dimensional lattice with units of one or more base pairs. To calculate protein occupancies in chromatin, one needs to consider the competition of TF and histone octamers for binding sites as well as the partial unwrapping of nucleosomal DNA. Here, we consider five different classes of algorithms to compute binding maps that include the binary variable, combinatorial, sequence generating function, transfer matrix and dynamic programming approaches. The calculation time of the binary variable algorithm scales exponentially with DNA length, which limits its use to the analysis of very small genomic regions. For regulatory regions with many overlapping binding sites, potentially applicable algorithms reduce either to the transfer matrix or dynamic programming approach. In addition to the recently proposed transfer matrix formalism for TF access to the nucleosomal organized DNA, we develop here a dynamic programming algorithm that accounts for this feature. In the absence of nucleosomes, dynamic programming outperforms the transfer matrix approach, but the latter is faster when nucleosome unwrapping has to be considered. Strategies are discussed that could further facilitate calculations to allow computing genome-wide TF binding maps.
Collapse
Affiliation(s)
- Vladimir B Teif
- BioQuant and German Cancer Research Center (DKFZ), Im Neuenheimer Feld 267, 69120 Heidelberg, Germany.
| | | |
Collapse
|
52
|
Zhou Q. On weight matrix and free energy models for sequence motif detection. J Comput Biol 2011; 17:1621-38. [PMID: 21128853 DOI: 10.1089/cmb.2009.0142] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The problem of motif detection can be formulated as the construction of a discriminant function to separate sequences of a specific pattern from background. In computational biology, motif detection is used to predict DNA binding sites of a transcription factor (TF), mostly based on the weight matrix (WM) model or the Gibbs free energy (FE) model. However, despite the wide applications, theoretical analysis of these two models and their predictions is still lacking. We derive asymptotic error rates of prediction procedures based on these models under different data generation assumptions. This allows a theoretical comparison between the WM-based and the FE-based predictions in terms of asymptotic efficiency. Applications of the theoretical results are demonstrated with empirical studies on ChIP-seq data and protein binding microarray data. We find that, irrespective of underlying data generation mechanisms, the FE approach shows higher or comparable predictive power relative to the WM approach when the number of observed binding sites used for constructing a discriminant decision is not too small.
Collapse
Affiliation(s)
- Qing Zhou
- Department of Statistics, University of California, Los Angeles, California 90095, USA.
| |
Collapse
|
53
|
Wang H, Mayhew D, Chen X, Johnston M, Mitra RD. Calling Cards enable multiplexed identification of the genomic targets of DNA-binding proteins. Genome Res 2011; 21:748-55. [PMID: 21471402 DOI: 10.1101/gr.114850.110] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Transcription factors direct gene expression, so there is much interest in mapping their genome-wide binding locations. Current methods do not allow for the multiplexed analysis of TF binding, and this limits their throughput. We describe a novel method for determining the genomic target genes of multiple transcription factors simultaneously. DNA-binding proteins are endowed with the ability to direct transposon insertions into the genome near to where they bind. The transposon becomes a "Calling Card" marking the visit of the DNA-binding protein to that location. A unique sequence "barcode" in the transposon matches it to the DNA-binding protein that directed its insertion. The sequences of the DNA flanking the transposon (which reveal where in the genome the transposon landed) and the barcode within the transposon (which identifies the TF that put it there) are determined by massively parallel DNA sequencing. To demonstrate the method's feasibility, we determined the genomic targets of eight transcription factors in a single experiment. The Calling Card method promises to significantly reduce the cost and labor needed to determine the genomic targets of many transcription factors in different environmental conditions and genetic backgrounds.
Collapse
Affiliation(s)
- Haoyi Wang
- Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University, School of Medicine, St. Louis, Missouri 63108, USA
| | | | | | | | | |
Collapse
|
54
|
Moyroud E, Minguet EG, Ott F, Yant L, Posé D, Monniaux M, Blanchet S, Bastien O, Thévenon E, Weigel D, Schmid M, Parcy F. Prediction of regulatory interactions from genome sequences using a biophysical model for the Arabidopsis LEAFY transcription factor. THE PLANT CELL 2011; 23:1293-306. [PMID: 21515819 PMCID: PMC3101549 DOI: 10.1105/tpc.111.083329] [Citation(s) in RCA: 127] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Revised: 03/22/2011] [Accepted: 04/01/2011] [Indexed: 05/18/2023]
Abstract
Despite great advances in sequencing technologies, generating functional information for nonmodel organisms remains a challenge. One solution lies in an improved ability to predict genetic circuits based on primary DNA sequence in combination with detailed knowledge of regulatory proteins that have been characterized in model species. Here, we focus on the LEAFY (LFY) transcription factor, a conserved master regulator of floral development. Starting with biochemical and structural information, we built a biophysical model describing LFY DNA binding specificity in vitro that accurately predicts in vivo LFY binding sites in the Arabidopsis thaliana genome. Applying the model to other plant species, we could follow the evolution of the regulatory relationship between LFY and the AGAMOUS (AG) subfamily of MADS box genes and show that this link predates the divergence between monocots and eudicots. Remarkably, our model succeeds in detecting the connection between LFY and AG homologs despite extensive variation in binding sites. This demonstrates that the cis-element fluidity recently observed in animals also exists in plants, but the challenges it poses can be overcome with predictions grounded in a biophysical model. Therefore, our work opens new avenues to deduce the structure of regulatory networks from mere inspection of genomic sequences.
Collapse
Affiliation(s)
- Edwige Moyroud
- Laboratoire de Physiologie Cellulaire Végétale, Unité Mixte de Recherche 5168, Centre National de la Recherche Scientifique, Commissariat à l’Énergie Atomique, Institut National de la Recherche Agronomique, Université Joseph Fourier Grenoble I, 38054 Grenoble, France
| | - Eugenio Gómez Minguet
- Laboratoire de Physiologie Cellulaire Végétale, Unité Mixte de Recherche 5168, Centre National de la Recherche Scientifique, Commissariat à l’Énergie Atomique, Institut National de la Recherche Agronomique, Université Joseph Fourier Grenoble I, 38054 Grenoble, France
| | - Felix Ott
- Max Planck Institute for Developmental Biology, Department of Molecular Biology, 72076 Tuebingen, Germany
| | - Levi Yant
- Max Planck Institute for Developmental Biology, Department of Molecular Biology, 72076 Tuebingen, Germany
| | - David Posé
- Max Planck Institute for Developmental Biology, Department of Molecular Biology, 72076 Tuebingen, Germany
| | - Marie Monniaux
- Laboratoire de Physiologie Cellulaire Végétale, Unité Mixte de Recherche 5168, Centre National de la Recherche Scientifique, Commissariat à l’Énergie Atomique, Institut National de la Recherche Agronomique, Université Joseph Fourier Grenoble I, 38054 Grenoble, France
| | - Sandrine Blanchet
- Laboratoire de Physiologie Cellulaire Végétale, Unité Mixte de Recherche 5168, Centre National de la Recherche Scientifique, Commissariat à l’Énergie Atomique, Institut National de la Recherche Agronomique, Université Joseph Fourier Grenoble I, 38054 Grenoble, France
| | - Olivier Bastien
- Laboratoire de Physiologie Cellulaire Végétale, Unité Mixte de Recherche 5168, Centre National de la Recherche Scientifique, Commissariat à l’Énergie Atomique, Institut National de la Recherche Agronomique, Université Joseph Fourier Grenoble I, 38054 Grenoble, France
| | - Emmanuel Thévenon
- Laboratoire de Physiologie Cellulaire Végétale, Unité Mixte de Recherche 5168, Centre National de la Recherche Scientifique, Commissariat à l’Énergie Atomique, Institut National de la Recherche Agronomique, Université Joseph Fourier Grenoble I, 38054 Grenoble, France
| | - Detlef Weigel
- Max Planck Institute for Developmental Biology, Department of Molecular Biology, 72076 Tuebingen, Germany
| | - Markus Schmid
- Max Planck Institute for Developmental Biology, Department of Molecular Biology, 72076 Tuebingen, Germany
| | - François Parcy
- Laboratoire de Physiologie Cellulaire Végétale, Unité Mixte de Recherche 5168, Centre National de la Recherche Scientifique, Commissariat à l’Énergie Atomique, Institut National de la Recherche Agronomique, Université Joseph Fourier Grenoble I, 38054 Grenoble, France
- Address correspondence to
| |
Collapse
|
55
|
Kaplan T, Li XY, Sabo PJ, Thomas S, Stamatoyannopoulos JA, Biggin MD, Eisen MB. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development. PLoS Genet 2011; 7:e1001290. [PMID: 21304941 PMCID: PMC3033374 DOI: 10.1371/journal.pgen.1001290] [Citation(s) in RCA: 139] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2010] [Accepted: 01/01/2011] [Indexed: 01/01/2023] Open
Abstract
Transcription factors that drive complex patterns of gene expression during animal development bind to thousands of genomic regions, with quantitative differences in binding across bound regions mediating their activity. While we now have tools to characterize the DNA affinities of these proteins and to precisely measure their genome-wide distribution in vivo, our understanding of the forces that determine where, when, and to what extent they bind remains primitive. Here we use a thermodynamic model of transcription factor binding to evaluate the contribution of different biophysical forces to the binding of five regulators of early embryonic anterior-posterior patterning in Drosophila melanogaster. Predictions based on DNA sequence and in vitro protein-DNA affinities alone achieve a correlation of ∼0.4 with experimental measurements of in vivo binding. Incorporating cooperativity and competition among the five factors, and accounting for spatial patterning by modeling binding in every nucleus independently, had little effect on prediction accuracy. A major source of error was the prediction of binding events that do not occur in vivo, which we hypothesized reflected reduced accessibility of chromatin. To test this, we incorporated experimental measurements of genome-wide DNA accessibility into our model, effectively restricting predicted binding to regions of open chromatin. This dramatically improved our predictions to a correlation of 0.6-0.9 for various factors across known target genes. Finally, we used our model to quantify the roles of DNA sequence, accessibility, and binding competition and cooperativity. Our results show that, in regions of open chromatin, binding can be predicted almost exclusively by the sequence specificity of individual factors, with a minimal role for protein interactions. We suggest that a combination of experimentally determined chromatin accessibility data and simple computational models of transcription factor binding may be used to predict the binding landscape of any animal transcription factor with significant precision.
Collapse
Affiliation(s)
- Tommy Kaplan
- Department of Molecular and Cell Biology, California Institute of Quantitative Biosciences, University of California Berkeley, Berkeley, California, United States of America
| | - Xiao-Yong Li
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, California, United States of America
| | - Peter J. Sabo
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Sean Thomas
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | | | - Mark D. Biggin
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Michael B. Eisen
- Department of Molecular and Cell Biology, California Institute of Quantitative Biosciences, University of California Berkeley, Berkeley, California, United States of America
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, California, United States of America
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| |
Collapse
|
56
|
Teif VB, Ettig R, Rippe K. A lattice model for transcription factor access to nucleosomal DNA. Biophys J 2011; 99:2597-607. [PMID: 20959101 DOI: 10.1016/j.bpj.2010.08.019] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2010] [Revised: 08/09/2010] [Accepted: 08/13/2010] [Indexed: 10/18/2022] Open
Abstract
Nucleosomes, the basic repeating unit of chromatin, consist of 147 basepairs of DNA that are wrapped in almost two turns around a histone protein octamer core. Because ∼3/4 of the human genomic DNA is found within nucleosomes, their position and DNA interaction is an essential determinant for the DNA access of gene-specific transcription factors and other proteins. Here, a DNA lattice model was developed for describing ligand binding in the presence of a nucleosome. The model takes into account intermediate states, in which DNA is partially unwrapped from the histone octamer. This facilitates access of transcription factors to up to 60 DNA basepairs located in the outer turn of nucleosomal DNA, while the inner DNA turn was found to be more resistant to competitive ligand binding. As deduced from quantitative comparisons with recently published experimental data, our model provides a better description than the previously used all-or-none lattice-binding model. Importantly, nucleosome-occupancy maps predicted by the nucleosome-unwrapping model also differed significantly when partial unwrapping of nucleosomal DNA was considered. In addition, large effects on the cooperative binding of transcription factors to multiple binding sites occluded by the nucleosome were apparent. These findings indicate that partial unwrapping of DNA from the histone octamer needs to be taken into account in quantitative models of gene regulation in chromatin.
Collapse
Affiliation(s)
- Vladimir B Teif
- BioQuant and German Cancer Research Center, Heidelberg, Germany.
| | | | | |
Collapse
|
57
|
Deo RC, Wilson JG, Xing C, Lawson K, Kao WHL, Reich D, Tandon A, Akylbekova E, Patterson N, Mosley TH, Boerwinkle E, Taylor HA. Single-nucleotide polymorphisms in LPA explain most of the ancestry-specific variation in Lp(a) levels in African Americans. PLoS One 2011; 6:e14581. [PMID: 21283670 PMCID: PMC3025914 DOI: 10.1371/journal.pone.0014581] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2010] [Accepted: 12/23/2010] [Indexed: 11/27/2022] Open
Abstract
Lipoprotein(a) (Lp(a)) is an important causal cardiovascular risk factor, with serum Lp(a) levels predicting atherosclerotic heart disease and genetic determinants of Lp(a) levels showing association with myocardial infarction. Lp(a) levels vary widely between populations, with African-derived populations having nearly 2-fold higher Lp(a) levels than European Americans. We investigated the genetic basis of this difference in 4464 African Americans from the Jackson Heart Study (JHS) using a panel of up to 1447 ancestry informative markers, allowing us to accurately estimate the African ancestry proportion of each individual at each position in the genome. In an unbiased genome-wide admixture scan for frequency-differentiated genetic determinants of Lp(a) level, we found a convincing peak (LOD = 13.6) at 6q25.3, which spans the LPA locus. Dense fine-mapping of the LPA locus identified a number of strongly associated, common biallelic SNPs, a subset of which can account for up to 7% of the variation in Lp(a) level, as well as >70% of the African-European population differences in Lp(a) level. We replicated the association of the most strongly associated SNP, rs9457951 (p = 6 × 10(-22), 27% change in Lp(a) per allele, ∼5% of Lp(a) variance explained in JHS), in 1,726 African Americans from the Dallas Heart Study and found an even stronger association after adjustment for the kringle(IV) repeat copy number. Despite the strong association with Lp(a) levels, we find no association of any LPA SNP with incident coronary heart disease in 3,225 African Americans from the Atherosclerosis Risk in Communities Study.
Collapse
Affiliation(s)
- Rahul C Deo
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
58
|
Abstract
This chapter briefly summarizes the topics in this volume.
Collapse
|
59
|
Beshnova DA, Bereznyak EG, Shestopalova AV, Evstigneev MP. A novel computational approach “BP-STOCH” to study ligand binding to finite lattice. Biopolymers 2010; 95:208-16. [DOI: 10.1002/bip.21562] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
60
|
Teif VB, Rippe K. Statistical-mechanical lattice models for protein-DNA binding in chromatin. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2010; 22:414105. [PMID: 21386588 DOI: 10.1088/0953-8984/22/41/414105] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Statistical-mechanical lattice models for protein-DNA binding are well established as a method to describe complex ligand binding equilibria measured in vitro with purified DNA and protein components. Recently, a new field of applications has opened up for this approach since it has become possible to experimentally quantify genome-wide protein occupancies in relation to the DNA sequence. In particular, the organization of the eukaryotic genome by histone proteins into a nucleoprotein complex termed chromatin has been recognized as a key parameter that controls the access of transcription factors to the DNA sequence. New approaches have to be developed to derive statistical-mechanical lattice descriptions of chromatin-associated protein-DNA interactions. Here, we present the theoretical framework for lattice models of histone-DNA interactions in chromatin and investigate the (competitive) DNA binding of other chromosomal proteins and transcription factors. The results have a number of applications for quantitative models for the regulation of gene expression.
Collapse
Affiliation(s)
- Vladimir B Teif
- Research Group Genome Organization and Function, Deutsches Krebsforschungszentrum and BioQuant, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany.
| | | |
Collapse
|
61
|
Su CH, Shih CH, Chang TH, Tsai HK. Genome-wide analysis of the cis-regulatory modules of divergent gene pairs in yeast. Genomics 2010; 96:352-61. [PMID: 20826206 DOI: 10.1016/j.ygeno.2010.08.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2010] [Revised: 08/27/2010] [Accepted: 08/27/2010] [Indexed: 01/16/2023]
Abstract
In budding yeast, approximately a quarter of adjacent genes are divergently transcribed (divergent gene pairs). Whether genes in a divergent pair share the same regulatory system is still unknown. By examining transcription factor (TF) knockout experiments, we found that most TF knockout only altered the expression of one gene in a divergent pair. This prompted us to conduct a comprehensive analysis in silico to estimate how many divergent pairs are regulated by common sets of TFs (cis-regulatory modules, CRMs) using TF binding sites and expression data. Analyses of ten expression datasets show that only a limited number of divergent gene pairs share CRMs in any single dataset. However, around half of divergent pairs do share a regulatory system in at least one dataset. Our analysis suggests that genes in a divergent pair tend to be co-regulated in at least one condition; however, in most conditions, they may not be co-regulated.
Collapse
Affiliation(s)
- Chien-Hao Su
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan.
| | | | | | | |
Collapse
|
62
|
Condensed DNA: condensing the concepts. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2010; 105:208-22. [PMID: 20638406 DOI: 10.1016/j.pbiomolbio.2010.07.002] [Citation(s) in RCA: 184] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2010] [Accepted: 07/11/2010] [Indexed: 01/09/2023]
Abstract
DNA is stored in vivo in a highly compact, so-called condensed phase, where gene regulatory processes are governed by the intricate interplay between different states of DNA compaction. These systems often have surprising properties, which one would not predict from classical concepts of dilute solutions. The mechanistic details of DNA packing are essential for its functioning, as revealed by the recent developments coming from biochemistry, electrostatics, statistical mechanics, and molecular and cell biology. Different aspects of condensed DNA behavior are linked to each other, but the links are often hidden in the bulk of experimental and theoretical details. Here we try to condense some of these concepts and provide interconnections between the different fields. After a brief description of main experimental features of DNA condensation inside viruses, bacteria, eukaryotes and the test tube, main theoretical approaches for the description of these systems are presented. We end up with an extended discussion of the role of DNA condensation in the context of gene regulation and mention potential applications of DNA condensation in gene therapy and biotechnology.
Collapse
|
63
|
Kazan H, Ray D, Chan ET, Hughes TR, Morris Q. RNAcontext: a new method for learning the sequence and structure binding preferences of RNA-binding proteins. PLoS Comput Biol 2010; 6:e1000832. [PMID: 20617199 PMCID: PMC2895634 DOI: 10.1371/journal.pcbi.1000832] [Citation(s) in RCA: 172] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2009] [Accepted: 05/25/2010] [Indexed: 12/31/2022] Open
Abstract
Metazoan genomes encode hundreds of RNA-binding proteins (RBPs). These proteins regulate post-transcriptional gene expression and have critical roles in numerous cellular processes including mRNA splicing, export, stability and translation. Despite their ubiquity and importance, the binding preferences for most RBPs are not well characterized. In vitro and in vivo studies, using affinity selection-based approaches, have successfully identified RNA sequence associated with specific RBPs; however, it is difficult to infer RBP sequence and structural preferences without specifically designed motif finding methods. In this study, we introduce a new motif-finding method, RNAcontext, designed to elucidate RBP-specific sequence and structural preferences with greater accuracy than existing approaches. We evaluated RNAcontext on recently published in vitro and in vivo RNA affinity selected data and demonstrate that RNAcontext identifies known binding preferences for several control proteins including HuR, PTB, and Vts1p and predicts new RNA structure preferences for SF2/ASF, RBM4, FUSIP1 and SLM2. The predicted preferences for SF2/ASF are consistent with its recently reported in vivo binding sites. RNAcontext is an accurate and efficient motif finding method ideally suited for using large-scale RNA-binding affinity datasets to determine the relative binding preferences of RBPs for a wide range of RNA sequences and structures.
Collapse
Affiliation(s)
- Hilal Kazan
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | | | | | | | | |
Collapse
|
64
|
Kiełbasa SM, Klein H, Roider HG, Vingron M, Blüthgen N. TransFind--predicting transcriptional regulators for gene sets. Nucleic Acids Res 2010; 38:W275-80. [PMID: 20511592 PMCID: PMC2896106 DOI: 10.1093/nar/gkq438] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
The analysis of putative transcription factor binding sites in promoter regions of coregulated genes allows to infer the transcription factors that underlie observed changes in gene expression. While such analyses constitute a central component of the in-silico characterization of transcriptional regulatory networks, there is still a lack of simple-to-use web servers able to combine state-of-the-art prediction methods with phylogenetic analysis and appropriate multiple testing corrected statistics, which returns the results within a short time. Having these aims in mind we developed TransFind, which is freely available at http://transfind.sys-bio.net/.
Collapse
Affiliation(s)
- Szymon M Kiełbasa
- Max Planck Institute for Molecular Genetics, Ihnestrasse 73, D-14195 Berlin, Germany.
| | | | | | | | | |
Collapse
|
65
|
Challenges for modeling global gene regulatory networks during development: Insights from Drosophila. Dev Biol 2010; 340:161-9. [DOI: 10.1016/j.ydbio.2009.10.032] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2009] [Revised: 10/14/2009] [Accepted: 10/21/2009] [Indexed: 12/26/2022]
|
66
|
Abstract
We have produced an evolutionary model for promoters, analogous to the commonly used synonymous/nonsynonymous mutation models for protein-coding sequences. Although our model, called Sunflower, relies on some simple assumptions, it captures enough of the biology of transcription factor action to show clear correlation with other biological features. Sunflower predicts a binding profile of transcription factors to DNA sequences, in which different factors compete for the same potential binding sites. The parametrized model simultaneously estimates a continuous measurement of binding occupancy across the genomic sequence for each factor. We can then introduce a localized mutation, rerun the binding model, and record the difference in binding profiles. A single mutation can alter interactions both upstream and downstream of its position due to potential overlapping binding sites, and our statistic captures this domino effect. Over evolutionary time, we observe a clear excess of low-scoring mutations fixed in promoters, consistent with most changes being neutral. However, this is not consistent across all promoters, and some promoters show more rapid divergence. This divergence often occurs in the presence of relatively constant protein-coding divergence. Interestingly, different classes of promoters show different sensitivity to mutations, with phosphorylation-related genes having promoters inherently more sensitive to mutations than immune genes. Although there have previously been a number of models attempting to handle transcription factor binding, Sunflower provides a richer biological model, incorporating weak binding sites and the possibility of competition. The results show the first clear correlations between such a model and evolutionary processes.
Collapse
Affiliation(s)
- Michael M Hoffman
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, United Kingdom
| | | |
Collapse
|
67
|
Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS One 2010; 5:e9202. [PMID: 20186320 PMCID: PMC2826397 DOI: 10.1371/journal.pone.0009202] [Citation(s) in RCA: 298] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2009] [Accepted: 01/19/2010] [Indexed: 11/29/2022] Open
Abstract
Background Systems biology has embraced computational modeling in response to the quantitative nature and increasing scale of contemporary data sets. The onslaught of data is accelerating as molecular profiling technology evolves. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) is a community effort to catalyze discussion about the design, application, and assessment of systems biology models through annual reverse-engineering challenges. Methodology and Principal Findings We describe our assessments of the four challenges associated with the third DREAM conference which came to be known as the DREAM3 challenges: signaling cascade identification, signaling response prediction, gene expression prediction, and the DREAM3 in silico network challenge. The challenges, based on anonymized data sets, tested participants in network inference and prediction of measurements. Forty teams submitted 413 predicted networks and measurement test sets. Overall, a handful of best-performer teams were identified, while a majority of teams made predictions that were equivalent to random. Counterintuitively, combining the predictions of multiple teams (including the weaker teams) can in some cases improve predictive power beyond that of any single method. Conclusions DREAM provides valuable feedback to practitioners of systems biology modeling. Lessons learned from the predictions of the community provide much-needed context for interpreting claims of efficacy of algorithms described in the scientific literature.
Collapse
|
68
|
Goh WS, Orlov Y, Li J, Clarke ND. Blurring of high-resolution data shows that the effect of intrinsic nucleosome occupancy on transcription factor binding is mostly regional, not local. PLoS Comput Biol 2010; 6:e1000649. [PMID: 20098497 PMCID: PMC2799660 DOI: 10.1371/journal.pcbi.1000649] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2009] [Accepted: 12/16/2009] [Indexed: 11/23/2022] Open
Abstract
Genome wide maps of nucleosome occupancy in yeast have recently been produced through deep sequencing of nuclease-protected DNA. These maps have been obtained from both crosslinked and uncrosslinked chromatin in vivo, and from chromatin assembled from genomic DNA and nucleosomes in vitro. Here, we analyze these maps in combination with existing ChIP-chip data, and with new ChIP-qPCR experiments reported here. We show that the apparent nucleosome density in crosslinked chromatin, when compared to uncrosslinked chromatin, is preferentially increased at transcription factor (TF) binding sites, suggesting a strategy for mapping generic transcription factor binding sites that would not require immunoprecipitation of a particular factor. We also confirm previous conclusions that the intrinsic, sequence dependent binding of nucleosomes helps determine the localization of TF binding sites. However, we find that the association between low nucleosome occupancy and TF binding is typically greater if occupancy at a site is averaged over a 600bp window, rather than using the occupancy at the binding site itself. We have also incorporated intrinsic nucleosome binding occupancies as weights in a computational model for TF binding, and by this measure as well we find better prediction if the high resolution nucleosome occupancy data is averaged over 600bp. We suggest that the intrinsic DNA binding specificity of nucleosomes plays a role in TF binding site selection not so much through the specification of precise nucleosome positions that permit or occlude binding, but rather through the creation of low occupancy regions that can accommodate competition from TFs through rearrangement of nucleosomes.
Collapse
Affiliation(s)
- Wee Siong Goh
- Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Yuriy Orlov
- Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Jingmei Li
- Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Neil D. Clarke
- Computational and Systems Biology, Genome Institute of Singapore, Singapore
| |
Collapse
|
69
|
Abstract
Hundreds of different factors adorn the eukaryotic genome, binding to it in large number. These DNA binding factors (DBFs) include nucleosomes, transcription factors (TFs), and other proteins and protein complexes, such as the origin recognition complex (ORC). DBFs compete with one another for binding along the genome, yet many current models of genome binding do not consider different types of DBFs together simultaneously. Additionally, binding is a stochastic process that results in a continuum of binding probabilities at any position along the genome, but many current models tend to consider positions as being either binding sites or not. Here, we present a model that allows a multitude of DBFs, each at different concentrations, to compete with one another for binding sites along the genome. The result is an "occupancy profile," a probabilistic description of the DNA occupancy of each factor at each position. We implement our model efficiently as the software package COMPETE. We demonstrate genome-wide and at specific loci how modeling nucleosome binding alters TF binding, and vice versa, and illustrate how factor concentration influences binding occupancy. Binding cooperativity between nearby TFs arises implicitly via mutual competition with nucleosomes. Our method applies not only to TFs, but also recapitulates known occupancy profiles of a well-studied replication origin with and without ORC binding. Importantly, the sequence preferences our model takes as input are derived from in vitro experiments. This ensures that the calculated occupancy profiles are the result of the forces of competition represented explicitly in our model and the inherent sequence affinities of the constituent DBFs.
Collapse
|
70
|
Abstract
Transcriptional regulation is largely enacted by transcription factors (TFs) binding DNA. Large numbers of TF binding motifs have been revealed by ChIP-chip experiments followed by computational DNA motif discovery. However, the success of motif discovery algorithms has been limited when applied to sequences bound in vivo (such as those identified by ChIP-chip) because the observed TF-DNA interactions are not necessarily direct: Some TFs predominantly associate with DNA indirectly through protein partners, while others exhibit both direct and indirect binding. Here, we present the first method for distinguishing between direct and indirect TF-DNA interactions, integrating in vivo TF binding data, in vivo nucleosome occupancy data, and motifs from in vitro protein binding microarray experiments. When applied to yeast ChIP-chip data, our method reveals that only 48% of the data sets can be readily explained by direct binding of the profiled TF, while 16% can be explained by indirect DNA binding. In the remaining 36%, none of the motifs used in our analysis was able to explain the ChIP-chip data, either because the data were too noisy or because the set of motifs was incomplete. As more in vitro TF DNA binding motifs become available, our method could be used to build a complete catalog of direct and indirect TF-DNA interactions. Our method is not restricted to yeast or to ChIP-chip data, but can be applied in any system for which both in vivo binding data and in vitro DNA binding motifs are available.
Collapse
|
71
|
Abstract
Complex transcriptional behaviours are encoded in the DNA sequences of gene regulatory regions. Advances in our understanding of these behaviours have been recently gained through quantitative models that describe how molecules such as transcription factors and nucleosomes interact with genomic sequences. An emerging view is that every regulatory sequence is associated with a unique binding affinity landscape for each molecule and, consequently, with a unique set of molecule-binding configurations and transcriptional outputs. We present a quantitative framework based on existing methods that unifies these ideas. This framework explains many experimental observations regarding the binding patterns of factors and nucleosomes and the dynamics of transcriptional activation. It can also be used to model more complex phenomena such as transcriptional noise and the evolution of transcriptional regulation.
Collapse
Affiliation(s)
- Eran Segal
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, 76100, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Jonathan Widom
- Department of Biochemistry, Molecular Biology, and Cell Biology, Northwestern University, 2205 Tech Drive, Evanston, IL 60208-3500 USA
| |
Collapse
|
72
|
Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA, Chan ET, Metzler G, Vedenko A, Chen X, Kuznetsov H, Wang CF, Coburn D, Newburger DE, Morris Q, Hughes TR, Bulyk ML. Diversity and complexity in DNA recognition by transcription factors. Science 2009; 324:1720-3. [PMID: 19443739 PMCID: PMC2905877 DOI: 10.1126/science.1162327] [Citation(s) in RCA: 740] [Impact Index Per Article: 49.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Sequence preferences of DNA binding proteins are a primary mechanism by which cells interpret the genome. Despite the central importance of these proteins in physiology, development, and evolution, comprehensive DNA binding specificities have been determined experimentally for only a few proteins. Here, we used microarrays containing all 10-base pair sequences to examine the binding specificities of 104 distinct mouse DNA binding proteins representing 22 structural classes. Our results reveal a complex landscape of binding, with virtually every protein analyzed possessing unique preferences. Roughly half of the proteins each recognized multiple distinctly different sequence motifs, challenging our molecular understanding of how proteins interact with their DNA binding sites. This complexity in DNA recognition may be important in gene regulation and in the evolution of transcriptional regulatory networks.
Collapse
Affiliation(s)
- Gwenael Badis
- Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
| | - Michael F. Berger
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA 02138
| | - Anthony A. Philippakis
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Harvard-MIT Division of Health Sciences and Technology (HST); Harvard Medical School, Boston, MA 02115
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA 02138
| | - Shaheynoor Talukder
- Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
- Department of Molecular Genetics, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
| | - Andrew R. Gehrke
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
| | - Savina A. Jaeger
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
| | - Esther T. Chan
- Department of Molecular Genetics, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
| | - Genita Metzler
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139
| | | | - Xiaoyu Chen
- Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
| | - Hanna Kuznetsov
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Chi-Fong Wang
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - David Coburn
- Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
| | - Daniel E. Newburger
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
| | - Quaid Morris
- Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
- Department of Molecular Genetics, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
- Department of Computer Science, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
| | - Timothy R. Hughes
- Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
- Department of Molecular Genetics, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
| | - Martha L. Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Harvard-MIT Division of Health Sciences and Technology (HST); Harvard Medical School, Boston, MA 02115
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA 02138
| |
Collapse
|
73
|
Vega VB, Woo XY, Hamidi H, Yeo HC, Yeo ZX, Bourque G, Clarke ND. Inferring direct regulatory targets of a transcription factor in the DREAM2 challenge. Ann N Y Acad Sci 2009; 1158:215-23. [PMID: 19348643 DOI: 10.1111/j.1749-6632.2008.03759.x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In the DREAM2 community-wide experiment on regulatory network inference, one of the challenges was to identify which genes, in a list of 200, are direct regulatory targets of the transcription factor BCL6. The organizers of the challenge defined targets based on gene expression and chromatin immunoprecipitation experiments (ChIP-chip). The expression data were publicly available; the ChIP-chip data were not. In order to assess the likelihood that a gene is a BCL6 target, we used three classes of information: expression-level differences, over-representation of sequence motifs in promoter regions, and gene ontology annotations. A weight was attached to each analysis based on how well it identified BCL6-bound genes as defined by publicly available ChIP-chip data. By the organizers' criteria, our group, GenomeSingapore, performed best. However, our retrospective analysis indicates that this success was dominated by a gene expression analysis that was predicated on a regulatory model known to be favored by the organizers. We also noted that the 200-gene test set was enriched only in genes that are upregulated, while genes bound by BCL6 are enriched in both upregulated and downregulated genes. Together, these observations suggest possible model biases in the selection of the gold-standard gene set and imply that our success was attained in part by adhering to the same assumptions. We argue that model biases of this type are unavoidable in the inference of regulatory networks and, for that reason, we suggest that future community-wide experiments of this type should focus on the prediction of data, rather than models.
Collapse
|
74
|
Yeo ZX, Yeo HC, Yeo JKS, Yeo AL, Li Y, Clarke ND. Inferring transcription factor targets from gene expression changes and predicted promoter occupancy. J Comput Biol 2009; 16:357-68. [PMID: 19193152 DOI: 10.1089/cmb.2008.19tt] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
We have developed a method for inferring condition-specific targets of transcription factors based on ranking genes by gene expression change and ranking genes based on predicted transcription factor occupancy. The average of these two ranks, used as a test statistic, allows target genes to be inferred in a stringent manner. The method complements chromatin immunoprecipitation experiments by predicting targets under many conditions for which ChIP experiments have not been performed. We used the method to predict targets of 102 yeast transcription factors in approximately 1600 expression microarray experiments. The reliability of the method is suggested by the strong enrichment of genes previously shown to be bound, by the validation of binding to novel targets, by the way transcription factors with similar specificities can be functionally distinguished, and by the greater-than-expected number of regulatory network motifs, such as auto-regulatory interactions, that arise from new, predicted interactions. The combination of ChIP data and the targets inferred from this analysis results in a high-confidence regulatory network that includes many novel interactions. Interestingly, we find only a weak association between conditions in which we can infer the activity of a transcription factor and conditions in which the transcription gene itself is regulated. Thus, methods that rely on transcription factor regulation to help define regulatory interactions may miss regulatory relationships that are detected by the method reported here.
Collapse
|
75
|
Badis G, Chan ET, van Bakel H, Pena-Castillo L, Tillo D, Tsui K, Carlson CD, Gossett AJ, Hasinoff MJ, Warren CL, Gebbia M, Talukder S, Yang A, Mnaimneh S, Terterov D, Coburn D, Li Yeo A, Yeo ZX, Clarke ND, Lieb JD, Ansari AZ, Nislow C, Hughes TR. A library of yeast transcription factor motifs reveals a widespread function for Rsc3 in targeting nucleosome exclusion at promoters. Mol Cell 2009; 32:878-87. [PMID: 19111667 DOI: 10.1016/j.molcel.2008.11.020] [Citation(s) in RCA: 360] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2008] [Revised: 11/05/2008] [Accepted: 11/26/2008] [Indexed: 01/17/2023]
Abstract
The sequence specificity of DNA-binding proteins is the primary mechanism by which the cell recognizes genomic features. Here, we describe systematic determination of yeast transcription factor DNA-binding specificities. We obtained binding specificities for 112 DNA-binding proteins representing 19 distinct structural classes. One-third of the binding specificities have not been previously reported. Several binding sequences have striking genomic distributions relative to transcription start sites, supporting their biological relevance and suggesting a role in promoter architecture. Among these are Rsc3 binding sequences, containing the core CGCG, which are found preferentially approximately 100 bp upstream of transcription start sites. Mutation of RSC3 results in a dramatic increase in nucleosome occupancy in hundreds of proximal promoters containing a Rsc3 binding element, but has little impact on promoters lacking Rsc3 binding sequences, indicating that Rsc3 plays a broad role in targeting nucleosome exclusion at yeast promoters.
Collapse
Affiliation(s)
- Gwenael Badis
- Banting and Best Department of Medical Research, University of Toronto, Toronto, ON M5S 3E1, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
76
|
Tuteja G, Jensen ST, White P, Kaestner KH. Cis-regulatory modules in the mammalian liver: composition depends on strength of Foxa2 consensus site. Nucleic Acids Res 2008; 36:4149-57. [PMID: 18556755 PMCID: PMC2475634 DOI: 10.1093/nar/gkn366] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Foxa2 is a critical transcription factor that controls liver development and plays an important role in hepatic gluconeogensis in adult mice. Here, we use genome-wide location analysis for Foxa2 to identify its targets in the adult liver. We then show by computational analyses that Foxa2 containing cis-regulatory modules are not constructed from a random assortment of binding sites for other transcription factors expressed in the liver, but rather that their composition depends on the strength of the Foxa2 consensus site present. Genes containing a cis-regulatory module with a medium or weak Foxa2 consensus site are much more liver-specific than the genes with a strong consensus site. We not only provide a better understanding of the mechanisms of Foxa2 regulation but also introduce a novel method for identification of different cis-regulatory modules involving a single factor.
Collapse
Affiliation(s)
- Geetu Tuteja
- Department of Genetics, Genomics and Computational Biology Graduate Group, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| | | | | | | |
Collapse
|
77
|
van Oeffelen L, Cornelis P, Van Delm W, De Ridder F, De Moor B, Moreau Y. Detecting cis-regulatory binding sites for cooperatively binding proteins. Nucleic Acids Res 2008; 36:e46. [PMID: 18400778 PMCID: PMC2377448 DOI: 10.1093/nar/gkn140] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Several methods are available to predict cis-regulatory modules in DNA based on position weight matrices. However, the performance of these methods generally depends on a number of additional parameters that cannot be derived from sequences and are difficult to estimate because they have no physical meaning. As the best way to detect cis-regulatory modules is the way in which the proteins recognize them, we developed a new scoring method that utilizes the underlying physical binding model. This method requires no additional parameter to account for multiple binding sites; and the only necessary parameters to model homotypic cooperative interactions are the distances between adjacent protein binding sites in basepairs, and the corresponding cooperative binding constants. The heterotypic cooperative binding model requires one more parameter per cooperatively binding protein, which is the concentration multiplied by the partition function of this protein. In a case study on the bacterial ferric uptake regulator, we show that our scoring method for homotypic cooperatively binding proteins significantly outperforms other PWM-based methods where biophysical cooperativity is not taken into account.
Collapse
Affiliation(s)
- Liesbeth van Oeffelen
- Department of Electrical Engineering, ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium.
| | | | | | | | | | | |
Collapse
|
78
|
Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 2008; 451:535-40. [PMID: 18172436 DOI: 10.1038/nature06496] [Citation(s) in RCA: 352] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2007] [Accepted: 11/20/2007] [Indexed: 01/12/2023]
Abstract
The establishment of complex expression patterns at precise times and locations is key to metazoan development, yet a mechanistic understanding of the underlying transcription control networks is still missing. Here we describe a novel thermodynamic model that computes expression patterns as a function of cis-regulatory sequence and of the binding-site preferences and expression of participating transcription factors. We apply this model to the segmentation gene network of Drosophila melanogaster and find that it predicts expression patterns of cis-regulatory modules with remarkable accuracy, demonstrating that positional information is encoded in the regulatory sequence and input factor distribution. Our analysis reveals that both strong and weaker binding sites contribute, leading to high occupancy of the module DNA, and conferring robustness against mutation; short-range homotypic clustering of weaker sites facilitates cooperative binding, which is necessary to sharpen the patterns. Our computational framework is generally applicable to most protein-DNA interaction systems.
Collapse
|
79
|
Zeigler RD, Gertz J, Cohen BA. A cis-regulatory logic simulator. BMC Bioinformatics 2007; 8:272. [PMID: 17662143 PMCID: PMC2375358 DOI: 10.1186/1471-2105-8-272] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2006] [Accepted: 07/27/2007] [Indexed: 05/25/2023] Open
Abstract
Abstract
Background
A major goal of computational studies of gene regulation is to accurately predict the expression of genes based on the cis-regulatory content of their promoters. The development of computational methods to decode the interactions among cis-regulatory elements has been slow, in part, because it is difficult to know, without extensive experimental validation, whether a particular method identifies the correct cis-regulatory interactions that underlie a given set of expression data. There is an urgent need for test expression data in which the interactions among cis-regulatory sites that produce the data are known. The ability to rapidly generate such data sets would facilitate the development and comparison of computational methods that predict gene expression patterns from promoter sequence.
Results
We developed a gene expression simulator which generates expression data using user-defined interactions between cis-regulatory sites. The simulator can incorporate additive, cooperative, competitive, and synergistic interactions between regulatory elements. Constraints on the spacing, distance, and orientation of regulatory elements and their interactions may also be defined and Gaussian noise can be added to the expression values. The simulator allows for a data transformation that simulates the sigmoid shape of expression levels from real promoters. We found good agreement between sets of simulated promoters and predicted regulatory modules from real expression data. We present several data sets that may be useful for testing new methodologies for predicting gene expression from promoter sequence.
Conclusion
We developed a flexible gene expression simulator that rapidly generates large numbers of simulated promoters and their corresponding transcriptional output based on specified interactions between cis-regulatory sites. When appropriate rule sets are used, the data generated by our simulator faithfully reproduces experimentally derived data sets. We anticipate that using simulated gene expression data sets will facilitate the direct comparison of computational strategies to predict gene expression from promoter sequence. The source code is available online and as additional material. The test sets are available as additional material.
Collapse
|
80
|
Chen X, Hughes TR, Morris Q. RankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors. ACTA ACUST UNITED AC 2007; 23:i72-9. [PMID: 17646348 DOI: 10.1093/bioinformatics/btm224] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The sequence specificity of DNA-binding proteins is typically represented as a position weight matrix in which each base position contributes independently to relative affinity. Assessment of the accuracy and broad applicability of this representation has been limited by the lack of extensive DNA-binding data. However, new microarray techniques, in which preferences for all possible K-mers are measured, enable a broad comparison of both motif representation and methods for motif discovery. Here, we consider the problem of accounting for all of the binding data in such experiments, rather than the highest affinity binding data. We introduce the RankMotif++, an algorithm designed for finding motifs whenever sequences are associated with a semi-quantitative measure of protein-DNA-binding affinity. RankMotif++ learns motif models by maximizing the likelihood of a set of binding preferences under a probabilistic model of how sequence binding affinity translates into binding preference observations. Because RankMotif++ makes few assumptions about the relationship between binding affinity and the semi-quantitative readout, it is applicable to a wide variety of experimental assays of DNA-binding preference. RESULTS By several criteria, RankMotif++ predicts binding affinity better than two widely used motif finding algorithms (MDScan, MatrixREDUCE) or more recently developed algorithms (PREGO, Seed and Wobble), and its performance is comparable to a motif model that separately assigns affinities to 8-mers. Our results validate the PWM model and provide an approximation of the precision and recall that can be expected in a genomic scan. AVAILABILITY RankMotif++ is available upon request. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoyu Chen
- Banting and Best Department of Medical Research, University of Toronto, Toronto, ON, Canada
| | | | | |
Collapse
|
81
|
Bussemaker HJ, Foat BC, Ward LD. Predictive modeling of genome-wide mRNA expression: from modules to molecules. ACTA ACUST UNITED AC 2007; 36:329-47. [PMID: 17311525 DOI: 10.1146/annurev.biophys.36.040306.132725] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Various algorithms are available for predicting mRNA expression and modeling gene regulatory processes. They differ in whether they rely on the existence of modules of coregulated genes or build a model that applies to all genes, whether they represent regulatory activities as hidden variables or as mRNA levels, and whether they implicitly or explicitly model the complex cis-regulatory logic of multiple interacting transcription factors binding the same DNA. The fact that functional genomics data of different types reflect the same molecular processes provides a natural strategy for integrative computational analysis. One promising avenue toward an accurate and comprehensive model of gene regulation combines biophysical modeling of the interactions among proteins, DNA, and RNA with the use of large-scale functional genomics data to estimate regulatory network connectivity and activity parameters. As the ability of these models to represent complex cis-regulatory logic increases, the need for approaches based on cross-species conservation may diminish.
Collapse
Affiliation(s)
- Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.
| | | | | |
Collapse
|
82
|
Teif VB. General transfer matrix formalism to calculate DNA-protein-drug binding in gene regulation: application to OR operator of phage lambda. Nucleic Acids Res 2007; 35:e80. [PMID: 17526526 PMCID: PMC1920246 DOI: 10.1093/nar/gkm268] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2007] [Revised: 04/09/2007] [Accepted: 04/09/2007] [Indexed: 11/24/2022] Open
Abstract
The transfer matrix methodology is proposed as a systematic tool for the statistical-mechanical description of DNA-protein-drug binding involved in gene regulation. We show that a genetic system of several cis-regulatory modules is calculable using this method, considering explicitly the site-overlapping, competitive, cooperative binding of regulatory proteins, their multilayer assembly and DNA looping. In the methodological section, the matrix models are solved for the basic types of short- and long-range interactions between DNA-bound proteins, drugs and nucleosomes. We apply the matrix method to gene regulation at the O(R) operator of phage lambda. The transfer matrix formalism allowed the description of the lambda-switch at a single-nucleotide resolution, taking into account the effects of a range of inter-protein distances. Our calculations confirm previously established roles of the contact CI-Cro-RNAP interactions. Concerning long-range interactions, we show that while the DNA loop between the O(R) and O(L) operators is important at the lysogenic CI concentrations, the interference between the adjacent promoters P(R) and P(RM) becomes more important at small CI concentrations. A large change in the expression pattern may arise in this regime due to anticooperative interactions between DNA-bound RNA polymerases. The applicability of the matrix method to more complex systems is discussed.
Collapse
Affiliation(s)
- Vladimir B Teif
- Institute of Bioorganic Chemistry, Belarus National Academy of Sciences, Street Kuprevich 5/2, 220141, Minsk, Belarus.
| |
Collapse
|
83
|
Maerkl SJ, Quake SR. A systems approach to measuring the binding energy landscapes of transcription factors. Science 2007; 315:233-7. [PMID: 17218526 DOI: 10.1126/science.1131007] [Citation(s) in RCA: 399] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
A major goal of systems biology is to predict the function of biological networks. Although network topologies have been successfully determined in many cases, the quantitative parameters governing these networks generally have not. Measuring affinities of molecular interactions in high-throughput format remains problematic, especially for transient and low-affinity interactions. We describe a high-throughput microfluidic platform that measures such properties on the basis of mechanical trapping of molecular interactions. With this platform we characterized DNA binding energy landscapes for four eukaryotic transcription factors; these landscapes were used to test basic assumptions about transcription factor binding and to predict their in vivo function.
Collapse
Affiliation(s)
- Sebastian J Maerkl
- Biochemistry and Molecular Biophysics Option, California Institute of Technology, 1200 East California Boulevard, Pasadena, CA 91125, USA
| | | |
Collapse
|
84
|
Kinney JB, Tkačik G, Callan CG. Precise physical models of protein-DNA interaction from high-throughput data. Proc Natl Acad Sci U S A 2006; 104:501-6. [PMID: 17197415 PMCID: PMC1766414 DOI: 10.1073/pnas.0609908104] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A cell's ability to regulate gene transcription depends in large part on the energy with which transcription factors (TFs) bind their DNA regulatory sites. Obtaining accurate models of this binding energy is therefore an important goal for quantitative biology. In this article, we present a principled likelihood-based approach for inferring physical models of TF-DNA binding energy from the data produced by modern high-throughput binding assays. Central to our analysis is the ability to assess the relative likelihood of different model parameters given experimental observations. We take a unique approach to this problem and show how to compute likelihood without any explicit assumptions about the noise that inevitably corrupts such measurements. Sampling possible choices for model parameters according to this likelihood function, we can then make probabilistic predictions for the identities of binding sites and their physical binding energies. Applying this procedure to previously published data on the Saccharomyces cerevisiae TF Abf1p, we find models of TF binding whose parameters are determined with remarkable precision. Evidence for the accuracy of these models is provided by an astonishing level of phylogenetic conservation in the predicted energies of putative binding sites. Results from in vivo and in vitro experiments also provide highly consistent characterizations of Abf1p, a result that contrasts with a previous analysis of the same data.
Collapse
Affiliation(s)
- Justin B. Kinney
- Physics Department and Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| | - Gašper Tkačik
- Physics Department and Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| | - Curtis G. Callan
- Physics Department and Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
- *To whom correspondence should be addressed. E-mail:
| |
Collapse
|
85
|
Roider HG, Kanhere A, Manke T, Vingron M. Predicting transcription factor affinities to DNA from a biophysical model. Bioinformatics 2006; 23:134-41. [PMID: 17098775 DOI: 10.1093/bioinformatics/btl565] [Citation(s) in RCA: 152] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Theoretical efforts to understand the regulation of gene expression are traditionally centered around the identification of transcription factor binding sites at specific DNA positions. More recently these efforts have been supplemented by experimental data for relative binding affinities of proteins to longer intergenic sequences. The question arises to what extent these two approaches converge. In this paper, we adopt a physical binding model to predict the relative binding affinity of a transcription factor for a given sequence. RESULTS We find that a significant fraction of genome-wide binding data in yeast can be accounted for by simple count matrices and a physical model with only two parameters. We demonstrate that our approach is both conceptually and practically more powerful than traditional methods, which require selection of a cutoff. Our analysis yields biologically meaningful parameters, suitable for predicting relative binding affinities in the absence of experimental binding data. AVAILABILITY The C source code for our TRAP program is freely available for non-commercial use at http://www.molgen.mpg.de/~manke/papers/TFaffinities/
Collapse
Affiliation(s)
- Helge G Roider
- Max-Planck-Institute for Molecular Genetics Ihnestrasse 73, 14195 Berlin, Germany
| | | | | | | |
Collapse
|
86
|
Liu X, Lee CK, Granek JA, Clarke ND, Lieb JD. Whole-genome comparison of Leu3 binding in vitro and in vivo reveals the importance of nucleosome occupancy in target site selection. Genome Res 2006; 16:1517-28. [PMID: 17053089 PMCID: PMC1665635 DOI: 10.1101/gr.5655606] [Citation(s) in RCA: 109] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Sequence motifs that are potentially recognized by DNA-binding proteins occur far more often in genomic DNA than do observed in vivo protein-DNA interactions. To determine how chromatin influences the utilization of particular DNA-binding sites, we compared the in vivo genome-wide binding location of the yeast transcription factor Leu3 to the binding location observed on the same genomic DNA in the absence of any protein cofactors. We found that the DNA-sequence motif recognized by Leu3 in vitro and in vivo was functionally indistinguishable, but Leu3 bound different genomic locations under the two conditions. Accounting for nucleosome occupancy in addition to DNA-sequence motifs significantly improved the prediction of protein-DNA interactions in vivo, but not the prediction of sites bound by purified Leu3 in vitro. Use of histone modification data does not further improve binding predictions, presumably because their effect is already manifest in the global histone distribution. Measurements of nucleosome occupancy in strains that differ in Leu3 genotype show that low nucleosome occupancy at loci bound by Leu3 is not a consequence of Leu3 binding. These results permit quantitation of the epigenetic influence that chromatin exerts on DNA binding-site selection, and provide evidence for an instructive, functionally important role for nucleosome occupancy in determining patterns of regulatory factor targeting genome-wide.
Collapse
Affiliation(s)
- Xiao Liu
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA
| | - Cheol-Koo Lee
- Department of Biology and the Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Joshua A. Granek
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA
| | - Neil D. Clarke
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA
- Genome Institute of Singapore, #02-01 Genome, Singapore 138672
| | - Jason D. Lieb
- Department of Biology and the Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
- Corresponding author.E-mail ; fax (919) 962-1625
| |
Collapse
|
87
|
Tang L, Liu X, Clarke ND. Inferring direct regulatory targets from expression and genome location analyses: a comparison of transcription factor deletion and overexpression. BMC Genomics 2006; 7:215. [PMID: 16923194 PMCID: PMC1559704 DOI: 10.1186/1471-2164-7-215] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2006] [Accepted: 08/22/2006] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Effects on gene expression due to environmental or genetic changes can be easily measured using microarrays. However, indirect effects on expression can be substantial. The indirect effects of a perturbation need to be distinguished from the direct effects if we are to understand the structure and behavior of regulatory networks. RESULTS The most direct way to perturb a transcriptional network is to alter transcription factor activity. Here, for the first time, we compare expression changes and genomic binding in a simple regulon under conditions of both low and high transcription factor activity. Specifically, we assessed the effects on expression and binding due to deletion of the yeast LEU3 transcription factor gene and effects due to elevation of Leu3 activity. Leu3 activity was elevated through overexpression and the introduction of a mutation that renders the protein constitutively active. Genes that are bound and/or regulated by Leu3 under one or both conditions were characterized in terms of their functional annotations and their predicted potential to be bound by Leu3. We also assessed the evolutionary conservation of the predicted binding potential using a novel alignment-independent method. Both perturbations yield genes that are likely to be direct targets of Leu3, including most of the classically defined targets. Additional direct targets are identified by each of the methods. However, experimental and computational criteria suggest that most genes whose expression is affected by the Leu3 genotype are unlikely to be regulated by binding of the protein. CONCLUSION Most genes that are differentially expressed by Leu3 are not direct targets despite the exceptional simplicity of the regulon, and the unusually direct nature of the perturbations investigated. These conclusions are reached through computational analyses that support and extend chromatin immunoprecipitation data on the identities of direct targets. These results have implications for the interpretation of expression experiments, especially in cases for which chromatin immunoprecipitation data are unavailable, incomplete, or ambiguous.
Collapse
Affiliation(s)
- Lin Tang
- Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, Baltimore, MD, USA
- AviaraDX Inc., 2715 Locker West, Carlsbad, CA, USA
| | - Xiao Liu
- Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Developmental Biology, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Neil D Clarke
- Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Genome Institute of Singapore, Singapore
| |
Collapse
|
88
|
Abstract
Major experimental and computational efforts are targeted at the characterization of transcriptional networks on a genomic scale. The ultimate goal of many of these studies is to construct networks associating transcription factors with genes via well-defined binding sites. Weaker regulatory interactions other than those occurring at high-affinity binding sites are largely ignored and are not well understood. Here I show that low-affinity interactions are abundant in vivo and quantifiable from current high-throughput ChIP experiments. I develop algorithms that predict DNA-binding energies from sequences and ChIP data across a wide dynamic range of affinities and use them to reveal widespread functionality of low-affinity transcription factor binding. Evolutionary analysis suggests that binding energies of many transcription factors are conserved even in promoters lacking classical binding sites. Gene expression analysis shows that such promoters can generate significant expression. I estimate that while only a small percentage of the genome is strongly regulated by a typical transcription factor, up to an order of magnitude more may be involved in weaker interactions. Low-affinity transcription factor-DNA interaction may therefore be important both evolutionarily and functionally.
Collapse
Affiliation(s)
- Amos Tanay
- Center for Studies in Physics and Biology, Rockefeller University, New York, New York 10021, USA.
| |
Collapse
|