Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: He X, Chen CC, Hong F, Fang F, Sinha S, Ng HH, Zhong S. A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data. PLoS One 2009;4:e8155. [PMID: 19956545 PMCID: PMC2780727 DOI: 10.1371/journal.pone.0008155] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Accepted: 11/10/2009] [Indexed: 11/19/2022] Open

For:	He X, Chen CC, Hong F, Fang F, Sinha S, Ng HH, Zhong S. A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data. PLoS One 2009;4:e8155. [PMID: 19956545 PMCID: PMC2780727 DOI: 10.1371/journal.pone.0008155] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Accepted: 11/10/2009] [Indexed: 11/19/2022] Open

Number

Cited by Other Article(s)

Avsec Ž, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, Fropf R, McAnany C, Gagneur J, Kundaje A, Zeitlinger J. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet 2021;53:354-366. [PMID: 33603233 PMCID: PMC8812996 DOI: 10.1038/s41588-021-00782-6] [Citation(s) in RCA: 225] [Impact Index Per Article: 75.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Accepted: 01/07/2021] [Indexed: 01/30/2023]

Rowland MA, Pilkiewicz KR, Mayo ML. Devil in the details: Mechanistic variations impact information transfer across models of transcriptional cascades. PLoS One 2021;16:e0245094. [PMID: 33439904 PMCID: PMC7806174 DOI: 10.1371/journal.pone.0245094] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 12/22/2020] [Indexed: 11/19/2022] Open

Abstract

The transcriptional network determines a cell’s internal state by regulating protein expression in response to changes in the local environment. Due to the interconnected nature of this network, information encoded in the abundance of various proteins will often propagate across chains of noisy intermediate signaling events. The data-processing inequality (DPI) leads us to expect that this intracellular game of “telephone” should degrade this type of signal, with longer chains losing successively more information to noise. However, a previous modeling effort predicted that because the steps of these signaling cascades do not truly represent independent stages of data processing, the limits of the DPI could seemingly be surpassed, and the amount of transmitted information could actually increase with chain length. What that work did not examine was whether this regime of growing information transmission was attainable by a signaling system constrained by the mechanistic details of more complex protein-binding kinetics. Here we address this knowledge gap through the lens of information theory by examining a model that explicitly accounts for the binding of each transcription factor to DNA. We analyze this model by comparing stochastic simulations of the fully nonlinear kinetics to simulations constrained by the linear response approximations that displayed a regime of growing information. Our simulations show that even when molecular binding is considered, there remains a regime wherein the transmitted information can grow with cascade length, but ends after a critical number of links determined by the kinetic parameter values. This inflection point marks where correlations decay in response to an oversaturation of binding sites, screening informative transcription factor fluctuations from further propagation down the chain where they eventually become indistinguishable from the surrounding levels of noise.

Collapse

Pande A, Makalowski W, Brosius J, Raabe CA. Enhancer occlusion transcripts regulate the activity of human enhancer domains via transcriptional interference: a computational perspective. Nucleic Acids Res 2020;48:3435-3454. [PMID: 32133533 PMCID: PMC7144904 DOI: 10.1093/nar/gkaa026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 12/27/2019] [Accepted: 01/31/2020] [Indexed: 02/05/2023] Open

Peng PC, Khoueiry P, Girardot C, Reddington JP, Garfield DA, Furlong EEM, Sinha S. The Role of Chromatin Accessibility in cis-Regulatory Evolution. Genome Biol Evol 2020;11:1813-1828. [PMID: 31114856 PMCID: PMC6601868 DOI: 10.1093/gbe/evz103] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/13/2019] [Indexed: 02/07/2023] Open

Xie X, Hanson C, Sinha S. Mechanistic interpretation of non-coding variants for discovering transcriptional regulators of drug response. BMC Biol 2019;17:62. [PMID: 31362726 PMCID: PMC6664756 DOI: 10.1186/s12915-019-0679-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 07/09/2019] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

Identification of functional non-coding variants and their mechanistic interpretation is a major challenge of modern genomics, especially for precision medicine. Transcription factor (TF) binding profiles and epigenomic landscapes in reference samples allow functional annotation of the genome, but do not provide ready answers regarding the effects of non-coding variants on phenotypes. A promising computational approach is to build models that predict TF-DNA binding from sequence, and use such models to score a variant's impact on TF binding strength. Here, we asked if this mechanistic approach to variant interpretation can be combined with information on genotype-phenotype associations to discover transcription factors regulating phenotypic variation among individuals.

RESULTS

We developed a statistical approach that integrates phenotype, genotype, gene expression, TF ChIP-seq, and Hi-C chromatin interaction data to answer this question. Using drug sensitivity of lymphoblastoid cell lines as the phenotype of interest, we tested if non-coding variants statistically linked to the phenotype are enriched for strong predicted impact on DNA binding strength of a TF and thus identified TFs regulating individual differences in the phenotype. Our approach relies on a new method for predicting variant impact on TF-DNA binding that uses a combination of biophysical modeling and machine learning. We report statistical and literature-based support for many of the TFs discovered here as regulators of drug response variation. We show that the use of mechanistically driven variant impact predictors can identify TF-drug associations that would otherwise be missed. We examined in depth one reported association-that of the transcription factor ELF1 with the drug doxorubicin-and identified several genes that may mediate this regulatory relationship.

CONCLUSION

Our work represents initial steps in utilizing predictions of variant impact on TF binding sites for discovery of regulatory mechanisms underlying phenotypic variation. Future advances on this topic will be greatly beneficial to the reconstruction of phenotype-associated gene regulatory networks.

Collapse

Datta V, Hannenhalli S, Siddharthan R. ChIPulate: A comprehensive ChIP-seq simulation pipeline. PLoS Comput Biol 2019;15:e1006921. [PMID: 30897079 PMCID: PMC6445533 DOI: 10.1371/journal.pcbi.1006921] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 04/02/2019] [Accepted: 03/04/2019] [Indexed: 12/17/2022] Open

Abstract

ChIP-seq (Chromatin Immunoprecipitation followed by sequencing) is a high-throughput technique to identify genomic regions that are bound in vivo by a particular protein, e.g., a transcription factor (TF). Biological factors, such as chromatin state, indirect and cooperative binding, as well as experimental factors, such as antibody quality, cross-linking, and PCR biases, are known to affect the outcome of ChIP-seq experiments. However, the relative impact of these factors on inferences made from ChIP-seq data is not entirely clear. Here, via a detailed ChIP-seq simulation pipeline, ChIPulate, we assess the impact of various biological and experimental sources of variation on several outcomes of a ChIP-seq experiment, viz., the recoverability of the TF binding motif, accuracy of TF-DNA binding detection, the sensitivity of inferred TF-DNA binding strength, and number of replicates needed to confidently infer binding strength. We find that the TF motif can be recovered despite poor and non-uniform extraction and PCR amplification efficiencies. The recovery of the motif is, however, affected to a larger extent by the fraction of sites that are either cooperatively or indirectly bound. Importantly, our simulations reveal that the number of ChIP-seq replicates needed to accurately measure in vivo occupancy at high-affinity sites is larger than the recommended community standards. Our results establish statistical limits on the accuracy of inferences of protein-DNA binding from ChIP-seq and suggest that increasing the mean extraction efficiency, rather than amplification efficiency, would better improve sensitivity. The source code and instructions for running ChIPulate can be found at https://github.com/vishakad/chipulate.

DNA-binding proteins perform many key roles in biology, such as transcriptional regulation of gene expression and chromatin modification. ChIP-seq (Chromatin immunoprecipitation followed by high-throughput sequencing) is a widely used experimental technique to identify DNA-binding sites of specific proteins of interest, within cells, genome-wide. DNA fragments from genomic regions that are bound by a protein of interest, often a transcription factor (TF), are selectively extracted using specific antibodies, amplified using PCR, and sequenced. The sequences are mapped to the reference genome. Regions where many sequences map, called “peaks”, are used to infer the location of TF-bound loci (peaks), in vivo occupancy at those loci, and the sequence pattern (motif) to which the TF shows a binding affinity. But measurements of TF occupancy and motif inference are vulnerable to several biological and experimental sources of variation that are poorly understood and difficult to assess directly. Here, we simulate key steps of the ChIP-seq protocol with the aim of estimating the relative effects of various sources of variations on motif inference and binding affinity estimations. Besides providing specific insights and recommendations, we provide a general framework to simulate sequence reads in a ChIP-seq experiment, which should considerably aid in the development of software aimed at analyzing ChIP-seq data.

Collapse

Detection of cooperatively bound transcription factor pairs using ChIP-seq peak intensities and expectation maximization. PLoS One 2018;13:e0199771. [PMID: 30016330 PMCID: PMC6049898 DOI: 10.1371/journal.pone.0199771] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Accepted: 06/13/2018] [Indexed: 11/19/2022] Open

Pande A, Brosius J, Makalowska I, Makalowski W, Raabe CA. Transcriptional interference by small transcripts in proximal promoter regions. Nucleic Acids Res 2018;46:1069-1088. [PMID: 29309647 PMCID: PMC5815073 DOI: 10.1093/nar/gkx1242] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Revised: 11/27/2017] [Accepted: 11/30/2017] [Indexed: 01/15/2023] Open

Khoueiry P, Girardot C, Ciglar L, Peng PC, Gustafson EH, Sinha S, Furlong EE. Uncoupling evolutionary changes in DNA sequence, transcription factor occupancy and enhancer activity. eLife 2017;6. [PMID: 28792889 PMCID: PMC5550276 DOI: 10.7554/elife.28440] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 07/21/2017] [Indexed: 12/15/2022] Open

Crowdsourcing biomedical research: leveraging communities as innovation engines. Nat Rev Genet 2017;17:470-86. [PMID: 27418159 DOI: 10.1038/nrg.2016.69] [Citation(s) in RCA: 100] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Zhang Y, Wang P, Yan M. An Entropy-Based Position Projection Algorithm for Motif Discovery. BIOMED RESEARCH INTERNATIONAL 2016;2016:9127474. [PMID: 27882329 PMCID: PMC5110948 DOI: 10.1155/2016/9127474] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Revised: 09/20/2016] [Accepted: 10/05/2016] [Indexed: 12/31/2022]

Combinatorial Cis-regulation in Saccharomyces Species. G3-GENES GENOMES GENETICS 2016;6:653-67. [PMID: 26772747 PMCID: PMC4777128 DOI: 10.1534/g3.115.024331] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Abstract

Transcriptional control of gene expression requires interactions between the cis-regulatory elements (CREs) controlling gene promoters. We developed a sensitive computational method to identify CRE combinations with conserved spacing that does not require genome alignments. When applied to seven sensu stricto and sensu lato Saccharomyces species, 80% of the predicted interactions displayed some evidence of combinatorial transcriptional behavior in several existing datasets including: (1) chromatin immunoprecipitation data for colocalization of transcription factors, (2) gene expression data for coexpression of predicted regulatory targets, and (3) gene ontology databases for common pathway membership of predicted regulatory targets. We tested several predicted CRE interactions with chromatin immunoprecipitation experiments in a wild-type strain and strains in which a predicted cofactor was deleted. Our experiments confirmed that transcription factor (TF) occupancy at the promoters of the CRE combination target genes depends on the predicted cofactor while occupancy of other promoters is independent of the predicted cofactor. Our method has the additional advantage of identifying regulatory differences between species. By analyzing the S. cerevisiae and S. bayanus genomes, we identified differences in combinatorial cis-regulation between the species and showed that the predicted changes in gene regulation explain several of the species-specific differences seen in gene expression datasets. In some instances, the same CRE combinations appear to regulate genes involved in distinct biological processes in the two different species. The results of this research demonstrate that (1) combinatorial cis-regulation can be inferred by multi-genome analysis and (2) combinatorial cis-regulation can explain differences in gene expression between species.

Collapse

Peng PC, Hassan Samee MA, Sinha S. Incorporating chromatin accessibility data into sequence-to-expression modeling. Biophys J 2016;108:1257-67. [PMID: 25762337 DOI: 10.1016/j.bpj.2014.12.037] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Revised: 12/01/2014] [Accepted: 12/11/2014] [Indexed: 01/30/2023] Open

Genome-Wide Analysis of Drosophila RBf2 Protein Highlights the Diversity of RB Family Targets and Possible Role in Regulation of Ribosome Biosynthesis. G3-GENES GENOMES GENETICS 2015;5:1503-15. [PMID: 25999584 PMCID: PMC4502384 DOI: 10.1534/g3.115.019166] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Thomas D, Finan C, Newport MJ, Jones S. DNA entropy reveals a significant difference in complexity between housekeeping and tissue specific gene promoters. Comput Biol Chem 2015;58:19-24. [PMID: 25988219 DOI: 10.1016/j.compbiolchem.2015.05.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 05/01/2015] [Accepted: 05/01/2015] [Indexed: 10/23/2022]

Blatti C, Kazemian M, Wolfe S, Brodsky M, Sinha S. Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism. Nucleic Acids Res 2015;43:3998-4012. [PMID: 25791631 PMCID: PMC4417154 DOI: 10.1093/nar/gkv195] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 02/24/2015] [Indexed: 11/17/2022] Open

Zabet NR, Adryan B. Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res 2015;43:84-94. [PMID: 25432957 PMCID: PMC4288167 DOI: 10.1093/nar/gku1269] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Revised: 10/22/2014] [Accepted: 11/19/2014] [Indexed: 12/20/2022] Open

Cha M, Zhou Q. Detecting clustering and ordering binding patterns among transcription factors via point process models. Bioinformatics 2014;30:2263-71. [PMID: 24790155 DOI: 10.1093/bioinformatics/btu303] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Mapping yeast transcriptional networks. Genetics 2014;195:9-36. [PMID: 24018767 PMCID: PMC3761317 DOI: 10.1534/genetics.113.153262] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open

Jankowski A, Prabhakar S, Tiuryn J. TACO: a general-purpose tool for predicting cell-type-specific transcription factor dimers. BMC Genomics 2014;15:208. [PMID: 24640962 PMCID: PMC4004051 DOI: 10.1186/1471-2164-15-208] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 03/07/2014] [Indexed: 12/22/2022] Open

Levinson M, Zhou Q. A penalized Bayesian approach to predicting sparse protein-DNA binding landscapes. ACTA ACUST UNITED AC 2014;30:636-43. [PMID: 24115169 DOI: 10.1093/bioinformatics/btt585] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Ezer D, Zabet NR, Adryan B. Physical constraints determine the logic of bacterial promoter architectures. Nucleic Acids Res 2014;42:4196-207. [PMID: 24476912 PMCID: PMC3985651 DOI: 10.1093/nar/gku078] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open

Chen CC, Xiao S, Xie D, Cao X, Song CX, Wang T, He C, Zhong S. Understanding variation in transcription factor binding by modeling transcription factor genome-epigenome interactions. PLoS Comput Biol 2013;9:e1003367. [PMID: 24339764 PMCID: PMC3854512 DOI: 10.1371/journal.pcbi.1003367] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2012] [Accepted: 10/15/2013] [Indexed: 12/20/2022] Open

Abstract

Despite explosive growth in genomic datasets, the methods for studying epigenomic mechanisms of gene regulation remain primitive. Here we present a model-based approach to systematically analyze the epigenomic functions in modulating transcription factor-DNA binding. Based on the first principles of statistical mechanics, this model considers the interactions between epigenomic modifications and a cis-regulatory module, which contains multiple binding sites arranged in any configurations. We compiled a comprehensive epigenomic dataset in mouse embryonic stem (mES) cells, including DNA methylation (MeDIP-seq and MRE-seq), DNA hydroxymethylation (5-hmC-seq), and histone modifications (ChIP-seq). We discovered correlations of transcription factors (TFs) for specific combinations of epigenomic modifications, which we term epigenomic motifs. Epigenomic motifs explained why some TFs appeared to have different DNA binding motifs derived from in vivo (ChIP-seq) and in vitro experiments. Theoretical analyses suggested that the epigenome can modulate transcriptional noise and boost the cooperativity of weak TF binding sites. ChIP-seq data suggested that epigenomic boost of binding affinities in weak TF binding sites can function in mES cells. We showed in theory that the epigenome should suppress the TF binding differences on SNP-containing binding sites in two people. Using personal data, we identified strong associations between H3K4me2/H3K9ac and the degree of personal differences in NFκB binding in SNP-containing binding sites, which may explain why some SNPs introduce much smaller personal variations on TF binding than other SNPs. In summary, this model presents a powerful approach to analyze the functions of epigenomic modifications. This model was implemented into an open source program APEG (Affinity Prediction by Epigenome and Genome, http://systemsbio.ucsd.edu/apeg).

Collapse

Zhong S, He X, Bar-Joseph Z. Predicting tissue specific transcription factor binding sites. BMC Genomics 2013;14:796. [PMID: 24238150 PMCID: PMC3898213 DOI: 10.1186/1471-2164-14-796] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 11/06/2013] [Indexed: 12/13/2022] Open

Cheng Q, Kazemian M, Pham H, Blatti C, Celniker SE, Wolfe SA, Brodsky MH, Sinha S. Computational identification of diverse mechanisms underlying transcription factor-DNA occupancy. PLoS Genet 2013;9:e1003571. [PMID: 23935523 PMCID: PMC3731213 DOI: 10.1371/journal.pgen.1003571] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Accepted: 05/02/2013] [Indexed: 12/13/2022] Open

Abstract

ChIP-based genome-wide assays of transcription factor (TF) occupancy have emerged as a powerful, high-throughput method to understand transcriptional regulation, especially on a global scale. This has led to great interest in the underlying biochemical mechanisms that direct TF-DNA binding, with the ultimate goal of computationally predicting a TF's occupancy profile in any cellular condition. In this study, we examined the influence of various potential determinants of TF-DNA binding on a much larger scale than previously undertaken. We used a thermodynamics-based model of TF-DNA binding, called “STAP,” to analyze 45 TF-ChIP data sets from Drosophila embryonic development. We built a cross-validation framework that compares a baseline model, based on the ChIP'ed (“primary”) TF's motif, to more complex models where binding by secondary TFs is hypothesized to influence the primary TF's occupancy. Candidates interacting TFs were chosen based on RNA-SEQ expression data from the time point of the ChIP experiment. We found widespread evidence of both cooperative and antagonistic effects by secondary TFs, and explicitly quantified these effects. We were able to identify multiple classes of interactions, including (1) long-range interactions between primary and secondary motifs (separated by ≤150 bp), suggestive of indirect effects such as chromatin remodeling, (2) short-range interactions with specific inter-site spacing biases, suggestive of direct physical interactions, and (3) overlapping binding sites suggesting competitive binding. Furthermore, by factoring out the previously reported strong correlation between TF occupancy and DNA accessibility, we were able to categorize the effects into those that are likely to be mediated by the secondary TF's effect on local accessibility and those that utilize accessibility-independent mechanisms. Finally, we conducted in vitro pull-down assays to test model-based predictions of short-range cooperative interactions, and found that seven of the eight TF pairs tested physically interact and that some of these interactions mediate cooperative binding to DNA.

Chromatin Immunoprecipitation (ChIP)-based genome-wide assays of transcription factor (TF) occupancy have emerged as a powerful, high throughput method to understand transcriptional regulation, especially on a global scale. Here, we utilize 45 ChIP-chip and ChIP-SEQ data sets from Drosophila to explore the underlying mechanisms of TF-DNA binding. For this, we employ a biophysically motivated computational model, in conjunction with over 300 TF motifs (binding specificities) as well as gene expression and DNA accessibility data from different developmental stages in Drosophila embryos. Our findings provide robust statistical evidence of the role played by TF-TF interactions in shaping genome-wide TF-DNA binding profiles, and thus in directing gene regulation. Our method allows us to go beyond simply recognizing the existence of such interactions, to quantifying their effects on TF occupancy. We are able to categorize the probable mechanisms of these effects as involving direct physical interactions versus accessibility-mediated indirect interactions, long-range versus short-range interactions, and cooperative versus antagonistic interactions. Our analysis reveals widespread evidence of combinatorial regulation present in recently generated ChIP data sets, and sets the stage for rich integrative models of the future that will predict cell type-specific TF occupancy values from sequence and expression data.

Collapse

Lee Y, Zhou Q. Co-regulation in embryonic stem cells via context-dependent binding of transcription factors. Bioinformatics 2013;29:2162-8. [DOI: 10.1093/bioinformatics/btt365] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Jankowski A, Szczurek E, Jauch R, Tiuryn J, Prabhakar S. Comprehensive prediction in 78 human cell lines reveals rigidity and compactness of transcription factor dimers. Genome Res 2013;23:1307-18. [PMID: 23554463 PMCID: PMC3730104 DOI: 10.1101/gr.154922.113] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

Teif VB, Erdel F, Beshnova DA, Vainshtein Y, Mallm JP, Rippe K. Taking into account nucleosomes for predicting gene expression. Methods 2013;62:26-38. [PMID: 23523656 DOI: 10.1016/j.ymeth.2013.03.011] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Accepted: 03/10/2013] [Indexed: 01/10/2023] Open

Goode DK, Elgar G. Capturing the regulatory interactions of eukaryote genomes. Brief Funct Genomics 2012;12:142-60. [PMID: 23117864 DOI: 10.1093/bfgp/els041] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open

Spatiotemporal clustering of the epigenome reveals rules of dynamic gene regulation. Genome Res 2012;23:352-64. [PMID: 23033340 PMCID: PMC3561876 DOI: 10.1101/gr.144949.112] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Bailey TL, Machanick P. Inferring direct DNA binding from ChIP-seq. Nucleic Acids Res 2012;40:e128. [PMID: 22610855 PMCID: PMC3458523 DOI: 10.1093/nar/gks433] [Citation(s) in RCA: 321] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2011] [Revised: 04/02/2012] [Accepted: 04/23/2012] [Indexed: 11/14/2022] Open

Guertin MJ, Martins AL, Siepel A, Lis JT. Accurate prediction of inducible transcription factor binding intensities in vivo. PLoS Genet 2012;8:e1002610. [PMID: 22479205 PMCID: PMC3315474 DOI: 10.1371/journal.pgen.1002610] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2011] [Accepted: 02/06/2012] [Indexed: 11/19/2022] Open

Teif VB, Shkrabkou AV, Egorova VP, Krot VI. Nucleosomes in gene regulation: Theoretical approaches. Mol Biol 2012. [DOI: 10.1134/s002689331106015x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Kaplan T, Biggin MD. Quantitative models of the mechanisms that control genome-wide patterns of animal transcription factor binding. Methods Cell Biol 2012;110:263-83. [PMID: 22482953 DOI: 10.1016/b978-0-12-388403-9.00011-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Thomas-Chollier M, Herrmann C, Defrance M, Sand O, Thieffry D, van Helden J. RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets. Nucleic Acids Res 2011;40:e31. [PMID: 22156162 PMCID: PMC3287167 DOI: 10.1093/nar/gkr1104] [Citation(s) in RCA: 159] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

He X, Duque TSPC, Sinha S. Evolutionary origins of transcription factor binding site clusters. Mol Biol Evol 2011;29:1059-70. [PMID: 22075113 DOI: 10.1093/molbev/msr277] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Abstract

Empirical studies have revealed that regulatory DNA sequences such as enhancers or promoters often harbor multiple binding sites for the same transcription factor. Such "homotypic site clustering" has been hypothesized as arising out of functional requirements of the sequences. Here, we propose an alternative explanation of this phenomenon that multisite enhancers are common because they are favored by evolutionary sampling of the genotype-phenotype landscape. To test this hypothesis, we developed a new computational framework specialized for population genetic simulations of enhancer evolution. It uses a thermodynamics-based model of enhancer function, integrating information from strong as well as weak binding sites, to determine the strength of selection. Using this framework, we found that even when simpler genotypes exist for a desired strength of regulation, relatively complex genotypes (enhancers with more sites) are more readily reached by the simulated evolutionary process. We show that there are more ways to "build" a fit genotype with many weak sites than with a few strong sites, and this is why evolution finds complex genotypes more often. Our claims are consistent with an empirical analysis of binding site content in enhancers characterized in Drosophila melanogaster and their orthologs in other Drosophila species. We also characterized a subtle but significant difference between genotypes likely to be sampled by evolution and equally fit genotypes one would obtain by uniform sampling of the fitness landscape, that is, an "evolutionary signature" in enhancer sequences. Finally, we investigated potential effects of other factors, such as rugged fitness landscapes, short local duplications, and noise characteristics of enhancers, on the emergence of homotypic site clustering. Homotypic site clustering is an important contributor to the complexity and function of cis-regulatory sequences. This work provides a simple null hypothesis for its origin, against which alternative adaptationist explanations may be evaluated, and cautions against "evolutionary mirages" present in common features of genomic sequence. The quantitative framework we develop here can be used more generally to understand how mechanisms of enhancer action influence their composition and evolution.

Collapse

Pollock DD, de Koning APJ, Kim H, Castoe TA, Churchill MEA, Kechris KJ. Bayesian analysis of high-throughput quantitative measurement of protein-DNA interactions. PLoS One 2011;6:e26105. [PMID: 22069446 PMCID: PMC3206046 DOI: 10.1371/journal.pone.0026105] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2011] [Accepted: 09/19/2011] [Indexed: 11/19/2022] Open

Teif VB, Rippe K. Calculating transcription factor binding maps for chromatin. Brief Bioinform 2011;13:187-201. [PMID: 21737419 DOI: 10.1093/bib/bbr037] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Xie D, Chen CC, He X, Cao X, Zhong S. Towards an evolutionary model of transcription networks. PLoS Comput Biol 2011;7:e1002064. [PMID: 21695281 PMCID: PMC3111474 DOI: 10.1371/journal.pcbi.1002064] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2010] [Accepted: 04/08/2011] [Indexed: 11/18/2022] Open

Abstract

DNA evolution models made invaluable contributions to comparative genomics, although it seemed formidable to include non-genomic features into these models. In order to build an evolutionary model of transcription networks (TNs), we had to forfeit the substitution model used in DNA evolution and to start from modeling the evolution of the regulatory relationships. We present a quantitative evolutionary model of TNs, subjecting the phylogenetic distance and the evolutionary changes of cis-regulatory sequence, gene expression and network structure to one probabilistic framework. Using the genome sequences and gene expression data from multiple species, this model can predict regulatory relationships between a transcription factor (TF) and its target genes in all species, and thus identify TN re-wiring events. Applying this model to analyze the pre-implantation development of three mammalian species, we identified the conserved and re-wired components of the TNs downstream to a set of TFs including Oct4, Gata3/4/6, cMyc and nMyc. Evolutionary events on the DNA sequence that led to turnover of TF binding sites were identified, including a birth of an Oct4 binding site by a 2nt deletion. In contrast to recent reports of large interspecies differences of TF binding sites and gene expression patterns, the interspecies difference in TF-target relationship is much smaller. The data showed increasing conservation levels from genomic sequences to TF-DNA interaction, gene expression, TN, and finally to morphology, suggesting that evolutionary changes are larger at molecular levels and smaller at functional levels. The data also showed that evolutionarily older TFs are more likely to have conserved target genes, whereas younger TFs tend to have larger re-wiring rates.

DNA evolution models made invaluable contributions to comparative genomic studies. Still lacking is an evolutionary model of transcription networks (TNs). To develop such a model, we had to forfeit the substitution model used in DNA evolution and to start from modeling the evolution of the regulatory relationships, and then subject the phylogenetic distance and the multi-species DNA sequence and gene expression data to one probabilistic framework. This model enabled us to infer the evolutionary changes of transcriptional regulatory relationships. Applying this model to analyze three yeast species, we found the anaerobic phenotype in two species was associated with the evolutionary loss of a larger cis-regulatory motif than previously thought. Analyzing three mammalian species, we found increasing conservation levels from genomic sequences to transcription factor-DNA interaction, gene expression, TN, and finally to morphology, suggesting that evolutionary changes are larger at molecular levels and smaller at functional levels. We also found that evolutionarily younger TFs are more likely to regulate different target genes in different species.

Collapse

Kaplan T, Li XY, Sabo PJ, Thomas S, Stamatoyannopoulos JA, Biggin MD, Eisen MB. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development. PLoS Genet 2011;7:e1001290. [PMID: 21304941 PMCID: PMC3033374 DOI: 10.1371/journal.pgen.1001290] [Citation(s) in RCA: 139] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2010] [Accepted: 01/01/2011] [Indexed: 01/01/2023] Open

Abstract

Transcription factors that drive complex patterns of gene expression during animal development bind to thousands of genomic regions, with quantitative differences in binding across bound regions mediating their activity. While we now have tools to characterize the DNA affinities of these proteins and to precisely measure their genome-wide distribution in vivo, our understanding of the forces that determine where, when, and to what extent they bind remains primitive. Here we use a thermodynamic model of transcription factor binding to evaluate the contribution of different biophysical forces to the binding of five regulators of early embryonic anterior-posterior patterning in Drosophila melanogaster. Predictions based on DNA sequence and in vitro protein-DNA affinities alone achieve a correlation of ∼0.4 with experimental measurements of in vivo binding. Incorporating cooperativity and competition among the five factors, and accounting for spatial patterning by modeling binding in every nucleus independently, had little effect on prediction accuracy. A major source of error was the prediction of binding events that do not occur in vivo, which we hypothesized reflected reduced accessibility of chromatin. To test this, we incorporated experimental measurements of genome-wide DNA accessibility into our model, effectively restricting predicted binding to regions of open chromatin. This dramatically improved our predictions to a correlation of 0.6-0.9 for various factors across known target genes. Finally, we used our model to quantify the roles of DNA sequence, accessibility, and binding competition and cooperativity. Our results show that, in regions of open chromatin, binding can be predicted almost exclusively by the sequence specificity of individual factors, with a minimal role for protein interactions. We suggest that a combination of experimentally determined chromatin accessibility data and simple computational models of transcription factor binding may be used to predict the binding landscape of any animal transcription factor with significant precision.

Collapse

Su J, Teichmann SA, Down TA. Assessing computational methods of cis-regulatory module prediction. PLoS Comput Biol 2010;6:e1001020. [PMID: 21152003 PMCID: PMC2996316 DOI: 10.1371/journal.pcbi.1001020] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2010] [Accepted: 10/29/2010] [Indexed: 01/02/2023] Open

Abstract

Computational methods attempting to identify instances of cis-regulatory modules (CRMs) in the genome face a challenging problem of searching for potentially interacting transcription factor binding sites while knowledge of the specific interactions involved remains limited. Without a comprehensive comparison of their performance, the reliability and accuracy of these tools remains unclear. Faced with a large number of different tools that address this problem, we summarized and categorized them based on search strategy and input data requirements. Twelve representative methods were chosen and applied to predict CRMs from the Drosophila CRM database REDfly, and across the human ENCODE regions. Our results show that the optimal choice of method varies depending on species and composition of the sequences in question. When discriminating CRMs from non-coding regions, those methods considering evolutionary conservation have a stronger predictive power than methods designed to be run on a single genome. Different CRM representations and search strategies rely on different CRM properties, and different methods can complement one another. For example, some favour homotypical clusters of binding sites, while others perform best on short CRMs. Furthermore, most methods appear to be sensitive to the composition and structure of the genome to which they are applied. We analyze the principal features that distinguish the methods that performed well, identify weaknesses leading to poor performance, and provide a guide for users. We also propose key considerations for the development and evaluation of future CRM-prediction methods.

Transcriptional regulation involves multiple transcription factors binding to DNA sequences. A limited repertoire of transcription factors performs this complex regulatory step through various spatial and temporal interactions between themselves and their binding sites. These transcription factor binding interactions are clustered as distinct modules: cis-regulatory modules (CRMs). Computational methods attempting to identify instances of CRMs in the genome face a challenging problem because a majority of these interactions between transcription factors remain unknown. To investigate the reliability and accuracy of these methods, we chose twelve representative methods and applied them to predict CRMs on both the fly and human genomes. Our results show that the optimal choice of method varies depending on species and composition of the sequences in question. Different CRM representations and search strategies rely on different CRM properties, and different methods can complement one another. We provide a guide for users and key considerations for developers. We also expect that, along with new technology generating new types of genomic data, future CRM prediction methods will be able to reveal transcription binding interactions in three-dimensional space.

Collapse