Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cheng Q, Kazemian M, Pham H, Blatti C, Celniker SE, Wolfe SA, Brodsky MH, Sinha S. Computational identification of diverse mechanisms underlying transcription factor-DNA occupancy. PLoS Genet 2013;9:e1003571. [PMID: 23935523 PMCID: PMC3731213 DOI: 10.1371/journal.pgen.1003571] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Accepted: 05/02/2013] [Indexed: 12/13/2022] Open

For:	Cheng Q, Kazemian M, Pham H, Blatti C, Celniker SE, Wolfe SA, Brodsky MH, Sinha S. Computational identification of diverse mechanisms underlying transcription factor-DNA occupancy. PLoS Genet 2013;9:e1003571. [PMID: 23935523 PMCID: PMC3731213 DOI: 10.1371/journal.pgen.1003571] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Accepted: 05/02/2013] [Indexed: 12/13/2022] Open

Number

Cited by Other Article(s)

de Boer CG, Taipale J. Hold out the genome: a roadmap to solving the cis-regulatory code. Nature 2024;625:41-50. [PMID: 38093018 DOI: 10.1038/s41586-023-06661-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/20/2023] [Indexed: 01/05/2024]

Martinez-Corral R, Park M, Biette KM, Friedrich D, Scholes C, Khalil AS, Gunawardena J, DePace AH. Transcriptional kinetic synergy: A complex landscape revealed by integrating modeling and synthetic biology. Cell Syst 2023;14:324-339.e7. [PMID: 37080164 PMCID: PMC10472254 DOI: 10.1016/j.cels.2023.02.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 08/22/2022] [Accepted: 02/10/2023] [Indexed: 04/22/2023]

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat Genet 2022;54:613-624. [PMID: 35551305 DOI: 10.1038/s41588-022-01048-5] [Citation(s) in RCA: 69] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 03/08/2022] [Indexed: 02/06/2023]

Bhogale S, Sinha S. Thermodynamics-based modeling reveals regulatory effects of indirect transcription factor-DNA binding. iScience 2022;25:104152. [PMID: 35465052 PMCID: PMC9018382 DOI: 10.1016/j.isci.2022.104152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 12/28/2021] [Accepted: 03/21/2022] [Indexed: 11/30/2022] Open

Garbuzov FE, Gursky VV. Nonequilibrium model of short-range repression in gene transcription regulation. Phys Rev E 2021;104:014407. [PMID: 34412298 DOI: 10.1103/physreve.104.014407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 06/24/2021] [Indexed: 11/07/2022]

Ullah F, Ben-Hur A. A self-attention model for inferring cooperativity between regulatory features. Nucleic Acids Res 2021;49:e77. [PMID: 33950192 PMCID: PMC8287919 DOI: 10.1093/nar/gkab349] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 04/15/2021] [Accepted: 04/20/2021] [Indexed: 11/14/2022] Open

Jindal GA, Farley EK. Enhancer grammar in development, evolution, and disease: dependencies and interplay. Dev Cell 2021;56:575-587. [PMID: 33689769 PMCID: PMC8462829 DOI: 10.1016/j.devcel.2021.02.016] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 02/15/2021] [Accepted: 02/16/2021] [Indexed: 12/19/2022]

Avsec Ž, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, Fropf R, McAnany C, Gagneur J, Kundaje A, Zeitlinger J. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet 2021;53:354-366. [PMID: 33603233 PMCID: PMC8812996 DOI: 10.1038/s41588-021-00782-6] [Citation(s) in RCA: 225] [Impact Index Per Article: 75.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Accepted: 01/07/2021] [Indexed: 01/30/2023]

Chen L, Capra JA. Learning and interpreting the gene regulatory grammar in a deep learning framework. PLoS Comput Biol 2020;16:e1008334. [PMID: 33137083 PMCID: PMC7660921 DOI: 10.1371/journal.pcbi.1008334] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 11/12/2020] [Accepted: 09/12/2020] [Indexed: 12/12/2022] Open

Abstract

Deep neural networks (DNNs) have achieved state-of-the-art performance in identifying gene regulatory sequences, but they have provided limited insight into the biology of regulatory elements due to the difficulty of interpreting the complex features they learn. Several models of how combinatorial binding of transcription factors, i.e. the regulatory grammar, drives enhancer activity have been proposed, ranging from the flexible TF billboard model to the stringent enhanceosome model. However, there is limited knowledge of the prevalence of these (or other) sequence architectures across enhancers. Here we perform several hypothesis-driven analyses to explore the ability of DNNs to learn the regulatory grammar of enhancers. We created synthetic datasets based on existing hypotheses about combinatorial transcription factor binding site (TFBS) patterns, including homotypic clusters, heterotypic clusters, and enhanceosomes, from real TF binding motifs from diverse TF families. We then trained deep residual neural networks (ResNets) to model the sequences under a range of scenarios that reflect real-world multi-label regulatory sequence prediction tasks. We developed a gradient-based unsupervised clustering method to extract the patterns learned by the ResNet models. We demonstrated that simulated regulatory grammars are best learned in the penultimate layer of the ResNets, and the proposed method can accurately retrieve the regulatory grammar even when there is heterogeneity in the enhancer categories and a large fraction of TFBS outside of the regulatory grammar. However, we also identify common scenarios where ResNets fail to learn simulated regulatory grammars. Finally, we applied the proposed method to mouse developmental enhancers and were able to identify the components of a known heterotypic TF cluster. Our results provide a framework for interpreting the regulatory rules learned by ResNets, and they demonstrate that the ability and efficiency of ResNets in learning the regulatory grammar depends on the nature of the prediction task.

Collapse

Small S, Arnosti DN. Transcriptional Enhancers in Drosophila. Genetics 2020;216:1-26. [PMID: 32878914 PMCID: PMC7463283 DOI: 10.1534/genetics.120.301370] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 06/20/2020] [Indexed: 12/14/2022] Open

Peng PC, Khoueiry P, Girardot C, Reddington JP, Garfield DA, Furlong EEM, Sinha S. The Role of Chromatin Accessibility in cis-Regulatory Evolution. Genome Biol Evol 2020;11:1813-1828. [PMID: 31114856 PMCID: PMC6601868 DOI: 10.1093/gbe/evz103] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/13/2019] [Indexed: 02/07/2023] Open

Xie X, Hanson C, Sinha S. Mechanistic interpretation of non-coding variants for discovering transcriptional regulators of drug response. BMC Biol 2019;17:62. [PMID: 31362726 PMCID: PMC6664756 DOI: 10.1186/s12915-019-0679-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 07/09/2019] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

Identification of functional non-coding variants and their mechanistic interpretation is a major challenge of modern genomics, especially for precision medicine. Transcription factor (TF) binding profiles and epigenomic landscapes in reference samples allow functional annotation of the genome, but do not provide ready answers regarding the effects of non-coding variants on phenotypes. A promising computational approach is to build models that predict TF-DNA binding from sequence, and use such models to score a variant's impact on TF binding strength. Here, we asked if this mechanistic approach to variant interpretation can be combined with information on genotype-phenotype associations to discover transcription factors regulating phenotypic variation among individuals.

RESULTS

We developed a statistical approach that integrates phenotype, genotype, gene expression, TF ChIP-seq, and Hi-C chromatin interaction data to answer this question. Using drug sensitivity of lymphoblastoid cell lines as the phenotype of interest, we tested if non-coding variants statistically linked to the phenotype are enriched for strong predicted impact on DNA binding strength of a TF and thus identified TFs regulating individual differences in the phenotype. Our approach relies on a new method for predicting variant impact on TF-DNA binding that uses a combination of biophysical modeling and machine learning. We report statistical and literature-based support for many of the TFs discovered here as regulators of drug response variation. We show that the use of mechanistically driven variant impact predictors can identify TF-drug associations that would otherwise be missed. We examined in depth one reported association-that of the transcription factor ELF1 with the drug doxorubicin-and identified several genes that may mediate this regulatory relationship.

CONCLUSION

Our work represents initial steps in utilizing predictions of variant impact on TF binding sites for discovery of regulatory mechanisms underlying phenotypic variation. Future advances on this topic will be greatly beneficial to the reconstruction of phenotype-associated gene regulatory networks.

Collapse

Datta V, Hannenhalli S, Siddharthan R. ChIPulate: A comprehensive ChIP-seq simulation pipeline. PLoS Comput Biol 2019;15:e1006921. [PMID: 30897079 PMCID: PMC6445533 DOI: 10.1371/journal.pcbi.1006921] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 04/02/2019] [Accepted: 03/04/2019] [Indexed: 12/17/2022] Open

Abstract

ChIP-seq (Chromatin Immunoprecipitation followed by sequencing) is a high-throughput technique to identify genomic regions that are bound in vivo by a particular protein, e.g., a transcription factor (TF). Biological factors, such as chromatin state, indirect and cooperative binding, as well as experimental factors, such as antibody quality, cross-linking, and PCR biases, are known to affect the outcome of ChIP-seq experiments. However, the relative impact of these factors on inferences made from ChIP-seq data is not entirely clear. Here, via a detailed ChIP-seq simulation pipeline, ChIPulate, we assess the impact of various biological and experimental sources of variation on several outcomes of a ChIP-seq experiment, viz., the recoverability of the TF binding motif, accuracy of TF-DNA binding detection, the sensitivity of inferred TF-DNA binding strength, and number of replicates needed to confidently infer binding strength. We find that the TF motif can be recovered despite poor and non-uniform extraction and PCR amplification efficiencies. The recovery of the motif is, however, affected to a larger extent by the fraction of sites that are either cooperatively or indirectly bound. Importantly, our simulations reveal that the number of ChIP-seq replicates needed to accurately measure in vivo occupancy at high-affinity sites is larger than the recommended community standards. Our results establish statistical limits on the accuracy of inferences of protein-DNA binding from ChIP-seq and suggest that increasing the mean extraction efficiency, rather than amplification efficiency, would better improve sensitivity. The source code and instructions for running ChIPulate can be found at https://github.com/vishakad/chipulate.

DNA-binding proteins perform many key roles in biology, such as transcriptional regulation of gene expression and chromatin modification. ChIP-seq (Chromatin immunoprecipitation followed by high-throughput sequencing) is a widely used experimental technique to identify DNA-binding sites of specific proteins of interest, within cells, genome-wide. DNA fragments from genomic regions that are bound by a protein of interest, often a transcription factor (TF), are selectively extracted using specific antibodies, amplified using PCR, and sequenced. The sequences are mapped to the reference genome. Regions where many sequences map, called “peaks”, are used to infer the location of TF-bound loci (peaks), in vivo occupancy at those loci, and the sequence pattern (motif) to which the TF shows a binding affinity. But measurements of TF occupancy and motif inference are vulnerable to several biological and experimental sources of variation that are poorly understood and difficult to assess directly. Here, we simulate key steps of the ChIP-seq protocol with the aim of estimating the relative effects of various sources of variations on motif inference and binding affinity estimations. Besides providing specific insights and recommendations, we provide a general framework to simulate sequence reads in a ChIP-seq experiment, which should considerably aid in the development of software aimed at analyzing ChIP-seq data.

Collapse

Asada R, Umeda M, Adachi A, Senmatsu S, Abe T, Iwasaki H, Ohta K, Hoffman CS, Hirota K. Recruitment and delivery of the fission yeast Rst2 transcription factor via a local genome structure counteracts repression by Tup1-family corepressors. Nucleic Acids Res 2017;45:9361-9371. [PMID: 28934464 PMCID: PMC5766161 DOI: 10.1093/nar/gkx555] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 06/14/2017] [Indexed: 12/12/2022] Open

Khoueiry P, Girardot C, Ciglar L, Peng PC, Gustafson EH, Sinha S, Furlong EE. Uncoupling evolutionary changes in DNA sequence, transcription factor occupancy and enhancer activity. eLife 2017;6. [PMID: 28792889 PMCID: PMC5550276 DOI: 10.7554/elife.28440] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 07/21/2017] [Indexed: 12/15/2022] Open

Sex combs reduced (Scr) regulatory region of Drosophila revisited. Mol Genet Genomics 2017;292:773-787. [DOI: 10.1007/s00438-017-1309-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Accepted: 03/08/2017] [Indexed: 10/19/2022]

Contextual Refinement of Regulatory Targets Reveals Effects on Breast Cancer Prognosis of the Regulome. PLoS Comput Biol 2017;13:e1005340. [PMID: 28103241 PMCID: PMC5289608 DOI: 10.1371/journal.pcbi.1005340] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Revised: 02/02/2017] [Accepted: 01/03/2017] [Indexed: 01/12/2023] Open

Abstract

Gene expression regulators, such as transcription factors (TFs) and microRNAs (miRNAs), have varying regulatory targets based on the tissue and physiological state (context) within which they are expressed. While the emergence of regulator-characterizing experiments has inferred the target genes of many regulators across many contexts, methods for transferring regulator target genes across contexts are lacking. Further, regulator target gene lists frequently are not curated or have permissive inclusion criteria, impairing their use. Here, we present a method called iterative Contextual Transcriptional Activity Inference of Regulators (icTAIR) to resolve these issues. icTAIR takes a regulator’s previously-identified target gene list and combines it with gene expression data from a context, quantifying that regulator’s activity for that context. It then calculates the correlation between each listed target gene’s expression and the quantitative score of regulatory activity, removes the uncorrelated genes from the list, and iterates the process until it derives a stable list of refined target genes. To validate and demonstrate icTAIR’s power, we use it to refine the MSigDB c3 database of TF, miRNA and unclassified motif target gene lists for breast cancer. We then use its output for survival analysis with clinicopathological multivariable adjustment in 7 independent breast cancer datasets covering 3,430 patients. We uncover many novel prognostic regulators that were obscured prior to refinement, in particular NFY, and offer a detailed look at the composition and relationships among the breast cancer prognostic regulome. We anticipate icTAIR will be of general use in contextually refining regulator target genes for discoveries across many contexts. The icTAIR algorithm can be downloaded from https://github.com/icTAIR.

Gene expression regulators, such as transcription factors and microRNAs, are critical actors in cellular physiology and pathophysiology and act by modulating the expression levels of sets of target genes. Given their significance, numerous experiments have sought to characterize the specific target genes of specific regulators, which in turn has led to regulator target gene list databases. Unfortunately, these lists are plagued by poor curation and validation. Further, all lists suffer from the fundamental issue that regulator targets vary across tissue type and physiological state, or “context”, making them poor for conducting downstream, context-specific analyses. To address this issue, here we present a method called icTAIR that contextually-refines regulator target gene lists. To demonstrate its value, we use icTAIR to take the largest-available database of regulator target gene lists, refine it for the breast cancer context, and use both the pre-refined and refined lists for downstream survival analyses in over 3,400 tumors. We find that icTAIR improves the statistical power of the analyses by multiple orders of magnitude. This in turn lets us map the relational network of breast cancer regulators and identify regulators with prognostic effects even after clinicopathological adjustment. We anticipate icTAIR will be broadly useful in regulator studies.

Collapse

Yang CC, Chen MH, Lin SY, Andrews EH, Cheng C, Liu CC, Chen JJW. Inferring condition-specific targets of human TF-TF complexes using ChIP-seq data. BMC Genomics 2017;18:61. [PMID: 28068916 PMCID: PMC5223348 DOI: 10.1186/s12864-016-3450-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 12/21/2016] [Indexed: 01/18/2023] Open

Crocker J, Tsai A, Stern DL. A Fully Synthetic Transcriptional Platform for a Multicellular Eukaryote. Cell Rep 2017;18:287-296. [DOI: 10.1016/j.celrep.2016.12.025] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Revised: 12/14/2015] [Accepted: 12/07/2016] [Indexed: 01/12/2023] Open

Peng PC, Sinha S. Quantitative modeling of gene expression using DNA shape features of binding sites. Nucleic Acids Res 2016;44:e120. [PMID: 27257066 PMCID: PMC5291265 DOI: 10.1093/nar/gkw446] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Revised: 05/06/2016] [Accepted: 05/09/2016] [Indexed: 12/11/2022] Open

Gomes ALC, Wang HH. The Role of Genome Accessibility in Transcription Factor Binding in Bacteria. PLoS Comput Biol 2016;12:e1004891. [PMID: 27104615 PMCID: PMC4841574 DOI: 10.1371/journal.pcbi.1004891] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Accepted: 03/31/2016] [Indexed: 02/01/2023] Open

Abstract

ChIP-seq enables genome-scale identification of regulatory regions that govern gene expression. However, the biological insights generated from ChIP-seq analysis have been limited to predictions of binding sites and cooperative interactions. Furthermore, ChIP-seq data often poorly correlate with in vitro measurements or predicted motifs, highlighting that binding affinity alone is insufficient to explain transcription factor (TF)-binding in vivo. One possibility is that binding sites are not equally accessible across the genome. A more comprehensive biophysical representation of TF-binding is required to improve our ability to understand, predict, and alter gene expression. Here, we show that genome accessibility is a key parameter that impacts TF-binding in bacteria. We developed a thermodynamic model that parameterizes ChIP-seq coverage in terms of genome accessibility and binding affinity. The role of genome accessibility is validated using a large-scale ChIP-seq dataset of the M. tuberculosis regulatory network. We find that accounting for genome accessibility led to a model that explains 63% of the ChIP-seq profile variance, while a model based in motif score alone explains only 35% of the variance. Moreover, our framework enables de novo ChIP-seq peak prediction and is useful for inferring TF-binding peaks in new experimental conditions by reducing the need for additional experiments. We observe that the genome is more accessible in intergenic regions, and that increased accessibility is positively correlated with gene expression and anti-correlated with distance to the origin of replication. Our biophysically motivated model provides a more comprehensive description of TF-binding in vivo from first principles towards a better representation of gene regulation in silico, with promising applications in systems biology.

Collapse

Bottani S, Veitia RA. Hill function-based models of transcriptional switches: impact of specific, nonspecific, functional and nonfunctional binding. Biol Rev Camb Philos Soc 2016;92:953-963. [PMID: 27061969 DOI: 10.1111/brv.12262] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Revised: 02/12/2016] [Accepted: 02/16/2016] [Indexed: 12/25/2022]

Peng PC, Hassan Samee MA, Sinha S. Incorporating chromatin accessibility data into sequence-to-expression modeling. Biophys J 2016;108:1257-67. [PMID: 25762337 DOI: 10.1016/j.bpj.2014.12.037] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Revised: 12/01/2014] [Accepted: 12/11/2014] [Indexed: 01/30/2023] Open

Samee MAH, Lim B, Samper N, Lu H, Rushlow CA, Jiménez G, Shvartsman SY, Sinha S. A Systematic Ensemble Approach to Thermodynamic Modeling of Gene Expression from Sequence Data. Cell Syst 2015;1:396-407. [PMID: 27136354 DOI: 10.1016/j.cels.2015.12.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2015] [Revised: 10/19/2015] [Accepted: 12/02/2015] [Indexed: 11/17/2022]

Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast. PLoS Comput Biol 2015;11:e1004418. [PMID: 26291518 PMCID: PMC4546298 DOI: 10.1371/journal.pcbi.1004418] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 06/29/2015] [Indexed: 11/19/2022] Open

Abstract

Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA “intrinsic properties” (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

Identification of transcription factor binding sites based on sequence motifs is typically accompanied by a high false positive rate. Increasing evidence suggests that there are many other factors besides DNA sequence that may affect the binding and interaction of TFs with DNA. Through the integration of sequence motif, chromatin state, and DNA structure properties, we show that TF binding can be better predicted. Moreover, considering chromatin state and DNA structure properties simultaneously yields a significant improvement. While the binding of some TFs can be readily predicted using either chromatin state information or DNA structure, other TFs need both. Thus, our findings provide insights on how different histone modifications and DNA structure properties may influence the binding of a particular TF and thus how TFs regulate gene expression. These features are referred to as sequence “intrinsic properties” because they can be predicted from sequences alone. These intrinsic properties can be used to build a TF binding prediction model that has a similar performance to considering all features. Moreover, the intrinsic property model allows TFBS predictions not only across TFs, but also across DNA-binding domain families that are present in most eukaryotes, suggesting that the model likely can be used across species.

Collapse

Sebeson A, Xi L, Zhang Q, Sigmund A, Wang JP, Widom J, Wang X. Differential Nucleosome Occupancies across Oct4-Sox2 Binding Sites in Murine Embryonic Stem Cells. PLoS One 2015;10:e0127214. [PMID: 25992972 PMCID: PMC4436218 DOI: 10.1371/journal.pone.0127214] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2014] [Accepted: 04/13/2015] [Indexed: 12/03/2022] Open

Colombo N, Vlassis N. FastMotif: spectral sequence motif discovery. Bioinformatics 2015;31:2623-31. [PMID: 25886979 DOI: 10.1093/bioinformatics/btv208] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Accepted: 04/09/2015] [Indexed: 11/14/2022] Open

Blatti C, Kazemian M, Wolfe S, Brodsky M, Sinha S. Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism. Nucleic Acids Res 2015;43:3998-4012. [PMID: 25791631 PMCID: PMC4417154 DOI: 10.1093/nar/gkv195] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 02/24/2015] [Indexed: 11/17/2022] Open

Zabet NR, Adryan B. Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res 2015;43:84-94. [PMID: 25432957 PMCID: PMC4288167 DOI: 10.1093/nar/gku1269] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Revised: 10/22/2014] [Accepted: 11/19/2014] [Indexed: 12/20/2022] Open

Xu DJ, Noyes MB. Understanding DNA-binding specificity by bacteria hybrid selection. Brief Funct Genomics 2014;14:3-16. [PMID: 25539837 DOI: 10.1093/bfgp/elu048] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Griffon A, Barbier Q, Dalino J, van Helden J, Spicuglia S, Ballester B. Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory landscape. Nucleic Acids Res 2014;43:e27. [PMID: 25477382 PMCID: PMC4344487 DOI: 10.1093/nar/gku1280] [Citation(s) in RCA: 106] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open

Krebs W, Schmidt SV, Goren A, De Nardo D, Labzin L, Bovier A, Ulas T, Theis H, Kraut M, Latz E, Beyer M, Schultze JL. Optimization of transcription factor binding map accuracy utilizing knockout-mouse models. Nucleic Acids Res 2014;42:13051-60. [PMID: 25378309 PMCID: PMC4245947 DOI: 10.1093/nar/gku1078] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Revised: 09/26/2014] [Accepted: 10/16/2014] [Indexed: 12/20/2022] Open

Siepel A, Arbiza L. Cis-regulatory elements and human evolution. Curr Opin Genet Dev 2014;29:81-9. [PMID: 25218861 PMCID: PMC4258466 DOI: 10.1016/j.gde.2014.08.011] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Revised: 08/17/2014] [Accepted: 08/23/2014] [Indexed: 11/20/2022]

Slattery M, Zhou T, Yang L, Dantas Machado AC, Gordân R, Rohs R. Absence of a simple code: how transcription factors read the genome. Trends Biochem Sci 2014;39:381-99. [PMID: 25129887 DOI: 10.1016/j.tibs.2014.07.002] [Citation(s) in RCA: 337] [Impact Index Per Article: 33.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Revised: 07/11/2014] [Accepted: 07/15/2014] [Indexed: 12/21/2022]

Ezer D, Zabet NR, Adryan B. Homotypic clusters of transcription factor binding sites: A model system for understanding the physical mechanics of gene expression. Comput Struct Biotechnol J 2014;10:63-9. [PMID: 25349675 PMCID: PMC4204428 DOI: 10.1016/j.csbj.2014.07.005] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Gupta A, Christensen RG, Bell HA, Goodwin M, Patel RY, Pandey M, Enuameh MS, Rayla AL, Zhu C, Thibodeau-Beganny S, Brodsky MH, Joung JK, Wolfe SA, Stormo GD. An improved predictive recognition model for Cys(2)-His(2) zinc finger proteins. Nucleic Acids Res 2014;42:4800-12. [PMID: 24523353 PMCID: PMC4005693 DOI: 10.1093/nar/gku132] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Revised: 01/21/2014] [Accepted: 01/22/2014] [Indexed: 11/17/2022] Open

Affiliation(s)

Ankit Gupta Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
Ryan G. Christensen Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
Heather A. Bell Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
Mathew Goodwin Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
Ronak Y. Patel Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
Manishi Pandey Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
Metewo Selase Enuameh Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
Amy L. Rayla Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
Cong Zhu Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
Stacey Thibodeau-Beganny Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
Michael H. Brodsky Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
J. Keith Joung Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
Scot A. Wolfe Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
Gary D. Stormo Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA

Collapse

Ezer D, Zabet NR, Adryan B. Physical constraints determine the logic of bacterial promoter architectures. Nucleic Acids Res 2014;42:4196-207. [PMID: 24476912 PMCID: PMC3985651 DOI: 10.1093/nar/gku078] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open

Duque T, Samee MAH, Kazemian M, Pham HN, Brodsky MH, Sinha S. Simulations of enhancer evolution provide mechanistic insights into gene regulation. Mol Biol Evol 2013;31:184-200. [PMID: 24097306 PMCID: PMC3879441 DOI: 10.1093/molbev/mst170] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open

Abstract

There is growing interest in models of regulatory sequence evolution. However, existing models specifically designed for regulatory sequences consider the independent evolution of individual transcription factor (TF)-binding sites, ignoring that the function and evolution of a binding site depends on its context, typically the cis-regulatory module (CRM) in which the site is located. Moreover, existing models do not account for the gene-specific roles of TF-binding sites, primarily because their roles often are not well understood. We introduce two models of regulatory sequence evolution that address some of the shortcomings of existing models and implement simulation frameworks based on them. One model simulates the evolution of an individual binding site in the context of a CRM, while the other evolves an entire CRM. Both models use a state-of-the art sequence-to-expression model to predict the effects of mutations on the regulatory output of the CRM and determine the strength of selection. We use the new framework to simulate the evolution of TF-binding sites in 37 well-studied CRMs belonging to the anterior-posterior patterning system in Drosophila embryos. We show that these simulations provide accurate fits to evolutionary data from 12 Drosophila genomes, which includes statistics of binding site conservation on relatively short evolutionary scales and site loss across larger divergence times. The new framework allows us, for the first time, to test hypotheses regarding the underlying cis-regulatory code by directly comparing the evolutionary implications of the hypothesis with the observed evolutionary dynamics of binding sites. Using this capability, we find that explicitly modeling self-cooperative DNA binding by the TF Caudal (CAD) provides significantly better fits than an otherwise identical evolutionary simulation that lacks this mechanistic aspect. This hypothesis is further supported by a statistical analysis of the distribution of intersite spacing between adjacent CAD sites. Experimental tests confirm direct homodimeric interaction between CAD molecules as well as self-cooperative DNA binding by CAD. We note that computational modeling of the D. melanogaster CRMs alone did not yield significant evidence to support CAD self-cooperativity. We thus demonstrate how specific mechanistic details encoded in CRMs can be revealed by modeling their evolution and fitting such models to multispecies data.

Collapse