Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Stormo GD, Fields DS. Specificity, free energy and information content in protein-DNA interactions. Trends Biochem Sci 1998;23:109-13. [PMID: 9581503 DOI: 10.1016/s0968-0004(98)01187-6] [Citation(s) in RCA: 237] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

For:	Stormo GD, Fields DS. Specificity, free energy and information content in protein-DNA interactions. Trends Biochem Sci 1998;23:109-13. [PMID: 9581503 DOI: 10.1016/s0968-0004(98)01187-6] [Citation(s) in RCA: 237] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Number

Cited by Other Article(s)

Andreani V, South EJ, Dunlop MJ. Generating information-dense promoter sequences with optimal string packing. PLoS Comput Biol 2024;20:e1012276. [PMID: 39047028 PMCID: PMC11268586 DOI: 10.1371/journal.pcbi.1012276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Accepted: 06/25/2024] [Indexed: 07/27/2024] Open

Abstract

Dense arrangements of binding sites within nucleotide sequences can collectively influence downstream transcription rates or initiate biomolecular interactions. For example, natural promoter regions can harbor many overlapping transcription factor binding sites that influence the rate of transcription initiation. Despite the prevalence of overlapping binding sites in nature, rapid design of nucleotide sequences with many overlapping sites remains a challenge. Here, we show that this is an NP-hard problem, coined here as the nucleotide String Packing Problem (SPP). We then introduce a computational technique that efficiently assembles sets of DNA-protein binding sites into dense, contiguous stretches of double-stranded DNA. For the efficient design of nucleotide sequences spanning hundreds of base pairs, we reduce the SPP to an Orienteering Problem with integer distances, and then leverage modern integer linear programming solvers. Our method optimally packs sets of 20-100 binding sites into dense nucleotide arrays of 50-300 base pairs in 0.05-10 seconds. Unlike approximation algorithms or meta-heuristics, our approach finds provably optimal solutions. We demonstrate how our method can generate large sets of diverse sequences suitable for library generation, where the frequency of binding site usage across the returned sequences can be controlled by modulating the objective function. As an example, we then show how adding additional constraints, like the inclusion of sequence elements with fixed positions, allows for the design of bacterial promoters. The nucleotide string packing approach we present can accelerate the design of sequences with complex DNA-protein interactions. When used in combination with synthesis and high-throughput screening, this design strategy could help interrogate how complex binding site arrangements impact either gene expression or biomolecular mechanisms in varied cellular contexts.

Collapse

Tabe-Bordbar S, Song YJ, Lunt BJ, Alavi Z, Prasanth KV, Sinha S. Mechanistic analysis of enhancer sequences in the estrogen receptor transcriptional program. Commun Biol 2024;7:719. [PMID: 38862711 PMCID: PMC11167054 DOI: 10.1038/s42003-024-06400-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 05/30/2024] [Indexed: 06/13/2024] Open

Lipps G. Definition of the binding specificity of the T7 bacteriophage primase by analysis of a protein binding microarray using a thermodynamic model. Nucleic Acids Res 2024;52:4818-4829. [PMID: 38597656 PMCID: PMC11109968 DOI: 10.1093/nar/gkae215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 01/26/2024] [Accepted: 03/13/2024] [Indexed: 04/11/2024] Open

Ishigami Y, Wong MS, Martí-Gómez C, Ayaz A, Kooshkbaghi M, Hanson SM, McCandlish DM, Krainer AR, Kinney JB. Specificity, synergy, and mechanisms of splice-modifying drugs. Nat Commun 2024;15:1880. [PMID: 38424098 PMCID: PMC10904865 DOI: 10.1038/s41467-024-46090-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 02/10/2024] [Indexed: 03/02/2024] Open

Liu S, Gomez-Alcala P, Leemans C, Glassford WJ, Mann RS, Bussemaker HJ. Predicting the DNA binding specificity of mutated transcription factors using family-level biophysically interpretable machine learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.24.577115. [PMID: 38352411 PMCID: PMC10862739 DOI: 10.1101/2024.01.24.577115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]

Recio PS, Mitra NJ, Shively CA, Song D, Jaramillo G, Lewis KS, Chen X, Mitra R. Zinc cluster transcription factors frequently activate target genes using a non-canonical half-site binding mode. Nucleic Acids Res 2023;51:5006-5021. [PMID: 37125648 PMCID: PMC10250231 DOI: 10.1093/nar/gkad320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 04/11/2023] [Accepted: 04/14/2023] [Indexed: 05/02/2023] Open

Affiliation(s)

Pamela S Recio Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
Nikhil J Mitra Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
Christian A Shively Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
David Song Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
Grace Jaramillo Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
Kristine Shady Lewis Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
Xuhua Chen Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
Robi D Mitra Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA McDonnell Genome Institute, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA

Collapse

Alexandari AM, Horton CA, Shrikumar A, Shah N, Li E, Weilert M, Pufall MA, Zeitlinger J, Fordyce PM, Kundaje A. De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.11.540401. [PMID: 37214836 PMCID: PMC10197627 DOI: 10.1101/2023.05.11.540401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]

Abstract

Transcription factors (TF) are proteins that bind DNA in a sequence-specific manner to regulate gene transcription. Despite their unique intrinsic sequence preferences, in vivo genomic occupancy profiles of TFs differ across cellular contexts. Hence, deciphering the sequence determinants of TF binding, both intrinsic and context-specific, is essential to understand gene regulation and the impact of regulatory, non-coding genetic variation. Biophysical models trained on in vitro TF binding assays can estimate intrinsic affinity landscapes and predict occupancy based on TF concentration and affinity. However, these models cannot adequately explain context-specific, in vivo binding profiles. Conversely, deep learning models, trained on in vivo TF binding assays, effectively predict and explain genomic occupancy profiles as a function of complex regulatory sequence syntax, albeit without a clear biophysical interpretation. To reconcile these complementary models of in vitro and in vivo TF binding, we developed Affinity Distillation (AD), a method that extracts thermodynamic affinities de-novo from deep learning models of TF chromatin immunoprecipitation (ChIP) experiments by marginalizing away the influence of genomic sequence context. Applied to neural networks modeling diverse classes of yeast and mammalian TFs, AD predicts energetic impacts of sequence variation within and surrounding motifs on TF binding as measured by diverse in vitro assays with superior dynamic range and accuracy compared to motif-based methods. Furthermore, AD can accurately discern affinities of TF paralogs. Our results highlight thermodynamic affinity as a key determinant of in vivo binding, suggest that deep learning models of in vivo binding implicitly learn high-resolution affinity landscapes, and show that these affinities can be successfully distilled using AD. This new biophysical interpretation of deep learning models enables high-throughput in silico experiments to explore the influence of sequence context and variation on both intrinsic affinity and in vivo occupancy.

Collapse

Ni P, Wilson D, Su Z. A map of cis-regulatory modules and constituent transcription factor binding sites in 80% of the mouse genome. BMC Genomics 2022;23:714. [PMID: 36261804 PMCID: PMC9583556 DOI: 10.1186/s12864-022-08933-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 10/11/2022] [Indexed: 11/10/2022] Open

Huang YA, Pan GQ, Wang J, Li JQ, Chen J, Wu YH. Heterogeneous graph embedding model for predicting interactions between TF and target gene. Bioinformatics 2022;38:2554-2560. [PMID: 35266510 DOI: 10.1093/bioinformatics/btac148] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/13/2022] [Accepted: 03/09/2022] [Indexed: 11/15/2022] Open

Wang XF, Sun J, Wang XL, Tian JK, Tian ZW, Zhang JL, Jia R. MD investigation on the binding of microphthalmia-associated transcription factor with DNA. JOURNAL OF SAUDI CHEMICAL SOCIETY 2022. [DOI: 10.1016/j.jscs.2022.101420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Sun H, Chen W, Chen L, Zheng W. Exploring the molecular basis of UG-rich RNA recognition by the human splicing factor TDP-43 using molecular dynamics simulation and free energy calculation. J Comput Chem 2021;42:1670-1680. [PMID: 34109652 DOI: 10.1002/jcc.26704] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 04/15/2021] [Accepted: 05/23/2021] [Indexed: 11/12/2022]

Zhang L, Karimzadeh M, Welch M, McIntosh C, Wang B. Analytics methods and tools for integration of biomedical data in medicine. Artif Intell Med 2021. [DOI: 10.1016/b978-0-12-821259-2.00007-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Ireland WT, Beeler SM, Flores-Bautista E, McCarty NS, Röschinger T, Belliveau NM, Sweredoski MJ, Moradian A, Kinney JB, Phillips R. Deciphering the regulatory genome of Escherichia coli, one hundred promoters at a time. eLife 2020;9:e55308. [PMID: 32955440 PMCID: PMC7567609 DOI: 10.7554/elife.55308] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 09/18/2020] [Indexed: 01/28/2023] Open

Cencini M, Pigolotti S. Energetic funnel facilitates facilitated diffusion. Nucleic Acids Res 2019;46:558-567. [PMID: 29216364 PMCID: PMC5778461 DOI: 10.1093/nar/gkx1220] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Accepted: 11/24/2017] [Indexed: 01/25/2023] Open

Kinney JB, McCandlish DM. Massively Parallel Assays and Quantitative Sequence-Function Relationships. Annu Rev Genomics Hum Genet 2019;20:99-127. [PMID: 31091417 DOI: 10.1146/annurev-genom-083118-014845] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Djordjevic M, Rodic A, Graovac S. From biophysics to 'omics and systems biology. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2019;48:413-424. [PMID: 30972433 DOI: 10.1007/s00249-019-01366-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 02/12/2019] [Accepted: 04/03/2019] [Indexed: 01/03/2023]

Li H, Quang D, Guan Y. Anchor: trans-cell type prediction of transcription factor binding sites. Genome Res 2019;29:281-292. [PMID: 30567711 PMCID: PMC6360811 DOI: 10.1101/gr.237156.118] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 12/13/2018] [Indexed: 12/16/2022]

Keilwagen J, Posch S, Grau J. Accurate prediction of cell type-specific transcription factor binding. Genome Biol 2019;20:9. [PMID: 30630522 PMCID: PMC6327544 DOI: 10.1186/s13059-018-1614-y] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 12/18/2018] [Indexed: 01/11/2023] Open

Lee NK, Li X, Wang D. A comprehensive survey on genetic algorithms for DNA motif prediction. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.07.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Zhang Q, Fan X, Wang Y, Sun MA, Shao J, Guo D. BPP: a sequence-based algorithm for branch point prediction. Bioinformatics 2018. [PMID: 28633445 DOI: 10.1093/bioinformatics/btx401] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open

Käppel S, Melzer R, Rümpler F, Gafert C, Theißen G. The floral homeotic protein SEPALLATA3 recognizes target DNA sequences by shape readout involving a conserved arginine residue in the MADS-domain. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2018;95:341-357. [PMID: 29744943 DOI: 10.1111/tpj.13954] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Revised: 04/17/2018] [Accepted: 04/23/2018] [Indexed: 05/05/2023]

Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding. Proc Natl Acad Sci U S A 2018;115:E3702-E3711. [PMID: 29588420 PMCID: PMC5910820 DOI: 10.1073/pnas.1715888115] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Wei X, Zhang J. Why Phenotype Robustness Promotes Phenotype Evolvability. Genome Biol Evol 2017;9:3509-3515. [PMID: 29228219 PMCID: PMC5751051 DOI: 10.1093/gbe/evx264] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/07/2017] [Indexed: 12/14/2022] Open

Djordjevic M, Djordjevic M, Zdobnov E. Scoring Targets of Transcription in Bacteria Rather than Focusing on Individual Binding Sites. Front Microbiol 2017;8:2314. [PMID: 29213263 PMCID: PMC5702782 DOI: 10.3389/fmicb.2017.02314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Accepted: 11/09/2017] [Indexed: 11/13/2022] Open

Yesudhas D, Anwar MA, Panneerselvam S, Kim HK, Choi S. Evaluation of Sox2 binding affinities for distinct DNA patterns using steered molecular dynamics simulation. FEBS Open Bio 2017;7:1750-1767. [PMID: 29123983 PMCID: PMC5666385 DOI: 10.1002/2211-5463.12316] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Revised: 08/14/2017] [Accepted: 09/05/2017] [Indexed: 11/29/2022] Open

Gursky VV, Kozlov KN, Kulakovskiy IV, Zubair A, Marjoram P, Lawrie DS, Nuzhdin SV, Samsonova MG. Translating natural genetic variation to gene expression in a computational model of the Drosophila gap gene regulatory network. PLoS One 2017;12:e0184657. [PMID: 28898266 PMCID: PMC5595321 DOI: 10.1371/journal.pone.0184657] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 08/28/2017] [Indexed: 11/18/2022] Open

Abstract

Annotating the genotype-phenotype relationship, and developing a proper quantitative description of the relationship, requires understanding the impact of natural genomic variation on gene expression. We apply a sequence-level model of gap gene expression in the early development of Drosophila to analyze single nucleotide polymorphisms (SNPs) in a panel of natural sequenced D. melanogaster lines. Using a thermodynamic modeling framework, we provide both analytical and computational descriptions of how single-nucleotide variants affect gene expression. The analysis reveals that the sequence variants increase (decrease) gene expression if located within binding sites of repressors (activators). We show that the sign of SNP influence (activation or repression) may change in time and space and elucidate the origin of this change in specific examples. The thermodynamic modeling approach predicts non-local and non-linear effects arising from SNPs, and combinations of SNPs, in individual fly genotypes. Simulation of individual fly genotypes using our model reveals that this non-linearity reduces to almost additive inputs from multiple SNPs. Further, we see signatures of the action of purifying selection in the gap gene regulatory regions. To infer the specific targets of purifying selection, we analyze the patterns of polymorphism in the data at two phenotypic levels: the strengths of binding and expression. We find that combinations of SNPs show evidence of being under selective pressure, while individual SNPs do not. The model predicts that SNPs appear to accumulate in the genotypes of the natural population in a way biased towards small increases in activating action on the expression pattern. Taken together, these results provide a systems-level view of how genetic variation translates to the level of gene regulatory networks via combinatorial SNP effects.

Collapse

Inherent limitations of probabilistic models for protein-DNA binding specificity. PLoS Comput Biol 2017;13:e1005638. [PMID: 28686588 PMCID: PMC5521849 DOI: 10.1371/journal.pcbi.1005638] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Revised: 07/21/2017] [Accepted: 06/21/2017] [Indexed: 01/10/2023] Open

Abstract

The specificities of transcription factors are most commonly represented with probabilistic models. These models provide a probability for each base occurring at each position within the binding site and the positions are assumed to contribute independently. The model is simple and intuitive and is the basis for many motif discovery algorithms. However, the model also has inherent limitations that prevent it from accurately representing true binding probabilities, especially for the highest affinity sites under conditions of high protein concentration. The limitations are not due to the assumption of independence between positions but rather are caused by the non-linear relationship between binding affinity and binding probability and the fact that independent normalization at each position skews the site probabilities. Generally probabilistic models are reasonably good approximations, but new high-throughput methods allow for biophysical models with increased accuracy that should be used whenever possible.

Transcription factors (TFs), a class of DNA-binding proteins, play a central role in the regulation of gene expression. TFs control the rate of transcription by binding to the genome in a sequence-specific manner. Thus, one important aspect in the study of gene regulation mechanism is to model the binding specificities of TFs, namely the features of the DNA sequences that a TF prefers to bind. Multiple models have been proposed to characterize the binding specificities of TFs, among which the class of probabilistic models is the most popular. In this study, we point out several major limitations of the well-established probabilistic model by comparing it with the biophysical model. Through simulations we demonstrate that the probabilistic model is only an approximation of the biophysical model. The latter has most of the advantages of the former, and is a more accurate representation of binding specificities. We propose a shift from the probabilistic model to the biophysical model in future studies of protein-DNA interactions.

Collapse

López Y, Vandenbon A, Nose A, Nakai K. Modeling the cis-regulatory modules of genes expressed in developmental stages of Drosophila melanogaster. PeerJ 2017;5:e3389. [PMID: 28584716 PMCID: PMC5452948 DOI: 10.7717/peerj.3389] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 05/08/2017] [Indexed: 12/30/2022] Open

Li L, Wunderlich Z. An Enhancer's Length and Composition Are Shaped by Its Regulatory Task. Front Genet 2017;8:63. [PMID: 28588608 PMCID: PMC5440464 DOI: 10.3389/fgene.2017.00063] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 05/08/2017] [Indexed: 12/02/2022] Open

Guo C, McDowell IC, Nodzenski M, Scholtens DM, Allen AS, Lowe WL, Reddy TE. Transversions have larger regulatory effects than transitions. BMC Genomics 2017;18:394. [PMID: 28525990 PMCID: PMC5438547 DOI: 10.1186/s12864-017-3785-4] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 05/10/2017] [Indexed: 12/30/2022] Open

Goldshtein M, Lukatsky DB. Specificity-Determining DNA Triplet Code for Positioning of Human Preinitiation Complex. Biophys J 2017;112:2047-2050. [PMID: 28479135 DOI: 10.1016/j.bpj.2017.04.023] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2016] [Revised: 03/30/2017] [Accepted: 04/14/2017] [Indexed: 01/23/2023] Open

Orenstein Y, Shamir R. Modeling protein-DNA binding via high-throughput in vitro technologies. Brief Funct Genomics 2017;16:171-180. [PMID: 27497616 PMCID: PMC5439287 DOI: 10.1093/bfgp/elw030] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open

Guzina J, Djordjevic M. Mix-and-matching as a promoter recognition mechanism by ECF σ factors. BMC Evol Biol 2017;17:12. [PMID: 28251873 PMCID: PMC5333181 DOI: 10.1186/s12862-016-0865-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

Background

Transcription initiation is in bacteria exhibited by different σ factors, most of which fall within σ⁷⁰ family. This family is diverse, ranging from the housekeeping Group I (RpoDs), to Group IV (ECF) σ factors, that transcribe smaller regulons under more stringent conditions. RpoDs employ a kinetic mix-and-match mechanism, where promoter elements complement each other binding strengths in achieving sufficient transcription activity. On the other hand, it is assumed that ECF σs, which are the most distant from the housekeeping σ factors, cannot exhibit mix-and-matching. However, mix-and-matching for ECF σ factors was not quantitatively checked before, and recent results show a much larger flexibility in the promoter recognition by the members of this group.

Results

To this end, we quantitatively investigate mix-and-matching in two canonical ECF σ family members (σ^E and σ^W), for which we use a biophysics based model of transcription initiation. For σ^E, we perform a separate analysis for in-vitro active and in-vitro inactive promoters, which allows us investigating how mix-and-matching depends on the external factors that may control transcription activity in the in-vitro inactive set. We show that the promoter elements of canonical ECF σs significantly complement each other strengths, where such mix-and-matching is in the in-vitro active set even stronger compared to the correlations observed for the housekeeping σs. This complementation however significantly decreases for the in-vitro inactive set, which we propose is due to mix-and-matching with regulatory sequences outside of the canonical promoter elements. In line with this proposition, we show that a conserved spacer element, which appears in the in-vitro inactive promoter set, significantly increases the promoter element complementation. While RpoD promoter elements mix-and-match to achieve sufficient total transcription activity, for σ^E they complement each other to achieve sufficiently strong total binding affinity, which we relate to differences in physiological responses between the two groups of σ factors.

Conclusion

Despite a common notion that smaller σ factor specificity leads to a larger mix-and-matching, we here obtain a larger promoter element complementation for σ^E compared to RpoDs. Finally, to explain this finding, we propose a simple model which relates the size of σ factor regulon with the extent of mix-and-matching, based on an assumption of a selection pressure on promoters that are near the non-specific binding boundary to remain functional.

Electronic supplementary material

The online version of this article (doi:10.1186/s12862-016-0865-z) contains supplementary material, which is available to authorized users.

Collapse

Austin RS, Hiu S, Waese J, Ierullo M, Pasha A, Wang TT, Fan J, Foong C, Breit R, Desveaux D, Moses A, Provart NJ. New BAR tools for mining expression data and exploring Cis-elements in Arabidopsis thaliana. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2016;88:490-504. [PMID: 27401965 DOI: 10.1111/tpj.13261] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Revised: 06/23/2016] [Accepted: 07/01/2016] [Indexed: 05/21/2023]

Affiliation(s)

Ryan S Austin Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
Shu Hiu Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
Jamie Waese Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
Matthew Ierullo Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
Asher Pasha Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
Ting Ting Wang Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
Jim Fan Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
Curtis Foong Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
Robert Breit Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
Darrell Desveaux Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
Alan Moses Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
Nicholas J Provart Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada

Collapse

Chen D, Orenstein Y, Golodnitsky R, Pellach M, Avrahami D, Wachtel C, Ovadia-Shochat A, Shir-Shapira H, Kedmi A, Juven-Gershon T, Shamir R, Gerber D. SELMAP - SELEX affinity landscape MAPping of transcription factor binding sites using integrated microfluidics. Sci Rep 2016;6:33351. [PMID: 27628341 PMCID: PMC5024299 DOI: 10.1038/srep33351] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 08/19/2016] [Indexed: 01/19/2023] Open

Westermark PO. Linking Core Promoter Classes to Circadian Transcription. PLoS Genet 2016;12:e1006231. [PMID: 27504829 PMCID: PMC4978467 DOI: 10.1371/journal.pgen.1006231] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 07/08/2016] [Indexed: 01/09/2023] Open

Promoter Recognition by Extracytoplasmic Function σ Factors: Analyzing DNA and Protein Interaction Motifs. J Bacteriol 2016;198:1927-1938. [PMID: 27137497 DOI: 10.1128/jb.00244-16] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2016] [Accepted: 04/25/2016] [Indexed: 01/25/2023] Open

Abstract

UNLABELLED

Extracytoplasmic function (ECF) σ factors are the largest and the most diverse group of alternative σ factors, but their mechanisms of transcription are poorly studied. This subfamily is considered to exhibit a rigid promoter structure and an absence of mixing and matching; both -35 and -10 elements are considered necessary for initiating transcription. This paradigm, however, is based on very limited data, which bias the analysis of diverse ECF σ subgroups. Here we investigate DNA and protein recognition motifs involved in ECF σ factor transcription by a computational analysis of canonical ECF subfamily members, much less studied ECF σ subgroups, and the group outliers, obtained from recently sequenced bacteriophages. The analysis identifies an extended -10 element in promoters for phage ECF σ factors; a comparison with bacterial σ factors points to a putative 6-amino-acid motif just C-terminal of domain σ2, which is responsible for the interaction with the identified extension of the -10 element. Interestingly, a similar protein motif is found C-terminal of domain σ2 in canonical ECF σ factors, at a position where it is expected to interact with a conserved motif further upstream of the -10 element. Moreover, the phiEco32 ECF σ factor lacks a recognizable -35 element and σ4 domain, which we identify in a homologous phage, 7-11, indicating that the extended -10 element can compensate for the lack of -35 element interactions. Overall, the results reveal greater flexibility in promoter recognition by ECF σ factors than previously recognized and raise the possibility that mixing and matching also apply to this group, a notion that remains to be biochemically tested.

IMPORTANCE

ECF σ factors are the most numerous group of alternative σ factors but have been little studied. Their promoter recognition mechanisms are obscured by the large diversity within the ECF σ factor group and the limited similarity with the well-studied housekeeping σ factors. Here we extensively compare bacterial and bacteriophage ECF σ factors and their promoters in order to infer DNA and protein recognition motifs involved in transcription initiation. We predict a more flexible promoter structure than is recognized by the current paradigm, which assumes rigidness, and propose that ECF σ promoter elements may complement (mix and match with) each other's strengths. These results warrant the refocusing of research efforts from the well-studied housekeeping σ factors toward the physiologically highly important, but insufficiently understood, alternative σ factors.

Collapse

Peng PC, Hassan Samee MA, Sinha S. Incorporating chromatin accessibility data into sequence-to-expression modeling. Biophys J 2016;108:1257-67. [PMID: 25762337 DOI: 10.1016/j.bpj.2014.12.037] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Revised: 12/01/2014] [Accepted: 12/11/2014] [Indexed: 01/30/2023] Open

Tapan S, Wang D. A Further Study on Mining DNA Motifs Using Fuzzy Self-Organizing Maps. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016;27:113-124. [PMID: 26068877 DOI: 10.1109/tnnls.2015.2435155] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]

Riley TR, Lazarovici A, Mann RS, Bussemaker HJ. Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE. eLife 2015;4:e06397. [PMID: 26701911 PMCID: PMC4758951 DOI: 10.7554/elife.06397] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Accepted: 12/20/2015] [Indexed: 01/26/2023] Open

Pulkkinen O, Metzler R. Variance-corrected Michaelis-Menten equation predicts transient rates of single-enzyme reactions and response times in bacterial gene-regulation. Sci Rep 2015;5:17820. [PMID: 26635080 PMCID: PMC4669464 DOI: 10.1038/srep17820] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 11/06/2015] [Indexed: 01/07/2023] Open

Tuğrul M, Paixão T, Barton NH, Tkačik G. Dynamics of Transcription Factor Binding Site Evolution. PLoS Genet 2015;11:e1005639. [PMID: 26545200 PMCID: PMC4636380 DOI: 10.1371/journal.pgen.1005639] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Accepted: 10/09/2015] [Indexed: 11/19/2022] Open

Abstract

Evolution of gene regulation is crucial for our understanding of the phenotypic differences between species, populations and individuals. Sequence-specific binding of transcription factors to the regulatory regions on the DNA is a key regulatory mechanism that determines gene expression and hence heritable phenotypic variation. We use a biophysical model for directional selection on gene expression to estimate the rates of gain and loss of transcription factor binding sites (TFBS) in finite populations under both point and insertion/deletion mutations. Our results show that these rates are typically slow for a single TFBS in an isolated DNA region, unless the selection is extremely strong. These rates decrease drastically with increasing TFBS length or increasingly specific protein-DNA interactions, making the evolution of sites longer than ∼ 10 bp unlikely on typical eukaryotic speciation timescales. Similarly, evolution converges to the stationary distribution of binding sequences very slowly, making the equilibrium assumption questionable. The availability of longer regulatory sequences in which multiple binding sites can evolve simultaneously, the presence of “pre-sites” or partially decayed old sites in the initial sequence, and biophysical cooperativity between transcription factors, can all facilitate gain of TFBS and reconcile theoretical calculations with timescales inferred from comparative genomics.

Evolution has produced a remarkable diversity of living forms that manifests in qualitative differences as well as quantitative traits. An essential factor that underlies this variability is transcription factor binding sites, short pieces of DNA that control gene expression levels. Nevertheless, we lack a thorough theoretical understanding of the evolutionary times required for the appearance and disappearance of these sites. By combining a biophysically realistic model for how cells read out information in transcription factor binding sites with model for DNA sequence evolution, we explore these timescales and ask what factors crucially affect them. We find that the emergence of binding sites from a random sequence is generically slow under point and insertion/deletion mutational mechanisms. Strong selection, sufficient genomic sequence in which the sites can evolve, the existence of partially decayed old binding sites in the sequence, as well as certain biophysical mechanisms such as cooperativity, can accelerate the binding site gain times and make them consistent with the timescales suggested by comparative analyses of genomic data.

Collapse

Chen L, Zheng QC, Zhang HX. Insights into the effects of mutations on Cren7-DNA binding using molecular dynamics simulations and free energy calculations. Phys Chem Chem Phys 2015;17:5704-11. [PMID: 25622968 DOI: 10.1039/c4cp05413j] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]

Simple Biophysical Model Predicts Faster Accumulation of Hybrid Incompatibilities in Small Populations Under Stabilizing Selection. Genetics 2015;201:1525-37. [PMID: 26434721 PMCID: PMC4676520 DOI: 10.1534/genetics.115.181685] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 09/23/2015] [Indexed: 01/07/2023] Open

Clifford J, Adami C. Discovery and information-theoretic characterization of transcription factor binding sites that act cooperatively. Phys Biol 2015;12:056004. [PMID: 26331781 DOI: 10.1088/1478-3975/12/5/056004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

A Biophysical Approach to Predicting Protein-DNA Binding Energetics. Genetics 2015;200:1349-61. [PMID: 26081193 DOI: 10.1534/genetics.115.178384] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 06/10/2015] [Indexed: 11/18/2022] Open

An adiabatic quantum algorithm and its application to DNA motif model discovery. Inf Sci (N Y) 2015. [DOI: 10.1016/j.ins.2014.10.057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Guzina J, Djordjevic M. Inferring bacteriophage infection strategies from genome sequence: analysis of bacteriophage 7-11 and related phages. BMC Evol Biol 2015;15 Suppl 1:S1. [PMID: 25708710 PMCID: PMC4331800 DOI: 10.1186/1471-2148-15-s1-s1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open

Abstract

Background

Analyzing regulation of bacteriophage gene expression historically lead to establishing major paradigms of molecular biology, and may provide important medical applications in the future. Temporal regulation of bacteriophage transcription is commonly analyzed through a labor-intensive combination of biochemical and bioinformatic approaches and macroarray measurements. We here investigate to what extent one can understand gene expression strategies of lytic phages, by directly analyzing their genomes through bioinformatic methods. We address this question on a recently sequenced lytic bacteriophage 7 - 11 that infects bacterium Salmonella enterica.

Results

We identify novel promoters for the bacteriophage-encoded σ factor, and test the predictions through homology with another bacteriophage (phiEco32) that has been experimentally characterized in detail. Interestingly, standard approach based on multiple local sequence alignment (MLSA) fails to correctly identify the promoters, but a simpler procedure that is based on pairwise alignment of intergenic regions identifies the desired motifs; we argue that such search strategy is more effective for promoters of bacteriophage-encoded σ factors that are typically well conserved but appear in low copy numbers, which we also verify on two additional bacteriophage genomes. Identifying promoters for bacteriophage encoded σ factors together with a more straightforward identification of promoters for bacterial encoded σ factor, allows clustering the genes in putative early, middle and late class, and consequently predicting the temporal regulation of bacteriophage gene expression, which we demonstrate on phage 7-11.

Conclusions

While MLSA algorithms proved highly useful in computational analysis of transcription regulation, we here established that a simpler procedure is more successful for identifying promoters that are recognized by bacteriophage encoded σ factor/RNA polymerase. We here used this approach for predicting sequence specificity of a novel (bacteriophage encoded) σ factor, and consequently inferring phage 7-11 transcription strategy. Therefore, direct analysis of bacteriophage genome sequences is a plausible first-line approach for efficiently inferring phage transcription strategies, and may provide a wealth of information on transcription initiation by diverse σ factors/RNA polymerases.

Collapse

Maaskola J, Rajewsky N. Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models. Nucleic Acids Res 2014;42:12995-3011. [PMID: 25389269 PMCID: PMC4245949 DOI: 10.1093/nar/gku1083] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

High-resolution specificity from DNA sequencing highlights alternative modes of Lac repressor binding. Genetics 2014;198:1329-43. [PMID: 25209146 DOI: 10.1534/genetics.114.170100] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open