1
|
Anashkina AA. Protein-DNA recognition mechanisms and specificity. Biophys Rev 2023; 15:1007-1014. [PMID: 37974977 PMCID: PMC10643805 DOI: 10.1007/s12551-023-01137-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 08/31/2023] [Indexed: 11/19/2023] Open
Abstract
The accumulated knowledge about the structure of protein-DNA complexes allowed us to understand the mechanisms of protein-DNA recognition and searching for a specific site on DNA. Obviously, the mechanism of specific DNA recognition by a protein must satisfy two requirements. First, the probability of incorrect binding should be very small. Second, the time to find the "correct" binding site should not be too long. If we assume that protein recognition of a precise site on DNA occurs at some distance from DNA and calculate global minima, we can avoid local minima at short distances. The only long-range interaction is the interaction of charges. The location of charges on DNA in three-dimensional space depends on the local conformation of DNA and thus reflects the DNA sequence and sets the spatial pattern for recognition. Various factors such as counter ion concentration, ionic strength, and pH can affect protein recognition of DNA. Nowadays, the theory of long-range interactions makes it possible to calculate the best mutual spatial arrangement of protein and DNA molecules by charged groups and avoid misplaced binding.
Collapse
Affiliation(s)
- Anastasia A. Anashkina
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| |
Collapse
|
2
|
Trerotola M, Antolini L, Beni L, Guerra E, Spadaccini M, Verzulli D, Moschella A, Alberti S. A deterministic code for transcription factor-DNA recognition through computation of binding interfaces. NAR Genom Bioinform 2022; 4:lqac008. [PMID: 35261972 PMCID: PMC8896162 DOI: 10.1093/nargab/lqac008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 12/05/2021] [Accepted: 02/28/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
The recognition code between transcription factor (TF) amino acids and DNA bases remains poorly understood. Here, the determinants of TF amino acid-DNA base binding selectivity were identified through the analysis of crystals of TF-DNA complexes. Selective, high-frequency interactions were identified for the vast majority of amino acid side chains (‘structural code’). DNA binding specificities were then independently assessed by meta-analysis of random-mutagenesis studies of Zn finger-target DNA sequences. Selective, high-frequency interactions were identified for the majority of mutagenized residues (‘mutagenesis code’). The structural code and the mutagenesis code were shown to match to a striking level of accuracy (P = 3.1 × 10−33), suggesting the identification of fundamental rules of TF binding to DNA bases. Additional insight was gained by showing a geometry-dictated choice among DNA-binding TF residues with overlapping specificity. These findings indicate the existence of a DNA recognition mode whereby the physical-chemical characteristics of the interacting residues play a deterministic role. The discovery of this DNA recognition code advances our knowledge on fundamental features of regulation of gene expression and is expected to pave the way for integration with higher-order complexity approaches.
Collapse
Affiliation(s)
- Marco Trerotola
- Laboratory of Cancer Pathology, Center for Advanced Studies and Technology (CAST), University “G. D’ Annunzio”, Via L. Polacchi 11, 66100 Chieti, Italy
- Department of Medical, Oral and Biotechnological Sciences, University “G. d’Annunzio”, 66100 Chieti, Italy
| | - Laura Antolini
- Center for Biostatistics, Department of Clinical Medicine, Prevention and Biotechnology, University of Milano-Bicocca, 20052 Monza, Italy
| | - Laura Beni
- Laboratory of Cancer Pathology, Center for Advanced Studies and Technology (CAST), University “G. D’ Annunzio”, Via L. Polacchi 11, 66100 Chieti, Italy
| | - Emanuela Guerra
- Laboratory of Cancer Pathology, Center for Advanced Studies and Technology (CAST), University “G. D’ Annunzio”, Via L. Polacchi 11, 66100 Chieti, Italy
- Department of Medical, Oral and Biotechnological Sciences, University “G. d’Annunzio”, 66100 Chieti, Italy
| | | | - Damiano Verzulli
- Unit of Informatics, University “G. d’Annunzio”, 66100 Chieti, Italy
| | - Antonino Moschella
- Unit of Medical Genetics, Department of Biomedical Sciences - BIOMORF, University of Messina, via Consolare Valeria, 98125 Messina, Italy
| | - Saverio Alberti
- Laboratory of Cancer Pathology, Center for Advanced Studies and Technology (CAST), University “G. D’ Annunzio”, Via L. Polacchi 11, 66100 Chieti, Italy
- Unit of Medical Genetics, Department of Biomedical Sciences - BIOMORF, University of Messina, via Consolare Valeria, 98125 Messina, Italy
| |
Collapse
|
3
|
AlQuraishi M, Sorger PK. Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms. Nat Methods 2021; 18:1169-1180. [PMID: 34608321 PMCID: PMC8793939 DOI: 10.1038/s41592-021-01283-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 08/27/2021] [Indexed: 02/08/2023]
Abstract
Deep learning using neural networks relies on a class of machine-learnable models constructed using 'differentiable programs'. These programs can combine mathematical equations specific to a particular domain of natural science with general-purpose, machine-learnable components trained on experimental data. Such programs are having a growing impact on molecular and cellular biology. In this Perspective, we describe an emerging 'differentiable biology' in which phenomena ranging from the small and specific (for example, one experimental assay) to the broad and complex (for example, protein folding) can be modeled effectively and efficiently, often by exploiting knowledge about basic natural phenomena to overcome the limitations of sparse, incomplete and noisy data. By distilling differentiable biology into a small set of conceptual primitives and illustrative vignettes, we show how it can help to address long-standing challenges in integrating multimodal data from diverse experiments across biological scales. This promises to benefit fields as diverse as biophysics and functional genomics.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Department of Systems Biology, Columbia University, New York, NY, USA.
- Laboratory of Systems Pharmacology, Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
| | - Peter K Sorger
- Laboratory of Systems Pharmacology, Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
4
|
Koshla O, Yushchuk O, Ostash I, Dacyuk Y, Myronovskyi M, Jäger G, Süssmuth RD, Luzhetskyy A, Byström A, Kirsebom LA, Ostash B. Gene miaA for post-transcriptional modification of tRNA XXA is important for morphological and metabolic differentiation in Streptomyces. Mol Microbiol 2019; 112:249-265. [PMID: 31017319 DOI: 10.1111/mmi.14266] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/20/2019] [Indexed: 12/14/2022]
Abstract
Members of actinobacterial genus Streptomyces possess a sophisticated life cycle and are the deepest source of bioactive secondary metabolites. Although morphogenesis and secondary metabolism are subject to transcriptional co-regulation, streptomycetes employ an additional mechanism to initiate the aforementioned processes. This mechanism is based on delayed translation of rare leucyl codon UUA by the only cognate tRNALeu UAA (encoded by bldA). The bldA-based genetic switch is an extensively documented example of translational regulation in Streptomyces. Yet, after five decades since the discovery of bldA, factors that shape its function and peculiar conditionality remained elusive. Here we address the hypothesis that post-transcriptional tRNA modifications play a role in tRNA-based mechanisms of translational control in Streptomyces. Particularly, we studied two Streptomyces albus J1074 genes, XNR_1074 (miaA) and XNR_1078 (miaB), encoding tRNA (adenosine(37)-N6)-dimethylallyltransferase and tRNA (N6-isopentenyl adenosine(37)-C2)-methylthiotransferase respectively. These enzymes produce, in a sequential manner, a hypermodified ms2 i6 A37 residue in most of the A36-A37-containing tRNAs. We show that miaB and especially miaA null mutant of S. albus possess altered morphogenesis and secondary metabolism. We provide genetic evidence that miaA deficiency impacts translational level of gene expression, most likely through impaired decoding of codons UXX and UUA in particular.
Collapse
Affiliation(s)
- Oksana Koshla
- Department of Genetics and Biotechnology, Ivan Franko National University of Lviv, 4 Hrushevskoho st., Lviv, 79005, Ukraine
| | - Oleksandr Yushchuk
- Department of Genetics and Biotechnology, Ivan Franko National University of Lviv, 4 Hrushevskoho st., Lviv, 79005, Ukraine
| | - Iryna Ostash
- Department of Genetics and Biotechnology, Ivan Franko National University of Lviv, 4 Hrushevskoho st., Lviv, 79005, Ukraine
| | - Yuriy Dacyuk
- Department of Physics of Earth, Ivan Franko National University of Lviv, 4 Hrushevskoho st., Lviv, 79005, Ukraine
| | - Maksym Myronovskyi
- Helmholtz Institute for Pharmaceutical Research, Saarland Campus, Building C2.3, Saarbrucken, 66123, Germany
| | - Gunilla Jäger
- Department of Molecular Biology, Umeå University, 6K och 6L, Sjukhusområdet, Umeå, 90197, Sweden
| | - Roderich D Süssmuth
- Institut für Chemie, Technische Universität Berlin, Straβe des 17 Juni 124/TC2, Berlin, 10623, Germany
| | - Andriy Luzhetskyy
- Helmholtz Institute for Pharmaceutical Research, Saarland Campus, Building C2.3, Saarbrucken, 66123, Germany
| | - Anders Byström
- Department of Molecular Biology, Umeå University, 6K och 6L, Sjukhusområdet, Umeå, 90197, Sweden
| | - Leif A Kirsebom
- Uppsala Biomedicinska Centrum BMC, Uppsala University, Husargatan 3, Box 596, Uppsala, 75124, Sweden
| | - Bohdan Ostash
- Department of Genetics and Biotechnology, Ivan Franko National University of Lviv, 4 Hrushevskoho st., Lviv, 79005, Ukraine
| |
Collapse
|
5
|
Andrabi M, Hutchins AP, Miranda-Saavedra D, Kono H, Nussinov R, Mizuguchi K, Ahmad S. Predicting conformational ensembles and genome-wide transcription factor binding sites from DNA sequences. Sci Rep 2017; 7:4071. [PMID: 28642456 PMCID: PMC5481346 DOI: 10.1038/s41598-017-03199-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2016] [Accepted: 04/26/2017] [Indexed: 12/24/2022] Open
Abstract
DNA shape is emerging as an important determinant of transcription factor binding beyond just the DNA sequence. The only tool for large scale DNA shape estimates, DNAshape was derived from Monte-Carlo simulations and predicts four broad and static DNA shape features, Propeller twist, Helical twist, Minor groove width and Roll. The contributions of other shape features e.g. Shift, Slide and Opening cannot be evaluated using DNAshape. Here, we report a novel method DynaSeq, which predicts molecular dynamics-derived ensembles of a more exhaustive set of DNA shape features. We compared the DNAshape and DynaSeq predictions for the common features and applied both to predict the genome-wide binding sites of 1312 TFs available from protein interaction quantification (PIQ) data. The results indicate a good agreement between the two methods for the common shape features and point to advantages in using DynaSeq. Predictive models employing ensembles from individual conformational parameters revealed that base-pair opening - known to be important in strand separation - was the best predictor of transcription factor-binding sites (TFBS) followed by features employed by DNAshape. Of note, TFBS could be predicted not only from the features at the target motif sites, but also from those as far as 200 nucleotides away from the motif.
Collapse
Affiliation(s)
- Munazah Andrabi
- National Institutes of Biomedical Innovation Health and Nutrition, 7-6-8, Saito-Asagi, Ibaraki, Osaka, 5670085, Japan
- Faculty of Biology,Medicine and Health, Michael Smith Building, The University of Manchester, Dover Street, Manchester, M13 9PT, UK
| | - Andrew Paul Hutchins
- Department of Biology, Southern University of Science and Technology of China, Shenzhen, 518055, China
| | - Diego Miranda-Saavedra
- World Premier International (WPI) Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita, 565-0871, Osaka, Japan
- Centro de Biología Molecular Severo Ochoa, CSIC/Universidad Autónoma de Madrid, 28049, Madrid, Spain
- Department of Computer Science, University of Oxford Wolfson Building, Parks Road, OXFORD, OX1 3QD, United Kingdom
| | - Hidetoshi Kono
- Molecular Modeling and Simulation (MMS) Group, National Institutes for Quantum and Radiological Science and Technology, 8-1-7, Umemidai, Kizugawa, Kyoto, 619-0215, Japan
| | - Ruth Nussinov
- National Cancer Institute, Cancer and Inflammation Program, Leidos Biomedical Research, Inc. Frederick, Maryland, USA
- Department of Biochemistry and Human Genetics, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Kenji Mizuguchi
- National Institutes of Biomedical Innovation Health and Nutrition, 7-6-8, Saito-Asagi, Ibaraki, Osaka, 5670085, Japan
| | - Shandar Ahmad
- National Institutes of Biomedical Innovation Health and Nutrition, 7-6-8, Saito-Asagi, Ibaraki, Osaka, 5670085, Japan.
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110067, India.
| |
Collapse
|
6
|
Korostelev YD, Zharov IA, Mironov AA, Rakhmaininova AB, Gelfand MS. Identification of Position-Specific Correlations between DNA-Binding Domains and Their Binding Sites. Application to the MerR Family of Transcription Factors. PLoS One 2016; 11:e0162681. [PMID: 27690309 PMCID: PMC5045206 DOI: 10.1371/journal.pone.0162681] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2015] [Accepted: 08/26/2016] [Indexed: 11/25/2022] Open
Abstract
The large and increasing volume of genomic data analyzed by comparative methods provides information about transcription factors and their binding sites that, in turn, enables statistical analysis of correlations between factors and sites, uncovering mechanisms and evolution of specific protein-DNA recognition. Here we present an online tool, Prot-DNA-Korr, designed to identify and analyze crucial protein-DNA pairs of positions in a family of transcription factors. Correlations are identified by analysis of mutual information between columns of protein and DNA alignments. The algorithm reduces the effects of common phylogenetic history and of abundance of closely related proteins and binding sites. We apply it to five closely related subfamilies of the MerR family of bacterial transcription factors that regulate heavy metal resistance systems. We validate the approach using known 3D structures of MerR-family proteins in complexes with their cognate DNA binding sites and demonstrate that a significant fraction of correlated positions indeed form specific side-chain-to-base contacts. The joint distribution of amino acids and nucleotides hence may be used to predict changes of specificity for point mutations in transcription factors.
Collapse
Affiliation(s)
- Yuriy D. Korostelev
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
- Department of Bioengineering and Bioinformatics, Moscow State University, 1-73 Vorobievy Gory, Moscow, Russia, 119991
| | - Ilya A. Zharov
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
| | - Andrey A. Mironov
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
- Department of Bioengineering and Bioinformatics, Moscow State University, 1-73 Vorobievy Gory, Moscow, Russia, 119991
| | - Alexandra B. Rakhmaininova
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
| | - Mikhail S. Gelfand
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
- Department of Bioengineering and Bioinformatics, Moscow State University, 1-73 Vorobievy Gory, Moscow, Russia, 119991
- * E-mail:
| |
Collapse
|
7
|
Meysman P, Zhou C, Cule B, Goethals B, Laukens K. Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns. BioData Min 2015; 8:4. [PMID: 25657820 PMCID: PMC4318390 DOI: 10.1186/s13040-015-0038-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 01/18/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The three-dimensional structure of a protein is an essential aspect of its functionality. Despite the large diversity in protein structures and functionality, it is known that there are common patterns and preferences in the contacts between amino acid residues, or between residues and other biomolecules, such as DNA. The discovery and characterization of these patterns is an important research topic within structural biology as it can give fundamental insight into protein structures and can aid in the prediction of unknown structures. RESULTS Here we apply an efficient spatial pattern miner to search for sets of amino acids that occur frequently in close spatial proximity in the protein structures of the Protein DataBank. This allowed us to mine for a new class of amino acid patterns, that we term FreSCOs (Frequent Spatially Cohesive Component sets), which feature synergetic combinations. To demonstrate the relevance of these FreSCOs, they were compared in relation to the thermostability of the protein structure and the interaction preferences of DNA-protein complexes. In both cases, the results matched well with prior investigations using more complex methods on smaller data sets. CONCLUSIONS The currently characterized protein structures feature a diverse set of frequent amino acid patterns that can be related to the stability of the protein molecular structure and that are independent from protein function or specific conserved domains.
Collapse
Affiliation(s)
- Pieter Meysman
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - Cheng Zhou
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Boris Cule
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Bart Goethals
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| |
Collapse
|
8
|
Zabet NR, Adryan B. Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res 2015; 43:84-94. [PMID: 25432957 PMCID: PMC4288167 DOI: 10.1093/nar/gku1269] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Revised: 10/22/2014] [Accepted: 11/19/2014] [Indexed: 12/20/2022] Open
Abstract
The binding of transcription factors (TFs) is essential for gene expression. One important characteristic is the actual occupancy of a putative binding site in the genome. In this study, we propose an analytical model to predict genomic occupancy that incorporates the preferred target sequence of a TF in the form of a position weight matrix (PWM), DNA accessibility data (in the case of eukaryotes), the number of TF molecules expected to be bound specifically to the DNA and a parameter that modulates the specificity of the TF. Given actual occupancy data in the form of ChIP-seq profiles, we backwards inferred copy number and specificity for five Drosophila TFs during early embryonic development: Bicoid, Caudal, Giant, Hunchback and Kruppel. Our results suggest that these TFs display thousands of molecules that are specifically bound to the DNA and that whilst Bicoid and Caudal display a higher specificity, the other three TFs (Giant, Hunchback and Kruppel) display lower specificity in their binding (despite having PWMs with higher information content). This study gives further weight to earlier investigations into TF copy numbers that suggest a significant proportion of molecules are not bound specifically to the DNA.
Collapse
Affiliation(s)
- Nicolae Radu Zabet
- Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
| | - Boris Adryan
- Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
| |
Collapse
|
9
|
Controlling gene networks and cell fate with precision-targeted DNA-binding proteins and small-molecule-based genome readers. Biochem J 2014; 462:397-413. [PMID: 25145439 DOI: 10.1042/bj20140400] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Transcription factors control the fate of a cell by regulating the expression of genes and regulatory networks. Recent successes in inducing pluripotency in terminally differentiated cells as well as directing differentiation with natural transcription factors has lent credence to the efforts that aim to direct cell fate with rationally designed transcription factors. Because DNA-binding factors are modular in design, they can be engineered to target specific genomic sequences and perform pre-programmed regulatory functions upon binding. Such precision-tailored factors can serve as molecular tools to reprogramme or differentiate cells in a targeted manner. Using different types of engineered DNA binders, both regulatory transcriptional controls of gene networks, as well as permanent alteration of genomic content, can be implemented to study cell fate decisions. In the present review, we describe the current state of the art in artificial transcription factor design and the exciting prospect of employing artificial DNA-binding factors to manipulate the transcriptional networks as well as epigenetic landscapes that govern cell fate.
Collapse
|
10
|
Dynamic conformational change regulates the protein-DNA recognition: an investigation on binding of a Y-family polymerase to its target DNA. PLoS Comput Biol 2014; 10:e1003804. [PMID: 25188490 PMCID: PMC4154647 DOI: 10.1371/journal.pcbi.1003804] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Accepted: 07/10/2014] [Indexed: 12/02/2022] Open
Abstract
Protein-DNA recognition is a central biological process that governs the life of cells. A protein will often undergo a conformational transition to form the functional complex with its target DNA. The protein conformational dynamics are expected to contribute to the stability and specificity of DNA recognition and therefore may control the functional activity of the protein-DNA complex. Understanding how the conformational dynamics influences the protein-DNA recognition is still challenging. Here, we developed a two-basin structure-based model to explore functional dynamics in Sulfolobus solfataricus DNA Y-family polymerase IV (DPO4) during its binding to DNA. With explicit consideration of non-specific and specific interactions between DPO4 and DNA, we found that DPO4-DNA recognition is comprised of first 3D diffusion, then a short-range adjustment sliding on DNA and finally specific binding. Interestingly, we found that DPO4 is under a conformational equilibrium between multiple states during the binding process and the distributions of the conformations vary at different binding stages. By modulating the strength of the electrostatic interactions, the flexibility of the linker, and the conformational dynamics in DPO4, we drew a clear picture on how DPO4 dynamically regulates the DNA recognition. We argue that the unique features of flexibility and conformational dynamics in DPO4-DNA recognition have direct implications for low-fidelity translesion DNA synthesis, most of which is found to be accomplished by the Y-family DNA polymerases. Our results help complete the description of the DNA synthesis process for the Y-family polymerases. Furthermore, the methods developed here can be widely applied for future investigations on how various proteins recognize and bind specific DNA substrates. Protein-DNA recognition is crucial for many key biological processes in cells. Protein often undergoes large-scale conformational change during DNA recognition. However, the physical and global understanding of flexible protein-DNA binding is still challenging. Here, we developed a theoretical approach to investigate binding of a Y-family DNA polymerase to its target DNA during the DNA synthesis process. The results of electrostatic-controlled multi-step DNA binding process accompanied with multi-state conformational transition of protein occurring throughout are in remarkable agreement with experiments. During the process of protein-DNA recognition, the flexibility is found to facilitate both the conformational transition of protein (intra-chain dynamics) and DNA binding (inter-chain dynamics) simultaneously. Therefore, we provided a quantitative description of protein-DNA binding mechanism that flexibility or conformational change regulates DNA recognition dynamically, leading to high efficiency and specificity of function for protein-DNA recognition.
Collapse
|
11
|
Leibovich L, Yakhini Z. Mutual enrichment in ranked lists and the statistical assessment of position weight matrix motifs. Algorithms Mol Biol 2014; 9:11. [PMID: 24708618 PMCID: PMC4021615 DOI: 10.1186/1748-7188-9-11] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2013] [Accepted: 03/30/2014] [Indexed: 11/18/2022] Open
Abstract
Background Statistics in ranked lists is useful in analysing molecular biology measurement data, such as differential expression, resulting in ranked lists of genes, or ChIP-Seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists of sequences. More flexible models such as position weight matrix (PWM) motifs are more challenging in this context, partially because it is not clear how to avoid the use of arbitrary thresholds. Results To assess the enrichment of a PWM motif in a ranked list we use a second ranking on the same set of elements induced by the PWM. Possible orders of one ranked list relative to another can be modelled as permutations. Due to sample space complexity, it is difficult to accurately characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top parts of two uniformly and independently drawn permutations. We further demonstrate advantages of this approach using our software implementation, mmHG-Finder, which is publicly available, to study PWM motifs in several datasets. In addition to validating known motifs, we found GC-rich strings to be enriched amongst the promoter sequences of long non-coding RNAs that are specifically expressed in thyroid and prostate tissue samples and observed a statistical association with tissue specific CpG hypo-methylation. Conclusions We develop tight bounds that can be calculated in polynomial time. We demonstrate utility of mutual enrichment in motif search and assess performance for synthetic and biological datasets. We suggest that thyroid and prostate-specific long non-coding RNAs are regulated by transcription factors that bind GC-rich sequences, such as EGR1, SP1 and E2F3. We further suggest that this regulation is associated with DNA hypo-methylation.
Collapse
|
12
|
Polozov RV, Sivozhelezov VS, Chirgadze YN, Ivanov VV. Recognition rules for binding of Zn-Cys2His2 transcription factors to operator DNA. J Biomol Struct Dyn 2014; 33:253-66. [PMID: 24460547 DOI: 10.1080/07391102.2013.879074] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The molecules of Zn-finger transcription factors consist of several similar small protein units. We analyzed the crystal structures 46 basic units of 22 complexes of Zn-Cys2His2 family with the fragments of operator DNA. We showed that the recognition of DNA occurs via five protein contacts. The canonical binding positions of the recognizing α-helix were -1, 3, 6, and 7, which make contacts with the tetra-nucleotide sequence ZXYZ of the coding DNA strand; here the canonical binding triplet is underlined. The non-coding DNA strand forms only one contact at α-helix position 2. We have discovered that there is a single highly conservative contact His7α with the phosphate group of nucleotide Z, which precedes each triplet XYZ of the coding DNA chain. This particular contact is invariant for the all Zn-Cys2His2 family with high frequency of occurrence 83%, which we considered as an invariant recognition rule. We have also selected a previously unreported Zn-Cys2His2-Arg subfamily of 21 Zn-finger units bound with DNA triplets, which make two invariant contacts with residues Arg6α and His7α with the coding DNA chain. These contacts show frequency of occurrence 100 and 90%, and are invariant recognition rule. Three other variable protein-DNA contacts are formed mainly with the bases and specify the recognition patterns of individual factor units. The revealed recognition rules are inherent for the Zn-Cys2His2 family and Zn-Cys2His2-Arg subfamily of different taxonomic groups and can distinguish members of these families from any other family of transcription factors.
Collapse
Affiliation(s)
- R V Polozov
- a Institute of Theoretical Experimental Biophysics, Russian Academy of Sciences , Pushchino 142290 , Moscow Region , Russia
| | | | | | | |
Collapse
|
13
|
Chang CW, Couñago RM, Williams SJ, Boden M, Kobe B. The distribution of different classes of nuclear localization signals (NLSs) in diverse organisms and the utilization of the minor NLS-binding site inplantnuclear import factor importin-α. PLANT SIGNALING & BEHAVIOR 2013; 8:25976. [PMID: 24270630 PMCID: PMC4091121 DOI: 10.4161/psb.25976] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Accepted: 07/31/2013] [Indexed: 05/29/2023]
Abstract
The specific recognition between the import receptor importin-α and the nuclear localization signals (NLSs) is crucial to ensure the selective transport of cargoes into the nucleus. NLSs contain 1 or 2 clusters of positively charged amino acids, which usually bind to the major (monopartite NLSs) or both minor and major NLS-binding sites (bipartite NLSs). In our recent study, we determined the structure of importin-α1a from rice (Oryza sativa), and made 2 observations that suggest an increased utilization of the minor NLS-binding site in this protein. First, unlike the mammalian protein, both the major and minor NLS-binding sites are auto-inhibited in the unliganded rice protein. Second, we showed that NLSs of the "plant-specific" class preferentially bind to the minor NLS-binding site of rice importin-α. Here, we show that a distinct group of "minor site-specific" NLSs also bind to the minor site of the rice protein. We further show a greater enrichment of proteins containing these "plant-specific" and "minor site-specific" NLSs in the rice proteome. However, the analysis of the distribution of different classes of NLSs in diverse eukaryotes shows that in all organisms, the minor site-specific NLSs are much less prevalent than the classical monopartite and bipartite NLSs.
Collapse
Affiliation(s)
- Chiung-Wen Chang
- School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience; University of Queensland; Brisbane, QLD Australia
- Australian Infectious Diseases Research Centre; University of Queensland; Brisbane, QLD Australia
| | - Rafael Miguez Couñago
- School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience; University of Queensland; Brisbane, QLD Australia
- Australian Infectious Diseases Research Centre; University of Queensland; Brisbane, QLD Australia
| | - Simon J Williams
- School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience; University of Queensland; Brisbane, QLD Australia
- Australian Infectious Diseases Research Centre; University of Queensland; Brisbane, QLD Australia
| | - Mikael Boden
- School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience; University of Queensland; Brisbane, QLD Australia
- School of Information Technology and Electrical Engineering; University of Queensland; Brisbane, QLD Australia
| | - Bostjan Kobe
- School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience; University of Queensland; Brisbane, QLD Australia
- Australian Infectious Diseases Research Centre; University of Queensland; Brisbane, QLD Australia
| |
Collapse
|
14
|
Jiang B, Liu JS, Bulyk ML. Bayesian hierarchical model of protein-binding microarray k-mer data reduces noise and identifies transcription factor subclasses and preferred k-mers. ACTA ACUST UNITED AC 2013; 29:1390-8. [PMID: 23559638 DOI: 10.1093/bioinformatics/btt152] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
MOTIVATION Sequence-specific transcription factors (TFs) regulate the expression of their target genes through interactions with specific DNA-binding sites in the genome. Data on TF-DNA binding specificities are essential for understanding how regulatory specificity is achieved. RESULTS Numerous studies have used universal protein-binding microarray (PBM) technology to determine the in vitro binding specificities of hundreds of TFs for all possible 8 bp sequences (8mers). We have developed a Bayesian analysis of variance (ANOVA) model that decomposes these 8mer data into background noise, TF familywise effects and effects due to the particular TF. Adjusting for background noise improves PBM data quality and concordance with in vivo TF binding data. Moreover, our model provides simultaneous identification of TF subclasses and their shared sequence preferences, and also of 8mers bound preferentially by individual members of TF subclasses. Such results may aid in deciphering cis-regulatory codes and determinants of protein-DNA binding specificity. AVAILABILITY AND IMPLEMENTATION Source code, compiled code and R and Python scripts are available from http://thebrain.bwh.harvard.edu/hierarchicalANOVA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bo Jiang
- Department of Statistics, Harvard University, Cambridge, MA 02138, USA.
| | | | | |
Collapse
|
15
|
Christensen RG, Enuameh MS, Noyes MB, Brodsky MH, Wolfe SA, Stormo GD. Recognition models to predict DNA-binding specificities of homeodomain proteins. Bioinformatics 2013; 28:i84-9. [PMID: 22689783 PMCID: PMC3371834 DOI: 10.1093/bioinformatics/bts202] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Motivation: Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C2H2 zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor (KNN) algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain (HD) proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes. Results: Using extensive experimental data, we have tested several machine learning approaches and find that both support vector machines and random forests (RFs) can produce recognition models for HD proteins that are significant improvements over KNN-based methods. Cross-validation analyses show that the resulting models are capable of predicting specificities with high accuracy. We have produced a web-based prediction tool, PreMoTF (Predicted Motifs for Transcription Factors) (http://stormo.wustl.edu/PreMoTF), for predicting position frequency matrices from protein sequence using a RF-based model. Contact:stormo@wustl.edu
Collapse
Affiliation(s)
- Ryan G Christensen
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | | | | | | | | | | |
Collapse
|
16
|
Merkulova TI, Ananko EA, Ignatieva EV, Kolchanov NA. Transcription regulatory codes of eukaryotic genomes. RUSS J GENET+ 2013. [DOI: 10.1134/s1022795413010079] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
17
|
Zabet NR, Adryan B. Computational models for large-scale simulations of facilitated diffusion. MOLECULAR BIOSYSTEMS 2012; 8:2815-27. [PMID: 22892851 PMCID: PMC4007627 DOI: 10.1039/c2mb25201e] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The binding of site-specific transcription factors to their genomic target sites is a key step in gene regulation. While the genome is huge, transcription factors belong to the least abundant protein classes in the cell. It is therefore fascinating how short the time frame is that they require to home in on their target sites. The underlying search mechanism is called facilitated diffusion and assumes a combination of three-dimensional diffusion in the space around the DNA combined with one-dimensional random walk on it. In this review, we present the current understanding of the facilitated diffusion mechanism and identify questions that lack a clear or detailed answer. One way to investigate these questions is through stochastic simulation and, in this manuscript, we support the idea that such simulations are able to address them. Finally, we review which biological parameters need to be included in such computational models in order to obtain a detailed representation of the actual process.
Collapse
Affiliation(s)
- Nicolae Radu Zabet
- Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
| | - Boris Adryan
- Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
| |
Collapse
|
18
|
The crystal structure of the Sox4 HMG domain-DNA complex suggests a mechanism for positional interdependence in DNA recognition. Biochem J 2012; 443:39-47. [PMID: 22181698 DOI: 10.1042/bj20111768] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
It has recently been proposed that the sequence preferences of DNA-binding TFs (transcription factors) can be well described by models that include the positional interdependence of the nucleotides of the target sites. Such binding models allow for multiple motifs to be invoked, such as principal and secondary motifs differing at two or more nucleotide positions. However, the structural mechanisms underlying the accommodation of such variant motifs by TFs remain elusive. In the present study we examine the crystal structure of the HMG (high-mobility group) domain of Sox4 [Sry (sex-determining region on the Y chromosome)-related HMG box 4] bound to DNA. By comparing this structure with previously solved structures of Sox17 and Sox2, we observed subtle conformational differences at the DNA-binding interface. Furthermore, using quantitative electrophoretic mobility-shift assays we validated the positional interdependence of two nucleotides and the presence of a secondary Sox motif in the affinity landscape of Sox4. These results suggest that a concerted rearrangement of two interface amino acids enables Sox4 to accommodate primary and secondary motifs. The structural adaptations lead to altered dinucleotide preferences that mutually reinforce each other. These analyses underline the complexity of the DNA recognition by TFs and provide an experimental validation for the conceptual framework of positional interdependence and secondary binding motifs.
Collapse
|
19
|
Chirgadze YN, Sivozhelezov VS, Polozov RV, Stepanenko VA, Ivanov VV. Recognition Rules for Binding of Homeodomains to Operator DNA. J Biomol Struct Dyn 2012; 29:715-31. [DOI: 10.1080/073911012010525019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
20
|
Abstract
Identification of transcription factor binding sites is necessary for deciphering gene regulatory networks. Several new methods provide extensive data about the specificity of transcription factors but most methods for analyzing these data to obtain specificity models are limited in scope by, for example, assuming additive interactions or are inefficient in their exploration of more complex models. This article describes an approach--encoding of DNA sequences as the vertices of a regular simplex--that allows simultaneous direct comparison of simple and complex models, with higher-order parameters fit to the residuals of lower-order models. In addition to providing an efficient assessment of all model parameters, this approach can yield valuable insight into the mechanism of binding by highlighting features that are critical to accurate models.
Collapse
|
21
|
Boryskina OP, Tkachenko MY, Shestopalova AV. Protein-DNA complexes: specificity and DNA readout mechanisms. ACTA ACUST UNITED AC 2011. [DOI: 10.7124/bc.00007c] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- O. P. Boryskina
- O. Ya. Usikov Institute for Radio Physics and Electronics, National Academy of Sciences of Ukraine
| | - M. Yu. Tkachenko
- O. Ya. Usikov Institute for Radio Physics and Electronics, National Academy of Sciences of Ukraine
| | - A. V. Shestopalova
- O. Ya. Usikov Institute for Radio Physics and Electronics, National Academy of Sciences of Ukraine
| |
Collapse
|
22
|
Lee J, Kim JS, Seok C. Cooperativity and specificity of Cys2His2 zinc finger protein-DNA interactions: a molecular dynamics simulation study. J Phys Chem B 2010; 114:7662-71. [PMID: 20469897 DOI: 10.1021/jp1017289] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Cys(2)His(2) zinc finger proteins are one of the most frequently observed DNA-binding motifs in eukaryotes. They have been widely used as a framework for designing new DNA-binding proteins. In this work, the binding affinity and conformational change of the Zif268-DNA complex were successfully reproduced with MD simulations and MM-PBSA analysis. The following new discoveries on the zinc finger protein-DNA interactions were obtained by careful energy decomposition analysis. First, a dramatic increase in the binding affinity was observed when the third zinc finger is added, indicating a cooperative nature. This cooperativity is shown to be a consequence of the small but distinctive conformational change of DNA, which enables a tight fit of the protein into the major groove of DNA. Second, specificity of the amino acid-nucleotide recognitions observed in the crystal structure is explained as originating from the ability of specific side chains and bases to take the optimal geometries for favorable interactions between polar groups. The success of the current approach implies that similar methods could be further applied to the study of protein-DNA interactions involving longer polyfingers or different linkers between fingers to provide insights for design of novel zinc finger proteins.
Collapse
Affiliation(s)
- Juyong Lee
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | | | | |
Collapse
|
23
|
Abstract
A long-standing goal of computational protein design is to create proteins similar to those found in Nature. One motivation is to harness the exquisite functional capabilities of proteins for our own purposes. The extent of similarity between designed and natural proteins also reports on how faithfully our models represent the selective pressures that determine protein sequences. As the field of protein design shifts emphasis from reproducing native-like protein structure to function, it has become important that these models treat the notion of specificity in molecular interactions. Although specificity may, in some cases, be achieved by optimization of a desired protein in isolation, methods have been developed to address directly the desire for proteins that exhibit specific functions and interactions.
Collapse
Affiliation(s)
- James J Havranek
- Department of Genetics, Washington University School of Medicine, St Louis, Missouri 63110, USA.
| |
Collapse
|
24
|
Zhao Y, Granas D, Stormo GD. Inferring binding energies from selected binding sites. PLoS Comput Biol 2009; 5:e1000590. [PMID: 19997485 PMCID: PMC2777355 DOI: 10.1371/journal.pcbi.1000590] [Citation(s) in RCA: 159] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Accepted: 11/02/2009] [Indexed: 11/18/2022] Open
Abstract
We employ a biophysical model that accounts for the non-linear relationship between binding energy and the statistics of selected binding sites. The model includes the chemical potential of the transcription factor, non-specific binding affinity of the protein for DNA, as well as sequence-specific parameters that may include non-independent contributions of bases to the interaction. We obtain maximum likelihood estimates for all of the parameters and compare the results to standard probabilistic methods of parameter estimation. On simulated data, where the true energy model is known and samples are generated with a variety of parameter values, we show that our method returns much more accurate estimates of the true parameters and much better predictions of the selected binding site distributions. We also introduce a new high-throughput SELEX (HT-SELEX) procedure to determine the binding specificity of a transcription factor in which the initial randomized library and the selected sites are sequenced with next generation methods that return hundreds of thousands of sites. We show that after a single round of selection our method can estimate binding parameters that give very good fits to the selected site distributions, much better than standard motif identification algorithms. The DNA binding sites of transcription factors that control gene expression are often predicted based on a collection of known or selected binding sites. The most commonly used methods for inferring the binding site pattern, or sequence motif, assume that the sites are selected in proportion to their affinity for the transcription factor, ignoring the effect of the transcription factor concentration. We have developed a new maximum likelihood approach, in a program called BEEML, that directly takes into account the transcription factor concentration as well as non-specific contributions to the binding affinity, and we show in simulation studies that it gives a much more accurate model of the transcription factor binding sites than previous methods. We also develop a new method for extracting binding sites for a transcription factor from a random pool of DNA sequences, called high-throughput SELEX (HT-SELEX), and we show that after a single round of selection BEEML can obtain an accurate model of the transcription factor binding sites.
Collapse
Affiliation(s)
- Yue Zhao
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - David Granas
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Gary D. Stormo
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
25
|
Homsi DSF, Gupta V, Stormo GD. Modeling the quantitative specificity of DNA-binding proteins from example binding sites. PLoS One 2009; 4:e6736. [PMID: 19707584 PMCID: PMC2726951 DOI: 10.1371/journal.pone.0006736] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2009] [Accepted: 07/07/2009] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The binding of transcription factors to their respective DNA sites is a key component of every regulatory network. Predictions of transcription factor binding sites are usually based on models for transcription factor specificity. These models, in turn, are often based on examples of known binding sites. METHODOLOGY/PRINCIPAL FINDINGS Collections of binding sites are obtained in simulation experiments where the true model for the transcription factor is known and various sampling procedures are employed. We compare the accuracies of three different and commonly used methods for predicting the specificity of the transcription factor based on example binding sites. Different methods for constructing the models can lead to significant differences in the accuracy of the predictions and we show that commonly used methods can be positively misleading, even at large sample sizes and using noise-free data. Methods that minimize the number of predicted binding sequences are often significantly more accurate than the other methods tested. CONCLUSIONS/SIGNIFICANCE Different methods for generating motifs from example binding sites can have significantly different numbers of false positive and false negative predictions. For many different sampling procedures models based on quadratic programming are the most accurate.
Collapse
Affiliation(s)
- Dana S. F. Homsi
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Vineet Gupta
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Gary D. Stormo
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| |
Collapse
|
26
|
Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA, Chan ET, Metzler G, Vedenko A, Chen X, Kuznetsov H, Wang CF, Coburn D, Newburger DE, Morris Q, Hughes TR, Bulyk ML. Diversity and complexity in DNA recognition by transcription factors. Science 2009; 324:1720-3. [PMID: 19443739 PMCID: PMC2905877 DOI: 10.1126/science.1162327] [Citation(s) in RCA: 740] [Impact Index Per Article: 49.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Sequence preferences of DNA binding proteins are a primary mechanism by which cells interpret the genome. Despite the central importance of these proteins in physiology, development, and evolution, comprehensive DNA binding specificities have been determined experimentally for only a few proteins. Here, we used microarrays containing all 10-base pair sequences to examine the binding specificities of 104 distinct mouse DNA binding proteins representing 22 structural classes. Our results reveal a complex landscape of binding, with virtually every protein analyzed possessing unique preferences. Roughly half of the proteins each recognized multiple distinctly different sequence motifs, challenging our molecular understanding of how proteins interact with their DNA binding sites. This complexity in DNA recognition may be important in gene regulation and in the evolution of transcriptional regulatory networks.
Collapse
Affiliation(s)
- Gwenael Badis
- Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
| | - Michael F. Berger
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA 02138
| | - Anthony A. Philippakis
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Harvard-MIT Division of Health Sciences and Technology (HST); Harvard Medical School, Boston, MA 02115
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA 02138
| | - Shaheynoor Talukder
- Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
- Department of Molecular Genetics, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
| | - Andrew R. Gehrke
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
| | - Savina A. Jaeger
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
| | - Esther T. Chan
- Department of Molecular Genetics, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
| | - Genita Metzler
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139
| | | | - Xiaoyu Chen
- Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
| | - Hanna Kuznetsov
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Chi-Fong Wang
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - David Coburn
- Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
| | - Daniel E. Newburger
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
| | - Quaid Morris
- Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
- Department of Molecular Genetics, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
- Department of Computer Science, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
| | - Timothy R. Hughes
- Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
- Department of Molecular Genetics, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College St., Toronto, ON, Canada M5S 3E1
| | - Martha L. Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Harvard-MIT Division of Health Sciences and Technology (HST); Harvard Medical School, Boston, MA 02115
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA 02138
| |
Collapse
|
27
|
Berger MF, Bulyk ML. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nat Protoc 2009; 4:393-411. [PMID: 19265799 DOI: 10.1038/nprot.2008.195] [Citation(s) in RCA: 268] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Protein-binding microarray (PBM) technology provides a rapid, high-throughput means of characterizing the in vitro DNA-binding specificities of transcription factors (TFs). Using high-density, custom-designed microarrays containing all 10-mer sequence variants, one can obtain comprehensive binding-site measurements for any TF, regardless of its structural class or species of origin. Here, we present a protocol for the examination and analysis of TF-binding specificities at high resolution using such 'all 10-mer' universal PBMs. This procedure involves double-stranding a commercially synthesized DNA oligonucleotide array, binding a TF directly to the double-stranded DNA microarray and labeling the protein-bound microarray with a fluorophore-conjugated antibody. We describe how to computationally extract the relative binding preferences of the examined TF for all possible contiguous and gapped 8-mers over the full range of affinities, from highest affinity sites to nonspecific sites. Multiple proteins can be tested in parallel in separate chambers on a single microarray, enabling the processing of a dozen or more TFs in a single day.
Collapse
Affiliation(s)
- Michael F Berger
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | | |
Collapse
|
28
|
Cherstvy AG. Positively charged residues in DNA-binding domains of structural proteins follow sequence-specific positions of DNA phosphate groups. J Phys Chem B 2009; 113:4242-7. [PMID: 19256532 DOI: 10.1021/jp810009s] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We study electrostatic charge complementarity along interfaces of DNA-protein complexes. We use the Protein Data Bank atomic coordinates of DNA-protein complexes for some DNA-binding proteins to study the distribution of positively charged protein residues in the close contact with DNA. We show that large structural proteins reveal a peculiar nonuniform distribution of Arg, Lys, and His amino acids in the frame of negatively charged DNA phosphate strands. We study the nucleosome core particles, DNA complexes with prokaryotic DNA-bending histone analogues, but also the basic binding motifs of small DNA-binding proteins. For large DNA-protein complexes, where extensive DNA wrapping around protein cores occurs, we show that positive amino acids on the proteins track sequence-specific positions of individual DNA phosphates. This specificity of electrostatic interactions can contribute to DNA recognition by DNA-binding proteins, which is governed for many DNA-protein complexes primarily by the hydrogen bond formation between protein residues and DNA bases.
Collapse
Affiliation(s)
- A G Cherstvy
- Institut für Festkörperforschung, Theorie-II, Forschungszentrum Jülich, Germany.
| |
Collapse
|
29
|
Adams CA, Melikishvili M, Rodgers DW, Rasimas JJ, Pegg AE, Fried MG. Topologies of complexes containing O6-alkylguanine-DNA alkyltransferase and DNA. J Mol Biol 2009; 389:248-63. [PMID: 19358853 DOI: 10.1016/j.jmb.2009.03.067] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2008] [Revised: 03/28/2009] [Accepted: 03/31/2009] [Indexed: 11/25/2022]
Abstract
The mutagenic and cytotoxic effects of many alkylating agents are reduced by O(6)-alkylguanine-DNA alkyltransferase (AGT). In humans, this protein not only protects the integrity of the genome, but also contributes to the resistance of tumors to DNA-alkylating chemotherapeutic agents. Here we describe and test models for cooperative multiprotein complexes of AGT with single-stranded and duplex DNAs that are based on in vitro binding data and the crystal structure of a 1:1 AGT-DNA complex. These models predict that cooperative assemblies contain a three-start helical array of proteins with dominant protein-protein interactions between the amino-terminal face of protein n and the carboxy-terminal face of protein n+3, and they predict that binding duplex DNA does not require large changes in B-form DNA geometry. Experimental tests using protein cross-linking analyzed by mass spectrometry, electrophoretic and analytical ultracentrifugation binding assays, and topological analyses with closed circular DNA show that the properties of multiprotein AGT-DNA complexes are consistent with these predictions.
Collapse
Affiliation(s)
- Claire A Adams
- Department of Molecular and Cellular Biochemistry and Center for Structural Biology, University of Kentucky, Lexington, KY 40536, USA
| | | | | | | | | | | |
Collapse
|
30
|
Effects of ploidy and recombination on evolution of robustness in a model of the segment polarity network. PLoS Comput Biol 2009; 5:e1000296. [PMID: 19247428 PMCID: PMC2637435 DOI: 10.1371/journal.pcbi.1000296] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2008] [Accepted: 01/20/2009] [Indexed: 11/19/2022] Open
Abstract
Many genetic networks are astonishingly robust to quantitative variation, allowing these networks to continue functioning in the face of mutation and environmental perturbation. However, the evolution of such robustness remains poorly understood for real genetic networks. Here we explore whether and how ploidy and recombination affect the evolution of robustness in a detailed computational model of the segment polarity network. We introduce a novel computational method that predicts the quantitative values of biochemical parameters from bit sequences representing genotype, allowing our model to bridge genotype to phenotype. Using this, we simulate 2,000 generations of evolution in a population of individuals under stabilizing and truncation selection, selecting for individuals that could sharpen the initial pattern of engrailed and wingless expression. Robustness was measured by simulating a mutation in the network and measuring the effect on the engrailed and wingless patterns; higher robustness corresponded to insensitivity of this pattern to perturbation. We compared robustness in diploid and haploid populations, with either asexual or sexual reproduction. In all cases, robustness increased, and the greatest increase was in diploid sexual populations; diploidy and sex synergized to evolve greater robustness than either acting alone. Diploidy conferred increased robustness by allowing most deleterious mutations to be rescued by a working allele. Sex (recombination) conferred a robustness advantage through "survival of the compatible": those alleles that can work with a wide variety of genetically diverse partners persist, and this selects for robust alleles.
Collapse
|
31
|
Philippakis AA, Qureshi AM, Berger MF, Bulyk ML. Design of compact, universal DNA microarrays for protein binding microarray experiments. J Comput Biol 2008; 15:655-65. [PMID: 18651798 DOI: 10.1089/cmb.2007.0114] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Our group has recently developed a compact, universal protein binding microarray (PBM) that can be used to determine the binding preferences of transcription factors (TFs). This design represents all possible sequence variants of a given length k (i.e., all k-mers) on a single array, allowing a complete characterization of the binding specificities of a given TF. Here, we present the mathematical foundations of this design based on de Bruijn sequences generated by linear feedback shift registers. We show that these sequences represent the maximum number of variants for any given set of array dimensions (i.e., number of spots and spot lengths), while also exhibiting desirable pseudo-randomness properties. Moreover, de Bruijn sequences can be selected that represent gapped sequence patterns, further increasing the coverage of the array. This design yields a powerful experimental platform that allows the binding preferences of TFs to be determined with unprecedented resolution.
Collapse
Affiliation(s)
- Anthony A Philippakis
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | | | | | | |
Collapse
|
32
|
Marabotti A, Spyrakis F, Facchiano A, Cozzini P, Alberti S, Kellogg GE, Mozzarelli A. Energy-based prediction of amino acid-nucleotide base recognition. J Comput Chem 2008; 29:1955-69. [PMID: 18366021 DOI: 10.1002/jcc.20954] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Despite decades of investigations, it is not yet clear whether there are rules dictating the specificity of the interaction between amino acids and nucleotide bases. This issue was addressed by determining, in a dataset consisting of 100 high-resolution protein-DNA structures, the frequency and energy of interaction between each amino acid and base, and the energetics of water-mediated interactions. The analysis was carried out using HINT, a non-Newtonian force field encoding both enthalpic and entropic contributions, and Rank, a geometry-based tool for evaluating hydrogen bond interactions. A frequency- and energy-based preferential interaction of Arg and Lys with G, Asp and Glu with C, and Asn and Gln with A was found. Not only favorable, but also unfavorable contacts were found to be conserved. Water-mediated interactions strongly increase the probability of Thr-A, Lys-A, and Lys-C contacts. The frequency, interaction energy, and water enhancement factors associated with each amino acid-base pair were used to predict the base triplet recognized by the helix motif in 45 zinc fingers, which represents an ideal case study for the analysis of one-to-one amino acid-base pair contacts. The model correctly predicted 70.4% of 135 amino acid-base pairs, and, by weighting the energetic relevance of each amino acid-base pair to the overall recognition energy, it yielded a prediction rate of 89.7%.
Collapse
Affiliation(s)
- Anna Marabotti
- Laboratory for Bioinformatics and Computational Biology, Institute of Food Science, National Research Council, Avellino, Italy.
| | | | | | | | | | | | | |
Collapse
|
33
|
Omagari K, Yoshimura H, Suzuki T, Takano M, Ohmori M, Sarai A. ΔG-based prediction and experimental confirmation of SYCRP1-binding sites on the Synechocystis genome. FEBS J 2008; 275:4786-95. [DOI: 10.1111/j.1742-4658.2008.06618.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
34
|
Anashkina AA, Tumanyan VG, Kuznetsov EN, Galkin AV, Esipova NG. Relative occurrence of amino acid-nucleotide contacts assessed by Voronoi-Delaunay tessellation of protein-DNA interfaces. Biophysics (Nagoya-shi) 2008. [DOI: 10.1134/s0006350908030032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
35
|
Babbitt GA, Kim Y. Inferring natural selection on fine-scale chromatin organization in yeast. Mol Biol Evol 2008; 25:1714-27. [PMID: 18515262 DOI: 10.1093/molbev/msn127] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Despite its potential role in the evolution of complex phenotypes, the detection of negative (purifying) and positive selection on noncoding regulatory sequence has been elusive because of the inherent difficulty in predicting the functional consequences of mutations on noncoding sequence. Because the functioning of regulatory sequence depends upon both chromatin configuration and cis-regulatory factor binding, we investigate the idea that the functional conservation of regulatory regions should be associated with the conservation of sequence-dependent bending properties of DNA that determine its affinity for the nucleosome. Recent advances in the computational prediction of sequence-dependent affinity to nucleosomes provide an opportunity to distinguish between neutral and nonneutral evolution of fine-scale chromatin organization. Here, a statistical test is presented for detecting evolutionary conservation and/or adaptive evolution of nucleosome affinity from interspecies comparisons of DNA sequences. Local nucleosome affinities of homologous sequences were calculated using 2 recently published methods. A randomization test was applied to sites of mutation to evaluate the similarity of DNA-nucleosome affinity between several closely related species of Saccharomyces yeast. For most of the genes we analyzed, the conservation of local nucleosome affinity was detected at a few distinct locations in the upstream noncoding region. Our results also demonstrate that different patterns of chromatin evolution have shaped DNA-nucleosome interaction at the core promoters of TATA-containing and TATA-less genes and that elevated purifying selection has maintained low affinity for nucleosome in the core promoters of the latter group. Across the entire yeast genome, DNA-nucleosome interaction was also discovered to be significantly more conserved in TATA-less genes compared with TATA-containing genes.
Collapse
Affiliation(s)
- G A Babbitt
- Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, USA.
| | | |
Collapse
|
36
|
Mrowka R, Blüthgen N, Fähling M. Seed-based systematic discovery of specific transcription factor target genes. FEBS J 2008; 275:3178-92. [PMID: 18485006 DOI: 10.1111/j.1742-4658.2008.06471.x] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Reliable prediction of specific transcription factor target genes is a major challenge in systems biology and functional genomics. Current sequence-based methods yield many false predictions, due to the short and degenerated DNA-binding motifs. Here, we describe a new systematic genome-wide approach, the seed-distribution-distance method, that searches large-scale genome-wide expression data for genes that are similarly expressed as known targets. This method is used to identify genes that are likely targets, allowing sequence-based methods to focus on a subset of genes, giving rise to fewer false-positive predictions. We show by cross-validation that this method is robust in recovering specific target genes. Furthermore, this method identifies genes with typical functions and binding motifs of the seed. The method is illustrated by predicting novel targets of the transcription factor nuclear factor kappaB (NF-kappaB). Among the new targets is optineurin, which plays a key role in the pathogenesis of acquired blindness caused by adult-onset primary open-angle glaucoma. We show experimentally that the optineurin gene and other predicted genes are targets of NF-kappaB. Thus, our data provide a missing link in the signalling of NF-kappaB and the damping function of optineurin in signalling feedback of NF-kappaB. We present a robust and reliable method to enhance the genome-wide prediction of specific transcription factor target genes that exploits the vast amount of expression information available in public databases today.
Collapse
Affiliation(s)
- Ralf Mrowka
- Paul-Ehrlich-Zentrum für Experimentelle Medizin, AG Systems Biology-Computational Physiology, Tucholskystrasse 2, Berlin, Germany.
| | | | | |
Collapse
|
37
|
van Oeffelen L, Cornelis P, Van Delm W, De Ridder F, De Moor B, Moreau Y. Detecting cis-regulatory binding sites for cooperatively binding proteins. Nucleic Acids Res 2008; 36:e46. [PMID: 18400778 PMCID: PMC2377448 DOI: 10.1093/nar/gkn140] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Several methods are available to predict cis-regulatory modules in DNA based on position weight matrices. However, the performance of these methods generally depends on a number of additional parameters that cannot be derived from sequences and are difficult to estimate because they have no physical meaning. As the best way to detect cis-regulatory modules is the way in which the proteins recognize them, we developed a new scoring method that utilizes the underlying physical binding model. This method requires no additional parameter to account for multiple binding sites; and the only necessary parameters to model homotypic cooperative interactions are the distances between adjacent protein binding sites in basepairs, and the corresponding cooperative binding constants. The heterotypic cooperative binding model requires one more parameter per cooperatively binding protein, which is the concentration multiplied by the partition function of this protein. In a case study on the bacterial ferric uptake regulator, we show that our scoring method for homotypic cooperatively binding proteins significantly outperforms other PWM-based methods where biophysical cooperativity is not taken into account.
Collapse
Affiliation(s)
- Liesbeth van Oeffelen
- Department of Electrical Engineering, ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium.
| | | | | | | | | | | |
Collapse
|
38
|
Laskowski RA, Thornton JM. Understanding the molecular machinery of genetics through 3D structures. Nat Rev Genet 2008; 9:141-51. [PMID: 18160966 DOI: 10.1038/nrg2273] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Detailed knowledge of the three-dimensional structures of biological molecules has had an enormous impact on all areas of biological science, including genetics, as structure can reveal the fine details of how molecules perform their biological functions. Here we consider how changes in protein sequence affect the corresponding 3D structure, and describe how structural information about proteins, DNA and chromatin has shed light on gene regulatory mechanisms and the storage and transmission of epigenetic information. Finally, we describe how structure determination is benefiting from the high-throughput technologies of the worldwide structural genomics projects.
Collapse
Affiliation(s)
- Roman A Laskowski
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | |
Collapse
|
39
|
Datson NA, Morsink MC, Meijer OC, de Kloet ER. Central corticosteroid actions: Search for gene targets. Eur J Pharmacol 2008; 583:272-89. [PMID: 18295201 DOI: 10.1016/j.ejphar.2007.11.070] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2007] [Revised: 11/12/2007] [Accepted: 11/14/2007] [Indexed: 12/14/2022]
Abstract
Although many of the physiological effects of corticosteroid stress hormones on neuronal function are well recognised, the underlying genomic mechanisms are only starting to be elucidated. Linking physiology and genomics has proven to be a complicated task, despite the emergence of large-scale gene expression profiling technology in the last decade. This is in part due to the complexity of glucocorticoid-signaling, in part due to the complexity of the brain itself. The presence of a binary receptor system for glucocorticoid hormones in limbic brain structures, the coexistence of membrane and intracellular receptors and the highly contextual action of glucocorticoids contribute to this complexity. In addition, the anatomical complexity, extensive cellular heterogeneity of brain and the modest changes in gene expression (mostly in the range of 10-30%) hamper detection of responsive genes, in particular of low abundant transcripts, such as many neurotransmitter receptors and growth factors. Nonetheless, ongoing research into central targets of glucocorticoids has identified many different functional gene classes that underlie the diverse effects of glucocorticoids on brain function. These functional classes include genes involved in energy metabolism, signal transduction, neuronal structure, vesicle dynamics, neurotransmitter catabolism, cell adhesion, genes encoding neurotrophic factors and their receptors and genes involved in regulating glucocorticoid-signalling. The aim of this review is to give an overview of the current status of the field on identification of central corticosteroid targets, discuss the opportunities and pitfalls and highlight new developments in understanding central corticosteroid action.
Collapse
Affiliation(s)
- Nicole A Datson
- Division of Medical Pharmacology, Leiden/Amsterdam Center for Drug Research & Leiden University Medical Center, The Netherlands.
| | | | | | | |
Collapse
|
40
|
Jauch R, Ng CKL, Saikatendu KS, Stevens RC, Kolatkar PR. Crystal structure and DNA binding of the homeodomain of the stem cell transcription factor Nanog. J Mol Biol 2007; 376:758-70. [PMID: 18177668 DOI: 10.1016/j.jmb.2007.11.091] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2007] [Revised: 11/27/2007] [Accepted: 11/28/2007] [Indexed: 11/24/2022]
Abstract
The transcription factor Nanog is an upstream regulator in early mammalian development and a key determinant of pluripotency in embryonic stem cells. Nanog binds to promoter elements of hundreds of target genes and regulates their expression by an as yet unknown mechanism. Here, we report the crystal structure of the murine Nanog homeodomain (HD) and analysis of its interaction with a DNA element derived from the Tcf3 promoter. Two Nanog amino acid pairs, unique among HD sequences, appear to affect the mechanism of nonspecific DNA recognition as well as maintain the integrity of the structural scaffold. To assess selective DNA recognition by Nanog, we performed electrophoretic mobility shift assays using a panel of modified DNA binding sites and found that Nanog HD preferentially binds the TAAT(G/T)(G/T) motif. A series of rational mutagenesis experiments probing the role of six variant residues of Nanog on its DNA binding function establish their role in affecting binding affinity but not binding specificity. Together, the structural and functional evidence establish Nanog as a distant member of a Q50-type HD despite having considerable variation at the sequence level.
Collapse
Affiliation(s)
- Ralf Jauch
- Laboratory of Structural Biochemistry, Genome Institute of Singapore, 60 Biopolis Street, Singapore 138672, Singapore.
| | | | | | | | | |
Collapse
|
41
|
Coulocheri SA, Pigis DG, Papavassiliou KA, Papavassiliou AG. Hydrogen bonds in protein–DNA complexes: Where geometry meets plasticity. Biochimie 2007; 89:1291-303. [PMID: 17825469 DOI: 10.1016/j.biochi.2007.07.020] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2007] [Accepted: 07/20/2007] [Indexed: 12/27/2022]
Abstract
Recognition of a DNA sequence by a protein is achieved by interface-coupled chemical and shape complementation. This complementation between the two molecules is clearly directional and is determined by the specific chemical contacts including mainly hydrogen bonds. Directionality is an instrumental property of hydrogen bonding as it influences molecular conformations, which also affects DNA-protein recognition. The prominent elements in the recognition of a particular DNA sequence by a protein are the hydrogen-bond donors and acceptors of the base pairs into the grooves of the DNA that must interact with complementary moieties of the protein partner. Protein side chains make most of the crucial contacts through bidentate and complex hydrogen-bonding interactions with DNA base edges hence conferring remarkable specificity.
Collapse
Affiliation(s)
- Stavroula A Coulocheri
- Department of Biological Chemistry, Medical School, University of Athens, Athens, Greece
| | | | | | | |
Collapse
|
42
|
Copley RR, Totrov M, Linnell J, Field S, Ragoussis J, Udalova IA. Functional conservation of Rel binding sites in drosophilid genomes. Genome Res 2007; 17:1327-35. [PMID: 17785540 PMCID: PMC1950901 DOI: 10.1101/gr.6490707] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Evolutionary constraints on gene regulatory elements are poorly understood: Little is known about how the strength of transcription factor binding correlates with DNA sequence conservation, and whether transcription factor binding sites can evolve rapidly while retaining their function. Here we use the model of the NFKB/Rel-dependent gene regulation in divergent Drosophila species to examine the hypothesis that the functional properties of authentic transcription factor binding sites are under stronger evolutionary constraints than the genomic background. Using molecular modeling we compare tertiary structures of the Drosophila Rel family proteins Dorsal, Dif, and Relish and demonstrate that their DNA-binding and protein dimerization domains undergo distinct rates of evolution. The accumulated amino acid changes, however, are unlikely to affect DNA sequence recognition and affinity. We employ our recently developed microarray-based experimental platform and principal coordinates statistical analysis to quantitatively and systematically profile DNA binding affinities of three Drosophila Rel proteins to 10,368 variants of the NFKB recognition sequences. We then correlate the evolutionary divergence of gene regulatory regions with differences in DNA binding affinities. Genome-wide analyses reveal a significant increase in the number of conserved Rel binding sites in promoters of developmental and immune genes. Significantly, the affinity of Rel proteins to these sites was higher than to less conserved sites and was maintained by the conservation of the DNA binding site sequence (static conservation) or in some cases despite significantly diverged sequences (dynamic conservation). We discuss how two types of conservation may contribute to the stabilization and optimization of a functional gene regulatory code in evolution.
Collapse
Affiliation(s)
- Richard R. Copley
- Wellcome Trust Centre for Human Genetics, Oxford University, Oxford OX3 7BN, United Kingdom
- Corresponding authors.E-mail ; fax 44-208-3834499.E-mail ; fax 44-1865-287664
| | | | - Jane Linnell
- Wellcome Trust Centre for Human Genetics, Oxford University, Oxford OX3 7BN, United Kingdom
| | - Simon Field
- Wellcome Trust Centre for Human Genetics, Oxford University, Oxford OX3 7BN, United Kingdom
| | - Jiannis Ragoussis
- Wellcome Trust Centre for Human Genetics, Oxford University, Oxford OX3 7BN, United Kingdom
| | - Irina A. Udalova
- Wellcome Trust Centre for Human Genetics, Oxford University, Oxford OX3 7BN, United Kingdom
- Kennedy Institute of Rheumatology, Imperial College, London W6 8LH, United Kingdom
- Corresponding authors.E-mail ; fax 44-208-3834499.E-mail ; fax 44-1865-287664
| |
Collapse
|
43
|
Abstract
DNA-protein interactions are fundamental to many biological processes, including the regulation of gene expression. Determining the binding affinities of transcription factors (TFs) to different DNA sequences allows the quantitative modeling of transcriptional regulatory networks and has been a significant technical challenge in molecular biology for many years. A recent paper by Maerkl and Quake1 demonstrated the use of microfluidic technology for the analysis of DNA-protein interactions. An array of short DNA sequences was spotted onto a glass slide, which was then covered with a microfluidic device allowing each spot to be within a chamber into which the flow of materials was controlled by valves. By trapping the DNA-protein complexes on the surface and measuring their concentrations microscopically, they could determine the binding affinity to a large number of DNA sequences that were varied systematically. They studied four TFs from the basic helix-loop-helix family of proteins, all of which bind to E-box sites with the consensus CAnnTG (where "n" can be any base), and showed that variations in affinity for different sites allows each TF to regulate different genes.
Collapse
Affiliation(s)
- Gary D Stormo
- Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA.
| | | |
Collapse
|
44
|
Moroni E, Caselle M, Fogolari F. Identification of DNA-binding protein target sequences by physical effective energy functions: free energy analysis of lambda repressor-DNA complexes. BMC STRUCTURAL BIOLOGY 2007; 7:61. [PMID: 17900341 PMCID: PMC2194778 DOI: 10.1186/1472-6807-7-61] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2007] [Accepted: 09/27/2007] [Indexed: 11/26/2022]
Abstract
Background Specific binding of proteins to DNA is one of the most common ways gene expression is controlled. Although general rules for the DNA-protein recognition can be derived, the ambiguous and complex nature of this mechanism precludes a simple recognition code, therefore the prediction of DNA target sequences is not straightforward. DNA-protein interactions can be studied using computational methods which can complement the current experimental methods and offer some advantages. In the present work we use physical effective potentials to evaluate the DNA-protein binding affinities for the λ repressor-DNA complex for which structural and thermodynamic experimental data are available. Results The binding free energy of two molecules can be expressed as the sum of an intermolecular energy (evaluated using a molecular mechanics forcefield), a solvation free energy term and an entropic term. Different solvation models are used including distance dependent dielectric constants, solvent accessible surface tension models and the Generalized Born model. The effect of conformational sampling by Molecular Dynamics simulations on the computed binding energy is assessed; results show that this effect is in general negative and the reproducibility of the experimental values decreases with the increase of simulation time considered. The free energy of binding for non-specific complexes, estimated using the best energetic model, agrees with earlier theoretical suggestions. As a results of these analyses, we propose a protocol for the prediction of DNA-binding target sequences. The possibility of searching regulatory elements within the bacteriophage λ genome using this protocol is explored. Our analysis shows good prediction capabilities, even in absence of any thermodynamic data and information on the naturally recognized sequence. Conclusion This study supports the conclusion that physics-based methods can offer a completely complementary methodology to sequence-based methods for the identification of DNA-binding protein target sequences.
Collapse
Affiliation(s)
- Elisabetta Moroni
- Dipartimento di Fisica Teorica, Universià di Torino and INFN, Via P. Giuria 1, 10125 Torino, Italy
- Dipartimento di Fisica G. Occhialini, Università di Milano-Bicocca and INFN, Piazza delle Scienze 3, 20156 Milano, Italy
| | - Michele Caselle
- Dipartimento di Fisica Teorica, Universià di Torino and INFN, Via P. Giuria 1, 10125 Torino, Italy
| | - Federico Fogolari
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| |
Collapse
|
45
|
Abstract
Statistical analysis of structures from the PBD has been used to examine the role that the aromatic amino acids play in protein-nucleic acid recognition. In protein-DNA complexes, the residues Phe and His are found to bind selectively to the DNA chain--Phe to A and T, and His to T and G. The preferred binding modes are identified, and the interactions involving Phe are shown to be important in the transcription process. In protein-RNA complexes, Phe is found to occur far less often and is instead replaced by Trp, which binds selectively to C and G, offering a possible mechanism for differentiation between the two nucleic acids. SASA analysis of the two sets of complexes suggests that all of the aromatic amino acids are more heavily involved in binding than would be expected on the balance of probability. Phe and Tyr occur approximately equal in both sets of data, whereas the proportions of His and Trp vary considerably, supporting the idea that these residues may be involved in differentiating between the two nucleic acids.
Collapse
Affiliation(s)
- Christopher M Baker
- Department of Chemistry, Physical and Theoretical Chemistry Laboratory, University of Oxford, South Parks Road, Oxford OX1 3QZ, UK
| | | |
Collapse
|
46
|
Marabotti A, Colonna G, Facchiano A. New computational strategy to analyze the interactions of ERalpha and ERbeta with different ERE sequences. J Comput Chem 2007; 28:1031-41. [PMID: 17269124 DOI: 10.1002/jcc.20582] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The importance of computational methods for the simulation and analysis of biological systems has increased during the last years. In particular, methods to predict binding energies are developing not only with the aim of ranking the affinities between two or more complexes, but also to quantify the contribution of different types of interaction. In this work, we present the application of HINT, a non Newtonian force field, to rank the affinities of complexes formed by estrogen receptors (ER) alpha and beta and different estrogen responsive elements (ERE) near the estrogen-regulated genes. We used the crystallographic coordinates of the DNA binding domain of ERalpha complexed to a consensus ERE as a starting point to simulate several complexes in which some nucleotides in the ERE sequence were mutated. Moreover, we used homology modeling methods to create the structure of the complexes between the DNA binding domain of ERbeta (for which no experimental structures are currently available) and the same ERE sequences. Our results show that HINT is able to rank the affinities of ERalpha and ERbeta for different ERE sequences, and to correctly identify the positions on the DNA sequence that are most important for binding affinity. Moreover, the HINT output gives us the opportunity to identify and quantify the role played by each single atom of amino acids and nucleotides in the binding event, as well as to predict the effect on the binding affinity for other nucleotide mutations.
Collapse
Affiliation(s)
- Anna Marabotti
- Laboratory of Bioinformatics and Computational Biology, Institute of Food Science, National Research Council, Avellino, Italy.
| | | | | |
Collapse
|
47
|
Lou C, Yang X, Liu X, He B, Ouyang Q. A quantitative study of lambda-phage SWITCH and its components. Biophys J 2007; 92:2685-93. [PMID: 17259278 PMCID: PMC1831702 DOI: 10.1529/biophysj.106.097089] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We propose what we believe is a new model to quantitatively describe the lambda-phage SWITCH system. The model incorporates facilitated transfer mechanism of transcription factor, which can be simplified into a two-step reaction. We first sequentially obtain two indispensable parameters by fitting our model to experimental data of two simple systems, and then apply them to study the natural lambda-SWITCH system. By incorporating the facilitated transfer mechanism, we find that in RecA(-) host Escherichia coli, the wild-type lambda-lysogenic state is in a monostable regime rather than in a bistable regime. Furthermore, the model explains the weak role of Cro protein and probably sheds light on the evolution of lambda-Cro protein, which is known to be structurally distinct from the other Cros in lambdoid family members.
Collapse
Affiliation(s)
- Chunbo Lou
- Center for Theoretical Biology and School of Physics, Peking University, Beijing, 100871, China
| | | | | | | | | |
Collapse
|
48
|
Energetics of the protein-DNA-water interaction. BMC STRUCTURAL BIOLOGY 2007; 7:4. [PMID: 17214883 PMCID: PMC1781455 DOI: 10.1186/1472-6807-7-4] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2006] [Accepted: 01/10/2007] [Indexed: 11/30/2022]
Abstract
Background To understand the energetics of the interaction between protein and DNA we analyzed 39 crystallographically characterized complexes with the HINT (Hydropathic INTeractions) computational model. HINT is an empirical free energy force field based on solvent partitioning of small molecules between water and 1-octanol. Our previous studies on protein-ligand complexes demonstrated that free energy predictions were significantly improved by taking into account the energetic contribution of water molecules that form at least one hydrogen bond with each interacting species. Results An initial correlation between the calculated HINT scores and the experimentally determined binding free energies in the protein-DNA system exhibited a relatively poor r2 of 0.21 and standard error of ± 1.71 kcal mol-1. However, the inclusion of 261 waters that bridge protein and DNA improved the HINT score-free energy correlation to an r2 of 0.56 and standard error of ± 1.28 kcal mol-1. Analysis of the water role and energy contributions indicate that 46% of the bridging waters act as linkers between amino acids and nucleotide bases at the protein-DNA interface, while the remaining 54% are largely involved in screening unfavorable electrostatic contacts. Conclusion This study quantifies the key energetic role of bridging waters in protein-DNA associations. In addition, the relevant role of hydrophobic interactions and entropy in driving protein-DNA association is indicated by analyses of interaction character showing that, together, the favorable polar and unfavorable polar/hydrophobic-polar interactions (i.e., desolvation) mostly cancel.
Collapse
|
49
|
Davuluri RV. Bioinformatics tools for modeling transcription factor target genes and epigenetic changes. Methods Mol Biol 2007; 408:129-151. [PMID: 18314581 DOI: 10.1007/978-1-59745-547-3_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The combinatorial control of gene regulatory switches involves both transcription factor (TF) complexes and associated epigenetic modifications to the chromatin template. The novel high-throughput technologies, such as Chromatin ImmunoPrecipitation ChIP-chip, have enabled genome-wide in vivo identification of TF target regulatory regions and related epigenetic modifications, which led to the view of highly dynamic TF-DNA interactions in activated or repressed promoters. Consequently, modeling and elucidating the combinatorial interaction of TFs and corresponding cis-regulatory modules in target promoters is of paramount interest. An estimated 5% of the genes in mammalian genomes code for TF proteins, and computational modeling of cis-regulatory logic would rapidly increase the pace of experimental confirmation of TF target promoters at the bench. The purpose of this chapter is to discuss the use of different bioinformatics tools for predicting the target genes of TFs of interest in mammalian genomes, and the application of these methods in the analysis of ChIP-chip experimental data. The author describes most commonly used databases and prediction programs that are available on the World Wide Web and demonstrate the use of some of these programs by an example. A list of these programs is provided along with their web Uniform Resource Locator (URLs) and guidelines for successful application are suggested.
Collapse
Affiliation(s)
- Ramana V Davuluri
- OSU Comprehensive Cancer Center, Ohio State University, Columbus, USA
| |
Collapse
|
50
|
Qian J, Lin J, Zack DJ. Characterization of binding sites of eukaryotic transcription factors. GENOMICS PROTEOMICS & BIOINFORMATICS 2006; 4:67-79. [PMID: 16970547 PMCID: PMC5054036 DOI: 10.1016/s1672-0229(06)60019-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
To explore the nature of eukaryotic transcription factor (TF) binding sites and determine how they differ from surrounding DNA sequences, we examined four features associated with DNA binding sites: G+C content, pattern complexity, palindromic structure, and Markov sequence ordering. Our analysis of the regulatory motifs obtained from the TRANSFAC database, using yeast intergenic sequences as background, revealed that these four features show variable enrichment in motif sequences. For example, motif sequences were more likely to have palindromic structure than were background sequences. In addition, these features were tightly localized to the regulatory motifs, indicating that they are a property of the motif sequences themselves and are not shared by the general promoter “environment” in which the regulatory motifs reside. By breaking down the motif sequences according to the TF classes to which they bind, more specific associations were identified. Finally, we found that some correlations, such as G+C content enrichment, were species-specific, while others, such as complexity enrichment, were universal across the species examined. The quantitative analysis provided here should increase our understanding of protein-DNA interactions and also help facilitate the discovery of regulatory motifs through bioinformatics.
Collapse
Affiliation(s)
- Jiang Qian
- The Wilmer Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.
| | | | | |
Collapse
|