51
|
Zykovich A, Korf I, Segal DJ. Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing. Nucleic Acids Res 2010; 37:e151. [PMID: 19843614 PMCID: PMC2794170 DOI: 10.1093/nar/gkp802] [Citation(s) in RCA: 115] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Transcription factor–DNA interactions are some of the most important processes in biology because they directly control hereditary information. The targets of most transcription factor are unknown. In this report, we introduce Bind-n-Seq, a new high-throughput method for analyzing protein–DNA interactions in vitro, with several advantages over current methods. The procedure has three steps (i) binding proteins to randomized oligonucleotide DNA targets, (ii) sequencing the bound oligonucleotide with massively parallel technology and (iii) finding motifs among the sequences. De novo binding motifs determined by this method for the DNA-binding domains of two well-characterized zinc-finger proteins were similar to those described previously. Furthermore, calculations of the relative affinity of the proteins for specific DNA sequences correlated significantly with previous studies (R2 = 0.9). These results present Bind-n-Seq as a highly rapid and parallel method for determining in vitro binding sites and relative affinities.
Collapse
Affiliation(s)
- Artem Zykovich
- Genome Center, University of California, Davis, CA 95616, USA
| | | | | |
Collapse
|
52
|
van Hijum SAFT, Medema MH, Kuipers OP. Mechanisms and evolution of control logic in prokaryotic transcriptional regulation. Microbiol Mol Biol Rev 2009; 73:481-509, Table of Contents. [PMID: 19721087 PMCID: PMC2738135 DOI: 10.1128/mmbr.00037-08] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
A major part of organismal complexity and versatility of prokaryotes resides in their ability to fine-tune gene expression to adequately respond to internal and external stimuli. Evolution has been very innovative in creating intricate mechanisms by which different regulatory signals operate and interact at promoters to drive gene expression. The regulation of target gene expression by transcription factors (TFs) is governed by control logic brought about by the interaction of regulators with TF binding sites (TFBSs) in cis-regulatory regions. A factor that in large part determines the strength of the response of a target to a given TF is motif stringency, the extent to which the TFBS fits the optimal TFBS sequence for a given TF. Advances in high-throughput technologies and computational genomics allow reconstruction of transcriptional regulatory networks in silico. To optimize the prediction of transcriptional regulatory networks, i.e., to separate direct regulation from indirect regulation, a thorough understanding of the control logic underlying the regulation of gene expression is required. This review summarizes the state of the art of the elements that determine the functionality of TFBSs by focusing on the molecular biological mechanisms and evolutionary origins of cis-regulatory regions.
Collapse
Affiliation(s)
- Sacha A F T van Hijum
- Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Kerklaan 30, 9751 NN Haren, The Netherlands.
| | | | | |
Collapse
|
53
|
Wunderlich Z, Mirny LA. Using genome-wide measurements for computational prediction of SH2-peptide interactions. Nucleic Acids Res 2009; 37:4629-41. [PMID: 19502496 PMCID: PMC2724268 DOI: 10.1093/nar/gkp394] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Peptide-recognition modules (PRMs) are used throughout biology to mediate protein–protein interactions, and many PRMs are members of large protein domain families. Recent genome-wide measurements describe networks of peptide–PRM interactions. In these networks, very similar PRMs recognize distinct sets of peptides, raising the question of how peptide-recognition specificity is achieved using similar protein domains. The analysis of individual protein complex structures often gives answers that are not easily applicable to other members of the same PRM family. Bioinformatics-based approaches, one the other hand, may be difficult to interpret physically. Here we integrate structural information with a large, quantitative data set of SH2 domain–peptide interactions to study the physical origin of domain–peptide specificity. We develop an energy model, inspired by protein folding, based on interactions between the amino-acid positions in the domain and peptide. We use this model to successfully predict which SH2 domains and peptides interact and uncover the positions in each that are important for specificity. The energy model is general enough that it can be applied to other members of the SH2 family or to new peptides, and the cross-validation results suggest that these energy calculations will be useful for predicting binding interactions. It can also be adapted to study other PRM families, predict optimal peptides for a given SH2 domain, or study other biological interactions, e.g. protein–DNA interactions.
Collapse
Affiliation(s)
- Zeba Wunderlich
- Biophysics Program, Harvard University, Cambridge, MA 02138, USA
| | | |
Collapse
|
54
|
Narlikar L, Ovcharenko I. Identifying regulatory elements in eukaryotic genomes. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2009; 8:215-30. [PMID: 19498043 DOI: 10.1093/bfgp/elp014] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Proper development and functioning of an organism depends on precise spatial and temporal expression of all its genes. These coordinated expression-patterns are maintained primarily through the process of transcriptional regulation. Transcriptional regulation is mediated by proteins binding to regulatory elements on the DNA in a combinatorial manner, where particular combinations of transcription factor binding sites establish specific regulatory codes. In this review, we survey experimental and computational approaches geared towards the identification of proximal and distal gene regulatory elements in the genomes of complex eukaryotes. Available approaches that decipher the genetic structure and function of regulatory elements by exploiting various sources of information like gene expression data, chromatin structure, DNA-binding specificities of transcription factors, cooperativity of transcription factors, etc. are highlighted. We also discuss the relevance of regulatory elements in the context of human health through examples of mutations in some of these regions having serious implications in misregulation of genes and being strongly associated with human disorders.
Collapse
Affiliation(s)
- Leelavati Narlikar
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | |
Collapse
|
55
|
Grigoryan G, Reinke AW, Keating AE. Design of protein-interaction specificity gives selective bZIP-binding peptides. Nature 2009; 458:859-64. [PMID: 19370028 PMCID: PMC2748673 DOI: 10.1038/nature07885] [Citation(s) in RCA: 289] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2008] [Accepted: 02/09/2009] [Indexed: 01/14/2023]
Abstract
Interaction specificity is a required feature of biological networks and a necessary characteristic of protein or small-molecule reagents and therapeutics. The ability to alter or inhibit protein interactions selectively would advance basic and applied molecular science. Assessing or modelling interaction specificity requires treating multiple competing complexes, which presents computational and experimental challenges. Here we present a computational framework for designing protein-interaction specificity and use it to identify specific peptide partners for human basic-region leucine zipper (bZIP) transcription factors. Protein microarrays were used to characterize designed, synthetic ligands for all but one of 20 bZIP families. The bZIP proteins share strong sequence and structural similarities and thus are challenging targets to bind specifically. Nevertheless, many of the designs, including examples that bind the oncoproteins c-Jun, c-Fos and c-Maf (also called JUN, FOS and MAF, respectively), were selective for their targets over all 19 other families. Collectively, the designs exhibit a wide range of interaction profiles and demonstrate that human bZIPs have only sparsely sampled the possible interaction space accessible to them. Our computational method provides a way to systematically analyse trade-offs between stability and specificity and is suitable for use with many types of structure-scoring functions; thus, it may prove broadly useful as a tool for protein design.
Collapse
Affiliation(s)
- Gevorg Grigoryan
- MIT Department of Biology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | | | | |
Collapse
|
56
|
Temiz NA, Camacho CJ. Experimentally based contact energies decode interactions responsible for protein-DNA affinity and the role of molecular waters at the binding interface. Nucleic Acids Res 2009; 37:4076-88. [PMID: 19429892 PMCID: PMC2709573 DOI: 10.1093/nar/gkp289] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
A major obstacle towards understanding the molecular basis of transcriptional regulation is the lack of a recognition code for protein–DNA interactions. Using high-quality crystal structures and binding data on the promiscuous family of C2H2 zinc fingers (ZF), we decode 10 fundamental specific interactions responsible for protein–DNA recognition. The interactions include five hydrogen bond types, three atomic desolvation penalties, a favorable non-polar energy, and a novel water accessibility factor. We apply this code to three large datasets containing a total of 89 C2H2 transcription factor (TF) mutants on the three ZFs of EGR. Guided by molecular dynamics simulations of individual ZFs, we map the interactions into homology models that embody all feasible intra- and intermolecular bonds, selecting for each sequence the structure with the lowest free energy. These interactions reproduce the change in affinity of 35 mutants of finger I (R2 = 0.998), 23 mutants of finger II (R2 = 0.96) and 31 finger III human domains (R2 = 0.94). Our findings reveal recognition rules that depend on DNA sequence/structure, molecular water at the interface and induced fit of the C2H2 TFs. Collectively, our method provides the first robust framework to decode the molecular basis of TFs binding to DNA.
Collapse
Affiliation(s)
- N Alpay Temiz
- Department of Computational Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | | |
Collapse
|
57
|
Reeves GA, Talavera D, Thornton JM. Genome and proteome annotation: organization, interpretation and integration. J R Soc Interface 2009; 6:129-47. [PMID: 19019817 DOI: 10.1098/rsif.2008.0341] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Recent years have seen a huge increase in the generation of genomic and proteomic data. This has been due to improvements in current biological methodologies, the development of new experimental techniques and the use of computers as support tools. All these raw data are useless if they cannot be properly analysed, annotated, stored and displayed. Consequently, a vast number of resources have been created to present the data to the wider community. Annotation tools and databases provide the means to disseminate these data and to comprehend their biological importance. This review examines the various aspects of annotation: type, methodology and availability. Moreover, it puts a special interest on novel annotation fields, such as that of phenotypes, and highlights the recent efforts focused on the integrating annotations.
Collapse
Affiliation(s)
- Gabrielle A Reeves
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
58
|
Liu Z, Guo JT, Li T, Xu Y. Structure-based prediction of transcription factor binding sites using a protein-DNA docking approach. Proteins 2009; 72:1114-24. [PMID: 18320590 DOI: 10.1002/prot.22002] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Accurate identification of transcription factor binding sites is critical to our understanding of transcriptional regulatory networks. To overcome the issue of high false-positive predictions that trouble the sequence-based prediction techniques, we have developed a structure-based prediction method that takes into consideration of interactions between the amino acids of a transcription factor and the nucleotides of its DNA binding sequence at structural level, along with an efficient protein-DNA docking algorithm. The docked structures between a protein and a DNA are evaluated using a knowledge-based energy function, in conjunction with van der Waals energy. Our docking algorithm supports quasi-flexible docking, overcoming a number of limiting issues faced by similar docking algorithms. Our rigid-body docking algorithm is tested on a dataset of 141 nonredundant transcription factor-DNA complex structures. The test results show that 63.1% of the 141 complex structures are reconstructed with accuracies better than 1.0 A RMSDs (root mean square deviation) and 79.4% of the complexes are predicted with accuracies better than 3.0 A RMSDs when using the native DNA structures. Our quasi-flexible docking algorithm, assuming that the DNA structures are not known, is tested on a separate set of 45 transcription factor-DNA complexes, of which 57.8% of the docked complex conformations achieve better than 1.0 A RMSDs while 71.1% of the complexes have RMSDs less than 3.0 A. We have also applied our method to predict the binding motifs of the ferric uptake regulator in E. coli and showed that most of the experimentally identified sites can be predicted accurately.
Collapse
Affiliation(s)
- Zhijie Liu
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia 30602, USA
| | | | | | | |
Collapse
|
59
|
Persikov AV, Osada R, Singh M. Predicting DNA recognition by Cys2His2 zinc finger proteins. ACTA ACUST UNITED AC 2008; 25:22-9. [PMID: 19008249 DOI: 10.1093/bioinformatics/btn580] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Cys(2)His(2) zinc finger (ZF) proteins represent the largest class of eukaryotic transcription factors. Their modular structure and well-conserved protein-DNA interface allow the development of computational approaches for predicting their DNA-binding preferences even when no binding sites are known for a particular protein. The 'canonical model' for ZF protein-DNA interaction consists of only four amino acid nucleotide contacts per zinc finger domain. RESULTS We present an approach for predicting ZF binding based on support vector machines (SVMs). While most previous computational approaches have been based solely on examples of known ZF protein-DNA interactions, ours additionally incorporates information about protein-DNA pairs known to bind weakly or not at all. Moreover, SVMs with a linear kernel can naturally incorporate constraints about the relative binding affinities of protein-DNA pairs; this type of information has not been used previously in predicting ZF protein-DNA binding. Here, we build a high-quality literature-derived experimental database of ZF-DNA binding examples and utilize it to test both linear and polynomial kernels for predicting ZF protein-DNA binding on the basis of the canonical binding model. The polynomial SVM outperforms previously published prediction procedures as well as the linear SVM. This may indicate the presence of dependencies between contacts in the canonical binding model and suggests that modification of the underlying structural model may result in further improved performance in predicting ZF protein-DNA binding. Overall, this work demonstrates that methods incorporating information about non-binding and relative binding of protein-DNA pairs have great potential for effective prediction of protein-DNA interactions. AVAILABILITY An online tool for predicting ZF DNA binding is available at http://compbio.cs.princeton.edu/zf/.
Collapse
Affiliation(s)
- Anton V Persikov
- Lewis-Sigler Institute for Integrative Genomics and Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | | | | |
Collapse
|
60
|
Ding G, Lorenz P, Kreutzer M, Li Y, Thiesen HJ. SysZNF: the C2H2 zinc finger gene database. Nucleic Acids Res 2008; 37:D267-73. [PMID: 18974185 PMCID: PMC2686507 DOI: 10.1093/nar/gkn782] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
C2H2 zinc finger (C2H2-ZNF) genes are one of the largest and most complex gene super-families in metazoan genomes, with hundreds of members in the human and mouse genome. The ongoing investigation of this huge gene family requires computational support to catalog genotype phenotype comparisons of C2H2-ZNF genes between related species and finally to extend the worldwide knowledge on the evolution of C2H2-ZNF genes in general. Here, we systematically collected all the C2H2-ZNF genes in the human and mouse genome and constructed a database named SysZNF to deposit available datasets related to these genes. In the database, each C2H2-ZNF gene entry consists of physical location, gene model (including different transcript forms), Affymetrix gene expression probes, protein domain structures, homologs (and synteny between human and mouse), PubMed references as well as links to relevant public databases. The clustered organization of the C2H2-ZNF genes is highlighted. The database can be searched using text strings or sequence information. The data are also available for batch download from the web site. Moreover, the graphical gene model/protein view system, sequence retrieval system and some other tools embedded in SysZNF facilitate the research on the C2H2 type ZNF genes under an integrative view. The database can be accessed from the URL http://epgd.biosino.org/SysZNF.
Collapse
Affiliation(s)
- Guohui Ding
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, P. R. China
| | | | | | | | | |
Collapse
|
61
|
Angarica VE, Pérez AG, Vasconcelos AT, Collado-Vides J, Contreras-Moreira B. Prediction of TF target sites based on atomistic models of protein-DNA complexes. BMC Bioinformatics 2008; 9:436. [PMID: 18922190 PMCID: PMC2585596 DOI: 10.1186/1471-2105-9-436] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2008] [Accepted: 10/16/2008] [Indexed: 11/10/2022] Open
Abstract
Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition.
Collapse
Affiliation(s)
- Vladimir Espinosa Angarica
- Departamento de Bioquímica y Biología Molecular y Celular, Facultad de Ciencias, Universidad de Zaragoza, Pedro Cerbuna 12, 50009 Zaragoza, España.
| | | | | | | | | |
Collapse
|
62
|
Liu J, Stormo GD. Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors. Bioinformatics 2008; 24:1850-7. [PMID: 18586699 DOI: 10.1093/bioinformatics/btn331] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Modeling and identifying the DNA-protein recognition code is one of the most challenging problems in computational biology. Several quantitative methods have been developed to model DNA-protein interactions with specific focus on the C(2)H(2) zinc-finger proteins, the largest transcription factor family in eukaryotic genomes. In many cases, they performed well. But the overall the predictive accuracy of these methods is still limited. One of the major reasons is all these methods used weight matrix models to represent DNA-protein interactions, assuming all base-amino acid contacts contribute independently to the total free energy of binding. RESULTS We present a context-dependent model for DNA-zinc-finger protein interactions that allows us to identify inter-positional dependencies in the DNA recognition code for C(2)H(2) zinc-finger proteins. The degree of non-independence was detected by comparing the linear perceptron model with the non-linear neural net (NN) model for their predictions of DNA-zinc-finger protein interactions. This dependency is supported by the complex base-amino acid contacts observed in DNA-zinc-finger interactions from structural analyses. Using extensive published qualitative and quantitative experimental data, we demonstrated that the context-dependent model developed in this study can significantly improves predictions of DNA binding profiles and free energies of binding for both individual zinc fingers and proteins with multiple zinc fingers when comparing to previous positional-independent models. This approach can be extended to other protein families with complex base-amino acid residue interactions that would help to further understand the transcriptional regulation in eukaryotic genomes. AVAILABILITY The software implemented as c programs and are available by request. http://ural.wustl.edu/softwares.html
Collapse
Affiliation(s)
- Jiajian Liu
- Department of Genetics, Washington University School of Medicine, 660 S Euclid, Box 8232, St. Louis, MO 63110, USA
| | | |
Collapse
|
63
|
Francke C, Kerkhoven R, Wels M, Siezen RJ. A generic approach to identify Transcription Factor-specific operator motifs; Inferences for LacI-family mediated regulation in Lactobacillus plantarum WCFS1. BMC Genomics 2008; 9:145. [PMID: 18371204 PMCID: PMC2329647 DOI: 10.1186/1471-2164-9-145] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2007] [Accepted: 03/27/2008] [Indexed: 12/18/2022] Open
Abstract
Background A key problem in the sequence-based reconstruction of regulatory networks in bacteria is the lack of specificity in operator predictions. The problem is especially prominent in the identification of transcription factor (TF) specific binding sites. More in particular, homologous TFs are abundant and, as they are structurally very similar, it proves difficult to distinguish the related operators by automated means. This also holds for the LacI-family, a family of TFs that is well-studied and has many members that fulfill crucial roles in the control of carbohydrate catabolism in bacteria including catabolite repression. To overcome the specificity problem, a comprehensive footprinting approach was formulated to identify TF-specific operator motifs and was applied to the LacI-family of TFs in the model gram positive organism, Lactobacillus plantarum WCFS1. The main premise behind the approach is that only orthologous sequences that share orthologous genomic context will share equivalent regulatory sites. Results When the approach was applied to the 12 LacI-family TFs of the model species, a specific operator motif was identified for each of them. With the TF-specific operator motifs, potential binding sites were found on the genome and putative minimal regulons could be defined. Moreover, specific inducers could in most cases be linked to the TFs through phylogeny, thereby unveiling the biological role of these regulons. The operator predictions indicated that the LacI-family TFs can be separated into two subfamilies with clearly distinct operator motifs. They also established that the operator related to the 'global' regulator CcpA is not inherently distinct from that of other LacI-family members, only more degenerate. Analysis of the chromosomal position of the identified putative binding sites confirmed that the LacI-family TFs are mostly auto-regulatory and relate mainly to carbohydrate uptake and catabolism. Conclusion Our approach to identify specific operator motifs for different TF-family members is specific and in essence generic. The data infer that, although the specific operator motifs can be used to identify minimal regulons, experimental knowledge on TF activity especially is essential to determine complete regulons as well as to estimate the overlap between TF affinities.
Collapse
Affiliation(s)
- Christof Francke
- TI Food and Nutrition, P,O, Box 557, 6700AN Wageningen, The Netherlands.
| | | | | | | |
Collapse
|
64
|
Vindal V, Ashwantha Kumar E, Ranjan A. Identification of operator sites within the upstream region of the putativemce2Rgene from mycobacteria. FEBS Lett 2008; 582:1117-22. [DOI: 10.1016/j.febslet.2008.02.074] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2008] [Revised: 02/26/2008] [Accepted: 02/29/2008] [Indexed: 10/22/2022]
|
65
|
Cho SY, Chung M, Park M, Park S, Lee YS. ZIFIBI: Prediction of DNA binding sites for zinc finger proteins. Biochem Biophys Res Commun 2008; 369:845-8. [PMID: 18325330 DOI: 10.1016/j.bbrc.2008.02.106] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2008] [Accepted: 02/22/2008] [Indexed: 11/26/2022]
Abstract
The cis-regulatory region of target genes is key elements in the transcriptional regulation of gene expression. Many of these cis-regulatory regions have not been identified by either biological experiments or computational methods. Recently, a few additional C(2)H(2) zinc finger transcription factor binding sites have been discovered. The majority of the zinc finger binding sites, however, are still unknown. In this study, we used publically available data to evaluate possible interaction patterns between nucleotides and the amino acids of zinc finger domains. We calculated the most probable state path of three nucleotides sequences using a Hidden Markov Model (HMM). We used these computations to predict C(2)H(2) zinc finger transcription factor binding sites in cis-regulatory regions of their target genes (http://bioinfo.hanyang.ac.kr/ZIFIBI/frameset.php).
Collapse
Affiliation(s)
- Soo Young Cho
- Division of Molecular and Life Sciences, Hanyang University, Sa 3 dong, Ansan, Kyunggodo 425-791, Republic of Korea
| | | | | | | | | |
Collapse
|
66
|
Habib N, Kaplan T, Margalit H, Friedman N. A novel Bayesian DNA motif comparison method for clustering and retrieval. PLoS Comput Biol 2008; 4:e1000010. [PMID: 18463706 PMCID: PMC2265534 DOI: 10.1371/journal.pcbi.1000010] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2007] [Accepted: 01/24/2008] [Indexed: 11/17/2022] Open
Abstract
Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors. Regulation of gene expression plays a central role in the activity of living cells and in their response to internal (e.g., cell division) or external (e.g., stress) stimuli. Key players in determining gene-specific regulation are transcription factors that bind sequence-specific sites on the DNA, modulating the expression of nearby genes. To understand the regulatory program of the cell, we need to identify these transcription factors, when they act, and on which genes. Transcription regulatory maps can be assembled by computational analysis of experimental data, by discovering the DNA recognition sequences (motifs) of transcription factors and their occurrences along the genome. Such an analysis usually results in a large number of overlapping motifs. To reconstruct regulatory maps, it is crucial to combine similar motifs and to relate them to transcription factors. To this end we developed an accurate fully-automated method, termed BLiC, based upon an improved similarity measure for comparing DNA motifs. By applying it to genome-wide data in yeast, we identified the DNA motifs of transcription factors and their putative target genes. Finally, we analyze motifs of transcription factor that alter their target genes under different conditions, and show how cells adjust their regulatory program in response to environmental changes.
Collapse
Affiliation(s)
- Naomi Habib
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel
| | | | | | | |
Collapse
|
67
|
MCCORD RACHELPATTON, BULYK MARTHAL. Functional trends in structural classes of the DNA binding domains of regulatory transcription factors. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2008:441-52. [PMID: 18229706 PMCID: PMC2757920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The DNA-binding domain (DBD) structure of a regulatory transcription factor (TF) is important in determining its DNA sequence specificity, but it is unclear whether a relationship exists between DBD structure and general TF biological function or regulatory mechanism. We observed moderate enrichment of functional annotation terms among TFs of the same structural class in Escherichia coli, Saccharomyces cerevisiae, Drosophila melanogaster, or Mus musculus, suggesting some preference for TFs of similar structures in the regulation of similar processes. In yeast, we also found trends among TF structural classes in phenomena including gene expression coherence, DNA binding site motif similarity, the general or specific nature of TFs' regulatory roles, and the position of a TF in a gene regulatory network. These results suggest that the biophysical constraints of different TF structural classes play a role in their gene regulatory mechanisms.
Collapse
Affiliation(s)
- RACHEL PATTON MCCORD
- Division of Genetics, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Harvard University Graduate Biophysics Program, Cambridge, MA 02138, ,
| | - MARTHA L. BULYK
- Division of Genetics, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Department of Pathology, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Harvard/MIT Division of Health Sciences & Technology (HST), Harvard Medical School, Boston, MA 02115
- Harvard University Graduate Biophysics Program, Cambridge, MA 02138, ,
| |
Collapse
|
68
|
Abstract
BACKGROUND Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. This survey reviews the latest developments in DNA motif finding algorithms. RESULTS Earlier algorithms use promoter sequences of coregulated genes from single genome and search for statistically overrepresented motifs. Recent algorithms are designed to use phylogenetic footprinting or orthologous sequences and also an integrated approach where promoter sequences of coregulated genes and phylogenetic footprinting are used. All the algorithms studied have been reported to correctly detect the motifs that have been previously detected by laboratory experimental approaches, and some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms. CONCLUSION Despite considerable efforts to date, DNA motif finding remains a complex challenge for biologists and computer scientists. Researchers have taken many different approaches in developing motif discovery tools and the progress made in this area of research is very encouraging. Performance comparison of different motif finding tools and identification of the best tools have proven to be a difficult task because tools are designed based on algorithms and motif models that are diverse and complex and our incomplete understanding of the biology of regulatory mechanism does not always provide adequate evaluation of underlying algorithms over motif models.
Collapse
Affiliation(s)
- Modan K Das
- Computer Science Department, Oklahoma State University, Stillwater, Oklahoma 74078, USA
- USDA-ARS, Department of Plant Sciences, University of Arizona, Tucson, Arizona 85721, USA
| | - Ho-Kwok Dai
- Computer Science Department, Oklahoma State University, Stillwater, Oklahoma 74078, USA
| |
Collapse
|
69
|
Moroni E, Caselle M, Fogolari F. Identification of DNA-binding protein target sequences by physical effective energy functions: free energy analysis of lambda repressor-DNA complexes. BMC STRUCTURAL BIOLOGY 2007; 7:61. [PMID: 17900341 PMCID: PMC2194778 DOI: 10.1186/1472-6807-7-61] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2007] [Accepted: 09/27/2007] [Indexed: 11/26/2022]
Abstract
Background Specific binding of proteins to DNA is one of the most common ways gene expression is controlled. Although general rules for the DNA-protein recognition can be derived, the ambiguous and complex nature of this mechanism precludes a simple recognition code, therefore the prediction of DNA target sequences is not straightforward. DNA-protein interactions can be studied using computational methods which can complement the current experimental methods and offer some advantages. In the present work we use physical effective potentials to evaluate the DNA-protein binding affinities for the λ repressor-DNA complex for which structural and thermodynamic experimental data are available. Results The binding free energy of two molecules can be expressed as the sum of an intermolecular energy (evaluated using a molecular mechanics forcefield), a solvation free energy term and an entropic term. Different solvation models are used including distance dependent dielectric constants, solvent accessible surface tension models and the Generalized Born model. The effect of conformational sampling by Molecular Dynamics simulations on the computed binding energy is assessed; results show that this effect is in general negative and the reproducibility of the experimental values decreases with the increase of simulation time considered. The free energy of binding for non-specific complexes, estimated using the best energetic model, agrees with earlier theoretical suggestions. As a results of these analyses, we propose a protocol for the prediction of DNA-binding target sequences. The possibility of searching regulatory elements within the bacteriophage λ genome using this protocol is explored. Our analysis shows good prediction capabilities, even in absence of any thermodynamic data and information on the naturally recognized sequence. Conclusion This study supports the conclusion that physics-based methods can offer a completely complementary methodology to sequence-based methods for the identification of DNA-binding protein target sequences.
Collapse
Affiliation(s)
- Elisabetta Moroni
- Dipartimento di Fisica Teorica, Universià di Torino and INFN, Via P. Giuria 1, 10125 Torino, Italy
- Dipartimento di Fisica G. Occhialini, Università di Milano-Bicocca and INFN, Piazza delle Scienze 3, 20156 Milano, Italy
| | - Michele Caselle
- Dipartimento di Fisica Teorica, Universià di Torino and INFN, Via P. Giuria 1, 10125 Torino, Italy
| | - Federico Fogolari
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| |
Collapse
|
70
|
Bussemaker HJ, Foat BC, Ward LD. Predictive modeling of genome-wide mRNA expression: from modules to molecules. ACTA ACUST UNITED AC 2007; 36:329-47. [PMID: 17311525 DOI: 10.1146/annurev.biophys.36.040306.132725] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Various algorithms are available for predicting mRNA expression and modeling gene regulatory processes. They differ in whether they rely on the existence of modules of coregulated genes or build a model that applies to all genes, whether they represent regulatory activities as hidden variables or as mRNA levels, and whether they implicitly or explicitly model the complex cis-regulatory logic of multiple interacting transcription factors binding the same DNA. The fact that functional genomics data of different types reflect the same molecular processes provides a natural strategy for integrative computational analysis. One promising avenue toward an accurate and comprehensive model of gene regulation combines biophysical modeling of the interactions among proteins, DNA, and RNA with the use of large-scale functional genomics data to estimate regulatory network connectivity and activity parameters. As the ability of these models to represent complex cis-regulatory logic increases, the need for approaches based on cross-species conservation may diminish.
Collapse
Affiliation(s)
- Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.
| | | | | |
Collapse
|
71
|
Morozov AV, Siggia ED. Connecting protein structure with predictions of regulatory sites. Proc Natl Acad Sci U S A 2007; 104:7068-73. [PMID: 17438293 PMCID: PMC1855371 DOI: 10.1073/pnas.0701356104] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A common task posed by microarray experiments is to infer the binding site preferences for a known transcription factor from a collection of genes that it regulates and to ascertain whether the factor acts alone or in a complex. The converse problem can also be posed: Given a collection of binding sites, can the regulatory factor or complex of factors be inferred? Both tasks are substantially facilitated by using relatively simple homology models for protein-DNA interactions, as well as the rapidly expanding protein structure database. For budding yeast, we are able to construct reliable structural models for 67 transcription factors and with them redetermine factor binding sites by using a Bayesian Gibbs sampling algorithm and an extensive protein localization data set. For 49 factors in common with a prior analysis of this data set (based largely on phylogenetic conservation), we find that half of the previously predicted binding motifs are in need of some revision. We also solve the inverse problem of ascertaining the factors from the binding sites by assigning a correct protein fold to 25 of the 49 cases from a previous study. Our approach is easily extended to other organisms, including higher eukaryotes. Our study highlights the utility of enlarging current structural genomics projects that exhaustively sample fold structure space to include all factors with significantly different DNA-binding specificities.
Collapse
Affiliation(s)
- Alexandre V Morozov
- Center for Studies in Physics and Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10021, USA.
| | | |
Collapse
|
72
|
Langlois RE, Carson MB, Bhardwaj N, Lu H. Learning to translate sequence and structure to function: identifying DNA binding and membrane binding proteins. Ann Biomed Eng 2007; 35:1043-52. [PMID: 17436108 PMCID: PMC2706547 DOI: 10.1007/s10439-007-9312-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2006] [Accepted: 04/02/2007] [Indexed: 10/23/2022]
Abstract
A protein's function depends in a large part on interactions with other molecules. With an increasing number of protein structures becoming available every year, a corresponding structural annotation approach identifying such interactions grows more expedient. At the same time, machine learning has gained popularity in bioinformatics providing robust annotation of genes and proteins without sequence homology. Here we have developed a general machine learning protocol to identify proteins that bind DNA and membrane. In general, there is no theory or even rule of thumb to pick the best machine learning algorithm. Thus, a systematic comparison of several classification algorithms known to perform well is investigated. Indeed, the boosted tree classifier is found to give the best performance, achieving 93% and 88% accuracy to discriminate non-homologous proteins that bind membrane and DNA, respectively, significantly outperforming all previously published works. We also attempted to address the importance of the attributes in function prediction and the relationships between relevant attributes. A graphical model based on boosted trees is applied to study the important features in discriminating DNA-binding proteins. In summary, the current protocol identified physical features important in DNA and membrane binding, rather than annotating function through sequence similarity.
Collapse
Affiliation(s)
| | | | | | - Hui Lu
- Corresponding Author: Hui Lu 851 S Morgan, Rm 218, M/C063 Chicago, IL 60607 Phone: (312) 413−2021 Fax: (312) 413−2018
| |
Collapse
|
73
|
Abnizova I, Subhankulova T, Gilks WR. Recent computational approaches to understand gene regulation: mining gene regulation in silico. Curr Genomics 2007; 8:79-91. [PMID: 18660846 PMCID: PMC2435357 DOI: 10.2174/138920207780368150] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2006] [Revised: 12/13/2006] [Accepted: 12/15/2006] [Indexed: 01/03/2023] Open
Abstract
This paper reviews recent computational approaches to the understanding of gene regulation in eukaryotes. Cis-regulation of gene expression by the binding of transcription factors is a critical component of cellular physiology. In eukaryotes, a number of transcription factors often work together in a combinatorial fashion to enable cells to respond to a wide spectrum of environmental and developmental signals. Integration of genome sequences and/or Chromatin Immunoprecipitation on chip data with gene-expression data has facilitated in silico discovery of how the combinatorics and positioning of transcription factors binding sites underlie gene activation in a variety of cellular processes.The process of gene regulation is extremely complex and intriguing, therefore all possible points of view and related links should be carefully considered. Here we attempt to collect an inventory, not claiming it to be comprehensive and complete, of related computational biological topics covering gene regulation, which may en-lighten the process, and briefly review what is currently occurring in these areas.We will consider the following computational areas:o gene regulatory network construction;o evolution of regulatory DNA;o studies of its structural and statistical informational properties;o and finally, regulatory RNA.
Collapse
Affiliation(s)
| | - T Subhankulova
- Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer and Developmental Biology, Cambridge, UK
| | | |
Collapse
|
74
|
Siggers TW, Honig B. Structure-based prediction of C2H2 zinc-finger binding specificity: sensitivity to docking geometry. Nucleic Acids Res 2007; 35:1085-97. [PMID: 17264128 PMCID: PMC1851644 DOI: 10.1093/nar/gkl1155] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Predicting the binding specificity of transcription factors is a critical step in the characterization and computational identification and of cis-regulatory elements in genomic sequences. Here we use protein–DNA structures to predict binding specificity and consider the possibility of predicting position weight matrices (PWM) for an entire protein family based on the structures of just a few family members. A particular focus is the sensitivity of prediction accuracy to the docking geometry of the structure used. We investigate this issue with the goal of determining how similar two docking geometries must be for binding specificity predictions to be accurate. Docking similarity is quantified using our recently described interface alignment score (IAS). Using a molecular-mechanics force field, we predict high-affinity nucleotide sequences that bind to the second zinc-finger (ZF) domain from the Zif268 protein, using different C2H2 ZF domains as structural templates. We identify a strong relationship between IAS values and prediction accuracy, and define a range of IAS values for which accurate structure-based predictions of binding specificity is to be expected. The implication of our results for large-scale, structure-based prediction of PWMs is discussed.
Collapse
Affiliation(s)
| | - Barry Honig
- *To whom correspondence should be addressed. Tel: + 1 212 851 4651; Fax: + 1 212 8514 650;
| |
Collapse
|
75
|
Abstract
Computational biology is a rapidly evolving area where methodologies from computer science, mathematics, and statistics are applied to address fundamental problems in biology. The study of gene regulatory information is a central problem in current computational biology. This article reviews recent development of statistical methods related to this field. Starting from microarray gene selection, we examine methods for finding transcription factor binding motifs and cis-regulatory modules in coregulated genes, and methods for utilizing information from cross-species comparisons and ChIP-chip experiments. The ultimate understanding of cis-regulatory logic in mammalian genomes may require the integration of information collected from all these steps.
Collapse
Affiliation(s)
- Hongkai Ji
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, Massachusetts 02138, USA.
| | | |
Collapse
|
76
|
Liu LA, Bader JS. Decoding transcriptional regulatory interactions. PHYSICA D. NONLINEAR PHENOMENA 2006; 224:174-181. [PMID: 17364011 PMCID: PMC1827156 DOI: 10.1016/j.physd.2006.09.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Transcription factor proteins control the temporal and spatial expression of genes by binding specific regulatory elements, or motifs, in DNA. Mapping a transcription factor to its motif is an important step towards defining the structure of transcriptional regulatory networks and understanding their dynamics. The information to map a transcription factor to its DNA binding specificity is in principle contained in the protein sequence. Nevertheless, methods that map directly from protein sequence to target DNA sequence have been lacking, and generation of regulatory maps has required experimental data. Here we describe a purely computational method for predicting transcription factor binding. The method calculates the free energy of binding between a transcription factor and possible target DNA sequences using thermodynamic integration. Approximations of additivity (each DNA basepair contributes independently to the binding energy) and linear response (the DNA-protein and DNA-solvent couplings are linear in an effective reaction coordinate representing the basepair character at a specific position) make the computations feasible and can be verified by more detailed simulations. Results obtained for MAT-alpha2, a yeast homeodomain transcription factor, are in good agreement with known results. This method promises to provide a general, computationally feasible route from a genome sequence to a gene regulatory network.
Collapse
Affiliation(s)
| | - Joel S. Bader
- Email address: (L. Angela Liu and Joel S. Bader). URL:www.jhubiomed.org (L. Angela Liu and Joel S. Bader)
| |
Collapse
|
77
|
Itzkovitz S, Tlusty T, Alon U. Coding limits on the number of transcription factors. BMC Genomics 2006; 7:239. [PMID: 16984633 PMCID: PMC1590034 DOI: 10.1186/1471-2164-7-239] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2006] [Accepted: 09/19/2006] [Indexed: 12/02/2022] Open
Abstract
Background Transcription factor proteins bind specific DNA sequences to control the expression of genes. They contain DNA binding domains which belong to several super-families, each with a specific mechanism of DNA binding. The total number of transcription factors encoded in a genome increases with the number of genes in the genome. Here, we examined the number of transcription factors from each super-family in diverse organisms. Results We find that the number of transcription factors from most super-families appears to be bounded. For example, the number of winged helix factors does not generally exceed 300, even in very large genomes. The magnitude of the maximal number of transcription factors from each super-family seems to correlate with the number of DNA bases effectively recognized by the binding mechanism of that super-family. Coding theory predicts that such upper bounds on the number of transcription factors should exist, in order to minimize cross-binding errors between transcription factors. This theory further predicts that factors with similar binding sequences should tend to have similar biological effect, so that errors based on mis-recognition are minimal. We present evidence that transcription factors with similar binding sequences tend to regulate genes with similar biological functions, supporting this prediction. Conclusion The present study suggests limits on the transcription factor repertoire of cells, and suggests coding constraints that might apply more generally to the mapping between binding sites and biological function.
Collapse
Affiliation(s)
- Shalev Itzkovitz
- Dept. Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
- Dept. Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Tsvi Tlusty
- Dept. Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Uri Alon
- Dept. Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
- Dept. Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
78
|
Zhang W, Walker E, Tamplin OJ, Rossant J, Stanford WL, Hughes TR. Zfp206 regulates ES cell gene expression and differentiation. Nucleic Acids Res 2006; 34:4780-90. [PMID: 16971461 PMCID: PMC1635278 DOI: 10.1093/nar/gkl631] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Understanding transcriptional regulation in early developmental stages is fundamental to understanding mammalian development and embryonic stem (ES) cell properties. Expression surveys suggest that the putative SCAN-Zinc finger transcription factor Zfp206 is expressed specifically in ES cells [Zhang,W., Morris,Q.D., Chang,R., Shai,O., Bakowski,M.A., Mitsakakis,N., Mohammad,N., Robinson,M.D., Zirngibl,R., Somogyi,E. et al., (2004) J. Biol., 3, 21; Brandenberger,R., Wei,H., Zhang,S., Lei,S., Murage,J., Fisk,G.J., Li,Y., Xu,C., Fang,R., Guegler,K. et al., (2004) Nat. Biotechnol., 22, 707-716]. Here, we confirm this observation, and we show that ZFP206 expression decreases rapidly upon differentiation of cultured mouse ES cells, and during development of mouse embryos. We find that there are at least six isoforms of the ZFP206 transcript, the longest being predominant. Overexpression and depletion experiments show that Zfp206 promotes formation of undifferentiated ES cell clones, and positively regulates abundance of a very small set of transcripts whose expression is also specific to ES cells and the two- to four-cell stages of preimplantation embryos. This set includes members of the Zscan4, Thoc4, Tcstv1 and eIF-1A gene families, none of which have been functionally characterized in vivo but whose members include apparent transcription factors, RNA-binding proteins and translation factors. Together, these data indicate that Zfp206 is a regulator of ES cell differentiation that controls a set of genes expressed very early in development, most of which themselves appear to be regulators.
Collapse
Affiliation(s)
- Wen Zhang
- Department of Molecular and Medical Genetics, University of Toronto#4388 Medical Sciences Building, 1 King's College Circle, Toronto, ON M5S 1A8 Canada
| | - Emily Walker
- Institute for Biomaterials and Biomedical Engineering164 College Street Room 407, Toronto, ON M5S 3G9 Canada
| | - Owen J. Tamplin
- Department of Molecular and Medical Genetics, University of Toronto#4388 Medical Sciences Building, 1 King's College Circle, Toronto, ON M5S 1A8 Canada
- The Hospital for Sick Children101 College Street Room 13-305, Toronto, ON M5G 1L7 Canada
| | - Janet Rossant
- Department of Molecular and Medical Genetics, University of Toronto#4388 Medical Sciences Building, 1 King's College Circle, Toronto, ON M5S 1A8 Canada
- The Hospital for Sick Children101 College Street Room 13-305, Toronto, ON M5G 1L7 Canada
| | - William L. Stanford
- Institute for Biomaterials and Biomedical Engineering164 College Street Room 407, Toronto, ON M5S 3G9 Canada
| | - Timothy R. Hughes
- Department of Molecular and Medical Genetics, University of Toronto#4388 Medical Sciences Building, 1 King's College Circle, Toronto, ON M5S 1A8 Canada
- Banting and Best Department of Medical Research, University of Toronto112 College Street, Toronto, ON M5G 1L6 Canada
- To whom correspondence should be addressed. Tel: 416 946 8260; Fax: 416 978 8528;
| |
Collapse
|
79
|
Zhang JZ, Gao W, Yang HB, Zhang B, Zhu ZY, Xue YF. Screening for genes essential for mouse embryonic stem cell self-renewal using a subtractive RNA interference library. Stem Cells 2006; 24:2661-8. [PMID: 16960129 DOI: 10.1634/stemcells.2006-0017] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The pluripotency of mouse embryonic stem (ES) cells is maintained by self-renewal. To screen for genes essential for this process, we constructed an RNA interference (RNAi) library by inserting subtracted ES cell cDNA fragments into plasmid containing two opposing cytomegalovirus promoters. ES cells were transfected with individual RNAi plasmids and levels of the pluripotency marker Oct-4 were monitored 48 hours later by real time RT-PCR. Of the first 89 RNAi plasmids characterized, 12 downregulated Oct-4 expression to less than 50% of the normal level and 7 of them upregulated Oct-4 expression to more than 150% of the normal level. To investigate their long-term effect on self-renewal, ES cells were transfected by these 19 RNAi plasmids individually and G418-resistant colonies were subjected to alkaline phosphatase (AP) staining after 7 days selection. Except for 4 plasmids that caused cell death, the ratio of AP positive colonies was repressed to less than 60% of the control group by the other 15 plasmids and even below 20% by 10 plasmids. The cDNA fragments in these 10 plasmids correspond to eight genes, including Zfp42/Rex-1, which was chosen for further functional analysis. RNAi knockdown of Zfp42 induced ES cells differentiate to endoderm and mesoderm lineages, and overexpression of Zfp42 also caused ES cells to lose the capacity of self-renewal. Our results indicate that RNAi screen is a feasible and efficient approach to identify genes involved in ES cells self-renewal. Further functional characterization of these genes will promote our understanding of the complex regulatory networks in ES cells.
Collapse
Affiliation(s)
- Jun-Zheng Zhang
- Key Laboratory of Cell Proliferation and Differentiation of Ministry of Education, College of Life Sciences, Peking University, Beijing, China
| | | | | | | | | | | |
Collapse
|
80
|
Segal DJ, Crotty JW, Bhakta MS, Barbas CF, Horton NC. Structure of Aart, a designed six-finger zinc finger peptide, bound to DNA. J Mol Biol 2006; 363:405-21. [PMID: 16963084 DOI: 10.1016/j.jmb.2006.08.016] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2005] [Revised: 08/08/2006] [Accepted: 08/08/2006] [Indexed: 11/20/2022]
Abstract
Cys2-His2 zinc fingers are one of the most common types of DNA-binding domains. Modifications to zinc-finger binding specificity have recently enabled custom DNA-binding proteins to be designed to a wide array of target sequences. We present here a 1.96 A structure of Aart, a designed six-zinc finger protein, bound to a consensus DNA target site. This is the first structure of a designed protein with six fingers, and was intended to provide insights into the unusual affinity and specificity characteristics of this protein. Most protein-DNA contacts were found to be consistent with expectations, while others were unanticipated or insufficient to explain specificity. Several were unexpectedly mediated by glycerol, water molecules or amino acid-base stacking interactions. These results challenge some conventional concepts of recognition, particularly the finding that triplets containing 5'A, C, or T are typically not specified by direct interaction with the amino acid in position 6 of the recognition helix.
Collapse
Affiliation(s)
- David J Segal
- UC Davis Genome Center and Department of Pharmacology, University of California, Davis, CA 95616, USA.
| | | | | | | | | |
Collapse
|
81
|
GuhaThakurta D. Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res 2006; 34:3585-98. [PMID: 16855295 PMCID: PMC1524905 DOI: 10.1093/nar/gkl372] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Identification and annotation of all the functional elements in the genome, including genes and the regulatory sequences, is a fundamental challenge in genomics and computational biology. Since regulatory elements are frequently short and variable, their identification and discovery using computational algorithms is difficult. However, significant advances have been made in the computational methods for modeling and detection of DNA regulatory elements. The availability of complete genome sequence from multiple organisms, as well as mRNA profiling and high-throughput experimental methods for mapping protein-binding sites in DNA, have contributed to the development of methods that utilize these auxiliary data to inform the detection of transcriptional regulatory elements. Progress is also being made in the identification of cis-regulatory modules and higher order structures of the regulatory sequences, which is essential to the understanding of transcription regulation in the metazoan genomes. This article reviews the computational approaches for modeling and identification of genomic regulatory elements, with an emphasis on the recent developments, and current challenges.
Collapse
Affiliation(s)
- Debraj GuhaThakurta
- Research Genetics Division, Rosetta Inpharmatics LLC, Merck & Co., Inc, 401 Terry Avenue North, Seattle, WA 98109, USA.
| |
Collapse
|
82
|
Barrett CL, Palsson BO. Iterative reconstruction of transcriptional regulatory networks: an algorithmic approach. PLoS Comput Biol 2006; 2:e52. [PMID: 16710450 PMCID: PMC1463018 DOI: 10.1371/journal.pcbi.0020052] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2005] [Accepted: 04/05/2006] [Indexed: 11/25/2022] Open
Abstract
The number of complete, publicly available genome sequences is now greater than 200, and this number is expected to rapidly grow in the near future as metagenomic and environmental sequencing efforts escalate and the cost of sequencing drops. In order to make use of this data for understanding particular organisms and for discerning general principles about how organisms function, it will be necessary to reconstruct their various biochemical reaction networks. Principal among these will be transcriptional regulatory networks. Given the physical and logical complexity of these networks, the various sources of (often noisy) data that can be utilized for their elucidation, the monetary costs involved, and the huge number of potential experiments (~1012) that can be performed, experiment design algorithms will be necessary for synthesizing the various computational and experimental data to maximize the efficiency of regulatory network reconstruction. This paper presents an algorithm for experimental design to systematically and efficiently reconstruct transcriptional regulatory networks. It is meant to be applied iteratively in conjunction with an experimental laboratory component. The algorithm is presented here in the context of reconstructing transcriptional regulation for metabolism in Escherichia coli, and, through a retrospective analysis with previously performed experiments, we show that the produced experiment designs conform to how a human would design experiments. The algorithm is able to utilize probability estimates based on a wide range of computational and experimental sources to suggest experiments with the highest potential of discovering the greatest amount of new regulatory knowledge. In recent years, the exploration of life has been bolstered through the advent of whole genome sequencing. This new data source significantly enables the reconstruction of genome-scale metabolic networks. After a metabolic reconstruction, it will be necessary to discover the genetic control mechanisms that operate within an organism. Transcriptional regulatory network (TRN) reconstruction is costly both in terms of time and money, so it is critical that the reconstruction efforts be made as efficient as possible. Experiments must be designed so that the most new regulatory knowledge is discovered in each experiment. The huge number of possible experiments (~1012) and the vast amount of heterogeneous data available for designing experiments overwhelms the human ability to assimilate. The authors have developed an algorithm that utilizes a mathematical model of a reconstructed metabolic network integrated with a partially reconstructed TRN to identify the experiment designs with the highest potential of yielding the most new regulatory knowledge. The authors show that the produced experiment designs are similar to those a human expert would produce, and that the algorithm has a facility to incorporate any relevant data source to design such experiments.
Collapse
Affiliation(s)
- Christian L Barrett
- Bioengineering Department, University of California San Diego, La Jolla, California, United States of America
| | - Bernhard O Palsson
- Bioengineering Department, University of California San Diego, La Jolla, California, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
83
|
Grigoryan G, Keating AE. Structure-based Prediction of bZIP Partnering Specificity. J Mol Biol 2006; 355:1125-42. [PMID: 16359704 DOI: 10.1016/j.jmb.2005.11.036] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2005] [Revised: 11/10/2005] [Accepted: 11/11/2005] [Indexed: 10/25/2022]
Abstract
Predicting protein interaction specificity from sequence is an important goal in computational biology. We present a model for predicting the interaction preferences of coiled-coil peptides derived from bZIP transcription factors that performs very well when tested against experimental protein microarray data. We used only sequence information to build atomic-resolution structures for 1711 dimeric complexes, and evaluated these with a variety of functions based on physics, learned empirical weights or experimental coupling energies. A purely physical model, similar to those used for protein design studies, gave reasonable performance. The results were improved significantly when helix propensities were used in place of a structurally explicit model to represent the unfolded reference state. Further improvement resulted upon accounting for residue-residue interactions in competing states in a generic way. Purely physical structure-based methods had difficulty capturing core interactions accurately, especially those involving polar residues such as asparagine. When these terms were replaced with weights from a machine-learning approach, the resulting model was able to correctly order the stabilities of over 6000 pairs of complexes with greater than 90% accuracy. The final model is physically interpretable, and suggests specific pairs of residues that are important for bZIP interaction specificity. Our results illustrate the power and potential of structural modeling as a method for predicting protein interactions and highlight obstacles that must be overcome to reach quantitative accuracy using a de novo approach. Our method shows unprecedented performance in predicting protein-protein interaction specificity accurately using structural modeling and suggests that predicting coiled-coil interactions generally may be within reach.
Collapse
|
84
|
Morozov AV, Havranek JJ, Baker D, Siggia ED. Protein-DNA binding specificity predictions with structural models. Nucleic Acids Res 2005; 33:5781-98. [PMID: 16246914 PMCID: PMC1270944 DOI: 10.1093/nar/gki875] [Citation(s) in RCA: 139] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Protein-DNA interactions play a central role in transcriptional regulation and other biological processes. Investigating the mechanism of binding affinity and specificity in protein-DNA complexes is thus an important goal. Here we develop a simple physical energy function, which uses electrostatics, solvation, hydrogen bonds and atom-packing terms to model direct readout and sequence-specific DNA conformational energy to model indirect readout of DNA sequence by the bound protein. The predictive capability of the model is tested against another model based only on the knowledge of the consensus sequence and the number of contacts between amino acids and DNA bases. Both models are used to carry out predictions of protein-DNA binding affinities which are then compared with experimental measurements. The nearly additive nature of protein-DNA interaction energies in our model allows us to construct position-specific weight matrices by computing base pair probabilities independently for each position in the binding site. Our approach is less data intensive than knowledge-based models of protein-DNA interactions, and is not limited to any specific family of transcription factors. However, native structures of protein-DNA complexes or their close homologs are required as input to the model. Use of homology modeling can significantly increase the extent of our approach, making it a useful tool for studying regulatory pathways in many organisms and cell types.
Collapse
Affiliation(s)
- Alexandre V Morozov
- Center for Studies in Physics and Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10021, USA.
| | | | | | | |
Collapse
|