1
|
Corynebacterium glutamicum Regulation beyond Transcription: Organizing Principles and Reconstruction of an Extended Regulatory Network Incorporating Regulations Mediated by Small RNA and Protein-Protein Interactions. Microorganisms 2021; 9:microorganisms9071395. [PMID: 34203422 PMCID: PMC8303971 DOI: 10.3390/microorganisms9071395] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Revised: 01/08/2021] [Accepted: 01/12/2021] [Indexed: 11/16/2022] Open
Abstract
Corynebacterium glutamicum is a Gram-positive bacterium found in soil where the condition changes demand plasticity of the regulatory machinery. The study of such machinery at the global scale has been challenged by the lack of data integration. Here, we report three regulatory network models for C. glutamicum: strong (3040 interactions) constructed solely with regulations previously supported by directed experiments; all evidence (4665 interactions) containing the strong network, regulations previously supported by nondirected experiments, and protein-protein interactions with a direct effect on gene transcription; sRNA (5222 interactions) containing the all evidence network and sRNA-mediated regulations. Compared to the previous version (2018), the strong and all evidence networks increased by 75 and 1225 interactions, respectively. We analyzed the system-level components of the three networks to identify how they differ and compared their structures against those for the networks of more than 40 species. The inclusion of the sRNA-mediated regulations changed the proportions of the system-level components and increased the number of modules but decreased their size. The C. glutamicum regulatory structure contrasted with other bacterial regulatory networks. Finally, we used the strong networks of three model organisms to provide insights and future directions of the C.glutamicum regulatory network characterization.
Collapse
|
2
|
Suvorova IA, Gelfand MS. Comparative Analysis of the IclR-Family of Bacterial Transcription Factors and Their DNA-Binding Motifs: Structure, Positioning, Co-Evolution, Regulon Content. Front Microbiol 2021; 12:675815. [PMID: 34177859 PMCID: PMC8222616 DOI: 10.3389/fmicb.2021.675815] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 05/14/2021] [Indexed: 11/13/2022] Open
Abstract
The IclR-family is a large group of transcription factors (TFs) regulating various biological processes in diverse bacteria. Using comparative genomics techniques, we have identified binding motifs of IclR-family TFs, reconstructed regulons and analyzed their content, finding co-occurrences between the regulated COGs (clusters of orthologous genes), useful for future functional characterizations of TFs and their regulated genes. We describe two main types of IclR-family motifs, similar in sequence but different in the arrangement of the half-sites (boxes), with GKTYCRYW3-4RYGRAMC and TGRAACAN1-2TGTTYCA consensuses, and also predict that TFs in 32 orthologous groups have binding sites comprised of three boxes with alternating direction, which implies two possible alternative modes of dimerization of TFs. We identified trends in site positioning relative to the translational gene start, and show that TFs in 94 orthologous groups bind tandem sites with 18-22 nucleotides between their centers. We predict protein-DNA contacts via the correlation analysis of nucleotides in binding sites and amino acids of the DNA-binding domain of TFs, and show that the majority of interacting positions and predicted contacts are similar for both types of motifs and conform well both to available experimental data and to general protein-DNA interaction trends.
Collapse
Affiliation(s)
- Inna A Suvorova
- Institute for Information Transmission Problems of Russian Academy of Sciences (The Kharkevich Institute), Moscow, Russia
| | - Mikhail S Gelfand
- Institute for Information Transmission Problems of Russian Academy of Sciences (The Kharkevich Institute), Moscow, Russia.,Skolkovo Institute of Science and Technology, Moscow, Russia
| |
Collapse
|
3
|
Baumgarten N, Schmidt F, Schulz MH. Improved linking of motifs to their TFs using domain information. Bioinformatics 2020; 36:1655-1662. [PMID: 31742324 PMCID: PMC7703792 DOI: 10.1093/bioinformatics/btz855] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 11/08/2019] [Accepted: 11/16/2019] [Indexed: 11/23/2022] Open
Abstract
Motivation A central aim of molecular biology is to identify mechanisms of transcriptional regulation. Transcription factors (TFs), which are DNA-binding proteins, are highly involved in these processes, thus a crucial information is to know where TFs interact with DNA and to be aware of the TFs’ DNA-binding motifs. For that reason, computational tools exist that link DNA-binding motifs to TFs either without sequence information or based on TF-associated sequences, e.g. identified via a chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiment. In this paper, we present MASSIF, a novel method to improve the performance of existing tools that link motifs to TFs relying on TF-associated sequences. MASSIF is based on the idea that a DNA-binding motif, which is correctly linked to a TF, should be assigned to a DNA-binding domain (DBD) similar to that of the mapped TF. Because DNA-binding motifs are in general not linked to DBDs, it is not possible to compare the DBD of a TF and the motif directly. Instead we created a DBD collection, which consist of TFs with a known DBD and an associated motif. This collection enables us to evaluate how likely it is that a linked motif and a TF of interest are associated to the same DBD. We named this similarity measure domain score, and represent it as a P-value. We developed two different ways to improve the performance of existing tools that link motifs to TFs based on TF-associated sequences: (i) using meta-analysis to combine P-values from one or several of these tools with the P-value of the domain score and (ii) filter unlikely motifs based on the domain score. Results We demonstrate the functionality of MASSIF on several human ChIP-seq datasets, using either motifs from the HOCOMOCO database or de novo identified ones as input motifs. In addition, we show that both variants of our method improve the performance of tools that link motifs to TFs based on TF-associated sequences significantly independent of the considered DBD type. Availability and implementation MASSIF is freely available online at https://github.com/SchulzLab/MASSIF. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nina Baumgarten
- Institute for Cardiovascular Regeneration, Goethe University, Frankfurt am Main 60590, Germany.,German Center for Cardiovascular Regeneration, Partner Site Rhein-Main, Frankfurt am Main 60590, Germany
| | - Florian Schmidt
- High-throughput Genomics & Systems Biology, Cluster of Excellence MMCI, Saarland University.,Research Group Computational Biology, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken 66123, Germany
| | - Marcel H Schulz
- Institute for Cardiovascular Regeneration, Goethe University, Frankfurt am Main 60590, Germany.,German Center for Cardiovascular Regeneration, Partner Site Rhein-Main, Frankfurt am Main 60590, Germany.,High-throughput Genomics & Systems Biology, Cluster of Excellence MMCI, Saarland University.,Research Group Computational Biology, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken 66123, Germany
| |
Collapse
|
4
|
Andrabi M, Hutchins AP, Miranda-Saavedra D, Kono H, Nussinov R, Mizuguchi K, Ahmad S. Predicting conformational ensembles and genome-wide transcription factor binding sites from DNA sequences. Sci Rep 2017; 7:4071. [PMID: 28642456 PMCID: PMC5481346 DOI: 10.1038/s41598-017-03199-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2016] [Accepted: 04/26/2017] [Indexed: 12/24/2022] Open
Abstract
DNA shape is emerging as an important determinant of transcription factor binding beyond just the DNA sequence. The only tool for large scale DNA shape estimates, DNAshape was derived from Monte-Carlo simulations and predicts four broad and static DNA shape features, Propeller twist, Helical twist, Minor groove width and Roll. The contributions of other shape features e.g. Shift, Slide and Opening cannot be evaluated using DNAshape. Here, we report a novel method DynaSeq, which predicts molecular dynamics-derived ensembles of a more exhaustive set of DNA shape features. We compared the DNAshape and DynaSeq predictions for the common features and applied both to predict the genome-wide binding sites of 1312 TFs available from protein interaction quantification (PIQ) data. The results indicate a good agreement between the two methods for the common shape features and point to advantages in using DynaSeq. Predictive models employing ensembles from individual conformational parameters revealed that base-pair opening - known to be important in strand separation - was the best predictor of transcription factor-binding sites (TFBS) followed by features employed by DNAshape. Of note, TFBS could be predicted not only from the features at the target motif sites, but also from those as far as 200 nucleotides away from the motif.
Collapse
Affiliation(s)
- Munazah Andrabi
- National Institutes of Biomedical Innovation Health and Nutrition, 7-6-8, Saito-Asagi, Ibaraki, Osaka, 5670085, Japan
- Faculty of Biology,Medicine and Health, Michael Smith Building, The University of Manchester, Dover Street, Manchester, M13 9PT, UK
| | - Andrew Paul Hutchins
- Department of Biology, Southern University of Science and Technology of China, Shenzhen, 518055, China
| | - Diego Miranda-Saavedra
- World Premier International (WPI) Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita, 565-0871, Osaka, Japan
- Centro de Biología Molecular Severo Ochoa, CSIC/Universidad Autónoma de Madrid, 28049, Madrid, Spain
- Department of Computer Science, University of Oxford Wolfson Building, Parks Road, OXFORD, OX1 3QD, United Kingdom
| | - Hidetoshi Kono
- Molecular Modeling and Simulation (MMS) Group, National Institutes for Quantum and Radiological Science and Technology, 8-1-7, Umemidai, Kizugawa, Kyoto, 619-0215, Japan
| | - Ruth Nussinov
- National Cancer Institute, Cancer and Inflammation Program, Leidos Biomedical Research, Inc. Frederick, Maryland, USA
- Department of Biochemistry and Human Genetics, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Kenji Mizuguchi
- National Institutes of Biomedical Innovation Health and Nutrition, 7-6-8, Saito-Asagi, Ibaraki, Osaka, 5670085, Japan
| | - Shandar Ahmad
- National Institutes of Biomedical Innovation Health and Nutrition, 7-6-8, Saito-Asagi, Ibaraki, Osaka, 5670085, Japan.
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110067, India.
| |
Collapse
|
5
|
Validating regulatory predictions from diverse bacteria with mutant fitness data. PLoS One 2017; 12:e0178258. [PMID: 28542589 PMCID: PMC5443562 DOI: 10.1371/journal.pone.0178258] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Accepted: 04/27/2017] [Indexed: 11/26/2022] Open
Abstract
Although transcriptional regulation is fundamental to understanding bacterial physiology, the targets of most bacterial transcription factors are not known. Comparative genomics has been used to identify likely targets of some of these transcription factors, but these predictions typically lack experimental support. Here, we used mutant fitness data, which measures the importance of each gene for a bacterium’s growth across many conditions, to test regulatory predictions from RegPrecise, a curated collection of comparative genomics predictions. Because characterized transcription factors often have correlated fitness with one of their targets (either positively or negatively), correlated fitness patterns provide support for the comparative genomics predictions. At a false discovery rate of 3%, we identified significant cofitness for at least one target of 158 TFs in 107 ortholog groups and from 24 bacteria. Thus, high-throughput genetics can be used to identify a high-confidence subset of the sequence-based regulatory predictions.
Collapse
|
6
|
Oliver P, Peralta-Gil M, Tabche ML, Merino E. Molecular and structural considerations of TF-DNA binding for the generation of biologically meaningful and accurate phylogenetic footprinting analysis: the LysR-type transcriptional regulator family as a study model. BMC Genomics 2016; 17:686. [PMID: 27567672 PMCID: PMC5002191 DOI: 10.1186/s12864-016-3025-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 08/18/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The goal of most programs developed to find transcription factor binding sites (TFBSs) is the identification of discrete sequence motifs that are significantly over-represented in a given set of sequences where a transcription factor (TF) is expected to bind. These programs assume that the nucleotide conservation of a specific motif is indicative of a selective pressure required for the recognition of a TF for its corresponding TFBS. Despite their extensive use, the accuracies reached with these programs remain low. In many cases, true TFBSs are excluded from the identification process, especially when they correspond to low-affinity but important binding sites of regulatory systems. RESULTS We developed a computational protocol based on molecular and structural criteria to perform biologically meaningful and accurate phylogenetic footprinting analyses. Our protocol considers fundamental aspects of the TF-DNA binding process, such as: i) the active homodimeric conformations of TFs that impose symmetric structures on the TFBSs, ii) the cooperative binding of TFs, iii) the effects of the presence or absence of co-inducers, iv) the proximity between two TFBSs or one TFBS and a promoter that leads to very long spurious motifs, v) the presence of AT-rich sequences not recognized by the TF but that are required for DNA flexibility, and vi) the dynamic order in which the different binding events take place to determine a regulatory response (i.e., activation or repression). In our protocol, the abovementioned criteria were used to analyze a profile of consensus motifs generated from canonical Phylogenetic Footprinting Analyses using a set of analysis windows of incremental sizes. To evaluate the performance of our protocol, we analyzed six members of the LysR-type TF family in Gammaproteobacteria. CONCLUSIONS The identification of TFBSs based exclusively on the significance of the over-representation of motifs in a set of sequences might lead to inaccurate results. The consideration of different molecular and structural properties of the regulatory systems benefits the identification of TFBSs and enables the development of elaborate, biologically meaningful and precise regulatory models that offer a more integrated view of the dynamics of the regulatory process of transcription.
Collapse
Affiliation(s)
- Patricia Oliver
- Departmento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Martín Peralta-Gil
- Escuela Superior de Apan de la Universidad Autónoma del Estado de Hidalgo, Carretera Apan-Calpulalpan, Km 8, Chimalpa Tlalayote s/n, Colonia Chimalpa, Apan, Hidalgo, México
| | - María-Luisa Tabche
- Departmento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Enrique Merino
- Departmento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México.
| |
Collapse
|
7
|
Suvorova IA, Rodionov DA. Comparative genomics of pyridoxal 5'-phosphate-dependent transcription factor regulons in Bacteria. Microb Genom 2016; 2:e000047. [PMID: 28348826 PMCID: PMC5320631 DOI: 10.1099/mgen.0.000047] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 12/16/2015] [Indexed: 12/13/2022] Open
Abstract
The MocR-subfamily transcription factors (MocR-TFs) characterized by the GntR-family DNA-binding domain and aminotransferase-like sensory domain are broadly distributed among certain lineages of Bacteria. Characterized MocR-TFs bind pyridoxal 5'-phosphate (PLP) and control transcription of genes involved in PLP, gamma aminobutyric acid (GABA) and taurine metabolism via binding specific DNA operator sites. To identify putative target genes and DNA binding motifs of MocR-TFs, we performed comparative genomics analysis of over 250 bacterial genomes. The reconstructed regulons for 825 MocR-TFs comprise structural genes from over 200 protein families involved in diverse biological processes. Using the genome context and metabolic subsystem analysis we tentatively assigned functional roles for 38 out of 86 orthologous groups of studied regulators. Most of these MocR-TF regulons are involved in PLP metabolism, as well as utilization of GABA, taurine and ectoine. The remaining studied MocR-TF regulators presumably control genes encoding enzymes involved in reduction/oxidation processes, various transporters and PLP-dependent enzymes, for example aminotransferases. Predicted DNA binding motifs of MocR-TFs are generally similar in each orthologous group and are characterized by two to four repeated sequences. Identified motifs were classified according to their structures. Motifs with direct and/or inverted repeat symmetry constitute the majority of inferred DNA motifs, suggesting preferable TF dimerization in head-to-tail or head-to-head configuration. The obtained genomic collection of in silico reconstructed MocR-TF motifs and regulons in Bacteria provides a basis for future experimental characterization of molecular mechanisms for various regulators in this family.
Collapse
Affiliation(s)
- Inna A. Suvorova
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Science, Moscow, Russia
| | - Dmitry A. Rodionov
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Science, Moscow, Russia
- Sanford-Burnham-Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
- Correspondence D. A. Rodionov ()
| |
Collapse
|
8
|
Suvorova IA, Korostelev YD, Gelfand MS. GntR Family of Bacterial Transcription Factors and Their DNA Binding Motifs: Structure, Positioning and Co-Evolution. PLoS One 2015; 10:e0132618. [PMID: 26151451 PMCID: PMC4494728 DOI: 10.1371/journal.pone.0132618] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 06/16/2015] [Indexed: 12/03/2022] Open
Abstract
The GntR family of transcription factors (TFs) is a large group of proteins present in diverse bacteria and regulating various biological processes. Here we use the comparative genomics approach to reconstruct regulons and identify binding motifs of regulators from three subfamilies of the GntR family, FadR, HutC, and YtrA. Using these data, we attempt to predict DNA-protein contacts by analyzing correlations between binding motifs in DNA and amino acid sequences of TFs. We identify pairs of positions with high correlation between amino acids and nucleotides for FadR, HutC, and YtrA subfamilies and show that the most predicted DNA-protein interactions are quite similar in all subfamilies and conform well to the experimentally identified contacts formed by FadR from E. coli and AraR from B. subtilis. The most frequent predicted contacts in the analyzed subfamilies are Arg-G, Asn-A, Asp-C. We also analyze the divergon structure and preferred site positions relative to regulated genes in the FadR and HutC subfamilies. A single site in a divergon usually regulates both operons and is approximately in the middle of the intergenic area. Double sites are either involved in the co-operative regulation of both operons and then are in the center of the intergenic area, or each site in the pair independently regulates its own operon and tends to be near it. We also identify additional candidate TF-binding boxes near palindromic binding sites of TFs from the FadR, HutC, and YtrA subfamilies, which may play role in the binding of additional TF-subunits.
Collapse
Affiliation(s)
- Inna A. Suvorova
- Research and Training Center on Bioinformatics, Institute for Information Transmission Problems RAS (The Kharkevich Institute), Moscow, Russia
- * E-mail:
| | - Yuri D. Korostelev
- Research and Training Center on Bioinformatics, Institute for Information Transmission Problems RAS (The Kharkevich Institute), Moscow, Russia
| | - Mikhail S. Gelfand
- Research and Training Center on Bioinformatics, Institute for Information Transmission Problems RAS (The Kharkevich Institute), Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Moscow State University, Moscow, Russia
| |
Collapse
|
9
|
An integrated approach to reconstructing genome-scale transcriptional regulatory networks. PLoS Comput Biol 2015; 11:e1004103. [PMID: 25723545 PMCID: PMC4344238 DOI: 10.1371/journal.pcbi.1004103] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Accepted: 12/23/2014] [Indexed: 11/24/2022] Open
Abstract
Transcriptional regulatory networks (TRNs) program cells to dynamically alter their gene expression in response to changing internal or environmental conditions. In this study, we develop a novel workflow for generating large-scale TRN models that integrates comparative genomics data, global gene expression analyses, and intrinsic properties of transcription factors (TFs). An assessment of this workflow using benchmark datasets for the well-studied γ-proteobacterium Escherichia coli showed that it outperforms expression-based inference approaches, having a significantly larger area under the precision-recall curve. Further analysis indicated that this integrated workflow captures different aspects of the E. coli TRN than expression-based approaches, potentially making them highly complementary. We leveraged this new workflow and observations to build a large-scale TRN model for the α-Proteobacterium Rhodobacter sphaeroides that comprises 120 gene clusters, 1211 genes (including 93 TFs), 1858 predicted protein-DNA interactions and 76 DNA binding motifs. We found that ~67% of the predicted gene clusters in this TRN are enriched for functions ranging from photosynthesis or central carbon metabolism to environmental stress responses. We also found that members of many of the predicted gene clusters were consistent with prior knowledge in R. sphaeroides and/or other bacteria. Experimental validation of predictions from this R. sphaeroides TRN model showed that high precision and recall was also obtained for TFs involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341). In addition, this integrative approach enabled generation of TRNs with increased information content relative to R. sphaeroides TRN models built via other approaches. We also show how this approach can be used to simultaneously produce TRN models for each related organism used in the comparative genomics analysis. Our results highlight the advantages of integrating comparative genomics of closely related organisms with gene expression data to assemble large-scale TRN models with high-quality predictions. The ever growing amount of genomic data enables the assembly of large-scale network models that can provide important new insights into living systems. However, assembly and validation of such large-scale models can be challenging, since we often lack sufficient information to make accurate predictions. This work describes a new approach for constructing large-scale transcriptional regulatory networks of individual cells. We show that the reconstructed network captures a significantly larger fraction of cellular regulatory processes than networks generated by other existing approaches. We predict this approach, with appropriate refinements, will allow reconstruction of large-scale transcriptional network models for a variety of other organisms. As we work towards modeling the function of cells or complex ecosystems, individually reconstructed network models of signaling, information transfer and metabolism, can be integrated to provide high information predictions and insights not otherwise obtainable.
Collapse
|
10
|
Andrilenas KK, Penvose A, Siggers T. Using protein-binding microarrays to study transcription factor specificity: homologs, isoforms and complexes. Brief Funct Genomics 2014; 14:17-29. [PMID: 25431149 DOI: 10.1093/bfgp/elu046] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Protein-DNA binding is central to specificity in gene regulation, and methods for characterizing transcription factor (TF)-DNA binding remain crucial to studies of regulatory specificity. High-throughput (HT) technologies have revolutionized our ability to characterize protein-DNA binding by significantly increasing the number of binding measurements that can be performed. Protein-binding microarrays (PBMs) are a robust and powerful HT platform for studying DNA-binding specificity of TFs. Analysis of PBM-determined DNA-binding profiles has provided new insight into the scope and mechanisms of TF binding diversity. In this review, we focus specifically on the PBM technique and discuss its application to the study of TF specificity, in particular, the binding diversity of TF homologs and multi-protein complexes.
Collapse
|
11
|
Zhang S, Zhou X, Du C, Su Z. SPIC: a novel similarity metric for comparing transcription factor binding site motifs based on information contents. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 2:S14. [PMID: 24564945 PMCID: PMC3866262 DOI: 10.1186/1752-0509-7-s2-s14] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND Discovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome. A set of short DNA sequences identified by a transcription factor (TF) is known as a motif, which can be expressed accurately in matrix form such as a position-specific scoring matrix (PSSM) and a position frequency matrix. Very frequently, we need to query a motif in a database of motifs by seeking its similar motifs, merge similar TFBS motifs possibly identified by the same TF, separate irrelevant motifs, or filter out spurious motifs. Therefore, a novel metric is required to seize slight differences between irrelevant motifs and highlight the similarity between motifs of the same group in all these applications. While there are already several metrics for motif similarity proposed before, their performance is still far from satisfactory for these applications. METHODS A novel metric has been proposed in this paper with name as SPIC (Similarity with Position Information Contents) for measuring the similarity between a column of a motif and a column of another motif. When defining this similarity score, we consider the likelihood that the column of the first motif's PFM can be produced by the column of the second motif's PSSM, and multiply the likelihood by the information content of the column of the second motif's PSSM, and vise versa. We evaluated the performance of SPIC combined with a local or a global alignment method having a function for affine gap penalty, for computing the similarity between two motifs. We also compared SPIC with seven existing state-of-the-arts metrics for their capability of clustering motifs from the same group and retrieving motifs from a database on three datasets. RESULTS When used jointly with the Smith-Waterman local alignment method with an affine gap penalty function (gap open penalty is equal to 1, gap extension penalty is equal to 0.5), SPIC outperforms the seven existing state-of-the-art motif similarity metrics combined with their best alignments for matching motifs in database searches, and clustering the same TF's sub-motifs or distinguishing relevant ones from a miscellaneous group of motifs. CONCLUSIONS We have developed a novel motif similarity metric that can more accurately match motifs in database searches, and more effectively cluster similar motifs and differentiate irrelevant motifs than do the other seven metrics we are aware of.
Collapse
|
12
|
Eichner J, Topf F, Dräger A, Wrzodek C, Wanke D, Zell A. TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors. PLoS One 2013; 8:e82238. [PMID: 24349230 PMCID: PMC3861411 DOI: 10.1371/journal.pone.0082238] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Accepted: 10/21/2013] [Indexed: 11/18/2022] Open
Abstract
One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF) and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1) discriminates TFs from other proteins, (2) determines the structural superclass of TFs, (3) identifies the DNA-binding domains of TFs and (4) predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.
Collapse
Affiliation(s)
- Johannes Eichner
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
- * E-mail:
| | - Florian Topf
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
| | - Andreas Dräger
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
- University of California San Diego, La Jolla, California, United States of America
| | - Clemens Wrzodek
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
| | - Dierk Wanke
- Center for Plant Physiology Tuebingen (ZMBP), University of Tuebingen, Tübingen, Germany
| | - Andreas Zell
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
| |
Collapse
|
13
|
Conserved Motifs and Prediction of Regulatory Modules in Caenorhabditis elegans. G3-GENES GENOMES GENETICS 2012; 2:469-81. [PMID: 22540038 PMCID: PMC3337475 DOI: 10.1534/g3.111.001081] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2011] [Accepted: 02/06/2012] [Indexed: 01/30/2023]
Abstract
Transcriptional regulation, a primary mechanism for controlling the development of multicellular organisms, is carried out by transcription factors (TFs) that recognize and bind to their cognate binding sites. In Caenorhabditis elegans, our knowledge of which genes are regulated by which TFs, through binding to specific sites, is still very limited. To expand our knowledge about the C. elegans regulatory network, we performed a comprehensive analysis of the C. elegans, Caenorhabditis briggsae, and Caenorhabditis remanei genomes to identify regulatory elements that are conserved in all genomes. Our analysis identified 4959 elements that are significantly conserved across the genomes and that each occur multiple times within each genome, both hallmarks of functional regulatory sites. Our motifs show significant matches to known core promoter elements, TF binding sites, splice sites, and poly-A signals as well as many putative regulatory sites. Many of the motifs are significantly correlated with various types of experimental data, including gene expression patterns, tissue-specific expression patterns, and binding site location analysis as well as enrichment in specific functional classes of genes. Many can also be significantly associated with specific TFs. Combinations of motif occurrences allow us to predict the location of cis-regulatory modules and we show that many of them significantly overlap experimentally determined enhancers. We provide access to the predicted binding sites, their associated motifs, and the predicted cis-regulatory modules across the whole genome through a web-accessible database and as tracks for genome browsers.
Collapse
|
14
|
Ishihama A. Prokaryotic genome regulation: a revolutionary paradigm. PROCEEDINGS OF THE JAPAN ACADEMY. SERIES B, PHYSICAL AND BIOLOGICAL SCIENCES 2012; 88:485-508. [PMID: 23138451 PMCID: PMC3511978 DOI: 10.2183/pjab.88.485] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Accepted: 08/31/2012] [Indexed: 06/01/2023]
Abstract
After determination of the whole genome sequence, the research frontier of bacterial molecular genetics has shifted to reveal the genome regulation under stressful conditions in nature. The gene selectivity of RNA polymerase is modulated after interaction with two groups of regulatory proteins, 7 sigma factors and 300 transcription factors. For identification of regulation targets of transcription factors in Escherichia coli, we have developed Genomic SELEX system and subjected to screening the binding sites of these factors on the genome. The number of regulation targets by a single transcription factor was more than those hitherto recognized, ranging up to hundreds of promoters. The number of transcription factors involved in regulation of a single promoter also increased to as many as 30 regulators. The multi-target transcription factors and the multi-factor promoters were assembled into complex networks of transcription regulation. The most complex network was identified in the regulation cascades of transcription of two master regulators for planktonic growth and biofilm formation.
Collapse
Affiliation(s)
- Akira Ishihama
- Department of Frontier Bioscience and Micro-Nano Technology Research Center, Hosei University, Koganei, Tokyo 184-8584, Japan.
| |
Collapse
|
15
|
Yang S, Yalamanchili HK, Li X, Yao KM, Sham PC, Zhang MQ, Wang J. Correlated evolution of transcription factors and their binding sites. ACTA ACUST UNITED AC 2011; 27:2972-8. [PMID: 21896508 DOI: 10.1093/bioinformatics/btr503] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The interaction between transcription factor (TF) and transcription factor binding site (TFBS) is essential for gene regulation. Mutation in either the TF or the TFBS may weaken their interaction and thus result in abnormalities. To maintain such vital interaction, a mutation in one of the interacting partners might be compensated by a corresponding mutation in its binding partner during the course of evolution. Confirming this co-evolutionary relationship will guide us in designing protein sequences to target a specific DNA sequence or in predicting TFBS for poorly studied proteins, or even correcting and rescuing disease mutations in clinical applications. RESULTS Based on six, publicly available, experimentally validated TF-TFBS binding datasets for the basic Helix-Loop-Helix (bHLH) family, Homeo family, High-Mobility Group (HMG) family and Transient Receptor Potential channels (TRP) family, we showed that the evolutions of the TFs and their TFBSs are significantly correlated across eukaryotes. We further developed a mutual information-based method to identify co-evolved protein residues and DNA bases. This research sheds light on the dynamic relationship between TF and TFBS during their evolution. The same principle and strategy can be applied to co-evolutionary studies on protein-DNA interactions in other protein families. AVAILABILITY All the datasets, scripts and other related files have been made freely available at: http://jjwanglab.org/co-evo. CONTACT junwen@uw.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shu Yang
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | | | | | | | | | | | | |
Collapse
|
16
|
Ishihama A. Prokaryotic genome regulation: multifactor promoters, multitarget regulators and hierarchic networks. FEMS Microbiol Rev 2010; 34:628-45. [DOI: 10.1111/j.1574-6976.2010.00227.x] [Citation(s) in RCA: 170] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
17
|
Sahota G, Stormo GD. Novel sequence-based method for identifying transcription factor binding sites in prokaryotic genomes. ACTA ACUST UNITED AC 2010; 26:2672-7. [PMID: 20807838 DOI: 10.1093/bioinformatics/btq501] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Computational techniques for microbial genomic sequence analysis are becoming increasingly important. With next-generation sequencing technology and the human microbiome project underway, current sequencing capacity is significantly greater than the speed at which organisms of interest can be studied experimentally. Most related computational work has been focused on sequence assembly, gene annotation and metabolic network reconstruction. We have developed a method that will primarily use available sequence data in order to determine prokaryotic transcription factor (TF) binding specificities. RESULTS Specificity determining residues (critical residues) were identified from crystal structures of DNA-protein complexes and TFs with the same critical residues were grouped into specificity classes. The putative binding regions for each class were defined as the set of promoters for each TF itself (autoregulatory) and the immediately upstream and downstream operons. MEME was used to find putative motifs within each separate class. Tests on the LacI and TetR TF families, using RegulonDB annotated sites, showed the sensitivity of prediction 86% and 80%, respectively. AVAILABILITY http://ural.wustl.edu/∼gsahota/HTHmotif/
Collapse
Affiliation(s)
- Gurmukh Sahota
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63108, USA
| | | |
Collapse
|
18
|
Cai Y, He Z, Shi X, Kong X, Gu L, Xie L. A novel sequence-based method of predicting protein DNA-binding residues, using a machine learning approach. Mol Cells 2010; 30:99-105. [PMID: 20706794 DOI: 10.1007/s10059-010-0093-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Revised: 04/06/2010] [Accepted: 04/22/2010] [Indexed: 11/29/2022] Open
Abstract
Protein-DNA interactions play an essential role in transcriptional regulation, DNA repair, and many vital biological processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.
Collapse
Affiliation(s)
- Yudong Cai
- Institute of System Biology, Shanghai University, Shanghai, 200244, People's Republic of China.
| | | | | | | | | | | |
Collapse
|
19
|
Fadda A, Fierro AC, Lemmens K, Monsieurs P, Engelen K, Marchal K. Inferring the transcriptional network of Bacillus subtilis. MOLECULAR BIOSYSTEMS 2009; 5:1840-52. [PMID: 20023724 DOI: 10.1039/b907310h] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The adaptation of bacteria to the vigorous environmental changes they undergo is crucial to their survival. They achieve this adaptation partly via intricate regulation of the transcription of their genes. In this study, we infer the transcriptional network of the Gram-positive model organism, Bacillus subtilis. We use a data integration workflow, exploiting both motif and expression data, towards the generation of condition-dependent transcriptional modules. In building the motif data, we rely on both known and predicted information. Known motifs were derived from DBTBS, while predicted motifs were generated by a de novo motif detection method that utilizes comparative genomics. The expression data consists of a compendium of microarrays across different platforms. Our results indicate that a considerable part of the B. subtilis network is yet undiscovered; we could predict 417 new regulatory interactions for known regulators and 453 interactions for yet uncharacterized regulators. The regulators in our network showed a preference for regulating modules in certain environmental conditions. Also, substantial condition-dependent intra-operonic regulation seems to take place. Global regulators seem to require functional flexibility to attain their roles by acting as both activators and repressors.
Collapse
Affiliation(s)
- Abeer Fadda
- Department of Microbial and Molecular Systems, KULeuven, Kasteelpark Arenberg 20, 3001 Heverlee, Belgium
| | | | | | | | | | | |
Collapse
|
20
|
Zhang S, Xu M, Li S, Su Z. Genome-wide de novo prediction of cis-regulatory binding sites in prokaryotes. Nucleic Acids Res 2009; 37:e72. [PMID: 19383880 PMCID: PMC2691844 DOI: 10.1093/nar/gkp248] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Although cis-regulatory binding sites (CRBSs) are at least as important as the coding sequences in a genome, our general understanding of them in most sequenced genomes is very limited due to the lack of efficient and accurate experimental and computational methods for their characterization, which has largely hindered our understanding of many important biological processes. In this article, we describe a novel algorithm for genome-wide de novo prediction of CRBSs with high accuracy. We designed our algorithm to circumvent three identified difficulties for CRBS prediction using comparative genomics principles based on a new method for the selection of reference genomes, a new metric for measuring the similarity of CRBSs, and a new graph clustering procedure. When operon structures are correctly predicted, our algorithm can predict 81% of known individual binding sites belonging to 94% of known cis-regulatory motifs in the Escherichia coli K12 genome, while achieving high prediction specificity. Our algorithm has also achieved similar prediction accuracy in the Bacillus subtilis genome, suggesting that it is very robust, and thus can be applied to any other sequenced prokaryotic genome. When compared with the prior state-of-the-art algorithms, our algorithm outperforms them in both prediction sensitivity and specificity.
Collapse
Affiliation(s)
- Shaoqiang Zhang
- Department of Bioinformatics and Genomics, Bioinformatics Research Center, the University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | | | | | | |
Collapse
|
21
|
Abstract
While hundreds of microbial genomes are sequenced, the challenge remains to define their cis-regulatory maps. Here, we present a comparative genomic analysis of the cis-regulatory map of Shewanella oneidensis, an important model organism for bioremediation because of its extraordinary abilities to use a wide variety of metals and organic molecules as electron acceptors in respiration. First, from the experimentally verified transcriptional regulatory networks of Escherichia coli, we inferred 24 DNA motifs that are conserved in S. oneidensis. We then applied a new comparative approach on five Shewanella genomes that allowed us to systematically identify 194 nonredundant palindromic DNA motifs and corresponding regulons in S. oneidensis. Sixty-four percent of the predicted motifs are conserved in at least three of the seven newly sequenced and distantly related Shewanella genomes. In total, we obtained 209 unique DNA motifs in S. oneidensis that cover 849 unique transcription units. Besides conservation in other genomes, 77 of these motifs are supported by at least one additional type of evidence, including matching to known transcription factor binding motifs and significant functional enrichment or expression coherence of the corresponding target genes. Using the same approach on a more focused gene set, 990 differentially expressed genes derived from published microarray data of S. oneidensis during exposure to metal ions, we identified 31 putative cis-regulatory motifs (16 with at least one type of additional supporting evidence) that are potentially involved in the process of metal reduction. The majority (18/31) of those motifs had been found in our whole-genome comparative approach, further demonstrating that such an approach is capable of uncovering a large fraction of the regulatory map of a genome even in the absence of experimental data. The integrated computational approach developed in this study provides a useful strategy to identify genome-wide cis-regulatory maps and a novel avenue to explore the regulatory pathways for particular biological processes in bacterial systems.
Collapse
Affiliation(s)
- Jiajian Liu
- Department of Genetics, Washington University School of Medicine, 660 S Euclid, Box 8232, St Louis, MO 63110, USA
| | | | | |
Collapse
|
22
|
Kinkhabwala A, Guet CC. Uncovering cis regulatory codes using synthetic promoter shuffling. PLoS One 2008; 3:e2030. [PMID: 18446205 PMCID: PMC2321153 DOI: 10.1371/journal.pone.0002030] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2007] [Accepted: 03/14/2008] [Indexed: 01/22/2023] Open
Abstract
Revealing the spectrum of combinatorial regulation of transcription at individual promoters is essential for understanding the complex structure of biological networks. However, the computations represented by the integration of various molecular signals at complex promoters are difficult to decipher in the absence of simple cis regulatory codes. Here we synthetically shuffle the regulatory architecture — operator sequences binding activators and repressors — of a canonical bacterial promoter. The resulting library of complex promoters allows for rapid exploration of promoter encoded logic regulation. Among all possible logic functions, NOR and ANDN promoter encoded logics predominate. A simple transcriptional cis regulatory code determines both logics, establishing a straightforward map between promoter structure and logic phenotype. The regulatory code is determined solely by the type of transcriptional regulation combinations: two repressors generate a NOR: NOT (a OR b) whereas a repressor and an activator generate an ANDN: a AND NOT b. Three-input versions of both logics, having an additional repressor as an input, are also present in the library. The resulting complex promoters cover a wide dynamic range of transcriptional strengths. Synthetic promoter shuffling represents a fast and efficient method for exploring the spectrum of complex regulatory functions that can be encoded by complex promoters. From an engineering point of view, synthetic promoter shuffling enables the experimental testing of the functional properties of complex promoters that cannot necessarily be inferred ab initio from the known properties of the individual genetic components. Synthetic promoter shuffling may provide a useful experimental tool for studying naturally occurring promoter shuffling.
Collapse
Affiliation(s)
- Ali Kinkhabwala
- Laboratory of Living Matter and Center for Studies in Physics and Biology, Rockefeller University, New York, New York, United States of America
- Systemic Cell Biology, Max Planck Institute for Molecular Physiology, Dortmund, Germany
- * E-mail: (AK); (CCG)
| | - Călin C. Guet
- Laboratory of Living Matter and Center for Studies in Physics and Biology, Rockefeller University, New York, New York, United States of America
- Institute for Biophysical Dynamics, University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (AK); (CCG)
| |
Collapse
|
23
|
Lintner RE, Mishra PK, Srivastava P, Martinez-Vaz BM, Khodursky AB, Blumenthal RM. Limited functional conservation of a global regulator among related bacterial genera: Lrp in Escherichia, Proteus and Vibrio. BMC Microbiol 2008; 8:60. [PMID: 18405378 PMCID: PMC2374795 DOI: 10.1186/1471-2180-8-60] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2007] [Accepted: 04/11/2008] [Indexed: 02/03/2023] Open
Abstract
Background Bacterial genome sequences are being determined rapidly, but few species are physiologically well characterized. Predicting regulation from genome sequences usually involves extrapolation from better-studied bacteria, using the hypothesis that a conserved regulator, conserved target gene, and predicted regulator-binding site in the target promoter imply conserved regulation between the two species. However many compared organisms are ecologically and physiologically diverse, and the limits of extrapolation have not been well tested. In E. coli K-12 the leucine-responsive regulatory protein (Lrp) affects expression of ~400 genes. Proteus mirabilis and Vibrio cholerae have highly-conserved lrp orthologs (98% and 92% identity to E. coli lrp). The functional equivalence of Lrp from these related species was assessed. Results Heterologous Lrp regulated gltB, livK and lrp transcriptional fusions in an E. coli background in the same general way as the native Lrp, though with significant differences in extent. Microarray analysis of these strains revealed that the heterologous Lrp proteins significantly influence only about half of the genes affected by native Lrp. In P. mirabilis, heterologous Lrp restored swarming, though with some pattern differences. P. mirabilis produced substantially more Lrp than E. coli or V. cholerae under some conditions. Lrp regulation of target gene orthologs differed among the three native hosts. Strikingly, while Lrp negatively regulates its own gene in E. coli, and was shown to do so even more strongly in P. mirabilis, Lrp appears to activate its own gene in V. cholerae. Conclusion The overall similarity of regulatory effects of the Lrp orthologs supports the use of extrapolation between related strains for general purposes. However this study also revealed intrinsic differences even between orthologous regulators sharing >90% overall identity, and 100% identity for the DNA-binding helix-turn-helix motif, as well as differences in the amounts of those regulators. These results suggest that predicting regulation of specific target genes based on genome sequence comparisons alone should be done on a conservative basis.
Collapse
Affiliation(s)
- Robert E Lintner
- Department of Medical Microbiology and Immunology, University of Toledo Health Sciences Center, Toledo, OH 43614-2598, USA.
| | | | | | | | | | | |
Collapse
|
24
|
González-Díaz H, González-Díaz Y, Santana L, Ubeira FM, Uriarte E. Proteomics, networks and connectivity indices. Proteomics 2008; 8:750-78. [DOI: 10.1002/pmic.200700638] [Citation(s) in RCA: 170] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
25
|
Abstract
Sequence motif discovery algorithms are an important part of the computational biologist's toolkit. The purpose of motif discovery is to discover patterns in biopolymer (nucleotide or protein) sequences in order to better understand the structure and function of the molecules the sequences represent. This chapter provides an overview of the use of sequence motif discovery in biology and a general guide to the use of motif discovery algorithms. The chapter discusses the types of biological features that DNA and protein motifs can represent and their usefulness. It also defines what sequence motifs are, how they are represented, and general techniques for discovering them. The primary focus is on one aspect of motif discovery: discovering motifs in a set of unaligned DNA or protein sequences. Also presented are steps useful for checking the biological validity and investigating the function of sequence motifs using methods such as motif scanning--searching for matches to motifs in a given sequence or a database of sequences. A discussion of some limitations of motif discovery concludes the chapter.
Collapse
Affiliation(s)
- Timothy L Bailey
- ARC Centre of Excellence in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
26
|
MCCORD RACHELPATTON, BULYK MARTHAL. Functional trends in structural classes of the DNA binding domains of regulatory transcription factors. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2008:441-52. [PMID: 18229706 PMCID: PMC2757920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The DNA-binding domain (DBD) structure of a regulatory transcription factor (TF) is important in determining its DNA sequence specificity, but it is unclear whether a relationship exists between DBD structure and general TF biological function or regulatory mechanism. We observed moderate enrichment of functional annotation terms among TFs of the same structural class in Escherichia coli, Saccharomyces cerevisiae, Drosophila melanogaster, or Mus musculus, suggesting some preference for TFs of similar structures in the regulation of similar processes. In yeast, we also found trends among TF structural classes in phenomena including gene expression coherence, DNA binding site motif similarity, the general or specific nature of TFs' regulatory roles, and the position of a TF in a gene regulatory network. These results suggest that the biophysical constraints of different TF structural classes play a role in their gene regulatory mechanisms.
Collapse
Affiliation(s)
- RACHEL PATTON MCCORD
- Division of Genetics, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Harvard University Graduate Biophysics Program, Cambridge, MA 02138, ,
| | - MARTHA L. BULYK
- Division of Genetics, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Department of Pathology, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Harvard/MIT Division of Health Sciences & Technology (HST), Harvard Medical School, Boston, MA 02115
- Harvard University Graduate Biophysics Program, Cambridge, MA 02138, ,
| |
Collapse
|
27
|
Price MN, Dehal PS, Arkin AP. Orthologous transcription factors in bacteria have different functions and regulate different genes. PLoS Comput Biol 2007; 3:1739-50. [PMID: 17845071 PMCID: PMC1971122 DOI: 10.1371/journal.pcbi.0030175] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2007] [Accepted: 07/25/2007] [Indexed: 11/21/2022] Open
Abstract
Transcription factors (TFs) form large paralogous gene families and have complex evolutionary histories. Here, we ask whether putative orthologs of TFs, from bidirectional best BLAST hits (BBHs), are evolutionary orthologs with conserved functions. We show that BBHs of TFs from distantly related bacteria are usually not evolutionary orthologs. Furthermore, the false orthologs usually respond to different signals and regulate distinct pathways, while the few BBHs that are evolutionary orthologs do have conserved functions. To test the conservation of regulatory interactions, we analyze expression patterns. We find that regulatory relationships between TFs and their regulated genes are usually not conserved for BBHs in Escherichia coli K12 and Bacillus subtilis. Even in the much more closely related bacteria Vibrio cholerae and Shewanella oneidensis MR-1, predicting regulation from E. coli BBHs has high error rates. Using gene–regulon correlations, we identify genes whose expression pattern differs between E. coli and S. oneidensis. Using literature searches and sequence analysis, we show that these changes in expression patterns reflect changes in gene regulation, even for evolutionary orthologs. We conclude that the evolution of bacterial regulation should be analyzed with phylogenetic trees, rather than BBHs, and that bacterial regulatory networks evolve more rapidly than previously thought. Living organisms use transcription factors (TFs) to control the production of proteins. For example, the bacterium E. coli contains a TF that prevents it from making enzymes that degrade lactose when lactose is absent. Bacterial genomes encode a huge diversity of TFs, and except in a few well-studied organisms, the function of these TFs is not known. To predict the function of a TF, biologists often search for a similar TF, from another organism, that has been characterized. It is generally believed that orthologous TFs—TFs that are derived from the organisms' common ancestor—will have conserved functions. The authors show that a commonly used method to identify orthologous TFs gives misleading results when applied to distantly related bacteria: the “orthologous” TFs are evolutionarily distant, they sense different signals, and they regulate different pathways. Biologists often predict, more specifically, that orthologous TFs will regulate orthologous genes. However, the authors show that even in more closely related bacteria, where the orthologous TFs do have conserved functions, these specific predictions are often incorrect. It seems that gene regulation in bacteria evolves rapidly, and it will be difficult to predict regulation in diverse bacteria from our knowledge of a few well-studied bacteria.
Collapse
Affiliation(s)
- Morgan N Price
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.
| | | | | |
Collapse
|
28
|
Janga SC, Collado-Vides J. Structure and evolution of gene regulatory networks in microbial genomes. Res Microbiol 2007; 158:787-94. [PMID: 17996425 DOI: 10.1016/j.resmic.2007.09.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2007] [Revised: 08/07/2007] [Accepted: 09/17/2007] [Indexed: 12/24/2022]
Abstract
With the availability of genome sequences for hundreds of microbial genomes, it has become possible to address several questions from a comparative perspective to understand the structure and function of regulatory systems, at least in model organisms. Recent studies have focused on topological properties and the evolution of regulatory networks and their components. Our understanding of natural networks is paving the way to embedding synthetic regulatory systems into organisms, allowing us to expand the natural diversity of living systems to an extent we had never before anticipated.
Collapse
Affiliation(s)
- Sarath Chandra Janga
- Program of Computational Genomics, CCG-UNAM, Apdo Postal 565-A, Cuernavaca, Morelos, 62100 Mexico.
| | | |
Collapse
|
29
|
He D, Zhou D, Zhou Y. Identifying transcription factor targets using enhanced Bayesian classifier. Comput Biol Chem 2007; 31:355-60. [PMID: 17890157 DOI: 10.1016/j.compbiolchem.2007.08.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2006] [Revised: 08/14/2007] [Accepted: 08/14/2007] [Indexed: 11/23/2022]
Abstract
Finding transcription factors (TFs) to their target genes (TGs) is the first step to understand the transcriptional regulatory networks. Here we present a method which uses an enhanced Bayesian classifier to predict the TF-TG pairs in time-course expression data. Different from previous prediction models, the gene expression data is encoded by discrete values and the temporal feature is used in the enhanced Bayesian classifier. The enhanced Bayesian classifier is trained and tested on two groups of positive and negative samples by three-fold cross-validation and compared with other methods. As the prediction result is improved obviously, the enhanced Bayesian classifier represents a new perspective on studying the regulation relationships from gene expression data. Further more, a data selection method which focus on 'active' TFs is proposed, suggesting a new approach on selecting effective time-course expression data.
Collapse
Affiliation(s)
- Dong He
- Hubei Bioinformatics and Molecular Imaging Key Laboratory, Huazhong University of Science and Technology, Wuhan 430074, China
| | | | | |
Collapse
|
30
|
Davies SR, Chang LW, Patra D, Xing X, Posey K, Hecht J, Stormo GD, Sandell LJ. Computational identification and functional validation of regulatory motifs in cartilage-expressed genes. Genome Res 2007; 17:1438-47. [PMID: 17785538 PMCID: PMC1987341 DOI: 10.1101/gr.6224007] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Chondrocyte gene regulation is important for the generation and maintenance of cartilage tissues. Several regulatory factors have been identified that play a role in chondrogenesis, including the positive transacting factors of the SOX family such as SOX9, SOX5, and SOX6, as well as negative transacting factors such as C/EBP and delta EF1. However, a complete understanding of the intricate regulatory network that governs the tissue-specific expression of cartilage genes is not yet available. We have taken a computational approach to identify cis-regulatory, transcription factor (TF) binding motifs in a set of cartilage characteristic genes to better define the transcriptional regulatory networks that regulate chondrogenesis. Our computational methods have identified several TFs, whose binding profiles are available in the TRANSFAC database, as important to chondrogenesis. In addition, a cartilage-specific SOX-binding profile was constructed and used to identify both known, and novel, functional paired SOX-binding motifs in chondrocyte genes. Using DNA pattern-recognition algorithms, we have also identified cis-regulatory elements for unknown TFs. We have validated our computational predictions through mutational analyses in cell transfection experiments. One novel regulatory motif, N1, found at high frequency in the COL2A1 promoter, was found to bind to chondrocyte nuclear proteins. Mutational analyses suggest that this motif binds a repressive factor that regulates basal levels of the COL2A1 promoter.
Collapse
Affiliation(s)
- Sherri R. Davies
- Department of Orthopaedic Surgery, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Li-Wei Chang
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri 63130, USA
| | - Debabrata Patra
- Department of Orthopaedic Surgery, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Xiaoyun Xing
- Department of Orthopaedic Surgery, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Karen Posey
- Department of Pediatrics, University of Texas Medical School at Houston, Houston, Texas 77030, USA
| | - Jacqueline Hecht
- Department of Pediatrics, University of Texas Medical School at Houston, Houston, Texas 77030, USA
- Shriners Hospital for Children, Houston, Texas 77030, USA
| | - Gary D. Stormo
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri 63130, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Linda J. Sandell
- Department of Orthopaedic Surgery, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Corresponding author.E-mail ; fax (314) 454-5900
| |
Collapse
|
31
|
Kolesov G, Wunderlich Z, Laikova ON, Gelfand MS, Mirny LA. How gene order is influenced by the biophysics of transcription regulation. Proc Natl Acad Sci U S A 2007; 104:13948-53. [PMID: 17709750 PMCID: PMC1955771 DOI: 10.1073/pnas.0700672104] [Citation(s) in RCA: 121] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
What are the forces that shape the structure of prokaryotic genomes: the order of genes, their proximity, and their orientation? Coregulation and coordinated horizontal gene transfer are believed to promote the proximity of functionally related genes and the formation of operons. However, forces that influence the structure of the genome beyond the level of a single operon remain unknown. Here, we show that the biophysical mechanism by which regulatory proteins search for their sites on DNA can impose constraints on genome structure. Using simulations, we demonstrate that rapid and reliable gene regulation requires that the transcription factor (TF) gene be close to the site on DNA the TF has to bind, thus promoting the colocalization of TF genes and their targets on the genome. We use parameters that have been measured in recent experiments to estimate the relevant length and times scales of this process and demonstrate that the search for a cognate site may be prohibitively slow if a TF has a low copy number and is not colocalized. We also analyze TFs and their sites in a number of bacterial genomes, confirm that they are colocalized significantly more often than expected, and show that this observation cannot be attributed to the pressure for coregulation or formation of selfish gene clusters, thus supporting the role of the biophysical constraint in shaping the structure of prokaryotic genomes. Our results demonstrate how spatial organization can influence timing and noise in gene expression.
Collapse
Affiliation(s)
- Grigory Kolesov
- *Harvard–MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139
| | | | - Olga N. Laikova
- State Scientific Center GosNIIGenetika, Moscow 117545, Russia; and
| | - Mikhail S. Gelfand
- Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow 127994, Russia
| | - Leonid A. Mirny
- *Harvard–MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139
- To whom correspondence should be addressed at:
77 Massachusetts Avenue, 16-343, Cambridge, MA 02139. E-mail:
| |
Collapse
|
32
|
Affiliation(s)
- Dmitry A Rodionov
- Burnham Institute for Medical Research, La Jolla, California 92037, USA.
| |
Collapse
|
33
|
Jothi R, Przytycka TM, Aravind L. Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment. BMC Bioinformatics 2007; 8:173. [PMID: 17521444 PMCID: PMC1904249 DOI: 10.1186/1471-2105-8-173] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2007] [Accepted: 05/23/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A widely-used approach for discovering functional and physical interactions among proteins involves phylogenetic profile comparisons (PPCs). Here, proteins with similar profiles are inferred to be functionally related under the assumption that proteins involved in the same metabolic pathway or cellular system are likely to have been co-inherited during evolution. RESULTS Our experimentation with E. coli and yeast proteins with 16 different carefully composed reference sets of genomes revealed that the phyletic patterns of proteins in prokaryotes alone could be adequate enough to make reasonably accurate functional linkage predictions. A slight improvement in performance is observed on adding few eukaryotes into the reference set, but a noticeable drop-off in performance is observed with increased number of eukaryotes. Inclusion of most parasitic, pathogenic or vertebrate genomes and multiple strains of the same species into the reference set do not necessarily contribute to an improved sensitivity or accuracy. Interestingly, we also found that evolutionary histories of individual pathways have a significant affect on the performance of the PPC approach with respect to a particular reference set. For example, to accurately predict functional links in carbohydrate or lipid metabolism, a reference set solely composed of prokaryotic (or bacterial) genomes performed among the best compared to one composed of genomes from all three super-kingdoms; this is in contrast to predicting functional links in translation for which a reference set composed of prokaryotic (or bacterial) genomes performed the worst. We also demonstrate that the widely used random null model to quantify the statistical significance of profile similarity is incomplete, which could result in an increased number of false-positives. CONCLUSION Contrary to previous proposals, it is not merely the number of genomes but a careful selection of informative genomes in the reference set that influences the prediction accuracy of the PPC approach. We note that the predictive power of the PPC approach, especially in eukaryotes, is heavily influenced by the primary endosymbiosis and subsequent bacterial contributions. The over-representation of parasitic unicellular eukaryotes and vertebrates additionally make eukaryotes less useful in the reference sets. Reference sets composed of highly non-redundant set of genomes from all three super-kingdoms fare better with pathways showing considerable vertical inheritance and strong conservation (e.g. translation apparatus), while reference sets solely composed of prokaryotic genomes fare better for more variable pathways like carbohydrate metabolism. Differential performance of the PPC approach on various pathways, and a weak positive correlation between functional and profile similarities suggest that caution should be exercised while interpreting functional linkages inferred from genome-wide large-scale profile comparisons using a single reference set.
Collapse
Affiliation(s)
- Raja Jothi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Teresa M Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
34
|
Abstract
Sequence motif discovery algorithms are an important part of the computational biologist's toolkit. The purpose of motif discovery is to discover patterns in biopolymer (nucleotide or protein) sequences to better understand the structure and function of the molecules the sequences represent. This chapter provides an overview of the use of sequence motif discovery in biology and a general guide to the use of motif discovery algorithms. This chapter examines the types of biological features that DNA and protein motifs can represent and their usefulness. This chapter also defines what sequence motifs are, how they are represented, and general techniques for discovering them. The primary focus of the chapter is on one aspect of motif discovery: discovering motifs in a set of unaligned DNA or protein sequences. This chapter also provides the steps useful for checking the biological validity and investigating the function of sequence motifs using methods such as motif scanning-searching for matches to motifs in a given sequence or a database of sequences. A discussion of some limitations of motif discovery concludes the chapter.
Collapse
|
35
|
|
36
|
Pérez AG, Angarica VE, Vasconcelos ATR, Collado-Vides J. Tractor_DB (version 2.0): a database of regulatory interactions in gamma-proteobacterial genomes. Nucleic Acids Res 2006; 35:D132-6. [PMID: 17088283 PMCID: PMC1669740 DOI: 10.1093/nar/gkl800] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The version 2.0 of Tractor_DB is now accessible at its three international mirrors: , and . This database contains a collection of computationally predicted Transcription Factors' binding sites in gamma-proteobacterial genomes. These data should aid researchers in the design of microarray experiments and the interpretation of their results. They should also facilitate studies of Comparative Genomics of the regulatory networks of this group of organisms. In this paper we describe the main improvements incorporated to the database in the past year and a half which include incorporating information on the regulatory networks of 13—increasing to 30—new gamma-proteobacteria and developing a new computational strategy to complement the putative sites identified by the original weight matrix-based approach. We have also added dynamically generated navigation tabs to the navigation interfaces. Moreover, we developed a new interface that allows users to directly retrieve information on the conservation of regulatory interactions in the 30 genomes included in the database by navigating a map that represents a core of the known Escherichia coli regulatory network.
Collapse
Affiliation(s)
| | | | - Ana Tereza R. Vasconcelos
- National Laboratory for Scientific ComputingBrazil
- To whom correspondence should be addressed at Bioinformatics Laboratory-LABINFO National Laboratory of Scientfic Computation Av. Getulio Vargas, 333, Quitandinha ZC: 25651-075 Petrópolis Rio de Janeiro, Brazil. Tel: +55 24 2233 6065; Fax: +55 24 2231 5595;
| | | |
Collapse
|
37
|
GuhaThakurta D. Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res 2006; 34:3585-98. [PMID: 16855295 PMCID: PMC1524905 DOI: 10.1093/nar/gkl372] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Identification and annotation of all the functional elements in the genome, including genes and the regulatory sequences, is a fundamental challenge in genomics and computational biology. Since regulatory elements are frequently short and variable, their identification and discovery using computational algorithms is difficult. However, significant advances have been made in the computational methods for modeling and detection of DNA regulatory elements. The availability of complete genome sequence from multiple organisms, as well as mRNA profiling and high-throughput experimental methods for mapping protein-binding sites in DNA, have contributed to the development of methods that utilize these auxiliary data to inform the detection of transcriptional regulatory elements. Progress is also being made in the identification of cis-regulatory modules and higher order structures of the regulatory sequences, which is essential to the understanding of transcription regulation in the metazoan genomes. This article reviews the computational approaches for modeling and identification of genomic regulatory elements, with an emphasis on the recent developments, and current challenges.
Collapse
Affiliation(s)
- Debraj GuhaThakurta
- Research Genetics Division, Rosetta Inpharmatics LLC, Merck & Co., Inc, 401 Terry Avenue North, Seattle, WA 98109, USA.
| |
Collapse
|
38
|
Vemuri GN, Aristidou AA. Metabolic engineering in the -omics era: elucidating and modulating regulatory networks. Microbiol Mol Biol Rev 2006; 69:197-216. [PMID: 15944454 PMCID: PMC1197421 DOI: 10.1128/mmbr.69.2.197-216.2005] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The importance of regulatory control in metabolic processes is widely acknowledged, and several enquiries (both local and global) are being made in understanding regulation at various levels of the metabolic hierarchy. The wealth of biological information has enabled identifying the individual components (genes, proteins, and metabolites) of a biological system, and we are now in a position to understand the interactions between these components. Since phenotype is the net result of these interactions, it is immensely important to elucidate them not only for an integrated understanding of physiology, but also for practical applications of using biological systems as cell factories. We present some of the recent "-omics" approaches that have expanded our understanding of regulation at the gene, protein, and metabolite level, followed by analysis of the impact of this progress on the advancement of metabolic engineering. Although this review is by no means exhaustive, we attempt to convey our ideology that combining global information from various levels of metabolic hierarchy is absolutely essential in understanding and subsequently predicting the relationship between changes in gene expression and the resulting phenotype. The ultimate aim of this review is to provide metabolic engineers with an overview of recent advances in complementary aspects of regulation at the gene, protein, and metabolite level and those involved in fundamental research with potential hurdles in the path to implementing their discoveries in practical applications.
Collapse
Affiliation(s)
- Goutham N Vemuri
- Center for Molecular BioEngineering, Drifmier Engineering Center, University of Georgia, Athens, 30605, USA
| | | |
Collapse
|
39
|
Macisaac KD, Gordon DB, Nekludova L, Odom DT, Schreiber J, Gifford DK, Young RA, Fraenkel E. A hypothesis-based approach for identifying the binding specificity of regulatory proteins from chromatin immunoprecipitation data. Bioinformatics 2005; 22:423-9. [PMID: 16332710 DOI: 10.1093/bioinformatics/bti815] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION Genome-wide chromatin-immunoprecipitation (ChIP-chip) detects binding of transcriptional regulators to DNA in vivo at low resolution. Motif discovery algorithms can be used to discover sequence patterns in the bound regions that may be recognized by the immunoprecipitated protein. However, the discovered motifs often do not agree with the binding specificity of the protein, when it is known. RESULTS We present a powerful approach to analyzing ChIP-chip data, called THEME, that tests hypotheses concerning the sequence specificity of a protein. Hypotheses are refined using constrained local optimization. Cross-validation provides a principled standard for selecting the optimal weighting of the hypothesis and the ChIP-chip data and for choosing the best refined hypothesis. We demonstrate how to derive hypotheses for proteins from 36 domain families. Using THEME together with these hypotheses, we analyze ChIP-chip datasets for 14 human and mouse proteins. In all the cases the identified motifs are consistent with the published data with regard to the binding specificity of the proteins.
Collapse
Affiliation(s)
- Kenzie D Macisaac
- MIT Computer Science and Artificial Intelligence, Laboratory 32, Vassar Street, Cambridge, MA 02139, USA
| | | | | | | | | | | | | | | |
Collapse
|
40
|
Conlan S, Lawrence C, McCue LA. Rhodopseudomonas palustris regulons detected by cross-species analysis of alphaproteobacterial genomes. Appl Environ Microbiol 2005; 71:7442-52. [PMID: 16269786 PMCID: PMC1287613 DOI: 10.1128/aem.71.11.7442-7452.2005] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2005] [Accepted: 06/14/2005] [Indexed: 11/20/2022] Open
Abstract
Rhodopseudomonas palustris, an alpha-proteobacterium, carries out three of the chemical reactions that support life on this planet: the conversion of sunlight to chemical-potential energy; the absorption of carbon dioxide, which it converts to cellular material; and the fixation of atmospheric nitrogen into ammonia. Insight into the transcription-regulatory network that coordinates these processes is fundamental to understanding the biology of this versatile bacterium. With this goal in mind, we predicted regulatory signals genomewide, using a two-step phylogenetic-footprinting and clustering process that we had developed previously. In the first step, 4,963 putative transcription factor binding sites, upstream of 2,044 genes and operons, were identified using cross-species Gibbs sampling. Bayesian motif clustering was then employed to group the cross-species motifs into regulons. We have identified 101 putative regulons in R. palustris, including 8 that are of particular interest: a photosynthetic regulon, a flagellar regulon, an organic hydroperoxide resistance regulon, the LexA regulon, and four regulons related to nitrogen metabolism (FixK2, NnrR, NtrC, and sigma54). In some cases, clustering allowed us to assign functions to proteins that previously had been annotated with only putative functions; we have identified RPA0828 as the organic hydroperoxide resistance regulator and RPA1026 as a cell cycle methylase. In addition to predicting regulons, we identified a novel inverted repeat that likely forms a highly conserved stem-loop and that occurs downstream of over 100 genes.
Collapse
Affiliation(s)
- Sean Conlan
- Wadsworth Center, New York State Department of Health, Center for Medical Sciences, 150 New Scotland Ave., Albany, NY 12208, USA
| | | | | |
Collapse
|
41
|
Leykin I, Kao MCJ, Wong WH. HumanUpstream and MouseUpstream: databases of promoter sequences in the human and mouse genomes. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2005; 9:220-4. [PMID: 16209636 DOI: 10.1089/omi.2005.9.220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Large-scale genome annotations, based largely on gene prediction programs, may be inaccurate in their predictions of transcription start sites, so that the identification of promoter regions remains unreliable. Here we focus on the identification of reliable gene promoter regions, critical to the understanding of transcriptional regulation. We report the construction of databases of upstream sequences Human Upstream and Mouse Upstream based on information from both the human and mouse genomes and the database of expressed sequence tags (dbEST). Using the ENSEMBL generic genome annotation system, our approach allows more reliable identification of transcript start sites, and therefore extraction of more reliable promoters regions. The Human Upstream and Human Upstream databases are available free of charge.
Collapse
Affiliation(s)
- Igor Leykin
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts.
| | | | | |
Collapse
|