51
|
Pérez AG, Angarica VE, Collado-Vides J, Vasconcelos ATR. From sequence to dynamics: the effects of transcription factor and polymerase concentration changes on activated and repressed promoters. BMC Mol Biol 2009; 10:92. [PMID: 19772633 PMCID: PMC2761915 DOI: 10.1186/1471-2199-10-92] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2009] [Accepted: 09/22/2009] [Indexed: 11/25/2022] Open
Abstract
Background The fine tuning of two features of the bacterial regulatory machinery have been known to contribute to the diversity of gene expression within the same regulon: the sequence of Transcription Factor (TF) binding sites, and their location with respect to promoters. While variations of binding sequences modulate the strength of the interaction between the TF and its binding sites, the distance between binding sites and promoters alter the interaction between the TF and the RNA polymerase (RNAP). Results In this paper we estimated the dissociation constants (Kd) of several E. coli TFs in their interaction with variants of their binding sequences from the scores resulting from aligning them to Positional Weight Matrices. A correlation coefficient of 0.78 was obtained when pooling together sites for different TFs. The theoretically estimated Kd values were then used, together with the dissociation constants of the RNAP-promoter interaction to analyze activated and repressed promoters. The strength of repressor sites -- i.e., the strength of the interaction between TFs and their binding sites -- is slightly higher than that of activated sites. We explored how different factors such as the variation of binding sequences, the occurrence of more than one binding site, or different RNAP concentrations may influence the promoters' response to the variations of TF concentrations. We found that the occurrence of several regulatory sites bound by the same TF close to a promoter -- if they are bound by the TF in an independent manner -- changes the effect of TF concentrations on promoter occupancy, with respect to individual sites. We also found that the occupancy of a promoter will never be more than half if the RNAP concentration-to-Kp ratio is 1 and the promoter is subject to repression; or less than half if the promoter is subject to activation. If the ratio falls to 0.1, the upper limit of occupancy probability for repressed drops below 10%; a descent of the limits occurs also for activated promoters. Conclusion The number of regulatory sites may thus act as a versatility-producing device, in addition to serving as a source of robustness of the transcription machinery. Furthermore, our results show that the effects of TF concentration fluctuations on promoter occupancy are constrained by RNAP concentrations.
Collapse
|
52
|
Lemmens K, De Bie T, Dhollander T, Monsieurs P, De Moor B, Collado-Vides J, Engelen K, Marchal K. The Condition-Dependent Transcriptional Network in Escherichia coli. Ann N Y Acad Sci 2009; 1158:29-35. [DOI: 10.1111/j.1749-6632.2008.03746.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
53
|
Lemmens K, De Bie T, Dhollander T, De Keersmaecker SC, Thijs IM, Schoofs G, De Weerdt A, De Moor B, Vanderleyden J, Collado-Vides J, Engelen K, Marchal K. DISTILLER: a data integration framework to reveal condition dependency of complex regulons in Escherichia coli. Genome Biol 2009; 10:R27. [PMID: 19265557 PMCID: PMC2690998 DOI: 10.1186/gb-2009-10-3-r27] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2008] [Revised: 01/15/2009] [Accepted: 03/06/2009] [Indexed: 11/13/2022] Open
Abstract
DISTILLER, a data integration framework for the inference of transcriptional module networks, is presented and used to investigate the condition dependency and modularity in Escherichia coli networks. We present DISTILLER, a data integration framework for the inference of transcriptional module networks. Experimental validation of predicted targets for the well-studied fumarate nitrate reductase regulator showed the effectiveness of our approach in Escherichia coli. In addition, the condition dependency and modularity of the inferred transcriptional network was studied. Surprisingly, the level of regulatory complexity seemed lower than that which would be expected from RegulonDB, indicating that complex regulatory programs tend to decrease the degree of modularity.
Collapse
|
54
|
Balleza E, López-Bojorquez LN, Martínez-Antonio A, Resendis-Antonio O, Lozada-Chávez I, Balderas-Martínez YI, Encarnación S, Collado-Vides J. Regulation by transcription factors in bacteria: beyond description. FEMS Microbiol Rev 2009; 33:133-51. [PMID: 19076632 PMCID: PMC2704942 DOI: 10.1111/j.1574-6976.2008.00145.x] [Citation(s) in RCA: 133] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Transcription is an essential step in gene expression and its understanding has been one of the major interests in molecular and cellular biology. By precisely tuning gene expression, transcriptional regulation determines the molecular machinery for developmental plasticity, homeostasis and adaptation. In this review, we transmit the main ideas or concepts behind regulation by transcription factors and give just enough examples to sustain these main ideas, thus avoiding a classical ennumeration of facts. We review recent concepts and developments: cis elements and trans regulatory factors, chromosome organization and structure, transcriptional regulatory networks (TRNs) and transcriptomics. We also summarize new important discoveries that will probably affect the direction of research in gene regulation: epigenetics and stochasticity in transcriptional regulation, synthetic circuits and plasticity and evolution of TRNs. Many of the new discoveries in gene regulation are not extensively tested with wetlab approaches. Consequently, we review this broad area in Inference of TRNs and Dynamical Models of TRNs. Finally, we have stepped backwards to trace the origins of these modern concepts, synthesizing their history in a timeline schema.
Collapse
|
55
|
Keseler IM, Bonavides-Martínez C, Collado-Vides J, Gama-Castro S, Gunsalus RP, Johnson DA, Krummenacker M, Nolan LM, Paley S, Paulsen IT, Peralta-Gil M, Santos-Zavaleta A, Shearer AG, Karp PD. EcoCyc: a comprehensive view of Escherichia coli biology. Nucleic Acids Res 2008; 37:D464-70. [PMID: 18974181 PMCID: PMC2686493 DOI: 10.1093/nar/gkn751] [Citation(s) in RCA: 256] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
EcoCyc (http://EcoCyc.org) provides a comprehensive encyclopedia of Escherichia coli biology. EcoCyc integrates information about the genome, genes and gene products; the metabolic network; and the regulatory network of E. coli. Recent EcoCyc developments include a new initiative to represent and curate all types of E. coli regulatory processes such as attenuation and regulation by small RNAs. EcoCyc has started to curate Gene Ontology (GO) terms for E. coli and has made a dataset of E. coli GO terms available through the GO Web site. The curation and visualization of electron transfer processes has been significantly improved. Other software and Web site enhancements include the addition of tracks to the EcoCyc genome browser, in particular a type of track designed for the display of ChIP-chip datasets, and the development of a comparative genome browser. A new Genome Omics Viewer enables users to paint omics datasets onto the full E. coli genome for analysis. A new advanced query page guides users in interactively constructing complex database queries against EcoCyc. A Macintosh version of EcoCyc is now available. A series of Webinars is available to instruct users in the use of EcoCyc.
Collapse
|
56
|
Freyre-González JA, Alonso-Pavón JA, Treviño-Quintanilla LG, Collado-Vides J. Functional architecture of Escherichia coli: new insights provided by a natural decomposition approach. Genome Biol 2008; 9:R154. [PMID: 18954463 PMCID: PMC2760881 DOI: 10.1186/gb-2008-9-10-r154] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2008] [Accepted: 10/27/2008] [Indexed: 11/16/2022] Open
Abstract
The E. coli transcriptional regulatory network is shown to have a nonpyramidal architecture of independent modules governed by transcription factors, whose responses are integrated by intermodular genes. Background Previous studies have used different methods in an effort to extract the modular organization of transcriptional regulatory networks. However, these approaches are not natural, as they try to cluster strongly connected genes into a module or locate known pleiotropic transcription factors in lower hierarchical layers. Here, we unravel the transcriptional regulatory network of Escherichia coli by separating it into its key elements, thus revealing its natural organization. We also present a mathematical criterion, based on the topological features of the transcriptional regulatory network, to classify the network elements into one of two possible classes: hierarchical or modular genes. Results We found that modular genes are clustered into physiologically correlated groups validated by a statistical analysis of the enrichment of the functional classes. Hierarchical genes encode transcription factors responsible for coordinating module responses based on general interest signals. Hierarchical elements correlate highly with the previously studied global regulators, suggesting that this could be the first mathematical method to identify global regulators. We identified a new element in transcriptional regulatory networks never described before: intermodular genes. These are structural genes that integrate, at the promoter level, signals coming from different modules, and therefore from different physiological responses. Using the concept of pleiotropy, we have reconstructed the hierarchy of the network and discuss the role of feedforward motifs in shaping the hierarchical backbone of the transcriptional regulatory network. Conclusions This study sheds new light on the design principles underpinning the organization of transcriptional regulatory networks, showing a novel nonpyramidal architecture composed of independent modules globally governed by hierarchical transcription factors, whose responses are integrated by intermodular genes.
Collapse
|
57
|
Angarica VE, Pérez AG, Vasconcelos AT, Collado-Vides J, Contreras-Moreira B. Prediction of TF target sites based on atomistic models of protein-DNA complexes. BMC Bioinformatics 2008; 9:436. [PMID: 18922190 PMCID: PMC2585596 DOI: 10.1186/1471-2105-9-436] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2008] [Accepted: 10/16/2008] [Indexed: 11/10/2022] Open
Abstract
Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition.
Collapse
|
58
|
Lozada-Chávez I, Angarica VE, Collado-Vides J, Contreras-Moreira B. The role of DNA-binding specificity in the evolution of bacterial regulatory networks. J Mol Biol 2008; 379:627-43. [PMID: 18466918 DOI: 10.1016/j.jmb.2008.04.008] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2008] [Accepted: 04/02/2008] [Indexed: 11/25/2022]
Abstract
Understanding the mechanisms by which transcriptional regulatory networks (TRNs) change through evolution is a fundamental problem.Here, we analyze this question using data from Escherichia coli and Bacillus subtilis, and find that paralogy relationships are insufficient to explain the global or local role observed for transcription factors (TFs) within regulatory networks. Our results provide a picture in which DNA-binding specificity, a molecular property that can be measured in different ways, is a predictor of the role of transcription factors. In particular, we observe that global regulators consistently display low levels of binding specificity, while displaying comparatively higher expression values in microarray experiments. In addition, we find a strong negative correlation between binding specificity and the number of co-regulators that help coordinate genetic expression on a genomic scale. A close look at several orthologous TFs,including FNR, a regulator found to be global in E. coli and local in B.subtilis, confirms the diagnostic value of specificity in order to understand their regulatory function, and highlights the importance of evaluating the metabolic and ecological relevance of effectors as another variable in the evolutionary equation of regulatory networks. Finally, a general model is presented that integrates some evolutionary forces and molecular properties,aiming to explain how regulons grow and shrink, as bacteria tune their regulation to increase adaptation.
Collapse
|
59
|
González Pérez AD, González González E, Espinosa Angarica V, Vasconcelos ATR, Collado-Vides J. Impact of Transcription Units rearrangement on the evolution of the regulatory network of gamma-proteobacteria. BMC Genomics 2008; 9:128. [PMID: 18366643 PMCID: PMC2329645 DOI: 10.1186/1471-2164-9-128] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2007] [Accepted: 03/17/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the past years, several studies begun to unravel the structure, dynamical properties, and evolution of transcriptional regulatory networks. However, even those comparative studies that focus on a group of closely related organisms are limited by the rather scarce knowledge on regulatory interactions outside a few model organisms, such as E. coli among the prokaryotes. RESULTS In this paper we used the information annotated in Tractor_DB (a database of regulatory networks in gamma-proteobacteria) to calculate a normalized Site Orthology Score (SOS) that quantifies the conservation of a regulatory link across thirty genomes of this subclass. Then we used this SOS to assess how regulatory connections have evolved in this group, and how the variation of basic regulatory connection is reflected on the structure of the chromosome. We found that individual regulatory interactions shift between different organisms, a process that may be described as rewiring the network. At this evolutionary scale (the gamma-proteobacteria subclass) this rewiring process may be an important source of variation of regulatory incoming interactions for individual networks. We also noticed that the regulatory links that form feed forward motifs are conserved in a better correlated manner than triads of random regulatory interactions or pairs of co-regulated genes. Furthermore, the rewiring process that takes place at the most basic level of the regulatory network may be linked to rearrangements of genetic material within bacterial chromosomes, which change the structure of Transcription Units and therefore the regulatory connections between Transcription Factors and structural genes. CONCLUSION The rearrangements that occur in bacterial chromosomes-mostly inversion or horizontal gene transfer events - are important sources of variation of gene regulation at this evolutionary scale.
Collapse
|
60
|
Gama-Castro S, Jiménez-Jacinto V, Peralta-Gil M, Santos-Zavaleta A, Peñaloza-Spinola MI, Contreras-Moreira B, Segura-Salazar J, Muñiz-Rascado L, Martínez-Flores I, Salgado H, Bonavides-Martínez C, Abreu-Goodger C, Rodríguez-Penagos C, Miranda-Ríos J, Morett E, Merino E, Huerta AM, Treviño-Quintanilla L, Collado-Vides J. RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation. Nucleic Acids Res 2007; 36:D120-4. [PMID: 18158297 PMCID: PMC2238961 DOI: 10.1093/nar/gkm994] [Citation(s) in RCA: 349] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
RegulonDB (http://regulondb.ccg.unam.mx/) is the primary reference database offering curated knowledge of the transcriptional regulatory network of Escherichia coli K12, currently the best-known electronically encoded database of the genetic regulatory network of any free-living organism. This paper summarizes the improvements, new biology and new features available in version 6.0. Curation of original literature is, from now on, up to date for every new release. All the objects are supported by their corresponding evidences, now classified as strong or weak. Transcription factors are classified by origin of their effectors and by gene ontology class. We have now computational predictions for σ54 and five different promoter types of the σ70 family, as well as their corresponding −10 and −35 boxes. In addition to those curated from the literature, we added about 300 experimentally mapped promoters coming from our own high-throughput mapping efforts. RegulonDB v.6.0 now expands beyond transcription initiation, including RNA regulatory elements, specifically riboswitches, attenuators and small RNAs, with their known associated targets. The data can be accessed through overviews of correlations about gene regulation. RegulonDB associated original literature, together with more than 4000 curation notes, can now be searched with the Textpresso text mining engine.
Collapse
|
61
|
Palacios R, Collado-Vides J. Development of genomic sciences in Mexico: a good start and a long way to go. PLoS Comput Biol 2007; 3:1670-3. [PMID: 17907791 PMCID: PMC1994971 DOI: 10.1371/journal.pcbi.0030143] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
62
|
Karp PD, Keseler IM, Shearer A, Latendresse M, Krummenacker M, Paley SM, Paulsen I, Collado-Vides J, Gama-Castro S, Peralta-Gil M, Santos-Zavaleta A, Peñaloza-Spínola MI, Bonavides-Martinez C, Ingraham J. Multidimensional annotation of the Escherichia coli K-12 genome. Nucleic Acids Res 2007; 35:7577-90. [PMID: 17940092 PMCID: PMC2190727 DOI: 10.1093/nar/gkm740] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The annotation of the Escherichia coli K-12 genome in the EcoCyc database is one of the most accurate, complete and multidimensional genome annotations. Of the 4460 E. coli genes, EcoCyc assigns biochemical functions to 76%, and 66% of all genes had their functions determined experimentally. EcoCyc assigns E. coli genes to Gene Ontology and to MultiFun. Seventy-five percent of gene products contain reviews authored by the EcoCyc project that summarize the experimental literature about the gene product. EcoCyc information was derived from 15 000 publications. The database contains extensive descriptions of E. coli cellular networks, describing its metabolic, transport and transcriptional regulatory processes. A comparison to genome annotations for other model organisms shows that the E. coli genome contains the most experimentally determined gene functions in both relative and absolute terms: 2941 (66%) for E. coli, 2319 (37%) for Saccharomyces cerevisiae, 1816 (5%) for Arabidopsis thaliana, 1456 (4%) for Mus musculus and 614 (4%) for Drosophila melanogaster. Database queries to EcoCyc survey the global properties of E. coli cellular networks and illuminate the extent of information gaps for E. coli, such as dead-end metabolites. EcoCyc provides a genome browser with novel properties, and a novel interactive display of transcriptional regulatory networks.
Collapse
|
63
|
Janga SC, Collado-Vides J. Structure and evolution of gene regulatory networks in microbial genomes. Res Microbiol 2007; 158:787-94. [PMID: 17996425 DOI: 10.1016/j.resmic.2007.09.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2007] [Revised: 08/07/2007] [Accepted: 09/17/2007] [Indexed: 12/24/2022]
Abstract
With the availability of genome sequences for hundreds of microbial genomes, it has become possible to address several questions from a comparative perspective to understand the structure and function of regulatory systems, at least in model organisms. Recent studies have focused on topological properties and the evolution of regulatory networks and their components. Our understanding of natural networks is paving the way to embedding synthetic regulatory systems into organisms, allowing us to expand the natural diversity of living systems to an extent we had never before anticipated.
Collapse
|
64
|
Janga SC, Salgado H, Martínez-Antonio A, Collado-Vides J. Coordination logic of the sensing machinery in the transcriptional regulatory network of Escherichia coli. Nucleic Acids Res 2007; 35:6963-72. [PMID: 17933780 PMCID: PMC2175315 DOI: 10.1093/nar/gkm743] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
The active and inactive state of transcription factors in growing cells is usually directed by allosteric physicochemical signals or metabolites, which are in turn either produced in the cell or obtained from the environment by the activity of the products of effector genes. To understand the regulatory dynamics and to improve our knowledge about how transcription factors (TFs) respond to endogenous and exogenous signals in the bacterial model, Escherichia coli, we previously proposed to classify TFs into external, internal and hybrid sensing classes depending on the source of their allosteric or equivalent metabolite. Here we analyze how a cell uses its topological structures in the context of sensing machinery and show that, while feed forward loops (FFLs) tightly integrate internal and external sensing TFs connecting TFs from different layers of the hierarchical transcriptional regulatory network (TRN), bifan motifs frequently connect TFs belonging to the same sensing class and could act as a bridge between TFs originating from the same level in the hierarchy. We observe that modules identified in the regulatory network of E. coli are heterogeneous in sensing context with a clear combination of internal and external sensing categories depending on the physiological role played by the module. We also note that propensity of two-component response regulators increases at promoters, as the number of TFs regulating a target operon increases. Finally we show that evolutionary families of TFs do not show a tendency to preserve their sensing abilities. Our results provide a detailed panorama of the topological structures of E. coli TRN and the way TFs they compose off, sense their surroundings by coordinating responses.
Collapse
|
65
|
Resendis-Antonio O, Reed JL, Encarnación S, Collado-Vides J, Palsson BØ. Metabolic reconstruction and modeling of nitrogen fixation in Rhizobium etli. PLoS Comput Biol 2007; 3:1887-95. [PMID: 17922569 PMCID: PMC2000972 DOI: 10.1371/journal.pcbi.0030192] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2007] [Accepted: 08/17/2007] [Indexed: 11/19/2022] Open
Abstract
Rhizobiaceas are bacteria that fix nitrogen during symbiosis with plants. This symbiotic relationship is crucial for the nitrogen cycle, and understanding symbiotic mechanisms is a scientific challenge with direct applications in agronomy and plant development. Rhizobium etli is a bacteria which provides legumes with ammonia (among other chemical compounds), thereby stimulating plant growth. A genome-scale approach, integrating the biochemical information available for R. etli, constitutes an important step toward understanding the symbiotic relationship and its possible improvement. In this work we present a genome-scale metabolic reconstruction (iOR363) for R. etli CFN42, which includes 387 metabolic and transport reactions across 26 metabolic pathways. This model was used to analyze the physiological capabilities of R. etli during stages of nitrogen fixation. To study the physiological capacities in silico, an objective function was formulated to simulate symbiotic nitrogen fixation. Flux balance analysis (FBA) was performed, and the predicted active metabolic pathways agreed qualitatively with experimental observations. In addition, predictions for the effects of gene deletions during nitrogen fixation in Rhizobia in silico also agreed with reported experimental data. Overall, we present some evidence supporting that FBA of the reconstructed metabolic network for R. etli provides results that are in agreement with physiological observations. Thus, as for other organisms, the reconstructed genome-scale metabolic network provides an important framework which allows us to compare model predictions with experimental measurements and eventually generate hypotheses on ways to improve nitrogen fixation.
Collapse
|
66
|
Rodríguez-Penagos C, Salgado H, Martínez-Flores I, Collado-Vides J. Automatic reconstruction of a bacterial regulatory network using Natural Language Processing. BMC Bioinformatics 2007; 8:293. [PMID: 17683642 PMCID: PMC1964768 DOI: 10.1186/1471-2105-8-293] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2007] [Accepted: 08/07/2007] [Indexed: 11/24/2022] Open
Abstract
Background Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in Escherichia coli K-12. Results Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners. Conclusion Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages.
Collapse
|
67
|
Gutierrez-Ríos RM, Freyre-Gonzalez JA, Resendis O, Collado-Vides J, Saier M, Gosset G. Identification of regulatory network topological units coordinating the genome-wide transcriptional response to glucose in Escherichia coli. BMC Microbiol 2007; 7:53. [PMID: 17559662 PMCID: PMC1905917 DOI: 10.1186/1471-2180-7-53] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2007] [Accepted: 06/08/2007] [Indexed: 11/24/2022] Open
Abstract
Background Glucose is the preferred carbon and energy source for Escherichia coli. A complex regulatory network coordinates gene expression, transport and enzyme activities in response to the presence of this sugar. To determine the extent of the cellular response to glucose, we applied an approach combining global transcriptome and regulatory network analyses. Results Transcriptome data from isogenic wild type and crp- strains grown in Luria-Bertani medium (LB) or LB + 4 g/L glucose (LB+G) were analyzed to identify differentially transcribed genes. We detected 180 and 200 genes displaying increased and reduced relative transcript levels in the presence of glucose, respectively. The observed expression pattern in LB was consistent with a gluconeogenic metabolic state including active transport and interconversion of small molecules and macromolecules, induction of protease-encoding genes and a partial heat shock response. In LB+G, catabolic repression was detected for transport and metabolic interconversion activities. We also detected an increased capacity for de novo synthesis of nucleotides, amino acids and proteins. Cluster analysis of a subset of genes revealed that CRP mediates catabolite repression for most of the genes displaying reduced transcript levels in LB+G, whereas Fis participates in the upregulation of genes under this condition. An analysis of the regulatory network, in terms of topological functional units, revealed 8 interconnected modules which again exposed the importance of Fis and CRP as directly responsible for the coordinated response of the cell. This effect was also seen with other not extensively connected transcription factors such as FruR and PdhR, which showed a consistent response considering media composition. Conclusion This work allowed the identification of eight interconnected regulatory network modules that includes CRP, Fis and other transcriptional factors that respond directly or indirectly to the presence of glucose. In most cases, each of these modules includes genes encoding physiologically related functions, thus indicating a connection between regulatory network topology and related cellular functions involved in nutrient sensing and metabolism.
Collapse
|
68
|
Contreras-Moreira B, Branger PA, Collado-Vides J. TFmodeller: comparative modelling of protein–DNA complexes. Bioinformatics 2007; 23:1694-6. [PMID: 17459960 DOI: 10.1093/bioinformatics/btm148] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED Interactions between proteins and DNA molecules lie at the core of the fundamental cellular processes such as transcriptional regulation. Some of these interactions have been experimentally described at atomic scale, but the molecular details of many others remain to be discovered. TFmodeller exploits the current knowledge about protein-DNA interfaces contained in the Protein Data Bank and uses it to model similar interfaces related by homology. Results are emailed to the user and include an evolutionary contact matrix, a schematic representation of the putative binding interface and atomic coordinates of the modelled complex. The library of complexes used by TFmodeller is updated on a weekly basis and is available for download. AVAILABILITY TFmodeller and its web service interface are free for academic users at http://www.ccg.unam.mx/tfmodeller.
Collapse
|
69
|
Janga SC, Salgado H, Collado-Vides J, Martínez-Antonio A. Internal versus external effector and transcription factor gene pairs differ in their relative chromosomal position in Escherichia coli. J Mol Biol 2007; 368:263-72. [PMID: 17321548 DOI: 10.1016/j.jmb.2007.01.019] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2006] [Revised: 12/22/2006] [Accepted: 01/04/2007] [Indexed: 11/28/2022]
Abstract
Transcription factors (TFs) play an important role in the genetic regulation of transcription in response to internal and external cellular stimuli. However, little is known about their functional and dynamic aspects on a large scale, even in a well-studied bacterium like Escherichia coli. To understand the regulatory dynamics and to improve our knowledge about how TFs respond to endogenous and exogenous signals in this simple bacterium model, we previously proposed that TFs can be classified into three classes, depending on how they sense their allosteric or equivalent metabolite: external class, internal class, and hybrid sensing class. Classification of these groups was done without considering the relative chromosomal positions of the TFs and their corresponding effector genes. Here, we analyze the genome organization of the genetic components of these sensing systems, using the classification described earlier. We report the chromosomal proximity of transcription factors and their effector genes to sense periplasmic signals or transported metabolites (i.e. transcriptional sensing systems from the external class) in contrast to the components for sensing internally synthesized metabolites, which tend to be distant on the chromosome. We strengthen our finding that external sensing genetic machinery behaves like chromosomal modules of regulation to respond rapidly to variations in external conditions through co-expression of their genetic components, which is corroborated with microarray data for E. coli. Furthermore, we show several lines of evidence supporting the need for the coordinated activity of external sensing systems in contrast to that of internal sensing machinery, which can explain their close chromosomal organization. The observed functional correlation between the chromosomal organization and the genetic machinery for environmental sensing should contribute to our understanding of the logical functioning and evolution of the transcriptional regulatory networks in bacteria.
Collapse
|
70
|
Pérez AG, Angarica VE, Vasconcelos ATR, Collado-Vides J. Tractor_DB (version 2.0): a database of regulatory interactions in gamma-proteobacterial genomes. Nucleic Acids Res 2006; 35:D132-6. [PMID: 17088283 PMCID: PMC1669740 DOI: 10.1093/nar/gkl800] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The version 2.0 of Tractor_DB is now accessible at its three international mirrors: , and . This database contains a collection of computationally predicted Transcription Factors' binding sites in gamma-proteobacterial genomes. These data should aid researchers in the design of microarray experiments and the interpretation of their results. They should also facilitate studies of Comparative Genomics of the regulatory networks of this group of organisms. In this paper we describe the main improvements incorporated to the database in the past year and a half which include incorporating information on the regulatory networks of 13—increasing to 30—new gamma-proteobacteria and developing a new computational strategy to complement the putative sites identified by the original weight matrix-based approach. We have also added dynamically generated navigation tabs to the navigation interfaces. Moreover, we developed a new interface that allows users to directly retrieve information on the conservation of regulatory interactions in the 30 genomes included in the database by navigating a map that represents a core of the known Escherichia coli regulatory network.
Collapse
|
71
|
Abstract
MOTIVATION Comparative modelling is a computational method used to tackle a variety of problems in molecular biology and biotechnology. Traditionally it has been applied to model the structure of proteins on their own or bound to small ligands, although more recently it has also been used to model protein-protein interfaces. This work is the first to systematically analyze whether comparative models of protein-DNA complexes could be built and be useful for predicting DNA binding sites. RESULTS First, we describe the structural and evolutionary conservation of protein-DNA interfaces, and the limits they impose on modelling accuracy. Second, we find that side-chains from contacting residues can be reasonably modeled and therefore used to identify contacting nucleotides. Third, the DNASITE protocol is implemented and different parameters are benchmarked on a set of 85 regulators from Escherichia coli. Results show that comparative footprinting can make useful predictions based solely on structural data, depending primarily on the interface identity with respect to the template used. AVAILABILITY DNASITE code available on request from the authors.
Collapse
|
72
|
Huerta AM, Francino MP, Morett E, Collado-Vides J. Selection for unequal densities of sigma70 promoter-like signals in different regions of large bacterial genomes. PLoS Genet 2006; 2:e185. [PMID: 17096598 PMCID: PMC1635534 DOI: 10.1371/journal.pgen.0020185] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2005] [Accepted: 09/12/2006] [Indexed: 11/18/2022] Open
Abstract
The evolutionary processes operating in the DNA regions that participate in the regulation of gene expression are poorly understood. In Escherichia coli, we have established a sequence pattern that distinguishes regulatory from nonregulatory regions. The density of promoter-like sequences, that could be recognizable by RNA polymerase and may function as potential promoters, is high within regulatory regions, in contrast to coding regions and regions located between convergently transcribed genes. Moreover, functional promoter sites identified experimentally are often found in the subregions of highest density of promoter-like signals, even when individual sites with higher binding affinity for RNA polymerase exist elsewhere within the regulatory region. In order to see the generality of this pattern, we have analyzed 43 additional genomes belonging to most established bacterial phyla. Differential densities between regulatory and nonregulatory regions are detectable in most of the analyzed genomes, with the exception of those that have evolved toward extreme genome reduction. Thus, presence of this pattern follows that of genes and other genomic features that require weak selection to be effective in order to persist. On this basis, we suggest that the loss of differential densities in the reduced genomes of host-restricted pathogens and symbionts is an outcome of the process of genome degradation resulting from the decreased efficiency of purifying selection in highly structured small populations. This implies that the differential distribution of promoter-like signals between regulatory and nonregulatory regions detected in large bacterial genomes confers a significant, although small, fitness advantage. This study paves the way for further identification of the specific types of selective constraints that affect the organization of regulatory regions and the overall distribution of promoter-like signals through more detailed comparative analyses among closely related bacterial genomes. The most important step in the regulation of genetic expression is the initiation of transcription. This process is accomplished by the association or specific binding of RNA polymerase to particular sequence segments present in the DNA, the promoters. Promoters are located in the upstream regions of the transcribed genes. The evolutionary processes operating in the DNA regions that participate in the regulation of gene expression are poorly understood. For a long time, the canonical picture of a σ70 promoter has been a 60 base pair region defined by the transcription start-point (+1) and two conserved hexanucleotide sequences centered 10 and 35 base pairs upstream from the +1. The authors have shown that in Escherichia coli, promoters exist in clusters, as a series of overlapping potentially competing RNAP interaction sites. The E. coli regulatory regions contain high densities of these promoter-like signals, in contrast to coding regions and regions located between convergently transcribed genes. They report that the differential densities between regulatory and nonregulatory regions are detectable in most eubacterial genomes, with the exception of those that have experienced severe genome degradation and size reduction. This suggests that the presence of this pattern in large bacterial genomes confers a significant, although small, fitness advantage.
Collapse
|
73
|
Lozada-Chávez I, Janga SC, Collado-Vides J. Bacterial regulatory networks are extremely flexible in evolution. Nucleic Acids Res 2006; 34:3434-45. [PMID: 16840530 PMCID: PMC1524901 DOI: 10.1093/nar/gkl423] [Citation(s) in RCA: 139] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Over millions of years the structure and complexity of the transcriptional regulatory network (TRN) in bacteria has changed, reorganized and enabled them to adapt to almost every environmental niche on earth. In order to understand the plasticity of TRNs in bacteria, we studied the conservation of currently known TRNs of the two model organisms Escherichia coli K12 and Bacillus subtilis across complete genomes including Bacteria, Archaea and Eukarya at three different levels: individual components of the TRN, pairs of interactions and regulons. We found that transcription factors (TFs) evolve much faster than the target genes (TGs) across phyla. We show that global regulators are poorly conserved across the phylogenetic spectrum and hence TFs could be the major players responsible for the plasticity and evolvability of the TRNs. We also found that there is only a small fraction of significantly conserved transcriptional regulatory interactions among different phyla of bacteria and that there is no constraint on the elements of the interaction to co-evolve. Finally our results suggest that majority of the regulons in bacteria are rapidly lost implying a high-order flexibility in the TRNs. We hypothesize that during the divergence of bacteria certain essential cellular processes like the synthesis of arginine, biotine and ribose, transport of amino acids and iron, availability of phosphate, replication process and the SOS response are well conserved in evolution. From our comparative analysis, it is possible to infer that transcriptional regulation is more flexible than the genetic component of the organisms and its complexity and structure plays an important role in the phenotypic adaptation.
Collapse
|
74
|
Huerta AM, Collado-Vides J, Francino MP. Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005. Positional conservation of clusters of overlapping promoter-like sequences in enterobacterial genomes. Mol Biol Evol 2006; 23:997-1010. [PMID: 16547149 DOI: 10.1093/molbev/msk004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
The selective mechanisms operating in regulatory regions of bacterial genomes are poorly understood. We have previously shown that, in most bacterial genomes, regulatory regions contain high densities of sigma70 promoter-like signals that are significantly above the densities detected in nonregulatory genomic regions. In order to investigate the molecular evolutionary forces that operate in bacterial regulatory regions and how they affect the observed redundancy of promoter-like signals, we have undertaken a comparative analysis across the completely sequenced genomes of enteric gamma-proteobacteria. This analysis detects significant positional conservation of promoter-like signal clusters across enterics, some times in spite of strong primary sequence divergence. This suggests that the conservation of the nature and exact position of specific nucleotides is not necessarily the priority of selection for maintaining the transcriptional function in these bacteria. We have further characterized the structural conservation of the regulatory regions of dnaQ and crp across all enterics. These two regions differ in essentiality and mode of regulation, the regulation of crp being more complex and involving interactions with several transcription factors. This results in substantially different modes of evolution, with the dnaQ region appearing to evolve under stronger purifying selection and the crp region showing the likely effects of stabilizing selection for a complex pattern of gene expression. The higher flexibility of the crp region is consistent with the observed less conservation of global regulators in evolution. Patterns of regulatory evolution are also found to be markedly different in endosymbiotic bacteria, in a manner consistent with regulatory regions suffering some level of degradation, as has been observed for many other characters in these genomes. Therefore, the mode of evolution of bacterial regulatory regions appears to be highly dependent on both the lifestyle of the bacterium and the specific regulatory requirements of different genes. In fact, in many bacteria, the mode of evolution of genes requiring significant physiological adaptability in expression levels may follow patterns similar to those operating in the more complex regulatory regions of eukaryotic genomes.
Collapse
|
75
|
Salgado H, Gama-Castro S, Peralta-Gil M, Díaz-Peredo E, Sánchez-Solano F, Santos-Zavaleta A, Martínez-Flores I, Jiménez-Jacinto V, Bonavides-Martínez C, Segura-Salazar J, Martínez-Antonio A, Collado-Vides J. RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res 2006; 34:D394-7. [PMID: 16381895 PMCID: PMC1347518 DOI: 10.1093/nar/gkj156] [Citation(s) in RCA: 276] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
RegulonDB is the internationally recognized reference database of Escherichia coli K-12 offering curated knowledge of the regulatory network and operon organization. It is currently the largest electronically-encoded database of the regulatory network of any free-living organism. We present here the recently launched RegulonDB version 5.0 radically different in content, interface design and capabilities. Continuous curation of original scientific literature provides the evidence behind every single object and feature. This knowledge is complemented with comprehensive computational predictions across the complete genome. Literature-based and predicted data are clearly distinguished in the database. Starting with this version, RegulonDB public releases are synchronized with those of EcoCyc since our curation supports both databases. The complex biology of regulation is simplified in a navigation scheme based on three major streams: genes, operons and regulons. Regulatory knowledge is directly available in every navigation step. Displays combine graphic and textual information and are organized allowing different levels of detail and biological context. This knowledge is the backbone of an integrated system for the graphic display of the network, graphic and tabular microarray comparisons with curated and predicted objects, as well as predictions across bacterial genomes, and predicted networks of functionally related gene products. Access RegulonDB at .
Collapse
|