1
|
Kılıç S, Sánchez-Osuna M, Collado-Padilla A, Barbé J, Erill I. Flexible comparative genomics of prokaryotic transcriptional regulatory networks. BMC Genomics 2020; 21:466. [PMID: 33327941 PMCID: PMC7739468 DOI: 10.1186/s12864-020-06838-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 06/16/2020] [Indexed: 11/25/2022] Open
Abstract
Background Comparative genomics methods enable the reconstruction of bacterial regulatory networks using available experimental data. In spite of their potential for accelerating research into the composition and evolution of bacterial regulons, few comparative genomics suites have been developed for the automated analysis of these regulatory systems. Available solutions typically rely on precomputed databases for operon and ortholog predictions, limiting the scope of analyses to processed complete genomes, and several key issues such as the transfer of experimental information or the integration of regulatory information in a probabilistic setting remain largely unaddressed. Results Here we introduce CGB, a flexible platform for comparative genomics of prokaryotic regulons. CGB has few external dependencies and enables fully customized analyses of newly available genome data. The platform automates the merging of experimental information and uses a gene-centered, Bayesian framework to generate and integrate easily interpretable results. We demonstrate its flexibility and power by analyzing the evolution of type III secretion system regulation in pathogenic Proteobacteria and by characterizing the SOS regulon of a new bacterial phylum, the Balneolaeota. Conclusions Our results demonstrate the applicability of the CGB pipeline in multiple settings. CGB’s ability to automatically integrate experimental information from multiple sources and use complete and draft genomic data, coupled with its non-reliance on precomputed databases and its easily interpretable display of gene-centered posterior probabilities of regulation provide users with an unprecedented level of flexibility in launching comparative genomics analyses of prokaryotic transcriptional regulatory networks. The analyses of type III secretion and SOS response regulatory networks illustrate instances of convergent and divergent evolution of these regulatory systems, showcasing the power of formal ancestral state reconstruction at inferring the evolutionary history of regulatory networks.
Collapse
Affiliation(s)
- Sefa Kılıç
- University of Maryland Baltimore County, Baltimore, MD, 21250, USA
| | | | | | - Jordi Barbé
- Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain
| | - Ivan Erill
- University of Maryland Baltimore County, Baltimore, MD, 21250, USA.
| |
Collapse
|
2
|
Agüero-Chapin G, Galpert D, Molina-Ruiz R, Ancede-Gallardo E, Pérez-Machado G, De la Riva GA, Antunes A. Graph Theory-Based Sequence Descriptors as Remote Homology Predictors. Biomolecules 2019; 10:E26. [PMID: 31878100 PMCID: PMC7022958 DOI: 10.3390/biom10010026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 12/16/2019] [Accepted: 12/18/2019] [Indexed: 12/23/2022] Open
Abstract
Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical-numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Deborah Galpert
- Departamento de Ciencia de la Computación. Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Evys Ancede-Gallardo
- Programa de Doctorado en Fisicoquímica Molecular, Facultad de Ciencias Exactas, Universidad Andrés Bello, Av. República 239, Santiago 8370146, Chile;
| | - Gisselle Pérez-Machado
- EpiDisease S.L. Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain;
| | - Gustavo A. De la Riva
- Laboratorio de Biotecnología Aplicada S. de R.L. de C.V., GRECA Inc., Carretera La Piedad-Carapán, km 3.5, La Piedad, Michoacán 59300, Mexico;
- Tecnológico Nacional de México, Instituto Tecnológico de la Piedad, Av. Ricardo Guzmán Romero, Santa Fe, La Piedad de Cavadas, Michoacán 59370, Mexico
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| |
Collapse
|
3
|
Mrázek J, Karls AC. In silico simulations of occurrence of transcription factor binding sites in bacterial genomes. BMC Evol Biol 2019; 19:67. [PMID: 30823869 PMCID: PMC6397444 DOI: 10.1186/s12862-019-1381-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 02/01/2019] [Indexed: 11/16/2022] Open
Abstract
Background Interactions between transcription factors and their specific binding sites are a key component of regulation of gene expression. Until recently, it was generally assumed that most bacterial transcription factor binding sites are located at or near promoters. However, several recent works utilizing high-throughput technology to detect transcription factor binding sites in bacterial genomes found a large number of binding sites in unexpected locations, particularly inside genes, as opposed to known or expected promoter regions. While some of these intragenic binding sites likely have regulatory functions, an alternative scenario is that many of these binding sites arise by chance in the absence of selective constraints. The latter possibility was supported by in silico simulations for σ54 binding sites in Salmonella. Results In this work, we extend these simulations to more than forty transcription factors from E. coli and other bacteria. The results suggest that binding sites for all analyzed transcription factors are likely to arise throughout the genome by random genetic drift and many transcription factor binding sites found in genomes may not have specific regulatory functions. In addition, when comparing observed and expected patterns of occurrence of binding sites in genomes, we observed distinct differences among different transcription factors. Conclusions We speculate that transcription factor binding sites randomly occurring throughout the genome could be beneficial in promoting emergence of new regulatory interactions and thus facilitating evolution of gene regulatory networks. Electronic supplementary material The online version of this article (10.1186/s12862-019-1381-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jan Mrázek
- Department of Microbiology, University of Georgia, Athens, GA, USA. .,Institute of Bioinformatics, University of Georgia, Athens, GA, USA.
| | - Anna C Karls
- Department of Microbiology, University of Georgia, Athens, GA, USA
| |
Collapse
|
4
|
AL-barakati HJ, Saigo H, Newman RH, KC DB. RF-GlutarySite: a random forest based predictor for glutarylation sites. Mol Omics 2019; 15:189-204. [DOI: 10.1039/c9mo00028c] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Glutarylation, which is a newly identified posttranslational modification that occurs on lysine residues, has recently emerged as an important regulator of several metabolic and mitochondrial processes. Here, we describe the development of RF-GlutarySite, a random forest-based predictor designed to predict glutarylation sites based on protein primary amino acid sequence.
Collapse
Affiliation(s)
- Hussam J. AL-barakati
- Department of Computational Science and Engineering
- North Carolina Agricultural & Technical State University
- Greensboro
- USA
| | - Hiroto Saigo
- Department of Informatics
- Kyushu University
- Fukuoka 819-0395
- Japan
| | - Robert H. Newman
- Department of Biology
- North Carolina Agricultural & Technical State University
- Greensboro
- USA
| | - Dukka B. KC
- Department of Computational Science and Engineering
- North Carolina Agricultural & Technical State University
- Greensboro
- USA
| |
Collapse
|
5
|
Tong H, Schliekelman P, Mrázek J. Unsupervised statistical discovery of spaced motifs in prokaryotic genomes. BMC Genomics 2017; 18:27. [PMID: 28056763 PMCID: PMC5217627 DOI: 10.1186/s12864-016-3400-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Accepted: 12/09/2016] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites. Most motif-finding methods apply probabilistic models to detect motifs characterized by unusually high number of copies of the motif in the analyzed sequences. RESULTS We present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. In the present implementation of the method, motifs are represented by pentamers and all pairs of pentamers are evaluated for statistically significant preference for a specific distance. An important step of the algorithm eliminates motif pairs where the spacers separating the two motifs exhibit a high degree of sequence similarity; such motif pairs likely arise from duplications of the whole segment including the motifs and the spacer rather than due to selective constraints indicative of a functional importance of the motif pair. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. Some motifs detected were previously known but other motifs found in the search appear to be novel. Selected motif pairs were subjected to further investigation and in some cases their possible biological functions were proposed. CONCLUSIONS We present a new motif-finding technique that is applicable to scanning complete genomes for sequence motifs. The results from analysis of 569 genomes suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. We conclude that our approach to detection of significant motif pairs can complement existing motif-finding techniques in discovery of novel functional sequence motifs in complete genomes.
Collapse
Affiliation(s)
- Hao Tong
- Department of Statistics, University of Georgia, Athens, GA, 30602, USA
| | - Paul Schliekelman
- Department of Statistics, University of Georgia, Athens, GA, 30602, USA
| | - Jan Mrázek
- Department of Microbiology and Institute of Bioinformatics, University of Georgia, Athens, GA, 30602, USA.
| |
Collapse
|
6
|
O'Neill PK, Erill I. Parametric bootstrapping for biological sequence motifs. BMC Bioinformatics 2016; 17:406. [PMID: 27716039 PMCID: PMC5052923 DOI: 10.1186/s12859-016-1246-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2016] [Accepted: 09/08/2016] [Indexed: 11/10/2022] Open
Abstract
Background Biological sequence motifs drive the specific interactions of proteins and nucleic acids. Accordingly, the effective computational discovery and analysis of such motifs is a central theme in bioinformatics. Many practical questions about the properties of motifs can be recast as random sampling problems. In this light, the task is to determine for a given motif whether a certain feature of interest is statistically unusual among relevantly similar alternatives. Despite the generality of this framework, its use has been frustrated by the difficulties of defining an appropriate reference class of motifs for comparison and of sampling from it effectively. Results We define two distributions over the space of all motifs of given dimension. The first is the maximum entropy distribution subject to mean information content, and the second is the truncated uniform distribution over all motifs having information content within a given interval. We derive exact sampling algorithms for each. As a proof of concept, we employ these sampling methods to analyze a broad collection of prokaryotic and eukaryotic transcription factor binding site motifs. In addition to positional information content, we consider the informational Gini coefficient of the motif, a measure of the degree to which information is evenly distributed throughout a motif’s positions. We find that both prokaryotic and eukaryotic motifs tend to exhibit higher informational Gini coefficients (IGC) than would be expected by chance under either reference distribution. As a second application, we apply maximum entropy sampling to the motif p-value problem and use it to give elementary derivations of two new estimators. Conclusions Despite the historical centrality of biological sequence motif analysis, this study constitutes to our knowledge the first use of principled null hypotheses for sequence motifs given information content. Through their use, we are able to characterize for the first time differerences in global motif statistics between biological motifs and their null distributions. In particular, we observe that biological sequence motifs show an unusual distribution of IGC, presumably due to biochemical constraints on the mechanisms of direct read-out. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1246-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Patrick K O'Neill
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, 21250, US
| | - Ivan Erill
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, 21250, US.
| |
Collapse
|
7
|
Peng FY, Hu Z, Yang RC. Bioinformatic prediction of transcription factor binding sites at promoter regions of genes for photoperiod and vernalization responses in model and temperate cereal plants. BMC Genomics 2016; 17:573. [PMID: 27503086 PMCID: PMC4977670 DOI: 10.1186/s12864-016-2916-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 07/07/2016] [Indexed: 11/14/2022] Open
Abstract
Background Many genes involved in responses to photoperiod and vernalization have been characterized or predicted in Arabidopsis (Arabidopsis thaliana), Brachypodium (Brachypodium distachyon), wheat (Triticum aestivum) and barley (Hordeum vulgare). However, little is known about the transcription regulation of these genes, especially in the large, complex genomes of wheat and barley. Results We identified 68, 60, 195 and 61 genes that are known or postulated to control pathways of photoperiod (PH), vernalization (VE) and pathway integration (PI) in Arabidopsis, Brachypodium, wheat and barley for predicting transcription factor binding sites (TFBSs) in the promoters of these genes using the FIMO motif search tool of the MEME Suite. The initial predicted TFBSs were filtered to confirm the final numbers of predicted TFBSs to be 1066, 1379, 1528, and 789 in Arabidopsis, Brachypodium, wheat and barley, respectively. These TFBSs were mapped onto the PH, VE and PI pathways to infer about the regulation of gene expression in Arabidopsis and cereal species. The GC contents in promoters, untranslated regions (UTRs), coding sequences and introns were higher in the three cereal species than those in Arabidopsis. The predicted TFBSs were most abundant for two transcription factor (TF) families: MADS-box and CSD (cold shock domain). The analysis of publicly available gene expression data showed that genes with similar numbers of MADS-box and CSD TFBSs exhibited similar expression patterns across several different tissues and developmental stages. The intra-specific Tajima D-statistics of TFBS motif diversity showed different binding specificity among different TF families. The inter-specific Tajima D-statistics suggested faster TFBS divergence in TFBSs than in coding sequences and introns. Mapping TFBSs onto the PH, VE and PI pathways showed the predominance of MADS-box and CSD TFBSs in most genes of the four species, and the difference in the pathway regulations between Arabidopsis and the three cereal species. Conclusion Our approach to associating the key flowering genes with their potential TFs through prediction of putative TFBSs provides a framework to explore regulatory mechanisms of photoperiod and vernalization responses in flowering plants. The predicted TFBSs in the promoters of the flowering genes provide a basis for molecular characterization of transcription regulation in the large, complex genomes of important crop species, wheat and barley. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2916-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Fred Y Peng
- Feed Crops Section, Alberta Agriculture and Forestry, 7000 - 113 Street, Edmonton, AB, T6H 5T6, Canada
| | - Zhiqiu Hu
- Department of Agricultural, Food and Nutritional Science, University of Alberta, 410 Agriculture/Forestry Centre, Edmonton, AB, T6G 2P5, Canada
| | - Rong-Cai Yang
- Feed Crops Section, Alberta Agriculture and Forestry, 7000 - 113 Street, Edmonton, AB, T6H 5T6, Canada. .,Department of Agricultural, Food and Nutritional Science, University of Alberta, 410 Agriculture/Forestry Centre, Edmonton, AB, T6G 2P5, Canada.
| |
Collapse
|
8
|
RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest. BIOMED RESEARCH INTERNATIONAL 2016; 2016:3281590. [PMID: 27066500 PMCID: PMC4811047 DOI: 10.1155/2016/3281590] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Revised: 01/13/2016] [Accepted: 01/31/2016] [Indexed: 01/17/2023]
Abstract
Protein phosphorylation is one of the most widespread regulatory mechanisms in eukaryotes. Over the past decade, phosphorylation site prediction has emerged as an important problem in the field of bioinformatics. Here, we report a new method, termed Random Forest-based Phosphosite predictor 2.0 (RF-Phos 2.0), to predict phosphorylation sites given only the primary amino acid sequence of a protein as input. RF-Phos 2.0, which uses random forest with sequence and structural features, is able to identify putative sites of phosphorylation across many protein families. In side-by-side comparisons based on 10-fold cross validation and an independent dataset, RF-Phos 2.0 compares favorably to other popular mammalian phosphosite prediction methods, such as PhosphoSVM, GPS2.1, and Musite.
Collapse
|
9
|
Every Site Counts: Submitting Transcription Factor-Binding Site Information through the CollecTF Portal. J Bacteriol 2015; 197:2454-7. [PMID: 26013488 DOI: 10.1128/jb.00031-15] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Experimentally verified transcription factor-binding sites represent an information-rich and highly applicable data type that aptly summarizes the results of time-consuming experiments and inference processes. Currently, there is no centralized repository for this type of data, which is routinely embedded in articles and extremely hard to mine. CollecTF provides the first standardized resource for submission and deposition of these data into the NCBI RefSeq database, maximizing its accessibility and prompting the community to adopt direct submission policies.
Collapse
|
10
|
Identification and characterization of VpsR and VpsT binding sites in Vibrio cholerae. J Bacteriol 2015; 197:1221-35. [PMID: 25622616 DOI: 10.1128/jb.02439-14] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
UNLABELLED The ability to form biofilms is critical for environmental survival and transmission of Vibrio cholerae, a facultative human pathogen responsible for the disease cholera. Biofilm formation is controlled by several transcriptional regulators and alternative sigma factors. In this study, we report that the two main positive regulators of biofilm formation, VpsR and VpsT, bind to nonoverlapping target sequences in the regulatory region of vpsL in vitro. VpsR binds to a proximal site (the R1 box) as well as a distal site (the R2 box) with respect to the transcriptional start site identified upstream of vpsL. The VpsT binding site (the T box) is located between the R1 and R2 boxes. While mutations in the T and R boxes resulted in a decrease in vpsL expression, deletion of the T and R2 boxes resulted in an increase in vpsL expression. Analysis of the role of H-NS in vpsL expression revealed that deletion of hns resulted in enhanced vpsL expression. The level of vpsL expression was higher in an hns vpsT double mutant than in the parental strain but lower than that in an hns mutant. In silico analysis of the regulatory regions of the VpsR and VpsT targets resulted in the identification of conserved recognition motifs for VpsR and VpsT and revealed that operons involved in biofilm formation and vpsT are coregulated by VpsR and VpsT. Furthermore, a comparative genomics analysis revealed substantial variability in the promoter region of the vpsT and vpsL genes among extant V. cholerae isolates, suggesting that regulation of biofilm formation is under active selection. IMPORTANCE Vibrio cholerae causes cholera and is a natural inhabitant of aquatic environments. One critical factor that is important for environmental survival and transmission of V. cholerae is the microbe's ability to form biofilms, which are surface-associated communities encased in a matrix composed of the exopolysaccharide VPS (Vibrio polysaccharide), proteins, and nucleic acids. Two proteins, VpsR and VpsT, positively regulate VPS production and biofilm formation. We characterized the structural features of the promoter of the vpsL gene, determined the target sequences recognized by VpsT and VpsR, and analyzed their distribution and conservation patterns in multiple V. cholerae isolates. This work fills a fundamental gap in our understanding of the regulatory mechanisms employed by the master regulators VpsR and VpsT in controlling biofilm matrix production.
Collapse
|
11
|
Johnson MD, Mueller M, Adamowicz-Brice M, Collins MJ, Gellert P, Maratou K, Srivastava PK, Rotival M, Butt S, Game L, Atanur SS, Silver N, Norsworthy PJ, Langley SR, Petretto E, Pravenec M, Aitman TJ. Genetic analysis of the cardiac methylome at single nucleotide resolution in a model of human cardiovascular disease. PLoS Genet 2014; 10:e1004813. [PMID: 25474312 PMCID: PMC4256262 DOI: 10.1371/journal.pgen.1004813] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Accepted: 10/09/2014] [Indexed: 12/03/2022] Open
Abstract
Epigenetic marks such as cytosine methylation are important determinants of cellular and whole-body phenotypes. However, the extent of, and reasons for inter-individual differences in cytosine methylation, and their association with phenotypic variation are poorly characterised. Here we present the first genome-wide study of cytosine methylation at single-nucleotide resolution in an animal model of human disease. We used whole-genome bisulfite sequencing in the spontaneously hypertensive rat (SHR), a model of cardiovascular disease, and the Brown Norway (BN) control strain, to define the genetic architecture of cytosine methylation in the mammalian heart and to test for association between methylation and pathophysiological phenotypes. Analysis of 10.6 million CpG dinucleotides identified 77,088 CpGs that were differentially methylated between the strains. In F1 hybrids we found 38,152 CpGs showing allele-specific methylation and 145 regions with parent-of-origin effects on methylation. Cis-linkage explained almost 60% of inter-strain variation in methylation at a subset of loci tested for linkage in a panel of recombinant inbred (RI) strains. Methylation analysis in isolated cardiomyocytes showed that in the majority of cases methylation differences in cardiomyocytes and non-cardiomyocytes were strain-dependent, confirming a strong genetic component for cytosine methylation. We observed preferential nucleotide usage associated with increased and decreased methylation that is remarkably conserved across species, suggesting a common mechanism for germline control of inter-individual variation in CpG methylation. In the RI strain panel, we found significant correlation of CpG methylation and levels of serum chromogranin B (CgB), a proposed biomarker of heart failure, which is evidence for a link between germline DNA sequence variation, CpG methylation differences and pathophysiological phenotypes in the SHR strain. Together, these results will stimulate further investigation of the molecular basis of locally regulated variation in CpG methylation and provide a starting point for understanding the relationship between the genetic control of CpG methylation and disease phenotypes. Epigenetic marks provide information that is not encoded in the primary DNA sequence itself but in modifications of genomic DNA and of the associated proteins. Methylation of genomic DNA at cytosine residues is an important epigenetic modification that is associated with developmental processes, carcinogenesis and other diseases. Genome-wide extent of, and reasons for inter-individual differences in cytosine methylation, and their association with phenotypic variation are poorly characterised. To address these questions we have determined and compared the genome-wide methylation patterns in heart tissue of two inbred rat strains, the spontaneously hypertensive rat, an animal model of human disease and a control rat strain. Comparison of methylation differences between genetically identical animals from the same strain and differences between animals from different strains allowed us to quantify association of epigenetic and genetic differences. We show that differences in an individual's germline DNA sequence are important determinants of the variability in methylation between individuals. Comparison with previous reports implicates common mechanisms for regulation of cytosine methylation that are highly conserved across species. Finally, we find correlation between a proposed blood biomarker for heart failure and variation in DNA methylation, suggesting a link between germline DNA sequence variation, methylation and a disease-related phenotype.
Collapse
Affiliation(s)
- Michelle D. Johnson
- Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
- National Heart and Lung Institute, Imperial College, London, United Kingdom
| | - Michael Mueller
- Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
- National Heart and Lung Institute, Imperial College, London, United Kingdom
| | - Martyna Adamowicz-Brice
- Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
- National Heart and Lung Institute, Imperial College, London, United Kingdom
| | - Melissa J. Collins
- Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
- National Heart and Lung Institute, Imperial College, London, United Kingdom
| | - Pascal Gellert
- Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
- Institute of Clinical Sciences, Imperial College, London, United Kingdom
| | - Klio Maratou
- Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
- Institute of Clinical Sciences, Imperial College, London, United Kingdom
| | - Prashant K. Srivastava
- Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
| | - Maxime Rotival
- Integrative Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
| | - Shahena Butt
- Integrative Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
| | - Laurence Game
- Genomics Core Laboratory, MRC Clinical Sciences Centre, London, United Kingdom
| | - Santosh S. Atanur
- Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
- National Heart and Lung Institute, Imperial College, London, United Kingdom
| | - Nicholas Silver
- Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
- National Heart and Lung Institute, Imperial College, London, United Kingdom
| | - Penny J. Norsworthy
- Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
| | - Sarah R. Langley
- Integrative Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
| | - Enrico Petretto
- Integrative Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
| | - Michal Pravenec
- Institute of Physiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
- Institute of Biology and Medical Genetics, 1st Medical Faculty, Charles University, Prague, Czech Republic
| | - Timothy J. Aitman
- Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
- Institute of Clinical Sciences, Imperial College, London, United Kingdom
- * E-mail:
| |
Collapse
|
12
|
Vinga S. Information theory applications for biological sequence analysis. Brief Bioinform 2014; 15:376-89. [PMID: 24058049 PMCID: PMC7109941 DOI: 10.1093/bib/bbt068] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 08/17/2013] [Indexed: 01/13/2023] Open
Abstract
Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.
Collapse
Affiliation(s)
- Susana Vinga
- IDMEC, Instituto Superior Técnico - Universidade de Lisboa (IST-UL), Av. Rovisco Pais, 1049-001 Lisboa, Portugal. Tel.: +351-218419504; Fax: +351-218498097;
| |
Collapse
|
13
|
The LexA regulated genes of the Clostridium difficile. BMC Microbiol 2014; 14:88. [PMID: 24713082 PMCID: PMC4234289 DOI: 10.1186/1471-2180-14-88] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2013] [Accepted: 03/27/2014] [Indexed: 01/05/2023] Open
Abstract
Background The SOS response including two main proteins LexA and RecA, maintains the integrity of bacterial genomes after DNA damage due to metabolic or environmental assaults. Additionally, derepression of LexA-regulated genes can result in mutations, genetic exchange and expression of virulence factors. Here we describe the first comprehensive description of the in silico LexA regulon in Clostridium difficile, an important human pathogen. Results We grouped thirty C. difficile strains from different ribotypes and toxinotypes into three clusters according to lexA gene/protein variability. We applied in silico analysis coupled to surface plasmon resonance spectroscopy (SPR) and determined 16 LexA binding sites in C. difficile. Our data indicate that strains within the cluster, as defined by LexA variability, harbour several specific LexA regulon genes. In addition to core SOS genes: lexA, recA, ruvCA and uvrBA, we identified a LexA binding site on the pathogenicity locus (PaLoc) and in the putative promoter region of several genes involved in housekeeping, sporulation and antibiotic resistance. Conclusions Results presented here suggest that in C. difficile LexA is not merely a regulator of the DNA damage response genes but also controls the expression of dozen genes involved in various other biological functions. Our in vitro results indicate that in C. difficile inactivation of LexA repressor depends on repressor`s dissociation from the operators. We report that the repressor`s dissociation rates from operators differentiate, thus the determined LexA-DNA dissociation constants imply on the timing of SOS gene expression in C. difficile.
Collapse
|
14
|
O'Neill PK, Forder R, Erill I. Informational requirements for transcriptional regulation. J Comput Biol 2014; 21:373-84. [PMID: 24689750 DOI: 10.1089/cmb.2014.0032] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Transcription factors (TFs) regulate transcription by binding to specific sites in promoter regions. Information theory provides a useful mathematical framework to analyze the binding motifs associated with TFs but imposes several assumptions that limit their applicability to specific regulatory scenarios. Explicit simulations of the co-evolution of TFs and their binding motifs allow the study of the evolution of regulatory networks with a high degree of realism. In this work we analyze the impact of differential regulatory demands on the information content of TF-binding motifs by means of evolutionary simulations. We generalize a predictive index based on information theory, and we validate its applicability to regulatory scenarios in which the TF binds significantly to the genomic background. Our results show a logarithmic dependence of the evolved information content on the occupancy of target sites and indicate that TFs may actively exploit pseudo-sites to modulate their occupancy of target sites. In regulatory networks with differentially regulated targets, we observe that information content in TF-binding motifs is dictated primarily by the fraction of total probability mass that the TF assigns to its target sites, and we provide a predictive index to estimate the amount of information associated with arbitrarily complex regulatory systems. We observe that complex regulatory patterns can exert additional demands on evolved information content, but, given a total occupancy for target sites, we do not find conclusive evidence that this effect is because of the range of required binding affinities.
Collapse
Affiliation(s)
- Patrick K O'Neill
- 1 Department of Biological Sciences, University of Maryland Baltimore County , Baltimore, Maryland
| | | | | |
Collapse
|
15
|
Hudson NJ, Porto-Neto LR, Kijas J, McWilliam S, Taft RJ, Reverter A. Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest. BMC Bioinformatics 2014; 15:66. [PMID: 24606587 PMCID: PMC4015654 DOI: 10.1186/1471-2105-15-66] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Accepted: 02/26/2014] [Indexed: 11/20/2022] Open
Abstract
Background Genomic information allows population relatedness to be inferred and selected genes to be identified. Single nucleotide polymorphism microarray (SNP-chip) data, a proxy for genome composition, contains patterns in allele order and proportion. These patterns can be quantified by compression efficiency (CE). In principle, the composition of an entire genome can be represented by a CE number quantifying allele representation and order. Results We applied a compression algorithm (DEFLATE) to genome-wide high-density SNP data from 4,155 human, 1,800 cattle, 1,222 sheep, 81 dogs and 49 mice samples. All human ethnic groups can be clustered by CE and the clusters recover phylogeography based on traditional fixation index (FST) analyses. CE analysis of other mammals results in segregation by breed or species, and is sensitive to admixture and past effective population size. This clustering is a consequence of individual patterns such as runs of homozygosity. Intriguingly, a related approach can also be used to identify genomic loci that show population-specific CE segregation. A high resolution CE ‘sliding window’ scan across the human genome, organised at the population level, revealed genes known to be under evolutionary pressure. These include SLC24A5 (European and Gujarati Indian skin pigmentation), HERC2 (European eye color), LCT (European and Maasai milk digestion) and EDAR (Asian hair thickness). We also identified a set of previously unidentified loci with high population-specific CE scores including the chromatin remodeler SCMH1 in Africans and EDA2R in Asians. Closer inspection reveals that these prioritised genomic regions do not correspond to simple runs of homozygosity but rather compositionally complex regions that are shared by many individuals of a given population. Unlike FST, CE analyses do not require ab initio population comparisons and are amenable to the hemizygous X chromosome. Conclusions We conclude with a discussion of the implications of CE for a complex systems science view of genome evolution. CE allows one to clearly visualise the evolution of individual genomes and populations through a formal, mathematically-rigorous information space. Overall, CE makes a set of biological predictions, some of which are unique and await functional validation.
Collapse
Affiliation(s)
| | | | | | | | - Ryan J Taft
- Computational and Systems Biology, CSIRO Animal, Food and Health Sciences, St, Lucia, Brisbane, QLD 4067, Australia.
| | | |
Collapse
|
16
|
Cornish JP, Sanchez-Alberola N, O'Neill PK, O'Keefe R, Gheba J, Erill I. Characterization of the SOS meta-regulon in the human gut microbiome. ACTA ACUST UNITED AC 2014; 30:1193-7. [PMID: 24407225 PMCID: PMC3998124 DOI: 10.1093/bioinformatics/btt753] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
MOTIVATION Data from metagenomics projects remain largely untapped for the analysis of transcriptional regulatory networks. Here, we provide proof-of-concept that metagenomic data can be effectively leveraged to analyze regulatory networks by characterizing the SOS meta-regulon in the human gut microbiome. RESULTS We combine well-established in silico and in vitro techniques to mine the human gut microbiome data and determine the relative composition of the SOS network in a natural setting. Our analysis highlights the importance of translesion synthesis as a primary function of the SOS response. We predict the association of this network with three novel protein clusters involved in cell wall biogenesis, chromosome partitioning and restriction modification, and we confirm binding of the SOS response transcriptional repressor to sites in the promoter of a cell wall biogenesis enzyme, a phage integrase and a death-on-curing protein. We discuss the implications of these findings and the potential for this approach for metagenome analysis.
Collapse
Affiliation(s)
- Joseph P Cornish
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), Baltimore, MD 21250, USA
| | | | | | | | | | | |
Collapse
|
17
|
Abstract
Eukaryotic cell development has been optimized by natural selection to obey maximal intracellular flux of messenger proteins. This, in turn, implies maximum Fisher information on angular position about a target nuclear pore complex (NPR). The cell is simply modeled as spherical, with cell membrane (CM) diameter 10 micrometer and concentric nuclear membrane (NM) diameter 6 micrometer. The NM contains approximately 3000 nuclear pore complexes (NPCs). Development requires messenger ligands to travel from the CM-NPC-DNA target binding sites. Ligands acquire negative charge by phosphorylation, passing through the cytoplasm over Newtonian trajectories toward positively charged NPCs (utilizing positive nuclear localization sequences). The CM-NPC channel obeys maximized mean protein flux F and Fisher information I at the NPC. Therefore the first-order change in I = 0. But also, the 2nd-order change in I is likewise close to zero, indicating significant stability to environmental perturbations. Many predictions are confirmed, including the dominance of protein pathways of from 1-4 proteins, a 4 nm size for the EGFR protein and the flux value F approximately 10(16) proteins/m2-s. After entering the nucleus, each protein ultimately delivers its ligand information to a DNA target site with maximum probability, i.e. maximum Kullback-Liebler entropy H(KL). In a smoothness limit H(KL) --> I(DNA)/2, so that the total CM-NPC-DNA channel obeys maximum Fisher I. It is also shown that such maximum information --> a cell state far from thermodynamic equilibrium, one condition for life.
Collapse
Affiliation(s)
- B Roy Frieden
- College of Optical Sciences, University of Arizona, Tucson, Arizona 85721, USA.
| | | |
Collapse
|
18
|
Cornish JP, Matthews F, Thomas JR, Erill I. Inference of self-regulated transcriptional networks by comparative genomics. Evol Bioinform Online 2012; 8:449-61. [PMID: 23032607 PMCID: PMC3422134 DOI: 10.4137/ebo.s9205] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
The assumption of basic properties, like self-regulation, in simple transcriptional regulatory networks can be exploited to infer regulatory motifs from the growing amounts of genomic and meta-genomic data. These motifs can in principle be used to elucidate the nature and scope of transcriptional networks through comparative genomics. Here we assess the feasibility of this approach using the SOS regulatory network of Gram-positive bacteria as a test case. Using experimentally validated data, we show that the known regulatory motif can be inferred through the assumption of self-regulation. Furthermore, the inferred motif provides a more robust search pattern for comparative genomics than the experimental motifs defined in reference organisms. We take advantage of this robustness to generate a functional map of the SOS response in Gram-positive bacteria. Our results reveal definite differences in the composition of the LexA regulon between Firmicutes and Actinobacteria, and confirm that regulation of cell-division inhibition is a widespread characteristic of this network among Gram-positive bacteria.
Collapse
Affiliation(s)
- Joseph P Cornish
- Department of Biological Sciences, University of Maryland Baltimore County
| | | | | | | |
Collapse
|
19
|
BioWord: a sequence manipulation suite for Microsoft Word. BMC Bioinformatics 2012; 13:124. [PMID: 22676326 PMCID: PMC3546851 DOI: 10.1186/1471-2105-13-124] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2012] [Accepted: 05/10/2012] [Indexed: 11/30/2022] Open
Abstract
Background The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. Results BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. Conclusions BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms.
Collapse
|
20
|
Sanchez-Alberola N, Campoy S, Barbé J, Erill I. Analysis of the SOS response of Vibrio and other bacteria with multiple chromosomes. BMC Genomics 2012; 13:58. [PMID: 22305460 PMCID: PMC3323433 DOI: 10.1186/1471-2164-13-58] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 02/03/2012] [Indexed: 12/18/2022] Open
Abstract
Background The SOS response is a well-known regulatory network present in most bacteria and aimed at addressing DNA damage. It has also been linked extensively to stress-induced mutagenesis, virulence and the emergence and dissemination of antibiotic resistance determinants. Recently, the SOS response has been shown to regulate the activity of integrases in the chromosomal superintegrons of the Vibrionaceae, which encompasses a wide range of pathogenic species harboring multiple chromosomes. Here we combine in silico and in vitro techniques to perform a comparative genomics analysis of the SOS regulon in the Vibrionaceae, and we extend the methodology to map this transcriptional network in other bacterial species harboring multiple chromosomes. Results Our analysis provides the first comprehensive description of the SOS response in a family (Vibrionaceae) that includes major human pathogens. It also identifies several previously unreported members of the SOS transcriptional network, including two proteins of unknown function. The analysis of the SOS response in other bacterial species with multiple chromosomes uncovers additional regulon members and reveals that there is a conserved core of SOS genes, and that specialized additions to this basic network take place in different phylogenetic groups. Our results also indicate that across all groups the main elements of the SOS response are always found in the large chromosome, whereas specialized additions are found in the smaller chromosomes and plasmids. Conclusions Our findings confirm that the SOS response of the Vibrionaceae is strongly linked with pathogenicity and dissemination of antibiotic resistance, and suggest that the characterization of the newly identified members of this regulon could provide key insights into the pathogenesis of Vibrio. The persistent location of key SOS genes in the large chromosome across several bacterial groups confirms that the SOS response plays an essential role in these organisms and sheds light into the mechanisms of evolution of global transcriptional networks involved in adaptability and rapid response to environmental changes, suggesting that small chromosomes may act as evolutionary test beds for the rewiring of transcriptional networks.
Collapse
Affiliation(s)
- Neus Sanchez-Alberola
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | | | | | | |
Collapse
|
21
|
Cambray G, Sanchez-Alberola N, Campoy S, Guerin É, Da Re S, González-Zorn B, Ploy MC, Barbé J, Mazel D, Erill I. Prevalence of SOS-mediated control of integron integrase expression as an adaptive trait of chromosomal and mobile integrons. Mob DNA 2011; 2:6. [PMID: 21529368 PMCID: PMC3108266 DOI: 10.1186/1759-8753-2-6] [Citation(s) in RCA: 96] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2010] [Accepted: 04/30/2011] [Indexed: 11/26/2022] Open
Abstract
Background Integrons are found in hundreds of environmental bacterial species, but are mainly known as the agents responsible for the capture and spread of antibiotic-resistance determinants between Gram-negative pathogens. The SOS response is a regulatory network under control of the repressor protein LexA targeted at addressing DNA damage, thus promoting genetic variation in times of stress. We recently reported a direct link between the SOS response and the expression of integron integrases in Vibrio cholerae and a plasmid-borne class 1 mobile integron. SOS regulation enhances cassette swapping and capture in stressful conditions, while freezing the integron in steady environments. We conducted a systematic study of available integron integrase promoter sequences to analyze the extent of this relationship across the Bacteria domain. Results Our results showed that LexA controls the expression of a large fraction of integron integrases by binding to Escherichia coli-like LexA binding sites. In addition, the results provide experimental validation of LexA control of the integrase gene for another Vibrio chromosomal integron and for a multiresistance plasmid harboring two integrons. There was a significant correlation between lack of LexA control and predicted inactivation of integrase genes, even though experimental evidence also indicates that LexA regulation may be lost to enhance expression of integron cassettes. Conclusions Ancestral-state reconstruction on an integron integrase phylogeny led us to conclude that the ancestral integron was already regulated by LexA. The data also indicated that SOS regulation has been actively preserved in mobile integrons and large chromosomal integrons, suggesting that unregulated integrase activity is selected against. Nonetheless, additional adaptations have probably arisen to cope with unregulated integrase activity. Identifying them may be fundamental in deciphering the uneven distribution of integrons in the Bacteria domain.
Collapse
Affiliation(s)
- Guillaume Cambray
- Institut Pasteur, Unité Plasticité du Génome Bactérien, CNRS URA 2171, 75015 Paris, France
| | - Neus Sanchez-Alberola
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain.,Department of Biological Sciences, University of Maryland Baltimore County, Baltimore 21228, USA
| | - Susana Campoy
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - Émilie Guerin
- Université de Limoges, Faculté de Médecine, EA3175, INSERM, Equipe Avenir, Limoges 87000, France
| | - Sandra Da Re
- Université de Limoges, Faculté de Médecine, EA3175, INSERM, Equipe Avenir, Limoges 87000, France
| | - Bruno González-Zorn
- Departamento de Sanidad Animal, Facultad de Veterinaria, and VISAVET, Universidad Complutense de Madrid, 28040 Madrid, Spain
| | - Marie-Cécile Ploy
- Université de Limoges, Faculté de Médecine, EA3175, INSERM, Equipe Avenir, Limoges 87000, France
| | - Jordi Barbé
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore 21228, USA
| | - Didier Mazel
- Institut Pasteur, Unité Plasticité du Génome Bactérien, CNRS URA 2171, 75015 Paris, France
| | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore 21228, USA
| |
Collapse
|
22
|
Abstract
In this Perspective, we propose that communication theory--a field of mathematics concerned with the problems of signal transmission, reception and processing--provides a new quantitative lens for investigating multicellular biology, ancient and modern. What underpins the cohesive organisation and collective behaviour of multicellular ecosystems such as microbial colonies and communities (microbiomes) and multicellular organisms such as plants and animals, whether built of simple tissue layers (sponges) or of complex differentiated cells arranged in tissues and organs (members of the 35 or so phyla of the subkingdom Metazoa)? How do mammalian tissues and organs develop, maintain their architecture, become subverted in disease, and decline with age? How did single-celled organisms coalesce to produce many-celled forms that evolved and diversified into the varied multicellular organisms in existence today? Some answers can be found in the blueprints or recipes encoded in (epi)genomes, yet others lie in the generic physical properties of biological matter such as the ability of cell aggregates to attain a certain complexity in size, shape, and pattern. We suggest that Lasswell's maxim "Who says what to whom in what channel with what effect" provides a foundation for understanding not only the emergence and evolution of multicellularity, but also the assembly and sculpting of multicellular ecosystems and many-celled structures, whether of natural or human-engineered origin. We explore how the abstraction of communication theory as an organising principle for multicellular biology could be realised. We highlight the inherent ability of communication theory to be blind to molecular and/or genetic mechanisms. We describe selected applications that analyse the physics of communication and use energy efficiency as a central tenet. Whilst communication theory has and could contribute to understanding a myriad of problems in biology, investigations of multicellular biology could, in turn, lead to advances in communication theory, especially in the still immature field of network information theory.
Collapse
Affiliation(s)
- I S Mian
- Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| | | |
Collapse
|
23
|
Abstract
The idea that we could build molecular communications systems can be advanced by investigating how actual molecules from living organisms function. Information theory provides tools for such an investigation. This review describes how we can compute the average information in the DNA binding sites of any genetic control protein and how this can be extended to analyze its individual sites. A formula equivalent to Claude Shannon's channel capacity can be applied to molecular systems and used to compute the efficiency of protein binding. This efficiency is often 70% and a brief explanation for that is given. The results imply that biological systems have evolved to function at channel capacity, which means that we should be able to build molecular communications that are just as robust as our macroscopic ones.
Collapse
Affiliation(s)
- Thomas D. Schneider
- National Institutes of Health, National Cancer Institute at Frederick, P.O. Box B, Frederick, MD 21702-1201, United States
| |
Collapse
|
24
|
Pan Y, Tsai CJ, Ma B, Nussinov R. Mechanisms of transcription factor selectivity. Trends Genet 2010; 26:75-83. [PMID: 20074831 PMCID: PMC7316385 DOI: 10.1016/j.tig.2009.12.003] [Citation(s) in RCA: 115] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2009] [Revised: 12/08/2009] [Accepted: 12/10/2009] [Indexed: 10/20/2022]
Abstract
The initiation of transcription is regulated by transcription factors (TFs) binding to DNA response elements (REs). How do TFs recognize specific binding sites among the many similar ones available in the genome? Recent research has illustrated that even a single nucleotide substitution can alter the selective binding of TFs to coregulators, that prior binding events can lead to selective DNA binding, and that selectivity is influenced by the availability of binding sites in the genome. Here, we combine structural insights with recent genomics screens to address the problem of TF-DNA interaction specificity. The emerging picture of selective binding site sequence recognition and TF activation involves three major factors: the cellular network, protein and DNA as dynamic conformational ensembles and the tight packing of multiple TFs and coregulators on stretches of regulatory DNA. The classification of TF recognition mechanisms based on these factors impacts our understanding of how transcription initiation is regulated.
Collapse
Affiliation(s)
- Yongping Pan
- Basic Science Program, SAIC-Frederick, Inc., Center for Cancer Research Nanobiology Program, NCI-Frederick, Frederick, MD 21702, USA
| | | | | | | |
Collapse
|
25
|
Zhang J, Li E, Olsen GJ. Protein-coding gene promoters in Methanocaldococcus (Methanococcus) jannaschii. Nucleic Acids Res 2009; 37:3588-601. [PMID: 19359364 PMCID: PMC2699501 DOI: 10.1093/nar/gkp213] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Although Methanocaldococcus (Methanococcus) jannaschii was the first archaeon to have its genome sequenced, little is known about the promoters of its protein-coding genes. To expand our knowledge, we have experimentally identified 131 promoters for 107 protein-coding genes in this genome by mapping their transcription start sites. Compared to previously identified promoters, more than half of which are from genes for stable RNAs, the protein-coding gene promoters are qualitatively similar in overall sequence pattern, but statistically different at several positions due to greater variation among their sequences. Relative binding affinity for general transcription factors was measured for 12 of these promoters by competition electrophoretic mobility shift assays. These promoters bind the factors less tightly than do most tRNA gene promoters. When a position weight matrix (PWM) was constructed from the protein gene promoters, factor binding affinities correlated with corresponding promoter PWM scores. We show that the PWM based on our data more accurately predicts promoters in the genome and transcription start sites than could be done with the previously available data. We also introduce a PWM logo, which visually displays the implications of observing a given base at a position in a sequence.
Collapse
Affiliation(s)
- Jian Zhang
- Department of Microbiology, University of Illinois at Urbana-Champaign, 601 South Goodwin Avenue, Urbana, IL 61801, USA
| | | | | |
Collapse
|