Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Erill I, O'Neill MC. A reexamination of information theory-based methods for DNA-binding site identification. BMC Bioinformatics 2009;10:57. [PMID: 19210776 PMCID: PMC2680408 DOI: 10.1186/1471-2105-10-57] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2008] [Accepted: 02/11/2009] [Indexed: 11/10/2022] Open

For:	Erill I, O'Neill MC. A reexamination of information theory-based methods for DNA-binding site identification. BMC Bioinformatics 2009;10:57. [PMID: 19210776 PMCID: PMC2680408 DOI: 10.1186/1471-2105-10-57] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2008] [Accepted: 02/11/2009] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Kılıç S, Sánchez-Osuna M, Collado-Padilla A, Barbé J, Erill I. Flexible comparative genomics of prokaryotic transcriptional regulatory networks. BMC Genomics 2020;21:466. [PMID: 33327941 PMCID: PMC7739468 DOI: 10.1186/s12864-020-06838-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 06/16/2020] [Indexed: 11/25/2022] Open

Abstract

Background

Comparative genomics methods enable the reconstruction of bacterial regulatory networks using available experimental data. In spite of their potential for accelerating research into the composition and evolution of bacterial regulons, few comparative genomics suites have been developed for the automated analysis of these regulatory systems. Available solutions typically rely on precomputed databases for operon and ortholog predictions, limiting the scope of analyses to processed complete genomes, and several key issues such as the transfer of experimental information or the integration of regulatory information in a probabilistic setting remain largely unaddressed.

Results

Here we introduce CGB, a flexible platform for comparative genomics of prokaryotic regulons. CGB has few external dependencies and enables fully customized analyses of newly available genome data. The platform automates the merging of experimental information and uses a gene-centered, Bayesian framework to generate and integrate easily interpretable results. We demonstrate its flexibility and power by analyzing the evolution of type III secretion system regulation in pathogenic Proteobacteria and by characterizing the SOS regulon of a new bacterial phylum, the Balneolaeota.

Conclusions

Our results demonstrate the applicability of the CGB pipeline in multiple settings. CGB’s ability to automatically integrate experimental information from multiple sources and use complete and draft genomic data, coupled with its non-reliance on precomputed databases and its easily interpretable display of gene-centered posterior probabilities of regulation provide users with an unprecedented level of flexibility in launching comparative genomics analyses of prokaryotic transcriptional regulatory networks. The analyses of type III secretion and SOS response regulatory networks illustrate instances of convergent and divergent evolution of these regulatory systems, showcasing the power of formal ancestral state reconstruction at inferring the evolutionary history of regulatory networks.

Collapse

Agüero-Chapin G, Galpert D, Molina-Ruiz R, Ancede-Gallardo E, Pérez-Machado G, De la Riva GA, Antunes A. Graph Theory-Based Sequence Descriptors as Remote Homology Predictors. Biomolecules 2019;10:E26. [PMID: 31878100 PMCID: PMC7022958 DOI: 10.3390/biom10010026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 12/16/2019] [Accepted: 12/18/2019] [Indexed: 12/23/2022] Open

Mrázek J, Karls AC. In silico simulations of occurrence of transcription factor binding sites in bacterial genomes. BMC Evol Biol 2019;19:67. [PMID: 30823869 PMCID: PMC6397444 DOI: 10.1186/s12862-019-1381-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 02/01/2019] [Indexed: 11/16/2022] Open

AL-barakati HJ, Saigo H, Newman RH, KC DB. RF-GlutarySite: a random forest based predictor for glutarylation sites. Mol Omics 2019;15:189-204. [DOI: 10.1039/c9mo00028c] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Tong H, Schliekelman P, Mrázek J. Unsupervised statistical discovery of spaced motifs in prokaryotic genomes. BMC Genomics 2017;18:27. [PMID: 28056763 PMCID: PMC5217627 DOI: 10.1186/s12864-016-3400-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Accepted: 12/09/2016] [Indexed: 12/23/2022] Open

Abstract

BACKGROUND

DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites. Most motif-finding methods apply probabilistic models to detect motifs characterized by unusually high number of copies of the motif in the analyzed sequences.

RESULTS

We present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. In the present implementation of the method, motifs are represented by pentamers and all pairs of pentamers are evaluated for statistically significant preference for a specific distance. An important step of the algorithm eliminates motif pairs where the spacers separating the two motifs exhibit a high degree of sequence similarity; such motif pairs likely arise from duplications of the whole segment including the motifs and the spacer rather than due to selective constraints indicative of a functional importance of the motif pair. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. Some motifs detected were previously known but other motifs found in the search appear to be novel. Selected motif pairs were subjected to further investigation and in some cases their possible biological functions were proposed.

CONCLUSIONS

We present a new motif-finding technique that is applicable to scanning complete genomes for sequence motifs. The results from analysis of 569 genomes suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. We conclude that our approach to detection of significant motif pairs can complement existing motif-finding techniques in discovery of novel functional sequence motifs in complete genomes.

Collapse

O'Neill PK, Erill I. Parametric bootstrapping for biological sequence motifs. BMC Bioinformatics 2016;17:406. [PMID: 27716039 PMCID: PMC5052923 DOI: 10.1186/s12859-016-1246-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2016] [Accepted: 09/08/2016] [Indexed: 11/10/2022] Open

Abstract

Background

Biological sequence motifs drive the specific interactions of proteins and nucleic acids. Accordingly, the effective computational discovery and analysis of such motifs is a central theme in bioinformatics. Many practical questions about the properties of motifs can be recast as random sampling problems. In this light, the task is to determine for a given motif whether a certain feature of interest is statistically unusual among relevantly similar alternatives. Despite the generality of this framework, its use has been frustrated by the difficulties of defining an appropriate reference class of motifs for comparison and of sampling from it effectively.

Results

We define two distributions over the space of all motifs of given dimension. The first is the maximum entropy distribution subject to mean information content, and the second is the truncated uniform distribution over all motifs having information content within a given interval. We derive exact sampling algorithms for each. As a proof of concept, we employ these sampling methods to analyze a broad collection of prokaryotic and eukaryotic transcription factor binding site motifs. In addition to positional information content, we consider the informational Gini coefficient of the motif, a measure of the degree to which information is evenly distributed throughout a motif’s positions. We find that both prokaryotic and eukaryotic motifs tend to exhibit higher informational Gini coefficients (IGC) than would be expected by chance under either reference distribution. As a second application, we apply maximum entropy sampling to the motif p-value problem and use it to give elementary derivations of two new estimators.

Conclusions

Despite the historical centrality of biological sequence motif analysis, this study constitutes to our knowledge the first use of principled null hypotheses for sequence motifs given information content. Through their use, we are able to characterize for the first time differerences in global motif statistics between biological motifs and their null distributions. In particular, we observe that biological sequence motifs show an unusual distribution of IGC, presumably due to biochemical constraints on the mechanisms of direct read-out.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1246-8) contains supplementary material, which is available to authorized users.

Collapse

Peng FY, Hu Z, Yang RC. Bioinformatic prediction of transcription factor binding sites at promoter regions of genes for photoperiod and vernalization responses in model and temperate cereal plants. BMC Genomics 2016;17:573. [PMID: 27503086 PMCID: PMC4977670 DOI: 10.1186/s12864-016-2916-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 07/07/2016] [Indexed: 11/14/2022] Open

Abstract

Background

Many genes involved in responses to photoperiod and vernalization have been characterized or predicted in Arabidopsis (Arabidopsis thaliana), Brachypodium (Brachypodium distachyon), wheat (Triticum aestivum) and barley (Hordeum vulgare). However, little is known about the transcription regulation of these genes, especially in the large, complex genomes of wheat and barley.

Results

We identified 68, 60, 195 and 61 genes that are known or postulated to control pathways of photoperiod (PH), vernalization (VE) and pathway integration (PI) in Arabidopsis, Brachypodium, wheat and barley for predicting transcription factor binding sites (TFBSs) in the promoters of these genes using the FIMO motif search tool of the MEME Suite. The initial predicted TFBSs were filtered to confirm the final numbers of predicted TFBSs to be 1066, 1379, 1528, and 789 in Arabidopsis, Brachypodium, wheat and barley, respectively. These TFBSs were mapped onto the PH, VE and PI pathways to infer about the regulation of gene expression in Arabidopsis and cereal species. The GC contents in promoters, untranslated regions (UTRs), coding sequences and introns were higher in the three cereal species than those in Arabidopsis. The predicted TFBSs were most abundant for two transcription factor (TF) families: MADS-box and CSD (cold shock domain). The analysis of publicly available gene expression data showed that genes with similar numbers of MADS-box and CSD TFBSs exhibited similar expression patterns across several different tissues and developmental stages. The intra-specific Tajima D-statistics of TFBS motif diversity showed different binding specificity among different TF families. The inter-specific Tajima D-statistics suggested faster TFBS divergence in TFBSs than in coding sequences and introns. Mapping TFBSs onto the PH, VE and PI pathways showed the predominance of MADS-box and CSD TFBSs in most genes of the four species, and the difference in the pathway regulations between Arabidopsis and the three cereal species.

Conclusion

Our approach to associating the key flowering genes with their potential TFs through prediction of putative TFBSs provides a framework to explore regulatory mechanisms of photoperiod and vernalization responses in flowering plants. The predicted TFBSs in the promoters of the flowering genes provide a basis for molecular characterization of transcription regulation in the large, complex genomes of important crop species, wheat and barley.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-016-2916-7) contains supplementary material, which is available to authorized users.

Collapse

RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest. BIOMED RESEARCH INTERNATIONAL 2016;2016:3281590. [PMID: 27066500 PMCID: PMC4811047 DOI: 10.1155/2016/3281590] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Revised: 01/13/2016] [Accepted: 01/31/2016] [Indexed: 01/17/2023]

Every Site Counts: Submitting Transcription Factor-Binding Site Information through the CollecTF Portal. J Bacteriol 2015;197:2454-7. [PMID: 26013488 DOI: 10.1128/jb.00031-15] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open

Identification and characterization of VpsR and VpsT binding sites in Vibrio cholerae. J Bacteriol 2015;197:1221-35. [PMID: 25622616 DOI: 10.1128/jb.02439-14] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

UNLABELLED

The ability to form biofilms is critical for environmental survival and transmission of Vibrio cholerae, a facultative human pathogen responsible for the disease cholera. Biofilm formation is controlled by several transcriptional regulators and alternative sigma factors. In this study, we report that the two main positive regulators of biofilm formation, VpsR and VpsT, bind to nonoverlapping target sequences in the regulatory region of vpsL in vitro. VpsR binds to a proximal site (the R1 box) as well as a distal site (the R2 box) with respect to the transcriptional start site identified upstream of vpsL. The VpsT binding site (the T box) is located between the R1 and R2 boxes. While mutations in the T and R boxes resulted in a decrease in vpsL expression, deletion of the T and R2 boxes resulted in an increase in vpsL expression. Analysis of the role of H-NS in vpsL expression revealed that deletion of hns resulted in enhanced vpsL expression. The level of vpsL expression was higher in an hns vpsT double mutant than in the parental strain but lower than that in an hns mutant. In silico analysis of the regulatory regions of the VpsR and VpsT targets resulted in the identification of conserved recognition motifs for VpsR and VpsT and revealed that operons involved in biofilm formation and vpsT are coregulated by VpsR and VpsT. Furthermore, a comparative genomics analysis revealed substantial variability in the promoter region of the vpsT and vpsL genes among extant V. cholerae isolates, suggesting that regulation of biofilm formation is under active selection.

IMPORTANCE

Vibrio cholerae causes cholera and is a natural inhabitant of aquatic environments. One critical factor that is important for environmental survival and transmission of V. cholerae is the microbe's ability to form biofilms, which are surface-associated communities encased in a matrix composed of the exopolysaccharide VPS (Vibrio polysaccharide), proteins, and nucleic acids. Two proteins, VpsR and VpsT, positively regulate VPS production and biofilm formation. We characterized the structural features of the promoter of the vpsL gene, determined the target sequences recognized by VpsT and VpsR, and analyzed their distribution and conservation patterns in multiple V. cholerae isolates. This work fills a fundamental gap in our understanding of the regulatory mechanisms employed by the master regulators VpsR and VpsT in controlling biofilm matrix production.

Collapse

Johnson MD, Mueller M, Adamowicz-Brice M, Collins MJ, Gellert P, Maratou K, Srivastava PK, Rotival M, Butt S, Game L, Atanur SS, Silver N, Norsworthy PJ, Langley SR, Petretto E, Pravenec M, Aitman TJ. Genetic analysis of the cardiac methylome at single nucleotide resolution in a model of human cardiovascular disease. PLoS Genet 2014;10:e1004813. [PMID: 25474312 PMCID: PMC4256262 DOI: 10.1371/journal.pgen.1004813] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Accepted: 10/09/2014] [Indexed: 12/03/2022] Open

Abstract

Epigenetic marks such as cytosine methylation are important determinants of cellular and whole-body phenotypes. However, the extent of, and reasons for inter-individual differences in cytosine methylation, and their association with phenotypic variation are poorly characterised. Here we present the first genome-wide study of cytosine methylation at single-nucleotide resolution in an animal model of human disease. We used whole-genome bisulfite sequencing in the spontaneously hypertensive rat (SHR), a model of cardiovascular disease, and the Brown Norway (BN) control strain, to define the genetic architecture of cytosine methylation in the mammalian heart and to test for association between methylation and pathophysiological phenotypes. Analysis of 10.6 million CpG dinucleotides identified 77,088 CpGs that were differentially methylated between the strains. In F1 hybrids we found 38,152 CpGs showing allele-specific methylation and 145 regions with parent-of-origin effects on methylation. Cis-linkage explained almost 60% of inter-strain variation in methylation at a subset of loci tested for linkage in a panel of recombinant inbred (RI) strains. Methylation analysis in isolated cardiomyocytes showed that in the majority of cases methylation differences in cardiomyocytes and non-cardiomyocytes were strain-dependent, confirming a strong genetic component for cytosine methylation. We observed preferential nucleotide usage associated with increased and decreased methylation that is remarkably conserved across species, suggesting a common mechanism for germline control of inter-individual variation in CpG methylation. In the RI strain panel, we found significant correlation of CpG methylation and levels of serum chromogranin B (CgB), a proposed biomarker of heart failure, which is evidence for a link between germline DNA sequence variation, CpG methylation differences and pathophysiological phenotypes in the SHR strain. Together, these results will stimulate further investigation of the molecular basis of locally regulated variation in CpG methylation and provide a starting point for understanding the relationship between the genetic control of CpG methylation and disease phenotypes.

Epigenetic marks provide information that is not encoded in the primary DNA sequence itself but in modifications of genomic DNA and of the associated proteins. Methylation of genomic DNA at cytosine residues is an important epigenetic modification that is associated with developmental processes, carcinogenesis and other diseases. Genome-wide extent of, and reasons for inter-individual differences in cytosine methylation, and their association with phenotypic variation are poorly characterised. To address these questions we have determined and compared the genome-wide methylation patterns in heart tissue of two inbred rat strains, the spontaneously hypertensive rat, an animal model of human disease and a control rat strain. Comparison of methylation differences between genetically identical animals from the same strain and differences between animals from different strains allowed us to quantify association of epigenetic and genetic differences. We show that differences in an individual's germline DNA sequence are important determinants of the variability in methylation between individuals. Comparison with previous reports implicates common mechanisms for regulation of cytosine methylation that are highly conserved across species. Finally, we find correlation between a proposed blood biomarker for heart failure and variation in DNA methylation, suggesting a link between germline DNA sequence variation, methylation and a disease-related phenotype.

Collapse

Affiliation(s)

Michelle D. Johnson Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom National Heart and Lung Institute, Imperial College, London, United Kingdom
Michael Mueller Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom National Heart and Lung Institute, Imperial College, London, United Kingdom
Martyna Adamowicz-Brice Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom National Heart and Lung Institute, Imperial College, London, United Kingdom
Melissa J. Collins Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom National Heart and Lung Institute, Imperial College, London, United Kingdom
Pascal Gellert Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom Institute of Clinical Sciences, Imperial College, London, United Kingdom
Klio Maratou Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom Institute of Clinical Sciences, Imperial College, London, United Kingdom
Prashant K. Srivastava Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
Maxime Rotival Integrative Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
Shahena Butt Integrative Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
Laurence Game Genomics Core Laboratory, MRC Clinical Sciences Centre, London, United Kingdom
Santosh S. Atanur Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom National Heart and Lung Institute, Imperial College, London, United Kingdom
Nicholas Silver Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom National Heart and Lung Institute, Imperial College, London, United Kingdom
Penny J. Norsworthy Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
Sarah R. Langley Integrative Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
Enrico Petretto Integrative Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom
Michal Pravenec Institute of Physiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic Institute of Biology and Medical Genetics, 1st Medical Faculty, Charles University, Prague, Czech Republic
Timothy J. Aitman Physiological Genomics and Medicine Group, MRC Clinical Sciences Centre, London, United Kingdom Institute of Clinical Sciences, Imperial College, London, United Kingdom * E-mail:

Collapse

Vinga S. Information theory applications for biological sequence analysis. Brief Bioinform 2014;15:376-89. [PMID: 24058049 PMCID: PMC7109941 DOI: 10.1093/bib/bbt068] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 08/17/2013] [Indexed: 01/13/2023] Open

The LexA regulated genes of the Clostridium difficile. BMC Microbiol 2014;14:88. [PMID: 24713082 PMCID: PMC4234289 DOI: 10.1186/1471-2180-14-88] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2013] [Accepted: 03/27/2014] [Indexed: 01/05/2023] Open

O'Neill PK, Forder R, Erill I. Informational requirements for transcriptional regulation. J Comput Biol 2014;21:373-84. [PMID: 24689750 DOI: 10.1089/cmb.2014.0032] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Hudson NJ, Porto-Neto LR, Kijas J, McWilliam S, Taft RJ, Reverter A. Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest. BMC Bioinformatics 2014;15:66. [PMID: 24606587 PMCID: PMC4015654 DOI: 10.1186/1471-2105-15-66] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Accepted: 02/26/2014] [Indexed: 11/20/2022] Open

Abstract

Background

Genomic information allows population relatedness to be inferred and selected genes to be identified. Single nucleotide polymorphism microarray (SNP-chip) data, a proxy for genome composition, contains patterns in allele order and proportion. These patterns can be quantified by compression efficiency (CE). In principle, the composition of an entire genome can be represented by a CE number quantifying allele representation and order.

Results

We applied a compression algorithm (DEFLATE) to genome-wide high-density SNP data from 4,155 human, 1,800 cattle, 1,222 sheep, 81 dogs and 49 mice samples. All human ethnic groups can be clustered by CE and the clusters recover phylogeography based on traditional fixation index (F_ST) analyses. CE analysis of other mammals results in segregation by breed or species, and is sensitive to admixture and past effective population size. This clustering is a consequence of individual patterns such as runs of homozygosity. Intriguingly, a related approach can also be used to identify genomic loci that show population-specific CE segregation. A high resolution CE ‘sliding window’ scan across the human genome, organised at the population level, revealed genes known to be under evolutionary pressure. These include SLC24A5 (European and Gujarati Indian skin pigmentation), HERC2 (European eye color), LCT (European and Maasai milk digestion) and EDAR (Asian hair thickness). We also identified a set of previously unidentified loci with high population-specific CE scores including the chromatin remodeler SCMH1 in Africans and EDA2R in Asians. Closer inspection reveals that these prioritised genomic regions do not correspond to simple runs of homozygosity but rather compositionally complex regions that are shared by many individuals of a given population. Unlike F_ST, CE analyses do not require ab initio population comparisons and are amenable to the hemizygous X chromosome.

Conclusions

We conclude with a discussion of the implications of CE for a complex systems science view of genome evolution. CE allows one to clearly visualise the evolution of individual genomes and populations through a formal, mathematically-rigorous information space. Overall, CE makes a set of biological predictions, some of which are unique and await functional validation.

Collapse

Cornish JP, Sanchez-Alberola N, O'Neill PK, O'Keefe R, Gheba J, Erill I. Characterization of the SOS meta-regulon in the human gut microbiome. ACTA ACUST UNITED AC 2014;30:1193-7. [PMID: 24407225 PMCID: PMC3998124 DOI: 10.1093/bioinformatics/btt753] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Frieden BR, Gatenby RA. Cell development obeys maximum Fisher information. Front Biosci (Elite Ed) 2013;5:1017-32. [PMID: 23747917 PMCID: PMC4711766 DOI: 10.2741/e681] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Cornish JP, Matthews F, Thomas JR, Erill I. Inference of self-regulated transcriptional networks by comparative genomics. Evol Bioinform Online 2012;8:449-61. [PMID: 23032607 PMCID: PMC3422134 DOI: 10.4137/ebo.s9205] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open

BioWord: a sequence manipulation suite for Microsoft Word. BMC Bioinformatics 2012;13:124. [PMID: 22676326 PMCID: PMC3546851 DOI: 10.1186/1471-2105-13-124] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2012] [Accepted: 05/10/2012] [Indexed: 11/30/2022] Open

Sanchez-Alberola N, Campoy S, Barbé J, Erill I. Analysis of the SOS response of Vibrio and other bacteria with multiple chromosomes. BMC Genomics 2012;13:58. [PMID: 22305460 PMCID: PMC3323433 DOI: 10.1186/1471-2164-13-58] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 02/03/2012] [Indexed: 12/18/2022] Open

Abstract

Background

The SOS response is a well-known regulatory network present in most bacteria and aimed at addressing DNA damage. It has also been linked extensively to stress-induced mutagenesis, virulence and the emergence and dissemination of antibiotic resistance determinants. Recently, the SOS response has been shown to regulate the activity of integrases in the chromosomal superintegrons of the Vibrionaceae, which encompasses a wide range of pathogenic species harboring multiple chromosomes. Here we combine in silico and in vitro techniques to perform a comparative genomics analysis of the SOS regulon in the Vibrionaceae, and we extend the methodology to map this transcriptional network in other bacterial species harboring multiple chromosomes.

Results

Our analysis provides the first comprehensive description of the SOS response in a family (Vibrionaceae) that includes major human pathogens. It also identifies several previously unreported members of the SOS transcriptional network, including two proteins of unknown function. The analysis of the SOS response in other bacterial species with multiple chromosomes uncovers additional regulon members and reveals that there is a conserved core of SOS genes, and that specialized additions to this basic network take place in different phylogenetic groups. Our results also indicate that across all groups the main elements of the SOS response are always found in the large chromosome, whereas specialized additions are found in the smaller chromosomes and plasmids.

Conclusions

Our findings confirm that the SOS response of the Vibrionaceae is strongly linked with pathogenicity and dissemination of antibiotic resistance, and suggest that the characterization of the newly identified members of this regulon could provide key insights into the pathogenesis of Vibrio. The persistent location of key SOS genes in the large chromosome across several bacterial groups confirms that the SOS response plays an essential role in these organisms and sheds light into the mechanisms of evolution of global transcriptional networks involved in adaptability and rapid response to environmental changes, suggesting that small chromosomes may act as evolutionary test beds for the rewiring of transcriptional networks.

Collapse

Cambray G, Sanchez-Alberola N, Campoy S, Guerin É, Da Re S, González-Zorn B, Ploy MC, Barbé J, Mazel D, Erill I. Prevalence of SOS-mediated control of integron integrase expression as an adaptive trait of chromosomal and mobile integrons. Mob DNA 2011;2:6. [PMID: 21529368 PMCID: PMC3108266 DOI: 10.1186/1759-8753-2-6] [Citation(s) in RCA: 96] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2010] [Accepted: 04/30/2011] [Indexed: 11/26/2022] Open

Abstract

Background

Integrons are found in hundreds of environmental bacterial species, but are mainly known as the agents responsible for the capture and spread of antibiotic-resistance determinants between Gram-negative pathogens. The SOS response is a regulatory network under control of the repressor protein LexA targeted at addressing DNA damage, thus promoting genetic variation in times of stress. We recently reported a direct link between the SOS response and the expression of integron integrases in Vibrio cholerae and a plasmid-borne class 1 mobile integron. SOS regulation enhances cassette swapping and capture in stressful conditions, while freezing the integron in steady environments. We conducted a systematic study of available integron integrase promoter sequences to analyze the extent of this relationship across the Bacteria domain.

Results

Our results showed that LexA controls the expression of a large fraction of integron integrases by binding to Escherichia coli-like LexA binding sites. In addition, the results provide experimental validation of LexA control of the integrase gene for another Vibrio chromosomal integron and for a multiresistance plasmid harboring two integrons. There was a significant correlation between lack of LexA control and predicted inactivation of integrase genes, even though experimental evidence also indicates that LexA regulation may be lost to enhance expression of integron cassettes.

Conclusions

Ancestral-state reconstruction on an integron integrase phylogeny led us to conclude that the ancestral integron was already regulated by LexA. The data also indicated that SOS regulation has been actively preserved in mobile integrons and large chromosomal integrons, suggesting that unregulated integrase activity is selected against. Nonetheless, additional adaptations have probably arisen to cope with unregulated integrase activity. Identifying them may be fundamental in deciphering the uneven distribution of integrons in the Bacteria domain.

Collapse

Mian IS, Rose C. Communication theory and multicellular biology. Integr Biol (Camb) 2011;3:350-67. [PMID: 21424025 DOI: 10.1039/c0ib00117a] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

Abstract

In this Perspective, we propose that communication theory--a field of mathematics concerned with the problems of signal transmission, reception and processing--provides a new quantitative lens for investigating multicellular biology, ancient and modern. What underpins the cohesive organisation and collective behaviour of multicellular ecosystems such as microbial colonies and communities (microbiomes) and multicellular organisms such as plants and animals, whether built of simple tissue layers (sponges) or of complex differentiated cells arranged in tissues and organs (members of the 35 or so phyla of the subkingdom Metazoa)? How do mammalian tissues and organs develop, maintain their architecture, become subverted in disease, and decline with age? How did single-celled organisms coalesce to produce many-celled forms that evolved and diversified into the varied multicellular organisms in existence today? Some answers can be found in the blueprints or recipes encoded in (epi)genomes, yet others lie in the generic physical properties of biological matter such as the ability of cell aggregates to attain a certain complexity in size, shape, and pattern. We suggest that Lasswell's maxim "Who says what to whom in what channel with what effect" provides a foundation for understanding not only the emergence and evolution of multicellularity, but also the assembly and sculpting of multicellular ecosystems and many-celled structures, whether of natural or human-engineered origin. We explore how the abstraction of communication theory as an organising principle for multicellular biology could be realised. We highlight the inherent ability of communication theory to be blind to molecular and/or genetic mechanisms. We describe selected applications that analyse the physics of communication and use energy efficiency as a central tenet. Whilst communication theory has and could contribute to understanding a myriad of problems in biology, investigations of multicellular biology could, in turn, lead to advances in communication theory, especially in the still immature field of network information theory.

Collapse

Schneider TD. A brief review of molecular information theory. NANO COMMUNICATION NETWORKS 2010;1:173-180. [PMID: 22110566 PMCID: PMC3220916 DOI: 10.1016/j.nancom.2010.09.002] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Pan Y, Tsai CJ, Ma B, Nussinov R. Mechanisms of transcription factor selectivity. Trends Genet 2010;26:75-83. [PMID: 20074831 PMCID: PMC7316385 DOI: 10.1016/j.tig.2009.12.003] [Citation(s) in RCA: 115] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2009] [Revised: 12/08/2009] [Accepted: 12/10/2009] [Indexed: 10/20/2022]

Zhang J, Li E, Olsen GJ. Protein-coding gene promoters in Methanocaldococcus (Methanococcus) jannaschii. Nucleic Acids Res 2009;37:3588-601. [PMID: 19359364 PMCID: PMC2699501 DOI: 10.1093/nar/gkp213] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open