1
|
Guerrero-Preston R, Valle BL, Jedlicka A, Turaga N, Folawiyo O, Pirini F, Lawson F, Vergura A, Noordhuis M, Dziedzic A, Pérez G, Renehan M, Guerrero-Diaz C, De Jesus Rodríguez E, Diaz-Montes T, Rodríguez Orengo J, Méndez K, Romaguera J, Trock BJ, Florea L, Sidransky D. Molecular Triage of Premalignant Lesions in Liquid-Based Cervical Cytology and Circulating Cell-Free DNA from Urine, Using a Panel of Methylated Human Papilloma Virus and Host Genes. Cancer Prev Res (Phila) 2016; 9:915-924. [DOI: 10.1158/1940-6207.capr-16-0138] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Revised: 08/23/2016] [Accepted: 09/07/2016] [Indexed: 11/16/2022]
|
2
|
Abstract
Gold ions are mobilized and disseminated through the environment and enter into the cells by non-specific intake. To avoid deleterious effect that occurs even at very low concentrations, bacteria such as Salmonella enterica and Cupriavidus metallidurans use Au-specific MerR-type transcriptional regulators to detect the presence of these toxic ions, and control the expression of specific resistance factors. In contrast to the related copper sensor CueR, the Au-selective metalloregulatory proteins are able to distinguish Au(I) from Cu(I) or Ag(I). This is achieved by finely tuning a single dithiolate metal coordination with conserved cysteine residues at the metal binding site of the proteins to lower the affinity for Cu(I) in comparison to the Cu-sensors, while maintaining or even increasing the affinity for Au(I). In Salmonella, GolS not only privileges the binding of Au(I) over Cu(I) or Ag(I), but also distinguishes its target recognition sites in its regulated promoters minimizing cross-activation of CueR-controlled operators. In this sense, the presence of a selective Au sensory devise would allow species harbouring resident Cu-homeostasis systems to eliminate the toxic ion without affecting Cu acquisition in Au rich environments.
Collapse
|
3
|
Kucerova E, Clifton SW, Xia XQ, Long F, Porwollik S, Fulton L, Fronick C, Minx P, Kyung K, Warren W, Fulton R, Feng D, Wollam A, Shah N, Bhonagiri V, Nash WE, Hallsworth-Pepin K, Wilson RK, McClelland M, Forsythe SJ. Genome sequence of Cronobacter sakazakii BAA-894 and comparative genomic hybridization analysis with other Cronobacter species. PLoS One 2010; 5:e9556. [PMID: 20221447 PMCID: PMC2833190 DOI: 10.1371/journal.pone.0009556] [Citation(s) in RCA: 175] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2009] [Accepted: 02/14/2010] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND The genus Cronobacter (formerly called Enterobacter sakazakii) is composed of five species; C. sakazakii, C. malonaticus, C. turicensis, C. muytjensii, and C. dublinensis. The genus includes opportunistic human pathogens, and the first three species have been associated with neonatal infections. The most severe diseases are caused in neonates and include fatal necrotizing enterocolitis and meningitis. The genetic basis of the diversity within the genus is unknown, and few virulence traits have been identified. METHODOLOGY/PRINCIPAL FINDINGS We report here the first sequence of a member of this genus, C. sakazakii strain BAA-894. The genome of Cronobacter sakazakii strain BAA-894 comprises a 4.4 Mb chromosome (57% GC content) and two plasmids; 31 kb (51% GC) and 131 kb (56% GC). The genome was used to construct a 387,000 probe oligonucleotide tiling DNA microarray covering the whole genome. Comparative genomic hybridization (CGH) was undertaken on five other C. sakazakii strains, and representatives of the four other Cronobacter species. Among 4,382 annotated genes inspected in this study, about 55% of genes were common to all C. sakazakii strains and 43% were common to all Cronobacter strains, with 10-17% absence of genes. CONCLUSIONS/SIGNIFICANCE CGH highlighted 15 clusters of genes in C. sakazakii BAA-894 that were divergent or absent in more than half of the tested strains; six of these are of probable prophage origin. Putative virulence factors were identified in these prophage and in other variable regions. A number of genes unique to Cronobacter species associated with neonatal infections (C. sakazakii, C. malonaticus and C. turicensis) were identified. These included a copper and silver resistance system known to be linked to invasion of the blood-brain barrier by neonatal meningitic strains of Escherichia coli. In addition, genes encoding for multidrug efflux pumps and adhesins were identified that were unique to C. sakazakii strains from outbreaks in neonatal intensive care units.
Collapse
Affiliation(s)
- Eva Kucerova
- School of Science and Technology, Nottingham Trent University, Nottingham, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Chiapello H, Gendrault A, Caron C, Blum J, Petit MA, El Karoui M. MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level. BMC Bioinformatics 2008; 9:498. [PMID: 19038022 PMCID: PMC2607288 DOI: 10.1186/1471-2105-9-498] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2008] [Accepted: 11/27/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The recent availability of complete sequences for numerous closely related bacterial genomes opens up new challenges in comparative genomics. Several methods have been developed to align complete genomes at the nucleotide level but their use and the biological interpretation of results are not straightforward. It is therefore necessary to develop new resources to access, analyze, and visualize genome comparisons. DESCRIPTION Here we present recent developments on MOSAIC, a generalist comparative bacterial genome database. This database provides the bacteriologist community with easy access to comparisons of complete bacterial genomes at the intra-species level. The strategy we developed for comparison allows us to define two types of regions in bacterial genomes: backbone segments (i.e., regions conserved in all compared strains) and variable segments (i.e., regions that are either specific to or variable in one of the aligned genomes). Definition of these segments at the nucleotide level allows precise comparative and evolutionary analyses of both coding and non-coding regions of bacterial genomes. Such work is easily performed using the MOSAIC Web interface, which allows browsing and graphical visualization of genome comparisons. CONCLUSION The MOSAIC database now includes 493 pairwise comparisons and 35 multiple maximal comparisons representing 78 bacterial species. Genome conserved regions (backbones) and variable segments are presented in various formats for further analysis. A graphical interface allows visualization of aligned genomes and functional annotations. The MOSAIC database is available online at http://genome.jouy.inra.fr/mosaic.
Collapse
Affiliation(s)
- Hélène Chiapello
- INRA UR1077, Unité Mathématique, Informatique & Génome, Domaine de Vilvert, 78352, Jouy-en-Josas, France.
| | | | | | | | | | | |
Collapse
|
5
|
Rodionov DA, Li X, Rodionova IA, Yang C, Sorci L, Dervyn E, Martynowski D, Zhang H, Gelfand MS, Osterman AL. Transcriptional regulation of NAD metabolism in bacteria: genomic reconstruction of NiaR (YrxA) regulon. Nucleic Acids Res 2008; 36:2032-46. [PMID: 18276644 PMCID: PMC2330245 DOI: 10.1093/nar/gkn046] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A comparative genomic approach was used to reconstruct transcriptional regulation of NAD biosynthesis in bacteria containing orthologs of Bacillus subtilis gene yrxA, a previously identified niacin-responsive repressor of NAD de novo synthesis. Members of YrxA family (re-named here NiaR) are broadly conserved in the Bacillus/Clostridium group and in the deeply branching Fusobacteria and Thermotogales lineages. We analyzed upstream regions of genes associated with NAD biosynthesis to identify candidate NiaR-binding DNA motifs and assess the NiaR regulon content in these species. Representatives of the two distinct types of candidate NiaR-binding sites, characteristic of the Firmicutes and Thermotogales, were verified by an electrophoretic mobility shift assay. In addition to transcriptional control of the nadABC genes, the NiaR regulon in some species extends to niacin salvage (the pncAB genes) and includes uncharacterized membrane proteins possibly involved in niacin transport. The involvement in niacin uptake proposed for one of these proteins (re-named NiaP), encoded by the B. subtilis gene yceI, was experimentally verified. In addition to bacteria, members of the NiaP family are conserved in multicellular eukaryotes, including human, pointing to possible NaiP involvement in niacin utilization in these organisms. Overall, the analysis of the NiaR and NrtR regulons (described in the accompanying paper) revealed mechanisms of transcriptional regulation of NAD metabolism in nearly a hundred diverse bacteria.
Collapse
|
6
|
Affiliation(s)
- Dmitry A Rodionov
- Burnham Institute for Medical Research, La Jolla, California 92037, USA.
| |
Collapse
|
7
|
Fremez R, Faraut T, Fichant G, Gouzy J, Quentin Y. Phylogenetic exploration of bacterial genomic rearrangements. Bioinformatics 2007; 23:1172-4. [PMID: 17332021 DOI: 10.1093/bioinformatics/btm070] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
UNLABELLED We present a graphical tool dedicated to the exploration of bacterial genome rearrangements. The principle of this exploration relies on the reconstruction of ancestral genomes at each internal node of a gene-order-based phylogenetic tree. This tool allows the selection of internal nodes to visualize the rearrangements between the inferred chromosome of this node and its direct descendant on the tree. AVAILABILITY PEGR is available at the Genopole Toulouse Bioinformatics platform.
Collapse
Affiliation(s)
- Romain Fremez
- Université Paul Sabatier, CNRS-LMGM, UMR 5100, Toulouse Cedex, France
| | | | | | | | | |
Collapse
|
8
|
Thakur V, Azad RK, Ramaswamy R. Markov models of genome segmentation. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2007; 75:011915. [PMID: 17358192 DOI: 10.1103/physreve.75.011915] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2006] [Revised: 06/19/2006] [Indexed: 05/14/2023]
Abstract
We introduce Markov models for segmentation of symbolic sequences, extending a segmentation procedure based on the Jensen-Shannon divergence that has been introduced earlier. Higher-order Markov models are more sensitive to the details of local patterns and in application to genome analysis, this makes it possible to segment a sequence at positions that are biologically meaningful. We show the advantage of higher-order Markov-model-based segmentation procedures in detecting compositional inhomogeneity in chimeric DNA sequences constructed from genomes of diverse species, and in application to the E. coli K12 genome, boundaries of genomic islands, cryptic prophages, and horizontally acquired regions are accurately identified.
Collapse
Affiliation(s)
- Vivek Thakur
- Center for Computational Biology and Bioinformatics, School of Information Technology, Jawaharlal Nehru University, New Delhi 110 067, India
| | | | | |
Collapse
|
9
|
Gibson DL, White AP, Snyder SD, Martin S, Heiss C, Azadi P, Surette M, Kay WW. Salmonella produces an O-antigen capsule regulated by AgfD and important for environmental persistence. J Bacteriol 2006; 188:7722-30. [PMID: 17079680 PMCID: PMC1636306 DOI: 10.1128/jb.00809-06] [Citation(s) in RCA: 135] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2006] [Accepted: 07/25/2006] [Indexed: 11/20/2022] Open
Abstract
In this study, we show that Salmonella produces an O-antigen capsule coregulated with the fimbria- and cellulose-associated extracellular matrix. Structural analysis of purified Salmonella extracellular polysaccharides yielded predominantly a repeating oligosaccharide unit similar to that of Salmonella enterica serovar Enteritidis lipopolysaccharide O antigen with some modifications. Putative carbohydrate transport and regulatory operons important for capsule assembly and translocation, designated yihU-yshA and yihVW, were identified by screening a random transposon library with immune serum generated to the capsule. The absence of capsule was confirmed by generating various isogenic Deltayih mutants, where yihQ and yihO were shown to be important in capsule assembly and translocation. Luciferase-based expression studies showed that AgfD regulates the yih operons in coordination with extracellular matrix genes coding for thin aggregative fimbriae and cellulose. Although the capsule did not appear to be important for multicellular behavior, we demonstrate that it was important for survival during desiccation stress. Since the yih genes are conserved in salmonellae and the O-antigen capsule was important for environmental persistence, the formation of this surface structure may represent a conserved survival strategy.
Collapse
Affiliation(s)
- D L Gibson
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, V8W 3P6 British Columbia, Canada
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Treangen TJ, Messeguer X. M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species. BMC Bioinformatics 2006; 7:433. [PMID: 17022809 PMCID: PMC1629028 DOI: 10.1186/1471-2105-7-433] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2006] [Accepted: 10/05/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Due to recent advances in whole genome shotgun sequencing and assembly technologies, the financial cost of decoding an organism's DNA has been drastically reduced, resulting in a recent explosion of genomic sequencing projects. This increase in related genomic data will allow for in depth studies of evolution in closely related species through multiple whole genome comparisons. RESULTS To facilitate such comparisons, we present an interactive multiple genome comparison and alignment tool, M-GCAT, that can efficiently construct multiple genome comparison frameworks in closely related species. M-GCAT is able to compare and identify highly conserved regions in up to 20 closely related bacterial species in minutes on a standard computer, and as many as 90 (containing 75 cloned genomes from a set of 15 published enterobacterial genomes) in an hour. M-GCAT also incorporates a novel comparative genomics data visualization interface allowing the user to globally and locally examine and inspect the conserved regions and gene annotations. CONCLUSION M-GCAT is an interactive comparative genomics tool well suited for quickly generating multiple genome comparisons frameworks and alignments among closely related species. M-GCAT is freely available for download for academic and non-commercial use at: http://alggen.lsi.upc.es/recerca/align/mgcat/intro-mgcat.html.
Collapse
Affiliation(s)
- Todd J Treangen
- Dept. of Computer Science, Technical University of Catalonia, Barcelona, Spain
| | - Xavier Messeguer
- Dept. of Computer Science, Technical University of Catalonia, Barcelona, Spain
| |
Collapse
|
11
|
Giel JL, Rodionov D, Liu M, Blattner FR, Kiley PJ. IscR-dependent gene expression links iron-sulphur cluster assembly to the control of O2-regulated genes in Escherichia coli. Mol Microbiol 2006; 60:1058-75. [PMID: 16677314 DOI: 10.1111/j.1365-2958.2006.05160.x] [Citation(s) in RCA: 207] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
IscR is an iron-sulphur (Fe-S) cluster-containing transcription factor that represses transcription of the operon containing its own gene and the iscSUA-hscBA-fdx genes, whose products are involved in Fe-S cluster biogenesis. In this study, global transcriptional profiling of Escherichia coli IscR(+) and IscR(-) strains grown under aerobic and anaerobic conditions indicated that 40 genes in 20 predicted operons were regulated by IscR. DNase I footprinting and/or in vitro transcription reactions identified seven new promoters under direct IscR control. Among these were genes encoding known or proposed functions in Fe-S cluster biogenesis (sufABCDSE, yadR and yhgI) and Fe-S cluster-containing anaerobic respiratory enzymes (hyaABCDEF, hybOABCDEFG and napFDAGHBC). The finding that IscR repressed expression of the hyaA, hybO and napF promoters specifically under aerobic growth conditions suggests a new mechanism to explain their upregulation under anaerobic growth conditions. Phylogenetic footprinting of the DNase I protected regions of seven promoters implies that there are at least two different classes of IscR binding sites conserved among many bacteria. The findings presented here indicate a more general role of IscR in the regulation of Fe-S cluster biogenesis and that IscR contributes to the O(2) regulation of several promoters controlling the expression of anaerobic Fe-S proteins.
Collapse
Affiliation(s)
- Jennifer L Giel
- Microbiology Doctoral Training Program, Department of Biomolecular Chemistry, University of Winsconsin, Madison, WI 53706, USA
| | | | | | | | | |
Collapse
|
12
|
White AP, Gibson DL, Kim W, Kay WW, Surette MG. Thin aggregative fimbriae and cellulose enhance long-term survival and persistence of Salmonella. J Bacteriol 2006; 188:3219-27. [PMID: 16621814 PMCID: PMC1447457 DOI: 10.1128/jb.188.9.3219-3227.2006] [Citation(s) in RCA: 212] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Salmonella spp. are environmentally persistent pathogens that have served as one of the important models for understanding how bacteria adapt to stressful conditions. However, it remains poorly understood how they survive extreme conditions encountered outside their hosts. Here we show that the rdar morphotype, a multicellular phenotype characterized by fimbria- and cellulose-mediated colony pattern formation, enhances the resistance of Salmonella to desiccation. When colonies were stored on plastic for several months in the absence of exogenous nutrients, survival of wild-type cells was increased compared to mutants deficient in fimbriae and/or cellulose production. Differences between strains were further highlighted upon exposure to sodium hypochlorite, as cellulose-deficient strains were 1,000-fold more susceptible. Measurements of gene expression using luciferase reporters indicated that production of thin aggregative fimbriae (Tafi) may initiate formation of colony surface patterns characteristic of the rdar morphotype. We hypothesize that Tafi play a role in the organization of different components of the extracellular matrix. Conservation of the rdar morphotype among pathogenic S. enterica isolates and the survival advantages that it provides collectively suggest that this phenotype could play a role in the transmission of Salmonella between hosts.
Collapse
Affiliation(s)
- A P White
- Department of Microbiology and Infectious Diseases, University of Calgary, 3330 Hospital Dr. NW, Calgary, Alberta T2N 4N1, Canada
| | | | | | | | | |
Collapse
|
13
|
Pritchard L, White JA, Birch PRJ, Toth IK. GenomeDiagram: a python package for the visualization of large-scale genomic data. Bioinformatics 2005; 22:616-7. [PMID: 16377612 DOI: 10.1093/bioinformatics/btk021] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED We present GenomeDiagram, a flexible, open-source Python module for the visualization of large-scale genomic, comparative genomic and other data with reference to a single chromosome or other biological sequence. GenomeDiagram may be used to generate publication-quality vector graphics, rastered images and in-line streamed graphics for webpages. The package integrates with datatypes from the BioPython project, and is available for Windows, Linux and Mac OS X systems. AVAILABILITY GenomeDiagram is freely available as source code (under GNU Public License) at http://bioinf.scri.ac.uk/lp/programs.html, and requires Python 2.3 or higher, and recent versions of the ReportLab and BioPython packages. SUPPLEMENTARY INFORMATION A user manual, example code and images are available at http://bioinf.scri.ac.uk/lp/programs.html.
Collapse
Affiliation(s)
- Leighton Pritchard
- Plant Pathogen Programme, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK.
| | | | | | | |
Collapse
|
14
|
Merkl R. A comparative categorization of protein function encoded in bacterial or archeal genomic islands. J Mol Evol 2005; 62:1-14. [PMID: 16341468 DOI: 10.1007/s00239-004-0311-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2004] [Accepted: 06/14/2005] [Indexed: 01/11/2023]
Abstract
Genomes of prokaryotes harbor genomic islands (GIs), which are frequently acquired via horizontal gene transfer (HGT). Here I present an analysis of GIs with respect to gene-encoded functions. GIs were identified by statistical analysis of codon usage and clustering. Genes classified as putatively alien (pA) or putatively native (pN) were categorized according to the COG database. Among pA and pN genes, the distribution of COG functions and classes were studied for different groupings of prokaryotes. Groups were formed according to taxonomical relation or habitats. In all groups, genes related to class L (replication, recombination, and repair) were statistically significantly overrepresented in GIs. GIs of bacteria and archaea showed a distinct pattern of preferences. In archeal GIs, genes belonging to COG class M (cell wall/membrane/envelope biogenesis) or Q (secondary metabolites biosynthesis, transport, and catabolism) were more frequent. In bacterial GIs, genes of classes U (intracellular trafficking, secretion, and vesicular transport), N (cell motility), and V (defense mechanisms) were predominant. Underrepresentation was strongest for genes belonging to class J (translation, ribosomal structure, and biogenesis). Among single COG functions overrepresented in GIs were transferases and transporters. In both superkingdoms, HGT enhances genomic content by meeting demands that are independent of the studied habitats. These findings are in agreement with the complexity theory, which predicts the preferential import of operational genes. However, only specific subsets of operational genes were enriched in GIs. Modification of the cell envelope, cell motility, secretion, and protection of cellular DNA are major issues in HGT.
Collapse
Affiliation(s)
- Rainer Merkl
- Institut für Biophysik und physikalische Biochemie, Universität Regensburg, D-93040 Regensburg, Germany.
| |
Collapse
|
15
|
Gerasimova AV, Gelfand MS. Evolution of the NadR regulon in Enterobacteriaceae. J Bioinform Comput Biol 2005; 3:1007-19. [PMID: 16078372 DOI: 10.1142/s0219720005001387] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2005] [Revised: 02/18/2005] [Accepted: 02/24/2005] [Indexed: 12/23/2022]
Abstract
The NAD biosynthetic pathway and NAD transformations in E. coli and S. typhi are well characterized. Using comparative genomics methods we describe the NadR regulon in other Enterobacteriaceae, identity new candidate regulon members and demonstrate that even a very simple regulon covering an essential methabolic pathway could be different in closely related genomes.
Collapse
Affiliation(s)
- Anna V Gerasimova
- Laboratory of Bioinformatics, State Scientific Center GOSNIIGenetika, 1-iy Dorozhny proezd 1, Moscow, 113545, Russia.
| | | |
Collapse
|
16
|
Srinivasan BS, Caberoy NB, Suen G, Taylor RG, Shah R, Tengra F, Goldman BS, Garza AG, Welch RD. Functional genome annotation through phylogenomic mapping. Nat Biotechnol 2005; 23:691-8. [PMID: 15940241 DOI: 10.1038/nbt1098] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Accurate determination of functional interactions among proteins at the genome level remains a challenge for genomic research. Here we introduce a genome-scale approach to functional protein annotation--phylogenomic mapping--that requires only sequence data, can be applied equally well to both finished and unfinished genomes, and can be extended beyond single genomes to annotate multiple genomes simultaneously. We have developed and applied it to more than 200 sequenced bacterial genomes. Proteins with similar evolutionary histories were grouped together, placed on a three dimensional map and visualized as a topographical landscape. The resulting phylogenomic maps display thousands of proteins clustered in mountains on the basis of coinheritance, a strong indicator of shared function. In addition to systematic computational validation, we have experimentally confirmed the ability of phylogenomic maps to predict both mutant phenotype and gene function in the delta proteobacterium Myxococcus xanthus.
Collapse
Affiliation(s)
- Balaji S Srinivasan
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Chiapello H, Bourgait I, Sourivong F, Heuclin G, Gendrault-Jacquemard A, Petit MA, El Karoui M. Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops. BMC Bioinformatics 2005; 6:171. [PMID: 16011797 PMCID: PMC1187871 DOI: 10.1186/1471-2105-6-171] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2005] [Accepted: 07/12/2005] [Indexed: 11/23/2022] Open
Abstract
Background Public databases now contain multitude of complete bacterial genomes, including several genomes of the same species. The available data offers new opportunities to address questions about bacterial genome evolution, a task that requires reliable fine comparison data of closely related genomes. Recent analyses have shown, using pairwise whole genome alignments, that it is possible to segment bacterial genomes into a common conserved backbone and strain-specific sequences called loops. Results Here, we generalize this approach and propose a strategy that allows systematic and non-biased genome segmentation based on multiple genome alignments. Segmentation analyses, as applied to 13 different bacterial species, confirmed the feasibility of our approach to discern the 'mosaic' organization of bacterial genomes. Segmentation results are available through a Web interface permitting functional analysis, extraction and visualization of the backbone/loops structure of documented genomes. To illustrate the potential of this approach, we performed a precise analysis of the mosaic organization of three E. coli strains and functional characterization of the loops. Conclusion The segmentation results including the backbone/loops structure of 13 bacterial species genomes are new and available for use by the scientific community at the URL: .
Collapse
Affiliation(s)
- H Chiapello
- Mathématique, Informatique & Génome, INRA Domaine de Vilvert, 78352 Jouy-en-Josas cedex, France
| | - I Bourgait
- Mathématique, Informatique & Génome, INRA Domaine de Vilvert, 78352 Jouy-en-Josas cedex, France
| | - F Sourivong
- Mathématique, Informatique & Génome, INRA Domaine de Vilvert, 78352 Jouy-en-Josas cedex, France
| | - G Heuclin
- Mathématique, Informatique & Génome, INRA Domaine de Vilvert, 78352 Jouy-en-Josas cedex, France
| | - A Gendrault-Jacquemard
- Mathématique, Informatique & Génome, INRA Domaine de Vilvert, 78352 Jouy-en-Josas cedex, France
| | - M-A Petit
- Unité de Recherches Laitières et Génétique Appliquée, INRA Domaine de Vilvert, 78352 Jouy-en-Josas cedex, France
| | - M El Karoui
- Unité de Recherches Laitières et Génétique Appliquée, INRA Domaine de Vilvert, 78352 Jouy-en-Josas cedex, France
| |
Collapse
|
18
|
Washietl S, Hofacker IL, Stadler PF. Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci U S A 2005; 102:2454-9. [PMID: 15665081 PMCID: PMC548974 DOI: 10.1073/pnas.0409169102] [Citation(s) in RCA: 520] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2004] [Indexed: 01/22/2023] Open
Abstract
We report an efficient method for detecting functional RNAs. The approach, which combines comparative sequence analysis and structure prediction, already has yielded excellent results for a small number of aligned sequences and is suitable for large-scale genomic screens. It consists of two basic components: (i) a measure for RNA secondary structure conservation based on computing a consensus secondary structure, and (ii) a measure for thermodynamic stability, which, in the spirit of a z score, is normalized with respect to both sequence length and base composition but can be calculated without sampling from shuffled sequences. Functional RNA secondary structures can be identified in multiple sequence alignments with high sensitivity and high specificity. We demonstrate that this approach is not only much more accurate than previous methods but also significantly faster. The method is implemented in the program rnaz, which can be downloaded from www.tbi.univie.ac.at/~wash/RNAz. We screened all alignments of length n > or = 50 in the Comparative Regulatory Genomics database, which compiles conserved noncoding elements in upstream regions of orthologous genes from human, mouse, rat, Fugu, and zebrafish. We recovered all of the known noncoding RNAs and cis-acting elements with high significance and found compelling evidence for many other conserved RNA secondary structures not described so far to our knowledge.
Collapse
Affiliation(s)
- Stefan Washietl
- Department of Theoretical Chemistry and Structural Biology, University of Vienna, Währingerstrasse 17, A-1090 Wien, Austria
| | | | | |
Collapse
|
19
|
Washietl S, Hofacker IL. Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. J Mol Biol 2004; 342:19-30. [PMID: 15313604 DOI: 10.1016/j.jmb.2004.07.018] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2004] [Revised: 07/05/2004] [Accepted: 07/09/2004] [Indexed: 01/22/2023]
Abstract
Facing the ever-growing list of newly discovered classes of functional RNAs, it can be expected that further types of functional RNAs are still hidden in recently completed genomes. The computational identification of such RNA genes is, therefore, of major importance. While most known functional RNAs have characteristic secondary structures, their free energies are generally not statistically significant enough to distinguish RNA genes from the genomic background. Additional information is required. Considering the wide availability of new genomic data of closely related species, comparative studies seem to be the most promising approach. Here, we show that prediction of consensus structures of aligned sequences can be a significant measure to detect functional RNAs. We report a new method to test multiple sequence alignments for the existence of an unusually structured and conserved fold. We show for alignments of six types of well-known functional RNA that an energy score consisting of free energy and a covariation term significantly improves sensitivity compared to single sequence predictions. We further test our method on a number of non-coding RNAs from Caenorhabditis elegans/Caenorhabditis briggsae and seven Saccharomyces species. Most RNAs can be detected with high significance. We provide a Perl implementation that can be used readily to score single alignments and discuss how the methods described here can be extended to allow for efficient genome-wide screens.
Collapse
Affiliation(s)
- Stefan Washietl
- Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstrasse 17, A-1090, Austria
| | | |
Collapse
|
20
|
Abstract
Differences in gene repertoire among bacterial genomes are usually ascribed to gene loss or to lateral gene transfer from unrelated cellular organisms. However, most bacteria contain large numbers of ORFans, that is, annotated genes that are restricted to a particular genome and that possess no known homologs. The uniqueness of ORFans within a genome has precluded the use of a comparative approach to examine their function and evolution. However, by identifying sequences unique to monophyletic groups at increasing phylogenetic depths, we can make direct comparisons of the characteristics of ORFans of different ages in the Escherichia coli genome, and establish their functional status and evolutionary rates. Relative to the genes ancestral to gamma-Proteobacteria and to those genes distributed sporadically in other prokaryotic species, ORFans in the E. coli lineage are short, A+T rich, and evolve quickly. Moreover, most encode functional proteins. Based on these features, ORFans are not attributable to errors in gene annotation, limitations of current databases, or to failure of methods for detecting homology. Rather, ORFans in the genomes of free-living microorganisms apparently derive from bacteriophage and occasionally become established by assuming roles in key cellular functions.
Collapse
Affiliation(s)
- Vincent Daubin
- Department of Biochemistry & Molecular Biophysics, University of Arizona, Tucson, Arizona 85721, USA.
| | | |
Collapse
|