2851
|
Abstract
Antibiotic resistance in Gram-negative bacteria is often due to the acquisition of resistance genes from a shared pool. In multiresistant isolates these genes, together with associated mobile elements, may be found in complex conglomerations on plasmids or on the chromosome. Analysis of available sequences reveals that these multiresistance regions (MRR) are modular, mosaic structures composed of different combinations of components from a limited set arranged in a limited number of ways. Components common to different MRR provide targets for homologous recombination, allowing these regions to evolve by combinatorial evolution, but our understanding of this process is far from complete. Advances in technology are leading to increasing amounts of sequence data, but currently available automated annotation methods usually focus on identifying ORFs and predicting protein function by homology. In MRR, where the genes are often well characterized, the challenge is to identify precisely which genes are present and to define the boundaries of complete and fragmented mobile elements. This review aims to summarize the types of mobile elements involved in multiresistance in Gram-negative bacteria and their associations with particular resistance genes, to describe common components of MRR and to illustrate methods for detailed analysis of these regions.
Collapse
Affiliation(s)
- Sally R Partridge
- Centre for Infectious Diseases and Microbiology, The University of Sydney, Westmead Hospital, Sydney, NSW 2145, Australia.
| |
Collapse
|
2852
|
Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. Cactus: Algorithms for genome multiple sequence alignment. Genome Res 2011; 21:1512-28. [PMID: 21665927 DOI: 10.1101/gr.123356.111] [Citation(s) in RCA: 162] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms in the new "Cactus" alignment program. We test Cactus using the Evolver genome evolution simulator, a comprehensive new tool for simulation, and show using these and existing simulations that Cactus significantly outperforms all of its peers. Finally, we make an empirical assessment of Cactus's ability to properly align genes and find interesting cases of intra-gene duplication within the primates.
Collapse
Affiliation(s)
- Benedict Paten
- Center for Biomolecular Science and Engineering, University of California-Santa Cruz, CA 95064, USA.
| | | | | | | | | | | |
Collapse
|
2853
|
Mancheron A, Uricaru R, Rivals E. An alternative approach to multiple genome comparison. Nucleic Acids Res 2011; 39:e101. [PMID: 21646341 PMCID: PMC3159434 DOI: 10.1093/nar/gkr177] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Genome comparison is now a crucial step for genome annotation and identification of regulatory motifs. Genome comparison aims for instance at finding genomic regions either specific to or in one-to-one correspondance between individuals/strains/species. It serves e.g. to pre-annotate a new genome by automatically transfering annotations from a known one. However, efficiency, flexibility and objectives of current methods do not suit the whole spectrum of applications, genome sizes and organizations. Innovative approaches are still needed. Hence, we propose an alternative way of comparing multiple genomes based on segmentation by similarity. In this framework, rather than being formulated as a complex optimization problem, genome comparison is seen as a segmentation question for which a single optimal solution can be found in almost linear time. We apply our method to analyse three strains of a virulent pathogenic bacteria, Ehrlichia ruminantium, and identify 92 new genes. We also find out that a substantial number of genes thought to be strain specific have potential orthologs in the other strains. Our solution is implemented in an efficient program, qod, equipped with a user-friendly interface, and enables the automatic transfer of annotations betwen compared genomes or contigs (Video in Supplementary Data). Because it somehow disregards the relative order of genomic blocks, qod can handle unfinished genomes, which due to the difficulty of sequencing completion may become an interesting characteristic for the future. Availabilty: http://www.atgc-montpellier.fr/qod.
Collapse
Affiliation(s)
- Alban Mancheron
- LIRMM - CNRS, Université Montpellier 2 - CC 477, 161, rue Ada, 34095 Montpellier Cedex 5, France
| | | | | |
Collapse
|
2854
|
Exploring the common molecular basis for the universal DNA mutation bias: Revival of Löwdin mutation model. Biochem Biophys Res Commun 2011; 409:367-71. [PMID: 21586276 DOI: 10.1016/j.bbrc.2011.05.017] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2011] [Accepted: 05/03/2011] [Indexed: 11/23/2022]
|
2855
|
Lai AG, Denton-Giles M, Mueller-Roeber B, Schippers JHM, Dijkwel PP. Positional information resolves structural variations and uncovers an evolutionarily divergent genetic locus in accessions of Arabidopsis thaliana. Genome Biol Evol 2011; 3:627-40. [PMID: 21622917 PMCID: PMC3157834 DOI: 10.1093/gbe/evr038] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Genome sequencing of closely related individuals has yielded valuable insights that link genome evolution to phenotypic variations. However, advancement in sequencing technology has also led to an escalation in the number of poor quality–drafted genomes assembled based on reference genomes that can have highly divergent or haplotypic regions. The self-fertilizing nature of Arabidopsis thaliana poses an advantage to sequencing projects because its genome is mostly homozygous. To determine the accuracy of an Arabidopsis drafted genome in less conserved regions, we performed a resequencing experiment on a ∼371-kb genomic interval in the Landsberg erecta (Ler-0) accession. We identified novel structural variations (SVs) between Ler-0 and the reference accession Col-0 using a long-range polymerase chain reaction approach to generate an Illumina data set that has positional information, that is, a data set with reads that map to a known location. Positional information is important for accurate genome assembly and the resolution of SVs particularly in highly duplicated or repetitive regions. Sixty-one regions with misassembly signatures were identified from the Ler-0 draft, suggesting the presence of novel SVs that are not represented in the draft sequence. Sixty of those were resolved by iterative mapping using our data set. Fifteen large indels (>100 bp) identified from this study were found to be located either within protein-coding regions or upstream regulatory regions, suggesting the formation of novel alleles or altered regulation of existing genes in Ler-0. We propose future genome-sequencing experiments to follow a clone-based approach that incorporates positional information to ultimately reveal haplotype-specific differences between accessions.
Collapse
Affiliation(s)
- Alvina G Lai
- Institute of Molecular BioSciences, Massey University, Private Bag 11-222, Palmerston North 4442, New Zealand
| | | | | | | | | |
Collapse
|
2856
|
Joseph SJ, Didelot X, Gandhi K, Dean D, Read TD. Interplay of recombination and selection in the genomes of Chlamydia trachomatis. Biol Direct 2011; 6:28. [PMID: 21615910 PMCID: PMC3126793 DOI: 10.1186/1745-6150-6-28] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Accepted: 05/26/2011] [Indexed: 02/02/2023] Open
Abstract
Background Chlamydia trachomatis is an obligate intracellular bacterial parasite, which causes several severe and debilitating diseases in humans. This study uses comparative genomic analyses of 12 complete published C. trachomatis genomes to assess the contribution of recombination and selection in this pathogen and to understand the major evolutionary forces acting on the genome of this bacterium. Results The conserved core genes of C. trachomatis are a large proportion of the pan-genome: we identified 836 core genes in C. trachomatis out of a range of 874-927 total genes in each genome. The ratio of recombination events compared to mutation (ρ/θ) was 0.07 based on ancestral reconstructions using the ClonalFrame tool, but recombination had a significant effect on genetic diversification (r/m = 0.71). The distance-dependent decay of linkage disequilibrium also indicated that C. trachomatis populations behaved intermediately between sexual and clonal extremes. Fifty-five genes were identified as having a history of recombination and 92 were under positive selection based on statistical tests. Twenty-three genes showed evidence of being under both positive selection and recombination, which included genes with a known role in virulence and pathogencity (e.g., ompA, pmps, tarp). Analysis of inter-clade recombination flux indicated non-uniform currents of recombination between clades, which suggests the possibility of spatial population structure in C. trachomatis infections. Conclusions C. trachomatis is the archetype of a bacterial species where recombination is relatively frequent yet gene gains by horizontal gene transfer (HGT) and losses (by deletion) are rare. Gene conversion occurs at sites across the whole C. trachomatis genome but may be more often fixed in genes that are under diversifying selection. Furthermore, genome sequencing will reveal patterns of serotype specific gene exchange and selection that will generate important research questions for understanding C. trachomatis pathogenesis. Reviewers This article was reviewed by Dr. Jeremy Selengut, Dr. Lee S. Katz (nominated by Dr. I. King Jordan) and Dr. Arcady Mushegian.
Collapse
Affiliation(s)
- Sandeep J Joseph
- Department of Medicine, Division of Infectious, Diseases Emory University School of Medicine, 615 Michael Street, Atlanta, GA 30322, USA
| | | | | | | | | |
Collapse
|
2857
|
Pignatelli M, Moya A. Evaluating the fidelity of de novo short read metagenomic assembly using simulated data. PLoS One 2011; 6:e19984. [PMID: 21625384 PMCID: PMC3100316 DOI: 10.1371/journal.pone.0019984] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2010] [Accepted: 04/22/2011] [Indexed: 02/07/2023] Open
Abstract
A frequent step in metagenomic data analysis comprises the assembly of the sequenced reads. Many assembly tools have been published in the last years targeting data coming from next-generation sequencing (NGS) technologies but these assemblers have not been designed for or tested in multi-genome scenarios that characterize metagenomic studies. Here we provide a critical assessment of current de novo short reads assembly tools in multi-genome scenarios using complex simulated metagenomic data. With this approach we tested the fidelity of different assemblers in metagenomic studies demonstrating that even under the simplest compositions the number of chimeric contigs involving different species is noticeable. We further showed that the assembly process reduces the accuracy of the functional classification of the metagenomic data and that these errors can be overcome raising the coverage of the studied metagenome. The results presented here highlight the particular difficulties that de novo genome assemblers face in multi-genome scenarios demonstrating that these difficulties, that often compromise the functional classification of the analyzed data, can be overcome with a high sequencing effort.
Collapse
Affiliation(s)
- Miguel Pignatelli
- Unitat Mixta d'Investigació en Genòmica i Salut, Centre Superior d'Investigació en Salut Pública/UVEG-Institut Cavanilles, Valencia, Spain.
| | | |
Collapse
|
2858
|
Dikow RB. Genome-level homology and phylogeny of Shewanella (Gammaproteobacteria: lteromonadales: Shewanellaceae). BMC Genomics 2011; 12:237. [PMID: 21569439 PMCID: PMC3107185 DOI: 10.1186/1471-2164-12-237] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2010] [Accepted: 05/12/2011] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The explosion in availability of whole genome data provides the opportunity to build phylogenetic hypotheses based on these data as well as the ability to learn more about the genomes themselves. The biological history of genes and genomes can be investigated based on the taxomonic history provided by the phylogeny. A phylogenetic hypothesis based on complete genome data is presented for the genus Shewanella (Gammaproteobacteria: Alteromonadales: Shewanellaceae). Nineteen taxa from Shewanella (16 species and 3 additional strains of one species) as well as three outgroup species representing the genera Aeromonas (Gammaproteobacteria: Aeromonadales: Aeromonadaceae), Alteromonas (Gammaproteobacteria: Alteromonadales: Alteromonadaceae) and Colwellia (Gammaproteobacteria: Alteromonadales: Colwelliaceae) are included for a total of 22 taxa. RESULTS Putatively homologous regions were found across unannotated genomes and tested with a phylogenetic analysis. Two genome-wide data-sets are considered, one including only those genomic regions for which all taxa are represented, which included 3,361,015 aligned nucleotide base-pairs (bp) and a second that additionally includes those regions present in only subsets of taxa, which totaled 12,456,624 aligned bp. Alignment columns in these large data-sets were then randomly sampled to create smaller data-sets. After the phylogenetic hypothesis was generated, genome annotations were projected onto the DNA sequence alignment to compare the historical hypothesis generated by the phylogeny with the functional hypothesis posited by annotation. CONCLUSIONS Individual phylogenetic analyses of the 243 locally co-linear genome regions all failed to recover the genome topology, but the smaller data-sets that were random samplings of the large concatenated alignments all produced the genome topology. It is shown that there is not a single orthologous copy of 16S rRNA across the taxon sampling included in this study and that the relationships among the multiple copies are consistent with 16S rRNA undergoing concerted evolution. Unannotated whole genome data can provide excellent raw material for generating hypotheses of historical homology, which can be tested with phylogenetic analysis and compared with hypotheses of gene function.
Collapse
Affiliation(s)
- Rebecca B Dikow
- Committee on Evolutionary Biology, The University of Chicago, Chicago, IL, USA.
| |
Collapse
|
2859
|
Hadjifrangiskou M, Kostakioti M, Chen SL, Henderson JP, Greene SE, Hultgren SJ. A central metabolic circuit controlled by QseC in pathogenic Escherichia coli. Mol Microbiol 2011; 80:1516-29. [PMID: 21542868 DOI: 10.1111/j.1365-2958.2011.07660.x] [Citation(s) in RCA: 98] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The QseC sensor kinase regulates virulence in multiple Gram-negative pathogens, by controlling the activity of the QseB response regulator. We have previously shown that qseC deletion interferes with dephosphorylation of QseB thus unleashing what appears to be an uncontrolled positive feedback loop stimulating increased QseB levels. Deletion of QseC downregulates virulence gene expression and attenuates enterohaemorrhagic and uropathogenic Escherichia coli (EHEC and UPEC), Salmonella typhimurium, and Francisella tularensis. Given that these pathogens employ different infection strategies and virulence factors, we used genome-wide approaches to better understand the role of the QseBC interplay in pathogenesis. We found that deletion of qseC results in misregulation of nucleotide, amino acid, and carbon metabolism. Comparable metabolic changes are seen in EHEC ΔqseC, suggesting that deletion of qseC confers similar pleiotropic effects in these two different pathogens. Disruption of representative metabolic enzymes phenocopied UPEC ΔqseC in vivo and resulted in virulence factor downregulation. We thus propose that in the absence of QseC, the constitutively active QseB leads to pleiotropic effects, impairing bacterial metabolism, and thereby attenuating virulence. These findings provide a basis for the development of antimicrobials targeting the phosphatase activity of QseC, as a means to attenuate a wide range of QseC-bearing pathogens.
Collapse
Affiliation(s)
- Maria Hadjifrangiskou
- Department of Molecular Microbiology and Microbial Pathogenesis, Washington University in Saint Louis School of Medicine, 660 S Euclid, St Louis, MO 63110-1010, USA
| | | | | | | | | | | |
Collapse
|
2860
|
Complete genome sequence of the commensal Enterococcus faecalis 62, isolated from a healthy Norwegian infant. J Bacteriol 2011; 193:2377-8. [PMID: 21398545 DOI: 10.1128/jb.00183-11] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The genome of Enterococcus faecalis 62, a commensal isolate from a healthy Norwegian infant, revealed multiple adaptive traits to the gastrointestinal tract (GIT) environment and the milk-containing diet of breast-fed infants. Adaptation to a commensal existence was emphasized by lactose and other carbohydrate metabolism genes within genomic islands, accompanied by the absence of virulence traits.
Collapse
|
2861
|
Substoichiometrically different mitotypes coexist in mitochondrial genomes of Brassica napus L. PLoS One 2011; 6:e17662. [PMID: 21423700 PMCID: PMC3053379 DOI: 10.1371/journal.pone.0017662] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Accepted: 02/07/2011] [Indexed: 12/20/2022] Open
Abstract
Cytoplasmic male sterility (CMS) has been identified in numerous plant species. Brassica napus CMS plants, such as Polima (pol), MI, and Shaan 2A, have been identified independently by different researchers with different materials in conventional breeding processes. How this kind of CMS emerges is unclear. Here, we report the mitochondrial genome sequence of the prevalent mitotype in the most widely used pol-CMS line, which has a length of 223,412 bp and encodes 34 proteins, 3 ribosomal RNAs, and 18 tRNAs, including two near identical copies of trnH. Of these 55 genes, 48 were found to be identical to their equivalents in the “nap” cytoplasm. The nap mitotype carries only one copy of trnH, and the sequences of five of the six remaining genes are highly similar to their equivalents in the pol mitotype. Forty-four open reading frames (ORFs) with unknown function were detected, including two unique to the pol mitotype (orf122 and orf132). At least five rearrangement events are required to account for the structural differences between the pol and nap sequences. The CMS-related orf224 neighboring region (∼5 kb) rearranged twice. PCR profiling based on mitotype-specific primer pairs showed that both mitotypes are present in B. napus cultivars. Quantitative PCR showed that the pol cytoplasm consists mainly of the pol mitotype, and the nap mitotype is the main genome of nap cytoplasm. Large variation in the copy number ratio of mitotypes was found, even among cultivars sharing the same cytoplasm. The coexistence of mitochondrial mitotypes and substoichiometric shifting can explain the emergence of CMS in B. napus.
Collapse
|
2862
|
Abstract
Neisseria meningitidis is an obligate human pathogen. While it is a frequent commensal of the upper respiratory tract, in some individuals the bacterium spreads to the bloodstream, causing meningitis and/or sepsis, which are serious conditions with high morbidity and mortality. Here we report the availability of the genome sequence of the widely used serogroup B laboratory strain H44/76.
Collapse
|
2863
|
Evidence of a dominant lineage of Vibrio cholerae-specific lytic bacteriophages shed by cholera patients over a 10-year period in Dhaka, Bangladesh. mBio 2011; 2:e00334-10. [PMID: 21304168 PMCID: PMC3037004 DOI: 10.1128/mbio.00334-10] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Lytic bacteriophages are hypothesized to contribute to the seasonality and duration of cholera epidemics in Bangladesh. However, the bacteriophages contributing to this phenomenon have yet to be characterized at a molecular genetic level. In this study, we isolated and sequenced the genomes of 15 bacteriophages from stool samples from cholera patients spanning a 10-year surveillance period in Dhaka, Bangladesh. Our results indicate that a single novel bacteriophage type, designated ICP1 (for the International Centre for Diarrhoeal Disease Research, Bangladesh cholera phage 1) is present in all stool samples from cholera patients, while two other bacteriophage types, one novel (ICP2) and one T7-like (ICP3), are transient. ICP1 is a member of the Myoviridae family and has a 126-kilobase genome comprising 230 open reading frames. Comparative sequence analysis of ICP1 and related isolates from this time period indicates a high level of genetic conservation. The ubiquitous presence of ICP1 in cholera patients and the finding that the O1 antigen of lipopolysaccharide (LPS) serves as the ICP1 receptor suggest that ICP1 is extremely well adapted to predation of human-pathogenic V. cholerae O1.
Collapse
|
2864
|
Munro SA, Zinder SH, Walker LP. ORIGINAL RESEARCH: Comparative constraint-based model development for thermophilic hydrogen production. Ind Biotechnol (New Rochelle N Y) 2011. [DOI: 10.1089/ind.2011.7.063] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Affiliation(s)
- Sarah A. Munro
- Department of Biological and Environmental Engineering, Cornell University, Ithaca, New York USA 14853
| | - Stephen H. Zinder
- Department of Microbiology, Cornell University, Ithaca, New York USA 14853
| | - Larry P. Walker
- Department of Biological and Environmental Engineering, Cornell University, Ithaca, New York USA 14853
| |
Collapse
|
2865
|
Sahl JW, Steinsland H, Redman JC, Angiuoli SV, Nataro JP, Sommerfelt H, Rasko DA. A comparative genomic analysis of diverse clonal types of enterotoxigenic Escherichia coli reveals pathovar-specific conservation. Infect Immun 2011; 79:950-60. [PMID: 21078854 PMCID: PMC3028850 DOI: 10.1128/iai.00932-10] [Citation(s) in RCA: 100] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2010] [Revised: 10/06/2010] [Accepted: 11/01/2010] [Indexed: 11/20/2022] Open
Abstract
Enterotoxigenic Escherichia coli (ETEC) is a major cause of diarrheal illness in children less than 5 years of age in low- and middle-income nations, whereas it is an emerging enteric pathogen in industrialized nations. Despite being an important cause of diarrhea, little is known about the genomic composition of ETEC. To address this, we sequenced the genomes of five ETEC isolates obtained from children in Guinea-Bissau with diarrhea. These five isolates represent distinct and globally dominant ETEC clonal groups. Comparative genomic analyses utilizing a gene-independent whole-genome alignment method demonstrated that sequenced ETEC strains share approximately 2.7 million bases of genomic sequence. Phylogenetic analysis of this "core genome" confirmed the diverse history of the ETEC pathovar and provides a finer resolution of the E. coli relationships than multilocus sequence typing. No identified genomic regions were conserved exclusively in all ETEC genomes; however, we identified more genomic content conserved among ETEC genomes than among non-ETEC E. coli genomes, suggesting that ETEC isolates share a genomic core. Comparisons of known virulence and of surface-exposed and colonization factor genes across all sequenced ETEC genomes not only identified variability but also indicated that some antigens are restricted to the ETEC pathovar. Overall, the generation of these five genome sequences, in addition to the two previously generated ETEC genomes, highlights the genomic diversity of ETEC. These studies increase our understanding of ETEC evolution, as well as provide insight into virulence factors and conserved proteins, which may be targets for vaccine development.
Collapse
Affiliation(s)
- Jason W. Sahl
- Institute for Genome Sciences, Department of Pediatrics, Center for Vaccine Development, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, Centre for International Health, Department of Biomedicine, University of Bergen, Bergen, Norway, Division of Infectious Disease Control, Norwegian Institute of Public Health, Oslo, Norway
| | - Hans Steinsland
- Institute for Genome Sciences, Department of Pediatrics, Center for Vaccine Development, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, Centre for International Health, Department of Biomedicine, University of Bergen, Bergen, Norway, Division of Infectious Disease Control, Norwegian Institute of Public Health, Oslo, Norway
| | - Julia C. Redman
- Institute for Genome Sciences, Department of Pediatrics, Center for Vaccine Development, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, Centre for International Health, Department of Biomedicine, University of Bergen, Bergen, Norway, Division of Infectious Disease Control, Norwegian Institute of Public Health, Oslo, Norway
| | - Samuel V. Angiuoli
- Institute for Genome Sciences, Department of Pediatrics, Center for Vaccine Development, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, Centre for International Health, Department of Biomedicine, University of Bergen, Bergen, Norway, Division of Infectious Disease Control, Norwegian Institute of Public Health, Oslo, Norway
| | - James P. Nataro
- Institute for Genome Sciences, Department of Pediatrics, Center for Vaccine Development, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, Centre for International Health, Department of Biomedicine, University of Bergen, Bergen, Norway, Division of Infectious Disease Control, Norwegian Institute of Public Health, Oslo, Norway
| | - Halvor Sommerfelt
- Institute for Genome Sciences, Department of Pediatrics, Center for Vaccine Development, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, Centre for International Health, Department of Biomedicine, University of Bergen, Bergen, Norway, Division of Infectious Disease Control, Norwegian Institute of Public Health, Oslo, Norway
| | - David A. Rasko
- Institute for Genome Sciences, Department of Pediatrics, Center for Vaccine Development, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, Centre for International Health, Department of Biomedicine, University of Bergen, Bergen, Norway, Division of Infectious Disease Control, Norwegian Institute of Public Health, Oslo, Norway
| |
Collapse
|
2866
|
Herbig A, Nieselt K. nocoRNAc: characterization of non-coding RNAs in prokaryotes. BMC Bioinformatics 2011; 12:40. [PMID: 21281482 PMCID: PMC3230914 DOI: 10.1186/1471-2105-12-40] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Accepted: 01/31/2011] [Indexed: 11/10/2022] Open
Abstract
Background The interest in non-coding RNAs (ncRNAs) constantly rose during the past few years because of the wide spectrum of biological processes in which they are involved. This led to the discovery of numerous ncRNA genes across many species. However, for most organisms the non-coding transcriptome still remains unexplored to a great extent. Various experimental techniques for the identification of ncRNA transcripts are available, but as these methods are costly and time-consuming, there is a need for computational methods that allow the detection of functional RNAs in complete genomes in order to suggest elements for further experiments. Several programs for the genome-wide prediction of functional RNAs have been developed but most of them predict a genomic locus with no indication whether the element is transcribed or not. Results We present NOCORNAc, a program for the genome-wide prediction of ncRNA transcripts in bacteria. NOCORNAc incorporates various procedures for the detection of transcriptional features which are then integrated with functional ncRNA loci to determine the transcript coordinates. We applied RNAz and NOCORNAc to the genome of Streptomyces coelicolor and detected more than 800 putative ncRNA transcripts most of them located antisense to protein-coding regions. Using a custom design microarray we profiled the expression of about 400 of these elements and found more than 300 to be transcribed, 38 of them are predicted novel ncRNA genes in intergenic regions. The expression patterns of many ncRNAs are similarly complex as those of the protein-coding genes, in particular many antisense ncRNAs show a high expression correlation with their protein-coding partner. Conclusions We have developed NOCORNAc, a framework that facilitates the automated characterization of functional ncRNAs. NOCORNAc increases the confidence of predicted ncRNA loci, especially if they contain transcribed ncRNAs. NOCORNAc is not restricted to intergenic regions, but it is applicable to the prediction of ncRNA transcripts in whole microbial genomes. The software as well as a user guide and example data is available at http://www.zbit.uni-tuebingen.de/pas/nocornac.htm.
Collapse
Affiliation(s)
- Alexander Herbig
- Center for Bioinformatics Tübingen, University of Tübingen, Sand 14, 72076 Tübingen, Germany
| | | |
Collapse
|
2867
|
Abstract
Summary: Easyfig is a Python application for creating linear comparison figures of multiple genomic loci with an easy-to-use graphical user interface. BLAST comparisons between multiple genomic regions, ranging from single genes to whole prokaryote chromosomes, can be generated, visualized and interactively coloured, enabling a rapid transition between analysis and the preparation of publication quality figures. Availability: Easyfig is freely available (under a GPL license) for download (for Mac OS X, Unix and Microsoft Windows) from the SourceForge web site: http://easyfig.sourceforge.net/. Contact:s.beatson@uq.edu.au
Collapse
Affiliation(s)
- Mitchell J Sullivan
- Australian Infectious Diseases Research Centre, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4072, Australia
| | | | | |
Collapse
|
2868
|
Davie JJ, Earl J, de Vries SPW, Ahmed A, Hu FZ, Bootsma HJ, Stol K, Hermans PWM, Wadowsky RM, Ehrlich GD, Hays JP, Campagnari AA. Comparative analysis and supragenome modeling of twelve Moraxella catarrhalis clinical isolates. BMC Genomics 2011; 12:70. [PMID: 21269504 PMCID: PMC3045334 DOI: 10.1186/1471-2164-12-70] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2010] [Accepted: 01/26/2011] [Indexed: 12/18/2022] Open
Abstract
Background M. catarrhalis is a gram-negative, gamma-proteobacterium and an opportunistic human pathogen associated with otitis media (OM) and exacerbations of chronic obstructive pulmonary disease (COPD). With direct and indirect costs for treating these conditions annually exceeding $33 billion in the United States alone, and nearly ubiquitous resistance to beta-lactam antibiotics among M. catarrhalis clinical isolates, a greater understanding of this pathogen's genome and its variability among isolates is needed. Results The genomic sequences of ten geographically and phenotypically diverse clinical isolates of M. catarrhalis were determined and analyzed together with two publicly available genomes. These twelve genomes were subjected to detailed comparative and predictive analyses aimed at characterizing the supragenome and understanding the metabolic and pathogenic potential of this species. A total of 2383 gene clusters were identified, of which 1755 are core with the remaining 628 clusters unevenly distributed among the twelve isolates. These findings are consistent with the distributed genome hypothesis (DGH), which posits that the species genome possesses a far greater number of genes than any single isolate. Multiple and pair-wise whole genome alignments highlight limited chromosomal re-arrangement. Conclusions M. catarrhalis gene content and chromosomal organization data, although supportive of the DGH, show modest overall genic diversity. These findings are in stark contrast with the reported heterogeneity of the species as a whole, as wells as to other bacterial pathogens mediating OM and COPD, providing important insight into M. catarrhalis pathogenesis that will aid in the development of novel therapeutic regimens.
Collapse
Affiliation(s)
- Jeremiah J Davie
- Department of Microbiology and Immunology, University at Buffalo, Buffalo, New York, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
2869
|
Indexing Finite Language Representation of Population Genotypes. LECTURE NOTES IN COMPUTER SCIENCE 2011. [DOI: 10.1007/978-3-642-23038-7_23] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
2870
|
Abstract
Motivation: The relative ease and low cost of current generation sequencing technologies has led to a dramatic increase in the number of sequenced genomes for species across the tree of life. This increasing volume of data requires tools that can quickly compare multiple whole-genome sequences, millions of base pairs in length, to aid in the study of populations, pan-genomes, and genome evolution. Results: We present a new multiple alignment tool for whole genomes named Mugsy. Mugsy is computationally efficient and can align 31 Streptococcus pneumoniae genomes in less than 2 hours producing alignments that compare favorably to other tools. Mugsy is also the fastest program evaluated for the multiple alignment of assembled human chromosome sequences from four individuals. Mugsy does not require a reference sequence, can align mixtures of assembled draft and completed genome data, and is robust in identifying a rich complement of genetic variation including duplications, rearrangements, and large-scale gain and loss of sequence. Availability: Mugsy is free, open-source software available from http://mugsy.sf.net. Contact:angiuoli@cs.umd.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Samuel V Angiuoli
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA.
| | | |
Collapse
|
2871
|
Didelot X, Lawson D, Darling A, Falush D. Inference of homologous recombination in bacteria using whole-genome sequences. Genetics 2010; 186:1435-49. [PMID: 20923983 PMCID: PMC2998322 DOI: 10.1534/genetics.110.120121] [Citation(s) in RCA: 118] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2010] [Accepted: 10/01/2010] [Indexed: 11/18/2022] Open
Abstract
Bacteria and archaea reproduce clonally, but sporadically import DNA into their chromosomes from other organisms. In many of these events, the imported DNA replaces an homologous segment in the recipient genome. Here we present a new method to reconstruct the history of recombination events that affected a given sample of bacterial genomes. We introduce a mathematical model that represents both the donor and the recipient of each DNA import as an ancestor of the genomes in the sample. The model represents a simplification of the previously described coalescent with gene conversion. We implement a Monte Carlo Markov chain algorithm to perform inference under this model from sequence data alignments and show that inference is feasible for whole-genome alignments through parallelization. Using simulated data, we demonstrate accurate and reliable identification of individual recombination events and global recombination rate parameters. We applied our approach to an alignment of 13 whole genomes from the Bacillus cereus group. We find, as expected from laboratory experiments, that the recombination rate is higher between closely related organisms and also that the genome contains several broad regions of elevated levels of recombination. Application of the method to the genomic data sets that are becoming available should reveal the evolutionary history and private lives of populations of bacteria and archaea. The methods described in this article have been implemented in a computer software package, ClonalOrigin, which is freely available from http://code.google.com/p/clonalorigin/.
Collapse
Affiliation(s)
- Xavier Didelot
- Department of Statistics, University of Oxford, Oxford, UK.
| | | | | | | |
Collapse
|
2872
|
Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions. BMC Bioinformatics 2010; 11:461. [PMID: 20843356 PMCID: PMC2949892 DOI: 10.1186/1471-2105-11-461] [Citation(s) in RCA: 189] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2010] [Accepted: 09/15/2010] [Indexed: 01/21/2023] Open
Abstract
Background The pan-genome of a bacterial species consists of a core and an accessory gene pool. The accessory genome is thought to be an important source of genetic variability in bacterial populations and is gained through lateral gene transfer, allowing subpopulations of bacteria to better adapt to specific niches. Low-cost and high-throughput sequencing platforms have created an exponential increase in genome sequence data and an opportunity to study the pan-genomes of many bacterial species. In this study, we describe a new online pan-genome sequence analysis program, Panseq. Results Panseq was used to identify Escherichia coli O157:H7 and E. coli K-12 genomic islands. Within a population of 60 E. coli O157:H7 strains, the existence of 65 accessory genomic regions identified by Panseq analysis was confirmed by PCR. The accessory genome and binary presence/absence data, and core genome and single nucleotide polymorphisms (SNPs) of six L. monocytogenes strains were extracted with Panseq and hierarchically clustered and visualized. The nucleotide core and binary accessory data were also used to construct maximum parsimony (MP) trees, which were compared to the MP tree generated by multi-locus sequence typing (MLST). The topology of the accessory and core trees was identical but differed from the tree produced using seven MLST loci. The Loci Selector module found the most variable and discriminatory combinations of four loci within a 100 loci set among 10 strains in 1 s, compared to the 449 s required to exhaustively search for all possible combinations; it also found the most discriminatory 20 loci from a 96 loci E. coli O157:H7 SNP dataset. Conclusion Panseq determines the core and accessory regions among a collection of genomic sequences based on user-defined parameters. It readily extracts regions unique to a genome or group of genomes, identifies SNPs within shared core genomic regions, constructs files for use in phylogeny programs based on both the presence/absence of accessory regions and SNPs within core regions and produces a graphical overview of the output. Panseq also includes a loci selector that calculates the most variable and discriminatory loci among sets of accessory loci or core gene SNPs. Availability Panseq is freely available online at http://76.70.11.198/panseq. Panseq is written in Perl.
Collapse
|