1
|
Haroon M, Wang X, Afzal R, Zafar MM, Idrees F, Batool M, Khan AS, Imran M. Novel Plant Breeding Techniques Shake Hands with Cereals to Increase Production. PLANTS (BASEL, SWITZERLAND) 2022; 11:plants11081052. [PMID: 35448780 PMCID: PMC9025237 DOI: 10.3390/plants11081052] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 04/07/2022] [Accepted: 04/10/2022] [Indexed: 06/01/2023]
Abstract
Cereals are the main source of human food on our planet. The ever-increasing food demand, continuously changing environment, and diseases of cereal crops have made adequate production a challenging task for feeding the ever-increasing population. Plant breeders are striving their hardest to increase production by manipulating conventional breeding methods based on the biology of plants, either self-pollinating or cross-pollinating. However, traditional approaches take a decade, space, and inputs in order to make crosses and release improved varieties. Recent advancements in genome editing tools (GETs) have increased the possibility of precise and rapid genome editing. New GETs such as CRISPR/Cas9, CRISPR/Cpf1, prime editing, base editing, dCas9 epigenetic modification, and several other transgene-free genome editing approaches are available to fill the lacuna of selection cycles and limited genetic diversity. Over the last few years, these technologies have led to revolutionary developments and researchers have quickly attained remarkable achievements. However, GETs are associated with various bottlenecks that prevent the scaling development of new varieties that can be dealt with by integrating the GETs with the improved conventional breeding methods such as speed breeding, which would take plant breeding to the next level. In this review, we have summarized all these traditional, molecular, and integrated approaches to speed up the breeding procedure of cereals.
Collapse
Affiliation(s)
- Muhammad Haroon
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China;
| | - Xiukang Wang
- College of Life Sciences, Yan’an University, Yan’an 716000, China
| | - Rabail Afzal
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China;
| | - Muhammad Mubashar Zafar
- State Key Laboratory of Cotton Biology, Key Laboratory of Biological and Genetic Breeding of Cotton, Chinese Academy of Agricultural Science, Anyang 455000, China;
| | - Fahad Idrees
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (F.I.); (M.B.)
| | - Maria Batool
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (F.I.); (M.B.)
| | - Abdul Saboor Khan
- Institute of Plant Sciences, University of Cologne, 50667 Cologne, Germany;
| | - Muhammad Imran
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, South China Agriculture University, Guangzhou 510642, China;
| |
Collapse
|
2
|
Lin M, Xiong Q, Chung M, Daugherty SC, Nagaraj S, Sengamalay N, Ott S, Godinez A, Tallon LJ, Sadzewicz L, Fraser C, Dunning Hotopp JC, Rikihisa Y. Comparative Analysis of Genome of Ehrlichia sp. HF, a Model Bacterium to Study Fatal Human Ehrlichiosis. BMC Genomics 2021; 22:11. [PMID: 33407096 PMCID: PMC7789307 DOI: 10.1186/s12864-020-07309-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 12/07/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The genus Ehrlichia consists of tick-borne obligatory intracellular bacteria that can cause deadly diseases of medical and agricultural importance. Ehrlichia sp. HF, isolated from Ixodes ovatus ticks in Japan [also referred to as I. ovatus Ehrlichia (IOE) agent], causes acute fatal infection in laboratory mice that resembles acute fatal human monocytic ehrlichiosis caused by Ehrlichia chaffeensis. As there is no small laboratory animal model to study fatal human ehrlichiosis, Ehrlichia sp. HF provides a needed disease model. However, the inability to culture Ehrlichia sp. HF and the lack of genomic information have been a barrier to advance this animal model. In addition, Ehrlichia sp. HF has several designations in the literature as it lacks a taxonomically recognized name. RESULTS We stably cultured Ehrlichia sp. HF in canine histiocytic leukemia DH82 cells from the HF strain-infected mice, and determined its complete genome sequence. Ehrlichia sp. HF has a single double-stranded circular chromosome of 1,148,904 bp, which encodes 866 proteins with a similar metabolic potential as E. chaffeensis. Ehrlichia sp. HF encodes homologs of all virulence factors identified in E. chaffeensis, including 23 paralogs of P28/OMP-1 family outer membrane proteins, type IV secretion system apparatus and effector proteins, two-component systems, ankyrin-repeat proteins, and tandem repeat proteins. Ehrlichia sp. HF is a novel species in the genus Ehrlichia, as demonstrated through whole genome comparisons with six representative Ehrlichia species, subspecies, and strains, using average nucleotide identity, digital DNA-DNA hybridization, and core genome alignment sequence identity. CONCLUSIONS The genome of Ehrlichia sp. HF encodes all known virulence factors found in E. chaffeensis, substantiating it as a model Ehrlichia species to study fatal human ehrlichiosis. Comparisons between Ehrlichia sp. HF and E. chaffeensis will enable identification of in vivo virulence factors that are related to host specificity, disease severity, and host inflammatory responses. We propose to name Ehrlichia sp. HF as Ehrlichia japonica sp. nov. (type strain HF), to denote the geographic region where this bacterium was initially isolated.
Collapse
Affiliation(s)
- Mingqun Lin
- Department of Veterinary Biosciences, The Ohio State University, 1925 Coffey Road, Columbus, OH, 43210, USA.
| | - Qingming Xiong
- Department of Veterinary Biosciences, The Ohio State University, 1925 Coffey Road, Columbus, OH, 43210, USA
| | - Matthew Chung
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Sean C Daugherty
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Sushma Nagaraj
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Naomi Sengamalay
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Sandra Ott
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Al Godinez
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Luke J Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Lisa Sadzewicz
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Claire Fraser
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
- Department of Medicine, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Julie C Dunning Hotopp
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
- Greenebaum Cancer Center, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Yasuko Rikihisa
- Department of Veterinary Biosciences, The Ohio State University, 1925 Coffey Road, Columbus, OH, 43210, USA.
| |
Collapse
|
3
|
Richardson LJ, Rawlings ND, Salazar GA, Almeida A, Haft DR, Ducq G, Sutton GG, Finn RD. Genome properties in 2019: a new companion database to InterPro for the inference of complete functional attributes. Nucleic Acids Res 2019; 47:D564-D572. [PMID: 30364992 PMCID: PMC6323913 DOI: 10.1093/nar/gky1013] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Revised: 10/09/2018] [Accepted: 10/10/2018] [Indexed: 11/14/2022] Open
Abstract
Automatic annotation of protein function is routinely applied to newly sequenced genomes. While this provides a fine-grained view of an organism's functional protein repertoire, proteins, more commonly function in a coordinated manner, such as in pathways or multimeric complexes. Genome Properties (GPs) define such functional entities as a series of steps, originally described by either TIGRFAMs or Pfam entries. To increase the scope of coverage, we have migrated GPs to function as a companion resource utilizing InterPro entries. Having introduced GPs-specific versioned releases, we provide software and data via a GitHub repository, and have developed a new web interface to GPs (available at https://www.ebi.ac.uk/interpro/genomeproperties). In addition to exploring each of the 1286 GPs, the website contains GPs pre-calculated for a representative set of proteomes; these results can be used to profile GPs phylogenetically via an interactive viewer. Users can upload novel data to the viewer for comparison with the pre-calculated results. Over the last year, we have added ∼700 new GPs, increasing the coverage of eukaryotic systems, as well as increasing general coverage through automatic generation of GPs from related resources. All data are freely available via the website and the GitHub repository.
Collapse
Affiliation(s)
- Lorna J Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Neil D Rawlings
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandre Almeida
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David R Haft
- J. Craig Venter Institute (JCVI), 9605 Medical Center Drive, Suite 150, Rockville, MD 20850, USA
| | - Gregory Ducq
- J. Craig Venter Institute (JCVI), 9605 Medical Center Drive, Suite 150, Rockville, MD 20850, USA
| | - Granger G Sutton
- J. Craig Venter Institute (JCVI), 9605 Medical Center Drive, Suite 150, Rockville, MD 20850, USA
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
4
|
Mercier J, Josso A, Médigue C, Vallenet D. GROOLS: reactive graph reasoning for genome annotation through biological processes. BMC Bioinformatics 2018; 19:132. [PMID: 29642842 PMCID: PMC5896057 DOI: 10.1186/s12859-018-2126-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Accepted: 03/22/2018] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND High quality functional annotation is essential for understanding the phenotypic consequences encoded in a genome. Despite improvements in bioinformatics methods, millions of sequences in databanks are not assigned reliable functions. The curation of protein functions in the context of biological processes is a way to evaluate and improve their annotation. RESULTS We developed an expert system using paraconsistent logic, named GROOLS (Genomic Rule Object-Oriented Logic System), that evaluates the completeness and the consistency of predicted functions through biological processes like metabolic pathways. Using a generic and hierarchical representation of knowledge, biological processes are modeled in a graph from which observations (i.e. predictions and expectations) are propagated by rules. At the end of the reasoning, conclusions are assigned to biological process components and highlight uncertainties and inconsistencies. Results on 14 microbial organisms are presented. CONCLUSIONS GROOLS software is designed to evaluate the overall accuracy of functional unit and pathway predictions according to organism experimental data like growth phenotypes. It assists biocurators in the functional annotation of proteins by focusing on missing or contradictory observations.
Collapse
Affiliation(s)
- Jonathan Mercier
- LABGeM, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Université d’Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, Evry, 91057 France
| | - Adrien Josso
- LABGeM, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Université d’Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, Evry, 91057 France
| | - Claudine Médigue
- LABGeM, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Université d’Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, Evry, 91057 France
| | - David Vallenet
- LABGeM, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Université d’Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, Evry, 91057 France
| |
Collapse
|
5
|
Bhardwaj T, Somvanshi P. Pan-genome analysis of Clostridium botulinum reveals unique targets for drug development. Gene 2017; 623:48-62. [DOI: 10.1016/j.gene.2017.04.019] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2016] [Revised: 03/29/2017] [Accepted: 04/12/2017] [Indexed: 10/19/2022]
|
6
|
Lin M, Bachman K, Cheng Z, Daugherty SC, Nagaraj S, Sengamalay N, Ott S, Godinez A, Tallon LJ, Sadzewicz L, Fraser C, Dunning Hotopp JC, Rikihisa Y. Analysis of complete genome sequence and major surface antigens of Neorickettsia helminthoeca, causative agent of salmon poisoning disease. Microb Biotechnol 2017; 10:933-957. [PMID: 28585301 PMCID: PMC5481527 DOI: 10.1111/1751-7915.12731] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Revised: 03/09/2017] [Accepted: 04/25/2017] [Indexed: 12/31/2022] Open
Abstract
Neorickettsia helminthoeca, a type species of the genus Neorickettsia, is an endosymbiont of digenetic trematodes of veterinary importance. Upon ingestion of salmonid fish parasitized with infected trematodes, canids develop salmon poisoning disease (SPD), an acute febrile illness that is particularly severe and often fatal in dogs without adequate treatment. We determined and analysed the complete genome sequence of N. helminthoeca: a single small circular chromosome of 884 232 bp encoding 774 potential proteins. N. helminthoeca is unable to synthesize lipopolysaccharides and most amino acids, but is capable of synthesizing vitamins, cofactors, nucleotides and bacterioferritin. N. helminthoeca is, however, distinct from majority of the family Anaplasmataceae to which it belongs, as it encodes nearly all enzymes required for peptidoglycan biosynthesis, suggesting its structural hardiness and inflammatory potential. Using sera from dogs that were experimentally infected by feeding with parasitized fish or naturally infected in southern California, Western blot analysis revealed that among five predicted N. helminthoeca outer membrane proteins, P51 and strain‐variable surface antigen were uniformly recognized. Our finding will help understanding pathogenesis, prevalence of N. helminthoeca infection among trematodes, canids and potentially other animals in nature to develop effective SPD diagnostic and preventive measures. Recent progresses in large‐scale genome sequencing have been uncovering broad distribution of Neorickettsia spp., the comparative genomics will facilitate understanding of biology and the natural history of these elusive environmental bacteria.
Collapse
Affiliation(s)
- Mingqun Lin
- Department of Veterinary Biosciences, The Ohio State University, 1925 Coffey Road, Columbus, OH, 43210, USA
| | - Katherine Bachman
- Department of Veterinary Biosciences, The Ohio State University, 1925 Coffey Road, Columbus, OH, 43210, USA
| | - Zhihui Cheng
- Department of Veterinary Biosciences, The Ohio State University, 1925 Coffey Road, Columbus, OH, 43210, USA
| | - Sean C Daugherty
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Sushma Nagaraj
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Naomi Sengamalay
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Sandra Ott
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Al Godinez
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Luke J Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Lisa Sadzewicz
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Claire Fraser
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA.,Department of Medicine, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Julie C Dunning Hotopp
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA.,Department of Microbiology and Immunology, University of Maryland School of Medicine, 801 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Yasuko Rikihisa
- Department of Veterinary Biosciences, The Ohio State University, 1925 Coffey Road, Columbus, OH, 43210, USA
| |
Collapse
|
7
|
Quantifying the Importance of the Rare Biosphere for Microbial Community Response to Organic Pollutants in a Freshwater Ecosystem. Appl Environ Microbiol 2017; 83:AEM.03321-16. [PMID: 28258138 DOI: 10.1128/aem.03321-16] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Accepted: 02/01/2017] [Indexed: 01/01/2023] Open
Abstract
A single liter of water contains hundreds, if not thousands, of bacterial and archaeal species, each of which typically makes up a very small fraction of the total microbial community (<0.1%), the so-called "rare biosphere." How often, and via what mechanisms, e.g., clonal amplification versus horizontal gene transfer, the rare taxa and genes contribute to microbial community response to environmental perturbations represent important unanswered questions toward better understanding the value and modeling of microbial diversity. We tested whether rare species frequently responded to changing environmental conditions by establishing 20-liter planktonic mesocosms with water from Lake Lanier (Georgia, USA) and perturbing them with organic compounds that are rarely detected in the lake, including 2,4-dichlorophenoxyacetic acid (2,4-D), 4-nitrophenol (4-NP), and caffeine. The populations of the degraders of these compounds were initially below the detection limit of quantitative PCR (qPCR) or metagenomic sequencing methods, but they increased substantially in abundance after perturbation. Sequencing of several degraders (isolates) and time-series metagenomic data sets revealed distinct cooccurring alleles of degradation genes, frequently carried on transmissible plasmids, especially for the 2,4-D mesocosms, and distinct species dominating the post-enrichment microbial communities from each replicated mesocosm. This diversity of species and genes also underlies distinct degradation profiles among replicated mesocosms. Collectively, these results supported the hypothesis that the rare biosphere can serve as a genetic reservoir, which can be frequently missed by metagenomics but enables community response to changing environmental conditions caused by organic pollutants, and they provided insights into the size of the pool of rare genes and species.IMPORTANCE A single liter of water or gram of soil contains hundreds of low-abundance bacterial and archaeal species, the so called rare biosphere. The value of this astonishing biodiversity for ecosystem functioning remains poorly understood, primarily due to the fact that microbial community analysis frequently focuses on abundant organisms. Using a combination of culture-dependent and culture-independent (metagenomics) techniques, we showed that rare taxa and genes commonly contribute to the microbial community response to organic pollutants. Our findings should have implications for future studies that aim to study the role of rare species in environmental processes, including environmental bioremediation efforts of oil spills or other contaminants.
Collapse
|
8
|
Haft DR, Haft DH. A comprehensive software suite for protein family construction and functional site prediction. PLoS One 2017; 12:e0171758. [PMID: 28182651 PMCID: PMC5300114 DOI: 10.1371/journal.pone.0171758] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Accepted: 01/25/2017] [Indexed: 11/18/2022] Open
Abstract
In functionally diverse protein families, conservation in short signature regions may outperform full-length sequence comparisons for identifying proteins that belong to a subgroup within which one specific aspect of their function is conserved. The SIMBAL workflow (Sites Inferred by Metabolic Background Assertion Labeling) is a data-mining procedure for finding such signature regions. It begins by using clues from genomic context, such as co-occurrence or conserved gene neighborhoods, to build a useful training set from a large number of uncharacterized but mutually homologous proteins. When training set construction is successful, the YES partition is enriched in proteins that share function with the user’s query sequence, while the NO partition is depleted. A selected query sequence is then mined for short signature regions whose closest matches overwhelmingly favor proteins from the YES partition. High-scoring signature regions typically contain key residues critical to functional specificity, so proteins with the highest sequence similarity across these regions tend to share the same function. The SIMBAL algorithm was described previously, but significant manual effort, expertise, and a supporting software infrastructure were required to prepare the requisite training sets. Here, we describe a new, distributable software suite that speeds up and simplifies the process for using SIMBAL, most notably by providing tools that automate training set construction. These tools have broad utility for comparative genomics, allowing for flexible collection of proteins or protein domains based on genomic context as well as homology, a capability that can greatly assist in protein family construction. Armed with this new software suite, SIMBAL can serve as a fast and powerful in silico alternative to direct experimentation for characterizing proteins and their functional interactions.
Collapse
Affiliation(s)
- David Renfrew Haft
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Daniel H. Haft
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
9
|
Brbić M, Piškorec M, Vidulin V, Kriško A, Šmuc T, Supek F. The landscape of microbial phenotypic traits and associated genes. Nucleic Acids Res 2016; 44:10074-10090. [PMID: 27915291 PMCID: PMC5137458 DOI: 10.1093/nar/gkw964] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2016] [Revised: 09/21/2016] [Accepted: 10/11/2016] [Indexed: 12/31/2022] Open
Abstract
Bacteria and Archaea display a variety of phenotypic traits and can adapt to diverse ecological niches. However, systematic annotation of prokaryotic phenotypes is lacking. We have therefore developed ProTraits, a resource containing ∼545 000 novel phenotype inferences, spanning 424 traits assigned to 3046 bacterial and archaeal species. These annotations were assigned by a computational pipeline that associates microbes with phenotypes by text-mining the scientific literature and the broader World Wide Web, while also being able to define novel concepts from unstructured text. Moreover, the ProTraits pipeline assigns phenotypes by drawing extensively on comparative genomics, capturing patterns in gene repertoires, codon usage biases, proteome composition and co-occurrence in metagenomes. Notably, we find that gene synteny is highly predictive of many phenotypes, and highlight examples of gene neighborhoods associated with spore-forming ability. A global analysis of trait interrelatedness outlined clusters in the microbial phenotype network, suggesting common genetic underpinnings. Our extended set of phenotype annotations allows detection of 57 088 high confidence gene-trait links, which recover many known associations involving sporulation, flagella, catalase activity, aerobicity, photosynthesis and other traits. Over 99% of the commonly occurring gene families are involved in genetic interactions conditional on at least one phenotype, suggesting that epistasis has a major role in shaping microbial gene content.
Collapse
Affiliation(s)
- Maria Brbić
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia
| | - Matija Piškorec
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia
| | - Vedrana Vidulin
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia
| | - Anita Kriško
- Mediterranean Institute of Life Sciences, 21000 Split, Croatia
| | - Tomislav Šmuc
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia
| | - Fran Supek
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia .,EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| |
Collapse
|
10
|
Biofilms on Hospital Shower Hoses: Characterization and Implications for Nosocomial Infections. Appl Environ Microbiol 2016; 82:2872-2883. [PMID: 26969701 DOI: 10.1128/aem.03529-15] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 02/23/2016] [Indexed: 11/20/2022] Open
Abstract
Although the source of drinking water (DW) used in hospitals is commonly disinfected, biofilms forming on water pipelines are a refuge for bacteria, including possible pathogens that survive different disinfection strategies. These biofilm communities are only beginning to be explored by culture-independent techniques that circumvent the limitations of conventional monitoring efforts. Hence, theories regarding the frequency of opportunistic pathogens in DW biofilms and how biofilm members withstand high doses of disinfectants and/or chlorine residuals in the water supply remain speculative. The aim of this study was to characterize the composition of microbial communities growing on five hospital shower hoses using both 16S rRNA gene sequencing of bacterial isolates and whole-genome shotgun metagenome sequencing. The resulting data revealed a Mycobacterium-like population, closely related to Mycobacterium rhodesiae and Mycobacterium tusciae, to be the predominant taxon in all five samples, and its nearly complete draft genome sequence was recovered. In contrast, the fraction recovered by culture was mostly affiliated with Proteobacteria, including members of the genera Sphingomonas, Blastomonas, and Porphyrobacter.The biofilm community harbored genes related to disinfectant tolerance (2.34% of the total annotated proteins) and a lower abundance of virulence determinants related to colonization and evasion of the host immune system. Additionally, genes potentially conferring resistance to β-lactam, aminoglycoside, amphenicol, and quinolone antibiotics were detected. Collectively, our results underscore the need to understand the microbiome of DW biofilms using metagenomic approaches. This information might lead to more robust management practices that minimize the risks associated with exposure to opportunistic pathogens in hospitals.
Collapse
|
11
|
Haft DH. Using comparative genomics to drive new discoveries in microbiology. Curr Opin Microbiol 2015; 23:189-96. [PMID: 25617609 DOI: 10.1016/j.mib.2014.11.017] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2014] [Revised: 11/19/2014] [Accepted: 11/20/2014] [Indexed: 01/17/2023]
Abstract
Bioinformatics looks to many microbiologists like a service industry. In this view, annotation starts with what is known from experiments in the lab, makes reasonable inferences of which genes match other genes in function, builds databases to make all that we know accessible, but creates nothing truly new. Experiments lead, then biocuration and computational biology follow. But the astounding success of genome sequencing is changing the annotation paradigm. Every genome sequenced is an intercepted coded message from the microbial world, and as all cryptographers know, it is easier to decode a thousand messages than a single message. Some biology is best discovered not by phenomenology, but by decoding genome content, forming hypotheses, and doing the first few rounds of validation computationally. Through such reasoning, a role and function may be assigned to a protein with no sequence similarity to any protein yet studied. Experimentation can follow after the discovery to cement and to extend the findings. Unfortunately, this approach remains so unfamiliar to most bench scientists that lab work and comparative genomics typically segregate to different teams working on unconnected projects. This review will discuss several themes in comparative genomics as a discovery method, including highly derived data, use of patterns of design to reason by analogy, and in silico testing of computationally generated hypotheses.
Collapse
|
12
|
Signal correlations in ecological niches can shape the organization and evolution of bacterial gene regulatory networks. Adv Microb Physiol 2013; 61:1-36. [PMID: 23046950 DOI: 10.1016/b978-0-12-394423-8.00001-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Transcriptional regulation plays a significant role in the biological response of bacteria to changing environmental conditions. Therefore, mapping transcriptional regulatory networks is an important step not only in understanding how bacteria sense and interpret their environment but also to identify the functions involved in biological responses to specific conditions. Recent experimental and computational developments have facilitated the characterization of regulatory networks on a genome-wide scale in model organisms. In addition, the multiplication of complete genome sequences has encouraged comparative analyses to detect conserved regulatory elements and infer regulatory networks in other less well-studied organisms. However, transcription regulation appears to evolve rapidly, thus, creating challenges for the transfer of knowledge to nonmodel organisms. Nevertheless, the mechanisms and constraints driving the evolution of regulatory networks have been the subjects of numerous analyses, and several models have been proposed. Overall, the contributions of mutations, recombination, and horizontal gene transfer are complex. Finally, the rapid evolution of regulatory networks plays a significant role in the remarkable capacity of bacteria to adapt to new or changing environments. Conversely, the characteristics of environmental niches determine the selective pressures and can shape the structure of regulatory network accordingly.
Collapse
|
13
|
Haft DH, Selengut JD, Richter RA, Harkins D, Basu MK, Beck E. TIGRFAMs and Genome Properties in 2013. Nucleic Acids Res 2012. [PMID: 23197656 PMCID: PMC3531188 DOI: 10.1093/nar/gks1234] [Citation(s) in RCA: 387] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
TIGRFAMs, available online at http://www.jcvi.org/tigrfams is a database of protein family definitions. Each entry features a seed alignment of trusted representative sequences, a hidden Markov model (HMM) built from that alignment, cutoff scores that let automated annotation pipelines decide which proteins are members, and annotations for transfer onto member proteins. Most TIGRFAMs models are designated equivalog, meaning they assign a specific name to proteins conserved in function from a common ancestral sequence. Models describing more functionally heterogeneous families are designated subfamily or domain, and assign less specific but more widely applicable annotations. The Genome Properties database, available at http://www.jcvi.org/genome-properties, specifies how computed evidence, including TIGRFAMs HMM results, should be used to judge whether an enzymatic pathway, a protein complex or another type of molecular subsystem is encoded in a genome. TIGRFAMs and Genome Properties content are developed in concert because subsystems reconstruction for large numbers of genomes guides selection of seed alignment sequences and cutoff values during protein family construction. Both databases specialize heavily in bacterial and archaeal subsystems. At present, 4284 models appear in TIGRFAMs, while 628 systems are described by Genome Properties. Content derives both from subsystem discovery work and from biocuration of the scientific literature.
Collapse
Affiliation(s)
- Daniel H Haft
- Informatics, J Craig Venter Institute, Rockville, MD 20850, USA.
| | | | | | | | | | | |
Collapse
|
14
|
Ricaldi JN, Fouts DE, Selengut JD, Harkins DM, Patra KP, Moreno A, Lehmann JS, Purushe J, Sanka R, Torres M, Webster NJ, Vinetz JM, Matthias MA. Whole genome analysis of Leptospira licerasiae provides insight into leptospiral evolution and pathogenicity. PLoS Negl Trop Dis 2012; 6:e1853. [PMID: 23145189 PMCID: PMC3493377 DOI: 10.1371/journal.pntd.0001853] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2012] [Accepted: 08/25/2012] [Indexed: 12/25/2022] Open
Abstract
The whole genome analysis of two strains of the first intermediately pathogenic leptospiral species to be sequenced (Leptospira licerasiae strains VAR010 and MMD0835) provides insight into their pathogenic potential and deepens our understanding of leptospiral evolution. Comparative analysis of eight leptospiral genomes shows the existence of a core leptospiral genome comprising 1547 genes and 452 conserved genes restricted to infectious species (including L. licerasiae) that are likely to be pathogenicity-related. Comparisons of the functional content of the genomes suggests that L. licerasiae retains several proteins related to nitrogen, amino acid and carbohydrate metabolism which might help to explain why these Leptospira grow well in artificial media compared with pathogenic species. L. licerasiae strains VAR010T and MMD0835 possess two prophage elements. While one element is circular and shares homology with LE1 of L. biflexa, the second is cryptic and homologous to a previously identified but unnamed region in L. interrogans serovars Copenhageni and Lai. We also report a unique O-antigen locus in L. licerasiae comprised of a 6-gene cluster that is unexpectedly short compared with L. interrogans in which analogous regions may include >90 such genes. Sequence homology searches suggest that these genes were acquired by lateral gene transfer (LGT). Furthermore, seven putative genomic islands ranging in size from 5 to 36 kb are present also suggestive of antecedent LGT. How Leptospira become naturally competent remains to be determined, but considering the phylogenetic origins of the genes comprising the O-antigen cluster and other putative laterally transferred genes, L. licerasiae must be able to exchange genetic material with non-invasive environmental bacteria. The data presented here demonstrate that L. licerasiae is genetically more closely related to pathogenic than to saprophytic Leptospira and provide insight into the genomic bases for its infectiousness and its unique antigenic characteristics. Leptospirosis is one of the most common diseases transmitted by animals worldwide and is important because it is a major cause of febrile illness in tropical areas and also occurs in epidemic form associated with natural disasters and flooding. The mechanisms through which Leptospira cause disease are not well understood. In this study we have sequenced the genomes of two strains of Leptospira licerasiae isolated from a person and a marsupial in the Peruvian Amazon. These strains were thought to be able to cause only mild disease in humans. We have compared these genomes with other leptospires that can cause severe illness and death and another leptospire that does not infect humans or animals. These comparisons have allowed us to demonstrate similarities among the disease-causing Leptospira. Studying genes that are common among infectious strains will allow us to identify genetic factors necessary for infecting, causing disease and determining the severity of disease. We have also found that L. licerasiae seems to be able to uptake and incorporate genetic information from other bacteria found in the environment. This information will allow us to begin to understand how Leptospira species have evolved.
Collapse
Affiliation(s)
- Jessica N. Ricaldi
- Instituto de Medicina Tropical Alexander von Humboldt, Universidad Peruana Cayetano Heredia, Lima, Peru
- Division of Infectious Diseases, Department of Medicine, University of California San Diego School of Medicine, La Jolla, California, United States of America
| | - Derrick E. Fouts
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Jeremy D. Selengut
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Derek M. Harkins
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Kailash P. Patra
- Division of Infectious Diseases, Department of Medicine, University of California San Diego School of Medicine, La Jolla, California, United States of America
| | - Angelo Moreno
- Division of Infectious Diseases, Department of Medicine, University of California San Diego School of Medicine, La Jolla, California, United States of America
| | - Jason S. Lehmann
- Division of Infectious Diseases, Department of Medicine, University of California San Diego School of Medicine, La Jolla, California, United States of America
| | - Janaki Purushe
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Ravi Sanka
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Michael Torres
- Departamento de Ciencias Celulares y Moleculares, Laboratorio de Investigación y Desarrollo, Facultad de Ciencias, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Nicholas J. Webster
- Department of Medicine, University of California San Diego School of Medicine, La Jolla, California, United States of America
| | - Joseph M. Vinetz
- Instituto de Medicina Tropical Alexander von Humboldt, Universidad Peruana Cayetano Heredia, Lima, Peru
- Division of Infectious Diseases, Department of Medicine, University of California San Diego School of Medicine, La Jolla, California, United States of America
- Departamento de Ciencias Celulares y Moleculares, Laboratorio de Investigación y Desarrollo, Facultad de Ciencias, Universidad Peruana Cayetano Heredia, Lima, Peru
- * E-mail: (JMV); (MAM)
| | - Michael A. Matthias
- Division of Infectious Diseases, Department of Medicine, University of California San Diego School of Medicine, La Jolla, California, United States of America
- * E-mail: (JMV); (MAM)
| |
Collapse
|
15
|
Paralanov V, Lu J, Duffy LB, Crabb DM, Shrivastava S, Methé BA, Inman J, Yooseph S, Xiao L, Cassell GH, Waites KB, Glass JI. Comparative genome analysis of 19 Ureaplasma urealyticum and Ureaplasma parvum strains. BMC Microbiol 2012; 12:88. [PMID: 22646228 PMCID: PMC3511179 DOI: 10.1186/1471-2180-12-88] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2011] [Accepted: 05/02/2012] [Indexed: 11/10/2022] Open
Abstract
Background Ureaplasma urealyticum (UUR) and Ureaplasma parvum (UPA) are sexually transmitted bacteria among humans implicated in a variety of disease states including but not limited to: nongonococcal urethritis, infertility, adverse pregnancy outcomes, chorioamnionitis, and bronchopulmonary dysplasia in neonates. There are 10 distinct serotypes of UUR and 4 of UPA. Efforts to determine whether difference in pathogenic potential exists at the ureaplasma serovar level have been hampered by limitations of antibody-based typing methods, multiple cross-reactions and poor discriminating capacity in clinical samples containing two or more serovars. Results We determined the genome sequences of the American Type Culture Collection (ATCC) type strains of all UUR and UPA serovars as well as four clinical isolates of UUR for which we were not able to determine serovar designation. UPA serovars had 0.75−0.78 Mbp genomes and UUR serovars were 0.84−0.95 Mbp. The original classification of ureaplasma isolates into distinct serovars was largely based on differences in the major ureaplasma surface antigen called the multiple banded antigen (MBA) and reactions of human and animal sera to the organisms. Whole genome analysis of the 14 serovars and the 4 clinical isolates showed the mba gene was part of a large superfamily, which is a phase variable gene system, and that some serovars have identical sets of mba genes. Most of the differences among serovars are hypothetical genes, and in general the two species and 14 serovars are extremely similar at the genome level. Conclusions Comparative genome analysis suggests UUR is more capable of acquiring genes horizontally, which may contribute to its greater virulence for some conditions. The overwhelming evidence of extensive horizontal gene transfer among these organisms from our previous studies combined with our comparative analysis indicates that ureaplasmas exist as quasi-species rather than as stable serovars in their native environment. Therefore, differential pathogenicity and clinical outcome of a ureaplasmal infection is most likely not on the serovar level, but rather may be due to the presence or absence of potential pathogenicity factors in an individual ureaplasma clinical isolate and/or patient to patient differences in terms of autoimmunity and microbiome.
Collapse
Affiliation(s)
- Vanya Paralanov
- J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Connecting genotype to phenotype in the era of high-throughput sequencing. Biochim Biophys Acta Gen Subj 2011; 1810:967-77. [PMID: 21421023 DOI: 10.1016/j.bbagen.2011.03.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2010] [Revised: 02/17/2011] [Accepted: 03/13/2011] [Indexed: 12/25/2022]
|
17
|
Lintner NG, Frankel KA, Tsutakawa SE, Alsbury DL, Copié V, Young MJ, Tainer JA, Lawrence CM. The structure of the CRISPR-associated protein Csa3 provides insight into the regulation of the CRISPR/Cas system. J Mol Biol 2011; 405:939-55. [PMID: 21093452 PMCID: PMC4507800 DOI: 10.1016/j.jmb.2010.11.019] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2010] [Revised: 11/01/2010] [Accepted: 11/09/2010] [Indexed: 01/07/2023]
Abstract
Adaptive immune systems have recently been recognized in prokaryotic organisms where, in response to viral infection, they incorporate short fragments of invader-derived DNA into loci called clustered regularly interspaced short palindromic repeats (CRISPRs). In subsequent infections, the CRISPR loci are transcribed and processed into guide sequences for the neutralization of the invading RNA or DNA. The CRISPR-associated protein machinery (Cas) lies at the heart of this process, yet many of the molecular details of the CRISPR/Cas system remain to be elucidated. Here, we report the first structure of Csa3, a CRISPR-associated protein from Sulfolobus solfataricus (Sso1445), which reveals a dimeric two-domain protein. The N-terminal domain is a unique variation on the dinucleotide binding domain that orchestrates dimer formation. In addition, it utilizes two conserved sequence motifs [Thr-h-Gly-Phe-(Asn/Asp)-Glu-X(4)-Arg and Leu-X(2)-Gly-h-Arg] to construct a 2-fold symmetric pocket on the dimer axis. This pocket is likely to represent a regulatory ligand-binding site. The N-terminal domain is fused to a C-terminal MarR-like winged helix-turn-helix domain that is expected to be involved in DNA recognition. Overall, the unique domain architecture of Csa3 suggests a transcriptional regulator under allosteric control of the N-terminal domain. Alternatively, Csa3 may function in a larger complex, with the conserved cleft participating in protein-protein or protein-nucleic acid interactions. A similar N-terminal domain is also identified in Csx1, a second CRISPR-associated protein family of unknown function.
Collapse
Affiliation(s)
- Nathanael G. Lintner
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA,Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717, USA
| | - Kenneth A. Frankel
- Life Science Division, Lawrence Berkeley National Labs, Berkeley, CA 94720, USA
| | - Susan E. Tsutakawa
- Life Science Division, Lawrence Berkeley National Labs, Berkeley, CA 94720, USA
| | - Donald L. Alsbury
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA,Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717, USA
| | - Valérie Copié
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA,Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717, USA
| | - Mark J. Young
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA,Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT 59717, USA
| | - John A. Tainer
- Life Science Division, Lawrence Berkeley National Labs, Berkeley, CA 94720, USA,Department of Molecular Biology MB4 and the Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - C. Martin Lawrence
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA,Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717, USA,Address correspondence to: Martin Lawrence, Department of Chemistry and Biochemistry, 103 CBB, Montana State University, Bozeman, MT 59717; ; Phone: 1-406-994-5382, Fax: 1-406-994-5407
| |
Collapse
|
18
|
Haft DH. Bioinformatic evidence for a widely distributed, ribosomally produced electron carrier precursor, its maturation proteins, and its nicotinoprotein redox partners. BMC Genomics 2011; 12:21. [PMID: 21223593 PMCID: PMC3023750 DOI: 10.1186/1471-2164-12-21] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2010] [Accepted: 01/11/2011] [Indexed: 11/10/2022] Open
Abstract
Background Enzymes in the radical SAM (rSAM) domain family serve in a wide variety of biological processes, including RNA modification, enzyme activation, bacteriocin core peptide maturation, and cofactor biosynthesis. Evolutionary pressures and relationships to other cellular constituents impose recognizable grammars on each class of rSAM-containing system, shaping patterns in results obtained through various comparative genomics analyses. Results An uncharacterized gene cluster found in many Actinobacteria and sporadically in Firmicutes, Chloroflexi, Deltaproteobacteria, and one Archaeal plasmid contains a PqqE-like rSAM protein family that includes Rv0693 from Mycobacterium tuberculosis. Members occur clustered with a strikingly well-conserved small polypeptide we designate "mycofactocin," similar in size to bacteriocins and PqqA, precursor of pyrroloquinoline quinone (PQQ). Partial Phylogenetic Profiling (PPP) based on the distribution of these markers identifies the mycofactocin cluster, but also a second tier of high-scoring proteins. This tier, strikingly, is filled with up to thirty-one members per genome from three variant subfamilies that occur, one each, in three unrelated classes of nicotinoproteins. The pattern suggests these variant enzymes require not only NAD(P), but also the novel gene cluster. Further study was conducted using SIMBAL, a PPP-like tool, to search these nicotinoproteins for subsequences best correlated across multiple genomes to the presence of mycofactocin. For both the short chain dehydrogenase/reductase (SDR) and iron-containing dehydrogenase families, aligning SIMBAL's top-scoring sequences to homologous solved crystal structures shows signals centered over NAD(P)-binding sites rather than over substrate-binding or active site residues. Previous studies on some of these proteins have revealed a non-exchangeable NAD cofactor, such that enzymatic activity in vitro requires an artificial electron acceptor such as N,N-dimethyl-4-nitrosoaniline (NDMA) for the enzyme to cycle. Conclusions Taken together, these findings suggest that the mycofactocin precursor is modified by the Rv0693 family rSAM protein and other enzymes in its cluster. It becomes an electron carrier molecule that serves in vivo as NDMA and other artificial electron acceptors do in vitro. Subclasses from three different nicotinoprotein families show "only-if" relationships to mycofactocin because they require its presence. This framework suggests a segregated redox pool in which mycofactocin mediates communication among enzymes with non-exchangeable cofactors.
Collapse
Affiliation(s)
- Daniel H Haft
- J Craig Venter Institute, 9704 Rockville, MD 20850, USA.
| |
Collapse
|
19
|
|
20
|
Abstract
Improvements in nucleotide sequencing technology have resulted in an ever increasing number of nucleotide and protein sequences being deposited in databases. Unfortunately, the ability to manually classify and annotate these sequences cannot keep pace with their rapid generation, resulting in an increased bias toward unannotated sequence. Automatic annotation tools can help redress the balance. There are a number of different groups working to produce protein signatures that describe protein families, functional domains or conserved sites within related groups of proteins. Protein signature databases include CATH-Gene3D, HAMAP, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY, and TIGRFAMs. Their approaches range from characterising small conserved motifs that can identify members of a family or subfamily, to the use of hidden Markov models that describe the conservation of residues over entire domains or whole proteins. To increase their value as protein classification tools, protein signatures from these 11 databases have been combined into one, powerful annotation tool: the InterPro database (http://www.ebi.ac.uk/interpro/) (Hunter et al., Nucleic Acids Res 37:D211-D215, 2009). InterPro is an open-source protein resource used for the automatic annotation of proteins, and is scalable to the analysis of entire new genomes through the use of a downloadable version of InterProScan, which can be incorporated into an existing local pipeline. InterPro provides structural information from PDB (Kouranov et al., Nucleic Acids Res 34:D302-D305, 2006), its classification in CATH (Cuff et al., Nucleic Acids Res 37:D310-D314, 2009) and SCOP (Andreeva et al., Nucleic Acids Res 36:D419-D425, 2008), as well as homology models from ModBase (Pieper et al., Nucleic Acids Res 37:D347-D354, 2009) and SwissModel (Kiefer et al., Nucleic Acids Res 37:D387-D392, 2009), allowing a direct comparison of the protein signatures with the available structural information. This chapter reviews the signature methods found in the InterPro database, and provides an overview of the InterPro resource itself.
Collapse
|
21
|
Madupu R, Brinkac LM, Harrow J, Wilming LG, Böhme U, Lamesch P, Hannick LI. Meeting report: a workshop on Best Practices in Genome Annotation. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2010; 2010:baq001. [PMID: 20428316 PMCID: PMC2860899 DOI: 10.1093/database/baq001] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2009] [Revised: 01/08/2010] [Accepted: 01/11/2010] [Indexed: 01/28/2023]
Abstract
Efforts to annotate the genomes of a wide variety of model organisms are currently carried out by sequencing centers, model organism databases and academic/institutional laboratories around the world. Different annotation methods and tools have been developed over time to meet the needs of biologists faced with the task of annotating biological data. While standardized methods are essential for consistent curation within each annotation group, methods and tools can differ between groups, especially when the groups are curating different organisms. Biocurators from several institutes met at the Third International Biocuration Conference in Berlin, Germany, April 2009 and hosted the ‘Best Practices in Genome Annotation: Inference from Evidence’ workshop to share their strategies, pipelines, standards and tools. This article documents the material presented in the workshop.
Collapse
Affiliation(s)
- Ramana Madupu
- Informatics, J. Craig Venter Institute, Rockville, MD 20850 USA, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK and The Arabidopsis Information Resource, Carnegie Institution of Washington, Stanford, CA 94305 USA
| | | | | | | | | | | | | |
Collapse
|
22
|
Selengut JD, Rusch DB, Haft DH. Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function. BMC Bioinformatics 2010; 11:52. [PMID: 20102603 PMCID: PMC3098086 DOI: 10.1186/1471-2105-11-52] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2009] [Accepted: 01/26/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. RESULTS Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. CONCLUSIONS SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites.
Collapse
Affiliation(s)
- Jeremy D Selengut
- J, Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | |
Collapse
|
23
|
Ho Sui SJ, Fedynak A, Hsiao WWL, Langille MGI, Brinkman FSL. The association of virulence factors with genomic islands. PLoS One 2009; 4:e8094. [PMID: 19956607 PMCID: PMC2779486 DOI: 10.1371/journal.pone.0008094] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Accepted: 11/07/2009] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND It has been noted that many bacterial virulence factor genes are located within genomic islands (GIs; clusters of genes in a prokaryotic genome of probable horizontal origin). However, such studies have been limited to single genera or isolated observations. We have performed the first large-scale analysis of multiple diverse pathogens to examine this association. We additionally identified genes found predominantly in pathogens, but not non-pathogens, across multiple genera using 631 complete bacterial genomes, and we identified common trends in virulence for genes in GIs. Furthermore, we examined the relationship between GIs and clustered regularly interspaced palindromic repeats (CRISPRs) proposed to confer resistance to phage. METHODOLOGY/PRINCIPAL FINDINGS We show quantitatively that GIs disproportionately contain more virulence factors than the rest of a given genome (p<1E-40 using three GI datasets) and that CRISPRs are also over-represented in GIs. Virulence factors in GIs and pathogen-associated virulence factors are enriched for proteins having more "offensive" functions, e.g. active invasion of the host, and are disproportionately components of type III/IV secretion systems or toxins. Numerous hypothetical pathogen-associated genes were identified, meriting further study. CONCLUSIONS/SIGNIFICANCE This is the first systematic analysis across diverse genera indicating that virulence factors are disproportionately associated with GIs. "Offensive" virulence factors, as opposed to host-interaction factors, may more often be a recently acquired trait (on an evolutionary time scale detected by GI analysis). Newly identified pathogen-associated genes warrant further study. We discuss the implications of these results, which cement the significant role of GIs in the evolution of many pathogens.
Collapse
Affiliation(s)
- Shannan J. Ho Sui
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Amber Fedynak
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - William W. L. Hsiao
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Morgan G. I. Langille
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Fiona S. L. Brinkman
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada
- * E-mail:
| |
Collapse
|
24
|
Davidsen T, Beck E, Ganapathy A, Montgomery R, Zafar N, Yang Q, Madupu R, Goetz P, Galinsky K, White O, Sutton G. The comprehensive microbial resource. Nucleic Acids Res 2009; 38:D340-5. [PMID: 19892825 PMCID: PMC2808947 DOI: 10.1093/nar/gkp912] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Comprehensive Microbial Resource or CMR (http://cmr.jcvi.org) provides a web-based central resource for the display, search and analysis of the sequence and annotation for complete and publicly available bacterial and archaeal genomes. In addition to displaying the original annotation from GenBank, the CMR makes available secondary automated structural and functional annotation across all genomes to provide consistent data types necessary for effective mining of genomic data. Precomputed homology searches are stored to allow meaningful genome comparisons. The CMR supplies users with over 50 different tools to utilize the sequence and annotation data across one or more of the 571 currently available genomes. At the gene level users can view the gene annotation and underlying evidence. Genome level information includes whole genome graphical displays, biochemical pathway maps and genome summary data. Comparative tools display analysis between genomes with homology and genome alignment tools, and searches across the accessions, annotation, and evidence assigned to all genes/genomes are available. The data and tools on the CMR aid genomic research and analysis, and the CMR is included in over 200 scientific publications. The code underlying the CMR website and the CMR database are freely available for download with no license restrictions.
Collapse
|
25
|
Brinkac LM, Davidsen T, Beck E, Ganapathy A, Caler E, Dodson RJ, Durkin AS, Harkins DM, Lorenzi H, Madupu R, Sebastian Y, Shrivastava S, Thiagarajan M, Orvis J, Sundaram JP, Crabtree J, Galens K, Zhao Y, Inman JM, Montgomery R, Schobel S, Galinsky K, Tanenbaum DM, Resnick A, Zafar N, White O, Sutton G. Pathema: a clade-specific bioinformatics resource center for pathogen research. Nucleic Acids Res 2009; 38:D408-14. [PMID: 19843611 PMCID: PMC2808925 DOI: 10.1093/nar/gkp850] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Pathema (http://pathema.jcvi.org) is one of the eight Bioinformatics Resource Centers (BRCs) funded by the National Institute of Allergy and Infectious Disease (NIAID) designed to serve as a core resource for the bio-defense and infectious disease research community. Pathema strives to support basic research and accelerate scientific progress for understanding, detecting, diagnosing and treating an established set of six target NIAID Category A-C pathogens: Category A priority pathogens; Bacillus anthracis and Clostridium botulinum, and Category B priority pathogens; Burkholderia mallei, Burkholderia pseudomallei, Clostridium perfringens and Entamoeba histolytica. Each target pathogen is represented in one of four distinct clade-specific Pathema web resources and underlying databases developed to target the specific data and analysis needs of each scientific community. All publicly available complete genome projects of phylogenetically related organisms are also represented, providing a comprehensive collection of organisms for comparative analyses. Pathema facilitates the scientific exploration of genomic and related data through its integration with web-based analysis tools, customized to obtain, display, and compute results relevant to ongoing pathogen research. Pathema serves the bio-defense and infectious disease research community by disseminating data resulting from pathogen genome sequencing projects and providing access to the results of inter-genomic comparisons for these organisms.
Collapse
|
26
|
Merhej V, Royer-Carenzi M, Pontarotti P, Raoult D. Massive comparative genomic analysis reveals convergent evolution of specialized bacteria. Biol Direct 2009; 4:13. [PMID: 19361336 PMCID: PMC2688493 DOI: 10.1186/1745-6150-4-13] [Citation(s) in RCA: 169] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2009] [Accepted: 04/10/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genome size and gene content in bacteria are associated with their lifestyles. Obligate intracellular bacteria (i.e., mutualists and parasites) have small genomes that derived from larger free-living bacterial ancestors; however, the different steps of bacterial specialization from free-living to intracellular lifestyle have not been studied comprehensively. The growing number of available sequenced genomes makes it possible to perform a statistical comparative analysis of 317 genomes from bacteria with different lifestyles. RESULTS Compared to free-living bacteria, host-dependent bacteria exhibit fewer rRNA genes, more split rRNA operons and fewer transcriptional regulators, linked to slower growth rates. We found a function-dependent and non-random loss of the same 100 orthologous genes in all obligate intracellular bacteria. Thus, we showed that obligate intracellular bacteria from different phyla are converging according to their lifestyle. Their specialization is an irreversible phenomenon characterized by translation modification and massive gene loss, including the loss of transcriptional regulators. Although both mutualists and parasites converge by genome reduction, these obligate intracellular bacteria have lost distinct sets of genes in the context of their specific host associations: mutualists have significantly more genes that enable nutrient provisioning whereas parasites have genes that encode Types II, IV, and VI secretion pathways. CONCLUSION Our findings suggest that gene loss, rather than acquisition of virulence factors, has been a driving force in the adaptation of parasites to eukaryotic cells. This comparative genomic analysis helps to explore the strategies by which obligate intracellular genomes specialize to particular host-associations and contributes to advance our knowledge about the mechanisms of bacterial evolution.
Collapse
Affiliation(s)
- Vicky Merhej
- Faculty of Medicine, Unit for Research on Emergent and Tropical Infectious Diseases, CNRS-IRD UMR 6236 IFR48, University of the Mediterranean, Marseilles, France.
| | | | | | | |
Collapse
|
27
|
Kastenmüller G, Schenk ME, Gasteiger J, Mewes HW. Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes. Genome Biol 2009; 10:R28. [PMID: 19284550 PMCID: PMC2690999 DOI: 10.1186/gb-2009-10-3-r28] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2008] [Revised: 02/12/2009] [Accepted: 03/10/2009] [Indexed: 01/20/2023] Open
Abstract
Identifying the biochemical basis of microbial phenotypes is a main objective of comparative genomics. Here we present a novel method using multivariate machine learning techniques for comparing automatically derived metabolic reconstructions of sequenced genomes on a large scale. Applying our method to 266 genomes directly led to testable hypotheses such as the link between the potential of microorganisms to cause periodontal disease and their ability to degrade histidine, a link also supported by clinical studies.
Collapse
Affiliation(s)
- Gabi Kastenmüller
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Ingolstädter Landstraße, D-85764 Neuherberg, Germany
| | - Maria Elisabeth Schenk
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Ingolstädter Landstraße, D-85764 Neuherberg, Germany
| | - Johann Gasteiger
- Computer-Chemie-Centrum, Universität Erlangen-Nürnberg, Nägelsbachstraße, D-91052 Erlangen, Germany
- Molecular Networks GmbH, Henkestraße 91, D-91052 Erlangen, Germany
| | - Hans-Werner Mewes
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Ingolstädter Landstraße, D-85764 Neuherberg, Germany
- Chair for Genome-oriented Bioinformatics, Technische Universität München, Life and Food Science Center Weihenstephan, Am Forum 1, D-85354 Freising-Weihenstephan, Germany
| |
Collapse
|
28
|
Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, Tatusova T, Thomson N, Allen MJ, Angiuoli SV, Ashburner M, Axelrod N, Baldauf S, Ballard S, Boore J, Cochrane G, Cole J, Dawyndt P, De Vos P, DePamphilis C, Edwards R, Faruque N, Feldman R, Gilbert J, Gilna P, Glöckner FO, Goldstein P, Guralnick R, Haft D, Hancock D, Hermjakob H, Hertz-Fowler C, Hugenholtz P, Joint I, Kagan L, Kane M, Kennedy J, Kowalchuk G, Kottmann R, Kolker E, Kravitz S, Kyrpides N, Leebens-Mack J, Lewis SE, Li K, Lister AL, Lord P, Maltsev N, Markowitz V, Martiny J, Methe B, Mizrachi I, Moxon R, Nelson K, Parkhill J, Proctor L, White O, Sansone SA, Spiers A, Stevens R, Swift P, Taylor C, Tateno Y, Tett A, Turner S, Ussery D, Vaughan B, Ward N, Whetzel T, San Gil I, Wilson G, Wipat A. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol 2008; 26:541-7. [PMID: 18464787 PMCID: PMC2409278 DOI: 10.1038/nbt1360] [Citation(s) in RCA: 969] [Impact Index Per Article: 60.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases.
Collapse
Affiliation(s)
- Dawn Field
- Natural Environmental Research Council Centre for Ecology and Hydrology, Oxford OX1 3SR, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Choi K, Kim S. ComPath: comparative enzyme analysis and annotation in pathway/subsystem contexts. BMC Bioinformatics 2008; 9:145. [PMID: 18325116 PMCID: PMC2277404 DOI: 10.1186/1471-2105-9-145] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2007] [Accepted: 03/06/2008] [Indexed: 11/16/2022] Open
Abstract
Background Once a new genome is sequenced, one of the important questions is to determine the presence and absence of biological pathways. Analysis of biological pathways in a genome is a complicated task since a number of biological entities are involved in pathways and biological pathways in different organisms are not identical. Computational pathway identification and analysis thus involves a number of computational tools and databases and typically done in comparison with pathways in other organisms. This computational requirement is much beyond the capability of biologists, so information systems for reconstructing, annotating, and analyzing biological pathways are much needed. We introduce a new comparative pathway analysis workbench, ComPath, which integrates various resources and computational tools using an interactive spreadsheet-style web interface for reliable pathway analyses. Results ComPath allows users to compare biological pathways in multiple genomes using a spreadsheet style web interface where various sequence-based analysis can be performed either to compare enzymes (e.g. sequence clustering) and pathways (e.g. pathway hole identification), to search a genome for de novo prediction of enzymes, or to annotate a genome in comparison with reference genomes of choice. To fill in pathway holes or make de novo enzyme predictions, multiple computational methods such as FASTA, Whole-HMM, CSR-HMM (a method of our own introduced in this paper), and PDB-domain search are integrated in ComPath. Our experiments show that FASTA and CSR-HMM search methods generally outperform Whole-HMM and PDB-domain search methods in terms of sensitivity, but FASTA search performs poorly in terms of specificity, detecting more false positive as E-value cutoff increases. Overall, CSR-HMM search method performs best in terms of both sensitivity and specificity. Gene neighborhood and pathway neighborhood (global network) visualization tools can be used to get context information that is complementary to conventional KEGG map representation. Conclusion ComPath is an interactive workbench for pathway reconstruction, annotation, and analysis where experts can perform various sequence, domain, context analysis, using an intuitive and interactive spreadsheet-style interface.
Collapse
Affiliation(s)
- Kwangmin Choi
- School of Informatics, Indiana University, Bloomington, IN 47408, USA.
| | | |
Collapse
|
30
|
Replacement of the Arginine Biosynthesis Operon in Xanthomonadales by Lateral Gene Transfer. J Mol Evol 2008; 66:266-75. [DOI: 10.1007/s00239-008-9082-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2007] [Revised: 07/23/2007] [Accepted: 01/25/2008] [Indexed: 11/30/2022]
|
31
|
Haft DH, Self WT. Orphan SelD proteins and selenium-dependent molybdenum hydroxylases. Biol Direct 2008; 3:4. [PMID: 18289380 PMCID: PMC2276186 DOI: 10.1186/1745-6150-3-4] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2008] [Accepted: 02/20/2008] [Indexed: 11/24/2022] Open
Abstract
Bacterial and Archaeal cells use selenium structurally in selenouridine-modified tRNAs, in proteins translated with selenocysteine, and in the selenium-dependent molybdenum hydroxylases (SDMH). The first two uses both require the selenophosphate synthetase gene, selD. Examining over 500 complete prokaryotic genomes finds selD in exactly two species lacking both the selenocysteine and selenouridine systems, Enterococcus faecalis and Haloarcula marismortui. Surrounding these orphan selD genes, forming bidirectional best hits between species, and detectable by Partial Phylogenetic Profiling vs. selD, are several candidate molybdenum hydroxylase subunits and accessory proteins. We propose that certain accessory proteins, and orphan selD itself, are markers through which new selenium-dependent molybdenum hydroxylases can be found.
Collapse
Affiliation(s)
- Daniel H Haft
- Department of Bioinformatics, J, Craig Venter Institute, Rockville, MD 20850, USA.
| | | |
Collapse
|
32
|
Annotation, comparison and databases for hundreds of bacterial genomes. Res Microbiol 2007; 158:724-36. [DOI: 10.1016/j.resmic.2007.09.009] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2007] [Revised: 09/21/2007] [Accepted: 09/26/2007] [Indexed: 11/20/2022]
|
33
|
Markowitz VM. Microbial genome data resources. Curr Opin Biotechnol 2007; 18:267-72. [PMID: 17467973 DOI: 10.1016/j.copbio.2007.04.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2007] [Revised: 03/18/2007] [Accepted: 04/18/2007] [Indexed: 11/17/2022]
Abstract
Studies of the genomes of individual microbial organisms as well as aggregate genomes (metagenomes) of microbial communities are expected to lead to advances in various areas, such as healthcare, environmental cleanup, and alternative energy production. A variety of specialized data resources manage the results of different microbial genome data processing and interpretation stages, and represent different degrees of microbial genome characterization. Scientists studying microbial genomes and metagenomes often need one or several of these resources. Given their diversity, these resources cannot be used effectively without determining the scope and type of individual resources as well as the relationship between their data.
Collapse
Affiliation(s)
- Victor M Markowitz
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mail Stop 50A-1148, Berkeley CA 94720, USA.
| |
Collapse
|
34
|
Greene JM, Collins F, Lefkowitz EJ, Roos D, Scheuermann RH, Sobral B, Stevens R, White O, Di Francesco V. National Institute of Allergy and Infectious Diseases bioinformatics resource centers: new assets for pathogen informatics. Infect Immun 2007; 75:3212-9. [PMID: 17420237 PMCID: PMC1932942 DOI: 10.1128/iai.00105-07] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- John M Greene
- National Institute of Allergy and Infectious Diseases/NIH, 6610 Rockledge Drive, MSC 6603, Bethesda, MD 20850-6603, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Abstract
The recent completion of the Human Genome Project has made possible a high-throughput "systems approach" for accelerating the elucidation of molecular underpinnings of human diseases, and subsequent derivation of molecular-based strategies to more effectively prevent, diagnose, and treat these diseases. Although altered phenotypes are among the most reliable manifestations of altered gene functions, research using systematic analysis of phenotype relationships to study human biology is still in its infancy. This article focuses on the emerging field of high-throughput phenotyping (HTP) phenomics research, which aims to capitalize on novel high-throughput computation and informatics technology developments to derive genomewide molecular networks of genotype-phenotype associations, or "phenomic associations." The HTP phenomics research field faces the challenge of technological research and development to generate novel tools in computation and informatics that will allow researchers to amass, access, integrate, organize, and manage phenotypic databases across species and enable genomewide analysis to associate phenotypic information with genomic data at different scales of biology. Key state-of-the-art technological advancements critical for HTP phenomics research are covered in this review. In particular, we highlight the power of computational approaches to conduct large-scale phenomics studies.
Collapse
Affiliation(s)
- Yves A Lussier
- Section of Genetic Medicine, Department of Medicine, University of Chicago,Chicago, Illinois 60637, USA.
| | | |
Collapse
|
36
|
Osterman AL, Begley TP. A subsystems-based approach to the identification of drug targets in bacterial pathogens. PROGRESS IN DRUG RESEARCH. FORTSCHRITTE DER ARZNEIMITTELFORSCHUNG. PROGRES DES RECHERCHES PHARMACEUTIQUES 2007; 64:131, 133-70. [PMID: 17195474 DOI: 10.1007/978-3-7643-7567-6_6] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
This chapter describes a three-stage approach to target identification based upon subsystem analysis. Subsystems analysis focuses on related metabolic pathways as a unit and is a biochemically-informed approach to target selection. The process involves three stages of analysis; the first stage, selection of the target subsystem, is guided by information about its essentiality and on the predicted vulnerability of the targeted pathway or enzyme to inhibition. The second stage involves analysis of the target subsystem by means of comparative genomics, including genome context analysis and metabolic reconstruction. The third stage evaluates the selection of the specific target genes within the subsystem by target prioritization and validation. The whole process allows for a careful consideration of spectrum, drugability, biological rationale and the metabolic role of the specific target within the context of an integrated circuit within a specific metabolic pathway.
Collapse
Affiliation(s)
- Andrei L Osterman
- Burnham Institute for Medical Research, Infectious and Inflammatory Disease Center, La Jolla, California, USA.
| | | |
Collapse
|
37
|
Abstract
As the molecular adapters between codons and amino acids, transfer-RNAs are pivotal molecules of the genetic code. The coding properties of a tRNA molecule do not reside only in its primary sequence. Posttranscriptional nucleoside modifications, particularly in the anticodon loop, can modify cognate codon recognition, affect aminoacylation properties, or stabilize the codon-anticodon wobble base pairing to prevent ribosomal frameshifting. Despite a wealth of biophysical and structural knowledge of the tRNA modifications themselves, their pathways of biosynthesis had been until recently only partially characterized. This discrepancy was mainly due to the lack of obvious phenotypes for tRNA modification-deficient strains and to the difficulty of the biochemical assays used to detect tRNA modifications. However, the availability of hundreds of whole-genome sequences has allowed the identification of many of these missing tRNA-modification genes. This chapter reviews the methods that were used to identify these genes with a special emphasis on the comparative genomic approaches. Methods that link gene and function but do not rely on sequence homology will be detailed, with examples taken from the tRNA modification field.
Collapse
|
38
|
Selengut JD, Haft DH, Davidsen T, Ganapathy A, Gwinn-Giglio M, Nelson WC, Richter AR, White O. TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res 2006; 35:D260-4. [PMID: 17151080 PMCID: PMC1781115 DOI: 10.1093/nar/gkl1043] [Citation(s) in RCA: 229] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
TIGRFAMs is a collection of protein family definitions built to aid in high-throughput annotation of specific protein functions. Each family is based on a hidden Markov model (HMM), where both cutoff scores and membership in the seed alignment are chosen so that the HMMs can classify numerous proteins according to their specific molecular functions. Most TIGRFAMs models describe ‘equivalog’ families, where both orthology and lateral gene transfer may be part of the evolutionary history, but where a single molecular function has been conserved. The Genome Properties system contains a queriable set of metabolic reconstructions, genome metrics and extractions of information from the scientific literature. Its genome-by-genome assertions of whether or not specific structures, pathways or systems are present provide high-level conceptual descriptions of genomic content. These assertions enable comparative genomics, provide a meaningful biological context to aid in manual annotation, support assignments of Gene Ontology (GO) biological process terms and help validate HMM-based predictions of protein function. The Genome Properties system is particularly useful as a generator of phylogenetic profiles, through which new protein family functions may be discovered. The TIGRFAMs and Genome Properties systems can be accessed at and .
Collapse
Affiliation(s)
- Jeremy D Selengut
- TIGR, Bioinformatics Department, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | | | | | | | | | | | |
Collapse
|
39
|
Mattes WB. Cross-species comparative toxicogenomics as an aid to safety assessment. Expert Opin Drug Metab Toxicol 2006; 2:859-74. [PMID: 17125406 DOI: 10.1517/17425255.2.6.859] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Cross-species comparative toxicogenomics has the potential for improving the understanding of the different responses of animal models to toxicants at a molecular level. This understanding could then lead to a more accurate extrapolation of the risk posed by these toxicants to humans. Cross-species comparative studies have been carried out at the genomic sequence level and using microarrays to examine changes in global mRNA profiles. However, these studies face considerable bioinformatic challenges in terms of identifying which genes are truly orthologous across species. The resources to analyse such studies, in the context of such orthologues, beg improvement. Finally, the experimental design of such studies needs to be carefully considered to make their results fully interpretable. These issues are discussed, along with the current state-of-the-art cross-species comparative toxicogenomics in this review.
Collapse
|
40
|
Badger JH, Hoover TR, Brun YV, Weiner RM, Laub MT, Alexandre G, Mrázek J, Ren Q, Paulsen IT, Nelson KE, Khouri HM, Radune D, Sosa J, Dodson RJ, Sullivan SA, Rosovitz MJ, Madupu R, Brinkac LM, Durkin AS, Daugherty SC, Kothari SP, Giglio MG, Zhou L, Haft DH, Selengut JD, Davidsen TM, Yang Q, Zafar N, Ward NL. Comparative genomic evidence for a close relationship between the dimorphic prosthecate bacteria Hyphomonas neptunium and Caulobacter crescentus. J Bacteriol 2006; 188:6841-50. [PMID: 16980487 PMCID: PMC1595504 DOI: 10.1128/jb.00111-06] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The dimorphic prosthecate bacteria (DPB) are alpha-proteobacteria that reproduce in an asymmetric manner rather than by binary fission and are of interest as simple models of development. Prior to this work, the only member of this group for which genome sequence was available was the model freshwater organism Caulobacter crescentus. Here we describe the genome sequence of Hyphomonas neptunium, a marine member of the DPB that differs from C. crescentus in that H. neptunium uses its stalk as a reproductive structure. Genome analysis indicates that this organism shares more genes with C. crescentus than it does with Silicibacter pomeroyi (a closer relative according to 16S rRNA phylogeny), that it relies upon a heterotrophic strategy utilizing a wide range of substrates, that its cell cycle is likely to be regulated in a similar manner to that of C. crescentus, and that the outer membrane complements of H. neptunium and C. crescentus are remarkably similar. H. neptunium swarmer cells are highly motile via a single polar flagellum. With the exception of cheY and cheR, genes required for chemotaxis were absent in the H. neptunium genome. Consistent with this observation, H. neptunium swarmer cells did not respond to any chemotactic stimuli that were tested, which suggests that H. neptunium motility is a random dispersal mechanism for swarmer cells rather than a stimulus-controlled navigation system for locating specific environments. In addition to providing insights into bacterial development, the H. neptunium genome will provide an important resource for the study of other interesting biological processes including chromosome segregation, polar growth, and cell aging.
Collapse
Affiliation(s)
- Jonathan H Badger
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Liu Y, Li J, Sam L, Goh CS, Gerstein M, Lussier YA. An integrative genomic approach to uncover molecular mechanisms of prokaryotic traits. PLoS Comput Biol 2006; 2:e159. [PMID: 17112314 PMCID: PMC1636675 DOI: 10.1371/journal.pcbi.0020159] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2006] [Accepted: 10/10/2006] [Indexed: 11/18/2022] Open
Abstract
With mounting availability of genomic and phenotypic databases, data integration and mining become increasingly challenging. While efforts have been put forward to analyze prokaryotic phenotypes, current computational technologies either lack high throughput capacity for genomic scale analysis, or are limited in their capability to integrate and mine data across different scales of biology. Consequently, simultaneous analysis of associations among genomes, phenotypes, and gene functions is prohibited. Here, we developed a high throughput computational approach, and demonstrated for the first time the feasibility of integrating large quantities of prokaryotic phenotypes along with genomic datasets for mining across multiple scales of biology (protein domains, pathways, molecular functions, and cellular processes). Applying this method over 59 fully sequenced prokaryotic species, we identified genetic basis and molecular mechanisms underlying the phenotypes in bacteria. We identified 3,711 significant correlations between 1,499 distinct Pfam and 63 phenotypes, with 2,650 correlations and 1,061 anti-correlations. Manual evaluation of a random sample of these significant correlations showed a minimal precision of 30% (95% confidence interval: 20%-42%; n = 50). We stratified the most significant 478 predictions and subjected 100 to manual evaluation, of which 60 were corroborated in the literature. We furthermore unveiled 10 significant correlations between phenotypes and KEGG pathways, eight of which were corroborated in the evaluation, and 309 significant correlations between phenotypes and 166 GO concepts evaluated using a random sample (minimal precision = 72%; 95% confidence interval: 60%-80%; n = 50). Additionally, we conducted a novel large-scale phenomic visualization analysis to provide insight into the modular nature of common molecular mechanisms spanning multiple biological scales and reused by related phenotypes (metaphenotypes). We propose that this method elucidates which classes of molecular mechanisms are associated with phenotypes or metaphenotypes and holds promise in facilitating a computable systems biology approach to genomic and biomedical research.
Collapse
Affiliation(s)
- Yang Liu
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Center for Biomedical Informatics, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Jianrong Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Center for Biomedical Informatics, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Lee Sam
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Center for Biomedical Informatics, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Chern-Sing Goh
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
- * To whom correspondence should be addressed. E-mail: (MG); (YAL)
| | - Yves A Lussier
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Center for Biomedical Informatics, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Department of Biomedical Informatics, Columbia University, New York, New York, United States of America
- * To whom correspondence should be addressed. E-mail: (MG); (YAL)
| |
Collapse
|
42
|
Seshadri R, Joseph SW, Chopra AK, Sha J, Shaw J, Graf J, Haft D, Wu M, Ren Q, Rosovitz MJ, Madupu R, Tallon L, Kim M, Jin S, Vuong H, Stine OC, Ali A, Horneman AJ, Heidelberg JF. Genome sequence of Aeromonas hydrophila ATCC 7966T: jack of all trades. J Bacteriol 2006; 188:8272-82. [PMID: 16980456 PMCID: PMC1698176 DOI: 10.1128/jb.00621-06] [Citation(s) in RCA: 259] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The complete genome of Aeromonas hydrophila ATCC 7966(T) was sequenced. Aeromonas, a ubiquitous waterborne bacterium, has been placed by the Environmental Protection Agency on the Contaminant Candidate List because of its potential to cause human disease. The 4.7-Mb genome of this emerging pathogen shows a physiologically adroit organism with broad metabolic capabilities and considerable virulence potential. A large array of virulence genes, including some identified in clinical isolates of Aeromonas spp. or Vibrio spp., may confer upon this organism the ability to infect a wide range of hosts. However, two recognized virulence markers, a type III secretion system and a lateral flagellum, that are reported in other A. hydrophila strains are not identified in the sequenced isolate, ATCC 7966(T). Given the ubiquity and free-living lifestyle of this organism, there is relatively little evidence of fluidity in terms of mobile elements in the genome of this particular strain. Notable aspects of the metabolic repertoire of A. hydrophila include dissimilatory sulfate reduction and resistance mechanisms (such as thiopurine reductase, arsenate reductase, and phosphonate degradation enzymes) against toxic compounds encountered in polluted waters. These enzymes may have bioremediative as well as industrial potential. Thus, the A. hydrophila genome sequence provides valuable insights into its ability to flourish in both aquatic and host environments.
Collapse
Affiliation(s)
- Rekha Seshadri
- The Institute for Genomic Research, Division of J. Craig Venter Institute, Rockville, MD 20850,USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Field D, Wilson G, van der Gast C. How do we compare hundreds of bacterial genomes? Curr Opin Microbiol 2006; 9:499-504. [PMID: 16942900 DOI: 10.1016/j.mib.2006.08.008] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2006] [Accepted: 08/16/2006] [Indexed: 11/26/2022]
Abstract
The genomic revolution is fully upon us in 2006 and the pace of discovery is set to accelerate with the emergence of ultra-high-throughput sequencing technologies. Our complete genome collection of bacteria and archaea continues to grow in number and diversity, as genome sequencing is applied to an array of new problems, from the characterization of the pan-genome to the detection of mutation after experimentation and the exploration of microbial communities in unprecedented detail. The benefits of large-scale comparative genomic analyses are driving the community to think about how to manage our public collections of genomes in novel ways.
Collapse
Affiliation(s)
- Dawn Field
- Oxford Centre for Ecology and Hydrology, Oxford OX1 3SR, UK.
| | | | | |
Collapse
|
44
|
Haft DH, Paulsen IT, Ward N, Selengut JD. Exopolysaccharide-associated protein sorting in environmental organisms: the PEP-CTERM/EpsH system. Application of a novel phylogenetic profiling heuristic. BMC Biol 2006; 4:29. [PMID: 16930487 PMCID: PMC1569441 DOI: 10.1186/1741-7007-4-29] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2006] [Accepted: 08/24/2006] [Indexed: 11/13/2022] Open
Abstract
Background Protein translocation to the proper cellular destination may be guided by various classes of sorting signals recognizable in the primary sequence. Detection in some genomes, but not others, may reveal sorting system components by comparison of the phylogenetic profile of the class of sorting signal to that of various protein families. Results We describe a short C-terminal homology domain, sporadically distributed in bacteria, with several key characteristics of protein sorting signals. The domain includes a near-invariant motif Pro-Glu-Pro (PEP). This possible recognition or processing site is followed by a predicted transmembrane helix and a cluster rich in basic amino acids. We designate this domain PEP-CTERM. It tends to occur multiple times in a genome if it occurs at all, with a median count of eight instances; Verrucomicrobium spinosum has sixty-five. PEP-CTERM-containing proteins generally contain an N-terminal signal peptide and exhibit high diversity and little homology to known proteins. All bacteria with PEP-CTERM have both an outer membrane and exopolysaccharide (EPS) production genes. By a simple heuristic for screening phylogenetic profiles in the absence of pre-formed protein families, we discovered that a homolog of the membrane protein EpsH (exopolysaccharide locus protein H) occurs in a species when PEP-CTERM domains are found. The EpsH family contains invariant residues consistent with a transpeptidase function. Most PEP-CTERM proteins are encoded by single-gene operons preceded by large intergenic regions. In the Proteobacteria, most of these upstream regions share a DNA sequence, a probable cis-regulatory site that contains a sigma-54 binding motif. The phylogenetic profile for this DNA sequence exactly matches that of three proteins: a sigma-54-interacting response regulator (PrsR), a transmembrane histidine kinase (PrsK), and a TPR protein (PrsT). Conclusion These findings are consistent with the hypothesis that PEP-CTERM and EpsH form a protein export sorting system, analogous to the LPXTG/sortase system of Gram-positive bacteria, and correlated to EPS expression. It occurs preferentially in bacteria from sediments, soils, and biofilms. The novel method that led to these findings, partial phylogenetic profiling, requires neither global sequence clustering nor arbitrary similarity cutoffs and appears to be a rapid, effective alternative to other profiling methods.
Collapse
Affiliation(s)
- Daniel H Haft
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville MD 20850, USA
| | - Ian T Paulsen
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville MD 20850, USA
| | - Naomi Ward
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville MD 20850, USA
| | - Jeremy D Selengut
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville MD 20850, USA
| |
Collapse
|
45
|
Mormann S, Lömker A, Rückert C, Gaigalat L, Tauch A, Pühler A, Kalinowski J. Random mutagenesis in Corynebacterium glutamicum ATCC 13032 using an IS6100-based transposon vector identified the last unknown gene in the histidine biosynthesis pathway. BMC Genomics 2006; 7:205. [PMID: 16901339 PMCID: PMC1590026 DOI: 10.1186/1471-2164-7-205] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2006] [Accepted: 08/10/2006] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Corynebacterium glutamicum, a Gram-positive bacterium of the class Actinobacteria, is an industrially relevant producer of amino acids. Several methods for the targeted genetic manipulation of this organism and rational strain improvement have been developed. An efficient transposon mutagenesis system for the completely sequenced type strain ATCC 13032 would significantly advance functional genome analysis in this bacterium. RESULTS A comprehensive transposon mutant library comprising 10,080 independent clones was constructed by electrotransformation of the restriction-deficient derivative of strain ATCC 13032, C. glutamicum RES167, with an IS6100-containing non-replicative plasmid. Transposon mutants had stable cointegrates between the transposon vector and the chromosome. Altogether 172 transposon integration sites have been determined by sequencing of the chromosomal inserts, revealing that each integration occurred at a different locus. Statistical target site analyses revealed an apparent absence of a target site preference. From the library, auxotrophic mutants were obtained with a frequency of 2.9%. By auxanography analyses nearly two thirds of the auxotrophs were further characterized, including mutants with single, double and alternative nutritional requirements. In most cases the nutritional requirement observed could be correlated to the annotation of the mutated gene involved in the biosynthesis of an amino acid, a nucleotide or a vitamin. One notable exception was a clone mutagenized by transposition into the gene cg0910, which exhibited an auxotrophy for histidine. The protein sequence deduced from cg0910 showed high sequence similarities to inositol-1(or 4)-monophosphatases (EC 3.1.3.25). Subsequent genetic deletion of cg0910 delivered the same histidine-auxotrophic phenotype. Genetic complementation of the mutants as well as supplementation by histidinol suggests that cg0910 encodes the hitherto unknown essential L-histidinol-phosphate phosphatase (EC 3.1.3.15) in C. glutamicum. The cg0910 gene, renamed hisN, and its encoded enzyme have putative orthologs in almost all Actinobacteria, including mycobacteria and streptomycetes. CONCLUSION The absence of regional and sequence preferences of IS6100-transposition demonstrate that the established system is suitable for efficient genome-scale random mutagenesis in the sequenced type strain C.glutamicum ATCC 13032. The identification of the hisN gene encoding histidinol-phosphate phosphatase in C. glutamicum closed the last gap in histidine synthesis in the Actinobacteria. The system might be a valuable genetic tool also in other bacteria due to the broad host-spectrum of IS6100.
Collapse
Affiliation(s)
- Sascha Mormann
- Institut für Genomforschung, Universität Bielefeld, D-33594 Bielefeld, Germany
- Lehrstuhl für Genetik, Universität Bielefeld, D-33594 Bielefeld, Germany
| | - Alexander Lömker
- Institut für Genomforschung, Universität Bielefeld, D-33594 Bielefeld, Germany
- Lehrstuhl für Genetik, Universität Bielefeld, D-33594 Bielefeld, Germany
| | - Christian Rückert
- Institut für Genomforschung, Universität Bielefeld, D-33594 Bielefeld, Germany
- Lehrstuhl für Genetik, Universität Bielefeld, D-33594 Bielefeld, Germany
| | - Lars Gaigalat
- Institut für Genomforschung, Universität Bielefeld, D-33594 Bielefeld, Germany
- Lehrstuhl für Genetik, Universität Bielefeld, D-33594 Bielefeld, Germany
| | - Andreas Tauch
- Institut für Genomforschung, Universität Bielefeld, D-33594 Bielefeld, Germany
| | - Alfred Pühler
- Lehrstuhl für Genetik, Universität Bielefeld, D-33594 Bielefeld, Germany
| | - Jörn Kalinowski
- Institut für Genomforschung, Universität Bielefeld, D-33594 Bielefeld, Germany
| |
Collapse
|
46
|
Schmidt T, Frishman D. PROMPT: a protein mapping and comparison tool. BMC Bioinformatics 2006; 7:331. [PMID: 16817977 PMCID: PMC1569443 DOI: 10.1186/1471-2105-7-331] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2006] [Accepted: 07/04/2006] [Indexed: 11/12/2022] Open
Abstract
Background Comparison of large protein datasets has become a standard task in bioinformatics. Typically researchers wish to know whether one group of proteins is significantly enriched in certain annotation attributes or sequence properties compared to another group, and whether this enrichment is statistically significant. In order to conduct such comparisons it is often required to integrate molecular sequence data and experimental information from disparate incompatible sources. While many specialized programs exist for comparisons of this kind in individual problem domains, such as expression data analysis, no generic software solution capable of addressing a wide spectrum of routine tasks in comparative proteomics is currently available. Results PROMPT is a comprehensive bioinformatics software environment which enables the user to compare arbitrary protein sequence sets, revealing statistically significant differences in their annotation features. It allows automatic retrieval and integration of data from a multitude of molecular biological databases as well as from a custom XML format. Similarity-based mapping of sequence IDs makes it possible to link experimental information obtained from different sources despite discrepancies in gene identifiers and minor sequence variation. PROMPT provides a full set of statistical procedures to address the following four use cases: i) comparison of the frequencies of categorical annotations between two sets, ii) enrichment of nominal features in one set with respect to another one, iii) comparison of numeric distributions, and iv) correlation of numeric variables. Analysis results can be visualized in the form of plots and spreadsheets and exported in various formats, including Microsoft Excel. Conclusion PROMPT is a versatile, platform-independent, easily expandable, stand-alone application designed to be a practical workhorse in analysing and mining protein sequences and associated annotation. The availability of the Java Application Programming Interface and scripting capabilities on one hand, and the intuitive Graphical User Interface with context-sensitive help system on the other, make it equally accessible to professional bioinformaticians and biologically-oriented users. PROMPT is freely available for academic users from .
Collapse
Affiliation(s)
- Thorsten Schmidt
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | - Dmitrij Frishman
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| |
Collapse
|
47
|
Gerdes SY, Kurnasov OV, Shatalin K, Polanuyer B, Sloutsky R, Vonstein V, Overbeek R, Osterman AL. Comparative genomics of NAD biosynthesis in cyanobacteria. J Bacteriol 2006; 188:3012-23. [PMID: 16585762 PMCID: PMC1446974 DOI: 10.1128/jb.188.8.3012-3023.2006] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2005] [Accepted: 01/23/2006] [Indexed: 11/20/2022] Open
Abstract
Biosynthesis of NAD(P) cofactors is of special importance for cyanobacteria due to their role in photosynthesis and respiration. Despite significant progress in understanding NAD(P) biosynthetic machinery in some model organisms, relatively little is known about its implementation in cyanobacteria. We addressed this problem by a combination of comparative genome analysis with verification experiments in the model system of Synechocystis sp. strain PCC 6803. A detailed reconstruction of the NAD(P) metabolic subsystem using the SEED genomic platform (http://theseed.uchicago.edu/FIG/index.cgi) helped us accurately annotate respective genes in the entire set of 13 cyanobacterial species with completely sequenced genomes available at the time. Comparative analysis of operational variants implemented in this divergent group allowed us to elucidate both conserved (de novo and universal pathways) and variable (recycling and salvage pathways) aspects of this subsystem. Focused genetic and biochemical experiments confirmed several conjectures about the key aspects of this subsystem. (i) The product of the slr1691 gene, a homolog of Escherichia coli gene nadE containing an additional nitrilase-like N-terminal domain, is a NAD synthetase capable of utilizing glutamine as an amide donor in vitro. (ii) The product of the sll1916 gene, a homolog of E. coli gene nadD, is a nicotinic acid mononucleotide-preferring adenylyltransferase. This gene is essential for survival and cannot be compensated for by an alternative nicotinamide mononucleotide (NMN)-preferring adenylyltransferase (slr0787 gene). (iii) The product of the slr0788 gene is a nicotinamide-preferring phosphoribosyltransferase involved in the first step of the two-step non-deamidating utilization of nicotinamide (NMN shunt). (iv) The physiological role of this pathway encoded by a conserved gene cluster, slr0787-slr0788, is likely in the recycling of endogenously generated nicotinamide, as supported by the inability of this organism to utilize exogenously provided niacin. Positional clustering and the co-occurrence profile of the respective genes across a diverse collection of cellular organisms provide evidence of horizontal transfer events in the evolutionary history of this pathway.
Collapse
Affiliation(s)
- Svetlana Y. Gerdes
- Fellowship for Interpretation of Genomes, Burr Ridge, Illinois 60527, Burnham Institute for Medical Research, La Jolla, California 92037, Department of Biochemistry, New York University School of Medicine, New York, New York 10016, Rohm and Haas Company, Advanced Biosciences Division, Spring House, Pennsylvania 19477, Department of Molecular Virology, Immunology, and Medical Genetics, Ohio State University, Columbus, Ohio 43210
| | - Oleg V. Kurnasov
- Fellowship for Interpretation of Genomes, Burr Ridge, Illinois 60527, Burnham Institute for Medical Research, La Jolla, California 92037, Department of Biochemistry, New York University School of Medicine, New York, New York 10016, Rohm and Haas Company, Advanced Biosciences Division, Spring House, Pennsylvania 19477, Department of Molecular Virology, Immunology, and Medical Genetics, Ohio State University, Columbus, Ohio 43210
| | - Konstantin Shatalin
- Fellowship for Interpretation of Genomes, Burr Ridge, Illinois 60527, Burnham Institute for Medical Research, La Jolla, California 92037, Department of Biochemistry, New York University School of Medicine, New York, New York 10016, Rohm and Haas Company, Advanced Biosciences Division, Spring House, Pennsylvania 19477, Department of Molecular Virology, Immunology, and Medical Genetics, Ohio State University, Columbus, Ohio 43210
| | - Boris Polanuyer
- Fellowship for Interpretation of Genomes, Burr Ridge, Illinois 60527, Burnham Institute for Medical Research, La Jolla, California 92037, Department of Biochemistry, New York University School of Medicine, New York, New York 10016, Rohm and Haas Company, Advanced Biosciences Division, Spring House, Pennsylvania 19477, Department of Molecular Virology, Immunology, and Medical Genetics, Ohio State University, Columbus, Ohio 43210
| | - Roman Sloutsky
- Fellowship for Interpretation of Genomes, Burr Ridge, Illinois 60527, Burnham Institute for Medical Research, La Jolla, California 92037, Department of Biochemistry, New York University School of Medicine, New York, New York 10016, Rohm and Haas Company, Advanced Biosciences Division, Spring House, Pennsylvania 19477, Department of Molecular Virology, Immunology, and Medical Genetics, Ohio State University, Columbus, Ohio 43210
| | - Veronika Vonstein
- Fellowship for Interpretation of Genomes, Burr Ridge, Illinois 60527, Burnham Institute for Medical Research, La Jolla, California 92037, Department of Biochemistry, New York University School of Medicine, New York, New York 10016, Rohm and Haas Company, Advanced Biosciences Division, Spring House, Pennsylvania 19477, Department of Molecular Virology, Immunology, and Medical Genetics, Ohio State University, Columbus, Ohio 43210
| | - Ross Overbeek
- Fellowship for Interpretation of Genomes, Burr Ridge, Illinois 60527, Burnham Institute for Medical Research, La Jolla, California 92037, Department of Biochemistry, New York University School of Medicine, New York, New York 10016, Rohm and Haas Company, Advanced Biosciences Division, Spring House, Pennsylvania 19477, Department of Molecular Virology, Immunology, and Medical Genetics, Ohio State University, Columbus, Ohio 43210
| | - Andrei L. Osterman
- Fellowship for Interpretation of Genomes, Burr Ridge, Illinois 60527, Burnham Institute for Medical Research, La Jolla, California 92037, Department of Biochemistry, New York University School of Medicine, New York, New York 10016, Rohm and Haas Company, Advanced Biosciences Division, Spring House, Pennsylvania 19477, Department of Molecular Virology, Immunology, and Medical Genetics, Ohio State University, Columbus, Ohio 43210
| |
Collapse
|
48
|
Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides NC. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. Nucleic Acids Res 2006; 34:D332-4. [PMID: 16381880 PMCID: PMC1347507 DOI: 10.1093/nar/gkj145] [Citation(s) in RCA: 196] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The Genomes On Line Database (GOLD) is a web resource for comprehensive access to information regarding complete and ongoing genome sequencing projects worldwide. The database currently incorporates information on over 1500 sequencing projects, of which 294 have been completed and the data deposited in the public databases. GOLD v.2 has been expanded to provide information related to organism properties such as phenotype, ecotype and disease. Furthermore, project relevance and availability information is now included. GOLD is available at http://www.genomesonline.org. It is also mirrored at the Institute of Molecular Biology and Biotechnology, Crete, Greece at http://gold.imbb.forth.gr/
Collapse
Affiliation(s)
- Konstantinos Liolios
- Department of Pathology, Feinberg School of Medicine, Northwestern UniversityChicago, IL, USA
- Department of Microbiology-Immunology, Feinberg School of Medicine, Northwestern UniversityChicago, IL, USA
| | - Nektarios Tavernarakis
- Institute of Molecular Biology and Biotechnology, Foundation for Research and TechnologyHeraklion, Crete, Greece
| | - Philip Hugenholtz
- Microbial Ecology Program, Joint Genome Institute2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Nikos C. Kyrpides
- Microbial Genome Analysis Program, Joint Genome Institute2800 Mitchell Drive, Walnut Creek, CA, USA
- To whom correspondence should be addressed. Tel: +1 925 296 5718; Fax: +1 925 296 5720;
| |
Collapse
|
49
|
Wang HC, Susko E, Roger AJ. On the correlation between genomic G+C content and optimal growth temperature in prokaryotes: data quality and confounding factors. Biochem Biophys Res Commun 2006; 342:681-4. [PMID: 16499870 DOI: 10.1016/j.bbrc.2006.02.037] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2006] [Accepted: 02/08/2006] [Indexed: 11/30/2022]
Abstract
The correlation between genomic G+C content and optimal growth temperature in prokaryotes has gained renewed interest after Musto et al. [H. Musto, H. Naya, A. Zavala, H. Romero, F. Alvarex-Valin, G. Bernardi, Correlations between genomic GC levels and optimal growth temperatures in prokaryotes, FEBS Lett. 573 (2004) 73-77], reported that positive correlations exist in 15 families studied. We have reanalyzed their data and found that when genome size and data quality were adjusted for, there was no significant evidence of relationship between optimal temperature and GC content for two of the families that had previously shown strongly significant correlations. Using updated temperature optima for Halobacteriaceae species we found the correlation is insignificant in this family. For the family Enterobacteriaceae when genome size and optimal temperature are included in a multiple linear regression, only genome size is significant as a predictor of GC content. We showed that more profound statistical methods than simple two factor correlation analysis should be used for analyzing complex intrinsic and extrinsic factors that affect genomic GC content. We further found that a positive correlation between temperature and genomic GC is only evident in free-living species of low optimal growth temperatures.
Collapse
Affiliation(s)
- Huai-Chun Wang
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada B3H 3J5.
| | | | | |
Collapse
|
50
|
Dunning Hotopp JC, Lin M, Madupu R, Crabtree J, Angiuoli SV, Eisen JA, Eisen J, Seshadri R, Ren Q, Wu M, Utterback TR, Smith S, Lewis M, Khouri H, Zhang C, Niu H, Lin Q, Ohashi N, Zhi N, Nelson W, Brinkac LM, Dodson RJ, Rosovitz MJ, Sundaram J, Daugherty SC, Davidsen T, Durkin AS, Gwinn M, Haft DH, Selengut JD, Sullivan SA, Zafar N, Zhou L, Benahmed F, Forberger H, Halpin R, Mulligan S, Robinson J, White O, Rikihisa Y, Tettelin H. Comparative genomics of emerging human ehrlichiosis agents. PLoS Genet 2006; 2:e21. [PMID: 16482227 PMCID: PMC1366493 DOI: 10.1371/journal.pgen.0020021] [Citation(s) in RCA: 341] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2005] [Accepted: 01/09/2006] [Indexed: 11/25/2022] Open
Abstract
Anaplasma (formerly Ehrlichia) phagocytophilum, Ehrlichia chaffeensis, and Neorickettsia (formerly Ehrlichia) sennetsu are intracellular vector-borne pathogens that cause human ehrlichiosis, an emerging infectious disease. We present the complete genome sequences of these organisms along with comparisons to other organisms in the Rickettsiales order. Ehrlichia spp. and Anaplasma spp. display a unique large expansion of immunodominant outer membrane proteins facilitating antigenic variation. All Rickettsiales have a diminished ability to synthesize amino acids compared to their closest free-living relatives. Unlike members of the Rickettsiaceae family, these pathogenic Anaplasmataceae are capable of making all major vitamins, cofactors, and nucleotides, which could confer a beneficial role in the invertebrate vector or the vertebrate host. Further analysis identified proteins potentially involved in vacuole confinement of the Anaplasmataceae, a life cycle involving a hematophagous vector, vertebrate pathogenesis, human pathogenesis, and lack of transovarial transmission. These discoveries provide significant insights into the biology of these obligate intracellular pathogens. Ehrlichiosis is an acute disease that triggers flu-like symptoms in both humans and animals. It is caused by a range of bacteria transmitted by ticks or flukes. Because these bacteria are difficult to culture, however, the organisms are poorly understood. The genomes of three emerging human pathogens causing ehrlichiosis were sequenced. A database was designed to allow the comparison of these three genomes to sixteen other bacteria with similar lifestyles. Analysis from this database reveals new species-specific and disease-specific genes indicating niche adaptations, pathogenic traits, and other features. In particular, one of the organisms contains more than 100 copies of a single gene involved in interactions with the host(s). These comparisons also enabled a reconstruction of the metabolic potential of five representative genomes from these bacteria and their close relatives. With this work, scientists can study these emerging pathogens in earnest.
Collapse
|