1
|
A Survey of Data Mining and Deep Learning in Bioinformatics. J Med Syst 2018; 42:139. [DOI: 10.1007/s10916-018-1003-9] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 06/21/2018] [Indexed: 12/13/2022]
|
2
|
|
3
|
Abstract
Reconstruction of metabolic networks from metabolites, enzymes, and reactions is the foundation of the network-based study on metabolism. In this chapter, we describe a practical method for reconstructing metabolic networks from KEGG. This method makes use of organism-specific pathway data in the KEGG/PATHWAY database to reconstruct metabolic networks on pathway level, and the pathway hierarchy data in the KEGG/ORTHOLOGY database to guide the network reconstruction on higher levels. By calling upon the KEGG Web services, this method ensures the data used in the reconstruction are correct and up-to-date. The incorporation of a local relational database allows caching of pathway data improves performance and speeds up network reconstruction. Some applications of reconstructed networks on network alignment and network topology analysis are exampled and notes are stated in the end.
Collapse
Affiliation(s)
- Tingting Zhou
- Laboratory of Molecular Immunology, Institute of Basic Medical Sciences, Beijing, People's Republic of China.
| |
Collapse
|
4
|
Jung SK, McDonald K. Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization. BMC Bioinformatics 2011; 12:340. [PMID: 21846353 PMCID: PMC3215308 DOI: 10.1186/1471-2105-12-340] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2011] [Accepted: 08/16/2011] [Indexed: 08/26/2023] Open
Abstract
Background Direct gene synthesis is becoming more popular owing to decreases in gene synthesis pricing. Compared with using natural genes, gene synthesis provides a good opportunity to optimize gene sequence for specific applications. In order to facilitate gene optimization, we have developed a stand-alone software called Visual Gene Developer. Results The software not only provides general functions for gene analysis and optimization along with an interactive user-friendly interface, but also includes unique features such as programming capability, dedicated mRNA secondary structure prediction, artificial neural network modeling, network & multi-threaded computing, and user-accessible programming modules. The software allows a user to analyze and optimize a sequence using main menu functions or specialized module windows. Alternatively, gene optimization can be initiated by designing a gene construct and configuring an optimization strategy. A user can choose several predefined or user-defined algorithms to design a complicated strategy. The software provides expandable functionality as platform software supporting module development using popular script languages such as VBScript and JScript in the software programming environment. Conclusion Visual Gene Developer is useful for both researchers who want to quickly analyze and optimize genes, and those who are interested in developing and testing new algorithms in bioinformatics. The software is available for free download at http://www.visualgenedeveloper.net.
Collapse
Affiliation(s)
- Sang-Kyu Jung
- Department of Chemical Engineering and Materials Science, University of California, Davis, 1 Shields Ave, Davis, CA 95616, USA
| | | |
Collapse
|
5
|
ModEnzA: Accurate Identification of Metabolic Enzymes Using Function Specific Profile HMMs with Optimised Discrimination Threshold and Modified Emission Probabilities. Adv Bioinformatics 2011; 2011:743782. [PMID: 21541071 PMCID: PMC3085309 DOI: 10.1155/2011/743782] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2010] [Revised: 12/07/2010] [Accepted: 01/27/2011] [Indexed: 01/07/2023] Open
Abstract
Various enzyme identification protocols involving homology transfer by sequence-sequence or profile-sequence comparisons have been devised which utilise Swiss-Prot sequences associated with EC numbers as the training set. A profile HMM constructed for a particular EC number might select sequences which perform a different enzymatic function due to the presence of certain fold-specific residues which are conserved in enzymes sharing a common fold. We describe a protocol, ModEnzA (HMM-ModE Enzyme Annotation), which generates profile HMMs highly specific at a functional level as defined by the EC numbers by incorporating information from negative training sequences. We enrich the training dataset by mining sequences from the NCBI Non-Redundant database for increased sensitivity. We compare our method with other enzyme identification methods, both for assigning EC numbers to a genome as well as identifying protein sequences associated with an enzymatic activity. We report a sensitivity of 88% and specificity of 95% in identifying EC numbers and annotating enzymatic sequences from the E. coli genome which is higher than any other method. With the next-generation sequencing methods producing a huge amount of sequence data, the development and use of fully automated yet accurate protocols such as ModEnzA is warranted for rapid annotation of newly sequenced genomes and metagenomic sequences.
Collapse
|
6
|
Wu XL, Beissinger TM, Bauck S, Woodward B, Rosa GJM, Weigel KA, Gatti NDL, Gianola D. A primer on high-throughput computing for genomic selection. Front Genet 2011; 2:4. [PMID: 22303303 PMCID: PMC3268564 DOI: 10.3389/fgene.2011.00004] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2010] [Accepted: 02/07/2011] [Indexed: 12/30/2022] Open
Abstract
High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans.
Collapse
Affiliation(s)
- Xiao-Lin Wu
- Department of Dairy Science, University of Wisconsin Madison, WI, USA
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Terzer M, Maynard ND, Covert MW, Stelling J. Genome-scale metabolic networks. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2011; 1:285-297. [PMID: 20835998 DOI: 10.1002/wsbm.37] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
During the last decade, models have been developed to characterize cellular metabolism at the level of an entire metabolic network. The main concept that underlies whole-network metabolic modeling is the identification and mathematical definition of constraints. Here, we review large-scale metabolic network modeling, in particular, stoichiometric- and constraint-based approaches. Although many such models have been reconstructed, few networks have been extensively validated and tested experimentally, and we focus on these. We describe how metabolic networks can be represented using stoichiometric matrices and well-defined constraints on metabolic fluxes. We then discuss relatively successful approaches, including flux balance analysis (FBA), pathway analysis, and common extensions or modifications to these approaches. Finally, we describe techniques for integrating these approaches with models of other biological processes.
Collapse
Affiliation(s)
- Marco Terzer
- Department of Biosystems Science and Engineering, ETH Zurich, Switzerland
| | | | - Markus W Covert
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Jörg Stelling
- Department of Biosystems Science and Engineering, ETH Zurich, Switzerland
| |
Collapse
|
8
|
Aho T, Almusa H, Matilainen J, Larjo A, Ruusuvuori P, Aho KL, Wilhelm T, Lähdesmäki H, Beyer A, Harju M, Chowdhury S, Leinonen K, Roos C, Yli-Harja O. Reconstruction and validation of RefRec: a global model for the yeast molecular interaction network. PLoS One 2010; 5:e10662. [PMID: 20498836 PMCID: PMC2871048 DOI: 10.1371/journal.pone.0010662] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Accepted: 04/15/2010] [Indexed: 11/26/2022] Open
Abstract
Molecular interaction networks establish all cell biological processes. The networks are under intensive research that is facilitated by new high-throughput measurement techniques for the detection, quantification, and characterization of molecules and their physical interactions. For the common model organism yeast Saccharomyces cerevisiae, public databases store a significant part of the accumulated information and, on the way to better understanding of the cellular processes, there is a need to integrate this information into a consistent reconstruction of the molecular interaction network. This work presents and validates RefRec, the most comprehensive molecular interaction network reconstruction currently available for yeast. The reconstruction integrates protein synthesis pathways, a metabolic network, and a protein-protein interaction network from major biological databases. The core of the reconstruction is based on a reference object approach in which genes, transcripts, and proteins are identified using their primary sequences. This enables their unambiguous identification and non-redundant integration. The obtained total number of different molecular species and their connecting interactions is approximately 67,000. In order to demonstrate the capacity of RefRec for functional predictions, it was used for simulating the gene knockout damage propagation in the molecular interaction network in approximately 590,000 experimentally validated mutant strains. Based on the simulation results, a statistical classifier was subsequently able to correctly predict the viability of most of the strains. The results also showed that the usage of different types of molecular species in the reconstruction is important for accurate phenotype prediction. In general, the findings demonstrate the benefits of global reconstructions of molecular interaction networks. With all the molecular species and their physical interactions explicitly modeled, our reconstruction is able to serve as a valuable resource in additional analyses involving objects from multiple molecular -omes. For that purpose, RefRec is freely available in the Systems Biology Markup Language format.
Collapse
Affiliation(s)
- Tommi Aho
- Department of Signal Processing, Tampere University of Technology, Tampere, Finland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Iyer LM, Abhiman S, de Souza RF, Aravind L. Origin and evolution of peptide-modifying dioxygenases and identification of the wybutosine hydroxylase/hydroperoxidase. Nucleic Acids Res 2010; 38:5261-79. [PMID: 20423905 PMCID: PMC2938197 DOI: 10.1093/nar/gkq265] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Unlike classical 2-oxoglutarate and iron-dependent dioxygenases, which include several nucleic acid modifiers, the structurally similar jumonji-related dioxygenase superfamily was only known to catalyze peptide modifications. Using comparative genomics methods, we predict that a family of jumonji-related enzymes catalyzes wybutosine hydroxylation/peroxidation at position 37 of eukaryotic tRNAPhe. Identification of this enzyme raised questions regarding the emergence of protein- and nucleic acid-modifying activities among jumonji-related domains. We addressed these with a natural classification of DSBH domains and reconstructed the precursor of the dioxygenases as a sugar-binding domain. This precursor gave rise to sugar epimerases and metal-binding sugar isomerases. The sugar isomerase active site was exapted for catalysis of oxygenation, with a radiation of these enzymes in bacteria, probably due to impetus from the primary oxygenation event in Earth’s history. 2-Oxoglutarate-dependent versions appear to have further expanded with rise of the tricarboxylic acid cycle. We identify previously under-appreciated aspects of their active site and multiple independent innovations of 2-oxoacid-binding basic residues among these superfamilies. We show that double-stranded β-helix dioxygenases diversified extensively in biosynthesis and modification of halogenated siderophores, antibiotics, peptide secondary metabolites and glycine-rich collagen-like proteins in bacteria. Jumonji-related domains diversified into three distinct lineages in bacterial secondary metabolism systems and these were precursors of the three major clades of eukaryotic enzymes. The specificity of wybutosine hydroxylase/peroxidase probably relates to the structural similarity of the modified moiety to the ancestral amino acid substrate of this superfamily.
Collapse
Affiliation(s)
- Lakshminarayan M Iyer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
10
|
Zhao J, Geng C, Tao L, Zhang D, Jiang Y, Tang K, Zhu R, Yu H, Zhang W, He F, Li Y, Cao Z. Reconstruction and Analysis of Human Liver-Specific Metabolic Network Based on CNHLPP Data. J Proteome Res 2010; 9:1648-58. [DOI: 10.1021/pr9006188] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Jing Zhao
- Shanghai Center for Bioinformation and Technology, Shanghai, China, School of Life Sciences and Technology, Tongji University, Shanghai, China, Key laboratory of Arrthythmias, Ministry of Education, China, Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine, Beijing, China, Beijing Proteome Research Center, Beijing, China, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China, Department of Natural Medicinal Chemistry, College of Pharmacy,
| | - Chao Geng
- Shanghai Center for Bioinformation and Technology, Shanghai, China, School of Life Sciences and Technology, Tongji University, Shanghai, China, Key laboratory of Arrthythmias, Ministry of Education, China, Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine, Beijing, China, Beijing Proteome Research Center, Beijing, China, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China, Department of Natural Medicinal Chemistry, College of Pharmacy,
| | - Lin Tao
- Shanghai Center for Bioinformation and Technology, Shanghai, China, School of Life Sciences and Technology, Tongji University, Shanghai, China, Key laboratory of Arrthythmias, Ministry of Education, China, Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine, Beijing, China, Beijing Proteome Research Center, Beijing, China, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China, Department of Natural Medicinal Chemistry, College of Pharmacy,
| | - Duanfeng Zhang
- Shanghai Center for Bioinformation and Technology, Shanghai, China, School of Life Sciences and Technology, Tongji University, Shanghai, China, Key laboratory of Arrthythmias, Ministry of Education, China, Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine, Beijing, China, Beijing Proteome Research Center, Beijing, China, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China, Department of Natural Medicinal Chemistry, College of Pharmacy,
| | - Ying Jiang
- Shanghai Center for Bioinformation and Technology, Shanghai, China, School of Life Sciences and Technology, Tongji University, Shanghai, China, Key laboratory of Arrthythmias, Ministry of Education, China, Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine, Beijing, China, Beijing Proteome Research Center, Beijing, China, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China, Department of Natural Medicinal Chemistry, College of Pharmacy,
| | - Kailin Tang
- Shanghai Center for Bioinformation and Technology, Shanghai, China, School of Life Sciences and Technology, Tongji University, Shanghai, China, Key laboratory of Arrthythmias, Ministry of Education, China, Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine, Beijing, China, Beijing Proteome Research Center, Beijing, China, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China, Department of Natural Medicinal Chemistry, College of Pharmacy,
| | - Ruixin Zhu
- Shanghai Center for Bioinformation and Technology, Shanghai, China, School of Life Sciences and Technology, Tongji University, Shanghai, China, Key laboratory of Arrthythmias, Ministry of Education, China, Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine, Beijing, China, Beijing Proteome Research Center, Beijing, China, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China, Department of Natural Medicinal Chemistry, College of Pharmacy,
| | - Hong Yu
- Shanghai Center for Bioinformation and Technology, Shanghai, China, School of Life Sciences and Technology, Tongji University, Shanghai, China, Key laboratory of Arrthythmias, Ministry of Education, China, Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine, Beijing, China, Beijing Proteome Research Center, Beijing, China, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China, Department of Natural Medicinal Chemistry, College of Pharmacy,
| | - Weidong Zhang
- Shanghai Center for Bioinformation and Technology, Shanghai, China, School of Life Sciences and Technology, Tongji University, Shanghai, China, Key laboratory of Arrthythmias, Ministry of Education, China, Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine, Beijing, China, Beijing Proteome Research Center, Beijing, China, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China, Department of Natural Medicinal Chemistry, College of Pharmacy,
| | - Fuchu He
- Shanghai Center for Bioinformation and Technology, Shanghai, China, School of Life Sciences and Technology, Tongji University, Shanghai, China, Key laboratory of Arrthythmias, Ministry of Education, China, Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine, Beijing, China, Beijing Proteome Research Center, Beijing, China, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China, Department of Natural Medicinal Chemistry, College of Pharmacy,
| | - Yixue Li
- Shanghai Center for Bioinformation and Technology, Shanghai, China, School of Life Sciences and Technology, Tongji University, Shanghai, China, Key laboratory of Arrthythmias, Ministry of Education, China, Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine, Beijing, China, Beijing Proteome Research Center, Beijing, China, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China, Department of Natural Medicinal Chemistry, College of Pharmacy,
| | - Zhiwei Cao
- Shanghai Center for Bioinformation and Technology, Shanghai, China, School of Life Sciences and Technology, Tongji University, Shanghai, China, Key laboratory of Arrthythmias, Ministry of Education, China, Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine, Beijing, China, Beijing Proteome Research Center, Beijing, China, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China, Department of Natural Medicinal Chemistry, College of Pharmacy,
| |
Collapse
|
11
|
Karp PD, Paley SM, Krummenacker M, Latendresse M, Dale JM, Lee TJ, Kaipa P, Gilham F, Spaulding A, Popescu L, Altman T, Paulsen I, Keseler IM, Caspi R. Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief Bioinform 2009; 11:40-79. [PMID: 19955237 DOI: 10.1093/bib/bbp043] [Citation(s) in RCA: 326] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Pathway Tools is a production-quality software environment for creating a type of model-organism database called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc integrates the evolving understanding of the genes, proteins, metabolic network and regulatory network of an organism. This article provides an overview of Pathway Tools capabilities. The software performs multiple computational inferences including prediction of metabolic pathways, prediction of metabolic pathway hole fillers and prediction of operons. It enables interactive editing of PGDBs by DB curators. It supports web publishing of PGDBs, and provides a large number of query and visualization tools. The software also supports comparative analyses of PGDBs, and provides several systems biology analyses of PGDBs including reachability analysis of metabolic networks, and interactive tracing of metabolites through a metabolic network. More than 800 PGDBs have been created using Pathway Tools by scientists around the world, many of which are curated DBs for important model organisms. Those PGDBs can be exchanged using a peer-to-peer DB sharing system called the PGDB Registry.
Collapse
Affiliation(s)
- Peter D Karp
- Artificial Intelligence Center, SRI International, 333 Ravenswood Ave, AE206, Menlo Park, CA 94025, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Pathway projector: web-based zoomable pathway browser using KEGG atlas and Google Maps API. PLoS One 2009; 4:e7710. [PMID: 19907644 PMCID: PMC2770834 DOI: 10.1371/journal.pone.0007710] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2009] [Accepted: 10/11/2009] [Indexed: 11/19/2022] Open
Abstract
Background Biochemical pathways provide an essential context for understanding comprehensive experimental data and the systematic workings of a cell. Therefore, the availability of online pathway browsers will facilitate post-genomic research, just as genome browsers have contributed to genomics. Many pathway maps have been provided online as part of public pathway databases. Most of these maps, however, function as the gateway interface to a specific database, and the comprehensiveness of their represented entities, data mapping capabilities, and user interfaces are not always sufficient for generic usage. Methodology/Principal Findings We have identified five central requirements for a pathway browser: (1) availability of large integrated maps showing genes, enzymes, and metabolites; (2) comprehensive search features and data access; (3) data mapping for transcriptomic, proteomic, and metabolomic experiments, as well as the ability to edit and annotate pathway maps; (4) easy exchange of pathway data; and (5) intuitive user experience without the requirement for installation and regular maintenance. According to these requirements, we have evaluated existing pathway databases and tools and implemented a web-based pathway browser named Pathway Projector as a solution. Conclusions/Significance Pathway Projector provides integrated pathway maps that are based upon the KEGG Atlas, with the addition of nodes for genes and enzymes, and is implemented as a scalable, zoomable map utilizing the Google Maps API. Users can search pathway-related data using keywords, molecular weights, nucleotide sequences, and amino acid sequences, or as possible routes between compounds. In addition, experimental data from transcriptomic, proteomic, and metabolomic analyses can be readily mapped. Pathway Projector is freely available for academic users at http://www.g-language.org/PathwayProjector/.
Collapse
|
13
|
Davidsen T, Beck E, Ganapathy A, Montgomery R, Zafar N, Yang Q, Madupu R, Goetz P, Galinsky K, White O, Sutton G. The comprehensive microbial resource. Nucleic Acids Res 2009; 38:D340-5. [PMID: 19892825 PMCID: PMC2808947 DOI: 10.1093/nar/gkp912] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Comprehensive Microbial Resource or CMR (http://cmr.jcvi.org) provides a web-based central resource for the display, search and analysis of the sequence and annotation for complete and publicly available bacterial and archaeal genomes. In addition to displaying the original annotation from GenBank, the CMR makes available secondary automated structural and functional annotation across all genomes to provide consistent data types necessary for effective mining of genomic data. Precomputed homology searches are stored to allow meaningful genome comparisons. The CMR supplies users with over 50 different tools to utilize the sequence and annotation data across one or more of the 571 currently available genomes. At the gene level users can view the gene annotation and underlying evidence. Genome level information includes whole genome graphical displays, biochemical pathway maps and genome summary data. Comparative tools display analysis between genomes with homology and genome alignment tools, and searches across the accessions, annotation, and evidence assigned to all genes/genomes are available. The data and tools on the CMR aid genomic research and analysis, and the CMR is included in over 200 scientific publications. The code underlying the CMR website and the CMR database are freely available for download with no license restrictions.
Collapse
|
14
|
Peregrín-Alvarez JM, Sanford C, Parkinson J. The conservation and evolutionary modularity of metabolism. Genome Biol 2009; 10:R63. [PMID: 19523219 PMCID: PMC2718497 DOI: 10.1186/gb-2009-10-6-r63] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2009] [Revised: 05/27/2009] [Accepted: 06/12/2009] [Indexed: 01/09/2023] Open
Abstract
A novel evolutionary analysis of metabolic networks across 26 taxa reveals a highly-conserved but flexible core of metabolic enzymes. Background Cellular metabolism is a fundamental biological system consisting of myriads of enzymatic reactions that together fulfill the basic requirements of life. The recent availability of vast amounts of sequence data from diverse sets of organisms provides an opportunity to systematically examine metabolism from a comparative perspective. Here we supplement existing genome and protein resources with partial genome datasets derived from 193 eukaryotes to present a comprehensive survey of the conservation of metabolism across 26 taxa representing the three domains of life. Results In general, metabolic enzymes are highly conserved. However, organizing these enzymes within the context of functional pathways revealed a spectrum of conservation from those that are highly conserved (for example, carbohydrate, energy, amino acid and nucleotide metabolism enzymes) to those specific to individual taxa (for example, those involved in glycan metabolism and secondary metabolite pathways). Applying a novel co-conservation analysis, KEGG defined pathways did not generally display evolutionary coherence. Instead, such modularity appears restricted to smaller subsets of enzymes. Expanding analyses to a global metabolic network revealed a highly conserved, but nonetheless flexible, 'core' of enzymes largely involved in multiple reactions across different pathways. Enzymes and pathways associated with the periphery of this network were less well conserved and associated with taxon-specific innovations. Conclusions These findings point to an emerging picture in which a core of enzyme activities involving amino acid, energy, carbohydrate and lipid metabolism have evolved to provide the basic functions required for life. However, the precise complement of enzymes associated within this core for each species is flexible.
Collapse
Affiliation(s)
- José M Peregrín-Alvarez
- Program in Molecular Structure and Function, Hospital for Sick Children, College Street, Toronto, ON M5G1L7, Canada.
| | | | | |
Collapse
|
15
|
Whitaker JW, McConkey GA, Westhead DR. The transferome of metabolic genes explored: analysis of the horizontal transfer of enzyme encoding genes in unicellular eukaryotes. Genome Biol 2009; 10:R36. [PMID: 19368726 PMCID: PMC2688927 DOI: 10.1186/gb-2009-10-4-r36] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2008] [Revised: 04/06/2009] [Accepted: 04/15/2009] [Indexed: 12/02/2022] Open
Abstract
Metabolic network analysis in multiple eukaryotes identifies how horizontal and endosymbiotic gene transfer of metabolic enzyme-encoding genes leads to functional gene gain during evolution. Background Metabolic networks are responsible for many essential cellular processes, and exhibit a high level of evolutionary conservation from bacteria to eukaryotes. If genes encoding metabolic enzymes are horizontally transferred and are advantageous, they are likely to become fixed. Horizontal gene transfer (HGT) has played a key role in prokaryotic evolution and its importance in eukaryotes is increasingly evident. High levels of endosymbiotic gene transfer (EGT) accompanied the establishment of plastids and mitochondria, and more recent events have allowed further acquisition of bacterial genes. Here, we present the first comprehensive multi-species analysis of E/HGT of genes encoding metabolic enzymes from bacteria to unicellular eukaryotes. Results The phylogenetic trees of 2,257 metabolic enzymes were used to make E/HGT assertions in ten groups of unicellular eukaryotes, revealing the sources and metabolic processes of the transferred genes. Analyses revealed a preference for enzymes encoded by genes gained through horizontal and endosymbiotic transfers to be connected in the metabolic network. Enrichment in particular functional classes was particularly revealing: alongside plastid related processes and carbohydrate metabolism, this highlighted a number of pathways in eukaryotic parasites that are rich in enzymes encoded by transferred genes, and potentially key to pathogenicity. The plant parasites Phytophthora were discovered to have a potential pathway for lipopolysaccharide biosynthesis of E/HGT origin not seen before in eukaryotes outside the Plantae. Conclusions The number of enzymes encoded by genes gained through E/HGT has been established, providing insight into functional gain during the evolution of unicellular eukaryotes. In eukaryotic parasites, genes encoding enzymes that have been gained through horizontal transfer may be attractive drug targets if they are part of processes not present in the host, or are significantly diverged from equivalent host enzymes.
Collapse
Affiliation(s)
- John W Whitaker
- Institute of Molecular and Cellular Biology, University of Leeds, Leeds, West Yorkshire, LS2 9JT, UK
| | | | | |
Collapse
|
16
|
Kastenmüller G, Schenk ME, Gasteiger J, Mewes HW. Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes. Genome Biol 2009; 10:R28. [PMID: 19284550 PMCID: PMC2690999 DOI: 10.1186/gb-2009-10-3-r28] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2008] [Revised: 02/12/2009] [Accepted: 03/10/2009] [Indexed: 01/20/2023] Open
Abstract
Identifying the biochemical basis of microbial phenotypes is a main objective of comparative genomics. Here we present a novel method using multivariate machine learning techniques for comparing automatically derived metabolic reconstructions of sequenced genomes on a large scale. Applying our method to 266 genomes directly led to testable hypotheses such as the link between the potential of microorganisms to cause periodontal disease and their ability to degrade histidine, a link also supported by clinical studies.
Collapse
Affiliation(s)
- Gabi Kastenmüller
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Ingolstädter Landstraße, D-85764 Neuherberg, Germany
| | - Maria Elisabeth Schenk
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Ingolstädter Landstraße, D-85764 Neuherberg, Germany
| | - Johann Gasteiger
- Computer-Chemie-Centrum, Universität Erlangen-Nürnberg, Nägelsbachstraße, D-91052 Erlangen, Germany
- Molecular Networks GmbH, Henkestraße 91, D-91052 Erlangen, Germany
| | - Hans-Werner Mewes
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Ingolstädter Landstraße, D-85764 Neuherberg, Germany
- Chair for Genome-oriented Bioinformatics, Technische Universität München, Life and Food Science Center Weihenstephan, Am Forum 1, D-85354 Freising-Weihenstephan, Germany
| |
Collapse
|
17
|
Abstract
As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for the best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe the chain of decisions accompanying a metagenomic project from the viewpoint of the bioinformatic analysis step by step. We guide the reader through a standard workflow for a metagenomic project beginning with presequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries, and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic data sets in contrast to genome projects. Different types of data analyses particular to metagenomes are then presented, including binning, dominant population analysis, and gene-centric analysis. Finally, data management issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.
Collapse
|
18
|
Abstract
The development of affordable, high-throughput sequencing technology has led to a flood of publicly available bacterial genome-sequence data. The availability of multiple genome sequences presents both an opportunity and a challenge for microbiologists, and new computational approaches are needed to extract the knowledge that is required to address specific biological problems and to analyse genomic data. The field of e-Science is maturing, and Grid-based technologies can help address this challenge.
Collapse
|
19
|
Senger RS, Papoutsakis ET. Genome-scale model for Clostridium acetobutylicum: Part I. Metabolic network resolution and analysis. Biotechnol Bioeng 2008; 101:1036-52. [PMID: 18767192 DOI: 10.1002/bit.22010] [Citation(s) in RCA: 141] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
A genome-scale metabolic network reconstruction for Clostridium acetobutylicum (ATCC 824) was carried out using a new semi-automated reverse engineering algorithm. The network consists of 422 intracellular metabolites involved in 552 reactions and includes 80 membrane transport reactions. The metabolic network illustrates the reliance of clostridia on the urea cycle, intracellular L-glutamate solute pools, and the acetylornithine transaminase for amino acid biosynthesis from the 2-oxoglutarate precursor. The semi-automated reverse engineering algorithm identified discrepancies in reaction network databases that are major obstacles for fully automated network-building algorithms. The proposed semi-automated approach allowed for the conservation of unique clostridial metabolic pathways, such as an incomplete TCA cycle. A thermodynamic analysis was used to determine the physiological conditions under which proposed pathways (e.g., reverse partial TCA cycle and reverse arginine biosynthesis pathway) are feasible. The reconstructed metabolic network was used to create a genome-scale model that correctly characterized the butyrate kinase knock-out and the asolventogenic M5 pSOL1 megaplasmid degenerate strains. Systematic gene knock-out simulations were performed to identify a set of genes encoding clostridial enzymes essential for growth in silico.
Collapse
Affiliation(s)
- Ryan S Senger
- Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way Newark, Delaware 19711, USA.
| | | |
Collapse
|
20
|
Whitaker JW, Letunic I, McConkey GA, Westhead DR. metaTIGER: a metabolic evolution resource. Nucleic Acids Res 2008; 37:D531-8. [PMID: 18953037 PMCID: PMC2686446 DOI: 10.1093/nar/gkn826] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Metabolic networks are a subject that has received much attention, but existing web resources do not include extensive phylogenetic information. Phylogenomic approaches (phylogenetics on a genomic scale) have been shown to be effective in the study of evolution and processes like horizontal gene transfer (HGT). To address the lack of phylogenomic information relating to eukaryotic metabolism, metaTIGER (www.bioinformatics.leeds.ac.uk/metatiger) has been created, using genomic information from 121 eukaryotes and 404 prokaryotes and sensitive sequence search techniques to predict the presence of metabolic enzymes. These enzyme sequences were used to create a comprehensive database of 2257 maximum-likelihood phylogenetic trees, some containing over 500 organisms. The trees can be viewed using iTOL, an advanced interactive tree viewer, enabling straightforward interpretation of large trees. Complex high-throughput tree analysis is also available through user-defined queries, allowing the rapid identification of trees of interest, e.g. containing putative HGT events. metaTIGER also provides novel and easy-to-use facilities for viewing and comparing the metabolic networks in different organisms via highlighted pathway images and tables. metaTIGER is demonstrated through evolutionary analysis of Plasmodium, including identification of genes horizontally transferred from chlamydia.
Collapse
Affiliation(s)
- John W Whitaker
- Institute of Molecular and Cellular Biology, Garstang Building, University of Leeds, Leeds, W. Yorks, LS2 9JT, UK
| | | | | | | |
Collapse
|
21
|
Bertin PN, Médigue C, Normand P. Advances in environmental genomics: towards an integrated view of micro-organisms and ecosystems. MICROBIOLOGY-SGM 2008; 154:347-359. [PMID: 18227239 DOI: 10.1099/mic.0.2007/011791-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Microbial genome sequencing has, for the first time, made accessible all the components needed for both the elaboration and the functioning of a cell. Associated with other global methods such as protein and mRNA profiling, genomics has considerably extended our knowledge of physiological processes and their diversity not only in human, animal and plant pathogens but also in environmental isolates. At a higher level of complexity, the so-called meta approaches have recently shown great promise in investigating microbial communities, including uncultured micro-organisms. Combined with classical methods of physico-chemistry and microbiology, these endeavours should provide us with an integrated view of how micro-organisms adapt to particular ecological niches and participate in the dynamics of ecosystems.
Collapse
Affiliation(s)
- Philippe N Bertin
- Génétique Moléculaire, Génomique et Microbiologie, Université Louis Pasteur, UMR7156 CNRS, Strasbourg, France
| | | | - Philippe Normand
- Ecologie Microbienne, Université Claude Bernard - Lyon 1, UMR5557 CNRS, Villeurbanne, France
| |
Collapse
|
22
|
Abstract
Comparative genome analysis is critical for the effective exploration of a rapidly growing number of complete and draft sequences for microbial genomes. The Integrated Microbial Genomes (IMG) system (img.jgi.doe.gov) has been developed as a community resource that provides support for comparative analysis of microbial genomes in an integrated context. IMG allows users to navigate the multidimensional microbial genome data space and focus their analysis on a subset of genes, genomes, and functions of interest. IMG provides graphical viewers, summaries, and occurrence profile tools for comparing genes, pathways, and functions (terms) across specific genomes. Genes can be further examined using gene neighborhoods and compared with sequence alignment tools.
Collapse
|
23
|
Sulakhe D, Rodriguez A, Wilde M, Foster I, Maltsev N. Interoperability of GADU in Using Heterogeneous Grid Resources for Bioinformatics Applications. ACTA ACUST UNITED AC 2008; 12:241-6. [DOI: 10.1109/titb.2007.897783] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
24
|
Yu C, Zavaljevski N, Desai V, Johnson S, Stevens FJ, Reifman J. The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation. BMC Bioinformatics 2008; 9:52. [PMID: 18221520 PMCID: PMC2259298 DOI: 10.1186/1471-2105-9-52] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2007] [Accepted: 01/25/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each method. However, integrated systems usually do not provide mechanisms to generate customized databases to predict particular protein functions. Here, we describe a tool termed PIPA (Pipeline for Protein Annotation) that has these capabilities. RESULTS PIPA annotates protein functions by combining the results of multiple programs and databases, such as InterPro and the Conserved Domains Database, into common Gene Ontology (GO) terms. The major algorithms implemented in PIPA are: (1) a profile database generation algorithm, which generates customized profile databases to predict particular protein functions, (2) an automated ontology mapping generation algorithm, which maps various classification schemes into GO, and (3) a consensus algorithm to reconcile annotations from the integrated programs and databases.PIPA's profile generation algorithm is employed to construct the enzyme profile database CatFam, which predicts catalytic functions described by Enzyme Commission (EC) numbers. Validation tests show that CatFam yields average recall and precision larger than 95.0%. CatFam is integrated with PIPA. We use an association rule mining algorithm to automatically generate mappings between terms of two ontologies from annotated sample proteins. Incorporating the ontologies' hierarchical topology into the algorithm increases the number of generated mappings. In particular, it generates 40.0% additional mappings from the Clusters of Orthologous Groups (COG) to EC numbers and a six-fold increase in mappings from COG to GO terms. The mappings to EC numbers show a very high precision (99.8%) and recall (96.6%), while the mappings to GO terms show moderate precision (80.0%) and low recall (33.0%). Our consensus algorithm for GO annotation is based on the computation and propagation of likelihood scores associated with GO terms. The test results suggest that, for a given recall, the application of the consensus algorithm yields higher precision than when consensus is not used. CONCLUSION The algorithms implemented in PIPA provide automated genome-wide protein function annotation based on reconciled predictions from multiple resources.
Collapse
Affiliation(s)
- Chenggang Yu
- Biotechnology HPC Software Applications Institute, Telemedicine and Advanced Technology Research Center, US Army Medical Research and Materiel Command, Ft. Detrick, MD, USA.
| | | | | | | | | | | |
Collapse
|
25
|
Vinatzer BA, Yan S. Mining the genomes of plant pathogenic bacteria: how not to drown in gigabases of sequence. MOLECULAR PLANT PATHOLOGY 2008; 9:105-118. [PMID: 18705888 PMCID: PMC6640517 DOI: 10.1111/j.1364-3703.2007.00438.x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Hundreds of bacterial genomes including the genomes of dozens of plant pathogenic bacteria have been sequenced. These genomes represent an invaluable resource for molecular plant pathologists. In this review, we describe different approaches that can be used for mining bacterial genome sequences and examples of how some of these approaches have been used to analyse plant pathogen genomes so far. We review how genomes can be mined one by one and how comparative genomics of closely related genomes releases the true power of genomics. Databases and tools useful for genome mining that are publicly accessible on the Internet are also described. Finally, the need for new databases and tools to efficiently mine today's plant pathogen genomes and hundreds more in the near future is discussed.
Collapse
Affiliation(s)
- Boris A Vinatzer
- Department of Plant Pathology, Physiology, and Weed Sciences, Virginia Tech, Blacksburg, VA 24061, USA.
| | | |
Collapse
|
26
|
Pinney JW, Papp B, Hyland C, Wambua L, Westhead DR, McConkey GA. Metabolic reconstruction and analysis for parasite genomes. Trends Parasitol 2007; 23:548-54. [PMID: 17950669 DOI: 10.1016/j.pt.2007.08.013] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2007] [Revised: 08/20/2007] [Accepted: 08/20/2007] [Indexed: 01/29/2023]
Abstract
With the completion of sequencing projects for several parasite genomes, efforts are ongoing to make sense of this mass of information in terms of the gene products encoded and their interactions in the growth, development and survival of parasites. The emerging science of systems biology aims to explain the complex relationship between genotype and phenotype by using network models. One area in which this approach has been particularly successful is in the modeling of metabolism. With an accurate picture of the set of metabolic reactions encoded in a genome, it is now possible to identify enzymes or transporters that might be viable targets for new drugs. Because these predictions greatly depend on the quality and completeness of the genome annotation, there are substantial efforts in the scientific community to increase the numbers of metabolic enzymes identified. In this review, we discuss the opportunities for using metabolic reconstruction and analysis tools in parasitology research, and their applications to protozoan parasites.
Collapse
Affiliation(s)
- John W Pinney
- Faculty of Life Sciences, The University of Manchester, Oxford Road, Manchester, UK
| | | | | | | | | | | |
Collapse
|
27
|
Rodriguez AA, Bompada T, Syed M, Shah PK, Maltsev N. Evolutionary analysis of enzymes using Chisel. Bioinformatics 2007; 23:2961-8. [PMID: 17855417 DOI: 10.1093/bioinformatics/btm421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Availability of large volumes of genomic and enzymatic data for taxonomically and phenotypically diverse organisms allows for exploration of the adaptive mechanisms that led to diversification of enzymatic functions. We present Chisel, a computational framework and a pipeline for an automated, high-resolution analysis of evolutionary variations of enzymes. Chisel allows automatic as well as interactive identification, and characterization of enzymatic sequences. Such knowledge can be utilized for comparative genomics, microbial diagnostics, metabolic engineering, drug design and analysis of metagenomes. RESULTS Chisel is a comprehensive resource that contains 8575 clusters and subsequent computational models specific for 939 distinct enzymatic functions and, when data is sufficient, their taxonomic variations. Application of Chisel to identification of enzymatic sequences in newly sequenced genomes, analysis of organism-specific metabolic networks, 'binning' of metagenomes and other biological problems are presented. We also provide a thorough analysis of Chisel performance with other similar resources and manual annotations on Shewanella oneidensis MR1 genome.
Collapse
Affiliation(s)
- Alexis A Rodriguez
- Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Ave., Argonne, IL 60439, USA
| | | | | | | | | |
Collapse
|
28
|
Maier TM, Casey MS, Becker RH, Dorsey CW, Glass EM, Maltsev N, Zahrt TC, Frank DW. Identification of Francisella tularensis Himar1-based transposon mutants defective for replication in macrophages. Infect Immun 2007; 75:5376-89. [PMID: 17682043 PMCID: PMC2168294 DOI: 10.1128/iai.00238-07] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Francisella tularensis, the etiologic agent of tularemia in humans, is a potential biological threat due to its low infectious dose and multiple routes of entry. F. tularensis replicates within several cell types, eventually causing cell death by inducing apoptosis. In this study, a modified Himar1 transposon (HimarFT) was used to mutagenize F. tularensis LVS. Approximately 7,000 Km(r) clones were screened using J774A.1 macrophages for reduction in cytopathogenicity based on retention of the cell monolayer. A total of 441 candidates with significant host cell retention compared to the parent were identified following screening in a high-throughput format. Retesting at a defined multiplicity of infection followed by in vitro growth analyses resulted in identification of approximately 70 candidates representing 26 unique loci involved in macrophage replication and/or cytotoxicity. Mutants carrying insertions in seven hypothetical genes were screened in a mouse model of infection, and all strains tested appeared to be attenuated, which validated the initial in vitro results obtained with cultured macrophages. Complementation and reverse transcription-PCR experiments suggested that the expression of genes adjacent to the HimarFT insertion may be affected depending on the orientation of the constitutive groEL promoter region used to ensure transcription of the selective marker in the transposon. A hypothetical gene, FTL_0706, postulated to be important for lipopolysaccharide biosynthesis, was confirmed to be a gene involved in O-antigen expression in F. tularensis LVS and Schu S4. These and other studies demonstrate that therapeutic targets, vaccine candidates, or virulence-related genes may be discovered utilizing classical genetic approaches in Francisella.
Collapse
Affiliation(s)
- Tamara M Maier
- Department of Microbiology and Molecular Genetics, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226, USA
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Affiliation(s)
- Dmitrij Frishman
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenchaftszentrum Weihenstephan, 85350 Freising, Germany
| |
Collapse
|
30
|
Sun Y, Wipat A, Pocock M, Lee PA, Flanagan K, Worthington JT. Exploring Microbial Genome Sequences to Identify Protein Families on the Grid. ACTA ACUST UNITED AC 2007; 11:435-42. [PMID: 17674626 DOI: 10.1109/titb.2007.892913] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The analysis of microbial genome sequences can identify protein families that provide potential drug targets for new antibiotics. With the rapid accumulation of newly sequenced genomes, this analysis has become a computationally intensive and data-intensive problem. This paper describes the development of a Web-service-enabled, component-based, architecture to support the large-scale comparative analysis of complete microbial genome sequences and the subsequent identification of orthologues and protein families (Microbase). The system is coordinated through the use of Web-service-based notifications and integrates distributed computing resources together with genomic databases to realize all-against-all comparisons for a large volume of genome sequences and to present the data in a computationally amenable format through a Web service interface. We demonstrate the use of the system in searching for orthologues and candidate protein families, which ultimately could lead to the identification of potential therapeutic targets.
Collapse
Affiliation(s)
- Yudong Sun
- Newcastle University, Newcastle Upon Tyne, NE1 7RU, UK.
| | | | | | | | | | | |
Collapse
|
31
|
Iyer LM, Burroughs AM, Aravind L. The prokaryotic antecedents of the ubiquitin-signaling system and the early evolution of ubiquitin-like beta-grasp domains. Genome Biol 2007; 7:R60. [PMID: 16859499 PMCID: PMC1779556 DOI: 10.1186/gb-2006-7-7-r60] [Citation(s) in RCA: 134] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2006] [Revised: 06/12/2006] [Accepted: 07/06/2006] [Indexed: 11/14/2022] Open
Abstract
A systematic analysis of prokaryotic ubiquitin-related beta-grasp fold proteins provides new insights into the Ubiquitin family functional history. Background Ubiquitin (Ub)-mediated signaling is one of the hallmarks of all eukaryotes. Prokaryotic homologs of Ub (ThiS and MoaD) and E1 ligases have been studied in relation to sulfur incorporation reactions in thiamine and molybdenum/tungsten cofactor biosynthesis. However, there is no evidence for entire protein modification systems with Ub-like proteins and deconjugation by deubiquitinating enzymes in prokaryotes. Hence, the evolutionary assembly of the eukaryotic Ub-signaling apparatus remains unclear. Results We systematically analyzed prokaryotic Ub-related β-grasp fold proteins using sensitive sequence profile searches and structural analysis. Consequently, we identified novel Ub-related proteins beyond the characterized ThiS, MoaD, TGS, and YukD domains. To understand their functional associations, we sought and recovered several conserved gene neighborhoods and domain architectures. These included novel associations involving diverse sulfur metabolism proteins, siderophore biosynthesis and the gene encoding the transfer mRNA binding protein SmpB, as well as domain fusions between Ub-like domains and PIN-domain related RNAses. Most strikingly, we found conserved gene neighborhoods in phylogenetically diverse bacteria combining genes for JAB domains (the primary de-ubiquitinating isopeptidases of the proteasomal complex), along with E1-like adenylating enzymes and different Ub-related proteins. Further sequence analysis of other conserved genes in these neighborhoods revealed several Ub-conjugating enzyme/E2-ligase related proteins. Genes for an Ub-like protein and a JAB domain peptidase were also found in the tail assembly gene cluster of certain caudate bacteriophages. Conclusion These observations imply that members of the Ub family had already formed strong functional associations with E1-like proteins, UBC/E2-related proteins, and JAB peptidases in the bacteria. Several of these Ub-like proteins and the associated protein families are likely to function together in signaling systems just as in eukaryotes.
Collapse
Affiliation(s)
- Lakshminarayan M Iyer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - A Maxwell Burroughs
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
- Bioinformatics Program, Boston University, Cummington Street, Boston, Massachusetts 02215, USA
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
32
|
Markowitz VM. Microbial genome data resources. Curr Opin Biotechnol 2007; 18:267-72. [PMID: 17467973 DOI: 10.1016/j.copbio.2007.04.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2007] [Revised: 03/18/2007] [Accepted: 04/18/2007] [Indexed: 11/17/2022]
Abstract
Studies of the genomes of individual microbial organisms as well as aggregate genomes (metagenomes) of microbial communities are expected to lead to advances in various areas, such as healthcare, environmental cleanup, and alternative energy production. A variety of specialized data resources manage the results of different microbial genome data processing and interpretation stages, and represent different degrees of microbial genome characterization. Scientists studying microbial genomes and metagenomes often need one or several of these resources. Given their diversity, these resources cannot be used effectively without determining the scope and type of individual resources as well as the relationship between their data.
Collapse
Affiliation(s)
- Victor M Markowitz
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mail Stop 50A-1148, Berkeley CA 94720, USA.
| |
Collapse
|
33
|
D'Souza M, Glass EM, Syed MH, Zhang Y, Rodriguez A, Maltsev N, Galperin MY. Sentra: a database of signal transduction proteins for comparative genome analysis. Nucleic Acids Res 2006; 35:D271-3. [PMID: 17135204 PMCID: PMC1751548 DOI: 10.1093/nar/gkl949] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Sentra (), a database of signal transduction proteins encoded in completely sequenced prokaryotic genomes, has been updated to reflect recent advances in understanding signal transduction events on a whole-genome scale. Sentra consists of two principal components, a manually curated list of signal transduction proteins in 202 completely sequenced prokaryotic genomes and an automatically generated listing of predicted signaling proteins in 235 sequenced genomes that are awaiting manual curation. In addition to two-component histidine kinases and response regulators, the database now lists manually curated Ser/Thr/Tyr protein kinases and protein phosphatases, as well as adenylate and diguanylate cyclases and c-di-GMP phosphodiesterases, as defined in several recent reviews. All entries in Sentra are extensively annotated with relevant information from public databases (e.g. UniProt, KEGG, PDB and NCBI). Sentra's infrastructure was redesigned to support interactive cross-genome comparisons of signal transduction capabilities of prokaryotic organisms from a taxonomic and phenotypic perspective and in the framework of signal transduction pathways from KEGG. Sentra leverages the PUMA2 system to support interactive analysis and annotation of signal transduction proteins by the users.
Collapse
Affiliation(s)
- Mark D'Souza
- Computational Biology Group, Mathematics and Computer Science Division, Argonne National LaboratoryArgonne, IL 60439, USA
- Computation Institute, University of ChicagoChicago, IL 60637, USA
- To whom correspondence should be addressed. Tel: +1 630 252 5195; Fax: +1 630 252 5986;
| | - Elizabeth M. Glass
- Computational Biology Group, Mathematics and Computer Science Division, Argonne National LaboratoryArgonne, IL 60439, USA
- Computation Institute, University of ChicagoChicago, IL 60637, USA
| | - Mustafa H. Syed
- Computational Biology Group, Mathematics and Computer Science Division, Argonne National LaboratoryArgonne, IL 60439, USA
| | - Yi Zhang
- Computational Biology Group, Mathematics and Computer Science Division, Argonne National LaboratoryArgonne, IL 60439, USA
| | - Alexis Rodriguez
- Computational Biology Group, Mathematics and Computer Science Division, Argonne National LaboratoryArgonne, IL 60439, USA
| | - Natalia Maltsev
- Computational Biology Group, Mathematics and Computer Science Division, Argonne National LaboratoryArgonne, IL 60439, USA
- Computation Institute, University of ChicagoChicago, IL 60637, USA
| | - Michael Y. Galperin
- National Center for Biotechnology Information, National Library of MedicineMSC3830, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
34
|
Hyland C, Pinney JW, McConkey GA, Westhead DR. metaSHARK: a WWW platform for interactive exploration of metabolic networks. Nucleic Acids Res 2006; 34:W725-8. [PMID: 16845107 PMCID: PMC1538829 DOI: 10.1093/nar/gkl196] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
The metaSHARK (metabolic search and reconstruction kit) web server offers users an intuitive, fully interactive way to explore the KEGG metabolic network via a WWW browser. Metabolic reconstruction information for specific organisms, produced by our automated SHARKhunt tool or from other programs or genome annotations, may be uploaded to the website and overlaid on the generic network. Additional data from gene expression experiments can also be incorporated, allowing the visualization of differential gene expression in the context of the predicted metabolic network. metaSHARK is available at .
Collapse
Affiliation(s)
| | - John W. Pinney
- Faculty of Life Sciences, University of ManchesterOxford Road, Manchester M13 9PT, UK
- To whom correspondence should be addressed. Tel: +44 0 161 275 1566; Fax: +44 0 161 275 5082;
| | | | | |
Collapse
|
35
|
Stothard P, Wishart DS. Automated bacterial genome analysis and annotation. Curr Opin Microbiol 2006; 9:505-10. [PMID: 16931121 DOI: 10.1016/j.mib.2006.08.002] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2006] [Accepted: 08/10/2006] [Indexed: 10/24/2022]
Abstract
More than 300 bacterial genome sequences are publicly available, and many more are scheduled to be completed and released in the near future. Converting this raw sequence information into a better understanding of the biology of bacteria involves the identification and annotation of genes, proteins and pathways. This processing is typically done using sequence annotation pipelines comprised of a variety of software modules and, in some cases, human experts. The reference databases, computational methods and knowledge that form the basis of these pipelines are constantly evolving, and thus there is a need to reprocess genome annotations on a regular basis. The combined challenge of revising existing annotations and extracting useful information from the flood of new genome sequences will necessitate more reliance on completely automated systems.
Collapse
Affiliation(s)
- Paul Stothard
- Departments of Biological Sciences & Computing Science, University of Alberta, Alberta, Canada
| | | |
Collapse
|
36
|
Abstract
The NAR Molecular Biology Database Collection is a public online resource that contains links to all databases described in this issue of Nucleic Acids Research. In addition, this collection lists databases that have been featured in previous issues of NAR, as well as selected other databases that are freely available to the public and may be useful to the molecular biologist. The 2006 update includes 858 databases, 139 more than the previous one. The databases come with brief summaries, many of which have been updated recently. Each database is assigned a stable accession number that does not change if the database moves to a new location and its URL, authors' names or the contact person address are updated. The complete database list and summaries are available online at the Nucleic Acids Research website http://nar.oxfordjournals.org/.
Collapse
Affiliation(s)
- Michael Y Galperin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|