1
|
Karp PD, Paley S, Caspi R, Kothari A, Krummenacker M, Midford PE, Moore LR, Subhraveti P, Gama-Castro S, Tierrafria VH, Lara P, Muñiz-Rascado L, Bonavides-Martinez C, Santos-Zavaleta A, Mackie A, Sun G, Ahn-Horst TA, Choi H, Covert MW, Collado-Vides J, Paulsen I. The EcoCyc Database (2023). EcoSal Plus 2023; 11:eesp00022023. [PMID: 37220074 PMCID: PMC10729931 DOI: 10.1128/ecosalplus.esp-0002-2023] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 04/04/2023] [Indexed: 01/28/2024]
Abstract
EcoCyc is a bioinformatics database available online at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on the regulation of gene expression, E. coli gene essentiality, and nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for the analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed online. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. Data generated from a whole-cell model that is parameterized from the latest data on EcoCyc are also available. This review outlines the data content of EcoCyc and of the procedures by which this content is generated.
Collapse
Affiliation(s)
- Peter D. Karp
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Suzanne Paley
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Ron Caspi
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Anamika Kothari
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Markus Krummenacker
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Peter E. Midford
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Lisa R. Moore
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Pallavi Subhraveti
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Socorro Gama-Castro
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Victor H. Tierrafria
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Paloma Lara
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Luis Muñiz-Rascado
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - César Bonavides-Martinez
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Alberto Santos-Zavaleta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Amanda Mackie
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Gwanggyu Sun
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Travis A. Ahn-Horst
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Heejo Choi
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Markus W. Covert
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Ian Paulsen
- School of Natural Sciences, Macquarie University, Sydney, New South Wales, Australia
| |
Collapse
|
2
|
Piedrafita G, Varma SJ, Castro C, Messner CB, Szyrwiel L, Griffin JL, Ralser M. Cysteine and iron accelerate the formation of ribose-5-phosphate, providing insights into the evolutionary origins of the metabolic network structure. PLoS Biol 2021; 19:e3001468. [PMID: 34860829 PMCID: PMC8673631 DOI: 10.1371/journal.pbio.3001468] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 12/15/2021] [Accepted: 11/04/2021] [Indexed: 12/21/2022] Open
Abstract
The structure of the metabolic network is highly conserved, but we know little about its evolutionary origins. Key for explaining the early evolution of metabolism is solving a chicken–egg dilemma, which describes that enzymes are made from the very same molecules they produce. The recent discovery of several nonenzymatic reaction sequences that topologically resemble central metabolism has provided experimental support for a “metabolism first” theory, in which at least part of the extant metabolic network emerged on the basis of nonenzymatic reactions. But how could evolution kick-start on the basis of a metal catalyzed reaction sequence, and how could the structure of nonenzymatic reaction sequences be imprinted on the metabolic network to remain conserved for billions of years? We performed an in vitro screening where we add the simplest components of metabolic enzymes, proteinogenic amino acids, to a nonenzymatic, iron-driven reaction network that resembles glycolysis and the pentose phosphate pathway (PPP). We observe that the presence of the amino acids enhanced several of the nonenzymatic reactions. Particular attention was triggered by a reaction that resembles a rate-limiting step in the oxidative PPP. A prebiotically available, proteinogenic amino acid cysteine accelerated the formation of RNA nucleoside precursor ribose-5-phosphate from 6-phosphogluconate. We report that iron and cysteine interact and have additive effects on the reaction rate so that ribose-5-phosphate forms at high specificity under mild, metabolism typical temperature and environmental conditions. We speculate that accelerating effects of amino acids on rate-limiting nonenzymatic reactions could have facilitated a stepwise enzymatization of nonenzymatic reaction sequences, imprinting their structure on the evolving metabolic network. The evolutionary origins of metabolism are largely unknown. This study shows that the prebiotically available proteinogenic amino acid cysteine can promote the metabolism-like rate-limiting formation of ribose-5-phosphate, suggesting that early metabolic pathways could have emerged thought the stepwise enzymatization of non-enzymatic reaction sequences.
Collapse
Affiliation(s)
- Gabriel Piedrafita
- Department of Biochemistry and Cambridge Systems Biology Centre, University of Cambridge, United Kingdom
| | - Sreejith J. Varma
- Department of Biochemistry, Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Cecilia Castro
- Department of Biochemistry and Cambridge Systems Biology Centre, University of Cambridge, United Kingdom
| | - Christoph B. Messner
- The Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Lukasz Szyrwiel
- Department of Biochemistry, Charité Universitätsmedizin Berlin, Berlin, Germany
- The Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Julian L. Griffin
- Department of Biochemistry and Cambridge Systems Biology Centre, University of Cambridge, United Kingdom
- The Rowett Institute, The University of Aberdeen, Aberdeen, United Kingdom
| | - Markus Ralser
- Department of Biochemistry and Cambridge Systems Biology Centre, University of Cambridge, United Kingdom
- Department of Biochemistry, Charité Universitätsmedizin Berlin, Berlin, Germany
- The Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
- * E-mail:
| |
Collapse
|
3
|
Swietnicki W, Caspi R. Prediction of Selected Biosynthetic Pathways for the Lipopolysaccharide Components in Porphyromonas gingivalis. Pathogens 2021; 10:pathogens10030374. [PMID: 33804654 PMCID: PMC8003790 DOI: 10.3390/pathogens10030374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/16/2021] [Accepted: 03/17/2021] [Indexed: 11/16/2022] Open
Abstract
Porphyromonas gingivalis is an oral human pathogen. The bacterium destroys dental tissue and is a serious health problem worldwide. Experimental data and bioinformatic analysis revealed that the pathogen produces three types of lipopolysaccharides (LPS): normal (O-type), anionic (A-type), and capsular (K-type). The enzymes involved in the production of all three types of lipopolysaccharide have been largely identified for the first two and partially for the third type. In the current work, we use bioinformatics tools to predict biosynthetic pathways for the production of the normal (O-type) lipopolysaccharide in the W50 strain Porphyromonas gingivalis and compare the pathway with other putative pathways in fully sequenced and completed genomes of other pathogenic strains. Selected enzymes from the pathway have been modeled and putative structures are presented. The pathway for the A-type antigen could not be predicted at this time due to two mutually exclusive structures proposed in the literature. The pathway for K-type antigen biosynthesis could not be predicted either due to the lack of structural data for the antigen. However, pathways for the synthesis of lipid A, its core components, and the O-type antigen ligase reaction have been proposed based on a combination of experimental data and bioinformatic analyses. The predicted pathways are compared with known pathways in other systems and discussed. It is the first report in the literature showing, in detail, predicted pathways for the synthesis of selected LPS components for the model W50 strain of P. gingivalis.
Collapse
Affiliation(s)
- Wieslaw Swietnicki
- Department of Immunology of Infectious Diseases, L. Hirszfeld Institute of Immunology and Experimental Therapy of PAS, ul. R. Weigla 12, 53-114 Wroclaw, Poland
- Correspondence:
| | - Ron Caspi
- Artificial Intelligence Center, SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493, USA;
| |
Collapse
|
4
|
Pathway Tools Visualization of Organism-Scale Metabolic Networks. Metabolites 2021; 11:metabo11020064. [PMID: 33499002 PMCID: PMC7911265 DOI: 10.3390/metabo11020064] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 01/12/2021] [Accepted: 01/12/2021] [Indexed: 12/20/2022] Open
Abstract
Metabolomics, synthetic biology, and microbiome research demand information about organism-scale metabolic networks. The convergence of genome sequencing and computational inference of metabolic networks has enabled great progress toward satisfying that demand by generating metabolic reconstructions from the genomes of thousands of sequenced organisms. Visualization of whole metabolic networks is critical for aiding researchers in understanding, analyzing, and exploiting those reconstructions. We have developed bioinformatics software tools that automatically generate a full metabolic-network diagram for an organism, and that enable searching and analyses of the network. The software generates metabolic-network diagrams for unicellular organisms, for multi-cellular organisms, and for pan-genomes and organism communities. Search tools enable users to find genes, metabolites, enzymes, reactions, and pathways within a diagram. The diagrams are zoomable to enable researchers to study local neighborhoods in detail and to see the big picture. The diagrams also serve as tools for comparison of metabolic networks and for interpreting high-throughput datasets, including transcriptomics, metabolomics, and reaction fluxes computed by metabolic models. These data can be overlaid on the metabolic charts to produce animated zoomable displays of metabolic flux and metabolite abundance. The BioCyc.org website contains whole-network diagrams for more than 18,000 sequenced organisms. The ready availability of organism-specific metabolic network diagrams and associated tools for almost any sequenced organism are useful for researchers working to better understand the metabolism of their organism and to interpret high-throughput datasets in a metabolic context.
Collapse
|
5
|
Karp PD, Ong WK, Paley S, Billington R, Caspi R, Fulcher C, Kothari A, Krummenacker M, Latendresse M, Midford PE, Subhraveti P, Gama-Castro S, Muñiz-Rascado L, Bonavides-Martinez C, Santos-Zavaleta A, Mackie A, Collado-Vides J, Keseler IM, Paulsen I. The EcoCyc Database. EcoSal Plus 2018; 8:10.1128/ecosalplus.ESP-0006-2018. [PMID: 30406744 PMCID: PMC6504970 DOI: 10.1128/ecosalplus.esp-0006-2018] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Indexed: 01/28/2023]
Abstract
EcoCyc is a bioinformatics database available at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on E. coli gene essentiality and on nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed via EcoCyc.org. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. This review outlines the data content of EcoCyc and of the procedures by which this content is generated.
Collapse
Affiliation(s)
- Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Wai Kit Ong
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Suzanne Paley
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | - Ron Caspi
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Carol Fulcher
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Anamika Kothari
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | - Mario Latendresse
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Peter E Midford
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | - Socorro Gama-Castro
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Luis Muñiz-Rascado
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - César Bonavides-Martinez
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Alberto Santos-Zavaleta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Amanda Mackie
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Ingrid M Keseler
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Ian Paulsen
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
6
|
Khomtchouk BB, Weitz E, Karp PD, Wahlestedt C. How the strengths of Lisp-family languages facilitate building complex and flexible bioinformatics applications. Brief Bioinform 2018; 19:537-543. [PMID: 28040748 PMCID: PMC5952920 DOI: 10.1093/bib/bbw130] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Revised: 11/16/2016] [Indexed: 11/14/2022] Open
Abstract
We present a rationale for expanding the presence of the Lisp family of programming languages in bioinformatics and computational biology research. Put simply, Lisp-family languages enable programmers to more quickly write programs that run faster than in other languages. Languages such as Common Lisp, Scheme and Clojure facilitate the creation of powerful and flexible software that is required for complex and rapidly evolving domains like biology. We will point out several important key features that distinguish languages of the Lisp family from other programming languages, and we will explain how these features can aid researchers in becoming more productive and creating better code. We will also show how these features make these languages ideal tools for artificial intelligence and machine learning applications. We will specifically stress the advantages of domain-specific languages (DSLs): languages that are specialized to a particular area, and thus not only facilitate easier research problem formulation, but also aid in the establishment of standards and best programming practices as applied to the specific research field at hand. DSLs are particularly easy to build in Common Lisp, the most comprehensive Lisp dialect, which is commonly referred to as the 'programmable programming language'. We are convinced that Lisp grants programmers unprecedented power to build increasingly sophisticated artificial intelligence systems that may ultimately transform machine learning and artificial intelligence research in bioinformatics and computational biology.
Collapse
Affiliation(s)
- Bohdan B Khomtchouk
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| | - Edmund Weitz
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| | - Peter D Karp
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| | - Claes Wahlestedt
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| |
Collapse
|
7
|
Karp PD, Weaver D, Paley S, Fulcher C, Kubo A, Kothari A, Krummenacker M, Subhraveti P, Weerasinghe D, Gama-Castro S, Huerta AM, Muñiz-Rascado L, Bonavides-Martinez C, Weiss V, Peralta-Gil M, Santos-Zavaleta A, Schröder I, Mackie A, Gunsalus R, Collado-Vides J, Keseler IM, Paulsen I. The EcoCyc Database. EcoSal Plus 2014; 6:10.1128/ecosalplus.ESP-0009-2013. [PMID: 26442933 PMCID: PMC4243172 DOI: 10.1128/ecosalplus.esp-0009-2013] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Indexed: 11/20/2022]
Abstract
EcoCyc is a bioinformatics database available at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene, metabolite, reaction, operon, and metabolic pathway. The database also includes information on E. coli gene essentiality and on nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. This review provides a detailed description of the data content of EcoCyc and of the procedures by which this content is generated.
Collapse
Affiliation(s)
- Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Daniel Weaver
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Suzanne Paley
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Carol Fulcher
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Aya Kubo
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Anamika Kothari
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | | | | | - Socorro Gama-Castro
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Araceli M Huerta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Luis Muñiz-Rascado
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - César Bonavides-Martinez
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Verena Weiss
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Martin Peralta-Gil
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Alberto Santos-Zavaleta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Imke Schröder
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095
- UCLA Institute of Genomics and Proteomics, University of California, Los Angeles, CA 90095
| | - Amanda Mackie
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Robert Gunsalus
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Ingrid M Keseler
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Ian Paulsen
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
8
|
Nicoletti MC, Bertini JR, Tanizaki MM, Zangirolami TC, Gonçalves VM, Horta ACL, Giordano RC. On-line prediction of the feeding phase in high-cell density cultivation of rE. coli using constructive neural networks. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2013; 111:228-248. [PMID: 23566708 DOI: 10.1016/j.cmpb.2013.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2011] [Revised: 12/18/2012] [Accepted: 03/11/2013] [Indexed: 06/02/2023]
Abstract
Streptococcus pneumoniae (pneumococcus) is a bacterium responsible for a wide spectrum of illnesses. The surface of the bacterium consists of three distinctive membranes: plasmatic, cellular and the polysaccharide (PS) capsule. PS capsules may mediate several biological processes, particularly invasive infections of human beings. Prevention against pneumococcal related illnesses can be provided by vaccines. There is a sound investment worldwide in the investigation of a proteic antigen as a possible alternative to pneumococcal vaccines based exclusively on PS. A few proteins which are part of the membrane of the pneumococcus seem to have antigen potential to be part of a vaccine, particularly the PspA. A vital aspect in the production of the intended conjugate pneumococcal vaccine is the efficient production (in industrial scale) of both, the chosen PS serotypes as well as the PspA protein. Growing recombinant Escherichia coli (rE. coli) in high-cell density cultures (HCDC) under a fed-batch regime requires a refined continuous control over various process variables where the on-line prediction of the feeding phase is of particular relevance and one of the focuses of this paper. The viability of an on-line monitoring software system, based on constructive neural networks (CoNN), for automatically detecting the time to start the fed-phase of a HCDC of rE. coli that contains a plasmid used for PspA expression is investigated. The paper describes the data and methodology used for training five different types of CoNNs, four of them suitable for classification tasks and one suitable for regression tasks, aiming at comparatively investigate both approaches. Results of software simulations implementing five CoNN algorithms as well as conventional neural networks (FFNN), decision trees (DT) and support vector machines (SVM) are also presented and discussed. A modified CasCor algorithm, implementing a data softening process, has shown to be an efficient candidate to be part of an on-line HCDC monitoring system for detecting the feeding phase of the HCDC process.
Collapse
Affiliation(s)
- M C Nicoletti
- Depto. de Computação, UFSCar, S. Carlos, SP, Brazil.
| | | | | | | | | | | | | |
Collapse
|
9
|
Van Moerkercke A, Fabris M, Pollier J, Baart GJE, Rombauts S, Hasnain G, Rischer H, Memelink J, Oksman-Caldentey KM, Goossens A. CathaCyc, a metabolic pathway database built from Catharanthus roseus RNA-Seq data. PLANT & CELL PHYSIOLOGY 2013; 54:673-85. [PMID: 23493402 DOI: 10.1093/pcp/pct039] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The medicinal plant Madagascar periwinkle (Catharanthus roseus) synthesizes numerous terpenoid indole alkaloids (TIAs), such as the anticancer drugs vinblastine and vincristine. The TIA pathway operates in a complex metabolic network that steers plant growth and survival. Pathway databases and metabolic networks reconstructed from 'omics' sequence data can help to discover missing enzymes, study metabolic pathway evolution and, ultimately, engineer metabolic pathways. To date, such databases have mainly been built for model plant species with sequenced genomes. Although genome sequence data are not available for most medicinal plant species, next-generation sequencing is now extensively employed to create comprehensive medicinal plant transcriptome sequence resources. Here we report on the construction of CathaCyc, a detailed metabolic pathway database, from C. roseus RNA-Seq data sets. CathaCyc (version 1.0) contains 390 pathways with 1,347 assigned enzymes and spans primary and secondary metabolism. Curation of the pathways linked with the synthesis of TIAs and triterpenoids, their primary metabolic precursors, and their elicitors, the jasmonate hormones, demonstrated that RNA-Seq resources are suitable for the construction of pathway databases. CathaCyc is accessible online (http://www.cathacyc.org) and offers a range of tools for the visualization and analysis of metabolic networks and 'omics' data. Overlay with expression data from publicly available RNA-Seq resources demonstrated that two well-characterized C. roseus terpenoid pathways, those of TIAs and triterpenoids, are subject to distinct regulation by both developmental and environmental cues. We anticipate that databases such as CathaCyc will become key to the study and exploitation of the metabolism of medicinal plants.
Collapse
|
10
|
Kadir TAA, Mannan AA, Kierzek AM, McFadden J, Shimizu K. Modeling and simulation of the main metabolism in Escherichia coli and its several single-gene knockout mutants with experimental verification. Microb Cell Fact 2010; 9:88. [PMID: 21092096 PMCID: PMC2999585 DOI: 10.1186/1475-2859-9-88] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2010] [Accepted: 11/19/2010] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND It is quite important to simulate the metabolic changes of a cell in response to the change in culture environment and/or specific gene knockouts particularly for the purpose of application in industry. If this could be done, the cell design can be made without conducting exhaustive experiments, and one can screen out the promising candidates, proceeded by experimental verification of a select few of particular interest. Although several models have so far been proposed, most of them focus on the specific metabolic pathways. It is preferred to model the whole of the main metabolic pathways in Escherichia coli, allowing for the estimation of energy generation and cell synthesis, based on intracellular fluxes and that may be used to characterize phenotypic growth. RESULTS In the present study, we considered the simulation of the main metabolic pathways such as glycolysis, TCA cycle, pentose phosphate (PP) pathway, and the anapleorotic pathways using enzymatic reaction models of E. coli. Once intracellular fluxes were computed by this model, the specific ATP production rate, the specific CO₂ production rate, and the specific NADPH production rate could be estimated. The specific ATP production rate thus computed was used for the estimation of the specific growth rate. The CO₂ production rate could be used to estimate cell yield, and the specific NADPH production rate could be used to determine the flux of the oxidative PP pathway. The batch and continuous cultivations were simulated where the changing patterns of extracellular and intra-cellular metabolite concentrations were compared with experimental data. Moreover, the effects of the knockout of such pathways as Ppc, Pck and Pyk on the metabolism were simulated. It was shown to be difficult for the cell to grow in Ppc mutant due to low concentration of OAA, while Pck mutant does not necessarily show this phenomenon. The slower growth rate of the Ppc mutant was properly estimated by taking into account the lower specific ATP production rate. In the case of Pyk mutant, the enzyme level regulation was made clear such that Pyk knockout caused PEP concentration to be up-regulated and activated Ppc, which caused the increase in MAL concentration and backed up reduced PYR through Mez, resulting in the phenotypic growth characteristics similar to the wild type. CONCLUSIONS It was shown to be useful to simulate the main metabolism of E. coli for understanding metabolic changes inside the cell in response to specific pathway gene knockouts, considering the whole main metabolic pathways. The comparison of the simulation result with the experimental data indicates that the present model could simulate the effect of the specific gene knockouts to the changes in the metabolisms to some extent.
Collapse
Affiliation(s)
- Tuty Asmawaty Abdul Kadir
- Dept. of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan
| | - Ahmad A Mannan
- Fac. Of Health and Medical Sciences, AW Building, University of Surrey, Guilford Surrey GU2 7TE, UK
| | - Andrzej M Kierzek
- Fac. Of Health and Medical Sciences, AW Building, University of Surrey, Guilford Surrey GU2 7TE, UK
| | - Johnjoe McFadden
- Fac. Of Health and Medical Sciences, AW Building, University of Surrey, Guilford Surrey GU2 7TE, UK
| | - Kazuyuki Shimizu
- Dept. of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0017, Japan
| |
Collapse
|
11
|
Affiliation(s)
- S H Preskorn
- Department of Psychiatry, University of Kansas School of Medicine-Wichita, Wichita, Kansas, USA
| |
Collapse
|
12
|
Abstract
We describe a graphical editor designed specifically to facilitate analysis and visualization of complex signal-transduction pathways. The editor provides automatic layout of complex regulatory graphs and enables users easily to maintain, edit, and exchange publication-quality images of regulatory networks.
Collapse
Affiliation(s)
- T Koike
- Columbia Genome Center, Columbia University, New York, NY, USA
| | | |
Collapse
|
13
|
Affiliation(s)
- M Kanehisa
- Institute for Chemical Research, Kyoto University, Japan
| |
Collapse
|
14
|
Tweeddale H, Notley-McRobb L, Ferenci T. Assessing the effect of reactive oxygen species on Escherichia coli using a metabolome approach. Redox Rep 2000; 4:237-41. [PMID: 10731098 DOI: 10.1179/135100099101534954] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
A two-dimensional thin-layer chromatographic analysis of [14C]-labelled metabolites in Escherichia coli was employed to follow metabolic shifts in response to superoxide stress. Steady-state challenge with paraquat at concentrations inducing SoxRS-controlled genes resulted in several alterations in metabolite pools, including a striking increase in valine concentration. Elevated valine levels, together with increased glutathione and alkylperoxidase, are proposed to constitute an intracellular protection mechanism against reactive oxygen species. As shown by this example of metabolome analysis, novel cellular responses to environmental challenge can be revealed by following the total complement of metabolites in a cell.
Collapse
Affiliation(s)
- H Tweeddale
- Department of Microbiology, University of Sydney, New South Wales, Australia
| | | | | |
Collapse
|
15
|
Abstract
We developed a data and knowledge base for cellular signal transduction in human cells, to make this rapidly growing information available. The database includes all the biological properties of cellular signal transduction, including biological reactions that transfer cellular signals and molecular attributes characterized by sequences, structures, and functions. Since the database is based on the object-oriented technique, highly flexible methods of data definition and modification are necessary to handle this diverse and complex biological information. The database includes attractive graphical representations of signaling cascades and the three-dimensional structure of molecules. The database is a novel application of ACEDB, which was the database originally developed to store the C. elegans genome. The database can be accessed through the Internet at http://geo.nihs.go.jp/csndb.html.
Collapse
Affiliation(s)
- T Takai-Igarashi
- Division of Chem-Bio Informatics, National Institute of Health Sciences, Setagaya, Tokyo, Japan.
| | | | | |
Collapse
|
16
|
Abstract
The discovery and characterization of genes specifically induced in vivo upon infection and/or at a specific stage of the infection will be the next phase in studying bacterial virulence at the molecular level. Genes isolated are most likely to encode virulence-associated factors or products essential for survival, bacterial cell division and multiplication in situ. Identification of these genes is expected to provide new means to prevent infection, new targets for, antimicrobial therapy, as well as new insights into the infection process. Analysis of genes and their sequences initially discovered as in vivo induced may now be revealed by functional and comparative genomics. The new field of virulence genomics and their clustering as pathogenicity islands makes feasible their in-depth analysis. Application of new technologies such as in vivo expression technologies, signature-tagged mutagenesis, differential fluorescence induction, differential display using polymerase chain reaction coupled to bacterial genomics is expected to provide a strong basis for studying in vivo induced genes, and a better understanding of bacterial pathogenicity in vivo. This review presents technologies for characterization of genes expressed in vivo.
Collapse
Affiliation(s)
- M Handfield
- Molecular Microbiology and Protein Engineering, Health and Life Sciences Research Center, Quebec, Canada
| | | |
Collapse
|
17
|
Karp PD, Riley M, Paley SM, Pellegrini-Toole A, Krummenacker M. Eco Cyc: encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res 1999; 27:55-8. [PMID: 9847140 PMCID: PMC148095 DOI: 10.1093/nar/27.1.55] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The EcoCyc database describes the genome and gene products of Escherichia coli, its metabolic and signal-transduction pathways, and its tRNAs. The database describes 4391 genes of E.coli, 695 enzymes encoded by a subset of these genes, 904 metabolic reactions that occur in E.coli, and the organization of these reactions into 129 metabolic pathways. The EcoCyc graphical user interface allows scientists to query and explore the EcoCyc database using visualization tools such as genomic-map browsers and automatic layouts of metabolic pathways. EcoCyc has many references to the primary literature, and is a (qualitative) computational model of E. coli metabolism. EcoCyc is available at URL http://ecocyc. PangeaSystems.com/ecocyc/
Collapse
Affiliation(s)
- P D Karp
- Pangea Systems Inc., 4040 Campbell Avenue, Menlo Park, CA 94025, USA and Marine Biological Laboratory, Woods Hole, MA 02543, USA.
| | | | | | | | | |
Collapse
|
18
|
Fetrow JS, Godzik A, Skolnick J. Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity. J Mol Biol 1998; 282:703-11. [PMID: 9743619 DOI: 10.1006/jmbi.1998.2061] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The application of an automated method for the screening of protein activity based on the sequence-to-structure-to-function paradigm is presented for the complete Escherichia coli genome. First, the structure of the protein is identified from its sequence using a threading algorithm, which aligns the sequences to the best matching structure in a structural database and extends sequence analysis well beyond the limits of local sequence identity. Then, the active site is identified in the resulting sequence-to-structure alignment using a "fuzzy functional form" (FFF), a three-dimensional descriptor of the active site of a protein. Here, this sequence-to-structure-to-function concept is applied to analysis of the complete E. coli genome, i.e. all E. coli open reading frames (ORFs) are screened for the thiol-disulfide oxidoreductase activity of the glutaredoxin/thioredoxin protein family. We show that the method can identify the active sites in ten sequences that are known to or proposed to exhibit this activity. Furthermore, oxidoreductase activity is predicted in two other sequences that have not been identified previously. This method distinguishes protein pairs with similar active sites from proteins pairs that are just topological cousins, i.e. those having similar global folds, but not necessarily similar active sites. Thus, this method provides a novel approach for extraction of active site and functional information based on three-dimensional structures, rather than simple sequence analysis. Prediction of protein activity is fully automated and easily extendible to new functions. Finally, it is demonstrated here that the method can be applied to complete genome database analysis.
Collapse
Affiliation(s)
- J S Fetrow
- Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA
| | | | | |
Collapse
|
19
|
Tweeddale H, Notley-McRobb L, Ferenci T. Effect of slow growth on metabolism of Escherichia coli, as revealed by global metabolite pool ("metabolome") analysis. J Bacteriol 1998; 180:5109-16. [PMID: 9748443 PMCID: PMC107546 DOI: 10.1128/jb.180.19.5109-5116.1998] [Citation(s) in RCA: 299] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/1998] [Accepted: 07/28/1998] [Indexed: 11/20/2022] Open
Abstract
Escherichia coli growing on glucose in minimal medium controls its metabolite pools in response to environmental conditions. The extent of pool changes was followed through two-dimensional thin-layer chromatography of all 14C-glucose labelled compounds extracted from bacteria. The patterns of metabolites and spot intensities detected by phosphorimaging were found to reproducibly differ depending on culture conditions. Clear trends were apparent in the pool sizes of several of the 70 most abundant metabolites extracted from bacteria growing in glucose-limited chemostats at different growth rates. The pools of glutamate, aspartate, trehalose, and adenosine as well as UDP-sugars and putrescine changed markedly. The data on pools observed by two-dimensional thin-layer chromatography were confirmed for amino acids by independent analysis. Other unidentified metabolites also displayed different spot intensities under various conditions, with four trend patterns depending on growth rate. As RpoS controls a number of metabolic genes in response to nutrient limitation, an rpoS mutant was also analyzed for metabolite pools. The mutant had altered metabolite profiles, but only some of the changes at slow growth rates were ascribable to the known control of metabolic genes by RpoS. These results indicate that total metabolite pool ("metabolome") analysis offers a means of revealing novel aspects of cellular metabolism and global regulation.
Collapse
Affiliation(s)
- H Tweeddale
- Department of Microbiology, University of Sydney, New South Wales 2006, Australia
| | | | | |
Collapse
|
20
|
Abstract
Sixteen microorganisms, including one eukaryote, four archaeons, and 11 eubacteria, have been completely sequenced and published. More than 50 genomes are scheduled to be completed by the year 2000. This explosive growth of information is forcing change in many scientific disciplines (e.g. bioinformatics and molecular genetics), spawning new fields, and even changing the way scientific information is used and shared. Novel, global genome sequence comparisons seem slow to appear but the infrastructure for these projects is being built, and we expect exciting developments in the near future.
Collapse
Affiliation(s)
- R A Clayton
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | |
Collapse
|
21
|
Abstract
Networks of interacting transcription factors, or gene circuits, form an essential part of the metabolic pathways controlling macromolecular synthesis. This paper conveys two new results about gene circuits. We first show how a gene circuit for mutant phenotypes can be constructed from the wild type gene circuit for the same organism. We then present results of computational studies that show that mutant expression patterns can be correctly predicted using gene circuits whose parameters have been determined from wild type data only. Further computational studies demonstrate that this property is insensitive to errors as large as a factor of two in the input data. Together, these results show that gene circuits can be used to identify the regulatory mechanisms governing an entire family of genotypes from a knowledge of the wild type genotype alone. It is argued that this fact forms the basis for a new paradigm in genetics.
Collapse
Affiliation(s)
- D H Sharp
- Theoretical Division, Los Alamos National Laboratory, NM 87545, USA.
| | | |
Collapse
|
22
|
Bono H, Ogata H, Goto S, Kanehisa M. Reconstruction of amino acid biosynthesis pathways from the complete genome sequence. Genome Res 1998; 8:203-10. [PMID: 9521924 DOI: 10.1101/gr.8.3.203] [Citation(s) in RCA: 119] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The complete genome sequence of an organism contains information that has not been fully utilized in the current prediction methods of gene functions, which are based on piece-by-piece similarity searches of individual genes. We present here a method that utilizes a higher level information of molecular pathways to reconstruct a complete functional unit from a set of genes. Specifically, a genome-by-genome comparison is first made for identifying enzyme genes and assigning EC numbers, which is followed by the reconstruction of selected portions of the metabolic pathways by use of the reference biochemical knowledge. The completeness of the reconstructed pathway is an indicator of the correctness of the initial gene function assignment. This feature has become possible because of our efforts to computerize the current knowledge of metabolic pathways under the KEGG project. We found that the biosynthesis pathways of all 20 amino acids were completely reconstructed in Escherichia coli, Haemophilus influenzae, and Bacillus subtilis, and probably in Synechocystis and Saccharomyces cerevisiae as well, although it was necessary to assume wider substrate specificity for aspartate aminotransferases.
Collapse
Affiliation(s)
- H Bono
- Institute for Chemical Research, Kyoto University, Uji, Kyoto 611, Japan
| | | | | | | |
Collapse
|
23
|
Affiliation(s)
- P D Karp
- Pangea Systems Inc., Menlo Park, CA 94025, USA.
| |
Collapse
|
24
|
Abstract
Biological sequence databases are currently being re-engineered to make them more efficient and easier to use. This re-engineering is also providing an infrastructure to make it easier to interrogate and integrate data from different sources. The net result of this effort should be a great improvement in the power and availability of bioinformatics resources to the general biology community.
Collapse
Affiliation(s)
- P G Baker
- School of Biological Sciences, University of Manchester, UK.
| | | |
Collapse
|
25
|
Karp PD, Riley M, Paley SM, Pellegrini-Toole A, Krummenacker M. EcoCyc: Encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res 1998; 26:50-3. [PMID: 9399798 PMCID: PMC147256 DOI: 10.1093/nar/26.1.50] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The encyclopedia of Escherichia coli genes and metabolism (EcoCyc) is a database that combines information about the genome and the intermediary metabolism of E.coli. The database describes 3030 genes of E.coli , 695 enzymes encoded by a subset of these genes, 595 metabolic reactions that occur in E.coli, and the organization of these reactions into 123 metabolic pathways. The EcoCyc graphical user interface allows scientists to query and explore the EcoCyc database using visualization tools such as genomic-map browsers and automatic layouts of metabolic pathways. EcoCyc can be thought of as an electronic review article because of its copious references to the primary literature, and as a (qualitative) computational model of E.coli metabolism. EcoCyc is available at URL http://ecocyc.PangeaSystems.com/ecocyc/
Collapse
Affiliation(s)
- P D Karp
- Artificial Intelligence Center, SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025, USA.
| | | | | | | | | |
Collapse
|
26
|
Abstract
Recently, molecular biologists have sequenced about a dozen bacterial genomes and the first eukaryotic genome. We can now obtain answers to detailed questions about the complete set of genes of an organism. Bioinformatics methods are increasingly used for attaching biological knowledge to long lists of genes, assigning genes to biological pathways, comparing the gene sets of different species, identifying specificity factors, and describing sets of highly conserved proteins common to all domains of life. Substantial progress has recently been made in the availability of primary and added-value databases, in the development of algorithms and of network information services for genome analysis. The pharmaceutical industry has greatly benefited from the accumulation of sequence data through the identification of targets and candidates for the development of drugs, vaccines, diagnostic markers and therapeutic proteins.
Collapse
|
27
|
Tomb JF, White O, Kerlavage AR, Clayton RA, Sutton GG, Fleischmann RD, Ketchum KA, Klenk HP, Gill S, Dougherty BA, Nelson K, Quackenbush J, Zhou L, Kirkness EF, Peterson S, Loftus B, Richardson D, Dodson R, Khalak HG, Glodek A, McKenney K, Fitzegerald LM, Lee N, Adams MD, Hickey EK, Berg DE, Gocayne JD, Utterback TR, Peterson JD, Kelley JM, Cotton MD, Weidman JM, Fujii C, Bowman C, Watthey L, Wallin E, Hayes WS, Borodovsky M, Karp PD, Smith HO, Fraser CM, Venter JC. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 1997; 388:539-47. [PMID: 9252185 DOI: 10.1038/41483] [Citation(s) in RCA: 2543] [Impact Index Per Article: 94.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Helicobacter pylori, strain 26695, has a circular genome of 1,667,867 base pairs and 1,590 predicted coding sequences. Sequence analysis indicates that H. pylori has well-developed systems for motility, for scavenging iron, and for DNA restriction and modification. Many putative adhesins, lipoproteins and other outer membrane proteins were identified, underscoring the potential complexity of host-pathogen interaction. Based on the large number of sequence-related genes encoding outer membrane proteins and the presence of homopolymeric tracts and dinucleotide repeats in coding sequences, H. pylori, like several other mucosal pathogens, probably uses recombination and slipped-strand mispairing within repeats as mechanisms for antigenic variation and adaptive evolution. Consistent with its restricted niche, H. pylori has a few regulatory networks, and a limited metabolic repertoire and biosynthetic capacity. Its survival in acid conditions depends, in part, on its ability to establish a positive inside-membrane potential in low pH.
Collapse
Affiliation(s)
- J F Tomb
- Institute for Genomic Research, Rockville, Maryland 20850, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
|