51
|
Deckwer WD, Jahn D, Hempel D, Zeng AP. Systems Biology Approaches to Bioprocess Development. Eng Life Sci 2006. [DOI: 10.1002/elsc.200620153] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
52
|
Eom YH, Lee S, Jeong H. Exploring local structural organization of metabolic networks using subgraph patterns. J Theor Biol 2006; 241:823-9. [PMID: 16504210 DOI: 10.1016/j.jtbi.2006.01.018] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2005] [Revised: 01/17/2006] [Accepted: 01/17/2006] [Indexed: 10/25/2022]
Abstract
Metabolic networks of many cellular organisms share global statistical features. Their connectivity distributions follow the long-tailed power law and show the small-world property. In addition, their modular structures are organized in a hierarchical manner. Although the global topological organization of metabolic networks is well understood, their local structural organization is still not clear. Investigating local properties of metabolic networks is necessary to understand the nature of metabolism in living organisms. To identify the local structural organization of metabolic networks, we analysed the subgraphs of metabolic networks of 43 organisms from three domains of life. We first identified the network motifs of metabolic networks and identified the statistically significant subgraph patterns. We then compared metabolic networks from different domains and found that they have similar local structures and that the local structure of each metabolic network has its own taxonomical meaning. Organisms closer in taxonomy showed similar local structures. In addition, the common substrates of 43 metabolic networks were not randomly distributed, but were more likely to be constituents of cohesive subgraph patterns.
Collapse
Affiliation(s)
- Young-Ho Eom
- Department of Physics, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea
| | | | | |
Collapse
|
53
|
Zhao J, Yu H, Luo J, Cao ZW, Li Y. Complex networks theory for analyzing metabolic networks. ACTA ACUST UNITED AC 2006. [DOI: 10.1007/s11434-006-2015-2] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
54
|
Heizer EM, Raiford DW, Raymer ML, Doom TE, Miller RV, Krane DE. Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotic Genomes: A Whole-Genome Analysis. Mol Biol Evol 2006; 23:1670-80. [PMID: 16754641 DOI: 10.1093/molbev/msl029] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
For most prokaryotic organisms, amino acid biosynthesis represents a significant portion of their overall energy budget. The difference in the cost of synthesis between amino acids can be striking, differing by as much as 7-fold. Two prokaryotic organisms, Escherichia coli and Bacillus subtilis, have been shown to preferentially utilize less costly amino acids in highly expressed genes, indicating that parsimony in amino acid selection may confer a selective advantage for prokaryotes. This study confirms those findings and extends them to 4 additional prokaryotic organisms: Chlamydia trachomatis, Chlamydophila pneumoniae AR39, Synechocystis sp. PCC 6803, and Thermus thermophilus HB27. Adherence to codon-usage biases for each of these 6 organisms is inversely correlated with a coding region's average amino acid biosynthetic cost in a fashion that is independent of chemoheterotrophic, photoautotrophic, or thermophilic lifestyle. The obligate parasites C. trachomatis and C. pneumoniae AR39 are incapable of synthesizing many of the 20 common amino acids. Removing auxotrophic amino acids from consideration in these organisms does not alter the overall trend of preferential use of energetically inexpensive amino acids in highly expressed genes.
Collapse
Affiliation(s)
- Esley M Heizer
- Department of Biological Sciences, Wright State University, USA
| | | | | | | | | | | |
Collapse
|
55
|
Ohtake H, Yamashita S, Kato J. Development of a New Biotechnological Basis for Improving Industrial Sustainability in Japan. Eng Life Sci 2006. [DOI: 10.1002/elsc.200620124] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
56
|
Abstract
Scientists seeking to understand the inner workings of cells have access to a multitude of pathway data resources. However, the representations of pathway data within these resources are not consistent or interchangeable. To facilitate easy information retrieval from a wide variety of pathway resources, such as signal transduction, gene regulation, molecular interaction and metabolic pathway databases, a broad effort in the biopathways community called BioPAX was formed. New biological pathway software applications built using the BioPAX standard will be able to integrate knowledge from multiple sources in a coherent and reliable way. This article reports the progress that the BioPAX work-group has made towards building and deploying the BioPAX data-exchange format for biological pathway data.
Collapse
|
57
|
Zhang Y, Li S, Skogerbø G, Zhang Z, Zhu X, Zhang Z, Sun S, Lu H, Shi B, Chen R. Phylophenetic properties of metabolic pathway topologies as revealed by global analysis. BMC Bioinformatics 2006; 7:252. [PMID: 16684350 PMCID: PMC1483838 DOI: 10.1186/1471-2105-7-252] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2005] [Accepted: 05/09/2006] [Indexed: 11/24/2022] Open
Abstract
Background As phenotypic features derived from heritable characters, the topologies of metabolic pathways contain both phylogenetic and phenetic components. In the post-genomic era, it is possible to measure the "phylophenetic" contents of different pathways topologies from a global perspective. Results We reconstructed phylophenetic trees for all available metabolic pathways based on topological similarities, and compared them to the corresponding 16S rRNA-based trees. Similarity values for each pair of trees ranged from 0.044 to 0.297. Using the quartet method, single pathways trees were merged into a comprehensive tree containing information from a large part of the entire metabolic networks. This tree showed considerably higher similarity (0.386) to the corresponding 16S rRNA-based tree than any tree based on a single pathway, but was, on the other hand, sufficiently distinct to preserve unique phylogenetic information not reflected by the 16S rRNA tree. Conclusion We observed that the topology of different metabolic pathways provided different phylogenetic and phenetic information, depicting the compromise between phylogenetic information and varying evolutionary pressures forming metabolic pathway topologies in different organisms. The phylogenetic information content of the comprehensive tree is substantially higher than that of any tree based on a single pathway, which also gave clues to constraints working on the topology of the global metabolic networks, information that is only partly reflected by the topologies of individual metabolic pathways.
Collapse
Affiliation(s)
- Yong Zhang
- Bioinformatics Laboratory and National Laboratory of Bromacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Graduate School of the Chinese Academy of Sciences, Beijing, China
| | - Shaojuan Li
- Bioinformatics Laboratory and National Laboratory of Bromacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Graduate School of the Chinese Academy of Sciences, Beijing, China
| | - Geir Skogerbø
- Bioinformatics Laboratory and National Laboratory of Bromacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Zhihua Zhang
- Bioinformatics Laboratory and National Laboratory of Bromacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Graduate School of the Chinese Academy of Sciences, Beijing, China
| | - Xiaopeng Zhu
- Bioinformatics Laboratory and National Laboratory of Bromacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Graduate School of the Chinese Academy of Sciences, Beijing, China
| | - Zefeng Zhang
- Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Beijing 100080, China
- Graduate School of the Chinese Academy of Sciences, Beijing, China
| | - Shiwei Sun
- Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Beijing 100080, China
- Graduate School of the Chinese Academy of Sciences, Beijing, China
| | - Hongchao Lu
- Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Beijing 100080, China
- Graduate School of the Chinese Academy of Sciences, Beijing, China
| | - Baochen Shi
- Bioinformatics Laboratory and National Laboratory of Bromacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Graduate School of the Chinese Academy of Sciences, Beijing, China
| | - Runsheng Chen
- Bioinformatics Laboratory and National Laboratory of Bromacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Beijing 100080, China
| |
Collapse
|
58
|
Abstract
Our information about the gene content of organisms continues to grow as more genomes are sequenced and gene products are characterized. Sequence-based annotation efforts have led to a list of cellular components, which can be thought of as a one-dimensional annotation. With growing information about component interactions, facilitated by the advancement of various high-throughput technologies, systemic, or two-dimensional, annotations can be generated. Knowledge about the physical arrangement of chromosomes will lead to a three-dimensional spatial annotation of the genome and a fourth dimension of annotation will arise from the study of changes in genome sequences that occur during adaptive evolution. Here we discuss all four levels of genome annotation, with specific emphasis on two-dimensional annotation methods.
Collapse
Affiliation(s)
- Jennifer L Reed
- Department of Bioengineering, University of California, San Diego, La Jolla, California, 92093, USA
| | | | | | | |
Collapse
|
59
|
Grindrod P, Kibble M. Review of uses of network and graph theory concepts within proteomics. Expert Rev Proteomics 2006; 1:229-38. [PMID: 15966817 DOI: 10.1586/14789450.1.2.229] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The size and nature of data collected on gene and protein interactions has led to a rapid growth of interest in graph theory and modern techniques for describing, characterizing and comparing networks. Simultaneously, this is a field of growth within mathematics and theoretical physics, where the global properties, and emergent behavior of networks, as a function of the local properties has long been studied. In this review, a number of approaches for exploiting modern network theory to help describe and analyze different data sets and problems associated with proteomic data are considered. This review aims to help biologists find their way towards useful ideas and references, yet may also help scientists from a mathematics and physics background to understand where they may apply their expertise.
Collapse
Affiliation(s)
- Peter Grindrod
- Magadalen Centre, Lawson Software, The Oxford Science Park, Oxford OX4 4GA, UK.
| | | |
Collapse
|
60
|
Mahadevan R, Bond DR, Butler JE, Esteve-Nuñez A, Coppi MV, Palsson BO, Schilling CH, Lovley DR. Characterization of metabolism in the Fe(III)-reducing organism Geobacter sulfurreducens by constraint-based modeling. Appl Environ Microbiol 2006; 72:1558-68. [PMID: 16461711 PMCID: PMC1392927 DOI: 10.1128/aem.72.2.1558-1568.2006] [Citation(s) in RCA: 206] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Geobacter sulfurreducens is a well-studied representative of the Geobacteraceae, which play a critical role in organic matter oxidation coupled to Fe(III) reduction, bioremediation of groundwater contaminated with organics or metals, and electricity production from waste organic matter. In order to investigate G. sulfurreducens central metabolism and electron transport, a metabolic model which integrated genome-based predictions with available genetic and physiological data was developed via the constraint-based modeling approach. Evaluation of the rates of proton production and consumption in the extracellular and cytoplasmic compartments revealed that energy conservation with extracellular electron acceptors, such as Fe(III), was limited relative to that associated with intracellular acceptors. This limitation was attributed to lack of cytoplasmic proton consumption during reduction of extracellular electron acceptors. Model-based analysis of the metabolic cost of producing an extracellular electron shuttle to promote electron transfer to insoluble Fe(III) oxides demonstrated why Geobacter species, which do not produce shuttles, have an energetic advantage over shuttle-producing Fe(III) reducers in subsurface environments. In silico analysis also revealed that the metabolic network of G. sulfurreducens could synthesize amino acids more efficiently than that of Escherichia coli due to the presence of a pyruvate-ferredoxin oxidoreductase, which catalyzes synthesis of pyruvate from acetate and carbon dioxide in a single step. In silico phenotypic analysis of deletion mutants demonstrated the capability of the model to explore the flexibility of G. sulfurreducens central metabolism and correctly predict mutant phenotypes. These results demonstrate that iterative modeling coupled with experimentation can accelerate the understanding of the physiology of poorly studied but environmentally relevant organisms and may help optimize their practical applications.
Collapse
Affiliation(s)
- R Mahadevan
- Genomatica, 5405 Morehouse Dr., Ste. 210, San Diego, CA 92121, USA.
| | | | | | | | | | | | | | | |
Collapse
|
61
|
Arakawa K, Yamada Y, Shinoda K, Nakayama Y, Tomita M. GEM System: automatic prototyping of cell-wide metabolic pathway models from genomes. BMC Bioinformatics 2006; 7:168. [PMID: 16553966 PMCID: PMC1435936 DOI: 10.1186/1471-2105-7-168] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2005] [Accepted: 03/23/2006] [Indexed: 11/18/2022] Open
Abstract
Background Successful realization of a "systems biology" approach to analyzing cells is a grand challenge for our understanding of life. However, current modeling approaches to cell simulation are labor-intensive, manual affairs, and therefore constitute a major bottleneck in the evolution of computational cell biology. Results We developed the Genome-based Modeling (GEM) System for the purpose of automatically prototyping simulation models of cell-wide metabolic pathways from genome sequences and other public biological information. Models generated by the GEM System include an entire Escherichia coli metabolism model comprising 968 reactions of 1195 metabolites, achieving 100% coverage when compared with the KEGG database, 92.38% with the EcoCyc database, and 95.06% with iJR904 genome-scale model. Conclusion The GEM System prototypes qualitative models to reduce the labor-intensive tasks required for systems biology research. Models of over 90 bacterial genomes are available at our web site.
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan
| | - Yohei Yamada
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan
| | - Kosaku Shinoda
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan
| | - Yoichi Nakayama
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan
| |
Collapse
|
62
|
Deckwer WD, Hempel D, Zeng AP, Jahn D. Systembiotechnologische Ansätze zur Prozessentwicklung. CHEM-ING-TECH 2006. [DOI: 10.1002/cite.200500156] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
63
|
Maltsev N, Glass E, Sulakhe D, Rodriguez A, Syed MH, Bompada T, Zhang Y, D'Souza M. PUMA2--grid-based high-throughput analysis of genomes and metabolic pathways. Nucleic Acids Res 2006; 34:D369-72. [PMID: 16381888 PMCID: PMC1347457 DOI: 10.1093/nar/gkj095] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The PUMA2 system (available at ) is an interactive, integrated bioinformatics environment for high-throughput genetic sequence analysis and metabolic reconstructions from sequence data. PUMA2 provides a framework for comparative and evolutionary analysis of genomic data and metabolic networks in the context of taxonomic and phenotypic information. Grid infrastructure is used to perform computationally intensive tasks. PUMA2 currently contains precomputed analysis of 213 prokaryotic, 22 eukaryotic, 650 mitochondrial and 1493 viral genomes and automated metabolic reconstructions for >200 organisms. Genomic data is annotated with information integrated from >20 sequence, structural and metabolic databases and ontologies. PUMA2 supports both automated and interactive expert-driven annotation of genomes, using a variety of publicly available bioinformatics tools. It also contains a suite of unique PUMA2 tools for automated assignment of gene function, evolutionary analysis of protein families and comparative analysis of metabolic pathways. PUMA2 allows users to submit batch sequence data for automated functional analysis and construction of metabolic models. The results of these analyses are made available to the users in the PUMA2 environment for further interactive sequence analysis and annotation.
Collapse
Affiliation(s)
- Natalia Maltsev
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA.
| | | | | | | | | | | | | | | |
Collapse
|
64
|
Abstract
MOTIVATION Analysis of metabolic pathways is a central topic in understanding the relationship between genotype and phenotype. The rapid accumulation of biological data provides the possibility of studying metabolic pathways at both the genomic and the metabolic levels. Retrieving metabolic pathways from current biological data sources, reconstructing metabolic pathways from rudimentary pathway components, and aligning metabolic pathways with each other are major tasks. Our motivation was to develop a conceptual framework and computational system that allows the retrieval of metabolic pathway information and the processing of alignments to reveal the similarities between metabolic pathways. RESULTS PathAligner extracts metabolic information from biological databases via the Internet and builds metabolic pathways with data sources of genes, sequences, enzymes, metabolites etc. It provides an easy-to-use interface to retrieve, display and manipulate metabolic information. PathAligner also provides an alignment method to compare the similarity between metabolic pathways. AVAILABILITY PathAligner is available at http://bibiserv.techfak.uni-bielefeld.de/pathaligner.
Collapse
Affiliation(s)
- Ming Chen
- Department of Bioinformatics and Medical Informatics, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| | | |
Collapse
|
65
|
Yu GX, Glass EM, Karonis NT, Maltsev N. Knowledge-based voting algorithm for automated protein functional annotation†. Proteins 2005; 61:907-17. [PMID: 16252283 DOI: 10.1002/prot.20652] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Automated annotation of high-throughput genome sequences is one of the earliest steps toward a comprehensive understanding of the dynamic behavior of living organisms. However, the step is often error-prone because of its underlying algorithms, which rely mainly on a simple similarity analysis, and lack of guidance from biological rules. We present herein a knowledge-based protein annotation algorithm. Our objectives are to reduce errors and to improve annotation confidences. This algorithm consists of two major components: a knowledge system, called "RuleMiner," and a voting procedure. The knowledge system, which includes biological rules and functional profiles for each function, provides a platform for seamless integration of multiple sequence analysis tools and guidance for function annotation. The voting procedure, which relies on the knowledge system, is designed to make (possibly) unbiased judgments in functional assignments among complicated, sometimes conflicting, information. We have applied this algorithm to 10 prokaryotic bacterial genomes and observed a significant improvement in annotation confidences. We also discuss the current limitations of the algorithm and the potential for future improvement.
Collapse
Affiliation(s)
- G X Yu
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA.
| | | | | | | |
Collapse
|
66
|
Karp PD, Ouzounis CA, Moore-Kochlacs C, Goldovsky L, Kaipa P, Ahrén D, Tsoka S, Darzentas N, Kunin V, López-Bigas N. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res 2005; 33:6083-9. [PMID: 16246909 PMCID: PMC1266070 DOI: 10.1093/nar/gki892] [Citation(s) in RCA: 395] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The BioCyc database collection is a set of 160 pathway/genome databases (PGDBs) for most eukaryotic and prokaryotic species whose genomes have been completely sequenced to date. Each PGDB in the BioCyc collection describes the genome and predicted metabolic network of a single organism, inferred from the MetaCyc database, which is a reference source on metabolic pathways from multiple organisms. In addition, each bacterial PGDB includes predicted operons for the corresponding species. The BioCyc collection provides a unique resource for computational systems biology, namely global and comparative analyses of genomes and metabolic networks, and a supplement to the BioCyc resource of curated PGDBs. The Omics viewer available through the BioCyc website allows scientists to visualize combinations of gene expression, proteomics and metabolomics data on the metabolic maps of these organisms. This paper discusses the computational methodology by which the BioCyc collection has been expanded, and presents an aggregate analysis of the collection that includes the range of number of pathways present in these organisms, and the most frequently observed pathways. We seek scientists to adopt and curate individual PGDBs within the BioCyc collection. Only by harnessing the expertise of many scientists we can hope to produce biological databases, which accurately reflect the depth and breadth of knowledge that the biomedical research community is producing.
Collapse
Affiliation(s)
- Peter D Karp
- Bioinformatics Research Group, SRI International EK207, 333 Ravenswood Avenue, Menlo Park, CA 94025, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
67
|
Croes D, Couche F, Wodak SJ, van Helden J. Inferring meaningful pathways in weighted metabolic networks. J Mol Biol 2005; 356:222-36. [PMID: 16337962 DOI: 10.1016/j.jmb.2005.09.079] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2005] [Revised: 09/06/2005] [Accepted: 09/27/2005] [Indexed: 10/25/2022]
Abstract
An approach is presented for computing meaningful pathways in the network of small molecule metabolism comprising the chemical reactions characterized in all organisms. The metabolic network is described as a weighted graph in which all the compounds are included, but each compound is assigned a weight equal to the number of reactions in which it participates. Path finding is performed in this graph by searching for one or more paths with lowest weight. Performance is evaluated systematically by computing paths between the first and last reactions in annotated metabolic pathways, and comparing the intermediate reactions in the computed pathways to those in the annotated ones. For the sake of comparison, paths are computed also in the un-weighted raw (all compounds and reactions) and filtered (highly connected pool metabolites removed) metabolic graphs, respectively. The correspondence between the computed and annotated pathways is very poor (<30%) in the raw graph; increasing to approximately 65% in the filtered graph; reaching approximately 85% in the weighted graph. Considering the best-matching path among the five lightest paths increases the correspondence to 92%, on average. We then show that the average distance between pairs of metabolites is significantly larger in the weighted graph than in the raw unfiltered graph, suggesting that the small-world properties previously reported for metabolic networks probably result from irrelevant shortcuts through pool metabolites. In addition, we provide evidence that the length of the shortest path in the weighted graph represents a valid measure of the "metabolic distance" between enzymes. We suggest that the success of our simplistic approach is rooted in the high degree of specificity of the reactions in metabolic pathways, presumably reflecting thermodynamic constraints operating in these pathways. We expect our approach to find useful applications in inferring metabolic pathways in newly sequenced genomes.
Collapse
Affiliation(s)
- Didier Croes
- SCMBB-Université Libre de Bruxelles, Campus Plaine, CP 263, Boulevard du Triomphe, 1050 Bruxelles, Belgium
| | | | | | | |
Collapse
|
68
|
Alves R, Savageau MA. Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes. Mol Microbiol 2005; 56:1017-34. [PMID: 15853887 PMCID: PMC1839009 DOI: 10.1111/j.1365-2958.2005.04566.x] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
If the enzymes responsible for biosynthesis of a given amino acid are repressed and the cognate amino acid pool suddenly depleted, then derepression of these enzymes and replenishment of the pool would be problematic, if the enzymes were largely composed of the cognate amino acid. In the proverbial "Catch 22", cells would lack the necessary enzymes to make the amino acid, and they would lack the necessary amino acid to make the needed enzymes. Based on this scenario, we hypothesize that evolution would lead to the selection of amino acid biosynthetic enzymes that have a relatively low content of their cognate amino acid. We call this the "cognate bias hypothesis". Here we test several implications of this hypothesis directly using data from the proteome of Escherichia coli. Several lines of evidence show that low cognate bias is evident in 15 of the 20 amino acid biosynthetic pathways. Comparison with closely related Salmonella typhimurium shows similar results. Comparison with more distantly related Bacillus subtilis shows general similarities as well as significant differences in the detailed profiles of cognate bias. Thus, selection for low cognate bias plays a significant role in shaping the amino acid composition for a large class of cellular proteins.
Collapse
Affiliation(s)
- Rui Alves
- Biomedical Engineering Department, University of California-Davis, Davis, CA, USA
| | | |
Collapse
|
69
|
Popescu L, Yona G. Automation of gene assignments to metabolic pathways using high-throughput expression data. BMC Bioinformatics 2005; 6:217. [PMID: 16135255 PMCID: PMC1239907 DOI: 10.1186/1471-2105-6-217] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2005] [Accepted: 08/31/2005] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Accurate assignment of genes to pathways is essential in order to understand the functional role of genes and to map the existing pathways in a given genome. Existing algorithms predict pathways by extrapolating experimental data in one organism to other organisms for which this data is not available. However, current systems classify all genes that belong to a specific EC family to all the pathways that contain the corresponding enzymatic reaction, and thus introduce ambiguity. RESULTS Here we describe an algorithm for assignment of genes to cellular pathways that addresses this problem by selectively assigning specific genes to pathways. Our algorithm uses the set of experimentally elucidated metabolic pathways from MetaCyc, together with statistical models of enzyme families and expression data to assign genes to enzyme families and pathways by optimizing correlated co-expression, while minimizing conflicts due to shared assignments among pathways. Our algorithm also identifies alternative ("backup") genes and addresses the multi-domain nature of proteins. We apply our model to assign genes to pathways in the Yeast genome and compare the results for genes that were assigned experimentally. Our assignments are consistent with the experimentally verified assignments and reflect characteristic properties of cellular pathways. CONCLUSION We present an algorithm for automatic assignment of genes to metabolic pathways. The algorithm utilizes expression data and reduces the ambiguity that characterizes assignments that are based only on EC numbers.
Collapse
Affiliation(s)
- Liviu Popescu
- Department of Computer Science, Cornell University, Ithaca, NY
| | - Golan Yona
- Department of Computer Science, Cornell University, Ithaca, NY
| |
Collapse
|
70
|
Tsoka S, Simon D, Ouzounis CA. Automated metabolic reconstruction for Methanococcus jannaschii. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2005; 1:223-9. [PMID: 15810431 PMCID: PMC2685575 DOI: 10.1155/2004/324925] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We present the computational prediction and synthesis of the metabolic pathways in Methanococcus jannaschii from its genomic sequence using the PathoLogic software. Metabolic reconstruction is based on a reference knowledge base of metabolic pathways and is performed with minimal manual intervention. We predict the existence of 609 metabolic reactions that are assembled in 113 metabolic pathways and an additional 17 super-pathways consisting of one or more component pathways. These assignments represent significantly improved enzyme and pathway predictions compared with previous metabolic reconstructions, and some key metabolic reactions, previously missing, have been identified. Our results, in the form of enzymatic assignments and metabolic pathway predictions, form a database (MJCyc) that is accessible over the World Wide Web for further dissemination among members of the scientific community.
Collapse
Affiliation(s)
- Sophia Tsoka
- Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
71
|
Freilich S, Spriggs RV, George RA, Al-Lazikani B, Swindells M, Thornton JM. The complement of enzymatic sets in different species. J Mol Biol 2005; 349:745-63. [PMID: 15896806 DOI: 10.1016/j.jmb.2005.04.027] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2005] [Revised: 04/10/2005] [Accepted: 04/12/2005] [Indexed: 11/17/2022]
Abstract
We present here a comprehensive analysis of the complement of enzymes in a large variety of species. As enzymes are a relatively conserved group there are several classification systems available that are common to all species and link a protein sequence to an enzymatic function. Enzymes are therefore an ideal functional group to study the relationship between sequence expansion, functional divergence and phenotypic changes. By using information retrieved from the well annotated SWISS-PROT database together with sequence information from a variety of fully sequenced genomes and information from the EC functional scheme we have aimed here to estimate the fraction of enzymes in genomes, to determine the extent of their functional redundancy in different domains of life and to identify functional innovations and lineage specific expansions in the metazoa lineage. We found that prokaryote and eukaryote species differ both in the fraction of enzymes in their genomes and in the pattern of expansion of their enzymatic sets. We observe an increase in functional redundancy accompanying an increase in species complexity. A quantitative assessment was performed in order to determine the degree of functional redundancy in different species. Finally, we report a massive expansion in the number of mammalian enzymes involved in signalling and degradation.
Collapse
Affiliation(s)
- Shiri Freilich
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | |
Collapse
|
72
|
van der Werf MJ, Jellema RH, Hankemeier T. Microbial metabolomics: replacing trial-and-error by the unbiased selection and ranking of targets. J Ind Microbiol Biotechnol 2005; 32:234-52. [PMID: 15895265 DOI: 10.1007/s10295-005-0231-4] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2004] [Accepted: 03/10/2005] [Indexed: 01/01/2023]
Abstract
Microbial production strains are currently improved using a combination of random and targeted approaches. In the case of a targeted approach, potential bottlenecks, feed-back inhibition, and side-routes are removed, and other processes of interest are targeted by overexpressing or knocking-out the gene(s) of interest. To date, the selection of these targets has been based at its best on expert knowledge, but to a large extent also on 'educated guesses' and 'gut feeling'. Therefore, time and thus money is wasted on targets that later prove to be irrelevant or only result in a very minor improvement. Moreover, in current approaches, biological processes that are not known to be involved in the formation of a specific product are overlooked and it is impossible to rank the relative importance of the different targets postulated. Metabolomics, a technology that involves the non-targeted, holistic analysis of the changes in the complete set of metabolites in the cell in response to environmental or cellular changes, in combination with multivariate data analysis (MVDA) tools like principal component discriminant analysis and partial least squares, allow the replacement of current empirical approaches by a scientific approach towards the selection and ranking of targets. In this review, we describe the technological challenges in setting up the novel metabolomics technology and the principle of MVDA algorithms in analyzing biomolecular data sets. In addition to strain improvement, the combined metabolomics and MVDA approach can also be applied to growth medium optimization, predicting the effect of quality differences of different batches of complex media on productivity, the identification of bioactives in complex mixtures, the characterization of mutant strains, the exploration of the production potential of strains, the assignment of functions to orphan genes, the identification of metabolite-dependent regulatory interactions, and many more microbiological issues.
Collapse
|
73
|
Abstract
New information architectures enable new approaches to publishing and accessing valuable data and programs. So-called service-oriented architectures define standard interfaces and protocols that allow developers to encapsulate information tools as services that clients can access without knowledge of, or control over, their internal workings. Thus, tools formerly accessible only to the specialist can be made available to all; previously manual data-processing and analysis tasks can be automated by having services access services. Such service-oriented approaches to science are already being applied successfully, in some cases at substantial scales, but much more effort is required before these approaches are applied routinely across many disciplines. Grid technologies can accelerate the development and adoption of service-oriented science by enabling a separation of concerns between discipline-specific content and domain-independent software and hardware infrastructure.
Collapse
Affiliation(s)
- Ian Foster
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA, and Department of Computer Science, University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
74
|
Cary MP, Bader GD, Sander C. Pathway information for systems biology. FEBS Lett 2005; 579:1815-20. [PMID: 15763557 DOI: 10.1016/j.febslet.2005.02.005] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2005] [Revised: 02/01/2005] [Accepted: 02/01/2005] [Indexed: 01/03/2023]
Abstract
Pathway information is vital for successful quantitative modeling of biological systems. The almost 170 online pathway databases vary widely in coverage and representation of biological processes, making their use extremely difficult. Future pathway information systems for querying, visualization and analysis must support standard exchange formats to successfully integrate data on a large scale. Such integrated systems will greatly facilitate the constructive cycle of computational model building and experimental verification that lies at the heart of systems biology.
Collapse
Affiliation(s)
- Michael P Cary
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, Box 460, New York, NY 10021, USA
| | | | | |
Collapse
|
75
|
Marland E, Prachumwat A, Maltsev N, Gu Z, Li WH. Higher gene duplicabilities for metabolic proteins than for nonmetabolic proteins in yeast and E. coli. J Mol Evol 2005; 59:806-14. [PMID: 15599512 DOI: 10.1007/s00239-004-0068-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2004] [Accepted: 06/29/2004] [Indexed: 10/26/2022]
Abstract
Although the evolutionary significance of gene duplication has long been appreciated, it remains unclear what factors determine gene duplicability. In this study we investigated whether metabolism is an important determinant of gene duplicability because cellular metabolism is crucial for the survival and reproduction of an organism. Using genomic data and metabolic pathway data from the yeast (Saccharomyces cerevisiae) and Escherichia coli, we found that metabolic proteins indeed tend to have higher gene duplicability than nonmetabolic proteins. Moreover, a detailed analysis of metabolic pathways in these two organisms revealed that genes in the central metabolic pathways and the catabolic pathways have, on average, higher gene duplicability than do other genes and that most genes in anabolic pathways are single-copy genes.
Collapse
Affiliation(s)
- Elizabeth Marland
- Mathematics & Computer Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439, USA
| | | | | | | | | |
Collapse
|
76
|
Brown SC, Kruppa G, Dasseux JL. Metabolomics applications of FT-ICR mass spectrometry. MASS SPECTROMETRY REVIEWS 2005; 24:223-231. [PMID: 15389859 DOI: 10.1002/mas.20011] [Citation(s) in RCA: 170] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Metabolomics, also known as Metabolic Profiling, is an emerging discipline under the umbrella concept of systems biology. The goal of metabolomics is to know and understand the concentrations and fluxes of endogenous metabolites within a living biological system under study. General tools are being developed for the rapid measurement of many metabolites in a single experiment, most of which are mass spectrometric methods. FT-ICR has unique advantages, as a mass spectrometric method, in this regard. Applications of FT-ICR to metabolomics analyses will be discussed and reviewed in the context of the single publication currently available.
Collapse
Affiliation(s)
- Stephen C Brown
- Esperion Therapeutics, Inc., Ann Arbor, Michigan 48108, USA.
| | | | | |
Collapse
|
77
|
Chen M, Hofestädt R. Web-based information retrieval system for the prediction of metabolic pathways. IEEE Trans Nanobioscience 2005; 3:192-9. [PMID: 15473071 DOI: 10.1109/tnb.2004.833691] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Analysis of metabolic pathways is a central topic in understanding the relationship between genotype and phenotype. The rapid accumulation of biological data provides the possibility of studying metabolic pathways both at the genomic and metabolic levels. Our motivation is to develop a conceptual framework and computational system that will allow retrieval of metabolic information and prediction of metabolic pathways. In this paper, we introduce a metabolic pathway prediction framework that extracts metabolic information from biological databases via the Internet, and builds metabolic pathways with data sources of genes, sequences, enzymes, metabolites, etc. It provides an easy-to-use interface to retrieve, display, and manipulate metabolic information. The system has been implemented into PathAligner, available at http://bibiserv.techfak.uni-bielefeld. de/pathaligner/.
Collapse
Affiliation(s)
- Ming Chen
- Department of Bioinformatics/Medical Informatics, Faculty of Technology, Bielefeld University, Bielefeld D-33501, Germany.
| | | |
Collapse
|
78
|
Ahren DG, Ouzounis CA. Robustness of metabolic map reconstruction. J Bioinform Comput Biol 2005; 2:589-93. [PMID: 15359428 DOI: 10.1142/s021972000400079x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2004] [Revised: 06/25/2004] [Accepted: 06/25/2004] [Indexed: 11/18/2022]
Abstract
With the ever increasing amount of genomic data available, the interest for generating biochemical pathways has grown tremendously. So far, mainly complete genomes have been used to reconstruct the biochemical pathways and their associated interactions. However, a large number of low coverage genomes, as well as other sources of partial genomic data, are currently available for many organisms. In order to be able to use incomplete data for metabolic reconstruction, the inherent properties of this procedure need to be investigated. In this short note, we describe the robustness and predictive power of metabolic reconstructions using partial information from Schizosaccharomyces pombe. We also discuss the implications of the results on reference genome projects as well as other large-scale sequencing data.
Collapse
Affiliation(s)
- Dag G Ahren
- Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.
| | | |
Collapse
|
79
|
Pharkya P, Burgard AP, Maranas CD. OptStrain: a computational framework for redesign of microbial production systems. Genome Res 2005; 14:2367-76. [PMID: 15520298 PMCID: PMC525696 DOI: 10.1101/gr.2872004] [Citation(s) in RCA: 375] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
This paper introduces the hierarchical computational framework OptStrain aimed at guiding pathway modifications, through reaction additions and deletions, of microbial networks for the overproduction of targeted compounds. These compounds may range from electrons or hydrogen in biofuel cell and environmental applications to complex drug precursor molecules. A comprehensive database of biotransformations, referred to as the Universal database (with >5700 reactions), is compiled and regularly updated by downloading and curating reactions from multiple biopathway database sources. Combinatorial optimization is then used to elucidate the set(s) of non-native functionalities, extracted from this Universal database, to add to the examined production host for enabling the desired product formation. Subsequently, competing functionalities that divert flux away from the targeted product are identified and removed to ensure higher product yields coupled with growth. This work represents an advancement over earlier efforts by establishing an integrated computational framework capable of constructing stoichiometrically balanced pathways, imposing maximum product yield requirements, pinpointing the optimal substrate(s), and evaluating different microbial hosts. The range and utility of OptStrain are demonstrated by addressing two very different product molecules. The hydrogen case study pinpoints reaction elimination strategies for improving hydrogen yields using two different substrates for three separate production hosts. In contrast, the vanillin study primarily showcases which non-native pathways need to be added into Escherichia coli. In summary, OptStrain provides a useful tool to aid microbial strain design and, more importantly, it establishes an integrated framework to accommodate future modeling developments.
Collapse
Affiliation(s)
- Priti Pharkya
- Department of Chemical Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | | | | |
Collapse
|
80
|
Song C, Havlin S, Makse HA. Self-similarity of complex networks. Nature 2005; 433:392-5. [PMID: 15674285 DOI: 10.1038/nature03248] [Citation(s) in RCA: 364] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2004] [Accepted: 11/30/2004] [Indexed: 11/08/2022]
Abstract
Complex networks have been studied extensively owing to their relevance to many real systems such as the world-wide web, the Internet, energy landscapes and biological and social networks. A large number of real networks are referred to as 'scale-free' because they show a power-law distribution of the number of links per node. However, it is widely believed that complex networks are not invariant or self-similar under a length-scale transformation. This conclusion originates from the 'small-world' property of these networks, which implies that the number of nodes increases exponentially with the 'diameter' of the network, rather than the power-law relation expected for a self-similar structure. Here we analyse a variety of real complex networks and find that, on the contrary, they consist of self-repeating patterns on all length scales. This result is achieved by the application of a renormalization procedure that coarse-grains the system into boxes containing nodes within a given 'size'. We identify a power-law relation between the number of boxes needed to cover the network and the size of the box, defining a finite self-similar exponent. These fundamental properties help to explain the scale-free nature of complex networks and suggest a common self-organization dynamics.
Collapse
Affiliation(s)
- Chaoming Song
- Levich Institute and Physics Department, City College of New York, New York, New York 10031, USA
| | | | | |
Collapse
|
81
|
Holford M, Li N, Nadkarni P, Zhao H. VitaPad: visualization tools for the analysis of pathway data. Bioinformatics 2004; 21:1596-602. [PMID: 15564306 DOI: 10.1093/bioinformatics/bti153] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Packages that support the creation of pathway diagrams are limited by their inability to be readily extended to new classes of pathway-related data. RESULTS VitaPad is a cross-platform application that enables users to create and modify biological pathway diagrams and incorporate microarray data with them. It improves on existing software in the following areas: (i) It can create diagrams dynamically through graph layout algorithms. (ii) It is open-source and uses an open XML format to store data, allowing for easy extension or integration with other tools. (iii) It features a cutting-edge user interface with intuitive controls, high-resolution graphics and fully customizable appearance. AVAILABILITY http://bioinformatics.med.yale.edu CONTACTS matthew.holford@yale.edu; hongyu.zhao@yale.edu.
Collapse
Affiliation(s)
- Matthew Holford
- Center for Statistical Genomics and Proteomics, Yale University, New Haven, CT 06520, USA.
| | | | | | | |
Collapse
|
82
|
Bolotin A, Quinquis B, Renault P, Sorokin A, Ehrlich SD, Kulakauskas S, Lapidus A, Goltsman E, Mazur M, Pusch GD, Fonstein M, Overbeek R, Kyprides N, Purnelle B, Prozzi D, Ngui K, Masuy D, Hancy F, Burteau S, Boutry M, Delcour J, Goffeau A, Hols P. Complete sequence and comparative genome analysis of the dairy bacterium Streptococcus thermophilus. Nat Biotechnol 2004; 22:1554-8. [PMID: 15543133 PMCID: PMC7416660 DOI: 10.1038/nbt1034] [Citation(s) in RCA: 357] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2004] [Accepted: 09/21/2004] [Indexed: 02/06/2023]
Abstract
The lactic acid bacterium Streptococcus thermophilus is widely used for the manufacture of yogurt and cheese. This dairy species of major economic importance is phylogenetically close to pathogenic streptococci, raising the possibility that it has a potential for virulence. Here we report the genome sequences of two yogurt strains of S. thermophilus. We found a striking level of gene decay (10% pseudogenes) in both microorganisms. Many genes involved in carbon utilization are nonfunctional, in line with the paucity of carbon sources in milk. Notably, most streptococcal virulence-related genes that are not involved in basic cellular processes are either inactivated or absent in the dairy streptococcus. Adaptation to the constant milk environment appears to have resulted in the stabilization of the genome structure. We conclude that S. thermophilus has evolved mainly through loss-of-function events that remarkably mirror the environment of the dairy niche resulting in a severely diminished pathogenic potential.
Collapse
Affiliation(s)
- Alexander Bolotin
- Génétique Microbienne. Centre de Recherche de Jouy en Josas, Institut National de la Recherche Agronomique, Jouy en Josas, 78352 Cedex France
| | - Benoît Quinquis
- Génétique Microbienne. Centre de Recherche de Jouy en Josas, Institut National de la Recherche Agronomique, Jouy en Josas, 78352 Cedex France
| | - Pierre Renault
- Génétique Microbienne. Centre de Recherche de Jouy en Josas, Institut National de la Recherche Agronomique, Jouy en Josas, 78352 Cedex France
| | - Alexei Sorokin
- Génétique Microbienne. Centre de Recherche de Jouy en Josas, Institut National de la Recherche Agronomique, Jouy en Josas, 78352 Cedex France
| | - S Dusko Ehrlich
- Génétique Microbienne. Centre de Recherche de Jouy en Josas, Institut National de la Recherche Agronomique, Jouy en Josas, 78352 Cedex France
| | - Saulius Kulakauskas
- Unité de Recherche Latière et Génétique Appliquée, Centre de Recherche de Jouy en Josas, Institut National de la Recherche Agronomique, Jouy en Josas, 78352 Cedex France
| | - Alla Lapidus
- Integrated Genomics, Chicago, 60612 USA Illinois
- Present Address: Microbial Genomics, Department of Energy Joint Genome Institute, 2800 Mitchell Drive, B400, Walnut Creek, California 94598 USA
| | - Eugene Goltsman
- Integrated Genomics, Chicago, 60612 USA Illinois
- Present Address: Microbial Genomics, Department of Energy Joint Genome Institute, 2800 Mitchell Drive, B400, Walnut Creek, California 94598 USA
| | | | - Gordon D Pusch
- Integrated Genomics, Chicago, 60612 USA Illinois
- Present Address: Fellowship for Interpretation of Genomes, 15W155 81st Street, Burr Ridge, Illinois 60527 USA
| | - Michael Fonstein
- Integrated Genomics, Chicago, 60612 USA Illinois
- Present Address: Cleveland BioLabs, Inc., 10265 Carnegie Ave., Cleveland, Ohio 44106
| | - Ross Overbeek
- Integrated Genomics, Chicago, 60612 USA Illinois
- Present Address: Fellowship for Interpretation of Genomes, 15W155 81st Street, Burr Ridge, Illinois 60527 USA
| | - Nikos Kyprides
- Integrated Genomics, Chicago, 60612 USA Illinois
- Present Address: Microbial Genomics, Department of Energy Joint Genome Institute, 2800 Mitchell Drive, B400, Walnut Creek, California 94598 USA
| | - Bénédicte Purnelle
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| | - Deborah Prozzi
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| | - Katrina Ngui
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
- Present Address: Department Anatomy and Cell Biology, University of Melbourne, Victoria 3010 Australia
| | - David Masuy
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| | - Frédéric Hancy
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| | - Sophie Burteau
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
- Present Address: Unité de Recherche en Biologie Cellulaire, Facultés Universitaires Notre-Dame de la Paix, 61 Rue de Bruxelles, 5000 Namur, Belgium
| | - Marc Boutry
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| | - Jean Delcour
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| | - André Goffeau
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| | - Pascal Hols
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| |
Collapse
|
83
|
Ma HW, Zeng AP. Phylogenetic comparison of metabolic capacities of organisms at genome level. Mol Phylogenet Evol 2004; 31:204-13. [PMID: 15019620 DOI: 10.1016/j.ympev.2003.08.011] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2003] [Revised: 06/05/2003] [Indexed: 11/22/2022]
Abstract
Horizontal gene transfer (HGT) has been shown to widely spread in organisms by comparative genomic studies. However, its effect on the phylogenetic relationship of organisms, especially at a system level of different cellular functions, is still not well understood. In this work, we have constructed phylogenetic trees based on the enzyme, reaction, and gene contents of metabolic networks reconstructed from annotated genome information of 82 sequenced organisms. Results from different phylogenetic distance definitions and based on three different functional subsystems (i.e., metabolism, cellular processes, information storage and processing) were compared. Results based on the three different functional subsystems give different pictures on the phylogenetic relationship of organisms, reflecting the different extents of HGT in the different functional systems. In general, horizontal transfer is prevailing in genes for metabolism, but less in genes for information processing. Nevertheless, the major results of metabolic network-based phylogenetic trees are in good agreement with the tree based on 16S rRNA and genome trees, confirming the three domain classification and the close relationship between eukaryotes and archaea at the level of metabolic networks. These results strongly support the hypothesis that although HGT is widely distributed, it is nevertheless constrained by certain pre-existing metabolic organization principle(s) during the evolution. Further research is needed to identify the organization principle and constraints of metabolic network on HGT which have large impacts on understanding the evolution of life and in purposefully manipulating cellular metabolism.
Collapse
Affiliation(s)
- Hong-Wu Ma
- Department of Genome Analysis, GBF-German Research Center for Biotechnology, Mascheroder Weg 1, D-38124 Braunschweig, Germany
| | | |
Collapse
|
84
|
|
85
|
Haft DH, Selengut JD, Brinkac LM, Zafar N, White O. Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics. Bioinformatics 2004; 21:293-306. [PMID: 15347579 DOI: 10.1093/bioinformatics/bti015] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The presence or absence of metabolic pathways and structures provide a context that makes protein annotation far more reliable. Compiling such information across microbial genomes improves the functional classification of proteins and provides a valuable resource for comparative genomics. RESULTS We have created a Genome Properties system to present key aspects of prokaryotic biology using standardized computational methods and controlled vocabularies. Properties reflect gene content, phenotype, phylogeny and computational analyses. The results of searches using hidden Markov models allow many properties to be deduced automatically, especially for families of proteins (equivalogs) conserved in function since their last common ancestor. Additional properties are derived from curation, published reports and other forms of evidence. Genome Properties system was applied to 156 complete prokaryotic genomes, and is easily mined to find differences between species, correlations between metabolic features and families of uncharacterized proteins, or relationships among properties. AVAILABILITY Genome Properties can be found at http://www.tigr.org/Genome_Properties SUPPLEMENTARY INFORMATION http://www.tigr.org/tigr-scripts/CMR2/genome_properties_references.spl.
Collapse
Affiliation(s)
- Daniel H Haft
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| | | | | | | | | |
Collapse
|
86
|
Ishii N, Robert M, Nakayama Y, Kanai A, Tomita M. Toward large-scale modeling of the microbial cell for computer simulation. J Biotechnol 2004; 113:281-94. [PMID: 15380661 DOI: 10.1016/j.jbiotec.2004.04.038] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2003] [Revised: 03/30/2004] [Accepted: 04/01/2004] [Indexed: 11/26/2022]
Abstract
In the post-genomic era, the large-scale, systematic, and functional analysis of all cellular components using transcriptomics, proteomics, and metabolomics, together with bioinformatics for the analysis of the massive amount of data generated by these "omics" methods are the focus of intensive research activities. As a consequence of these developments, systems biology, whose goal is to comprehend the organism as a complex system arising from interactions between its multiple elements, becomes a more tangible objective. Mathematical modeling of microorganisms and subsequent computer simulations are effective tools for systems biology, which will lead to a better understanding of the microbial cell and will have immense ramifications for biological, medical, environmental sciences, and the pharmaceutical industry. In this review, we describe various types of mathematical models (structured, unstructured, static, dynamic, etc.), of microorganisms that have been in use for a while, and others that are emerging. Several biochemical/cellular simulation platforms to manipulate such models are summarized and the E-Cell system developed in our laboratory is introduced. Finally, our strategy for building a "whole cell metabolism model", including the experimental approach, is presented.
Collapse
Affiliation(s)
- Nobuyoshi Ishii
- Institute for Advanced Biosciences, Keio University, 403-1 Daihoji, Tsuruoka, Yamagata 997-0017, Japan
| | | | | | | | | |
Collapse
|
87
|
Nishio Y, Nakamura Y, Usuda Y, Sugimoto S, Matsui K, Kawarabayasi Y, Kikuchi H, Gojobori T, Ikeo K. Evolutionary Process of Amino Acid Biosynthesis in Corynebacterium at the Whole Genome Level. Mol Biol Evol 2004; 21:1683-91. [PMID: 15163767 DOI: 10.1093/molbev/msh175] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Corynebacterium glutamicum, which is the closest relative of Corynebacterium efficiens, is widely used for the large scale production of many kinds of amino acids, particularly glutamic acid and lysine, by fermentation. Corynebacterium diphtheriae, which is well known as a human pathogen, is also closely related to these two species of Corynebacteria, but it lacks such productivity of amino acids. It is an important and interesting question to ask how those closely related bacterial species have undergone such significant functional differentiation in amino acid biosynthesis. The main purpose of the present study is to clarify the evolutionary process of functional differentiation among the three species of Corynebacteria by conducting a comparative analysis of genome sequences. When Mycobacterium and Streptomyces were used as out groups, our comparative study suggested that the common ancestor of Corynebacteria already possessed almost all of the gene sets necessary for amino acid production. However, C. diphtheriae was found to have lost the genes responsible for amino acid production. Moreover, we found that the common ancestor of C. efficiens and C. glutamicum have acquired some of genes responsible for amino acid production by horizontal gene transfer. Thus, we conclude that the evolutionary events of gene loss and horizontal gene transfer must have been responsible for functional differentiation in amino acid biosynthesis of the three species of Corynebacteria.
Collapse
Affiliation(s)
- Yousuke Nishio
- Institute of Life Sciences, Ajinomoto Co., Inc., Kawasaki, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
88
|
Sun J, Zeng AP. IdentiCS--identification of coding sequence and in silico reconstruction of the metabolic network directly from unannotated low-coverage bacterial genome sequence. BMC Bioinformatics 2004; 5:112. [PMID: 15312235 PMCID: PMC514700 DOI: 10.1186/1471-2105-5-112] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2004] [Accepted: 08/16/2004] [Indexed: 11/17/2022] Open
Abstract
Background A necessary step for a genome level analysis of the cellular metabolism is the in silico reconstruction of the metabolic network from genome sequences. The available methods are mainly based on the annotation of genome sequences including two successive steps, the prediction of coding sequences (CDS) and their function assignment. The annotation process takes time. The available methods often encounter difficulties when dealing with unfinished error-containing genomic sequence. Results In this work a fast method is proposed to use unannotated genome sequence for predicting CDSs and for an in silico reconstruction of metabolic networks. Instead of using predicted genes or CDSs to query public databases, entries from public DNA or protein databases are used as queries to search a local database of the unannotated genome sequence to predict CDSs. Functions are assigned to the predicted CDSs simultaneously. The well-annotated genome of Salmonella typhimurium LT2 is used as an example to demonstrate the applicability of the method. 97.7% of the CDSs in the original annotation are correctly identified. The use of SWISS-PROT-TrEMBL databases resulted in an identification of 98.9% of CDSs that have EC-numbers in the published annotation. Furthermore, two versions of sequences of the bacterium Klebsiella pneumoniae with different genome coverage (3.9 and 7.9 fold, respectively) are examined. The results suggest that a 3.9-fold coverage of the bacterial genome could be sufficiently used for the in silico reconstruction of the metabolic network. Compared to other gene finding methods such as CRITICA our method is more suitable for exploiting sequences of low genome coverage. Based on the new method, a program called IdentiCS (Identification of Coding Sequences from Unfinished Genome Sequences) is delivered that combines the identification of CDSs with the reconstruction, comparison and visualization of metabolic networks (free to download at ). Conclusions The reversed querying process and the program IdentiCS allow a fast and adequate prediction protein coding sequences and reconstruction of the potential metabolic network from low coverage genome sequences of bacteria. The new method can accelerate the use of genomic data for studying cellular metabolism.
Collapse
Affiliation(s)
- Jibin Sun
- Department of Genome Analysis, GBF-German Research Center for Biotechnology, Mascheroder Weg 1, Braunschweig, 38124, Germany
| | - An-Ping Zeng
- Department of Genome Analysis, GBF-German Research Center for Biotechnology, Mascheroder Weg 1, Braunschweig, 38124, Germany
| |
Collapse
|
89
|
Tsoka S, Ouzounis CA. Metabolic database systems for the analysis of genome-wide function. Biotechnol Bioeng 2004; 84:750-5. [PMID: 14708115 DOI: 10.1002/bit.10881] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genome sequencing projects provide an inventory of molecular components for a wide variety of organisms. Metabolic databases integrate these functional descriptions of individual modules into a higher-level characterization of cellular metabolism. This article reviews efforts related to the development of metabolic databases and discusses how such systems have aided the delineation of genome properties. We illustrate the design features of metabolic databases and discuss the challenges facing metabolic as well as databases of other functional type.
Collapse
Affiliation(s)
- Sophia Tsoka
- Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB1O 1SD, UK.
| | | |
Collapse
|
90
|
Abstract
Network representations of biological pathways offer a functional view of molecular biology that is different from and complementary to sequence, expression, and structure databases. There is currently available a wide range of digital collections of pathway data, differing in organisms included, functional area covered (e.g., metabolism vs. signaling), detail of modeling, and support for dynamic pathway construction. While it is currently impossible for these databases to communicate with each other, there are several efforts at standardizing a data exchange language for pathway data. Databases that represent pathway data at the level of individual interactions make it possible to combine data from different predefined pathways and to query by network connectivity. Computable representations of pathways provide a basis for various analyses, including detection of broad network patterns, comparison with mRNA or protein abundance, and simulation.
Collapse
Affiliation(s)
- Carl F Schaefer
- Center for Bioinformatics, National Cancer Institute, National Institutes of Health, 6116 Executive Boulevard, Suite 403, Rockville, MD 20852, USA.
| |
Collapse
|
91
|
Yeh I, Hanekamp T, Tsoka S, Karp PD, Altman RB. Computational analysis of Plasmodium falciparum metabolism: organizing genomic information to facilitate drug discovery. Genome Res 2004; 14:917-24. [PMID: 15078855 PMCID: PMC479120 DOI: 10.1101/gr.2050304] [Citation(s) in RCA: 177] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2003] [Accepted: 01/14/2004] [Indexed: 11/25/2022]
Abstract
Identification of novel targets for the development of more effective antimalarial drugs and vaccines is a primary goal of the Plasmodium genome project. However, deciding which gene products are ideal drug/vaccine targets remains a difficult task. Currently, a systematic disruption of every single gene in Plasmodium is technically challenging. Hence, we have developed a computational approach to prioritize potential targets. A pathway/genome database (PGDB) integrates pathway information with information about the complete genome of an organism. We have constructed PlasmoCyc, a PGDB for Plasmodium falciparum 3D7, using its annotated genomic sequence. In addition to the annotations provided in the genome database, we add 956 additional annotations to proteins annotated as "hypothetical" using the GeneQuiz annotation system. We apply a novel computational algorithm to PlasmoCyc to identify 216 "chokepoint enzymes." All three clinically validated drug targets are chokepoint enzymes. A total of 87.5% of proposed drug targets with biological evidence in the literature are chokepoint reactions. Therefore, identifying chokepoint enzymes represents one systematic way to identify potential metabolic drug targets.
Collapse
Affiliation(s)
- Iwei Yeh
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | | | | | | | | |
Collapse
|
92
|
Claudel-Renard C, Chevalet C, Faraut T, Kahn D. Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res 2004; 31:6633-9. [PMID: 14602924 PMCID: PMC275543 DOI: 10.1093/nar/gkg847] [Citation(s) in RCA: 281] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The advent of fully sequenced genomes opens the ground for the reconstruction of metabolic pathways on the basis of the identification of enzyme-coding genes. Here we describe PRIAM, a method for automated enzyme detection in a fully sequenced genome, based on the classification of enzymes in the ENZYME database. PRIAM relies on sets of position-specific scoring matrices ('profiles') automatically tailored for each ENZYME entry. Automatically generated logical rules define which of these profiles is required in order to infer the presence of the corresponding enzyme in an organism. As an example, PRIAM was applied to identify potential metabolic pathways from the complete genome of the nitrogen-fixing bacterium Sinorhizobium meliloti. The results of this automated method were compared with the original genome annotation and visualised on KEGG graphs in order to facilitate the interpretation of metabolic pathways and to highlight potentially missing enzymes.
Collapse
Affiliation(s)
- Clotilde Claudel-Renard
- Laboratoire de Génétique Cellulaire, INRA, INRA/CNRS, BP27, 31326 Castanet-Tolosan Cedex, France
| | | | | | | |
Collapse
|
93
|
Brüggemann H, Gottschalk G. Insights in metabolism and toxin production from the complete genome sequence of Clostridium tetani. Anaerobe 2004; 10:53-68. [PMID: 16701501 DOI: 10.1016/j.anaerobe.2003.08.001] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2003] [Accepted: 08/21/2003] [Indexed: 01/01/2023]
Abstract
The decryption of prokaryotic genome sequences progresses rapidly and provides the scientific community with an enormous amount of information. Clostridial genome sequencing projects have been finished only recently, starting with the genome of the solvent-producing Clostridium acetobutylicum in 2001. A lot of attention has been devoted to the genomes of pathogenic clostridia. In 2002, the genome sequence of C. perfringens, the causative agent of gas gangrene, has been released. Currently in the finishing stage and prior to publication are the genomes of the foodborne botulism-causing C. botulinum and of C. difficile, the causative agent of a wide spectrum of clinical manifestations such as antibiotic-associated diarrhea. Our team sequenced the genome of neuropathogenic C. tetani, a Gram-positive spore-forming bacterium predominantly found in the soil. In deep wound infections it occasionally causes spastic paralysis in humans and vertebrate animals, known as tetanus disease, by the secretion of potent neurotoxin, designated tetanus toxin. The toxin blocks the release of neurotransmitters from presynaptic membranes of interneurons of the spinal cord and the brainstem, thus preventing muscle relaxation. Fortunately, this disease is successfully controlled through immunization with tetanus toxoid, a formaldehyde-treated tetanus toxin, but nevertheless, an estimated 400,000 cases still occur each year, mainly of neonatal tetanus. The World Health Organization has stated that neonatal tetanus is the second leading cause of death from vaccine preventable diseases among children worldwide. This minireview focuses on an analysis of the genome sequence of C. tetani E88, a vaccine production strain, which is a toxigenic non-sporulating variant of strain Massachusetts. The genome consists of a 2,799,250 bp chromosome encoding 2618 open reading frames. The tetanus toxin is encoded on a 74,082 kb plasmid, containing 61 genes. Additional virulence-related factors as well as an insight into the metabolic strategy of C. tetani with regard to its pathogenic phenotype will be presented. The information from other clostridial genomes by means of comparative analysis will also be explored.
Collapse
Affiliation(s)
- Holger Brüggemann
- Göttingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August-University, Grisebachstr. 8, D-37077 Göttingen, Germany
| | | |
Collapse
|
94
|
|
95
|
Ueda HR, Hayashi S, Matsuyama S, Yomo T, Hashimoto S, Kay SA, Hogenesch JB, Iino M. Universality and flexibility in gene expression from bacteria to human. Proc Natl Acad Sci U S A 2004; 101:3765-9. [PMID: 14999098 PMCID: PMC374318 DOI: 10.1073/pnas.0306244101] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Highly parallel experimental biology is offering opportunities to not just accomplish work more easily, but to explore for underlying governing principles. Recent analysis of the large-scale organization of gene expression has revealed its complex and dynamic nature. However, the underlying dynamics that generate complex gene expression and cellular organization are not yet understood. To comprehensively and quantitatively elucidate these underlying gene expression dynamics, we have analyzed genome-wide gene expression in many experimental conditions in Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster, Mus musculus, and Homo sapiens. Here we demonstrate that the gene expression dynamics follows the same and surprisingly simple principle from E. coli to human, where gene expression changes are proportional to their expression levels, and show that this "proportional" dynamics or "rich-travel-more" mechanism can regenerate the observed complex and dynamic organization of the transcriptome. These findings provide a universal principle in the regulation of gene expression, show how complex and dynamic organization can emerge from simple underlying dynamics, and demonstrate the flexibility of transcription across a wide range of expression levels.
Collapse
Affiliation(s)
- Hiroki R Ueda
- Laboratory for Systems Biology, Center for Developmental Biology, RIKEN, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan.
| | | | | | | | | | | | | | | |
Collapse
|
96
|
Rice J, Stolovitzky G. Making the most of it: pathway reconstruction and integrative simulation using the data at hand. ACTA ACUST UNITED AC 2004. [DOI: 10.1016/s1741-8364(04)02399-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
97
|
|
98
|
Light S, Kraulis P. Network analysis of metabolic enzyme evolution in Escherichia coli. BMC Bioinformatics 2004; 5:15. [PMID: 15113413 PMCID: PMC394313 DOI: 10.1186/1471-2105-5-15] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2003] [Accepted: 02/18/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The two most common models for the evolution of metabolism are the patchwork evolution model, where enzymes are thought to diverge from broad to narrow substrate specificity, and the retrograde evolution model, according to which enzymes evolve in response to substrate depletion. Analysis of the distribution of homologous enzyme pairs in the metabolic network can shed light on the respective importance of the two models. We here investigate the evolution of the metabolism in E. coli viewed as a single network using EcoCyc. RESULTS Sequence comparison between all enzyme pairs was performed and the minimal path length (MPL) between all enzyme pairs was determined. We find a strong over-representation of homologous enzymes at MPL 1. We show that the functionally similar and functionally undetermined enzyme pairs are responsible for most of the over-representation of homologous enzyme pairs at MPL 1. CONCLUSIONS The retrograde evolution model predicts that homologous enzymes pairs are at short metabolic distances from each other. In general agreement with previous studies we find that homologous enzymes occur close to each other in the network more often than expected by chance, which lends some support to the retrograde evolution model. However, we show that the homologous enzyme pairs which may have evolved through retrograde evolution, namely the pairs that are functionally dissimilar, show a weaker over-representation at MPL 1 than the functionally similar enzyme pairs. Our study indicates that, while the retrograde evolution model may have played a small part, the patchwork evolution model is the predominant process of metabolic enzyme evolution.
Collapse
Affiliation(s)
- Sara Light
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm Center for Physics, Astronomy and Biotechnology, Stockholm University, Stockholm SE-10691, Sweden
| | - Per Kraulis
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm Center for Physics, Astronomy and Biotechnology, Stockholm University, Stockholm SE-10691, Sweden
| |
Collapse
|
99
|
Bugrim A, Nikolskaya T, Nikolsky Y. Early prediction of drug metabolism and toxicity: systems biology approach and modeling. Drug Discov Today 2004; 9:127-35. [PMID: 14960390 DOI: 10.1016/s1359-6446(03)02971-4] [Citation(s) in RCA: 114] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Many of the drug candidates that fail in clinical trials are withdrawn because of unforeseen effects of human metabolism, such as toxicity and unfavorable pharmacokinetic profiles. Early pre-clinical elimination of such compounds is important but not yet possible. An ideal system would enable researchers to make a confident elimination decision based purely on the structure of a new compound, and incorporate and use multiple pre-clinical experimental data to support such a decision. Currently available resources can be split into three categories: (i). structure-activity relationships (SAR) computational models based on compound structure; (ii). 'pattern' databases of tissue or organ response to drugs, compiled from high-throughput experiments; and (iii). 'systems biology' databases of metabolic pathways, genes and regulatory networks. In this review, we outline the advantages and drawbacks of each of these systems and suggest directions for their integration.
Collapse
Affiliation(s)
- Andrej Bugrim
- GeneGo, 500 Renaissance Drive, Suite 106, St Joseph, MI 49085, USA.
| | | | | |
Collapse
|
100
|
Lemer C, Antezana E, Couche F, Fays F, Santolaria X, Janky R, Deville Y, Richelle J, Wodak SJ. The aMAZE LightBench: a web interface to a relational database of cellular processes. Nucleic Acids Res 2004; 32:D443-8. [PMID: 14681453 PMCID: PMC308873 DOI: 10.1093/nar/gkh139] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The aMAZE LightBench (http://www.amaze.ulb. ac.be/) is a web interface to the aMAZE relational database, which contains information on gene expression, catalysed chemical reactions, regulatory interactions, protein assembly, as well as metabolic and signal transduction pathways. It allows the user to browse the information in an intuitive way, which also reflects the underlying data model. Moreover links are provided to literature references, and whenever appropriate, to external databases.
Collapse
Affiliation(s)
- Christian Lemer
- The aMAZE Project, SCMBB, Université libre de Bruxelles, boulevard du Triomphe CP 263, B-1050 Bruxelles, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|