101
|
|
102
|
Desiere F. Towards a systems biology understanding of human health: Interplay between genotype, environment and nutrition. BIOTECHNOLOGY ANNUAL REVIEW 2004; 10:51-84. [PMID: 15504703 DOI: 10.1016/s1387-2656(04)10003-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Sequencing of the human genome has opened the door to the most exciting new era for the holistic system description of human health. It is now possible to study the underlying mechanisms of human health in relation to diet and other environmental factors such as drugs and toxic pollutants. Technological advances make it feasible to envisage that in the future personalized drug treatment and dietary advice and possibly tailored food products can be used for promoting optimal health on an individual basis, in relation to genotype and lifestyle. Life-Science research has in the past very much focused on diseases and how to reestablish human health after illness. Today, the role of food and nutrition in human health and especially prevention of illness is gaining recognition. Diseases of modern civilization, such as diabetes, heart disease and cancer have been shown to be effected by dietary patterns. The risk of disease is often associated with genetic polymorphisms, but the effect is dependent on dietary intake and nutritional status. To understand the link between diet and health, nutritional-research must cover a broad range of areas, from the molecular level to whole body studies. Therefore it provides an excellent example of integrative biology requiring a systems biology approach. The current state and implications of systems biology in the understanding of human health are reviewed. It becomes clear that a complete mechanistic description of the human organism is not yet possible. However, recent advances in systems biology provide a trajectory for future research in order to improve health of individuals and populations. Disease prevention through personalized nutrition will become more important as the obvious avenue of research in life sciences and more focus will need to be put upon those natural ways of disease prevention. In particular, the new discipline of nutrigenomics, which investigates how nutrients interact with humans, taking predetermined genetic factors into account, will mediate new insights into human health that will finally have significant positive impact on our quality of life.
Collapse
Affiliation(s)
- Frank Desiere
- Nestlé Research Center, P.O. Box 44, 1000 Lausanne 26, Switzerland.
| |
Collapse
|
103
|
Abstract
The problem of assigning a biochemical function to newly discovered proteins has been traditionally approached by expert enzymological analysis, sequence analysis, and structural modeling. In recent years, the appearance of databases containing protein-ligand interaction data for large numbers of protein classes and chemical compounds have provided new ways of investigating proteins for which the biochemical function is not completely understood. In this work, we introduce a method that utilizes ligand-binding data for functional classification of enzymes. The method makes use of the existing Enzyme Commission (EC) classification scheme and the data on interactions of small molecules with enzymes from the BRENDA database. A set of ligands that binds to an enzyme with unknown biochemical function serves as a query to search a protein-ligand interaction database for enzyme classes that are known to interact with a similar set of ligands. These classes provide hypotheses of the query enzyme's function and complement other computational annotations that take advantage of sequence and structural information. Similarity between sets of ligands is computed using point set similarity measures based upon similarity between individual compounds. We present the statistics of classification of the enzymes in the database by a cross-validation procedure and illustrate the application of the method on several examples.
Collapse
Affiliation(s)
- Sergei Izrailev
- Johnson & Johnson Pharmaceutical Research and Development, Cranbury, New Jersey 08512, USA.
| | | |
Collapse
|
104
|
Martínez-Antonio A, Salgado H, Gama-Castro S, Gutiérrez-Ríos RM, Jiménez-Jacinto V, Collado-Vides J. Environmental conditions and transcriptional regulation inEscherichia coli: a physiological integrative approach. Biotechnol Bioeng 2003; 84:743-9. [PMID: 14708114 DOI: 10.1002/bit.10846] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Bacteria develop a number of devices for sensing, responding, and adapting to different environmental conditions. Understanding within a genomic perspective how the transcriptional machinery of bacteria is modulated, as a response for changing conditions, is a major challenge for biologists. Knowledge of which genes are turned on or turned off under specific conditions is essential for our understanding of cell behavior. In this study we describe how the information pertaining to gene expression and associated growth conditions (even with very little knowledge of the associated regulatory mechanisms) is gathered from the literature and incorporated into RegulonDB, a database on transcriptional regulation and operon organization in E. coli. The link between growth conditions, signal transduction, and transcriptional regulation is modeled in the database in a simple format that highlights biological relevant information. As far as we know, there is no other database that explicitly clarifies the effect of environmental conditions on gene transcription. We discuss how this knowledge constitutes a benchmark that will impact future research aimed at integration of regulatory responses in the cell; for instance, analysis of microarrays, predicting culture behavior in biotechnological processes, and comprehension of dynamics of regulatory networks. This integrated knowledge will contribute to the future goal of modeling the behavior of E. coli as an entire cell. The RegulonDB database can be accessed on the web at the URL: http://www.cifn.unam.mx/Computational_Biology/regulondb/.
Collapse
|
105
|
Allen TE, Herrgård MJ, Liu M, Qiu Y, Glasner JD, Blattner FR, Palsson BØ. Genome-scale analysis of the uses of the Escherichia coli genome: model-driven analysis of heterogeneous data sets. J Bacteriol 2003; 185:6392-9. [PMID: 14563874 PMCID: PMC219383 DOI: 10.1128/jb.185.21.6392-6399.2003] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The recent availability of heterogeneous high-throughput data types has increased the need for scalable in silico methods with which to integrate data related to the processes of regulation, protein synthesis, and metabolism. A sequence-based framework for modeling transcription and translation in prokaryotes has been established and has been extended to study the expression state of the entire Escherichia coli genome. The resulting in silico analysis of the expression state highlighted three facets of gene expression in E. coli: (i) the metabolic resources required for genome expression and protein synthesis were found to be relatively invariant under the conditions tested; (ii) effective promoter strengths were estimated at the genome scale by using global mRNA abundance and half-life data, revealing genes subject to regulation under the experimental conditions tested; and (iii) large-scale genome location-dependent expression patterns with approximately 600-kb periodicity were detected in the E. coli genome based on the 49 expression data sets analyzed. These results support the notion that a structured model-driven analysis of expression data yields additional information that can be subjected to commonly used statistical analyses. The integration of heterogeneous genome-scale data (i.e., sequence, expression data, and mRNA half-life data) is readily achieved in the context of an in silico model.
Collapse
Affiliation(s)
- Timothy E Allen
- Department of Bioengineering, University of California-San Diego, La Jolla, California 92093-0412, USA
| | | | | | | | | | | | | |
Collapse
|
106
|
Fong SS, Marciniak JY, Palsson BØ. Description and interpretation of adaptive evolution of Escherichia coli K-12 MG1655 by using a genome-scale in silico metabolic model. J Bacteriol 2003; 185:6400-8. [PMID: 14563875 PMCID: PMC219384 DOI: 10.1128/jb.185.21.6400-6408.2003] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Genome-scale in silico metabolic networks of Escherichia coli have been reconstructed. By using a constraint-based in silico model of a reconstructed network, the range of phenotypes exhibited by E. coli under different growth conditions can be computed, and optimal growth phenotypes can be predicted. We hypothesized that the end point of adaptive evolution of E. coli could be accurately described a priori by our in silico model since adaptive evolution should lead to an optimal phenotype. Adaptive evolution of E. coli during prolonged exponential growth was performed with M9 minimal medium supplemented with 2 g of alpha-ketoglutarate per liter, 2 g of lactate per liter, or 2 g of pyruvate per liter at both 30 and 37 degrees C, which produced seven distinct strains. The growth rates, substrate uptake rates, oxygen uptake rates, by-product secretion patterns, and growth rates on alternative substrates were measured for each strain as a function of evolutionary time. Three major conclusions were drawn from the experimental results. First, adaptive evolution leads to a phenotype characterized by maximized growth rates that may not correspond to the highest biomass yield. Second, metabolic phenotypes resulting from adaptive evolution can be described and predicted computationally. Third, adaptive evolution on a single substrate leads to changes in growth characteristics on other substrates that could signify parallel or opposing growth objectives. Together, the results show that genome-scale in silico metabolic models can describe the end point of adaptive evolution a priori and can be used to gain insight into the adaptive evolutionary process for E. coli.
Collapse
Affiliation(s)
- Stephen S Fong
- Department of Bioengineering, University of California, San Diego, La Jolla, California 92093-0412, USA
| | | | | |
Collapse
|
107
|
Ehrentreich F, Schomburg D. Dynamic generation and qualitative analysis of metabolic pathways by a joint database/graph theoretical approach. Funct Integr Genomics 2003; 3:189-96. [PMID: 14564666 DOI: 10.1007/s10142-003-0091-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2003] [Revised: 06/15/2003] [Accepted: 06/20/2003] [Indexed: 11/26/2022]
Abstract
The dynamic generation and qualitative analysis of metabolic networks relying on continuously growing qualified metabolic data by a joint database/graph theoretical approach is described. The procedure is applied to analyze the connectivity of a metabolic network after enzyme removal and to subsequently perform shortest path analyses. The focus lies on the analysis of the connectivity of the metabolic network depending on model assumptions. Here we analyze the influence of the number of strongly connected components on the assignment of reversibility or irreversibility of the biochemical reactions.
Collapse
Affiliation(s)
- F Ehrentreich
- Institut für Biochemie, Universität zu Köln, Zülpicher Strasse 47, 50674, Cologne, Germany.
| | | |
Collapse
|
108
|
Wiback SJ, Mahadevan R, Palsson BØ. Reconstructing metabolic flux vectors from extreme pathways: defining the alpha-spectrum. J Theor Biol 2003; 224:313-24. [PMID: 12941590 DOI: 10.1016/s0022-5193(03)00168-1] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The move towards genome-scale analysis of cellular functions has necessitated the development of analytical (in silico) methods to understand such large and complex biochemical reaction networks. One such method is extreme pathway analysis that uses stoichiometry and thermodynamic irreversibly to define mathematically unique, systemic metabolic pathways. These extreme pathways form the edges of a high-dimensional convex cone in the flux space that contains all the attainable steady state solutions, or flux distributions, for the metabolic network. By definition, any steady state flux distribution can be described as a nonnegative linear combination of the extreme pathways. To date, much effort has been focused on calculating, defining, and understanding these extreme pathways. However, little work has been performed to determine how these extreme pathways contribute to a given steady state flux distribution. This study represents an initial effort aimed at defining how physiological steady state solutions can be reconstructed from a network's extreme pathways. In general, there is not a unique set of nonnegative weightings on the extreme pathways that produce a given steady state flux distribution but rather a range of possible values. This range can be determined using linear optimization to maximize and minimize the weightings of a particular extreme pathway in the reconstruction, resulting in what we have termed the alpha-spectrum. The alpha-spectrum defines which extreme pathways can and cannot be included in the reconstruction of a given steady state flux distribution and to what extent they individually contribute to the reconstruction. It is shown that accounting for transcriptional regulatory constraints can considerably shrink the alpha-spectrum. The alpha-spectrum is computed and interpreted for two cases; first, optimal states of a skeleton representation of core metabolism that include transcriptional regulation, and second for human red blood cell metabolism under various physiological, non-optimal conditions.
Collapse
Affiliation(s)
- Sharon J Wiback
- Department of Bioengineering, University of California, 9500 Gilman Drive EBU 1 Room 6607, San Diego, La Jolla, CA 92093, USA
| | | | | |
Collapse
|
109
|
Famili I, Palsson BO. Systemic metabolic reactions are obtained by singular value decomposition of genome-scale stoichiometric matrices. J Theor Biol 2003; 224:87-96. [PMID: 12900206 DOI: 10.1016/s0022-5193(03)00146-2] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Genome-scale metabolic networks can be reconstructed. The systemic biochemical properties of these networks can now be studied. Here, genome-scale reconstructed metabolic networks were analysed using singular value decomposition (SVD). All the individual biochemical conversions contained in a reconstructed metabolic network are described by a stoichiometric matrix (S). SVD of S led to the definition of the underlying modes that characterize the overall biochemical conversions that take place in a network and rank-ordered their importance. The modes were shown to correspond to systemic biochemical reactions and they could be used to identify the groups and clusters of individual biochemical reactions that drive them. Comparative analysis of the Escherichia coli, Haemophilus influenzae, and Helicobacter pylori genome-scale metabolic networks showed that the four dominant modes in all three networks correspond to: (1) the conversion of ATP to ADP, (2) redox metabolism of NADP, (3) proton-motive force, and (4) inorganic phosphate metabolism. The sets of individual metabolic reactions deriving these systemic conversions, however, differed among the three organisms. Thus, we can now define systemic metabolic reactions, or eigen-reactions, for the study of systems biology of metabolism and have a basis for comparing the overall properties of genome-specific metabolic networks.
Collapse
Affiliation(s)
- Iman Famili
- Department of Bioengineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0412, USA
| | | |
Collapse
|
110
|
Wang W, Sun J, Hartlep M, Deckwer WD, Zeng AP. Combined use of proteomic analysis and enzyme activity assays for metabolic pathway analysis of glycerol fermentation by Klebsiella pneumoniae. Biotechnol Bioeng 2003; 83:525-36. [PMID: 12827694 DOI: 10.1002/bit.10701] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The fed-batch fermentation of glycerol to 1,3-propanediol by Klebsiella pneumoniae displayed an unusual dynamic behavior that can be clearly divided into four distinct phases according to cell growth and CO(2) evolution rate. Metabolism changed significantly during the different phases as reflected by the varied specific rates of substrate consumption and product formation. An assay of activities of the three initial enzymes of glycerol metabolism, namely glycerol dehydratase (GDHt), glycerol dehydrogenase (GDH), and 1,3-propanediol-oxidoreductase (PDOR), showed apparently different patterns of expression. To understand the culture dynamics and patterns of enzyme formation at a more systemic level we analyzed the expression patterns of intracellular proteins of K. pneumoniae from different phases of the fed-batch fermentation using two-dimensional gel electrophoresis (2DE). Two new enzymes, namely a phosphoenolpyruvate-dependent dihydroxyacetone kinase (DHAK II) and a hypothetical oxidoreductase (HOR), which are directly related to glycerol metabolism and 1,3-propanediol formation, were identified among the highly expressed proteins. The changes in expression of these new enzymes and several other proteins identified from the 2DE analysis helped to understand not only the dynamic behavior of the fed-batch fermentation reported in this work but also some previously insufficiently understood phenomena related to this fermentation process. In particular, we demonstrated the combined use of proteomic analysis and enzyme activity assay data for metabolic pathway analysis and for a better identification of targets for bioprocess improvement.
Collapse
Affiliation(s)
- Wei Wang
- TU-BCE, German Research Centre for Biotechnology, Mascheroder Weg 1, 38124 Braunschweig, Germany
| | | | | | | | | |
Collapse
|
111
|
Abstract
Our intestine is the site of an extraordinarily complex and dynamic environmentally transmitted consortial symbiosis. The molecular foundations of beneficial symbiotic host-bacterial relationships in the gut are being revealed in part from studies of simplified models of this ecosystem, where germ-free mice are colonized with specified members of the microbial community, and in part from comparisons of the genomes of members of the intestinal microbiota. The results emphasize the contributions of symbionts to postnatal gut development and host physiology, as well as the remarkable strategies these microorganisms have evolved to sustain their alliances. These points are illustrated by the human-Bacteroides thetaiotaomicron symbiosis. Interdisciplinary studies of the effects of the intestinal environment on genome structure and function should provide important new insights about how microbes and humans have coevolved mutually beneficial relationships and new perspectives about the foundations of our health.
Collapse
Affiliation(s)
- Jian Xu
- Department of Molecular Biology and Pharmacology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | |
Collapse
|
112
|
Abstract
MOTIVATION Automated methods for biochemical pathway inference are becoming increasingly important for understanding biological processes in living and synthetic systems. With the availability of data on complete genomes and increasing information about enzyme-catalyzed biochemistry it is becoming feasible to approach this problem computationally. In this paper we present PathMiner, a system for automatic metabolic pathway inference. PathMiner predicts metabolic routes by reasoning over transformations using chemical and biological information. RESULTS We build a biochemical state-space using data from known enzyme-catalyzed transformations in Ligand, including, 2917 unique transformations between 3890 different compounds. To predict metabolic pathways we explore this state-space by developing an informed search algorithm. For this purpose we develop a chemically motivated heuristic to guide the search. Since the algorithm does not depend on predefined pathways, it can efficiently identify plausible routes using known biochemical transformations.
Collapse
Affiliation(s)
- D C McShan
- School of Medicine, University of Colorado, 4200 East Ninth Avenue, C-245 Denver, Colorado 80262, USA
| | | | | |
Collapse
|
113
|
Pawlowski A, Riedel KU, Klipp W, Dreiskemper P, Gross S, Bierhoff H, Drepper T, Masepohl B. Yeast two-hybrid studies on interaction of proteins involved in regulation of nitrogen fixation in the phototrophic bacterium Rhodobacter capsulatus. J Bacteriol 2003; 185:5240-7. [PMID: 12923097 PMCID: PMC181009 DOI: 10.1128/jb.185.17.5240-5247.2003] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Rhodobacter capsulatus contains two PII-like proteins, GlnB and GlnK, which play central roles in controlling the synthesis and activity of nitrogenase in response to ammonium availability. Here we used the yeast two-hybrid system to probe interactions between these PII-like proteins and proteins known to be involved in regulating nitrogen fixation. Analysis of defined protein pairs demonstrated the following interactions: GlnB-NtrB, GlnB-NifA1, GlnB-NifA2, GlnB-DraT, GlnK-NifA1, GlnK-NifA2, and GlnK-DraT. These results corroborate earlier genetic data and in addition show that PII-dependent ammonium regulation of nitrogen fixation in R. capsulatus does not require additional proteins, like NifL in Klebsiella pneumoniae. In addition, we found interactions for the protein pairs GlnB-GlnB, GlnB-GlnK, NifA1-NifA1, NifA2-NifA2, and NifA1-NifA2, suggesting that fine tuning of the nitrogen fixation process in R. capsulatus may involve the formation of GlnB-GlnK heterotrimers as well as NifA1-NifA2 heterodimers. In order to identify new proteins that interact with GlnB and GlnK, we constructed an R. capsulatus genomic library for use in yeast two-hybrid studies. Screening of this library identified the ATP-dependent helicase PcrA as a new putative protein that interacts with GlnB and the Ras-like protein Era as a new protein that interacts with GlnK.
Collapse
Affiliation(s)
- Alice Pawlowski
- Lehrstuhl für Biologie der Mikroorganismen, Fakultät für Biologie, Ruhr-Universität Bochum, D-44780 Bochum, Germany
| | | | | | | | | | | | | | | |
Collapse
|
114
|
Charlebois RL, Clarke GDP, Beiko RG, St Jean A. Characterization of species-specific genes using a flexible, web-based querying system. FEMS Microbiol Lett 2003; 225:213-20. [PMID: 12951244 DOI: 10.1016/s0378-1097(03)00512-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
We describe a query-based web-accessible system (www.neurogadgets.com/bws.php) for facilitating comparative microbial genomics. A variety of query pages are available, each with numerous options, that allow a biologist to pose relevant questions of genomic data. We illustrate with a characterization of species-specific protein-coding genes (so-called "ORFans"), finding that they are on average smaller, faster evolving, and less G+C-rich, and that they encode proteins more basic in their predicted isoelectric point, compared with non-species-specific genes. Using a dual-threshold approach, we conclude that these are characteristics of true species-specific genes, rather than artifacts of mis-annotation.
Collapse
|
115
|
Heffelfinger GS, Martino A, Gorin A, Xu Y, Rintoul MD, Geist A, Al-Hashimi HM, Davidson GS, Faulon JL, Frink LJ, Haaland DM, Hart WE, Jakobsson E, Lane T, Li M, Locascio P, Olken F, Olman V, Palenik B, Plimpton SJ, Roe DC, Samatova NF, Shah M, Shoshoni A, Strauss CEM, Thomas EV, Timlin JA, Xu D. Carbon sequestration in Synechococcus Sp.: from molecular machines to hierarchical modeling. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2003; 6:305-30. [PMID: 12626091 DOI: 10.1089/153623102321112746] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
The U.S. Department of Energy recently announced the first five grants for the Genomes to Life (GTL) Program. The goal of this program is to "achieve the most far-reaching of all biological goals: a fundamental, comprehensive, and systematic understanding of life." While more information about the program can be found at the GTL website (www.doegenomestolife.org), this paper provides an overview of one of the five GTL projects funded, "Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling." This project is a combined experimental and computational effort emphasizing developing, prototyping, and applying new computational tools and methods to elucidate the biochemical mechanisms of the carbon sequestration of Synechococcus Sp., an abundant marine cyanobacteria known to play an important role in the global carbon cycle. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO(2) are important terms in the global environmental response to anthropogenic atmospheric inputs of CO(2) and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. The project includes five subprojects: an experimental investigation, three computational biology efforts, and a fifth which deals with addressing computational infrastructure challenges of relevance to this project and the Genomes to Life program as a whole. Our experimental effort is designed to provide biology and data to drive the computational efforts and includes significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Our computational efforts include coupling molecular simulation methods with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes and developing a set of novel capabilities for inference of regulatory pathways in microbial genomes across multiple sources of information through the integration of computational and experimental technologies. These capabilities will be applied to Synechococcus regulatory pathways to characterize their interaction map and identify component proteins in these pathways. We will also investigate methods for combining experimental and computational results with visualization and natural language tools to accelerate discovery of regulatory pathways. Furthermore, given that the ultimate goal of this effort is to develop a systems-level of understanding of how the Synechococcus genome affects carbon fixation at the global scale, we will develop and apply a set of tools for capturing the carbon fixation behavior of complex of Synechococcus at different levels of resolution. Finally, because the explosion of data being produced by high-throughput experiments requires data analysis and models which are more computationally complex, more heterogeneous, and require coupling to ever increasing amounts of experimentally obtained data in varying formats, we have also established a companion computational infrastructure to support this effort as well as the Genomes to Life program as a whole.
Collapse
Affiliation(s)
- Grant S Heffelfinger
- Sandia National Laboratories, Building 701/2101, MS-0885, 1515 Eubank SE, Albuquerque, NM 87123, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
116
|
|
117
|
Ouzounis CA, Coulson RMR, Enright AJ, Kunin V, Pereira-Leal JB. Classification schemes for protein structure and function. Nat Rev Genet 2003; 4:508-19. [PMID: 12838343 DOI: 10.1038/nrg1113] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We examine the structural and functional classifications of the protein universe, providing an overview of the existing classification schemes, their features and inter-relationships. We argue that a unified scheme should be based on a natural classification approach and that more comparative analyses of the present schemes are required both to understand their limitations and to help delimit the number of known protein folds and their corresponding functional roles in cells.
Collapse
Affiliation(s)
- Christos A Ouzounis
- Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK.
| | | | | | | | | |
Collapse
|
118
|
Bader GD, Heilbut A, Andrews B, Tyers M, Hughes T, Boone C. Functional genomics and proteomics: charting a multidimensional map of the yeast cell. Trends Cell Biol 2003; 13:344-56. [PMID: 12837605 DOI: 10.1016/s0962-8924(03)00127-2] [Citation(s) in RCA: 79] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The challenge of large-scale functional genomics projects is to build a comprehensive map of the cell including genome sequence and gene expression data, information on protein localization, structure, function and expression, post-translational modifications, molecular and genetic interactions and phenotypic descriptions. Some of this broad set of functional genomics data has been already assembled for the budding yeast. Even though molecular cartography of the yeast cell is still far from comprehensive, functional genomics has begun to forge connections between disparate cellular events and to foster numerous hypotheses. Here we review several different genomics and proteomics technologies and describe bioinformatics methods for exploring these data to make new discoveries.
Collapse
Affiliation(s)
- Gary D Bader
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, Box 460, 10021, New York, NY, USA
| | | | | | | | | | | |
Collapse
|
119
|
Abstract
Protein-protein interactions are facilitated by a myriad of residue-residue contacts on the interacting proteins. Identifying the site of interaction in the protein is a key for deciphering its functional mechanisms, and is crucial for drug development. Many studies indicate that the compositions of contacting residues are unique. Here, we describe a neural network that identifies protein-protein interfaces from sequence. For the most strongly predicted sites (in 34 of 333 proteins), 94% of the predictions were confirmed experimentally. When 70% of our predictions were right, we correctly predicted at least one interaction site in 20% of the complexes (66/333). These results indicate that the prediction of some interaction sites from sequence alone is possible. Incorporating evolutionary and predicted structural information may improve our method. However, even at this early stage, our tool might already assist wet-lab biology.
Collapse
Affiliation(s)
- Yanay Ofran
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA.
| | | |
Collapse
|
120
|
Karlsson M, Olson A, Stenlid J. Expressed sequences from the basidiomycetous tree pathogen Heterobasidion annosum during early infection of scots pine. Fungal Genet Biol 2003; 39:51-9. [PMID: 12742063 DOI: 10.1016/s1087-1845(02)00586-8] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The pattern of gene expression of the basidiomycete Heterobasidion annosum, causal agent of the root rot of conifers, was analysed during its interaction with pine roots. A complementary DNA (cDNA) library was constructed from total RNA extracted from H. annosum mycelia challenged with Scots pine seedling roots for 6 and 72h. Single pass sequencing of 1148 randomly selected cDNA clones resulted in 923 expressed sequence tags (ESTs). Contig analysis and sequence comparisons identified 318 unigene sequences, of which 62 were repeatedly sampled. A putative cellular function was assigned to 223 contigs (70%) that showed a moderate to high homology to protein sequences from public databases. Variations in expression levels during the infection process were monitored on a set of 96 unigenes by reverse northern using dot hybridisation. Seven unigenes (7%) were shown to be either up (4) or down (3) regulated during interaction of the fungus with pine roots. Fungal genes differentially expressed during contact with roots include genes encoding mitochondrial proteins, a cytochrome P450 and a vacuolar ATP synthase.
Collapse
Affiliation(s)
- Magnus Karlsson
- Department of Forest Mycology and Pathology, Swedish University of Agricultural Sciences, Box 7026, SE-750 07, Uppsala, Sweden.
| | | | | |
Collapse
|
121
|
Attwood TK, Miller CJ. Progress in bioinformatics and the importance of being earnest. BIOTECHNOLOGY ANNUAL REVIEW 2003; 8:1-54. [PMID: 12436914 DOI: 10.1016/s1387-2656(02)08003-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2023]
Abstract
In silico biology has gathered momentum as, worldwide, scientists have united in a common quest to sequence, store and analyse complete genomes. This year, a pivotal achievement of this cooperative endeavour was realised in the release of a public draft of the human genome, and with it the promises to improve our understanding of diverse aspects of biology and to yield a healthier future with safe personalized medicines. Key to these goals will be the need to elucidate and characterise the genes and gene products encoded not just in the human genome, but in many genomes. These tasks are underpinned by the concepts and processes of genome and gene/protein evolution, regulation of gene expression, mechanisms of protein folding, the manifestation of protein function, and so on, all of which must be understood in the context of complex, dynamic biological systems. Our use of computers to model such concepts and systems must be placed in the context of the current limits of our understanding of them:- it is important to recognise, for example, that we don't have a common understanding either of what constitutes a gene or a protein function; we can't invariably say that a particular sequence or fold has arisen via divergent or convergent evolution; and we don't fully understand the rules of protein folding. Accepting what we can't do in silico is essential in appreciating what we can do. Without this understanding, it is easy to be misled, as notions of what particular computational approaches can achieve are sometimes rather optimistic. There are valuable lessons to be learned here from the field of Artificial Intelligence, principal among which is the realisation that capturing and representing complex knowledge is time consuming, expensive and hard. Thus, we argue here that if bioinformatics is to tackle biological complexity in earnest, it would be wise to absorb the experience distilled from decades of artificial intelligence research, and to approach the road ahead with caution, rigour and pragmatism.
Collapse
Affiliation(s)
- T K Attwood
- School of Biological Sciences, Department of Computer Science, University of Manchester, Oxford Road, Manchester M13 9PT, UK.
| | | |
Collapse
|
122
|
Meyer F, Goesmann A, McHardy AC, Bartels D, Bekel T, Clausen J, Kalinowski J, Linke B, Rupp O, Giegerich R, Pühler A. GenDB--an open source genome annotation system for prokaryote genomes. Nucleic Acids Res 2003; 31:2187-95. [PMID: 12682369 PMCID: PMC153740 DOI: 10.1093/nar/gkg312] [Citation(s) in RCA: 575] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2002] [Revised: 02/04/2003] [Accepted: 02/04/2003] [Indexed: 11/12/2022] Open
Abstract
The flood of sequence data resulting from the large number of current genome projects has increased the need for a flexible, open source genome annotation system, which so far has not existed. To account for the individual needs of different projects, such a system should be modular and easily extensible. We present a genome annotation system for prokaryote genomes, which is well tested and readily adaptable to different tasks. The modular system was developed using an object-oriented approach, and it relies on a relational database backend. Using a well defined application programmers interface (API), the system can be linked easily to other systems. GenDB supports manual as well as automatic annotation strategies. The software currently is in use in more than a dozen microbial genome annotation projects. In addition to its use as a production genome annotation system, it can be employed as a flexible framework for the large-scale evaluation of different annotation strategies. The system is open source.
Collapse
Affiliation(s)
- Folker Meyer
- Center for Genome Research, Department of Biology, Bielefeld University, Bielefeld, Germany.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
123
|
Bhattacharya S, Chakrabarti S, Nayak A, Bhattacharya SK. Metabolic networks of microbial systems. Microb Cell Fact 2003; 2:3. [PMID: 12740044 PMCID: PMC155636 DOI: 10.1186/1475-2859-2-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2003] [Accepted: 04/11/2003] [Indexed: 12/02/2022] Open
Abstract
In contrast to bioreactors the metabolites within the microbial cells are converted in an impure atmosphere, yet the productivity seems to be well regulated and not affected by changes in operation variables. These features are attributed to integral metabolic network within the microorganism. With the advent of neo-integrative proteomic approaches the understanding of integration of metabolic and protein-protein interaction networks have began. In this article we review the methods employed to determine the protein-protein interaction and their integration to define metabolite networks. We further present a review of current understanding of network properties, and benefit of studying the networks. The predictions using network structure, for example, in silico experiments help illustrate the importance of studying the network properties. The cells are regarded as complex system but their elements unlike complex systems interact selectively and nonlinearly to produce coherent rather than complex behaviors.
Collapse
Affiliation(s)
- Sumana Bhattacharya
- Environmental Biotechnology Division, ABRD Company LLC, 1555 Wood Road, Cleveland, Ohio 44121, USA
| | - Subhra Chakrabarti
- Environmental Biotechnology Division, ABRD Company LLC, 1555 Wood Road, Cleveland, Ohio 44121, USA
| | - Amiya Nayak
- Environmental Biotechnology Division, ABRD Company LLC, 1555 Wood Road, Cleveland, Ohio 44121, USA
| | - Sanjoy K Bhattacharya
- Dept. of Ophthalmic Research, Cleveland Clinic Foundation, Area I 31, 9500 Euclid Avenue, Cleveland, Ohio 44195, USA
| |
Collapse
|
124
|
Abstract
In the genomics era, the interactions between proteins are at the center of attention. Genomic-context methods used to predict these interactions have been put on a quantitative basis, revealing that they are at least on an equal footing with genomics experimental data. A survey of experimentally confirmed predictions proves the applicability of these methods, and new concepts to predict protein interactions in eukaryotes have been described. Finally, the interaction networks that can be obtained by combining the predicted pair-wise interactions have enough internal structure to detect higher levels of organization, such as 'functional modules'.
Collapse
Affiliation(s)
- Martijn A Huynen
- Nijmegen Center for Molecular Life Sciences, Center for Molecular and Biomolecular Informatics, Toernooiveld 1, 6525 ED, Nijmegen, The Netherlands.
| | | | | | | |
Collapse
|
125
|
Sutormin RA, Rakhmaninova AB, Gelfand MS. BATMAS30: amino acid substitution matrix for alignment of bacterial transporters. Proteins 2003; 51:85-95. [PMID: 12596266 DOI: 10.1002/prot.10308] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Aligned amino acid sequences of three functionally independent samples of transmembrane (TM) transport proteins have been analyzed. The concept of TM-kernel is proposed as the most probable transmembrane region of a sequence. The average amino acid composition of TM-kernels differs from the published amino acid composition of transmembrane segments. TM-kernels contain more alanines, glycines, and less polar, charged, and aromatic residues in contrast to non-TM-proteins. There are also differences between TM-kernels of bacterial and eukaryotic proteins. We have constructed amino acid substitution matrices for bacterial TM-kernels, named the BATMAS (BActerial Transmembrane MAtrix of Substitutions) series. In TM-kernels, polar and charged residues, as well as proline and tyrosine, are highly conserved, whereas there are more substitutions within the group of hydrophobic residues, in contrast to non-TM-proteins that have fewer, relatively more conserved, hydrophobic residues. These results demonstrate that alignment of transmembrane proteins should be based on at least two amino acid substitution matrices, one for loops (e.g., the BLOSUM series) and one for TM-segments (the BATMAS series), and the choice of the TM-matrix should be different for eukaryotic and bacterial proteins.
Collapse
|
126
|
Bader GD, Hogue CWV. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 2003; 4:2. [PMID: 12525261 PMCID: PMC149346 DOI: 10.1101/gr.123930316 10.1186/1471-2105-4-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2002] [Accepted: 01/13/2003] [Indexed: 07/10/2023] Open
Abstract
BACKGROUND Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery. RESULTS This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation. CONCLUSION Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.
Collapse
Affiliation(s)
- Gary D Bader
- Samuel Lunenfeld Research Institute, Mt. Sinai Hospital, Toronto ON Canada M5G 1X5, Dept. of Biochemistry, University of Toronto, Toronto ON Canada M5S 1A8
- Current address: Memorial Sloan-Kettering Cancer Center 1275 York Avenue, Box 460, New York, NY, 10021, USA
| | - Christopher WV Hogue
- Samuel Lunenfeld Research Institute, Mt. Sinai Hospital, Toronto ON Canada M5G 1X5, Dept. of Biochemistry, University of Toronto, Toronto ON Canada M5S 1A8
| |
Collapse
|
127
|
Bader GD, Hogue CWV. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 2003; 4:2. [PMID: 12525261 PMCID: PMC149346 DOI: 10.1186/1471-2105-4-2] [Citation(s) in RCA: 3536] [Impact Index Per Article: 168.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2002] [Accepted: 01/13/2003] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery. RESULTS This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation. CONCLUSION Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.
Collapse
Affiliation(s)
- Gary D Bader
- Samuel Lunenfeld Research Institute, Mt. Sinai Hospital, Toronto ON Canada M5G 1X5, Dept. of Biochemistry, University of Toronto, Toronto ON Canada M5S 1A8
- Current address: Memorial Sloan-Kettering Cancer Center 1275 York Avenue, Box 460, New York, NY, 10021, USA
| | - Christopher WV Hogue
- Samuel Lunenfeld Research Institute, Mt. Sinai Hospital, Toronto ON Canada M5G 1X5, Dept. of Biochemistry, University of Toronto, Toronto ON Canada M5S 1A8
| |
Collapse
|
128
|
Allen TE, Palsson BØ. Sequence-based analysis of metabolic demands for protein synthesis in prokaryotes. J Theor Biol 2003; 220:1-18. [PMID: 12453446 DOI: 10.1006/jtbi.2003.3087] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Constraints-based models for microbial metabolism can currently be constructed on a genome-scale. These models do not account for RNA and protein synthesis. A scalable formalism to describe translation and transcription that can be integrated with the existing metabolic models is thus needed. Here, we developed such a formalism. The fundamental protein synthesis network described by this formalism was analysed via extreme pathway and flux balance analyses. The protein synthesis network exhibited one extreme pathway per messenger RNA synthesized and one extreme pathway per protein synthesized. The key parameters in this network included promoter strengths, messenger RNA half-lives, and the availability of nucleotide triphosphates, amino acids, RNA polymerase, and active ribosomes. Given these parameters, we were able to calculate a cell's material and energy expenditures for protein synthesis using a flux balance approach. The framework provided herein can subsequently be integrated with genome-scale metabolic models, providing a sequence-based accounting of the metabolic demands resulting from RNA and protein polymerization.
Collapse
Affiliation(s)
- Timothy E Allen
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, 92093-0412, USA
| | | |
Collapse
|
129
|
A Method to Identify Essential Enzymes in the Metabolism: Application to Escherichia Coli. COMPUTATIONAL METHODS IN SYSTEMS BIOLOGY 2003. [DOI: 10.1007/3-540-36481-1_12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
130
|
Overbeek R, Larsen N, Walunas T, D'Souza M, Pusch G, Selkov E, Liolios K, Joukov V, Kaznadzey D, Anderson I, Bhattacharyya A, Burd H, Gardner W, Hanke P, Kapatral V, Mikhailova N, Vasieva O, Osterman A, Vonstein V, Fonstein M, Ivanova N, Kyrpides N. The ERGO genome analysis and discovery system. Nucleic Acids Res 2003; 31:164-71. [PMID: 12519973 PMCID: PMC165577 DOI: 10.1093/nar/gkg148] [Citation(s) in RCA: 176] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The ERGO (http://ergo.integratedgenomics.com/ERGO/) genome analysis and discovery suite is an integration of biological data from genomics, biochemistry, high-throughput expression profiling, genetics and peer-reviewed journals to achieve a comprehensive analysis of genes and genomes. Far beyond any conventional systems that facilitate functional assignments, ERGO combines pattern-based analysis with comparative genomics by visualizing genes within the context of regulation, expression profiling, phylogenetic clusters, fusion events, networked cellular pathways and chromosomal neighborhoods of other functionally related genes. The result of this multifaceted approach is to provide an extensively curated database of the largest available integration of genomes, with a vast collection of reconstructed cellular pathways spanning all domains of life. Although access to ERGO is provided only under subscription, it is already widely used by the academic community. The current version of the system integrates 500 genomes from all domains of life in various levels of completion, 403 of which are available for subscription.
Collapse
Affiliation(s)
- Ross Overbeek
- Integrated Genomics Inc., 2201 West Campbell Park Drive, Chicago, IL 60612, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
131
|
Abstract
MBGD is a workbench system for comparative analysis of completely sequenced microbial genomes. The central function of MBGD is to create an orthologous gene classification table using precomputed all-against-all similarity relationships among genes in multiple genomes. In MBGD, an automated classification algorithm has been implemented so that users can create their own classification table by specifying a set of organisms and parameters. This feature is especially useful when the user's interest is focused on some taxonomically related organisms. The created classification table is stored into the database and can be explored combining with the data of individual genomes as well as similarity relationships among genomes. Using these data, users can carry out comparative analyses from various points of view, such as phylogenetic pattern analysis, gene order comparison and detailed gene structure comparison. MBGD is accessible at http://mbgd.genome.ad.jp/.
Collapse
Affiliation(s)
- Ikuo Uchiyama
- Research Center for Computational Science, Okazaki National Research Institutes, Nishigonaka 38, Myodaiji, Okazaki 444-8585, Japan.
| |
Collapse
|
132
|
van Helden J, Wernisch L, Gilbert D, Wodak SJ. Graph-based analysis of metabolic networks. ERNST SCHERING RESEARCH FOUNDATION WORKSHOP 2002:245-74. [PMID: 12061005 DOI: 10.1007/978-3-662-04747-7_12] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Affiliation(s)
- J van Helden
- Unité de Conformation des Macromolécules Biologiques, Université Libre de Bruxelles, CP 160/16, Avenue F.D. Roosevelt, 50, 1050 Bruxelles, Belgium.
| | | | | | | |
Collapse
|
133
|
Rodionov DA, Vitreschak AG, Mironov AA, Gelfand MS. Comparative genomics of thiamin biosynthesis in procaryotes. New genes and regulatory mechanisms. J Biol Chem 2002; 277:48949-59. [PMID: 12376536 DOI: 10.1074/jbc.m208965200] [Citation(s) in RCA: 251] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Vitamin B(1) in its active form thiamin pyrophosphate is an essential coenzyme that is synthesized by coupling of pyrimidine (hydroxymethylpyrimidine; HMP) and thiazole (hydroxyethylthiazole) moieties in bacteria. Using comparative analysis of genes, operons, and regulatory elements, we describe the thiamin biosynthetic pathway in available bacterial genomes. The previously detected thiamin-regulatory element, thi box (Miranda-Rios, J., Navarro, M., and Soberon, M. (2001) Proc. Natl. Acad. Sci. U. S. A. 98, 9736-9741), was extended, resulting in a new, highly conserved RNA secondary structure, the THI element, which is widely distributed in eubacteria and also occurs in some archaea. Search for THI elements and analysis of operon structures identified a large number of new candidate thiamin-regulated genes, mostly transporters, in various prokaryotic organisms. In particular, we assign the thiamin transporter function to yuaJ in the Bacillus/Clostridium group and the HMP transporter function to an ABC transporter thiXYZ in some proteobacteria and firmicutes. By analogy to the model of regulation of the riboflavin biosynthesis, we suggest thiamin-mediated regulation based on formation of alternative RNA structures involving the THI element. Either transcriptional or translational attenuation mechanism may operate in different taxonomic groups, dependent on the existence of putative hairpins that either act as transcriptional terminators or sequester translation initiation sites. Based on analysis of co-occurrence of the thiamin biosynthetic genes in complete genomes, we predict that eubacteria, archaea, and eukaryota have different pathways for the HMP and hydroxyethylthiazole biosynthesis.
Collapse
|
134
|
Gilman A, Arkin AP. Genetic "code": representations and dynamical models of genetic components and networks. Annu Rev Genomics Hum Genet 2002; 3:341-69. [PMID: 12142360 DOI: 10.1146/annurev.genom.3.030502.111004] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Dynamical modeling of biological systems is becoming increasingly widespread as people attempt to grasp biological phenomena in their full complexity and make sense of an accelerating stream of experimental data. We review a number of recent modeling studies that focus on systems specifically involving gene expression and regulation. These systems include bacterial metabolic operons and phase-variable piliation, bacteriophages T7 and lambda, and interacting networks of eukaryotic developmental genes. A wide range of conceptual and mathematical representations of genetic components and phenomena appears in these works. We discuss these representations in depth and give an overview of the tools currently available for creating and exploring dynamical models. We argue that for modeling to realize its full potential as a mainstream biological research technique the tools must become more general and flexible, and formal, standardized representations of biological knowledge and data must be developed.
Collapse
Affiliation(s)
- Alex Gilman
- Howard Hughes Medical Institute, Berkeley, California, USA.
| | | |
Collapse
|
135
|
Abstract
As the amount of information available to biologists increases exponentially, data analysis becomes progressively more challenging. Sequence homology has been a traditional tool in the researchers' armamentarium; it is a very versatile instrument and can be employed to assist in numerous tasks, from establishing the function of a gene to determination of the evolutionary development of an organism. Consequently, numerous specialized tools have been established in the public domain (most commonly, the World Wide Web) to help investigators use sequence homology in their research. These homology databases differ both in techniques they use to compare sequences as well as in the size of the unit of analysis, which can be the whole gene, a domain, or a motif. In this paper, we aim to present a systematic review of the inner details of the most commonly used databases as well as to offer guidelines for their use.
Collapse
Affiliation(s)
- Alexander Turchin
- Department of Medicine, New England Medical Center, Boston 02111, USA.
| | | |
Collapse
|
136
|
Ibarra RU, Edwards JS, Palsson BO. Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 2002; 420:186-9. [PMID: 12432395 DOI: 10.1038/nature01149] [Citation(s) in RCA: 582] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2001] [Accepted: 09/02/2002] [Indexed: 11/09/2022]
Abstract
Annotated genome sequences can be used to reconstruct whole-cell metabolic networks. These metabolic networks can be modelled and analysed (computed) to study complex biological functions. In particular, constraints-based in silico models have been used to calculate optimal growth rates on common carbon substrates, and the results were found to be consistent with experimental data under many but not all conditions. Optimal biological functions are acquired through an evolutionary process. Thus, incorrect predictions of in silico models based on optimal performance criteria may be due to incomplete adaptive evolution under the conditions examined. Escherichia coli K-12 MG1655 grows sub-optimally on glycerol as the sole carbon source. Here we show that when placed under growth selection pressure, the growth rate of E. coli on glycerol reproducibly evolved over 40 days, or about 700 generations, from a sub-optimal value to the optimal growth rate predicted from a whole-cell in silico model. These results open the possibility of using adaptive evolution of entire metabolic networks to realize metabolic states that have been determined a priori based on in silico analysis.
Collapse
Affiliation(s)
- Rafael U Ibarra
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0412, USA
| | | | | |
Collapse
|
137
|
Ayoubi P, Jin X, Leite S, Liu X, Martajaja J, Abduraham A, Wan Q, Yan W, Misawa E, Prade RA. PipeOnline 2.0: automated EST processing and functional data sorting. Nucleic Acids Res 2002; 30:4761-9. [PMID: 12409467 PMCID: PMC135791 DOI: 10.1093/nar/gkf585] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.
Collapse
Affiliation(s)
- Patricia Ayoubi
- Department of Microbiology and Molecular Genetics and. School of Mechanical and Aerospace Engineering, Oklahoma State University, Stillwater, OK 74078, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
138
|
Lin J, Qian J, Greenbaum D, Bertone P, Das R, Echols N, Senes A, Stenger B, Gerstein M. GeneCensus: genome comparisons in terms of metabolic pathway activity and protein family sharing. Nucleic Acids Res 2002; 30:4574-82. [PMID: 12384605 PMCID: PMC137121 DOI: 10.1093/nar/gkf555] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2002] [Revised: 08/08/2002] [Accepted: 08/08/2002] [Indexed: 11/15/2022] Open
Abstract
We present a prototype of a new database tool, GeneCensus, which focuses on comparing genomes globally, in terms of the collective properties of many genes, rather than in terms of the attributes of a single gene (e.g. sequence similarity for a particular ortholog). The comparisons are presented in a visual fashion over the web at GeneCensus.org. The system concentrates on two types of comparisons: (i) trees based on the sharing of generalized protein families between genomes, and (ii) whole pathway analysis in terms of activity levels. For the trees, we have developed a module (TreeViewer) that clusters genomes in terms of the folds, superfamilies or orthologs--all can be considered as generalized 'families' or 'protein parts'--they share, and compares the resulting trees side-by-side with those built from sequence similarity of individual genes (e.g. a traditional tree built on ribosomal similarity). We also include comparisons to trees built on whole-genome dinucleotide or codon composition. For pathway comparisons, we have implemented a module (PathwayPainter) that graphically depicts, in selected metabolic pathways, the fluxes or expression levels of the associated enzymes (i.e. generalized 'activities'). One can, consequently, compare organisms (and organism states) in terms of representations of these systemic quantities. Develop ment of this module involved compiling, calculating and standardizing flux and expression information from many different sources. We illustrate pathway analysis for enzymes involved in central metabolism. We are able to show that, to some degree, flux and expression fluctuations have characteristic values in different sections of the central metabolism and that control points in this system (e.g. hexokinase, pyruvate kinase, phosphofructokinase, isocitrate dehydrogenase and citric synthase) tend to be especially variable in flux and expression. Both the TreeViewer and PathwayPainter modules connect to other information sources related to individual-gene or organism properties (e.g. a single-gene structural annotation viewer).
Collapse
Affiliation(s)
- J Lin
- Department of Molecular Biophysics and Biochemistry, Yale University, PO Box 208114, New Haven, CT 06520, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
139
|
Drepper T, Raabe K, Giaourakis D, Gendrullis M, Masepohl B, Klipp W. The Hfq-like protein NrfA of the phototrophic purple bacterium Rhodobacter capsulatus controls nitrogen fixation via regulation of nifA and anfA expression. FEMS Microbiol Lett 2002; 215:221-7. [PMID: 12399038 DOI: 10.1111/j.1574-6968.2002.tb11394.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The Rhodobacter capsulatus nrfA gene product exhibits extensive similarity to the nif (nitrogen fixation) regulatory factor NrfA of Azorhizobium caulinodans and the nucleoid-associated protein Hfq of Escherichia coli. Mutational analysis revealed that, in contrast to the situation in A. caulinodans, NrfA is not essential for diazotrophic growth of R. capsulatus, but it is required for maximal growth rates with N(2) as sole nitrogen source via either molybdenum nitrogenase or the alternative nitrogenase. NrfA was shown to control N(2) fixation in R. capsulatus at the level of expression of the regulatory genes nifA1, nifA2 and anfA, encoding the transcriptional activators of all the other nitrogen fixation genes.
Collapse
Affiliation(s)
- Thomas Drepper
- Ruhr-Universität Bochum, Fakultät für Biologie, Lehrstuhl für Biologie der Mikroorganismen, Germany
| | | | | | | | | | | |
Collapse
|
140
|
Rodionov DA, Mironov AA, Gelfand MS. Conservation of the biotin regulon and the BirA regulatory signal in Eubacteria and Archaea. Genome Res 2002; 12:1507-16. [PMID: 12368242 PMCID: PMC187538 DOI: 10.1101/gr.314502] [Citation(s) in RCA: 159] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Biotin is a necessary cofactor of numerous biotin-dependent carboxylases in a variety of microorganisms. The strict control of biotin biosynthesis in Escherichia coli is mediated by the bifunctional BirA protein, which acts both as a biotin-protein ligase and as a transcriptional repressor of the biotin operon. Little is known about regulation of biotin biosynthesis in other bacteria. Using comparative genomics and phylogenetic analysis, we describe the biotin biosynthetic pathway and the BirA regulon in most available bacterial genomes. Existence of an N-terminal DNA-binding domain in BirA strictly correlates with the presence of putative BirA-binding sites upstream of biotin operons. The predicted BirA-binding sites are well conserved among various eubacterial and archaeal genomes. The possible role of the hypothetical genes bioY and yhfS-yhfT, newly identified members of the BirA regulon, in the biotin metabolism is discussed. Based on analysis of co-occurrence of the biotin biosynthetic genes and bioY in complete genomes, we predict involvement of the transmembrane protein BioY in biotin transport. Various nonorthologous substitutes of the bioC-coupled gene bioH from E. coli, observed in several genomes, possibly represent the existence of different pathways for pimeloyl-CoA biosynthesis. Another interesting result of analysis of operon structures and BirA sites is that some biotin-dependent carboxylases from Rhodobacter capsulatus, actinomycetes, and archaea are possibly coregulated with BirA. BirA is the first example of a transcriptional regulator with a conserved binding signal in eubacteria and archaea.
Collapse
|
141
|
Forst CV. Network genomics--a novel approach for the analysis of biological systems in the post-genomic era. Mol Biol Rep 2002; 29:265-80. [PMID: 12463419 DOI: 10.1023/a:1020437311167] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Network Genomics studies genomics and proteomics foundations of cellular networks in biological systems. It complements systems biology in providing information on elements, their interaction and their functional interplay in cellular networks. The relationship between genomic and proteomic high-throughput technologies and computational methods are described, as well as several examples of specific network genomic application are presented.
Collapse
Affiliation(s)
- Christian V Forst
- Bioscience Division, Mailstop M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| |
Collapse
|
142
|
Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási AL. Hierarchical organization of modularity in metabolic networks. Science 2002; 297:1551-5. [PMID: 12202830 DOI: 10.1126/science.1073374] [Citation(s) in RCA: 1969] [Impact Index Per Article: 89.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Spatially or chemically isolated functional modules composed of several cellular components and carrying discrete functions are considered fundamental building blocks of cellular organization, but their presence in highly integrated biochemical networks lacks quantitative support. Here, we show that the metabolic networks of 43 distinct organisms are organized into many small, highly connected topologic modules that combine in a hierarchical manner into larger, less cohesive units, with their number and degree of clustering following a power law. Within Escherichia coli, the uncovered hierarchical modularity closely overlaps with known metabolic functions. The identified network architecture may be generic to system-level cellular organization.
Collapse
Affiliation(s)
- E Ravasz
- Department of Physics, University of Notre Dame, Notre Dame, IN 46556, USA
| | | | | | | | | |
Collapse
|
143
|
Teplyakov A, Obmolova G, Tordova M, Thanki N, Bonander N, Eisenstein E, Howard AJ, Gilliland GL. Crystal structure of the YjeE protein from Haemophilus influenzae: a putative Atpase involved in cell wall synthesis. Proteins 2002; 48:220-6. [PMID: 12112691 DOI: 10.1002/prot.10114] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A hypothetical protein encoded by the gene YjeE of Haemophilus influenzae was selected as part of a structural genomics project for X-ray analysis to assist with the functional assignment. The protein is considered essential to bacteria because the gene is present in virtually all bacterial genomes but not in those of archaea or eukaryotes. The amino acid sequence shows no homology to other proteins except for the presence of the Walker A motif G-X-X-X-X-G-K-T that indicates the possibility of a nucleotide-binding protein. The YjeE protein was cloned, expressed, and the crystal structure determined by the MAD method at 1.7-A resolution. The protein has a nucleotide-binding fold with a four-stranded parallel beta-sheet flanked by antiparallel beta-strands on each side. The topology of the beta-sheet is unique among P-loop proteins and has features of different families of enzymes. Crystallization of YjeE in the presence of ATP and Mg2+ resulted in the structure with ADP bound in the P-loop. The ATPase activity of YjeE was confirmed by kinetic measurements. The distribution of conserved residues suggests that the protein may work as a "molecular switch" triggered by ATP hydrolysis. The phylogenetic pattern of YjeE suggests its involvement in cell wall biosynthesis.
Collapse
Affiliation(s)
- Alexey Teplyakov
- Center for Advanced Research in Biotechnology of the University of Maryland Biotechnology Institute, Rockville, Maryland 20850, USA. )
| | | | | | | | | | | | | | | |
Collapse
|
144
|
Gerdes SY, Scholle MD, D'Souza M, Bernal A, Baev MV, Farrell M, Kurnasov OV, Daugherty MD, Mseeh F, Polanuyer BM, Campbell JW, Anantha S, Shatalin KY, Chowdhury SAK, Fonstein MY, Osterman AL. From genetic footprinting to antimicrobial drug targets: examples in cofactor biosynthetic pathways. J Bacteriol 2002; 184:4555-72. [PMID: 12142426 PMCID: PMC135229 DOI: 10.1128/jb.184.16.4555-4572.2002] [Citation(s) in RCA: 222] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Novel drug targets are required in order to design new defenses against antibiotic-resistant pathogens. Comparative genomics provides new opportunities for finding optimal targets among previously unexplored cellular functions, based on an understanding of related biological processes in bacterial pathogens and their hosts. We describe an integrated approach to identification and prioritization of broad-spectrum drug targets. Our strategy is based on genetic footprinting in Escherichia coli followed by metabolic context analysis of essential gene orthologs in various species. Genes required for viability of E. coli in rich medium were identified on a whole-genome scale using the genetic footprinting technique. Potential target pathways were deduced from these data and compared with a panel of representative bacterial pathogens by using metabolic reconstructions from genomic data. Conserved and indispensable functions revealed by this analysis potentially represent broad-spectrum antibacterial targets. Further target prioritization involves comparison of the corresponding pathways and individual functions between pathogens and the human host. The most promising targets are validated by direct knockouts in model pathogens. The efficacy of this approach is illustrated using examples from metabolism of adenylate cofactors NAD(P), coenzyme A, and flavin adenine dinucleotide. Several drug targets within these pathways, including three distantly related adenylyltransferases (orthologs of the E. coli genes nadD, coaD, and ribF), are discussed in detail.
Collapse
|
145
|
Abstract
Several models have been proposed to explain the origin and evolution of enzymes in metabolic pathways. Initially, the retro-evolution model proposed that, as enzymes at the end of pathways depleted their substrates in the primordial soup, there was a pressure for earlier enzymes in pathways to be created, using the later ones as initial template, in order to replenish the pools of depleted metabolites. Later, the recruitment model proposed that initial templates from other pathways could be used as long as those enzymes were similar in chemistry or substrate specificity. These two models have dominated recent studies of enzyme evolution. These studies are constrained by either the small scale of the study or the artificial restrictions imposed by pathway definitions. Here, a network approach is used to study enzyme evolution in fully sequenced genomes, thus removing both constraints. We find that homologous pairs of enzymes are roughly twice as likely to have evolved from enzymes that are less than three steps away from each other in the reaction network than pairs of non-homologous enzymes. These results, together with the conservation of the type of chemical reaction catalyzed by evolutionarily related enzymes, suggest that functional blocks of similar chemistry have evolved within metabolic networks. One possible explanation for these observations is that this local evolution phenomenon is likely to cause less global physiological disruptions in metabolism than evolution of enzymes from other enzymes that are distant from them in the metabolic network.
Collapse
Affiliation(s)
- Rui Alves
- Department of Biological Sciences, Structural Bioinformatics Group, Biochemistry Building, Imperial College of Science, Technology and Medicine, London SW7 2AZ, UK
| | | | | |
Collapse
|
146
|
Vitreschak AG, Rodionov DA, Mironov AA, Gelfand MS. Regulation of riboflavin biosynthesis and transport genes in bacteria by transcriptional and translational attenuation. Nucleic Acids Res 2002; 30:3141-51. [PMID: 12136096 PMCID: PMC135753 DOI: 10.1093/nar/gkf433] [Citation(s) in RCA: 242] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The riboflavin biosynthesis in bacteria was analyzed using comparative analysis of genes, operons and regulatory elements. A model for regulation based on formation of alternative RNA structures involving the RFN elements is suggested. In Gram-positive bacteria including actinomycetes, Thermotoga, Thermus and Deinococcus, the riboflavin metabolism and transport genes are predicted to be regulated by transcriptional attenuation, whereas in most Gram-negative bacteria, the riboflavin biosynthesis genes seem to be regulated on the level of translation initiation. Several new candidate riboflavin transporters were identified (impX in Desulfitobacterium halfniense and Fusobacterium nucleatum; pnuX in several actinomycetes, including some Corynebacterium species and Strepto myces coelicolor; rfnT in Rhizobiaceae). Traces of a number of likely horizontal transfer events were found: the complete riboflavin operon with the upstream regulatory element was transferred to Haemophilus influenzae and Actinobacillus pleuropneumoniae from some Gram-positive bacterium; non-regulated riboflavin operon in Pyrococcus furiousus was likely transferred from Thermotoga; and the RFN element was inserted into the riboflavin operon of Pseudomonas aeruginosa from some other Pseudomonas species, where it had regulated the ribH2 gene.
Collapse
|
147
|
Abstract
Small-molecule metabolism forms the core of the metabolic processes of all living organisms. As early as 1945, possible mechanisms for the evolution of such a complex metabolic system were considered. The problem is to explain the appearance and development of a highly regulated complex network of interacting proteins and substrates from a limited structural and functional repertoire. By permitting the co-analysis of phylogeny and metabolism, the combined exploitation of pathway and structural databases, as well as the use of multiple-sequence alignment search algorithms, sheds light on this problem. Much of the current research suggests a chemistry-driven 'patchwork' model of pathway evolution, but other mechanisms may play a role. In the future, as metabolic structure and sequence space are further explored, it should become easier to trace the finer details of pathway development and understand how complexity has evolved.
Collapse
Affiliation(s)
- Stuart C G Rison
- Department of Biochemistry and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | | |
Collapse
|
148
|
Jardine O, Gough J, Chothia C, Teichmann SA. Comparison of the small molecule metabolic enzymes of Escherichia coli and Saccharomyces cerevisiae. Genome Res 2002; 12:916-29. [PMID: 12045145 PMCID: PMC313875 DOI: 10.1101/gr.228002] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The comparison of the small molecule metabolism pathways in Escherichia coli and Saccharomyces cerevisiae (yeast) shows that 271 enzymes are common to both organisms. These common enzymes involve 384 gene products in E. coli and 390 in yeast, which are between one half and two thirds of the gene products of small molecule metabolism in E. coli and yeast, respectively. The arrangement and family membership of the domains that form all or part of 374 E. coli sequences and 343 yeast sequences was determined. Of these, 70% consist entirely of homologous domains, and 20% have homologous domains linked to other domains that are unique to E. coli, yeast, or both. Over two thirds of the enzymes common to the two organisms have sequence identities between 30% and 50%. The remaining groups include 13 clear cases of nonorthologous displacement. Our calculations show that at most one half to two thirds of the gene products involved in small molecule metabolism are common to E. coli and yeast. We have shown that the common core of 271 enzymes has been largely conserved since the separation of prokaryotes and eukaryotes, including modifications for regulatory purposes, such as gene fusion and changes in the number of isozymes in one of the two organisms. Only one fifth of the common enzymes have nonhomologous domains between the two organisms. Around the common core very different extensions have been made to small molecule metabolism in the two organisms.
Collapse
Affiliation(s)
- Oliver Jardine
- Department of Crystallography, Birkbeck College, London WC1E 7HX, United Kingdom
| | | | | | | |
Collapse
|
149
|
Rison SCG, Teichmann SA, Thornton JM. Homology, pathway distance and chromosomal localization of the small molecule metabolism enzymes in Escherichia coli. J Mol Biol 2002; 318:911-32. [PMID: 12054833 DOI: 10.1016/s0022-2836(02)00140-7] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Here, we analyse Escherichia coli enzymes involved in small molecule metabolism (SMM). We introduce the concept of pathway distance as a measure of the number of distinct metabolic steps separating two SMM enzymes, and we consider protein homology (as determined by assigning enzymes to structural and sequence families) and gene interval (the number of genes separating two genes on the E. coli chromosome). The relationships between these three contexts (pathway distance, homology and chromosomal localisation) is investigated extensively. We make use of these relationships to suggest possible SMM evolution mechanisms. Homology between enzyme pairs close in the SMM was higher than expected by chance but was still rare. When observed, homologues usually conserved their reaction mechanism and/or co-factor binding rather than shared substrate binding. The correlation between pathway distance and gene intervals was clear. Enzymes catalysing nearby SMM reactions were usually encoded by genes close by on the E. coli chromosome. We found many co-regulated blocks of three to four genes (usually non-homologous) encoding enzymes occurring within four metabolic steps of one another; nearly all of these blocks formed part of known or predicted operons. The "inline reuse" of enzymes (i.e. the use of the same enzyme to catalyse two or more different steps of a metabolic pathway) is also discussed: of these enzymes, four were multifunctional (i.e. catalysed a different reaction in each instance), nine had multiple substrate specificity (i.e. catalysed the same reaction on different substrates in each instance) and one catalysed the same reaction on the same substrate but as part of two different complexes. We also identified 59 sets of isozymic proteins most commonly duplicated to function under different conditions, or with a different preferred substrate or minor substrate. In addition to transcriptional units, isozymes and inline reuse of enzymes provide mechanisms for controlling the SMM network. Our data suggest that several pathway evolution mechanisms may occur in concert, although chemistry-driven duplication/recruitment is favoured. SMM exploits regulatory strategies involving chromosomal location, isozymes and the reuse of enzymes.
Collapse
Affiliation(s)
- Stuart C G Rison
- Department of Biochemistry and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | | | | |
Collapse
|
150
|
Jones KM, Haselkorn R. Newly identified cytochrome c oxidase operon in the nitrogen-fixing cyanobacterium Anabaena sp. strain PCC 7120 specifically induced in heterocysts. J Bacteriol 2002; 184:2491-9. [PMID: 11948164 PMCID: PMC134978 DOI: 10.1128/jb.184.9.2491-2499.2002] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Two operons have been cloned from Anabaena sp. strain PCC 7120 DNA, each of which encodes the three core subunits of distinct mitochondrial-type cytochrome c oxidases. The two operons are only 72 to 85% similar to one another at the nucleotide level in the most conserved subunit. One of these, coxBACII, is induced >20-fold in the middle to late stages of heterocyst differentiation. Analysis of green fluorescent protein reporters indicates that this operon is expressed specifically in proheterocysts and heterocysts. The other operon, coxBACI, is induced only 2.5-fold following nitrogen step-down and is expressed in all cells. Surprisingly, a disruption mutant of coxAII, the gene encoding subunit I of the heterocyst-specific oxidase, grows normally in the absence of combined nitrogen. It is likely that coxBACI and/or two other putative terminal oxidases present in the Anabaena sp. strain PCC 7120 genome are able to compensate for the loss of the heterocyst-specific oxidase in providing ATP for nitrogen fixation and maintaining a low oxygen level in heterocysts.
Collapse
Affiliation(s)
- Kathryn M Jones
- Department of Molecular Genetics and Cell Biology, University of Chicago, Chicago, Illinois 60637, USA.
| | | |
Collapse
|