151
|
Price ND, Papin JA, Palsson BØ. Determination of redundancy and systems properties of the metabolic network of Helicobacter pylori using genome-scale extreme pathway analysis. Genome Res 2002; 12:760-9. [PMID: 11997342 PMCID: PMC186586 DOI: 10.1101/gr.218002] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The capabilities of genome-scale metabolic networks can be described through the determination of a set of systemically independent and unique flux maps called extreme pathways. The first study of genome-scale extreme pathways for the simultaneous formation of all nonessential amino acids or ribonucleotides in Helicobacter pylori is presented. Three key results were obtained. First, the extreme pathways for the production of individual amino acids in H. pylori showed far fewer internal states per external state than previously found in Haemophilus influenzae, indicating a more rigid metabolic network. Second, the degree of pathway redundancy in H. pylori was essentially the same for the production of individual amino acids and linked amino acid sets, but was approximately twice that of the production of the ribonucleotides. Third, the metabolic network of H. pylori was unable to achieve extensive conversion of amino acids consumed to the set of either nonessential amino acids or ribonucleotides and thus diverted a large portion of its nitrogen to ammonia production, a potentially important result for pH regulation in its acidic habitat. Genome-scale extreme pathways elucidate emergent system-wide properties. Extreme pathway analysis is emerging as a potentially important method to analyze the link between the metabolic genotype and its phenotypes.
Collapse
Affiliation(s)
- Nathan D Price
- Department of Bioengineering, University of California at San Diego, La Jolla, California 92093, USA
| | | | | |
Collapse
|
152
|
Akashi H, Gojobori T. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc Natl Acad Sci U S A 2002; 99:3695-700. [PMID: 11904428 PMCID: PMC122586 DOI: 10.1073/pnas.062526999] [Citation(s) in RCA: 456] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2001] [Indexed: 01/11/2023] Open
Abstract
Biosynthesis of an Escherichia coli cell, with organic compounds as sources of energy and carbon, requires approximately 20 to 60 billion high-energy phosphate bonds [Stouthamer, A. H. (1973) Antonie van Leeuwenhoek 39, 545-565]. A substantial fraction of this energy budget is devoted to biosynthesis of amino acids, the building blocks of proteins. The fueling reactions of central metabolism provide precursor metabolites for synthesis of the 20 amino acids incorporated into proteins. Thus, synthesis of an amino acid entails a dual cost: energy is lost by diverting chemical intermediates from fueling reactions and additional energy is required to convert precursor metabolites to amino acids. Among amino acids, costs of synthesis vary from 12 to 74 high-energy phosphate bonds per molecule. The energetic advantage to encoding a less costly amino acid in a highly expressed gene can be greater than 0.025% of the total energy budget. Here, we provide evidence that amino acid composition in the proteomes of E. coli and Bacillus subtilis reflects the action of natural selection to enhance metabolic efficiency. We employ synonymous codon usage bias as a measure of translation rates and show increases in the abundance of less energetically costly amino acids in highly expressed proteins.
Collapse
Affiliation(s)
- Hiroshi Akashi
- Institute of Molecular Evolutionary Genetics and Department of Biology, 208 Mueller Laboratory, Pennsylvania State University, University Park, PA 16802, USA.
| | | |
Collapse
|
153
|
Papin JA, Price ND, Edwards JS, Palsson B BØ. The genome-scale metabolic extreme pathway structure in Haemophilus influenzae shows significant network redundancy. J Theor Biol 2002; 215:67-82. [PMID: 12051985 DOI: 10.1006/jtbi.2001.2499] [Citation(s) in RCA: 96] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Genome-scale metabolic networks can be characterized by a set of systemically independent and unique extreme pathways. These extreme pathways span a convex, high-dimensional space that circumscribes all potential steady-state flux distributions achievable by the defined metabolic network. Genome-scale extreme pathways associated with the production of non-essential amino acids in Haemophilus influenzae were computed. They offer valuable insight into the functioning of its metabolic network. Three key results were obtained. First, there were multiple internal flux maps corresponding to externally indistinguishable states. It was shown that there was an average of 37 internal states per unique exchange flux vector in H. influenzae when the network was used to produce a single amino acid while allowing carbon dioxide and acetate as carbon sinks. With the inclusion of succinate as an additional output, this ratio increased to 52, a 40% increase. Second, an analysis of the carbon fates illustrated that the extreme pathways were non-uniformly distributed across the carbon fate spectrum. In the detailed case study, 45% of the distinct carbon fate values associated with lysine production represented 85% of the extreme pathways. Third, this distribution fell between distinct systemic constraints. For lysine production, the carbon fate values that represented 85% of the pathways described above corresponded to only 2 distinct ratios of 1:1 and 4:1 between carbon dioxide and acetate. The present study analysed single outputs from one organism, and provides a start to genome-scale extreme pathways studies. These emergent system-level characterizations show the significance of metabolic extreme pathway analysis at the genome-scale.
Collapse
Affiliation(s)
- Jason A Papin
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0412, USA
| | | | | | | |
Collapse
|
154
|
Abstract
A number of technological innovations are yielding unprecedented data on the networks of biochemical, genetic, and biophysical reactions that underlie cellular behavior and failure. These networks are composed of hundreds to thousands of chemical species and structures, interacting via nonlinear and possibly stochastic physical processes. A central goal of modern biology is to optimally use the data on these networks to understand how their design leads to the observed cellular behaviors and failures. Ultimately, this knowledge should enable cellular engineers to redesign cellular processes to meet industrial needs (such as optimal natural product synthesis), aid in choosing the most effective targets for pharmaceuticals, and tailor treatment for individual genotypes. The size and complexity of these networks and the inevitable lack of complete data, however, makes reaching these goals extremely difficult. If it proves possible to modularize these networks into functional subnetworks, then these smaller networks may be amenable to direct analysis and might serve as regulatory motifs. These motifs, recurring elements of control, may help to deduce the structure and function of partially known networks and form the basis for fulfilling the goals described above. A number of approaches to identifying and analyzing control motifs in intracellular networks are reviewed.
Collapse
Affiliation(s)
- C V Rao
- Department of Bioengineering, University of California, Berkeley, CA 94720, USA.
| | | |
Collapse
|
155
|
Abstract
As the number of completed genome sequences increases, there is increasing emphasis on comparative genomic analysis of closely related organisms. Comparison of the similarities and differences between the five publicly available Salmonella genome sequences reveals extensive sequence conservation among the Salmonella serovars. However, horizontal gene transfer has provided each genome with between 10% and 12% of unique DNA. Genome comparisons of the closely related salmonellae emphasize the insights that can be gleaned from sequencing genomes of a single species.
Collapse
Affiliation(s)
- Robert A Edwards
- University of Tennessee Health Sciences Center, MSB 101 858 Madison Ave, Memphis, TN 38163, USA.
| | | | | |
Collapse
|
156
|
Wolf YI, Karev G, Koonin EV. Scale-free networks in biology: new insights into the fundamentals of evolution? Bioessays 2002; 24:105-9. [PMID: 11835273 DOI: 10.1002/bies.10059] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Scale-free network models describe many natural and social phenomena. In particular, networks of interacting components of a living cell were shown to possess scale-free properties. A recent study((1)) compares the system-level properties of metabolic and information networks in 43 archaeal, bacterial and eukaryal species and claims that the scale-free organization of these networks is more conserved during evolution than their content.
Collapse
Affiliation(s)
- Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | |
Collapse
|
157
|
Edwards JS, Ramakrishna R, Palsson BO. Characterizing the metabolic phenotype: a phenotype phase plane analysis. Biotechnol Bioeng 2002; 77:27-36. [PMID: 11745171 DOI: 10.1002/bit.10047] [Citation(s) in RCA: 107] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Genome-scale metabolic maps can be reconstructed from annotated genome sequence data, biochemical literature, bioinformatic analysis, and strain-specific information. Flux-balance analysis has been useful for qualitative and quantitative analysis of metabolic reconstructions. In the past, FBA has typically been performed in one growth condition at a time, thus giving a limited view of the metabolic capabilities of a metabolic network. We have broadened the use of FBA to map the optimal metabolic flux distribution onto a single plane, which is defined by the availability of two key substrates. A finite number of qualitatively distinct patterns of metabolic pathway utilization were identified in this plane, dividing it into discrete phases. The characteristics of these distinct phases are interpreted using ratios of shadow prices in the form of isoclines. The isoclines can be used to classify the state of the metabolic network. This methodology gives rise to a "phase plane" analysis of the metabolic genotype-phenotype relation relevant for a range of growth conditions. Phenotype phase planes (PhPPs) were generated for Escherichia coli growth on two carbon sources (acetate and glucose) at all levels of oxygenation, and the resulting optimal metabolic phenotypes were studied. Supplementary information can be downloaded from our website (http://epicurus.che.udel.edu).
Collapse
Affiliation(s)
- Jeremy S Edwards
- Department of Chemical Engineering, University of Delaware, Newark, Delaware 19716, USA.
| | | | | |
Collapse
|
158
|
|
159
|
Lee JM, Zhang S, Saha S, Santa Anna S, Jiang C, Perkins J. RNA expression analysis using an antisense Bacillus subtilis genome array. J Bacteriol 2001; 183:7371-80. [PMID: 11717296 PMCID: PMC95586 DOI: 10.1128/jb.183.24.7371-7380.2001] [Citation(s) in RCA: 84] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We have developed an antisense oligonucleotide microarray for the study of gene expression and regulation in Bacillus subtilis by using Affymetrix technology. Quality control tests of the B. subtilis GeneChip were performed to ascertain the quality of the array. These tests included optimization of the labeling and hybridization conditions, determination of the linear dynamic range of gene expression levels, and assessment of differential gene expression patterns of known vitamin biosynthetic genes. In minimal medium, we detected transcripts for approximately 70% of the known open reading frames (ORFs). In addition, we were able to monitor the transcript level of known biosynthetic genes regulated by riboflavin, biotin, or thiamine. Moreover, novel transcripts were also detected within intergenic regions and on the opposite coding strand of known ORFs. Several of these novel transcripts were subsequently correlated to new coding regions.
Collapse
Affiliation(s)
- J M Lee
- Roche Vitamins Inc., Nutley, New Jersey 07110, USA
| | | | | | | | | | | |
Collapse
|
160
|
Masepohl B, Führer F, Klipp W. Genetic analysis of a Rhodobacter capsulatus gene region involved in utilization of taurine as a sulfur source. FEMS Microbiol Lett 2001; 205:105-11. [PMID: 11728723 DOI: 10.1111/j.1574-6968.2001.tb10932.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Rhodobacter capsulatus was shown to grow efficiently with taurine as sole source of sulfur. We identified a gene region exhibiting similarity to the Escherichia coli tauABC genes coding for a taurine-specific ABC transporter. The R. capsulatus tauABC genes were flanked by two putative operons (orf459-484-590 and cysE-srpI-nifS2) both reading in opposite direction relative to tauABC. Orf459 shows strong similarity to taurine:pyruvate aminotransferase (Tpa) from Bilophila wadsworthia catalyzing the initial transamination during anaerobic taurine degradation, and Orf590 exhibits clear similarity to sulfoacetaldehyde sulfo-lyase from Desulfonispora thiosulfatigenes probably catalyzing the step following the taurine:pyruvate aminotransferase (Tpa) reaction, whereas nifS2 might code for a putative cysteine desulfurase. Expression of R. capsulatus tauABC and nifS2 was inhibited by sulfate, suggesting that tauABC and nifS2 might belong to the same regulon. In contrast, transcription of orf459 was not inhibited by sulfate but was induced by taurine. A tauAB deletion mutant showed significantly reduced growth compared to the wild-type with taurine as sole sulfur source in the presence of serine as a nitrogen source, whereas normal growth was observed in the presence of taurine and ammonium. Deletion of orf459-484-590 completely abolished growth with taurine/serine. Single mutations in any of the three genes resulted in the same phenotype, indicating that all three genes of this putative operon are essential for taurine sulfur utilization in the presence of serine. A model for anaerobic taurine sulfur assimilation in R. capsulatus is discussed.
Collapse
Affiliation(s)
- B Masepohl
- Ruhr-Universität Bochum, Fakultät für Biologie, Lehrstuhl für Biologie der Mikroorganismen, D-44780 Bochum, Germany
| | | | | |
Collapse
|
161
|
Abstract
A structural survey of the Escherichia coli proteins occurring in metabolic networks in the KEGG database (release 19 of LIGAND) has been carried out. A measure of structural coverage of a network is defined and calculated for each network. Twenty-four networks have 50 % or more of the enzyme steps assigned in E. coli and of these 21 have a structural coverage of 50 % or more. For those proteins that have a region matching a SCOP domain 50 % fall on or below the 30 % sequence identity threshold and represent non-trivial comparative modelling targets highlighting the need for experimental structure determination studies. The survey reveals the predominance of alpha/beta and alpha+beta folds for enzymes involved in metabolic pathways and that this general trend is maintained at the level of each pathway. The most popular superfamilies are coenzyme binding domains and are involved in the supply of energy to reactions. Although a few superfamilies are found in many pathways, in general there is a specificity of a particular superfamily for a particular pathway.
Collapse
Affiliation(s)
- M A Saqi
- Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, 44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | | |
Collapse
|
162
|
Osipiuk J, Górnicki P, Maj L, Dementieva I, Laskowski R, Joachimiak A. Streptococcus pneumonia YlxR at 1.35 A shows a putative new fold. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2001; 57:1747-51. [PMID: 11679764 PMCID: PMC2792016 DOI: 10.1107/s0907444901014019] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2001] [Accepted: 08/23/2001] [Indexed: 11/10/2022]
Abstract
The structure of the YlxR protein of unknown function from Streptococcus pneumonia was determined to 1.35 A. YlxR is expressed from the nusA/infB operon in bacteria and belongs to a small protein family (COG2740) that shares a conserved sequence motif GRGA(Y/W). The family shows no significant amino-acid sequence similarity with other proteins. Three-wavelength diffraction MAD data were collected to 1.7 A from orthorhombic crystals using synchrotron radiation and the structure was determined using a semi-automated approach. The YlxR structure resembles a two-layer alpha/beta sandwich with the overall shape of a cylinder and shows no structural homology to proteins of known structure. Structural analysis revealed that the YlxR structure represents a new protein fold that belongs to the alpha-beta plait superfamily. The distribution of the electrostatic surface potential shows a large positively charged patch on one side of the protein, a feature often found in nucleic acid-binding proteins. Three sulfate ions bind to this positively charged surface. Analysis of potential binding sites uncovered several substantial clefts, with the largest spanning 3/4 of the protein. A similar distribution of binding sites and a large sharply bent cleft are observed in RNA-binding proteins that are unrelated in sequence and structure. It is proposed that YlxR is an RNA-binding protein.
Collapse
Affiliation(s)
- Jerzy Osipiuk
- Argonne National Laboratory, Structural Biology Center, Biosciences Division, 9700 South Cass Avenue, Argonne, IL 60439, USA
| | - Piotr Górnicki
- Department of Molecular Genetics and Cell Biology, University of Chicago, 920 East 58th Street, Chicago, IL 60637, USA
| | - Luke Maj
- Argonne National Laboratory, Structural Biology Center, Biosciences Division, 9700 South Cass Avenue, Argonne, IL 60439, USA
| | - Irina Dementieva
- Argonne National Laboratory, Structural Biology Center, Biosciences Division, 9700 South Cass Avenue, Argonne, IL 60439, USA
| | - Roman Laskowski
- The Department of Crystallography, Birkbeck College, Malet Street, London WC1E 7HX, England
| | - Andrzej Joachimiak
- Argonne National Laboratory, Structural Biology Center, Biosciences Division, 9700 South Cass Avenue, Argonne, IL 60439, USA
| |
Collapse
|
163
|
Abstract
Post-genomics may be defined in different ways depending on how one views the challenges after the discovery of the genome. A traditional view is to follow the concept of the central dogma in molecular biology, namely from genome to transcriptome to proteome. Projects are ongoing to analyse gene expression profiles both at the mRNA and protein levels, and to catalogue protein 3D structure families, which will no doubt help the understanding of the information in the genome. However, once complete, such experimentally determined catalogues of genes, RNAs and proteins only tell us about the building blocks of life. They do not tell us much about how life operates as a system, such as higher order functional behaviours of the cell or the organism. Thus, an alternative view of post-genomics is to go up from the molecular level to the cellular level and eventually to still higher levels, i.e., the biological systems. Bioinformatics provides basic concepts as well as practical methods to integrate this view with the traditional view and to analyse complex interactions among building blocks and with dynamic environments.
Collapse
Affiliation(s)
- M Kanehisa
- Bioinformatics Centre, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan.
| |
Collapse
|
164
|
Abstract
A pathway database (DB) is a DB that describes biochemical pathways, reactions, and enzymes. The EcoCyc pathway DB (see http://ecocyc.org) describes the metabolic, transport, and genetic-regulatory networks of Escherichia coli. EcoCyc is an example of a computational symbolic theory, which is a DB that structures a scientific theory within a formal ontology so that it is available for computational analysis. It is argued that by encoding scientific theories in symbolic form, we open new realms of analysis and understanding for theories that would otherwise be too large and complex for scientists to reason with effectively.
Collapse
Affiliation(s)
- P D Karp
- Bioinformatics Research Group, SRI International, EK223, 333 Ravenswood Avenue, Menlo Park, CA 94025, USA.
| |
Collapse
|
165
|
Podani J, Oltvai ZN, Jeong H, Tombor B, Barabási AL, Szathmáry E. Comparable system-level organization of Archaea and Eukaryotes. Nat Genet 2001; 29:54-6. [PMID: 11528391 DOI: 10.1038/ng708] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A central and long-standing issue in evolutionary theory is the origin of the biological variation upon which natural selection acts. Some hypotheses suggest that evolutionary change represents an adaptation to the surrounding environment within the constraints of an organism's innate characteristics. Elucidation of the origin and evolutionary relationship of species has been complemented by nucleotide sequence and gene content analyses, with profound implications for recognizing life's major domains. Understanding of evolutionary relationships may be further expanded by comparing systemic higher-level organization among species. Here we employ multivariate analyses to evaluate the biochemical reaction pathways characterizing 43 species. Comparison of the information transfer pathways of Archaea and Eukaryotes indicates a close relationship between these domains. In addition, whereas eukaryotic metabolic enzymes are primarily of bacterial origin, the pathway-level organization of archaeal and eukaryotic metabolic networks is more closely related. Our analyses therefore suggest that during the symbiotic evolution of eukaryotes, incorporation of bacterial metabolic enzymes into the proto-archaeal proteome was constrained by the host's pre-existing metabolic architecture.
Collapse
Affiliation(s)
- J Podani
- Institute for Advanced Study, Collegium Budapest, H-1014 Budapest, Hungary
| | | | | | | | | | | |
Collapse
|
166
|
Nölling J, Breton G, Omelchenko MV, Makarova KS, Zeng Q, Gibson R, Lee HM, Dubois J, Qiu D, Hitti J, Wolf YI, Tatusov RL, Sabathe F, Doucette-Stamm L, Soucaille P, Daly MJ, Bennett GN, Koonin EV, Smith DR. Genome sequence and comparative analysis of the solvent-producing bacterium Clostridium acetobutylicum. J Bacteriol 2001; 183:4823-38. [PMID: 11466286 PMCID: PMC99537 DOI: 10.1128/jb.183.16.4823-4838.2001] [Citation(s) in RCA: 636] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The genome sequence of the solvent-producing bacterium Clostridium acetobutylicum ATCC 824 has been determined by the shotgun approach. The genome consists of a 3.94-Mb chromosome and a 192-kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetobutylicum to Bacillus subtilis reveals significant local conservation of gene order, which has not been seen in comparisons of other genomes with similar, or, in some cases closer, phylogenetic proximity. This conservation allows the prediction of many previously undetected operons in both bacteria. However, the C. acetobutylicum genome also contains a significant number of predicted operons that are shared with distantly related bacteria and archaea but not with B. subtilis. Phylogenetic analysis is compatible with the dissemination of such operons by horizontal transfer. The enzymes of the solventogenesis pathway and of the cellulosome of C. acetobutylicum comprise a new set of metabolic capacities not previously represented in the collection of complete genomes. These enzymes show a complex pattern of evolutionary affinities, emphasizing the role of lateral gene exchange in the evolution of the unique metabolic profile of the bacterium. Many of the sporulation genes identified in B. subtilis are missing in C. acetobutylicum, which suggests major differences in the sporulation process. Thus, comparative analysis reveals both significant conservation of the genome organization and pronounced differences in many systems that reflect unique adaptive strategies of the two gram-positive bacteria.
Collapse
Affiliation(s)
- J Nölling
- GTC Sequencing Center, Genome Therapeutics Corporation, Waltham, Massachusetts 02453, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
167
|
Wilkins JC, Homer KA, Beighton D. Altered protein expression of Streptococcus oralis cultured at low pH revealed by two-dimensional gel electrophoresis. Appl Environ Microbiol 2001; 67:3396-405. [PMID: 11472910 PMCID: PMC93034 DOI: 10.1128/aem.67.8.3396-3405.2001] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Streptococcus oralis is the predominant aciduric nonmutans streptococcus isolated from the human dentition, but the role of this organism in the initiation and progression of dental caries has yet to be established. To identify proteins that are differentially expressed by S. oralis growing under conditions of low pH, soluble cellular proteins extracted from bacteria grown in batch culture at pH 5.2 or 7.0 were analyzed by two-dimensional (2-D) gel electrophoresis. Thirty-nine proteins had altered expression at low pH; these were excised, digested with trypsin using an in-gel protocol, and further analyzed by peptide mass fingerprinting using matrix-assisted laser desorption ionization mass spectrometry. The resulting fingerprints were compared with the genomic database for Streptococcus pneumoniae, an organism that is phylogenetically closely related to S. oralis, and putative functions for the majority of these proteins were determined on the basis of functional homology. Twenty-eight proteins were up-regulated following growth at pH 5.2; these included enzymes of the glycolytic pathway (glyceraldehyde-3-phosphate dehydrogenase and lactate dehydrogenase), the polypeptide chains comprising ATP synthase, and proteins that are considered to play a role in the general stress response of bacteria, including the 60-kDa chaperone, Hsp33, and superoxide dismutase, and three distinct ABC transporters. These data identify, for the first time, gene products that may be important in the survival and proliferation of nonmutans aciduric S. oralis under conditions of low pH that are likely to be encountered by this organism in vivo.
Collapse
Affiliation(s)
- J C Wilkins
- Department of Oral Microbiology, GKT Dental Institute, King's College London, London United Kingdom.
| | | | | |
Collapse
|
168
|
|
169
|
Orengo CA, Sillitoe I, Reeves G, Pearl FM. Review: what can structural classifications reveal about protein evolution? J Struct Biol 2001; 134:145-65. [PMID: 11551176 DOI: 10.1006/jsbi.2001.4398] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In this article we present a review of the methods used for comparing and classifying protein structures. We discuss the hierarchies and populations of fold groups and evolutionary families in some of the major classifications and we consider some of the problems confronting any general analyses of structural evolution in protein families. We also review some more recent analyses that have expanded these classifications by identifying sequence relatives in the genomes and thereby reveal interesting trends in fold usage and recurrence.
Collapse
Affiliation(s)
- C A Orengo
- Department of Biochemistry and Molecular Biology, University College, Gower Street, London, WC1E 6BT, United Kingdom
| | | | | | | |
Collapse
|
170
|
Mishra P, Park PK, Drueckhammer DG. Identification of yacE (coaE) as the structural gene for dephosphocoenzyme A kinase in Escherichia coli K-12. J Bacteriol 2001; 183:2774-8. [PMID: 11292795 PMCID: PMC99492 DOI: 10.1128/jb.183.9.2774-2778.2001] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Dephosphocoenzyme A (dephospho-CoA) kinase catalyzes the final step in coenzyme A biosynthesis, the phosphorylation of the 3'-hydroxy group of the ribose sugar moiety. Wild-type dephospho-CoA kinase from Corynebacterium ammoniagenes was purified to homogeneity and subjected to N-terminal sequence analysis. A BLAST search identified a gene from Escherichia coli previously designated yacE encoding a highly homologous protein. Amplification of the gene and overexpression yielded recombinant dephospho-CoA kinase as a 22.6-kDa monomer. Enzyme assay and nuclear magnetic resonance analyses of the product demonstrated that the recombinant enzyme is indeed dephospho-CoA kinase. The activities with adenosine, AMP, and adenosine phosphosulfate were 4 to 8% of the activity with dephospho-CoA. Homologues of the E. coli dephospho-CoA kinase were identified in a diverse range of organisms.
Collapse
Affiliation(s)
- P Mishra
- Department of Chemistry, State University at Stony Brook, New York 11794-3400, USA
| | | | | |
Collapse
|
171
|
Graham DE, Kyrpides N, Anderson IJ, Overbeek R, Whitman WB. Genome of Methanocaldococcus (Methanococcus) jannaschii. Methods Enzymol 2001; 330:40-123. [PMID: 11210518 DOI: 10.1016/s0076-6879(01)30370-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
Affiliation(s)
- D E Graham
- Department of Biochemistry, Virginia Polytechnical Institute & State University, Blackburg, Virginia 24061-0308, USA
| | | | | | | | | |
Collapse
|
172
|
Karzai AW, Sauer RT. Protein factors associated with the SsrA.SmpB tagging and ribosome rescue complex. Proc Natl Acad Sci U S A 2001; 98:3040-4. [PMID: 11248028 PMCID: PMC30603 DOI: 10.1073/pnas.051628298] [Citation(s) in RCA: 83] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
SsrA RNA acts as a tRNA and mRNA to modify proteins whose synthesis on ribosomes has stalled. Such proteins are marked for degradation by addition of peptide tags to their C termini in a reaction mediated by SsrA RNA and SmpB, a specific SsrA-RNA binding protein. Evidence is presented here for the existence of a larger ribonucleoprotein complex that contains ribosomal protein S1, phosphoribosyl pyrophosphate synthase, RNase R, and YfbG in addition to SsrA RNA and SmpB. Biochemical, genetic, and phylogenetic results suggest potential roles for some of these factors in various stages of the ribosome rescue and tagging process and/or the presence of functional interactions between one or more of these proteins and SsrA.
Collapse
Affiliation(s)
- A W Karzai
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | |
Collapse
|
173
|
Makarova KS, Aravind L, Wolf YI, Tatusov RL, Minton KW, Koonin EV, Daly MJ. Genome of the extremely radiation-resistant bacterium Deinococcus radiodurans viewed from the perspective of comparative genomics. Microbiol Mol Biol Rev 2001; 65:44-79. [PMID: 11238985 PMCID: PMC99018 DOI: 10.1128/mmbr.65.1.44-79.2001] [Citation(s) in RCA: 486] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The bacterium Deinococcus radiodurans shows remarkable resistance to a range of damage caused by ionizing radiation, desiccation, UV radiation, oxidizing agents, and electrophilic mutagens. D. radiodurans is best known for its extreme resistance to ionizing radiation; not only can it grow continuously in the presence of chronic radiation (6 kilorads/h), but also it can survive acute exposures to gamma radiation exceeding 1,500 kilorads without dying or undergoing induced mutation. These characteristics were the impetus for sequencing the genome of D. radiodurans and the ongoing development of its use for bioremediation of radioactive wastes. Although it is known that these multiple resistance phenotypes stem from efficient DNA repair processes, the mechanisms underlying these extraordinary repair capabilities remain poorly understood. In this work we present an extensive comparative sequence analysis of the Deinococcus genome. Deinococcus is the first representative with a completely sequenced genome from a distinct bacterial lineage of extremophiles, the Thermus-Deinococcus group. Phylogenetic tree analysis, combined with the identification of several synapomorphies between Thermus and Deinococcus, supports the hypothesis that it is an ancient group with no clear affinities to any of the other known bacterial lineages. Distinctive features of the Deinococcus genome as well as features shared with other free-living bacteria were revealed by comparison of its proteome to the collection of clusters of orthologous groups of proteins. Analysis of paralogs in Deinococcus has revealed several unique protein families. In addition, specific expansions of several other families including phosphatases, proteases, acyltransferases, and Nudix family pyrophosphohydrolases were detected. Genes that potentially affect DNA repair and recombination and stress responses were investigated in detail. Some proteins appear to have been horizontally transferred from eukaryotes and are not present in other bacteria. For example, three proteins homologous to plant desiccation resistance proteins were identified, and these are particularly interesting because of the correlation between desiccation and radiation resistance. Compared to other bacteria, the D. radiodurans genome is enriched in repetitive sequences, namely, IS-like transposons and small intergenic repeats. In combination, these observations suggest that several different biological mechanisms contribute to the multiple DNA repair-dependent phenotypes of this organism.
Collapse
Affiliation(s)
- K S Makarova
- Uniformed Services University of the Health Sciences, Bethesda, Maryland 20814-4799,USA
| | | | | | | | | | | | | |
Collapse
|
174
|
Prade RA, Ayoubi P, Krishnan S, Macwana S, Russell H. Accumulation of stress and inducer-dependent plant-cell-wall-degrading enzymes during asexual development in Aspergillus nidulans. Genetics 2001; 157:957-67. [PMID: 11238386 PMCID: PMC1461545 DOI: 10.1093/genetics/157.3.957] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Determination and interpretation of fungal gene expression profiles based on digital reconstruction of expressed sequenced tags (ESTs) are reported. A total of 51,524 DNA sequence files processed with PipeOnline resulted in 9775 single and 5660 contig unique ESTs, 31.2% of a typical fungal transcriptome. Half of the unique ESTs shared homology with genes in public databases, 35.8% of which are functionally defined and 64.2% are unclear or unknown. In Aspergillus nidulans 86% of transcripts associate with intermediate metabolism functions, mainly related to carbohydrate, amino acid, protein, and peptide biosynthesis. During asexual development, A. nidulans unexpectedly accumulates stress response and inducer-dependent transcripts in the absence of an inducer. Stress response genes in A. nidulans ESTs total 1039 transcripts, contrasting with 117 in Neurospora crassa, a 14.3-fold difference. A total of 5.6% of A. nidulans ESTs implicate inducer-dependent cell wall degradation or amino acid acquisition, 3.5-fold higher than in N. crassa. Accumulation of stress response and inducer-dependent transcripts suggests general derepression of cis-regulation during terminal asexual development.
Collapse
Affiliation(s)
- R A Prade
- Department of Microbiology and Molecular Genetics, Oklahoma State University, Stillwater, Oklahoma 74078-3020, USA.
| | | | | | | | | |
Collapse
|
175
|
Covert MW, Schilling CH, Famili I, Edwards JS, Goryanin II, Selkov E, Palsson BO. Metabolic modeling of microbial strains in silico. Trends Biochem Sci 2001; 26:179-86. [PMID: 11246024 DOI: 10.1016/s0968-0004(00)01754-0] [Citation(s) in RCA: 232] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The large volume of genome-scale data that is being produced and made available in databases on the World Wide Web is demanding the development of integrated mathematical models of cellular processes. The analysis of reconstructed metabolic networks as systems leads to the development of an in silico or computer representation of collections of cellular metabolic constituents, their interactions and their integrated function as a whole. The use of quantitative analysis methods to generate testable hypotheses and drive experimentation at a whole-genome level signals the advent of a systemic modeling approach to cellular and molecular biology.
Collapse
Affiliation(s)
- M W Covert
- Dept Bioengineering, University of California, San Diego, La Jolla, CA 92093-0412, USA
| | | | | | | | | | | | | |
Collapse
|
176
|
Edwards JS, Ibarra RU, Palsson BO. In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat Biotechnol 2001; 19:125-30. [PMID: 11175725 DOI: 10.1038/84379] [Citation(s) in RCA: 625] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A significant goal in the post-genome era is to relate the annotated genome sequence to the physiological functions of a cell. Working from the annotated genome sequence, as well as biochemical and physiological information, it is possible to reconstruct complete metabolic networks. Furthermore, computational methods have been developed to interpret and predict the optimal performance of a metabolic network under a range of growth conditions. We have tested the hypothesis that Escherichia coli uses its metabolism to grow at a maximal rate using the E. coli MG1655 metabolic reconstruction. Based on this hypothesis, we formulated experiments that describe the quantitative relationship between a primary carbon source (acetate or succinate) uptake rate, oxygen uptake rate, and maximal cellular growth rate. We found that the experimental data were consistent with the stated hypothesis, namely that the E. coli metabolic network is optimized to maximize growth under the experimental conditions considered. This study thus demonstrates how the combination of in silico and experimental biology can be used to obtain a quantitative genotype-phenotype relationship for metabolism in bacterial cells.
Collapse
Affiliation(s)
- J S Edwards
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0412, USA
| | | | | |
Collapse
|
177
|
Bernal A, Ear U, Kyrpides N. Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res 2001; 29:126-7. [PMID: 11125068 PMCID: PMC29859 DOI: 10.1093/nar/29.1.126] [Citation(s) in RCA: 198] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
GOLD is a comprehensive resource for accessing information related to completed and ongoing genome projects world-wide. The database currently provides information on 350 genome projects, of which 48 have been completely sequenced and their analysis published. GOLD was created in 1997 and since April 2000 it has been licensed to Integrated Genomics. The database is freely available through the URL: http://igweb.integratedgenomics.com/GOLD/.
Collapse
Affiliation(s)
- A Bernal
- Integrated Genomics, Chicago Technology Park, 2201 West Campbell Park Drive, Chicago, IL 60612, USA
| | | | | |
Collapse
|
178
|
Daugherty M, Vonstein V, Overbeek R, Osterman A. Archaeal shikimate kinase, a new member of the GHMP-kinase family. J Bacteriol 2001; 183:292-300. [PMID: 11114929 PMCID: PMC94878 DOI: 10.1128/jb.183.1.292-300.2001] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Shikimate kinase (EC 2.7.1.71) is a committed enzyme in the seven-step biosynthesis of chorismate, a major precursor of aromatic amino acids and many other aromatic compounds. Genes for all enzymes of the chorismate pathway except shikimate kinase are found in archaeal genomes by sequence homology to their bacterial counterparts. In this study, a conserved archaeal gene (gi1500322 in Methanococcus jannaschii) was identified as the best candidate for the missing shikimate kinase gene by the analysis of chromosomal clustering of chorismate biosynthetic genes. The encoded hypothetical protein, with no sequence similarity to bacterial and eukaryotic shikimate kinases, is distantly related to homoserine kinases (EC 2.7.1.39) of the GHMP-kinase superfamily. The latter functionality in M. jannaschii is assigned to another gene (gi591748), in agreement with sequence similarity and chromosomal clustering analysis. Both archaeal proteins, overexpressed in Escherichia coli and purified to homogeneity, displayed activity of the predicted type, with steady-state kinetic parameters similar to those of the corresponding bacterial kinases: K(m,shikimate) = 414 +/- 33 microM, K(m,ATP) = 48 +/- 4 microM, and k(cat) = 57 +/- 2 s(-1) for the predicted shikimate kinase and K(m,homoserine) = 188 +/- 37 microM, K(m,ATP) = 101 +/- 7 microM, and k(cat) = 28 +/- 1 s(-1) for the homoserine kinase. No overlapping activity could be detected between shikimate kinase and homoserine kinase, both revealing a >1,000-fold preference for their own specific substrates. The case of archaeal shikimate kinase illustrates the efficacy of techniques based on reconstruction of metabolism from genomic data and analysis of gene clustering on chromosomes in finding missing genes.
Collapse
Affiliation(s)
- M Daugherty
- Integrated Genomics Inc., Chicago, Illinois 60612, USA
| | | | | | | |
Collapse
|
179
|
Apweiler R, Biswas M, Fleischmann W, Kanapin A, Karavidopoulou Y, Kersey P, Kriventseva EV, Mittard V, Mulder N, Phan I, Zdobnov E. Proteome Analysis Database: online application of InterPro and CluSTr for the functional classification of proteins in whole genomes. Nucleic Acids Res 2001; 29:44-8. [PMID: 11125045 PMCID: PMC29822 DOI: 10.1093/nar/29.1.44] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2000] [Revised: 10/23/2000] [Accepted: 10/23/2000] [Indexed: 11/14/2022] Open
Abstract
The SWISS-PROT group at EBI has developed the Proteome Analysis Database utilising existing resources and providing comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archaea and eukaryotes (http://www.ebi.ac. uk/proteome/). The two main projects used, InterPro and CluSTr, give a new perspective on families, domains and sites and cover 31-67% (InterPro statistics) of the proteins from each of the complete genomes. CluSTr covers the three complete eukaryotic genomes and the incomplete human genome data. The Proteome Analysis Database is accompanied by a program that has been designed to carry out InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database.
Collapse
Affiliation(s)
- R Apweiler
- EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
180
|
McClelland M, Florea L, Sanderson K, Clifton SW, Parkhill J, Churcher C, Dougan G, Wilson RK, Miller W. Comparison of the Escherichia coli K-12 genome with sampled genomes of a Klebsiella pneumoniae and three salmonella enterica serovars, Typhimurium, Typhi and Paratyphi. Nucleic Acids Res 2000; 28:4974-86. [PMID: 11121489 PMCID: PMC115240 DOI: 10.1093/nar/28.24.4974] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Escherichia coli K-12 genome (ECO) was compared with the sampled genomes of the sibling species Salmonella enterica serovars Typhimurium, Typhi and Paratyphi A (collectively referred to as SAL) and the genome of the close outgroup Klebsiella pneumoniae (KPN). There are at least 160 locations where sequences of >400 bp are absent from ECO but present in the genomes of all three SAL and 394 locations where sequences are present in ECO but close homologs are absent in all SAL genomes. The 394 sequences in ECO that do not occur in SAL contain 1350 (30.6%) of the 4405 ECO genes. Of these, 1165 are missing from both SAL and KPN. Most of the 1165 genes are concentrated within 28 regions of 10-40 kb, which consist almost exclusively of such genes. Among these regions were six that included previously identified cryptic phage. A hypothetical ancestral state of genomic regions that differ between ECO and SAL can be inferred in some cases by reference to the genome structure in KPN and the more distant relative Yersinia pestis. However, many changes between ECO and SAL are concentrated in regions where all four genera have a different structure. The rate of gene insertion and deletion is sufficiently high in these regions that the ancestral state of the ECO/SAL lineage cannot be inferred from the present data. The sequencing of other closely related genomes, such as S.bongori or Citrobacter, may help in this regard.
Collapse
Affiliation(s)
- M McClelland
- Sidney Kimmel Cancer Center, 10835 Altman Row, San Diego, CA 92121, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
181
|
Kyrpides NC, Ouzounis CA, Iliopoulos I, Vonstein V, Overbeek R. Analysis of the Thermotoga maritima genome combining a variety of sequence similarity and genome context tools. Nucleic Acids Res 2000; 28:4573-6. [PMID: 11071948 PMCID: PMC113882 DOI: 10.1093/nar/28.22.4573] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2000] [Revised: 10/03/2000] [Accepted: 10/03/2000] [Indexed: 11/12/2022] Open
Abstract
The proliferation of genome sequence data has led to the development of a number of tools and strategies that facilitate computational analysis. These methods include the identification of motif patterns, membership of the query sequences in family databases, metabolic pathway involvement and gene proximity. We re-examined the completely sequenced genome of Thermotoga maritima by employing the combined use of the above methods. By analyzing all 1877 proteins encoded in this genome, we identified 193 cases of conflicting annotations (10%), of which 164 are new function predictions and 29 are amendments of previously proposed assignments. These results suggest that the combined use of existing computational tools can resolve inconclusive sequence similarities and significantly improve the prediction of protein function from genome sequence.
Collapse
Affiliation(s)
- N C Kyrpides
- Integrated Genomics Inc., Chicago Technology Park, 2201 West Campbell Park Drive, Chicago, IL 60612, USA. Cambridge CB10 1SD, UK.
| | | | | | | | | |
Collapse
|
182
|
Jeong H, Tombor B, Albert R, Oltvai ZN, Barabási AL. The large-scale organization of metabolic networks. Nature 2000; 407:651-4. [PMID: 11034217 DOI: 10.1038/35036627] [Citation(s) in RCA: 2088] [Impact Index Per Article: 87.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In a cell or microorganism, the processes that generate mass, energy, information transfer and cell-fate specification are seamlessly integrated through a complex network of cellular constituents and reactions. However, despite the key role of these networks in sustaining cellular functions, their large-scale structure is essentially unknown. Here we present a systematic comparative mathematical analysis of the metabolic networks of 43 organisms representing all three domains of life. We show that, despite significant variation in their individual constituents and pathways, these metabolic networks have the same topological scaling properties and show striking similarities to the inherent organization of complex non-biological systems. This may indicate that metabolic organization is not only identical for all living organisms, but also complies with the design principles of robust and error-tolerant scale-free networks, and may represent a common blueprint for the large-scale organization of interactions among all cellular constituents.
Collapse
Affiliation(s)
- H Jeong
- Department of Physics, University of Notre Dame, Indiana 46556, USA
| | | | | | | | | |
Collapse
|
183
|
Abstract
Utilizing genome sequence data from bacterial and fungal pathogens for the discovery of new antimicrobial agents has received considerable attention, both practical and critical, from the pharmaceutical and biotechnological communities. Although no new drugs derived from genomics-based discovery have been reported to be in a development pipeline, the utilization of genomics has revolutionized many aspects of drug discovery. The application, utility, opportunity, and challenges afforded by many of these new approaches are discussed.
Collapse
Affiliation(s)
- T Black
- Department of Chemotherapy and Molecular Genetics, Schering-Plough Research Institute, 2015 Galloping Hill Road, K-15-4700, Kenilworth, NJ 07974-1300, USA.
| | | |
Collapse
|
184
|
Florea L, Riemer C, Schwartz S, Zhang Z, Stojanovic N, Miller W, McClelland M. Web-based visualization tools for bacterial genome alignments. Nucleic Acids Res 2000; 28:3486-96. [PMID: 10982867 PMCID: PMC110741 DOI: 10.1093/nar/28.18.3486] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
With the increase in the flow of sequence data, both in contigs and whole genomes, visual aids for comparison and analysis studies are becoming imperative. We describe three web-based tools for visualizing alignments of bacterial genomes. The first, called Enteric, produces a graphical, hypertext view of pairwise alignments between a reference genome and sequences from each of several related organisms, covering 20 kb around a user-specified position. Insertions, deletions and rearrangements relative to the reference genome are color-coded, which reveals many intriguing differences among genomes. The second, Menteric, computes and displays nucleotide-level multiple alignments of the same sequences, together with annotations of ORFs and regulatory sites, in a 1 kb region surrounding a given address. The third, a Java-based viewer called Maj, combines some features of the previous tools, and adds a zoom-in mechanism. We compare the Escherichia coli K-12 genome with the partially sequenced genomes of Klebsiella pneumoniae, Yersinia pestis, Vibrio cholerae, and the Salmonella enterica serovars Typhimurium, Typhi and Paratyphi A. Examination of the pairwise and multiple alignments in a region allows one to draw inferences about regulatory patterns and functional assignments. For example, these tools revealed that rffH, a gene involved in enterobacterial common antigen (ECA) biosynthesis, is partly deleted in one of the genomes. We used PCR to show that this deletion occurs sporadically in some strains of some serovars of S.enterica subspecies I but not in any strains tested from six other subspecies. The resulting cell surface diversity may be associated with selection by the host immune response.
Collapse
Affiliation(s)
- L Florea
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | | | | | | | | | | | | |
Collapse
|
185
|
van Helden J, Naim A, Mancuso R, Eldridge M, Wernisch L, Gilbert D, Wodak SJ. Representing and analysing molecular and cellular function using the computer. Biol Chem 2000; 381:921-35. [PMID: 11076023 DOI: 10.1515/bc.2000.113] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Determining the biological function of a myriad of genes, and understanding how they interact to yield a living cell, is the major challenge of the post genome-sequencing era. The complexity of biological systems is such that this cannot be envisaged without the help of powerful computer systems capable of representing and analysing the intricate networks of physical and functional interactions between the different cellular components. In this review we try to provide the reader with an appreciation of where we stand in this regard. We discuss some of the inherent problems in describing the different facets of biological function, give an overview of how information on function is currently represented in the major biological databases, and describe different systems for organising and categorising the functions of gene products. In a second part, we present a new general data model, currently under development, which describes information on molecular function and cellular processes in a rigorous manner. The model is capable of representing a large variety of biochemical processes, including metabolic pathways, regulation of gene expression and signal transduction. It also incorporates taxonomies for categorising molecular entities, interactions and processes, and it offers means of viewing the information at different levels of resolution, and dealing with incomplete knowledge. The data model has been implemented in the database on protein function and cellular processes 'aMAZE' (http://www.ebi.ac.uk/research/pfbp/), which presently covers metabolic pathways and their regulation. Several tools for querying, displaying, and performing analyses on such pathways are briefly described in order to illustrate the practical applications enabled by the model.
Collapse
Affiliation(s)
- J van Helden
- European Bioinformatics Institute (EBI), Hinxton, Cambridge, UK
| | | | | | | | | | | | | |
Collapse
|
186
|
Abstract
Computational genomics is a subfield of computational biology that deals with the analysis of entire genome sequences. Transcending the boundaries of classical sequence analysis, computational genomics exploits the inherent properties of entire genomes by modelling them as systems. We review recent developments in the field, discuss in some detail a number of novel approaches that take into account the genomic context and argue that progress will be made by novel knowledge representation and simulation technologies.
Collapse
Affiliation(s)
- S Tsoka
- Research Programme, The European Bioinformatics Institute, EMBL Cambridge Outstation, UK
| | | |
Collapse
|
187
|
Paulsen IT, Nguyen L, Sliwinski MK, Rabus R, Saier MH. Microbial genome analyses: comparative transport capabilities in eighteen prokaryotes. J Mol Biol 2000; 301:75-100. [PMID: 10926494 DOI: 10.1006/jmbi.2000.3961] [Citation(s) in RCA: 222] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Here, we present a comprehensive analysis of solute transport systems encoded within the completely sequenced genomes of 18 prokaryotic organisms. These organisms include four Gram-positive bacteria, seven Gram-negative bacteria, two spirochetes, one cyanobacterium and four archaea. Membrane proteins are analyzed in terms of putative membrane topology, and the recognized transport systems are classified into 76 families, including four families of channel proteins, four families of primary carriers, 54 families of secondary carriers, six families of group translocators, and eight unclassified families. These families are analyzed in terms of the paralogous and orthologous relationships of their protein members, the substrate specificities of their constituent transporters and their distributions in each of the 18 organisms studied. The families vary from large superfamilies with hundreds of represented members, to small families with only one or a few members. The mode of transport generally correlates with the primary mechanism of energy generation, and the numbers of secondary transporters relative to primary transporters are roughly proportional to the total numbers of primary H(+) and Na(+) pumps in the cell. The phosphotransferase system is less prevalent in the analyzed bacteria than previously thought (only six of 14 bacteria transport sugars via this system) and is completely lacking in archaea and eukaryotes. Escherichia coli is shown to be exceptionally broad in its transport capabilities and therefore, at a membrane transport level, does not appear representative of the bacteria thus far sequenced. Archaea and spirochetes exhibit fewer proteins with multiple transmembrane segments and fewer net transporters than most bacteria. These results provide insight into the relevance of transport to the overall physiology of prokaryotes.
Collapse
Affiliation(s)
- I T Paulsen
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | | | | | |
Collapse
|
188
|
Natale DA, Shankavaram UT, Galperin MY, Wolf YI, Aravind L, Koonin EV. Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs). Genome Biol 2000; 1:RESEARCH0009. [PMID: 11178258 PMCID: PMC15027 DOI: 10.1186/gb-2000-1-5-research0009] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2000] [Revised: 08/25/2000] [Accepted: 09/21/2000] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi. RESULTS A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix. CONCLUSIONS Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and predicted protein functions provide for a significant improvement in genome annotation. A differential genome display approach helps in a systematic investigation of common and distinct features of gene repertoires and in some cases reveals unexpected connections that may be indicative of functional similarities between phylogenetically distant organisms and of lateral gene exchange.
Collapse
Affiliation(s)
- Darren A Natale
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rockville Pike, Bethesda, MD 20894, USA. E-mail:
| | - Uma T Shankavaram
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rockville Pike, Bethesda, MD 20894, USA. E-mail:
| | - Michael Y Galperin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rockville Pike, Bethesda, MD 20894, USA. E-mail:
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rockville Pike, Bethesda, MD 20894, USA. E-mail:
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rockville Pike, Bethesda, MD 20894, USA. E-mail:
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rockville Pike, Bethesda, MD 20894, USA. E-mail:
| |
Collapse
|
189
|
Comparative Genome Analysis: Exploiting the Context of Genes to Infer Evolution and Predict Function. COMPARATIVE GENOMICS 2000. [DOI: 10.1007/978-94-011-4309-7_25] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|