1
|
Herson J, Krummenacker M, Spaulding A, O'Maille P, Karp PD. The Genome Explorer genome browser. mSystems 2024; 9:e0026724. [PMID: 38958457 PMCID: PMC11265445 DOI: 10.1128/msystems.00267-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 05/28/2024] [Indexed: 07/04/2024] Open
Abstract
Are two adjacent genes in the same operon? What are the order and spacing between several transcription factor binding sites? Genome browsers are software data visualization and exploration tools that enable biologists to answer questions such as these. In this paper, we report on a major update to our browser, Genome Explorer, that provides nearly instantaneous scaling and traversing of a genome, enabling users to quickly and easily zoom into an area of interest. The user can rapidly move between scales that depict the entire genome, individual genes, and the sequence; Genome Explorer presents the most relevant detail and context for each scale. By downloading the data for the entire genome to the user's web browser and dynamically generating visualizations locally, we enable fine control of zoom and pan functions and real-time redrawing of the visualization, resulting in smoother and more intuitive exploration of a genome than is possible with other browsers. Further, genome features are presented together, in-line, using familiar graphical depictions. In contrast, many other browsers depict genome features using data tracks, which have low information density and can visually obscure the relative positions of features. Genome Explorer diagrams have a high information density that provides larger amounts of genome context and sequence information to be presented in a given-sized monitor than for tracks-based browsers. Genome Explorer provides optional data tracks for the analysis of large-scale data sets and a unique comparative mode that aligns genomes at orthologous genes with synchronized zooming. IMPORTANCE Genome browsers provide graphical depictions of genome information to speed the uptake of complex genome data by scientists. They provide search operations to help scientists find information and zoom operations to enable scientists to view genome features at different resolutions. We introduce the Genome Explorer browser, which provides extremely fast zooming and panning of genome visualizations and displays with high information density.
Collapse
Affiliation(s)
- James Herson
- Advanced Technology and Systems Division, SRI International, Menlo Park, California, USA
| | - Markus Krummenacker
- Artificial Intelligence Center, SRI International, Menlo Park, California, USA
| | - Aaron Spaulding
- Artificial Intelligence Center, SRI International, Menlo Park, California, USA
| | - Paul O'Maille
- BioSciences Division, SRI International, Menlo Park, California, USA
| | - Peter D. Karp
- Artificial Intelligence Center, SRI International, Menlo Park, California, USA
| |
Collapse
|
2
|
Khomtchouk BB, Weitz E, Karp PD, Wahlestedt C. How the strengths of Lisp-family languages facilitate building complex and flexible bioinformatics applications. Brief Bioinform 2018; 19:537-543. [PMID: 28040748 PMCID: PMC5952920 DOI: 10.1093/bib/bbw130] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Revised: 11/16/2016] [Indexed: 11/14/2022] Open
Abstract
We present a rationale for expanding the presence of the Lisp family of programming languages in bioinformatics and computational biology research. Put simply, Lisp-family languages enable programmers to more quickly write programs that run faster than in other languages. Languages such as Common Lisp, Scheme and Clojure facilitate the creation of powerful and flexible software that is required for complex and rapidly evolving domains like biology. We will point out several important key features that distinguish languages of the Lisp family from other programming languages, and we will explain how these features can aid researchers in becoming more productive and creating better code. We will also show how these features make these languages ideal tools for artificial intelligence and machine learning applications. We will specifically stress the advantages of domain-specific languages (DSLs): languages that are specialized to a particular area, and thus not only facilitate easier research problem formulation, but also aid in the establishment of standards and best programming practices as applied to the specific research field at hand. DSLs are particularly easy to build in Common Lisp, the most comprehensive Lisp dialect, which is commonly referred to as the 'programmable programming language'. We are convinced that Lisp grants programmers unprecedented power to build increasingly sophisticated artificial intelligence systems that may ultimately transform machine learning and artificial intelligence research in bioinformatics and computational biology.
Collapse
Affiliation(s)
- Bohdan B Khomtchouk
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| | - Edmund Weitz
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| | - Peter D Karp
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| | - Claes Wahlestedt
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| |
Collapse
|
3
|
Karp PD, Latendresse M, Paley SM, Krummenacker M, Ong QD, Billington R, Kothari A, Weaver D, Lee T, Subhraveti P, Spaulding A, Fulcher C, Keseler IM, Caspi R. Pathway Tools version 19.0 update: software for pathway/genome informatics and systems biology. Brief Bioinform 2015; 17:877-90. [PMID: 26454094 DOI: 10.1093/bib/bbv079] [Citation(s) in RCA: 173] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Indexed: 11/15/2022] Open
Abstract
Pathway Tools is a bioinformatics software environment with a broad set of capabilities. The software provides genome-informatics tools such as a genome browser, sequence alignments, a genome-variant analyzer and comparative-genomics operations. It offers metabolic-informatics tools, such as metabolic reconstruction, quantitative metabolic modeling, prediction of reaction atom mappings and metabolic route search. Pathway Tools also provides regulatory-informatics tools, such as the ability to represent and visualize a wide range of regulatory interactions. This article outlines the advances in Pathway Tools in the past 5 years. Major additions include components for metabolic modeling, metabolic route search, computation of atom mappings and estimation of compound Gibbs free energies of formation; addition of editors for signaling pathways, for genome sequences and for cellular architecture; storage of gene essentiality data and phenotype data; display of multiple alignments, and of signaling and electron-transport pathways; and development of Python and web-services application programming interfaces. Scientists around the world have created more than 9800 Pathway/Genome Databases by using Pathway Tools, many of which are curated databases for important model organisms.
Collapse
|
4
|
Hamerly T, Tripet BP, Tigges M, Giannone RJ, Wurch L, Hettich RL, Podar M, Copié V, Bothner B. Untargeted metabolomics studies employing NMR and LC-MS reveal metabolic coupling between Nanoarcheum equitans and its archaeal host Ignicoccus hospitalis. Metabolomics 2015; 11:895-907. [PMID: 26273237 PMCID: PMC4529127 DOI: 10.1007/s11306-014-0747-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Interspecies interactions are the basis of microbial community formation and infectious diseases. Systems biology enables the construction of complex models describing such interactions, leading to a better understanding of disease states and communities. However, before interactions between complex organisms can be understood, metabolic and energetic implications of simpler real-world host-microbe systems must be worked out. To this effect, untargeted metabolomics experiments were conducted and integrated with proteomics data to characterize key molecular-level interactions between two hyperthermophilic microbial species, both of which have reduced genomes. Metabolic changes and transfer of metabolites between the archaea Ignicoccus hospitalis and Nanoarcheum equitans were investigated using integrated LC-MS and NMR metabolomics. The study of such a system is challenging, as no genetic tools are available, growth in the laboratory is challenging, and mechanisms by which they interact are unknown. Together with information about relative enzyme levels obtained from shotgun proteomics, the metabolomics data provided useful insights into metabolic pathways and cellular networks of I. hospitalis that are impacted by the presence of N. equitans, including arginine, isoleucine, and CTP biosynthesis. On the organismal level, the data indicate that N. equitans exploits metabolites generated by I. hospitalis to satisfy its own metabolic needs. This finding is based on N. equitans's consumption of a significant fraction of the metabolite pool in I. hospitalis that cannot solely be attributed to increased biomass production for N. equitans. Combining LC-MS and NMR metabolomics datasets improved coverage of the metabolome and enhanced the identification and quantitation of cellular metabolites.
Collapse
Affiliation(s)
- Timothy Hamerly
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
| | - Brian P. Tripet
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
| | - Michelle Tigges
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
| | | | - Louie Wurch
- Oak Ridge National Laboratory, Oak Ridge, TN 37831
- Department of Microbiology, University of Tennessee, Knoxville, TN 37996
| | | | - Mircea Podar
- Oak Ridge National Laboratory, Oak Ridge, TN 37831
- Department of Microbiology, University of Tennessee, Knoxville, TN 37996
| | - Valerie Copié
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717
| | - Brian Bothner
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717
| |
Collapse
|
5
|
Abstract
Bacterial metabolism is an important source of novel products/processes for everyday life and strong efforts are being undertaken to discover and exploit new usable substances of microbial origin. Computational modeling and in silico simulations are powerful tools in this context since they allow the exploration and a deeper understanding of bacterial metabolic circuits. Many approaches exist to quantitatively simulate chemical reaction fluxes within the whole microbial metabolism and, regardless of the technique of choice, metabolic model reconstruction is the first step in every modeling pipeline. Reconstructing a metabolic network consists in drafting the list of the biochemical reactions that an organism can carry out together with information on cellular boundaries, a biomass assembly reaction, and exchange fluxes with the external environment. Building up models able to represent the different functional cellular states is universally recognized as a tricky task that requires intensive manual effort and much additional information besides genome sequence. In this chapter we present a general protocol for metabolic reconstruction in bacteria and the main challenges encountered during this process.
Collapse
Affiliation(s)
- Marco Fondi
- Department of Biology, University of Florence, via Madonna del Piano 6, I-50019 Sesto Fiorentino, Florence, Italy,
| | | |
Collapse
|
6
|
Shanmugasundram A, Gonzalez-Galarza FF, Wastling JM, Vasieva O, Jones AR. An integrated approach to understand apicomplexan metabolism from their genomes. BMC Bioinformatics 2014. [PMCID: PMC4071867 DOI: 10.1186/1471-2105-15-s3-a3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
7
|
Shanmugasundram A, Gonzalez-Galarza FF, Wastling JM, Vasieva O, Jones AR. Library of Apicomplexan Metabolic Pathways: a manually curated database for metabolic pathways of apicomplexan parasites. Nucleic Acids Res 2012. [PMID: 23193253 PMCID: PMC3531055 DOI: 10.1093/nar/gks1139] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The Library of Apicomplexan Metabolic Pathways (LAMP, http://www.llamp.net) is a web database that provides near complete mapping from genes to the central metabolic functions for some of the prominent intracellular parasites of the phylum Apicomplexa. This phylum includes the causative agents of malaria, toxoplasmosis and theileriosis-diseases with a huge economic and social impact. A number of apicomplexan genomes have been sequenced, but the accurate annotation of gene function remains challenging. We have adopted an approach called metabolic reconstruction, in which genes are systematically assigned to functions within pathways/networks for Toxoplasma gondii, Neospora caninum, Cryptosporidium and Theileria species, and Babesia bovis. Several functions missing from pathways have been identified, where the corresponding gene for an essential process appears to be absent from the current genome annotation. For each species, LAMP contains interactive diagrams of each pathway, hyperlinked to external resources and annotated with detailed information, including the sources of evidence used. We have also developed a section to highlight the overall metabolic capabilities of each species, such as the ability to synthesize or the dependence on the host for a particular metabolite. We expect this new database will become a valuable resource for fundamental and applied research on the Apicomplexa.
Collapse
Affiliation(s)
- Achchuthan Shanmugasundram
- Department of Functional and Comparative Genomics, Institute of Integrative Biology, University of Liverpool, Biosciences Building, Crown Street, Liverpool L69 7ZB, UK.
| | | | | | | | | |
Collapse
|
8
|
MILED ZINABEN, WEBSTER YUEW, LIU YANG, LI NIANHUA. AN ONTOLOGY FOR SEMANTIC INTEGRATION OF LIFE SCIENCE WEB DATABASES. INT J COOP INF SYST 2012. [DOI: 10.1142/s0218843003000747] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The incompatibilities among complex data formats and various schema used by biological databases that house these data are becoming a bottleneck in biological research. For example, biological data format varies from simple words (e.g. gene name), numbers (e.g. molecular weight) to sequence strings (e.g. nucleic acid sequence), to even more complex data formats such as taxonomy trees. Some information is embedded in narrative text, such as expert comments and publications. Some other information is expressed as graphs or images (e.g. pathways networks). The confederation of heterogeneous web databases has become a crucial issue in today's biological research. In other words, interoperability has to be archieved among the biological web databases and the heterogeneity of the web databases has to be resolved. This paper presents a biological ontology, BAO, and discusses its advantages in supporting the semantic integration of biological web databases are discussed.
Collapse
Affiliation(s)
- ZINA BEN MILED
- Electrical and Computer Engineering Department, Purdue School of Engineering & Technology, Indianapolis, Indiana, 46202, USA
| | - YUE W. WEBSTER
- Electrical and Computer Engineering Department, Purdue School of Engineering & Technology, Indianapolis, Indiana, 46202, USA
| | - YANG LIU
- Electrical and Computer Engineering Department, Purdue School of Engineering & Technology, Indianapolis, Indiana, 46202, USA
| | - NIANHUA LI
- Computer & Information Science, Purdue School of Science, Indianapolis, Indiana, 46202, USA
| |
Collapse
|
9
|
Karp PD, Paley SM, Krummenacker M, Latendresse M, Dale JM, Lee TJ, Kaipa P, Gilham F, Spaulding A, Popescu L, Altman T, Paulsen I, Keseler IM, Caspi R. Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief Bioinform 2009; 11:40-79. [PMID: 19955237 DOI: 10.1093/bib/bbp043] [Citation(s) in RCA: 325] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Pathway Tools is a production-quality software environment for creating a type of model-organism database called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc integrates the evolving understanding of the genes, proteins, metabolic network and regulatory network of an organism. This article provides an overview of Pathway Tools capabilities. The software performs multiple computational inferences including prediction of metabolic pathways, prediction of metabolic pathway hole fillers and prediction of operons. It enables interactive editing of PGDBs by DB curators. It supports web publishing of PGDBs, and provides a large number of query and visualization tools. The software also supports comparative analyses of PGDBs, and provides several systems biology analyses of PGDBs including reachability analysis of metabolic networks, and interactive tracing of metabolites through a metabolic network. More than 800 PGDBs have been created using Pathway Tools by scientists around the world, many of which are curated DBs for important model organisms. Those PGDBs can be exchanged using a peer-to-peer DB sharing system called the PGDB Registry.
Collapse
Affiliation(s)
- Peter D Karp
- Artificial Intelligence Center, SRI International, 333 Ravenswood Ave, AE206, Menlo Park, CA 94025, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Abstract
Background The development of e-Science presents a major set of opportunities and challenges for the future progress of biological and life scientific research. Major new tools are required and corresponding demands are placed on the high-throughput data generated and used in these processes. Nowhere is the demand greater than in the semantic integration of these data. Semantic Web tools and technologies afford the chance to achieve this semantic integration. Since pathway knowledge is central to much of the scientific research today it is a good test-bed for semantic integration. Within the context of biological pathways, the BioPAX initiative, part of a broader movement towards the standardization and integration of life science databases, forms a necessary prerequisite for its successful application of e-Science in health care and life science research. This paper examines whether BioPAX, an effort to overcome the barrier of disparate and heterogeneous pathway data sources, addresses the needs of e-Science. Results We demonstrate how BioPAX pathway data can be used to ask and answer some useful biological questions. We find that BioPAX comes close to meeting a broad range of e-Science needs, but certain semantic weaknesses mean that these goals are missed. We make a series of recommendations for re-modeling some aspects of BioPAX to better meet these needs. Conclusion Once these semantic weaknesses are addressed, it will be possible to integrate pathway information in a manner that would be useful in e-Science.
Collapse
Affiliation(s)
- Joanne S Luciano
- Genetics Department, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
- School of Computer Science, Manchester University, Oxford Road, Manchester, M13 9PL, UK
| | - Robert D Stevens
- School of Computer Science, Manchester University, Oxford Road, Manchester, M13 9PL, UK
| |
Collapse
|
11
|
Abstract
Integrating information in the molecular biosciences involves more than the cross-referencing of sequences or structures. Experimental protocols, results of computational analyses, annotations and links to relevant literature form integral parts of this information, and impart meaning to sequence or structure. In this review, we examine some existing approaches to integrating information in the molecular biosciences. We consider not only technical issues concerning the integration of heterogeneous data sources and the corresponding semantic implications, but also the integration of analytical results. Within the broad range of strategies for integration of data and information, we distinguish between platforms and developments. We discuss two current platforms and six current developments, and identify what we believe to be their strengths and limitations. We identify key unsolved problems in integrating information in the molecular biosciences, and discuss possible strategies for addressing them including semantic integration using ontologies, XML as a data model, and graphical user interfaces as integrative environments.
Collapse
Affiliation(s)
- Alexander Garcia Castro
- ARC Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | | | | |
Collapse
|
12
|
Shannon PT, Reiss DJ, Bonneau R, Baliga NS. The Gaggle: an open-source software system for integrating bioinformatics software and data sources. BMC Bioinformatics 2006; 7:176. [PMID: 16569235 PMCID: PMC1464137 DOI: 10.1186/1471-2105-7-176] [Citation(s) in RCA: 122] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2005] [Accepted: 03/28/2006] [Indexed: 01/16/2023] Open
Abstract
Background Systems biologists work with many kinds of data, from many different sources, using a variety of software tools. Each of these tools typically excels at one type of analysis, such as of microarrays, of metabolic networks and of predicted protein structure. A crucial challenge is to combine the capabilities of these (and other forthcoming) data resources and tools to create a data exploration and analysis environment that does justice to the variety and complexity of systems biology data sets. A solution to this problem should recognize that data types, formats and software in this high throughput age of biology are constantly changing. Results In this paper we describe the Gaggle -a simple, open-source Java software environment that helps to solve the problem of software and database integration. Guided by the classic software engineering strategy of separation of concerns and a policy of semantic flexibility, it integrates existing popular programs and web resources into a user-friendly, easily-extended environment. We demonstrate that four simple data types (names, matrices, networks, and associative arrays) are sufficient to bring together diverse databases and software. We highlight some capabilities of the Gaggle with an exploration of Helicobacter pylori pathogenesis genes, in which we identify a putative ricin-like protein -a discovery made possible by simultaneous data exploration using a wide range of publicly available data and a variety of popular bioinformatics software tools. Conclusion We have integrated diverse databases (for example, KEGG, BioCyc, String) and software (Cytoscape, DataMatrixViewer, R statistical environment, and TIGR Microarray Expression Viewer). Through this loose coupling of diverse software and databases the Gaggle enables simultaneous exploration of experimental data (mRNA and protein abundance, protein-protein and protein-DNA interactions), functional associations (operon, chromosomal proximity, phylogenetic pattern), metabolic pathways (KEGG) and Pubmed abstracts (STRING web resource), creating an exploratory environment useful to 'web browser and spreadsheet biologists', to statistically savvy computational biologists, and those in between. The Gaggle uses Java RMI and Java Web Start technologies and can be found at .
Collapse
Affiliation(s)
- Paul T Shannon
- Institute for Systems Biology, 1441 N 34Street, Seattle, WA 98103, USA
| | - David J Reiss
- Institute for Systems Biology, 1441 N 34Street, Seattle, WA 98103, USA
| | - Richard Bonneau
- Institute for Systems Biology, 1441 N 34Street, Seattle, WA 98103, USA
- Department of Biology, New York University, 100 Washington Square E, New York, NY 10003, USA
| | - Nitin S Baliga
- Institute for Systems Biology, 1441 N 34Street, Seattle, WA 98103, USA
| |
Collapse
|
13
|
Wiback SJ, Mahadevan R, Palsson BØ. Reconstructing metabolic flux vectors from extreme pathways: defining the alpha-spectrum. J Theor Biol 2003; 224:313-24. [PMID: 12941590 DOI: 10.1016/s0022-5193(03)00168-1] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The move towards genome-scale analysis of cellular functions has necessitated the development of analytical (in silico) methods to understand such large and complex biochemical reaction networks. One such method is extreme pathway analysis that uses stoichiometry and thermodynamic irreversibly to define mathematically unique, systemic metabolic pathways. These extreme pathways form the edges of a high-dimensional convex cone in the flux space that contains all the attainable steady state solutions, or flux distributions, for the metabolic network. By definition, any steady state flux distribution can be described as a nonnegative linear combination of the extreme pathways. To date, much effort has been focused on calculating, defining, and understanding these extreme pathways. However, little work has been performed to determine how these extreme pathways contribute to a given steady state flux distribution. This study represents an initial effort aimed at defining how physiological steady state solutions can be reconstructed from a network's extreme pathways. In general, there is not a unique set of nonnegative weightings on the extreme pathways that produce a given steady state flux distribution but rather a range of possible values. This range can be determined using linear optimization to maximize and minimize the weightings of a particular extreme pathway in the reconstruction, resulting in what we have termed the alpha-spectrum. The alpha-spectrum defines which extreme pathways can and cannot be included in the reconstruction of a given steady state flux distribution and to what extent they individually contribute to the reconstruction. It is shown that accounting for transcriptional regulatory constraints can considerably shrink the alpha-spectrum. The alpha-spectrum is computed and interpreted for two cases; first, optimal states of a skeleton representation of core metabolism that include transcriptional regulation, and second for human red blood cell metabolism under various physiological, non-optimal conditions.
Collapse
Affiliation(s)
- Sharon J Wiback
- Department of Bioengineering, University of California, 9500 Gilman Drive EBU 1 Room 6607, San Diego, La Jolla, CA 92093, USA
| | | | | |
Collapse
|
14
|
van Helden J, Wernisch L, Gilbert D, Wodak SJ. Graph-based analysis of metabolic networks. ERNST SCHERING RESEARCH FOUNDATION WORKSHOP 2002:245-74. [PMID: 12061005 DOI: 10.1007/978-3-662-04747-7_12] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Affiliation(s)
- J van Helden
- Unité de Conformation des Macromolécules Biologiques, Université Libre de Bruxelles, CP 160/16, Avenue F.D. Roosevelt, 50, 1050 Bruxelles, Belgium.
| | | | | | | |
Collapse
|
15
|
Médigue C, Bocs S, Labarre L, Mathé C, Vallenet D. L’annotationin silicodes séquences génomiques. Med Sci (Paris) 2002. [DOI: 10.1051/medsci/2002182237] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
16
|
Karp PD, Riley M, Saier M, Paulsen IT, Collado-Vides J, Paley SM, Pellegrini-Toole A, Bonavides C, Gama-Castro S. The EcoCyc Database. Nucleic Acids Res 2002; 30:56-8. [PMID: 11752253 PMCID: PMC99147 DOI: 10.1093/nar/30.1.56] [Citation(s) in RCA: 245] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
EcoCyc is an organism-specific pathway/genome database that describes the metabolic and signal-transduction pathways of Escherichia coli, its enzymes, its transport proteins and its mechanisms of transcriptional control of gene expression. EcoCyc is queried using the Pathway Tools graphical user interface, which provides a wide variety of query operations and visualization tools. EcoCyc is available at http://ecocyc.org/.
Collapse
Affiliation(s)
- Peter D Karp
- Bioinformatics Research Group, SRI International, 333 Ravenswood Avenue EK207, Menlo Park, CA 94025, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Abstract
We describe a graphical editor designed specifically to facilitate analysis and visualization of complex signal-transduction pathways. The editor provides automatic layout of complex regulatory graphs and enables users easily to maintain, edit, and exchange publication-quality images of regulatory networks.
Collapse
Affiliation(s)
- T Koike
- Columbia Genome Center, Columbia University, New York, NY, USA
| | | |
Collapse
|
18
|
Abstract
Computational genomics is a subfield of computational biology that deals with the analysis of entire genome sequences. Transcending the boundaries of classical sequence analysis, computational genomics exploits the inherent properties of entire genomes by modelling them as systems. We review recent developments in the field, discuss in some detail a number of novel approaches that take into account the genomic context and argue that progress will be made by novel knowledge representation and simulation technologies.
Collapse
Affiliation(s)
- S Tsoka
- Research Programme, The European Bioinformatics Institute, EMBL Cambridge Outstation, UK
| | | |
Collapse
|
19
|
Häring D, Kypr J. Escherichia coli genome is composed of two distinct types of nucleotide sequences. Biochem Biophys Res Commun 2000; 272:571-5. [PMID: 10833453 DOI: 10.1006/bbrc.2000.2825] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We calculated correlations of the nucleotide distributions along the E. coli genome. Subsequent cluster analysis of the correlation distributions showed that the genome was composed of two qualitatively different types of nucleotide sequences. The first type exhibited strong correlations of the genomic distributions of A with T and G with C, and high anticorrelations of A with C and G with T. In contrast, the second type was characterized by weak or negligible correlations typical of randomized sequences. Both types of sequences were almost equally abundant in the E. coli genome and their length varied from several hundred nucleotides to about 70 kilobases. They were not disjunct with respect to their (G + C) content but the high correlations and anticorrelations were rather characteristic for (A + T)-rich genomic segments. We offer possible explanations of the mosaic structure of the E. coli genome.
Collapse
Affiliation(s)
- D Häring
- Institute of Biophysics, Academy of Sciences of the Czech Republic, Královopolská 135, Brno, Czech Republic
| | | |
Collapse
|
20
|
Karp PD, Riley M, Saier M, Paulsen IT, Paley SM, Pellegrini-Toole A. The EcoCyc and MetaCyc databases. Nucleic Acids Res 2000; 28:56-9. [PMID: 10592180 PMCID: PMC102475 DOI: 10.1093/nar/28.1.56] [Citation(s) in RCA: 134] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/1999] [Revised: 10/15/1999] [Accepted: 10/15/1999] [Indexed: 11/13/2022] Open
Abstract
EcoCyc is an organism-specific Pathway/Genome Database that describes the metabolic and signal-transduction pathways of Escherichia coli, its enzymes, and-a new addition-its transport proteins. MetaCyc is a new metabolic-pathway database that describes pathways and enzymes of many different organisms, with a microbial focus. Both databases are queried using the Pathway Tools graphical user interface, which provides a wide variety of query operations and visualization tools. EcoCyc and MetaCyc are available at http://ecocyc.PangeaSystems.com/ecocyc/
Collapse
Affiliation(s)
- P D Karp
- SRI International, 333 Ravenswood Avenue, EK223, Menlo Park, CA 94025, USA.
| | | | | | | | | | | |
Collapse
|
21
|
Karp PD, Krummenacker M, Paley S, Wagg J. Integrated pathway-genome databases and their role in drug discovery. Trends Biotechnol 1999; 17:275-81. [PMID: 10370234 DOI: 10.1016/s0167-7799(99)01316-5] [Citation(s) in RCA: 113] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Integrated pathway-genome databases describe the genes and genome of an organism, as well as its predicted pathways, reactions, enzymes and metabolites. In conjunction with visualization and analysis software, these databases provide a framework for improved understanding of microbial physiology and for antimicrobial drug discovery. We describe pathway-based analyses of the genomes of a number of medically relevant microorganisms and a novel software tool that visualizes gene-expression data on a diagram showing the whole metabolic network of the microorganism.
Collapse
Affiliation(s)
- P D Karp
- Pangea Systems, 4040 Campbell Ave, Menlo Park, CA 94025, USA.
| | | | | | | |
Collapse
|
22
|
Karp PD, Riley M, Paley SM, Pellegrini-Toole A, Krummenacker M. Eco Cyc: encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res 1999; 27:55-8. [PMID: 9847140 PMCID: PMC148095 DOI: 10.1093/nar/27.1.55] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The EcoCyc database describes the genome and gene products of Escherichia coli, its metabolic and signal-transduction pathways, and its tRNAs. The database describes 4391 genes of E.coli, 695 enzymes encoded by a subset of these genes, 904 metabolic reactions that occur in E.coli, and the organization of these reactions into 129 metabolic pathways. The EcoCyc graphical user interface allows scientists to query and explore the EcoCyc database using visualization tools such as genomic-map browsers and automatic layouts of metabolic pathways. EcoCyc has many references to the primary literature, and is a (qualitative) computational model of E. coli metabolism. EcoCyc is available at URL http://ecocyc. PangeaSystems.com/ecocyc/
Collapse
Affiliation(s)
- P D Karp
- Pangea Systems Inc., 4040 Campbell Avenue, Menlo Park, CA 94025, USA and Marine Biological Laboratory, Woods Hole, MA 02543, USA.
| | | | | | | | | |
Collapse
|
23
|
Paulsen IT, Sliwinski MK, Saier MH. Microbial genome analyses: global comparisons of transport capabilities based on phylogenies, bioenergetics and substrate specificities. J Mol Biol 1998; 277:573-92. [PMID: 9533881 DOI: 10.1006/jmbi.1998.1609] [Citation(s) in RCA: 210] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We have conducted genome sequence analyses of seven prokaryotic microorganisms for which completely sequenced genomes are available (Escherichia coli, Haemophilus influenzae, Helicobacter pylori, Bacillus subtilis, Mycoplasma genitalium, Synechocystis PCC6803 and Methanococcus jannaschii). We report the distribution of encoded known and putative polytopic cytoplasmic membrane transport proteins within these genomes. Transport systems for each organism were classified according to (1) putative membrane topology, (2) protein family, (3) bioenergetics, and (4) substrate specificities. The overall transport capabilities of each organism were thereby estimated. Probable function was assigned to greater than 90% of the putative transport proteins identified. The results show the following: (1) Numbers of transport systems in eubacteria are approximately proportional to genome size and correspond to 9.7 to 10.8% of the total encoded genes except for H. pylori (5.4%), Synechocystis (4.7%) and M. jannaschii (3.5%) which exhibit substantially lower proportions. (2) The distribution of topological types is similar in all seven organisms. (3) Transport systems belonging to 67 families were identified within the genomes of these organisms, and about half of these families are also found in eukaryotes. (4) 12% of these families are found exclusively in Gram-negative bacteria, but none is found exclusively in Gram-positive bacteria, cyanobacteria or archaea. (5) Two superfamilies, the ATP-binding cassette (ABC) and major facilitator (MF) superfamilies account for nearly 50% of all transporters in each organism, but the relative representation of these two transporter types varies over a tenfold range, depending on the organism. (6) Secondary, pmf-dependent carriers are 1.5 to threefold more prevalent than primary ATP-dependent carriers in E. coli, H. influenzae, H. pylori and B. subtilis while primary carriers are about twofold more prevalent in M. genitalium and Synechocystis. M. jannaschii exhibits a slight preference for secondary carriers. (7) Bioenergetics of transport generally correlate with the primary forms of energy generated via available metabolic pathways but ecological niche and substrate availability may also be determining factors. (8) All organisms display a similar range of transport specificities with quantitative differences presumably reflective of disparate ecological niches. (9) M. jannaschii and Synechocystis have a two to threefold increased proportion of transporters for inorganic ions with a concomitant decrease in transporters for organic compounds. (10) 6 to 18% of all transporters in these bacteria probably function as drug export systems showing that these systems are prevalent in non-pathogenic as well as pathogenic organisms. (11) All seven prokaryotes examined encode proteins homologous to known channel proteins, but none of the channel types identified occurs in all of these organisms. (12) The phosphoenolpyruvate:sugar phosphotransferase system is prevalent in the large genome organisms, E. coli and B. subtilis, and is present in the small genome organisms, H. influenzae and M. genitalium, but is totally lacking in H. pylori, Synechocystis and M. jannaschii. Details of the information summarized in this article are available on our web sites, and this information will be periodically updated and corrected as new sequence and biochemical data become available.
Collapse
Affiliation(s)
- I T Paulsen
- Department of Biology, University of California at San Diego, La Jolla, CA, 92093-0116, USA
| | | | | |
Collapse
|
24
|
Affiliation(s)
- P D Karp
- Pangea Systems Inc., Menlo Park, CA 94025, USA.
| |
Collapse
|
25
|
Karp PD, Riley M, Paley SM, Pellegrini-Toole A, Krummenacker M. EcoCyc: Encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res 1998; 26:50-3. [PMID: 9399798 PMCID: PMC147256 DOI: 10.1093/nar/26.1.50] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The encyclopedia of Escherichia coli genes and metabolism (EcoCyc) is a database that combines information about the genome and the intermediary metabolism of E.coli. The database describes 3030 genes of E.coli , 695 enzymes encoded by a subset of these genes, 595 metabolic reactions that occur in E.coli, and the organization of these reactions into 123 metabolic pathways. The EcoCyc graphical user interface allows scientists to query and explore the EcoCyc database using visualization tools such as genomic-map browsers and automatic layouts of metabolic pathways. EcoCyc can be thought of as an electronic review article because of its copious references to the primary literature, and as a (qualitative) computational model of E.coli metabolism. EcoCyc is available at URL http://ecocyc.PangeaSystems.com/ecocyc/
Collapse
Affiliation(s)
- P D Karp
- Artificial Intelligence Center, SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025, USA.
| | | | | | | | | |
Collapse
|
26
|
Karp PD, Riley M, Paley SM, Pellegrini-Toole A, Krummenacker M. EcoCyc: Enyclopedia of Escherichia coli Genes and Metabolism. Nucleic Acids Res 1997; 25:43-51. [PMID: 9016502 PMCID: PMC146379 DOI: 10.1093/nar/25.1.43] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
The Encyclopedia of Genes and Metabolism (EcoCyc) is a database that combines information about the genome and the intermediary metabolism of Escherichia coli. It describes 2970 genes of E.coli, 547 enzymes encoded by these genes, 702 metabolic reactions that occur in E.coli and the organization of these reactions into 107 metabolic pathways. The EcoCyc graphical user interface allows scientists to query and explore the EcoCyc database using visualization tools such as genomic-map browsers and automatic layouts of metabolic pathways. EcoCyc spans the space from sequence to function to allow scientists to investigate an unusually broad range of questions. EcoCyc can be thought of as both an electronic review article because of its copious references to the primary literature, and as an in silicio model of E.coli metabolism that can be probed and analyzed through computational means.
Collapse
Affiliation(s)
- P D Karp
- Artificial Intelligence Center, SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025, USA.
| | | | | | | | | |
Collapse
|
27
|
Abstract
Several techniques are being introduced into the bioinformatics community to permit interoperation between molecular biology databases (DBs). The common factor to these approaches is the creation of links between entities in different DBs. Links can connect pieces of information about a single protein that are partitioned across multiple DBs, and can also encode relationships between different biological entities, such as relationships between an enzyme, its gene and its catalytic activity. This article provides an overview of the DB-interoperation problem, and offers several solutions. It discusses how links are used in molecular biology DBs, and describes the potential stumbling blocks when DB links are created and used.
Collapse
Affiliation(s)
- P D Karp
- Artificial Intelligence Center, SRI International, Menlo Park, CA 94025, USA.
| |
Collapse
|