101
|
Haq IU, Graupner K, Nazir R, van Elsas JD. The genome of the fungal-interactive soil bacterium Burkholderia terrae BS001-a plethora of outstanding interactive capabilities unveiled. Genome Biol Evol 2014; 6:1652-68. [PMID: 24923325 PMCID: PMC4122924 DOI: 10.1093/gbe/evu126] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Burkholderia terrae strain BS001, obtained as an inhabitant of the mycosphere of Laccaria proxima (a close relative of Lyophyllum sp. strain Karsten), actively interacts with Lyophyllum sp. strain Karsten. We here summarize the remarkable ecological behavior of B. terrae BS001 in the mycosphere and add key data to this. Moreover, we extensively analyze the approximately 11.5-Mb five-replicon genome of B. terrae BS001 and highlight its remarkable features. Seventy-nine regions of genomic plasticity (RGP), that is, 16.48% of the total genome size, were found. One 70.42-kb RGP, RGP76, revealed a typical conjugal element structure, including a full type 4 secretion system. Comparative analyses across 24 related Burkholderia genomes revealed that 95.66% of the total BS001 genome belongs to the variable part, whereas the remaining 4.34% constitutes the core genome. Genes for biofilm formation and several secretion systems, under which a type 3 secretion system (T3SS), were found, which is consistent with the hypothesis that T3SSs play a role in the interaction with Lyophyllum sp. strain Karsten. The high number of predicted metabolic pathways and membrane transporters suggested that strain BS001 can take up and utilize a range of sugars, amino acids and organic acids. In particular, a unique glycerol uptake system was found. The BS001 genome further contains genetic systems for the degradation of complex organic compounds. Moreover, gene clusters encoding nonribosomal peptide synthetases (NRPS) and hybrid polyketide synthases/NRPS were found, highlighting the potential role of secondary metabolites in the ecology of strain BS001. The patchwork of genetic features observed in the genome is consistent with the notion that 1) horizontal gene transfer is a main driver of B. terrae BS001 adaptation and 2) the organism is very flexible in its ecological behavior in soil.
Collapse
Affiliation(s)
- Irshad Ul Haq
- Department of Microbial Ecology, Center for Ecological and Evolutionary Studies (CEES), University of Groningen, The Netherlands
| | - Katharina Graupner
- Department of Biomolecular Chemistry, Leibniz Institute for Natural Product Research and Infection Biology, Hans Knöll Institute, Jena, Germany
| | - Rashid Nazir
- Department of Environmental Sciences, COMSATS Institute of Information Technology, Abbottabad, Pakistan
| | - Jan Dirk van Elsas
- Department of Microbial Ecology, Center for Ecological and Evolutionary Studies (CEES), University of Groningen, The Netherlands
| |
Collapse
|
102
|
Montague E, Stanberry L, Higdon R, Janko I, Lee E, Anderson N, Choiniere J, Stewart E, Yandl G, Broomall W, Kolker N, Kolker E. MOPED 2.5--an integrated multi-omics resource: multi-omics profiling expression database now includes transcriptomics data. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:335-43. [PMID: 24910945 PMCID: PMC4048574 DOI: 10.1089/omi.2014.0061] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Multi-omics data-driven scientific discovery crucially rests on high-throughput technologies and data sharing. Currently, data are scattered across single omics repositories, stored in varying raw and processed formats, and are often accompanied by limited or no metadata. The Multi-Omics Profiling Expression Database (MOPED, http://moped.proteinspire.org ) version 2.5 is a freely accessible multi-omics expression database. Continual improvement and expansion of MOPED is driven by feedback from the Life Sciences Community. In order to meet the emergent need for an integrated multi-omics data resource, MOPED 2.5 now includes gene relative expression data in addition to protein absolute and relative expression data from over 250 large-scale experiments. To facilitate accurate integration of experiments and increase reproducibility, MOPED provides extensive metadata through the Data-Enabled Life Sciences Alliance (DELSA Global, http://delsaglobal.org ) metadata checklist. MOPED 2.5 has greatly increased the number of proteomics absolute and relative expression records to over 500,000, in addition to adding more than four million transcriptomics relative expression records. MOPED has an intuitive user interface with tabs for querying different types of omics expression data and new tools for data visualization. Summary information including expression data, pathway mappings, and direct connection between proteins and genes can be viewed on Protein and Gene Details pages. These connections in MOPED provide a context for multi-omics expression data exploration. Researchers are encouraged to submit omics data which will be consistently processed into expression summaries. MOPED as a multi-omics data resource is a pivotal public database, interdisciplinary knowledge resource, and platform for multi-omics understanding.
Collapse
Affiliation(s)
- Elizabeth Montague
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Larissa Stanberry
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Roger Higdon
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Imre Janko
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Elaine Lee
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Nathaniel Anderson
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - John Choiniere
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Elizabeth Stewart
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Gregory Yandl
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - William Broomall
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Natali Kolker
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Eugene Kolker
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Departments of Biomedical Informatics and Medical Education and Pediatrics, University of Washington, Seattle, Washington
| |
Collapse
|
103
|
Kumar S, Shah N, Garg V, Bhatia S. Large scale in-silico identification and characterization of simple sequence repeats (SSRs) from de novo assembled transcriptome of Catharanthus roseus (L.) G. Don. PLANT CELL REPORTS 2014; 33:905-918. [PMID: 24482265 DOI: 10.1007/s00299-014-1569-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2013] [Revised: 12/17/2013] [Accepted: 01/09/2014] [Indexed: 06/03/2023]
Abstract
Transcriptomic data of C. roseus offering ample sequence resources for providing better insights into gene diversity: large resource of genic SSR markers to accelerate genomic studies and breeding in Catharanthus . Next-generation sequencing is an efficient system for generating high-throughput complete transcripts/genes and developing molecular markers. We present here the transcriptome sequencing of a 26-day-old Catharanthus roseus seedling tissue using Illumina GAIIX platform that resulted in a total of 3.37 Gb of nucleotide sequence data comprising 29,964,104 reads which were de novo assembled into 26,581 unigenes. Based on similarity searches 58 % of the unigenes were annotated of which 13,580 unique transcripts were assigned 5016 gene ontology terms. Further, 7,687 of the unigenes were found to have Cluster of Orthologous Group classifications, and 4,006 were assigned to 289 Kyoto Encyclopedia of Genes and Genome pathways. Also, 5,221 (19.64 %) of transcripts were distributed to 81 known transcription factor (TF) families. In-silico analysis of the transcriptome resulted in identification of 11,004 SSRs in 26.62 % transcripts from which 2,520 SSR markers were designed which exhibited a non-random pattern of distribution. The most abundant was the trinucleotide repeats (AAG/CTT) followed by the dinucleotide repeats (AG/CT). Location specific analysis of SSRs revealed that SSRs were preferentially associated with the 5'-UTRs with a predicted role in regulation of gene expression. A PCR validation of a set of 48 primers revealed 97.9 % successful amplification, and 76.6 % of them showed polymorphism across different Catharanthus species as well as accessions of C. roseus. In summary, this study will provide an insight into understanding the seedling development and resources for novel gene discovery and SSR development for utilization in marker-assisted selective breeding in C. roseus.
Collapse
Affiliation(s)
- Santosh Kumar
- National Institute of Plant Genome Research, Aruna Asaf Ali Marg, PO Box 10531, New Delhi, 110067, India
| | | | | | | |
Collapse
|
104
|
Feltes BC, de Faria Poloni J, Nunes IJG, Bonatto D. Fetal alcohol syndrome, chemo-biology and OMICS: ethanol effects on vitamin metabolism during neurodevelopment as measured by systems biology analysis. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:344-63. [PMID: 24816220 DOI: 10.1089/omi.2013.0144] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Fetal alcohol syndrome (FAS) is a prenatal disease characterized by fetal morphological and neurological abnormalities originating from exposure to alcohol. Although FAS is a well-described pathology, the molecular mechanisms underlying its progression are virtually unknown. Moreover, alcohol abuse can affect vitamin metabolism and absorption, although how alcohol impairs such biochemical pathways remains to be elucidated. We employed a variety of systems chemo-biology tools to understand the interplay between ethanol metabolism and vitamins during mouse neurodevelopment. For this purpose, we designed interactomes and employed transcriptomic data analysis approaches to study the neural tissue of Mus musculus exposed to ethanol prenatally and postnatally, simulating conditions that could lead to FAS development at different life stages. Our results showed that FAS can promote early changes in neurotransmitter release and glutamate equilibrium, as well as an abnormal calcium influx that can lead to neuroinflammation and impaired neurodifferentiation, both extensively connected with vitamin action and metabolism. Genes related to retinoic acid, niacin, vitamin D, and folate metabolism were underexpressed during neurodevelopment and appear to contribute to neuroinflammation progression and impaired synapsis. Our results also indicate that genes coding for tubulin, tubulin-associated proteins, synapse plasticity proteins, and proteins related to neurodifferentiation are extensively affected by ethanol exposure. Finally, we developed a molecular model of how ethanol can affect vitamin metabolism and impair neurodevelopment.
Collapse
Affiliation(s)
- Bruno César Feltes
- Centro de Biotecnologia da Universidade Federal do Rio Grande do Sul , Departamento de Biologia Molecular e Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | | | | | | |
Collapse
|
105
|
Streptococcus pyogenes polymyxin B-resistant mutants display enhanced ExPortal integrity. J Bacteriol 2014; 196:2563-77. [PMID: 24794568 DOI: 10.1128/jb.01596-14] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The ExPortal protein secretion organelle in Streptococcus pyogenes is an anionic phospholipid-containing membrane microdomain enriched in Sec translocons and postsecretion protein biogenesis factors. Polymyxin B binds to and disrupts ExPortal integrity, resulting in defective secretion of several toxins. To gain insight into factors that influence ExPortal organization, a genetic screen was conducted to select for spontaneous polymyxin B-resistant mutants displaying enhanced ExPortal integrity. Whole-genome resequencing of 25 resistant mutants revealed from one to four mutations per mutant genome clustered primarily within a core set of 10 gene groups. Construction of mutants with individual deletions or insertions demonstrated that 7 core genes confer resistance and enhanced ExPortal integrity through loss of function, while 3 were likely due to gain of function and/or combinatorial effects. Core resistance genes include a transcriptional regulator of lipid biosynthesis, several genes involved in nutrient acquisition, and a variety of genes involved in stress responses. Two members of the latter class also function as novel regulators of the secreted SpeB cysteine protease. Analysis of the most frequently isolated mutation, a single nucleotide deletion in a track of 9 consecutive adenine residues in pstS, encoding a component of a high-affinity Pi transporter, suggests that this sequence functions as a molecular switch to facilitate stress adaptation. Together, these data suggest the existence of a membrane stress response that promotes enhanced ExPortal integrity and resistance to cationic antimicrobial peptides.
Collapse
|
106
|
Genomic features of a bumble bee symbiont reflect its host environment. Appl Environ Microbiol 2014; 80:3793-803. [PMID: 24747890 DOI: 10.1128/aem.00322-14] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Here, we report the genome of one gammaproteobacterial member of the gut microbiota, for which we propose the name "Candidatus Schmidhempelia bombi," that was inadvertently sequenced alongside the genome of its host, the bumble bee, Bombus impatiens. This symbiont is a member of the recently described bacterial order Orbales, which has been collected from the guts of diverse insect species; however, "Ca. Schmidhempelia" has been identified exclusively with bumble bees. Metabolic reconstruction reveals that "Ca. Schmidhempelia" lacks many genes for a functioning NADH dehydrogenase I, all genes for the high-oxygen cytochrome o, and most genes in the tricarboxylic acid (TCA) cycle. "Ca. Schmidhempelia" has retained NADH dehydrogenase II, the low-oxygen specific cytochrome bd, anaerobic nitrate respiration, mixed-acid fermentation pathways, and citrate fermentation, which may be important for survival in low-oxygen or anaerobic environments found in the bee hindgut. Additionally, a type 6 secretion system, a Flp pilus, and many antibiotic/multidrug transporters suggest complex interactions with its host and other gut commensals or pathogens. This genome has signatures of reduction (2.0 megabase pairs) and rearrangement, as previously observed for genomes of host-associated bacteria. A survey of wild and laboratory B. impatiens revealed that "Ca. Schmidhempelia" is present in 90% of individuals and, therefore, may provide benefits to its host.
Collapse
|
107
|
Glass K, Girvan M. Annotation enrichment analysis: an alternative method for evaluating the functional properties of gene sets. Sci Rep 2014; 4:4191. [PMID: 24569707 PMCID: PMC3935204 DOI: 10.1038/srep04191] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Accepted: 01/28/2014] [Indexed: 12/18/2022] Open
Abstract
Gene annotation databases (compendiums maintained by the scientific community that describe the biological functions performed by individual genes) are commonly used to evaluate the functional properties of experimentally derived gene sets. Overlap statistics, such as Fishers Exact test (FET), are often employed to assess these associations, but don't account for non-uniformity in the number of genes annotated to individual functions or the number of functions associated with individual genes. We find FET is strongly biased toward over-estimating overlap significance if a gene set has an unusually high number of annotations. To correct for these biases, we develop Annotation Enrichment Analysis (AEA), which properly accounts for the non-uniformity of annotations. We show that AEA is able to identify biologically meaningful functional enrichments that are obscured by numerous false-positive enrichment scores in FET, and we therefore suggest it be used to more accurately assess the biological properties of gene sets.
Collapse
Affiliation(s)
- Kimberly Glass
- 1] Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA [2] Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA [3] Department of Physics, University of Maryland, College Park, MD, USA
| | - Michelle Girvan
- 1] Department of Physics, University of Maryland, College Park, MD, USA [2] Institute for Physical Science and Technology, University of Maryland, College Park, MD, USA [3] Santa Fe Institute, Santa Fe, NM
| |
Collapse
|
108
|
Macklin DN, Ruggero NA, Covert MW. The future of whole-cell modeling. Curr Opin Biotechnol 2014; 28:111-5. [PMID: 24556244 DOI: 10.1016/j.copbio.2014.01.012] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Revised: 01/19/2014] [Accepted: 01/20/2014] [Indexed: 12/21/2022]
Abstract
Integrated whole-cell modeling is poised to make a dramatic impact on molecular and systems biology, bioengineering, and medicine--once certain obstacles are overcome. From our group's experience building a whole-cell model of Mycoplasma genitalium, we identified several significant challenges to building models of more complex cells. Here we review and discuss these challenges in seven areas: first, experimental interrogation; second, data curation; third, model building and integration; fourth, accelerated computation; fifth, analysis and visualization; sixth, model validation; and seventh, collaboration and community development. Surmounting these challenges will require the cooperation of an interdisciplinary group of researchers to create increasingly sophisticated whole-cell models and make data, models, and simulations more accessible to the wider community.
Collapse
Affiliation(s)
- Derek N Macklin
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Nicholas A Ruggero
- Department of Chemical Engineering, Stanford University, Stanford, CA, USA
| | - Markus W Covert
- Department of Bioengineering, Stanford University, Stanford, CA, USA.
| |
Collapse
|
109
|
Watson E, MacNeil LT, Ritter AD, Yilmaz LS, Rosebrock AP, Caudy AA, Walhout AJM. Interspecies systems biology uncovers metabolites affecting C. elegans gene expression and life history traits. Cell 2014; 156:759-70. [PMID: 24529378 PMCID: PMC4169190 DOI: 10.1016/j.cell.2014.01.047] [Citation(s) in RCA: 143] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Revised: 10/09/2013] [Accepted: 01/09/2014] [Indexed: 01/07/2023]
Abstract
Diet greatly influences gene expression and physiology. In mammals, elucidating the effects and mechanisms of individual nutrients is challenging due to the complexity of both the animal and its diet. Here, we used an interspecies systems biology approach with Caenorhabditis elegans and two of its bacterial diets, Escherichia coli and Comamonas aquatica, to identify metabolites that affect the animal's gene expression and physiology. We identify vitamin B12 as the major dilutable metabolite provided by Comamonas aq. that regulates gene expression, accelerates development, and reduces fertility but does not affect lifespan. We find that vitamin B12 has a dual role in the animal: it affects development and fertility via the methionine/S-Adenosylmethionine (SAM) cycle and breaks down the short-chain fatty acid propionic acid, preventing its toxic buildup. Our interspecies systems biology approach provides a paradigm for understanding complex interactions between diet and physiology.
Collapse
Affiliation(s)
- Emma Watson
- Program in Systems Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA; Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Lesley T MacNeil
- Program in Systems Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA; Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Ashlyn D Ritter
- Program in Systems Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA; Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - L Safak Yilmaz
- Program in Systems Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA; Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Adam P Rosebrock
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto M5S 3E1, Canada
| | - Amy A Caudy
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto M5S 3E1, Canada
| | - Albertha J M Walhout
- Program in Systems Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA; Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA.
| |
Collapse
|
110
|
Benedict MN, Henriksen JR, Metcalf WW, Whitaker RJ, Price ND. ITEP: an integrated toolkit for exploration of microbial pan-genomes. BMC Genomics 2014; 15:8. [PMID: 24387194 PMCID: PMC3890548 DOI: 10.1186/1471-2164-15-8] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Accepted: 12/18/2013] [Indexed: 01/31/2023] Open
Abstract
Background Comparative genomics is a powerful approach for studying variation in physiological traits as well as the evolution and ecology of microorganisms. Recent technological advances have enabled sequencing large numbers of related genomes in a single project, requiring computational tools for their integrated analysis. In particular, accurate annotations and identification of gene presence and absence are critical for understanding and modeling the cellular physiology of newly sequenced genomes. Although many tools are available to compare the gene contents of related genomes, new tools are necessary to enable close examination and curation of protein families from large numbers of closely related organisms, to integrate curation with the analysis of gain and loss, and to generate metabolic networks linking the annotations to observed phenotypes. Results We have developed ITEP, an Integrated Toolkit for Exploration of microbial Pan-genomes, to curate protein families, compute similarities to externally-defined domains, analyze gene gain and loss, and generate draft metabolic networks from one or more curated reference network reconstructions in groups of related microbial species among which the combination of core and variable genes constitute the their "pan-genomes". The ITEP toolkit consists of: (1) a series of modular command-line scripts for identification, comparison, curation, and analysis of protein families and their distribution across many genomes; (2) a set of Python libraries for programmatic access to the same data; and (3) pre-packaged scripts to perform common analysis workflows on a collection of genomes. ITEP’s capabilities include de novo protein family prediction, ortholog detection, analysis of functional domains, identification of core and variable genes and gene regions, sequence alignments and tree generation, annotation curation, and the integration of cross-genome analysis and metabolic networks for study of metabolic network evolution. Conclusions ITEP is a powerful, flexible toolkit for generation and curation of protein families. ITEP's modular design allows for straightforward extension as analysis methods and tools evolve. By integrating comparative genomics with the development of draft metabolic networks, ITEP harnesses the power of comparative genomics to build confidence in links between genotype and phenotype and helps disambiguate gene annotations when they are evaluated in both evolutionary and metabolic network contexts.
Collapse
Affiliation(s)
| | | | | | | | - Nathan D Price
- Institute for Systems Biology, 401 Terry Ave, N,, Seattle, WA 98109, USA.
| |
Collapse
|
111
|
Dreher K. Putting The Plant Metabolic Network pathway databases to work: going offline to gain new capabilities. Methods Mol Biol 2014; 1083:151-71. [PMID: 24218215 DOI: 10.1007/978-1-62703-661-0_10] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Metabolic databases such as The Plant Metabolic Network/MetaCyc and KEGG PATHWAY are publicly accessible resources providing organism-specific information on reactions and metabolites. KEGG PATHWAY depicts metabolic networks as wired, electronic circuit-like maps, whereas the MetaCyc family of databases uses a canonical textbook-like representation. The first MetaCyc-based database for a plant species was AraCyc, which describes metabolism in the model plant Arabidopsis. This database was created over 10 years ago and has since then undergone extensive manual curation to reflect updated information on enzymes and pathways in Arabidopsis. This chapter describes accessing and using AraCyc and its underlying Pathway Tools software. Specifically, methods for (1) navigating Pathway Tools, (2) visualizing omics data and superimposing the data on a metabolic pathway map, and (3) creating pathways and pathway components are discussed.
Collapse
Affiliation(s)
- Kate Dreher
- Carnegie Institution for Science, Palo Alto, CA, USA
| |
Collapse
|
112
|
Heath BS, Marshall MJ, Laskin J. The characterization of living bacterial colonies using nanospray desorption electrospray ionization mass spectrometry. Methods Mol Biol 2014; 1151:199-208. [PMID: 24838888 DOI: 10.1007/978-1-4939-0554-6_14] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Nanospray desorption electrospray ionization (nano-DESI) coupled with high-resolution mass spectrometry (MS) and tandem mass spectrometry (MS/MS) enable detailed molecular characterization of living bacterial colonies directly from nutrient agar. The ability to detect molecular signatures of living microbial communities is important for investigating metabolic exchange between species without affecting the viability of the colonies. We describe the protocol for bacterial growth, sample preparation, ambient profiling, and data analysis of microbial communities using nano-DESI MS.
Collapse
Affiliation(s)
- Brandi S Heath
- Physical Sciences Division, Pacific Northwest National Laboratory, 999, MSIN K8-88, Richland, WA, 99352, USA
| | | | | |
Collapse
|
113
|
Somerville GA, Powers R. Growth and preparation of Staphylococcus epidermidis for NMR metabolomic analysis. Methods Mol Biol 2014; 1106:71-91. [PMID: 24222456 DOI: 10.1007/978-1-62703-736-5_6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The "omics" era began with transcriptomics and this progressed into proteomics. While useful, these approaches provide only circumstantial information about carbon flow, metabolic status, redox poise, etc. To more directly address these metabolic concerns, researchers have turned to the emerging field of metabolomics. In our laboratories, we frequently use NMR metabolomics to acquire a snapshot of bacterial metabolomes during stressful or transition events. Irrespective of the "omics" method of choice, the experimental outcome depends on the proper cultivation and preparation of bacterial samples. In addition, the integration of these large datasets requires that these cultivation conditions be clearly defined.
Collapse
Affiliation(s)
- Greg A Somerville
- School of Veterinary Medicine and Biomedical Sciences, University of Nebraska-Lincoln, Lincoln, NE, USA
| | | |
Collapse
|
114
|
Oakeson KF, Gil R, Clayton AL, Dunn DM, von Niederhausern AC, Hamil C, Aoyagi A, Duval B, Baca A, Silva FJ, Vallier A, Jackson DG, Latorre A, Weiss RB, Heddi A, Moya A, Dale C. Genome degeneration and adaptation in a nascent stage of symbiosis. Genome Biol Evol 2014; 6:76-93. [PMID: 24407854 PMCID: PMC3914690 DOI: 10.1093/gbe/evt210] [Citation(s) in RCA: 134] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Symbiotic associations between animals and microbes are ubiquitous in nature, with an estimated 15% of all insect species harboring intracellular bacterial symbionts. Most bacterial symbionts share many genomic features including small genomes, nucleotide composition bias, high coding density, and a paucity of mobile DNA, consistent with long-term host association. In this study, we focus on the early stages of genome degeneration in a recently derived insect-bacterial mutualistic intracellular association. We present the complete genome sequence and annotation of Sitophilus oryzae primary endosymbiont (SOPE). We also present the finished genome sequence and annotation of strain HS, a close free-living relative of SOPE and other insect symbionts of the Sodalis-allied clade, whose gene inventory is expected to closely resemble the putative ancestor of this group. Structural, functional, and evolutionary analyses indicate that SOPE has undergone extensive adaptation toward an insect-associated lifestyle in a very short time period. The genome of SOPE is large in size when compared with many ancient bacterial symbionts; however, almost half of the protein-coding genes in SOPE are pseudogenes. There is also evidence for relaxed selection on the remaining intact protein-coding genes. Comparative analyses of the whole-genome sequence of strain HS and SOPE highlight numerous genomic rearrangements, duplications, and deletions facilitated by a recent expansion of insertions sequence elements, some of which appear to have catalyzed adaptive changes. Functional metabolic predictions suggest that SOPE has lost the ability to synthesize several essential amino acids and vitamins. Analyses of the bacterial cell envelope and genes encoding secretion systems suggest that these structures and elements have become simplified in the transition to a mutualistic association.
Collapse
Affiliation(s)
- Kelly F. Oakeson
- Department of Biology, University of Utah
- *Corresponding author: E-mail:
| | - Rosario Gil
- Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Spain
| | | | | | | | - Cindy Hamil
- Department of Human Genetics, University of Utah
| | - Alex Aoyagi
- Department of Human Genetics, University of Utah
| | - Brett Duval
- Department of Human Genetics, University of Utah
| | | | - Francisco J. Silva
- Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Spain
| | - Agnès Vallier
- INSA-Lyon, INRA, UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, Villeurbanne, France
| | | | - Amparo Latorre
- Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Spain
- Área de Genómica y Salud, Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunitat Valenciana FISABIO – Salud Pública, Valencia, Spain
| | | | - Abdelaziz Heddi
- INSA-Lyon, INRA, UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, Villeurbanne, France
| | - Andrés Moya
- Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Spain
- Área de Genómica y Salud, Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunitat Valenciana FISABIO – Salud Pública, Valencia, Spain
| | - Colin Dale
- Department of Biology, University of Utah
| |
Collapse
|
115
|
Ngounou Wetie AG, Sokolowska I, Woods AG, Roy U, Deinhardt K, Darie CC. Protein-protein interactions: switch from classical methods to proteomics and bioinformatics-based approaches. Cell Mol Life Sci 2014; 71:205-28. [PMID: 23579629 PMCID: PMC11113707 DOI: 10.1007/s00018-013-1333-1] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2012] [Revised: 03/25/2013] [Accepted: 03/26/2013] [Indexed: 11/28/2022]
Abstract
Following the sequencing of the human genome and many other organisms, research on protein-coding genes and their functions (functional genomics) has intensified. Subsequently, with the observation that proteins are indeed the molecular effectors of most cellular processes, the discipline of proteomics was born. Clearly, proteins do not function as single entities but rather as a dynamic network of team players that have to communicate. Though genetic (yeast two-hybrid Y2H) and biochemical methods (co-immunoprecipitation Co-IP, affinity purification AP) were the methods of choice at the beginning of the study of protein-protein interactions (PPI), in more recent years there has been a shift towards proteomics-based methods and bioinformatics-based approaches. In this review, we first describe in depth PPIs and we make a strong case as to why unraveling the interactome is the next challenge in the field of proteomics. Furthermore, classical methods of investigation of PPIs and structure-based bioinformatics approaches are presented. The greatest emphasis is placed on proteomic methods, especially native techniques that were recently developed and that have been shown to be reliable. Finally, we point out the limitations of these methods and the need to set up a standard for the validation of PPI experiments.
Collapse
Affiliation(s)
- Armand G. Ngounou Wetie
- Department of Chemistry and Biomolecular Science, Biochemistry and Proteomics Group, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5810 USA
| | - Izabela Sokolowska
- Department of Chemistry and Biomolecular Science, Biochemistry and Proteomics Group, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5810 USA
| | - Alisa G. Woods
- Department of Chemistry and Biomolecular Science, Biochemistry and Proteomics Group, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5810 USA
| | - Urmi Roy
- Department of Chemistry and Biomolecular Science, Biochemistry and Proteomics Group, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5810 USA
| | - Katrin Deinhardt
- Centre for Biological Sciences, University of Southampton, Life Sciences Building 85, Southampton, SO17 1BJ UK
- Institute for Life Sciences, University of Southampton, Life Sciences Building 85, Southampton, SO17 1BJ UK
| | - Costel C. Darie
- Department of Chemistry and Biomolecular Science, Biochemistry and Proteomics Group, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5810 USA
| |
Collapse
|
116
|
Medina S, Domínguez-Perles R, Ferreres F, Tomás-Barberán FA, Gil-Izquierdo Á. The effects of the intake of plant foods on the human metabolome. Trends Analyt Chem 2013. [DOI: 10.1016/j.trac.2013.08.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
117
|
Kuhn M, Szklarczyk D, Pletscher-Frankild S, Blicher TH, von Mering C, Jensen LJ, Bork P. STITCH 4: integration of protein-chemical interactions with user data. Nucleic Acids Res 2013; 42:D401-7. [PMID: 24293645 PMCID: PMC3964996 DOI: 10.1093/nar/gkt1207] [Citation(s) in RCA: 309] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
STITCH is a database of protein-chemical interactions that integrates many sources of experimental and manually curated evidence with text-mining information and interaction predictions. Available at http://stitch.embl.de, the resulting interaction network includes 390 000 chemicals and 3.6 million proteins from 1133 organisms. Compared with the previous version, the number of high-confidence protein-chemical interactions in human has increased by 45%, to 367 000. In this version, we added features for users to upload their own data to STITCH in the form of internal identifiers, chemical structures or quantitative data. For example, a user can now upload a spreadsheet with screening hits to easily check which interactions are already known. To increase the coverage of STITCH, we expanded the text mining to include full-text articles and added a prediction method based on chemical structures. We further changed our scheme for transferring interactions between species to rely on orthology rather than protein similarity. This improves the performance within protein families, where scores are now transferred only to orthologous proteins, but not to paralogous proteins. STITCH can be accessed with a web-interface, an API and downloadable files.
Collapse
Affiliation(s)
- Michael Kuhn
- Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Molecular Life Sciences, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | | | | | | | | | | | | |
Collapse
|
118
|
Hamilton JJ, Reed JL. Software platforms to facilitate reconstructing genome-scale metabolic networks. Environ Microbiol 2013; 16:49-59. [PMID: 24148076 DOI: 10.1111/1462-2920.12312] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2013] [Accepted: 10/12/2013] [Indexed: 12/24/2022]
Abstract
System-level analyses of microbial metabolism are facilitated by genome-scale reconstructions of microbial biochemical networks. A reconstruction provides a structured representation of the biochemical transformations occurring within an organism, as well as the genes necessary to carry out these transformations, as determined by the annotated genome sequence and experimental data. Network reconstructions also serve as platforms for constraint-based computational techniques, which facilitate biological studies in a variety of applications, including evaluation of network properties, metabolic engineering and drug discovery. Bottom-up metabolic network reconstructions have been developed for dozens of organisms, but until recently, the pace of reconstruction has failed to keep up with advances in genome sequencing. To address this problem, a number of software platforms have been developed to automate parts of the reconstruction process, thereby alleviating much of the manual effort previously required. Here, we review four such platforms in the context of established guidelines for network reconstruction. While many steps of the reconstruction process have been successfully automated, some manual evaluation of the results is still required to ensure a high-quality reconstruction. Widespread adoption of these platforms by the scientific community is underway and will be further enabled by exchangeable formats across platforms.
Collapse
Affiliation(s)
- Joshua J Hamilton
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | | |
Collapse
|
119
|
Rodrigues A, Formas-Oliveira A, Bandeira V, Alves P, Hu W, Coroadinha A. Metabolic pathways recruited in the production of a recombinant enveloped virus: Mining targets for process and cell engineering. Metab Eng 2013; 20:131-45. [DOI: 10.1016/j.ymben.2013.10.001] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Revised: 07/22/2013] [Accepted: 10/03/2013] [Indexed: 11/27/2022]
|
120
|
|
121
|
Milreu PV, Klein CC, Cottret L, Acuña V, Birmelé E, Borassi M, Junot C, Marchetti-Spaccamela A, Marino A, Stougie L, Jourdan F, Crescenzi P, Lacroix V, Sagot MF. Telling metabolic stories to explore metabolomics data: a case study on the yeast response to cadmium exposure. ACTA ACUST UNITED AC 2013; 30:61-70. [PMID: 24167155 PMCID: PMC3866556 DOI: 10.1093/bioinformatics/btt597] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Motivation: The increasing availability of metabolomics data enables to better understand the metabolic processes involved in the immediate response of an organism to environmental changes and stress. The data usually come in the form of a list of metabolites whose concentrations significantly changed under some conditions, and are thus not easy to interpret without being able to precisely visualize how such metabolites are interconnected. Results: We present a method that enables to organize the data from any metabolomics experiment into metabolic stories. Each story corresponds to a possible scenario explaining the flow of matter between the metabolites of interest. These scenarios may then be ranked in different ways depending on which interpretation one wishes to emphasize for the causal link between two affected metabolites: enzyme activation, enzyme inhibition or domino effect on the concentration changes of substrates and products. Equally probable stories under any selected ranking scheme can be further grouped into a single anthology that summarizes, in a unique subnetwork, all equivalently plausible alternative stories. An anthology is simply a union of such stories. We detail an application of the method to the response of yeast to cadmium exposure. We use this system as a proof of concept for our method, and we show that we are able to find a story that reproduces very well the current knowledge about the yeast response to cadmium. We further show that this response is mostly based on enzyme activation. We also provide a framework for exploring the alternative pathways or side effects this local response is expected to have in the rest of the network. We discuss several interpretations for the changes we see, and we suggest hypotheses that could in principle be experimentally tested. Noticeably, our method requires simple input data and could be used in a wide variety of applications. Availability and implementation: The code for the method presented in this article is available at http://gobbolino.gforge.inria.fr. Contact: pvmilreu@gmail.com; vincent.lacroix@univ-lyon1.fr; marie-france.sagot@inria.fr Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Paulo Vieira Milreu
- INRIA Grenoble Rhône-Alpes & Université de Lyon, F-69000 Lyon, Université Lyon 1; CNRS, UMR5558 LBBE, France, Laboratório Nacional de Computação Científica (LNCC), Petrópolis, Brazil, LISBP, UMR CNRS 5504 - INRA 792, Toulouse, France, Mathomics, Center for Mathematical Modeling (UMI-2807 CNRS) and Center for Genome Regulation (Fondap 15090007), University of Chile, Santiago, Chile Lab. Statistique et Génome, CNRS UMR8071 INRA1152, Université d'Évry, France, Scuola Normale Superiore, 56126 Pisa, Italy, Laboratoire d'Etude du Métabolisme des Médicaments, DSV/iBiTecS/SPI, CEA/Saclay, 91191 Gif-sur-Yvette, France, La Sapienza University of Rome, Rome, Dipartimento di Sistemi e Informatica, Università di Firenze, I-50134 Firenze, Italy, VU University and CWI, Amsterdam, The Netherlands and INRA UMR1331 - Toxalim, Toulouse, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
122
|
Percudani R, Carnevali D, Puggioni V. Ureidoglycolate hydrolase, amidohydrolase, lyase: how errors in biological databases are incorporated in scientific papers and vice versa. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat071. [PMID: 24107613 PMCID: PMC3793230 DOI: 10.1093/database/bat071] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
An opaque biochemical definition, an insufficient functional characterization, an interpolated database description, and a beautiful 3D structure with a wrong reaction. All these are elements of an exemplar case of misannotation in biological databases and confusion in the scientific literature concerning genes and enzymes acting on ureidoglycolate, an intermediate of purine catabolism. Here we show biochemical evidence for the relocation of genes assigned to EC 3.5.3.19 (ureidoglycolate hydrolase, releasing ammonia), such as allA of Escherichia coli or DAL3 of Saccharomyces cerevisiae, to EC 4.3.2.3 (ureidoglycolate lyase, releasing urea). The EC 3.5.3.19 should be more appropriately named ureidoglycolate amidohydrolase and include genes equivalent to UAH of Arabidopsis thaliana. The distinction between ammonia- or urea-releasing activities from ureidoglycolate is relevant for the understanding of nitrogen metabolism in various organisms and of virulence factors in certain pathogens rather than a nomenclature problem. We trace the original fault in database annotation and provide a rationale for its incorporation and persistence in the scientific literature. Notwithstanding the technological distance, yet not surprising for the constancy of human nature, error categories and mechanisms established in the study of the work of amanuensis monks still apply to the modern curation of biological databases.
Collapse
Affiliation(s)
- Riccardo Percudani
- Department of Life Sciences, Laboratory of Biochemistry, Molecular Biology and Bioinformatics, University of Parma, Italy
| | | | | |
Collapse
|
123
|
Kuperstein I, Cohen DPA, Pook S, Viara E, Calzone L, Barillot E, Zinovyev A. NaviCell: a web-based environment for navigation, curation and maintenance of large molecular interaction maps. BMC SYSTEMS BIOLOGY 2013; 7:100. [PMID: 24099179 PMCID: PMC3851986 DOI: 10.1186/1752-0509-7-100] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/26/2012] [Accepted: 09/20/2013] [Indexed: 11/24/2022]
Abstract
Background Molecular biology knowledge can be formalized and systematically represented in a computer-readable form as a comprehensive map of molecular interactions. There exist an increasing number of maps of molecular interactions containing detailed and step-wise description of various cell mechanisms. It is difficult to explore these large maps, to organize discussion of their content and to maintain them. Several efforts were recently made to combine these capabilities together in one environment, and NaviCell is one of them. Results NaviCell is a web-based environment for exploiting large maps of molecular interactions, created in CellDesigner, allowing their easy exploration, curation and maintenance. It is characterized by a combination of three essential features: (1) efficient map browsing based on Google Maps; (2) semantic zooming for viewing different levels of details or of abstraction of the map and (3) integrated web-based blog for collecting community feedback. NaviCell can be easily used by experts in the field of molecular biology for studying molecular entities of interest in the context of signaling pathways and crosstalk between pathways within a global signaling network. NaviCell allows both exploration of detailed molecular mechanisms represented on the map and a more abstract view of the map up to a top-level modular representation. NaviCell greatly facilitates curation, maintenance and updating the comprehensive maps of molecular interactions in an interactive and user-friendly fashion due to an imbedded blogging system. Conclusions NaviCell provides user-friendly exploration of large-scale maps of molecular interactions, thanks to Google Maps and WordPress interfaces, with which many users are already familiar. Semantic zooming which is used for navigating geographical maps is adopted for molecular maps in NaviCell, making any level of visualization readable. In addition, NaviCell provides a framework for community-based curation of maps.
Collapse
|
124
|
Carbonetto P, Stephens M. Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for IL-2 signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn's disease. PLoS Genet 2013; 9:e1003770. [PMID: 24098138 PMCID: PMC3789883 DOI: 10.1371/journal.pgen.1003770] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2012] [Accepted: 07/22/2013] [Indexed: 12/17/2022] Open
Abstract
Pathway analyses of genome-wide association studies aggregate information over sets of related genes, such as genes in common pathways, to identify gene sets that are enriched for variants associated with disease. We develop a model-based approach to pathway analysis, and apply this approach to data from the Wellcome Trust Case Control Consortium (WTCCC) studies. Our method offers several benefits over existing approaches. First, our method not only interrogates pathways for enrichment of disease associations, but also estimates the level of enrichment, which yields a coherent way to promote variants in enriched pathways, enhancing discovery of genes underlying disease. Second, our approach allows for multiple enriched pathways, a feature that leads to novel findings in two diseases where the major histocompatibility complex (MHC) is a major determinant of disease susceptibility. Third, by modeling disease as the combined effect of multiple markers, our method automatically accounts for linkage disequilibrium among variants. Interrogation of pathways from eight pathway databases yields strong support for enriched pathways, indicating links between Crohn's disease (CD) and cytokine-driven networks that modulate immune responses; between rheumatoid arthritis (RA) and "Measles" pathway genes involved in immune responses triggered by measles infection; and between type 1 diabetes (T1D) and IL2-mediated signaling genes. Prioritizing variants in these enriched pathways yields many additional putative disease associations compared to analyses without enrichment. For CD and RA, 7 of 8 additional non-MHC associations are corroborated by other studies, providing validation for our approach. For T1D, prioritization of IL-2 signaling genes yields strong evidence for 7 additional non-MHC candidate disease loci, as well as suggestive evidence for several more. Of the 7 strongest associations, 4 are validated by other studies, and 3 (near IL-2 signaling genes RAF1, MAPK14, and FYN) constitute novel putative T1D loci for further study.
Collapse
Affiliation(s)
- Peter Carbonetto
- Dept. of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Matthew Stephens
- Dept. of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Dept. of Statistics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
125
|
Demir E, Babur Ö, Rodchenkov I, Aksoy BA, Fukuda KI, Gross B, Sümer OS, Bader GD, Sander C. Using biological pathway data with paxtools. PLoS Comput Biol 2013; 9:e1003194. [PMID: 24068901 PMCID: PMC3777916 DOI: 10.1371/journal.pcbi.1003194] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2013] [Accepted: 06/25/2013] [Indexed: 11/18/2022] Open
Abstract
A rapidly growing corpus of formal, computable pathway information can be used to answer important biological questions including finding non-trivial connections between cellular processes, identifying significantly altered portions of the cellular network in a disease state and building predictive models that can be used for precision medicine. Due to its complexity and fragmented nature, however, working with pathway data is still difficult. We present Paxtools, a Java library that contains algorithms, software components and converters for biological pathways represented in the standard BioPAX language. Paxtools allows scientists to focus on their scientific problem by removing technical barriers to access and analyse pathway information. Paxtools can run on any platform that has a Java Runtime Environment and was tested on most modern operating systems. Paxtools is open source and is available under the Lesser GNU public license (LGPL), which allows users to freely use the code in their software systems with a requirement for attribution. Source code for the current release (4.2.0) can be found in Software S1. A detailed manual for obtaining and using Paxtools can be found in Protocol S1. The latest sources and release bundles can be obtained from biopax.org/paxtools.
Collapse
Affiliation(s)
- Emek Demir
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
- * E-mail:
| | - Özgün Babur
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Igor Rodchenkov
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Bülent Arman Aksoy
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
- Tri-Institutional Training Program, Computational Biology and Medicine New York, New York, United States of America
| | - Ken I. Fukuda
- Intelligent Information Infrastructure Division, National Institute of Advanced Industrial Science and Technology, Tsukuba, Japan
| | - Benjamin Gross
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Onur Selçuk Sümer
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Gary D. Bader
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Chris Sander
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| |
Collapse
|
126
|
Gao L, Du G, Zhou J, Chen J, Liu J. Characterization of a group of pyrroloquinoline quinone-dependent dehydrogenases that are involved in the conversion of L-sorbose to 2-Keto-L-gulonic acid in Ketogulonicigenium vulgare WSH-001. Biotechnol Prog 2013; 29:1398-404. [PMID: 23970495 DOI: 10.1002/btpr.1803] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2013] [Revised: 06/02/2013] [Indexed: 11/09/2022]
Abstract
Ketogulonicigenium vulgare WSH-001 is an industrial strain used for vitamin C production. Based on genome sequencing and pathway analysis of the bacterium, some of its potential pyrroloquinoline quinone (PQQ)-dependent dehydrogenases were predicted, including KVU_pmdA_0245, KVU_2142, KVU_2159, KVU_1366, KVU_0203, KVU_0095, and KVU_pmdB_0115. BLAST and function domain searches showed that enzymes encoded by these genes may act as putative PQQ-dependent L-sorbose dehydrogenases (SDH) or L-sorbosone dehydrogenases (SNDH). To validate whether these dehydrogenases are PQQ-dependent or not, these seven putative dehyrogenases were overexpressed in Escherichia coli BL21 (DE3) and purified for characterization. Biochemical and kinetic characterization of the purified proteins have led to the identification of seven enzymes that possess the ability to oxidize L-sorbose or L-sorbosone to varying degrees. In addition, the dehydrogenation of sorbose in K. vulgare is validated to be PQQ dependent, identification of these PQQ-dependent dehydrogenases expanded the PQQ-dependent dehydrogenase family. Besides, the optimal combination of enzymes that could more efficiently catalyze the conversion of sorbose to gulonic acid was proposed. These are important in supporting the development of metabolic engineering strategies and engineering of efficient strains for one-step production of vitamin C in the future.
Collapse
Affiliation(s)
- Lili Gao
- School of Biotechnology, Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | | | | | | | | |
Collapse
|
127
|
Network-based approaches in drug discovery and early development. Clin Pharmacol Ther 2013; 94:651-8. [PMID: 24025802 DOI: 10.1038/clpt.2013.176] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 09/03/2013] [Indexed: 12/20/2022]
Abstract
Identification of novel targets is a critical first step in the drug discovery and development process. Most diseases such as cancer, metabolic disorders, and neurological disorders are complex, and their pathogenesis involves multiple genetic and environmental factors. Finding a viable drug target-drug combination with high potential for yielding clinical success within the efficacy-toxicity spectrum is extremely challenging. Many examples are now available in which network-based approaches show potential for the identification of novel targets and for the repositioning of established targets. The objective of this article is to highlight network approaches for identifying novel targets with greater chances of gaining approved drugs with maximal efficacy and minimal side effects. Further enhancement of these approaches may emerge from effectively integrating computational systems biology with pharmacodynamic systems analysis. Coupling genomics, proteomics, and metabolomics databases with systems pharmacology modeling may aid in the development of disease-specific networks that can be further used to build confidence in target identification.
Collapse
|
128
|
|
129
|
Koskimaki JE, Blazier AS, Clarens AF, Papin JA. Computational Models of Algae Metabolism for Industrial Applications. Ind Biotechnol (New Rochelle N Y) 2013. [DOI: 10.1089/ind.2013.0012] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Affiliation(s)
- Jacob E. Koskimaki
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA
| | - Anna S. Blazier
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA
| | - Andres F. Clarens
- Department of Civil and Environmental Engineering, University of Virginia, Charlottesville, VA
| | - Jason A. Papin
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA
| |
Collapse
|
130
|
Helicobacter pylori salvages purines from extracellular host cell DNA utilizing the outer membrane-associated nuclease NucT. J Bacteriol 2013; 195:4387-98. [PMID: 23893109 DOI: 10.1128/jb.00388-13] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Helicobacter pylori is a bacterial pathogen that establishes life-long infections in humans, and its presence in the gastric epithelium is strongly associated with gastritis, peptic ulcer disease, and gastric cancer. Having evolved in this specific gastric niche for hundreds of thousands of years, this microbe has become dependent on its human host. Bioinformatic analysis reveals that H. pylori has lost several genes involved in the de novo synthesis of purine nucleotides, and without this pathway present, H. pylori must salvage purines from its environment in order to grow. While the presence and abundance of free purines in various mammalian tissues has been loosely quantified, the concentration of purines present within the gastric mucosa remains unknown. There is evidence, however, that a significant amount of extracellular DNA is present in the human gastric mucosal layer as a result of epithelial cell turnover, and this DNA has the potential to serve as an adequate purine source for gastric purine auxotrophs. In this study, we characterize the ability of H. pylori to grow utilizing only DNA as a purine source. We show that this ability is independent of the ComB DNA uptake system, and that H. pylori utilization of DNA as a purine source is largely influenced by the presence of an outer membrane-associated nuclease (NucT). A ΔnucT mutant exhibits significantly reduced extracellular nuclease activity and is deficient in growth when DNA is provided as the sole purine source in laboratory growth media. These growth defects are also evident when this nuclease mutant is grown in the presence of AGS cells or in purine-free tissue culture medium that has been conditioned by AGS cells in the absence of fetal bovine serum. Taken together, these results indicate that the salvage of purines from exogenous host cell DNA plays an important role in allowing H. pylori to meet its purine requirements for growth.
Collapse
|
131
|
Poliquin PO, Chen J, Cloutier M, Trudeau LÉ, Jolicoeur M. Metabolomics and in-silico analysis reveal critical energy deregulations in animal models of Parkinson's disease. PLoS One 2013; 8:e69146. [PMID: 23935941 PMCID: PMC3720533 DOI: 10.1371/journal.pone.0069146] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2013] [Accepted: 06/04/2013] [Indexed: 11/18/2022] Open
Abstract
Parkinson's disease (PD) is a multifactorial disease known to result from a variety of factors. Although age is the principal risk factor, other etiological mechanisms have been identified, including gene mutations and exposure to toxins. Deregulation of energy metabolism, mostly through the loss of complex I efficiency, is involved in disease progression in both the genetic and sporadic forms of the disease. In this study, we investigated energy deregulation in the cerebral tissue of animal models (genetic and toxin induced) of PD using an approach that combines metabolomics and mathematical modelling. In a first step, quantitative measurements of energy-related metabolites in mouse brain slices revealed most affected pathways. A genetic model of PD, the Park2 knockout, was compared to the effect of CCCP, a mitochondrial uncoupler [corrected]. Model simulated and experimental results revealed a significant and sustained decrease in ATP after CCCP exposure, but not in the genetic mice model. In support to data analysis, a mathematical model of the relevant metabolic pathways was developed and calibrated onto experimental data. In this work, we show that a short-term stress response in nucleotide scavenging is most probably induced by the toxin exposure. In turn, the robustness of energy-related pathways in the model explains how genetic perturbations, at least in young animals, are not sufficient to induce significant changes at the metabolite level.
Collapse
Affiliation(s)
- Pierre O. Poliquin
- Department of Chemical Engineering, École Polytechnique de Montréal, Montréal, Quebec, Canada
| | - Jingkui Chen
- Department of Chemical Engineering, École Polytechnique de Montréal, Montréal, Quebec, Canada
| | - Mathieu Cloutier
- GERAD and Department of Chemical Engineering, École Polytechnique de Montréal, Montréal, Quebec, Canada
| | - Louis-Éric Trudeau
- Department of Pharmacology, Faculty of Medicine, Université de Montréal, Montréal, Quebec, Canada
| | - Mario Jolicoeur
- Department of Chemical Engineering, École Polytechnique de Montréal, Montréal, Quebec, Canada
- * E-mail:
| |
Collapse
|
132
|
Chung BKS, Dick T, Lee DY. In silico analyses for the discovery of tuberculosis drug targets. J Antimicrob Chemother 2013; 68:2701-9. [DOI: 10.1093/jac/dkt273] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
133
|
Identification of drug targets by chemogenomic and metabolomic profiling in yeast. Pharmacogenet Genomics 2013; 22:877-86. [PMID: 23076370 DOI: 10.1097/fpc.0b013e32835aa888] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
OBJECTIVE To advance our understanding of disease biology, the characterization of the molecular target for clinically proven or new drugs is very important. Because of its simplicity and the availability of strains with individual deletions in all of its genes, chemogenomic profiling in yeast has been used to identify drug targets. As measurement of drug-induced changes in cellular metabolites can yield considerable information about the effects of a drug, we investigated whether combining chemogenomic and metabolomic profiling in yeast could improve the characterization of drug targets. BASIC METHODS We used chemogenomic and metabolomic profiling in yeast to characterize the target for five drugs acting on two biologically important pathways. A novel computational method that uses a curated metabolic network was also developed, and it was used to identify the genes that are likely to be responsible for the metabolomic differences found. RESULTS AND CONCLUSION The combination of metabolomic and chemogenomic profiling, along with data analyses carried out using a novel computational method, could robustly identify the enzymes targeted by five drugs. Moreover, this novel computational method has the potential to identify genes that are causative of metabolomic differences or drug targets.
Collapse
|
134
|
Predictions of Enzymatic Parameters: A Mini-Review with Focus on Enzymes for Biofuel. Appl Biochem Biotechnol 2013; 171:590-615. [DOI: 10.1007/s12010-013-0328-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Accepted: 06/11/2013] [Indexed: 12/25/2022]
|
135
|
Quantification of endospore-forming firmicutes by quantitative PCR with the functional gene spo0A. Appl Environ Microbiol 2013; 79:5302-12. [PMID: 23811505 DOI: 10.1128/aem.01376-13] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Bacterial endospores are highly specialized cellular forms that allow endospore-forming Firmicutes (EFF) to tolerate harsh environmental conditions. EFF are considered ubiquitous in natural environments, in particular, those subjected to stress conditions. In addition to natural habitats, EFF are often the cause of contamination problems in anthropogenic environments, such as industrial production plants or hospitals. It is therefore desirable to assess their prevalence in environmental and industrial fields. To this end, a high-sensitivity detection method is still needed. The aim of this study was to develop and evaluate an approach based on quantitative PCR (qPCR). For this, the suitability of functional genes specific for and common to all EFF were evaluated. Seven genes were considered, but only spo0A was retained to identify conserved regions for qPCR primer design. An approach based on multivariate analysis was developed for primer design. Two primer sets were obtained and evaluated with 16 pure cultures, including representatives of the genera Bacillus, Paenibacillus, Brevibacillus, Geobacillus, Alicyclobacillus, Sulfobacillus, Clostridium, and Desulfotomaculum, as well as with environmental samples. The primer sets developed gave a reliable quantification when tested on laboratory strains, with the exception of Sulfobacillus and Desulfotomaculum. A test using sediment samples with a diverse EFF community also gave a reliable quantification compared to 16S rRNA gene pyrosequencing. A detection limit of about 10(4) cells (or spores) per gram of initial material was calculated, indicating this method has a promising potential for the detection of EFF over a wide range of applications.
Collapse
|
136
|
Sorokina SY, Kuptzov VN, Urban YN, Fokin AV, Pojarkov SV, Ivankov MY, Melnikov AI, Kulikov AM. Databases as instruments for analysis of large-scale data sets of interactions between molecular biological objects. BIOL BULL+ 2013. [DOI: 10.1134/s1062359013030096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
137
|
Sawada Y, Hirai MY. Integrated LC-MS/MS system for plant metabolomics. Comput Struct Biotechnol J 2013; 4:e201301011. [PMID: 24688692 PMCID: PMC3962214 DOI: 10.5936/csbj.201301011] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2013] [Revised: 04/01/2013] [Accepted: 04/05/2013] [Indexed: 12/31/2022] Open
Abstract
Liquid chromatography-tandem mass spectrometry (LC-MS/MS) is highly sensitive, selective, and enables extensive detection of metabolites within a sample. The result allows us to characterize comprehensive metabolite accumulation patterns without dependence on authentic standard compounds and isolation of the individual metabolites. A reference database search is essential for the structural assignment process of un-targeted MS and MS/MS data. Moreover, the characterization of unknown metabolites is challenging, since these cannot be assigned a candidate structure by using a reference database. In this case study, integrated LC-MS/MS based plant metabolomics allows us to detect several hundred metabolites in a sample; and integrated omics analyses, e.g., large-scale reverse genetics, linkage mapping, and association mapping, provides a powerful tool for candidate structure selection or rejection. We also examine emerging technology and applications for LC-MS/MS-based un-targeted plant metabolomics. These activities promote the characterization of massive extended detectable metabolites.
Collapse
Affiliation(s)
- Yuji Sawada
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan ; RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Masami Yokota Hirai
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan ; RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan ; JST, CREST, 4-1-8 Hon-chou, Kawaguchi, Saitama 332-0012,Japan
| |
Collapse
|
138
|
Lee DH, Lim JA, Lee J, Roh E, Jung K, Choi M, Oh C, Ryu S, Yun J, Heu S. Characterization of genes required for the pathogenicity of Pectobacterium carotovorum subsp. carotovorum Pcc21 in Chinese cabbage. MICROBIOLOGY-SGM 2013; 159:1487-1496. [PMID: 23676432 PMCID: PMC3749726 DOI: 10.1099/mic.0.067280-0] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Pectobacterium carotovorum subsp. carotovorum is a well-known plant pathogen that causes severe soft rot disease in various crops, resulting in considerable economic loss. To identify pathogenicity-related factors, Chinese cabbage was inoculated with 5314 transposon mutants of P. carotovorum subsp. carotovorum Pcc21 derived using Tn5 transposon mutagenesis. A total of 35 reduced-virulence or avirulent mutants were isolated, and 14 loci were identified. The 14 loci could be functionally grouped into nutrient utilization (pyrD, purH, purD, leuA and serB), production of plant cell-wall-degrading enzymes (PCWDEs) (expI, expR and PCC21_023220), motility (flgA, fliA and flhB), biofilm formation (expI, expR and qseC), susceptibility to antibacterial plant chemicals (tolC) and unknown function (ECA2640). Among the 14 genes identified, qseC, tolC and PCC21_023220 are novel pathogenicity factors of P. carotovorum subsp. carotovorum involved in biofilm formation, phytochemical resistance and PCWDE production, respectively.
Collapse
Affiliation(s)
- Dong Hwan Lee
- Division of Microbial Safety, National Academy of Agricultural Science, Rural Development Administration, Suwon 441-707, Republic of Korea
| | - Jeong-A Lim
- Division of Microbial Safety, National Academy of Agricultural Science, Rural Development Administration, Suwon 441-707, Republic of Korea
| | - Juneok Lee
- Division of Microbial Safety, National Academy of Agricultural Science, Rural Development Administration, Suwon 441-707, Republic of Korea
| | - Eunjung Roh
- Division of Microbial Safety, National Academy of Agricultural Science, Rural Development Administration, Suwon 441-707, Republic of Korea
| | - Kyusuk Jung
- Division of Microbial Safety, National Academy of Agricultural Science, Rural Development Administration, Suwon 441-707, Republic of Korea
| | - Minseon Choi
- Department of Horticultural Biotechnology and Institute of Life Science & Resources, Kyung Hee University, Yongin 441-701, Republic of Korea
| | - Changsik Oh
- Department of Horticultural Biotechnology and Institute of Life Science & Resources, Kyung Hee University, Yongin 441-701, Republic of Korea
| | - Sangryeol Ryu
- Department of Agricultural Biotechnology, Center for Agricultural Biomaterials, Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul 151-921, Republic of Korea
| | - Jongchul Yun
- Division of Microbial Safety, National Academy of Agricultural Science, Rural Development Administration, Suwon 441-707, Republic of Korea
| | - Sunggi Heu
- Division of Microbial Safety, National Academy of Agricultural Science, Rural Development Administration, Suwon 441-707, Republic of Korea
| |
Collapse
|
139
|
Van Moerkercke A, Fabris M, Pollier J, Baart GJE, Rombauts S, Hasnain G, Rischer H, Memelink J, Oksman-Caldentey KM, Goossens A. CathaCyc, a metabolic pathway database built from Catharanthus roseus RNA-Seq data. PLANT & CELL PHYSIOLOGY 2013; 54:673-85. [PMID: 23493402 DOI: 10.1093/pcp/pct039] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The medicinal plant Madagascar periwinkle (Catharanthus roseus) synthesizes numerous terpenoid indole alkaloids (TIAs), such as the anticancer drugs vinblastine and vincristine. The TIA pathway operates in a complex metabolic network that steers plant growth and survival. Pathway databases and metabolic networks reconstructed from 'omics' sequence data can help to discover missing enzymes, study metabolic pathway evolution and, ultimately, engineer metabolic pathways. To date, such databases have mainly been built for model plant species with sequenced genomes. Although genome sequence data are not available for most medicinal plant species, next-generation sequencing is now extensively employed to create comprehensive medicinal plant transcriptome sequence resources. Here we report on the construction of CathaCyc, a detailed metabolic pathway database, from C. roseus RNA-Seq data sets. CathaCyc (version 1.0) contains 390 pathways with 1,347 assigned enzymes and spans primary and secondary metabolism. Curation of the pathways linked with the synthesis of TIAs and triterpenoids, their primary metabolic precursors, and their elicitors, the jasmonate hormones, demonstrated that RNA-Seq resources are suitable for the construction of pathway databases. CathaCyc is accessible online (http://www.cathacyc.org) and offers a range of tools for the visualization and analysis of metabolic networks and 'omics' data. Overlay with expression data from publicly available RNA-Seq resources demonstrated that two well-characterized C. roseus terpenoid pathways, those of TIAs and triterpenoids, are subject to distinct regulation by both developmental and environmental cues. We anticipate that databases such as CathaCyc will become key to the study and exploitation of the metabolism of medicinal plants.
Collapse
|
140
|
Glez-Peña D, Lourenço A, López-Fernández H, Reboiro-Jato M, Fdez-Riverola F. Web scraping technologies in an API world. Brief Bioinform 2013; 15:788-97. [PMID: 23632294 DOI: 10.1093/bib/bbt026] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Web services are the de facto standard in biomedical data integration. However, there are data integration scenarios that cannot be fully covered by Web services. A number of Web databases and tools do not support Web services, and existing Web services do not cover for all possible user data demands. As a consequence, Web data scraping, one of the oldest techniques for extracting Web contents, is still in position to offer a valid and valuable service to a wide range of bioinformatics applications, ranging from simple extraction robots to online meta-servers. This article reviews existing scraping frameworks and tools, identifying their strengths and limitations in terms of extraction capabilities. The main focus is set on showing how straightforward it is today to set up a data scraping pipeline, with minimal programming effort, and answer a number of practical needs. For exemplification purposes, we introduce a biomedical data extraction scenario where the desired data sources, well-known in clinical microbiology and similar domains, do not offer programmatic interfaces yet. Moreover, we describe the operation of WhichGenes and PathJam, two bioinformatics meta-servers that use scraping as means to cope with gene set enrichment analysis.
Collapse
|
141
|
Inactivation of the Pta-AckA pathway causes cell death in Staphylococcus aureus. J Bacteriol 2013; 195:3035-44. [PMID: 23625849 DOI: 10.1128/jb.00042-13] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
During growth under conditions of glucose and oxygen excess, Staphylococcus aureus predominantly accumulates acetate in the culture medium, suggesting that the phosphotransacetylase-acetate kinase (Pta-AckA) pathway plays a crucial role in bacterial fitness. Previous studies demonstrated that these conditions also induce the S. aureus CidR regulon involved in the control of cell death. Interestingly, the CidR regulon is comprised of only two operons, both encoding pyruvate catabolic enzymes, suggesting an intimate relationship between pyruvate metabolism and cell death. To examine this relationship, we introduced ackA and pta mutations in S. aureus and tested their effects on bacterial growth, carbon and energy metabolism, cid expression, and cell death. Inactivation of the Pta-AckA pathway showed a drastic inhibitory effect on growth and caused accumulation of dead cells in both pta and ackA mutants. Surprisingly, inactivation of the Pta-AckA pathway did not lead to a decrease in the energy status of bacteria, as the intracellular concentrations of ATP, NAD(+), and NADH were higher in the mutants. However, inactivation of this pathway increased the rate of glucose consumption, led to a metabolic block at the pyruvate node, and enhanced carbon flux through both glycolysis and the tricarboxylic acid (TCA) cycle. Intriguingly, disruption of the Pta-AckA pathway also induced the CidR regulon, suggesting that activation of alternative pyruvate catabolic pathways could be an important survival strategy for the mutants. Collectively, the results of this study demonstrate the indispensable role of the Pta-AckA pathway in S. aureus for maintaining energy and metabolic homeostasis during overflow metabolism.
Collapse
|
142
|
Steeb B, Claudi B, Burton NA, Tienz P, Schmidt A, Farhan H, Mazé A, Bumann D. Parallel exploitation of diverse host nutrients enhances Salmonella virulence. PLoS Pathog 2013; 9:e1003301. [PMID: 23633950 PMCID: PMC3636032 DOI: 10.1371/journal.ppat.1003301] [Citation(s) in RCA: 128] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2012] [Accepted: 02/26/2013] [Indexed: 12/20/2022] Open
Abstract
Pathogen access to host nutrients in infected tissues is fundamental for pathogen growth and virulence, disease progression, and infection control. However, our understanding of this crucial process is still rather limited because of experimental and conceptual challenges. Here, we used proteomics, microbial genetics, competitive infections, and computational approaches to obtain a comprehensive overview of Salmonella nutrition and growth in a mouse typhoid fever model. The data revealed that Salmonella accessed an unexpectedly diverse set of at least 31 different host nutrients in infected tissues but the individual nutrients were available in only scarce amounts. Salmonella adapted to this situation by expressing versatile catabolic pathways to simultaneously exploit multiple host nutrients. A genome-scale computational model of Salmonella in vivo metabolism based on these data was fully consistent with independent large-scale experimental data on Salmonella enzyme quantities, and correctly predicted 92% of 738 reported experimental mutant virulence phenotypes, suggesting that our analysis provided a comprehensive overview of host nutrient supply, Salmonella metabolism, and Salmonella growth during infection. Comparison of metabolic networks of other pathogens suggested that complex host/pathogen nutritional interfaces are a common feature underlying many infectious diseases.
Collapse
Affiliation(s)
- Benjamin Steeb
- Focal Area Infection Biology, Biozentrum, University of Basel, Basel, Switzerland
| | - Beatrice Claudi
- Focal Area Infection Biology, Biozentrum, University of Basel, Basel, Switzerland
| | - Neil A. Burton
- Focal Area Infection Biology, Biozentrum, University of Basel, Basel, Switzerland
| | - Petra Tienz
- Focal Area Infection Biology, Biozentrum, University of Basel, Basel, Switzerland
| | - Alexander Schmidt
- Proteomics Core Facility, Biozentrum, University of Basel, Basel, Switzerland
| | - Hesso Farhan
- Focal Area Infection Biology, Biozentrum, University of Basel, Basel, Switzerland
| | - Alain Mazé
- Focal Area Infection Biology, Biozentrum, University of Basel, Basel, Switzerland
| | - Dirk Bumann
- Focal Area Infection Biology, Biozentrum, University of Basel, Basel, Switzerland
| |
Collapse
|
143
|
Vasco-Cárdenas MF, Baños S, Ramos A, Martín JF, Barreiro C. Proteome response of Corynebacterium glutamicum to high concentration of industrially relevant C₄ and C₅ dicarboxylic acids. J Proteomics 2013; 85:65-88. [PMID: 23624027 DOI: 10.1016/j.jprot.2013.04.019] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2012] [Revised: 03/05/2013] [Accepted: 04/09/2013] [Indexed: 12/11/2022]
Abstract
UNLABELLED More than fifty years of industrial and scientific developments on the amino acid-producer strain Corynebacterium glutamicum has generated an extremely huge knowledge highly applicable to the development of new products. Despite the production of dicarboxylic acids has already been engineered in C. glutamicum, the effect caused by these acids at competitive industrial levels has not yet been described. Thus, aspartic, fumaric, itaconic, malic and succinic acids have been tested on the growth of C. glutamicum to obtain their minimal inhibitory concentrations and their intracellular effects analyzed by 2D-DIGE. This analysis showed the modification of the central metabolism of C. glutamicum, the cross-regulation between malic acid and glucose as well as the aspartic acid utilization as nitrogen source. The analysis of the transcriptional regulators involved in the control of the detected proteins pointed to the ramB gene as a candidate for strain improvement. The analysis of the ΔramB mutant demonstrated its function as an enhancer of the growth speed or resistance level against aspartic, fumaric, itaconic and malic acids in C. glutamicum. BIOLOGICAL SIGNIFICANCE The effect of dicarboxylic acids addition to the C. glutamicum culture broth has been described. This proteome response is detailed and the deletion of a global regulator (ramB) has been described as a possible improving method for industrial strains. In addition, the consumption of aspartic acid as nitrogen source has been described for the first time in C. glutamicum, as well as, the cross-regulation between malic acid and glucose through the F0F1 respiratory system.
Collapse
Affiliation(s)
- María F Vasco-Cárdenas
- Área de Microbiología, Departamento de Biología Molecular, Universidad de León, Campus de Vegazana s/n, 24071 León, Spain
| | | | | | | | | |
Collapse
|
144
|
Kremmydas GF, Tampakaki AP, Georgakopoulos DG. Characterization of the biocontrol activity of pseudomonas fluorescens strain X reveals novel genes regulated by glucose. PLoS One 2013; 8:e61808. [PMID: 23596526 PMCID: PMC3626644 DOI: 10.1371/journal.pone.0061808] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2011] [Accepted: 03/18/2013] [Indexed: 11/18/2022] Open
Abstract
Pseudomonas fluorescens strain X, a bacterial isolate from the rhizosphere of bean seedlings, has the ability to suppress damping-off caused by the oomycete Pythium ultimum. To determine the genes controlling the biocontrol activity of strain X, transposon mutagenesis, sequencing and complementation was performed. Results indicate that, biocontrol ability of this isolate is attributed to gcd gene encoding glucose dehydrogenase, genes encoding its co-enzyme pyrroloquinoline quinone (PQQ), and two genes (sup5 and sup6) which seem to be organized in a putative operon. This operon (named supX) consists of five genes, one of which encodes a non-ribosomal peptide synthase. A unique binding site for a GntR-type transcriptional factor is localized upstream of the supX putative operon. Synteny comparison of the genes in supX revealed that they are common in the genus Pseudomonas, but with a low degree of similarity. supX shows high similarity only to the mangotoxin operon of Ps. syringae pv. syringae UMAF0158. Quantitative real-time PCR analysis indicated that transcription of supX is strongly reduced in the gcd and PQQ-minus mutants of Ps. fluorescens strain X. On the contrary, transcription of supX in the wild type is enhanced by glucose and transcription levels that appear to be higher during the stationary phase. Gcd, which uses PQQ as a cofactor, catalyses the oxidation of glucose to gluconic acid, which controls the activity of the GntR family of transcriptional factors. The genes in the supX putative operon have not been implicated before in the biocontrol of plant pathogens by pseudomonads. They are involved in the biosynthesis of an antimicrobial compound by Ps. fluorescens strain X and their transcription is controlled by glucose, possibly through the activity of a GntR-type transcriptional factor binding upstream of this putative operon.
Collapse
Affiliation(s)
- Gerasimos F. Kremmydas
- Department of Agricultural Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Anastasia P. Tampakaki
- Department of Agricultural Biotechnology, Agricultural University of Athens, Athens, Greece
| | | |
Collapse
|
145
|
Kang C, Yu H, Yi GS. Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data. BMC Med Inform Decis Mak 2013; 13 Suppl 1:S3. [PMID: 23566118 PMCID: PMC3618247 DOI: 10.1186/1472-6947-13-s1-s3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Background Due to the low statistical power of individual markers from a genome-wide association study (GWAS), detecting causal single nucleotide polymorphisms (SNPs) for complex diseases is a challenge. SNP combinations are suggested to compensate for the low statistical power of individual markers, but SNP combinations from GWAS generate high computational complexity. Methods We aim to detect type 2 diabetes (T2D) causal SNP combinations from a GWAS dataset with optimal filtration and to discover the biological meaning of the detected SNP combinations. Optimal filtration can enhance the statistical power of SNP combinations by comparing the error rates of SNP combinations from various Bonferroni thresholds and p-value range-based thresholds combined with linkage disequilibrium (LD) pruning. T2D causal SNP combinations are selected using random forests with variable selection from an optimal SNP dataset. T2D causal SNP combinations and genome-wide SNPs are mapped into functional modules using expanded gene set enrichment analysis (GSEA) considering pathway, transcription factor (TF)-target, miRNA-target, gene ontology, and protein complex functional modules. The prediction error rates are measured for SNP sets from functional module-based filtration that selects SNPs within functional modules from genome-wide SNPs based expanded GSEA. Results A T2D causal SNP combination containing 101 SNPs from the Wellcome Trust Case Control Consortium (WTCCC) GWAS dataset are selected using optimal filtration criteria, with an error rate of 10.25%. Matching 101 SNPs with known T2D genes and functional modules reveals the relationships between T2D and SNP combinations. The prediction error rates of SNP sets from functional module-based filtration record no significance compared to the prediction error rates of randomly selected SNP sets and T2D causal SNP combinations from optimal filtration. Conclusions We propose a detection method for complex disease causal SNP combinations from an optimal SNP dataset by using random forests with variable selection. Mapping the biological meanings of detected SNP combinations can help uncover complex disease mechanisms.
Collapse
Affiliation(s)
- Chiyong Kang
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea
| | | | | |
Collapse
|
146
|
Remli MA, Deris S. An Approach for Biological Data Integration and Knowledge Retrieval Based on Ontology, Semantic Web Services Composition, and AI Planning. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
This chapter describes an approach involved in two knowledge management processes in biological fields, namely data integration and knowledge retrieval based on ontology, Web services, and Artificial Intelligence (AI) planning. For the data integration, Semantic Web combining with ontology is promising several ways to integrate a heterogeneous biological database. The goal of this work is to construct an integration approach for gram-positive bacteria organism that combines gene, protein, and pathway, thus allowing biological questions to be answered. The authors present a new perspective to retrieve knowledge by using Semantic Web services composition and Artificial Intelligence (AI) planning system, Simple Hierarchical Order Planner 2 (SHOP2). A Semantic Web service annotated with domain ontology is used to describe services for biological pathway knowledge retrieval at Kyoto Encyclopedia of Gene and Genomes (KEGG) database. The authors investigate the effectiveness of this approach by applying a real world scenario in pathway information retrieval for an organism where the biologist needs to discover the pathway description from a given specific gene of interest. Both of these two processes (data integration and knowledge retrieval) used ontology as the key role to achieve the biological goals.
Collapse
|
147
|
Jacobsen UP, Nielsen HB, Hildebrand F, Raes J, Sicheritz-Ponten T, Kouskoumvekaki I, Panagiotou G. The chemical interactome space between the human host and the genetically defined gut metabotypes. THE ISME JOURNAL 2013; 7:730-42. [PMID: 23178670 PMCID: PMC3603391 DOI: 10.1038/ismej.2012.141] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Revised: 07/30/2012] [Accepted: 09/28/2012] [Indexed: 01/07/2023]
Abstract
The bacteria that colonize the gastrointestinal tracts of mammals represent a highly selected microbiome that has a profound influence on human physiology by shaping the host's metabolic and immune system activity. Despite the recent advances on the biological principles that underlie microbial symbiosis in the gut of mammals, mechanistic understanding of the contributions of the gut microbiome and how variations in the metabotypes are linked to the host health are obscure. Here, we mapped the entire metabolic potential of the gut microbiome based solely on metagenomics sequencing data derived from fecal samples of 124 Europeans (healthy, obese and with inflammatory bowel disease). Interestingly, three distinct clusters of individuals with high, medium and low metabolic potential were observed. By illustrating these results in the context of bacterial population, we concluded that the abundance of the Prevotella genera is a key factor indicating a low metabolic potential. These metagenome-based metabolic signatures were used to study the interaction networks between bacteria-specific metabolites and human proteins. We found that thirty-three such metabolites interact with disease-relevant protein complexes several of which are highly expressed in cells and tissues involved in the signaling and shaping of the adaptive immune system and associated with squamous cell carcinoma and bladder cancer. From this set of metabolites, eighteen are present in DrugBank providing evidence that we carry a natural pharmacy in our guts. Furthermore, we established connections between the systemic effects of non-antibiotic drugs and the gut microbiome of relevance to drug side effects and health-care solutions.
Collapse
Affiliation(s)
- Ulrik Plesner Jacobsen
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Henrik Bjørn Nielsen
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
- NNF-Center for Biosustainability, Technical University of Denmark, Horsholm, Denmark
| | - Falk Hildebrand
- Research Group of Bioinformatics and (eco-)systems biology, Department of Structural Biology, VIB, Brussels, Belgium
- Research Group of Bioinformatics and (eco-)systems biology, Microbiology Unit (MICR), Department of Applied Biological Sciences (DBIT), Vrije Universiteit Brussel, Brussels, Belgium
| | - Jeroen Raes
- Research Group of Bioinformatics and (eco-)systems biology, Department of Structural Biology, VIB, Brussels, Belgium
- Research Group of Bioinformatics and (eco-)systems biology, Microbiology Unit (MICR), Department of Applied Biological Sciences (DBIT), Vrije Universiteit Brussel, Brussels, Belgium
| | - Thomas Sicheritz-Ponten
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
- NNF-Center for Biosustainability, Technical University of Denmark, Horsholm, Denmark
| | - Irene Kouskoumvekaki
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Gianni Panagiotou
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
- NNF-Center for Biosustainability, Technical University of Denmark, Horsholm, Denmark
- School of Biological Sciences, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
148
|
Foster A, Barnes N, Speight R, Morris PC, Keane MA. Role of amine oxidase expression to maintain putrescine homeostasis in Rhodococcus opacus. Enzyme Microb Technol 2013; 52:286-95. [DOI: 10.1016/j.enzmictec.2013.01.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Revised: 12/12/2012] [Accepted: 01/07/2013] [Indexed: 10/27/2022]
|
149
|
De Filippo C, Ramazzotti M, Fontana P, Cavalieri D. Bioinformatic approaches for functional annotation and pathway inference in metagenomics data. Brief Bioinform 2013; 13:696-710. [PMID: 23175748 PMCID: PMC3505041 DOI: 10.1093/bib/bbs070] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Metagenomic approaches are increasingly recognized as a baseline for understanding the
ecology and evolution of microbial ecosystems. The development of methods for pathway
inference from metagenomics data is of paramount importance to link a phenotype to a
cascade of events stemming from a series of connected sets of genes or proteins.
Biochemical and regulatory pathways have until recently been thought and modelled within
one cell type, one organism, one species. This vision is being dramatically changed by the
advent of whole microbiome sequencing studies, revealing the role of symbiotic microbial
populations in fundamental biochemical functions. The new landscape we face requires a
clear picture of the potentialities of existing tools and development of new tools to
characterize, reconstruct and model biochemical and regulatory pathways as the result of
integration of function in complex symbiotic interactions of ontologically and
evolutionary distinct cell types.
Collapse
|
150
|
Altman T, Travers M, Kothari A, Caspi R, Karp PD. A systematic comparison of the MetaCyc and KEGG pathway databases. BMC Bioinformatics 2013; 14:112. [PMID: 23530693 PMCID: PMC3665663 DOI: 10.1186/1471-2105-14-112] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 03/04/2013] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND The MetaCyc and KEGG projects have developed large metabolic pathway databases that are used for a variety of applications including genome analysis and metabolic engineering. We present a comparison of the compound, reaction, and pathway content of MetaCyc version 16.0 and a KEGG version downloaded on Feb-27-2012 to increase understanding of their relative sizes, their degree of overlap, and their scope. To assess their overlap, we must know the correspondences between compounds, reactions, and pathways in MetaCyc, and those in KEGG. We devoted significant effort to computational and manual matching of these entities, and we evaluated the accuracy of the correspondences. RESULTS KEGG contains 179 module pathways versus 1,846 base pathways in MetaCyc; KEGG contains 237 map pathways versus 296 super pathways in MetaCyc. KEGG pathways contain 3.3 times as many reactions on average as do MetaCyc pathways, and the databases employ different conceptualizations of metabolic pathways. KEGG contains 8,692 reactions versus 10,262 for MetaCyc. 6,174 KEGG reactions are components of KEGG pathways versus 6,348 for MetaCyc. KEGG contains 16,586 compounds versus 11,991 for MetaCyc. 6,912 KEGG compounds act as substrates in KEGG reactions versus 8,891 for MetaCyc. MetaCyc contains a broader set of database attributes than does KEGG, such as relationships from a compound to enzymes that it regulates, identification of spontaneous reactions, and the expected taxonomic range of metabolic pathways. MetaCyc contains many pathways not found in KEGG, from plants, fungi, metazoa, and actinobacteria; KEGG contains pathways not found in MetaCyc, for xenobiotic degradation, glycan metabolism, and metabolism of terpenoids and polyketides. MetaCyc contains fewer unbalanced reactions, which facilitates metabolic modeling such as using flux-balance analysis. MetaCyc includes generic reactions that may be instantiated computationally. CONCLUSIONS KEGG contains significantly more compounds than does MetaCyc, whereas MetaCyc contains significantly more reactions and pathways than does KEGG, in particular KEGG modules are quite incomplete. The number of reactions occurring in pathways in the two DBs are quite similar.
Collapse
Affiliation(s)
- Tomer Altman
- Bioinformatics Research Group, SRI International, Menlo Park, USA
| | | | | | | | | |
Collapse
|