1
|
Siddharth T, Lewis NE. Predicting pathways for old and new metabolites through clustering. J Theor Biol 2024; 578:111684. [PMID: 38048983 PMCID: PMC11139542 DOI: 10.1016/j.jtbi.2023.111684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 11/17/2023] [Accepted: 11/29/2023] [Indexed: 12/06/2023]
Abstract
The diverse metabolic pathways are fundamental to all living organisms, as they harvest energy, synthesize biomass components, produce molecules to interact with the microenvironment, and neutralize toxins. While the discovery of new metabolites and pathways continues, the prediction of pathways for new metabolites can be challenging. It can take vast amounts of time to elucidate pathways for new metabolites; thus, according to HMDB (Human Metabolome Database), only 60% of metabolites get assigned to pathways. Here, we present an approach to identify pathways based on metabolite structure. We extracted 201 features from SMILES annotations and identified new metabolites from PubMed abstracts and HMDB. After applying clustering algorithms to both groups of features, we quantified correlations between metabolites, and found the clusters accurately linked 92% of known metabolites to their respective pathways. Thus, this approach could be valuable for predicting metabolic pathways for new metabolites.
Collapse
Affiliation(s)
- Thiru Siddharth
- Department of Computer Science and Engineering, Indian Institute of Information Technology, Bhopal, MP 462003, India
| | - Nathan E Lewis
- Department of Pediatrics and Bioengineering, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
2
|
Karp PD, Paley S, Caspi R, Kothari A, Krummenacker M, Midford PE, Moore LR, Subhraveti P, Gama-Castro S, Tierrafria VH, Lara P, Muñiz-Rascado L, Bonavides-Martinez C, Santos-Zavaleta A, Mackie A, Sun G, Ahn-Horst TA, Choi H, Covert MW, Collado-Vides J, Paulsen I. The EcoCyc Database (2023). EcoSal Plus 2023; 11:eesp00022023. [PMID: 37220074 PMCID: PMC10729931 DOI: 10.1128/ecosalplus.esp-0002-2023] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 04/04/2023] [Indexed: 01/28/2024]
Abstract
EcoCyc is a bioinformatics database available online at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on the regulation of gene expression, E. coli gene essentiality, and nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for the analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed online. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. Data generated from a whole-cell model that is parameterized from the latest data on EcoCyc are also available. This review outlines the data content of EcoCyc and of the procedures by which this content is generated.
Collapse
Affiliation(s)
- Peter D. Karp
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Suzanne Paley
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Ron Caspi
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Anamika Kothari
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Markus Krummenacker
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Peter E. Midford
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Lisa R. Moore
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Pallavi Subhraveti
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Socorro Gama-Castro
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Victor H. Tierrafria
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Paloma Lara
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Luis Muñiz-Rascado
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - César Bonavides-Martinez
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Alberto Santos-Zavaleta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Amanda Mackie
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Gwanggyu Sun
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Travis A. Ahn-Horst
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Heejo Choi
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Markus W. Covert
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Ian Paulsen
- School of Natural Sciences, Macquarie University, Sydney, New South Wales, Australia
| |
Collapse
|
3
|
Singh DP, Bisen MS, Shukla R, Prabha R, Maurya S, Reddy YS, Singh PM, Rai N, Chaubey T, Chaturvedi KK, Srivastava S, Farooqi MS, Gupta VK, Sarma BK, Rai A, Behera TK. Metabolomics-Driven Mining of Metabolite Resources: Applications and Prospects for Improving Vegetable Crops. Int J Mol Sci 2022; 23:ijms232012062. [PMID: 36292920 PMCID: PMC9603451 DOI: 10.3390/ijms232012062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 09/13/2022] [Accepted: 09/23/2022] [Indexed: 11/16/2022] Open
Abstract
Vegetable crops possess a prominent nutri-metabolite pool that not only contributes to the crop performance in the fields, but also offers nutritional security for humans. In the pursuit of identifying, quantifying and functionally characterizing the cellular metabolome pool, biomolecule separation technologies, data acquisition platforms, chemical libraries, bioinformatics tools, databases and visualization techniques have come to play significant role. High-throughput metabolomics unravels structurally diverse nutrition-rich metabolites and their entangled interactions in vegetable plants. It has helped to link identified phytometabolites with unique phenotypic traits, nutri-functional characters, defense mechanisms and crop productivity. In this study, we explore mining diverse metabolites, localizing cellular metabolic pathways, classifying functional biomolecules and establishing linkages between metabolic fluxes and genomic regulations, using comprehensive metabolomics deciphers of the plant’s performance in the environment. We discuss exemplary reports covering the implications of metabolomics, addressing metabolic changes in vegetable plants during crop domestication, stage-dependent growth, fruit development, nutri-metabolic capabilities, climatic impacts, plant-microbe-pest interactions and anthropogenic activities. Efforts leading to identify biomarker metabolites, candidate proteins and the genes responsible for plant health, defense mechanisms and nutri-rich crop produce are documented. With the insights on metabolite-QTL (mQTL) driven genetic architecture, molecular breeding in vegetable crops can be revolutionized for developing better nutritional capabilities, improved tolerance against diseases/pests and enhanced climate resilience in plants.
Collapse
Affiliation(s)
- Dhananjaya Pratap Singh
- ICAR-Indian Institute of Vegetable Research, Jakhini, Shahanshahpur, Varanasi 221305, India
- Correspondence:
| | - Mansi Singh Bisen
- ICAR-Indian Institute of Vegetable Research, Jakhini, Shahanshahpur, Varanasi 221305, India
| | - Renu Shukla
- Indian Council of Agricultural Research (ICAR), Krishi Bhawan, Dr. Rajendra Prasad Road, New Delhi 110001, India
| | - Ratna Prabha
- ICAR-Indian Agricultural Statistics Research Institute, Centre for Agricultural Bioinformatics, Library Avenue, Pusa, New Delhi 110012, India
| | - Sudarshan Maurya
- ICAR-Indian Institute of Vegetable Research, Jakhini, Shahanshahpur, Varanasi 221305, India
| | - Yesaru S. Reddy
- ICAR-Indian Institute of Vegetable Research, Jakhini, Shahanshahpur, Varanasi 221305, India
| | - Prabhakar Mohan Singh
- ICAR-Indian Institute of Vegetable Research, Jakhini, Shahanshahpur, Varanasi 221305, India
| | - Nagendra Rai
- ICAR-Indian Institute of Vegetable Research, Jakhini, Shahanshahpur, Varanasi 221305, India
| | - Tribhuwan Chaubey
- ICAR-Indian Institute of Vegetable Research, Jakhini, Shahanshahpur, Varanasi 221305, India
| | - Krishna Kumar Chaturvedi
- ICAR-Indian Agricultural Statistics Research Institute, Centre for Agricultural Bioinformatics, Library Avenue, Pusa, New Delhi 110012, India
| | - Sudhir Srivastava
- ICAR-Indian Agricultural Statistics Research Institute, Centre for Agricultural Bioinformatics, Library Avenue, Pusa, New Delhi 110012, India
| | - Mohammad Samir Farooqi
- ICAR-Indian Agricultural Statistics Research Institute, Centre for Agricultural Bioinformatics, Library Avenue, Pusa, New Delhi 110012, India
| | - Vijai Kumar Gupta
- Biorefining and Advanced Materials Research Centre, Scotland’s Rural College, Kings Buildings, West Mains Road, Edinburgh EH9 3JG, UK
| | - Birinchi K. Sarma
- Department of Mycology and Plant Pathology, Institute of Agricultural Sciences, Banaras Hindu University, Varanasi 221005, India
| | - Anil Rai
- ICAR-Indian Agricultural Statistics Research Institute, Centre for Agricultural Bioinformatics, Library Avenue, Pusa, New Delhi 110012, India
| | - Tusar Kanti Behera
- ICAR-Indian Institute of Vegetable Research, Jakhini, Shahanshahpur, Varanasi 221305, India
| |
Collapse
|
4
|
Chen B, Rupani PF, Azman S, Dewil R, Appels L. A redox-based strategy to enhance propionic and butyric acid production during anaerobic fermentation. BIORESOURCE TECHNOLOGY 2022; 361:127672. [PMID: 35878771 DOI: 10.1016/j.biortech.2022.127672] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 07/17/2022] [Accepted: 07/18/2022] [Indexed: 06/15/2023]
Abstract
This study investigated the selective production of volatile fatty acids (VFAs) during anaerobic mixed-culture fermentation. The experiment used chicken manure (CM) as a potential substrate to produce high added-value propionic acid and butyric acid under an alkaline environment. The conversion of CM into selective VFAs depends highly on operational conditions such as pH and redox balance. Therefore, the current experiment is designed to employ amino acid addition and develop a redox balance control method to control the final VFA profile. This study showed that 0.2-5.0 % valine and threonine addition successfully enhanced propionic acid and butyric acid production during alkaline fermentation and hence decreased the proportion of acetic acid from 83 % to approximately 47 %. The oxidation-reduction potential (ORP) and redox cofactor ratio (NADH/NAD+) were measured to support the selective VFA production mechanism. The results obtained in this study bring extra value to the valorization of CM within the circular economy concept for selective value-added VFA production.
Collapse
Affiliation(s)
- Boyang Chen
- KU Leuven, Department of Chemical Engineering, Process and Environmental Technology Lab, Jan Pieter De Nayerlaan 5, B-2860 Sint-Katelijne-Waver, Belgium
| | - Parveen Fatemeh Rupani
- KU Leuven, Department of Chemical Engineering, Process and Environmental Technology Lab, Jan Pieter De Nayerlaan 5, B-2860 Sint-Katelijne-Waver, Belgium
| | - Samet Azman
- Avans University of Applied Sciences, Academy of Life Sciences and Technology, Lovensdijk 61, 4818 AJ Breda, Netherlands
| | - Raf Dewil
- KU Leuven, Department of Chemical Engineering, Process and Environmental Technology Lab, Jan Pieter De Nayerlaan 5, B-2860 Sint-Katelijne-Waver, Belgium; University of Oxford, Department of Engineering Science, Parks Road, Oxford OX1 3PJ, United Kingdom
| | - Lise Appels
- KU Leuven, Department of Chemical Engineering, Process and Environmental Technology Lab, Jan Pieter De Nayerlaan 5, B-2860 Sint-Katelijne-Waver, Belgium.
| |
Collapse
|
5
|
Gasteiger J. Chemistry in Times of Artificial Intelligence. Chemphyschem 2020; 21:2233-2242. [PMID: 32808729 PMCID: PMC7702165 DOI: 10.1002/cphc.202000518] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 08/14/2020] [Indexed: 11/09/2022]
Abstract
Chemists have to a large extent gained their knowledge by doing experiments and thus gather data. By putting various data together and then analyzing them, chemists have fostered their understanding of chemistry. Since the 1960s, computer methods have been developed to perform this process from data to information to knowledge. Simultaneously, methods were developed for assisting chemists in solving their fundamental questions such as the prediction of chemical, physical, or biological properties, the design of organic syntheses, and the elucidation of the structure of molecules. This eventually led to a discipline of its own: chemoinformatics. Chemoinformatics has found important applications in the fields of drug discovery, analytical chemistry, organic chemistry, agrichemical research, food science, regulatory science, material science, and process control. From its inception, chemoinformatics has utilized methods from artificial intelligence, an approach that has recently gained more momentum.
Collapse
Affiliation(s)
- Johann Gasteiger
- Computer-Chemie-Centrum and Institute of Organic ChemistryUniversity of Erlangen-NurembergNaegelsbachstrasse 2591052ErlangenGermany
| |
Collapse
|
6
|
Karp PD, Ong WK, Paley S, Billington R, Caspi R, Fulcher C, Kothari A, Krummenacker M, Latendresse M, Midford PE, Subhraveti P, Gama-Castro S, Muñiz-Rascado L, Bonavides-Martinez C, Santos-Zavaleta A, Mackie A, Collado-Vides J, Keseler IM, Paulsen I. The EcoCyc Database. EcoSal Plus 2018; 8:10.1128/ecosalplus.ESP-0006-2018. [PMID: 30406744 PMCID: PMC6504970 DOI: 10.1128/ecosalplus.esp-0006-2018] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Indexed: 01/28/2023]
Abstract
EcoCyc is a bioinformatics database available at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on E. coli gene essentiality and on nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed via EcoCyc.org. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. This review outlines the data content of EcoCyc and of the procedures by which this content is generated.
Collapse
Affiliation(s)
- Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Wai Kit Ong
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Suzanne Paley
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | - Ron Caspi
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Carol Fulcher
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Anamika Kothari
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | - Mario Latendresse
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Peter E Midford
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | - Socorro Gama-Castro
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Luis Muñiz-Rascado
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - César Bonavides-Martinez
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Alberto Santos-Zavaleta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Amanda Mackie
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Ingrid M Keseler
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Ian Paulsen
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
7
|
Burns JA, Pittis AA, Kim E. Gene-based predictive models of trophic modes suggest Asgard archaea are not phagocytotic. Nat Ecol Evol 2018; 2:697-704. [DOI: 10.1038/s41559-018-0477-7] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 01/11/2018] [Indexed: 12/24/2022]
|
8
|
Kaur H, Das C, Mande SS. In Silico Analysis of Putrefaction Pathways in Bacteria and Its Implication in Colorectal Cancer. Front Microbiol 2017; 8:2166. [PMID: 29163445 PMCID: PMC5682003 DOI: 10.3389/fmicb.2017.02166] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Accepted: 10/23/2017] [Indexed: 12/15/2022] Open
Abstract
Fermentation of undigested proteins in human gastrointestinal tract (gut) by the resident microbiota, a process called bacterial putrefaction, can sometimes disrupt the gut homeostasis. In this process, essential amino acids (e.g., histidine, tryptophan, etc.) that are required by the host may be utilized by the gut microbes. In addition, some of the products of putrefaction, like ammonia, putrescine, cresol, indole, phenol, etc., have been implicated in the disease pathogenesis of colorectal cancer (CRC). We have investigated bacterial putrefaction pathways that are known to be associated with such metabolites. Results of the comprehensive in silico analysis of the selected putrefaction pathways across bacterial genomes revealed presence of these pathways in limited bacterial groups. Majority of these bacteria are commonly found in human gut. These include Bacillus, Clostridium, Enterobacter, Escherichia, Fusobacterium, Salmonella, etc. Interestingly, while pathogens utilize almost all the analyzed pathways, commensals prefer putrescine and H2S production pathways for metabolizing the undigested proteins. Further, comparison of the putrefaction pathways in the gut microbiomes of healthy, carcinoma and adenoma datasets indicate higher abundances of putrefying bacteria in the carcinoma stage of CRC. The insights obtained from the present study indicate utilization of possible microbiome-based therapies to minimize the adverse effects of gut microbiome in enteric diseases.
Collapse
Affiliation(s)
- Harrisham Kaur
- Bio-Sciences R&D Division, TCS Research, Tata Consultancy Services Ltd., Pune, India
| | - Chandrani Das
- Bio-Sciences R&D Division, TCS Research, Tata Consultancy Services Ltd., Pune, India
| | - Sharmila S Mande
- Bio-Sciences R&D Division, TCS Research, Tata Consultancy Services Ltd., Pune, India
| |
Collapse
|
9
|
From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer. mSystems 2016; 1:mSystems00101-16. [PMID: 28066816 PMCID: PMC5192078 DOI: 10.1128/msystems.00101-16] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 11/12/2016] [Indexed: 01/17/2023] Open
Abstract
Bacteria are ubiquitous in our ecosystem and have a major impact on human health, e.g., by supporting digestion in the human gut. Bacterial communities can also aid in biotechnological processes such as wastewater treatment or decontamination of polluted soils. Diverse bacteria contribute with their unique capabilities to the functioning of such ecosystems, but lab experiments to investigate those capabilities are labor-intensive. Major advances in sequencing techniques open up the opportunity to study bacteria by their genome sequences. For this purpose, we have developed Traitar, software that predicts traits of bacteria on the basis of their genomes. It is applicable to studies with tens or hundreds of bacterial genomes. Traitar may help researchers in microbiology to pinpoint the traits of interest, reducing the amount of wet lab work required. The number of sequenced genomes is growing exponentially, profoundly shifting the bottleneck from data generation to genome interpretation. Traits are often used to characterize and distinguish bacteria and are likely a driving factor in microbial community composition, yet little is known about the traits of most microbes. We describe Traitar, the microbial trait analyzer, which is a fully automated software package for deriving phenotypes from a genome sequence. Traitar provides phenotype classifiers to predict 67 traits related to the use of various substrates as carbon and energy sources, oxygen requirement, morphology, antibiotic susceptibility, proteolysis, and enzymatic activities. Furthermore, it suggests protein families associated with the presence of particular phenotypes. Our method uses L1-regularized L2-loss support vector machines for phenotype assignments based on phyletic patterns of protein families and their evolutionary histories across a diverse set of microbial species. We demonstrate reliable phenotype assignment for Traitar to bacterial genomes from 572 species of eight phyla, also based on incomplete single-cell genomes and simulated draft genomes. We also showcase its application in metagenomics by verifying and complementing a manual metabolic reconstruction of two novel Clostridiales species based on draft genomes recovered from commercial biogas reactors. Traitar is available at https://github.com/hzi-bifo/traitar. IMPORTANCE Bacteria are ubiquitous in our ecosystem and have a major impact on human health, e.g., by supporting digestion in the human gut. Bacterial communities can also aid in biotechnological processes such as wastewater treatment or decontamination of polluted soils. Diverse bacteria contribute with their unique capabilities to the functioning of such ecosystems, but lab experiments to investigate those capabilities are labor-intensive. Major advances in sequencing techniques open up the opportunity to study bacteria by their genome sequences. For this purpose, we have developed Traitar, software that predicts traits of bacteria on the basis of their genomes. It is applicable to studies with tens or hundreds of bacterial genomes. Traitar may help researchers in microbiology to pinpoint the traits of interest, reducing the amount of wet lab work required.
Collapse
|
10
|
Brbić M, Piškorec M, Vidulin V, Kriško A, Šmuc T, Supek F. The landscape of microbial phenotypic traits and associated genes. Nucleic Acids Res 2016; 44:10074-10090. [PMID: 27915291 PMCID: PMC5137458 DOI: 10.1093/nar/gkw964] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2016] [Revised: 09/21/2016] [Accepted: 10/11/2016] [Indexed: 12/31/2022] Open
Abstract
Bacteria and Archaea display a variety of phenotypic traits and can adapt to diverse ecological niches. However, systematic annotation of prokaryotic phenotypes is lacking. We have therefore developed ProTraits, a resource containing ∼545 000 novel phenotype inferences, spanning 424 traits assigned to 3046 bacterial and archaeal species. These annotations were assigned by a computational pipeline that associates microbes with phenotypes by text-mining the scientific literature and the broader World Wide Web, while also being able to define novel concepts from unstructured text. Moreover, the ProTraits pipeline assigns phenotypes by drawing extensively on comparative genomics, capturing patterns in gene repertoires, codon usage biases, proteome composition and co-occurrence in metagenomes. Notably, we find that gene synteny is highly predictive of many phenotypes, and highlight examples of gene neighborhoods associated with spore-forming ability. A global analysis of trait interrelatedness outlined clusters in the microbial phenotype network, suggesting common genetic underpinnings. Our extended set of phenotype annotations allows detection of 57 088 high confidence gene-trait links, which recover many known associations involving sporulation, flagella, catalase activity, aerobicity, photosynthesis and other traits. Over 99% of the commonly occurring gene families are involved in genetic interactions conditional on at least one phenotype, suggesting that epistasis has a major role in shaping microbial gene content.
Collapse
Affiliation(s)
- Maria Brbić
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia
| | - Matija Piškorec
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia
| | - Vedrana Vidulin
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia
| | - Anita Kriško
- Mediterranean Institute of Life Sciences, 21000 Split, Croatia
| | - Tomislav Šmuc
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia
| | - Fran Supek
- Division of Electronics, Ruder Boskovic Institute, 10000 Zagreb, Croatia .,EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| |
Collapse
|
11
|
Gasteiger J. Explorations into Chemical Reactions and Biochemical Pathways. Mol Inform 2016; 35:588-592. [DOI: 10.1002/minf.201600038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 04/25/2016] [Indexed: 11/07/2022]
Affiliation(s)
- Johann Gasteiger
- Computer-Chemie-Centrum; Universität Erlangen-Nürnberg; Nägelsbachstr. 25 91052 Erlangen Germany
| |
Collapse
|
12
|
Tamames J, Sánchez PD, Nikel PI, Pedrós-Alió C. Quantifying the Relative Importance of Phylogeny and Environmental Preferences As Drivers of Gene Content in Prokaryotic Microorganisms. Front Microbiol 2016; 7:433. [PMID: 27065987 PMCID: PMC4814473 DOI: 10.3389/fmicb.2016.00433] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Accepted: 03/17/2016] [Indexed: 01/15/2023] Open
Abstract
Two complementary forces shape microbial genomes: vertical inheritance of genes by phylogenetic descent, and acquisition of new genes related to adaptation to particular habitats and lifestyles. Quantification of the relative importance of each driving force proved difficult. We determined the contribution of each factor, and identified particular genes or biochemical/cellular processes linked to environmental preferences (i.e., propensity of a taxon to live in particular habitats). Three types of data were confronted: (i) complete genomes, which provide gene content of different taxa; (ii) phylogenetic information, via alignment of 16S rRNA sequences, which allowed determination of the distance between taxa, and (iii) distribution of species in environments via 16S rRNA sampling experiments, reflecting environmental preferences of different taxa. The combination of these three datasets made it possible to describe and quantify the relationships among them. We found that, although phylogenetic descent was responsible for shaping most genomes, a discernible part of the latter was correlated to environmental adaptations. Particular families of genes were identified as environmental markers, as supported by direct studies such as metagenomic sequencing. These genes are likely important for adaptation of bacteria to particular conditions or habitats, such as carbohydrate or glycan metabolism genes being linked to host-associated environments.
Collapse
Affiliation(s)
- Javier Tamames
- Departamento de Biología de Sistemas, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas Madrid, Spain
| | - Pablo D Sánchez
- Departamento de Biología de Sistemas, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas Madrid, Spain
| | - Pablo I Nikel
- Departamento de Biología de Sistemas, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas Madrid, Spain
| | - Carlos Pedrós-Alió
- Departamento de Biología de Sistemas, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones CientíficasMadrid, Spain; Departament de Biologia Marina i Oceanografia, Institut de Ciències del Mar, Consejo Superior de Investigaciones CientíficasBarcelona, Spain
| |
Collapse
|
13
|
Chemoinformatics: Achievements and Challenges, a Personal View. Molecules 2016; 21:151. [PMID: 26828468 PMCID: PMC6273366 DOI: 10.3390/molecules21020151] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Revised: 01/14/2016] [Accepted: 01/20/2016] [Indexed: 11/16/2022] Open
Abstract
Chemoinformatics provides computer methods for learning from chemical data and for modeling tasks a chemist is facing. The field has evolved in the past 50 years and has substantially shaped how chemical research is performed by providing access to chemical information on a scale unattainable by traditional methods. Many physical, chemical and biological data have been predicted from structural data. For the early phases of drug design, methods have been developed that are used in all major pharmaceutical companies. However, all domains of chemistry can benefit from chemoinformatics methods; many areas that are not yet well developed, but could substantially gain from the use of chemoinformatics methods. The quality of data is of crucial importance for successful results. Computer-assisted structure elucidation and computer-assisted synthesis design have been attempted in the early years of chemoinformatics. Because of the importance of these fields to the chemist, new approaches should be made with better hardware and software techniques. Society's concern about the impact of chemicals on human health and the environment could be met by the development of methods for toxicity prediction and risk assessment. In conjunction with bioinformatics, our understanding of the events in living organisms could be deepened and, thus, novel strategies for curing diseases developed. With so many challenging tasks awaiting solutions, the future is bright for chemoinformatics.
Collapse
|
14
|
Burns JA, Paasch A, Narechania A, Kim E. Comparative Genomics of a Bacterivorous Green Alga Reveals Evolutionary Causalities and Consequences of Phago-Mixotrophic Mode of Nutrition. Genome Biol Evol 2015. [PMID: 26224703 PMCID: PMC5741210 DOI: 10.1093/gbe/evv144] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Cymbomonas tetramitiformis—a marine prasinophyte—is one of only a few green algae that still retain an ancestral particulate-feeding mechanism while harvesting energy through photosynthesis. The genome of the alga is estimated to be 850 Mb–1.2 Gb in size—the bulk of which is filled with repetitive sequences—and is annotated with 37,366 protein-coding gene models. A number of unusual metabolic pathways (for the Chloroplastida) are predicted for C. tetramitiformis, including pathways for Lipid-A and peptidoglycan metabolism. Comparative analyses of the predicted peptides of C. tetramitiformis to sets of other eukaryotes revealed that nonphagocytes are depleted in a number of genes, a proportion of which have known function in feeding. In addition, our analysis suggests that obligatory phagotrophy is associated with the loss of genes that function in biosynthesis of small molecules (e.g., amino acids). Further, C. tetramitiformis and at least one other phago-mixotrophic alga are thus unique, compared with obligatory heterotrophs and nonphagocytes, in that both feeding and small molecule synthesis-related genes are retained in their genomes. These results suggest that early, ancestral host eukaryotes that gave rise to phototrophs had the capacity to assimilate building block molecules from inorganic substances (i.e., prototrophy). The loss of biosynthesis genes, thus, may at least partially explain the apparent lack of instances of permanent incorporation of photosynthetic endosymbionts in later-divergent, auxotrophic eukaryotic lineages, such as metazoans and ciliates.
Collapse
Affiliation(s)
- John A Burns
- Sackler Institute for Comparative Genomics and Division of Invertebrate Zoology, American Museum of Natural History, New York, NY
| | - Amber Paasch
- Sackler Institute for Comparative Genomics and Division of Invertebrate Zoology, American Museum of Natural History, New York, NY
| | - Apurva Narechania
- Sackler Institute for Comparative Genomics and Division of Invertebrate Zoology, American Museum of Natural History, New York, NY
| | - Eunsoo Kim
- Sackler Institute for Comparative Genomics and Division of Invertebrate Zoology, American Museum of Natural History, New York, NY
| |
Collapse
|
15
|
Karp PD, Weaver D, Paley S, Fulcher C, Kubo A, Kothari A, Krummenacker M, Subhraveti P, Weerasinghe D, Gama-Castro S, Huerta AM, Muñiz-Rascado L, Bonavides-Martinez C, Weiss V, Peralta-Gil M, Santos-Zavaleta A, Schröder I, Mackie A, Gunsalus R, Collado-Vides J, Keseler IM, Paulsen I. The EcoCyc Database. EcoSal Plus 2014; 6:10.1128/ecosalplus.ESP-0009-2013. [PMID: 26442933 PMCID: PMC4243172 DOI: 10.1128/ecosalplus.esp-0009-2013] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Indexed: 11/20/2022]
Abstract
EcoCyc is a bioinformatics database available at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene, metabolite, reaction, operon, and metabolic pathway. The database also includes information on E. coli gene essentiality and on nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. This review provides a detailed description of the data content of EcoCyc and of the procedures by which this content is generated.
Collapse
Affiliation(s)
- Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Daniel Weaver
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Suzanne Paley
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Carol Fulcher
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Aya Kubo
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Anamika Kothari
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | | | | | - Socorro Gama-Castro
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Araceli M Huerta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Luis Muñiz-Rascado
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - César Bonavides-Martinez
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Verena Weiss
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Martin Peralta-Gil
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Alberto Santos-Zavaleta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Imke Schröder
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095
- UCLA Institute of Genomics and Proteomics, University of California, Los Angeles, CA 90095
| | - Amanda Mackie
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Robert Gunsalus
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Ingrid M Keseler
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Ian Paulsen
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
16
|
Gasteiger J. Some solved and unsolved problems of chemoinformatics. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2014; 25:443-455. [PMID: 24716817 DOI: 10.1080/1062936x.2014.898688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The field of chemoinformatics has developed from different roots, starting in the 1960s. These branches have now merged into a scientific discipline of its own, exchanging ideas and methods across different areas of chemistry. In the last 40 years chemoinformatics has achieved a lot. Without access to the databases in chemistry developed with chemoinformatics methods, modern chemical research would not be able to work at its present high level of competence. However, there are quite a few challenges, such as drug design and understanding the effect of chemicals on human health and on the environment, as well as furthering our knowledge of chemistry and of biological systems, that can benefit from a more intensive use of chemoinformatics methods. Approaches to meet these challenges will be briefly outlined. All this emphasizes that chemoinformatics has matured into a scientific discipline of its own that reaches out to many other chemical fields and will increase in attractiveness to students and researchers.
Collapse
Affiliation(s)
- J Gasteiger
- a Computer-Chemie-Centrum, University of Erlangen-Nuremberg , Erlangen , Germany
| |
Collapse
|
17
|
Abstract
The human microbiome plays important roles in health, but when disrupted, these same indigenous microbes can cause disease. The composition of the microbiome changes during the transition from health to disease; however, these changes are often not conserved among patients. Since microbiome-associated diseases like periodontitis cause similar patient symptoms despite interpatient variability in microbial community composition, we hypothesized that human-associated microbial communities undergo conserved changes in metabolism during disease. Here, we used patient-matched healthy and diseased samples to compare gene expression of 160,000 genes in healthy and diseased periodontal communities. We show that health- and disease-associated communities exhibit defined differences in metabolism that are conserved between patients. In contrast, the metabolic gene expression of individual species was highly variable between patients. These results demonstrate that despite high interpatient variability in microbial composition, disease-associated communities display conserved metabolic profiles that are generally accomplished by a patient-specific cohort of microbes. IMPORTANCE The human microbiome project has shown that shifts in our microbiota are associated with many diseases, including obesity, Crohn's disease, diabetes, and periodontitis. While changes in microbial populations are apparent during these diseases, the species associated with each disease can vary from patient to patient. Taking into account this interpatient variability, we hypothesized that specific microbiota-associated diseases would be marked by conserved microbial community behaviors. Here, we use gene expression analyses of patient-matched healthy and diseased human periodontal plaque to show that microbial communities have highly conserved metabolic gene expression profiles, whereas individual species within the community do not. Furthermore, disease-associated communities exhibit conserved changes in metabolic and virulence gene expression.
Collapse
|
18
|
Combining chemoinformatics with bioinformatics: in silico prediction of bacterial flavor-forming pathways by a chemical systems biology approach "reverse pathway engineering". PLoS One 2014; 9:e84769. [PMID: 24416282 PMCID: PMC3885609 DOI: 10.1371/journal.pone.0084769] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Accepted: 11/18/2013] [Indexed: 12/05/2022] Open
Abstract
The incompleteness of genome-scale metabolic models is a major bottleneck for systems biology approaches, which are based on large numbers of metabolites as identified and quantified by metabolomics. Many of the revealed secondary metabolites and/or their derivatives, such as flavor compounds, are non-essential in metabolism, and many of their synthesis pathways are unknown. In this study, we describe a novel approach, Reverse Pathway Engineering (RPE), which combines chemoinformatics and bioinformatics analyses, to predict the “missing links” between compounds of interest and their possible metabolic precursors by providing plausible chemical and/or enzymatic reactions. We demonstrate the added-value of the approach by using flavor-forming pathways in lactic acid bacteria (LAB) as an example. Established metabolic routes leading to the formation of flavor compounds from leucine were successfully replicated. Novel reactions involved in flavor formation, i.e. the conversion of alpha-hydroxy-isocaproate to 3-methylbutanoic acid and the synthesis of dimethyl sulfide, as well as the involved enzymes were successfully predicted. These new insights into the flavor-formation mechanisms in LAB can have a significant impact on improving the control of aroma formation in fermented food products. Since the input reaction databases and compounds are highly flexible, the RPE approach can be easily extended to a broad spectrum of applications, amongst others health/disease biomarker discovery as well as synthetic biology.
Collapse
|
19
|
Boon E, Meehan CJ, Whidden C, Wong DHJ, Langille MGI, Beiko RG. Interactions in the microbiome: communities of organisms and communities of genes. FEMS Microbiol Rev 2014; 38:90-118. [PMID: 23909933 PMCID: PMC4298764 DOI: 10.1111/1574-6976.12035] [Citation(s) in RCA: 119] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Revised: 07/02/2013] [Accepted: 07/10/2013] [Indexed: 12/17/2022] Open
Abstract
A central challenge in microbial community ecology is the delineation of appropriate units of biodiversity, which can be taxonomic, phylogenetic, or functional in nature. The term 'community' is applied ambiguously; in some cases, the term refers simply to a set of observed entities, while in other cases, it requires that these entities interact with one another. Microorganisms can rapidly gain and lose genes, potentially decoupling community roles from taxonomic and phylogenetic groupings. Trait-based approaches offer a useful alternative, but many traits can be defined based on gene functions, metabolic modules, and genomic properties, and the optimal set of traits to choose is often not obvious. An analysis that considers taxon assignment and traits in concert may be ideal, with the strengths of each approach offsetting the weaknesses of the other. Individual genes also merit consideration as entities in an ecological analysis, with characteristics such as diversity, turnover, and interactions modeled using genes rather than organisms as entities. We identify some promising avenues of research that are likely to yield a deeper understanding of microbial communities that shift from observation-based questions of 'Who is there?' and 'What are they doing?' to the mechanistically driven question of 'How will they respond?'
Collapse
Affiliation(s)
- Eva Boon
- Department of Biology, Dalhousie University, Halifax, NS, Canada
| | | | | | | | | | | |
Collapse
|
20
|
Konietzny SGA, Pope PB, Weimann A, McHardy AC. Inference of phenotype-defining functional modules of protein families for microbial plant biomass degraders. BIOTECHNOLOGY FOR BIOFUELS 2014; 7:124. [PMID: 25342967 PMCID: PMC4189754 DOI: 10.1186/s13068-014-0124-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2014] [Accepted: 08/05/2014] [Indexed: 05/14/2023]
Abstract
BACKGROUND Efficient industrial processes for converting plant lignocellulosic materials into biofuels are a key to global efforts to come up with alternative energy sources to fossil fuels. Novel cellulolytic enzymes have been discovered in microbial genomes and metagenomes of microbial communities. However, the identification of relevant genes without known homologs, and the elucidation of the lignocellulolytic pathways and protein complexes for different microorganisms remain challenging. RESULTS We describe a new computational method for the targeted discovery of functional modules of plant biomass-degrading protein families, based on their co-occurrence patterns across genomes and metagenome datasets, and the strength of association of these modules with the genomes of known degraders. From approximately 6.4 million family annotations for 2,884 microbial genomes, and 332 taxonomic bins from 18 metagenomes, we identified 5 functional modules that are distinctive for plant biomass degraders, which we term "plant biomass degradation modules" (PDMs). These modules incorporate protein families involved in the degradation of cellulose, hemicelluloses, and pectins, structural components of the cellulosome, and additional families with potential functions in plant biomass degradation. The PDMs were linked to 81 gene clusters in genomes of known lignocellulose degraders, including previously described clusters of lignocellulolytic genes. On average, 70% of the families of each PDM were found to map to gene clusters in known degraders, which served as an additional confirmation of their functional relationships. The presence of a PDM in a genome or taxonomic metagenome bin furthermore allowed us to accurately predict the ability of any particular organism to degrade plant biomass. For 15 draft genomes of a cow rumen metagenome, we used cross-referencing to confirmed cellulolytic enzymes to validate that the PDMs identified plant biomass degraders within a complex microbial community. CONCLUSIONS Functional modules of protein families that are involved in different aspects of plant cell wall degradation can be inferred from co-occurrence patterns across (meta-)genomes with a probabilistic topic model. PDMs represent a new resource of protein families and candidate genes implicated in microbial plant biomass degradation. They can also be used to predict the plant biomass degradation ability for a genome or taxonomic bin. The method is also suitable for characterizing other microbial phenotypes.
Collapse
Affiliation(s)
- Sebastian GA Konietzny
- />Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, Saarbrücken, 66123 Germany
- />Department of Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, 40225 Germany
| | - Phillip B Pope
- />Department of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Post Office Box 5003, 1432 Ås, Norway
| | - Aaron Weimann
- />Department of Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, 40225 Germany
| | - Alice C McHardy
- />Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, Saarbrücken, 66123 Germany
- />Department of Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, 40225 Germany
| |
Collapse
|
21
|
Psomopoulos FE, Mitkas PA, Ouzounis CA. Detection of genomic idiosyncrasies using fuzzy phylogenetic profiles. PLoS One 2013; 8:e52854. [PMID: 23341912 PMCID: PMC3544837 DOI: 10.1371/journal.pone.0052854] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Accepted: 11/22/2012] [Indexed: 11/18/2022] Open
Abstract
Phylogenetic profiles express the presence or absence of genes and their homologs across a number of reference genomes. They have emerged as an elegant representation framework for comparative genomics and have been used for the genome-wide inference and discovery of functionally linked genes or metabolic pathways. As the number of reference genomes grows, there is an acute need for faster and more accurate methods for phylogenetic profile analysis with increased performance in speed and quality. We propose a novel, efficient method for the detection of genomic idiosyncrasies, i.e. sets of genes found in a specific genome with peculiar phylogenetic properties, such as intra-genome correlations or inter-genome relationships. Our algorithm is a four-step process where genome profiles are first defined as fuzzy vectors, then discretized to binary vectors, followed by a de-noising step, and finally a comparison step to generate intra- and inter-genome distances for each gene profile. The method is validated with a carefully selected benchmark set of five reference genomes, using a range of approaches regarding similarity metrics and pre-processing stages for noise reduction. We demonstrate that the fuzzy profile method consistently identifies the actual phylogenetic relationship and origin of the genes under consideration for the majority of the cases, while the detected outliers are found to be particular genes with peculiar phylogenetic patterns. The proposed method provides a time-efficient and highly scalable approach for phylogenetic stratification, with the detected groups of genes being either similar to their own genome profile or different from it, thus revealing atypical evolutionary histories.
Collapse
Affiliation(s)
- Fotis E. Psomopoulos
- Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Pericles A. Mitkas
- Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Christos A. Ouzounis
- Centre for Bioinformatics, Department of Informatics, School of Natural and Mathematical Sciences, King’s College London, Strand, London, United Kingdom
- * E-mail:
| |
Collapse
|
22
|
Abstract
Background Phenotypes exhibited by microorganisms can be useful for several purposes, e.g., ethanol as an alternate fuel. Sometimes, the target phenotype maybe required in combination with other phenotypes, in order to be useful, for e.g., an industrial process may require that the organism survive in an anaerobic, alcohol rich environment and be able to feed on both hexose and pentose sugars to produce ethanol. This combination of traits may not be available in any existing organism or if they do exist, the mechanisms involved in the phenotype-expression may not be efficient enough to be useful. Thus, it may be required to genetically modify microorganisms. However, before any genetic modification can take place, it is important to identify the underlying cellular subsystems responsible for the expression of the target phenotype. Results In this paper, we develop a method to identify statistically significant and phenotypically-biased functional modules. The method can compare the organismal network information from hundreds of phenotype expressing and phenotype non-expressing organisms to identify cellular subsystems that are more prone to occur in phenotype-expressing organisms than in phenotype non-expressing organisms. We have provided literature evidence that the phenotype-biased modules identified for phenotypes such as hydrogen production (dark and light fermentation), respiration, gram-positive, gram-negative and motility, are indeed phenotype-related. Conclusion Thus we have proposed a methodology to identify phenotype-biased cellular subsystems. We have shown the effectiveness of our methodology by applying it to several target phenotypes. The code and all supplemental files can be downloaded from (http://freescience.org/cs/phenotype-biased-biclusters/).
Collapse
|
23
|
Schmidt MC, Rocha AM, Padmanabhan K, Shpanskaya Y, Banfield J, Scott K, Mihelcic JR, Samatova NF. NIBBS-search for fast and accurate prediction of phenotype-biased metabolic systems. PLoS Comput Biol 2012; 8:e1002490. [PMID: 22589706 PMCID: PMC3349732 DOI: 10.1371/journal.pcbi.1002490] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2011] [Accepted: 03/08/2012] [Indexed: 02/07/2023] Open
Abstract
Understanding of genotype-phenotype associations is important not only for furthering our knowledge on internal cellular processes, but also essential for providing the foundation necessary for genetic engineering of microorganisms for industrial use (e.g., production of bioenergy or biofuels). However, genotype-phenotype associations alone do not provide enough information to alter an organism's genome to either suppress or exhibit a phenotype. It is important to look at the phenotype-related genes in the context of the genome-scale network to understand how the genes interact with other genes in the organism. Identification of metabolic subsystems involved in the expression of the phenotype is one way of placing the phenotype-related genes in the context of the entire network. A metabolic system refers to a metabolic network subgraph; nodes are compounds and edges labels are the enzymes that catalyze the reaction. The metabolic subsystem could be part of a single metabolic pathway or span parts of multiple pathways. Arguably, comparative genome-scale metabolic network analysis is a promising strategy to identify these phenotype-related metabolic subsystems. Network Instance-Based Biased Subgraph Search (NIBBS) is a graph-theoretic method for genome-scale metabolic network comparative analysis that can identify metabolic systems that are statistically biased toward phenotype-expressing organismal networks. We set up experiments with target phenotypes like hydrogen production, TCA expression, and acid-tolerance. We show via extensive literature search that some of the resulting metabolic subsystems are indeed phenotype-related and formulate hypotheses for other systems in terms of their role in phenotype expression. NIBBS is also orders of magnitude faster than MULE, one of the most efficient maximal frequent subgraph mining algorithms that could be adjusted for this problem. Also, the set of phenotype-biased metabolic systems output by NIBBS comes very close to the set of phenotype-biased subgraphs output by an exact maximally-biased subgraph enumeration algorithm ( MBS-Enum ). The code (NIBBS and the module to visualize the identified subsystems) is available at http://freescience.org/cs/NIBBS.
Collapse
Affiliation(s)
- Matthew C. Schmidt
- Department of Computer Science, North Carolina State University, Raleigh, North Carolina, United States of America
- Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
| | - Andrea M. Rocha
- Department of Civil and Environmental Engineering, University of South Florida, Tampa, Florida, United States of America
| | - Kanchana Padmanabhan
- Department of Computer Science, North Carolina State University, Raleigh, North Carolina, United States of America
- Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
| | - Yekaterina Shpanskaya
- Neuroscience Department, Duke University, Durham, North Carolina, United States of America
| | - Jill Banfield
- Departments of Earth and Planetary Science, University of California, Berkeley, California, United States of America
- Environmental Science, Policy, & Management, University of California, Berkeley, California, United States of America
- Geochemistry Department, Lawrence Berkeley National Laboratory Earth Sciences Division, Berkeley, California, United States of America
| | - Kathleen Scott
- Department of Integrative Biology, University of South Florida, Tampa, Florida, United States of America
| | - James R. Mihelcic
- Department of Civil and Environmental Engineering, University of South Florida, Tampa, Florida, United States of America
| | - Nagiza F. Samatova
- Department of Computer Science, North Carolina State University, Raleigh, North Carolina, United States of America
- Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
24
|
Use of comparative genomics approaches to characterize interspecies differences in response to environmental chemicals: challenges, opportunities, and research needs. Toxicol Appl Pharmacol 2011; 271:372-85. [PMID: 22142766 DOI: 10.1016/j.taap.2011.11.011] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Revised: 11/11/2011] [Accepted: 11/16/2011] [Indexed: 01/12/2023]
Abstract
A critical challenge for environmental chemical risk assessment is the characterization and reduction of uncertainties introduced when extrapolating inferences from one species to another. The purpose of this article is to explore the challenges, opportunities, and research needs surrounding the issue of how genomics data and computational and systems level approaches can be applied to inform differences in response to environmental chemical exposure across species. We propose that the data, tools, and evolutionary framework of comparative genomics be adapted to inform interspecies differences in chemical mechanisms of action. We compare and contrast existing approaches, from disciplines as varied as evolutionary biology, systems biology, mathematics, and computer science, that can be used, modified, and combined in new ways to discover and characterize interspecies differences in chemical mechanism of action which, in turn, can be explored for application to risk assessment. We consider how genetic, protein, pathway, and network information can be interrogated from an evolutionary biology perspective to effectively characterize variations in biological processes of toxicological relevance among organisms. We conclude that comparative genomics approaches show promise for characterizing interspecies differences in mechanisms of action, and further, for improving our understanding of the uncertainties inherent in extrapolating inferences across species in both ecological and human health risk assessment. To achieve long-term relevance and consistent use in environmental chemical risk assessment, improved bioinformatics tools, computational methods robust to data gaps, and quantitative approaches for conducting extrapolations across species are critically needed. Specific areas ripe for research to address these needs are recommended.
Collapse
|
25
|
Abstract
The mitis group streptococci (MGS) are widespread in the oral cavity and are traditionally associated with oral health. However, these organisms have many attributes that contribute to the development of pathogenic oral communities. MGS adhere rapidly to saliva-coated tooth surfaces, thereby providing an attachment substratum for more overtly pathogenic organisms such as Porphyromonas gingivalis, and the two species assemble into heterotypic communities. Close physical association facilitates physiologic support, and pathogens such as Aggregatibacter actinomycetemcomitans display resource partitioning to favour carbon sources generated by streptococcal metabolism. MGS exchange information with community members through a number of interspecies signalling systems including AI-2 and contact dependent mechanisms. Signal transduction systems induced in P. gingivalis are based on protein dephosphorylation mediated by the tyrosine phosphatase Ltp1, and converge on a LuxR-family transcriptional regulator, CdhR. Phenotypic responses in P. gingivalis include regulation of hemin uptake systems and gingipain activity, processes that are intimately linked to the virulence of the organism. Furthermore, communities of S. gordonii with P. gingivalis or with A. actinomycetemcomitans are more pathogenic in animal models than the constituent species alone. We propose that MGS should be considered accessory pathogens, organisms whose pathogenic potential only becomes evident in the context of a heterotypic microbial community.
Collapse
Affiliation(s)
- Sarah E Whitmore
- Center for Oral Health and Systemic Disease, School of Dentistry, University of Louisville, Louisville, KY 40202, USA
| | | |
Collapse
|
26
|
Lingner T, Mühlhausen S, Gabaldón T, Notredame C, Meinicke P. Predicting phenotypic traits of prokaryotes from protein domain frequencies. BMC Bioinformatics 2010; 11:481. [PMID: 20868492 PMCID: PMC2955703 DOI: 10.1186/1471-2105-11-481] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2010] [Accepted: 09/24/2010] [Indexed: 12/03/2022] Open
Abstract
Background Establishing the relationship between an organism's genome sequence and its phenotype is a fundamental challenge that remains largely unsolved. Accurately predicting microbial phenotypes solely based on genomic features will allow us to infer relevant phenotypic characteristics when the availability of a genome sequence precedes experimental characterization, a scenario that is favored by the advent of novel high-throughput and single cell sequencing techniques. Results We present a novel approach to predict the phenotype of prokaryotes directly from their protein domain frequencies. Our discriminative machine learning approach provides high prediction accuracy of relevant phenotypes such as motility, oxygen requirement or spore formation. Moreover, the set of discriminative domains provides biological insight into the underlying phenotype-genotype relationship and enables deriving hypotheses on the possible functions of uncharacterized domains. Conclusions Fast and accurate prediction of microbial phenotypes based on genomic protein domain content is feasible and has the potential to provide novel biological insights. First results of a systematic check for annotation errors indicate that our approach may also be applied to semi-automatic correction and completion of the existing phenotype annotation.
Collapse
Affiliation(s)
- Thomas Lingner
- Department of Bioinformatics, Institute of Microbiology and Genetics, Georg-August-University Göttingen, Germany.
| | | | | | | | | |
Collapse
|
27
|
MacDonald NJ, Beiko RG. Efficient learning of microbial genotype-phenotype association rules. ACTA ACUST UNITED AC 2010; 26:1834-40. [PMID: 20529891 DOI: 10.1093/bioinformatics/btq305] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Finding biologically causative genotype-phenotype associations from whole-genome data is difficult due to the large gene feature space to mine, the potential for interactions among genes and phylogenetic correlations between genomes. Associations within phylogenetically distinct organisms with unusual molecular mechanisms underlying their phenotype may be particularly difficult to assess. RESULTS We have developed a new genotype-phenotype association approach that uses Classification based on Predictive Association Rules (CPAR), and compare it with NETCAR, a recently published association algorithm. Our implementation of CPAR gave on average slightly higher classification accuracy, with approximately 100 time faster running times. Given the influence of phylogenetic correlations in the extraction of genotype-phenotype association rules, we furthermore propose a novel measure for downweighting the dependence among samples by modeling shared ancestry using conditional mutual information, and demonstrate its complementary nature to traditional mining approaches. AVAILABILITY Software implemented for this study is available under the Creative Commons Attribution 3.0 license from the author at http://kiwi.cs.dal.ca/Software/PICA
Collapse
Affiliation(s)
- Norman J MacDonald
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
| | | |
Collapse
|
28
|
|
29
|
Heinemann M, Sauer U. Systems biology of microbial metabolism. Curr Opin Microbiol 2010; 13:337-43. [PMID: 20219420 DOI: 10.1016/j.mib.2010.02.005] [Citation(s) in RCA: 89] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2010] [Accepted: 02/13/2010] [Indexed: 12/20/2022]
Abstract
One current challenge in metabolic systems biology is to map out the regulation networks that control metabolism. From progress in this area, we conclude that non-transcriptional mechanisms (e.g. metabolite-protein interactions and protein phosphorylation) are highly relevant in actually controlling metabolic function. Furthermore, recent results highlight more functions of enzymes and metabolites than currently appreciated in genome-scale metabolic reconstructions, thereby adding another level of complexity. Combining experimental analyses and modeling efforts we are also beginning to understand how metabolic behavior emerges. Particularly, we recognize that metabolism is not simply a dull workhorse process but rather takes very active control of itself and other cellular processes, rendering true system-level understanding of metabolism possibly more difficult than for other cellular systems.
Collapse
Affiliation(s)
- Matthias Heinemann
- ETH Zurich, Institute of Molecular Systems Biology, Wolfgang-Pauli-Str. 16, 8093 Zurich, Switzerland.
| | | |
Collapse
|
30
|
Dale JM, Popescu L, Karp PD. Machine learning methods for metabolic pathway prediction. BMC Bioinformatics 2010; 11:15. [PMID: 20064214 PMCID: PMC3146072 DOI: 10.1186/1471-2105-11-15] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2009] [Accepted: 01/08/2010] [Indexed: 12/29/2022] Open
Abstract
Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.
Collapse
Affiliation(s)
- Joseph M Dale
- Bioinformatics Research Group, SRI International, 333 Ravenswood Ave, Menlo Park, CA 94025, USA
| | | | | |
Collapse
|