1
|
Chettri D, Verma AK, Chirania M, Verma AK. Metagenomic approaches in bioremediation of environmental pollutants. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 363:125297. [PMID: 39537082 DOI: 10.1016/j.envpol.2024.125297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 10/05/2024] [Accepted: 11/10/2024] [Indexed: 11/16/2024]
Abstract
Metagenomics has emerged as a pivotal tool in bioremediation, providing a deeper understanding of the structure and function of the microbial communities involved in pollutant degradation. By circumventing the limitations of traditional culture-based methods, metagenomics enables comprehensive analysis of microbial ecosystems and facilitates the identification of new genes and metabolic pathways that are critical for bioremediation. Advanced sequencing technologies combined with computational and bioinformatics approaches have greatly enhanced our ability to detect sources of pollution and monitor dynamic changes in microbial communities during the bioremediation process. These tools enable the precise identification of key microbial players and their functional roles, and provide a deeper understanding of complex biodegradation networks. The integration of artificial intelligence (AI) with machine learning algorithms has accelerated the process of discovery of novel genes associated with bioremediation and has optimized metabolic pathway prediction. Novel strategies, including sequencing techniques and AI-assisted analysis, have the potential to revolutionize bioremediation by enabling the development of highly efficient, targeted, and sustainable remediation strategies for various contaminated environments. However, the complexity of microbial interactions, data interpretation, and high cost of these advanced technologies remain challenging. Future research should focus on improving computational tools, reducing costs, and integrating multidisciplinary approaches to overcome these limitations.
Collapse
Affiliation(s)
- Dixita Chettri
- Department of Microbiology, Sikkim University, Gangtok, 737102, Sikkim, India
| | - Ashwani Kumar Verma
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, 781039, Assam, India
| | - Manisha Chirania
- Department of Microbiology, Sikkim University, Gangtok, 737102, Sikkim, India
| | - Anil Kumar Verma
- Department of Microbiology, Sikkim University, Gangtok, 737102, Sikkim, India.
| |
Collapse
|
2
|
Cui J, Ju KS. Biosynthesis of Bacillus Phosphonoalamides Reveals Highly Specific Amino Acid Ligation. ACS Chem Biol 2024; 19:1506-1514. [PMID: 38885091 PMCID: PMC11259534 DOI: 10.1021/acschembio.4c00190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2024]
Abstract
Phosphonate natural products have a history of commercial success across numerous industries due to their potent inhibition of metabolic processes. Over the past decade, genome mining approaches have successfully led to the discovery of numerous bioactive phosphonates. However, continued success is dependent upon a greater understanding of phosphonate metabolism, which will enable the prioritization and prediction of biosynthetic gene clusters for targeted isolation. Here, we report the complete biosynthetic pathway for phosphonoalamides E and F, antimicrobial phosphonopeptides with a conserved C-terminal l-phosphonoalanine (PnAla) residue. These peptides, produced by Bacillus, are the direct result of PnAla biosynthesis and serial ligation by two ATP-grasp ligases. A critical step of this pathway was the reversible transamination of phosphonopyruvate to PnAla by a dedicated transaminase with preference for the forward reaction. The dipeptide ligase PnfA was shown to ligate alanine to PnAla to afford phosphonoalamide E, which was subsequently ligated to alanine by PnfB to form phosphonoalamide F. Specificity profiling of both ligases found each to be highly specific, although the limited acceptance of noncanonical substrates by PnfA allowed for in vitro formation of products incorporating alternative pharmacophores. Our findings further establish the transaminative branch of phosphonate metabolism, unveil insights into the specificity of ATP-grasp ligation, and highlight the biocatalytic potential of biosynthetic enzymes.
Collapse
Affiliation(s)
- Jerry Cui
- Department of Microbiology, The Ohio State University, Columbus, Ohio 43210, United States
| | - Kou-San Ju
- Department of Microbiology, The Ohio State University, Columbus, Ohio 43210, United States
- Division of Medicinal Chemistry and Pharmacognosy, The Ohio State University, Columbus, Ohio 43210, United States
- Center for Applied Plant Sciences, The Ohio State University, Columbus, Ohio 43210, United States
- Infectious Diseases Institute, The Ohio State University, Columbus, Ohio 43210, United States
| |
Collapse
|
3
|
Botas J, Rodríguez Del Río Á, Giner-Lamia J, Huerta-Cepas J. GeCoViz: genomic context visualisation of prokaryotic genes from a functional and evolutionary perspective. Nucleic Acids Res 2022; 50:W352-W357. [PMID: 35639770 PMCID: PMC9252766 DOI: 10.1093/nar/gkac367] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 04/14/2022] [Accepted: 05/05/2022] [Indexed: 11/14/2022] Open
Abstract
Synteny conservation analysis is a well-established methodology to investigate the potential functional role of unknown prokaryotic genes. However, bioinformatic tools to reconstruct and visualise genomic contexts usually depend on slow computations, are restricted to narrow taxonomic ranges, and/or do not allow for the functional and interactive exploration of neighbouring genes across different species. Here, we present GeCoViz, an online resource built upon 12 221 reference prokaryotic genomes that provides fast and interactive visualisation of custom genomic regions anchored by any target gene, which can be sought by either name, orthologous group (KEGGs, eggNOGs), protein domain (PFAM) or sequence. To facilitate functional and evolutionary interpretation, GeCoViz allows to customise the taxonomic scope of each analysis and provides comprehensive annotations of the neighbouring genes. Interactive visualisation options include, among others, the scaled representations of gene lengths and genomic distances, and on the fly calculation of synteny conservation of neighbouring genes, which can be highlighted based on custom thresholds. The resulting plots can be downloaded as high-quality images for publishing purposes. Overall, GeCoViz offers an easy-to-use, comprehensive, fast and interactive web-based tool for investigating the genomic context of prokaryotic genes, and is freely available at https://gecoviz.cgmlab.org.
Collapse
Affiliation(s)
- Jorge Botas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, 28223, Spain
| | - Álvaro Rodríguez Del Río
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, 28223, Spain
| | - Joaquín Giner-Lamia
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, 28223, Spain.,Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, 28040, Spain
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, 28223, Spain
| |
Collapse
|
4
|
Sinha S, Lynn AM, Desai DK. Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study. BMC Bioinformatics 2020; 21:466. [PMID: 33076816 PMCID: PMC7574302 DOI: 10.1186/s12859-020-03794-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 10/01/2020] [Indexed: 02/06/2023] Open
Abstract
Background Homology based methods are one of the most important and widely used approaches for functional annotation of high-throughput microbial genome data. A major limitation of these methods is the absence of well-characterized sequences for certain functions. The non-homology methods based on the context and the interactions of a protein are very useful for identifying missing metabolic activities and functional annotation in the absence of significant sequence similarity. In the current work, we employ both homology and context-based methods, incrementally, to identify local holes and chokepoints, whose presence in the Mycobacterium tuberculosis genome is indicated based on its interaction with known proteins in a metabolic network context, but have not been annotated. We have developed two computational procedures using network theory to identify orphan enzymes (‘Hole finding protocol’) coupled with the identification of candidate proteins for the predicted orphan enzyme (‘Hole filling protocol’). We propose an integrated interaction score based on scores from the STRING database to identify candidate protein sequences for the orphan enzymes from M. tuberculosis, as a case study, which are most likely to perform the missing function. Results The application of an automated homology-based enzyme identification protocol, ModEnzA, on M. tuberculosis genome yielded 56 novel enzyme predictions. We further predicted 74 putative local holes, 6 choke points, and 3 high confidence local holes in the genome using ‘Hole finding protocol’. The ‘Hole-filling protocol’ was validated on the E. coli genome using artificial in-silico enzyme knockouts where our method showed 25% increased accuracy, compared to other methods, in assigning the correct sequence for the knocked-out enzyme amongst the top 10 ranks. The method was further validated on 8 additional genomes. Conclusions We have developed methods that can be generalized to augment homology-based annotation to identify missing enzyme coding genes and to predict a candidate protein for them. For pathogens such as M. tuberculosis, this work holds significance in terms of increasing the protein repertoire and thereby, the potential for identifying novel drug targets.
Collapse
Affiliation(s)
- Swati Sinha
- Bioinformatics Institute, Agency for Science, Technology, and Research (A*Star), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Andrew M Lynn
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Dhwani K Desai
- Department of Biology and Department of Pharmacology, Dalhousie University, Halifax, NS, B3H4R2, Canada. .,School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.
| |
Collapse
|
5
|
Nakamura Y, Hirose S, Taniguchi Y, Moriya Y, Yamada T. Targeted enzyme gene re-positioning: A computational approach for discovering alternative bacterial enzymes for the synthesis of plant-specific secondary metabolites. Metab Eng Commun 2019; 9:e00102. [PMID: 31720217 PMCID: PMC6838473 DOI: 10.1016/j.mec.2019.e00102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 08/19/2019] [Accepted: 09/08/2019] [Indexed: 12/27/2022] Open
Abstract
Plant-biosynthesised secondary metabolites are unique sources of pharmaceuticals, food additives, and flavourings, among other industrial uses. However, industrial production of these metabolites is difficult because of their structural complexity, dangerousness and unfriendliness to natural environment, so the development of new methods to synthesise them is required. In this study, we developed a novel approach to identifying alternative bacterial enzyme to produce plant-biosynthesised secondary metabolites. Based on the similarity of enzymatic reactions, we searched for candidate bacterial genes encoding enzymes that could potentially replace the enzymes in plant-specific secondary metabolism reactions that are contained in the KEGG database (enzyme re-positioning). As a result, we discovered candidate bacterial alternative enzyme genes for 447 plant-specific secondary metabolic reaction. To validate our approach, we focused on the ability of an enzyme from Streptomyces coelicolor strain A3(2) strain to convert valencene to the grapefruit metabolite nootkatone, and confirmed its enzymatic activity by gas chromatography-mass spectrometry. This enzyme re-positioning approach may offer an entirely new way of screening enzymes that cannot be achieved by most of other conventional methods, and it is applicable to various other metabolites and may enable microbial production of compounds that are currently difficult to produce industrially.
Collapse
Affiliation(s)
- Yuya Nakamura
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro, Tokyo, 152-8550, Japan
| | - Shuichi Hirose
- NAGASE R&D Center, Nagase & Co., Ltd, Kobe High Tech Park 2-2-3 Murotani, Nishi- ku, Kobe, Hyogo, 651-2241, Japan
| | - Yuko Taniguchi
- NAGASE R&D Center, Nagase & Co., Ltd, Kobe High Tech Park 2-2-3 Murotani, Nishi- ku, Kobe, Hyogo, 651-2241, Japan
| | - Yuki Moriya
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa, 277-0871, Japan
| | - Takuji Yamada
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro, Tokyo, 152-8550, Japan
- PRESTO, Japan Science and Technology Agency, 4-1-8 Honcho Kawaguchi, Saitama, 332-0012, Japan
- Metabologenomics Inc, 246-2 Kakuganji, Tsuruoka, Yamagata, 997-0052, Japan
| |
Collapse
|
6
|
Carboxylic Ester Hydrolases in Bacteria: Active Site, Structure, Function and Application. CRYSTALS 2019. [DOI: 10.3390/cryst9110597] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Carboxylic ester hydrolases (CEHs), which catalyze the hydrolysis of carboxylic esters to produce alcohol and acid, are identified in three domains of life. In the Protein Data Bank (PDB), 136 crystal structures of bacterial CEHs (424 PDB codes) from 52 genera and metagenome have been reported. In this review, we categorize these structures based on catalytic machinery, structure and substrate specificity to provide a comprehensive understanding of the bacterial CEHs. CEHs use Ser, Asp or water as a nucleophile to drive diverse catalytic machinery. The α/β/α sandwich architecture is most frequently found in CEHs, but 3-solenoid, β-barrel, up-down bundle, α/β/β/α 4-layer sandwich, 6 or 7 propeller and α/β barrel architectures are also found in these CEHs. Most are substrate-specific to various esters with types of head group and lengths of the acyl chain, but some CEHs exhibit peptidase or lactamase activities. CEHs are widely used in industrial applications, and are the objects of research in structure- or mutation-based protein engineering. Structural studies of CEHs are still necessary for understanding their biological roles, identifying their structure-based functions and structure-based engineering and their potential industrial applications.
Collapse
|
7
|
Heirendt L, Arreckx S, Pfau T, Mendoza SN, Richelle A, Heinken A, Haraldsdóttir HS, Wachowiak J, Keating SM, Vlasov V, Magnusdóttir S, Ng CY, Preciat G, Žagare A, Chan SHJ, Aurich MK, Clancy CM, Modamio J, Sauls JT, Noronha A, Bordbar A, Cousins B, El Assal DC, Valcarcel LV, Apaolaza I, Ghaderi S, Ahookhosh M, Ben Guebila M, Kostromins A, Sompairac N, Le HM, Ma D, Sun Y, Wang L, Yurkovich JT, Oliveira MAP, Vuong PT, El Assal LP, Kuperstein I, Zinovyev A, Hinton HS, Bryant WA, Aragón Artacho FJ, Planes FJ, Stalidzans E, Maass A, Vempala S, Hucka M, Saunders MA, Maranas CD, Lewis NE, Sauter T, Palsson BØ, Thiele I, Fleming RMT. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat Protoc 2019; 14:639-702. [PMID: 30787451 PMCID: PMC6635304 DOI: 10.1038/s41596-018-0098-2] [Citation(s) in RCA: 664] [Impact Index Per Article: 110.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Constraint-based reconstruction and analysis (COBRA) provides a molecular mechanistic framework for integrative analysis of experimental molecular systems biology data and quantitative prediction of physicochemically and biochemically feasible phenotypic states. The COBRA Toolbox is a comprehensive desktop software suite of interoperable COBRA methods. It has found widespread application in biology, biomedicine, and biotechnology because its functions can be flexibly combined to implement tailored COBRA protocols for any biochemical network. This protocol is an update to the COBRA Toolbox v.1.0 and v.2.0. Version 3.0 includes new methods for quality-controlled reconstruction, modeling, topological analysis, strain and experimental design, and network visualization, as well as network integration of chemoinformatic, metabolomic, transcriptomic, proteomic, and thermochemical data. New multi-lingual code integration also enables an expansion in COBRA application scope via high-precision, high-performance, and nonlinear numerical optimization solvers for multi-scale, multi-cellular, and reaction kinetic modeling, respectively. This protocol provides an overview of all these new features and can be adapted to generate and analyze constraint-based models in a wide variety of scenarios. The COBRA Toolbox v.3.0 provides an unparalleled depth of COBRA methods.
Collapse
Affiliation(s)
- Laurent Heirendt
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Sylvain Arreckx
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Thomas Pfau
- Life Sciences Research Unit, University of Luxembourg, Belvaux, Luxembourg
| | - Sebastián N Mendoza
- Center for Genome Regulation (Fondap 15090007), University of Chile, Santiago, Chile
- Mathomics, Center for Mathematical Modeling, University of Chile, Santiago, Chile
| | - Anne Richelle
- Department of Pediatrics, University of California, San Diego, School of Medicine, La Jolla, CA, USA
| | - Almut Heinken
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Hulda S Haraldsdóttir
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Jacek Wachowiak
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Sarah M Keating
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Vanja Vlasov
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Stefania Magnusdóttir
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Chiam Yu Ng
- Department of Chemical Engineering, The Pennsylvania State University, State College, PA, USA
| | - German Preciat
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Alise Žagare
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Siu H J Chan
- Department of Chemical Engineering, The Pennsylvania State University, State College, PA, USA
| | - Maike K Aurich
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Catherine M Clancy
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Jennifer Modamio
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - John T Sauls
- Department of Physics, and Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA
| | - Alberto Noronha
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | | | - Benjamin Cousins
- Algorithms and Randomness Center, School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
| | - Diana C El Assal
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Luis V Valcarcel
- Biomedical Engineering and Sciences Department, TECNUN, University of Navarra, San Sebastián, Spain
| | - Iñigo Apaolaza
- Biomedical Engineering and Sciences Department, TECNUN, University of Navarra, San Sebastián, Spain
| | - Susan Ghaderi
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Masoud Ahookhosh
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Marouen Ben Guebila
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Andrejs Kostromins
- Institute of Microbiology and Biotechnology, University of Latvia, Riga, Latvia
| | - Nicolas Sompairac
- Institut Curie, PSL Research University, Mines Paris Tech, Inserm, U900, Paris, France
| | - Hoai M Le
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Ding Ma
- Department of Management Science and Engineering, Stanford University, Stanford, CA, USA
| | - Yuekai Sun
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Lin Wang
- Department of Chemical Engineering, The Pennsylvania State University, State College, PA, USA
| | - James T Yurkovich
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Miguel A P Oliveira
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Phan T Vuong
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Lemmer P El Assal
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Inna Kuperstein
- Institut Curie, PSL Research University, Mines Paris Tech, Inserm, U900, Paris, France
| | - Andrei Zinovyev
- Institut Curie, PSL Research University, Mines Paris Tech, Inserm, U900, Paris, France
| | - H Scott Hinton
- Utah State University Research Foundation, North Logan, UT, USA
| | - William A Bryant
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, UK
| | | | - Francisco J Planes
- Biomedical Engineering and Sciences Department, TECNUN, University of Navarra, San Sebastián, Spain
| | - Egils Stalidzans
- Institute of Microbiology and Biotechnology, University of Latvia, Riga, Latvia
| | - Alejandro Maass
- Center for Genome Regulation (Fondap 15090007), University of Chile, Santiago, Chile
- Mathomics, Center for Mathematical Modeling, University of Chile, Santiago, Chile
| | - Santosh Vempala
- Algorithms and Randomness Center, School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
| | - Michael Hucka
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Michael A Saunders
- Department of Management Science and Engineering, Stanford University, Stanford, CA, USA
| | - Costas D Maranas
- Department of Chemical Engineering, The Pennsylvania State University, State College, PA, USA
| | - Nathan E Lewis
- Department of Pediatrics, University of California, San Diego, School of Medicine, La Jolla, CA, USA
- Novo Nordisk Foundation Center for Biosustainability, University of California, San Diego, La Jolla, CA, USA
| | - Thomas Sauter
- Life Sciences Research Unit, University of Luxembourg, Belvaux, Luxembourg
| | - Bernhard Ø Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Lyngby, Denmark
| | - Ines Thiele
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Ronan M T Fleming
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg.
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
8
|
Missing gene identification using functional coherence scores. Sci Rep 2016; 6:31725. [PMID: 27552989 PMCID: PMC4995438 DOI: 10.1038/srep31725] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 07/22/2016] [Indexed: 11/18/2022] Open
Abstract
Reconstructing metabolic and signaling pathways is an effective way of interpreting a genome sequence. A challenge in a pathway reconstruction is that often genes in a pathway cannot be easily found, reflecting current imperfect information of the target organism. In this work, we developed a new method for finding missing genes, which integrates multiple features, including gene expression, phylogenetic profile, and function association scores. Particularly, for considering function association between candidate genes and neighboring proteins to the target missing gene in the network, we used Co-occurrence Association Score (CAS) and PubMed Association Score (PAS), which are designed for capturing functional coherence of proteins. We showed that adding CAS and PAS substantially improve the accuracy of identifying missing genes in the yeast enzyme-enzyme network compared to the cases when only the conventional features, gene expression, phylogenetic profile, were used. Finally, it was also demonstrated that the accuracy improves by considering indirect neighbors to the target enzyme position in the network using a proper network-topology-based weighting scheme.
Collapse
|
9
|
Kotera M, Goto S. Metabolic pathway reconstruction strategies for central metabolism and natural product biosynthesis. Biophys Physicobiol 2016; 13:195-205. [PMID: 27924274 PMCID: PMC5042172 DOI: 10.2142/biophysico.13.0_195] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Accepted: 06/20/2016] [Indexed: 12/22/2022] Open
Abstract
Metabolic pathway reconstruction presents a challenge for understanding metabolic pathways in organisms of interest. Different strategies, i.e., reference-based vs. de novo, must be used for pathway reconstruction depending on the availability of well-characterized enzymatic reactions. If at least one enzyme is already known to catalyze a reaction, its amino acid sequence can be used as a reference for identifying homologous enzymes in the genome of an organism of interest. Where there is no known enzyme able to catalyze a corresponding reaction, however, the reaction and the corresponding enzyme must be predicted de novo from chemical transformations of the putative substrate-product pair. This review summarizes studies involving reference-based and de novo metabolic pathway reconstruction and discusses the importance of the classification and structure-function relationships of enzymes.
Collapse
Affiliation(s)
- Masaaki Kotera
- School of Life Science and Technology, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Susumu Goto
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| |
Collapse
|
10
|
Moriya Y, Yamada T, Okuda S, Nakagawa Z, Kotera M, Tokimatsu T, Kanehisa M, Goto S. Identification of Enzyme Genes Using Chemical Structure Alignments of Substrate-Product Pairs. J Chem Inf Model 2016; 56:510-6. [PMID: 26822930 DOI: 10.1021/acs.jcim.5b00216] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Although there are several databases that contain data on many metabolites and reactions in biochemical pathways, there is still a big gap in the numbers between experimentally identified enzymes and metabolites. It is supposed that many catalytic enzyme genes are still unknown. Although there are previous studies that estimate the number of candidate enzyme genes, these studies required some additional information aside from the structures of metabolites such as gene expression and order in the genome. In this study, we developed a novel method to identify a candidate enzyme gene of a reaction using the chemical structures of the substrate-product pair (reactant pair). The proposed method is based on a search for similar reactant pairs in a reference database and offers ortholog groups that possibly mediate the given reaction. We applied the proposed method to two experimentally validated reactions. As a result, we confirmed that the histidine transaminase was correctly identified. Although our method could not directly identify the asparagine oxo-acid transaminase, we successfully found the paralog gene most similar to the correct enzyme gene. We also applied our method to infer candidate enzyme genes in the mesaconate pathway. The advantage of our method lies in the prediction of possible genes for orphan enzyme reactions where any associated gene sequences are not determined yet. We believe that this approach will facilitate experimental identification of genes for orphan enzymes.
Collapse
Affiliation(s)
- Yuki Moriya
- Bioinformatics Center, Institute for Chemical Research, Kyoto University , Uji, Kyoto 611-0011, Japan
| | - Takuji Yamada
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology , 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8550, Japan
| | - Shujiro Okuda
- Graduate School of Medical and Dental Sciences, Niigata University , 1-757 Asahimachi-dori, Chuo-ku, Niigata 951-8510, Japan
| | - Zenichi Nakagawa
- Bioinformatics Center, Institute for Chemical Research, Kyoto University , Uji, Kyoto 611-0011, Japan
| | - Masaaki Kotera
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology , 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8550, Japan
| | - Toshiaki Tokimatsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University , Uji, Kyoto 611-0011, Japan
| | - Minoru Kanehisa
- Bioinformatics Center, Institute for Chemical Research, Kyoto University , Uji, Kyoto 611-0011, Japan
| | - Susumu Goto
- Bioinformatics Center, Institute for Chemical Research, Kyoto University , Uji, Kyoto 611-0011, Japan
| |
Collapse
|
11
|
Ponce-de-Leon M, Calle-Espinosa J, Peretó J, Montero F. Consistency Analysis of Genome-Scale Models of Bacterial Metabolism: A Metamodel Approach. PLoS One 2015; 10:e0143626. [PMID: 26629901 PMCID: PMC4668087 DOI: 10.1371/journal.pone.0143626] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 11/06/2015] [Indexed: 01/10/2023] Open
Abstract
Genome-scale metabolic models usually contain inconsistencies that manifest as blocked reactions and gap metabolites. With the purpose to detect recurrent inconsistencies in metabolic models, a large-scale analysis was performed using a previously published dataset of 130 genome-scale models. The results showed that a large number of reactions (~22%) are blocked in all the models where they are present. To unravel the nature of such inconsistencies a metamodel was construed by joining the 130 models in a single network. This metamodel was manually curated using the unconnected modules approach, and then, it was used as a reference network to perform a gap-filling on each individual genome-scale model. Finally, a set of 36 models that had not been considered during the construction of the metamodel was used, as a proof of concept, to extend the metamodel with new biochemical information, and to assess its impact on gap-filling results. The analysis performed on the metamodel allowed to conclude: 1) the recurrent inconsistencies found in the models were already present in the metabolic database used during the reconstructions process; 2) the presence of inconsistencies in a metabolic database can be propagated to the reconstructed models; 3) there are reactions not manifested as blocked which are active as a consequence of some classes of artifacts, and; 4) the results of an automatic gap-filling are highly dependent on the consistency and completeness of the metamodel or metabolic database used as the reference network. In conclusion the consistency analysis should be applied to metabolic databases in order to detect and fill gaps as well as to detect and remove artifacts and redundant information.
Collapse
Affiliation(s)
- Miguel Ponce-de-Leon
- Departamento de Bioquímica y Biología Molecular I, Facultad de Ciencias Químicas, Universidad Complutense de Madrid, Ciudad Universitaria, Madrid 28045, Spain
- * E-mail:
| | - Jorge Calle-Espinosa
- Departamento de Bioquímica y Biología Molecular I, Facultad de Ciencias Químicas, Universidad Complutense de Madrid, Ciudad Universitaria, Madrid 28045, Spain
| | - Juli Peretó
- Departament de Bioquímica i Biologia Molecular and Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, C/José Beltrán 2, Paterna 46980, Spain
| | - Francisco Montero
- Departamento de Bioquímica y Biología Molecular I, Facultad de Ciencias Químicas, Universidad Complutense de Madrid, Ciudad Universitaria, Madrid 28045, Spain
| |
Collapse
|
12
|
Ufarté L, Laville É, Duquesne S, Potocki-Veronese G. Metagenomics for the discovery of pollutant degrading enzymes. Biotechnol Adv 2015; 33:1845-54. [DOI: 10.1016/j.biotechadv.2015.10.009] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Revised: 10/20/2015] [Accepted: 10/22/2015] [Indexed: 11/16/2022]
|
13
|
Sorokina M, Medigue C, Vallenet D. A new network representation of the metabolism to detect chemical transformation modules. BMC Bioinformatics 2015; 16:385. [PMID: 26573681 PMCID: PMC4647279 DOI: 10.1186/s12859-015-0809-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Accepted: 10/29/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Metabolism is generally modeled by directed networks where nodes represent reactions and/or metabolites. In order to explore metabolic pathway conservation and divergence among organisms, previous studies were based on graph alignment to find similar pathways. Few years ago, the concept of chemical transformation modules, also called reaction modules, was introduced and correspond to sequences of chemical transformations which are conserved in metabolism. We propose here a novel graph representation of the metabolic network where reactions sharing a same chemical transformation type are grouped in Reaction Molecular Signatures (RMS). RESULTS RMS were automatically computed for all reactions and encode changes in atoms and bonds. A reaction network containing all available metabolic knowledge was then reduced by an aggregation of reaction nodes and edges to obtain a RMS network. Paths in this network were explored and a substantial number of conserved chemical transformation modules was detected. Furthermore, this graph-based formalism allows us to define several path scores reflecting different biological conservation meanings. These scores are significantly higher for paths corresponding to known metabolic pathways and were used conjointly to build association rules that should predict metabolic pathway types like biosynthesis or degradation. CONCLUSIONS This representation of metabolism in a RMS network offers new insights to capture relevant metabolic contexts. Furthermore, along with genomic context methods, it should improve the detection of gene clusters corresponding to new metabolic pathways.
Collapse
Affiliation(s)
- Maria Sorokina
- Direction des Sciences du Vivant, Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA), Institut de Génomique, Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, 2 rue Gaston Crémieux, Evry, 91057, France.
- CNRS-UMR8030, 2 rue Gaston Crémieux, Evry, 91057, France.
- UEVE, Université d'Evry Val d'Essonne, Boulevard François Mitterrand, Evry, 91057, France.
| | - Claudine Medigue
- Direction des Sciences du Vivant, Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA), Institut de Génomique, Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, 2 rue Gaston Crémieux, Evry, 91057, France.
- CNRS-UMR8030, 2 rue Gaston Crémieux, Evry, 91057, France.
- UEVE, Université d'Evry Val d'Essonne, Boulevard François Mitterrand, Evry, 91057, France.
| | - David Vallenet
- Direction des Sciences du Vivant, Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA), Institut de Génomique, Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, 2 rue Gaston Crémieux, Evry, 91057, France.
- CNRS-UMR8030, 2 rue Gaston Crémieux, Evry, 91057, France.
- UEVE, Université d'Evry Val d'Essonne, Boulevard François Mitterrand, Evry, 91057, France.
| |
Collapse
|
14
|
Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N, Kotoulas G, Arvanitidis C, Iliopoulos I. Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinform Biol Insights 2015; 9:75-88. [PMID: 25983555 PMCID: PMC4426941 DOI: 10.4137/bbi.s12462] [Citation(s) in RCA: 177] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 03/09/2015] [Accepted: 03/13/2015] [Indexed: 12/14/2022] Open
Abstract
Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of "metagenomics", often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards.
Collapse
Affiliation(s)
- Anastasis Oulas
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Christina Pavloudi
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
- Department of Biology, University of Ghent, Ghent, Belgium
- Department of Microbial Ecophysiology, University of Bremen, Bremen, Germany
| | - Paraskevi Polymenakou
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Georgios A Pavlopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| | - Nikolas Papanikolaou
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| | - Georgios Kotoulas
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Christos Arvanitidis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Ioannis Iliopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| |
Collapse
|
15
|
Wang T, Mori H, Zhang C, Kurokawa K, Xing XH, Yamada T. DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe. BMC Bioinformatics 2015; 16:96. [PMID: 25888481 PMCID: PMC4389672 DOI: 10.1186/s12859-015-0499-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Accepted: 02/18/2015] [Indexed: 12/27/2022] Open
Abstract
Background Computational predictions of catalytic function are vital for in-depth understanding of enzymes. Because several novel approaches performing better than the common BLAST tool are rarely applied in research, we hypothesized that there is a large gap between the number of known annotated enzymes and the actual number in the protein universe, which significantly limits our ability to extract additional biologically relevant functional information from the available sequencing data. To reliably expand the enzyme space, we developed DomSign, a highly accurate domain signature–based enzyme functional prediction tool to assign Enzyme Commission (EC) digits. Results DomSign is a top-down prediction engine that yields results comparable, or superior, to those from many benchmark EC number prediction tools, including BLASTP, when a homolog with an identity >30% is not available in the database. Performance tests showed that DomSign is a highly reliable enzyme EC number annotation tool. After multiple tests, the accuracy is thought to be greater than 90%. Thus, DomSign can be applied to large-scale datasets, with the goal of expanding the enzyme space with high fidelity. Using DomSign, we successfully increased the percentage of EC-tagged enzymes from 12% to 30% in UniProt-TrEMBL. In the Kyoto Encyclopedia of Genes and Genomes bacterial database, the percentage of EC-tagged enzymes for each bacterial genome could be increased from 26.0% to 33.2% on average. Metagenomic mining was also efficient, as exemplified by the application of DomSign to the Human Microbiome Project dataset, recovering nearly one million new EC-labeled enzymes. Conclusions Our results offer preliminarily confirmation of the existence of the hypothesized huge number of “hidden enzymes” in the protein universe, the identification of which could substantially further our understanding of the metabolisms of diverse organisms and also facilitate bioengineering by providing a richer enzyme resource. Furthermore, our results highlight the necessity of using more advanced computational tools than BLAST in protein database annotations to extract additional biologically relevant functional information from the available biological sequences. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0499-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tianmin Wang
- Department of Biological Information, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, 2-12-1 M6-3, Ookayama, Meguro-ku, Tokyo, 152-8550, Japan. .,Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China.
| | - Hiroshi Mori
- Department of Biological Information, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, 2-12-1 M6-3, Ookayama, Meguro-ku, Tokyo, 152-8550, Japan. .,Earth-Life Science Institute, Tokyo Institute of Technology, 2-12-1-E3-10 Ookayama, Meguro-ku, Tokyo, 152-8550, Japan.
| | - Chong Zhang
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China.
| | - Ken Kurokawa
- Department of Biological Information, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, 2-12-1 M6-3, Ookayama, Meguro-ku, Tokyo, 152-8550, Japan. .,Earth-Life Science Institute, Tokyo Institute of Technology, 2-12-1-E3-10 Ookayama, Meguro-ku, Tokyo, 152-8550, Japan.
| | - Xin-Hui Xing
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China.
| | - Takuji Yamada
- Department of Biological Information, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, 2-12-1 M6-3, Ookayama, Meguro-ku, Tokyo, 152-8550, Japan.
| |
Collapse
|
16
|
Jacobson MP, Kalyanaraman C, Zhao S, Tian B. Leveraging structure for enzyme function prediction: methods, opportunities, and challenges. Trends Biochem Sci 2014; 39:363-71. [PMID: 24998033 DOI: 10.1016/j.tibs.2014.05.006] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Revised: 05/26/2014] [Accepted: 05/29/2014] [Indexed: 02/06/2023]
Abstract
The rapid growth of the number of protein sequences that can be inferred from sequenced genomes presents challenges for function assignment, because only a small fraction (currently <1%) has been experimentally characterized. Bioinformatics tools are commonly used to predict functions of uncharacterized proteins. Recently, there has been significant progress in using protein structures as an additional source of information to infer aspects of enzyme function, which is the focus of this review. Successful application of these approaches has led to the identification of novel metabolites, enzyme activities, and biochemical pathways. We discuss opportunities to elucidate systematically protein domains of unknown function, orphan enzyme activities, dead-end metabolites, and pathways in secondary metabolism.
Collapse
Affiliation(s)
- Matthew P Jacobson
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94158, USA; California Institute for Quantitative Biomedical Research, University of California, San Francisco, CA 94158, USA.
| | - Chakrapani Kalyanaraman
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94158, USA; California Institute for Quantitative Biomedical Research, University of California, San Francisco, CA 94158, USA
| | - Suwen Zhao
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94158, USA; California Institute for Quantitative Biomedical Research, University of California, San Francisco, CA 94158, USA
| | - Boxue Tian
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94158, USA; California Institute for Quantitative Biomedical Research, University of California, San Francisco, CA 94158, USA
| |
Collapse
|
17
|
Sorokina M, Stam M, Médigue C, Lespinet O, Vallenet D. Profiling the orphan enzymes. Biol Direct 2014; 9:10. [PMID: 24906382 PMCID: PMC4084501 DOI: 10.1186/1745-6150-9-10] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 05/29/2014] [Indexed: 11/10/2022] Open
Abstract
The emergence of Next Generation Sequencing generates an incredible amount of sequence and great potential for new enzyme discovery. Despite this huge amount of data and the profusion of bioinformatic methods for function prediction, a large part of known enzyme activities is still lacking an associated protein sequence. These particular activities are called "orphan enzymes". The present review proposes an update of previous surveys on orphan enzymes by mining the current content of public databases. While the percentage of orphan enzyme activities has decreased from 38% to 22% in ten years, there are still more than 1,000 orphans among the 5,000 entries of the Enzyme Commission (EC) classification. Taking into account all the reactions present in metabolic databases, this proportion dramatically increases to reach nearly 50% of orphans and many of them are not associated to a known pathway. We extended our survey to "local orphan enzymes" that are activities which have no representative sequence in a given clade, but have at least one in organisms belonging to other clades. We observe an important bias in Archaea and find that in general more than 30% of the EC activities have incomplete sequence information in at least one superkingdom. To estimate if candidate proteins for local orphans could be retrieved by homology search, we applied a simple strategy based on the PRIAM software and noticed that candidates may be proposed for an important fraction of local orphan enzymes. Finally, by studying relation between protein domains and catalyzed activities, it appears that newly discovered enzymes are mostly associated with already known enzyme domains. Thus, the exploration of the promiscuity and the multifunctional aspect of known enzyme families may solve part of the orphan enzyme issue. We conclude this review with a presentation of recent initiatives in finding proteins for orphan enzymes and in extending the enzyme world by the discovery of new activities.
Collapse
Affiliation(s)
- Maria Sorokina
- Direction des Sciences du Vivant, Commissariat à l'Energie Atomique (CEA), Institut de Génomique, Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, 2 rue Gaston Crémieux, 91057 Evry, France.
| | | | | | | | | |
Collapse
|
18
|
Alsop EB, Boyd ES, Raymond J. Merging metagenomics and geochemistry reveals environmental controls on biological diversity and evolution. BMC Ecol 2014; 14:16. [PMID: 24886397 PMCID: PMC4047435 DOI: 10.1186/1472-6785-14-16] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 05/16/2014] [Indexed: 11/13/2022] Open
Abstract
Background The metabolic strategies employed by microbes inhabiting natural systems are, in large part, dictated by the physical and geochemical properties of the environment. This study sheds light onto the complex relationship between biology and environmental geochemistry using forty-three metagenomes collected from geochemically diverse and globally distributed natural systems. It is widely hypothesized that many uncommonly measured geochemical parameters affect community dynamics and this study leverages the development and application of multidimensional biogeochemical metrics to study correlations between geochemistry and microbial ecology. Analysis techniques such as a Markov cluster-based measure of the evolutionary distance between whole communities and a principal component analysis (PCA) of the geochemical gradients between environments allows for the determination of correlations between microbial community dynamics and environmental geochemistry and provides insight into which geochemical parameters most strongly influence microbial biodiversity. Results By progressively building from samples taken along well defined geochemical gradients to samples widely dispersed in geochemical space this study reveals strong links between the extent of taxonomic and functional diversification of resident communities and environmental geochemistry and reveals temperature and pH as the primary factors that have shaped the evolution of these communities. Moreover, the inclusion of extensive geochemical data into analyses reveals new links between geochemical parameters (e.g. oxygen and trace element availability) and the distribution and taxonomic diversification of communities at the functional level. Further, an overall geochemical gradient (from multivariate analyses) between natural systems provides one of the most complete predictions of microbial taxonomic and functional composition. Conclusions Clustering based on the frequency in which orthologous proteins occur among metagenomes facilitated accurate prediction of the ordering of community functional composition along geochemical gradients, despite a lack of geochemical input. The consistency in the results obtained from the application of Markov clustering and multivariate methods to distinct natural systems underscore their utility in predicting the functional potential of microbial communities within a natural system based on system geochemistry alone, allowing geochemical measurements to be used to predict purely biological metrics such as microbial community composition and metabolism.
Collapse
Affiliation(s)
| | | | - Jason Raymond
- School of Earth and Space Exploration, Arizona State University, ISTB4, Room 795, 781 E, Terrace Rd, Tempe, AZ 85287, USA.
| |
Collapse
|
19
|
Shearer AG, Altman T, Rhee CD. Finding sequences for over 270 orphan enzymes. PLoS One 2014; 9:e97250. [PMID: 24826896 PMCID: PMC4020792 DOI: 10.1371/journal.pone.0097250] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 04/16/2014] [Indexed: 01/04/2023] Open
Abstract
Despite advances in sequencing technology, there are still significant numbers of well-characterized enzymatic activities for which there are no known associated sequences. These 'orphan enzymes' represent glaring holes in our biological understanding, and it is a top priority to reunite them with their coding sequences. Here we report a methodology for resolving orphan enzymes through a combination of database search and literature review. Using this method we were able to reconnect over 270 orphan enzymes with their corresponding sequence. This success points toward how we can systematically eliminate the remaining orphan enzymes and prevent the introduction of future orphan enzymes.
Collapse
Affiliation(s)
| | - Tomer Altman
- Stanford University, Stanford, California, United States of America
| | - Christine D. Rhee
- Clover Collective, Mountain View, California, United States of America
| |
Collapse
|
20
|
Klünemann M, Schmid M, Patil KR. Computational tools for modeling xenometabolism of the human gut microbiota. Trends Biotechnol 2014; 32:157-65. [PMID: 24529988 DOI: 10.1016/j.tibtech.2014.01.005] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Revised: 01/09/2014] [Accepted: 01/13/2014] [Indexed: 12/24/2022]
Abstract
The gut microbiota is increasingly being recognized as a key site of metabolism for drugs and other xenobiotic compounds that are relevant to human health. The molecular complexity of the gut microbiota revealed by recent metagenomics studies has highlighted the need as well as the challenges for system-level modeling of xenobiotic metabolism in the gut. Here, we outline the possible pathways through which the gut microbiota can modify xenobiotics and review the available computational tools towards modeling complex xenometabolic processes. We put these diverse computational tools and relevant experimental findings into a unified perspective towards building holistic models of xenobiotic metabolism in the gut.
Collapse
Affiliation(s)
- Martina Klünemann
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Melanie Schmid
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Kiran Raosaheb Patil
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany.
| |
Collapse
|
21
|
Boon E, Meehan CJ, Whidden C, Wong DHJ, Langille MGI, Beiko RG. Interactions in the microbiome: communities of organisms and communities of genes. FEMS Microbiol Rev 2014; 38:90-118. [PMID: 23909933 PMCID: PMC4298764 DOI: 10.1111/1574-6976.12035] [Citation(s) in RCA: 121] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Revised: 07/02/2013] [Accepted: 07/10/2013] [Indexed: 12/17/2022] Open
Abstract
A central challenge in microbial community ecology is the delineation of appropriate units of biodiversity, which can be taxonomic, phylogenetic, or functional in nature. The term 'community' is applied ambiguously; in some cases, the term refers simply to a set of observed entities, while in other cases, it requires that these entities interact with one another. Microorganisms can rapidly gain and lose genes, potentially decoupling community roles from taxonomic and phylogenetic groupings. Trait-based approaches offer a useful alternative, but many traits can be defined based on gene functions, metabolic modules, and genomic properties, and the optimal set of traits to choose is often not obvious. An analysis that considers taxon assignment and traits in concert may be ideal, with the strengths of each approach offsetting the weaknesses of the other. Individual genes also merit consideration as entities in an ecological analysis, with characteristics such as diversity, turnover, and interactions modeled using genes rather than organisms as entities. We identify some promising avenues of research that are likely to yield a deeper understanding of microbial communities that shift from observation-based questions of 'Who is there?' and 'What are they doing?' to the mechanistically driven question of 'How will they respond?'
Collapse
Affiliation(s)
- Eva Boon
- Department of Biology, Dalhousie University, Halifax, NS, Canada
| | | | | | | | | | | |
Collapse
|
22
|
Ponce-de-León M, Montero F, Peretó J. Solving gap metabolites and blocked reactions in genome-scale models: application to the metabolic network of Blattabacterium cuenoti. BMC SYSTEMS BIOLOGY 2013; 7:114. [PMID: 24176055 PMCID: PMC3819652 DOI: 10.1186/1752-0509-7-114] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Accepted: 10/23/2013] [Indexed: 11/20/2022]
Abstract
Background Metabolic reconstruction is the computational-based process that aims to elucidate the network of metabolites interconnected through reactions catalyzed by activities assigned to one or more genes. Reconstructed models may contain inconsistencies that appear as gap metabolites and blocked reactions. Although automatic methods for solving this problem have been previously developed, there are many situations where manual curation is still needed. Results We introduce a general definition of gap metabolite that allows its detection in a straightforward manner. Moreover, a method for the detection of Unconnected Modules, defined as isolated sets of blocked reactions connected through gap metabolites, is proposed. The method has been successfully applied to the curation of iCG238, the genome-scale metabolic model for the bacterium Blattabacterium cuenoti, obligate endosymbiont of cockroaches. Conclusion We found the proposed approach to be a valuable tool for the curation of genome-scale metabolic models. The outcome of its application to the genome-scale model B. cuenoti iCG238 is a more accurate model version named as B. cuenoti iMP240.
Collapse
Affiliation(s)
| | - Francisco Montero
- Departamento de Bioquímica y Biología Molecular I, Facultad de Ciencias Químicas, Universidad Complutense de Madrid, Ciudad Universitaria, Madrid 28045, Spain.
| | | |
Collapse
|
23
|
Brochado AR, Typas A. High-throughput approaches to understanding gene function and mapping network architecture in bacteria. Curr Opin Microbiol 2013; 16:199-206. [DOI: 10.1016/j.mib.2013.01.008] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Revised: 01/09/2013] [Accepted: 01/11/2013] [Indexed: 11/24/2022]
|
24
|
Inferring the metabolism of human orphan metabolites from their metabolic network context affirms human gluconokinase activity. Biochem J 2013; 449:427-35. [PMID: 23067238 DOI: 10.1042/bj20120980] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Metabolic network reconstructions define metabolic information within a target organism and can therefore be used to address incomplete metabolic information. In the present study we used a computational approach to identify human metabolites whose metabolism is incomplete on the basis of their detection in humans but exclusion from the human metabolic network reconstruction RECON 1. Candidate solutions, composed of metabolic reactions capable of explaining the metabolism of these compounds, were then identified computationally from a global biochemical reaction database. Solutions were characterized with respect to how metabolites were incorporated into RECON 1 and their biological relevance. Through detailed case studies we show that biologically plausible non-intuitive hypotheses regarding the metabolism of these compounds can be proposed in a semi-automated manner, in an approach that is similar to de novo network reconstruction. We subsequently experimentally validated one of the proposed hypotheses and report that C9orf103, previously identified as a candidate tumour suppressor gene, encodes a functional human gluconokinase. The results of the present study demonstrate how semi-automatic gap filling can be used to refine and extend metabolic reconstructions, thereby increasing their biological scope. Furthermore, we illustrate how incomplete human metabolic knowledge can be coupled with gene annotation in order to prioritize and confirm gene functions.
Collapse
|