1
|
Solano YJ, Kiser PD. Double-duty isomerases: a case study of isomerization-coupled enzymatic catalysis. Trends Biochem Sci 2024:S0968-0004(24)00107-5. [PMID: 38760195 DOI: 10.1016/j.tibs.2024.04.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/08/2024] [Accepted: 04/23/2024] [Indexed: 05/19/2024]
Abstract
Enzymes can usually be unambiguously assigned to one of seven classes specifying the basic chemistry of their catalyzed reactions. Less frequently, two or more reaction classes are catalyzed by a single enzyme within one active site. Two examples are an isomerohydrolase and an isomero-oxygenase that catalyze isomerization-coupled reactions crucial for production of vision-supporting 11-cis-retinoids. In these enzymes, isomerization is obligately paired and mechanistically intertwined with a second reaction class. A handful of other enzymes carrying out similarly coupled isomerization reactions have been described, some of which have been subjected to detailed structure-function analyses. Herein we review these rarefied enzymes, focusing on the mechanistic and structural basis of their reaction coupling with the goal of revealing catalytic commonalities.
Collapse
Affiliation(s)
- Yasmeen J Solano
- Department of Physiology and Biophysics, University of California Irvine School of Medicine, Irvine, CA 92697, USA
| | - Philip D Kiser
- Department of Physiology and Biophysics, University of California Irvine School of Medicine, Irvine, CA 92697, USA; Department of Clinical Pharmacy Practice, University of Irvine School of Pharmacy and Pharmaceutical Sciences, Irvine, CA 92697, USA; Department of Ophthalmology, Gavin Herbert Eye Institute - Center for Translational Vision Research, University of California Irvine School of Medicine, Irvine, CA 92697, USA; Research Service, VA Long Beach Healthcare System, Long Beach, CA 90822, USA.
| |
Collapse
|
2
|
Bartuv R, Berihu M, Medina S, Salim S, Feygenberg O, Faigenboim-Doron A, Zhimo VY, Abdelfattah A, Piombo E, Wisniewski M, Freilich S, Droby S. Functional analysis of the apple fruit microbiome based on shotgun metagenomic sequencing of conventional and organic orchard samples. Environ Microbiol 2023; 25:1728-1746. [PMID: 36807446 DOI: 10.1111/1462-2920.16353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 02/16/2023] [Indexed: 02/23/2023]
Abstract
Fruits harbour abundant and diverse microbial communities that protect them from post-harvest pathogens. Identification of functional traits associated with a given microbiota can provide a better understanding of their potential influence. Here, we focused on the epiphytic microbiome of apple fruit. We suggest that shotgun metagenomic data can indicate specific functions carried out by different groups and provide information on their potential impact. Samples were collected from the surface of 'Golden Delicious' apples from four orchards that differ in their geographic location and management practice. Approximately 1 million metagenes were predicted based on a high-quality assembly. Functional profiling of the microbiome of fruits from orchards differing in their management practice revealed a functional shift in the microbiota. The organic orchard microbiome was enriched in pathways involved in plant defence activities; the conventional orchard microbiome was enriched in pathways related to the synthesis of antibiotics. The functional significance of the variations was explored using microbial network modelling algorithms to reveal the metabolic role of specific phylogenetic groups. The analysis identified several associations supported by other published studies. For example, the analysis revealed the nutritional dependencies of the Capnodiales group, including the Alternaria pathogen, on aromatic compounds.
Collapse
Affiliation(s)
- Rotem Bartuv
- Agricultural Research Organization (A.R.O.), Institute of Plant Sciences, Rishon LeZion/Ramat Yishay, Israel
- The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture, Faculty of Agriculture, The Hebrew University of Jerusalem, Rehovot, Israel
- Department of Postharvest Science, Agricultural Research Organization, The Volcani Institute, Rishon LeZion, Israel
| | - Maria Berihu
- Agricultural Research Organization (A.R.O.), Institute of Plant Sciences, Rishon LeZion/Ramat Yishay, Israel
| | - Shlomit Medina
- Agricultural Research Organization (A.R.O.), Institute of Plant Sciences, Rishon LeZion/Ramat Yishay, Israel
| | - Shoshana Salim
- Department of Postharvest Science, Agricultural Research Organization, The Volcani Institute, Rishon LeZion, Israel
| | - Oleg Feygenberg
- Department of Postharvest Science, Agricultural Research Organization, The Volcani Institute, Rishon LeZion, Israel
| | - Adi Faigenboim-Doron
- Agricultural Research Organization (A.R.O.), Institute of Plant Sciences, Rishon LeZion/Ramat Yishay, Israel
| | - V Yeka Zhimo
- Department of Postharvest Science, Agricultural Research Organization, The Volcani Institute, Rishon LeZion, Israel
| | - Ahmed Abdelfattah
- Department of Microbiome Biotechnology, Leibniz Institute for Agricultural Engineering and Bioeconomy, Potsdam, Germany
| | - Edoardo Piombo
- Department of Agricultural, Forest and Food Sciences (DISAFA), University of Torino, Grugliasco, Italy
| | - Michael Wisniewski
- Department of Biological Sciences, Virginia Polytechnic Institute, and State University, Blacksburg, Virginia, USA
| | - Shiri Freilich
- Agricultural Research Organization (A.R.O.), Institute of Plant Sciences, Rishon LeZion/Ramat Yishay, Israel
| | - Samir Droby
- Department of Postharvest Science, Agricultural Research Organization, The Volcani Institute, Rishon LeZion, Israel
| |
Collapse
|
3
|
Berihu M, Somera TS, Malik A, Medina S, Piombo E, Tal O, Cohen M, Ginatt A, Ofek-Lalzar M, Doron-Faigenboim A, Mazzola M, Freilich S. A framework for the targeted recruitment of crop-beneficial soil taxa based on network analysis of metagenomics data. MICROBIOME 2023; 11:8. [PMID: 36635724 PMCID: PMC9835355 DOI: 10.1186/s40168-022-01438-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 11/28/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND The design of ecologically sustainable and plant-beneficial soil systems is a key goal in actively manipulating root-associated microbiomes. Community engineering efforts commonly seek to harness the potential of the indigenous microbiome through substrate-mediated recruitment of beneficial members. In most sustainable practices, microbial recruitment mechanisms rely on the application of complex organic mixtures where the resources/metabolites that act as direct stimulants of beneficial groups are not characterized. Outcomes of such indirect amendments are unpredictable regarding engineering the microbiome and achieving a plant-beneficial environment. RESULTS This study applied network analysis of metagenomics data to explore amendment-derived transformations in the soil microbiome, which lead to the suppression of pathogens affecting apple root systems. Shotgun metagenomic analysis was conducted with data from 'sick' vs 'healthy/recovered' rhizosphere soil microbiomes. The data was then converted into community-level metabolic networks. Simulations examined the functional contribution of treatment-associated taxonomic groups and linked them with specific amendment-induced metabolites. This analysis enabled the selection of specific metabolites that were predicted to amplify or diminish the abundance of targeted microbes functional in the healthy soil system. Many of these predictions were corroborated by experimental evidence from the literature. The potential of two of these metabolites (dopamine and vitamin B12) to either stimulate or suppress targeted microbial groups was evaluated in a follow-up set of soil microcosm experiments. The results corroborated the stimulant's potential (but not the suppressor) to act as a modulator of plant beneficial bacteria, paving the way for future development of knowledge-based (rather than trial and error) metabolic-defined amendments. Our pipeline for generating predictions for the selective targeting of microbial groups based on processing assembled and annotated metagenomics data is available at https://github.com/ot483/NetCom2 . CONCLUSIONS This research demonstrates how genomic-based algorithms can be used to formulate testable hypotheses for strategically engineering the rhizosphere microbiome by identifying specific compounds, which may act as selective modulators of microbial communities. Applying this framework to reduce unpredictable elements in amendment-based solutions promotes the development of ecologically-sound methods for re-establishing a functional microbiome in agro and other ecosystems. Video Abstract.
Collapse
Affiliation(s)
- Maria Berihu
- Agricultural Research Organization (ARO), Institute of Plant Sciences, Rishon LeZion/Ramat Yishay, Israel
| | - Tracey S. Somera
- United States Department of Agriculture-Agricultural Research Service Tree Fruit Research Lab, 1104 N. Western Ave, Wenatchee, WA 98801 USA
| | | | - Shlomit Medina
- Agricultural Research Organization (ARO), Institute of Plant Sciences, Rishon LeZion/Ramat Yishay, Israel
| | - Edoardo Piombo
- Department of Agricultural, Forest and Food Sciences (DISAFA), University of Torino, Grugliasco, Italy
- Department of Forest Mycology and Plant Pathology, Uppsala Biocenter, Swedish University of Agricultural Sciences, P.O. Box 7026, 75007 Uppsala, Sweden
| | - Ofir Tal
- Agricultural Research Organization (ARO), Institute of Plant Sciences, Rishon LeZion/Ramat Yishay, Israel
- Kinneret Limnological Laboratory (KLL) Israel Oceanographic and Limnological Research (IOLR), P.O. Box 447, 49500 Migdal, Israel
| | - Matan Cohen
- Agricultural Research Organization (ARO), Institute of Plant Sciences, Rishon LeZion/Ramat Yishay, Israel
| | - Alon Ginatt
- Agricultural Research Organization (ARO), Institute of Plant Sciences, Rishon LeZion/Ramat Yishay, Israel
| | | | - Adi Doron-Faigenboim
- Agricultural Research Organization (ARO), Institute of Plant Sciences, Rishon LeZion/Ramat Yishay, Israel
| | - Mark Mazzola
- United States Department of Agriculture-Agricultural Research Service Tree Fruit Research Lab, 1104 N. Western Ave, Wenatchee, WA 98801 USA
- Department of Plant Pathology, Stellenbosch University, Private Bag X1, Matieland, 7600 South Africa
| | - Shiri Freilich
- Agricultural Research Organization (ARO), Institute of Plant Sciences, Rishon LeZion/Ramat Yishay, Israel
| |
Collapse
|
4
|
de Oliveira Almeida R, Valente GT. Predicting metabolic pathways of plant enzymes without using sequence similarity: Models from machine learning. THE PLANT GENOME 2020; 13:e20043. [PMID: 33217216 DOI: 10.1002/tpg2.20043] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Revised: 06/03/2020] [Accepted: 06/10/2020] [Indexed: 06/11/2023]
Abstract
Most of the bioinformatics tools for enzyme annotation focus on enzymatic function assignments. Sequence similarity to well-characterized enzymes is often used for functional annotation and to assign metabolic pathways. However, these approaches are not feasible for all sequences leading to inaccurate annotations or lack of metabolic pathway information. Here we present the mApLe (metabolic pathway predictor of plant enzymes), a high-performance machine learning-based tool with models to label the metabolic pathway of enzymes rather than specifying enzymes' reactions. The mApLe uses molecular descriptors of the enzyme sequences to perform predictions without considering sequence similarities with reference sequences. Hence, mApLe can classify a diversity of enzymes, even the ones without any homolog or with incomplete EC numbers. This tool can be used to improve the quality of genomic annotation of plants or to narrow down the number of candidate genes for metabolic engineering researches. The mApLe tool is available online, and the GUI can be locally installed.
Collapse
Affiliation(s)
- Rodrigo de Oliveira Almeida
- Instituto Federal de Educação, Ciência e Tecnologia do Sudeste de Minas Gerais, Muriaé, Brazil
- Department of Bioprocess and Biotechnology, School of Agriculture, São Paulo State University (Unesp), Botucatu, Brazil
| | - Guilherme Targino Valente
- Department of Bioprocess and Biotechnology, School of Agriculture, São Paulo State University (Unesp), Botucatu, Brazil
- Department of Developmental Genetics, Max Planck Institut für Herz- und Lungenforschung, Bad Nauheim, Germany
| |
Collapse
|
5
|
Hidden resources in the Escherichia coli genome restore PLP synthesis and robust growth after deletion of the essential gene pdxB. Proc Natl Acad Sci U S A 2019; 116:24164-24173. [PMID: 31712440 PMCID: PMC6883840 DOI: 10.1073/pnas.1915569116] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
The evolution of new metabolic pathways has been a driver of diversification from the last universal common ancestor 3.8 billion y ago to the present. Bioinformatic evidence suggests that many pathways were assembled by recruiting promiscuous enzymes to serve new functions. However, the processes by which new pathways have emerged are lost in time. We have little information about the environmental conditions that fostered emergence of new pathways, the genome context in which new pathways emerged, and the types of mutations that elevated flux through inefficient new pathways. Experimental laboratory evolution has allowed us to evolve a new pathway and identify mechanisms by which mutations increase fitness when an inefficient new pathway becomes important for survival. PdxB (erythronate 4-phosphate dehydrogenase) is expected to be required for synthesis of the essential cofactor pyridoxal 5′-phosphate (PLP) in Escherichia coli. Surprisingly, incubation of the ∆pdxB strain in medium containing glucose as a sole carbon source for 10 d resulted in visible turbidity, suggesting that PLP is being produced by some alternative pathway. Continued evolution of parallel lineages for 110 to 150 generations produced several strains that grow robustly in glucose. We identified a 4-step bypass pathway patched together from promiscuous enzymes that restores PLP synthesis in strain JK1. None of the mutations in JK1 occurs in a gene encoding an enzyme in the new pathway. Two mutations indirectly enhance the ability of SerA (3-phosphoglycerate dehydrogenase) to perform a new function in the bypass pathway. Another disrupts a gene encoding a PLP phosphatase, thus preserving PLP levels. These results demonstrate that a functional pathway can be patched together from promiscuous enzymes in the proteome, even without mutations in the genes encoding those enzymes.
Collapse
|
6
|
Katsir L, Zhepu R, Santos Garcia D, Piasezky A, Jiang J, Sela N, Freilich S, Bahar O. Genome Analysis of Haplotype D of Candidatus Liberibacter Solanacearum. Front Microbiol 2018; 9:2933. [PMID: 30619106 PMCID: PMC6295461 DOI: 10.3389/fmicb.2018.02933] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 11/14/2018] [Indexed: 11/20/2022] Open
Abstract
Candidatus Liberibacter solanacearum (Lso) haplotype D (LsoD) is a suspected bacterial pathogen, spread by the phloem-feeding psyllid Bactericera trigonica Hodkinson and found to infect carrot plants throughout the Mediterranean. Haplotype D is one of six haplotypes of Lso that each have specific and overlapping host preferences, disease symptoms, and psyllid vectors. Genotyping of rRNA genes has allowed for tracking the haplotype diversity of Lso and genome sequencing of several haplotypes has been performed to advance a comprehensive understanding of Lso diseases and of the phylogenetic relationships among the haplotypes. To further pursue that aim we have sequenced the genome of LsoD from its psyllid vector and report here its draft genome. Genome-based single nucleotide polymorphism analysis indicates LsoD is most closely related to the A haplotype. Genomic features and the metabolic potential of LsoD are assessed in relation to Lso haplotypes A, B, and C, as well as the facultative strain Liberibacter crescens. We identify genes unique to haplotype D as well as putative secreted effectors that may play a role in disease characteristics specific to this haplotype of Lso.
Collapse
Affiliation(s)
- Leron Katsir
- Department of Plant Pathology and Weed Research, Agricultural Research Organization, Volcani Center, Rishon LeZion, Israel
| | - Ruan Zhepu
- Newe Ya’ar Research Center, Agricultural Research Organization, Ramat Yishay, Israel
- Department of Microbiology, College of Life Sciences, Nanjing Agricultural University, Nanjing, China
| | - Diego Santos Garcia
- Department of Entomology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Alon Piasezky
- Department of Plant Pathology and Weed Research, Agricultural Research Organization, Volcani Center, Rishon LeZion, Israel
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Jiandong Jiang
- Department of Microbiology, College of Life Sciences, Nanjing Agricultural University, Nanjing, China
| | - Noa Sela
- Department of Plant Pathology and Weed Research, Agricultural Research Organization, Volcani Center, Rishon LeZion, Israel
| | - Shiri Freilich
- Newe Ya’ar Research Center, Agricultural Research Organization, Ramat Yishay, Israel
| | - Ofir Bahar
- Department of Plant Pathology and Weed Research, Agricultural Research Organization, Volcani Center, Rishon LeZion, Israel
| |
Collapse
|
7
|
Towards a Stochastic Paradigm: From Fuzzy Ensembles to Cellular Functions. Molecules 2018; 23:molecules23113008. [PMID: 30453632 PMCID: PMC6278454 DOI: 10.3390/molecules23113008] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Revised: 11/11/2018] [Accepted: 11/16/2018] [Indexed: 01/03/2023] Open
Abstract
The deterministic sequence → structure → function relationship is not applicable to describe how proteins dynamically adapt to different cellular conditions. A stochastic model is required to capture functional promiscuity, redundant sequence motifs, dynamic interactions, or conformational heterogeneity, which facilitate the decision-making in regulatory processes, ranging from enzymes to membraneless cellular compartments. The fuzzy set theory offers a quantitative framework to address these problems. The fuzzy formalism allows the simultaneous involvement of proteins in multiple activities, the degree of which is given by the corresponding memberships. Adaptation is described via a fuzzy inference system, which relates heterogeneous conformational ensembles to different biological activities. Sequence redundancies (e.g., tandem motifs) can also be treated by fuzzy sets to characterize structural transitions affecting the heterogeneous interaction patterns (e.g., pathological fibrillization of stress granules). The proposed framework can provide quantitative protein models, under stochastic cellular conditions.
Collapse
|
8
|
Copley SD. Shining a light on enzyme promiscuity. Curr Opin Struct Biol 2017; 47:167-175. [DOI: 10.1016/j.sbi.2017.11.001] [Citation(s) in RCA: 104] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Revised: 08/14/2017] [Accepted: 11/02/2017] [Indexed: 11/16/2022]
|
9
|
Martínez-Núñez MA, Rodríguez-Escamilla Z, Rodríguez-Vázquez K, Pérez-Rueda E. Tracing the Repertoire of Promiscuous Enzymes along the Metabolic Pathways in Archaeal Organisms. Life (Basel) 2017; 7:life7030030. [PMID: 28703743 PMCID: PMC5617955 DOI: 10.3390/life7030030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Revised: 07/09/2017] [Accepted: 07/10/2017] [Indexed: 01/10/2023] Open
Abstract
The metabolic pathways that carry out the biochemical transformations sustaining life depend on the efficiency of their associated enzymes. In recent years, it has become clear that promiscuous enzymes have played an important role in the function and evolution of metabolism. In this work we analyze the repertoire of promiscuous enzymes in 89 non-redundant genomes of the Archaea cellular domain. Promiscuous enzymes are defined as those proteins with two or more different Enzyme Commission (E.C.) numbers, according the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. From this analysis, it was found that the fraction of promiscuous enzymes is lower in Archaea than in Bacteria. A greater diversity of superfamily domains is associated with promiscuous enzymes compared to specialized enzymes, both in Archaea and Bacteria, and there is an enrichment of substrate promiscuity rather than catalytic promiscuity in the archaeal enzymes. Finally, the presence of promiscuous enzymes in the metabolic pathways was found to be heterogeneously distributed at the domain level and in the phyla that make up the Archaea. These analyses increase our understanding of promiscuous enzymes and provide additional clues to the evolution of metabolism in Archaea.
Collapse
Affiliation(s)
- Mario Alberto Martínez-Núñez
- Laboratorio de Estudios Ecogenómicos, Facultad de Ciencias, Unidad Académica de Ciencias y Tecnología de la UNAM en Yucatán, Universidad Nacional Autónoma de México, Carretera Sierra Papacal-Chuburna Km. 5, C.P. 97302, Mérida, Yucatán, Mexico.
| | - Zuemy Rodríguez-Escamilla
- Departamento de Microbiología, Instituto de Biotecnología, Universidad Nacional, Autónoma de México, C.P. 62210, Cuernavaca, Morelos, Mexico.
| | - Katya Rodríguez-Vázquez
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Ciudad Universitaria, C.P. 04510, Ciudad de México, Mexico.
| | - Ernesto Pérez-Rueda
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México, C.P. 62210, Cuernavaca, Morelos, Mexico.
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Carretera Sierra Papacal-Chuburna Km. 5, C.P. 97302, Mérida, Yucatán, Mexico.
| |
Collapse
|
10
|
Multiple nucleophilic elbows leading to multiple active sites in a single module esterase from Sorangium cellulosum. J Struct Biol 2015; 190:314-27. [DOI: 10.1016/j.jsb.2015.04.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Revised: 03/25/2015] [Accepted: 04/10/2015] [Indexed: 11/17/2022]
|
11
|
Martínez Cuesta S, Rahman SA, Furnham N, Thornton JM. The Classification and Evolution of Enzyme Function. Biophys J 2015; 109:1082-6. [PMID: 25986631 DOI: 10.1016/j.bpj.2015.04.020] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Revised: 04/16/2015] [Accepted: 04/17/2015] [Indexed: 11/30/2022] Open
Abstract
Enzymes are the proteins responsible for the catalysis of life. Enzymes sharing a common ancestor as defined by sequence and structure similarity are grouped into families and superfamilies. The molecular function of enzymes is defined as their ability to catalyze biochemical reactions; it is manually classified by the Enzyme Commission and robust approaches to quantitatively compare catalytic reactions are just beginning to appear. Here, we present an overview of studies at the interface of the evolution and function of enzymes.
Collapse
Affiliation(s)
- Sergio Martínez Cuesta
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Syed Asad Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene & Tropical Medicine, London, United Kingdom
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
| |
Collapse
|
12
|
Zhou F, Toivonen H, King RD. The use of weighted graphs for large-scale genome analysis. PLoS One 2014; 9:e89618. [PMID: 24619061 PMCID: PMC3949676 DOI: 10.1371/journal.pone.0089618] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2013] [Accepted: 01/23/2014] [Indexed: 11/18/2022] Open
Abstract
There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution.
Collapse
Affiliation(s)
- Fang Zhou
- Division of Computer Science, The University of Nottingham, Ningbo, China
| | - Hannu Toivonen
- Department of Computer Science, University of Helsinki, Finland
| | - Ross D. King
- Manchester Institute of Biotechnology, University of Manchester, Manchester, United Kingdom
- * E-mail:
| |
Collapse
|
13
|
Comparative genomics approaches to understanding and manipulating plant metabolism. Curr Opin Biotechnol 2013; 24:278-84. [DOI: 10.1016/j.copbio.2012.07.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Revised: 07/29/2012] [Accepted: 07/30/2012] [Indexed: 12/11/2022]
|
14
|
Chen TW, Gan RCR, Wu TH, Huang PJ, Lee CY, Chen YYM, Chen CC, Tang P. FastAnnotator--an efficient transcript annotation web tool. BMC Genomics 2012; 13 Suppl 7:S9. [PMID: 23281853 PMCID: PMC3521244 DOI: 10.1186/1471-2164-13-s7-s9] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recent developments in high-throughput sequencing (HTS) technologies have made it feasible to sequence the complete transcriptomes of non-model organisms or metatranscriptomes from environmental samples. The challenge after generating hundreds of millions of sequences is to annotate these transcripts and classify the transcripts based on their putative functions. Because many biological scientists lack the knowledge to install Linux-based software packages or maintain databases used for transcript annotation, we developed an automatic annotation tool with an easy-to-use interface. METHODS To elucidate the potential functions of gene transcripts, we integrated well-established annotation tools: Blast2GO, PRIAM and RPS BLAST in a web-based service, FastAnnotator, which can assign Gene Ontology (GO) terms, Enzyme Commission numbers (EC numbers) and functional domains to query sequences. RESULTS Using six transcriptome sequence datasets as examples, we demonstrated the ability of FastAnnotator to assign functional annotations. FastAnnotator annotated 88.1% and 81.3% of the transcripts from the well-studied organisms Caenorhabditis elegans and Streptococcus parasanguinis, respectively. Furthermore, FastAnnotator annotated 62.9%, 20.4%, 53.1% and 42.0% of the sequences from the transcriptomes of sweet potato, clam, amoeba, and Trichomonas vaginalis, respectively, which lack reference genomes. We demonstrated that FastAnnotator can complete the annotation process in a reasonable amount of time and is suitable for the annotation of transcriptomes from model organisms or organisms for which annotated reference genomes are not avaiable. CONCLUSIONS The sequencing process no longer represents the bottleneck in the study of genomics, and automatic annotation tools have become invaluable as the annotation procedure has become the limiting step. We present FastAnnotator, which was an automated annotation web tool designed to efficiently annotate sequences with their gene functions, enzyme functions or domains. FastAnnotator is useful in transcriptome studies and especially for those focusing on non-model organisms or metatranscriptomes. FastAnnotator does not require local installation and is freely available at http://fastannotator.cgu.edu.tw.
Collapse
Affiliation(s)
- Ting-Wen Chen
- Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | | | | | | | | | | | | | | |
Collapse
|
15
|
Klein CC, Cottret L, Kielbassa J, Charles H, Gautier C, Ribeiro de Vasconcelos AT, Lacroix V, Sagot MF. Exploration of the core metabolism of symbiotic bacteria. BMC Genomics 2012; 13:438. [PMID: 22938206 PMCID: PMC3543179 DOI: 10.1186/1471-2164-13-438] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Accepted: 08/18/2012] [Indexed: 12/01/2022] Open
Abstract
Background A large number of genome-scale metabolic networks is now available for many organisms, mostly bacteria. Previous works on minimal gene sets, when analysing host-dependent bacteria, found small common sets of metabolic genes. When such analyses are restricted to bacteria with similar lifestyles, larger portions of metabolism are expected to be shared and their composition is worth investigating. Here we report a comparative analysis of the small molecule metabolism of symbiotic bacteria, exploring common and variable portions as well as the contribution of different lifestyle groups to the reduction of a common set of metabolic capabilities. Results We found no reaction shared by all the bacteria analysed. Disregarding those with the smallest genomes, we still do not find a reaction core, however we did find a core of biochemical capabilities. While obligate intracellular symbionts have no core of reactions within their group, extracellular and cell-associated symbionts do have a small core composed of disconnected fragments. In agreement with previous findings in Escherichia coli, their cores are enriched in biosynthetic processes whereas the variable metabolisms have similar ratios of biosynthetic and degradation reactions. Conversely, the variable metabolism of obligate intracellular symbionts is enriched in anabolism. Conclusion Even when removing the symbionts with the most reduced genomes, there is no core of reactions common to the analysed symbiotic bacteria. The main reason is the very high specialisation of obligate intracellular symbionts, however, host-dependence alone is not an explanation for such absence. The composition of the metabolism of cell-associated and extracellular bacteria shows that while they have similar needs in terms of the building blocks of their cells, they have to adapt to very distinct environments. On the other hand, in obligate intracellular bacteria, catabolism has largely disappeared, whereas synthetic routes appear to have been selected for depending on the nature of the symbiosis. As more genomes are added, we expect, based on our simulations, that the core of cell-associated and extracellular bacteria continues to diminish, converging to approximately 60 reactions.
Collapse
|
16
|
Suen S, Lu HHS, Yeang CH. Evolution of domain architectures and catalytic functions of enzymes in metabolic systems. Genome Biol Evol 2012; 4:976-93. [PMID: 22936075 PMCID: PMC3468959 DOI: 10.1093/gbe/evs072] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Domain architectures and catalytic functions of enzymes constitute the centerpieces of a metabolic network. These types of information are formulated as a two-layered network consisting of domains, proteins, and reactions-a domain-protein-reaction (DPR) network. We propose an algorithm to reconstruct the evolutionary history of DPR networks across multiple species and categorize the mechanisms of metabolic systems evolution in terms of network changes. The reconstructed history reveals distinct patterns of evolutionary mechanisms between prokaryotic and eukaryotic networks. Although the evolutionary mechanisms in early ancestors of prokaryotes and eukaryotes are quite similar, more novel and duplicated domain compositions with identical catalytic functions arise along the eukaryotic lineage. In contrast, prokaryotic enzymes become more versatile by catalyzing multiple reactions with similar chemical operations. Moreover, different metabolic pathways are enriched with distinct network evolution mechanisms. For instance, although the pathways of steroid biosynthesis, protein kinases, and glycosaminoglycan biosynthesis all constitute prominent features of animal-specific physiology, their evolution of domain architectures and catalytic functions follows distinct patterns. Steroid biosynthesis is enriched with reaction creations but retains a relatively conserved repertoire of domain compositions and proteins. Protein kinases retain conserved reactions but possess many novel domains and proteins. In contrast, glycosaminoglycan biosynthesis has high rates of reaction/protein creations and domain recruitments. Finally, we elicit and validate two general principles underlying the evolution of DPR networks: 1) duplicated enzyme proteins possess similar catalytic functions and 2) the majority of novel domains arise to catalyze novel reactions. These results shed new lights on the evolution of metabolic systems.
Collapse
Affiliation(s)
- Summit Suen
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | | | | |
Collapse
|
17
|
Seaver SMD, Henry CS, Hanson AD. Frontiers in metabolic reconstruction and modeling of plant genomes. JOURNAL OF EXPERIMENTAL BOTANY 2012; 63:2247-58. [PMID: 22238452 DOI: 10.1093/jxb/err371] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
A major goal of post-genomic biology is to reconstruct and model in silico the metabolic networks of entire organisms. Work on bacteria is well advanced, and is now under way for plants and other eukaryotes. Genome-scale modelling in plants is much more challenging than in bacteria. The challenges come from features characteristic of higher organisms (subcellular compartmentation, tissue differentiation) and also from the particular severity in plants of a general problem: genome content whose functions remain undiscovered. This problem results in thousands of genes for which no function is known ('undiscovered genome content') and hundreds of enzymatic and transport functions for which no gene is yet identified. The severity of the undiscovered genome content problem in plants reflects their genome size and complexity. To bring the challenges of plant genome-scale modelling into focus, we first summarize the current status of plant genome-scale models. We then highlight the challenges - and ways to address them - in three areas: identifying genes for missing processes, modelling tissues as opposed to single cells, and finding metabolic functions encoded by undiscovered genome content. We also discuss the emerging view that a significant fraction of undiscovered genome content encodes functions that counter damage to metabolites inflicted by spontaneous chemical reactions or enzymatic mistakes.
Collapse
Affiliation(s)
- Samuel M D Seaver
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA
| | | | | |
Collapse
|
18
|
Abstract
Large superfamilies of enzymes derived from a common progenitor have emerged by duplication and divergence of genes encoding metabolic enzymes. Division of the functions of early generalist enzymes enhanced catalytic power and control over metabolic fluxes. Later, novel enzymes evolved from inefficient secondary activities in specialized enzymes. Enzymes operate in the context of complex metabolic and regulatory networks. The potential for evolution of a new enzyme depends upon the collection of enzymes in a microbe, the topology of the metabolic network, the environmental conditions, and the net effect of trade-offs between the original and novel activities of the enzyme.
Collapse
Affiliation(s)
- Shelley D Copley
- Department of Molecular, Cellular and Developmental Biology and Cooperative Institute for Research in Environmental Sciences, University of Colorado at Boulder, Boulder, Colorado 80309.
| |
Collapse
|
19
|
Three serendipitous pathways in E. coli can bypass a block in pyridoxal-5'-phosphate synthesis. Mol Syst Biol 2011; 6:436. [PMID: 21119630 PMCID: PMC3010111 DOI: 10.1038/msb.2010.88] [Citation(s) in RCA: 97] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Accepted: 09/30/2010] [Indexed: 11/28/2022] Open
Abstract
Overexpression of seven different genes restores growth of a ΔpdxB strain of E. coli, which cannot make pyridoxal phosphate (PLP), on M9/glucose. None of the enzymes encoded by these genes has a promiscuous 4-phosphoerythronate dehydrogenase activity that can replace the activity of PdxB. Overexpression of these genes restores PLP synthesis by three different serendipitous pathways that feed into the normal PLP synthesis pathway downstream of the blocked step. Reactions in one of these pathways are catalyzed by low-level activities of enzymes of unknown function and a promiscuous activity of an enzyme that normally has a role in another pathway; one reaction appears to be non-enzymatic.
Most metabolic enzymes are prodigious catalysts that have evolved to accelerate chemical reactions with high efficiency and specificity. However, many enzymes have inefficient promiscuous activities, as well, as a result of the assemblage of highly reactive catalytic residues and cofactors in active sites. Although promiscuous activities are generally orders of magnitude less efficient than well-evolved activities (O'Brien and Herschlag, 1998, 2001; Wang et al, 2003; Taylor Ringia et al, 2004), they often enhance reaction rates by orders of magnitude relative to those of uncatalyzed reactions (O'Brien and Herschlag, 1998, 2001). Thus, promiscuous activities provide a reservoir of novel catalytic activities that can be recruited to serve new functions. The evolutionary potential of promiscuous enzymes extends beyond the recruitment of single enzymes to serve new functions. Microbes contain hundreds of enzymes—E. coli contains about 1700 (Freilich et al, 2005)—raising the possibility that promiscuous enzymes can be patched together to generate ‘serendipitous' pathways that are not part of normal metabolism. We distinguish serendipitous pathways from latent or cryptic pathways, which are bona fide pathways involving dedicated enzymes that are produced only under particular environmental circumstances. In contrast, serendipitous pathways are patched together from enzymes that normally serve other functions and are not regulated in a coordinated manner in response to the need to synthesize or degrade a metabolite. In this study, we describe the discovery of three serendipitous pathways that allow synthesis of pyridoxal phosphate (PLP) in a strain of E. coli that lacks 4-phosphoerythronate dehydrogenase (PdxB) when one of the seven different genes is overexpressed. These genes were identified in a multicopy suppression experiment in which a library of E. coli genes (from the ASKA collection) was introduced into a ΔpdxB strain of E. coli that is unable to synthesize PLP. Surprisingly, none of the enzymes encoded by these genes has a promiscuous 4-phosphoerythronate (4PE) dehydrogenase activity that can substitute for the missing PdxB. Rather, overproduction of these enzymes appears to facilitate at least three serendipitous pathways that draw material from other metabolic pathways and feed into the normal PLP synthesis pathway downstream of the blocked step (Figure 1). We have characterized one of these pathways in detail (Figure 3). The first step, dephosphorylation of 3-phosphohydroxypyruvate, is catalyzed by YeaB, a predicted NUDIX hydrolase of unknown function. Although catalysis is inefficient (kcat=5.7×10−5 s−1 and kcat/KM>0.028 M−1 s−1), the enzymatic rate is 4×107-fold faster than the rate of the uncatalyzed reaction, and is sufficient to support PLP synthesis when YeaB is overproduced. The second step in the pathway is decarboxylation of 3-hydroxypyruvate (3HP). Although we found two enzymes (1-deoxyxylulose-5-phosphate synthase and the catalytic domain of α-ketoglutarate dehydrogenase) that catalyze this reaction with low but respectable activity in vitro, their involvement in pathway 1 was ruled out by genetic methods. Surprisingly, the non-enzymatic rate of decarboxylation of 3HP appears to be sufficient to support PLP synthesis. The third step in the pathway, condensation of glycolaldehyde and glycine to form 4-hydroxy-L-threonine, is catalyzed by LtaE, a low-specificity threonine aldolase whose physiological role is not known. The final step, phosphorylation of 4-hydroxy-L-threonine, is catalyzed by homoserine kinase (ThrB), which is required for synthesis of threonine. The promiscuous phosphorylation of 4-hydroxy-L-threonine is 80-fold slower than the physiological phosphorylation of homoserine. The involvement of LtaE and ThrB in pathway 1 was confirmed by genetic experiments showing that overexpression of yeaB no longer restores growth of ΔpdxB strains lacking either ltaE or thrB. Although pathway 1 is inefficient, it provides the ΔpdxB strain with the ability to grow under conditions in which survival is otherwise impossible. In general, serendipitous assembly of an inefficient pathway from promiscuous activities of available enzymes will be important whenever the pathway provides increased fitness. This might occur when a critical metabolite is no longer available from the environment, and survival depends on assembly of a new biosynthetic pathway. A second circumstance in which an inefficient serendipitous pathway might improve fitness is the appearance of a novel compound in the environment that can be exploited as a source of carbon, nitrogen or phosphorous. Finally, chemotherapeutic agents that block metabolic pathways in bacteria or cancer cells could provide selective pressure for assembly of serendipitous pathways that allow synthesis of the end product of the blocked pathway and thus a previously unappreciated source of drug resistance. In all of these cases, even an inefficient pathway can provide a selective advantage over other cells in a particular environmental niche, allowing survival and subsequent mutations that elevate the efficiency of the pathway. Our work is consistent with the hypothesis that the recognized metabolic network of E. coli is underlain by a denser network of reactions due to promiscuous enzymes that use and generate recognized metabolites, but also unusual metabolites that normally have no physiological role. The findings reported here highlight the abundance of cryptic capabilities in the E. coli proteome that can be drawn on to generate novel pathways. Such pathways could provide a starting place for assembly of more efficient pathways, both in nature and in the hands of metabolic engineers. Bacterial genomes encode hundreds to thousands of enzymes, most of which are specialized for particular functions. However, most enzymes have inefficient promiscuous activities, as well, that generally serve no purpose. Promiscuous reactions can be patched together to form multistep metabolic pathways. Mutations that increase expression or activity of enzymes in such serendipitous pathways can elevate flux through the pathway to a physiologically significant level. In this study, we describe the discovery of three serendipitous pathways that allow synthesis of pyridoxal-5′-phosphate (PLP) in a strain of E. coli that lacks 4-phosphoerythronate (4PE) dehydrogenase (PdxB) when one of seven different genes is overexpressed. We have characterized one of these pathways in detail. This pathway diverts material from serine biosynthesis and generates an intermediate in the normal PLP synthesis pathway downstream of the block caused by lack of PdxB. Steps in the pathway are catalyzed by a protein of unknown function, a broad-specificity enzyme whose physiological role is unknown, and a promiscuous activity of an enzyme that normally serves another function. One step in the pathway may be non-enzymatic.
Collapse
|
20
|
Loss of genetic redundancy in reductive genome evolution. PLoS Comput Biol 2011; 7:e1001082. [PMID: 21379323 PMCID: PMC3040653 DOI: 10.1371/journal.pcbi.1001082] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2010] [Accepted: 01/12/2011] [Indexed: 01/14/2023] Open
Abstract
Biological systems evolved to be functionally robust in uncertain environments, but also highly adaptable. Such robustness is partly achieved by genetic redundancy, where the failure of a specific component through mutation or environmental challenge can be compensated by duplicate components capable of performing, to a limited extent, the same function. Highly variable environments require very robust systems. Conversely, predictable environments should not place a high selective value on robustness. Here we test this hypothesis by investigating the evolutionary dynamics of genetic redundancy in extremely reduced genomes, found mostly in intracellular parasites and endosymbionts. By combining data analysis with simulations of genome evolution we show that in the extensive gene loss suffered by reduced genomes there is a selective drive to keep the diversity of protein families while sacrificing paralogy. We show that this is not a by-product of the known drivers of genome reduction and that there is very limited convergence to a common core of families, indicating that the repertoire of protein families in reduced genomes is the result of historical contingency and niche-specific adaptations. We propose that our observations reflect a loss of genetic redundancy due to a decreased selection for robustness in a predictable environment.
Collapse
|
21
|
Sonavane S, Chakrabarti P. Prediction of active site cleft using support vector machines. J Chem Inf Model 2010; 50:2266-73. [PMID: 21080689 DOI: 10.1021/ci1002922] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Computational tools are available today for the detection and delineation of the clefts and cavities in protein 3D structure and ranking them on the basis of probable binding site clefts. There is a need to improve the ranking of clefts and accuracy of predicting catalytic site clefts. Our results show that the distance of the clefts from protein centroid and sequence entropy of the lining residues, when used in conjunction with the volume, are valuable descriptors for predicting the catalytic site. We have applied the SVM approach for recognizing and ranking the active site clefts and tested its performance using different combinations of attributes. In both the ligand-bound and the unbound forms of structures, our method correctly predicts the active site clefts in 73% of cases at rank one. If we consider the results at rank 3 (i.e., the correct solution is among one of the top three solutions), the correctly predicted cases are 94% and 90% for the bound and the unbound forms of structures, respectively. Our approach improves the ranking of binding site clefts in comparison with CASTp and is comparable to other existing methods like Fpocket. Although the data set for training the SVM approach is rather small in size, the results are encouraging for the method to be used as complementary to other existing tools.
Collapse
Affiliation(s)
- Shrihari Sonavane
- Department of Biochemistry and Bioinformatics Centre, Bose Institute, P-1/12 CIT Scheme VIIM, Kolkata 700 054, India
| | | |
Collapse
|
22
|
Glasner ME, Gerlt JA, Babbitt PC. Mechanisms of protein evolution and their application to protein engineering. ADVANCES IN ENZYMOLOGY AND RELATED AREAS OF MOLECULAR BIOLOGY 2010; 75:193-239, xii-xiii. [PMID: 17124868 DOI: 10.1002/9780471224464.ch3] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Protein engineering holds great promise for the development of new biosensors, diagnostics, therapeutics, and agents for bioremediation. Despite some remarkable successes in experimental and computational protein design, engineered proteins rarely achieve the efficiency or specificity of natural enzymes. Current protein design methods utilize evolutionary concepts, including mutation, recombination, and selection, but the inability to fully recapitulate the success of natural evolution suggests that some evolutionary principles have not been fully exploited. One aspect of protein engineering that has received little attention is how to select the most promising proteins to serve as templates, or scaffolds, for engineering. Two evolutionary concepts that could provide a rational basis for template selection are the conservation of catalytic mechanisms and functional promiscuity. Knowledge of the catalytic motifs responsible for conserved aspects of catalysis in mechanistically diverse superfamilies could be used to identify promising templates for protein engineering. Second, protein evolution often proceeds through promiscuous intermediates, suggesting that templates which are naturally promiscuous for a target reaction could enhance protein engineering strategies. This review explores these ideas and alternative hypotheses concerning protein evolution and engineering. Future research will determine if application of these principles will lead to a protein engineering methodology governed by predictable rules for designing efficient, novel catalysts.
Collapse
Affiliation(s)
- Margaret E Glasner
- Department of Biopharmaceutical Sciences, University of California-San Francisco, San Francisco, CA 94143, USA
| | | | | |
Collapse
|
23
|
Freilich S, Goldovsky L, Gottlieb A, Blanc E, Tsoka S, Ouzounis CA. Stratification of co-evolving genomic groups using ranked phylogenetic profiles. BMC Bioinformatics 2009; 10:355. [PMID: 19860884 PMCID: PMC2775751 DOI: 10.1186/1471-2105-10-355] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2009] [Accepted: 10/27/2009] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. RESULTS The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. CONCLUSION Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples.
Collapse
Affiliation(s)
- Shiri Freilich
- The Blavatnik School of Computer Sciences and School of Medicine, Tel-Aviv University, Tel-Aviv 69978, Israel.
| | | | | | | | | | | |
Collapse
|
24
|
Wagner A. Evolutionary constraints permeate large metabolic networks. BMC Evol Biol 2009; 9:231. [PMID: 19747381 PMCID: PMC2753571 DOI: 10.1186/1471-2148-9-231] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2009] [Accepted: 09/11/2009] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Metabolic networks show great evolutionary plasticity, because they can differ substantially even among closely related prokaryotes. Any one metabolic network can also effectively compensate for the blockage of individual reactions by rerouting metabolic flux through other pathways. These observations, together with the continual discovery of new microbial metabolic pathways and enzymes, raise the possibility that metabolic networks are only weakly constrained in changing their complement of enzymatic reactions. RESULTS To ask whether this is the case, I characterized pairwise and higher-order associations in the co-occurrence of genes encoding metabolic enzymes in more than 200 completely sequenced representatives of prokaryotic genera. The majority of reactions show constrained evolution. Specifically, genes encoding most reactions tend to co-occur with genes encoding other reaction(s). Constrained reaction pairs occur in small sets whose number is substantially greater than expected by chance alone. Most such sets are associated with single biochemical pathways. The respective genes are not always tightly linked, which renders horizontal co-transfer of constrained reaction sets an unlikely sole cause for these patterns of association. CONCLUSION Even a limited number of available genomes suffices to show that metabolic network evolution is highly constrained by reaction combinations that are favored by natural selection. With increasing numbers of completely sequenced genomes, an evolutionary constraint-based approach may enable a detailed characterization of co-evolving metabolic modules.
Collapse
Affiliation(s)
- Andreas Wagner
- University of Zurich, Dept. of Biochemistry, CH-8057 Zurich, Switzerland.
| |
Collapse
|
25
|
Abstract
Anthropogenic compounds used as pesticides, solvents and explosives often persist in the environment and can cause toxicity to humans and wildlife. The persistence of anthropogenic compounds is due to their recent introduction into the environment; microbes in soil and water have had relatively little time to evolve efficient mechanisms for degradation of these new compounds. Some anthropogenic compounds are easily degraded, whereas others are degraded very slowly or only partially, leading to accumulation of toxic products. This review examines the factors that affect the ability of microbes to degrade anthropogenic compounds and the mechanisms by which new pathways emerge in nature. New approaches for engineering microbes with enhanced degradative abilities include assembly of pathways using enzymes from multiple organisms, directed evolution of inefficient enzymes, and genome shuffling to improve microbial fitness under the challenging conditions posed by contaminated environments.
Collapse
|
26
|
Arakaki AK, Huang Y, Skolnick J. EFICAz2: enzyme function inference by a combined approach enhanced by machine learning. BMC Bioinformatics 2009; 10:107. [PMID: 19361344 PMCID: PMC2670841 DOI: 10.1186/1471-2105-10-107] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2008] [Accepted: 04/13/2009] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND We previously developed EFICAz, an enzyme function inference approach that combines predictions from non-completely overlapping component methods. Two of the four components in the original EFICAz are based on the detection of functionally discriminating residues (FDRs). FDRs distinguish between member of an enzyme family that are homofunctional (classified under the EC number of interest) or heterofunctional (annotated with another EC number or lacking enzymatic activity). Each of the two FDR-based components is associated to one of two specific kinds of enzyme families. EFICAz exhibits high precision performance, except when the maximal test to training sequence identity (MTTSI) is lower than 30%. To improve EFICAz's performance in this regime, we: i) increased the number of predictive components and ii) took advantage of consensual information from the different components to make the final EC number assignment. RESULTS We have developed two new EFICAz components, analogs to the two FDR-based components, where the discrimination between homo and heterofunctional members is based on the evaluation, via Support Vector Machine models, of all the aligned positions between the query sequence and the multiple sequence alignments associated to the enzyme families. Benchmark results indicate that: i) the new SVM-based components outperform their FDR-based counterparts, and ii) both SVM-based and FDR-based components generate unique predictions. We developed classification tree models to optimally combine the results from the six EFICAz components into a final EC number prediction. The new implementation of our approach, EFICAz2, exhibits a highly improved prediction precision at MTTSI < 30% compared to the original EFICAz, with only a slight decrease in prediction recall. A comparative analysis of enzyme function annotation of the human proteome by EFICAz2 and KEGG shows that: i) when both sources make EC number assignments for the same protein sequence, the assignments tend to be consistent and ii) EFICAz2 generates considerably more unique assignments than KEGG. CONCLUSION Performance benchmarks and the comparison with KEGG demonstrate that EFICAz2 is a powerful and precise tool for enzyme function annotation, with multiple applications in genome analysis and metabolic pathway reconstruction. The EFICAz2 web service is available at: http://cssb.biology.gatech.edu/skolnick/webservice/EFICAz2/index.html.
Collapse
Affiliation(s)
- Adrian K Arakaki
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, 30318, USA
| | - Ying Huang
- California Institute for Telecommunications and Information Technology, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, 30318, USA
| |
Collapse
|
27
|
Yu C, Zavaljevski N, Desai V, Reifman J. Genome-wide enzyme annotation with precision control: catalytic families (CatFam) databases. Proteins 2009; 74:449-60. [PMID: 18636476 DOI: 10.1002/prot.22167] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
In this article, we present a new method termed CatFam (Catalytic Families) to automatically infer the functions of catalytic proteins, which account for 20-40% of all proteins in living organisms and play a critical role in a variety of biological processes. CatFam is a sequence-based method that generates sequence profiles to represent and infer protein catalytic functions. CatFam generates profiles through a stepwise procedure that carefully controls profile quality and employs nonenzymes as negative samples to establish profile-specific thresholds associated with a predefined nominal false-positive rate (FPR) of predictions. The adjustable FPR allows for fine precision control of each profile and enables the generation of profile databases that meet different needs: function annotation with high precision and hypothesis generation with moderate precision but better recall. Multiple tests of CatFam databases (generated with distinct nominal FPRs) against enzyme and nonenzyme datasets show that the method's predictions have consistently high precision and recall. For example, a 1% FPR database predicts protein catalytic functions for a dataset of enzymes and nonenzymes with 98.6% precision and 95.0% recall. Comparisons of CatFam databases against other established profile-based methods for the functional annotation of 13 bacterial genomes indicate that CatFam consistently achieves higher precision and (in most cases) higher recall, and that (on average) CatFam provides 21.9% additional catalytic functions not inferred by the other similarly reliable methods. These results strongly suggest that the proposed method provides a valuable contribution to the automated prediction of protein catalytic functions. The CatFam databases and the database search program are freely available at http://www.bhsai.org/downloads/catfam.tar.gz.
Collapse
Affiliation(s)
- Chenggang Yu
- Biotechnology HPC Software Applications Institute, Telemedicine and Advanced Technology Research Center, US Army Medical Research and Materiel Command, Fort Detrick, MD 21702-5012, USA
| | | | | | | |
Collapse
|
28
|
Freilich S, Goldovsky L, Ouzounis CA, Thornton JM. Metabolic innovations towards the human lineage. BMC Evol Biol 2008; 8:247. [PMID: 18782449 PMCID: PMC2553087 DOI: 10.1186/1471-2148-8-247] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2008] [Accepted: 09/09/2008] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND We describe a function-driven approach to the analysis of metabolism which takes into account the phylogenetic origin of biochemical reactions to reveal subtle lineage-specific metabolic innovations, undetectable by more traditional methods based on sequence comparison. The origins of reactions and thus entire pathways are inferred using a simple taxonomic classification scheme that describes the evolutionary course of events towards the lineage of interest. We investigate the evolutionary history of the human metabolic network extracted from a metabolic database, construct a network of interconnected pathways and classify this network according to the taxonomic categories representing eukaryotes, metazoa and vertebrates. RESULTS It is demonstrated that lineage-specific innovations correspond to reactions and pathways associated with key phenotypic changes during evolution, such as the emergence of cellular organelles in eukaryotes, cell adhesion cascades in metazoa and the biosynthesis of complex cell-specific biomolecules in vertebrates. CONCLUSION This phylogenetic view of metabolic networks puts gene innovations within an evolutionary context, demonstrating how the emergence of a phenotype in a lineage provides a platform for the development of specialized traits.
Collapse
Affiliation(s)
- Shiri Freilich
- The European Bioinformatics Institute, EMBL Cambridge Outstation, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.
| | | | | | | |
Collapse
|
29
|
Caetano-Anollés G, Yafremava LS, Gee H, Caetano-Anollés D, Kim HS, Mittenthal JE. The origin and evolution of modern metabolism. Int J Biochem Cell Biol 2008; 41:285-97. [PMID: 18790074 DOI: 10.1016/j.biocel.2008.08.022] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2008] [Revised: 08/09/2008] [Accepted: 08/11/2008] [Indexed: 10/21/2022]
Abstract
One fundamental goal of current research is to understand how complex biomolecular networks took the form that we observe today. Cellular metabolism is probably one of the most ancient biological networks and constitutes a good model system for the study of network evolution. While many evolutionary models have been proposed, a substantial body of work suggests metabolic pathways evolve fundamentally by recruitment, in which enzymes are drawn from close or distant regions of the network to perform novel chemistries or use different substrates. Here we review how structural and functional genomics has impacted our knowledge of evolution of modern metabolism and describe some approaches that merge evolutionary and structural genomics with advances in bioinformatics. These include mining the data on structure and function of enzymes for salient patterns of enzyme recruitment. Initial studies suggest modern metabolism originated in enzymes of nucleotide metabolism harboring the P-loop hydrolase fold, probably in pathways linked to the purine metabolic subnetwork. This gateway of recruitment gave rise to pathways related to the synthesis of nucleotides and cofactors for an ancient RNA world. Once the TIM beta/alpha-barrel fold architecture was discovered, it appears metabolic activities were recruited explosively giving rise to subnetworks related to carbohydrate and then amino acid metabolism. Remarkably, recruitment occurred in a layered system reminiscent of Morowitz's prebiotic shells, supporting the notion that modern metabolism represents a palimpsest of ancient metabolic chemistries.
Collapse
|
30
|
Sanjuán R, Nebot MR. A network model for the correlation between epistasis and genomic complexity. PLoS One 2008; 3:e2663. [PMID: 18648534 PMCID: PMC2481279 DOI: 10.1371/journal.pone.0002663] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2008] [Accepted: 06/12/2008] [Indexed: 01/28/2023] Open
Abstract
The study of genetic interactions (epistasis) is central to the understanding of genome organization and evolution. A general correlation between epistasis and genomic complexity has been recently shown, such that in simpler genomes epistasis is antagonistic on average (mutational effects tend to cancel each other out), whereas a transition towards synergistic epistasis occurs in more complex genomes (mutational effects strengthen each other). Here, we use a simple network model to identify basic features explaining this correlation. We show that, in small networks with multifunctional nodes, lack of redundancy, and absence of alternative pathways, epistasis is antagonistic on average. In contrast, lack of multi-functionality, high connectivity, and redundancy favor synergistic epistasis. Moreover, we confirm the previous finding that epistasis is a covariate of mutational robustness: in less robust networks it tends to be antagonistic whereas in more robust networks it tends to be synergistic. We argue that network features associated with antagonistic epistasis are typically found in simple genomes, such as those of viruses and bacteria, whereas the features associated with synergistic epistasis are more extensively exploited by higher eukaryotes.
Collapse
Affiliation(s)
- Rafael Sanjuán
- Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, València, Spain.
| | | |
Collapse
|
31
|
Sun J, Lu X, Rinas U, Zeng AP. Metabolic peculiarities of Aspergillus niger disclosed by comparative metabolic genomics. Genome Biol 2008; 8:R182. [PMID: 17784953 PMCID: PMC2375020 DOI: 10.1186/gb-2007-8-9-r182] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2007] [Revised: 07/13/2007] [Accepted: 09/04/2007] [Indexed: 11/10/2022] Open
Abstract
A genome-scale metabolic network and an in-depth genomic comparison of Aspergillus niger with seven other fungi is presented, revealing more than 1,100 enzyme-coding genes that are unique to A. niger. Background Aspergillus niger is an important industrial microorganism for the production of both metabolites, such as citric acid, and proteins, such as fungal enzymes or heterologous proteins. Despite its extensive industrial applications, the genetic inventory of this fungus is only partially understood. The recently released genome sequence opens a new horizon for both scientific studies and biotechnological applications. Results Here, we present the first genome-scale metabolic network for A. niger and an in-depth genomic comparison of this species to seven other fungi to disclose its metabolic peculiarities. The raw genomic sequences of A. niger ATCC 9029 were first annotated. The reconstructed metabolic network is based on the annotation of two A. niger genomes, CBS 513.88 and ATCC 9029, including enzymes with 988 unique EC numbers, 2,443 reactions and 2,349 metabolites. More than 1,100 enzyme-coding genes are unique to A. niger in comparison to the other seven fungi. For example, we identified additional copies of genes such as those encoding alternative mitochondrial oxidoreductase and citrate synthase in A. niger, which might contribute to the high citric acid production efficiency of this species. Moreover, nine genes were identified as encoding enzymes with EC numbers exclusively found in A. niger, mostly involved in the biosynthesis of complex secondary metabolites and degradation of aromatic compounds. Conclusion The genome-level reconstruction of the metabolic network and genome-based metabolic comparison disclose peculiarities of A. niger highly relevant to its biotechnological applications and should contribute to future rational metabolic design and systems biology studies of this black mold and related species.
Collapse
Affiliation(s)
- Jibin Sun
- Helmholtz Centre for Infection Research, Inhoffenstr., 38124 Braunschweig, Germany
| | - Xin Lu
- Helmholtz Centre for Infection Research, Inhoffenstr., 38124 Braunschweig, Germany
| | - Ursula Rinas
- Helmholtz Centre for Infection Research, Inhoffenstr., 38124 Braunschweig, Germany
| | - An Ping Zeng
- Helmholtz Centre for Infection Research, Inhoffenstr., 38124 Braunschweig, Germany
- Hamburg University of Technology, Institute of Bioprocess and Biosystems Engineering, Denickestr., 21071 Hamburg, Germany
| |
Collapse
|
32
|
Abstract
Zinc is one of the metal ions essential for life, as it is required for the proper functioning of a large number of proteins. Despite its importance, the annotation of zinc-binding proteins in gene banks or protein domain databases still has significant room for improvement. In the present work, we compiled a list of known zinc-binding protein domains and of known zinc-binding sequence motifs (zinc-binding patterns), and then used them jointly to analyze the proteome of 57 different organisms to obtain an overview of zinc usage by archaeal, bacterial, and eukaryotic organisms. Zinc-binding proteins are an abundant fraction of these proteomes, ranging between 4% and 10%. The number of zinc-binding proteins correlates linearly with the total number of proteins encoded by the genome of an organism, but the proportionality constant of Eukaryota (8.8%) is significantly higher than that observed in Bacteria and Archaea (from 5% to 6%). Most of this enrichment is due to the larger portfolio of regulatory proteins in Eukaryota.
Collapse
Affiliation(s)
- Claudia Andreini
- Magnetic Resonance Center (CERM), University of Florence, Via L. Sacconi 6, 50019 Sesto Fiorentino, Italy
| | | | | | | |
Collapse
|
33
|
High precision multi-genome scale reannotation of enzyme function by EFICAz. BMC Genomics 2006; 7:315. [PMID: 17166279 PMCID: PMC1764738 DOI: 10.1186/1471-2164-7-315] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2006] [Accepted: 12/13/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The functional annotation of most genes in newly sequenced genomes is inferred from similarity to previously characterized sequences, an annotation strategy that often leads to erroneous assignments. We have performed a reannotation of 245 genomes using an updated version of EFICAz, a highly precise method for enzyme function prediction. RESULTS Based on our three-field EC number predictions, we have obtained lower-bound estimates for the average enzyme content in Archaea (29%), Bacteria (30%) and Eukarya (18%). Most annotations added in KEGG from 2005 to 2006 agree with EFICAz predictions made in 2005. The coverage of EFICAz predictions is significantly higher than that of KEGG, especially for eukaryotes. Thousands of our novel predictions correspond to hypothetical proteins. We have identified a subset of 64 hypothetical proteins with low sequence identity to EFICAz training enzymes, whose biochemical functions have been recently characterized and find that in 96% (84%) of the cases we correctly identified their three-field (four-field) EC numbers. For two of the 64 hypothetical proteins: PA1167 from Pseudomonas aeruginosa, an alginate lyase (EC 4.2.2.3) and Rv1700 of Mycobacterium tuberculosis H37Rv, an ADP-ribose diphosphatase (EC 3.6.1.13), we have detected annotation lag of more than two years in databases. Two examples are presented where EFICAz predictions act as hypothesis generators for understanding the functional roles of hypothetical proteins: FLJ11151, a human protein overexpressed in cancer that EFICAz identifies as an endopolyphosphatase (EC 3.6.1.10), and MW0119, a protein of Staphylococcus aureus strain MW2 that we propose as candidate virulence factor based on its EFICAz predicted activity, sphingomyelin phosphodiesterase (EC 3.1.4.12). CONCLUSION Our results suggest that we have generated enzyme function annotations of high precision and recall. These predictions can be mined and correlated with other information sources to generate biologically significant hypotheses and can be useful for comparative genome analysis and automated metabolic pathway reconstruction.
Collapse
|