1
|
de Crécy-Lagard V, Dias R, Friedberg I, Yuan Y, Swairjo MA. Limitations of Current Machine-Learning Models in Predicting Enzymatic Functions for Uncharacterized Proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.01.601547. [PMID: 39005379 PMCID: PMC11244979 DOI: 10.1101/2024.07.01.601547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Thirty to seventy percent of proteins in any given genome have no assigned function and have been labeled as the protein "unknome". This large knowledge gap prevents the biological community from fully leveraging the plethora of genomic data that is now available. Machine-learning approaches are showing some promise in propagating functional knowledge from experimentally characterized proteins to the correct set of isofunctional orthologs. However, they largely fail to predict enzymatic functions unseen in the training set, as shown by dissecting the predictions made for over 450 enzymes of unknown function from the model bacteria Escherichia coli uxgsing the DeepECTransformer platform. Lessons from these failures can help the community develop machine-learning methods that assist domain experts in making testable functional predictions for more members of the uncharacterized proteome. Article Summary Many proteins in any genome, ranging from 30 to 70%, lack an assigned function. This knowledge gap limits the full use of the vast available genomic data. Machine learning has shown promise in transferring functional knowledge from proteins of known functions to similar ones, but largely fails to predict novel functions not seen in its training data. Understanding these failures can guide the development of better machine-learning methods to help experts make accurate functional predictions for uncharacterized proteins.
Collapse
|
2
|
Kim NY, Kim OB. Oxamic transcarbamylase of Escherichia coli is encoded by the three genes allFGH (formerly fdrA, ylbE, and ylbF). Appl Environ Microbiol 2024; 90:e0095724. [PMID: 38888336 PMCID: PMC11326118 DOI: 10.1128/aem.00957-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 05/21/2024] [Indexed: 06/20/2024] Open
Abstract
Escherichia coli uses allantoin as the sole nitrogen source during anaerobic growth. In the final step of allantoin degradation, oxamic transcarbamylase (OXTCase) converts oxalurate to carbamoyl phosphate (CP) and oxamate. The activity of this enzyme was first measured in Streptococcus allantoicus in the 1960s, but no OXTCase enzyme or the encoding gene(s) have been found in any strain. This study discovered that allFGH (fdrA, ylbE, and ylbF) are the genes that encode the global orphan enzyme OXTCase. The three genes form an operon together with allK (ybcF), encoding catabolic carbamate kinase. The allFGHK operon is located directly downstream of the allECD operon that encodes enzymes for the preceding steps of OXTCase. The OXTCase kinetic parameters were analyzed using the purified protein composed of AllF-AllG-AllH (FdrA-YlbE-YlbF); for the substrate CP, KM and Vmax were 1.3 mM and 15.4 U/mg OXTCase, respectively, and for the substrate oxamate, they were 36.9 mM and 27.0 U/mg OXTCase. In addition, the OXTCase encoded by the three genes is a novel transcarbamylase that shows no similarity with known enzymes of the transcarbamylase family such as aspartate transcarbamylase, ornithine transcarbamylase, and YgeW transcarbamylase. The present study elucidated the anaerobic allantoin degradation pathway of E. coli. Therefore, we suggest that the genes fdrA, ylbE, and ylbF are renamed allF, allG, and allH, respectively.IMPORTANCEThe anaerobic allantoin degradation pathway of Escherichia coli includes a global orphan enzyme, oxamic transcarbamylase (OXTCase), which converts oxalurate to carbamoyl phosphate and oxamate. This study found that the allFGH (fdrA, ylbE, and ylbF) genes encode OXTCase. The OXTCase activity and kinetics were successfully determined with purified recombinant AllF-AllG-AllH (FdrA-YlbE-YlbF). This OXTCase is a novel transcarbamylase that shows no similarity with known enzymes of the transcarbamylase family such as aspartate transcarbamylase (ATCase), ornithine transcarbamylase (OTCase), and YgeW transcarbamylase (YTCase). In addition, OXTCase activity requires three genes, whereas ATCase is encoded by two genes, and OTCase and YTCase are encoded by a single gene. The current study discovered OXTCase, the last unknown step in allantoin degradation, and this enzyme is a new member of the transcarbamylase group that was previously unknown.
Collapse
Affiliation(s)
- Nam Yeun Kim
- Division of
EcoScience, Department of Life Science, Ewha Womans
University, Seoul,
Republic of Korea
| | - Ok Bin Kim
- Division of
EcoScience, Department of Life Science, Ewha Womans
University, Seoul,
Republic of Korea
| |
Collapse
|
3
|
Wu S, Zhou H, Chen D, Lu Y, Li Y, Qiao J. Multi-omic analysis tools for microbial metabolites prediction. Brief Bioinform 2024; 25:bbae264. [PMID: 38859767 PMCID: PMC11165163 DOI: 10.1093/bib/bbae264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 05/08/2024] [Indexed: 06/12/2024] Open
Abstract
How to resolve the metabolic dark matter of microorganisms has long been a challenging problem in discovering active molecules. Diverse omics tools have been developed to guide the discovery and characterization of various microbial metabolites, which make it gradually possible to predict the overall metabolites for individual strains. The combinations of multi-omic analysis tools effectively compensates for the shortcomings of current studies that focus only on single omics or a broad class of metabolites. In this review, we systematically update, categorize and sort out different analysis tools for microbial metabolites prediction in the last five years to appeal for the multi-omic combination on the understanding of the metabolic nature of microbes. First, we provide the general survey on different updated prediction databases, webservers, or software that based on genomics, transcriptomics, proteomics, and metabolomics, respectively. Then, we discuss the essentiality on the integration of multi-omics data to predict metabolites of different microbial strains and communities, as well as stressing the combination of other techniques, such as systems biology methods and data-driven algorithms. Finally, we identify key challenges and trends in developing multi-omic analysis tools for more comprehensive prediction on diverse microbial metabolites that contribute to human health and disease treatment.
Collapse
Affiliation(s)
- Shengbo Wu
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
- Zhejiang Institute of Tianjin University, Shaoxing, Shaoxing 312300, China
| | - Haonan Zhou
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Danlei Chen
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
- Zhejiang Institute of Tianjin University, Shaoxing, Shaoxing 312300, China
| | - Yutong Lu
- Zhejiang Institute of Tianjin University, Shaoxing, Shaoxing 312300, China
| | - Yanni Li
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
- Key Laboratory of Systems Bioengineering, Ministry of Education (Tianjin University), Tianjin 300072, China
| | - Jianjun Qiao
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
- Zhejiang Institute of Tianjin University, Shaoxing, Shaoxing 312300, China
- Key Laboratory of Systems Bioengineering, Ministry of Education (Tianjin University), Tianjin 300072, China
- Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin 300072, China
| |
Collapse
|
4
|
Rodionova IA, Hosseinnia A, Kim S, Goodacre N, Zhang L, Zhang Z, Palsson B, Uetz P, Babu M, Saier MH. E. coli allantoinase is activated by the downstream metabolic enzyme, glycerate kinase, and stabilizes the putative allantoin transporter by direct binding. Sci Rep 2023; 13:7345. [PMID: 37147430 PMCID: PMC10163214 DOI: 10.1038/s41598-023-31812-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 03/17/2023] [Indexed: 05/07/2023] Open
Abstract
Allantoin is a good source of ammonium for many organisms, and in Escherichia coli it is utilized under anaerobic conditions. We provide evidence that allantoinase (AllB) is allosterically activated by direct binding of the allantoin catabolic enzyme, glycerate 2-kinase (GlxK) in the presence of glyoxylate. Glyoxylate is known to be an effector of the AllR repressor which regulates the allantoin utilization operons in E. coli. AllB has low affinity for allantoin, but its activation by GlxK leads to increased affinity for its substrate. We also show that the predicted allantoin transporter YbbW (re-named AllW) has allantoin specificity and the protein-protein interaction with AllB. Our results show that the AllB-dependent allantoin degradative pathway is subject to previously unrecognized regulatory mechanisms involving direct protein-protein interactions.
Collapse
Affiliation(s)
- Irina A Rodionova
- Department of Molecular Biology, Division of Biological Sciences, University of California at San Diego, La Jolla, CA, 92093, USA.
- Department of Bioengineering, Division of Engineering, University of California at San Diego, La Jolla, CA, 92093-0116, USA.
| | - Ali Hosseinnia
- Department of Biochemistry, University of Regina, Regina, SK, S4S 0A2, Canada
| | - Sunyoung Kim
- Department of Biochemistry, University of Regina, Regina, SK, S4S 0A2, Canada
| | - Norman Goodacre
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Li Zhang
- Department of Molecular Biology, Division of Biological Sciences, University of California at San Diego, La Jolla, CA, 92093, USA
- College of Food Science and Engineering, Ocean University of China, Yushan Road, Shinan District, Qingdao, 266003, China
| | - Zhongge Zhang
- Department of Molecular Biology, Division of Biological Sciences, University of California at San Diego, La Jolla, CA, 92093, USA
| | - Bernhard Palsson
- Department of Bioengineering, Division of Engineering, University of California at San Diego, La Jolla, CA, 92093-0116, USA
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800, Lyngby, Denmark
| | - Peter Uetz
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Mohan Babu
- Department of Biochemistry, University of Regina, Regina, SK, S4S 0A2, Canada
| | - Milton H Saier
- Department of Molecular Biology, Division of Biological Sciences, University of California at San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
5
|
Huynh TN, Stewart V. Purine catabolism by enterobacteria. Adv Microb Physiol 2023; 82:205-266. [PMID: 36948655 DOI: 10.1016/bs.ampbs.2023.01.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2023]
Abstract
Purines are abundant among organic nitrogen sources and have high nitrogen content. Accordingly, microorganisms have evolved different pathways to catabolize purines and their metabolic products such as allantoin. Enterobacteria from the genera Escherichia, Klebsiella and Salmonella have three such pathways. First, the HPX pathway, found in the genus Klebsiella and very close relatives, catabolizes purines during aerobic growth, extracting all four nitrogen atoms in the process. This pathway includes several known or predicted enzymes not previously observed in other purine catabolic pathways. Second, the ALL pathway, found in strains from all three species, catabolizes allantoin during anaerobic growth in a branched pathway that also includes glyoxylate assimilation. This allantoin fermentation pathway originally was characterized in a gram-positive bacterium, and therefore is widespread. Third, the XDH pathway, found in strains from Escherichia and Klebsiella spp., at present is ill-defined but likely includes enzymes to catabolize purines during anaerobic growth. Critically, this pathway may include an enzyme system for anaerobic urate catabolism, a phenomenon not previously described. Documenting such a pathway would overturn the long-held assumption that urate catabolism requires oxygen. Overall, this broad capability for purine catabolism during either aerobic or anaerobic growth suggests that purines and their metabolites contribute to enterobacterial fitness in a variety of environments.
Collapse
Affiliation(s)
- TuAnh Ngoc Huynh
- Department of Food Science, University of Wisconsin, Madison, WI, United States
| | - Valley Stewart
- Department of Microbiology & Molecular Genetics, University of California, Davis, CA, United States.
| |
Collapse
|
6
|
Kim NY, Kim OB. The ybcF Gene of Escherichia coli Encodes a Local Orphan Enzyme, Catabolic Carbamate Kinase. J Microbiol Biotechnol 2022; 32:1527-1536. [PMID: 36384810 PMCID: PMC9843812 DOI: 10.4014/jmb.2210.10037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 10/31/2022] [Accepted: 11/01/2022] [Indexed: 11/18/2022]
Abstract
Escherichia coli can use allantoin as its sole nitrogen source under anaerobic conditions. The ureidoglycolate produced by double release of ammonia from allantoin can flow into either the glyoxylate shunt or further catabolic transcarbamoylation. Although the former pathway is well studied, the genes of the latter (catabolic) pathway are not known. In the catabolic pathway, ureidoglycolate is finally converted to carbamoyl phosphate (CP) and oxamate, and then CP is dephosphorylated to carbamate by a catabolic carbamate kinase (CK), whereby ATP is formed. We identified the ybcF gene in a gene cluster containing fdrA-ylbE-ylbF-ybcF that is located downstream of the allDCE-operon. Reverse transcription PCR of total mRNA confirmed that the genes fdrA, ylbE, ylbF, and ybcF are co-transcribed. Deletion of ybcF caused only a slight increase in metabolic flow into the glyoxylate pathway, probably because CP was used to de novo synthesize pyrimidine and arginine. The activity of the catabolic CK was analyzed using purified YbcF protein. The Vmax is 1.82 U/mg YbcF for CP and 1.94 U/mg YbcF for ADP, and the KM value is 0.47 mM for CP and 0.43 mM for ADP. With these results, it was experimentally revealed that the ybcF gene of E. coli encodes catabolic CK, which completes anaerobic allantoin degradation through substrate-level phosphorylation. Therefore, we suggest renaming the ybcF gene as allK.
Collapse
Affiliation(s)
- Nam Yeun Kim
- Department of Life Science, Division of EcoScience, Ewha Womans University, Seoul 03760, Republic of Korea
| | - Ok Bin Kim
- Department of Life Science, Division of EcoScience, Ewha Womans University, Seoul 03760, Republic of Korea
| |
Collapse
|
7
|
An Escherichia coli FdrA Variant Derived from Syntrophic Coculture with a Methanogen Increases Succinate Production Due to Changes in Allantoin Degradation. mSphere 2021; 6:e0065421. [PMID: 34494882 PMCID: PMC8550087 DOI: 10.1128/msphere.00654-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Wild-type Escherichia coli was adapted to syntrophic growth with Methanobacterium formicicum for glycerol fermentation over 44 weeks. Succinate production by E. coli started to increase in the early stages of syntrophic growth. Genetic analysis of the cultured E. coli population by pooled sequencing at eight time points suggests that (i) rapid evolution occurred through repeated emergence of mutators that introduced a large number of nucleotide variants and (ii) many mutators increased to high frequencies but remained polymorphic throughout the continuous cultivation. The evolved E. coli populations exhibited gains both in fitness and succinate production, but only for growth under glycerol fermentation with M. formicicum (the condition for this laboratory evolution) and not under other growth conditions. The mutant alleles of the 69 single nucleotide polymorphisms (SNPs) identified in the adapted E. coli populations were constructed individually in the ancestral wild-type E. coli. We analyzed the phenotypic changes caused by 84 variants, including 15 nonsense variants, and found that FdrAD296Y was the most significant variant leading to increased succinate production. Transcription of fdrA was induced under anaerobic allantoin degradation conditions, and FdrA was shown to play a crucial role in oxamate production. The FdrAD296Y variant increased glyoxylate conversion to malate by accelerating oxamate production, which promotes carbon flow through the C4 branch, leading to increased succinate production. IMPORTANCE Here, we demonstrate the ability of E. coli to perform glycerol fermentation in coculture with the methanogen M. formicicum to produce succinate. We found that the production of succinate by E. coli significantly increased during successive cocultivation. Genomic DNA sequencing, evaluation of relative fitness, and construction of SNPs were performed, from which FdrAD296Y was identified as the most significant variant to enable increased succinate production by E. coli. The function of FdrA is uncertain. In this study, experiments with gene expression assays and metabolic analysis showed for the first time that FdrA could be the “orphan enzyme” oxamate:carbamoyltransferase in anaerobic allantoin degradation. Furthermore, we demonstrate that the anaerobic allantoin degradation pathway is linked to succinate production via the glyoxylate pathway during glycerol fermentation.
Collapse
|
8
|
Role of Bioinformatics in Biological Sciences. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
9
|
Otero-Muras I, Carbonell P. Automated engineering of synthetic metabolic pathways for efficient biomanufacturing. Metab Eng 2020; 63:61-80. [PMID: 33316374 DOI: 10.1016/j.ymben.2020.11.012] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Revised: 11/15/2020] [Accepted: 11/20/2020] [Indexed: 12/19/2022]
Abstract
Metabolic engineering involves the engineering and optimization of processes from single-cell to fermentation in order to increase production of valuable chemicals for health, food, energy, materials and others. A systems approach to metabolic engineering has gained traction in recent years thanks to advances in strain engineering, leading to an accelerated scaling from rapid prototyping to industrial production. Metabolic engineering is nowadays on track towards a truly manufacturing technology, with reduced times from conception to production enabled by automated protocols for DNA assembly of metabolic pathways in engineered producer strains. In this review, we discuss how the success of the metabolic engineering pipeline often relies on retrobiosynthetic protocols able to identify promising production routes and dynamic regulation strategies through automated biodesign algorithms, which are subsequently assembled as embedded integrated genetic circuits in the host strain. Those approaches are orchestrated by an experimental design strategy that provides optimal scheduling planning of the DNA assembly, rapid prototyping and, ultimately, brings forward an accelerated Design-Build-Test-Learn cycle and the overall optimization of the biomanufacturing process. Achieving such a vision will address the increasingly compelling demand in our society for delivering valuable biomolecules in an affordable, inclusive and sustainable bioeconomy.
Collapse
Affiliation(s)
- Irene Otero-Muras
- BioProcess Engineering Group, IIM-CSIC, Spanish National Research Council, Vigo, 36208, Spain.
| | - Pablo Carbonell
- Institute of Industrial Control Systems and Computing (ai2), Universitat Politècnica de València, 46022, Spain.
| |
Collapse
|
10
|
Enzyme annotation for orphan and novel reactions using knowledge of substrate reactive sites. Proc Natl Acad Sci U S A 2019; 116:7298-7307. [PMID: 30910961 PMCID: PMC6462048 DOI: 10.1073/pnas.1818877116] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent advances in synthetic biochemistry have resulted in a wealth of novel hypothetical enzymatic reactions that are not matched to protein-encoding genes, deeming them “orphan.” A large number of known metabolic enzymes are also orphan, leaving important gaps in metabolic network maps. Proposing genes for the catalysis of orphan reactions is critical for applications ranging from biotechnology to medicine. In this work, the computational method BridgIT identified potential enzymes of orphan reactions and nearly all theoretically possible biochemical transformations, providing candidate genes to catalyze these reactions to the research community. The BridgIT online tool will allow researchers to fill the knowledge gaps in metabolic networks and will act as a starting point for designing novel enzymes to catalyze nonnatural transformations. Thousands of biochemical reactions with characterized activities are “orphan,” meaning they cannot be assigned to a specific enzyme, leaving gaps in metabolic pathways. Novel reactions predicted by pathway-generation tools also lack associated sequences, limiting protein engineering applications. Associating orphan and novel reactions with known biochemistry and suggesting enzymes to catalyze them is a daunting problem. We propose the method BridgIT to identify candidate genes and catalyzing proteins for these reactions. This method introduces information about the enzyme binding pocket into reaction-similarity comparisons. BridgIT assesses the similarity of two reactions, one orphan and one well-characterized nonorphan reaction, using their substrate reactive sites, their surrounding structures, and the structures of the generated products to suggest enzymes that catalyze the most-similar nonorphan reactions as candidates for also catalyzing the orphan ones. We performed two large-scale validation studies to test BridgIT predictions against experimental biochemical evidence. For the 234 orphan reactions from the Kyoto Encyclopedia of Genes and Genomes (KEGG) 2011 (a comprehensive enzymatic-reaction database) that became nonorphan in KEGG 2018, BridgIT predicted the exact or a highly related enzyme for 211 of them. Moreover, for 334 of 379 novel reactions in 2014 that were later cataloged in KEGG 2018, BridgIT predicted the exact or highly similar enzymes. BridgIT requires knowledge about only four connecting bonds around the atoms of the reactive sites to correctly annotate proteins for 93% of analyzed enzymatic reactions. Increasing to seven connecting bonds allowed for the accurate identification of a sequence for nearly all known enzymatic reactions.
Collapse
|
11
|
Danchin A, Ouzounis C, Tokuyasu T, Zucker JD. No wisdom in the crowd: genome annotation in the era of big data - current status and future prospects. Microb Biotechnol 2018; 11:588-605. [PMID: 29806194 PMCID: PMC6011933 DOI: 10.1111/1751-7915.13284] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Science and engineering rely on the accumulation and dissemination of knowledge to make discoveries and create new designs. Discovery-driven genome research rests on knowledge passed on via gene annotations. In response to the deluge of sequencing big data, standard annotation practice employs automated procedures that rely on majority rules. We argue this hinders progress through the generation and propagation of errors, leading investigators into blind alleys. More subtly, this inductive process discourages the discovery of novelty, which remains essential in biological research and reflects the nature of biology itself. Annotation systems, rather than being repositories of facts, should be tools that support multiple modes of inference. By combining deduction, induction and abduction, investigators can generate hypotheses when accurate knowledge is extracted from model databases. A key stance is to depart from 'the sequence tells the structure tells the function' fallacy, placing function first. We illustrate our approach with examples of critical or unexpected pathways, using MicroScope to demonstrate how tools can be implemented following the principles we advocate. We end with a challenge to the reader.
Collapse
Affiliation(s)
- Antoine Danchin
- Integromics, Institute of Cardiometabolism and Nutrition, Hôpital de la Pitié-Salpêtrière, 47 Boulevard de l'Hôpital, 75013, Paris, France
- School of Biomedical Sciences, Li KaShing Faculty of Medicine, Hong Kong University, 21 Sassoon Road, Pokfulam, Hong Kong
| | - Christos Ouzounis
- Biological Computation and Process Laboratory, Centre for Research and Technology Hellas, Chemical Process and Energy Resources Institute, Thessalonica, 57001, Greece
| | - Taku Tokuyasu
- Shenzhen Institutes of Advanced Technology, Institute of Synthetic Biology, Shenzhen University Town, 1068 Xueyuan Avenue, Shenzhen, China
| | - Jean-Daniel Zucker
- Integromics, Institute of Cardiometabolism and Nutrition, Hôpital de la Pitié-Salpêtrière, 47 Boulevard de l'Hôpital, 75013, Paris, France
| |
Collapse
|
12
|
Campos M, Govers SK, Irnov I, Dobihal GS, Cornet F, Jacobs-Wagner C. Genomewide phenotypic analysis of growth, cell morphogenesis, and cell cycle events in Escherichia coli. Mol Syst Biol 2018; 14:e7573. [PMID: 29941428 PMCID: PMC6018989 DOI: 10.15252/msb.20177573] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Cell size, cell growth, and cell cycle events are necessarily intertwined to achieve robust bacterial replication. Yet, a comprehensive and integrated view of these fundamental processes is lacking. Here, we describe an image‐based quantitative screen of the single‐gene knockout collection of Escherichia coli and identify many new genes involved in cell morphogenesis, population growth, nucleoid (bulk chromosome) dynamics, and cell division. Functional analyses, together with high‐dimensional classification, unveil new associations of morphological and cell cycle phenotypes with specific functions and pathways. Additionally, correlation analysis across ~4,000 genetic perturbations shows that growth rate is surprisingly not predictive of cell size. Growth rate was also uncorrelated with the relative timings of nucleoid separation and cell constriction. Rather, our analysis identifies scaling relationships between cell size and nucleoid size and between nucleoid size and the relative timings of nucleoid separation and cell division. These connections suggest that the nucleoid links cell morphogenesis to the cell cycle.
Collapse
Affiliation(s)
- Manuel Campos
- Microbial Sciences Institute, Yale University, West Haven, CT, USA.,Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, USA.,Howard Hughes Medical Institute, Yale University, New Haven, CT, USA.,Laboratoire de Microbiologie et Génétique Moléculaires (LMGM; UMR5100), Centre de Biologie Intégrative (CBI), Centre National de la Recherche Scientifique (CNRS), Université de Toulouse, UPS, Toulouse, France
| | - Sander K Govers
- Microbial Sciences Institute, Yale University, West Haven, CT, USA.,Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, USA
| | - Irnov Irnov
- Microbial Sciences Institute, Yale University, West Haven, CT, USA.,Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, USA
| | - Genevieve S Dobihal
- Microbial Sciences Institute, Yale University, West Haven, CT, USA.,Howard Hughes Medical Institute, Yale University, New Haven, CT, USA
| | - François Cornet
- Laboratoire de Microbiologie et Génétique Moléculaires (LMGM; UMR5100), Centre de Biologie Intégrative (CBI), Centre National de la Recherche Scientifique (CNRS), Université de Toulouse, UPS, Toulouse, France
| | - Christine Jacobs-Wagner
- Microbial Sciences Institute, Yale University, West Haven, CT, USA .,Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, USA.,Howard Hughes Medical Institute, Yale University, New Haven, CT, USA.,Department of Microbial Pathogenesis, Yale School of Medicine, New Haven, CT, USA
| |
Collapse
|
13
|
Abstract
Directed evolution (DE) is a powerful tool for optimizing an enzyme's properties toward a particular objective, such as broader substrate scope, greater thermostability, or increased kcat. A successful DE project requires the generation of genetic diversity and subsequent screening or selection to identify variants with improved fitness. In contrast to random methods (error-prone PCR or DNA shuffling), site-directed mutagenesis enables the rational design of variant libraries and provides control over the nature and frequency of the encoded mutations. Knowledge of protein structure, dynamics, enzyme mechanisms, and natural evolution demonstrates that multiple (combinatorial) mutations are required to discover the most improved variants. To this end, we describe an experimentally straightforward and low-cost method for the preparation of combinatorial variant libraries. Our approach employs a two-step PCR protocol, first producing mutagenic megaprimers, which can then be combined in a "mix-and-match" fashion to generate diverse sets of combinatorial variant libraries both quickly and accurately.
Collapse
|
14
|
Calhoun S, Korczynska M, Wichelecki DJ, San Francisco B, Zhao S, Rodionov DA, Vetting MW, Al-Obaidi NF, Lin H, O'Meara MJ, Scott DA, Morris JH, Russel D, Almo SC, Osterman AL, Gerlt JA, Jacobson MP, Shoichet BK, Sali A. Prediction of enzymatic pathways by integrative pathway mapping. eLife 2018; 7:31097. [PMID: 29377793 PMCID: PMC5788505 DOI: 10.7554/elife.31097] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Accepted: 12/18/2017] [Indexed: 01/17/2023] Open
Abstract
The functions of most proteins are yet to be determined. The function of an enzyme is often defined by its interacting partners, including its substrate and product, and its role in larger metabolic networks. Here, we describe a computational method that predicts the functions of orphan enzymes by organizing them into a linear metabolic pathway. Given candidate enzyme and metabolite pathway members, this aim is achieved by finding those pathways that satisfy structural and network restraints implied by varied input information, including that from virtual screening, chemoinformatics, genomic context analysis, and ligand -binding experiments. We demonstrate this integrative pathway mapping method by predicting the L-gulonate catabolic pathway in Haemophilus influenzae Rd KW20. The prediction was subsequently validated experimentally by enzymology, crystallography, and metabolomics. Integrative pathway mapping by satisfaction of structural and network restraints is extensible to molecular networks in general and thus formally bridges the gap between structural biology and systems biology.
Collapse
Affiliation(s)
- Sara Calhoun
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
| | - Magdalena Korczynska
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, United States
| | - Daniel J Wichelecki
- Institute for Genomic Biology, University of Illinois, Urbana, United States.,Department of Biochemistry, University of Illinois, Urbana, United States.,Department of Chemistry, University of Illinois, Urbana, United States
| | - Brian San Francisco
- Institute for Genomic Biology, University of Illinois, Urbana, United States
| | - Suwen Zhao
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, United States
| | - Dmitry A Rodionov
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, United States.,A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Matthew W Vetting
- Department of Biochemistry, Albert Einstein College of Medicine, New York, United States
| | - Nawar F Al-Obaidi
- Department of Biochemistry, Albert Einstein College of Medicine, New York, United States
| | - Henry Lin
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, United States
| | - Matthew J O'Meara
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, United States
| | - David A Scott
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, United States
| | - John H Morris
- Resource for Biocomputing, Visualization and Informatics, Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, United States
| | - Daniel Russel
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
| | - Steven C Almo
- Department of Biochemistry, Albert Einstein College of Medicine, New York, United States
| | - Andrei L Osterman
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, United States
| | - John A Gerlt
- Institute for Genomic Biology, University of Illinois, Urbana, United States.,Department of Biochemistry, University of Illinois, Urbana, United States.,Department of Chemistry, University of Illinois, Urbana, United States
| | - Matthew P Jacobson
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, United States
| | - Brian K Shoichet
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, United States
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States.,Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, United States.,California Institute for Quantitative Biosciences, University of California, San Francisco, San Francisco, United States
| |
Collapse
|
15
|
Carbonell P, Currin A, Jervis AJ, Rattray NJW, Swainston N, Yan C, Takano E, Breitling R. Bioinformatics for the synthetic biology of natural products: integrating across the Design-Build-Test cycle. Nat Prod Rep 2016; 33:925-32. [PMID: 27185383 PMCID: PMC5063057 DOI: 10.1039/c6np00018e] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Indexed: 12/11/2022]
Abstract
Covering: 2000 to 2016Progress in synthetic biology is enabled by powerful bioinformatics tools allowing the integration of the design, build and test stages of the biological engineering cycle. In this review we illustrate how this integration can be achieved, with a particular focus on natural products discovery and production. Bioinformatics tools for the DESIGN and BUILD stages include tools for the selection, synthesis, assembly and optimization of parts (enzymes and regulatory elements), devices (pathways) and systems (chassis). TEST tools include those for screening, identification and quantification of metabolites for rapid prototyping. The main advantages and limitations of these tools as well as their interoperability capabilities are highlighted.
Collapse
Affiliation(s)
- Pablo Carbonell
- Manchester Centre for Fine and Specialty Chemicals (SYNBIOCHEM) , Manchester Institute of Biotechnology , University of Manchester , Manchester M1 7DN , UK . ;
| | - Andrew Currin
- Manchester Centre for Fine and Specialty Chemicals (SYNBIOCHEM) , Manchester Institute of Biotechnology , University of Manchester , Manchester M1 7DN , UK . ;
| | - Adrian J. Jervis
- Manchester Centre for Fine and Specialty Chemicals (SYNBIOCHEM) , Manchester Institute of Biotechnology , University of Manchester , Manchester M1 7DN , UK . ;
| | - Nicholas J. W. Rattray
- Manchester Centre for Fine and Specialty Chemicals (SYNBIOCHEM) , Manchester Institute of Biotechnology , University of Manchester , Manchester M1 7DN , UK . ;
| | - Neil Swainston
- Manchester Centre for Fine and Specialty Chemicals (SYNBIOCHEM) , Manchester Institute of Biotechnology , University of Manchester , Manchester M1 7DN , UK . ;
| | - Cunyu Yan
- Manchester Centre for Fine and Specialty Chemicals (SYNBIOCHEM) , Manchester Institute of Biotechnology , University of Manchester , Manchester M1 7DN , UK . ;
| | - Eriko Takano
- Manchester Centre for Fine and Specialty Chemicals (SYNBIOCHEM) , Manchester Institute of Biotechnology , University of Manchester , Manchester M1 7DN , UK . ;
| | - Rainer Breitling
- Manchester Centre for Fine and Specialty Chemicals (SYNBIOCHEM) , Manchester Institute of Biotechnology , University of Manchester , Manchester M1 7DN , UK . ;
| |
Collapse
|
16
|
Missing gene identification using functional coherence scores. Sci Rep 2016; 6:31725. [PMID: 27552989 PMCID: PMC4995438 DOI: 10.1038/srep31725] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 07/22/2016] [Indexed: 11/18/2022] Open
Abstract
Reconstructing metabolic and signaling pathways is an effective way of interpreting a genome sequence. A challenge in a pathway reconstruction is that often genes in a pathway cannot be easily found, reflecting current imperfect information of the target organism. In this work, we developed a new method for finding missing genes, which integrates multiple features, including gene expression, phylogenetic profile, and function association scores. Particularly, for considering function association between candidate genes and neighboring proteins to the target missing gene in the network, we used Co-occurrence Association Score (CAS) and PubMed Association Score (PAS), which are designed for capturing functional coherence of proteins. We showed that adding CAS and PAS substantially improve the accuracy of identifying missing genes in the yeast enzyme-enzyme network compared to the cases when only the conventional features, gene expression, phylogenetic profile, were used. Finally, it was also demonstrated that the accuracy improves by considering indirect neighbors to the target enzyme position in the network using a proper network-topology-based weighting scheme.
Collapse
|
17
|
Belda E, van Heck RGA, José Lopez-Sanchez M, Cruveiller S, Barbe V, Fraser C, Klenk HP, Petersen J, Morgat A, Nikel PI, Vallenet D, Rouy Z, Sekowska A, Martins dos Santos VAP, de Lorenzo V, Danchin A, Médigue C. The revisited genome ofPseudomonas putidaKT2440 enlightens its value as a robust metabolicchassis. Environ Microbiol 2016; 18:3403-3424. [DOI: 10.1111/1462-2920.13230] [Citation(s) in RCA: 217] [Impact Index Per Article: 27.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 01/16/2016] [Indexed: 01/08/2023]
Affiliation(s)
- Eugeni Belda
- Alternative Energies and Atomic Energy Commission (CEA), Genomic Institute & CNRS-UMR8030 & Evry University, Laboratory of Bioinformatics Analysis in Genomics and Metabolism; 2 rue Gaston Crémieux 91057 Evry France
- Institut Pasteur, Unit of Insect Vector Genetics and Genomics, Department of Parasitology and Mycology; 28, rue du Dr. Roux, Paris, Cedex 15 75724 France
| | - Ruben G. A. van Heck
- Laboratory of Systems and Synthetic Biology, Wageningen University; Dreijenplein 10, Building number 316 6703 HB Wageningen The Netherlands
| | - Maria José Lopez-Sanchez
- Alternative Energies and Atomic Energy Commission (CEA), Genomic Institute & CNRS-UMR8030 & Evry University, Laboratory of Bioinformatics Analysis in Genomics and Metabolism; 2 rue Gaston Crémieux 91057 Evry France
- AMAbiotics SAS, Institut du Cerveau et de la Moëlle Épinière, Hôpital de la Pitié-Salpêtrière; Paris France
| | - Stéphane Cruveiller
- Alternative Energies and Atomic Energy Commission (CEA), Genomic Institute & CNRS-UMR8030 & Evry University, Laboratory of Bioinformatics Analysis in Genomics and Metabolism; 2 rue Gaston Crémieux 91057 Evry France
| | - Valérie Barbe
- Alternative Energies and Atomic Energy Commission (CEA), Genomic Institute, National Sequencing Center; 2 rue Gaston Crémieux 91057 Evry France
| | - Claire Fraser
- Institute for Genome Sciences, Department of Microbiology and Immunology, University of Maryland School of Medicine; Baltimore MD USA
| | - Hans-Peter Klenk
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures; Braunschweig Germany
- School of Biology, Newcastle University; Newcastle upon Tyne NE1 7RU UK
| | - Jörn Petersen
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures; Braunschweig Germany
| | - Anne Morgat
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics; Geneva CH-1206 Switzerland
| | - Pablo I. Nikel
- Systems and Synthetic Biology Program, Centro Nacional de Biotecnología (CNB-CSIC); C/Darwin 3 28049 Madrid Spain
| | - David Vallenet
- Alternative Energies and Atomic Energy Commission (CEA), Genomic Institute & CNRS-UMR8030 & Evry University, Laboratory of Bioinformatics Analysis in Genomics and Metabolism; 2 rue Gaston Crémieux 91057 Evry France
| | - Zoé Rouy
- Alternative Energies and Atomic Energy Commission (CEA), Genomic Institute & CNRS-UMR8030 & Evry University, Laboratory of Bioinformatics Analysis in Genomics and Metabolism; 2 rue Gaston Crémieux 91057 Evry France
| | - Agnieszka Sekowska
- AMAbiotics SAS, Institut du Cerveau et de la Moëlle Épinière, Hôpital de la Pitié-Salpêtrière; Paris France
| | - Vitor A. P. Martins dos Santos
- Laboratory of Systems and Synthetic Biology, Wageningen University; Dreijenplein 10, Building number 316 6703 HB Wageningen The Netherlands
| | - Víctor de Lorenzo
- Systems and Synthetic Biology Program, Centro Nacional de Biotecnología (CNB-CSIC); C/Darwin 3 28049 Madrid Spain
| | - Antoine Danchin
- AMAbiotics SAS, Institut du Cerveau et de la Moëlle Épinière, Hôpital de la Pitié-Salpêtrière; Paris France
| | - Claudine Médigue
- Alternative Energies and Atomic Energy Commission (CEA), Genomic Institute & CNRS-UMR8030 & Evry University, Laboratory of Bioinformatics Analysis in Genomics and Metabolism; 2 rue Gaston Crémieux 91057 Evry France
| |
Collapse
|
18
|
Sorokina M, Medigue C, Vallenet D. A new network representation of the metabolism to detect chemical transformation modules. BMC Bioinformatics 2015; 16:385. [PMID: 26573681 PMCID: PMC4647279 DOI: 10.1186/s12859-015-0809-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Accepted: 10/29/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Metabolism is generally modeled by directed networks where nodes represent reactions and/or metabolites. In order to explore metabolic pathway conservation and divergence among organisms, previous studies were based on graph alignment to find similar pathways. Few years ago, the concept of chemical transformation modules, also called reaction modules, was introduced and correspond to sequences of chemical transformations which are conserved in metabolism. We propose here a novel graph representation of the metabolic network where reactions sharing a same chemical transformation type are grouped in Reaction Molecular Signatures (RMS). RESULTS RMS were automatically computed for all reactions and encode changes in atoms and bonds. A reaction network containing all available metabolic knowledge was then reduced by an aggregation of reaction nodes and edges to obtain a RMS network. Paths in this network were explored and a substantial number of conserved chemical transformation modules was detected. Furthermore, this graph-based formalism allows us to define several path scores reflecting different biological conservation meanings. These scores are significantly higher for paths corresponding to known metabolic pathways and were used conjointly to build association rules that should predict metabolic pathway types like biosynthesis or degradation. CONCLUSIONS This representation of metabolism in a RMS network offers new insights to capture relevant metabolic contexts. Furthermore, along with genomic context methods, it should improve the detection of gene clusters corresponding to new metabolic pathways.
Collapse
Affiliation(s)
- Maria Sorokina
- Direction des Sciences du Vivant, Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA), Institut de Génomique, Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, 2 rue Gaston Crémieux, Evry, 91057, France.
- CNRS-UMR8030, 2 rue Gaston Crémieux, Evry, 91057, France.
- UEVE, Université d'Evry Val d'Essonne, Boulevard François Mitterrand, Evry, 91057, France.
| | - Claudine Medigue
- Direction des Sciences du Vivant, Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA), Institut de Génomique, Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, 2 rue Gaston Crémieux, Evry, 91057, France.
- CNRS-UMR8030, 2 rue Gaston Crémieux, Evry, 91057, France.
- UEVE, Université d'Evry Val d'Essonne, Boulevard François Mitterrand, Evry, 91057, France.
| | - David Vallenet
- Direction des Sciences du Vivant, Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA), Institut de Génomique, Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, 2 rue Gaston Crémieux, Evry, 91057, France.
- CNRS-UMR8030, 2 rue Gaston Crémieux, Evry, 91057, France.
- UEVE, Université d'Evry Val d'Essonne, Boulevard François Mitterrand, Evry, 91057, France.
| |
Collapse
|
19
|
Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N, Kotoulas G, Arvanitidis C, Iliopoulos I. Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinform Biol Insights 2015; 9:75-88. [PMID: 25983555 PMCID: PMC4426941 DOI: 10.4137/bbi.s12462] [Citation(s) in RCA: 177] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 03/09/2015] [Accepted: 03/13/2015] [Indexed: 12/14/2022] Open
Abstract
Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of "metagenomics", often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards.
Collapse
Affiliation(s)
- Anastasis Oulas
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Christina Pavloudi
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
- Department of Biology, University of Ghent, Ghent, Belgium
- Department of Microbial Ecophysiology, University of Bremen, Bremen, Germany
| | - Paraskevi Polymenakou
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Georgios A Pavlopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| | - Nikolas Papanikolaou
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| | - Georgios Kotoulas
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Christos Arvanitidis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Ioannis Iliopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| |
Collapse
|
20
|
Jacobson MP, Kalyanaraman C, Zhao S, Tian B. Leveraging structure for enzyme function prediction: methods, opportunities, and challenges. Trends Biochem Sci 2014; 39:363-71. [PMID: 24998033 DOI: 10.1016/j.tibs.2014.05.006] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Revised: 05/26/2014] [Accepted: 05/29/2014] [Indexed: 02/06/2023]
Abstract
The rapid growth of the number of protein sequences that can be inferred from sequenced genomes presents challenges for function assignment, because only a small fraction (currently <1%) has been experimentally characterized. Bioinformatics tools are commonly used to predict functions of uncharacterized proteins. Recently, there has been significant progress in using protein structures as an additional source of information to infer aspects of enzyme function, which is the focus of this review. Successful application of these approaches has led to the identification of novel metabolites, enzyme activities, and biochemical pathways. We discuss opportunities to elucidate systematically protein domains of unknown function, orphan enzyme activities, dead-end metabolites, and pathways in secondary metabolism.
Collapse
Affiliation(s)
- Matthew P Jacobson
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94158, USA; California Institute for Quantitative Biomedical Research, University of California, San Francisco, CA 94158, USA.
| | - Chakrapani Kalyanaraman
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94158, USA; California Institute for Quantitative Biomedical Research, University of California, San Francisco, CA 94158, USA
| | - Suwen Zhao
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94158, USA; California Institute for Quantitative Biomedical Research, University of California, San Francisco, CA 94158, USA
| | - Boxue Tian
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94158, USA; California Institute for Quantitative Biomedical Research, University of California, San Francisco, CA 94158, USA
| |
Collapse
|
21
|
El Yacoubi B, de Crécy-Lagard V. Integrative data-mining tools to link gene and function. Methods Mol Biol 2014; 1101:43-66. [PMID: 24233777 DOI: 10.1007/978-1-62703-721-1_4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Information derived from genomic and post-genomic data can be efficiently used to link gene and function. Several web-based platforms have been developed to mine these types of data by integrating different tools. This method paper is designed to allow the user to navigate these platforms in order to make functional predictions. The main focus is on phylogenetic distribution and physical clustering tools, but other tools such as pathway reconstruction, gene fusions, and analysis of high-throughput experimental data are also surveyed.
Collapse
Affiliation(s)
- Basma El Yacoubi
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | | |
Collapse
|
22
|
de Crécy-Lagard V. Variations in metabolic pathways create challenges for automated metabolic reconstructions: Examples from the tetrahydrofolate synthesis pathway. Comput Struct Biotechnol J 2014; 10:41-50. [PMID: 25210598 PMCID: PMC4151868 DOI: 10.1016/j.csbj.2014.05.008] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
The availability of thousands of sequenced genomes has revealed the diversity of biochemical solutions to similar chemical problems. Even for molecules at the heart of metabolism, such as cofactors, the pathway enzymes first discovered in model organisms like Escherichia coli or Saccharomyces cerevisiae are often not universally conserved. Tetrahydrofolate (THF) (or its close relative tetrahydromethanopterin) is a universal and essential C1-carrier that most microbes and plants synthesize de novo. The THF biosynthesis pathway and enzymes are, however, not universal and alternate solutions are found for most steps, making this pathway a challenge to annotate automatically in many genomes. Comparing THF pathway reconstructions and functional annotations of a chosen set of folate synthesis genes in specific prokaryotes revealed the strengths and weaknesses of different microbial annotation platforms. This analysis revealed that most current platforms fail in metabolic reconstruction of variant pathways. However, all the pieces are in place to quickly correct these deficiencies if the different databases were built on each other's strengths.
Collapse
Affiliation(s)
- Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science and Genetics Institute, University of Florida, Gainesville, FL, United States
| |
Collapse
|
23
|
Sorokina M, Stam M, Médigue C, Lespinet O, Vallenet D. Profiling the orphan enzymes. Biol Direct 2014; 9:10. [PMID: 24906382 PMCID: PMC4084501 DOI: 10.1186/1745-6150-9-10] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 05/29/2014] [Indexed: 11/10/2022] Open
Abstract
The emergence of Next Generation Sequencing generates an incredible amount of sequence and great potential for new enzyme discovery. Despite this huge amount of data and the profusion of bioinformatic methods for function prediction, a large part of known enzyme activities is still lacking an associated protein sequence. These particular activities are called "orphan enzymes". The present review proposes an update of previous surveys on orphan enzymes by mining the current content of public databases. While the percentage of orphan enzyme activities has decreased from 38% to 22% in ten years, there are still more than 1,000 orphans among the 5,000 entries of the Enzyme Commission (EC) classification. Taking into account all the reactions present in metabolic databases, this proportion dramatically increases to reach nearly 50% of orphans and many of them are not associated to a known pathway. We extended our survey to "local orphan enzymes" that are activities which have no representative sequence in a given clade, but have at least one in organisms belonging to other clades. We observe an important bias in Archaea and find that in general more than 30% of the EC activities have incomplete sequence information in at least one superkingdom. To estimate if candidate proteins for local orphans could be retrieved by homology search, we applied a simple strategy based on the PRIAM software and noticed that candidates may be proposed for an important fraction of local orphan enzymes. Finally, by studying relation between protein domains and catalyzed activities, it appears that newly discovered enzymes are mostly associated with already known enzyme domains. Thus, the exploration of the promiscuity and the multifunctional aspect of known enzyme families may solve part of the orphan enzyme issue. We conclude this review with a presentation of recent initiatives in finding proteins for orphan enzymes and in extending the enzyme world by the discovery of new activities.
Collapse
Affiliation(s)
- Maria Sorokina
- Direction des Sciences du Vivant, Commissariat à l'Energie Atomique (CEA), Institut de Génomique, Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, 2 rue Gaston Crémieux, 91057 Evry, France.
| | | | | | | | | |
Collapse
|
24
|
Shearer AG, Altman T, Rhee CD. Finding sequences for over 270 orphan enzymes. PLoS One 2014; 9:e97250. [PMID: 24826896 PMCID: PMC4020792 DOI: 10.1371/journal.pone.0097250] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 04/16/2014] [Indexed: 01/04/2023] Open
Abstract
Despite advances in sequencing technology, there are still significant numbers of well-characterized enzymatic activities for which there are no known associated sequences. These 'orphan enzymes' represent glaring holes in our biological understanding, and it is a top priority to reunite them with their coding sequences. Here we report a methodology for resolving orphan enzymes through a combination of database search and literature review. Using this method we were able to reconnect over 270 orphan enzymes with their corresponding sequence. This success points toward how we can systematically eliminate the remaining orphan enzymes and prevent the introduction of future orphan enzymes.
Collapse
Affiliation(s)
| | - Tomer Altman
- Stanford University, Stanford, California, United States of America
| | - Christine D. Rhee
- Clover Collective, Mountain View, California, United States of America
| |
Collapse
|
25
|
Fernández-Castané A, Fehér T, Carbonell P, Pauthenier C, Faulon JL. Computer-aided design for metabolic engineering. J Biotechnol 2014; 192 Pt B:302-13. [PMID: 24704607 DOI: 10.1016/j.jbiotec.2014.03.029] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2013] [Revised: 03/18/2014] [Accepted: 03/24/2014] [Indexed: 12/20/2022]
Abstract
The development and application of biotechnology-based strategies has had a great socio-economical impact and is likely to play a crucial role in the foundation of more sustainable and efficient industrial processes. Within biotechnology, metabolic engineering aims at the directed improvement of cellular properties, often with the goal of synthesizing a target chemical compound. The use of computer-aided design (CAD) tools, along with the continuously emerging advanced genetic engineering techniques have allowed metabolic engineering to broaden and streamline the process of heterologous compound-production. In this work, we review the CAD tools available for metabolic engineering with an emphasis, on retrosynthesis methodologies. Recent advances in genetic engineering strategies for pathway implementation and optimization are also reviewed as well as a range of bionalytical tools to validate in silico predictions. A case study applying retrosynthesis is presented as an experimental verification of the output from Retropath, the first complete automated computational pipeline applicable to metabolic engineering. Applying this CAD pipeline, together with genetic reassembly and optimization of culture conditions led to improved production of the plant flavonoid pinocembrin. Coupling CAD tools with advanced genetic engineering strategies and bioprocess optimization is crucial for enhanced product yields and will be of great value for the development of non-natural products through sustainable biotechnological processes.
Collapse
Affiliation(s)
- Alfred Fernández-Castané
- Institute of Systems and Synthetic Biology, University of Evry-Val-d'Essonne, CNRS FRE3561, Genopole(®) Campus 1, Genavenir 6, 5 rue Henri Desbruères, F-91030 Evry Cedex, France.
| | - Tamás Fehér
- Institute of Systems and Synthetic Biology, University of Evry-Val-d'Essonne, CNRS FRE3561, Genopole(®) Campus 1, Genavenir 6, 5 rue Henri Desbruères, F-91030 Evry Cedex, France.
| | - Pablo Carbonell
- Institute of Systems and Synthetic Biology, University of Evry-Val-d'Essonne, CNRS FRE3561, Genopole(®) Campus 1, Genavenir 6, 5 rue Henri Desbruères, F-91030 Evry Cedex, France.
| | - Cyrille Pauthenier
- Institute of Systems and Synthetic Biology, University of Evry-Val-d'Essonne, CNRS FRE3561, Genopole(®) Campus 1, Genavenir 6, 5 rue Henri Desbruères, F-91030 Evry Cedex, France.
| | - Jean-Loup Faulon
- Institute of Systems and Synthetic Biology, University of Evry-Val-d'Essonne, CNRS FRE3561, Genopole(®) Campus 1, Genavenir 6, 5 rue Henri Desbruères, F-91030 Evry Cedex, France.
| |
Collapse
|
26
|
Mackie A, Keseler IM, Nolan L, Karp PD, Paulsen IT. Dead end metabolites--defining the known unknowns of the E. coli metabolic network. PLoS One 2013; 8:e75210. [PMID: 24086468 PMCID: PMC3781023 DOI: 10.1371/journal.pone.0075210] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2013] [Accepted: 08/12/2013] [Indexed: 12/19/2022] Open
Abstract
The EcoCyc database is an online scientific database which provides an integrated view of the metabolic and regulatory network of the bacterium Escherichia coli K-12 and facilitates computational exploration of this important model organism. We have analysed the occurrence of dead end metabolites within the database – these are metabolites which lack the requisite reactions (either metabolic or transport) that would account for their production or consumption within the metabolic network. 127 dead end metabolites were identified from the 995 compounds that are contained within the EcoCyc metabolic network. Their presence reflects either a deficit in our representation of the network or in our knowledge of E. coli metabolism. Extensive literature searches resulted in the addition of 38 transport reactions and 3 metabolic reactions to the database and led to an improved representation of the pathway for Vitamin B12 salvage. 39 dead end metabolites were identified as components of reactions that are not physiologically relevant to E. coli K-12 – these reactions are properties of purified enzymes in vitro that would not be expected to occur in vivo. Our analysis led to improvements in the software that underpins the database and to the program that finds dead end metabolites within EcoCyc. The remaining dead end metabolites in the EcoCyc database likely represent deficiencies in our knowledge of E. coli metabolism.
Collapse
Affiliation(s)
- Amanda Mackie
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, Australia
| | | | - Laura Nolan
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Peter D. Karp
- SRI International, Menlo Park, California, United States of America
| | - Ian T. Paulsen
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, Australia
- * E-mail:
| |
Collapse
|
27
|
Belda E, Sekowska A, Le Fèvre F, Morgat A, Mornico D, Ouzounis C, Vallenet D, Médigue C, Danchin A. An updated metabolic view of the Bacillus subtilis 168 genome. Microbiology (Reading) 2013; 159:757-770. [DOI: 10.1099/mic.0.064691-0] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Affiliation(s)
- Eugeni Belda
- UEVE, Université d'Evry, boulevard François Mitterrand, 91025 Evry, France
- CNRS-UMR 8030, 2 rue Gaston Crémieux, 91057 Evry, France
- CEA, Institut de Génomique, Génoscope Laboratoire d’Analyse Bioinformatique en Génomique et Métabolisme, 2 rue Gaston Crémieux, 91057 Evry, France
| | | | - François Le Fèvre
- UEVE, Université d'Evry, boulevard François Mitterrand, 91025 Evry, France
- CNRS-UMR 8030, 2 rue Gaston Crémieux, 91057 Evry, France
- CEA, Institut de Génomique, Génoscope Laboratoire d’Analyse Bioinformatique en Génomique et Métabolisme, 2 rue Gaston Crémieux, 91057 Evry, France
| | - Anne Morgat
- Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Genève 4, Switzerland
| | - Damien Mornico
- UEVE, Université d'Evry, boulevard François Mitterrand, 91025 Evry, France
- CNRS-UMR 8030, 2 rue Gaston Crémieux, 91057 Evry, France
- CEA, Institut de Génomique, Génoscope Laboratoire d’Analyse Bioinformatique en Génomique et Métabolisme, 2 rue Gaston Crémieux, 91057 Evry, France
| | - Christos Ouzounis
- Department of Biochemistry, Li KaShing Faculty of Medicine, The University of Hong Kong, 21, Sassoon Road, Hong Kong SAR, China
- Institute of Applied Biosciences, Centre for Research and Technology Hellas (CERTH), Thessaloniki, Greece
| | - David Vallenet
- UEVE, Université d'Evry, boulevard François Mitterrand, 91025 Evry, France
- CNRS-UMR 8030, 2 rue Gaston Crémieux, 91057 Evry, France
- CEA, Institut de Génomique, Génoscope Laboratoire d’Analyse Bioinformatique en Génomique et Métabolisme, 2 rue Gaston Crémieux, 91057 Evry, France
| | - Claudine Médigue
- UEVE, Université d'Evry, boulevard François Mitterrand, 91025 Evry, France
- CNRS-UMR 8030, 2 rue Gaston Crémieux, 91057 Evry, France
- CEA, Institut de Génomique, Génoscope Laboratoire d’Analyse Bioinformatique en Génomique et Métabolisme, 2 rue Gaston Crémieux, 91057 Evry, France
| | - Antoine Danchin
- Department of Biochemistry, Li KaShing Faculty of Medicine, The University of Hong Kong, 21, Sassoon Road, Hong Kong SAR, China
- AMAbiotics SAS, Bldg G1, 2 rue Gaston Crémieux, 91000 Evry, France
| |
Collapse
|
28
|
Vallenet D, Belda E, Calteau A, Cruveiller S, Engelen S, Lajus A, Le Fèvre F, Longin C, Mornico D, Roche D, Rouy Z, Salvignol G, Scarpelli C, Thil Smith AA, Weiman M, Médigue C. MicroScope--an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data. Nucleic Acids Res 2012. [PMID: 23193269 PMCID: PMC3531135 DOI: 10.1093/nar/gks1194] [Citation(s) in RCA: 306] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
MicroScope is an integrated platform dedicated to both the methodical updating of microbial genome annotation and to comparative analysis. The resource provides data from completed and ongoing genome projects (automatic and expert annotations), together with data sources from post-genomic experiments (i.e. transcriptomics, mutant collections) allowing users to perfect and improve the understanding of gene functions. MicroScope (http://www.genoscope.cns.fr/agc/microscope) combines tools and graphical interfaces to analyse genomes and to perform the manual curation of gene annotations in a comparative context. Since its first publication in January 2006, the system (previously named MaGe for Magnifying Genomes) has been continuously extended both in terms of data content and analysis tools. The last update of MicroScope was published in 2009 in the Database journal. Today, the resource contains data for >1600 microbial genomes, of which ∼300 are manually curated and maintained by biologists (1200 personal accounts today). Expert annotations are continuously gathered in the MicroScope database (∼50 000 a year), contributing to the improvement of the quality of microbial genomes annotations. Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly. Furthermore, in the context of the European project Microme (Framework Program 7 Collaborative Project), MicroScope is becoming a resource providing for the curation and analysis of both genomic and metabolic data. An increasing number of projects are related to the study of environmental bacterial (meta)genomes that are able to metabolize a large variety of chemical compounds that may be of high industrial interest.
Collapse
Affiliation(s)
- David Vallenet
- CEA, Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, 91057 Evry, France.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|