1
|
Gricourt G, Meyer P, Duigou T, Faulon JL. Artificial Intelligence Methods and Models for Retro-Biosynthesis: A Scoping Review. ACS Synth Biol 2024. [PMID: 39047143 DOI: 10.1021/acssynbio.4c00091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Retrosynthesis aims to efficiently plan the synthesis of desirable chemicals by strategically breaking down molecules into readily available building block compounds. Having a long history in chemistry, retro-biosynthesis has also been used in the fields of biocatalysis and synthetic biology. Artificial intelligence (AI) is driving us toward new frontiers in synthesis planning and the exploration of chemical spaces, arriving at an opportune moment for promoting bioproduction that would better align with green chemistry, enhancing environmental practices. In this review, we summarize the recent advancements in the application of AI methods and models for retrosynthetic and retro-biosynthetic pathway design. These techniques can be based either on reaction templates or generative models and require scoring functions and planning strategies to navigate through the retrosynthetic graph of possibilities. We finally discuss limitations and promising research directions in this field.
Collapse
Affiliation(s)
- Guillaume Gricourt
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350 Jouy-en-Josas, France
| | - Philippe Meyer
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350 Jouy-en-Josas, France
| | - Thomas Duigou
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350 Jouy-en-Josas, France
| | - Jean-Loup Faulon
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350 Jouy-en-Josas, France
- The University of Manchester, Manchester Institute of Biotechnology, Manchester M1 7DN, U.K
| |
Collapse
|
2
|
Shi Z, Wang D, Li Y, Deng R, Lin J, Liu C, Li H, Wang R, Zhao M, Mao Z, Yuan Q, Liao X, Ma H. REME: an integrated platform for reaction enzyme mining and evaluation. Nucleic Acids Res 2024; 52:W299-W305. [PMID: 38769057 PMCID: PMC11223788 DOI: 10.1093/nar/gkae405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 04/16/2024] [Accepted: 05/01/2024] [Indexed: 05/22/2024] Open
Abstract
A key challenge in pathway design is finding proper enzymes that can be engineered to catalyze a non-natural reaction. Although existing tools can identify potential enzymes based on similar reactions, these tools encounter several issues. Firstly, the calculated similar reactions may not even have the same reaction type. Secondly, the associated enzymes are often numerous and identifying the most promising candidate enzymes is difficult due to the lack of data for evaluation. Thirdly, existing web tools do not provide interactive functions that enable users to fine-tune results based on their expertise. Here, we present REME (https://reme.biodesign.ac.cn/), the first integrated web platform for reaction enzyme mining and evaluation. Combining atom-to-atom mapping, atom type change identification, and reaction similarity calculation enables quick ranking and visualization of reactions similar to an objective non-natural reaction. Additional functionality enables users to filter similar reactions by their specified functional groups and candidate enzymes can be further filtered (e.g. by organisms) or expanded by Enzyme Commission number (EC) or sequence homology. Afterward, enzyme attributes (such as kcat, Km, optimal temperature and pH) can be assessed with deep learning-based methods, facilitating the swift identification of potential enzymes that can catalyze the non-natural reaction.
Collapse
Affiliation(s)
- Zhenkun Shi
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Dehang Wang
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Yang Li
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- University of Chinese Academy of Sciences, Beijing 101408, PR China
| | - Rui Deng
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Jiawei Lin
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Cui Liu
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Haoran Li
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Ruoyu Wang
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Muqiang Zhao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Zhitao Mao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Qianqian Yuan
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Xiaoping Liao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- Haihe Laboratory of Synthetic Biology, Tianjin 300308, PR China
| | - Hongwu Ma
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| |
Collapse
|
3
|
Ferreira S, Balola A, Sveshnikova A, Hatzimanikatis V, Vilaça P, Maia P, Carreira R, Stoney R, Carbonell P, Souza CS, Correia J, Lousa D, Soares CM, Rocha I. Computer-aided design and implementation of efficient biosynthetic pathways to produce high added-value products derived from tyrosine in Escherichia coli. Front Bioeng Biotechnol 2024; 12:1360740. [PMID: 38978715 PMCID: PMC11228882 DOI: 10.3389/fbioe.2024.1360740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Accepted: 06/03/2024] [Indexed: 07/10/2024] Open
Abstract
Developing efficient bioprocesses requires selecting the best biosynthetic pathways, which can be challenging and time-consuming due to the vast amount of data available in databases and literature. The extension of the shikimate pathway for the biosynthesis of commercially attractive molecules often involves promiscuous enzymes or lacks well-established routes. To address these challenges, we developed a computational workflow integrating enumeration/retrosynthesis algorithms, a toolbox for pathway analysis, enzyme selection tools, and a gene discovery pipeline, supported by manual curation and literature review. Our focus has been on implementing biosynthetic pathways for tyrosine-derived compounds, specifically L-3,4-dihydroxyphenylalanine (L-DOPA) and dopamine, with significant applications in health and nutrition. We selected one pathway to produce L-DOPA and two different pathways for dopamine-one already described in the literature and a novel pathway. Our goal was either to identify the most suitable gene candidates for expression in Escherichia coli for the known pathways or to discover innovative pathways. Although not all implemented pathways resulted in the accumulation of target compounds, in our shake-flask experiments we achieved a maximum L-DOPA titer of 0.71 g/L and dopamine titers of 0.29 and 0.21 g/L for known and novel pathways, respectively. In the case of L-DOPA, we utilized, for the first time, a mutant version of tyrosinase from Ralstonia solanacearum. Production of dopamine via the known biosynthesis route was accomplished by coupling the L-DOPA pathway with the expression of DOPA decarboxylase from Pseudomonas putida, resulting in a unique biosynthetic pathway never reported in literature before. In the context of the novel pathway, dopamine was produced using tyramine as the intermediate compound. To achieve this, tyrosine was initially converted into tyramine by expressing TDC from Levilactobacillus brevis, which, in turn, was converted into dopamine through the action of the enzyme encoded by ppoMP from Mucuna pruriens. This marks the first time that an alternative biosynthetic pathway for dopamine has been validated in microbes. These findings underscore the effectiveness of our computational workflow in facilitating pathway enumeration and selection, offering the potential to uncover novel biosynthetic routes, thus paving the way for other target compounds of biotechnological interest.
Collapse
Affiliation(s)
- Sofia Ferreira
- Systems and Synthetic Biology Laboratory, ITQB Nova-Instituto de Tecnologia Química e Biológica António Xavier, Oeiras, Portugal
| | - Alexandra Balola
- Systems and Synthetic Biology Laboratory, ITQB Nova-Instituto de Tecnologia Química e Biológica António Xavier, Oeiras, Portugal
| | - Anastasia Sveshnikova
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, EPFL, Lausanne, Switzerland
| | - Vassily Hatzimanikatis
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, EPFL, Lausanne, Switzerland
| | - Paulo Vilaça
- SilicoLife-Computational Biology Solutions for the Life Sciences, Braga, Portugal
| | - Paulo Maia
- SilicoLife-Computational Biology Solutions for the Life Sciences, Braga, Portugal
| | - Rafael Carreira
- SilicoLife-Computational Biology Solutions for the Life Sciences, Braga, Portugal
| | - Ruth Stoney
- Manchester Institute of Biotechnology, School of Chemistry, Faculty of Science and Engineering, University of Manchester, Manchester, United Kingdom
| | - Pablo Carbonell
- Institute of Industrial Control Systems and Computing (AI2), Universitat Politècnica de València (UPV), Valencia, Spain
- Institute for Integrative Systems Biology I2SysBio, Universitat de València-CSIC: Consejo Superior de Investigaciones Científicas, Paterna, Spain
| | - Caio Silva Souza
- Protein Modelling Laboratory, ITQB Nova-Instituto de Tecnologia Química e Biológica António Xavier, Oeiras, Portugal
| | - João Correia
- Protein Modelling Laboratory, ITQB Nova-Instituto de Tecnologia Química e Biológica António Xavier, Oeiras, Portugal
| | - Diana Lousa
- Protein Modelling Laboratory, ITQB Nova-Instituto de Tecnologia Química e Biológica António Xavier, Oeiras, Portugal
| | - Cláudio M Soares
- Protein Modelling Laboratory, ITQB Nova-Instituto de Tecnologia Química e Biológica António Xavier, Oeiras, Portugal
| | - Isabel Rocha
- Systems and Synthetic Biology Laboratory, ITQB Nova-Instituto de Tecnologia Química e Biológica António Xavier, Oeiras, Portugal
| |
Collapse
|
4
|
Ribeiro AJM, Riziotis IG, Borkakoti N, Thornton JM. Enzyme function and evolution through the lens of bioinformatics. Biochem J 2023; 480:1845-1863. [PMID: 37991346 PMCID: PMC10754289 DOI: 10.1042/bcj20220405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/09/2023] [Accepted: 11/14/2023] [Indexed: 11/23/2023]
Abstract
Enzymes have been shaped by evolution over billions of years to catalyse the chemical reactions that support life on earth. Dispersed in the literature, or organised in online databases, knowledge about enzymes can be structured in distinct dimensions, either related to their quality as biological macromolecules, such as their sequence and structure, or related to their chemical functions, such as the catalytic site, kinetics, mechanism, and overall reaction. The evolution of enzymes can only be understood when each of these dimensions is considered. In addition, many of the properties of enzymes only make sense in the light of evolution. We start this review by outlining the main paradigms of enzyme evolution, including gene duplication and divergence, convergent evolution, and evolution by recombination of domains. In the second part, we overview the current collective knowledge about enzymes, as organised by different types of data and collected in several databases. We also highlight some increasingly powerful computational tools that can be used to close gaps in understanding, in particular for types of data that require laborious experimental protocols. We believe that recent advances in protein structure prediction will be a powerful catalyst for the prediction of binding, mechanism, and ultimately, chemical reactions. A comprehensive mapping of enzyme function and evolution may be attainable in the near future.
Collapse
Affiliation(s)
- Antonio J. M. Ribeiro
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Ioannis G. Riziotis
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Neera Borkakoti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Janet M. Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| |
Collapse
|
5
|
Probst D. An explainability framework for deep learning on chemical reactions exemplified by enzyme-catalysed reaction classification. J Cheminform 2023; 15:113. [PMID: 37996942 PMCID: PMC10668483 DOI: 10.1186/s13321-023-00784-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 11/13/2023] [Indexed: 11/25/2023] Open
Abstract
Assigning or proposing a catalysing enzyme given a chemical or biochemical reaction is of great interest to life sciences and chemistry alike. The exploration and design of metabolic pathways and the challenge of finding more sustainable enzyme-catalysed alternatives to traditional organic reactions are just two examples of tasks that require an association between reaction and enzyme. However, given the lack of large and balanced annotated data sets of enzyme-catalysed reactions, assigning an enzyme to a reaction still relies on expert-curated rules and databases. Here, we present a data-driven explainable human-in-the-loop machine learning approach to support and ultimately automate the association of a catalysing enzyme with a given biochemical reaction. In addition, the proposed method is capable of predicting enzymes as candidate catalysts for organic reactions amendable to biocatalysis. Finally, the introduced explainability and visualisation methods can easily be generalised to support other machine-learning approaches involving chemical and biochemical reactions.
Collapse
Affiliation(s)
- Daniel Probst
- Signal Processing Laboratory 2, Institute of Electrical and Micro Engineering, School of Engineering, EPFL, Rte Cantonale, 1015, Lausanne, Vaud, Switzerland.
| |
Collapse
|
6
|
Ryu G, Kim GB, Yu T, Lee SY. Deep learning for metabolic pathway design. Metab Eng 2023; 80:130-141. [PMID: 37734652 DOI: 10.1016/j.ymben.2023.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 09/17/2023] [Accepted: 09/19/2023] [Indexed: 09/23/2023]
Abstract
The establishment of a bio-based circular economy is imperative in tackling the climate crisis and advancing sustainable development. In this realm, the creation of microbial cell factories is central to generating a variety of chemicals and materials. The design of metabolic pathways is crucial in shaping these microbial cell factories, especially when it comes to producing chemicals with yet-to-be-discovered biosynthetic routes. To aid in navigating the complexities of chemical and metabolic domains, computer-supported tools for metabolic pathway design have emerged. In this paper, we evaluate how digital strategies can be employed for pathway prediction and enzyme discovery. Additionally, we touch upon the recent strides made in using deep learning techniques for metabolic pathway prediction. These computational tools and strategies streamline the design of metabolic pathways, facilitating the development of microbial cell factories. Leveraging the capabilities of deep learning in metabolic pathway design is profoundly promising, potentially hastening the advent of a bio-based circular economy.
Collapse
Affiliation(s)
- Gahyeon Ryu
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea
| | - Gi Bae Kim
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea
| | - Taeho Yu
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea
| | - Sang Yup Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea; BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon, 34141, Republic of Korea; Graduate School of Engineering Biology, KAIST, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
7
|
Riziotis IG, Ribeiro AJM, Borkakoti N, Thornton JM. The 3D Modules of Enzyme Catalysis: Deconstructing Active Sites into Distinct Functional Entities. J Mol Biol 2023; 435:168254. [PMID: 37652131 DOI: 10.1016/j.jmb.2023.168254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 08/20/2023] [Accepted: 08/22/2023] [Indexed: 09/02/2023]
Abstract
Enzyme catalysis is governed by a limited toolkit of residues and organic or inorganic co-factors. Therefore, it is expected that recurring residue arrangements will be found across the enzyme space, which perform a defined catalytic function, are structurally similar and occur in unrelated enzymes. Leveraging the integrated information in the Mechanism and Catalytic Site Atlas (M-CSA) (enzyme structure, sequence, catalytic residue annotations, catalysed reaction, detailed mechanism description), 3D templates were derived to represent compact groups of catalytic residues. A fuzzy template-template search, allowed us to identify those recurring motifs, which are conserved or convergent, that we define as the "modules of enzyme catalysis". We show that a large fraction of these modules facilitate binding of metal ions, co-factors and substrates, and are frequently the result of convergent evolution. A smaller number of convergent modules perform a well-defined catalytic role, such as the variants of the catalytic triad (i.e. Ser-His-Asp/Cys-His-Asp) and the saccharide-cleaving Asp/Glu triad. It is also shown that enzymes whose functions have diverged during evolution preserve regions of their active site unaltered, as shown by modules performing similar or identical steps of the catalytic mechanism. We have compiled a comprehensive library of catalytic modules, that characterise a broad spectrum of enzymes. These modules can be used as templates in enzyme design and for better understanding catalysis in 3D.
Collapse
Affiliation(s)
- Ioannis G Riziotis
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK.
| | - António J M Ribeiro
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| | - Neera Borkakoti
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| | - Janet M Thornton
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| |
Collapse
|
8
|
Sarker B, Khare N, Devignes MD, Aridhi S. Improving automatic GO annotation with semantic similarity. BMC Bioinformatics 2022; 23:433. [PMID: 36510133 PMCID: PMC9743508 DOI: 10.1186/s12859-022-04958-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 09/19/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Automatic functional annotation of proteins is an open research problem in bioinformatics. The growing number of protein entries in public databases, for example in UniProtKB, poses challenges in manual functional annotation. Manual annotation requires expert human curators to search and read related research articles, interpret the results, and assign the annotations to the proteins. Thus, it is a time-consuming and expensive process. Therefore, designing computational tools to perform automatic annotation leveraging the high quality manual annotations that already exist in UniProtKB/SwissProt is an important research problem RESULTS: In this paper, we extend and adapt the GrAPFI (graph-based automatic protein function inference) (Sarker et al. in BMC Bioinform 21, 2020; Sarker et al., in: Proceedings of 7th international conference on complex networks and their applications, Cambridge, 2018) method for automatic annotation of proteins with gene ontology (GO) terms renaming it as GrAPFI-GO. The original GrAPFI method uses label propagation in a similarity graph where proteins are linked through the domains, families, and superfamilies that they share. Here, we also explore various types of similarity measures based on common neighbors in the graph. Moreover, GO terms are arranged in a hierarchical manner according to semantic parent-child relations. Therefore, we propose an efficient pruning and post-processing technique that integrates both semantic similarity and hierarchical relations between the GO terms. We produce experimental results comparing the GrAPFI-GO method with and without considering common neighbors similarity. We also test the performance of GrAPFI-GO and other annotation tools for GO annotation on a benchmark of proteins with and without the proposed pruning and post-processing procedure. CONCLUSION Our results show that the proposed semantic hierarchical post-processing potentially improves the performance of GrAPFI-GO and of other annotation tools as well. Thus, GrAPFI-GO exposes an original efficient and reusable procedure, to exploit the semantic relations among the GO terms in order to improve the automatic annotation of protein functions.
Collapse
Affiliation(s)
- Bishnu Sarker
- grid.29172.3f0000 0001 2194 6418CNRS, Inria, LORIA, University of Lorraine, 54000 Nancy, France ,grid.443078.c0000 0004 0371 4228Khulna University of Engineering and Technology, Khulna, Bangladesh ,grid.259870.10000 0001 0286 752XSchool of Applied Computational Sciences, Meharry Medical College, Nashville, TN USA
| | - Navya Khare
- grid.29172.3f0000 0001 2194 6418CNRS, Inria, LORIA, University of Lorraine, 54000 Nancy, France ,grid.419361.80000 0004 1759 7632International Institute of Information Technology, Hyderabad, India
| | | | - Sabeur Aridhi
- grid.29172.3f0000 0001 2194 6418CNRS, Inria, LORIA, University of Lorraine, 54000 Nancy, France
| |
Collapse
|
9
|
The automated Galaxy-SynBioCAD pipeline for synthetic biology design and engineering. Nat Commun 2022; 13:5082. [PMID: 36038542 PMCID: PMC9424320 DOI: 10.1038/s41467-022-32661-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 08/11/2022] [Indexed: 11/27/2022] Open
Abstract
Here we introduce the Galaxy-SynBioCAD portal, a toolshed for synthetic biology, metabolic engineering, and industrial biotechnology. The tools and workflows currently shared on the portal enables one to build libraries of strains producing desired chemical targets covering an end-to-end metabolic pathway design and engineering process from the selection of strains and targets, the design of DNA parts to be assembled, to the generation of scripts driving liquid handlers for plasmid assembly and strain transformations. Standard formats like SBML and SBOL are used throughout to enforce the compatibility of the tools. In a study carried out at four different sites, we illustrate the link between pathway design and engineering with the building of a library of E. coli lycopene-producing strains. We also benchmark our workflows on literature and expert validated pathways. Overall, we find an 83% success rate in retrieving the validated pathways among the top 10 pathways generated by the workflows.
Collapse
|
10
|
Cho JS, Kim GB, Eun H, Moon CW, Lee SY. Designing Microbial Cell Factories for the Production of Chemicals. JACS AU 2022; 2:1781-1799. [PMID: 36032533 PMCID: PMC9400054 DOI: 10.1021/jacsau.2c00344] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 07/26/2022] [Accepted: 07/26/2022] [Indexed: 05/24/2023]
Abstract
The sustainable production of chemicals from renewable, nonedible biomass has emerged as an essential alternative to address pressing environmental issues arising from our heavy dependence on fossil resources. Microbial cell factories are engineered microorganisms harboring biosynthetic pathways streamlined to produce chemicals of interests from renewable carbon sources. The biosynthetic pathways for the production of chemicals can be defined into three categories with reference to the microbial host selected for engineering: native-existing pathways, nonnative-existing pathways, and nonnative-created pathways. Recent trends in leveraging native-existing pathways, discovering nonnative-existing pathways, and designing de novo pathways (as nonnative-created pathways) are discussed in this Perspective. We highlight key approaches and successful case studies that exemplify these concepts. Once these pathways are designed and constructed in the microbial cell factory, systems metabolic engineering strategies can be used to improve the performance of the strain to meet industrial production standards. In the second part of the Perspective, current trends in design tools and strategies for systems metabolic engineering are discussed with an eye toward the future. Finally, we survey current and future challenges that need to be addressed to advance microbial cell factories for the sustainable production of chemicals.
Collapse
Affiliation(s)
- Jae Sung Cho
- Metabolic
and Biomolecular Engineering National Research Laboratory and Systems
Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative
Laboratory, Department of Chemical and Biomolecular Engineering (BK21
four), Korea Advanced Institute of Science
and Technology (KAIST), Daejeon 34141, Republic of Korea
- KAIST
Institute for the BioCentury and KAIST Institute for Artificial Intelligence, Korea Advanced Institute of Science and Technology
(KAIST), Daejeon 34141, Republic of Korea
- BioProcess
Engineering Research Center and BioInformatics Research Center, Korea Advanced Institute of Science and Technology
(KAIST), Daejeon 34141, Republic of Korea
| | - Gi Bae Kim
- Metabolic
and Biomolecular Engineering National Research Laboratory and Systems
Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative
Laboratory, Department of Chemical and Biomolecular Engineering (BK21
four), Korea Advanced Institute of Science
and Technology (KAIST), Daejeon 34141, Republic of Korea
- KAIST
Institute for the BioCentury and KAIST Institute for Artificial Intelligence, Korea Advanced Institute of Science and Technology
(KAIST), Daejeon 34141, Republic of Korea
| | - Hyunmin Eun
- Metabolic
and Biomolecular Engineering National Research Laboratory and Systems
Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative
Laboratory, Department of Chemical and Biomolecular Engineering (BK21
four), Korea Advanced Institute of Science
and Technology (KAIST), Daejeon 34141, Republic of Korea
- KAIST
Institute for the BioCentury and KAIST Institute for Artificial Intelligence, Korea Advanced Institute of Science and Technology
(KAIST), Daejeon 34141, Republic of Korea
| | - Cheon Woo Moon
- Metabolic
and Biomolecular Engineering National Research Laboratory and Systems
Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative
Laboratory, Department of Chemical and Biomolecular Engineering (BK21
four), Korea Advanced Institute of Science
and Technology (KAIST), Daejeon 34141, Republic of Korea
- KAIST
Institute for the BioCentury and KAIST Institute for Artificial Intelligence, Korea Advanced Institute of Science and Technology
(KAIST), Daejeon 34141, Republic of Korea
| | - Sang Yup Lee
- Metabolic
and Biomolecular Engineering National Research Laboratory and Systems
Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative
Laboratory, Department of Chemical and Biomolecular Engineering (BK21
four), Korea Advanced Institute of Science
and Technology (KAIST), Daejeon 34141, Republic of Korea
- KAIST
Institute for the BioCentury and KAIST Institute for Artificial Intelligence, Korea Advanced Institute of Science and Technology
(KAIST), Daejeon 34141, Republic of Korea
- BioProcess
Engineering Research Center and BioInformatics Research Center, Korea Advanced Institute of Science and Technology
(KAIST), Daejeon 34141, Republic of Korea
| |
Collapse
|
11
|
Warrier T, El Farran C, Zeng Y, Ho B, Bao Q, Zheng Z, Bi X, Ng HH, Ong D, Chu J, Sanyal A, Fullwood MJ, Collins J, Li H, Xu J, Loh YH. SETDB1 acts as a topological accessory to Cohesin via an H3K9me3-independent, genomic shunt for regulating cell fates. Nucleic Acids Res 2022; 50:7326-7349. [PMID: 35776115 PMCID: PMC9303280 DOI: 10.1093/nar/gkac531] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 05/30/2022] [Accepted: 06/30/2022] [Indexed: 11/13/2022] Open
Abstract
SETDB1 is a key regulator of lineage-specific genes and endogenous retroviral elements (ERVs) through its deposition of repressive H3K9me3 mark. Apart from its H3K9me3 regulatory role, SETDB1 has seldom been studied in terms of its other potential regulatory roles. To investigate this, a genomic survey of SETDB1 binding in mouse embryonic stem cells across multiple libraries was conducted, leading to the unexpected discovery of regions bereft of common repressive histone marks (H3K9me3, H3K27me3). These regions were enriched with the CTCF motif that is often associated with the topological regulator Cohesin. Further profiling of these non-H3K9me3 regions led to the discovery of a cluster of non-repeat loci that were co-bound by SETDB1 and Cohesin. These regions, which we named DiSCs (domains involving SETDB1 and Cohesin) were seen to be proximal to the gene promoters involved in embryonic stem cell pluripotency and lineage development. Importantly, it was found that SETDB1-Cohesin co-regulate target gene expression and genome topology at these DiSCs. Depletion of SETDB1 led to localized dysregulation of Cohesin binding thereby locally disrupting topological structures. Dysregulated gene expression trends revealed the importance of this cluster in ES cell maintenance as well as at gene 'islands' that drive differentiation to other lineages. The 'unearthing' of the DiSCs thus unravels a unique topological and transcriptional axis of control regulated chiefly by SETDB1.
Collapse
Affiliation(s)
- Tushar Warrier
- Cell Fate Engineering and Therapeutics Lab, Cell Biology and Therapies Division, A*STAR Institute of Molecular and Cell Biology, Singapore 138673, Singapore
- Department of Biological Sciences, National University of Singapore, Singapore 117543, Singapore
| | - Chadi El Farran
- Cell Fate Engineering and Therapeutics Lab, Cell Biology and Therapies Division, A*STAR Institute of Molecular and Cell Biology, Singapore 138673, Singapore
- Department of Biological Sciences, National University of Singapore, Singapore 117543, Singapore
| | - Yingying Zeng
- Cell Fate Engineering and Therapeutics Lab, Cell Biology and Therapies Division, A*STAR Institute of Molecular and Cell Biology, Singapore 138673, Singapore
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive 637551, Singapore
| | - Benedict Shao Quan Ho
- Cell Fate Engineering and Therapeutics Lab, Cell Biology and Therapies Division, A*STAR Institute of Molecular and Cell Biology, Singapore 138673, Singapore
| | - Qiuye Bao
- Cell Fate Engineering and Therapeutics Lab, Cell Biology and Therapies Division, A*STAR Institute of Molecular and Cell Biology, Singapore 138673, Singapore
| | - Zi Hao Zheng
- Cell Fate Engineering and Therapeutics Lab, Cell Biology and Therapies Division, A*STAR Institute of Molecular and Cell Biology, Singapore 138673, Singapore
- Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117593, Singapore
| | - Xuezhi Bi
- Proteomics Group, Bioprocessing Technology Institute, A*STAR, Singapore 138668, Singapore
| | - Huck Hui Ng
- Gene Regulation Laboratory, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Derrick Sek Tong Ong
- Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117593, Singapore
| | - Justin Jang Hann Chu
- Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117593, Singapore
- Infectious Disease Translational Research Programme, National University of Singapore, Singapore 117597, Singapore
| | - Amartya Sanyal
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive 637551, Singapore
| | - Melissa Jane Fullwood
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive 637551, Singapore
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Drive, Singapore 117599, Singapore
| | - James J Collins
- Howard Hughes Medical Institute, Boston, MA 02114, USA
- Institute for Medical Engineering and Science Department of Biological Engineering, and Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA 02114, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
| | - Hu Li
- Center for Individualized Medicine, Department of Molecular Pharmacology & Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | - Jian Xu
- Department of Biological Sciences, National University of Singapore, Singapore 117543, Singapore
- Department of Plant Systems Physiology, Radboud Institute for Biological and Environmental Sciences, Radboud University, Heyendaalseweg 135, 6525 AJ, Nijmegen, The Netherlands
| | - Yuin-Han Loh
- Cell Fate Engineering and Therapeutics Lab, Cell Biology and Therapies Division, A*STAR Institute of Molecular and Cell Biology, Singapore 138673, Singapore
- Department of Biological Sciences, National University of Singapore, Singapore 117543, Singapore
- Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117593, Singapore
- NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, 28 MedicalDrive, Singapore 117456, Singapore
| |
Collapse
|
12
|
Furse S, Watkins AJ, Williams HEL, Snowden SG, Chiarugi D, Koulman A. Paternal nutritional programming of lipid metabolism is propagated through sperm and seminal plasma. Metabolomics 2022; 18:13. [PMID: 35141784 PMCID: PMC8828597 DOI: 10.1007/s11306-022-01869-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 01/04/2022] [Indexed: 12/12/2022]
Abstract
BACKGROUND The paternal diet affects lipid metabolism in offspring for at least two generations through nutritional programming. However, we do not know how this is propagated to the offspring. OBJECTIVES We tested the hypothesis that the changes in lipid metabolism that are driven by paternal diet are propagated through spermatozoa and not seminal plasma. METHODS We applied an updated, purpose-built computational network analysis tool to characterise control of lipid metabolism systemically (Lipid Traffic Analysis v2.3) on a known mouse model of paternal nutritional programming. RESULTS The analysis showed that the two possible routes for programming effects, the sperm (genes) and seminal plasma (influence on the uterine environment), both have a distinct effect on the offspring's lipid metabolism. Further, the programming effects in offspring suggest that changes in lipid distribution are more important than alterations in lipid biosynthesis. CONCLUSIONS These results show how the uterine environment and genes both affect lipid metabolism in offspring, enhancing our understanding of the link between parental diet and metabolism in offspring.
Collapse
Affiliation(s)
- Samuel Furse
- Core Metabolomics and Lipidomics Laboratory, Wellcome Trust-MRL Institute of Metabolic Science, University of Cambridge, Addenbrooke's Treatment Centre, Keith Day Road, Cambridge, CB2 0QQ, UK.
- Metabolic Disease Unit, Wellcome Trust-MRL Institute of Metabolic Science, University of Cambridge, Addenbrooke's Treatment Centre, Keith Day Road, Cambridge, CB2 0QQ, UK.
- Biological Chemistry Group, Jodrell Laboratory, Royal Botanic Gardens Kew, Richmond, UK.
| | - Adam J Watkins
- Division of Child Health, Obstetrics and Gynaecology, Faculty of Medicine, University of Nottingham, Nottingham, NG7 2UH, UK
| | - Huw E L Williams
- Biodiscovery Institute, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
| | - Stuart G Snowden
- Department of Biological Sciences, Royal Holloway College, University of London, Egham, TW20 0EX, Surrey, UK
| | - Davide Chiarugi
- Bioinformatics and Biostatistics Core, Wellcome Trust-MRL Institute of Metabolic Science, University of Cambridge, Addenbrooke's Treatment Centre, Keith Day Road, Cambridge, CB2 0QQ, UK
| | - Albert Koulman
- Core Metabolomics and Lipidomics Laboratory, Wellcome Trust-MRL Institute of Metabolic Science, University of Cambridge, Addenbrooke's Treatment Centre, Keith Day Road, Cambridge, CB2 0QQ, UK.
- Metabolic Disease Unit, Wellcome Trust-MRL Institute of Metabolic Science, University of Cambridge, Addenbrooke's Treatment Centre, Keith Day Road, Cambridge, CB2 0QQ, UK.
| |
Collapse
|
13
|
Tao YM, Bu CY, Zou LH, Hu YL, Zheng ZJ, Ouyang J. A comprehensive review on microbial production of 1,2-propanediol: micro-organisms, metabolic pathways, and metabolic engineering. BIOTECHNOLOGY FOR BIOFUELS 2021; 14:216. [PMID: 34794503 PMCID: PMC8600716 DOI: 10.1186/s13068-021-02067-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 11/07/2021] [Indexed: 06/13/2023]
Abstract
1,2-Propanediol is an important building block as a component used in the manufacture of unsaturated polyester resin, antifreeze, biofuel, nonionic detergent, etc. Commercial production of 1,2-propanediol through microbial biosynthesis is limited by low efficiency, and chemical production of 1,2-propanediol requires petrochemically derived routes involving wasteful power consumption and high pollution emissions. With the development of various strategies based on metabolic engineering, a series of obstacles are expected to be overcome. This review provides an extensive overview of the progress in the microbial production of 1,2-propanediol, particularly the different micro-organisms used for 1,2-propanediol biosynthesis and microbial production pathways. In addition, outstanding challenges associated with microbial biosynthesis and feasible metabolic engineering strategies, as well as perspectives on the future microbial production of 1,2-propanediol, are discussed.
Collapse
Affiliation(s)
- Yuan-Ming Tao
- Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Nanjing Forestry University, Nanjing, 210037, People's Republic of China
- College of Chemical Engineering, Nanjing Forestry University, Nanjing, 210037, People's Republic of China
| | - Chong-Yang Bu
- Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Nanjing Forestry University, Nanjing, 210037, People's Republic of China
- College of Chemical Engineering, Nanjing Forestry University, Nanjing, 210037, People's Republic of China
| | - Li-Hua Zou
- Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Nanjing Forestry University, Nanjing, 210037, People's Republic of China
- College of Chemical Engineering, Nanjing Forestry University, Nanjing, 210037, People's Republic of China
| | - Yue-Li Hu
- Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Nanjing Forestry University, Nanjing, 210037, People's Republic of China
- College of Chemical Engineering, Nanjing Forestry University, Nanjing, 210037, People's Republic of China
| | - Zhao-Juan Zheng
- Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Nanjing Forestry University, Nanjing, 210037, People's Republic of China
- College of Chemical Engineering, Nanjing Forestry University, Nanjing, 210037, People's Republic of China
| | - Jia Ouyang
- Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Nanjing Forestry University, Nanjing, 210037, People's Republic of China.
- College of Chemical Engineering, Nanjing Forestry University, Nanjing, 210037, People's Republic of China.
| |
Collapse
|
14
|
Lin A, Dyubankova N, Madzhidov TI, Nugmanov RI, Verhoeven J, Gimadiev TR, Afonina VA, Ibragimova Z, Rakhimbekova A, Sidorov P, Gedich A, Suleymanov R, Mukhametgaleev R, Wegner J, Ceulemans H, Varnek A. Atom-to-atom Mapping: A Benchmarking Study of Popular Mapping Algorithms and Consensus Strategies. Mol Inform 2021; 41:e2100138. [PMID: 34726834 DOI: 10.1002/minf.202100138] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 10/15/2021] [Indexed: 01/23/2023]
Abstract
In this paper, we compare the most popular Atom-to-Atom Mapping (AAM) tools: ChemAxon,[1] Indigo,[2] RDTool,[3] NameRXN (NextMove),[4] and RXNMapper[5] which implement different AAM algorithms. An open-source RDTool program was optimized, and its modified version ("new RDTool") was considered together with several consensus mapping strategies. The Condensed Graph of Reaction approach was used to calculate chemical distances and develop the "AAM fixer" algorithm for an automatized correction of erroneous mapping. The benchmarking calculations were performed on a Golden dataset containing 1851 manually mapped and curated reactions. The best performing RXNMapper program together with the AMM Fixer was applied to map the USPTO database. The Golden dataset, mapped USPTO and optimized RDTool are available in the GitHub repository https://github.com/Laboratoire-de-Chemoinformatique.
Collapse
Affiliation(s)
- Arkadii Lin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg4, Blaise Pascal str., 67081, Strasbourg, France
| | | | - Timur I Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Ramil I Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Jonas Verhoeven
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Timur R Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Sapporo, Kita-ku, 001-0021, Sapporo, Japan
| | - Valentina A Afonina
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Zarina Ibragimova
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Assima Rakhimbekova
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Sapporo, Kita-ku, 001-0021, Sapporo, Japan
| | - Andrei Gedich
- Arcadia Inc., 28 k2, Bolshoy Sampsonievskiy pr., St. Petersburg, 194044, Russia
| | - Rail Suleymanov
- Arcadia Inc., 28 k2, Bolshoy Sampsonievskiy pr., St. Petersburg, 194044, Russia
| | - Ravil Mukhametgaleev
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Joerg Wegner
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Hugo Ceulemans
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg4, Blaise Pascal str., 67081, Strasbourg, France.,Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Sapporo, Kita-ku, 001-0021, Sapporo, Japan
| |
Collapse
|
15
|
Dong J, Zhao M, Liu Y, Su Y, Zeng X. Deep learning in retrosynthesis planning: datasets, models and tools. Brief Bioinform 2021; 23:6375056. [PMID: 34571535 DOI: 10.1093/bib/bbab391] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/16/2021] [Accepted: 08/30/2021] [Indexed: 12/29/2022] Open
Abstract
In recent years, synthesizing drugs powered by artificial intelligence has brought great convenience to society. Since retrosynthetic analysis occupies an essential position in synthetic chemistry, it has received broad attention from researchers. In this review, we comprehensively summarize the development process of retrosynthesis in the context of deep learning. This review covers all aspects of retrosynthesis, including datasets, models and tools. Specifically, we report representative models from academia, in addition to a detailed description of the available and stable platforms in the industry. We also discuss the disadvantages of the existing models and provide potential future trends, so that more abecedarians will quickly understand and participate in the family of retrosynthesis planning.
Collapse
Affiliation(s)
- Jingxin Dong
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Mingyi Zhao
- Department of Pediatrics, Third Xiangya Hospital, Central South University, 400013, Hunan, China
| | - Yuansheng Liu
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 230601, Hefei, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| |
Collapse
|
16
|
Visani GM, Hughes MC, Hassoun S. Enzyme Promiscuity Prediction Using Hierarchy-Informed Multi-Label Classification. Bioinformatics 2021; 37:2017–2024. [PMID: 33515234 PMCID: PMC8337005 DOI: 10.1093/bioinformatics/btab054] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 12/30/2020] [Accepted: 01/22/2021] [Indexed: 11/25/2022] Open
Abstract
MOTIVATION As experimental efforts are costly and time consuming, computational characterization of enzyme capabilities is an attractive alternative. We present and evaluate several machine-learning models to predict which of 983 distinct enzymes, as defined via the Enzyme Commission (EC) numbers, are likely to interact with a given query molecule. Our data consists of enzyme-substrate interactions from the BRENDA database. Some interactions are attributed to natural selection and involve the enzyme's natural substrates. The majority of the interactions however involve non-natural substrates, thus reflecting promiscuous enzymatic activities. RESULTS We frame this "enzyme promiscuity prediction" problem as a multi-label classification task. We maximally utilize inhibitor and unlabelled data to train prediction models that can take advantage of known hierarchical relationships between enzyme classes. We report that a hierarchical multi-label neural network, EPP-HMCNF, is the best model for solving this problem, outperforming k-nearest neighbours similarity-based and other machine learning models. We show that inhibitor information during training consistently improves predictive power, particularly for EPP-HMCNF. We also show that all promiscuity prediction models perform worse under a realistic data split when compared to a random data split, and when evaluating performance on non-natural substrates compared to natural substrates. AVAILABILITY AND IMPLEMENTATION We provide Python code for EPP-HMCNF and other models in a repository termed EPP (Enzyme Promiscuity Prediction) at https://github.com/hassounlab/EPP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gian Marco Visani
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Michael C Hughes
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Soha Hassoun
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
- Department of Chemical and Biological Engineering, Tufts University, Medford, MA 02155, USA
| |
Collapse
|
17
|
|
18
|
Fenner K, Elsner M, Lueders T, McLachlan MS, Wackett LP, Zimmermann M, Drewes JE. Methodological Advances to Study Contaminant Biotransformation: New Prospects for Understanding and Reducing Environmental Persistence? ACS ES&T WATER 2021; 1:1541-1554. [PMID: 34278380 PMCID: PMC8276273 DOI: 10.1021/acsestwater.1c00025] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2021] [Revised: 06/11/2021] [Accepted: 06/11/2021] [Indexed: 05/14/2023]
Abstract
Complex microbial communities in environmental systems play a key role in the detoxification of chemical contaminants by transforming them into less active metabolites or by complete mineralization. Biotransformation, i.e., transformation by microbes, is well understood for a number of priority pollutants, but a similar level of understanding is lacking for many emerging contaminants encountered at low concentrations and in complex mixtures across natural and engineered systems. Any advanced approaches aiming to reduce environmental exposure to such contaminants (e.g., novel engineered biological water treatment systems, design of readily degradable chemicals, or improved regulatory assessment strategies to determine contaminant persistence a priori) will depend on understanding the causal links among contaminant removal, the key driving agents of biotransformation at low concentrations (i.e., relevant microbes and their metabolic activities), and how their presence and activity depend on environmental conditions. In this Perspective, we present the current understanding and recent methodological advances that can help to identify such links, even in complex environmental microbiomes and for contaminants present at low concentrations in complex chemical mixtures. We discuss the ensuing insights into contaminant biotransformation across varying environments and conditions and ask how much closer we have come to designing improved approaches to reducing environmental exposure to contaminants.
Collapse
Affiliation(s)
- Kathrin Fenner
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland
- Institute of Biogeochemistry and Pollutant Dynamics, ETH Zürich, 8092 Zürich, Switzerland
- Department of Chemistry, University of Zürich, 8057 Zürich, Switzerland
| | - Martin Elsner
- Chair of Analytical Chemistry and Water Chemistry, Technical University of Munich, 85748 Garching, Germany
| | - Tillmann Lueders
- Chair of Ecological Microbiology, Bayreuth Center of Ecology and Environmental Research (BayCEER), University of Bayreuth, 95448 Bayreuth, Germany
| | - Michael S McLachlan
- Department of Environmental Science (ACES), Stockholm University, 106 91 Stockholm, Sweden
| | - Lawrence P Wackett
- Biotechnology Institute, University of Minnesota, Saint Paul, Minnesota 55108, United States
| | - Michael Zimmermann
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Jörg E Drewes
- Chair of Urban Water Systems Engineering, Technical University of Munich, 85748 Garching, Germany
| |
Collapse
|
19
|
Habib MAH, Ismail MN. Extraction and identification of biologically important proteins from the medicinal plant God's crown (Phaleria macrocarpa). J Food Biochem 2021; 45:e13817. [PMID: 34137461 DOI: 10.1111/jfbc.13817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 05/24/2021] [Accepted: 05/28/2021] [Indexed: 11/30/2022]
Abstract
The fruit and leaf of God's crown (Phaleria macrocarpa) have been traditionally used to treat a wide variety of diseases. However, the proteins of this tropical plant are still heavily understudied. Three protein extraction methods; phenol (Phe), trichloroacetic acid (TCA)-acetone-phenol (TCA-A-Phe), and ultrasonic (Ult) were compared on the fruit and leaf of P. macrocarpa. The Phe extraction method showed the highest percentage of recovered protein after the resolubilization process for both leaf (12.24%) and fruit (30.41%) based on protein yields of the leaf (6.15 mg/g) and fruit (36.98 mg/g). Phe and TCA-A-Phe extraction methods gave well-resolved bands over a wide range of molecular weights through sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Following liquid chromatography-tandem mass spectrometry analysis, proteins identified through the Phe extraction method were 30%-35% enzymatic proteins, including oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases that possess various biological functions. PRACTICAL APPLICATIONS: Every part of God's crown plant is traditionally consumed to treat various illnesses. While plant's benefits are well known and have led to a plethora of health products, the proteome remains mostly unknown. This study compares three protein extraction methods for the leaf and fruit of P. macrocarpa and identifies their proteins thru LC-MS/MS coupled with PEAKS. These method comparisons can be a guide for works on other plants as well. In addition, the proteomics data from this study may shed light on the functional properties of these plant parts and their products.
Collapse
Affiliation(s)
- Mohd Afiq Hazlami Habib
- Analytical Biochemistry Research Centre (ABrC), Universiti Sains Malaysia (USM), Bayan Lepas, Penang, Malaysia
| | - Mohd Nazri Ismail
- Analytical Biochemistry Research Centre (ABrC), Universiti Sains Malaysia (USM), Bayan Lepas, Penang, Malaysia.,Institute For Research in Molecular Medicine (INFORMM), Universiti Sains Malaysia (USM), Bayan Lepas, Penang, Malaysia
| |
Collapse
|
20
|
Jiang J, Liu LP, Hassoun S. Learning graph representations of biochemical networks and its application to enzymatic link prediction. Bioinformatics 2021; 37:793-799. [PMID: 33051674 PMCID: PMC8097755 DOI: 10.1093/bioinformatics/btaa881] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 08/01/2020] [Accepted: 09/29/2020] [Indexed: 11/20/2022] Open
Abstract
Motivation The complete characterization of enzymatic activities between molecules remains incomplete, hindering biological engineering and limiting biological discovery. We develop in this work a technique, enzymatic link prediction (ELP), for predicting the likelihood of an enzymatic transformation between two molecules. ELP models enzymatic reactions cataloged in the KEGG database as a graph. ELP is innovative over prior works in using graph embedding to learn molecular representations that capture not only molecular and enzymatic attributes but also graph connectivity. Results We explore transductive (test nodes included in the training graph) and inductive (test nodes not part of the training graph) learning models. We show that ELP achieves high AUC when learning node embeddings using both graph connectivity and node attributes. Further, we show that graph embedding improves link prediction by 30% in area under curve over fingerprint-based similarity approaches and by 8% over support vector machines. We compare ELP against rule-based methods. We also evaluate ELP for predicting links in pathway maps and for reconstruction of edges in reaction networks of four common gut microbiota phyla: actinobacteria, bacteroidetes, firmicutes and proteobacteria. To emphasize the importance of graph embedding in the context of biochemical networks, we illustrate how graph embedding can guide visualization. Availability and implementation The code and datasets are available through https://github.com/HassounLab/ELP.
Collapse
Affiliation(s)
- Julie Jiang
- Department of Computer Science, Tufts University, Medford 02155, USA
| | - Li-Ping Liu
- Department of Computer Science, Tufts University, Medford 02155, USA
| | - Soha Hassoun
- Department of Computer Science, Tufts University, Medford 02155, USA.,Department of Chemical and Biological Engineering, Tufts University, Medford 02155, USA
| |
Collapse
|
21
|
Erhardt P, Bachmann K, Birkett D, Boberg M, Bodor N, Gibson G, Hawkins D, Hawksworth G, Hinson J, Koehler D, Kress B, Luniwal A, Masumoto H, Novak R, Portoghese P, Sarver J, Serafini MT, Trabbic C, Vermeulen N, Wrighton S. Glossary and tutorial of xenobiotic metabolism terms used during small molecule drug discovery and development (IUPAC Technical Report). PURE APPL CHEM 2021. [DOI: 10.1515/pac-2018-0208] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Abstract
This project originated more than 15 years ago with the intent to produce a glossary of drug metabolism terms having definitions especially applicable for use by practicing medicinal chemists. A first-draft version underwent extensive beta-testing that, fortuitously, engaged international audiences in a wide range of disciplines involved in drug discovery and development. It became clear that the inclusion of information to enhance discussions among this mix of participants would be even more valuable. The present version retains a chemical structure theme while expanding tutorial comments that aim to bridge the various perspectives that may arise during interdisciplinary communications about a given term. This glossary is intended to be educational for early stage researchers, as well as useful for investigators at various levels who participate on today’s highly multidisciplinary, collaborative small molecule drug discovery teams.
Collapse
Affiliation(s)
- Paul Erhardt
- Center for Drug Design and Development , University of Toledo , Toledo , Ohio , USA
| | | | - Donald Birkett
- Department of Clinical Pharmacology , Flinders University , Adelaide , Australia (now Emeritus), (TGM)
| | - Michael Boberg
- Metabolism and Isotope Chemistry , Bayer , AG , Germany (now undetermined), (TGM)
| | - Nicholas Bodor
- Center for Drug Discovery , University of Florida , Belle Glade , FL , USA (now Emeritus Grad Res Prof/CEO Bodor Labs), (TGM)
| | - Gordon Gibson
- School of Biomedical and Life Sciences, University of Surrey , Surrey , UK (now deceased), (TGM)
| | - David Hawkins
- Huntingdon Life Sciences , Huntingdon , UK (now retired), (TGM)
| | - Gabrielle Hawksworth
- Department of Medicine and Therapeutics , University Aberdeen , Aberdeen , UK (now deceased), (TGM)
| | - Jack Hinson
- Division of Toxicology , University Arkansas for Medical Sciences , Little Rock , Arkansas , USA (now Emeritus Dist Prof), (TGM)
| | - Daniel Koehler
- Department of Pharmacology , University of Toledo , Toledo , Ohio , USA, (ST)
| | - Brian Kress
- Department of Medicinal and Biological Chemistry , University of Toledo , Toledo , Ohio , USA, (ST)
| | | | - Hiroshi Masumoto
- Drug Metabolism , Daiichi Pharm. Corp., Ltd. , Chuo , Tokyo , Japan (now retired), (TGM)
| | - Raymond Novak
- Institute of Environmental Health Science, Wayne State University , Detroit , Michigan , USA (now undetermined), (TGM)
| | - Phillip Portoghese
- Department of Medicinal Chemistry , University of Minnesota , Minneapolis , Minnesota , USA (now same), (TGM)
| | - Jeffrey Sarver
- Department of Pharmacology , University of Toledo , Toledo , Ohio , USA, (ST)
| | - M. Teresa Serafini
- Department of Pharmacokinetics and Drug Metabolism , Laboratories Dr. Esteve, S.A. , Barcelona , Spain (now Head Early ADME), (TGM)
| | | | - Nico Vermeulen
- Department of Pharmacochemistry , Vrije University , Amsterdam , Netherlands (now Emeritus Section Molecular Toxicology), (TGM)
| | - Steven Wrighton
- Eli Lilly, Inc. , Indianapolis , Indiana , USA (now retired), (TGM)
| |
Collapse
|
22
|
Hafner J, Payne J, MohammadiPeyhani H, Hatzimanikatis V, Smolke C. A computational workflow for the expansion of heterologous biosynthetic pathways to natural product derivatives. Nat Commun 2021; 12:1760. [PMID: 33741955 PMCID: PMC7979880 DOI: 10.1038/s41467-021-22022-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 02/24/2021] [Indexed: 01/31/2023] Open
Abstract
Plant natural products (PNPs) and their derivatives are important but underexplored sources of pharmaceutical molecules. To access this untapped potential, the reconstitution of heterologous PNP biosynthesis pathways in engineered microbes provides a valuable starting point to explore and produce novel PNP derivatives. Here, we introduce a computational workflow to systematically screen the biochemical vicinity of a biosynthetic pathway for pharmaceutical compounds that could be produced by derivatizing pathway intermediates. We apply our workflow to the biosynthetic pathway of noscapine, a benzylisoquinoline alkaloid (BIA) with a long history of medicinal use. Our workflow identifies pathways and enzyme candidates for the production of (S)-tetrahydropalmatine, a known analgesic and anxiolytic, and three additional derivatives. We then construct pathways for these compounds in yeast, resulting in platforms for de novo biosynthesis of BIA derivatives and demonstrating the value of cheminformatic tools to predict reactions, pathways, and enzymes in synthetic biology and metabolic engineering.
Collapse
Affiliation(s)
- Jasmin Hafner
- Laboratory of Computational Systems Biotechnology, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - James Payne
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Homa MohammadiPeyhani
- Laboratory of Computational Systems Biotechnology, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Vassily Hatzimanikatis
- Laboratory of Computational Systems Biotechnology, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
| | - Christina Smolke
- Department of Bioengineering, Stanford University, Stanford, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
23
|
Lipid Traffic Analysis reveals the impact of high paternal carbohydrate intake on offsprings' lipid metabolism. Commun Biol 2021; 4:163. [PMID: 33547386 PMCID: PMC7864968 DOI: 10.1038/s42003-021-01686-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 01/08/2021] [Indexed: 12/12/2022] Open
Abstract
In this paper we present an investigation of parental-diet-driven metabolic programming in offspring using a novel computational network analysis tool. The impact of high paternal carbohydrate intake on offsprings’ phospholipid and triglyceride metabolism in F1 and F2 generations is described. Detailed lipid profiles were acquired from F1 neonate (3 weeks), F1 adult (16 weeks) and F2 neonate offspring in serum, liver, brain, heart and abdominal adipose tissues by MS and NMR. Using a purpose-built computational tool for analysing both phospholipid and fat metabolism as a network, we characterised the number, type and abundance of lipid variables in and between tissues (Lipid Traffic Analysis), finding a variety of reprogrammings associated with paternal diet. These results are important because they describe the long-term metabolic result of dietary intake by fathers. This analytical approach is important because it offers unparalleled insight into possible mechanisms for alterations in lipid metabolism throughout organisms. Furse et al. use a purpose-built computational tool called Lipid Traffic Analysis to determine the spatial distribution of lipids throughout an organism. They use it to show that high paternal carbohydrate intake influences lipid metabolism in offspring two generations hence.
Collapse
|
24
|
Otero-Muras I, Carbonell P. Automated engineering of synthetic metabolic pathways for efficient biomanufacturing. Metab Eng 2020; 63:61-80. [PMID: 33316374 DOI: 10.1016/j.ymben.2020.11.012] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Revised: 11/15/2020] [Accepted: 11/20/2020] [Indexed: 12/19/2022]
Abstract
Metabolic engineering involves the engineering and optimization of processes from single-cell to fermentation in order to increase production of valuable chemicals for health, food, energy, materials and others. A systems approach to metabolic engineering has gained traction in recent years thanks to advances in strain engineering, leading to an accelerated scaling from rapid prototyping to industrial production. Metabolic engineering is nowadays on track towards a truly manufacturing technology, with reduced times from conception to production enabled by automated protocols for DNA assembly of metabolic pathways in engineered producer strains. In this review, we discuss how the success of the metabolic engineering pipeline often relies on retrobiosynthetic protocols able to identify promising production routes and dynamic regulation strategies through automated biodesign algorithms, which are subsequently assembled as embedded integrated genetic circuits in the host strain. Those approaches are orchestrated by an experimental design strategy that provides optimal scheduling planning of the DNA assembly, rapid prototyping and, ultimately, brings forward an accelerated Design-Build-Test-Learn cycle and the overall optimization of the biomanufacturing process. Achieving such a vision will address the increasingly compelling demand in our society for delivering valuable biomolecules in an affordable, inclusive and sustainable bioeconomy.
Collapse
Affiliation(s)
- Irene Otero-Muras
- BioProcess Engineering Group, IIM-CSIC, Spanish National Research Council, Vigo, 36208, Spain.
| | - Pablo Carbonell
- Institute of Industrial Control Systems and Computing (ai2), Universitat Politècnica de València, 46022, Spain.
| |
Collapse
|
25
|
Carbonell P, Le Feuvre R, Takano E, Scrutton NS. In silico design and automated learning to boost next-generation smart biomanufacturing. Synth Biol (Oxf) 2020; 5:ysaa020. [PMID: 33344778 PMCID: PMC7737007 DOI: 10.1093/synbio/ysaa020] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 09/08/2020] [Accepted: 09/28/2020] [Indexed: 02/07/2023] Open
Abstract
The increasing demand for bio-based compounds produced from waste or sustainable sources is driving biofoundries to deliver a new generation of prototyping biomanufacturing platforms. Integration and automation of the design, build, test and learn (DBTL) steps in centers like SYNBIOCHEM in Manchester and across the globe (Global Biofoundries Alliance) are helping to reduce the delivery time from initial strain screening and prototyping towards industrial production. Notably, a portfolio of producer strains for a suite of material monomers was recently developed, some approaching industrial titers, in a tour de force by the Manchester Centre that was achieved in less than 90 days. New in silico design tools are providing significant contributions to the front end of the DBTL pipelines. At the same time, the far-reaching initiatives of modern biofoundries are generating a large amount of high-dimensional data and knowledge that can be integrated through automated learning to expedite the DBTL cycle. In this Perspective, the new design tools and the role of the learning component as an enabling technology for the next generation of automated biofoundries are discussed. Future biofoundries will operate under completely automated DBTL cycles driven by in silico optimal experimental planning, full biomanufacturing devices connectivity, virtualization platforms and cloud-based design. The automated generation of robotic build worklists and the integration of machine-learning algorithms will collectively allow high levels of adaptability and rapid design changes toward fully automated smart biomanufacturing.
Collapse
Affiliation(s)
- Pablo Carbonell
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM) and Future Biomanufacturing Research Hub, Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, UK.,Instituto Universitario de Automática e Informática Industrial, Universitat Politècnica de València, 46022 Valencia, Spain
| | - Rosalind Le Feuvre
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM) and Future Biomanufacturing Research Hub, Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, UK
| | - Eriko Takano
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM) and Future Biomanufacturing Research Hub, Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, UK
| | - Nigel S Scrutton
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM) and Future Biomanufacturing Research Hub, Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, UK
| |
Collapse
|
26
|
Sun D, Cheng X, Tian Y, Ding S, Zhang D, Cai P, Hu QN. EnzyMine: a comprehensive database for enzyme function annotation with enzymatic reaction chemical feature. Database (Oxford) 2020; 2023:baaa065. [PMID: 33002112 PMCID: PMC10755256 DOI: 10.1093/database/baaa065] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 07/19/2020] [Accepted: 07/24/2020] [Indexed: 11/14/2022]
Abstract
Addition of chemical structural information in enzymatic reactions has proven to be significant for accurate enzyme function prediction. However, such chemical data lack systematic feature mining and hardly exist in enzyme-related databases. Therefore, global mining of enzymatic reactions will offer a unique landscape for researchers to understand the basic functional mechanisms of natural bioprocesses and facilitate enzyme function annotation. Here, we established a new knowledge base called EnzyMine, through which we propose to elucidate enzymatic reaction features and then link them with sequence and structural annotations. EnzyMine represents an advanced database that extends enzyme knowledge by incorporating reaction chemical feature strategies, strengthening the connectivity between enzyme and metabolic reactions. Therefore, it has the potential to reveal many new metabolic pathways involved with given enzymes, as well as expand enzyme function annotation. Database URL: http://www.rxnfinder.org/enzymine/.
Collapse
Affiliation(s)
- Dandan Sun
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Xingxiang Cheng
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Yu Tian
- School of Biology and Pharmaceutical Engineering, Wuhan Polytechnic University, Wuhan, Hubei 430023, China and
| | - Shaozhen Ding
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Dachuan Zhang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Pengli Cai
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, P. R. China
| | - Qian-nan Hu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| |
Collapse
|
27
|
Chen F, Yuan L, Ding S, Tian Y, Hu QN. Data-driven rational biosynthesis design: from molecules to cell factories. Brief Bioinform 2020; 21:1238-1248. [PMID: 31243440 DOI: 10.1093/bib/bbz065] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Revised: 04/28/2019] [Accepted: 05/08/2019] [Indexed: 11/12/2022] Open
Abstract
A proliferation of chemical, reaction and enzyme databases, new computational methods and software tools for data-driven rational biosynthesis design have emerged in recent years. With the coming of the era of big data, particularly in the bio-medical field, data-driven rational biosynthesis design could potentially be useful to construct target-oriented chassis organisms. Engineering the complicated metabolic systems of chassis organisms to biosynthesize target molecules from inexpensive biomass is the main goal of cell factory design. The process of data-driven cell factory design could be divided into several parts: (1) target molecule selection; (2) metabolic reaction and pathway design; (3) prediction of novel enzymes based on protein domain and structure transformation of biosynthetic reactions; (4) construction of large-scale DNA for metabolic pathways; and (5) DNA assembly methods and visualization tools. The construction of a one-stop cell factory system could achieve automated design from the molecule level to the chassis level. In this article, we outline data-driven rational biosynthesis design steps and provide an overview of related tools in individual steps.
Collapse
Affiliation(s)
- Fu Chen
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin, People's Republic of China.,Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, People's Republic of China.,CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
| | - Le Yuan
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, People's Republic of China.,University of Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Shaozhen Ding
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
| | - Yu Tian
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, People's Republic of China.,University of Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
| |
Collapse
|
28
|
Sarker B, Ritchie DW, Aridhi S. GrAPFI: predicting enzymatic function of proteins from domain similarity graphs. BMC Bioinformatics 2020; 21:168. [PMID: 32349654 PMCID: PMC7191693 DOI: 10.1186/s12859-020-3460-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Accepted: 03/19/2020] [Indexed: 01/20/2023] Open
Abstract
An amendment to this paper has been published and can be accessed via the original article.
Collapse
Affiliation(s)
- Bishnu Sarker
- University of Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
| | - David W Ritchie
- University of Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
| | - Sabeur Aridhi
- University of Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France.
| |
Collapse
|
29
|
Holliday GL, Brown SD, Mischel D, Polacco BJ, Babbitt PC. A strategy for large-scale comparison of evolutionary- and reaction-based classifications of enzyme function. Database (Oxford) 2020; 2020:baaa034. [PMID: 32449511 PMCID: PMC7246345 DOI: 10.1093/database/baaa034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 03/18/2020] [Accepted: 04/27/2020] [Indexed: 12/12/2022]
Abstract
Determining the molecular function of enzymes discovered by genome sequencing represents a primary foundation for understanding many aspects of biology. Historically, classification of enzyme reactions has used the enzyme nomenclature system developed to describe the overall reactions performed by biochemically characterized enzymes, irrespective of their associated sequences. In contrast, functional classification and assignment for the millions of protein sequences of unknown function now available is largely done in two computational steps, first by similarity-based assignment of newly obtained sequences to homologous groups, followed by transferring to them the known functions of similar biochemically characterized homologs. Due to the fundamental differences in their etiologies and practice, `how' these chemistry- and evolution-centric functional classification systems relate to each other has been difficult to explore on a large scale. To investigate this issue in a new way, we integrated two published ontologies that had previously described each of these classification systems independently. The resulting infrastructure was then used to compare the functional assignments obtained from each classification system for the well-studied and functionally diverse enolase superfamily. Mapping these function assignments to protein structure and reaction similarity networks shows a profound and complex disconnect between the homology- and chemistry-based classification systems. This conclusion mirrors previous observations suggesting that except for closely related sequences, facile annotation transfer from small numbers of characterized enzymes to the huge number uncharacterized homologs to which they are related is problematic. Our extension of these comparisons to large enzyme superfamilies in a computationally intelligent manner provides a foundation for new directions in protein function prediction for the huge proportion of sequences of unknown function represented in major databases. Interactive sequence, reaction, substrate and product similarity networks computed for this work for the enolase and two other superfamilies are freely available for download from the Structure Function Linkage Database Archive (http://sfld.rbvi.ucsf.edu).
Collapse
Affiliation(s)
- Gemma L Holliday
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
- Present Address: Medicines Discovery Catapult, Mereside, Alderley Park, Alderley Edge SK10 4TG, UK
| | - Shoshana D Brown
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| | - David Mischel
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| | - Benjamin J Polacco
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| |
Collapse
|
30
|
Chung NC, Miasojedow B, Startek M, Gambin A. Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data. BMC Bioinformatics 2019; 20:644. [PMID: 31874610 PMCID: PMC6929325 DOI: 10.1186/s12859-019-3118-5] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 09/27/2019] [Indexed: 11/12/2022] Open
Abstract
Background A survey of presences and absences of specific species across multiple biogeographic units (or bioregions) are used in a broad area of biological studies from ecology to microbiology. Using binary presence-absence data, we evaluate species co-occurrences that help elucidate relationships among organisms and environments. To summarize similarity between occurrences of species, we routinely use the Jaccard/Tanimoto coefficient, which is the ratio of their intersection to their union. It is natural, then, to identify statistically significant Jaccard/Tanimoto coefficients, which suggest non-random co-occurrences of species. However, statistical hypothesis testing using this similarity coefficient has been seldom used or studied. Results We introduce a hypothesis test for similarity for biological presence-absence data, using the Jaccard/Tanimoto coefficient. Several key improvements are presented including unbiased estimation of expectation and centered Jaccard/Tanimoto coefficients, that account for occurrence probabilities. The exact and asymptotic solutions are derived. To overcome a computational burden due to high-dimensionality, we propose the bootstrap and measurement concentration algorithms to efficiently estimate statistical significance of binary similarity. Comprehensive simulation studies demonstrate that our proposed methods produce accurate p-values and false discovery rates. The proposed estimation methods are orders of magnitude faster than the exact solution, particularly with an increasing dimensionality. We showcase their applications in evaluating co-occurrences of bird species in 28 islands of Vanuatu and fish species in 3347 freshwater habitats in France. The proposed methods are implemented in an open source R package called jaccard (https://cran.r-project.org/package=jaccard). Conclusion We introduce a suite of statistical methods for the Jaccard/Tanimoto similarity coefficient for binary data, that enable straightforward incorporation of probabilistic measures in analysis for species co-occurrences. Due to their generality, the proposed methods and implementations are applicable to a wide range of binary data arising from genomics, biochemistry, and other areas of science.
Collapse
Affiliation(s)
- Neo Christopher Chung
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Stefana Banacha 2, Warsaw, 02-097, Poland.
| | - BłaŻej Miasojedow
- Institute of Mathematics, Polish Academy of Sciences, Jana i Jędrzeja Śniadeckich 8, Warsaw, 00-656, Poland
| | - Michał Startek
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Stefana Banacha 2, Warsaw, 02-097, Poland
| | - Anna Gambin
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Stefana Banacha 2, Warsaw, 02-097, Poland
| |
Collapse
|
31
|
Ribeiro AJM, Tyzack JD, Borkakoti N, Holliday GL, Thornton JM. A global analysis of function and conservation of catalytic residues in enzymes. J Biol Chem 2019; 295:314-324. [PMID: 31796628 DOI: 10.1074/jbc.rev119.006289] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The catalytic residues of an enzyme comprise the amino acids located in the active center responsible for accelerating the enzyme-catalyzed reaction. These residues lower the activation energy of reactions by performing several catalytic functions. Decades of enzymology research has established general themes regarding the roles of specific residues in these catalytic reactions, but it has been more difficult to explore these roles in a more systematic way. Here, we review the data on the catalytic residues of 648 enzymes, as annotated in the Mechanism and Catalytic Site Atlas (M-CSA), and compare our results with those in previous studies. We structured this analysis around three key properties of the catalytic residues: amino acid type, catalytic function, and sequence conservation in homologous proteins. As expected, we observed that catalysis is mostly accomplished by a small set of residues performing a limited number of catalytic functions. Catalytic residues are typically highly conserved, but to a smaller degree in homologues that perform different reactions or are nonenzymes (pseudoenzymes). Cross-analysis yielded further insights revealing which residues perform particular functions and how often. We obtained more detailed specificity rules for certain functions by identifying the chemical group upon which the residue acts. Finally, we show the mutation tolerance of the catalytic residues based on their roles. The characterization of the catalytic residues, their functions, and conservation, as presented here, is key to understanding the impact of mutations in evolution, disease, and enzyme design. The tools developed for this analysis are available at the M-CSA website and allow for user specific analysis of the same data.
Collapse
Affiliation(s)
- António J M Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
| | - Jonathan D Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Neera Borkakoti
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Gemma L Holliday
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
32
|
Carboxylic Ester Hydrolases in Bacteria: Active Site, Structure, Function and Application. CRYSTALS 2019. [DOI: 10.3390/cryst9110597] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Carboxylic ester hydrolases (CEHs), which catalyze the hydrolysis of carboxylic esters to produce alcohol and acid, are identified in three domains of life. In the Protein Data Bank (PDB), 136 crystal structures of bacterial CEHs (424 PDB codes) from 52 genera and metagenome have been reported. In this review, we categorize these structures based on catalytic machinery, structure and substrate specificity to provide a comprehensive understanding of the bacterial CEHs. CEHs use Ser, Asp or water as a nucleophile to drive diverse catalytic machinery. The α/β/α sandwich architecture is most frequently found in CEHs, but 3-solenoid, β-barrel, up-down bundle, α/β/β/α 4-layer sandwich, 6 or 7 propeller and α/β barrel architectures are also found in these CEHs. Most are substrate-specific to various esters with types of head group and lengths of the acyl chain, but some CEHs exhibit peptidase or lactamase activities. CEHs are widely used in industrial applications, and are the objects of research in structure- or mutation-based protein engineering. Structural studies of CEHs are still necessary for understanding their biological roles, identifying their structure-based functions and structure-based engineering and their potential industrial applications.
Collapse
|
33
|
Tyzack JD, Ribeiro AJM, Borkakoti N, Thornton JM. Transform-MinER: transforming molecules in enzyme reactions. Bioinformatics 2019; 34:3597-3599. [PMID: 29762650 PMCID: PMC6184704 DOI: 10.1093/bioinformatics/bty394] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Accepted: 05/09/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation One goal of synthetic biology is to make new enzymes to generate new products, but identifying the starting enzymes for further investigation is often elusive and relies on expert knowledge, intensive literature searching and trial and error. Results We present Transform Molecules in Enzyme Reactions, an online computational tool that transforms query substrate molecules into products using enzyme reactions. The most similar native enzyme reactions for each transformation are found, highlighting those that may be of most interest for enzyme design and directed evolution approaches. Availability and implementation https://www.ebi.ac.uk/thornton-srv/transform-miner
Collapse
|
34
|
Ribeiro AJM, Holliday GL, Furnham N, Tyzack JD, Ferris K, Thornton JM. Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites. Nucleic Acids Res 2019; 46:D618-D623. [PMID: 29106569 PMCID: PMC5753290 DOI: 10.1093/nar/gkx1012] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/13/2017] [Indexed: 12/28/2022] Open
Abstract
M-CSA (Mechanism and Catalytic Site Atlas) is a database of enzyme active sites and reaction mechanisms that can be accessed at www.ebi.ac.uk/thornton-srv/m-csa. Our objectives with M-CSA are to provide an open data resource for the community to browse known enzyme reaction mechanisms and catalytic sites, and to use the dataset to understand enzyme function and evolution. M-CSA results from the merging of two existing databases, MACiE (Mechanism, Annotation and Classification in Enzymes), a database of enzyme mechanisms, and CSA (Catalytic Site Atlas), a database of catalytic sites of enzymes. We are releasing M-CSA as a new website and underlying database architecture. At the moment, M-CSA contains 961 entries, 423 of these with detailed mechanism information, and 538 with information on the catalytic site residues only. In total, these cover 81% (195/241) of third level EC numbers with a PDB structure, and 30% (840/2793) of fourth level EC numbers with a PDB structure, out of 6028 in total. By searching for close homologues, we are able to extend M-CSA coverage of PDB and UniProtKB to 51 993 structures and to over five million sequences, respectively, of which about 40% and 30% have a conserved active site.
Collapse
Affiliation(s)
- António J M Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gemma L Holliday
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 1HT, UK
| | - Jonathan D Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Katherine Ferris
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
35
|
Automatic mapping of atoms across both simple and complex chemical reactions. Nat Commun 2019; 10:1434. [PMID: 30926819 PMCID: PMC6441094 DOI: 10.1038/s41467-019-09440-2] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Accepted: 03/01/2019] [Indexed: 11/08/2022] Open
Abstract
Mapping atoms across chemical reactions is important for substructure searches, automatic extraction of reaction rules, identification of metabolic pathways, and more. Unfortunately, the existing mapping algorithms can deal adequately only with relatively simple reactions but not those in which expert chemists would benefit from computer's help. Here we report how a combination of algorithmics and expert chemical knowledge significantly improves the performance of atom mapping, allowing the machine to deal with even the most mechanistically complex chemical and biochemical transformations. The key feature of our approach is the use of few but judiciously chosen reaction templates that are used to generate plausible "intermediate" atom assignments which then guide a graph-theoretical algorithm towards the chemically correct isomorphic mappings. The algorithm performs significantly better than the available state-of-the-art reaction mappers, suggesting its uses in database curation, mechanism assignments, and - above all - machine extraction of reaction rules underlying modern synthesis-planning programs.
Collapse
|
36
|
Enzyme annotation for orphan and novel reactions using knowledge of substrate reactive sites. Proc Natl Acad Sci U S A 2019; 116:7298-7307. [PMID: 30910961 PMCID: PMC6462048 DOI: 10.1073/pnas.1818877116] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent advances in synthetic biochemistry have resulted in a wealth of novel hypothetical enzymatic reactions that are not matched to protein-encoding genes, deeming them “orphan.” A large number of known metabolic enzymes are also orphan, leaving important gaps in metabolic network maps. Proposing genes for the catalysis of orphan reactions is critical for applications ranging from biotechnology to medicine. In this work, the computational method BridgIT identified potential enzymes of orphan reactions and nearly all theoretically possible biochemical transformations, providing candidate genes to catalyze these reactions to the research community. The BridgIT online tool will allow researchers to fill the knowledge gaps in metabolic networks and will act as a starting point for designing novel enzymes to catalyze nonnatural transformations. Thousands of biochemical reactions with characterized activities are “orphan,” meaning they cannot be assigned to a specific enzyme, leaving gaps in metabolic pathways. Novel reactions predicted by pathway-generation tools also lack associated sequences, limiting protein engineering applications. Associating orphan and novel reactions with known biochemistry and suggesting enzymes to catalyze them is a daunting problem. We propose the method BridgIT to identify candidate genes and catalyzing proteins for these reactions. This method introduces information about the enzyme binding pocket into reaction-similarity comparisons. BridgIT assesses the similarity of two reactions, one orphan and one well-characterized nonorphan reaction, using their substrate reactive sites, their surrounding structures, and the structures of the generated products to suggest enzymes that catalyze the most-similar nonorphan reactions as candidates for also catalyzing the orphan ones. We performed two large-scale validation studies to test BridgIT predictions against experimental biochemical evidence. For the 234 orphan reactions from the Kyoto Encyclopedia of Genes and Genomes (KEGG) 2011 (a comprehensive enzymatic-reaction database) that became nonorphan in KEGG 2018, BridgIT predicted the exact or a highly related enzyme for 211 of them. Moreover, for 334 of 379 novel reactions in 2014 that were later cataloged in KEGG 2018, BridgIT predicted the exact or highly similar enzymes. BridgIT requires knowledge about only four connecting bonds around the atoms of the reactive sites to correctly annotate proteins for 93% of analyzed enzymatic reactions. Increasing to seven connecting bonds allowed for the accurate identification of a sequence for nearly all known enzymatic reactions.
Collapse
|
37
|
|
38
|
Tyzack JD, Furnham N, Sillitoe I, Orengo CM, Thornton JM. Exploring Enzyme Evolution from Changes in Sequence, Structure, and Function. Methods Mol Biol 2019; 1851:263-275. [PMID: 30298402 DOI: 10.1007/978-1-4939-8736-8_14] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The goal of our research is to increase our understanding of how biology works at the molecular level, with a particular focus on how enzymes evolve their functions through adaptations to generate new specificities and mechanisms. FunTree (Sillitoe and Furnham, Nucleic Acids Res 44:D317-D323, 2016) is a resource that brings together sequence, structure, phylogenetic, and chemical and mechanistic information for 2340 CATH superfamilies (Sillitoe et al., Nucleic Acids Res 43:D376-D381, 2015) (which all contain at least one enzyme) to allow evolution to be investigated within a structurally defined superfamily.We will give an overview of FunTree's use of sequence and structural alignments to cluster proteins within a superfamily into structurally similar groups (SSGs) and generate phylogenetic trees augmented by ancestral character estimations (ACE). This core information is supplemented with new measures of functional similarity (Rahman et al., Nat Methods 11:171-174, 2014) to compare enzyme reactions based on overall bond changes, reaction centers (the local environment atoms involved in the reaction), and the structural similarities of the metabolites involved in the reaction. These trees are also decorated with taxonomic and Enzyme Commission (EC) code and GO annotations, forming the basis of a comprehensive web interface that can be found at http://www.funtree.info . In this chapter, we will discuss the various analyses and supporting computational tools in more detail, describing the steps required to extract information.
Collapse
Affiliation(s)
| | | | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Christine M Orengo
- Institute of Structural and Molecular Biology, University College London, London, UK
| | | |
Collapse
|
39
|
Zhang J, Kwong S, Wong KC. ToBio: Global Pathway Similarity Search Based on Topological and Biological Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:336-349. [PMID: 29990160 DOI: 10.1109/tcbb.2017.2769642] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Pathway similarity search plays a vital role in the post-genomics era. Unfortunately, pathway similarity search involves the graph isomorphism problem which is NP-complete. Therefore, efficient search algorithms are desirable. In this work, we propose a novel global pathway similarity search approach named ToBio, which considers both topological and biological features for effective global pathway similarity search. Specifically, as motivated from nature, various topological and biological features including subgraph signature similarities, sequence similarities, and gene ontology similarities are considered in ToBio. Since different features carry different functional importance and dependences, we report three schemes of ToBio using different sets of features. In addition, to enhance the existing search algorithms for rigorous comparisons, post-processing pipelines are also proposed to investigate how different features can contribute to the search performance. ToBio and other state-of-the-art methods are benchmarked on the gold-standard pathway datasets from three species. The results demonstrate the competitive edges of ToBio over the state-of-the-arts ranging from the topological aspects to the biological aspects. Case studies have been conducted to reveal mechanistic insights into the unique search performance of ToBio.
Collapse
|
40
|
TAMMiCol: Tool for analysis of the morphology of microbial colonies. PLoS Comput Biol 2018; 14:e1006629. [PMID: 30507938 PMCID: PMC6292648 DOI: 10.1371/journal.pcbi.1006629] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 12/13/2018] [Accepted: 11/08/2018] [Indexed: 01/21/2023] Open
Abstract
Many microbes are studied by examining colony morphology via two-dimensional top-down images. The quantification of such images typically requires each pixel to be labelled as belonging to either the colony or background, producing a binary image. While this may be achieved manually for a single colony, this process is infeasible for large datasets containing thousands of images. The software Tool for Analysis of the Morphology of Microbial Colonies (TAMMiCol) has been developed to efficiently and automatically convert colony images to binary. TAMMiCol exploits the structure of the images to choose a thresholding tolerance and produce a binary image of the colony. The images produced are shown to compare favourably with images processed manually, while TAMMiCol is shown to outperform standard segmentation methods. Multiple images may be imported together for batch processing, while the binary data may be exported as a CSV or MATLAB MAT file for quantification, or analysed using statistics built into the software. Using the in-built statistics, it is found that images produced by TAMMiCol yield values close to those computed from binary images processed manually. Analysis of a new large dataset using TAMMiCol shows that colonies of Saccharomyces cerevisiae reach a maximum level of filamentous growth once the concentration of ammonium sulfate is reduced to 200 μM. TAMMiCol is accessed through a graphical user interface, making it easy to use for those without specialist knowledge of image processing, statistical methods or coding. Many microbes are studied by examining the colony morphology via a two-dimensional top-down image. In order to quantify such images, we typically need to label each pixel as belonging either to the colony or the background, creating a binary image. This task is laborious when performed manually and proves infeasible for large datasets. To overcome this, we have developed the software Tool for Analysis of the Morphology of Microbial Colonies (TAMMiCol), which automatically and efficiently converts colony images to binary. Multiple images may be imported and processed simultaneously, and TAMMiCol exploits the structure of the images to identify an appropriate threshold for the binary conversion of each image. The images produced by TAMMiCol, which take around 20 seconds each to process, compare favourably with images processed manually, which take anywhere up to 15 minutes, while TAMMiCol outperforms several standard image segmentation methods. After processing, the images may be exported as a CSV or MATLAB MAT file for further analysis, or may be quantified by TAMMiCol using the in-built statistics. Using TAMMiCol, we have found that colonies of S. cerevisiae reach a maximum level of filamentous growth once the concentration of ammonium sulfate is reduced to 200 μM.
Collapse
|
41
|
Li Y, Wang S, Umarov R, Xie B, Fan M, Li L, Gao X. DEEPre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics 2018; 34:760-769. [PMID: 29069344 PMCID: PMC6030869 DOI: 10.1093/bioinformatics/btx680] [Citation(s) in RCA: 124] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 10/20/2017] [Indexed: 11/15/2022] Open
Abstract
Motivation Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number. Results We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manually crafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre’s ability to capture the functional difference of enzyme isoforms. Availability and implementation The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu Li
- Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), Computer, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Sheng Wang
- Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), Computer, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Ramzan Umarov
- Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), Computer, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Bingqing Xie
- Computer Science Department, Illinois Institute of Technology, Chicago, IL 60616, USA
| | - Ming Fan
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Lihua Li
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Xin Gao
- Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), Computer, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
42
|
Holliday GL, Akiva E, Meng EC, Brown SD, Calhoun S, Pieper U, Sali A, Booker SJ, Babbitt PC. Atlas of the Radical SAM Superfamily: Divergent Evolution of Function Using a "Plug and Play" Domain. Methods Enzymol 2018; 606:1-71. [PMID: 30097089 DOI: 10.1016/bs.mie.2018.06.004] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
The radical SAM superfamily contains over 100,000 homologous enzymes that catalyze a remarkably broad range of reactions required for life, including metabolism, nucleic acid modification, and biogenesis of cofactors. While the highly conserved SAM-binding motif responsible for formation of the key 5'-deoxyadenosyl radical intermediate is a key structural feature that simplifies identification of superfamily members, our understanding of their structure-function relationships is complicated by the modular nature of their structures, which exhibit varied and complex domain architectures. To gain new insight about these relationships, we classified the entire set of sequences into similarity-based subgroups that could be visualized using sequence similarity networks. This superfamily-wide analysis reveals important features that had not previously been appreciated from studies focused on one or a few members. Functional information mapped to the networks indicates which members have been experimentally or structurally characterized, their known reaction types, and their phylogenetic distribution. Despite the biological importance of radical SAM chemistry, the vast majority of superfamily members have never been experimentally characterized in any way, suggesting that many new reactions remain to be discovered. In addition to 20 subgroups with at least one known function, we identified additional subgroups made up entirely of sequences of unknown function. Importantly, our results indicate that even general reaction types fail to track well with our sequence similarity-based subgroupings, raising major challenges for function prediction for currently identified and new members that continue to be discovered. Interactive similarity networks and other data from this analysis are available from the Structure-Function Linkage Database.
Collapse
Affiliation(s)
- Gemma L Holliday
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, United States.
| | - Eyal Akiva
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, United States
| | - Elaine C Meng
- Resource for Biocomputing, Visualization, and Informatics, Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA, United States
| | - Shoshana D Brown
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, United States
| | - Sara Calhoun
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, United States; Graduate Program in Biophysics, University of California, San Francisco, CA, United States
| | - Ursula Pieper
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, United States
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, United States; Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, United States; Quantitative Biosciences Institute, University of California, San Francisco, CA, United States
| | - Squire J Booker
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, United States; Department of Chemistry, The Pennsylvania State University, University Park, PA, United States; The Howard Hughes Medical Institute, The Pennsylvania State University, University Park, PA, United States
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, United States; Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, United States; Quantitative Biosciences Institute, University of California, San Francisco, CA, United States.
| |
Collapse
|
43
|
Sivakumar TV, Bhaduri A, Duvvuru Muni RR, Park JH, Kim TY. SimCAL: a flexible tool to compute biochemical reaction similarity. BMC Bioinformatics 2018; 19:254. [PMID: 29969981 PMCID: PMC6029250 DOI: 10.1186/s12859-018-2248-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 06/14/2018] [Indexed: 11/29/2022] Open
Abstract
Background Computation of reaction similarity is a pre-requisite for several bioinformatics applications including enzyme identification for specific biochemical reactions, enzyme classification and mining for specific inhibitors. Reaction similarity is often assessed at either two levels: (i) comparison across all the constituent substrates and products of a reaction, reaction level similarity, (ii) comparison at the transformation center with various degrees of neighborhood, transformation level similarity. Existing reaction similarity computation tools are designed for specific applications and use different features and similarity measures. A single system integrating these diverse features enables comparison of the impact of different molecular properties on similarity score computation. Results To address these requirements, we present SimCAL, an integrated system to calculate reaction similarity with novel features and capability to perform comparative assessment. SimCAL provides reaction similarity computation at both whole reaction level and transformation level. Novel physicochemical features such as stereochemistry, mass, volume and charge are included in computing reaction fingerprint. Users can choose from four different fingerprint types and nine molecular similarity measures. Further, a comparative assessment of these features is also enabled. The performance of SimCAL is assessed on 3,688,122 reaction pairs with Enzyme Commission (EC) number from MetaCyc and achieved an area under the curve (AUC) of > 0.9. In addition, SimCAL results showed strong correlation with state-of-the-art EC-BLAST and molecular signature based reaction similarity methods. Conclusions SimCAL is developed in java and is available as a standalone tool, with intuitive, user-friendly graphical interface and also as a console application. With its customizable feature selection and similarity calculations, it is expected to cater a wide audience interested in studying and analyzing biochemical reactions and metabolic networks. Electronic supplementary material The online version of this article (10.1186/s12859-018-2248-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Anirban Bhaduri
- Bioinformatics Lab, Samsung Advanced Institute of Technology, Bangalore, 560037, India
| | | | - Jin Hwan Park
- Biomaterials Lab, Materials Center, Samsung Advanced Institute of Technology, Gyeonggi-do, 443803, South Korea
| | - Tae Yong Kim
- Biomaterials Lab, Materials Center, Samsung Advanced Institute of Technology, Gyeonggi-do, 443803, South Korea.
| |
Collapse
|
44
|
Vazquez-Hernandez C, Loza A, Peguero-Sanchez E, Segovia L, Gutierrez-Rios RM. Identification of reaction organization patterns that naturally cluster enzymatic transformations. BMC SYSTEMS BIOLOGY 2018; 12:63. [PMID: 29848336 PMCID: PMC5977463 DOI: 10.1186/s12918-018-0583-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Accepted: 05/09/2018] [Indexed: 11/10/2022]
Abstract
BACKGROUND Metabolic reactions are chemical transformations commonly catalyzed by enzymes. In recent years, the explosion of genomic data and individual experimental characterizations have contributed to the construction of databases and methodologies for the analysis of metabolic networks. Some methodologies based on graph theory organize compound networks into metabolic functional categories without preserving biochemical pathways. Other methods based on chemical group exchange and atom flow trace the conversion of substrates into products in detail, which is useful for inferring metabolic pathways. METHODS Here, we present a novel rule-based approach incorporating both methods that decomposes each reaction into architectures of compound pairs and loner compounds that can be organized into tree structures. We compared the tree structure-compound pairs to those reported in the KEGG-RPAIR dataset and obtained a match precision of 81%. The generated tree structures naturally clustered all reactions into general reaction patterns of compounds with similar chemical transformations. The match precision of each cluster was calculated and used to suggest reactant-pairs for which manual curation can be avoided because this is the main goal of the method. We evaluated catalytic processes in the clusters based on Enzyme Commission categories that revealed preferential use of enzyme classes. CONCLUSIONS We demonstrate that the application of simple rules can enable the identification of reaction patterns reflecting metabolic reactions that transform substrates into products and the types of catalysis involved in these transformations. Our rule-based approach can be incorporated as the input in pathfinders or as a tool for the construction of reaction classifiers, indicating its usefulness for predicting enzyme catalysis.
Collapse
Affiliation(s)
- Carlos Vazquez-Hernandez
- Departamento de Microbiología Molecular, Instituto de Biotecnología Universidad Nacional Autónoma de México, Apdo, Postal 510-3, 62250, Cuernavaca, Morelos, Mexico
| | - Antonio Loza
- Departamento de Microbiología Molecular, Instituto de Biotecnología Universidad Nacional Autónoma de México, Apdo, Postal 510-3, 62250, Cuernavaca, Morelos, Mexico
| | - Esteban Peguero-Sanchez
- Departamento de Microbiología Molecular, Instituto de Biotecnología Universidad Nacional Autónoma de México, Apdo, Postal 510-3, 62250, Cuernavaca, Morelos, Mexico
| | - Lorenzo Segovia
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología Universidad Nacional Autónoma de México, Apdo, Postal 510-3, 62250, Cuernavaca, Morelos, Mexico
| | - Rosa-Maria Gutierrez-Rios
- Departamento de Microbiología Molecular, Instituto de Biotecnología Universidad Nacional Autónoma de México, Apdo, Postal 510-3, 62250, Cuernavaca, Morelos, Mexico.
| |
Collapse
|
45
|
Abstract
Directed evolution (DE) is a powerful tool for optimizing an enzyme's properties toward a particular objective, such as broader substrate scope, greater thermostability, or increased kcat. A successful DE project requires the generation of genetic diversity and subsequent screening or selection to identify variants with improved fitness. In contrast to random methods (error-prone PCR or DNA shuffling), site-directed mutagenesis enables the rational design of variant libraries and provides control over the nature and frequency of the encoded mutations. Knowledge of protein structure, dynamics, enzyme mechanisms, and natural evolution demonstrates that multiple (combinatorial) mutations are required to discover the most improved variants. To this end, we describe an experimentally straightforward and low-cost method for the preparation of combinatorial variant libraries. Our approach employs a two-step PCR protocol, first producing mutagenic megaprimers, which can then be combined in a "mix-and-match" fashion to generate diverse sets of combinatorial variant libraries both quickly and accurately.
Collapse
|
46
|
Plehiers PP, Marin GB, Stevens CV, Van Geem KM. Automated reaction database and reaction network analysis: extraction of reaction templates using cheminformatics. J Cheminform 2018. [PMID: 29524042 PMCID: PMC5845084 DOI: 10.1186/s13321-018-0269-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Both the automated generation of reaction networks and the automated prediction of synthetic trees require, in one way or another, the definition of possible transformations a molecule can undergo. One way of doing this is by using reaction templates. In view of the expanding amount of known reactions, it has become more and more difficult to envision all possible transformations that could occur in a studied system. Nonetheless, most reaction network generation tools rely on user-defined reaction templates. Not only does this limit the amount of chemistry that can be accounted for in the reaction networks, it also confines the wide-spread use of the tools by a broad public. In retrosynthetic analysis, the quality of the analysis depends on what percentage of the known chemistry is accounted for. Using databases to identify templates is therefore crucial in this respect. For this purpose, an algorithm has been developed to extract reaction templates from various types of chemical databases. Some databases such as the Kyoto Encyclopedia for Genes and Genomes and RMG do not report an atom-atom mapping (AAM) for the reactions. This makes the extraction of a template non-straightforward. If no mapping is available, it is calculated by the Reaction Decoder Tool (RDT). With a correct AAM-either calculated by RDT or specified-the algorithm consistently extracts a correct template for a wide variety of reactions, both elementary and non-elementary. The developed algorithm is a first step towards data-driven generation of synthetic trees or reaction networks, and a greater accessibility for non-expert users.
Collapse
Affiliation(s)
- Pieter P Plehiers
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 914, 9052, Ghent, Belgium
| | - Guy B Marin
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 914, 9052, Ghent, Belgium
| | - Christian V Stevens
- SynBioC Research Group, Department of Sustainable Organic Chemistry and Technology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Kevin M Van Geem
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 914, 9052, Ghent, Belgium.
| |
Collapse
|
47
|
Carbonell P, Wong J, Swainston N, Takano E, Turner NJ, Scrutton NS, Kell DB, Breitling R, Faulon JL. Selenzyme: enzyme selection tool for pathway design. Bioinformatics 2018; 34:2153-2154. [PMID: 29425325 PMCID: PMC9881682 DOI: 10.1093/bioinformatics/bty065] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 02/06/2018] [Indexed: 02/02/2023] Open
Abstract
Summary Synthetic biology applies the principles of engineering to biology in order to create biological functionalities not seen before in nature. One of the most exciting applications of synthetic biology is the design of new organisms with the ability to produce valuable chemicals including pharmaceuticals and biomaterials in a greener; sustainable fashion. Selecting the right enzymes to catalyze each reaction step in order to produce a desired target compound is, however, not trivial. Here, we present Selenzyme, a free online enzyme selection tool for metabolic pathway design. The user is guided through several decision steps in order to shortlist the best candidates for a given pathway step. The tool graphically presents key information about enzymes based on existing databases and tools such as: similarity of sequences and of catalyzed reactions; phylogenetic distance between source organism and intended host species; multiple alignment highlighting conserved regions, predicted catalytic site, and active regions and relevant properties such as predicted solubility and transmembrane regions. Selenzyme provides bespoke sequence selection for automated workflows in biofoundries. Availability and implementation The tool is integrated as part of the pathway design stage into the design-build-test-learn SYNBIOCHEM pipeline. The Selenzyme web server is available at http://selenzyme.synbiochem.co.uk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Jerry Wong
- BBSRC/EPSRC Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology
| | - Neil Swainston
- BBSRC/EPSRC Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology
| | - Eriko Takano
- BBSRC/EPSRC Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology,School of Chemistry, The University of Manchester, Manchester M1 7DN, UK
| | - Nicholas J Turner
- BBSRC/EPSRC Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology,School of Chemistry, The University of Manchester, Manchester M1 7DN, UK
| | - Nigel S Scrutton
- BBSRC/EPSRC Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology,School of Chemistry, The University of Manchester, Manchester M1 7DN, UK
| | - Douglas B Kell
- BBSRC/EPSRC Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology,School of Chemistry, The University of Manchester, Manchester M1 7DN, UK
| | - Rainer Breitling
- BBSRC/EPSRC Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology,School of Chemistry, The University of Manchester, Manchester M1 7DN, UK
| | - Jean-Loup Faulon
- BBSRC/EPSRC Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology,School of Chemistry, The University of Manchester, Manchester M1 7DN, UK,MICALIS, INRA-AgroParisTech, Domaine de Vilvert, 78352 Jouy en Josas Cedex, France
| |
Collapse
|
48
|
Mallory EK, Acharya A, Rensi SE, Turnbaugh PJ, Bright RA, Altman RB. Chemical reaction vector embeddings: towards predicting drug metabolism in the human gut microbiome. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:56-67. [PMID: 29218869 PMCID: PMC5771676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Bacteria in the human gut have the ability to activate, inactivate, and reactivate drugs with both intended and unintended effects. For example, the drug digoxin is reduced to the inactive metabolite dihydrodigoxin by the gut Actinobacterium E. lenta, and patients colonized with high levels of drug metabolizing strains may have limited response to the drug. Understanding the complete space of drugs that are metabolized by the human gut microbiome is critical for predicting bacteria-drug relationships and their effects on individual patient response. Discovery and validation of drug metabolism via bacterial enzymes has yielded >50 drugs after nearly a century of experimental research. However, there are limited computational tools for screening drugs for potential metabolism by the gut microbiome. We developed a pipeline for comparing and characterizing chemical transformations using continuous vector representations of molecular structure learned using unsupervised representation learning. We applied this pipeline to chemical reaction data from MetaCyc to characterize the utility of vector representations for chemical reaction transformations. After clustering molecular and reaction vectors, we performed enrichment analyses and queries to characterize the space. We detected enriched enzyme names, Gene Ontology terms, and Enzyme Consortium (EC) classes within reaction clusters. In addition, we queried reactions against drug-metabolite transformations known to be metabolized by the human gut microbiome. The top results for these known drug transformations contained similar substructure modifications to the original drug pair. This work enables high throughput screening of drugs and their resulting metabolites against chemical reactions common to gut bacteria.
Collapse
Affiliation(s)
- Emily K Mallory
- Biomedical Informatics Training Program, Stanford University, Stanford, CA 94305, USA
| | | | | | | | | | | |
Collapse
|
49
|
Carbonell P, Koch M, Duigou T, Faulon JL. Enzyme Discovery: Enzyme Selection and Pathway Design. Methods Enzymol 2018; 608:3-27. [PMID: 30173766 DOI: 10.1016/bs.mie.2018.04.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
In this protocol, we describe in silico design methods that can assist in the engineering of production pathways that are based on enzymatic transformations. The described protocols are the basis for automated processes to be integrated into an iterative Design-Build-Test-Learn cycle in synthetic biology for chemical production. Selecting the right enzyme sequence for a desired biocatalytic activity from the extensive catalogue of sequences available in databases is challenging and can dramatically influence the success of bioproducing chemical compounds. A method for enzyme selection is presented that helps identifying candidate enzyme sequences through a scoring approach that considers not only sequence homology but also reaction similarity. Selecting a viable biochemical pathway for compound production requires screening large sets of reactions in a process involving combinatorial complexity. A method for pathway design using retrosynthesis is presented. The protocol allows the discovery of alternative chemical pathways leading to the final product by using reaction rules of selectable degree of specificity. The protocols can be reversed through clustering discovery and product identification processes. The integration of these protocols into a general pipeline provides a toolbox for enhanced automated synthetic biology design and metabolic engineering.
Collapse
Affiliation(s)
- Pablo Carbonell
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester, United Kingdom
| | - Mathilde Koch
- Micalis Institute, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
| | - Thomas Duigou
- Micalis Institute, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
| | - Jean-Loup Faulon
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester, United Kingdom; Micalis Institute, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France; School of Chemistry, The University of Manchester, Manchester, United Kingdom.
| |
Collapse
|
50
|
Delépine B, Duigou T, Carbonell P, Faulon JL. RetroPath2.0: A retrosynthesis workflow for metabolic engineers. Metab Eng 2018; 45:158-170. [DOI: 10.1016/j.ymben.2017.12.002] [Citation(s) in RCA: 128] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Revised: 11/03/2017] [Accepted: 12/05/2017] [Indexed: 12/01/2022]
|