1
|
Harding-Larsen D, Madsen CD, Teze D, Kittilä T, Langhorn MR, Gharabli H, Hobusch M, Otalvaro FM, Kırtel O, Bidart GN, Mazurenko S, Travnik E, Welner DH. GASP: A Pan-Specific Predictor of Family 1 Glycosyltransferase Acceptor Specificity Enabled by a Pipeline for Substrate Feature Generation and Large-Scale Experimental Screening. ACS OMEGA 2024; 9:27278-27288. [PMID: 38947828 PMCID: PMC11209901 DOI: 10.1021/acsomega.4c01583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 05/27/2024] [Accepted: 05/29/2024] [Indexed: 07/02/2024]
Abstract
Glycosylation represents a major chemical challenge; while it is one of the most common reactions in Nature, conventional chemistry struggles with stereochemistry, regioselectivity, and solubility issues. In contrast, family 1 glycosyltransferase (GT1) enzymes can glycosylate virtually any given nucleophilic group with perfect control over stereochemistry and regioselectivity. However, the appropriate catalyst for a given reaction needs to be identified among the tens of thousands of available sequences. Here, we present the glycosyltransferase acceptor specificity predictor (GASP) model, a data-driven approach to the identification of reactive GT1:acceptor pairs. We trained a random forest-based acceptor predictor on literature data and validated it on independent in-house generated data on 1001 GT1:acceptor pairs, obtaining an AUROC of 0.79 and a balanced accuracy of 72%. The performance was stable even in the case of completely new GT1s and acceptors not present in the training data set, highlighting the pan-specificity of GASP. Moreover, the model is capable of parsing all known GT1 sequences, as well as all chemicals, the latter through a pipeline for the generation of 153 chemical features for a given molecule taking the CID or SMILES as input (freely available at https://github.com/degnbol/GASP). To investigate the power of GASP, the model prediction probability scores were compared to GT1 substrate conversion yields from a newly published data set, with the top 50% of GASP predictions corresponding to reactions with >50% synthetic yields. The model was also tested in two comparative case studies: glycosylation of the antihelminth drug niclosamide and the plant defensive compound DIBOA. In the first study, the model achieved an 83% hit rate, outperforming a hit rate of 53% from a random selection assay. In the second case study, the hit rate of GASP was 50%, and while being lower than the hit rate of 83% using expert-selected enzymes, it provides a reasonable performance for the cases when an expert opinion is unavailable. The hierarchal importance of the generated chemical features was investigated by negative feature selection, revealing properties related to cyclization and atom hybridization status to be the most important characteristics for accurate prediction. Our study provides a GT1:acceptor predictor which can be trained on other data sets enabled by the automated feature generation pipelines. We also release the new in-house generated data set used for testing of GASP to facilitate the future development of GT1 activity predictors and their robust benchmarking.
Collapse
Affiliation(s)
- David Harding-Larsen
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Christian Degnbol Madsen
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
- The
University of Melbourne Faculty of Science, Melbourne Integrative
Genomics, University of Melbourne, Building 184, Royal Parade, Parkville
3010, Melbourne, VIC 3052, Australia
| | - David Teze
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Tiia Kittilä
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | | | - Hani Gharabli
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Mandy Hobusch
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Felipe Mejia Otalvaro
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Onur Kırtel
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Gonzalo Nahuel Bidart
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Stanislav Mazurenko
- Department
of Experimental Biology and RECETOX, Faculty of Science, Masarykova Univerzita, Kamenice 5/A4, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Evelyn Travnik
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Ditte Hededam Welner
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| |
Collapse
|
2
|
Wang MQ, You ZN, Yang BY, Xia ZW, Chen Q, Pan J, Li CX, Xu JH. Machine-Learning-Guided Engineering of an NADH-Dependent 7β-Hydroxysteroid Dehydrogenase for Economic Synthesis of Ursodeoxycholic Acid. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:19672-19681. [PMID: 38016669 DOI: 10.1021/acs.jafc.3c06339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Enzymatic synthesis of ursodeoxycholic acid (UDCA) catalyzed by an NADH-dependent 7β-hydroxysteroid dehydrogenase (7β-HSDH) is more economic compared with an NADPH-dependent 7β-HSDH when considering the much higher cost of NADP+/NADPH than that of NAD+/NADH. However, the poor catalytic performance of NADH-dependent 7β-HSDH significantly limits its practical applications. Herein, machine-learning-guided protein engineering was performed on an NADH-dependent Rt7β-HSDHM0 from Ruminococcus torques. We combined random forest, Gaussian Naïve Bayes classifier, and Gaussian process regression with limited experimental data, resulting in the best variant Rt7β-HSDHM3 (R40I/R41K/F94Y/S196A/Y253F) with improvements in specific activity and half-life (40 °C) by 4.1-fold and 8.3-fold, respectively. The preparative biotransformation using a "two stage in one pot" sequential process coupled with Rt7β-HSDHM3 exhibited a space-time yield (STY) of 192 g L-1 d-1, which is so far the highest productivity for the biosynthesis of UDCA from chenodeoxycholic acid (CDCA) with NAD+ as a cofactor. More importantly, the cost of raw materials for the enzymatic production of UDCA employing Rt7β-HSDHM3 decreased by 22% in contrast to that of Rt7β-HSDHM0, indicating the tremendous potential of the variant Rt7β-HSDHM3 for more efficient and economic production of UDCA.
Collapse
Affiliation(s)
- Mu-Qiang Wang
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Zhi-Neng You
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Bing-Yi Yang
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Zi-Wei Xia
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Qi Chen
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
- Shanghai Collaborative Innovation Center for Biomanufacturing, School of Biotechnology, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Jiang Pan
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
- Shanghai Collaborative Innovation Center for Biomanufacturing, School of Biotechnology, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Chun-Xiu Li
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
- Shanghai Collaborative Innovation Center for Biomanufacturing, School of Biotechnology, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Jian-He Xu
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
- Shanghai Collaborative Innovation Center for Biomanufacturing, School of Biotechnology, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| |
Collapse
|
3
|
Rocha RA, Speight RE, Scott C. Engineering Enzyme Properties for Improved Biocatalytic Processes in Batch and Continuous Flow. Org Process Res Dev 2022. [DOI: 10.1021/acs.oprd.1c00424] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Raquel A. Rocha
- School of Biology and Environmental Science, Faculty of Science, Queensland University of Technology, Brisbane, Queensland 4000, Australia
- CSIRO Synthetic Biology Future Science Platform, CSIRO Land & Water, Black Mountain Science and Innovation Park, Canberra, ACT 2601, Australia
| | - Robert E. Speight
- School of Biology and Environmental Science, Faculty of Science, Queensland University of Technology, Brisbane, Queensland 4000, Australia
- ARC Centre of Excellence in Synthetic Biology, Queensland University of Technology, Brisbane, Queensland 4000, Australia
| | - Colin Scott
- CSIRO Synthetic Biology Future Science Platform, CSIRO Land & Water, Black Mountain Science and Innovation Park, Canberra, ACT 2601, Australia
| |
Collapse
|