1
|
Harding-Larsen D, Funk J, Madsen NG, Gharabli H, Acevedo-Rocha CG, Mazurenko S, Welner DH. Protein representations: Encoding biological information for machine learning in biocatalysis. Biotechnol Adv 2024; 77:108459. [PMID: 39366493 DOI: 10.1016/j.biotechadv.2024.108459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 09/19/2024] [Accepted: 09/29/2024] [Indexed: 10/06/2024]
Abstract
Enzymes offer a more environmentally friendly and low-impact solution to conventional chemistry, but they often require additional engineering for their application in industrial settings, an endeavour that is challenging and laborious. To address this issue, the power of machine learning can be harnessed to produce predictive models that enable the in silico study and engineering of improved enzymatic properties. Such machine learning models, however, require the conversion of the complex biological information to a numerical input, also called protein representations. These inputs demand special attention to ensure the training of accurate and precise models, and, in this review, we therefore examine the critical step of encoding protein information to numeric representations for use in machine learning. We selected the most important approaches for encoding the three distinct biological protein representations - primary sequence, 3D structure, and dynamics - to explore their requirements for employment and inductive biases. Combined representations of proteins and substrates are also introduced as emergent tools in biocatalysis. We propose the division of fixed representations, a collection of rule-based encoding strategies, and learned representations extracted from the latent spaces of large neural networks. To select the most suitable protein representation, we propose two main factors to consider. The first one is the model setup, which is influenced by the size of the training dataset and the choice of architecture. The second factor is the model objectives such as consideration about the assayed property, the difference between wild-type models and mutant predictors, and requirements for explainability. This review is aimed at serving as a source of information and guidance for properly representing enzymes in future machine learning models for biocatalysis.
Collapse
Affiliation(s)
- David Harding-Larsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Jonathan Funk
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Niklas Gesmar Madsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Hani Gharabli
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Carlos G Acevedo-Rocha
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Ditte Hededam Welner
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark.
| |
Collapse
|
2
|
Liu L, Liu S, Hu X, Zhou S, Deng Y. Enhancing the activity and succinyl-CoA specificity of 3-ketoacyl-CoA thiolase Tfu_0875 through rational binding pocket engineering. Synth Syst Biotechnol 2024; 9:558-568. [PMID: 38694995 PMCID: PMC11061225 DOI: 10.1016/j.synbio.2024.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 04/14/2024] [Accepted: 04/16/2024] [Indexed: 05/04/2024] Open
Abstract
The 3-ketoacyl-CoA thiolase is the rate-limiting enzyme for linear dicarboxylic acids production. However, the promiscuous substrate specificity and suboptimal catalytic performance have restricted its application. Here we present both biochemical and structural analyses of a high-efficiency 3-ketoacyl-CoA thiolase Tfu_0875. Notably, Tfu_0875 displayed heightened activity and substrate specificity for succinyl-CoA, a key precursor in adipic acid production. To enhance its performance, a deep learning approach (DLKcat) was employed to identify effective mutants, and a computational strategy, known as the greedy accumulated strategy for protein engineering (GRAPE), was used to accumulate these effective mutants. Among the mutants, Tfu_0875N249W/L163H/E217L exhibited the highest specific activity (320% of wild-type Tfu_0875), the greatest catalytic efficiency (kcat/KM = 1.00 min-1mM-1), the highest succinyl-CoA specificity (KM = 0.59 mM, 28.1% of Tfu_0875) and dramatically reduced substrate binding energy (-30.25 kcal mol-1v.s. -15.94 kcal mol-1). A structural comparison between Tfu_0875N249W/L163H/E217L and the wild type Tfu_0875 revealed that the increased interaction between the enzyme and succinyl-CoA was the primary reason for the enhanced enzyme activity. This interaction facilitated rapid substrate anchoring and stabilization. Furthermore, a reduced binding pocket volume improved substrate specificity by enhancing the complementarity between the binding pocket and the substrate in stereo conformation. Finally, our rationally designed mutant, Tfu_0875N249W/L163H/E217L, increased the adipic acid titer by 1.35-fold compared to the wild type Tfu_0875 in shake flask. The demonstrated enzymatic methods provide a promising enzyme variant for the adipic acid production. The above effective substrate binding pocket engineering strategy can be beneficial for the production of other industrially competitive biobased chemicals when be applied to other thiolases.
Collapse
Affiliation(s)
- Lixia Liu
- National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Shuang Liu
- National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Xiangyang Hu
- National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Shenghu Zhou
- National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Yu Deng
- National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| |
Collapse
|
3
|
Harding-Larsen D, Madsen CD, Teze D, Kittilä T, Langhorn MR, Gharabli H, Hobusch M, Otalvaro FM, Kırtel O, Bidart GN, Mazurenko S, Travnik E, Welner DH. GASP: A Pan-Specific Predictor of Family 1 Glycosyltransferase Acceptor Specificity Enabled by a Pipeline for Substrate Feature Generation and Large-Scale Experimental Screening. ACS OMEGA 2024; 9:27278-27288. [PMID: 38947828 PMCID: PMC11209901 DOI: 10.1021/acsomega.4c01583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 05/27/2024] [Accepted: 05/29/2024] [Indexed: 07/02/2024]
Abstract
Glycosylation represents a major chemical challenge; while it is one of the most common reactions in Nature, conventional chemistry struggles with stereochemistry, regioselectivity, and solubility issues. In contrast, family 1 glycosyltransferase (GT1) enzymes can glycosylate virtually any given nucleophilic group with perfect control over stereochemistry and regioselectivity. However, the appropriate catalyst for a given reaction needs to be identified among the tens of thousands of available sequences. Here, we present the glycosyltransferase acceptor specificity predictor (GASP) model, a data-driven approach to the identification of reactive GT1:acceptor pairs. We trained a random forest-based acceptor predictor on literature data and validated it on independent in-house generated data on 1001 GT1:acceptor pairs, obtaining an AUROC of 0.79 and a balanced accuracy of 72%. The performance was stable even in the case of completely new GT1s and acceptors not present in the training data set, highlighting the pan-specificity of GASP. Moreover, the model is capable of parsing all known GT1 sequences, as well as all chemicals, the latter through a pipeline for the generation of 153 chemical features for a given molecule taking the CID or SMILES as input (freely available at https://github.com/degnbol/GASP). To investigate the power of GASP, the model prediction probability scores were compared to GT1 substrate conversion yields from a newly published data set, with the top 50% of GASP predictions corresponding to reactions with >50% synthetic yields. The model was also tested in two comparative case studies: glycosylation of the antihelminth drug niclosamide and the plant defensive compound DIBOA. In the first study, the model achieved an 83% hit rate, outperforming a hit rate of 53% from a random selection assay. In the second case study, the hit rate of GASP was 50%, and while being lower than the hit rate of 83% using expert-selected enzymes, it provides a reasonable performance for the cases when an expert opinion is unavailable. The hierarchal importance of the generated chemical features was investigated by negative feature selection, revealing properties related to cyclization and atom hybridization status to be the most important characteristics for accurate prediction. Our study provides a GT1:acceptor predictor which can be trained on other data sets enabled by the automated feature generation pipelines. We also release the new in-house generated data set used for testing of GASP to facilitate the future development of GT1 activity predictors and their robust benchmarking.
Collapse
Affiliation(s)
- David Harding-Larsen
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Christian Degnbol Madsen
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
- The
University of Melbourne Faculty of Science, Melbourne Integrative
Genomics, University of Melbourne, Building 184, Royal Parade, Parkville
3010, Melbourne, VIC 3052, Australia
| | - David Teze
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Tiia Kittilä
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | | | - Hani Gharabli
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Mandy Hobusch
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Felipe Mejia Otalvaro
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Onur Kırtel
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Gonzalo Nahuel Bidart
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Stanislav Mazurenko
- Department
of Experimental Biology and RECETOX, Faculty of Science, Masarykova Univerzita, Kamenice 5/A4, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Evelyn Travnik
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Ditte Hededam Welner
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| |
Collapse
|
4
|
Salvatti BA, Chagas MA, Fernandes PO, Ladeira YFX, Bozzi AS, Valadares VS, Valente AP, de Miranda AS, Rocha WR, Maltarollo VG, Moraes AH. Understanding the Enzyme ( S)-Norcoclaurine Synthase Promiscuity to Aldehydes and Ketones. J Chem Inf Model 2024; 64:4462-4474. [PMID: 38776464 DOI: 10.1021/acs.jcim.3c01773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
The (S)-norcoclaurine synthase from Thalictrum flavum (TfNCS) stereoselectively catalyzes the Pictet-Spengler reaction between dopamine and 4-hydroxyphenylacetaldehyde to give (S)-norcoclaurine. TfNCS can catalyze the Pictet-Spengler reaction with various aldehydes and ketones, leading to diverse tetrahydroisoquinolines. This substrate promiscuity positions TfNCS as a highly promising enzyme for synthesizing fine chemicals. Understanding carbonyl-containing substrates' structural and electronic signatures that influence TfNCS activity can help expand its applications in the synthesis of different compounds and aid in protein optimization strategies. In this study, we investigated the influence of the molecular properties of aldehydes and ketones on their reactivity in the TfNCS-catalyzed Pictet-Spengler reaction. Initially, we compiled a library of reactive and unreactive compounds from previous publications. We also performed enzymatic assays using nuclear magnetic resonance to identify some reactive and unreactive carbonyl compounds, which were then included in the library. Subsequently, we employed QSAR and DFT calculations to establish correlations between substrate-candidate structures and reactivity. Our findings highlight correlations of structural and stereoelectronic features, including the electrophilicity of the carbonyl group, to the reactivity of aldehydes and ketones toward the TfNCS-catalyzed Pictet-Spengler reaction. Interestingly, experimental data of seven compounds out of fifty-three did not correlate with the electrophilicity of the carbonyl group. For these seven compounds, we identified unfavorable interactions between them and the TfNCS. Our results demonstrate the applications of in silico techniques in understanding enzyme promiscuity and specificity, with a particular emphasis on machine learning methodologies, DFT electronic structure calculations, and molecular dynamic (MD) simulations.
Collapse
Affiliation(s)
- Brunno A Salvatti
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Marcelo A Chagas
- Departamento de Ciências Exatas, Universidade do Estado de Minas Gerais, João Monlevade, Minas Gerais 35930-314, Brazil
| | - Phillipe O Fernandes
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Yan F X Ladeira
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Aline S Bozzi
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Veronica S Valadares
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Ana Paula Valente
- Centro Nacional de Ressonância Magnética Nuclear, Instituto de Bioquímica Médica Leopoldo de Meis, Centro de Ciências da Saúde, Universidade Federal do Rio de Janeiro, Rio de Janeiro 21.941-902, Brazil
| | - Amanda S de Miranda
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Willian R Rocha
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Vinicius G Maltarollo
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Adolfo H Moraes
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| |
Collapse
|
5
|
Han Y, Zhang H, Zeng Z, Liu Z, Lu D, Liu Z. Descriptor-augmented machine learning for enzyme-chemical interaction predictions. Synth Syst Biotechnol 2024; 9:259-268. [PMID: 38450325 PMCID: PMC10915406 DOI: 10.1016/j.synbio.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/08/2024] Open
Abstract
Descriptors play a pivotal role in enzyme design for the greener synthesis of biochemicals, as they could characterize enzymes and chemicals from the physicochemical and evolutionary perspective. This study examined the effects of various descriptors on the performance of Random Forest model used for enzyme-chemical relationships prediction. We curated activity data of seven specific enzyme families from the literature and developed the pipeline for evaluation the machine learning model performance using 10-fold cross-validation. The influence of protein and chemical descriptors was assessed in three scenarios, which were predicting the activity of unknown relations between known enzymes and known chemicals (new relationship evaluation), predicting the activity of novel enzymes on known chemicals (new enzyme evaluation), and predicting the activity of new chemicals on known enzymes (new chemical evaluation). The results showed that protein descriptors significantly enhanced the classification performance of model on new enzyme evaluation in three out of the seven datasets with the greatest number of enzymes, whereas chemical descriptors appear no effect. A variety of sequence-based and structure-based protein descriptors were constructed, among which the esm-2 descriptor achieved the best results. Using enzyme families as labels showed that descriptors could cluster proteins well, which could explain the contributions of descriptors to the machine learning model. As a counterpart, in the new chemical evaluation, chemical descriptors made significant improvement in four out of the seven datasets, while protein descriptors appear no effect. We attempted to evaluate the generalization ability of the model by correlating the statistics of the datasets with the performance of the models. The results showed that datasets with higher sequence similarity were more likely to get better results in the new enzyme evaluation and datasets with more enzymes were more likely beneficial from the protein descriptor strategy. This work provides guidance for the development of machine learning models for specific enzyme families.
Collapse
Affiliation(s)
- Yilei Han
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Haoye Zhang
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Zheni Zeng
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Zhiyuan Liu
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Diannan Lu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Zheng Liu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
6
|
Yu Y, Trottmann NF, Schärer MR, Fenner K, Robinson SL. Substrate promiscuity of xenobiotic-transforming hydrolases from stream biofilms impacted by treated wastewater. WATER RESEARCH 2024; 256:121593. [PMID: 38631239 DOI: 10.1016/j.watres.2024.121593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 04/07/2024] [Accepted: 04/08/2024] [Indexed: 04/19/2024]
Abstract
Organic contaminants enter aquatic ecosystems from various sources, including wastewater treatment plant effluent. Freshwater biofilms play a major role in the removal of organic contaminants from receiving water bodies, but knowledge of the molecular mechanisms driving contaminant biotransformations in complex stream biofilm (periphyton) communities remains limited. Previously, we demonstrated that biofilms in experimental flume systems grown at higher ratios of treated wastewater (WW) to stream water displayed an increased biotransformation potential for a number of organic contaminants. We identified a positive correlation between WW percentage and biofilm biotransformation rates for the widely-used insect repellent, N,N-diethyl-meta-toluamide (DEET) and a number of other wastewater-borne contaminants with hydrolyzable moieties. Here, we conducted deep shotgun sequencing of flume biofilms and identified a positive correlation between WW percentage and metagenomic read abundances of DEET hydrolase (DH) homologs. To test the causality of this association, we constructed a targeted metagenomic library of DH homologs from flume biofilms. We screened our complete metagenomic library for activity with four different substrates, including DEET, and a subset thereof with 183 WW-related organic compounds. The majority of active hydrolases in the metagenomic library preferred aliphatic and aromatic ester substrates while, remarkably, only a single reference enzyme was capable of DEET hydrolysis. Of the 626 total enzyme-substrate combinations tested, approximately 5% were active enzyme-substrate pairs. Metagenomic DH family homologs revealed a broad substrate promiscuity spanning 22 different compounds when summed across all enzymes tested. We biochemically characterized the most promiscuous and active enzymes identified based on metagenomic analysis from uncultivated Rhodospirillaceae and Planctomycetaceae. In addition to characterizing new DH family enzymes, we exemplified a framework for linking metagenome-guided hypothesis generation with experimental validation. Overall, this study expands the scope of known enzymatic contaminant biotransformations for metagenomic hydrolases from WW-receiving stream biofilm communities.
Collapse
Affiliation(s)
- Yaochun Yu
- Department of Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology (Eawag), 8600 Dübendorf, Switzerland
| | - Niklas Ferenc Trottmann
- Department of Environmental Microbiology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), 8600 Dübendorf, Switzerland
| | - Milo R Schärer
- Department of Environmental Microbiology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), 8600 Dübendorf, Switzerland
| | - Kathrin Fenner
- Department of Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology (Eawag), 8600 Dübendorf, Switzerland; Department of Chemistry, University of Zürich, 8057 Zürich, Switzerland
| | - Serina L Robinson
- Department of Environmental Microbiology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), 8600 Dübendorf, Switzerland.
| |
Collapse
|
7
|
Tsutsui Y, Yanaka I, Takeda K, Kondo M, Takizawa S, Kojima R, Konishi A, Yasuda M. Selective recognition between aromatics and aliphatics by cage-shaped borates supported by a machine learning approach. Org Biomol Chem 2024; 22:4283-4291. [PMID: 38602393 DOI: 10.1039/d4ob00408f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Selective recognition between hydrocarbon moieties is a longstanding issue. Although we developed a π-pocket Lewis acid catalyst with high selectivity for aromatic aldehydes over aliphatic ones, a general strategy for catalyst design remains elusive. As an approach that transfers the molecular recognition based on multiple cooperative non-covalent interactions within the π-pocket to a rational catalyst design, herein, we demonstrate Lewis acid catalysts showing improved selectivity through the support of an ensemble algorithm with random forest, Ada Boost, and XG Boost as a machine learning (ML) approach. Using 7963 explanatory variables extracted from model hetero-Diels-Alder reactions, the ensemble algorithm predicted the chemoselectivity of unlearned catalysts. Experiments confirmed the prediction. The proposed catalyst shows the highest selective recognition, reminiscing enzymatic catalytic activity. Additionally, a SHapley Additive exPlanations (SHAP) method suggested that the selectivity originates from the polarizability and three-dimensional size of the catalyst. This insight leads to rational design guidelines for Lewis acid catalysts with dispersion forces.
Collapse
Affiliation(s)
- Yuya Tsutsui
- Department of Applied Chemistry, Graduate School of Engineering, Osaka University, Suita, 565-0871, Japan.
| | - Issei Yanaka
- Department of Engineering, Graduate School of Integrated Science and Technology, Shizuoka University, Hamamatsu, 432-8561, Japan.
| | - Kazuhiro Takeda
- Department of Engineering, Graduate School of Integrated Science and Technology, Shizuoka University, Hamamatsu, 432-8561, Japan.
| | - Masaru Kondo
- School of Pharmaceutical Sciences, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan
| | | | - Ryosuke Kojima
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Sakyo-ku, 606-8507, Japan
| | - Akihito Konishi
- Department of Applied Chemistry, Graduate School of Engineering, Osaka University, Suita, 565-0871, Japan.
- Innovative Catalysis Science Division, Institute for Open and Transdisciplinary Research Initiatives (ICS-OTRI), Osaka University, Suita, 565-0871, Japan
| | - Makoto Yasuda
- Department of Applied Chemistry, Graduate School of Engineering, Osaka University, Suita, 565-0871, Japan.
- Innovative Catalysis Science Division, Institute for Open and Transdisciplinary Research Initiatives (ICS-OTRI), Osaka University, Suita, 565-0871, Japan
| |
Collapse
|
8
|
Wang X, Quinn D, Moody TS, Huang M. ALDELE: All-Purpose Deep Learning Toolkits for Predicting the Biocatalytic Activities of Enzymes. J Chem Inf Model 2024; 64:3123-3139. [PMID: 38573056 PMCID: PMC11040732 DOI: 10.1021/acs.jcim.4c00058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/15/2024] [Accepted: 03/11/2024] [Indexed: 04/05/2024]
Abstract
Rapidly predicting enzyme properties for catalyzing specific substrates is essential for identifying potential enzymes for industrial transformations. The demand for sustainable production of valuable industry chemicals utilizing biological resources raised a pressing need to speed up biocatalyst screening using machine learning techniques. In this research, we developed an all-purpose deep-learning-based multiple-toolkit (ALDELE) workflow for screening enzyme catalysts. ALDELE incorporates both structural and sequence representations of proteins, alongside representations of ligands by subgraphs and overall physicochemical properties. Comprehensive evaluation demonstrated that ALDELE can predict the catalytic activities of enzymes, and particularly, it identifies residue-based hotspots to guide enzyme engineering and generates substrate heat maps to explore the substrate scope for a given biocatalyst. Moreover, our models notably match empirical data, reinforcing the practicality and reliability of our approach through the alignment with confirmed mutation sites. ALDELE offers a facile and comprehensive solution by integrating different toolkits tailored for different purposes at affordable computational cost and therefore would be valuable to speed up the discovery of new functional enzymes for their exploitation by the industry.
Collapse
Affiliation(s)
- Xiangwen Wang
- School
of Chemistry and Chemical Engineering, Queen’s
University Belfast, Belfast BT9 5AG, Northern Ireland, U.K.
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, Craigavon BT63 5QD, Northern Ireland, U.K.
| | - Derek Quinn
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, Craigavon BT63 5QD, Northern Ireland, U.K.
| | - Thomas S. Moody
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, Craigavon BT63 5QD, Northern Ireland, U.K.
- Arran
Chemical Company Limited, Unit 1 Monksland Industrial Estate, Athlone,
Co., Roscommon N37 DN24, Ireland
| | - Meilan Huang
- School
of Chemistry and Chemical Engineering, Queen’s
University Belfast, Belfast BT9 5AG, Northern Ireland, U.K.
| |
Collapse
|
9
|
Ao YF, Dörr M, Menke MJ, Born S, Heuson E, Bornscheuer UT. Data-Driven Protein Engineering for Improving Catalytic Activity and Selectivity. Chembiochem 2024; 25:e202300754. [PMID: 38029350 DOI: 10.1002/cbic.202300754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 12/01/2023]
Abstract
Protein engineering is essential for altering the substrate scope, catalytic activity and selectivity of enzymes for applications in biocatalysis. However, traditional approaches, such as directed evolution and rational design, encounter the challenge in dealing with the experimental screening process of a large protein mutation space. Machine learning methods allow the approximation of protein fitness landscapes and the identification of catalytic patterns using limited experimental data, thus providing a new avenue to guide protein engineering campaigns. In this concept article, we review machine learning models that have been developed to assess enzyme-substrate-catalysis performance relationships aiming to improve enzymes through data-driven protein engineering. Furthermore, we prospect the future development of this field to provide additional strategies and tools for achieving desired activities and selectivities.
Collapse
Affiliation(s)
- Yu-Fei Ao
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
- Beijing National Laboratory for Molecular Sciences, CAS Key Laboratory of Molecular Recognition and Function, Institute of Chemistry, Chinese Academy of Sciences, Zhongguancun North First Street 2, Beijing, 100190, China
- University of Chinese Academy of Sciences, Yuquan Road 19(A), Beijing, 100049, China
| | - Mark Dörr
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| | - Marian J Menke
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| | - Stefan Born
- Technische Universität Berlin, Chair of Bioprocess Engineering, Ackerstraße 76, 13355, Berlin, Germany
| | - Egon Heuson
- Univ. Lille, CNRS, Centrale Lille, Univ. Artois, UMR 8181 UCCS, Unité de Catalyse et Chimie du Solide, 59000, Lille, France
| | - Uwe T Bornscheuer
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| |
Collapse
|
10
|
King-Smith E, Faber FA, Reilly U, Sinitskiy AV, Yang Q, Liu B, Hyek D, Lee AA. Predictive Minisci late stage functionalization with transfer learning. Nat Commun 2024; 15:426. [PMID: 38225239 PMCID: PMC10789750 DOI: 10.1038/s41467-023-42145-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 10/01/2023] [Indexed: 01/17/2024] Open
Abstract
Structural diversification of lead molecules is a key component of drug discovery to explore chemical space. Late-stage functionalizations (LSFs) are versatile methodologies capable of installing functional handles on richly decorated intermediates to deliver numerous diverse products in a single reaction. Predicting the regioselectivity of LSF is still an open challenge in the field. Numerous efforts from chemoinformatics and machine learning (ML) groups have made strides in this area. However, it is arduous to isolate and characterize the multitude of LSF products generated, limiting available data and hindering pure ML approaches. We report the development of an approach that combines a message passing neural network and 13C NMR-based transfer learning to predict the atom-wise probabilities of functionalization for Minisci and P450-based functionalizations. We validated our model both retrospectively and with a series of prospective experiments, showing that it accurately predicts the outcomes of Minisci-type and P450 transformations and outperforms the well-established Fukui-based reactivity indices and other machine learning reactivity-based algorithms.
Collapse
Affiliation(s)
- Emma King-Smith
- Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Felix A Faber
- Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Usa Reilly
- Development & Medical, Pfizer Worldwide Research, Groton, CT, USA
| | - Anton V Sinitskiy
- Machine Learning Computational Sciences, Pfizer Worldwide Research, Cambridge, MA, USA
| | - Qingyi Yang
- Development & Medical, Pfizer Worldwide Research, Cambridge, MA, USA
| | - Bo Liu
- Spectrix Analytic Services, LLC., North Haven, CT, USA
| | - Dennis Hyek
- Spectrix Analytic Services, LLC., North Haven, CT, USA
| | - Alpha A Lee
- Cavendish Laboratory, University of Cambridge, Cambridge, UK.
| |
Collapse
|
11
|
Xi C, Diao J, Moon TS. Advances in ligand-specific biosensing for structurally similar molecules. Cell Syst 2023; 14:1024-1043. [PMID: 38128482 PMCID: PMC10751988 DOI: 10.1016/j.cels.2023.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/23/2023] [Accepted: 10/19/2023] [Indexed: 12/23/2023]
Abstract
The specificity of biological systems makes it possible to develop biosensors targeting specific metabolites, toxins, and pollutants in complex medical or environmental samples without interference from structurally similar compounds. For the last two decades, great efforts have been devoted to creating proteins or nucleic acids with novel properties through synthetic biology strategies. Beyond augmenting biocatalytic activity, expanding target substrate scopes, and enhancing enzymes' enantioselectivity and stability, an increasing research area is the enhancement of molecular specificity for genetically encoded biosensors. Here, we summarize recent advances in the development of highly specific biosensor systems and their essential applications. First, we describe the rational design principles required to create libraries containing potential mutants with less promiscuity or better specificity. Next, we review the emerging high-throughput screening techniques to engineer biosensing specificity for the desired target. Finally, we examine the computer-aided evaluation and prediction methods to facilitate the construction of ligand-specific biosensors.
Collapse
Affiliation(s)
- Chenggang Xi
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Jinjin Diao
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Tae Seok Moon
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA; Division of Biology and Biomedical Sciences, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
12
|
Ge F, Chen G, Qian M, Xu C, Liu J, Cao J, Li X, Hu D, Xu Y, Xin Y, Wang D, Zhou J, Shi H, Tan Z. Artificial Intelligence Aided Lipase Production and Engineering for Enzymatic Performance Improvement. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:14911-14930. [PMID: 37800676 DOI: 10.1021/acs.jafc.3c05029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/07/2023]
Abstract
With the development of artificial intelligence (AI), tailoring methods for enzyme engineering have been widely expanded. Additional protocols based on optimized network models have been used to predict and optimize lipase production as well as properties, namely, catalytic activity, stability, and substrate specificity. Here, different network models and algorithms for the prediction and reforming of lipase, focusing on its modification methods and cases based on AI, are reviewed in terms of both their advantages and disadvantages. Different neural networks coupled with various algorithms are usually applied to predict the maximum yield of lipase by optimizing the external cultivations for lipase production, while one part is used to predict the molecule variations affecting the properties of lipase. However, few studies have directly utilized AI to engineer lipase by affecting the structure of the enzyme, and a set of research gaps needs to be explored. Additionally, future perspectives of AI application in enzymes, including lipase engineering, are deduced to help the redesign of enzymes and the reform of new functional biocatalysts. This review provides a new horizon for developing effective and innovative AI tools for lipase production and engineering and facilitating lipase applications in the food industry and biomass conversion.
Collapse
Affiliation(s)
- Feiyin Ge
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Gang Chen
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Minjing Qian
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Cheng Xu
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Jiao Liu
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Jiaqi Cao
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Xinchao Li
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Die Hu
- School of Pharmacy & School of Biological and Food Engineering, Changzhou University, Changzhou 213164, People's Republic of China
| | - Yangsen Xu
- Dongtai Hanfangyuan Biotechnology Co. Ltd., Yancheng 224241, People's Republic of China
| | - Ya Xin
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Dianlong Wang
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Jia Zhou
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Hao Shi
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Zhongbiao Tan
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| |
Collapse
|
13
|
Zhang Q, Zheng W, Song Z, Zhang Q, Yang L, Wu J, Lin J, Xu G, Yu H. Machine Learning Enables Prediction of Pyrrolysyl-tRNA Synthetase Substrate Specificity. ACS Synth Biol 2023; 12:2403-2417. [PMID: 37486975 DOI: 10.1021/acssynbio.3c00225] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]
Abstract
Knowledge about the substrate scope for a given enzyme is informative for elucidating biochemical pathways and also for expanding applications of the enzyme. However, no general methods are available to accurately predict the substrate specificity of an enzyme. Pyrrolysyl-tRNA synthetase (PylRS) is a powerful tool for incorporating various noncanonical amino acids (NCAAs) into proteins, which enabled us to probe, image, rationally engineer, and evolve protein structure and function. However, the incorporation of a new NCAA typically requires the selection of large libraries of PylRS with randomized mutations at active sites, and this process requires multiple rounds of selection for each new substrate. Therefore, a single aminoacyl-tRNA synthetase with broad substrate promiscuity is ideal to facilitate widespread applications of the genetic NCAA incorporation technique. Herein, machine learning models were developed to predict the substrate specificity of PylRS to accept novel NCAAs that could be incorporated into proteins by three PylRS mutants. The models were built from a training set of 285 unique enzyme-substrate pairs of three PylRS mutants including IFRS, BtaRS, and MFRS against 95 NCAAs. The best BaggingTree (BT) model was then used for virtually screening a NCAAs library containing 1474 phenylalanine, tyrosine, tryptophan, and alanine analogues, and 156 NCAAs were predicted to be accepted by at least one of the three PylRS mutants. Then, 27 NCAAs including 24 positive and 3 negative substrates were experimentally tested for their activities, and 20 of the 24 positive substrates showed weak or strong activity and were accepted by at least one PylRS mutant, among which 11 NCAAs were never reported to be incorporated into proteins before. Three negative substrates did not show any activity. Experimental results suggested that the BT model provides a three-class classification accuracy of 0.69 and a binary classification accuracy of 0.86. This study expanded the substrate scope of three PylRS variants and provided a framework for developing machine learning models to predict substrate specificity of other PylRS variants.
Collapse
Affiliation(s)
- Qunfeng Zhang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Wenlong Zheng
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| | - Zhongdi Song
- Key Laboratory of Pollution Exposure and Health Intervention of Zhejiang Province, Interdisciplinary Research Academy, Zhejiang Shuren University, Hangzhou 310015, China
| | - Qiang Zhang
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Lirong Yang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| | - Jianping Wu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| | - Jianping Lin
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Gang Xu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Haoran Yu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| |
Collapse
|
14
|
Clements HD, Flynn AR, Nicholls BT, Grosheva D, Lefave SJ, Merriman MT, Hyster TK, Sigman MS. Using Data Science for Mechanistic Insights and Selectivity Predictions in a Non-Natural Biocatalytic Reaction. J Am Chem Soc 2023; 145:17656-17664. [PMID: 37530568 PMCID: PMC10602048 DOI: 10.1021/jacs.3c03639] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
The study of non-natural biocatalytic transformations relies heavily on empirical methods, such as directed evolution, for identifying improved variants. Although exceptionally effective, this approach provides limited insight into the molecular mechanisms behind the transformations and necessitates multiple protein engineering campaigns for new reactants. To address this limitation, we disclose a strategy to explore the biocatalytic reaction space and garner insight into the molecular mechanisms driving enzymatic transformations. Specifically, we explored the selectivity of an "ene"-reductase, GluER-T36A, to create a data-driven toolset that explores reaction space and rationalizes the observed and predicted selectivities of substrate/mutant combinations. The resultant statistical models related structural features of the enzyme and substrate to selectivity and were used to effectively predict selectivity in reactions with out-of-sample substrates and mutants. Our approach provided a deeper understanding of enantioinduction by GluER-T36A and holds the potential to enhance the virtual screening of enzyme mutants.
Collapse
Affiliation(s)
- Hanna D Clements
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Autumn R Flynn
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Bryce T Nicholls
- Department of Chemistry and Chemical Biology, Cornell University, 122 Baker Laboratory, Ithaca, New York 14853, United States
| | - Daria Grosheva
- Department of Chemistry and Chemical Biology, Cornell University, 122 Baker Laboratory, Ithaca, New York 14853, United States
| | - Sarah J Lefave
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Morgan T Merriman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Todd K Hyster
- Department of Chemistry and Chemical Biology, Cornell University, 122 Baker Laboratory, Ithaca, New York 14853, United States
| | - Matthew S Sigman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| |
Collapse
|
15
|
Vasina M, Kovar D, Damborsky J, Ding Y, Yang T, deMello A, Mazurenko S, Stavrakis S, Prokop Z. In-depth analysis of biocatalysts by microfluidics: An emerging source of data for machine learning. Biotechnol Adv 2023; 66:108171. [PMID: 37150331 DOI: 10.1016/j.biotechadv.2023.108171] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 05/04/2023] [Accepted: 05/04/2023] [Indexed: 05/09/2023]
Abstract
Nowadays, the vastly increasing demand for novel biotechnological products is supported by the continuous development of biocatalytic applications which provide sustainable green alternatives to chemical processes. The success of a biocatalytic application is critically dependent on how quickly we can identify and characterize enzyme variants fitting the conditions of industrial processes. While miniaturization and parallelization have dramatically increased the throughput of next-generation sequencing systems, the subsequent characterization of the obtained candidates is still a limiting process in identifying the desired biocatalysts. Only a few commercial microfluidic systems for enzyme analysis are currently available, and the transformation of numerous published prototypes into commercial platforms is still to be streamlined. This review presents the state-of-the-art, recent trends, and perspectives in applying microfluidic tools in the functional and structural analysis of biocatalysts. We discuss the advantages and disadvantages of available technologies, their reproducibility and robustness, and readiness for routine laboratory use. We also highlight the unexplored potential of microfluidics to leverage the power of machine learning for biocatalyst development.
Collapse
Affiliation(s)
- Michal Vasina
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - David Kovar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Yun Ding
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland
| | - Tianjin Yang
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland; Department of Biochemistry, University of Zurich, 8057 Zurich, Switzerland
| | - Andrew deMello
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic.
| | - Stavros Stavrakis
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland.
| | - Zbynek Prokop
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic.
| |
Collapse
|
16
|
Jiang Y, Ran X, Yang ZJ. Data-driven enzyme engineering to identify function-enhancing enzymes. Protein Eng Des Sel 2023; 36:gzac009. [PMID: 36214500 PMCID: PMC10365845 DOI: 10.1093/protein/gzac009] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 08/08/2022] [Accepted: 09/28/2022] [Indexed: 01/22/2023] Open
Abstract
Identifying function-enhancing enzyme variants is a 'holy grail' challenge in protein science because it will allow researchers to expand the biocatalytic toolbox for late-stage functionalization of drug-like molecules, environmental degradation of plastics and other pollutants, and medical treatment of food allergies. Data-driven strategies, including statistical modeling, machine learning, and deep learning, have largely advanced the understanding of the sequence-structure-function relationships for enzymes. They have also enhanced the capability of predicting and designing new enzymes and enzyme variants for catalyzing the transformation of new-to-nature reactions. Here, we reviewed the recent progresses of data-driven models that were applied in identifying efficiency-enhancing mutants for catalytic reactions. We also discussed existing challenges and obstacles faced by the community. Although the review is by no means comprehensive, we hope that the discussion can inform the readers about the state-of-the-art in data-driven enzyme engineering, inspiring more joint experimental-computational efforts to develop and apply data-driven modeling to innovate biocatalysts for synthetic and pharmaceutical applications.
Collapse
Affiliation(s)
- Yaoyukun Jiang
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
| | - Xinchun Ran
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
| | - Zhongyue J Yang
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
- Vanderbilt Institute of Chemical Biology, Vanderbilt University, Nashville, TN 37235, USA
- Data Science Institute, Vanderbilt University, Nashville, TN 37235, USA
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, TN 37235, USA
| |
Collapse
|
17
|
Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022; 21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open
Abstract
Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D.J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
18
|
Kovács SC, Szappanos B, Tengölics R, Notebaart RA, Papp B. Underground metabolism as a rich reservoir for pathway engineering. Bioinformatics 2022; 38:3070-3077. [PMID: 35441658 PMCID: PMC9154287 DOI: 10.1093/bioinformatics/btac282] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 04/12/2022] [Accepted: 04/14/2022] [Indexed: 11/25/2022] Open
Abstract
Motivation Bioproduction of value-added compounds is frequently achieved by utilizing enzymes from other species. However, expression of such heterologous enzymes can be detrimental due to unexpected interactions within the host cell. Recently, an alternative strategy emerged, which relies on recruiting side activities of host enzymes to establish new biosynthetic pathways. Although such low-level ‘underground’ enzyme activities are prevalent, it remains poorly explored whether they may serve as an important reservoir for pathway engineering. Results Here, we use genome-scale modeling to estimate the theoretical potential of underground reactions for engineering novel biosynthetic pathways in Escherichia coli. We found that biochemical reactions contributed by underground enzyme activities often enhance the in silico production of compounds with industrial importance, including several cases where underground activities are indispensable for production. Most of these new capabilities can be achieved by the addition of one or two underground reactions to the native network, suggesting that only a few side activities need to be enhanced during implementation. Remarkably, we find that the contribution of underground reactions to the production of value-added compounds is comparable to that of heterologous reactions, underscoring their biotechnological potential. Taken together, our genome-wide study demonstrates that exploiting underground enzyme activities could be a promising addition to the toolbox of industrial strain development. Availability and implementation The data and scripts underlying this article are available on GitHub at https://github.com/pappb/Kovacs-et-al-Underground-metabolism. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Szabolcs Cselgő Kovács
- HCEMM-BRC Metabolic Systems Biology Lab, Szeged, Hungary.,Biological Research Centre, Institute of Biochemistry, Synthetic and Systems Biology Unit, Eötvös Loránd Research Network (ELKH), Szeged, Hungary
| | - Balázs Szappanos
- HCEMM-BRC Metabolic Systems Biology Lab, Szeged, Hungary.,Biological Research Centre, Institute of Biochemistry, Synthetic and Systems Biology Unit, Eötvös Loránd Research Network (ELKH), Szeged, Hungary.,Department of Biotechnology, University of Szeged, Szeged, Hungary
| | - Roland Tengölics
- HCEMM-BRC Metabolic Systems Biology Lab, Szeged, Hungary.,Biological Research Centre, Institute of Biochemistry, Synthetic and Systems Biology Unit, Eötvös Loránd Research Network (ELKH), Szeged, Hungary
| | - Richard A Notebaart
- Food Microbiology, Wageningen University & Research, Wageningen, The Netherlands
| | - Balázs Papp
- HCEMM-BRC Metabolic Systems Biology Lab, Szeged, Hungary.,Biological Research Centre, Institute of Biochemistry, Synthetic and Systems Biology Unit, Eötvös Loránd Research Network (ELKH), Szeged, Hungary
| |
Collapse
|
19
|
Machine learning modeling of family wide enzyme-substrate specificity screens. PLoS Comput Biol 2022; 18:e1009853. [PMID: 35143485 PMCID: PMC8865696 DOI: 10.1371/journal.pcbi.1009853] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Revised: 02/23/2022] [Accepted: 01/21/2022] [Indexed: 11/19/2022] Open
Abstract
Biocatalysis is a promising approach to sustainably synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale. However, the adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates. While machine learning and in silico directed evolution are well-posed for this predictive modeling challenge, efforts to date have primarily aimed to increase activity against a single known substrate, rather than to identify enzymes capable of acting on new substrates of interest. To address this need, we curate 6 different high-quality enzyme family screens from the literature that each measure multiple enzymes against multiple substrates. We compare machine learning-based compound-protein interaction (CPI) modeling approaches from the literature used for predicting drug-target interactions. Surprisingly, comparing these interaction-based models against collections of independent (single task) enzyme-only or substrate-only models reveals that current CPI approaches are incapable of learning interactions between compounds and proteins in the current family level data regime. We further validate this observation by demonstrating that our no-interaction baseline can outperform CPI-based models from the literature used to guide the discovery of kinase inhibitors. Given the high performance of non-interaction based models, we introduce a new structure-based strategy for pooling residue representations across a protein sequence. Altogether, this work motivates a principled path forward in order to build and evaluate meaningful predictive models for biocatalysis and other drug discovery applications. Predicting interactions between compounds and proteins represents a long-standing dream of drug discovery and protein engineering. Robust models of enzyme-substrate scope would dramatically advance our ability to design synthetic routes involving enzymatic catalysis. However, the lack of standardization between compound-protein interaction studies makes it difficult to evaluate the generalizability of such models. In this work we take a critical step forward by standardizing high-quality datasets measuring enzyme-substrate interactions, outlining rigorous evaluations, and proposing a new way to integrate structural information into protein representations. In testing previous modeling approaches, we highlight a surprising inability of existing models to effectively leverage compound-protein interactions to improve generalization, which challenges a perception in the literature. This establishes future opportunities for model development and integration of enzyme-substrate scope models into computer-aided synthesis planning software.
Collapse
|
20
|
Dudley QM, Cai YM, Kallam K, Debreyne H, Carrasco Lopez JA, Patron NJ. Biofoundry-assisted expression and characterization of plant proteins. Synth Biol (Oxf) 2021; 6:ysab029. [PMID: 34693026 PMCID: PMC8529701 DOI: 10.1093/synbio/ysab029] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 08/25/2021] [Accepted: 09/09/2021] [Indexed: 12/29/2022] Open
Abstract
Many goals in synthetic biology, including the elucidation and refactoring of biosynthetic pathways and the engineering of regulatory circuits and networks, require knowledge of protein function. In plants, the prevalence of large gene families means it can be particularly challenging to link specific functions to individual proteins. However, protein characterization has remained a technical bottleneck, often requiring significant effort to optimize expression and purification protocols. To leverage the ability of biofoundries to accelerate design-built-test-learn cycles, we present a workflow for automated DNA assembly and cell-free expression of plant proteins that accelerates optimization and enables rapid screening of enzyme activity. First, we developed a phytobrick-compatible Golden Gate DNA assembly toolbox containing plasmid acceptors for cell-free expression using Escherichia coli or wheat germ lysates as well as a set of N- and C-terminal tag parts for detection, purification and improved expression/folding. We next optimized automated assembly of miniaturized cell-free reactions using an acoustic liquid handling platform and then compared tag configurations to identify those that increase expression. We additionally developed a luciferase-based system for rapid quantification that requires a minimal 11-amino acid tag and demonstrate facile removal of tags following synthesis. Finally, we show that several functional assays can be performed with cell-free protein synthesis reactions without the need for protein purification. Together, the combination of automated assembly of DNA parts and cell-free expression reactions should significantly increase the throughput of experiments to test and understand plant protein function and enable the direct reuse of DNA parts in downstream plant engineering workflows.
Collapse
Affiliation(s)
- Quentin M Dudley
- Engineering Biology, Earlham Institute, Norwich Research Park, Norwich, Norfolk UK
| | - Yao-Min Cai
- Engineering Biology, Earlham Institute, Norwich Research Park, Norwich, Norfolk UK
| | - Kalyani Kallam
- Engineering Biology, Earlham Institute, Norwich Research Park, Norwich, Norfolk UK
| | - Hubert Debreyne
- Engineering Biology, Earlham Institute, Norwich Research Park, Norwich, Norfolk UK
| | | | - Nicola J Patron
- Engineering Biology, Earlham Institute, Norwich Research Park, Norwich, Norfolk UK
| |
Collapse
|
21
|
Dutta K, Shityakov S, Khalifa I. New Trends in Bioremediation Technologies Toward Environment-Friendly Society: A Mini-Review. Front Bioeng Biotechnol 2021; 9:666858. [PMID: 34409018 PMCID: PMC8365754 DOI: 10.3389/fbioe.2021.666858] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 05/26/2021] [Indexed: 01/29/2023] Open
Abstract
Today's environmental balance has been compromised by the unreasonable and sometimes dangerous actions committed by humans to maintain their dominance over the Earth's natural resources. As a result, oceans are contaminated by the different types of plastic trash, crude oil coming from mismanagement of transporting ships spilling it in the water, and air pollution due to increasing production of greenhouse gases, such as CO2 and CH4 etc., into the atmosphere. The lands, agricultural fields, and groundwater are also contaminated by the infamous chemicals viz., polycyclic aromatic hydrocarbons, pyrethroids pesticides, bisphenol-A, and dioxanes. Therefore, bioremediation might function as a convenient alternative to restore a clean environment. However, at present, the majority of bioremediation reports are limited to the natural capabilities of microbial enzymes. Synthetic biology with uncompromised supervision of ethical standards could help to outsmart nature's engineering, such as the CETCH cycle for improved CO2 fixation. Additionally, a blend of synthetic biology with machine learning algorithms could expand the possibilities of bioengineering. This review summarized current state-of-the-art knowledge of the data-assisted enzyme redesigning to actively promote new research on important enzymes to ameliorate the environment.
Collapse
Affiliation(s)
- Kunal Dutta
- Department of Human Physiology, Vidyasagar University, Medinipur, India
| | - Sergey Shityakov
- Department of Chemoinformatics, Infochemistry Scientific Center, Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University), Saint-Petersburg, Russia
| | - Ibrahim Khalifa
- Food Technology Department, Faculty of Agriculture, Benha University, Moshtohor, Egypt
| |
Collapse
|
22
|
Fenner K, Elsner M, Lueders T, McLachlan MS, Wackett LP, Zimmermann M, Drewes JE. Methodological Advances to Study Contaminant Biotransformation: New Prospects for Understanding and Reducing Environmental Persistence? ACS ES&T WATER 2021; 1:1541-1554. [PMID: 34278380 PMCID: PMC8276273 DOI: 10.1021/acsestwater.1c00025] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2021] [Revised: 06/11/2021] [Accepted: 06/11/2021] [Indexed: 05/14/2023]
Abstract
Complex microbial communities in environmental systems play a key role in the detoxification of chemical contaminants by transforming them into less active metabolites or by complete mineralization. Biotransformation, i.e., transformation by microbes, is well understood for a number of priority pollutants, but a similar level of understanding is lacking for many emerging contaminants encountered at low concentrations and in complex mixtures across natural and engineered systems. Any advanced approaches aiming to reduce environmental exposure to such contaminants (e.g., novel engineered biological water treatment systems, design of readily degradable chemicals, or improved regulatory assessment strategies to determine contaminant persistence a priori) will depend on understanding the causal links among contaminant removal, the key driving agents of biotransformation at low concentrations (i.e., relevant microbes and their metabolic activities), and how their presence and activity depend on environmental conditions. In this Perspective, we present the current understanding and recent methodological advances that can help to identify such links, even in complex environmental microbiomes and for contaminants present at low concentrations in complex chemical mixtures. We discuss the ensuing insights into contaminant biotransformation across varying environments and conditions and ask how much closer we have come to designing improved approaches to reducing environmental exposure to contaminants.
Collapse
Affiliation(s)
- Kathrin Fenner
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland
- Institute of Biogeochemistry and Pollutant Dynamics, ETH Zürich, 8092 Zürich, Switzerland
- Department of Chemistry, University of Zürich, 8057 Zürich, Switzerland
| | - Martin Elsner
- Chair of Analytical Chemistry and Water Chemistry, Technical University of Munich, 85748 Garching, Germany
| | - Tillmann Lueders
- Chair of Ecological Microbiology, Bayreuth Center of Ecology and Environmental Research (BayCEER), University of Bayreuth, 95448 Bayreuth, Germany
| | - Michael S McLachlan
- Department of Environmental Science (ACES), Stockholm University, 106 91 Stockholm, Sweden
| | - Lawrence P Wackett
- Biotechnology Institute, University of Minnesota, Saint Paul, Minnesota 55108, United States
| | - Michael Zimmermann
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Jörg E Drewes
- Chair of Urban Water Systems Engineering, Technical University of Munich, 85748 Garching, Germany
| |
Collapse
|
23
|
Smith MD, Tassoulas LJ, Biernath TA, Richman JE, Aukema KG, Wackett LP. p-Nitrophenyl esters provide new insights and applications for the thiolase enzyme OleA. Comput Struct Biotechnol J 2021; 19:3087-3096. [PMID: 34141132 PMCID: PMC8180931 DOI: 10.1016/j.csbj.2021.05.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 05/18/2021] [Accepted: 05/19/2021] [Indexed: 11/21/2022] Open
Abstract
The OleA enzyme is distinct amongst thiolase enzymes in binding two long (≥C8) acyl chains into structurally-opposed hydrophobic channels, denoted A and B, to carry out a non-decarboxylative Claisen condensation reaction and initiate the biosynthesis of membrane hydrocarbons and β-lactone natural products. OleA has now been identified in hundreds of diverse bacteria via bioinformatics and high-throughput screening using p-nitrophenyl alkanoate esters as surrogate substrates. In the present study, p-nitrophenyl esters were used to probe the reaction mechanism of OleA and shown to be incorporated into Claisen condensation products for the first time. p-Nitrophenyl alkanoate substrates alone were shown not to undergo Claisen condensation, but co-incubation of p-nitrophenyl esters and CoA thioesters produced mixed Claisen products. Mixed product reactions were shown to initiate via acyl group transfer from a p-nitrophenyl carrier to the enzyme active site cysteine, C143. Acyl chains esterified to p-nitrophenol were synthesized and shown to undergo Claisen condensation with an acyl-CoA substrate, showing potential to greatly expand the range of possible Claisen products. Using p-nitrophenyl 1-13C-decanoate, the Channel A bound thioester chain was shown to act as the Claisen nucleophile, representing the first direct evidence for the directionality of the Claisen reaction in any OleA enzyme. These results both provide new insights into OleA catalysis and open a path for making unnatural hydrocarbon and β-lactone natural products for biotechnological applications using cheap and easily synthesized p-nitrophenyl esters.
Collapse
Affiliation(s)
- Megan D. Smith
- Biotechnology Institute, University of Minnesota, St Paul, MN, USA
- Department of Microbiology and Immunology, University of Minnesota, Minneapolis, MN, USA
- Microbial and Plant Genomics Institute, University of Minnesota, St Paul, MN, USA
| | - Lambros J. Tassoulas
- Biotechnology Institute, University of Minnesota, St Paul, MN, USA
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, St Paul, MN, USA
| | - Troy A. Biernath
- Biotechnology Institute, University of Minnesota, St Paul, MN, USA
| | - Jack E. Richman
- Biotechnology Institute, University of Minnesota, St Paul, MN, USA
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, St Paul, MN, USA
| | - Kelly G. Aukema
- Biotechnology Institute, University of Minnesota, St Paul, MN, USA
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, St Paul, MN, USA
| | - Lawrence P. Wackett
- Biotechnology Institute, University of Minnesota, St Paul, MN, USA
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, St Paul, MN, USA
- Microbial and Plant Genomics Institute, University of Minnesota, St Paul, MN, USA
| |
Collapse
|