1
|
Alazmi M. Enzyme catalytic efficiency prediction: employing convolutional neural networks and XGBoost. Front Artif Intell 2024; 7:1446063. [PMID: 39498388 PMCID: PMC11532030 DOI: 10.3389/frai.2024.1446063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Accepted: 10/07/2024] [Indexed: 11/07/2024] Open
Abstract
Introduction In the intricate realm of enzymology, the precise quantification of enzyme efficiency, epitomized by the turnover number (k cat), is a paramount yet elusive objective. Existing methodologies, though sophisticated, often grapple with the inherent stochasticity and multifaceted nature of enzymatic reactions. Thus, there arises a necessity to explore avant-garde computational paradigms. Methods In this context, we introduce "enzyme catalytic efficiency prediction (ECEP)," leveraging advanced deep learning techniques to enhance the previous implementation, TurNuP, for predicting the enzyme catalase k cat. Our approach significantly outperforms prior methodologies, incorporating new features derived from enzyme sequences and chemical reaction dynamics. Through ECEP, we unravel the intricate enzyme-substrate interactions, capturing the nuanced interplay of molecular determinants. Results Preliminary assessments, compared against established models like TurNuP and DLKcat, underscore the superior predictive capabilities of ECEP, marking a pivotal shift in silico enzymatic turnover number estimation. This study enriches the computational toolkit available to enzymologists and lays the groundwork for future explorations in the burgeoning field of bioinformatics. This paper suggested a multi-feature ensemble deep learning-based approach to predict enzyme kinetic parameters using an ensemble convolution neural network and XGBoost by calculating weighted-average of each feature-based model's output to outperform traditional machine learning methods. The proposed "ECEP" model significantly outperformed existing methodologies, achieving a mean squared error (MSE) reduction of 0.35 from 0.81 to 0.46 and R-squared score from 0.44 to 0.54, thereby demonstrating its superior accuracy and effectiveness in enzyme catalytic efficiency prediction. Discussion This improvement underscores the model's potential to enhance the field of bioinformatics, setting a new benchmark for performance.
Collapse
Affiliation(s)
- Meshari Alazmi
- College of Computer Science and Engineering, University of Ha’il, Ha’il, Saudi Arabia
| |
Collapse
|
2
|
Harding-Larsen D, Funk J, Madsen NG, Gharabli H, Acevedo-Rocha CG, Mazurenko S, Welner DH. Protein representations: Encoding biological information for machine learning in biocatalysis. Biotechnol Adv 2024; 77:108459. [PMID: 39366493 DOI: 10.1016/j.biotechadv.2024.108459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 09/19/2024] [Accepted: 09/29/2024] [Indexed: 10/06/2024]
Abstract
Enzymes offer a more environmentally friendly and low-impact solution to conventional chemistry, but they often require additional engineering for their application in industrial settings, an endeavour that is challenging and laborious. To address this issue, the power of machine learning can be harnessed to produce predictive models that enable the in silico study and engineering of improved enzymatic properties. Such machine learning models, however, require the conversion of the complex biological information to a numerical input, also called protein representations. These inputs demand special attention to ensure the training of accurate and precise models, and, in this review, we therefore examine the critical step of encoding protein information to numeric representations for use in machine learning. We selected the most important approaches for encoding the three distinct biological protein representations - primary sequence, 3D structure, and dynamics - to explore their requirements for employment and inductive biases. Combined representations of proteins and substrates are also introduced as emergent tools in biocatalysis. We propose the division of fixed representations, a collection of rule-based encoding strategies, and learned representations extracted from the latent spaces of large neural networks. To select the most suitable protein representation, we propose two main factors to consider. The first one is the model setup, which is influenced by the size of the training dataset and the choice of architecture. The second factor is the model objectives such as consideration about the assayed property, the difference between wild-type models and mutant predictors, and requirements for explainability. This review is aimed at serving as a source of information and guidance for properly representing enzymes in future machine learning models for biocatalysis.
Collapse
Affiliation(s)
- David Harding-Larsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Jonathan Funk
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Niklas Gesmar Madsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Hani Gharabli
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Carlos G Acevedo-Rocha
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Ditte Hededam Welner
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark.
| |
Collapse
|
3
|
Wan X, Shahrear S, Chew SW, Vilaplana F, Mäkelä MR. Discovery of alkaline laccases from basidiomycete fungi through machine learning-based approach. BIOTECHNOLOGY FOR BIOFUELS AND BIOPRODUCTS 2024; 17:120. [PMID: 39261970 PMCID: PMC11391777 DOI: 10.1186/s13068-024-02566-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 09/02/2024] [Indexed: 09/13/2024]
Abstract
BACKGROUND Laccases can oxidize a broad spectrum of substrates, offering promising applications in various sectors, such as bioremediation, biomass fractionation in future biorefineries, and synthesis of biochemicals and biopolymers. However, laccase discovery and optimization with a desirable pH optimum remains a challenge due to the labor-intensive and time-consuming nature of the traditional laboratory methods. RESULTS This study presents a machine learning (ML)-integrated approach for predicting pH optima of basidiomycete fungal laccases, utilizing a small, curated dataset against a vast metagenomic data. Comparative computational analyses unveiled the structural and pH-dependent solubility differences between acidic and neutral-alkaline laccases, helping us understand the molecular bases of enzyme pH optimum. The pH profiling of the two ML-predicted alkaline laccase candidates from the basidiomycete fungus Lepista nuda further validated our computational approach, showing the accuracy of this comprehensive method. CONCLUSIONS This study uncovers the efficacy of ML in the prediction of enzyme pH optimum from minimal datasets, marking a significant step towards harnessing computational tools for systematic screening of enzymes for biotechnology applications.
Collapse
Affiliation(s)
- Xing Wan
- Department of Microbiology, Faculty of Agriculture and Forestry, University of Helsinki, Biocenter 1, Viikinkaari 9, 00790, Helsinki, Finland.
| | - Sazzad Shahrear
- Department of Microbiology, Faculty of Agriculture and Forestry, University of Helsinki, Biocenter 1, Viikinkaari 9, 00790, Helsinki, Finland
| | - Shea Wen Chew
- Department of Microbiology, Faculty of Agriculture and Forestry, University of Helsinki, Biocenter 1, Viikinkaari 9, 00790, Helsinki, Finland
| | - Francisco Vilaplana
- Division of Glycoscience, Department of Chemistry, School of Engineering Science in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, AlbaNova University Center, Roslagstullbacken 21, 11421, Stockholm, Sweden
| | - Miia R Mäkelä
- Department of Microbiology, Faculty of Agriculture and Forestry, University of Helsinki, Biocenter 1, Viikinkaari 9, 00790, Helsinki, Finland.
- Department of Bioproducts and Biosystems, Aalto University, Kemistintie 1, 02150, Espoo, Finland.
| |
Collapse
|
4
|
Fordjour E, Liu CL, Yang Y, Bai Z. Recent advances in lycopene and germacrene a biosynthesis and their role as antineoplastic drugs. World J Microbiol Biotechnol 2024; 40:254. [PMID: 38916754 DOI: 10.1007/s11274-024-04057-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 06/17/2024] [Indexed: 06/26/2024]
Abstract
Sesquiterpenes and tetraterpenes are classes of plant-derived natural products with antineoplastic effects. While plant extraction of the sesquiterpene, germacrene A, and the tetraterpene, lycopene suffers supply chain deficits and poor yields, chemical synthesis has difficulties in separating stereoisomers. This review highlights cutting-edge developments in producing germacrene A and lycopene from microbial cell factories. We then summarize the antineoplastic properties of β-elemene (a thermal product from germacrene A), sesquiterpene lactones (metabolic products from germacrene A), and lycopene. We also elaborate on strategies to optimize microbial-based germacrene A and lycopene production.
Collapse
Affiliation(s)
- Eric Fordjour
- The Key Laboratory of Industrial Biotechnology, School of Biotechnology, Ministry of Education, Jiangnan University, Wuxi, 214122, China
- National Engineering Research Center of Cereal Fermentation, and Food Biomanufacturing, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu , 214122, China
- Jiangsu Provincial Research Centre for Bioactive Product Processing Technology, Jiangnan University, Wuxi, 214122, China
| | - Chun-Li Liu
- The Key Laboratory of Industrial Biotechnology, School of Biotechnology, Ministry of Education, Jiangnan University, Wuxi, 214122, China.
- National Engineering Research Center of Cereal Fermentation, and Food Biomanufacturing, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu , 214122, China.
- Jiangsu Provincial Research Centre for Bioactive Product Processing Technology, Jiangnan University, Wuxi, 214122, China.
| | - Yankun Yang
- The Key Laboratory of Industrial Biotechnology, School of Biotechnology, Ministry of Education, Jiangnan University, Wuxi, 214122, China
- National Engineering Research Center of Cereal Fermentation, and Food Biomanufacturing, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu , 214122, China
- Jiangsu Provincial Research Centre for Bioactive Product Processing Technology, Jiangnan University, Wuxi, 214122, China
| | - Zhonghu Bai
- The Key Laboratory of Industrial Biotechnology, School of Biotechnology, Ministry of Education, Jiangnan University, Wuxi, 214122, China
- National Engineering Research Center of Cereal Fermentation, and Food Biomanufacturing, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu , 214122, China
- Jiangsu Provincial Research Centre for Bioactive Product Processing Technology, Jiangnan University, Wuxi, 214122, China
| |
Collapse
|
5
|
Harding-Larsen D, Madsen CD, Teze D, Kittilä T, Langhorn MR, Gharabli H, Hobusch M, Otalvaro FM, Kırtel O, Bidart GN, Mazurenko S, Travnik E, Welner DH. GASP: A Pan-Specific Predictor of Family 1 Glycosyltransferase Acceptor Specificity Enabled by a Pipeline for Substrate Feature Generation and Large-Scale Experimental Screening. ACS OMEGA 2024; 9:27278-27288. [PMID: 38947828 PMCID: PMC11209901 DOI: 10.1021/acsomega.4c01583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 05/27/2024] [Accepted: 05/29/2024] [Indexed: 07/02/2024]
Abstract
Glycosylation represents a major chemical challenge; while it is one of the most common reactions in Nature, conventional chemistry struggles with stereochemistry, regioselectivity, and solubility issues. In contrast, family 1 glycosyltransferase (GT1) enzymes can glycosylate virtually any given nucleophilic group with perfect control over stereochemistry and regioselectivity. However, the appropriate catalyst for a given reaction needs to be identified among the tens of thousands of available sequences. Here, we present the glycosyltransferase acceptor specificity predictor (GASP) model, a data-driven approach to the identification of reactive GT1:acceptor pairs. We trained a random forest-based acceptor predictor on literature data and validated it on independent in-house generated data on 1001 GT1:acceptor pairs, obtaining an AUROC of 0.79 and a balanced accuracy of 72%. The performance was stable even in the case of completely new GT1s and acceptors not present in the training data set, highlighting the pan-specificity of GASP. Moreover, the model is capable of parsing all known GT1 sequences, as well as all chemicals, the latter through a pipeline for the generation of 153 chemical features for a given molecule taking the CID or SMILES as input (freely available at https://github.com/degnbol/GASP). To investigate the power of GASP, the model prediction probability scores were compared to GT1 substrate conversion yields from a newly published data set, with the top 50% of GASP predictions corresponding to reactions with >50% synthetic yields. The model was also tested in two comparative case studies: glycosylation of the antihelminth drug niclosamide and the plant defensive compound DIBOA. In the first study, the model achieved an 83% hit rate, outperforming a hit rate of 53% from a random selection assay. In the second case study, the hit rate of GASP was 50%, and while being lower than the hit rate of 83% using expert-selected enzymes, it provides a reasonable performance for the cases when an expert opinion is unavailable. The hierarchal importance of the generated chemical features was investigated by negative feature selection, revealing properties related to cyclization and atom hybridization status to be the most important characteristics for accurate prediction. Our study provides a GT1:acceptor predictor which can be trained on other data sets enabled by the automated feature generation pipelines. We also release the new in-house generated data set used for testing of GASP to facilitate the future development of GT1 activity predictors and their robust benchmarking.
Collapse
Affiliation(s)
- David Harding-Larsen
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Christian Degnbol Madsen
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
- The
University of Melbourne Faculty of Science, Melbourne Integrative
Genomics, University of Melbourne, Building 184, Royal Parade, Parkville
3010, Melbourne, VIC 3052, Australia
| | - David Teze
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Tiia Kittilä
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | | | - Hani Gharabli
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Mandy Hobusch
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Felipe Mejia Otalvaro
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Onur Kırtel
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Gonzalo Nahuel Bidart
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Stanislav Mazurenko
- Department
of Experimental Biology and RECETOX, Faculty of Science, Masarykova Univerzita, Kamenice 5/A4, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Evelyn Travnik
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| | - Ditte Hededam Welner
- DTU
Biosustain, Technical University of Denmark, Kemitorvet 220, Lyngby, Denmark 2800
| |
Collapse
|
6
|
Han Y, Zhang H, Zeng Z, Liu Z, Lu D, Liu Z. Descriptor-augmented machine learning for enzyme-chemical interaction predictions. Synth Syst Biotechnol 2024; 9:259-268. [PMID: 38450325 PMCID: PMC10915406 DOI: 10.1016/j.synbio.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/08/2024] Open
Abstract
Descriptors play a pivotal role in enzyme design for the greener synthesis of biochemicals, as they could characterize enzymes and chemicals from the physicochemical and evolutionary perspective. This study examined the effects of various descriptors on the performance of Random Forest model used for enzyme-chemical relationships prediction. We curated activity data of seven specific enzyme families from the literature and developed the pipeline for evaluation the machine learning model performance using 10-fold cross-validation. The influence of protein and chemical descriptors was assessed in three scenarios, which were predicting the activity of unknown relations between known enzymes and known chemicals (new relationship evaluation), predicting the activity of novel enzymes on known chemicals (new enzyme evaluation), and predicting the activity of new chemicals on known enzymes (new chemical evaluation). The results showed that protein descriptors significantly enhanced the classification performance of model on new enzyme evaluation in three out of the seven datasets with the greatest number of enzymes, whereas chemical descriptors appear no effect. A variety of sequence-based and structure-based protein descriptors were constructed, among which the esm-2 descriptor achieved the best results. Using enzyme families as labels showed that descriptors could cluster proteins well, which could explain the contributions of descriptors to the machine learning model. As a counterpart, in the new chemical evaluation, chemical descriptors made significant improvement in four out of the seven datasets, while protein descriptors appear no effect. We attempted to evaluate the generalization ability of the model by correlating the statistics of the datasets with the performance of the models. The results showed that datasets with higher sequence similarity were more likely to get better results in the new enzyme evaluation and datasets with more enzymes were more likely beneficial from the protein descriptor strategy. This work provides guidance for the development of machine learning models for specific enzyme families.
Collapse
Affiliation(s)
- Yilei Han
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Haoye Zhang
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Zheni Zeng
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Zhiyuan Liu
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Diannan Lu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Zheng Liu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
7
|
Kroll A, Ranjan S, Lercher MJ. A multimodal Transformer Network for protein-small molecule interactions enhances predictions of kinase inhibition and enzyme-substrate relationships. PLoS Comput Biol 2024; 20:e1012100. [PMID: 38768223 PMCID: PMC11142704 DOI: 10.1371/journal.pcbi.1012100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/31/2024] [Accepted: 04/24/2024] [Indexed: 05/22/2024] Open
Abstract
The activities of most enzymes and drugs depend on interactions between proteins and small molecules. Accurate prediction of these interactions could greatly accelerate pharmaceutical and biotechnological research. Current machine learning models designed for this task have a limited ability to generalize beyond the proteins used for training. This limitation is likely due to a lack of information exchange between the protein and the small molecule during the generation of the required numerical representations. Here, we introduce ProSmith, a machine learning framework that employs a multimodal Transformer Network to simultaneously process protein amino acid sequences and small molecule strings in the same input. This approach facilitates the exchange of all relevant information between the two molecule types during the computation of their numerical representations, allowing the model to account for their structural and functional interactions. Our final model combines gradient boosting predictions based on the resulting multimodal Transformer Network with independent predictions based on separate deep learning representations of the proteins and small molecules. The resulting predictions outperform recently published state-of-the-art models for predicting protein-small molecule interactions across three diverse tasks: predicting kinase inhibitions; inferring potential substrates for enzymes; and predicting Michaelis constants KM. The Python code provided can be used to easily implement and improve machine learning predictions involving arbitrary protein-small molecule interactions.
Collapse
Affiliation(s)
- Alexander Kroll
- Institute for Computer Science and Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| | - Sahasra Ranjan
- Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
| | - Martin J. Lercher
- Institute for Computer Science and Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
8
|
Wang X, Quinn D, Moody TS, Huang M. ALDELE: All-Purpose Deep Learning Toolkits for Predicting the Biocatalytic Activities of Enzymes. J Chem Inf Model 2024; 64:3123-3139. [PMID: 38573056 PMCID: PMC11040732 DOI: 10.1021/acs.jcim.4c00058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/15/2024] [Accepted: 03/11/2024] [Indexed: 04/05/2024]
Abstract
Rapidly predicting enzyme properties for catalyzing specific substrates is essential for identifying potential enzymes for industrial transformations. The demand for sustainable production of valuable industry chemicals utilizing biological resources raised a pressing need to speed up biocatalyst screening using machine learning techniques. In this research, we developed an all-purpose deep-learning-based multiple-toolkit (ALDELE) workflow for screening enzyme catalysts. ALDELE incorporates both structural and sequence representations of proteins, alongside representations of ligands by subgraphs and overall physicochemical properties. Comprehensive evaluation demonstrated that ALDELE can predict the catalytic activities of enzymes, and particularly, it identifies residue-based hotspots to guide enzyme engineering and generates substrate heat maps to explore the substrate scope for a given biocatalyst. Moreover, our models notably match empirical data, reinforcing the practicality and reliability of our approach through the alignment with confirmed mutation sites. ALDELE offers a facile and comprehensive solution by integrating different toolkits tailored for different purposes at affordable computational cost and therefore would be valuable to speed up the discovery of new functional enzymes for their exploitation by the industry.
Collapse
Affiliation(s)
- Xiangwen Wang
- School
of Chemistry and Chemical Engineering, Queen’s
University Belfast, Belfast BT9 5AG, Northern Ireland, U.K.
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, Craigavon BT63 5QD, Northern Ireland, U.K.
| | - Derek Quinn
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, Craigavon BT63 5QD, Northern Ireland, U.K.
| | - Thomas S. Moody
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, Craigavon BT63 5QD, Northern Ireland, U.K.
- Arran
Chemical Company Limited, Unit 1 Monksland Industrial Estate, Athlone,
Co., Roscommon N37 DN24, Ireland
| | - Meilan Huang
- School
of Chemistry and Chemical Engineering, Queen’s
University Belfast, Belfast BT9 5AG, Northern Ireland, U.K.
| |
Collapse
|
9
|
Yang J, Li FZ, Arnold FH. Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering. ACS CENTRAL SCIENCE 2024; 10:226-241. [PMID: 38435522 PMCID: PMC10906252 DOI: 10.1021/acscentsci.3c01275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 12/26/2023] [Accepted: 01/16/2024] [Indexed: 03/05/2024]
Abstract
Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.
Collapse
Affiliation(s)
- Jason Yang
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Francesca-Zhoufan Li
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Frances H. Arnold
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
10
|
Ao YF, Dörr M, Menke MJ, Born S, Heuson E, Bornscheuer UT. Data-Driven Protein Engineering for Improving Catalytic Activity and Selectivity. Chembiochem 2024; 25:e202300754. [PMID: 38029350 DOI: 10.1002/cbic.202300754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 12/01/2023]
Abstract
Protein engineering is essential for altering the substrate scope, catalytic activity and selectivity of enzymes for applications in biocatalysis. However, traditional approaches, such as directed evolution and rational design, encounter the challenge in dealing with the experimental screening process of a large protein mutation space. Machine learning methods allow the approximation of protein fitness landscapes and the identification of catalytic patterns using limited experimental data, thus providing a new avenue to guide protein engineering campaigns. In this concept article, we review machine learning models that have been developed to assess enzyme-substrate-catalysis performance relationships aiming to improve enzymes through data-driven protein engineering. Furthermore, we prospect the future development of this field to provide additional strategies and tools for achieving desired activities and selectivities.
Collapse
Affiliation(s)
- Yu-Fei Ao
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
- Beijing National Laboratory for Molecular Sciences, CAS Key Laboratory of Molecular Recognition and Function, Institute of Chemistry, Chinese Academy of Sciences, Zhongguancun North First Street 2, Beijing, 100190, China
- University of Chinese Academy of Sciences, Yuquan Road 19(A), Beijing, 100049, China
| | - Mark Dörr
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| | - Marian J Menke
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| | - Stefan Born
- Technische Universität Berlin, Chair of Bioprocess Engineering, Ackerstraße 76, 13355, Berlin, Germany
| | - Egon Heuson
- Univ. Lille, CNRS, Centrale Lille, Univ. Artois, UMR 8181 UCCS, Unité de Catalyse et Chimie du Solide, 59000, Lille, France
| | - Uwe T Bornscheuer
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| |
Collapse
|
11
|
Joho Y, Vongsouthi V, Gomez C, Larsen JS, Ardevol A, Jackson CJ. Improving plastic degrading enzymes via directed evolution. Protein Eng Des Sel 2024; 37:gzae009. [PMID: 38713696 PMCID: PMC11091475 DOI: 10.1093/protein/gzae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 04/30/2024] [Accepted: 05/05/2024] [Indexed: 05/09/2024] Open
Abstract
Plastic degrading enzymes have immense potential for use in industrial applications. Protein engineering efforts over the last decade have resulted in considerable enhancement of many properties of these enzymes. Directed evolution, a protein engineering approach that mimics the natural process of evolution in a laboratory, has been particularly useful in overcoming some of the challenges of structure-based protein engineering. For example, directed evolution has been used to improve the catalytic activity and thermostability of polyethylene terephthalate (PET)-degrading enzymes, although its use for the improvement of other desirable properties, such as solvent tolerance, has been less studied. In this review, we aim to identify some of the knowledge gaps and current challenges, and highlight recent studies related to the directed evolution of plastic-degrading enzymes.
Collapse
Affiliation(s)
- Yvonne Joho
- Manufacturing, Commonwealth Scientific and Industrial Research Organisation, Research Way, Clayton, Victoria 3168, Australia
- Research School of Chemistry, Australian National University, Sullivans Creek Rd, Canberra, ACT 2601, Australia
- CSIRO Advanced Engineering Biology Future Science Platform, GPO Box 1700, Canberra, ACT 2601, Australia
| | - Vanessa Vongsouthi
- Research School of Chemistry, Australian National University, Sullivans Creek Rd, Canberra, ACT 2601, Australia
| | - Chloe Gomez
- Research School of Chemistry, Australian National University, Sullivans Creek Rd, Canberra, ACT 2601, Australia
| | - Joachim S Larsen
- Research School of Chemistry, Australian National University, Sullivans Creek Rd, Canberra, ACT 2601, Australia
- ARC Centre of Excellence for Synthetic Biology, Research School of Chemistry, Australian National University, Sullivans Creek Rd, Canberra, ACT 2601, Australia
| | - Albert Ardevol
- Manufacturing, Commonwealth Scientific and Industrial Research Organisation, Research Way, Clayton, Victoria 3168, Australia
- CSIRO Advanced Engineering Biology Future Science Platform, GPO Box 1700, Canberra, ACT 2601, Australia
| | - Colin J Jackson
- Research School of Chemistry, Australian National University, Sullivans Creek Rd, Canberra, ACT 2601, Australia
- ARC Centre of Excellence for Synthetic Biology, Research School of Chemistry, Australian National University, Sullivans Creek Rd, Canberra, ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Sullivans Creek Rd, Canberra, ACT 2601, Australia
| |
Collapse
|
12
|
Xing H, Cai P, Liu D, Han M, Liu J, Le Y, Zhang D, Hu QN. High-throughput prediction of enzyme promiscuity based on substrate-product pairs. Brief Bioinform 2024; 25:bbae089. [PMID: 38487850 PMCID: PMC10940840 DOI: 10.1093/bib/bbae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/20/2024] [Accepted: 02/03/2024] [Indexed: 03/18/2024] Open
Abstract
The screening of enzymes for catalyzing specific substrate-product pairs is often constrained in the realms of metabolic engineering and synthetic biology. Existing tools based on substrate and reaction similarity predominantly rely on prior knowledge, demonstrating limited extrapolative capabilities and an inability to incorporate custom candidate-enzyme libraries. Addressing these limitations, we have developed the Substrate-product Pair-based Enzyme Promiscuity Prediction (SPEPP) model. This innovative approach utilizes transfer learning and transformer architecture to predict enzyme promiscuity, thereby elucidating the intricate interplay between enzymes and substrate-product pairs. SPEPP exhibited robust predictive ability, eliminating the need for prior knowledge of reactions and allowing users to define their own candidate-enzyme libraries. It can be seamlessly integrated into various applications, including metabolic engineering, de novo pathway design, and hazardous material degradation. To better assist metabolic engineers in designing and refining biochemical pathways, particularly those without programming skills, we also designed EnzyPick, an easy-to-use web server for enzyme screening based on SPEPP. EnzyPick is accessible at http://www.biosynther.com/enzypick/.
Collapse
Affiliation(s)
- Huadong Xing
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Pengli Cai
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Dongliang Liu
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Mengying Han
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan 430072, China
| | - Yingying Le
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Dachuan Zhang
- Institute of Environmental Engineering, ETH Zurich, Laura-Hezner-Weg 7, 8093 Zurich, Switzerland
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
13
|
Bitencourt-Ferreira G, Villarreal MA, Quiroga R, Biziukova N, Poroikov V, Tarasova O, de Azevedo Junior WF. Exploring Scoring Function Space: Developing Computational Models for Drug Discovery. Curr Med Chem 2024; 31:2361-2377. [PMID: 36944627 DOI: 10.2174/0929867330666230321103731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 12/15/2022] [Accepted: 12/29/2022] [Indexed: 03/23/2023]
Abstract
BACKGROUND The idea of scoring function space established a systems-level approach to address the development of models to predict the affinity of drug molecules by those interested in drug discovery. OBJECTIVE Our goal here is to review the concept of scoring function space and how to explore it to develop machine learning models to address protein-ligand binding affinity. METHODS We searched the articles available in PubMed related to the scoring function space. We also utilized crystallographic structures found in the protein data bank (PDB) to represent the protein space. RESULTS The application of systems-level approaches to address receptor-drug interactions allows us to have a holistic view of the process of drug discovery. The scoring function space adds flexibility to the process since it makes it possible to see drug discovery as a relationship involving mathematical spaces. CONCLUSION The application of the concept of scoring function space has provided us with an integrated view of drug discovery methods. This concept is useful during drug discovery, where we see the process as a computational search of the scoring function space to find an adequate model to predict receptor-drug binding affinity.
Collapse
Affiliation(s)
| | - Marcos A Villarreal
- CONICET-Departamento de Matemática y Física, Instituto de Investigaciones en Fisicoquímica de Córdoba (INFIQC), Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina
| | - Rodrigo Quiroga
- CONICET-Departamento de Matemática y Física, Instituto de Investigaciones en Fisicoquímica de Córdoba (INFIQC), Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina
| | - Nadezhda Biziukova
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Vladimir Poroikov
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Olga Tarasova
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Walter F de Azevedo Junior
- Pontifical Catholic University of Rio Grande do Sul - PUCRS, Porto Alegre-RS, Brazil
- Specialization Program in Bioinformatics, The Pontifical Catholic University of Rio Grande do Sul (PUCRS), Av. Ipiranga, 6681 Porto Alegre / RS 90619-900, Brazil
| |
Collapse
|
14
|
Boob AG, Chen J, Zhao H. Enabling pathway design by multiplex experimentation and machine learning. Metab Eng 2024; 81:70-87. [PMID: 38040110 DOI: 10.1016/j.ymben.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 11/01/2023] [Accepted: 11/25/2023] [Indexed: 12/03/2023]
Abstract
The remarkable metabolic diversity observed in nature has provided a foundation for sustainable production of a wide array of valuable molecules. However, transferring the biosynthetic pathway to the desired host often runs into inherent failures that arise from intermediate accumulation and reduced flux resulting from competing pathways within the host cell. Moreover, the conventional trial and error methods utilized in pathway optimization struggle to fully grasp the intricacies of installed pathways, leading to time-consuming and labor-intensive experiments, ultimately resulting in suboptimal yields. Considering these obstacles, there is a pressing need to explore the enzyme expression landscape and identify the optimal pathway configuration for enhanced production of molecules. This review delves into recent advancements in pathway engineering, with a focus on multiplex experimentation and machine learning techniques. These approaches play a pivotal role in overcoming the limitations of traditional methods, enabling exploration of a broader design space and increasing the likelihood of discovering optimal pathway configurations for enhanced production of molecules. We discuss several tools and strategies for pathway design, construction, and optimization for sustainable and cost-effective microbial production of molecules ranging from bulk to fine chemicals. We also highlight major successes in academia and industry through compelling case studies.
Collapse
Affiliation(s)
- Aashutosh Girish Boob
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Junyu Chen
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Huimin Zhao
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States.
| |
Collapse
|
15
|
Raghavan P, Haas BC, Ruos ME, Schleinitz J, Doyle AG, Reisman SE, Sigman MS, Coley CW. Dataset Design for Building Models of Chemical Reactivity. ACS CENTRAL SCIENCE 2023; 9:2196-2204. [PMID: 38161380 PMCID: PMC10755851 DOI: 10.1021/acscentsci.3c01163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 11/06/2023] [Accepted: 11/15/2023] [Indexed: 01/03/2024]
Abstract
Models can codify our understanding of chemical reactivity and serve a useful purpose in the development of new synthetic processes via, for example, evaluating hypothetical reaction conditions or in silico substrate tolerance. Perhaps the most determining factor is the composition of the training data and whether it is sufficient to train a model that can make accurate predictions over the full domain of interest. Here, we discuss the design of reaction datasets in ways that are conducive to data-driven modeling, emphasizing the idea that training set diversity and model generalizability rely on the choice of molecular or reaction representation. We additionally discuss the experimental constraints associated with generating common types of chemistry datasets and how these considerations should influence dataset design and model building.
Collapse
Affiliation(s)
- Priyanka Raghavan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Brittany C. Haas
- Department
of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Madeline E. Ruos
- Department
of Chemistry & Biochemistry, University
of California, Los Angeles, Los Angeles, California 90095, United States
| | - Jules Schleinitz
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Abigail G. Doyle
- Department
of Chemistry & Biochemistry, University
of California, Los Angeles, Los Angeles, California 90095, United States
| | - Sarah E. Reisman
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Matthew S. Sigman
- Department
of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
16
|
Xi C, Diao J, Moon TS. Advances in ligand-specific biosensing for structurally similar molecules. Cell Syst 2023; 14:1024-1043. [PMID: 38128482 PMCID: PMC10751988 DOI: 10.1016/j.cels.2023.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/23/2023] [Accepted: 10/19/2023] [Indexed: 12/23/2023]
Abstract
The specificity of biological systems makes it possible to develop biosensors targeting specific metabolites, toxins, and pollutants in complex medical or environmental samples without interference from structurally similar compounds. For the last two decades, great efforts have been devoted to creating proteins or nucleic acids with novel properties through synthetic biology strategies. Beyond augmenting biocatalytic activity, expanding target substrate scopes, and enhancing enzymes' enantioselectivity and stability, an increasing research area is the enhancement of molecular specificity for genetically encoded biosensors. Here, we summarize recent advances in the development of highly specific biosensor systems and their essential applications. First, we describe the rational design principles required to create libraries containing potential mutants with less promiscuity or better specificity. Next, we review the emerging high-throughput screening techniques to engineer biosensing specificity for the desired target. Finally, we examine the computer-aided evaluation and prediction methods to facilitate the construction of ligand-specific biosensors.
Collapse
Affiliation(s)
- Chenggang Xi
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Jinjin Diao
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Tae Seok Moon
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA; Division of Biology and Biomedical Sciences, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
17
|
Heid E, Probst D, Green WH, Madsen GKH. EnzymeMap: curation, validation and data-driven prediction of enzymatic reactions. Chem Sci 2023; 14:14229-14242. [PMID: 38098707 PMCID: PMC10718068 DOI: 10.1039/d3sc02048g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 11/21/2023] [Indexed: 12/17/2023] Open
Abstract
Enzymatic reactions are an ecofriendly, selective, and versatile addition, sometimes even alternative to organic reactions for the synthesis of chemical compounds such as pharmaceuticals or fine chemicals. To identify suitable reactions, computational models to predict the activity of enzymes on non-native substrates, to perform retrosynthetic pathway searches, or to predict the outcomes of reactions including regio- and stereoselectivity are becoming increasingly important. However, current approaches are substantially hindered by the limited amount of available data, especially if balanced and atom mapped reactions are needed and if the models feature machine learning components. We therefore constructed a high-quality dataset (EnzymeMap) by developing a large set of correction and validation algorithms for recorded reactions in the literature and showcase its significant positive impact on machine learning models of retrosynthesis, forward prediction, and regioselectivity prediction, outperforming previous approaches by a large margin. Our dataset allows for deep learning models of enzymatic reactions with unprecedented accuracy, and is freely available online.
Collapse
Affiliation(s)
- Esther Heid
- Institute of Materials Chemistry, TU Wien 1060 Vienna Austria
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | | | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | | |
Collapse
|
18
|
Barghout RA, Xu Z, Betala S, Mahadevan R. Advances in generative modeling methods and datasets to design novel enzymes for renewable chemicals and fuels. Curr Opin Biotechnol 2023; 84:103007. [PMID: 37931573 DOI: 10.1016/j.copbio.2023.103007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 09/12/2023] [Accepted: 09/13/2023] [Indexed: 11/08/2023]
Abstract
Biotechnology has revolutionized the development of sustainable energy sources by harnessing biomass as a feedstock for energy production. However, challenges such as recalcitrant feedstocks and inefficient metabolic pathways hinder the large-scale integration of renewable energy systems. Enzyme engineering has emerged as a powerful tool to address these challenges by enhancing enzyme activity, specificity, and stability. Generative machine learning (ML) models have shown great promise in accelerating protein design, allowing for the generation of novel protein sequences with desired properties by navigating vast spaces. This review paper aims to summarize the state of the art in generative models for protein design and how they can be applied to bioenergy applications, including the underlying architectures and training strategies. Additionally, it highlights the importance of high-quality datasets for training and evaluating generative models, organizes available datasets for generative protein design, and discusses the potential of applying generative models to strain design for bioenergy production.
Collapse
Affiliation(s)
- Rana A Barghout
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St, Toronto, ON, Canada.
| | - Zhiqing Xu
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St, Toronto, ON, Canada
| | - Siddharth Betala
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - Radhakrishnan Mahadevan
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St, Toronto, ON, Canada
| |
Collapse
|
19
|
Ran X, Jiang Y, Shao Q, Yang ZJ. EnzyKR: a chirality-aware deep learning model for predicting the outcomes of the hydrolase-catalyzed kinetic resolution. Chem Sci 2023; 14:12073-12082. [PMID: 37969577 PMCID: PMC10631226 DOI: 10.1039/d3sc02752j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 10/16/2023] [Indexed: 11/17/2023] Open
Abstract
Hydrolase-catalyzed kinetic resolution is a well-established biocatalytic process. However, the computational tools that predict favorable enzyme scaffolds for separating a racemic substrate mixture are underdeveloped. To address this challenge, we trained a deep learning framework, EnzyKR, to automate the selection of hydrolases for stereoselective biocatalysis. EnzyKR adopts a classifier-regressor architecture that first identifies the reactive binding conformer of a substrate-hydrolase complex, and then predicts its activation free energy. A structure-based encoding strategy was used to depict the chiral interactions between hydrolases and enantiomers. Different from existing models trained on protein sequences and substrate SMILES strings, EnzyKR was trained using 204 substrate-hydrolase complexes, which were constructed by docking. EnzyKR was tested using a held-out dataset of 20 complexes on the task of predicting activation free energy. EnzyKR achieved a Pearson correlation coefficient (R) of 0.72, a Spearman rank correlation coefficient (Spearman R) of 0.72, and a mean absolute error (MAE) of 1.54 kcal mol-1 in this task. Furthermore, EnzyKR was tested on the task of predicting enantiomeric excess ratios for 28 hydrolytic kinetic resolution reactions catalyzed by fluoroacetate dehalogenase RPA1163, halohydrin HheC, A. mediolanus epoxide hydrolase, and P. fluorescens esterase. The performance of EnzyKR was compared against that of a recently developed kinetic predictor, DLKcat. EnzyKR correctly predicts the favored enantiomer and outperforms DLKcat in 18 out of 28 reactions, occupying 64% of the test cases. These results demonstrate EnzyKR to be a new approach for prediction of enantiomeric outcomes in hydrolase-catalyzed kinetic resolution reactions.
Collapse
Affiliation(s)
- Xinchun Ran
- Department of Chemistry, Vanderbilt University Nashville Tennessee 37235 USA +1-343-9849
| | - Yaoyukun Jiang
- Department of Chemistry, Vanderbilt University Nashville Tennessee 37235 USA +1-343-9849
| | - Qianzhen Shao
- Department of Chemistry, Vanderbilt University Nashville Tennessee 37235 USA +1-343-9849
| | - Zhongyue J Yang
- Department of Chemistry, Vanderbilt University Nashville Tennessee 37235 USA +1-343-9849
- Center for Structural Biology, Vanderbilt University Nashville Tennessee 37235 USA
- Vanderbilt Institute of Chemical Biology, Vanderbilt University Nashville Tennessee 37235 USA
- Data Science Institute, Vanderbilt University Nashville Tennessee 37235 USA
- Department of Chemical and Biomolecular Engineering, Vanderbilt University Nashville Tennessee 37235 USA
| |
Collapse
|
20
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
21
|
Markus B, C GC, Andreas K, Arkadij K, Stefan L, Gustav O, Elina S, Radka S. Accelerating Biocatalysis Discovery with Machine Learning: A Paradigm Shift in Enzyme Engineering, Discovery, and Design. ACS Catal 2023; 13:14454-14469. [PMID: 37942268 PMCID: PMC10629211 DOI: 10.1021/acscatal.3c03417] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/29/2023] [Accepted: 10/03/2023] [Indexed: 11/10/2023]
Abstract
Emerging computational tools promise to revolutionize protein engineering for biocatalytic applications and accelerate the development timelines previously needed to optimize an enzyme to its more efficient variant. For over a decade, the benefits of predictive algorithms have helped scientists and engineers navigate the complexity of functional protein sequence space. More recently, spurred by dramatic advances in underlying computational tools, the promise of faster, cheaper, and more accurate enzyme identification, characterization, and engineering has catapulted terms such as artificial intelligence and machine learning to the must-have vocabulary in the field. This Perspective aims to showcase the current status of applications in pharmaceutical industry and also to discuss and celebrate the innovative approaches in protein science by highlighting their potential in selected recent developments and offering thoughts on future opportunities for biocatalysis. It also critically assesses the technology's limitations, unanswered questions, and unmet challenges.
Collapse
Affiliation(s)
- Braun Markus
- Department
of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010 Graz, Austria
| | - Gruber Christian C
- Enzyme
and Drug Discovery, Innophore. 1700 Montgomery Street, San Francisco, California 94111, United States
| | - Krassnigg Andreas
- Enzyme
and Drug Discovery, Innophore. 1700 Montgomery Street, San Francisco, California 94111, United States
| | - Kummer Arkadij
- Moderna,
Inc., 200 Technology
Square, Cambridge, Massachusetts 02139, United States
| | - Lutz Stefan
- Codexis
Inc., 200 Penobscot Drive, Redwood City, California 94063, United States
| | - Oberdorfer Gustav
- Department
of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010 Graz, Austria
| | - Siirola Elina
- Novartis
Institute for Biomedical Research, Global Discovery Chemistry, Basel CH-4108, Switzerland
| | - Snajdrova Radka
- Novartis
Institute for Biomedical Research, Global Discovery Chemistry, Basel CH-4108, Switzerland
| |
Collapse
|
22
|
Zhang Y, Guo J, Gao P, Yan W, Shen J, Luo X, Keasling JD. Development of an efficient yeast platform for cannabigerolic acid biosynthesis. Metab Eng 2023; 80:232-240. [PMID: 37890610 DOI: 10.1016/j.ymben.2023.10.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/16/2023] [Accepted: 10/18/2023] [Indexed: 10/29/2023]
Abstract
Cannabinoids are important therapeutical molecules for human ailments, cancer treatment, and SARS-CoV-2. The central cannabinoid, cannabigerolic acid (CBGA), is generated from geranyl pyrophosphate and olivetolic acid by Cannabis sativa prenyltransferase (CsPT4). Despite efforts to engineer microorganisms such as Saccharomyces cerevisiae (S. cerevisiae) for CBGA production, their titers remain suboptimal because of the low conversion of hexanoate into olivetolic acid and the limited activity and stability of the CsPT4. To address the low hexanoate conversion, we eliminated hexanoate consumption by the beta-oxidation pathway and reduced its incorporation into fatty acids. To address CsPT4 limitations, we expanded the endoplasmic reticulum and fused an auxiliary protein to CsPT4. Consequently, the engineered S. cerevisiae chassis showed a marked improvement of 78.64-fold in CBGA production, reaching a titer of 510.32 ± 10.70 mg l-1 from glucose and hexanoate.
Collapse
Affiliation(s)
- Yunfeng Zhang
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, CAS Key Laboratory of Quantitative Engineering Biology, Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Jiulong Guo
- Synceres Biosciences (Shenzhen) CO., LTD, China
| | - PeiZhen Gao
- Synceres Biosciences (Shenzhen) CO., LTD, China
| | - Wei Yan
- Synceres Biosciences (Shenzhen) CO., LTD, China
| | - Junfeng Shen
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, CAS Key Laboratory of Quantitative Engineering Biology, Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Xiaozhou Luo
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, CAS Key Laboratory of Quantitative Engineering Biology, Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Jay D Keasling
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Department of Chemical and Biomolecular Engineering & Department of Bioengineering, University of California, Berkeley, CA, 94720, USA; Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800, Kgs. Lyngby, Denmark.
| |
Collapse
|
23
|
Finnigan W, Lubberink M, Hepworth LJ, Citoler J, Mattey AP, Ford GJ, Sangster J, Cosgrove SC, da Costa BZ, Heath RS, Thorpe TW, Yu Y, Flitsch SL, Turner NJ. RetroBioCat Database: A Platform for Collaborative Curation and Automated Meta-Analysis of Biocatalysis Data. ACS Catal 2023; 13:11771-11780. [PMID: 37671181 PMCID: PMC10476152 DOI: 10.1021/acscatal.3c01418] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 06/26/2023] [Indexed: 09/07/2023]
Abstract
Despite the increasing use of biocatalysis for organic synthesis, there are currently no databases that adequately capture synthetic biotransformations. The lack of a biocatalysis database prevents accelerating biocatalyst characterization efforts from being leveraged to quickly identify candidate enzymes for reactions or cascades, slowing their development. The RetroBioCat Database (available at retrobiocat.com) addresses this gap by capturing information on synthetic biotransformations and providing an analysis platform that allows biocatalysis data to be searched and explored through a range of highly interactive data visualization tools. This database makes it simple to explore available enzymes, their substrate scopes, and how characterized enzymes are related to each other and the wider sequence space. Data entry is facilitated through an openly accessible curation platform, featuring automated tools to accelerate the process. The RetroBioCat Database democratizes biocatalysis knowledge and has the potential to accelerate biocatalytic reaction development, making it a valuable resource for the community.
Collapse
Affiliation(s)
- William Finnigan
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | | | - Lorna J. Hepworth
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Joan Citoler
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Ashley P. Mattey
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Grayson J. Ford
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Jack Sangster
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | | | - Bruna Zucoloto da Costa
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Rachel S. Heath
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | | | - Yuqi Yu
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Sabine L. Flitsch
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Nicholas J. Turner
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| |
Collapse
|
24
|
Clements HD, Flynn AR, Nicholls BT, Grosheva D, Lefave SJ, Merriman MT, Hyster TK, Sigman MS. Using Data Science for Mechanistic Insights and Selectivity Predictions in a Non-Natural Biocatalytic Reaction. J Am Chem Soc 2023; 145:17656-17664. [PMID: 37530568 PMCID: PMC10602048 DOI: 10.1021/jacs.3c03639] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
The study of non-natural biocatalytic transformations relies heavily on empirical methods, such as directed evolution, for identifying improved variants. Although exceptionally effective, this approach provides limited insight into the molecular mechanisms behind the transformations and necessitates multiple protein engineering campaigns for new reactants. To address this limitation, we disclose a strategy to explore the biocatalytic reaction space and garner insight into the molecular mechanisms driving enzymatic transformations. Specifically, we explored the selectivity of an "ene"-reductase, GluER-T36A, to create a data-driven toolset that explores reaction space and rationalizes the observed and predicted selectivities of substrate/mutant combinations. The resultant statistical models related structural features of the enzyme and substrate to selectivity and were used to effectively predict selectivity in reactions with out-of-sample substrates and mutants. Our approach provided a deeper understanding of enantioinduction by GluER-T36A and holds the potential to enhance the virtual screening of enzyme mutants.
Collapse
Affiliation(s)
- Hanna D Clements
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Autumn R Flynn
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Bryce T Nicholls
- Department of Chemistry and Chemical Biology, Cornell University, 122 Baker Laboratory, Ithaca, New York 14853, United States
| | - Daria Grosheva
- Department of Chemistry and Chemical Biology, Cornell University, 122 Baker Laboratory, Ithaca, New York 14853, United States
| | - Sarah J Lefave
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Morgan T Merriman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Todd K Hyster
- Department of Chemistry and Chemical Biology, Cornell University, 122 Baker Laboratory, Ithaca, New York 14853, United States
| | - Matthew S Sigman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| |
Collapse
|
25
|
Brooks SM, Marsan C, Reed KB, Yuan SF, Nguyen DD, Trivedi A, Altin-Yavuzarslan G, Ballinger N, Nelson A, Alper HS. A tripartite microbial co-culture system for de novo biosynthesis of diverse plant phenylpropanoids. Nat Commun 2023; 14:4448. [PMID: 37488111 PMCID: PMC10366228 DOI: 10.1038/s41467-023-40242-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 07/19/2023] [Indexed: 07/26/2023] Open
Abstract
Plant-derived phenylpropanoids, in particular phenylpropenes, have diverse industrial applications ranging from flavors and fragrances to polymers and pharmaceuticals. Heterologous biosynthesis of these products has the potential to address low, seasonally dependent yields hindering ease of widespread manufacturing. However, previous efforts have been hindered by the inherent pathway promiscuity and the microbial toxicity of key pathway intermediates. Here, in this study, we establish the propensity of a tripartite microbial co-culture to overcome these limitations and demonstrate to our knowledge the first reported de novo phenylpropene production from simple sugar starting materials. After initially designing the system to accumulate eugenol, the platform modularity and downstream enzyme promiscuity was leveraged to quickly create avenues for hydroxychavicol and chavicol production. The consortia was found to be compatible with Engineered Living Material production platforms that allow for reusable, cold-chain-independent distributed manufacturing. This work lays the foundation for further deployment of modular microbial approaches to produce plant secondary metabolites.
Collapse
Affiliation(s)
- Sierra M Brooks
- McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Celeste Marsan
- McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Kevin B Reed
- McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Shuo-Fu Yuan
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA
| | - Dustin-Dat Nguyen
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA
| | - Adit Trivedi
- McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Gokce Altin-Yavuzarslan
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, 98195, USA
| | - Nathan Ballinger
- Department of Chemistry, University of Washington, Box 351700, Seattle, WA, USA
| | - Alshakim Nelson
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, 98195, USA
- Department of Chemistry, University of Washington, Box 351700, Seattle, WA, USA
| | - Hal S Alper
- McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX, USA.
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA.
| |
Collapse
|
26
|
Kroll A, Rousset Y, Hu XP, Liebrand NA, Lercher MJ. Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning. Nat Commun 2023; 14:4139. [PMID: 37438349 DOI: 10.1038/s41467-023-39840-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 06/27/2023] [Indexed: 07/14/2023] Open
Abstract
The turnover number kcat, a measure of enzyme efficiency, is central to understanding cellular physiology and resource allocation. As experimental kcat estimates are unavailable for the vast majority of enzymatic reactions, the development of accurate computational prediction methods is highly desirable. However, existing machine learning models are limited to a single, well-studied organism, or they provide inaccurate predictions except for enzymes that are highly similar to proteins in the training set. Here, we present TurNuP, a general and organism-independent model that successfully predicts turnover numbers for natural reactions of wild-type enzymes. We constructed model inputs by representing complete chemical reactions through differential reaction fingerprints and by representing enzymes through a modified and re-trained Transformer Network model for protein sequences. TurNuP outperforms previous models and generalizes well even to enzymes that are not similar to proteins in the training set. Parameterizing metabolic models with TurNuP-predicted kcat values leads to improved proteome allocation predictions. To provide a powerful and convenient tool for the study of molecular biochemistry and physiology, we implemented a TurNuP web server.
Collapse
Affiliation(s)
- Alexander Kroll
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Yvan Rousset
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Xiao-Pan Hu
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Nina A Liebrand
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Martin J Lercher
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany.
| |
Collapse
|
27
|
Upadhyay V, Boorla VS, Maranas CD. Rank-ordering of known enzymes as starting points for re-engineering novel substrate activity using a convolutional neural network. Metab Eng 2023; 78:171-182. [PMID: 37301359 DOI: 10.1016/j.ymben.2023.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 05/19/2023] [Accepted: 06/02/2023] [Indexed: 06/12/2023]
Abstract
Retro-biosynthetic approaches have made significant advances in predicting synthesis routes of target biofuel, bio-renewable or bio-active molecules. The use of only cataloged enzymatic activities limits the discovery of new production routes. Recent retro-biosynthetic algorithms increasingly use novel conversions that require altering the substrate or cofactor specificities of existing enzymes while connecting pathways leading to a target metabolite. However, identifying and re-engineering enzymes for desired novel conversions are currently the bottlenecks in implementing such designed pathways. Herein, we present EnzRank, a convolutional neural network (CNN) based approach, to rank-order existing enzymes in terms of their suitability to undergo successful protein engineering through directed evolution or de novo design towards a desired specific substrate activity. We train the CNN model on 11,800 known active enzyme-substrate pairs from the BRENDA database as positive samples and data generated by scrambling these pairs as negative samples using substrate dissimilarity between an enzyme's native substrate and all other molecules present in the dataset using Tanimoto similarity score. EnzRank achieves an average recovery rate of 80.72% and 73.08% for positive and negative pairs on test data after using a 10-fold holdout method for training and cross-validation. We further developed a web-based user interface (available at https://huggingface.co/spaces/vuu10/EnzRank) to predict enzyme-substrate activity using SMILES strings of substrates and enzyme sequence as input to allow convenient and easy-to-use access to EnzRank. In summary, this effort can aid de novo pathway design tools to prioritize starting enzyme re-engineering candidates for novel reactions as well as in predicting the potential secondary activity of enzymes in cell metabolism.
Collapse
Affiliation(s)
- Vikas Upadhyay
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Veda Sheersh Boorla
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Costas D Maranas
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
28
|
Singh R, Sledzieski S, Bryson B, Cowen L, Berger B. Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc Natl Acad Sci U S A 2023; 120:e2220778120. [PMID: 37289807 PMCID: PMC10268324 DOI: 10.1073/pnas.2220778120] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/10/2023] [Indexed: 06/10/2023] Open
Abstract
Sequence-based prediction of drug-target interactions has the potential to accelerate drug discovery by complementing experimental screens. Such computational prediction needs to be generalizable and scalable while remaining sensitive to subtle variations in the inputs. However, current computational techniques fail to simultaneously meet these goals, often sacrificing performance of one to achieve the others. We develop a deep learning model, ConPLex, successfully leveraging the advances in pretrained protein language models ("PLex") and employing a protein-anchored contrastive coembedding ("Con") to outperform state-of-the-art approaches. ConPLex achieves high accuracy, broad adaptivity to unseen data, and specificity against decoy compounds. It makes predictions of binding based on the distance between learned representations, enabling predictions at the scale of massive compound libraries and the human proteome. Experimental testing of 19 kinase-drug interaction predictions validated 12 interactions, including four with subnanomolar affinity, plus a strongly binding EPHB1 inhibitor (KD = 1.3 nM). Furthermore, ConPLex embeddings are interpretable, which enables us to visualize the drug-target embedding space and use embeddings to characterize the function of human cell-surface proteins. We anticipate that ConPLex will facilitate efficient drug discovery by making highly sensitive in silico drug screening feasible at the genome scale. ConPLex is available open source at https://ConPLex.csail.mit.edu.
Collapse
Affiliation(s)
- Rohit Singh
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Samuel Sledzieski
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Bryan Bryson
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA02155
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
29
|
Kroll A, Ranjan S, Engqvist MKM, Lercher MJ. A general model to predict small molecule substrates of enzymes based on machine and deep learning. Nat Commun 2023; 14:2787. [PMID: 37188731 DOI: 10.1038/s41467-023-38347-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 04/21/2023] [Indexed: 05/17/2023] Open
Abstract
For most proteins annotated as enzymes, it is unknown which primary and/or secondary reactions they catalyze. Experimental characterizations of potential substrates are time-consuming and costly. Machine learning predictions could provide an efficient alternative, but are hampered by a lack of information regarding enzyme non-substrates, as available training data comprises mainly positive examples. Here, we present ESP, a general machine-learning model for the prediction of enzyme-substrate pairs with an accuracy of over 91% on independent and diverse test data. ESP can be applied successfully across widely different enzymes and a broad range of metabolites included in the training data, outperforming models designed for individual, well-studied enzyme families. ESP represents enzymes through a modified transformer model, and is trained on data augmented with randomly sampled small molecules assigned as non-substrates. By facilitating easy in silico testing of potential substrates, the ESP web server may support both basic and applied science.
Collapse
Affiliation(s)
- Alexander Kroll
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Sahasra Ranjan
- Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India
| | - Martin K M Engqvist
- Department of Biology and Bioengineering, Chalmers University of Technology, SE-412 96, Gothenburg, Sweden
- EnginZyme AB, Tomtebodevägen 6, 17165, Stockholm, Sweden
| | - Martin J Lercher
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany.
| |
Collapse
|
30
|
Vasina M, Kovar D, Damborsky J, Ding Y, Yang T, deMello A, Mazurenko S, Stavrakis S, Prokop Z. In-depth analysis of biocatalysts by microfluidics: An emerging source of data for machine learning. Biotechnol Adv 2023; 66:108171. [PMID: 37150331 DOI: 10.1016/j.biotechadv.2023.108171] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 05/04/2023] [Accepted: 05/04/2023] [Indexed: 05/09/2023]
Abstract
Nowadays, the vastly increasing demand for novel biotechnological products is supported by the continuous development of biocatalytic applications which provide sustainable green alternatives to chemical processes. The success of a biocatalytic application is critically dependent on how quickly we can identify and characterize enzyme variants fitting the conditions of industrial processes. While miniaturization and parallelization have dramatically increased the throughput of next-generation sequencing systems, the subsequent characterization of the obtained candidates is still a limiting process in identifying the desired biocatalysts. Only a few commercial microfluidic systems for enzyme analysis are currently available, and the transformation of numerous published prototypes into commercial platforms is still to be streamlined. This review presents the state-of-the-art, recent trends, and perspectives in applying microfluidic tools in the functional and structural analysis of biocatalysts. We discuss the advantages and disadvantages of available technologies, their reproducibility and robustness, and readiness for routine laboratory use. We also highlight the unexplored potential of microfluidics to leverage the power of machine learning for biocatalyst development.
Collapse
Affiliation(s)
- Michal Vasina
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - David Kovar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Yun Ding
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland
| | - Tianjin Yang
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland; Department of Biochemistry, University of Zurich, 8057 Zurich, Switzerland
| | - Andrew deMello
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic.
| | - Stavros Stavrakis
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland.
| | - Zbynek Prokop
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic.
| |
Collapse
|
31
|
Yu T, Boob AG, Volk MJ, Liu X, Cui H, Zhao H. Machine learning-enabled retrobiosynthesis of molecules. Nat Catal 2023. [DOI: 10.1038/s41929-022-00909-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
32
|
Jiang Y, Ran X, Yang ZJ. Data-driven enzyme engineering to identify function-enhancing enzymes. Protein Eng Des Sel 2023; 36:gzac009. [PMID: 36214500 PMCID: PMC10365845 DOI: 10.1093/protein/gzac009] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 08/08/2022] [Accepted: 09/28/2022] [Indexed: 01/22/2023] Open
Abstract
Identifying function-enhancing enzyme variants is a 'holy grail' challenge in protein science because it will allow researchers to expand the biocatalytic toolbox for late-stage functionalization of drug-like molecules, environmental degradation of plastics and other pollutants, and medical treatment of food allergies. Data-driven strategies, including statistical modeling, machine learning, and deep learning, have largely advanced the understanding of the sequence-structure-function relationships for enzymes. They have also enhanced the capability of predicting and designing new enzymes and enzyme variants for catalyzing the transformation of new-to-nature reactions. Here, we reviewed the recent progresses of data-driven models that were applied in identifying efficiency-enhancing mutants for catalytic reactions. We also discussed existing challenges and obstacles faced by the community. Although the review is by no means comprehensive, we hope that the discussion can inform the readers about the state-of-the-art in data-driven enzyme engineering, inspiring more joint experimental-computational efforts to develop and apply data-driven modeling to innovate biocatalysts for synthetic and pharmaceutical applications.
Collapse
Affiliation(s)
- Yaoyukun Jiang
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
| | - Xinchun Ran
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
| | - Zhongyue J Yang
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
- Vanderbilt Institute of Chemical Biology, Vanderbilt University, Nashville, TN 37235, USA
- Data Science Institute, Vanderbilt University, Nashville, TN 37235, USA
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, TN 37235, USA
| |
Collapse
|
33
|
Iqbal WA, Lisitsa A, Kapralov MV. Predicting plant Rubisco kinetics from RbcL sequence data using machine learning. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:638-650. [PMID: 36094849 PMCID: PMC9833099 DOI: 10.1093/jxb/erac368] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 09/12/2022] [Indexed: 06/15/2023]
Abstract
Ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) is responsible for the conversion of atmospheric CO2 to organic carbon during photosynthesis, and often acts as a rate limiting step in the later process. Screening the natural diversity of Rubisco kinetics is the main strategy used to find better Rubisco enzymes for crop engineering efforts. Here, we demonstrate the use of Gaussian processes (GPs), a family of Bayesian models, coupled with protein encoding schemes, for predicting Rubisco kinetics from Rubisco large subunit (RbcL) sequence data. GPs trained on published experimentally obtained Rubisco kinetic datasets were applied to over 9000 sequences encoding RbcL to predict Rubisco kinetic parameters. Notably, our predicted kinetic values were in agreement with known trends, e.g. higher carboxylation turnover rates (Kcat) for Rubisco enzymes from C4 or crassulacean acid metabolism (CAM) species, compared with those found in C3 species. This is the first study demonstrating machine learning approaches as a tool for screening and predicting Rubisco kinetics, which could be applied to other enzymes.
Collapse
Affiliation(s)
- Wasim A Iqbal
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU, United Kingdom
| | - Alexei Lisitsa
- Department of Computer Science, University of Liverpool, Liverpool, L69 3BX, United Kingdom
| | | |
Collapse
|
34
|
Lim PK, Julca I, Mutwil M. Redesigning plant specialized metabolism with supervised machine learning using publicly available reactome data. Comput Struct Biotechnol J 2023; 21:1639-1650. [PMID: 36874159 PMCID: PMC9976193 DOI: 10.1016/j.csbj.2023.01.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 01/12/2023] [Accepted: 01/12/2023] [Indexed: 01/19/2023] Open
Abstract
The immense structural diversity of products and intermediates of plant specialized metabolism (specialized metabolites) makes them rich sources of therapeutic medicine, nutrients, and other useful materials. With the rapid accumulation of reactome data that can be accessible on biological and chemical databases, along with recent advances in machine learning, this review sets out to outline how supervised machine learning can be used to design new compounds and pathways by exploiting the wealth of said data. We will first examine the various sources from which reactome data can be obtained, followed by explaining the different machine learning encoding methods for reactome data. We then discuss current supervised machine learning developments that can be employed in various aspects to help redesign plant specialized metabolism.
Collapse
Affiliation(s)
- Peng Ken Lim
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Irene Julca
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
35
|
Merging enzymatic and synthetic chemistry with computational synthesis planning. Nat Commun 2022; 13:7747. [PMID: 36517480 PMCID: PMC9750992 DOI: 10.1038/s41467-022-35422-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 11/30/2022] [Indexed: 12/15/2022] Open
Abstract
Synthesis planning programs trained on chemical reaction data can design efficient routes to new molecules of interest, but are limited in their ability to leverage rare chemical transformations. This challenge is acute for enzymatic reactions, which are valuable due to their selectivity and sustainability but are few in number. We report a retrosynthetic search algorithm using two neural network models for retrosynthesis-one covering 7984 enzymatic transformations and one 163,723 synthetic transformations-that balances the exploration of enzymatic and synthetic reactions to identify hybrid synthesis plans. This approach extends the space of retrosynthetic moves by thousands of uniquely enzymatic one-step transformations, discovers routes to molecules for which synthetic or enzymatic searches find none, and designs shorter routes for others. Application to (-)-Δ9 tetrahydrocannabinol (THC) (dronabinol) and R,R-formoterol (arformoterol) illustrates how our strategy facilitates the replacement of metal catalysis, high step counts, or costly enantiomeric resolution with more elegant hybrid proposals.
Collapse
|
36
|
Xu Z, Wu J, Song YS, Mahadevan R. Enzyme Activity Prediction of Sequence Variants on Novel Substrates using Improved Substrate Encodings and Convolutional Pooling. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2022; 165:78-87. [PMID: 36530936 PMCID: PMC9759087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Protein engineering is currently being revolutionized by deep learning applications, especially through natural language processing (NLP) techniques. It has been shown that state-of-the-art self-supervised language models trained on entire protein databases capture hidden contextual and structural information in amino acid sequences and are capable of improving sequence-to-function predictions. Yet, recent studies have reported that current compound-protein modeling approaches perform poorly on learning interactions between enzymes and substrates of interest within one protein family. We attribute this to low-grade substrate encoding methods and over-compressed sequence representations received by downstream predictive models. In this study, we propose a new substrate-encoding based on Extended Connectivity Fingerprints (ECFPs) and a convolutional-pooling of the sequence embeddings. Through testing on an activity profiling dataset of haloalkanoate dehalogenase superfamily that measures activities of 218 phosphatases against 168 substrates, we show substantial improvements in predictive performances of compound-protein interaction modeling. In addition, we also test the workflow on three other datasets from the halogenase, kinase and aminotransferase families and show that our pipeline achieves good performance on these datasets as well. We further demonstrate the utility of this downstream model architecture by showing that it achieves good performance with six different protein embeddings, including ESM-1b (Rives et al., 2021), TAPE (Rao et al., 2019), ProtBert, ProtAlbert, ProtT5, and ProtXLNet (Elnaggar et al., 2021). This study provides a new workflow for activity prediction on novel substrates that can be used to engineer new enzymes for sustainability applications.
Collapse
|
37
|
Xiang R, Fernandez-Lopez L, Robles-Martín A, Ferrer M, Guallar V. EP-Pred: A Machine Learning Tool for Bioprospecting Promiscuous Ester Hydrolases. Biomolecules 2022; 12:1529. [PMID: 36291739 PMCID: PMC9599548 DOI: 10.3390/biom12101529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/11/2022] [Accepted: 10/18/2022] [Indexed: 11/25/2022] Open
Abstract
When bioprospecting for novel industrial enzymes, substrate promiscuity is a desirable property that increases the reusability of the enzyme. Among industrial enzymes, ester hydrolases have great relevance for which the demand has not ceased to increase. However, the search for new substrate promiscuous ester hydrolases is not trivial since the mechanism behind this property is greatly influenced by the active site's structural and physicochemical characteristics. These characteristics must be computed from the 3D structure, which is rarely available and expensive to measure, hence the need for a method that can predict promiscuity from sequence alone. Here we report such a method called EP-pred, an ensemble binary classifier, that combines three machine learning algorithms: SVM, KNN, and a Linear model. EP-pred has been evaluated against the Lipase Engineering Database together with a hidden Markov approach leading to a final set of ten sequences predicted to encode promiscuous esterases. Experimental results confirmed the validity of our method since all ten proteins were found to exhibit a broad substrate ambiguity.
Collapse
Affiliation(s)
- Ruite Xiang
- Department of Life Sciences, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | | | - Ana Robles-Martín
- Department of Life Sciences, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Manuel Ferrer
- Department of Applied Biocatalysis, ICP, CSIC, 28049 Madrid, Spain
| | - Victor Guallar
- Department of Life Sciences, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
| |
Collapse
|