1
|
Harding-Larsen D, Funk J, Madsen NG, Gharabli H, Acevedo-Rocha CG, Mazurenko S, Welner DH. Protein representations: Encoding biological information for machine learning in biocatalysis. Biotechnol Adv 2024; 77:108459. [PMID: 39366493 DOI: 10.1016/j.biotechadv.2024.108459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 09/19/2024] [Accepted: 09/29/2024] [Indexed: 10/06/2024]
Abstract
Enzymes offer a more environmentally friendly and low-impact solution to conventional chemistry, but they often require additional engineering for their application in industrial settings, an endeavour that is challenging and laborious. To address this issue, the power of machine learning can be harnessed to produce predictive models that enable the in silico study and engineering of improved enzymatic properties. Such machine learning models, however, require the conversion of the complex biological information to a numerical input, also called protein representations. These inputs demand special attention to ensure the training of accurate and precise models, and, in this review, we therefore examine the critical step of encoding protein information to numeric representations for use in machine learning. We selected the most important approaches for encoding the three distinct biological protein representations - primary sequence, 3D structure, and dynamics - to explore their requirements for employment and inductive biases. Combined representations of proteins and substrates are also introduced as emergent tools in biocatalysis. We propose the division of fixed representations, a collection of rule-based encoding strategies, and learned representations extracted from the latent spaces of large neural networks. To select the most suitable protein representation, we propose two main factors to consider. The first one is the model setup, which is influenced by the size of the training dataset and the choice of architecture. The second factor is the model objectives such as consideration about the assayed property, the difference between wild-type models and mutant predictors, and requirements for explainability. This review is aimed at serving as a source of information and guidance for properly representing enzymes in future machine learning models for biocatalysis.
Collapse
Affiliation(s)
- David Harding-Larsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Jonathan Funk
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Niklas Gesmar Madsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Hani Gharabli
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Carlos G Acevedo-Rocha
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Ditte Hededam Welner
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark.
| |
Collapse
|
2
|
Zhang Z, Li Z, Yang M, Zhao F, Han S. Machine learning-guided multi-site combinatorial mutagenesis enhances the thermostability of pectin lyase. Int J Biol Macromol 2024; 277:134530. [PMID: 39111490 DOI: 10.1016/j.ijbiomac.2024.134530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 07/25/2024] [Accepted: 08/04/2024] [Indexed: 08/13/2024]
Abstract
Enhancing the thermostability of enzymes is crucial for industrial applications. Methods such as directed evolution are often limited by the huge sequence space and combinatorial explosion, making it difficult to obtain optimal mutants. In recent years, machine learning (ML)-guided protein engineering has become an attractive tool because of its ability to comprehensively explore the sequence space of enzymes and discover superior mutants. This study employed ML to perform combinatorial mutation design on the pectin lyase PMGL-Ba from Bacillus licheniformis, aiming to improve its thermostability. First, 18 single-point mutants with enhanced thermostability were identified through semi-rational design. Subsequently, the initial library containing a small number of low-order mutants was utilized to construct an ML model to explore the combinatorial sequence space (theoretically 196,608 mutants) of single-point mutants. The results showed that the ML-predicted second library was successfully enriched with highly thermostable combinatorial mutants. After one iteration of learning, the best-performing combinatorial mutant in the third library, P36, showed a 67-fold and 39-fold increase in half-life at 75 °C and 80 °C, respectively, as well as a 2.1-fold increase in activity. Structural analysis and molecular dynamics simulations provided insights into the improved performance of the engineered enzyme.
Collapse
Affiliation(s)
- Zhihui Zhang
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Zhixuan Li
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Manli Yang
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Fengguang Zhao
- School of Light Industry and Engineering, South China University of Technology, Guangzhou 510006, China
| | - Shuangyan Han
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China.
| |
Collapse
|
3
|
Chong W, Zhang Z, Li Z, Meng S, Nian B, Hu Y. Hook loop dynamics engineering transcended the barrier of activity-stability trade-off and boosted the thermostability of enzymes. Int J Biol Macromol 2024; 278:134953. [PMID: 39181358 DOI: 10.1016/j.ijbiomac.2024.134953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 08/20/2024] [Accepted: 08/20/2024] [Indexed: 08/27/2024]
Abstract
The improvement of enzyme thermostability often accompanies the decreased activity due to the loss of the key regions' flexibility. As a representative structure, unlocking the potential of loop dynamics will not only provide new ideas for stabilization strategies, but also help to deepen the understanding of the relationship between enzyme structural dynamics and function. In this study, a creative "hook loop dynamics engineering" (HLoD) strategy was successfully proposed for simultaneously improving the thermostability and maintaining activity of the model enzyme, Candida Antarctica lipase B. A small and smart mutant library involving five key residues located at the "hook loop" was meticulously identified and systematically investigated and thus yielded a five-point multiple mutant M1 (L147S/T244P/S250P/T256D/N292D), demonstrating a remarkable 7.0-fold increase in thermostability at 60 °C compared to the wild-type (WT). Furthermore, the activity of M1 remained comparable to that of WT, effectively transcending the barrier of activity-stability trade-off. Molecular dynamics simulations revealed that the precise regulation of hook loop dynamics via intermolecular interactions, such as salt bridges and hydrogen bonding, curbed the excessive flexibility of the pivotal regions α5 and α10 at high temperatures, thus driving the substantial enhancement of the thermostability of M1. Refining the dynamics of the flexible region via HLoD, which transcended the barrier of activity-stability trade-off, exhibited to be a robust and potentially universal strategy for designing enzymes with outstanding thermostability and activity.
Collapse
Affiliation(s)
- Wenya Chong
- State Key Laboratory of Materials-Oriented Chemical Engineering, School of Pharmaceutical Sciences, Nanjing Tech university, Nanjing 210009, Jiangsu Province, People's Republic of China
| | - Zihan Zhang
- State Key Laboratory of Materials-Oriented Chemical Engineering, School of Pharmaceutical Sciences, Nanjing Tech university, Nanjing 210009, Jiangsu Province, People's Republic of China
| | - Zhongyu Li
- Institute of Biotechnology, RWTH Aachen University, Worringerweg 3, 52074 Aachen, Germany
| | - Shuaiqi Meng
- Institute of Biotechnology, RWTH Aachen University, Worringerweg 3, 52074 Aachen, Germany
| | - Binbin Nian
- State Key Laboratory of Materials-Oriented Chemical Engineering, School of Pharmaceutical Sciences, Nanjing Tech university, Nanjing 210009, Jiangsu Province, People's Republic of China.
| | - Yi Hu
- State Key Laboratory of Materials-Oriented Chemical Engineering, School of Pharmaceutical Sciences, Nanjing Tech university, Nanjing 210009, Jiangsu Province, People's Republic of China.
| |
Collapse
|
4
|
Gantz M, Mathis SV, Nintzel FEH, Lio P, Hollfelder F. On synergy between ultrahigh throughput screening and machine learning in biocatalyst engineering. Faraday Discuss 2024; 252:89-114. [PMID: 39133073 PMCID: PMC11318516 DOI: 10.1039/d4fd00065j] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 04/23/2024] [Indexed: 08/13/2024]
Abstract
Protein design and directed evolution have separately contributed enormously to protein engineering. Without being mutually exclusive, the former relies on computation from first principles, while the latter is a combinatorial approach based on chance. Advances in ultrahigh throughput (uHT) screening, next generation sequencing and machine learning may create alternative routes to engineered proteins, where functional information linked to specific sequences is interpreted and extrapolated in silico. In particular, the miniaturisation of functional tests in water-in-oil emulsion droplets with picoliter volumes and their rapid generation and analysis (>1 kHz) allows screening of >107-membered libraries in a day. Subsequently, decoding the selected clones by short or long-read sequencing methods leads to large sequence-function datasets that may allow extrapolation from experimental directed evolution to further improved mutants beyond the observed hits. In this work, we explore experimental strategies for how to draw up 'fitness landscapes' in sequence space with uHT droplet microfluidics, review the current state of AI/ML in enzyme engineering and discuss how uHT datasets may be combined with AI/ML to make meaningful predictions and accelerate biocatalyst engineering.
Collapse
Affiliation(s)
- Maximilian Gantz
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, UK
| | - Simon V Mathis
- Department of Computer Science, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Friederike E H Nintzel
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, UK
| | - Pietro Lio
- Department of Computer Science, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Florian Hollfelder
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, UK
| |
Collapse
|
5
|
Gong X, Zhang J, Gan Q, Teng Y, Hou J, Lyu Y, Liu Z, Wu Z, Dai R, Zou Y, Wang X, Zhu D, Zhu H, Liu T, Yan Y. Advancing microbial production through artificial intelligence-aided biology. Biotechnol Adv 2024; 74:108399. [PMID: 38925317 DOI: 10.1016/j.biotechadv.2024.108399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/20/2024] [Accepted: 06/23/2024] [Indexed: 06/28/2024]
Abstract
Microbial cell factories (MCFs) have been leveraged to construct sustainable platforms for value-added compound production. To optimize metabolism and reach optimal productivity, synthetic biology has developed various genetic devices to engineer microbial systems by gene editing, high-throughput protein engineering, and dynamic regulation. However, current synthetic biology methodologies still rely heavily on manual design, laborious testing, and exhaustive analysis. The emerging interdisciplinary field of artificial intelligence (AI) and biology has become pivotal in addressing the remaining challenges. AI-aided microbial production harnesses the power of processing, learning, and predicting vast amounts of biological data within seconds, providing outputs with high probability. With well-trained AI models, the conventional Design-Build-Test (DBT) cycle has been transformed into a multidimensional Design-Build-Test-Learn-Predict (DBTLP) workflow, leading to significantly improved operational efficiency and reduced labor consumption. Here, we comprehensively review the main components and recent advances in AI-aided microbial production, focusing on genome annotation, AI-aided protein engineering, artificial functional protein design, and AI-enabled pathway prediction. Finally, we discuss the challenges of integrating novel AI techniques into biology and propose the potential of large language models (LLMs) in advancing microbial production.
Collapse
Affiliation(s)
- Xinyu Gong
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Jianli Zhang
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Qi Gan
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Yuxi Teng
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Jixin Hou
- School of ECAM, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Yanjun Lyu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019, USA
| | - Zhengliang Liu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Zihao Wu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Runpeng Dai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yusong Zou
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Xianqiao Wang
- School of ECAM, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Dajiang Zhu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Tianming Liu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Yajun Yan
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA.
| |
Collapse
|
6
|
Jansen Z, Alameri A, Wei Q, Kulhanek DL, Gilmour AR, Halper S, Schwalm ND, Thyer R. A modular toolkit for environmental Rhodococcus, Gordonia, and Nocardia enables complex metabolic manipulation. Appl Environ Microbiol 2024; 90:e0034024. [PMID: 39082821 PMCID: PMC11337820 DOI: 10.1128/aem.00340-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 06/29/2024] [Indexed: 08/22/2024] Open
Abstract
Soil-dwelling Actinomycetes are a diverse and ubiquitous component of the global microbiome but largely lack genetic tools comparable to those available in model species such as Escherichia coli or Pseudomonas putida, posing a fundamental barrier to their characterization and utilization as hosts for biotechnology. To address this, we have developed a modular plasmid assembly framework, along with a series of genetic control elements for the previously genetically intractable Gram-positive environmental isolate Rhodococcus ruber C208, and demonstrate conserved functionality in 11 additional environmental isolates of Rhodococcus, Nocardia, and Gordonia. This toolkit encompasses five Mycobacteriale origins of replication, five broad-host-range antibiotic resistance markers, transcriptional and translational control elements, fluorescent reporters, a tetracycline-inducible system, and a counter-selectable marker. We use this toolkit to interrogate the carotenoid biosynthesis pathway in Rhodococcus erythropolis N9T-4, a weakly carotenogenic environmental isolate and engineer higher pathway flux toward the keto-carotenoid canthaxanthin. This work establishes several new genetic tools for environmental Mycobacteriales and provides a synthetic biology framework to support the design of complex genetic circuits in these species.IMPORTANCESoil-dwelling Actinomycetes, particularly the Mycobacteriales, include both diverse new hosts for sustainable biomanufacturing and emerging opportunistic pathogens. Rhodococcus, Gordonia, and Nocardia are three abundant genera with particularly flexible metabolisms and untapped potential for natural product discovery. Among these, Rhodococcus ruber C208 was shown to degrade polyethylene; Gordonia paraffinivorans can assimilate carbon from solid hydrocarbons; and Nocardia neocaledoniensis (and many other Nocardia spp.) possesses dual isoprenoid biosynthesis pathways. Many species accumulate high levels of carotenoid pigments, indicative of highly active isoprenoid biosynthesis pathways which may be harnessed for fermentation of terpenes and other commodity isoprenoids. Modular genetic toolkits have proven valuable for both fundamental and applied research in model organisms, but such tools are lacking for most Actinomycetes. Our suite of genetic tools and DNA assembly framework were developed for broad functionality and to facilitate rapid prototyping of genetic constructs in these organisms.
Collapse
Affiliation(s)
- Zachary Jansen
- Systems, Synthetic, and Physical Biology, Rice University, Houston, Texas, USA
| | - Abdulaziz Alameri
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA
| | - Qiyao Wei
- Department of Bioengineering, Rice University, Houston, Texas, USA
| | - Devon L. Kulhanek
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA
| | - Andrew R. Gilmour
- Systems, Synthetic, and Physical Biology, Rice University, Houston, Texas, USA
| | - Sean Halper
- DEVCOM Army Research Laboratory, Adelphi, Maryland, USA
| | | | - Ross Thyer
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA
| |
Collapse
|
7
|
Li G, Zhang N, Dai X, Fan L. EnzyACT: A Novel Deep Learning Method to Predict the Impacts of Single and Multiple Mutations on Enzyme Activity. J Chem Inf Model 2024; 64:5912-5921. [PMID: 39038814 PMCID: PMC11323264 DOI: 10.1021/acs.jcim.4c00920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 07/01/2024] [Accepted: 07/09/2024] [Indexed: 07/24/2024]
Abstract
Enzyme engineering involves the customization of enzymes by introducing mutations to expand the application scope of natural enzymes. One limitation of that is the complex interaction between two key properties, activity and stability, where the enhancement of one often leads to the reduction of the other, also called the trade-off mechanism. Although dozens of methods that predict the change of protein stability upon mutations have been developed, the prediction of the effect on activity is still in its early stage. Therefore, developing a fast and accurate method to predict the impact of the mutations on enzyme activity is helpful for enzyme design and understanding of the trade-off mechanism. Here, we introduce a novel approach, EnzyACT, a deep learning method that fuses graph technique and protein embedding to predict activity changes upon single or multiple mutations. Our model combines graph-based techniques and language models to predict the activity changes. Moreover, EnzyACT is trained on a new curated data set including both single- and multiple-point mutations. When benchmarked on multiple independent data sets, it shows uniform performance on problems affected by mutations. This work also provides insights into the impact of distant mutations within activity design, which could also be useful for predicting catalytic residues and developing improved enzyme-engineering strategies.
Collapse
Affiliation(s)
- Gen Li
- Production
and R&D Center I of LSS, GenScript (Shanghai)
Biotech Co.,Ltd., Shanghai 200131, China
| | - Ning Zhang
- Production
and R&D Center I of LSS, GenScript Biotech
Corporation, Nanjing 211122, China
| | - Xiaowen Dai
- Production
and R&D Center I of LSS, GenScript Biotech
Corporation, Nanjing 211122, China
| | - Long Fan
- Production
and R&D Center I of LSS, GenScript (Shanghai)
Biotech Co.,Ltd., Shanghai 200131, China
| |
Collapse
|
8
|
Zhou J, Huang M. Navigating the landscape of enzyme design: from molecular simulations to machine learning. Chem Soc Rev 2024; 53:8202-8239. [PMID: 38990263 DOI: 10.1039/d4cs00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Global environmental issues and sustainable development call for new technologies for fine chemical synthesis and waste valorization. Biocatalysis has attracted great attention as the alternative to the traditional organic synthesis. However, it is challenging to navigate the vast sequence space to identify those proteins with admirable biocatalytic functions. The recent development of deep-learning based structure prediction methods such as AlphaFold2 reinforced by different computational simulations or multiscale calculations has largely expanded the 3D structure databases and enabled structure-based design. While structure-based approaches shed light on site-specific enzyme engineering, they are not suitable for large-scale screening of potential biocatalysts. Effective utilization of big data using machine learning techniques opens up a new era for accelerated predictions. Here, we review the approaches and applications of structure-based and machine-learning guided enzyme design. We also provide our view on the challenges and perspectives on effectively employing enzyme design approaches integrating traditional molecular simulations and machine learning, and the importance of database construction and algorithm development in attaining predictive ML models to explore the sequence fitness landscape for the design of admirable biocatalysts.
Collapse
Affiliation(s)
- Jiahui Zhou
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| | - Meilan Huang
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| |
Collapse
|
9
|
Landwehr GM, Vogeli B, Tian C, Singal B, Gupta A, Lion R, Sargent EH, Karim AS, Jewett MC. A synthetic cell-free pathway for biocatalytic upgrading of one-carbon substrates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.08.607227. [PMID: 39149402 PMCID: PMC11326285 DOI: 10.1101/2024.08.08.607227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Biotechnological processes hold tremendous potential for the efficient and sustainable conversion of one-carbon (C1) substrates into complex multi-carbon products. However, the development of robust and versatile biocatalytic systems for this purpose remains a significant challenge. In this study, we report a hybrid electrochemical-biochemical cell-free system for the conversion of C1 substrates into the universal biological building block acetyl-CoA. The synthetic reductive formate pathway (ReForm) consists of five core enzymes catalyzing non-natural reactions that were established through a cell-free enzyme engineering platform. We demonstrate that ReForm works in a plug-and-play manner to accept diverse C1 substrates including CO2 equivalents. We anticipate that ReForm will facilitate efforts to build and improve synthetic C1 utilization pathways for a formate-based bioeconomy.
Collapse
Affiliation(s)
- Grant M. Landwehr
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, 60208, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, 60208, USA
| | - Bastian Vogeli
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, 60208, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, 60208, USA
| | - Cong Tian
- Department of Chemistry, Northwestern University, Evanston, IL, 60208, USA
| | - Bharti Singal
- Stanford SLAC CryoEM Initiative, Stanford University; Stanford, CA 94305, USA
| | - Anika Gupta
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, 60208, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, 60208, USA
| | - Rebeca Lion
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, 60208, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, 60208, USA
| | - Edward H. Sargent
- Department of Chemistry, Northwestern University, Evanston, IL, 60208, USA
| | - Ashty S. Karim
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, 60208, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, 60208, USA
| | - Michael C. Jewett
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, 60208, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, 60208, USA
- Department of Bioengineering, Stanford University; Stanford, CA 94305, USA
| |
Collapse
|
10
|
Diaz DJ, Gong C, Ouyang-Zhang J, Loy JM, Wells J, Yang D, Ellington AD, Dimakis AG, Klivans AR. Stability Oracle: a structure-based graph-transformer framework for identifying stabilizing mutations. Nat Commun 2024; 15:6170. [PMID: 39043654 PMCID: PMC11266546 DOI: 10.1038/s41467-024-49780-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 06/14/2024] [Indexed: 07/25/2024] Open
Abstract
Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle: a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations. Our framework introduces several innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time, such as: Thermodynamic Permutations for data augmentation, structural amino acid embeddings to model a mutation with a single structure, a protein structure-specific attention-bias mechanism that makes transformers a viable alternative to graph neural networks. We provide training/test splits that mitigate data leakage and ensure proper model evaluation. Furthermore, to examine our data engineering contributions, we fine-tune ESM2 representations (Prostata-IFML) and achieve SOTA for sequence-based models. Notably, Stability Oracle outperforms Prostata-IFML even though it was pretrained on 2000X less proteins and has 548X less parameters. Our framework establishes a path for fine-tuning structure-based transformers to virtually any phenotype, a necessary task for accelerating the development of protein-based biotechnologies.
Collapse
Affiliation(s)
- Daniel J Diaz
- UT Austin, Department of Computer Science, Austin, TX, 78712, USA.
- Intelligent Proteins, LLC, Austin, TX, 78712, USA.
- UT Austin, Department of Chemistry, Austin, TX, 78712, USA.
| | - Chengyue Gong
- UT Austin, Department of Computer Science, Austin, TX, 78712, USA
| | | | - James M Loy
- Intelligent Proteins, LLC, Austin, TX, 78712, USA
- UT Austin, Department of Molecular Biosciences, Austin, TX, 78712, USA
| | - Jordan Wells
- UT Austin, McKetta Department of Chemical Engineering, Austin, TX, 78712, USA
| | - David Yang
- UT Austin, Department of Molecular Biosciences, Austin, TX, 78712, USA
| | | | - Alexandros G Dimakis
- UT Austin, Chandra Family Department of Electrical and Computer Engineering, Austin, TX, 78712, USA
| | - Adam R Klivans
- UT Austin, Department of Computer Science, Austin, TX, 78712, USA
| |
Collapse
|
11
|
Hunter Wilson R, Damodaran AR, Bhagi-Damodaran A. Machine learning guided rational design of a non-heme iron-based lysine dioxygenase improves its total turnover number. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.04.597480. [PMID: 38895203 PMCID: PMC11185610 DOI: 10.1101/2024.06.04.597480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Highly selective C-H functionalization remains an ongoing challenge in organic synthetic methodologies. Biocatalysts are robust tools for achieving these difficult chemical transformations. Biocatalyst engineering has often required directed evolution or structure-based rational design campaigns to improve their activities. In recent years, machine learning has been integrated into these workflows to improve the discovery of beneficial enzyme variants. In this work, we combine a structure-based machine-learning algorithm with classical molecular dynamics simulations to down select mutations for rational design of a non-heme iron-dependent lysine dioxygenase, LDO. This approach consistently resulted in functional LDO mutants and circumvents the need for extensive study of mutational activity before-hand. Our rationally designed single mutants purified with up to 2-fold higher yields than WT and displayed higher total turnover numbers (TTN). Combining five such single mutations into a pentamutant variant, LPNYI LDO, leads to a 40% improvement in the TTN (218±3) as compared to WT LDO (TTN = 160±2). Overall, this work offers a low-barrier approach for those seeking to synergize machine learning algorithms with pre-existing protein engineering strategies.
Collapse
Affiliation(s)
- R Hunter Wilson
- Department of Chemistry, University of Minnesota, Twin Cities, Minneapolis, MN, 55455
| | - Anoop R Damodaran
- Department of Chemistry, University of Minnesota, Twin Cities, Minneapolis, MN, 55455
| | | |
Collapse
|
12
|
Daffern N, Johansson KE, Baumer ZT, Robertson NR, Woojuh J, Bedewitz MA, Davis Z, Wheeldon I, Cutler SR, Lindorff-Larsen K, Whitehead TA. GMMA Can Stabilize Proteins Across Different Functional Constraints. J Mol Biol 2024; 436:168586. [PMID: 38663544 DOI: 10.1016/j.jmb.2024.168586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/16/2024] [Accepted: 04/17/2024] [Indexed: 05/06/2024]
Abstract
Stabilizing proteins without otherwise hampering their function is a central task in protein engineering and design. PYR1 is a plant hormone receptor that has been engineered to bind diverse small molecule ligands. We sought a set of generalized mutations that would provide stability without affecting functionality for PYR1 variants with diverse ligand-binding capabilities. To do this we used a global multi-mutant analysis (GMMA) approach, which can identify substitutions that have stabilizing effects and do not lower function. GMMA has the added benefit of finding substitutions that are stabilizing in different sequence contexts and we hypothesized that applying GMMA to PYR1 with different functionalities would identify this set of generalized mutations. Indeed, conducting FACS and deep sequencing of libraries for PYR1 variants with two different functionalities and applying a GMMA analysis identified 5 substitutions that, when inserted into four PYR1 variants that each bind a unique ligand, provided an increase of 2-6 °C in thermal inactivation temperature and no decrease in functionality.
Collapse
Affiliation(s)
- Nicolas Daffern
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA
| | - Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Zachary T Baumer
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA
| | | | - Janty Woojuh
- Department of Botany and Plant Sciences, University of California, Riverside, USA
| | - Matthew A Bedewitz
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA
| | - Zoë Davis
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA
| | - Ian Wheeldon
- Department of Chemical and Environmental Engineering, University of California, Riverside, USA; Institute for Integrative Genome Biology, University of California, Riverside, Riverside, CA, USA
| | - Sean R Cutler
- Department of Botany and Plant Sciences, University of California, Riverside, USA; Institute for Integrative Genome Biology, University of California, Riverside, Riverside, CA, USA; Center for Plant Cell Biology, University of California, Riverside, Riverside, CA, USA
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Timothy A Whitehead
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA.
| |
Collapse
|
13
|
Joho Y, Royan S, Caputo AT, Newton S, Peat TS, Newman J, Jackson C, Ardevol A. Enhancing PET Degrading Enzymes: A Combinatory Approach. Chembiochem 2024; 25:e202400084. [PMID: 38584134 DOI: 10.1002/cbic.202400084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/02/2024] [Accepted: 04/04/2024] [Indexed: 04/09/2024]
Abstract
Plastic waste has become a substantial environmental issue. A potential strategy to mitigate this problem is to use enzymatic hydrolysis of plastics to depolymerize post-consumer waste and allow it to be reused. Over the last few decades, the use of enzymatic PET-degrading enzymes has shown promise as a great solution for creating a circular plastic waste economy. PsPETase from Piscinibacter sakaiensis has been identified as an enzyme with tremendous potential for such applications. But to improve its efficiency, enzyme engineering has been applied aiming at enhancing its thermal stability, enzymatic activity, and ease of production. Here, we combine different strategies such as structure-based rational design, ancestral sequence reconstruction and machine learning to engineer a more highly active Combi-PETase variant with a melting temperature of 70 °C and optimal performance at 60 °C. Furthermore, this study demonstrates that these approaches, commonly used in other works of enzyme engineering, are most effective when utilized in combination, enabling the improvement of enzymes for industrial applications.
Collapse
Affiliation(s)
- Yvonne Joho
- Manufacturing, Commonwealth Scientific and Industrial Research Organisation, Clayton, Victoria, 3168, Australia
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- CSIRO Advanced Engineering Biology Future Science Platform, GPO Box 1700, Canberra, ACT 2601, Australia
| | - Santana Royan
- Manufacturing, Commonwealth Scientific and Industrial Research Organisation, Clayton, Victoria, 3168, Australia
| | - Alessandro T Caputo
- Manufacturing, Commonwealth Scientific and Industrial Research Organisation, Clayton, Victoria, 3168, Australia
| | - Sophia Newton
- Manufacturing, Commonwealth Scientific and Industrial Research Organisation, Clayton, Victoria, 3168, Australia
| | - Thomas S Peat
- School of Biotechnology & Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Janet Newman
- School of Biotechnology & Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Colin Jackson
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Synthetic Biology, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Albert Ardevol
- Manufacturing, Commonwealth Scientific and Industrial Research Organisation, Clayton, Victoria, 3168, Australia
- CSIRO Advanced Engineering Biology Future Science Platform, GPO Box 1700, Canberra, ACT 2601, Australia
| |
Collapse
|
14
|
Pu Z, Cao J, Wu W, Song Z, Yang L, Wu J, Yu H. Reconstructing dynamics correlation network to simultaneously improve activity and stability of 2,3-butanediol dehydrogenase by design of distal interchain disulfide bonds. Int J Biol Macromol 2024; 267:131415. [PMID: 38582485 DOI: 10.1016/j.ijbiomac.2024.131415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 04/03/2024] [Accepted: 04/03/2024] [Indexed: 04/08/2024]
Abstract
The complete enzyme catalytic cycle includes substrate binding, chemical reaction and product release, in which different dynamic conformations are adopted. Due to the complex relationship among enzyme activity, stability and dynamics, the directed evolution of enzymes for improved activity or stability commonly leads to a trade-off in stability or activity. It hence remains a challenge to engineer an enzyme to have both enhanced activity and stability. Here, we have attempted to reconstruct the dynamics correlation network involved with active center to improve both activity and stability of a 2,3-butanediol dehydrogenase (2,3-BDH) by introducing inter-chain disulfide bonds. A computational strategy was first applied to evaluate the effect of introducing inter-chain disulfide bond on activity and stability of three 2,3-BDHs, and the N258C mutation of 2,3-BDH from Corynebacterium glutamicum (CgBDH) was proved to be effective in improving both activity and stability. In the results, CgBDH-N258C showed a different unfolding curve from the wild type, with two melting temperatures (Tm) of 68.3 °C and 50.8 °C, 19.7 °C and 2 °C higher than 48.6 °C of the wild type. Its half-life was also improved by 14.8-fold compared to the wild type. Catalytic efficiency (kcat/Km) of the mutant was increased by 7.9-fold toward native substrate diacetyl and 8.8-fold toward non-native substrate 2,5-hexanedione compared to the wild type. Molecular dynamics simulations revealed that an interaction network formed by Cys258, Arg162, Ala144 and the catalytic residues was reconstructed in the mutant and the dynamics change caused by the disulfide bond could be propagated through the interactions network. This improved the enzyme stability and activity by decreasing the flexibility and locking more "reactive" pose, respectively. Further construction of mutations including A144G showing a 44-fold improvement in catalytic efficiency toward meso-2,3-BD confirmed the role of modifying dynamics correlation network in tunning enzyme activity and selectivity. This study provided important insights into the relationship among dynamics, enzyme catalysis and stability, and will be useful in the designing new enzymes with co-evolution of stability, activity and selectivity.
Collapse
Affiliation(s)
- Zhongji Pu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang 310027, China; ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang 311200, China; Xianghu Laboratory, Hangzhou 311231, China
| | - Jiawen Cao
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang 310027, China
| | - Wenhui Wu
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang 311200, China
| | - Zhongdi Song
- Key Laboratory of Pollution Exposure and Health Intervention of Zhejiang Province, Interdisciplinary Research Academy, Zhejiang Shuren University, Hangzhou 310015, China
| | - Lirong Yang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang 310027, China; ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang 311200, China
| | - Jianping Wu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang 310027, China; ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang 311200, China
| | - Haoran Yu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang 310027, China; ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang 311200, China.
| |
Collapse
|
15
|
Tripp A, Braun M, Wieser F, Oberdorfer G, Lechner H. Click, Compute, Create: A Review of Web-based Tools for Enzyme Engineering. Chembiochem 2024:e202400092. [PMID: 38634409 DOI: 10.1002/cbic.202400092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/14/2024] [Accepted: 04/15/2024] [Indexed: 04/19/2024]
Abstract
Enzyme engineering, though pivotal across various biotechnological domains, is often plagued by its time-consuming and labor-intensive nature. This review aims to offer an overview of supportive in silico methodologies for this demanding endeavor. Starting from methods to predict protein structures, to classification of their activity and even the discovery of new enzymes we continue with describing tools used to increase thermostability and production yields of selected targets. Subsequently, we discuss computational methods to modulate both, the activity as well as selectivity of enzymes. Last, we present recent approaches based on cutting-edge machine learning methods to redesign enzymes. With exception of the last chapter, there is a strong focus on methods easily accessible via web-interfaces or simple Python-scripts, therefore readily useable for a diverse and broad community.
Collapse
Affiliation(s)
- Adrian Tripp
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Markus Braun
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Florian Wieser
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Gustav Oberdorfer
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
- BioTechMed, Graz, Austria
| | - Horst Lechner
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
- BioTechMed, Graz, Austria
| |
Collapse
|
16
|
Lin W, Wells J, Wang Z, Orengo C, Martin ACR. Enhancing missense variant pathogenicity prediction with protein language models using VariPred. Sci Rep 2024; 14:8136. [PMID: 38584172 PMCID: PMC10999449 DOI: 10.1038/s41598-024-51489-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 01/05/2024] [Indexed: 04/09/2024] Open
Abstract
Computational approaches for predicting the pathogenicity of genetic variants have advanced in recent years. These methods enable researchers to determine the possible clinical impact of rare and novel variants. Historically these prediction methods used hand-crafted features based on structural, evolutionary, or physiochemical properties of the variant. In this study we propose a novel framework that leverages the power of pre-trained protein language models to predict variant pathogenicity. We show that our approach VariPred (Variant impact Predictor) outperforms current state-of-the-art methods by using an end-to-end model that only requires the protein sequence as input. Using one of the best-performing protein language models (ESM-1b), we establish a robust classifier that requires no calculation of structural features or multiple sequence alignments. We compare the performance of VariPred with other representative models including 3Cnet, Polyphen-2, REVEL, MetaLR, FATHMM and ESM variant. VariPred performs as well as, or in most cases better than these other predictors using six variant impact prediction benchmarks despite requiring only sequence data and no pre-processing of the data.
Collapse
Affiliation(s)
- Weining Lin
- Division of Biosciences, Institute of Structural and Molecular Biology, University College London, London, UK
| | - Jude Wells
- Department of Computer Science, University College London, London, UK
| | - Zeyuan Wang
- College of Computer Science and Technology, Zhejiang University, Zhejiang, China
| | - Christine Orengo
- Division of Biosciences, Institute of Structural and Molecular Biology, University College London, London, UK.
| | - Andrew C R Martin
- Division of Biosciences, Institute of Structural and Molecular Biology, University College London, London, UK.
| |
Collapse
|
17
|
Liu Y, Bender SG, Sorigue D, Diaz DJ, Ellington AD, Mann G, Allmendinger S, Hyster TK. Asymmetric Synthesis of α-Chloroamides via Photoenzymatic Hydroalkylation of Olefins. J Am Chem Soc 2024; 146:7191-7197. [PMID: 38442365 DOI: 10.1021/jacs.4c00927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
Photoenzymatic intermolecular hydroalkylations of olefins are highly enantioselective for chiral centers formed during radical termination but poorly selective for centers set in the C-C bond-forming event. Here, we report the evolution of a flavin-dependent "ene"-reductase to catalyze the coupling of α,α-dichloroamides with alkenes to afford α-chloroamides in good yield with excellent chemo- and stereoselectivity. These products can serve as linchpins in the synthesis of pharmaceutically valuable motifs. Mechanistic studies indicate that radical formation occurs by exciting a charge-transfer complex templated by the protein. Precise control over the orientation of molecules within the charge-transfer complex potentially accounts for the observed stereoselectivity. The work expands the types of motifs that can be prepared using photoenzymatic catalysis.
Collapse
Affiliation(s)
- Yi Liu
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States
| | - Sophie G Bender
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States
| | - Damien Sorigue
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States
- Aix-Marseille University, CEA, CNRS, Institute of Biosciences and Biotechnologies, BIAM Cadarache, 13108 Saint-Paul-lez-Durance, France
| | - Daniel J Diaz
- Department of Chemistry, University of Texas at Austin, Austin, Texas 78712, United States
- Institute for Foundations of Machine Learning, University of Texas at Austin, Austin, Texas 78712, United States
| | - Andrew D Ellington
- Department of Molecular Bioscience, University of Texas at Austin, Austin, Texas 78712, United States
| | - Greg Mann
- Novartis Pharm. AG, Basel 4002, Switzerland
| | | | - Todd K Hyster
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States
| |
Collapse
|
18
|
d'Oelsnitz S, Diaz DJ, Kim W, Acosta DJ, Dangerfield TL, Schechter MW, Minus MB, Howard JR, Do H, Loy JM, Alper HS, Zhang YJ, Ellington AD. Biosensor and machine learning-aided engineering of an amaryllidaceae enzyme. Nat Commun 2024; 15:2084. [PMID: 38453941 PMCID: PMC10920890 DOI: 10.1038/s41467-024-46356-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 02/22/2024] [Indexed: 03/09/2024] Open
Abstract
A major challenge to achieving industry-scale biomanufacturing of therapeutic alkaloids is the slow process of biocatalyst engineering. Amaryllidaceae alkaloids, such as the Alzheimer's medication galantamine, are complex plant secondary metabolites with recognized therapeutic value. Due to their difficult synthesis they are regularly sourced by extraction and purification from the low-yielding daffodil Narcissus pseudonarcissus. Here, we propose an efficient biosensor-machine learning technology stack for biocatalyst development, which we apply to engineer an Amaryllidaceae enzyme in Escherichia coli. Directed evolution is used to develop a highly sensitive (EC50 = 20 μM) and specific biosensor for the key Amaryllidaceae alkaloid branchpoint 4'-O-methylnorbelladine. A structure-based residual neural network (MutComputeX) is subsequently developed and used to generate activity-enriched variants of a plant methyltransferase, which are rapidly screened with the biosensor. Functional enzyme variants are identified that yield a 60% improvement in product titer, 2-fold higher catalytic activity, and 3-fold lower off-product regioisomer formation. A solved crystal structure elucidates the mechanism behind key beneficial mutations.
Collapse
Affiliation(s)
- Simon d'Oelsnitz
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA.
- Synthetic Biology HIVE, Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA.
| | - Daniel J Diaz
- Department of Chemistry, University of Texas at Austin, Austin, TX, 78712, USA
- Institute for Foundations of Machine Learning, University of Texas at Austin, Austin, TX, 78712, USA
| | - Wantae Kim
- McKetta Department of Chemical Engineering, University of Texas at Austin, Austin, TX, 78712, USA
| | - Daniel J Acosta
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Tyler L Dangerfield
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Mason W Schechter
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Matthew B Minus
- Department of Chemistry, Prairie View A&M University, 100 University Dr, Prairie View, TX, 77446, USA
| | - James R Howard
- Department of Chemistry, University of Texas at Austin, Austin, TX, 78712, USA
| | - Hannah Do
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - James M Loy
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Hal S Alper
- McKetta Department of Chemical Engineering, University of Texas at Austin, Austin, TX, 78712, USA
| | - Y Jessie Zhang
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Andrew D Ellington
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| |
Collapse
|
19
|
Goshisht MK. Machine Learning and Deep Learning in Synthetic Biology: Key Architectures, Applications, and Challenges. ACS OMEGA 2024; 9:9921-9945. [PMID: 38463314 PMCID: PMC10918679 DOI: 10.1021/acsomega.3c05913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 01/19/2024] [Accepted: 01/30/2024] [Indexed: 03/12/2024]
Abstract
Machine learning (ML), particularly deep learning (DL), has made rapid and substantial progress in synthetic biology in recent years. Biotechnological applications of biosystems, including pathways, enzymes, and whole cells, are being probed frequently with time. The intricacy and interconnectedness of biosystems make it challenging to design them with the desired properties. ML and DL have a synergy with synthetic biology. Synthetic biology can be employed to produce large data sets for training models (for instance, by utilizing DNA synthesis), and ML/DL models can be employed to inform design (for example, by generating new parts or advising unrivaled experiments to perform). This potential has recently been brought to light by research at the intersection of engineering biology and ML/DL through achievements like the design of novel biological components, best experimental design, automated analysis of microscopy data, protein structure prediction, and biomolecular implementations of ANNs (Artificial Neural Networks). I have divided this review into three sections. In the first section, I describe predictive potential and basics of ML along with myriad applications in synthetic biology, especially in engineering cells, activity of proteins, and metabolic pathways. In the second section, I describe fundamental DL architectures and their applications in synthetic biology. Finally, I describe different challenges causing hurdles in the progress of ML/DL and synthetic biology along with their solutions.
Collapse
Affiliation(s)
- Manoj Kumar Goshisht
- Department of Chemistry, Natural and
Applied Sciences, University of Wisconsin—Green
Bay, Green
Bay, Wisconsin 54311-7001, United States
| |
Collapse
|
20
|
Yang J, Li FZ, Arnold FH. Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering. ACS CENTRAL SCIENCE 2024; 10:226-241. [PMID: 38435522 PMCID: PMC10906252 DOI: 10.1021/acscentsci.3c01275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 12/26/2023] [Accepted: 01/16/2024] [Indexed: 03/05/2024]
Abstract
Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.
Collapse
Affiliation(s)
- Jason Yang
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Francesca-Zhoufan Li
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Frances H. Arnold
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
21
|
Pun MN, Ivanov A, Bellamy Q, Montague Z, LaMont C, Bradley P, Otwinowski J, Nourmohammad A. Learning the shape of protein microenvironments with a holographic convolutional neural network. Proc Natl Acad Sci U S A 2024; 121:e2300838121. [PMID: 38300863 PMCID: PMC10861886 DOI: 10.1073/pnas.2300838121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 11/29/2023] [Indexed: 02/03/2024] Open
Abstract
Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from its sequence or structure remains a major challenge. Here, we introduce holographic convolutional neural network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein stability and binding of protein complexes. Our interpretable computational model for protein structure-function maps could guide design of novel proteins with desired function.
Collapse
Affiliation(s)
- Michael N. Pun
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Andrew Ivanov
- Department of Physics, University of Washington, Seattle, WA98195
| | - Quinn Bellamy
- Department of Physics, University of Washington, Seattle, WA98195
| | - Zachary Montague
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Colin LaMont
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Philip Bradley
- Fred Hutchinson Cancer Center, Seattle, WA98102
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Jakub Otwinowski
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
- Dyno Therapeutics, Watertown, MA02472
| | - Armita Nourmohammad
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
- Fred Hutchinson Cancer Center, Seattle, WA98102
- Department of Applied Mathematics, University of Washington, Seattle, WA98105
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA98195
| |
Collapse
|
22
|
Ao YF, Dörr M, Menke MJ, Born S, Heuson E, Bornscheuer UT. Data-Driven Protein Engineering for Improving Catalytic Activity and Selectivity. Chembiochem 2024; 25:e202300754. [PMID: 38029350 DOI: 10.1002/cbic.202300754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 12/01/2023]
Abstract
Protein engineering is essential for altering the substrate scope, catalytic activity and selectivity of enzymes for applications in biocatalysis. However, traditional approaches, such as directed evolution and rational design, encounter the challenge in dealing with the experimental screening process of a large protein mutation space. Machine learning methods allow the approximation of protein fitness landscapes and the identification of catalytic patterns using limited experimental data, thus providing a new avenue to guide protein engineering campaigns. In this concept article, we review machine learning models that have been developed to assess enzyme-substrate-catalysis performance relationships aiming to improve enzymes through data-driven protein engineering. Furthermore, we prospect the future development of this field to provide additional strategies and tools for achieving desired activities and selectivities.
Collapse
Affiliation(s)
- Yu-Fei Ao
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
- Beijing National Laboratory for Molecular Sciences, CAS Key Laboratory of Molecular Recognition and Function, Institute of Chemistry, Chinese Academy of Sciences, Zhongguancun North First Street 2, Beijing, 100190, China
- University of Chinese Academy of Sciences, Yuquan Road 19(A), Beijing, 100049, China
| | - Mark Dörr
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| | - Marian J Menke
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| | - Stefan Born
- Technische Universität Berlin, Chair of Bioprocess Engineering, Ackerstraße 76, 13355, Berlin, Germany
| | - Egon Heuson
- Univ. Lille, CNRS, Centrale Lille, Univ. Artois, UMR 8181 UCCS, Unité de Catalyse et Chimie du Solide, 59000, Lille, France
| | - Uwe T Bornscheuer
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| |
Collapse
|
23
|
Sumida K, Núñez-Franco R, Kalvet I, Pellock SJ, Wicky BIM, Milles LF, Dauparas J, Wang J, Kipnis Y, Jameson N, Kang A, De La Cruz J, Sankaran B, Bera AK, Jiménez-Osés G, Baker D. Improving Protein Expression, Stability, and Function with ProteinMPNN. J Am Chem Soc 2024; 146:2054-2061. [PMID: 38194293 PMCID: PMC10811672 DOI: 10.1021/jacs.3c10941] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/03/2023] [Accepted: 12/05/2023] [Indexed: 01/10/2024]
Abstract
Natural proteins are highly optimized for function but are often difficult to produce at a scale suitable for biotechnological applications due to poor expression in heterologous systems, limited solubility, and sensitivity to temperature. Thus, a general method that improves the physical properties of native proteins while maintaining function could have wide utility for protein-based technologies. Here, we show that the deep neural network ProteinMPNN, together with evolutionary and structural information, provides a route to increasing protein expression, stability, and function. For both myoglobin and tobacco etch virus (TEV) protease, we generated designs with improved expression, elevated melting temperatures, and improved function. For TEV protease, we identified multiple designs with improved catalytic activity as compared to the parent sequence and previously reported TEV variants. Our approach should be broadly useful for improving the expression, stability, and function of biotechnologically important proteins.
Collapse
Affiliation(s)
- Kiera
H. Sumida
- Department
of Chemistry, University of Washington, Seattle, Washington 98195, United States
- Institute
for Protein Design, University of Washington, Seattle, Washington 98195, United States
| | - Reyes Núñez-Franco
- Center
for Cooperative Research in Biosciences, Basque Research and Technology Alliance, Derio 48160, Spain
| | - Indrek Kalvet
- Institute
for Protein Design, University of Washington, Seattle, Washington 98195, United States
- Department
of Biochemistry, University of Washington, Seattle, Washington 98195, United States
- Howard
Hughes Medical Institute, University of
Washington, Seattle, Washington 98195, United States
| | - Samuel J. Pellock
- Institute
for Protein Design, University of Washington, Seattle, Washington 98195, United States
- Department
of Biochemistry, University of Washington, Seattle, Washington 98195, United States
| | - Basile I. M. Wicky
- Institute
for Protein Design, University of Washington, Seattle, Washington 98195, United States
- Department
of Biochemistry, University of Washington, Seattle, Washington 98195, United States
| | - Lukas F. Milles
- Institute
for Protein Design, University of Washington, Seattle, Washington 98195, United States
- Department
of Biochemistry, University of Washington, Seattle, Washington 98195, United States
| | - Justas Dauparas
- Institute
for Protein Design, University of Washington, Seattle, Washington 98195, United States
- Department
of Biochemistry, University of Washington, Seattle, Washington 98195, United States
| | - Jue Wang
- Institute
for Protein Design, University of Washington, Seattle, Washington 98195, United States
- Department
of Biochemistry, University of Washington, Seattle, Washington 98195, United States
| | - Yakov Kipnis
- Institute
for Protein Design, University of Washington, Seattle, Washington 98195, United States
- Department
of Biochemistry, University of Washington, Seattle, Washington 98195, United States
- Howard
Hughes Medical Institute, University of
Washington, Seattle, Washington 98195, United States
| | - Noel Jameson
- Department
of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Alex Kang
- Institute
for Protein Design, University of Washington, Seattle, Washington 98195, United States
| | - Joshmyn De La Cruz
- Institute
for Protein Design, University of Washington, Seattle, Washington 98195, United States
| | - Banumathi Sankaran
- Berkeley
Center for Structural Biology, Molecular Biophysics, and Integrated
Bioimaging, Lawrence Berkeley Laboratory, Berkeley, California 94720, United States
| | - Asim K. Bera
- Institute
for Protein Design, University of Washington, Seattle, Washington 98195, United States
- Department
of Biochemistry, University of Washington, Seattle, Washington 98195, United States
| | - Gonzalo Jiménez-Osés
- Center
for Cooperative Research in Biosciences, Basque Research and Technology Alliance, Derio 48160, Spain
- Ikerbasque,
Basque Foundation for Science, Bilbao 48013, Spain
| | - David Baker
- Institute
for Protein Design, University of Washington, Seattle, Washington 98195, United States
- Department
of Biochemistry, University of Washington, Seattle, Washington 98195, United States
- Howard
Hughes Medical Institute, University of
Washington, Seattle, Washington 98195, United States
| |
Collapse
|
24
|
Quezada A, Annapareddy A, Javanmardi K, Cooper J, Finkelstein IJ. Mammalian Antigen Display for Pandemic Countermeasures. Methods Mol Biol 2024; 2762:191-216. [PMID: 38315367 DOI: 10.1007/978-1-0716-3666-4_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
Pandemic countermeasures require the rapid design of antigens for vaccines, profiling patient antibody responses, assessing antigen structure-function landscapes, and the surveillance of emerging viral lineages. Cell surface display of a viral antigen or its subdomains can facilitate these goals by coupling the phenotypes of protein variants to their DNA sequence. Screening surface-displayed proteins via flow cytometry also eliminates time-consuming protein purification steps. Prior approaches have primarily relied on yeast as a display chassis. However, yeast often cannot express large viral glycoproteins, requiring their truncation into subdomains. Here, we describe a method to design and express antigens on the surface of mammalian HEK293T cells. We discuss three use cases, including screening of stabilizing mutations, deep mutational scanning, and epitope mapping. The mammalian antigen display platform described herein will accelerate ongoing and future pandemic countermeasures.
Collapse
Affiliation(s)
- Andrea Quezada
- Department of Molecular BioSciences, University of Texas at Austin, Austin, TX, USA
| | - Ankur Annapareddy
- Department of Molecular BioSciences, University of Texas at Austin, Austin, TX, USA
| | - Kamyab Javanmardi
- Department of Molecular BioSciences, University of Texas at Austin, Austin, TX, USA
| | - John Cooper
- Department of Molecular BioSciences, University of Texas at Austin, Austin, TX, USA
| | - Ilya J Finkelstein
- Department of Molecular BioSciences, University of Texas at Austin, Austin, TX, USA.
- Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, TX, USA.
| |
Collapse
|
25
|
Radley E, Davidson J, Foster J, Obexer R, Bell EL, Green AP. Engineering Enzymes for Environmental Sustainability. ANGEWANDTE CHEMIE (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2023; 135:e202309305. [PMID: 38516574 PMCID: PMC10952289 DOI: 10.1002/ange.202309305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Indexed: 03/23/2024]
Abstract
The development and implementation of sustainable catalytic technologies is key to delivering our net-zero targets. Here we review how engineered enzymes, with a focus on those developed using directed evolution, can be deployed to improve the sustainability of numerous processes and help to conserve our environment. Efficient and robust biocatalysts have been engineered to capture carbon dioxide (CO2) and have been embedded into new efficient metabolic CO2 fixation pathways. Enzymes have been refined for bioremediation, enhancing their ability to degrade toxic and harmful pollutants. Biocatalytic recycling is gaining momentum, with engineered cutinases and PETases developed for the depolymerization of the abundant plastic, polyethylene terephthalate (PET). Finally, biocatalytic approaches for accessing petroleum-based feedstocks and chemicals are expanding, using optimized enzymes to convert plant biomass into biofuels or other high value products. Through these examples, we hope to illustrate how enzyme engineering and biocatalysis can contribute to the development of cleaner and more efficient chemical industry.
Collapse
Affiliation(s)
- Emily Radley
- Department of Chemistry & Manchester Institute of Biotechnology The University of Manchester 131 Princess Street Manchester M1 7DN UK
| | - John Davidson
- Department of Chemistry & Manchester Institute of Biotechnology The University of Manchester 131 Princess Street Manchester M1 7DN UK
| | - Jake Foster
- Department of Chemistry & Manchester Institute of Biotechnology The University of Manchester 131 Princess Street Manchester M1 7DN UK
| | - Richard Obexer
- Department of Chemistry & Manchester Institute of Biotechnology The University of Manchester 131 Princess Street Manchester M1 7DN UK
| | - Elizabeth L Bell
- Renewable Resources and Enabling Sciences Center National Renewable Energy Laboratory Golden CO USA
- BOTTLE Consortium Golden CO USA
| | - Anthony P Green
- Department of Chemistry & Manchester Institute of Biotechnology The University of Manchester 131 Princess Street Manchester M1 7DN UK
| |
Collapse
|
26
|
Radley E, Davidson J, Foster J, Obexer R, Bell EL, Green AP. Engineering Enzymes for Environmental Sustainability. Angew Chem Int Ed Engl 2023; 62:e202309305. [PMID: 37651344 PMCID: PMC10952156 DOI: 10.1002/anie.202309305] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 08/23/2023] [Accepted: 08/24/2023] [Indexed: 09/02/2023]
Abstract
The development and implementation of sustainable catalytic technologies is key to delivering our net-zero targets. Here we review how engineered enzymes, with a focus on those developed using directed evolution, can be deployed to improve the sustainability of numerous processes and help to conserve our environment. Efficient and robust biocatalysts have been engineered to capture carbon dioxide (CO2 ) and have been embedded into new efficient metabolic CO2 fixation pathways. Enzymes have been refined for bioremediation, enhancing their ability to degrade toxic and harmful pollutants. Biocatalytic recycling is gaining momentum, with engineered cutinases and PETases developed for the depolymerization of the abundant plastic, polyethylene terephthalate (PET). Finally, biocatalytic approaches for accessing petroleum-based feedstocks and chemicals are expanding, using optimized enzymes to convert plant biomass into biofuels or other high value products. Through these examples, we hope to illustrate how enzyme engineering and biocatalysis can contribute to the development of cleaner and more efficient chemical industry.
Collapse
Affiliation(s)
- Emily Radley
- Department of Chemistry & Manchester Institute of BiotechnologyThe University of Manchester131 Princess StreetManchesterM1 7DNUK
| | - John Davidson
- Department of Chemistry & Manchester Institute of BiotechnologyThe University of Manchester131 Princess StreetManchesterM1 7DNUK
| | - Jake Foster
- Department of Chemistry & Manchester Institute of BiotechnologyThe University of Manchester131 Princess StreetManchesterM1 7DNUK
| | - Richard Obexer
- Department of Chemistry & Manchester Institute of BiotechnologyThe University of Manchester131 Princess StreetManchesterM1 7DNUK
| | - Elizabeth L. Bell
- Renewable Resources and Enabling Sciences CenterNational Renewable Energy LaboratoryGoldenCOUSA
- BOTTLE ConsortiumGoldenCOUSA
| | - Anthony P. Green
- Department of Chemistry & Manchester Institute of BiotechnologyThe University of Manchester131 Princess StreetManchesterM1 7DNUK
| |
Collapse
|
27
|
Hu YT, Hong XZ, Li HM, Yang JK, Shen W, Wang YW, Liu YH. Modifying the amino acids in conformational motion pathway of the α-amylase of Geobacillus stearothermophilus improved its activity and stability. Front Microbiol 2023; 14:1261245. [PMID: 38143856 PMCID: PMC10740195 DOI: 10.3389/fmicb.2023.1261245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 11/21/2023] [Indexed: 12/26/2023] Open
Abstract
Amino acids along the conformational motion pathway of the enzyme molecule correlated to its flexibility and rigidity. To enhance the enzyme activity and thermal stability, the motion pathway of Geobacillus stearothermophilus α-amylase has been identified and molecularly modified by using the neural relational inference model and deep learning tool. The significant differences in substrate specificity, enzymatic kinetics, optimal temperature, and thermal stability were observed among the mutants with modified amino acids along the pathway. Mutants especially the P44E demonstrated enhanced hydrolytic activity and catalytic efficiency (kcat/KM) than the wild-type enzyme to 95.0% and 93.8% respectively, with the optimum temperature increased to 90°C. This mutation from proline to glutamic acid has increased the number and the radius of the bottleneck of the channels, which might facilitate transporting large starch substrates into the enzyme. The mutation could also optimize the hydrogen bonding network of the catalytic center, and diminish the spatial hindering to the substrate entry and exit from the catalytic center.
Collapse
Affiliation(s)
- Yu-Ting Hu
- Pilot Base of Food Microbial Resources Utilization of Hubei Province, College of Life Science and Technology, Wuhan Polytechnic University, Wuhan, China
| | - Xi-Zhi Hong
- Pilot Base of Food Microbial Resources Utilization of Hubei Province, College of Life Science and Technology, Wuhan Polytechnic University, Wuhan, China
| | - Hui-Min Li
- Pilot Base of Food Microbial Resources Utilization of Hubei Province, College of Life Science and Technology, Wuhan Polytechnic University, Wuhan, China
| | - Jiang-Ke Yang
- Pilot Base of Food Microbial Resources Utilization of Hubei Province, College of Life Science and Technology, Wuhan Polytechnic University, Wuhan, China
| | - Wei Shen
- Pilot Base of Food Microbial Resources Utilization of Hubei Province, College of Life Science and Technology, Wuhan Polytechnic University, Wuhan, China
| | - Ya-Wei Wang
- Pilot Base of Food Microbial Resources Utilization of Hubei Province, College of Life Science and Technology, Wuhan Polytechnic University, Wuhan, China
| | - Yi-Han Liu
- Key Laboratory of Industrial Fermentation Microbiology, Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, The College of Biotechnology, Tianjin University of Science and Technology, Tianjin, China
| |
Collapse
|
28
|
Nava A, Roberts J, Haushalter RW, Wang Z, Keasling JD. Module-Based Polyketide Synthase Engineering for de Novo Polyketide Biosynthesis. ACS Synth Biol 2023; 12:3148-3155. [PMID: 37871264 PMCID: PMC10661043 DOI: 10.1021/acssynbio.3c00282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Indexed: 10/25/2023]
Abstract
Polyketide retrobiosynthesis, where the biosynthetic pathway of a given polyketide can be reversibly engineered due to the colinearity of the polyketide synthase (PKS) structure and function, has the potential to produce millions of organic molecules. Mixing and matching modules from natural PKSs is one of the routes to produce many of these molecules. Evolutionary analysis of PKSs suggests that traditionally used module boundaries may not lead to the most productive hybrid PKSs and that new boundaries around and within the ketosynthase domain may be more active when constructing hybrid PKSs. As this is still a nascent area of research, the generality of these design principles based on existing engineering efforts remains inconclusive. Recent advances in structural modeling and synthetic biology present an opportunity to accelerate PKS engineering by re-evaluating insights gained from previous engineering efforts with cutting edge tools.
Collapse
Affiliation(s)
- Alberto
A. Nava
- Joint
BioEnergy Institute, Lawrence Berkeley National
Laboratory, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence
Berkeley National Laboratory, Berkeley, California 94720, United States
- Department
of Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, California 94720, United States
| | - Jacob Roberts
- Joint
BioEnergy Institute, Lawrence Berkeley National
Laboratory, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence
Berkeley National Laboratory, Berkeley, California 94720, United States
- Department
of Bioengineering, University of California,
Berkeley, Berkeley, California 94720, United States
| | - Robert W. Haushalter
- Joint
BioEnergy Institute, Lawrence Berkeley National
Laboratory, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence
Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Zilong Wang
- Joint
BioEnergy Institute, Lawrence Berkeley National
Laboratory, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence
Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Jay D. Keasling
- Joint
BioEnergy Institute, Lawrence Berkeley National
Laboratory, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence
Berkeley National Laboratory, Berkeley, California 94720, United States
- Department
of Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, California 94720, United States
- Department
of Bioengineering, University of California,
Berkeley, Berkeley, California 94720, United States
- Center
for Synthetic Biochemistry, Shenzhen Institutes
for Advanced Technologies, Shenzhen 518055, P.R. China
- The
Novo
Nordisk Foundation Center for Biosustainability, Technical University Denmark, Kemitorvet, Building 220, Kongens Lyngby 2800, Denmark
| |
Collapse
|
29
|
Tu KJ, Diplas BH, Regal JA, Waitkus MS, Pirozzi CJ, Reitman ZJ. Mining cancer genomes for change-of-metabolic-function mutations. Commun Biol 2023; 6:1143. [PMID: 37950065 PMCID: PMC10638295 DOI: 10.1038/s42003-023-05475-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 10/17/2023] [Indexed: 11/12/2023] Open
Abstract
Enzymes with novel functions are needed to enable new organic synthesis techniques. Drawing inspiration from gain-of-function cancer mutations that functionally alter proteins and affect cellular metabolism, we developed METIS (Mutated Enzymes from Tumors In silico Screen). METIS identifies metabolism-altering cancer mutations using mutation recurrence rates and protein structure. We used METIS to screen 298,517 cancer mutations and identify 48 candidate mutations, including those previously identified to alter enzymatic function. Unbiased metabolomic profiling of cells exogenously expressing a candidate mutant (OGDHLp.A400T) supports an altered phenotype that boosts in vitro production of xanthosine, a pharmacologically useful chemical that is currently produced using unsustainable, water-intensive methods. We then applied METIS to 49 million cancer mutations, yielding a refined set of candidates that may impart novel enzymatic functions or contribute to tumor progression. Thus, METIS can be used to identify and catalog potentially-useful cancer mutations for green chemistry and therapeutic applications.
Collapse
Affiliation(s)
- Kevin J Tu
- Department of Radiation Oncology, Duke University, Durham, NC, 27710, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, 21044, USA
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, CB2 0RE, UK
| | - Bill H Diplas
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Joshua A Regal
- Department of Radiation Oncology, Duke University, Durham, NC, 27710, USA
| | | | | | - Zachary J Reitman
- Department of Radiation Oncology, Duke University, Durham, NC, 27710, USA.
- Department of Neurosurgery, Duke University, Durham, NC, 27710, USA.
- Department of Pathology, Duke University, Durham, NC, 27710, USA.
| |
Collapse
|
30
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
31
|
Charest N, Shen Y, Lai YC, Chen IA, Shea JE. Discovering pathways through ribozyme fitness landscapes using information theoretic quantification of epistasis. RNA (NEW YORK, N.Y.) 2023; 29:1644-1657. [PMID: 37580126 PMCID: PMC10578471 DOI: 10.1261/rna.079541.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 07/29/2023] [Indexed: 08/16/2023]
Abstract
The identification of catalytic RNAs is typically achieved through primarily experimental means. However, only a small fraction of sequence space can be analyzed even with high-throughput techniques. Methods to extrapolate from a limited data set to predict additional ribozyme sequences, particularly in a human-interpretable fashion, could be useful both for designing new functional RNAs and for generating greater understanding about a ribozyme fitness landscape. Using information theory, we express the effects of epistasis (i.e., deviations from additivity) on a ribozyme. This representation was incorporated into a simple model of the epistatic fitness landscape, which identified potentially exploitable combinations of mutations. We used this model to theoretically predict mutants of high activity for a self-aminoacylating ribozyme, identifying potentially active triple and quadruple mutants beyond the experimental data set of single and double mutants. The predictions were validated experimentally, with nine out of nine sequences being accurately predicted to have high activity. This set of sequences included mutants that form a previously unknown evolutionary "bridge" between two ribozyme families that share a common motif. Individual steps in the method could be examined, understood, and guided by a human, combining interpretability and performance in a simple model to predict ribozyme sequences by extrapolation.
Collapse
Affiliation(s)
- Nathaniel Charest
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
| | - Yuning Shen
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
| | - Yei-Chen Lai
- Department of Chemistry, National Chung Hsing University, Taichung City 40227, Taiwan
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, California 90095, USA
| | - Irene A Chen
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, California 90095, USA
| | - Joan-Emma Shea
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
| |
Collapse
|
32
|
Kunka A, Marques SM, Havlasek M, Vasina M, Velatova N, Cengelova L, Kovar D, Damborsky J, Marek M, Bednar D, Prokop Z. Advancing Enzyme's Stability and Catalytic Efficiency through Synergy of Force-Field Calculations, Evolutionary Analysis, and Machine Learning. ACS Catal 2023; 13:12506-12518. [PMID: 37822856 PMCID: PMC10563018 DOI: 10.1021/acscatal.3c02575] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 08/24/2023] [Indexed: 10/13/2023]
Abstract
Thermostability is an essential requirement for the use of enzymes in the bioindustry. Here, we compare different protein stabilization strategies using a challenging target, a stable haloalkane dehalogenase DhaA115. We observe better performance of automated stabilization platforms FireProt and PROSS in designing multiple-point mutations over the introduction of disulfide bonds and strengthening the intra- and the inter-domain contacts by in silico saturation mutagenesis. We reveal that the performance of automated stabilization platforms was still compromised due to the introduction of some destabilizing mutations. Notably, we show that their prediction accuracy can be improved by applying manual curation or machine learning for the removal of potentially destabilizing mutations, yielding highly stable haloalkane dehalogenases with enhanced catalytic properties. A comparison of crystallographic structures revealed that current stabilization rounds were not accompanied by large backbone re-arrangements previously observed during the engineering stability of DhaA115. Stabilization was achieved by improving local contacts including protein-water interactions. Our study provides guidance for further improvement of automated structure-based computational tools for protein stabilization.
Collapse
Affiliation(s)
- Antonin Kunka
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Brno 601 77, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Brno 601 77, Czech Republic
| | - Sérgio M. Marques
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Brno 601 77, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Brno 601 77, Czech Republic
| | - Martin Havlasek
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Brno 601 77, Czech Republic
| | - Michal Vasina
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Brno 601 77, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Brno 601 77, Czech Republic
| | - Nikola Velatova
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Brno 601 77, Czech Republic
| | - Lucia Cengelova
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Brno 601 77, Czech Republic
| | - David Kovar
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Brno 601 77, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Brno 601 77, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Brno 601 77, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Brno 601 77, Czech Republic
| | - Martin Marek
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Brno 601 77, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Brno 601 77, Czech Republic
| | - David Bednar
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Brno 601 77, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Brno 601 77, Czech Republic
| | - Zbynek Prokop
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Brno 601 77, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Brno 601 77, Czech Republic
| |
Collapse
|
33
|
Xiao B, Zhang C, Zhou J, Wang S, Meng H, Wu M, Zheng Y, Yu R. Design of SC PEP with enhanced stability against pepsin digestion and increased activity by machine learning and structural parameters modeling. Int J Biol Macromol 2023; 250:125933. [PMID: 37482154 DOI: 10.1016/j.ijbiomac.2023.125933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 06/20/2023] [Accepted: 07/20/2023] [Indexed: 07/25/2023]
Abstract
Prolyl endopeptidases from Sphingomonas capsulata (SC PEP) has attracted much attention as promising oral therapy candidate for celiac sprue, however, its low stability in the gastric environment leads to unsatisfactory clinical results. Therefore, improving its stability against pepsin digestion at low pH is crucial for clinical applications, but challenging. In this study, machine learning and physical parameter model were combined to design SC PEP mutants. After iterations, 20 mutants had higher hydrolysis activity in stomach environment, which was up to 14.1-fold compared with wild-type SC PEP. Mutant M24 involving stable and active mutations and pegylated M24 (M24-PEG) had higher activity of hydrolyzing immunogen in bread than wild-type SC PEP in vitro and in vivo, and residual immunogens in simulated gastric environment were only 1/8 and 1/10 of that in the wild-type SC PEP group. The total residual immunogens in the gastrointestinal tract of mice in the M24 and M24-PEG groups were <20 ppm, reaching the standard of non-toxic food. Our results indicate that the combination of M24 (or M24-PEG) with EP-B2 may be a promising candidate for celiac disease, and the strategies developed in this study provide a paradigm for the design of SC PEP stability mutants.
Collapse
Affiliation(s)
- Bin Xiao
- Department of Biopharmaceutics, West China School of Pharmacy, Sichuan University, Chengdu 610041, PR China; Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry, Sichuan Engineering Laboratory for Plant-Sourced Drug and Sichuan Research Center for Drug Precision Industrial Technology, West China School of Pharmacy Sichuan University, Chengdu 610041, PR China
| | - Chun Zhang
- Department of Biopharmaceutics, West China School of Pharmacy, Sichuan University, Chengdu 610041, PR China; Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry, Sichuan Engineering Laboratory for Plant-Sourced Drug and Sichuan Research Center for Drug Precision Industrial Technology, West China School of Pharmacy Sichuan University, Chengdu 610041, PR China
| | - Junxiu Zhou
- Department of Biopharmaceutics, West China School of Pharmacy, Sichuan University, Chengdu 610041, PR China; Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry, Sichuan Engineering Laboratory for Plant-Sourced Drug and Sichuan Research Center for Drug Precision Industrial Technology, West China School of Pharmacy Sichuan University, Chengdu 610041, PR China
| | - Sa Wang
- Department of Biopharmaceutics, West China School of Pharmacy, Sichuan University, Chengdu 610041, PR China; Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry, Sichuan Engineering Laboratory for Plant-Sourced Drug and Sichuan Research Center for Drug Precision Industrial Technology, West China School of Pharmacy Sichuan University, Chengdu 610041, PR China
| | - Huan Meng
- Department of Biopharmaceutics, West China School of Pharmacy, Sichuan University, Chengdu 610041, PR China; Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry, Sichuan Engineering Laboratory for Plant-Sourced Drug and Sichuan Research Center for Drug Precision Industrial Technology, West China School of Pharmacy Sichuan University, Chengdu 610041, PR China
| | - Miao Wu
- Department of Biopharmaceutics, West China School of Pharmacy, Sichuan University, Chengdu 610041, PR China; Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry, Sichuan Engineering Laboratory for Plant-Sourced Drug and Sichuan Research Center for Drug Precision Industrial Technology, West China School of Pharmacy Sichuan University, Chengdu 610041, PR China
| | - Yongxiang Zheng
- Department of Biopharmaceutics, West China School of Pharmacy, Sichuan University, Chengdu 610041, PR China; Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry, Sichuan Engineering Laboratory for Plant-Sourced Drug and Sichuan Research Center for Drug Precision Industrial Technology, West China School of Pharmacy Sichuan University, Chengdu 610041, PR China.
| | - Rong Yu
- Department of Biopharmaceutics, West China School of Pharmacy, Sichuan University, Chengdu 610041, PR China; Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry, Sichuan Engineering Laboratory for Plant-Sourced Drug and Sichuan Research Center for Drug Precision Industrial Technology, West China School of Pharmacy Sichuan University, Chengdu 610041, PR China.
| |
Collapse
|
34
|
Liu F, Wang T, Yang W, Zhang Y, Gong Y, Fan X, Wang G, Lu Z, Wang J. Current advances in the structural biology and molecular engineering of PETase. Front Bioeng Biotechnol 2023; 11:1263996. [PMID: 37795175 PMCID: PMC10546322 DOI: 10.3389/fbioe.2023.1263996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 08/31/2023] [Indexed: 10/06/2023] Open
Abstract
Poly(ethylene terephthalate) (PET) is a highly useful synthetic polyester plastic that is widely used in daily life. However, the increase in postconsumer PET as plastic waste that is recalcitrant to biodegradation in landfills and the natural environment has raised worldwide concern. Currently, traditional PET recycling processes with thermomechanical or chemical methods also result in the deterioration of the mechanical properties of PET. Therefore, it is urgent to develop more efficient and green strategies to address this problem. Recently, a novel mesophilic PET-degrading enzyme (IsPETase) from Ideonella sakaiensis was found to streamline PET biodegradation at 30°C, albeit with a lower PET-degrading activity than chitinase or chitinase-like PET-degrading enzymes. Consequently, the molecular engineering of more efficient PETases is still required for further industrial applications. This review details current knowledge on IsPETase, MHETase, and IsPETase-like hydrolases, including the structures, ligand‒protein interactions, and rational protein engineering for improved PET-degrading performance. In particular, applications of the engineered catalysts are highlighted, including metabolic engineering of the cell factories, enzyme immobilization or cell surface display. The information is expected to provide novel insights for the biodegradation of complex polymers.
Collapse
Affiliation(s)
- Fei Liu
- School of Biological Science, Jining Medical University, Rizhao, China
| | - Tao Wang
- School of Biological Science, Jining Medical University, Rizhao, China
| | - Wentao Yang
- School of Biological Science, Jining Medical University, Rizhao, China
| | - Yingkang Zhang
- School of Biological Science, Jining Medical University, Rizhao, China
| | - Yuming Gong
- School of Biological Science, Jining Medical University, Rizhao, China
| | - Xinxin Fan
- School of Biological Science, Jining Medical University, Rizhao, China
| | - Guocheng Wang
- School of Biological Science, Jining Medical University, Rizhao, China
| | - Zhenhua Lu
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jianmin Wang
- School of Pharmacy, Jining Medical University, Rizhao, China
| |
Collapse
|
35
|
Blázquez‐Sánchez P, Vargas JA, Furtado AA, Griñen A, Leonardo DA, Sculaccio SA, Pereira HD, Sonnendecker C, Zimmermann W, Díez B, Garratt RC, Ramírez‐Sarmiento CA. Engineering the catalytic activity of an Antarctic PET-degrading enzyme by loop exchange. Protein Sci 2023; 32:e4757. [PMID: 37574805 PMCID: PMC10464292 DOI: 10.1002/pro.4757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 08/09/2023] [Accepted: 08/10/2023] [Indexed: 08/15/2023]
Abstract
Several hydrolases have been described to degrade polyethylene terephthalate (PET) at moderate temperatures ranging from 25°C to 40°C. These mesophilic PET hydrolases (PETases) are less efficient in degrading this plastic polymer than their thermophilic homologs and have, therefore, been the subject of many protein engineering campaigns. However, enhancing their enzymatic activity through rational design or directed evolution poses a formidable challenge due to the need for exploring a large number of mutations. Additionally, evaluating the improvements in both activity and stability requires screening numerous variants, either individually or using high-throughput screening methods. Here, we utilize instead the design of chimeras as a protein engineering strategy to increase the activity and stability of Mors1, an Antarctic PETase active at 25°C. First, we obtained the crystal structure of Mors1 at 1.6 Å resolution, which we used as a scaffold for structure- and sequence-based chimeric design. Then, we designed a Mors1 chimera via loop exchange of a highly divergent active site loop from the thermophilic leaf-branch compost cutinase (LCC) into the equivalent region in Mors1. After restitution of an active site disulfide bond into this chimera, the enzyme exhibited a shift in optimal temperature for activity to 45°C and an increase in fivefold in PET hydrolysis when compared with wild-type Mors1 at 25°C. Our results serve as a proof of concept of the utility of chimeric design to further improve the activity and stability of PETases active at moderate temperatures.
Collapse
Affiliation(s)
- Paula Blázquez‐Sánchez
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological SciencesPontificia Universidad Católica de ChileSantiagoChile
- ANID—Millennium Science Initiative ProgramMillennium Institute for Integrative Biology (iBio)SantiagoChile
- Institute of Analytical ChemistryLeipzig UniversityLeipzigGermany
| | - Jhon A. Vargas
- São Carlos Institute of PhysicsUniversity of São PauloSão CarlosBrazil
| | | | - Aransa Griñen
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological SciencesPontificia Universidad Católica de ChileSantiagoChile
- ANID—Millennium Science Initiative ProgramMillennium Institute for Integrative Biology (iBio)SantiagoChile
| | - Diego A. Leonardo
- São Carlos Institute of PhysicsUniversity of São PauloSão CarlosBrazil
| | | | | | | | | | - Beatriz Díez
- Department of Molecular Genetics and Microbiology, School of Biological SciencesPontificia Universidad Católica de ChileSantiagoChile
- Center for Climate and Resilience Research (CR)SantiagoChile
- Millennium Institute Center for Genome Regulation (CGR)SantiagoChile
| | | | - César A. Ramírez‐Sarmiento
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological SciencesPontificia Universidad Católica de ChileSantiagoChile
- ANID—Millennium Science Initiative ProgramMillennium Institute for Integrative Biology (iBio)SantiagoChile
| |
Collapse
|
36
|
Kulikova AV, Diaz DJ, Chen T, Cole TJ, Ellington AD, Wilke CO. Two sequence- and two structure-based ML models have learned different aspects of protein biochemistry. Sci Rep 2023; 13:13280. [PMID: 37587128 PMCID: PMC10432456 DOI: 10.1038/s41598-023-40247-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 08/07/2023] [Indexed: 08/18/2023] Open
Abstract
Deep learning models are seeing increased use as methods to predict mutational effects or allowed mutations in proteins. The models commonly used for these purposes include large language models (LLMs) and 3D Convolutional Neural Networks (CNNs). These two model types have very different architectures and are commonly trained on different representations of proteins. LLMs make use of the transformer architecture and are trained purely on protein sequences whereas 3D CNNs are trained on voxelized representations of local protein structure. While comparable overall prediction accuracies have been reported for both types of models, it is not known to what extent these models make comparable specific predictions and/or generalize protein biochemistry in similar ways. Here, we perform a systematic comparison of two LLMs and two structure-based models (CNNs) and show that the different model types have distinct strengths and weaknesses. The overall prediction accuracies are largely uncorrelated between the sequence- and structure-based models. Overall, the two structure-based models are better at predicting buried aliphatic and hydrophobic residues whereas the two LLMs are better at predicting solvent-exposed polar and charged amino acids. Finally, we find that a combined model that takes the individual model predictions as input can leverage these individual model strengths and results in significantly improved overall prediction accuracy.
Collapse
Affiliation(s)
- Anastasiya V Kulikova
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA
- The Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, TX, USA
| | - Daniel J Diaz
- Department of Chemistry, The University of Texas at Austin, Austin, TX, USA
- The Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, TX, USA
- Institute for Foundations of Machine Learning (IFML), The University of Texas at Austin, Austin, TX, USA
| | - Tianlong Chen
- Institute for Foundations of Machine Learning (IFML), The University of Texas at Austin, Austin, TX, USA
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA
| | - T Jeffrey Cole
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA
| | - Andrew D Ellington
- The Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, TX, USA
| | - Claus O Wilke
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA.
| |
Collapse
|
37
|
Kulikova AV, Diaz DJ, Chen T, Jeffrey Cole T, Ellington AD, Wilke CO. Two sequence- and two structure-based ML models have learned different aspects of protein biochemistry. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.20.533508. [PMID: 36993648 PMCID: PMC10055221 DOI: 10.1101/2023.03.20.533508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
Deep learning models are seeing increased use as methods to predict mutational effects or allowed mutations in proteins. The models commonly used for these purposes include large language models (LLMs) and 3D Convolutional Neural Networks (CNNs). These two model types have very different architectures and are commonly trained on different representations of proteins. LLMs make use of the transformer architecture and are trained purely on protein sequences whereas 3D CNNs are trained on voxelized representations of local protein structure. While comparable overall prediction accuracies have been reported for both types of models, it is not known to what extent these models make comparable specific predictions and/or generalize protein biochemistry in similar ways. Here, we perform a systematic comparison of two LLMs and two structure-based models (CNNs) and show that the different model types have distinct strengths and weaknesses. The overall prediction accuracies are largely uncorrelated between the sequence- and structure-based models. Overall, the two structure-based models are better at predicting buried aliphatic and hydrophobic residues whereas the two LLMs are better at predicting solvent-exposed polar and charged amino acids. Finally, we find that a combined model that takes the individual model predictions as input can leverage these individual model strengths and results in significantly improved overall prediction accuracy.
Collapse
Affiliation(s)
- Anastasiya V. Kulikova
- Department of Integrative Biology, University of Texas at Austin, Austin, Texas, USA
- Center for Systems and Synthetic Biology, The Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, USA
| | - Daniel J. Diaz
- Department of Chemistry, The University of Texas at Austin, Austin, TX, USA
- Center for Systems and Synthetic Biology, The Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, USA
- Institute for Foundations of Machine Learning (IFML), The University of Texas at Austin, Austin, TX, USA
| | - Tianlong Chen
- Institute for Foundations of Machine Learning (IFML), The University of Texas at Austin, Austin, TX, USA
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA
| | - T. Jeffrey Cole
- Department of Integrative Biology, University of Texas at Austin, Austin, Texas, USA
| | - Andrew D. Ellington
- Center for Systems and Synthetic Biology, The Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, USA
| | - Claus O. Wilke
- Department of Integrative Biology, University of Texas at Austin, Austin, Texas, USA
| |
Collapse
|
38
|
Ramakrishnan G, Baakman C, Heijl S, Vroling B, van Horck R, Hiraki J, Xue LC, Huynen MA. Understanding structure-guided variant effect predictions using 3D convolutional neural networks. Front Mol Biosci 2023; 10:1204157. [PMID: 37475887 PMCID: PMC10354367 DOI: 10.3389/fmolb.2023.1204157] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 06/22/2023] [Indexed: 07/22/2023] Open
Abstract
Predicting pathogenicity of missense variants in molecular diagnostics remains a challenge despite the available wealth of data, such as evolutionary information, and the wealth of tools to integrate that data. We describe DeepRank-Mut, a configurable framework designed to extract and learn from physicochemically relevant features of amino acids surrounding missense variants in 3D space. For each variant, various atomic and residue-level features are extracted from its structural environment, including sequence conservation scores of the surrounding amino acids, and stored in multi-channel 3D voxel grids which are then used to train a 3D convolutional neural network (3D-CNN). The resultant model gives a probabilistic estimate of whether a given input variant is disease-causing or benign. We find that the performance of our 3D-CNN model, on independent test datasets, is comparable to other widely used resources which also combine sequence and structural features. Based on the 10-fold cross-validation experiments, we achieve an average accuracy of 0.77 on the independent test datasets. We discuss the contribution of the variant neighborhood in the model's predictive power, in addition to the impact of individual features on the model's performance. Two key features: evolutionary information of residues in the variant neighborhood and their solvent accessibilities were observed to influence the predictions. We also highlight how predictions are impacted by the underlying disease mechanisms of missense mutations and offer insights into understanding these to improve pathogenicity predictions. Our study presents aspects to take into consideration when adopting deep learning approaches for protein structure-guided pathogenicity predictions.
Collapse
Affiliation(s)
- Gayatri Ramakrishnan
- Department of Medical Biosciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Coos Baakman
- Department of Medical Biosciences, Radboud University Medical Center, Nijmegen, Netherlands
| | | | | | | | | | - Li C. Xue
- Department of Medical Biosciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Martijn A. Huynen
- Department of Medical Biosciences, Radboud University Medical Center, Nijmegen, Netherlands
| |
Collapse
|
39
|
Dürr SL, Levy A, Rothlisberger U. Metal3D: a general deep learning framework for accurate metal ion location prediction in proteins. Nat Commun 2023; 14:2713. [PMID: 37169763 PMCID: PMC10175565 DOI: 10.1038/s41467-023-37870-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 03/29/2023] [Indexed: 05/13/2023] Open
Abstract
Metal ions are essential cofactors for many proteins and play a crucial role in many applications such as enzyme design or design of protein-protein interactions because they are biologically abundant, tether to the protein using strong interactions, and have favorable catalytic properties. Computational design of metalloproteins is however hampered by the complex electronic structure of many biologically relevant metals such as zinc . In this work, we develop two tools - Metal3D (based on 3D convolutional neural networks) and Metal1D (solely based on geometric criteria) to improve the location prediction of zinc ions in protein structures. Comparison with other currently available tools shows that Metal3D is the most accurate zinc ion location predictor to date with predictions within 0.70 ± 0.64 Å of experimental locations. Metal3D outputs a confidence metric for each predicted site and works on proteins with few homologes in the protein data bank. Metal3D predicts a global zinc density that can be used for annotation of computationally predicted structures and a per residue zinc density that can be used in protein design workflows. Currently trained on zinc, the framework of Metal3D is readily extensible to other metals by modifying the training data.
Collapse
Affiliation(s)
- Simon L Dürr
- Laboratory of Computational Chemistry and Biochemistry,Institute of Chemical Sciences and Engineering, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Andrea Levy
- Laboratory of Computational Chemistry and Biochemistry,Institute of Chemical Sciences and Engineering, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Ursula Rothlisberger
- Laboratory of Computational Chemistry and Biochemistry,Institute of Chemical Sciences and Engineering, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
| |
Collapse
|
40
|
Xu G, Wang Q, Ma J. OPUS-Mut: Studying the Effect of Protein Mutation through Side-Chain Modeling. J Chem Theory Comput 2023; 19:1629-1640. [PMID: 36813264 PMCID: PMC10018731 DOI: 10.1021/acs.jctc.2c00847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Predicting the effect of protein mutation is crucial in many applications such as protein design, protein evolution, and genetic disease analysis. Structurally, mutation is basically the replacement of the side chain of a particular residue. Therefore, accurate side-chain modeling is useful in studying the effect of mutation. Here, we propose a computational method, namely, OPUS-Mut, which significantly outperforms other backbone-dependent side-chain modeling methods including our previous method OPUS-Rota4. We evaluate OPUS-Mut by four case studies on Myoglobin, p53, HIV-1 protease, and T4 lysozyme. The results show that the predicted structures of side chains of different mutants are consistent well with their experimentally determined results. In addition, when the residues with significant structural shifts upon the mutation are considered, it is found that the extent of the predicted structural shift of these affected residues can be correlated reasonably well with the functional changes of the mutant measured by experiments. OPUS-Mut can also help one to identify the harmful and benign mutations and thus may guide the construction of a protein with relatively low sequence homology but with a similar structure.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China.,Shanghai AI Laboratory, Shanghai 200030, China
| | - Qinghua Wang
- Center for Biomolecular Innovation, Harcam Biomedicines, Shanghai 200131, China
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China.,Shanghai AI Laboratory, Shanghai 200030, China
| |
Collapse
|
41
|
Li M, Kang L, Xiong Y, Wang YG, Fan G, Tan P, Hong L. SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering. J Cheminform 2023; 15:12. [PMID: 36737798 PMCID: PMC9898993 DOI: 10.1186/s13321-023-00688-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 01/23/2023] [Indexed: 02/05/2023] Open
Abstract
Deep learning has been widely used for protein engineering. However, it is limited by the lack of sufficient experimental data to train an accurate model for predicting the functional fitness of high-order mutants. Here, we develop SESNet, a supervised deep-learning model to predict the fitness for protein mutants by leveraging both sequence and structure information, and exploiting attention mechanism. Our model integrates local evolutionary context from homologous sequences, the global evolutionary context encoding rich semantic from the universal protein sequence space and the structure information accounting for the microenvironment around each residue in a protein. We show that SESNet outperforms state-of-the-art models for predicting the sequence-function relationship on 26 deep mutational scanning datasets. More importantly, we propose a data augmentation strategy by leveraging the data from unsupervised models to pre-train our model. After that, our model can achieve strikingly high accuracy in prediction of the fitness of protein mutants, especially for the higher order variants (> 4 mutation sites), when finetuned by using only a small number of experimental mutation data (< 50). The strategy proposed is of great practical value as the required experimental effort, i.e., producing a few tens of experimental mutation data on a given protein, is generally affordable by an ordinary biochemical group and can be applied on almost any protein.
Collapse
Affiliation(s)
- Mingchen Li
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200240, China
| | - Liqi Kang
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
- School of Physics and Astronomy & School of Pharmacy, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yu Guang Wang
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China
| | - Guisheng Fan
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200240, China
| | - Pan Tan
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China.
| | - Liang Hong
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China.
- School of Physics and Astronomy & School of Pharmacy, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
42
|
Diaz DJ, Kulikova AV, Ellington AD, Wilke CO. Using machine learning to predict the effects and consequences of mutations in proteins. Curr Opin Struct Biol 2023; 78:102518. [PMID: 36603229 PMCID: PMC9908841 DOI: 10.1016/j.sbi.2022.102518] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 11/07/2022] [Accepted: 11/20/2022] [Indexed: 01/05/2023]
Abstract
Machine and deep learning approaches can leverage the increasingly available massive datasets of protein sequences, structures, and mutational effects to predict variants with improved fitness. Many different approaches are being developed, but systematic benchmarking studies indicate that even though the specifics of the machine learning algorithms matter, the more important constraint comes from the data availability and quality utilized during training. In cases where little experimental data are available, unsupervised and self-supervised pre-training with generic protein datasets can still perform well after subsequent refinement via hybrid or transfer learning approaches. Overall, recent progress in this field has been staggering, and machine learning approaches will likely play a major role in future breakthroughs in protein biochemistry and engineering.
Collapse
Affiliation(s)
- Daniel J Diaz
- Department of Chemistry, The University of Texas at Austin, 105 E 24TH St., Austin, 78712, Texas, USA; Department of Molecular Biosciences, The University of Texas at Austin, 100 East 24th St., Stop A5000, Austin, 78712, Texas, USA. https://twitter.com/aiproteins
| | - Anastasiya V Kulikova
- Department of Integrative Biology, The University of Texas at Austin, 2415 Speedway, Stop C0930, Austin, 78712, Texas, USA
| | - Andrew D Ellington
- Department of Molecular Biosciences, The University of Texas at Austin, 100 East 24th St., Stop A5000, Austin, 78712, Texas, USA. https://twitter.com/CSSBatUT
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, 2415 Speedway, Stop C0930, Austin, 78712, Texas, USA.
| |
Collapse
|
43
|
Jiang Y, Ran X, Yang ZJ. Data-driven enzyme engineering to identify function-enhancing enzymes. Protein Eng Des Sel 2023; 36:gzac009. [PMID: 36214500 PMCID: PMC10365845 DOI: 10.1093/protein/gzac009] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 08/08/2022] [Accepted: 09/28/2022] [Indexed: 01/22/2023] Open
Abstract
Identifying function-enhancing enzyme variants is a 'holy grail' challenge in protein science because it will allow researchers to expand the biocatalytic toolbox for late-stage functionalization of drug-like molecules, environmental degradation of plastics and other pollutants, and medical treatment of food allergies. Data-driven strategies, including statistical modeling, machine learning, and deep learning, have largely advanced the understanding of the sequence-structure-function relationships for enzymes. They have also enhanced the capability of predicting and designing new enzymes and enzyme variants for catalyzing the transformation of new-to-nature reactions. Here, we reviewed the recent progresses of data-driven models that were applied in identifying efficiency-enhancing mutants for catalytic reactions. We also discussed existing challenges and obstacles faced by the community. Although the review is by no means comprehensive, we hope that the discussion can inform the readers about the state-of-the-art in data-driven enzyme engineering, inspiring more joint experimental-computational efforts to develop and apply data-driven modeling to innovate biocatalysts for synthetic and pharmaceutical applications.
Collapse
Affiliation(s)
- Yaoyukun Jiang
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
| | - Xinchun Ran
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
| | - Zhongyue J Yang
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
- Vanderbilt Institute of Chemical Biology, Vanderbilt University, Nashville, TN 37235, USA
- Data Science Institute, Vanderbilt University, Nashville, TN 37235, USA
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, TN 37235, USA
| |
Collapse
|
44
|
Hu R, Fu L, Chen Y, Chen J, Qiao Y, Si T. Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments. Brief Bioinform 2023; 24:6958505. [PMID: 36562723 DOI: 10.1093/bib/bbac570] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/14/2022] [Accepted: 11/22/2022] [Indexed: 12/24/2022] Open
Abstract
Directed protein evolution applies repeated rounds of genetic mutagenesis and phenotypic screening and is often limited by experimental throughput. Through in silico prioritization of mutant sequences, machine learning has been applied to reduce wet lab burden to a level practical for human researchers. On the other hand, robotics permits large batches and rapid iterations for protein engineering cycles, but such capacities have not been well exploited in existing machine learning-assisted directed evolution approaches. Here, we report a scalable and batched method, Bayesian Optimization-guided EVOlutionary (BO-EVO) algorithm, to guide multiple rounds of robotic experiments to explore protein fitness landscapes of combinatorial mutagenesis libraries. We first examined various design specifications based on an empirical landscape of protein G domain B1. Then, BO-EVO was successfully generalized to another empirical landscape of an Escherichia coli kinase PhoQ, as well as simulated NK landscapes with up to moderate epistasis. This approach was then applied to guide robotic library creation and screening to engineer enzyme specificity of RhlA, a key biosynthetic enzyme for rhamnolipid biosurfactants. A 4.8-fold improvement in producing a target rhamnolipid congener was achieved after examining less than 1% of all possible mutants after four iterations. Overall, BO-EVO proves to be an efficient and general approach to guide combinatorial protein engineering without prior knowledge.
Collapse
Affiliation(s)
- Ruyun Hu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Lihao Fu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen 518055, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yongcan Chen
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen 518055, China
| | - Junyu Chen
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yu Qiao
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Tong Si
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen 518055, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
45
|
Paik I, Ngo PHT, Shroff R, Diaz DJ, Maranhao AC, Walker DJ, Bhadra S, Ellington AD. Improved Bst DNA Polymerase Variants Derived via a Machine Learning Approach. Biochemistry 2023; 62:410-418. [PMID: 34762799 PMCID: PMC9514386 DOI: 10.1021/acs.biochem.1c00451] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The DNA polymerase I from Geobacillus stearothermophilus (also known as Bst DNAP) is widely used in isothermal amplification reactions, where its strand displacement ability is prized. More robust versions of this enzyme should be enabled for diagnostic applications, especially for carrying out higher temperature reactions that might proceed more quickly. To this end, we appended a short fusion domain from the actin-binding protein villin that improved both stability and purification of the enzyme. In parallel, we have developed a machine learning algorithm that assesses the relative fit of individual amino acids to their chemical microenvironments at any position in a protein and applied this algorithm to predict sequence substitutions in Bst DNAP. The top predicted variants had greatly improved thermotolerance (heating prior to assay), and upon combination, the mutations showed additive thermostability, with denaturation temperatures up to 2.5 °C higher than the parental enzyme. The increased thermostability of the enzyme allowed faster loop-mediated isothermal amplification assays to be carried out at 73 °C, where both Bst DNAP and its improved commercial counterpart Bst 2.0 are inactivated. Overall, this is one of the first examples of the application of machine learning approaches to the thermostabilization of an enzyme.
Collapse
Affiliation(s)
- Inyup Paik
- Department of Molecular Biosciences, College of Natural Sciences, the University of Texas at Austin, Austin, Texas 78712, United States; Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Phuoc H. T. Ngo
- Department of Molecular Biosciences, College of Natural Sciences, the University of Texas at Austin, Austin, Texas 78712, United States; Center for Systems and Synthetic Biology and Department of Chemistry, College of Natural Sciences, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Raghav Shroff
- Department of Molecular Biosciences, College of Natural Sciences, the University of Texas at Austin, Austin, Texas 78712, United States; Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712, United States; CCDC Army Research Lab-South, Austin, Texas 78712, United States
| | - Daniel J. Diaz
- Center for Systems and Synthetic Biology and Department of Chemistry, College of Natural Sciences, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Andre C. Maranhao
- Department of Molecular Biosciences, College of Natural Sciences, the University of Texas at Austin, Austin, Texas 78712, United States; Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712, United States
| | - David J.F. Walker
- Department of Molecular Biosciences, College of Natural Sciences, the University of Texas at Austin, Austin, Texas 78712, United States; Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Sanchita Bhadra
- Department of Molecular Biosciences, College of Natural Sciences, the University of Texas at Austin, Austin, Texas 78712, United States; Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Andrew D. Ellington
- Department of Molecular Biosciences, College of Natural Sciences, the University of Texas at Austin, Austin, Texas 78712, United States; Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712, United States
| |
Collapse
|
46
|
Kosonocky CW, Ellington AD. Evolving to Evolve, Dan Tawfik's Insights into Protein Engineering. Biochemistry 2023; 62:145-147. [PMID: 36647679 DOI: 10.1021/acs.biochem.2c00668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
47
|
Neves RP, Ramos MJ, Fernandes PA. Engineering DszC Mutants from Transition State Macrodipole Considerations and Evolutionary Sequence Analysis. J Chem Inf Model 2023; 63:20-26. [PMID: 36534708 PMCID: PMC9832474 DOI: 10.1021/acs.jcim.2c01337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
We describe an approach to identify enzyme mutants with increased turnover using the enzyme DszC as a case study. Our approach is based on recalculating the barriers of alanine mutants through single-point energy calculations at the hybrid QM/MM level in the wild-type reactant and transition state geometries. We analyze the difference in the electron density between the reactant and transition state to identify sites/residues where electrostatic interactions stabilize the transition state over the reactants. We also assess the insertion of a unit probe charge to identify positions in which the introduction of charged residues lowers the barrier.
Collapse
|
48
|
Sieow BFL, De Sotto R, Seet ZRD, Hwang IY, Chang MW. Synthetic Biology Meets Machine Learning. Methods Mol Biol 2023; 2553:21-39. [PMID: 36227537 DOI: 10.1007/978-1-0716-2617-7_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
This chapter outlines the myriad applications of machine learning (ML) in synthetic biology, specifically in engineering cell and protein activity, and metabolic pathways. Though by no means comprehensive, the chapter highlights several prominent computational tools applied in the field and their potential use cases. The examples detailed reinforce how ML algorithms can enhance synthetic biology research by providing data-driven insights into the behavior of living systems, even without detailed knowledge of their underlying mechanisms. By doing so, ML promises to increase the efficiency of research projects by modeling hypotheses in silico that can then be tested through experiments. While challenges related to training dataset generation and computational costs remain, ongoing improvements in ML tools are paving the way for smarter and more streamlined synthetic biology workflows that can be readily employed to address grand challenges across manufacturing, medicine, engineering, agriculture, and beyond.
Collapse
Affiliation(s)
- Brendan Fu-Long Sieow
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- NUS Graduate School for Integrative Sciences and Engineering Programme, National University of Singapore, Singapore, Singapore
| | - Ryan De Sotto
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Zhi Ren Darren Seet
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - In Young Hwang
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Matthew Wook Chang
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore.
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
49
|
Sugiki S, Niide T, Toya Y, Shimizu H. Logistic Regression-Guided Identification of Cofactor Specificity-Contributing Residues in Enzyme with Sequence Datasets Partitioned by Catalytic Properties. ACS Synth Biol 2022; 11:3973-3985. [PMID: 36321539 PMCID: PMC9764414 DOI: 10.1021/acssynbio.2c00315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Changing the substrate/cofactor specificity of an enzyme requires multiple mutations at spatially adjacent positions around the substrate pocket. However, this is challenging when solely based on crystal structure information because enzymes undergo dynamic conformational changes during the reaction process. Herein, we proposed a method for estimating the contribution of each amino acid residue to substrate specificity by deploying a phylogenetic analysis with logistic regression. Since this method can estimate the candidate amino acids for mutation by ranking, it is readable and can be used in protein engineering. We demonstrated our concept using redox cofactor conversion of the Escherichia coli malic enzyme as a model, which still lacks crystal structure elucidation. The use of logistic regression with amino acid sequences classified by cofactor specificity showed that the NADP+-dependent malic enzyme completely switched cofactor specificity to NAD+ dependence without the need for a practical screening step. The model showed that surrounding residues made a greater contribution to cofactor specificity than those in the interior of the substrate pocket. These residues might be difficult to identify from crystal structure observations. We show that a highly accurate and inferential machine learning model was obtained using amino acid sequences of structurally homologous and functionally distinct enzymes as input data.
Collapse
|
50
|
Wu J, Liu Z, Yang X, Lin Z. Improved compound-protein interaction site and binding affinity prediction using self-supervised protein embeddings. BMC Bioinformatics 2022; 23:543. [PMID: 36526969 PMCID: PMC9756525 DOI: 10.1186/s12859-022-05107-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 12/09/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Compound-protein interaction site and binding affinity predictions are crucial for drug discovery and drug design. In recent years, many deep learning-based methods have been proposed for predications related to compound-protein interaction. For protein inputs, how to make use of protein primary sequence and tertiary structure information has impact on prediction results. RESULTS In this study, we propose a deep learning model based on a multi-objective neural network, which involves a multi-objective neural network for compound-protein interaction site and binding affinity prediction. We used several kinds of self-supervised protein embeddings to enrich our protein inputs and used convolutional neural networks to extract features from them. Our results demonstrate that our model had improvements in terms of interaction site prediction and affinity prediction compared to previous models. In a case study, our model could better predict binding sites, which also showed its effectiveness. CONCLUSION These results suggest that our model could be a helpful tool for compound-protein related predictions.
Collapse
Affiliation(s)
- Jialin Wu
- grid.79703.3a0000 0004 1764 3838School of Biology and Biological Engineering, South China University of Technology, 382 East Outer Loop Road, University Park, Guangzhou, 510006 Guangdong China
| | - Zhe Liu
- grid.79703.3a0000 0004 1764 3838School of Biology and Biological Engineering, South China University of Technology, 382 East Outer Loop Road, University Park, Guangzhou, 510006 Guangdong China
| | - Xiaofeng Yang
- grid.79703.3a0000 0004 1764 3838School of Biology and Biological Engineering, South China University of Technology, 382 East Outer Loop Road, University Park, Guangzhou, 510006 Guangdong China
| | - Zhanglin Lin
- grid.79703.3a0000 0004 1764 3838School of Biology and Biological Engineering, South China University of Technology, 382 East Outer Loop Road, University Park, Guangzhou, 510006 Guangdong China
| |
Collapse
|