1
|
Liu Y, He Z, Jia L, Xue Y, Du Y, Tan H, Zhang X, Ji Y, Tong Y, Xu H, Liu L. Predicting Natural Evolution in the RBD Region of the Spike Glycoprotein of SARS-CoV-2 by Machine Learning. Viruses 2024; 16:477. [PMID: 38543841 PMCID: PMC10974066 DOI: 10.3390/v16030477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 03/12/2024] [Accepted: 03/18/2024] [Indexed: 05/23/2024] Open
Abstract
Machine learning (ML) is a key focus in predicting protein mutations and aiding directed evolution. Research on potential virus variants is crucial for vaccine development. In this study, the machine learning software PyPEF was employed to conduct mutation analysis within the receptor-binding domain (RBD) of the Spike glycoprotein of SARS-CoV-2. Over 48,960,000 variants were predicted. Eight prospective variants that could surface in the future underwent modeling and molecular dynamics simulations. The study forecasts that the latest variant, ISOY2P5O1, may potentially emerge around 17 November 2023, with an approximate window of uncertainty of ±22 days. The ISOY8P5O2 variant displayed an increased binding capacity in the dry assay, with a total predicted binding energy of -110.306 kcal/mol. This represents an 8.25% enhancement in total binding energy compared to the original SARS-CoV-2 strain discovered in Wuhan (-101.892 kcal/mol). Reverse research confirmed the structural significance of mutation sites using ML models, particularly in the context of protein folding. The study validated regression methods (SVR, RF, and PLS) with different data structures. This study investigates the effectiveness of the "ML-Guided Design Correctly Predicts Combinatorial Effects Strategy" compared to the "ML-Guided Design Correctly Predicts Natural Evolution Prediction Strategy". To enhance machine learning, we created a timestamping algorithm and two auxiliary programs using advanced techniques to rapidly process extensive data, surpassing batch sequencing capabilities. This study not only advances machine learning in guiding protein evolution but also holds potential for forecasting future viruses and vaccine development.
Collapse
Affiliation(s)
- Yiheng Liu
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China; (Y.L.); (Y.T.)
| | - Zitong He
- College of International Education, Beijing University of Chemical Technology, Beijing 100029, China (H.T.)
| | - Liyiyang Jia
- College of International Education, Beijing University of Chemical Technology, Beijing 100029, China (H.T.)
| | - Yiwei Xue
- College of International Education, Beijing University of Chemical Technology, Beijing 100029, China (H.T.)
| | - Yuxuan Du
- College of International Education, Beijing University of Chemical Technology, Beijing 100029, China (H.T.)
| | - Huiwen Tan
- College of International Education, Beijing University of Chemical Technology, Beijing 100029, China (H.T.)
| | - Xianzhi Zhang
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China;
| | - Yu Ji
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China; (Y.L.); (Y.T.)
| | - Yigang Tong
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China; (Y.L.); (Y.T.)
| | - Haijun Xu
- College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing 100029, China
| | - Luo Liu
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China; (Y.L.); (Y.T.)
| |
Collapse
|
2
|
Hemmer S, Siedhoff NE, Werner S, Ölçücü G, Schwaneberg U, Jaeger KE, Davari MD, Krauss U. Machine Learning-Assisted Engineering of Light, Oxygen, Voltage Photoreceptor Adduct Lifetime. JACS AU 2023; 3:3311-3323. [PMID: 38155650 PMCID: PMC10751770 DOI: 10.1021/jacsau.3c00440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 11/07/2023] [Accepted: 11/07/2023] [Indexed: 12/30/2023]
Abstract
Naturally occurring and engineered flavin-binding, blue-light-sensing, light, oxygen, voltage (LOV) photoreceptor domains have been used widely to design fluorescent reporters, optogenetic tools, and photosensitizers for the visualization and control of biological processes. In addition, natural LOV photoreceptors with engineered properties were recently employed for optimizing plant biomass production in the framework of a plant-based bioeconomy. Here, the understanding and fine-tuning of LOV photoreceptor (kinetic) properties is instrumental for application. In response to blue-light illumination, LOV domains undergo a cascade of photophysical and photochemical events that yield a transient covalent FMN-cysteine adduct, allowing for signaling. The rate-limiting step of the LOV photocycle is the dark-recovery process, which involves adduct scission and can take between seconds and days. Rational engineering of LOV domains with fine-tuned dark recovery has been challenging due to the lack of a mechanistic model, the long time scale of the process, which hampers atomistic simulations, and a gigantic protein sequence space covering known mutations (combinatorial challenge). To address these issues, we used machine learning (ML) trained on scarce literature data and iteratively generated and implemented experimental data to design LOV variants with faster and slower dark recovery. Over the three prediction-validation cycles, LOV domain variants were successfully predicted, whose adduct-state lifetimes spanned 7 orders of magnitude, yielding optimized tools for synthetic (opto)biology. In summary, our results demonstrate ML as a viable method to guide the design of proteins even with limited experimental data and when no mechanistic model of the underlying physical principles is available.
Collapse
Affiliation(s)
- Stefanie Hemmer
- Institute
of Molecular Enzyme Technology, Heinrich
Heine University Düsseldorf, Wilhelm Johnen Strasse, Jülich 52426, Germany
| | - Niklas Erik Siedhoff
- Institute
of Biotechnology, RWTH Aachen University, Worringer Weg 3, 52074 Aachen, Germany
- DWI-Leibniz
Institute for Interactive Materials, Forckenbeckstraße 50, 52074 Aachen, Germany
| | - Sophia Werner
- Institute
of Molecular Enzyme Technology, Heinrich
Heine University Düsseldorf, Wilhelm Johnen Strasse, Jülich 52426, Germany
| | - Gizem Ölçücü
- Institute
of Molecular Enzyme Technology, Heinrich
Heine University Düsseldorf, Wilhelm Johnen Strasse, Jülich 52426, Germany
| | - Ulrich Schwaneberg
- Institute
of Biotechnology, RWTH Aachen University, Worringer Weg 3, 52074 Aachen, Germany
- DWI-Leibniz
Institute for Interactive Materials, Forckenbeckstraße 50, 52074 Aachen, Germany
| | - Karl-Erich Jaeger
- Institute
of Molecular Enzyme Technology, Heinrich
Heine University Düsseldorf, Wilhelm Johnen Strasse, Jülich 52426, Germany
- Institute
of Bio-and Geosciences IBG 1: Biotechnology, Forschungszentrum Jülich GmbH, Wilhelm Johnen Strasse, Jülich 52426, Germany
| | - Mehdi D. Davari
- Department
of Bioorganic Chemistry, Leibniz Institute
of Plant Biochemistry, Weinberg 3, 06120 Halle, Germany
| | - Ulrich Krauss
- Institute
of Molecular Enzyme Technology, Heinrich
Heine University Düsseldorf, Wilhelm Johnen Strasse, Jülich 52426, Germany
- Institute
of Bio-and Geosciences IBG 1: Biotechnology, Forschungszentrum Jülich GmbH, Wilhelm Johnen Strasse, Jülich 52426, Germany
- Department
of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany
| |
Collapse
|
3
|
Yang L, Liang X, Zhang N, Lu L. STAR: A Web Server for Assisting Directed Protein Evolution with Machine Learning. ACS OMEGA 2023; 8:44751-44756. [PMID: 38046324 PMCID: PMC10688154 DOI: 10.1021/acsomega.3c04832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 10/10/2023] [Accepted: 10/12/2023] [Indexed: 12/05/2023]
Abstract
Protein engineering has made significant contributions to industries such as agriculture, food, and pharmaceuticals. In recent years, directed evolution combined with artificial intelligence has emerged as a cutting-edge R&D approach. However, the application of machine learning techniques can be challenging for those without relevant experience and coding skills. To address this issue, we have developed a web-based protein sequence recommendation system: STAR (Sequence recommendaTion via ARtificial intelligence). Our system utilizes Bayesian optimization as its backbone and includes a filtering step using a regression model to enhance the success rate of recommended sequences. Additionally, we have incorporated an in silico-directed evolution approach to expand the exploration of the protein space. The Web site can be accessed at https://www.FindProteinStar.com/.
Collapse
Affiliation(s)
- Likun Yang
- Asymchem Life Science (Tianjin) Co.,
Ltd, Tianjin 300457, P. R. China
| | - Xiaoli Liang
- Asymchem Life Science (Tianjin) Co.,
Ltd, Tianjin 300457, P. R. China
| | - Na Zhang
- Asymchem Life Science (Tianjin) Co.,
Ltd, Tianjin 300457, P. R. China
| | - Lu Lu
- Asymchem Life Science (Tianjin) Co.,
Ltd, Tianjin 300457, P. R. China
| |
Collapse
|
4
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
5
|
Boukid F, Ganeshan S, Wang Y, Tülbek MÇ, Nickerson MT. Bioengineered Enzymes and Precision Fermentation in the Food Industry. Int J Mol Sci 2023; 24:10156. [PMID: 37373305 DOI: 10.3390/ijms241210156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 06/06/2023] [Accepted: 06/13/2023] [Indexed: 06/29/2023] Open
Abstract
Enzymes have been used in the food processing industry for many years. However, the use of native enzymes is not conducive to high activity, efficiency, range of substrates, and adaptability to harsh food processing conditions. The advent of enzyme engineering approaches such as rational design, directed evolution, and semi-rational design provided much-needed impetus for tailor-made enzymes with improved or novel catalytic properties. Production of designer enzymes became further refined with the emergence of synthetic biology and gene editing techniques and a plethora of other tools such as artificial intelligence, and computational and bioinformatics analyses which have paved the way for what is referred to as precision fermentation for the production of these designer enzymes more efficiently. With all the technologies available, the bottleneck is now in the scale-up production of these enzymes. There is generally a lack of accessibility thereof of large-scale capabilities and know-how. This review is aimed at highlighting these various enzyme-engineering strategies and the associated scale-up challenges, including safety concerns surrounding genetically modified microorganisms and the use of cell-free systems to circumvent this issue. The use of solid-state fermentation (SSF) is also addressed as a potentially low-cost production system, amenable to customization and employing inexpensive feedstocks as substrate.
Collapse
Affiliation(s)
- Fatma Boukid
- ClonBio Group Ltd., 6 Fitzwilliam Pl, D02 XE61 Dublin, Ireland
| | | | - Yingxin Wang
- Saskatchewan Food Industry Development Centre, Saskatoon, SK S7M 5V1, Canada
| | | | - Michael T Nickerson
- Department of Food and Bioproduct Sciences, University of Saskatchewan, Saskatoon, SK S7N 5A8, Canada
| |
Collapse
|
6
|
Vasina M, Kovar D, Damborsky J, Ding Y, Yang T, deMello A, Mazurenko S, Stavrakis S, Prokop Z. In-depth analysis of biocatalysts by microfluidics: An emerging source of data for machine learning. Biotechnol Adv 2023; 66:108171. [PMID: 37150331 DOI: 10.1016/j.biotechadv.2023.108171] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 05/04/2023] [Accepted: 05/04/2023] [Indexed: 05/09/2023]
Abstract
Nowadays, the vastly increasing demand for novel biotechnological products is supported by the continuous development of biocatalytic applications which provide sustainable green alternatives to chemical processes. The success of a biocatalytic application is critically dependent on how quickly we can identify and characterize enzyme variants fitting the conditions of industrial processes. While miniaturization and parallelization have dramatically increased the throughput of next-generation sequencing systems, the subsequent characterization of the obtained candidates is still a limiting process in identifying the desired biocatalysts. Only a few commercial microfluidic systems for enzyme analysis are currently available, and the transformation of numerous published prototypes into commercial platforms is still to be streamlined. This review presents the state-of-the-art, recent trends, and perspectives in applying microfluidic tools in the functional and structural analysis of biocatalysts. We discuss the advantages and disadvantages of available technologies, their reproducibility and robustness, and readiness for routine laboratory use. We also highlight the unexplored potential of microfluidics to leverage the power of machine learning for biocatalyst development.
Collapse
Affiliation(s)
- Michal Vasina
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - David Kovar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Yun Ding
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland
| | - Tianjin Yang
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland; Department of Biochemistry, University of Zurich, 8057 Zurich, Switzerland
| | - Andrew deMello
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic.
| | - Stavros Stavrakis
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland.
| | - Zbynek Prokop
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic.
| |
Collapse
|
7
|
Wonderlick DR, Widom JR, Harms MJ. Disentangling contact and ensemble epistasis in a riboswitch. Biophys J 2023; 122:1600-1612. [PMID: 36710492 PMCID: PMC10183321 DOI: 10.1016/j.bpj.2023.01.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 01/09/2023] [Accepted: 01/24/2023] [Indexed: 01/29/2023] Open
Abstract
Mutations introduced into macromolecules often exhibit epistasis, where the effect of one mutation alters the effect of another. Knowing the mechanisms that lead to epistasis is important for understanding how macromolecules work and evolve, as well as for effective macromolecular engineering. Here, we investigate the interplay between "contact epistasis" (epistasis arising from physical interactions between mutated residues) and "ensemble epistasis" (epistasis that occurs when a mutation redistributes the conformational ensemble of a macromolecule, thus changing the effect of the second mutation). We argue that the two mechanisms can be distinguished in allosteric macromolecules by measuring epistasis at differing allosteric effector concentrations. Contact epistasis manifests as nonadditivity in the microscopic equilibrium constants describing the conformational ensemble. This epistatic effect is independent of allosteric effector concentration. Ensemble epistasis manifests as nonadditivity in thermodynamic observables-such as ligand binding-that are determined by the distribution of ensemble conformations. This epistatic effect strongly depends on allosteric effector concentration. Using this framework, we experimentally investigated the origins of epistasis in three pairwise mutant cycles introduced into the adenine riboswitch aptamer domain by measuring ligand binding as a function of allosteric effector concentration. We found evidence for both contact and ensemble epistasis in all cycles. Furthermore, we found that the two mechanisms of epistasis could interact with each other. For example, in one mutant cycle we observed 6 kcal/mol of contact epistasis in a microscopic equilibrium constant. In that same cycle, the maximum epistasis in ligand binding was only 1.5 kcal/mol: shifts in the ensemble masked the contribution of contact epistasis. Finally, our work yields simple heuristics for identifying contact and ensemble epistasis based on measurements of a biochemical observable as a function of allosteric effector concentration.
Collapse
Affiliation(s)
- Daria R Wonderlick
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon
| | - Julia R Widom
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon; Institute for Molecular Biology, University of Oregon, Eugene, Oregon; Oregon Center for Optical, Molecular, & Quantum Science, University of Oregon, Eugene, Oregon
| | - Michael J Harms
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon; Institute for Molecular Biology, University of Oregon, Eugene, Oregon.
| |
Collapse
|
8
|
Spirov AV, Myasnikova EM. Problem of Domain/Building Block Preservation in the Evolution of Biological Macromolecules and Evolutionary Computation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1345-1362. [PMID: 35594219 DOI: 10.1109/tcbb.2022.3175908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Structurally and functionally isolated domains in biological macromolecular evolution, both natural and artificial, are largely similar to "schemata", building blocks (BBs), in evolutionary computation (EC). The problem of preserving in subsequent evolutionary searches the already found domains / BBs is well known and quite relevant in biology as well as in EC. Both biology and EC are seeing parallel and independent development of several approaches to identifying and preserving previously identified domains / BBs. First, we notice the similarity of DNA shuffling methods in synthetic biology and multi-parent recombination algorithms in EC. Furthermore, approaches to computer identification of domains in proteins that are being developed in biology can be aligned with BB identification methods in EC. Finally, approaches to chimeric protein libraries optimization in biology can be compared to evolutionary search methods based on probabilistic models in EC. We propose to validate the prospects of mutual exchange of ideas and transfer of algorithms and approaches between evolutionary systems biology and EC in these three principal directions. A crucial aim of this transfer is the design of new advanced experimental techniques capable of solving more complex problems of in vitro evolution.
Collapse
|
9
|
Wittmund M, Cadet F, Davari MD. Learning Epistasis and Residue Coevolution Patterns: Current Trends and Future Perspectives for Advancing Enzyme Engineering. ACS Catal 2022. [DOI: 10.1021/acscatal.2c01426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Marcel Wittmund
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle, Germany
| | - Frederic Cadet
- Laboratory of Excellence LABEX GR, DSIMB, Inserm UMR S1134, University of Paris city & University of Reunion, Paris 75014, France
| | - Mehdi D. Davari
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle, Germany
| |
Collapse
|
10
|
Medina-Ortiz D, Contreras S, Amado-Hinojosa J, Torres-Almonacid J, Asenjo JA, Navarrete M, Olivera-Nappa Á. Generalized Property-Based Encoders and Digital Signal Processing Facilitate Predictive Tasks in Protein Engineering. Front Mol Biosci 2022; 9:898627. [PMID: 35911960 PMCID: PMC9329607 DOI: 10.3389/fmolb.2022.898627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 06/23/2022] [Indexed: 11/13/2022] Open
Abstract
Computational methods in protein engineering often require encoding amino acid sequences, i.e., converting them into numeric arrays. Physicochemical properties are a typical choice to define encoders, where we replace each amino acid by its value for a given property. However, what property (or group thereof) is best for a given predictive task remains an open problem. In this work, we generalize property-based encoding strategies to maximize the performance of predictive models in protein engineering. First, combining text mining and unsupervised learning, we partitioned the AAIndex database into eight semantically-consistent groups of properties. We then applied a non-linear PCA within each group to define a single encoder to represent it. Then, in several case studies, we assess the performance of predictive models for protein and peptide function, folding, and biological activity, trained using the proposed encoders and classical methods (One Hot Encoder and TAPE embeddings). Models trained on datasets encoded with our encoders and converted to signals through the Fast Fourier Transform (FFT) increased their precision and reduced their overfitting substantially, outperforming classical approaches in most cases. Finally, we propose a preliminary methodology to create de novo sequences with desired properties. All these results offer simple ways to increase the performance of general and complex predictive tasks in protein engineering without increasing their complexity.
Collapse
Affiliation(s)
- David Medina-Ortiz
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Santiago, Chile
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Punta Arenas, Chile
| | - Sebastian Contreras
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
- *Correspondence: Sebastian Contreras, ; Álvaro Olivera-Nappa,
| | - Juan Amado-Hinojosa
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Santiago, Chile
- Departamento de Ingeniería Química, Biotecnología y Materiales, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile
| | - Jorge Torres-Almonacid
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Punta Arenas, Chile
| | - Juan A. Asenjo
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Santiago, Chile
- Departamento de Ingeniería Química, Biotecnología y Materiales, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile
| | | | - Álvaro Olivera-Nappa
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Santiago, Chile
- Departamento de Ingeniería Química, Biotecnología y Materiales, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile
- *Correspondence: Sebastian Contreras, ; Álvaro Olivera-Nappa,
| |
Collapse
|
11
|
Machine learning to navigate fitness landscapes for protein engineering. Curr Opin Biotechnol 2022; 75:102713. [DOI: 10.1016/j.copbio.2022.102713] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/05/2022] [Accepted: 02/28/2022] [Indexed: 11/19/2022]
|
12
|
Herrmann KR, Brethauer C, Siedhoff NE, Hofmann I, Eyll J, Davari MD, Schwaneberg U, Ruff AJ. Evolution of E. coli Phytase Toward Improved Hydrolysis of Inositol Tetraphosphate. FRONTIERS IN CHEMICAL ENGINEERING 2022. [DOI: 10.3389/fceng.2022.838056] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein engineering campaigns are driven by the demand for superior enzyme performance under non-natural process conditions, such as elevated temperature or non-neutral pH, to achieve utmost efficiency and conserve limited resources. Phytases are industrial relevant feed enzymes that contribute to the overall phosphorus (P) management by catalyzing the stepwise phosphate hydrolysis from phytate, which is the main phosphorus storage in plants. Phosphorus is referred to as a critical disappearing nutrient, emphasizing the urgent need to implement strategies for a sustainable circular use and recovery of P from renewable resources. Engineered phytases already contribute today to an efficient phosphorus mobilization in the feeding industry and might pave the way to a circular P-bioeconomy. To date, a bottleneck in its application is the drastically reduced hydrolysis on lower phosphorylated reaction intermediates (lower inositol phosphates, ≤InsP4) and their subsequent accumulation. Here, we report the first KnowVolution campaign of the E. coli phytase toward improved hydrolysis on InsP4 and InsP3. As a prerequisite prior to evolution, a suitable screening setup was established and three isomers Ins(2,4,5)P3, Ins(2,3,4,5)P4 and Ins(1,2,5,6)P4 were generated through enzymatic hydrolysis of InsP6 and subsequent purification by HPLC. Screening of epPCR libraries identified clones with improved hydrolysis on Ins(1,2,5,6)P4 carrying substitutions involved in substrate binding and orientation. Saturation of seven positions and screening of, in total, 10,000 clones generated a dataset of 46 variants on their activity on all three isomers. This dataset was used for training, testing, and inferring models for machine learning guided recombination. The PyPEF method used allowed the prediction of recombinants from the identified substitutions, which were analyzed by reverse engineering to gain molecular understanding. Six variants with improved InsP4 hydrolysis of >2.5 were identified, of which variant T23L/K24S had a 3.7-fold improved relative activity on Ins(2,3,4,5)P4 and concomitantly shows a 2.7-fold improved hydrolysis of Ins(2,4,5)P3. Reported substitutions are the first published Ec phy variants with improved hydrolysis on InsP4 and InsP3.
Collapse
|
13
|
Saito Y, Oikawa M, Sato T, Nakazawa H, Ito T, Kameda T, Tsuda K, Umetsu M. Machine-Learning-Guided Library Design Cycle for Directed Evolution of Enzymes: The Effects of Training Data Composition on Sequence Space Exploration. ACS Catal 2021. [DOI: 10.1021/acscatal.1c03753] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Misaki Oikawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Takumi Sato
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Hikaru Nakazawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Tomoyuki Ito
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Tomoshi Kameda
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Koji Tsuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Mitsuo Umetsu
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| |
Collapse
|