1
|
Wang X, Yu S, Sun R, Xu K, Wang K, Wang R, Zhang J, Tao W, Yu S, Linghu K, Zhao X, Zhou J. Identification of a human type XVII collagen fragment with high capacity for maintaining skin health. Synth Syst Biotechnol 2024; 9:733-741. [PMID: 38911060 PMCID: PMC11192991 DOI: 10.1016/j.synbio.2024.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 06/04/2024] [Accepted: 06/05/2024] [Indexed: 06/25/2024] Open
Abstract
Collagen XVII (COL17) is a transmembrane protein that mediates skin homeostasis. Due to expression of full length collagen was hard to achieve in microorganisms, arising the needs for selection of collagen fragments with desired functions for microbial biosynthesis. Here, COL17 fragments (27-33 amino acids) were extracted and replicated 16 times for recombinant expression in Escherichia coli. Five variants were soluble expressed, with the highest yield of 223 mg/L. The fusion tag was removed for biochemical and biophysical characterization. Circular dichroism results suggested one variant (sample-1707) with a triple-helix structure at >37 °C. Sample-1707 can assemble into nanofiber (width, 5.6 nm) and form hydrogel at 3 mg/mL. Sample-1707 was shown to induce blood clotting and promote osteoblast differentiation. Furthermore, sample-1707 exhibited high capacity to induce mouse hair follicle stem cells differentiation and osteoblast migration, demonstrating a high capacity to induce skin cell regeneration and promote wound healing. A strong hydrogel was prepared from a chitosan and sample-1707 complex with a swelling rate of >30 % higher than simply using chitosan. Fed-batch fermentation of sample-1707 with a 5-L bioreactor obtained a yield of 600 mg/L. These results support the large-scale production of sample-1707 as a biomaterial for use in the skin care industry.
Collapse
Affiliation(s)
- Xinglong Wang
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Shuyao Yu
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Ruoxi Sun
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Kangjie Xu
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Kun Wang
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Ruiyan Wang
- Bloomage Biotechnology Corporation Limited, 678 Tianchen Street, Jinan, Shandong, 250101, China
| | - Junli Zhang
- Bloomage Biotechnology Corporation Limited, 678 Tianchen Street, Jinan, Shandong, 250101, China
| | - Wenwen Tao
- Bloomage Biotechnology Corporation Limited, 678 Tianchen Street, Jinan, Shandong, 250101, China
| | - Shangyang Yu
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Kai Linghu
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Xinyi Zhao
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Jingwen Zhou
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, Wuxi, 214122, China
- School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, China
| |
Collapse
|
2
|
Asen ND, Udenigwe CC, Aluko RE. Quantitative Structure-Activity Relationship Modeling of Pea Protein-Derived Acetylcholinesterase and Butyrylcholinesterase Inhibitory Peptides. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:16323-16330. [PMID: 37856319 DOI: 10.1021/acs.jafc.3c04880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
The aim of this work was to determine the structural requirements for peptides that inhibit acetylcholinesterase (AChE) and butyrylcholinesterase (BuChE) activities. The data set used consisted of 19 oligopeptides that had been identified through mass spectrometry analysis of enzymatic digests of yellow field pea protein. The structure-function relationship was analyzed by partial least squares regression using the 5z scores. A nine-component model was created from 16 peptides for AChE inhibitory peptides (Q2 = 67.2% and R2 = 0.9974), while three data sets were prepared for BuChE inhibitory peptides to improve the quality of the models (Q2 = 26.7-46.4% and R2 = 0.9577-0.9958). The most active peptides from the PLS models have threonine, leucine, alanine, and valine at the N terminal, asparagine, histidine, proline, and arginine at the second position, with aspartic acid and serine at the third, and arginine at the C terminal.
Collapse
Affiliation(s)
- Nancy D Asen
- Department of Food and Human Nutritional Sciences, University of Manitoba, Winnipeg, Manitoba R3T 2N2, Canada
| | - Chibuike C Udenigwe
- Department of Food and Human Nutritional Sciences, University of Manitoba, Winnipeg, Manitoba R3T 2N2, Canada
- School of Nutrition Sciences, Faculty of Health Sciences, University of Ottawa, Ottawa, Ontario K1H 8M5, Canada
| | - Rotimi E Aluko
- Department of Food and Human Nutritional Sciences, University of Manitoba, Winnipeg, Manitoba R3T 2N2, Canada
- Richardson Centre for Food Technology and Research, University of Manitoba, Winnipeg, Manitoba R3T 2N2, Canada
| |
Collapse
|
3
|
Emonts J, Buyel J. An overview of descriptors to capture protein properties - Tools and perspectives in the context of QSAR modeling. Comput Struct Biotechnol J 2023; 21:3234-3247. [PMID: 38213891 PMCID: PMC10781719 DOI: 10.1016/j.csbj.2023.05.022] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/23/2023] [Accepted: 05/23/2023] [Indexed: 01/13/2024] Open
Abstract
Proteins are important ingredients in food and feed, they are the active components of many pharmaceutical products, and they are necessary, in the form of enzymes, for the success of many technical processes. However, production can be challenging, especially when using heterologous host cells such as bacteria to express and assemble recombinant mammalian proteins. The manufacturability of proteins can be hindered by low solubility, a tendency to aggregate, or inefficient purification. Tools such as in silico protein engineering and models that predict separation criteria can overcome these issues but usually require the complex shape and surface properties of proteins to be represented by a small number of quantitative numeric values known as descriptors, as similarly used to capture the features of small molecules. Here, we review the current status of protein descriptors, especially for application in quantitative structure activity relationship (QSAR) models. First, we describe the complexity of proteins and the properties that descriptors must accommodate. Then we introduce descriptors of shape and surface properties that quantify the global and local features of proteins. Finally, we highlight the current limitations of protein descriptors and propose strategies for the derivation of novel protein descriptors that are more informative.
Collapse
Affiliation(s)
- J. Emonts
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Germany
| | - J.F. Buyel
- University of Natural Resources and Life Sciences, Vienna (BOKU), Department of Biotechnology (DBT), Institute of Bioprocess Science and Engineering (IBSE), Muthgasse 18, 1190 Vienna, Austria
- Institute for Molecular Biotechnology, Worringerweg 1, RWTH Aachen University, 52074 Aachen, Germany
| |
Collapse
|
4
|
Wittmund M, Cadet F, Davari MD. Learning Epistasis and Residue Coevolution Patterns: Current Trends and Future Perspectives for Advancing Enzyme Engineering. ACS Catal 2022. [DOI: 10.1021/acscatal.2c01426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Marcel Wittmund
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle, Germany
| | - Frederic Cadet
- Laboratory of Excellence LABEX GR, DSIMB, Inserm UMR S1134, University of Paris city & University of Reunion, Paris 75014, France
| | - Mehdi D. Davari
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle, Germany
| |
Collapse
|
5
|
Kuntz CP, Woods H, McKee AG, Zelt NB, Mendenhall JL, Meiler J, Schlebach JP. Towards generalizable predictions for G protein-coupled receptor variant expression. Biophys J 2022; 121:2712-2720. [PMID: 35715957 DOI: 10.1016/j.bpj.2022.06.018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 05/31/2022] [Accepted: 06/13/2022] [Indexed: 11/30/2022] Open
Abstract
Missense mutations that compromise the plasma membrane expression (PME) of integral membrane proteins are the root cause of numerous genetic diseases. Differentiation of this class of mutations from those that specifically modify the activity of the folded protein has proven useful for the development and targeting of precision therapeutics. Nevertheless, it remains challenging to predict the effects of mutations on the stability and/ or expression of membrane proteins. In this work, we utilize deep mutational scanning data to train a series of artificial neural networks to predict the PME of transmembrane domain variants of G protein-coupled receptors from structural and/ or evolutionary features. We show that our best-performing network, which we term the PME predictor, can recapitulate mutagenic trends within rhodopsin and can differentiate pathogenic transmembrane domain variants that cause it to misfold from those that compromise its signaling. This network also generates statistically significant predictions for the relative PME of transmembrane domain variants for another class A G protein-coupled receptor (β2 adrenergic receptor) but not for an unrelated voltage-gated potassium channel (KCNQ1). Notably, our analyses of these networks suggest structural features alone are generally sufficient to recapitulate the observed mutagenic trends. Moreover, our findings imply that networks trained in this manner may be generalizable to proteins that share a common fold. Implications of our findings for the design of mechanistically specific genetic predictors are discussed.
Collapse
Affiliation(s)
- Charles P Kuntz
- Department of Chemistry, Indiana University, Bloomington, Indiana
| | - Hope Woods
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee; Chemical and Physical Biology Program, Vanderbilt University, Nashville, Tennessee
| | - Andrew G McKee
- Department of Chemistry, Indiana University, Bloomington, Indiana
| | - Nathan B Zelt
- Department of Chemistry, Indiana University, Bloomington, Indiana
| | - Jeffrey L Mendenhall
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee; Chemical and Physical Biology Program, Vanderbilt University, Nashville, Tennessee
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee; Institute for Drug Discovery, Leipzig University Medical School, Leipzig, Saxony, Germany.
| | | |
Collapse
|
6
|
Zimmermann MT. Molecular Modeling is an Enabling Approach to Complement and Enhance Channelopathy Research. Compr Physiol 2022; 12:3141-3166. [PMID: 35578963 DOI: 10.1002/cphy.c190047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Hundreds of human membrane proteins form channels that transport necessary ions and compounds, including drugs and metabolites, yet details of their normal function or how function is altered by genetic variants to cause diseases are often unknown. Without this knowledge, researchers are less equipped to develop approaches to diagnose and treat channelopathies. High-resolution computational approaches such as molecular modeling enable researchers to investigate channelopathy protein function, facilitate detailed hypothesis generation, and produce data that is difficult to gather experimentally. Molecular modeling can be tailored to each physiologic context that a protein may act within, some of which may currently be difficult or impossible to assay experimentally. Because many genomic variants are observed in channelopathy proteins from high-throughput sequencing studies, methods with mechanistic value are needed to interpret their effects. The eminent field of structural bioinformatics integrates techniques from multiple disciplines including molecular modeling, computational chemistry, biophysics, and biochemistry, to develop mechanistic hypotheses and enhance the information available for understanding function. Molecular modeling and simulation access 3D and time-dependent information, not currently predictable from sequence. Thus, molecular modeling is valuable for increasing the resolution with which the natural function of protein channels can be investigated, and for interpreting how genomic variants alter them to produce physiologic changes that manifest as channelopathies. © 2022 American Physiological Society. Compr Physiol 12:3141-3166, 2022.
Collapse
Affiliation(s)
- Michael T Zimmermann
- Bioinformatics Research and Development Laboratory, Genomic Sciences and Precision Medicine Center, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Clinical and Translational Sciences Institute, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| |
Collapse
|
7
|
Mckenna A, P N Dubey S. Machine Learning Based Predictive Model for the Analysis of Sequence Activity Relationships Using Protein Spectra and Protein Descriptors. J Biomed Inform 2022; 128:104016. [PMID: 35143999 DOI: 10.1016/j.jbi.2022.104016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 12/13/2021] [Accepted: 02/03/2022] [Indexed: 11/26/2022]
Abstract
Accurately establishing the connection between a protein sequence and its function remains a focal point within the field of protein engineering, especially in the context of predicting the effects of mutations. From this, there has been a continued drive to build accurate and reliable predictive models via machine learning that allow for the virtual screening of many protein mutant sequences, measuring the relationship between sequence and 'fitness' or 'activity', commonly known as a Sequence-Activity-Relationship (SAR). An important preliminary stage in the building of these predictive models is the encoding of the chosen sequences. Evaluated in this work is a plethora of encoding strategies using the Amino Acid Index database, where the indices are transformed into their spectral form via Digital Signal Processing (DSP) techniques, as well as numerous protein structural and physiochemical descriptors. The encoding strategies are explored on a dataset curated to measure the thermostability of various mutants from a recombination library, designed from parental cytochrome P450s. In this work it was concluded that the implementation of protein spectra in concatenation with protein descriptors, together with the Partial Least Squares Regression (PLS) algorithm, gave the most noteworthy increase in the quality of the predictive models (as described in Encoding Strategy C), highlighting their utility in identifying an SAR. The accompanying software produced for this paper is termed pySAR (Python Sequence-Activity-Relationship), which allows for a user to find the optimal arrangement of structural and or physiochemical properties to encode their specific mutant library dataset; the source code is available at: https://github.com/amckenna41/pySAR.
Collapse
Affiliation(s)
- Adam Mckenna
- School of Electronics, Electrical Engineering and Computer Science, Queen's University of Belfast, University Road, BT7 1NN, Belfast, United Kingdom.
| | - Sandhya P N Dubey
- Department of Data Science and Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education (MAHE), Manipal, Karnataka 576104, India.
| |
Collapse
|
8
|
Samukhina YV, Matyushin DD, Grinevich OI, Buryak AK. A Deep Convolutional Neural Network for Prediction of Peptide Collision Cross Sections in Ion Mobility Spectrometry. Biomolecules 2021; 11:1904. [PMID: 34944547 PMCID: PMC8699202 DOI: 10.3390/biom11121904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 12/13/2021] [Accepted: 12/17/2021] [Indexed: 11/26/2022] Open
Abstract
Most frequently, the identification of peptides in mass spectrometry-based proteomics is carried out using high-resolution tandem mass spectrometry. In order to increase the accuracy of analysis, additional information on the peptides such as chromatographic retention time and collision cross section in ion mobility spectrometry can be used. An accurate prediction of the collision cross section values allows erroneous candidates to be rejected using a comparison of the observed values and the predictions based on the amino acids sequence. Recently, a massive high-quality data set of peptide collision cross sections was released. This opens up an opportunity to apply the most sophisticated deep learning techniques for this task. Previously, it was shown that a recurrent neural network allows for predicting these values accurately. In this work, we present a deep convolutional neural network that enables us to predict these values more accurately compared with previous studies. We use a neural network with complex architecture that contains both convolutional and fully connected layers and comprehensive methods of converting a peptide to multi-channel 1D spatial data and vector. The source code and pre-trained model are available online.
Collapse
Affiliation(s)
| | - Dmitriy D. Matyushin
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia; (Y.V.S.); (O.I.G.); (A.K.B.)
| | | | | |
Collapse
|
9
|
Bell DR, Chen SH. Toward Guided Mutagenesis: Gaussian Process Regression Predicts MHC Class II Antigen Mutant Binding. J Chem Inf Model 2021; 61:4857-4867. [PMID: 34375111 DOI: 10.1021/acs.jcim.1c00458] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Antigen-specific immunotherapies (ASI) require successful loading and presentation of antigen peptides into the major histocompatibility complex (MHC) binding cleft. One route of ASI design is to mutate native antigens for either stronger or weaker binding interaction to MHC. Exploring all possible mutations is costly both experimentally and computationally. To reduce experimental and computational expense, here we investigate the minimal amount of prior data required to accurately predict the relative binding affinity of point mutations for peptide-MHC class II (pMHCII) binding. Using data from different residue subsets, we interpolate pMHCII mutant binding affinities by Gaussian process (GP) regression of residue volume and hydrophobicity. We apply GP regression to an experimental data set from the Immune Epitope Database, and theoretical data sets from NetMHCIIpan and Free Energy Perturbation calculations. We find that GP regression can predict binding affinities of nine neutral residues from a six-residue subset with an average R2 coefficient of determination value of 0.62 ± 0.04 (±95% CI), average error of 0.09 ± 0.01 kcal/mol (±95% CI), and with an receiver operating characteristic (ROC) AUC value of 0.92 for binary classification of enhanced or diminished binding affinity. Similarly, metrics increase to an R2 value of 0.69 ± 0.04, average error of 0.07 ± 0.01 kcal/mol, and an ROC AUC value of 0.94 for predicting seven neutral residues from an eight-residue subset. Our work finds that prediction is most accurate for neutral residues at anchor residue sites without register shift. This work holds relevance to predicting pMHCII binding and accelerating ASI design.
Collapse
Affiliation(s)
- David R Bell
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21701, United States
| | - Serena H Chen
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, United States
| |
Collapse
|
10
|
Siedhoff NE, Illig AM, Schwaneberg U, Davari MD. PyPEF-An Integrated Framework for Data-Driven Protein Engineering. J Chem Inf Model 2021; 61:3463-3476. [PMID: 34260225 DOI: 10.1021/acs.jcim.1c00099] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Data-driven strategies are gaining increased attention in protein engineering due to recent advances in access to large experimental databanks of proteins, next-generation sequencing (NGS), high-throughput screening (HTS) methods, and the development of artificial intelligence algorithms. However, the reliable prediction of beneficial amino acid substitutions, their combination, and the effect on functional properties remain the most significant challenges in protein engineering, which is applied to develop proteins and enzymes for biocatalysis, biomedicine, and life sciences. Here, we present a general-purpose framework (PyPEF: pythonic protein engineering framework) for performing data-driven protein engineering using machine learning methods combined with techniques from signal processing and statistical physics. PyPEF guides the identification and selection of beneficial proteins of a defined sequence space by systematically or randomly exploring the fitness of variants and by sampling random evolution pathways. The performance of PyPEF was evaluated concerning its predictive accuracy and throughput on four public protein and enzyme data sets using common regression models. It was proved that the program could efficiently predict the fitness of protein sequences for different target properties (predictive models with coefficient of determination values ranging from 0.58 to 0.92). By combining machine learning and protein evolution, PyPEF enabled the screening of proteins with various functions, reaching a screening capacity of more than 500,000 protein sequence variants in the timeframe of only a few minutes on a personal computer. PyPEF displayed significant accuracies on four public data sets (different proteins and properties) and underlined the potential of integrating data-driven technologies for covering different philosophies by either predicting the fitness of the variants to the highest accuracy accounting for epistatic effects or capturing the general trend of introduced mutations on the fitness in directed protein evolution campaigns. In essence, PyPEF can provide a powerful solution to current sequence exploration and combinatorial problems faced in protein engineering through exhaustive in silico screening of the sequence space.
Collapse
Affiliation(s)
- Niklas E Siedhoff
- Institute of Biotechnology, RWTH Aachen University, Worringer Weg 3, 52074 Aachen, Germany
| | | | - Ulrich Schwaneberg
- Institute of Biotechnology, RWTH Aachen University, Worringer Weg 3, 52074 Aachen, Germany.,DWI-Leibniz Institute for Interactive Materials, Forckenbeckstraße 50, 52074 Aachen, Germany
| | - Mehdi D Davari
- Institute of Biotechnology, RWTH Aachen University, Worringer Weg 3, 52074 Aachen, Germany
| |
Collapse
|
11
|
Shwaiki LN, Lynch KM, Arendt EK. Future of antimicrobial peptides derived from plants in food application – A focus on synthetic peptides. Trends Food Sci Technol 2021. [DOI: 10.1016/j.tifs.2021.04.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
12
|
Ochoa R, Cossio P. PepFun: Open Source Protocols for Peptide-Related Computational Analysis. Molecules 2021; 26:molecules26061664. [PMID: 33809815 PMCID: PMC8002403 DOI: 10.3390/molecules26061664] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/05/2021] [Accepted: 03/15/2021] [Indexed: 11/27/2022] Open
Abstract
Peptide research has increased during the last years due to their applications as biomarkers, therapeutic alternatives or as antigenic sub-units in vaccines. The implementation of computational resources have facilitated the identification of novel sequences, the prediction of properties, and the modelling of structures. However, there is still a lack of open source protocols that enable their straightforward analysis. Here, we present PepFun, a compilation of bioinformatics and cheminformatics functionalities that are easy to implement and customize for studying peptides at different levels: sequence, structure and their interactions with proteins. PepFun enables calculating multiple characteristics for massive sets of peptide sequences, and obtaining different structural observables derived from protein-peptide complexes. In addition, random or guided library design of peptide sequences can be customized for screening campaigns. The package has been created under the python language based on built-in functions and methods available in the open source projects BioPython and RDKit. We present two tutorials where we tested peptide binders of the MHC class II and the Granzyme B protease.
Collapse
Affiliation(s)
- Rodrigo Ochoa
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia, Medellin 050010, Colombia;
| | - Pilar Cossio
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia, Medellin 050010, Colombia;
- Department of Theoretical Biophysics, Max Planck Institute of Biophysics, 60348 Frankfurt am Main, Germany
- Correspondence:
| |
Collapse
|
13
|
Li G, Qin Y, Fontaine NT, Ng Fuk Chong M, Maria‐Solano MA, Feixas F, Cadet XF, Pandjaitan R, Garcia‐Borràs M, Cadet F, Reetz MT. Machine Learning Enables Selection of Epistatic Enzyme Mutants for Stability Against Unfolding and Detrimental Aggregation. Chembiochem 2021; 22:904-914. [PMID: 33094545 PMCID: PMC7984044 DOI: 10.1002/cbic.202000612] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 10/22/2020] [Indexed: 12/15/2022]
Abstract
Machine learning (ML) has pervaded most areas of protein engineering, including stability and stereoselectivity. Using limonene epoxide hydrolase as the model enzyme and innov'SAR as the ML platform, comprising a digital signal process, we achieved high protein robustness that can resist unfolding with concomitant detrimental aggregation. Fourier transform (FT) allows us to take into account the order of the protein sequence and the nonlinear interactions between positions, and thus to grasp epistatic phenomena. The innov'SAR approach is interpolative, extrapolative and makes outside-the-box, predictions not found in other state-of-the-art ML or deep learning approaches. Equally significant is the finding that our approach to ML in the present context, flanked by advanced molecular dynamics simulations, uncovers the connection between epistatic mutational interactions and protein robustness.
Collapse
Affiliation(s)
- Guangyue Li
- State Key Laboratory for Biology of Plant Diseases and Insect Pests Key Laboratory of Control of Biological Hazard Factors (Plant Origin) for Agri-product Quality and Safety Ministry of Agriculture, Institute of Plant ProtectionChinese Academy of Agricultural SciencesBeijing100081P. R. China
| | - Youcai Qin
- State Key Laboratory for Biology of Plant Diseases and Insect Pests Key Laboratory of Control of Biological Hazard Factors (Plant Origin) for Agri-product Quality and Safety Ministry of Agriculture, Institute of Plant ProtectionChinese Academy of Agricultural SciencesBeijing100081P. R. China
| | - Nicolas T. Fontaine
- PEACCELArtificial Intelligence Department6 Square Albin Cachot, Box 4275013ParisFrance) .
| | - Matthieu Ng Fuk Chong
- PEACCELArtificial Intelligence Department6 Square Albin Cachot, Box 4275013ParisFrance) .
| | - Miguel A. Maria‐Solano
- Institut de Química Computacional i Catàlisi and Departament de QuímicaUniversitat de Girona Campus Montilivi17003Girona, CataloniaSpain) .
| | - Ferran Feixas
- Institut de Química Computacional i Catàlisi and Departament de QuímicaUniversitat de Girona Campus Montilivi17003Girona, CataloniaSpain) .
| | - Xavier F. Cadet
- PEACCELArtificial Intelligence Department6 Square Albin Cachot, Box 4275013ParisFrance) .
| | - Rudy Pandjaitan
- PEACCELArtificial Intelligence Department6 Square Albin Cachot, Box 4275013ParisFrance) .
| | - Marc Garcia‐Borràs
- Institut de Química Computacional i Catàlisi and Departament de QuímicaUniversitat de Girona Campus Montilivi17003Girona, CataloniaSpain) .
| | - Frederic Cadet
- PEACCELArtificial Intelligence Department6 Square Albin Cachot, Box 4275013ParisFrance) .
| | - Manfred T. Reetz
- Department of ChemistryPhilipps-Universität35032MarburgGermany) .
- Max-Planck-Institut fuer Kohlenforschung45470MülheimGermany
- Tianjin Institute of Industrial BiotechnologyChinese Academy of Sciences32 West 7th Avenue, Tianjin Airport Economic Area300308TianjinP. R. China
| |
Collapse
|
14
|
Vornholt T, Christoffel F, Pellizzoni MM, Panke S, Ward TR, Jeschek M. Systematic engineering of artificial metalloenzymes for new-to-nature reactions. SCIENCE ADVANCES 2021; 7:eabe4208. [PMID: 33523952 DOI: 10.1126/sciadv.abe4208] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Accepted: 12/04/2020] [Indexed: 06/12/2023]
Abstract
Artificial metalloenzymes (ArMs) catalyzing new-to-nature reactions could play an important role in transitioning toward a sustainable economy. While ArMs have been created for various transformations, attempts at their genetic optimization have been case specific and resulted mostly in modest improvements. To realize their full potential, methods to rapidly discover active ArM variants for ideally any reaction of interest are required. Here, we introduce a reaction-independent, automation-compatible platform, which relies on periplasmic compartmentalization in Escherichia coli to rapidly and reliably engineer ArMs based on the biotin-streptavidin technology. We systematically assess 400 ArM mutants for five bioorthogonal transformations involving different metals, reaction mechanisms, and reactants, which include novel ArMs for gold-catalyzed hydroamination and hydroarylation. Activity enhancements up to 15-fold highlight the potential of the systematic approach. Furthermore, we suggest smart screening strategies and build machine learning models that accurately predict ArM activity from sequence, which has crucial implications for future ArM development.
Collapse
Affiliation(s)
- Tobias Vornholt
- Department of Biosystems Science and Engineering, ETH Zurich, CH-4058 Basel, Switzerland
- National Centre of Competence in Research (NCCR) Molecular Systems Engineering, Basel, Switzerland
| | - Fadri Christoffel
- National Centre of Competence in Research (NCCR) Molecular Systems Engineering, Basel, Switzerland
- Department of Chemistry, University of Basel, Mattenstrasse 24a, BPR 1096, CH-4002 Basel, Switzerland
| | - Michela M Pellizzoni
- National Centre of Competence in Research (NCCR) Molecular Systems Engineering, Basel, Switzerland
- Department of Chemistry, University of Basel, Mattenstrasse 24a, BPR 1096, CH-4002 Basel, Switzerland
| | - Sven Panke
- Department of Biosystems Science and Engineering, ETH Zurich, CH-4058 Basel, Switzerland
- National Centre of Competence in Research (NCCR) Molecular Systems Engineering, Basel, Switzerland
| | - Thomas R Ward
- National Centre of Competence in Research (NCCR) Molecular Systems Engineering, Basel, Switzerland
- Department of Chemistry, University of Basel, Mattenstrasse 24a, BPR 1096, CH-4002 Basel, Switzerland
| | - Markus Jeschek
- Department of Biosystems Science and Engineering, ETH Zurich, CH-4058 Basel, Switzerland.
- National Centre of Competence in Research (NCCR) Molecular Systems Engineering, Basel, Switzerland
| |
Collapse
|
15
|
Özçelik R, Öztürk H, Özgür A, Ozkirimli E. ChemBoost: A Chemical Language Based Approach for Protein - Ligand Binding Affinity Prediction. Mol Inform 2020; 40:e2000212. [PMID: 33225594 DOI: 10.1002/minf.202000212] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Accepted: 11/20/2020] [Indexed: 11/07/2022]
Abstract
Identification of high affinity drug-target interactions is a major research question in drug discovery. Proteins are generally represented by their structures or sequences. However, structures are available only for a small subset of biomolecules and sequence similarity is not always correlated with functional similarity. We propose ChemBoost, a chemical language based approach for affinity prediction using SMILES syntax. We hypothesize that SMILES is a codified language and ligands are documents composed of chemical words. These documents can be used to learn chemical word vectors that represent words in similar contexts with similar vectors. In ChemBoost, the ligands are represented via chemical word embeddings, while the proteins are represented through sequence-based features and/or chemical words of their ligands. Our aim is to process the patterns in SMILES as a language to predict protein-ligand affinity, even when we cannot infer the function from the sequence. We used eXtreme Gradient Boosting to predict protein-ligand affinities in KIBA and BindingDB data sets. ChemBoost was able to predict drug-target binding affinity as well as or better than state-of-the-art machine learning systems. When powered with ligand-centric representations, ChemBoost was more robust to the changes in protein sequence similarity and successfully captured the interactions between a protein and a ligand, even if the protein has low sequence similarity to the known targets of the ligand.
Collapse
Affiliation(s)
- Rıza Özçelik
- Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey
| | - Hakime Öztürk
- Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey
| | - Arzucan Özgür
- Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey
| | - Elif Ozkirimli
- Department of Chemical Engineering, Boğaziçi University, Istanbul, Turkey.,Data and Analytics Chapter, Pharma International Informatics, F. Hoffmann-La Roche AG, Switzerland
| |
Collapse
|
16
|
Monomer structure fingerprints: an extension of the monomer composition version for peptide databases. J Comput Aided Mol Des 2020; 34:1147-1156. [PMID: 32812076 DOI: 10.1007/s10822-020-00336-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 08/12/2020] [Indexed: 10/23/2022]
Abstract
Previously a fingerprint based on monomer composition (MCFP) of nonribosomal peptides (NRPs) has been introduced. MCFP is a novel method for obtaining a representative description of NRP structures from their monomer composition in a fingerprint form. An effective screening and prediction of biological activities has been obtained from Norine NRPs database. In this paper, we present an extension of the MCFP fingerprint. This extension is based on adding few columns into the fingerprint; representing monomer clusters, 2D structures, peptide categories, and peptide diversity. All these data have been extracted from the NRP structure. Experiments with Norine NRPs database showed that the extended MCFP, that can be called Monomer Structure FingerPrint (MSFP) produced high prediction accuracy (> 95%) together with a high recall rate (86%) obtained when MSFP was used for prediction and similarity searching. From this study it appeared that MSFP mainly built from monomer composition can substantially be improved by adding more columns representing useful information about monomer composition and 2D structure of NRPs.
Collapse
|
17
|
Fontaine NT, Cadet XF, Vetrivel I. Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study. Int J Mol Sci 2019; 20:ijms20225640. [PMID: 31718061 PMCID: PMC6888668 DOI: 10.3390/ijms20225640] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2019] [Revised: 11/04/2019] [Accepted: 11/07/2019] [Indexed: 12/18/2022] Open
Abstract
The work aiming to unravel the correlation between protein sequence and function in the absence of structural information can be highly rewarding. We present a new way of considering descriptors from the amino acids index database for modeling and predicting the fitness value of a polypeptide chain. This approach includes the following steps: (i) Calculating Q elementary numerical sequences (Ele_SEQ) depending on the encoding of the amino acid residues, (ii) determining an extended numerical sequence (Ext_SEQ) by concatenating the Q elementary numerical sequences, wherein at least one elementary numerical sequence is a protein spectrum obtained by applying fast Fourier transformation (FFT), and (iii) predicting a value of fitness for polypeptide variants (train and/or validation set). These new descriptors were tested on four sets of proteins of different lengths (GLP-2, TNF alpha, cytochrome P450, and epoxide hydrolase) and activities (cAMP activation, binding affinity, thermostability and enantioselectivity). We show that the use of multiple physicochemical descriptors coupled with the implementation of the FFT, taking into account the interactions between residues of amino acids within the protein sequence, could lead to very significant improvement in the quality of models and predictions. The choice of the descriptor or of the combination of descriptors and/or FFT is dependent on the couple protein/fitness. This approach can provide potential users with value added to existing mutant libraries where screening efforts have so far been unsuccessful in finding improved polypeptide mutants for useful applications.
Collapse
Affiliation(s)
- Nicolas T Fontaine
- PEACCEL, Protein Engineering ACCELerator, 6 Square Albin Cachot, box 42, 75013 Paris, France
| | - Xavier F Cadet
- PEACCEL, Protein Engineering ACCELerator, 6 Square Albin Cachot, box 42, 75013 Paris, France
| | - Iyanar Vetrivel
- PEACCEL, Protein Engineering ACCELerator, 6 Square Albin Cachot, box 42, 75013 Paris, France
| |
Collapse
|
18
|
Abstract
This paper investigates a novel graph embedding procedure based on simplicial complexes. Inherited from algebraic topology, simplicial complexes are collections of increasing-order simplices (e.g., points, lines, triangles, tetrahedrons) which can be interpreted as possibly meaningful substructures (i.e., information granules) on the top of which an embedding space can be built by means of symbolic histograms. In the embedding space, any Euclidean pattern recognition system can be used, possibly equipped with feature selection capabilities in order to select the most informative symbols. The selected symbols can be analysed by field-experts in order to extract further knowledge about the process to be modelled by the learning system, hence the proposed modelling strategy can be considered as a grey-box. The proposed embedding has been tested on thirty benchmark datasets for graph classification and, further, we propose two real-world applications, namely predicting proteins’ enzymatic function and solubility propensity starting from their 3D structure in order to give an example of the knowledge discovery phase which can be carried out starting from the proposed embedding strategy.
Collapse
|
19
|
Yang KK, Wu Z, Arnold FH. Machine-learning-guided directed evolution for protein engineering. Nat Methods 2019; 16:687-694. [PMID: 31308553 DOI: 10.1038/s41592-019-0496-6] [Citation(s) in RCA: 431] [Impact Index Per Article: 86.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 06/17/2019] [Indexed: 02/06/2023]
Abstract
Protein engineering through machine-learning-guided directed evolution enables the optimization of protein functions. Machine-learning approaches predict how sequence maps to function in a data-driven manner without requiring a detailed model of the underlying physics or biological pathways. Such methods accelerate directed evolution by learning from the properties of characterized variants and using that information to select sequences that are likely to exhibit improved properties. Here we introduce the steps required to build machine-learning sequence-function models and to use those models to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to the use of machine learning for protein engineering, as well as the current literature and applications of this engineering paradigm. We illustrate the process with two case studies. Finally, we look to future opportunities for machine learning to enable the discovery of unknown protein functions and uncover the relationship between protein sequence and function.
Collapse
Affiliation(s)
- Kevin K Yang
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Zachary Wu
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Frances H Arnold
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
20
|
“Ideal correlations” for biological activity of peptides. Biosystems 2019; 181:51-57. [DOI: 10.1016/j.biosystems.2019.04.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Revised: 02/18/2019] [Accepted: 04/12/2019] [Indexed: 02/08/2023]
|
21
|
Exploring the Potential of Spherical Harmonics and PCVM for Compounds Activity Prediction. Int J Mol Sci 2019; 20:ijms20092175. [PMID: 31052500 PMCID: PMC6539940 DOI: 10.3390/ijms20092175] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 04/14/2019] [Accepted: 04/29/2019] [Indexed: 01/11/2023] Open
Abstract
Biologically active chemical compounds may provide remedies for several diseases. Meanwhile, Machine Learning techniques applied to Drug Discovery, which are cheaper and faster than wet-lab experiments, have the capability to more effectively identify molecules with the expected pharmacological activity. Therefore, it is urgent and essential to develop more representative descriptors and reliable classification methods to accurately predict molecular activity. In this paper, we investigate the potential of a novel representation based on Spherical Harmonics fed into Probabilistic Classification Vector Machines classifier, namely SHPCVM, to compound the activity prediction task. We make use of representation learning to acquire the features which describe the molecules as precise as possible. To verify the performance of SHPCVM ten-fold cross-validation tests are performed on twenty-one G protein-coupled receptors (GPCRs). Experimental outcomes (accuracy of 0.86) assessed by the classification accuracy, precision, recall, Matthews’ Correlation Coefficient and Cohen’s kappa reveal that using our Spherical Harmonics-based representation which is relatively short and Probabilistic Classification Vector Machines can achieve very satisfactory performance results for GPCRs.
Collapse
|
22
|
Xu J, Cen Y, Singh W, Fan J, Wu L, Lin X, Zhou J, Huang M, Reetz MT, Wu Q. Stereodivergent Protein Engineering of a Lipase To Access All Possible Stereoisomers of Chiral Esters with Two Stereocenters. J Am Chem Soc 2019; 141:7934-7945. [DOI: 10.1021/jacs.9b02709] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Jian Xu
- Department of Chemistry, Zhejiang University, Hangzhou 310027, PR China
| | - Yixin Cen
- Department of Chemistry, Zhejiang University, Hangzhou 310027, PR China
- State Key Laboratory of Bio-organic and Natural Products Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, PR China
| | - Warispreet Singh
- School of Chemistry and Chemical Engineering, Queen’s University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, U.K
| | - Jiajie Fan
- Department of Chemistry, Zhejiang University, Hangzhou 310027, PR China
| | - Lian Wu
- State Key Laboratory of Bio-organic and Natural Products Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, PR China
| | - Xianfu Lin
- Department of Chemistry, Zhejiang University, Hangzhou 310027, PR China
| | - Jiahai Zhou
- State Key Laboratory of Bio-organic and Natural Products Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, PR China
| | - Meilan Huang
- School of Chemistry and Chemical Engineering, Queen’s University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, U.K
| | - Manfred T. Reetz
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470 Mülheim an der Ruhr, Germany
- Chemistry Department, Philipps-University, Hans-Meerwein-Str. 4, 35032 Marburg, Germany
| | - Qi Wu
- Department of Chemistry, Zhejiang University, Hangzhou 310027, PR China
| |
Collapse
|
23
|
Li G, Dong Y, Reetz MT. Can Machine Learning Revolutionize Directed Evolution of Selective Enzymes? Adv Synth Catal 2019. [DOI: 10.1002/adsc.201900149] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Guangyue Li
- State Key Laboratory for Biology of Plant Diseases and Insect Pests/Key Laboratory of Control of Biological Hazard Factors (Plant Origin) for Agri-product Quality and Safety, Ministry of Agriculture, Institute of Plant ProtectionChinese Academy of Agricultural Sciences Beijing 100081 People's Republic of China
| | - Yijie Dong
- State Key Laboratory for Biology of Plant Diseases and Insect Pests/Key Laboratory of Control of Biological Hazard Factors (Plant Origin) for Agri-product Quality and Safety, Ministry of Agriculture, Institute of Plant ProtectionChinese Academy of Agricultural Sciences Beijing 100081 People's Republic of China
| | - Manfred T. Reetz
- Max-Planck-Institut für Kohlenforschung Kaiser-Wilhelm-Platz 1 45470 Mülheim an der Ruhr Germany
- Fachbereich Chemie der Philipps-Universität Hans-Meerwein-Strasse 35032 Marburg Germany
| |
Collapse
|
24
|
Machine learning-assisted directed protein evolution with combinatorial libraries. Proc Natl Acad Sci U S A 2019; 116:8852-8858. [PMID: 30979809 DOI: 10.1073/pnas.1901979116] [Citation(s) in RCA: 273] [Impact Index Per Article: 54.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To reduce experimental effort associated with directed protein evolution and to explore the sequence space encoded by mutating multiple positions simultaneously, we incorporate machine learning into the directed evolution workflow. Combinatorial sequence space can be quite expensive to sample experimentally, but machine-learning models trained on tested variants provide a fast method for testing sequence space computationally. We validated this approach on a large published empirical fitness landscape for human GB1 binding protein, demonstrating that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. We then provide an example application in evolving an enzyme to produce each of the two possible product enantiomers (i.e., stereodivergence) of a new-to-nature carbene Si-H insertion reaction. The approach predicted libraries enriched in functional enzymes and fixed seven mutations in two rounds of evolution to identify variants for selective catalysis with 93% and 79% ee (enantiomeric excess). By greatly increasing throughput with in silico modeling, machine learning enhances the quality and diversity of sequence solutions for a protein engineering problem.
Collapse
|
25
|
Cadet F, Fontaine N, Li G, Sanchis J, Ng Fuk Chong M, Pandjaitan R, Vetrivel I, Offmann B, Reetz MT. A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes. Sci Rep 2018; 8:16757. [PMID: 30425279 PMCID: PMC6233173 DOI: 10.1038/s41598-018-35033-y] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 10/26/2018] [Indexed: 11/09/2022] Open
Abstract
Directed evolution is an important research activity in synthetic biology and biotechnology. Numerous reports describe the application of tedious mutation/screening cycles for the improvement of proteins. Recently, knowledge-based approaches have facilitated the prediction of protein properties and the identification of improved mutants. However, epistatic phenomena constitute an obstacle which can impair the predictions in protein engineering. We present an innovative sequence-activity relationship (innov'SAR) methodology based on digital signal processing combining wet-lab experimentation and computational protein design. In our machine learning approach, a predictive model is developed to find the resulting property of the protein when the n single point mutations are permuted (2n combinations). The originality of our approach is that only sequence information and the fitness of mutants measured in the wet-lab are needed to build models. We illustrate the application of the approach in the case of improving the enantioselectivity of an epoxide hydrolase from Aspergillus niger. n = 9 single point mutants of the enzyme were experimentally assessed for their enantioselectivity and used as a learning dataset to build a model. Based on combinations of the 9 single point mutations (29), the enantioselectivity of these 512 variants were predicted, and candidates were experimentally checked: better mutants with higher enantioselectivity were indeed found.
Collapse
Affiliation(s)
- Frédéric Cadet
- PEACCEL, Protein Engineering Accelerator, Paris, France.
| | | | - Guangyue Li
- Department of Chemistry, Philipps-University, 35032, Marburg, Germany
| | - Joaquin Sanchis
- Faculty of Pharmacy and Pharmaceutical Sciences, Monash University, Parkville, Australia
| | | | | | | | - Bernard Offmann
- UFIP, UMR 6286 CNRS, UFR Sciences et Techniques, Université de Nantes, Nantes, France
| | - Manfred T Reetz
- Department of Chemistry, Philipps-University, 35032, Marburg, Germany
- Max-Planck-Institut fuer Kohlenforschung, 45470, Mülheim, Germany
| |
Collapse
|
26
|
Saito Y, Oikawa M, Nakazawa H, Niide T, Kameda T, Tsuda K, Umetsu M. Machine-Learning-Guided Mutagenesis for Directed Evolution of Fluorescent Proteins. ACS Synth Biol 2018; 7:2014-2022. [PMID: 30103599 DOI: 10.1021/acssynbio.8b00155] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Molecular evolution based on mutagenesis is widely used in protein engineering. However, optimal proteins are often difficult to obtain due to a large sequence space. Here, we propose a novel approach that combines molecular evolution with machine learning. In this approach, we conduct two rounds of mutagenesis where an initial library of protein variants is used to train a machine-learning model to guide mutagenesis for the second-round library. This enables us to prepare a small library suited for screening experiments with high enrichment of functional proteins. We demonstrated a proof-of-concept of our approach by altering the reference green fluorescent protein (GFP) so that its fluorescence is changed into yellow. We successfully obtained a number of proteins showing yellow fluorescence, 12 of which had longer wavelengths than the reference yellow fluorescent protein (YFP). These results show the potential of our approach as a powerful method for directed evolution of fluorescent proteins.
Collapse
Affiliation(s)
- Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Misaki Oikawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Hikaru Nakazawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Teppei Niide
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Tomoshi Kameda
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Koji Tsuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Mitsuo Umetsu
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| |
Collapse
|
27
|
A Quantitative Structure-Property Relationship Model Based on Chaos-Enhanced Accelerated Particle Swarm Optimization Algorithm and Back Propagation Artificial Neural Network. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8071121] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|