1
|
Harding-Larsen D, Funk J, Madsen NG, Gharabli H, Acevedo-Rocha CG, Mazurenko S, Welner DH. Protein representations: Encoding biological information for machine learning in biocatalysis. Biotechnol Adv 2024; 77:108459. [PMID: 39366493 DOI: 10.1016/j.biotechadv.2024.108459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 09/19/2024] [Accepted: 09/29/2024] [Indexed: 10/06/2024]
Abstract
Enzymes offer a more environmentally friendly and low-impact solution to conventional chemistry, but they often require additional engineering for their application in industrial settings, an endeavour that is challenging and laborious. To address this issue, the power of machine learning can be harnessed to produce predictive models that enable the in silico study and engineering of improved enzymatic properties. Such machine learning models, however, require the conversion of the complex biological information to a numerical input, also called protein representations. These inputs demand special attention to ensure the training of accurate and precise models, and, in this review, we therefore examine the critical step of encoding protein information to numeric representations for use in machine learning. We selected the most important approaches for encoding the three distinct biological protein representations - primary sequence, 3D structure, and dynamics - to explore their requirements for employment and inductive biases. Combined representations of proteins and substrates are also introduced as emergent tools in biocatalysis. We propose the division of fixed representations, a collection of rule-based encoding strategies, and learned representations extracted from the latent spaces of large neural networks. To select the most suitable protein representation, we propose two main factors to consider. The first one is the model setup, which is influenced by the size of the training dataset and the choice of architecture. The second factor is the model objectives such as consideration about the assayed property, the difference between wild-type models and mutant predictors, and requirements for explainability. This review is aimed at serving as a source of information and guidance for properly representing enzymes in future machine learning models for biocatalysis.
Collapse
Affiliation(s)
- David Harding-Larsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Jonathan Funk
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Niklas Gesmar Madsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Hani Gharabli
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Carlos G Acevedo-Rocha
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Ditte Hededam Welner
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark.
| |
Collapse
|
2
|
Jaufer AM, Bouhadana A, Kharrazizadeh A, Zhou M, Colina CM, Fanucci GE. Designing surface exposed sites on Bacillus subtilis lipase A for spin-labeling and hydration studies. Biophys Chem 2024; 308:107203. [PMID: 38382282 DOI: 10.1016/j.bpc.2024.107203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 02/09/2024] [Accepted: 02/15/2024] [Indexed: 02/23/2024]
Abstract
Spin-labeling with electron paramagnetic resonance spectroscopy (EPR) is a facile method for interrogating macromolecular flexibility, conformational changes, accessibility, and hydration. Within we present a computationally based approach for the rational selection of reporter sites in Bacillus subtilis lipase A (BSLA) for substitution to cysteine residues with subsequent modification with a spin-label that are expected to not significantly perturb the wild-type structure, dynamics, or enzymatic function. Experimental circular dichroism spectroscopy, Michaelis-Menten kinetic parameters and EPR spectroscopy data validate the success of this approach to computationally select reporter sites for future magnetic resonance investigations of hydration and hydration changes induced by polymer conjugation, tethering, immobilization, or amino acid substitution in BSLA. Analysis of molecular dynamic simulations of the impact of substitutions on the secondary structure agree well with experimental findings. We propose that this computationally guided approach for choosing spin-labeled EPR reporter sites, which evaluates relative surface accessibility coupled with hydrogen bonding occupancy of amino acids to the catalytic pocket via atomistic simulations, should be readily transferable to other macromolecular systems of interest including selecting sites for paramagnetic relaxation enhancement NMR studies, other spin-labeling EPR studies or any method requiring a tagging method where it is desirable to not alter enzyme stability or activity.
Collapse
Affiliation(s)
- Afnan M Jaufer
- Department of Chemistry, University of Florida, PO BOX 117200, Gainesville, FL 32611, USA; George and Josephine Butler Polymer Research Laboratory, University of Florida, Gainesville, FL 32611, USA.
| | - Adam Bouhadana
- Department of Chemistry, University of Florida, PO BOX 117200, Gainesville, FL 32611, USA.
| | - Amir Kharrazizadeh
- Department of Chemistry, University of Florida, PO BOX 117200, Gainesville, FL 32611, USA.
| | - Mingwei Zhou
- Department of Chemistry, University of Florida, PO BOX 117200, Gainesville, FL 32611, USA.
| | - Coray M Colina
- Department of Chemistry, University of Florida, PO BOX 117200, Gainesville, FL 32611, USA; George and Josephine Butler Polymer Research Laboratory, University of Florida, Gainesville, FL 32611, USA; Department of Materials Science and Engineering, University of Florida, PO BOX 117200, Gainesville, FL 32611, USA.
| | - Gail E Fanucci
- Department of Chemistry, University of Florida, PO BOX 117200, Gainesville, FL 32611, USA.
| |
Collapse
|
3
|
Jaufer AM, Bouhadana A, Fanucci GE. Hydrophobic Clusters Regulate Surface Hydration Dynamics of Bacillus subtilis Lipase A. J Phys Chem B 2024; 128:3919-3928. [PMID: 38628066 DOI: 10.1021/acs.jpcb.4c00405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
The surface hydration diffusivity of Bacillus subtilis Lipase A (BSLA) has been characterized by low-field Overhauser dynamic nuclear polarization (ODNP) relaxometry using a series of spin-labeled constructs. Sites for spin-label incorporation were previously designed via an atomistic computational approach that screened for surface exposure, reflective of the surface hydration comparable to other proteins studied by this method, as well as minimal impact on protein function, dynamics, and structure of BSLA by excluding any surface site that participated in greater than 30% occupancy of a hydrogen bonding network within BSLA. Experimental ODNP relaxometry coupling factor results verify the overall surface hydration behavior for these BSLA spin-labeled sites similar to other globular proteins. Here, by plotting the ODNP parameters of relative diffusive water versus the relative bound water, we introduce an effective "phase-space" analysis, which provides a facile visual comparison of the ODNP parameters of various biomolecular systems studied to date. We find notable differences when comparing BSLA to other systems, as well as when comparing different clusters on the surface of BSLA. Specifically, we find a grouping of sites that correspond to the spin-label surface location within the two main hydrophobic core clusters of the branched aliphatic amino acids isoleucine, leucine, and valine cores observed in the BSLA crystal structure. The results imply that hydrophobic clustering may dictate local surface hydration properties, perhaps through modulation of protein conformations and samplings of the unfolded states, providing insights into how the dynamics of the hydration shell is coupled to protein motion and fluctuations.
Collapse
Affiliation(s)
- Afnan M Jaufer
- Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, Florida 32611, United States
- George and Josephine Butler Polymer Research Laboratory, University of Florida, Gainesville, Florida 32611, United States
| | - Adam Bouhadana
- Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, Florida 32611, United States
| | - Gail E Fanucci
- Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, Florida 32611, United States
- George and Josephine Butler Polymer Research Laboratory, University of Florida, Gainesville, Florida 32611, United States
| |
Collapse
|