1
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network. Proc Natl Acad Sci U S A 2023; 120:e2303590120. [PMID: 37729196 PMCID: PMC10523478 DOI: 10.1073/pnas.2303590120] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 08/14/2023] [Indexed: 09/22/2023] Open
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key posttranslational modification involved in physiology and disease. The ability to robustly and rapidly predict protease-substrate specificity would also enable targeted proteolytic cleavage by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pretrained PGCN model to guide the design of protease libraries for cleaving two noncanonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Joseph H. Lubin
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | | | - Guanyang Wang
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| |
Collapse
|
2
|
Tague EP, McMahan JB, Tague N, Dunlop MJ, Ngo JT. Controlled Protein Activities with Viral Proteases, Antiviral Peptides, and Antiviral Drugs. ACS Chem Biol 2023; 18:1228-1236. [PMID: 37140437 PMCID: PMC10501127 DOI: 10.1021/acschembio.3c00138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Chemical control of protein activity is a powerful tool for scientific study, synthetic biology, and cell therapy; however, for broad use, effective chemical inducer systems must minimally crosstalk with endogenous processes and exhibit desirable drug delivery properties. Accordingly, the drug-controllable proteolytic activity of hepatitis C cis-protease NS3 and its associated antiviral drugs have been used to regulate protein activity and gene modulation. These tools advantageously exploit non-eukaryotic and non-prokaryotic proteins and clinically approved inhibitors. Here, we expand the toolkit by utilizing catalytically inactive NS3 protease as a high affinity binder to genetically encoded, antiviral peptides. Through our approach, we create NS3-peptide complexes that can be displaced by FDA-approved drugs to modulate transcription, cell signaling, and split-protein complementation. With our developed system, we invented a new mechanism to allosterically regulate Cre recombinase. Allosteric Cre regulation with NS3 ligands enables orthogonal recombination tools in eukaryotic cells and functions in divergent organisms to control prokaryotic recombinase activity.
Collapse
Affiliation(s)
- Elliot P Tague
- Department of Biomedical Engineering and Biological Design Center, Boston University, Boston, Massachusetts 02215, United States
| | - Jeffrey B McMahan
- Department of Biomedical Engineering and Biological Design Center, Boston University, Boston, Massachusetts 02215, United States
| | - Nathan Tague
- Department of Biomedical Engineering and Biological Design Center, Boston University, Boston, Massachusetts 02215, United States
| | - Mary J Dunlop
- Department of Biomedical Engineering and Biological Design Center, Boston University, Boston, Massachusetts 02215, United States
| | - John T Ngo
- Department of Biomedical Engineering and Biological Design Center, Boston University, Boston, Massachusetts 02215, United States
| |
Collapse
|
3
|
Tague EP, McMahan JB, Tague N, Dunlop MJ, Ngo JT. Controlled protein activities with viral proteases, antiviral peptides, and antiviral drugs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.27.530290. [PMID: 36909459 PMCID: PMC10002686 DOI: 10.1101/2023.02.27.530290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/03/2023]
Abstract
Chemical control of protein activity is a powerful tool for scientific study, synthetic biology, and cell therapy; however, for broad use, effective chemical inducer systems must minimally crosstalk with endogenous processes and exhibit desirable drug delivery properties. Accordingly, the drug-controllable proteolytic activity of hepatitis C cis- protease NS3 and its associated antiviral drugs have been used to regulate protein activity and gene modulation. These tools advantageously exploit non-eukaryotic/prokaryotic proteins and clinically approved inhibitors. Here we expand the toolkit by utilizing catalytically inactive NS3 protease as a high affinity binder to genetically encoded, antiviral peptides. Through our approach, we create NS3-peptide complexes that can be displaced by FDA-approved drugs to modulate transcription, cell signaling, split-protein complementation. With our developed system, we discover a new mechanism to allosterically regulate Cre recombinase. Allosteric Cre regulation with NS3 ligands enables orthogonal recombination tools in eukaryotic cells and functions in divergent organisms to control prokaryotic recombinase activity.
Collapse
|
4
|
Belyaeva J, Zlobin A, Maslova V, Golovin A. Modern non-polarizable force fields diverge in modeling the enzyme-substrate complex of a canonical serine protease. Phys Chem Chem Phys 2023; 25:6352-6361. [PMID: 36779321 DOI: 10.1039/d2cp05502c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Classical molecular dynamics simulation is a powerful and established method of modern computational chemistry. Being able to obtain accurate information on molecular behavior is crucial to get valuable insights into structure-function relationships that translate into fundamental findings and practical applications. Active sites of enzymes are known to be particularly intricate, therefore, simpler non-polarizable force fields may provide an inaccurate description. In this work, we addressed this hypothesis in a case of a canonical serine triad protease trypsin in its complex with a substrate-mimicking inhibitor. We tested six modern and popular force fields to find that significantly diverging results may be obtained. Amber FB-15 and OPLS-AA/M turned out to model the active site incorrectly. Amber ff19sb and ff15ipq demonstrated mixed performance. The best performing force fields were CHARMM36m and Amber ff99sb-ildn, therefore, they are recommended for use with this and related systems. We speculate that a similar lack of cross-force field convergence may be characteristic of other enzymatic systems. Therefore, we advocate for careful consideration of different force fields in any study within the field of computational enzymology.
Collapse
Affiliation(s)
- Julia Belyaeva
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia. .,Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 117997, Moscow, Russia
| | - Alexander Zlobin
- Sirius University of Science and Technology, 354340, Sochi, Russia.,Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Valentina Maslova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia. .,Sirius University of Science and Technology, 354340, Sochi, Russia
| | - Andrey Golovin
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia. .,Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 117997, Moscow, Russia.,Sirius University of Science and Technology, 354340, Sochi, Russia
| |
Collapse
|
5
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and Design of Protease Enzyme Specificity Using a Structure-Aware Graph Convolutional Network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.16.528728. [PMID: 36824945 PMCID: PMC9949123 DOI: 10.1101/2023.02.16.528728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key post-translational modification involved in physiology and disease. The ability to robustly and rapidly predict protease substrate specificity would also enable targeted proteolytic cleavage - editing - of a target protein by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally-derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the three-dimensional structure and energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically-grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases: the NS3/4 protease from the Hepatitis C virus (HCV) and the Tobacco Etch Virus (TEV) proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pre-trained PGCN model to guide the design of TEV protease libraries for cleaving two non-canonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Joseph H. Lubin
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | | | - Guanyang Wang
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| |
Collapse
|
6
|
Veit-Acosta M, de Azevedo Junior WF. Computational Prediction of Binding Affinity for CDK2-ligand Complexes. A Protein Target for Cancer Drug Discovery. Curr Med Chem 2021; 29:2438-2455. [PMID: 34365938 DOI: 10.2174/0929867328666210806105810] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 06/15/2021] [Accepted: 06/22/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND CDK2 participates in the control of eukaryotic cell-cycle progression. Due to the great interest in CDK2 for drug development and the relative easiness in crystallizing this enzyme, we have over 400 structural studies focused on this protein target. This structural data is the basis for the development of computational models to estimate CDK2-ligand binding affinity. OBJECTIVE This work focuses on the recent developments in the application of supervised machine learning modeling to develop scoring functions to predict the binding affinity of CDK2. METHOD We employed the structures available at the protein data bank and the ligand information accessed from the BindingDB, Binding MOAD, and PDBbind to evaluate the predictive performance of machine learning techniques combined with physical modeling used to calculate binding affinity. We compared this hybrid methodology with classical scoring functions available in docking programs. RESULTS Our comparative analysis of previously published models indicated that a model created using a combination of a mass-spring system and cross-validated Elastic Net to predict the binding affinity of CDK2-inhibitor complexes outperformed classical scoring functions available in AutoDock4 and AutoDock Vina. CONCLUSION All studies reviewed here suggest that targeted machine learning models are superior to classical scoring functions to calculate binding affinities. Specifically for CDK2, we see that the combination of physical modeling with supervised machine learning techniques exhibits improved predictive performance to calculate the protein-ligand binding affinity. These results find theoretical support in the application of the concept of scoring function space.
Collapse
Affiliation(s)
- Martina Veit-Acosta
- Western Michigan University, 1903 Western, Michigan Ave, Kalamazoo, MI 49008. United States
| | | |
Collapse
|
7
|
Mahajan SP, Srinivasan Y, Labonte JW, DeLisa MP, Gray JJ. Structural basis for peptide substrate specificities of glycosyltransferase GalNAc-T2. ACS Catal 2021; 11:2977-2991. [PMID: 34322281 DOI: 10.1021/acscatal.0c04609] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The polypeptide N-acetylgalactosaminyl transferase (GalNAc-T) enzyme family initiates O-linked mucin-type glycosylation. The family constitutes 20 isoenzymes in humans. GalNAc-Ts exhibit both redundancy and finely tuned specificity for a wide range of peptide substrates. In this work, we deciphered the sequence and structural motifs that determine the peptide substrate preferences for the GalNAc-T2 isoform. Our approach involved sampling and characterization of peptide-enzyme conformations obtained from Rosetta Monte Carlo-minimization-based flexible docking. We computationally scanned 19 amino acid residues at positions -1 and +1 of an eight-residue peptide substrate, which comprised a dataset of 361 (19x19) peptides with previously characterized experimental GalNAc-T2 glycosylation efficiencies. The calculations recapitulated experimental specificity data, successfully discriminating between glycosylatable and non-glycosylatable peptides with a probability of 96.5% (ROC-AUC score), a balanced accuracy of 85.5% and a false positive rate of 7.3%. The glycosylatable peptide substrates viz. peptides with proline, serine, threonine, and alanine at the -1 position of the peptide preferentially exhibited cognate sequon-like conformations. The preference for specific residues at the -1 position of the peptide was regulated by enzyme residues R362, K363, Q364, H365 and W331, which modulate the pocket size and specific enzyme-peptide interactions. For the +1 position of the peptide, enzyme residues K281 and K363 formed gating interactions with aromatics and glutamines at the +1 position of the peptide, leading to modes of peptide-binding sub-optimal for catalysis. Overall, our work revealed enzyme features that lead to the finely tuned specificity observed for a broad range of peptide substrates for the GalNAc-T2 enzyme. We anticipate that the key sequence and structural motifs can be extended to analyze specificities of other isoforms of the GalNAc-T family and can be used to guide design of variants with tailored specificity.
Collapse
Affiliation(s)
- Sai Pooja Mahajan
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Yashes Srinivasan
- Department of Bioengineering, University of California—Los Angeles, Los Angeles, California 90095, United States
| | - Jason W. Labonte
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
- Department of Chemistry, Franklin & Marshall College, Lancaster, Pennsylvania 17604, United States
| | - Matthew P. DeLisa
- Robert Frederick Smith School of Chemical and Biomolecular Engineering, Department of Microbiology, and Nancy E. and Peter C. Meinig School of Biomedical Engineering, Biochemistry, Molecular and Cell Biology, Cornell University, Ithaca, New York 14853, United States
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, Maryland 21224, United States
| |
Collapse
|
8
|
Denard CA, Paresi C, Yaghi R, McGinnis N, Bennett Z, Yi L, Georgiou G, Iverson BL. YESS 2.0, a Tunable Platform for Enzyme Evolution, Yields Highly Active TEV Protease Variants. ACS Synth Biol 2021; 10:63-71. [PMID: 33401904 DOI: 10.1021/acssynbio.0c00452] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Here we describe YESS 2.0, a highly versatile version of the yeast endoplasmic sequestration screening (YESS) system suitable for engineering and characterizing protein/peptide modifying enzymes such as proteases with desired new activities. By incorporating features that modulate gene transcription as well as substrate and enzyme spatial sequestration, YESS 2.0 achieves a significantly higher operational and dynamic range compared with the original YESS. To showcase the new advantages of YESS 2.0, we improved an already efficient TEV protease variant (TEV-EAV) to obtain a variant (eTEV) with a 2.25-fold higher catalytic efficiency, derived almost entirely from an increase in turnover rate (kcat). In our analysis, eTEV specifically digests a fusion protein in 2 h at a low 1:200 enzyme to substrate ratio. Structural modeling indicates that the increase in catalytic efficiency of eTEV is likely due to an enhanced interaction between the catalytic Cys151 with the P1 substrate residue (Gln). Furthermore, the modeling showed that the ENLYFQS peptide substrate is buried to a larger extent in the active site of eTEV compared with WT TEV. The new eTEV variant is functionally the fastest TEV variant reported to date and could potentially improve efficiency in any TEV application.
Collapse
Affiliation(s)
- Carl A. Denard
- Department of Chemistry, University of Texas at Austin, Austin, Texas 78712, United States
| | - Chelsea Paresi
- Department of Chemistry, University of Texas at Austin, Austin, Texas 78712, United States
| | - Rasha Yaghi
- Department of Chemistry, University of Texas at Austin, Austin, Texas 78712, United States
| | - Natalie McGinnis
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas 78712, United States
| | - Zachary Bennett
- Department of Biomedical Engineering, University of Texas at Austin, Austin, Texas 78712, United States
| | - Li Yi
- Department of Chemistry, University of Texas at Austin, Austin, Texas 78712, United States
| | - George Georgiou
- Department of Chemical Engineering, University of Texas at Austin, Austin, Texas 78712, United States
| | - Brent L. Iverson
- Department of Chemistry, University of Texas at Austin, Austin, Texas 78712, United States
| |
Collapse
|
9
|
Abstract
Molecular dynamics (MD) simulations have become increasingly useful in the modern drug development process. In this review, we give a broad overview of the current application possibilities of MD in drug discovery and pharmaceutical development. Starting from the target validation step of the drug development process, we give several examples of how MD studies can give important insights into the dynamics and function of identified drug targets such as sirtuins, RAS proteins, or intrinsically disordered proteins. The role of MD in antibody design is also reviewed. In the lead discovery and lead optimization phases, MD facilitates the evaluation of the binding energetics and kinetics of the ligand-receptor interactions, therefore guiding the choice of the best candidate molecules for further development. The importance of considering the biological lipid bilayer environment in the MD simulations of membrane proteins is also discussed, using G-protein coupled receptors and ion channels as well as the drug-metabolizing cytochrome P450 enzymes as relevant examples. Lastly, we discuss the emerging role of MD simulations in facilitating the pharmaceutical formulation development of drugs and candidate drugs. Specifically, we look at how MD can be used in studying the crystalline and amorphous solids, the stability of amorphous drug or drug-polymer formulations, and drug solubility. Moreover, since nanoparticle drug formulations are of great interest in the field of drug delivery research, different applications of nano-particle simulations are also briefly summarized using multiple recent studies as examples. In the future, the role of MD simulations in facilitating the drug development process is likely to grow substantially with the increasing computer power and advancements in the development of force fields and enhanced MD methodologies.
Collapse
|
10
|
Ochoa R, Magnitov M, Laskowski RA, Cossio P, Thornton JM. An automated protocol for modelling peptide substrates to proteases. BMC Bioinformatics 2020; 21:586. [PMID: 33375946 PMCID: PMC7771086 DOI: 10.1186/s12859-020-03931-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 12/09/2020] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Proteases are key drivers in many biological processes, in part due to their specificity towards their substrates. However, depending on the family and molecular function, they can also display substrate promiscuity which can also be essential. Databases compiling specificity matrices derived from experimental assays have provided valuable insights into protease substrate recognition. Despite this, there are still gaps in our knowledge of the structural determinants. Here, we compile a set of protease crystal structures with bound peptide-like ligands to create a protocol for modelling substrates bound to protease structures, and for studying observables associated to the binding recognition. RESULTS As an application, we modelled a subset of protease-peptide complexes for which experimental cleavage data are available to compare with informational entropies obtained from protease-specificity matrices. The modelled complexes were subjected to conformational sampling using the Backrub method in Rosetta, and multiple observables from the simulations were calculated and compared per peptide position. We found that some of the calculated structural observables, such as the relative accessible surface area and the interaction energy, can help characterize a protease's substrate recognition, giving insights for the potential prediction of novel substrates by combining additional approaches. CONCLUSION Overall, our approach provides a repository of protease structures with annotated data, and an open source computational protocol to reproduce the modelling and dynamic analysis of the protease-peptide complexes.
Collapse
Affiliation(s)
- Rodrigo Ochoa
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia, 050010, Medellín, Colombia.
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Mikhail Magnitov
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- Department of Biological and Medical Physics, Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Russia, 141701
| | - Roman A Laskowski
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Pilar Cossio
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia, 050010, Medellín, Colombia
- Department of Theoretical Biophysics, Max Planck Institute of Biophysics, 60438, Frankfurt am Main, Germany
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
11
|
Timucin AC. Structure based peptide design, molecular dynamics and MM-PBSA studies for targeting C terminal dimerization of NFAT5 DNA binding domain. J Mol Graph Model 2020; 103:107804. [PMID: 33248341 DOI: 10.1016/j.jmgm.2020.107804] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 11/10/2020] [Accepted: 11/10/2020] [Indexed: 11/27/2022]
Abstract
NFAT5 as a transcription factor with an established role in osmotic stress response, has also been revealed to be active under numerous settings, including pathological conditions such as diabetic microvascular complications, chronic arthritis and cancer. Despite these links, current strategies for downregulating NFAT5 activity only relies on indirect modulators, not directly targeting NFAT5, itself. With this study, through using a computational approach, an original peptide was explored to directly target C terminal dimerization of NFAT5 RHR, located in its DNA binding domain. At first, homodimeric NFAT5 RHR bound to its consensus DNA was used for prediction of a preliminary peptide sequence. Possible amino acid replacements for this preliminary peptide were predicted for optimization, which was followed by addition of a cell penetrating peptide sequence. These attempts yielded a small peptide library, which was further investigated for peptide affinities towards C terminal of NFAT5 RHR through molecular docking, 50 ns and 250 ns molecular dynamics simulations, followed by estimation of MM-PBSA based relative binding free energies. Results indicated that after receiving mutations on the preliminary peptide sequence for optimization, a unique peptide could target C terminal dimerization region of NFAT5 RHR through using its cell penetrating peptide sequence. In conclusion, this is the first study presenting computational evidence on identification of a novel peptide capable of directly targeting NFAT5 dimerization. Besides, future implications of these observations were also discussed in terms of methodology and possible applications.
Collapse
Affiliation(s)
- Ahmet Can Timucin
- Department of Chemical Engineering, Faculty of Natural Sciences and Engineering, Üsküdar University, Turkey; Neuropsychopharmacology Application and Research Center (NPARC), Üsküdar University, Turkey.
| |
Collapse
|
12
|
Wheeler LC, Perkins A, Wong CE, Harms MJ. Learning peptide recognition rules for a low-specificity protein. Protein Sci 2020; 29:2259-2273. [PMID: 32979254 PMCID: PMC7586891 DOI: 10.1002/pro.3958] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 09/18/2020] [Accepted: 09/18/2020] [Indexed: 12/18/2022]
Abstract
Many proteins interact with short linear regions of target proteins. For some proteins, however, it is difficult to identify a well-defined sequence motif that defines its target peptides. To overcome this difficulty, we used supervised machine learning to train a model that treats each peptide as a collection of easily-calculated biochemical features rather than as an amino acid sequence. As a test case, we dissected the peptide-recognition rules for human S100A5 (hA5), a low-specificity calcium binding protein. We trained a Random Forest model against a recently released, high-throughput phage display dataset collected for hA5. The model identifies hydrophobicity and shape complementarity, rather than polar contacts, as the primary determinants of peptide binding specificity in hA5. We tested this hypothesis by solving a crystal structure of hA5 and through computational docking studies of diverse peptides onto hA5. These structural studies revealed that peptides exhibit multiple binding modes at the hA5 peptide interface-all of which have few polar contacts with hA5. Finally, we used our trained model to predict new, plausible binding targets in the human proteome. This revealed a fragment of the protein α-1-syntrophin that binds to hA5. Our work helps better understand the biochemistry and biology of hA5, as well as demonstrating how high-throughput experiments coupled with machine learning of biochemical features can reveal the determinants of binding specificity in low-specificity proteins.
Collapse
Affiliation(s)
- Lucas C. Wheeler
- Institute of Molecular BiologyUniversity of OregonEugeneOregonUSA
- Department of Chemistry and BiochemistryUniversity of OregonEugeneOregonUSA
- Department of Ecology and Evolutionary BiologyUniversity of ColoradoBoulderColoradoUSA
| | - Arden Perkins
- Institute of Molecular BiologyUniversity of OregonEugeneOregonUSA
- Department of Chemistry and BiochemistryUniversity of OregonEugeneOregonUSA
| | - Caitlyn E. Wong
- Institute of Molecular BiologyUniversity of OregonEugeneOregonUSA
- Department of Chemistry and BiochemistryUniversity of OregonEugeneOregonUSA
| | - Michael J. Harms
- Institute of Molecular BiologyUniversity of OregonEugeneOregonUSA
- Department of Chemistry and BiochemistryUniversity of OregonEugeneOregonUSA
| |
Collapse
|
13
|
Vihinen M. Functional effects of protein variants. Biochimie 2020; 180:104-120. [PMID: 33164889 DOI: 10.1016/j.biochi.2020.10.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 10/15/2020] [Accepted: 10/19/2020] [Indexed: 12/11/2022]
Abstract
Genetic and other variations frequently affect protein functions. Scientific articles can contain confusing descriptions about which function or property is affected, and in many cases the statements are pure speculation without any experimental evidence. To clarify functional effects of protein variations of genetic or non-genetic origin, a systematic conceptualisation and framework are introduced. This framework describes protein functional effects on abundance, activity, specificity and affinity, along with countermeasures, which allow cells, tissues and organisms to tolerate, avoid, repair, attenuate or resist (TARAR) the effects. Effects on abundance discussed include gene dosage, restricted expression, mis-localisation and degradation. Enzymopathies, effects on kinetics, allostery and regulation of protein activity are subtopics for the effects of variants on activity. Variation outcomes on specificity and affinity comprise promiscuity, specificity, affinity and moonlighting. TARAR mechanisms redress variations with active and passive processes including chaperones, redundancy, robustness, canalisation and metabolic and signalling rewiring. A framework for pragmatic protein function analysis and presentation is introduced. All of the mechanisms and effects are described along with representative examples, most often in relation to diseases. In addition, protein function is discussed from evolutionary point of view. Application of the presented framework facilitates unambiguous, detailed and specific description of functional effects and their systematic study.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184, Lund, Sweden.
| |
Collapse
|
14
|
Cognitive Framework for HIV-1 Protease Cleavage Site Classification Using Evolutionary Algorithm. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2019. [DOI: 10.1007/s13369-019-03871-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
15
|
Fong P, Wong HK. Evaluation of Scoring Function Performance on DNA-ligand Complexes. THE OPEN MEDICINAL CHEMISTRY JOURNAL 2019. [DOI: 10.2174/1874104501913010040] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Background:
DNA has been a pharmacological target for different types of treatment, such as antibiotics and chemotherapy agents, and is still a potential target in many drug discovery processes. However, most docking and scoring approaches were parameterised for protein-ligand interactions; their suitability for modelling DNA-ligand interactions is uncertain.
Objective:
This study investigated the performance of four scoring functions on DNA-ligand complexes.
Material & Methods:
Here, we explored the ability of four docking protocols and scoring functions to discriminate the native pose of 33 DNA-ligand complexes over a compiled set of 200 decoys for each DNA-ligand complexes. The four approaches were the AutoDock, ASP@GOLD, ChemScore@GOLD and GoldScore@GOLD.
Results:
Our results indicate that AutoDock performed the best when predicting binding mode and that ChemScore@GOLD achieved the best discriminative power. Rescoring of AutoDock-generated decoys with ChemScore@GOLD further enhanced their individual discriminative powers. All four approaches have no discriminative power in some DNA-ligand complexes, including both minor groove binders and intercalators.
Conclusion:
This study suggests that the evaluation for each DNA-ligand complex should be performed in order to obtain meaningful results for any drug discovery processes. Rescoring with different scoring functions can improve discriminative power.
Collapse
|
16
|
Data-driven supervised learning of a viral protease specificity landscape from deep sequencing and molecular simulations. Proc Natl Acad Sci U S A 2018; 116:168-176. [PMID: 30587591 DOI: 10.1073/pnas.1805256116] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Biophysical interactions between proteins and peptides are key determinants of molecular recognition specificity landscapes. However, an understanding of how molecular structure and residue-level energetics at protein-peptide interfaces shape these landscapes remains elusive. We combine information from yeast-based library screening, next-generation sequencing, and structure-based modeling in a supervised machine learning approach to report the comprehensive sequence-energetics-function mapping of the specificity landscape of the hepatitis C virus (HCV) NS3/4A protease, whose function-site-specific cleavages of the viral polyprotein-is a key determinant of viral fitness. We screened a library of substrates in which five residue positions were randomized and measured cleavability of ∼30,000 substrates (∼1% of the library) using yeast display and fluorescence-activated cell sorting followed by deep sequencing. Structure-based models of a subset of experimentally derived sequences were used in a supervised learning procedure to train a support vector machine to predict the cleavability of 3.2 million substrate variants by the HCV protease. The resulting landscape allows identification of previously unidentified HCV protease substrates, and graph-theoretic analyses reveal extensive clustering of cleavable and uncleavable motifs in sequence space. Specificity landscapes of known drug-resistant variants are similarly clustered. The described approach should enable the elucidation and redesign of specificity landscapes of a wide variety of proteases, including human-origin enzymes. Our results also suggest a possible role for residue-level energetics in shaping plateau-like functional landscapes predicted from viral quasispecies theory.
Collapse
|
17
|
Waldner BJ, Kraml J, Kahler U, Spinn A, Schauperl M, Podewitz M, Fuchs JE, Cruciani G, Liedl KR. Electrostatic recognition in substrate binding to serine proteases. J Mol Recognit 2018; 31:e2727. [PMID: 29785722 PMCID: PMC6175425 DOI: 10.1002/jmr.2727] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 04/11/2018] [Accepted: 04/11/2018] [Indexed: 12/16/2022]
Abstract
Serine proteases of the Chymotrypsin family are structurally very similar but have very different substrate preferences. This study investigates a set of 9 different proteases of this family comprising proteases that prefer substrates containing positively charged amino acids, negatively charged amino acids, and uncharged amino acids with varying degree of specificity. Here, we show that differences in electrostatic substrate preferences can be predicted reliably by electrostatic molecular interaction fields employing customized GRID probes. Thus, we are able to directly link protease structures to their electrostatic substrate preferences. Additionally, we present a new metric that measures similarities in substrate preferences focusing only on electrostatics. It efficiently compares these electrostatic substrate preferences between different proteases. This new metric can be interpreted as the electrostatic part of our previously developed substrate similarity metric. Consequently, we suggest, that substrate recognition in terms of electrostatics and shape complementarity are rather orthogonal aspects of substrate recognition. This is in line with a 2‐step mechanism of protein‐protein recognition suggested in the literature.
Collapse
Affiliation(s)
- Birgit J Waldner
- Institute of General, Inorganic and Theoretical Chemistry, and Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, Innsbruck, Austria
| | - Johannes Kraml
- Institute of General, Inorganic and Theoretical Chemistry, and Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, Innsbruck, Austria
| | - Ursula Kahler
- Institute of General, Inorganic and Theoretical Chemistry, and Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, Innsbruck, Austria
| | - Alexander Spinn
- Institute of General, Inorganic and Theoretical Chemistry, and Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, Innsbruck, Austria
| | - Michael Schauperl
- Institute of General, Inorganic and Theoretical Chemistry, and Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, Innsbruck, Austria
| | - Maren Podewitz
- Institute of General, Inorganic and Theoretical Chemistry, and Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, Innsbruck, Austria
| | - Julian E Fuchs
- Institute of General, Inorganic and Theoretical Chemistry, and Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, Innsbruck, Austria
| | - Gabriele Cruciani
- Laboratory of Chemometrics, Department of Chemistry, University of Perugia, Perugia, Italy
| | - Klaus R Liedl
- Institute of General, Inorganic and Theoretical Chemistry, and Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, Innsbruck, Austria
| |
Collapse
|
18
|
Wang C, Greene D, Xiao L, Qi R, Luo R. Recent Developments and Applications of the MMPBSA Method. Front Mol Biosci 2018; 4:87. [PMID: 29367919 PMCID: PMC5768160 DOI: 10.3389/fmolb.2017.00087] [Citation(s) in RCA: 325] [Impact Index Per Article: 54.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 11/30/2017] [Indexed: 12/23/2022] Open
Abstract
The Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA) approach has been widely applied as an efficient and reliable free energy simulation method to model molecular recognition, such as for protein-ligand binding interactions. In this review, we focus on recent developments and applications of the MMPBSA method. The methodology review covers solvation terms, the entropy term, extensions to membrane proteins and high-speed screening, and new automation toolkits. Recent applications in various important biomedical and chemical fields are also reviewed. We conclude with a few future directions aimed at making MMPBSA a more robust and efficient method.
Collapse
Affiliation(s)
- Changhao Wang
- Chemical and Materials Physics Graduate Program, University of California, Irvine, Irvine, CA, United States
- Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA, United States
- Department of Physics and Astronomy, University of California, Irvine, Irvine, CA, United States
| | - D'Artagnan Greene
- Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA, United States
| | - Li Xiao
- Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA, United States
- Department of Biomedical Engineering, University of California, Irvine, Irvine, CA, United States
| | - Ruxi Qi
- Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA, United States
| | - Ray Luo
- Chemical and Materials Physics Graduate Program, University of California, Irvine, Irvine, CA, United States
- Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA, United States
- Department of Biomedical Engineering, University of California, Irvine, Irvine, CA, United States
- Department of Chemical Engineering and Materials Science, University of California, Irvine, Irvine, CA, United States
| |
Collapse
|
19
|
Greening DW, Kapp EA, Simpson RJ. The Peptidome Comes of Age: Mass Spectrometry-Based Characterization of the Circulating Cancer Peptidome. Enzymes 2017; 42:27-64. [PMID: 29054270 DOI: 10.1016/bs.enz.2017.08.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Peptides play a seminal role in most physiological processes acting as neurotransmitters, hormones, antibiotics, and immune regulation. In the context of tumor biology, it is hypothesized that endogenous peptides, hormones, cytokines, growth factors, and aberrant degradation of select protein networks (e.g., enzymatic activities, protein shedding, and extracellular matrix remodeling) are fundamental in mediating cancer progression. Analysis of peptides in biological fluids by mass spectrometry holds promise of providing sensitive and specific diagnostic and prognostic information for cancer and other diseases. The identification of circulating peptides in the context of disease constitutes a hitherto source of new clinical biomarkers. The field of peptidomics can be defined as the identification and comprehensive analysis of physiological and pathological peptides. Like proteomics, peptidomics has been advanced by the development of new separation strategies, analytical detection methods such as mass spectrometry, and bioinformatic technologies. Unlike proteomics, peptidomics is targeted toward identifying endogenous protein and peptide fragments, defining proteolytic enzyme substrate specificity, as well as protease cleavage recognition (degradome). Peptidomics employs "top-down proteomics" strategies where mass spectrometry is applied at the proteoform level to analyze intact proteins and large endogenous peptide fragments. With recent advances in prefractionation workflows for separating peptides, mass spectrometry instrumentation, and informatics, peptidomics is an important field that promises to impact on translational medicine. This review covers the current advances in peptidomics, including top-down and imaging mass spectrometry, comprehensive quantitative peptidome analyses (developments in reproducibility and coverage), peptide prefractionation and enrichment workflows, peptidomic data analyses, and informatic tools. The application of peptidomics in cancer biomarker discovery will be discussed.
Collapse
Affiliation(s)
- David W Greening
- La Trobe Institute for Molecular Science (LIMS), La Trobe University, Melbourne, Victoria, Australia.
| | - Eugene A Kapp
- Systems Biology & Personalised Medicine Division, Walter & Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia; Florey Institute of Neuroscience, Parkville, Victoria, Australia; University of Melbourne, Parkville, Victoria, Australia
| | - Richard J Simpson
- La Trobe Institute for Molecular Science (LIMS), La Trobe University, Melbourne, Victoria, Australia.
| |
Collapse
|
20
|
Rubenstein AB, Pethe MA, Khare SD. MFPred: Rapid and accurate prediction of protein-peptide recognition multispecificity using self-consistent mean field theory. PLoS Comput Biol 2017; 13:e1005614. [PMID: 28650961 PMCID: PMC5507473 DOI: 10.1371/journal.pcbi.1005614] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Revised: 07/11/2017] [Accepted: 06/02/2017] [Indexed: 11/24/2022] Open
Abstract
Multispecificity-the ability of a single receptor protein molecule to interact with multiple substrates-is a hallmark of molecular recognition at protein-protein and protein-peptide interfaces, including enzyme-substrate complexes. The ability to perform structure-based prediction of multispecificity would aid in the identification of novel enzyme substrates, protein interaction partners, and enable design of novel enzymes targeted towards alternative substrates. The relatively slow speed of current biophysical, structure-based methods limits their use for prediction and, especially, design of multispecificity. Here, we develop a rapid, flexible-backbone self-consistent mean field theory-based technique, MFPred, for multispecificity modeling at protein-peptide interfaces. We benchmark our method by predicting experimentally determined peptide specificity profiles for a range of receptors: protease and kinase enzymes, and protein recognition modules including SH2, SH3, MHC Class I and PDZ domains. We observe robust recapitulation of known specificities for all receptor-peptide complexes, and comparison with other methods shows that MFPred results in equivalent or better prediction accuracy with a ~10-1000-fold decrease in computational expense. We find that modeling bound peptide backbone flexibility is key to the observed accuracy of the method. We used MFPred for predicting with high accuracy the impact of receptor-side mutations on experimentally determined multispecificity of a protease enzyme. Our approach should enable the design of a wide range of altered receptor proteins with programmed multispecificities.
Collapse
Affiliation(s)
- Aliza B. Rubenstein
- Computational Biology & Molecular Biophysics Program, Rutgers, The State University of New Jersey, Piscataway, NJ
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ
| | - Manasi A. Pethe
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ
- Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ
| | - Sagar D. Khare
- Computational Biology & Molecular Biophysics Program, Rutgers, The State University of New Jersey, Piscataway, NJ
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ
- Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ
| |
Collapse
|