1
|
Childs H, Guerin N, Zhou P, Donald BR. Protocol for Designing De Novo Noncanonical Peptide Binders in OSPREY. J Comput Biol 2024; 31:965-974. [PMID: 39364612 PMCID: PMC11698684 DOI: 10.1089/cmb.2024.0669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024] Open
Abstract
D-peptides, the mirror image of canonical L-peptides, offer numerous biological advantages that make them effective therapeutics. This article details how to use DexDesign, the newest OSPREY-based algorithm, for designing these D-peptides de novo. OSPREY physics-based models precisely mimic energy-equivariant reflection operations, enabling the generation of D-peptide scaffolds from L-peptide templates. Due to the scarcity of D-peptide:L-protein structural data, DexDesign calls a geometric hashing algorithm, Method of Accelerated Search for Tertiary Ensemble Representatives, as a subroutine to produce a synthetic structural dataset. DexDesign enables mixed-chirality designs with a new user interface and also reduces the conformation and sequence search space using three new design techniques: Minimum Flexible Set, Inverse Alanine Scanning, and K*-based Mutational Scanning.
Collapse
Affiliation(s)
- Henry Childs
- Department of Chemistry, Duke University, Durham, North Carolina, USA
| | - Nathan Guerin
- Department of Computer Science, Duke University, Durham, North Carolina, USA
| | - Pei Zhou
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina, USA
| | - Bruce R. Donald
- Department of Chemistry, Duke University, Durham, North Carolina, USA
- Department of Computer Science, Duke University, Durham, North Carolina, USA
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina, USA
- Department of Mathematics, Duke University, Durham, North Carolina, USA
| |
Collapse
|
2
|
Karvelis E, Swanson C, Tidor B. Substrate Turnover Dynamics Guide Ketol-Acid Reductoisomerase Redesign for Increased Specific Activity. ACS Catal 2024; 14:10491-10509. [PMID: 39050899 PMCID: PMC11264209 DOI: 10.1021/acscatal.4c01446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 05/16/2024] [Accepted: 06/12/2024] [Indexed: 07/27/2024]
Abstract
The task of adapting enzymes for specific applications is often hampered by our incomplete ability to tune and tailor catalytic functions, particularly when seeking increased activity. Here, we develop and demonstrate a rational approach to address this challenge, applied to ketol-acid reductoisomerase (KARI), which has uses in industrial-scale isobutanol production. While traditional structure-based computational enzyme redesign strategies typically focus on the enzyme-bound ground state (GS) and transition state (TS), we postulated that additionally treating the underlying dynamics of complete turnover events that connect and pass through both states could further elucidate the structural properties affecting catalysis and help identify mutations that lead to increased catalytic activity. To examine the dynamics of substrate conversion with atomistic detail, we adapted and applied computational methods based on path sampling techniques to gather thousands of QM/MM simulations of attempted substrate turnover events by KARI: both productive (reactive) and unproductive (nonreactive) attempts. From these data, machine learning models were constructed and used to identify specific conformational features (interatomic distances, angles, and torsions) associated with successful, productive catalysis. Multistate protein redesign techniques were then used to select mutations that stabilized reactive-like structures over nonreactive-like ones while also meeting additional criteria consistent with enhanced specific activity. This procedure resulted in eight high-confidence enzyme mutants with a significant improvement in calculated specific activity relative to wild type (WT), with the fastest variant's increase in calculated k cat being (2 ± 1) × 104-fold. Collectively, these results suggest that introducing mutations designed to increase the population of reaction-promoting conformations of the enzyme-substrate complex before it reaches the barrier can provide an effective approach to engineering improved enzyme catalysts.
Collapse
Affiliation(s)
- Elijah Karvelis
- Department
of Biological Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
- Computer
Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chloe Swanson
- Department
of Biological Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
- Computer
Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Bruce Tidor
- Department
of Biological Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
- Computer
Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
3
|
Guerin N, Kaserer T, Donald BR. Protocol for predicting drug-resistant protein mutations to an ERK2 inhibitor using RESISTOR. STAR Protoc 2023; 4:102170. [PMID: 37115667 PMCID: PMC10173857 DOI: 10.1016/j.xpro.2023.102170] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Revised: 01/11/2023] [Accepted: 02/21/2023] [Indexed: 04/29/2023] Open
Abstract
Prospective predictions of drug-resistant protein mutants could improve the design of therapeutics less prone to resistance. Here, we describe RESISTOR, an algorithm that uses structure- and sequence-based criteria to predict resistance mutations. We demonstrate the process of using RESISTOR to predict ERK2 mutants likely to arise in melanoma ablating the efficacy of the ERK1/2 inhibitor SCH779284. RESISTOR is included in the free and open-source computational protein design software OSPREY. For complete details on the use and execution of this protocol, please refer to Guerin et al..1.
Collapse
Affiliation(s)
- Nathan Guerin
- Department of Computer Science, Duke University, Durham, NC 27708, USA.
| | - Teresa Kaserer
- Institute of Pharmacy/Pharmaceutical Chemistry, University of Innsbruck, 6020 Innsbruck Austria
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC 27708, USA; Department of Biochemistry, Duke University Medical Center, Durham, NC 22710, USA; Department of Chemistry, Duke University, Durham, NC 27708, USA; Department of Mathematics, Duke University, Durham, NC 27708, USA.
| |
Collapse
|
4
|
Abstract
Computational, in silico prediction of resistance-conferring escape mutations could accelerate the design of therapeutics less prone to resistance. This article describes how to use the Resistor algorithm to predict escape mutations. Resistor employs Pareto optimization on four resistance-conferring criteria-positive and negative design, mutational probability, and hotspot cardinality-to assign a Pareto rank to each prospective mutant. It also predicts the mechanism of resistance, that is, whether a mutant ablates binding to a drug, strengthens binding to the endogenous ligand, or a combination of these two factors, and provides structural models of the mutants. Resistor is part of the free and open-source computational protein design software OSPREY.
Collapse
Affiliation(s)
- Nathan Guerin
- Department of Computer Science, Duke University, Durham, North Carolina, USA
| | - Teresa Kaserer
- Institute of Pharmacy/Pharmaceutical Chemistry, University of Innsbruck, Innsbruck, Austria
| | - Bruce R. Donald
- Department of Computer Science, Duke University, Durham, North Carolina, USA
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, USA
- Department of Chemistry, Duke University, Durham, North Carolina, USA
- Department of Mathematics, Duke University, Durham, North Carolina, USA
| |
Collapse
|
5
|
Guerin N, Feichtner A, Stefan E, Kaserer T, Donald BR. Resistor: An algorithm for predicting resistance mutations via Pareto optimization over multistate protein design and mutational signatures. Cell Syst 2022; 13:830-843.e3. [PMID: 36265469 PMCID: PMC9589925 DOI: 10.1016/j.cels.2022.09.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 06/29/2022] [Accepted: 09/13/2022] [Indexed: 01/26/2023]
Abstract
Resistance to pharmacological treatments is a major public health challenge. Here, we introduce Resistor-a structure- and sequence-based algorithm that prospectively predicts resistance mutations for drug design. Resistor computes the Pareto frontier of four resistance-causing criteria: the change in binding affinity (ΔKa) of the (1) drug and (2) endogenous ligand upon a protein's mutation; (3) the probability a mutation will occur based on empirically derived mutational signatures; and (4) the cardinality of mutations comprising a hotspot. For validation, we applied Resistor to EGFR and BRAF kinase inhibitors treating lung adenocarcinoma and melanoma. Resistor correctly identified eight clinically significant EGFR resistance mutations, including the erlotinib and gefitinib "gatekeeper" T790M mutation and five known osimertinib resistance mutations. Furthermore, Resistor predictions are consistent with BRAF inhibitor sensitivity data from both retrospective and prospective experiments using KinCon biosensors. Resistor is available in the open-source protein design software OSPREY.
Collapse
Affiliation(s)
- Nathan Guerin
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Andreas Feichtner
- Institute of Biochemistry and Center for Molecular Biosciences, University of Innsbruck, Innsbruck, 6020 Tyrol, Austria
| | - Eduard Stefan
- Institute of Biochemistry and Center for Molecular Biosciences, University of Innsbruck, Innsbruck, 6020 Tyrol, Austria; Tyrolean Cancer Research Institute, Innsbruck, 6020 Tyrol, Austria
| | - Teresa Kaserer
- Institute of Pharmacy/Pharmaceutical Chemistry, University of Innsbruck, Innsbruck, 6020 Tyrol, Austria.
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC 27708, USA; Department of Biochemistry, Duke University Medical Center, Durham, NC 27710, USA; Department of Chemistry, Duke University, Durham, NC 27708, USA; Department of Mathematics, Duke University, Durham, NC 27708, USA.
| |
Collapse
|
6
|
Talluri S. Algorithms for protein design. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:1-38. [PMID: 35534105 DOI: 10.1016/bs.apcsb.2022.01.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Computational Protein Design has the potential to contribute to major advances in enzyme technology, vaccine design, receptor-ligand engineering, biomaterials, nanosensors, and synthetic biology. Although Protein Design is a challenging problem, proteins can be designed by experts in Protein Design, as well as by non-experts whose primary interests are in the applications of Protein Design. The increased accessibility of Protein Design technology is attributable to the accumulated knowledge and experience with Protein Design as well as to the availability of software and online resources. The objective of this review is to serve as a guide to the relevant literature with a focus on the novel methods and algorithms that have been developed or applied for Protein Design, and to assist in the selection of algorithms for Protein Design. Novel algorithms and models that have been introduced to utilize the enormous amount of experimental data and novel computational hardware have the potential for producing substantial increases in the accuracy, reliability and range of applications of designed proteins.
Collapse
Affiliation(s)
- Sekhar Talluri
- Department of Biotechnology, GITAM, Visakhapatnam, India.
| |
Collapse
|
7
|
Bouchiba Y, Cortés J, Schiex T, Barbe S. Molecular flexibility in computational protein design: an algorithmic perspective. Protein Eng Des Sel 2021; 34:6271252. [PMID: 33959778 DOI: 10.1093/protein/gzab011] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/12/2021] [Accepted: 03/29/2021] [Indexed: 12/19/2022] Open
Abstract
Computational protein design (CPD) is a powerful technique for engineering new proteins, with both great fundamental implications and diverse practical interests. However, the approximations usually made for computational efficiency, using a single fixed backbone and a discrete set of side chain rotamers, tend to produce rigid and hyper-stable folds that may lack functionality. These approximations contrast with the demonstrated importance of molecular flexibility and motions in a wide range of protein functions. The integration of backbone flexibility and multiple conformational states in CPD, in order to relieve the inaccuracies resulting from these simplifications and to improve design reliability, are attracting increased attention. However, the greatly increased search space that needs to be explored in these extensions defines extremely challenging computational problems. In this review, we outline the principles of CPD and discuss recent effort in algorithmic developments for incorporating molecular flexibility in the design process.
Collapse
Affiliation(s)
- Younes Bouchiba
- Toulouse Biotechnology Institute, TBI, CNRS, INRAE, INSA, ANITI, Toulouse 31400, France.,Laboratoire d'Analyse et d'Architecture des Systèmes, LAAS CNRS, Université de Toulouse, CNRS, Toulouse 31400, France
| | - Juan Cortés
- Laboratoire d'Analyse et d'Architecture des Systèmes, LAAS CNRS, Université de Toulouse, CNRS, Toulouse 31400, France
| | - Thomas Schiex
- Université de Toulouse, ANITI, INRAE, UR MIAT, F-31320, Castanet-Tolosan, France
| | - Sophie Barbe
- Toulouse Biotechnology Institute, TBI, CNRS, INRAE, INSA, ANITI, Toulouse 31400, France
| |
Collapse
|
8
|
Karimi M, Zhu S, Cao Y, Shen Y. De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks. J Chem Inf Model 2020; 60:5667-5681. [PMID: 32945673 PMCID: PMC7775287 DOI: 10.1021/acs.jcim.0c00593] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Although massive data is quickly accumulating on protein sequence and structure, there is a small and limited number of protein architectural types (or structural folds). This study is addressing the following question: how well could one reveal underlying sequence-structure relationships and design protein sequences for an arbitrary, potentially novel, structural fold? In response to the question, we have developed novel deep generative models, namely, semisupervised gcWGAN (guided, conditional, Wasserstein Generative Adversarial Networks). To overcome training difficulties and improve design qualities, we build our models on conditional Wasserstein GAN (WGAN) that uses Wasserstein distance in the loss function. Our major contributions include (1) constructing a low-dimensional and generalizable representation of the fold space for the conditional input, (2) developing an ultrafast sequence-to-fold predictor (or oracle) and incorporating its feedback into WGAN as a loss to guide model training, and (3) exploiting sequence data with and without paired structures to enable a semisupervised training strategy. Assessed by the oracle over 100 novel folds not in the training set, gcWGAN generates more successful designs and covers 3.5 times more target folds compared to a competing data-driven method (cVAE). Assessed by sequence- and structure-based predictors, gcWGAN designs are physically and biologically sound. Assessed by a structure predictor over representative novel folds, including one not even part of basis folds, gcWGAN designs have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. The ultrafast data-driven model is further shown to boost the success of a principle-driven de novo method (RosettaDesign), through generating design seeds and tailoring design space. In conclusion, gcWGAN explores uncharted sequence space to design proteins by learning generalizable principles from current sequence-structure data. Data, source codes, and trained models are available at https://github.com/Shen-Lab/gcWGAN.
Collapse
Affiliation(s)
- Mostafa Karimi
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States
| | - Shaowen Zhu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Yue Cao
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States
| |
Collapse
|
9
|
Lowegard AU, Frenkel MS, Holt GT, Jou JD, Ojewole AA, Donald BR. Novel, provable algorithms for efficient ensemble-based computational protein design and their application to the redesign of the c-Raf-RBD:KRas protein-protein interface. PLoS Comput Biol 2020; 16:e1007447. [PMID: 32511232 PMCID: PMC7329130 DOI: 10.1371/journal.pcbi.1007447] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 07/01/2020] [Accepted: 05/13/2020] [Indexed: 11/25/2022] Open
Abstract
The K* algorithm provably approximates partition functions for a set of states (e.g., protein, ligand, and protein-ligand complex) to a user-specified accuracy ε. Often, reaching an ε-approximation for a particular set of partition functions takes a prohibitive amount of time and space. To alleviate some of this cost, we introduce two new algorithms into the osprey suite for protein design: fries, a Fast Removal of Inadequately Energied Sequences, and EWAK*, an Energy Window Approximation to K*. fries pre-processes the sequence space to limit a design to only the most stable, energetically favorable sequence possibilities. EWAK* then takes this pruned sequence space as input and, using a user-specified energy window, calculates K* scores using the lowest energy conformations. We expect fries/EWAK* to be most useful in cases where there are many unstable sequences in the design sequence space and when users are satisfied with enumerating the low-energy ensemble of conformations. In combination, these algorithms provably retain calculational accuracy while limiting the input sequence space and the conformations included in each partition function calculation to only the most energetically favorable, effectively reducing runtime while still enriching for desirable sequences. This combined approach led to significant speed-ups compared to the previous state-of-the-art multi-sequence algorithm, BBK*, while maintaining its efficiency and accuracy, which we show across 40 different protein systems and a total of 2,826 protein design problems. Additionally, as a proof of concept, we used these new algorithms to redesign the protein-protein interface (PPI) of the c-Raf-RBD:KRas complex. The Ras-binding domain of the protein kinase c-Raf (c-Raf-RBD) is the tightest known binder of KRas, a protein implicated in difficult-to-treat cancers. fries/EWAK* accurately retrospectively predicted the effect of 41 different sets of mutations in the PPI of the c-Raf-RBD:KRas complex. Notably, these mutations include mutations whose effect had previously been incorrectly predicted using other computational methods. Next, we used fries/EWAK* for prospective design and discovered a novel point mutation that improves binding of c-Raf-RBD to KRas in its active, GTP-bound state (KRasGTP). We combined this new mutation with two previously reported mutations (which were highly-ranked by osprey) to create a new variant of c-Raf-RBD, c-Raf-RBD(RKY). fries/EWAK* in osprey computationally predicted that this new variant binds even more tightly than the previous best-binding variant, c-Raf-RBD(RK). We measured the binding affinity of c-Raf-RBD(RKY) using a bio-layer interferometry (BLI) assay, and found that this new variant exhibits single-digit nanomolar affinity for KRasGTP, confirming the computational predictions made with fries/EWAK*. This new variant binds roughly five times more tightly than the previous best known binder and roughly 36 times more tightly than the design starting point (wild-type c-Raf-RBD). This study steps through the advancement and development of computational protein design by presenting theory, new algorithms, accurate retrospective designs, new prospective designs, and biochemical validation. Computational structure-based protein design is an innovative tool for redesigning proteins to introduce a particular or novel function. One such function is improving the binding of one protein to another, which can increase our understanding of important protein systems. Herein we introduce two novel, provable algorithms, fries and EWAK*, for more efficient computational structure-based protein design as well as their application to the redesign of the c-Raf-RBD:KRas protein-protein interface. These new algorithms speed-up computational structure-based protein design while maintaining accurate calculations, allowing for larger, previously infeasible protein designs. Additionally, using fries and EWAK* within the osprey suite, we designed the tightest known binder of KRas, a heavily studied cancer target that interacts with a number of different proteins. This previously undiscovered variant of a KRas-binding domain, c-Raf-RBD, has potential to serve as a tool to further probe the protein-protein interface of KRas with its effectors and its discovery alone emphasizes the potential for more successful applications of computational structure-based protein design.
Collapse
Affiliation(s)
- Anna U. Lowegard
- Program in Computational Biology and Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Marcel S. Frenkel
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Graham T. Holt
- Program in Computational Biology and Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Jonathan D. Jou
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Adegoke A. Ojewole
- Program in Computational Biology and Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Bruce R. Donald
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
10
|
Adaptive landscape flattening allows the design of both enzyme: Substrate binding and catalytic power. PLoS Comput Biol 2020; 16:e1007600. [PMID: 31917825 PMCID: PMC7041857 DOI: 10.1371/journal.pcbi.1007600] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 02/25/2020] [Accepted: 12/11/2019] [Indexed: 01/30/2023] Open
Abstract
Designed enzymes are of fundamental and technological interest. Experimental directed evolution still has significant limitations, and computational approaches are a complementary route. A designed enzyme should satisfy multiple criteria: stability, substrate binding, transition state binding. Such multi-objective design is computationally challenging. Two recent studies used adaptive importance sampling Monte Carlo to redesign proteins for ligand binding. By first flattening the energy landscape of the apo protein, they obtained positive design for the bound state and negative design for the unbound. We have now extended the method to design an enzyme for specific transition state binding, i.e., for its catalytic power. We considered methionyl-tRNA synthetase (MetRS), which attaches methionine (Met) to its cognate tRNA, establishing codon identity. Previously, MetRS and other synthetases have been redesigned by experimental directed evolution to accept noncanonical amino acids as substrates, leading to genetic code expansion. Here, we have redesigned MetRS computationally to bind several ligands: the Met analog azidonorleucine, methionyl-adenylate (MetAMP), and the activated ligands that form the transition state for MetAMP production. Enzyme mutants known to have azidonorleucine activity were recovered by the design calculations, and 17 mutants predicted to bind MetAMP were characterized experimentally and all found to be active. Mutants predicted to have low activation free energies for MetAMP production were found to be active and the predicted reaction rates agreed well with the experimental values. We suggest the present method should become the paradigm for computational enzyme design. Designed enzymes are of major interest. Experimental directed evolution still has significant limitations, and computational approaches are another route. Enzymes must be stable, bind substrates, and be powerful catalysts. It is challenging to design for all these properties. A method to design substrate binding was proposed recently. It used an adaptive Monte Carlo method to explore mutations of a few amino acids near the substrate. A bias energy was gradually “learned” such that, in the absence of the ligand, the simulation visited most of the possible protein mutations with comparable probabilities. Remarkably, a simulation of the protein:ligand complex, including the bias, will then preferentially sample tight-binding sequences. We generalized the method to design binding specificity. We tested it for the methionyl-tRNA synthetase enzyme, which has been engineered in order to expand the genetic code. We redesigned the enzyme to obtain variants with low activation free energies for the catalytic step. The variants proposed by the simulations were shown experimentally to be active, and the predicted activation free energies were in reasonable agreement with the experimental values. We expect the new method will become the paradigm for computational enzyme design.
Collapse
|
11
|
HALLEN MARKA, DONALD BRUCER. Protein Design by Provable Algorithms. COMMUNICATIONS OF THE ACM 2019; 62:76-84. [PMID: 31607753 PMCID: PMC6788629 DOI: 10.1145/3338124] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein design algorithms can leverage provable guarantees of accuracy to provide new insights and unique optimized molecules.
Collapse
Affiliation(s)
- MARK A. HALLEN
- Research assistant professor at the Toyota Technological Institute at Chicago, IL, USA
| | - BRUCE R. DONALD
- James B. Duke Professor of Computer Science at Duke University, as well as a
professor of chemistry and biochemistry in the Duke University Medical
Center, Durham, NC, USA
| |
Collapse
|
12
|
Vucinic J, Simoncini D, Ruffini M, Barbe S, Schiex T. Positive multistate protein design. Bioinformatics 2019; 36:122-130. [DOI: 10.1093/bioinformatics/btz497] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Revised: 05/20/2019] [Accepted: 06/11/2019] [Indexed: 11/12/2022] Open
Abstract
Abstract
Motivation
Structure-based computational protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. The usual approach considers a single rigid backbone as a target, which ignores backbone flexibility. Multistate design (MSD) allows instead to consider several backbone states simultaneously, defining challenging computational problems.
Results
We introduce efficient reductions of positive MSD problems to Cost Function Networks with two different fitness definitions and implement them in the Pompd (Positive Multistate Protein design) software. Pompd is able to identify guaranteed optimal sequences of positive multistate full protein redesign problems and exhaustively enumerate suboptimal sequences close to the MSD optimum. Applied to nuclear magnetic resonance and back-rubbed X-ray structures, we observe that the average energy fitness provides the best sequence recovery. Our method outperforms state-of-the-art guaranteed computational design approaches by orders of magnitudes and can solve MSD problems with sizes previously unreachable with guaranteed algorithms.
Availability and implementation
https://forgemia.inra.fr/thomas.schiex/pompd as documented Open Source.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jelena Vucinic
- LISBP, Université de Toulouse, CNRS, INRA, INSA, 31400 Toulouse, France
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan Cedex, France
| | - David Simoncini
- LISBP, Université de Toulouse, CNRS, INRA, INSA, 31400 Toulouse, France
- IRIT UMR 5505-CNRS, Université de Toulouse, 31042 Cedex 9, France
| | - Manon Ruffini
- LISBP, Université de Toulouse, CNRS, INRA, INSA, 31400 Toulouse, France
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan Cedex, France
| | - Sophie Barbe
- LISBP, Université de Toulouse, CNRS, INRA, INSA, 31400 Toulouse, France
| | - Thomas Schiex
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan Cedex, France
| |
Collapse
|
13
|
Keedy DA. Journey to the center of the protein: allostery from multitemperature multiconformer X-ray crystallography. Acta Crystallogr D Struct Biol 2019; 75:123-137. [PMID: 30821702 PMCID: PMC6400254 DOI: 10.1107/s2059798318017941] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Accepted: 12/19/2018] [Indexed: 02/08/2023] Open
Abstract
Proteins inherently fluctuate between conformations to perform functions in the cell. For example, they sample product-binding, transition-state-stabilizing and product-release states during catalysis, and they integrate signals from remote regions of the structure for allosteric regulation. However, there is a lack of understanding of how these dynamic processes occur at the basic atomic level. This gap can be at least partially addressed by combining variable-temperature (instead of traditional cryogenic temperature) X-ray crystallography with algorithms for modeling alternative conformations based on electron-density maps, in an approach called multitemperature multiconformer X-ray crystallography (MMX). Here, the use of MMX to reveal alternative conformations at different sites in a protein structure and to estimate the degree of energetic coupling between them is discussed. These insights can suggest testable hypotheses about allosteric mechanisms. Temperature is an easily manipulated experimental parameter, so the MMX approach is widely applicable to any protein that yields well diffracting crystals. Moreover, the general principles of MMX are extensible to other perturbations such as pH, pressure, ligand concentration etc. Future work will explore strategies for leveraging X-ray data across such perturbation series to more quantitatively measure how different parts of a protein structure are coupled to each other, and the consequences thereof for allostery and other aspects of protein function.
Collapse
Affiliation(s)
- Daniel A. Keedy
- Structural Biology Initiative, CUNY Advanced Science Research Center, New York, USA
- Department of Chemistry and Biochemistry, City College of New York, New York, USA
- PhD Programs in Chemistry and Biochemistry, The Graduate Center of the City University of New York, New York, USA
| |
Collapse
|
14
|
Hallen MA, Martin JW, Ojewole A, Jou JD, Lowegard AU, Frenkel MS, Gainza P, Nisonoff HM, Mukund A, Wang S, Holt GT, Zhou D, Dowd E, Donald BR. OSPREY 3.0: Open-source protein redesign for you, with powerful new features. J Comput Chem 2018; 39:2494-2507. [PMID: 30368845 PMCID: PMC6391056 DOI: 10.1002/jcc.25522] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Accepted: 06/14/2018] [Indexed: 12/14/2022]
Abstract
We present osprey 3.0, a new and greatly improved release of the osprey protein design software. Osprey 3.0 features a convenient new Python interface, which greatly improves its ease of use. It is over two orders of magnitude faster than previous versions of osprey when running the same algorithms on the same hardware. Moreover, osprey 3.0 includes several new algorithms, which introduce substantial speedups as well as improved biophysical modeling. It also includes GPU support, which provides an additional speedup of over an order of magnitude. Like previous versions of osprey, osprey 3.0 offers a unique package of advantages over other design software, including provable design algorithms that account for continuous flexibility during design and model conformational entropy. Finally, we show here empirically that osprey 3.0 accurately predicts the effect of mutations on protein-protein binding. Osprey 3.0 is available at http://www.cs.duke.edu/donaldlab/osprey.php as free and open-source software. © 2018 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Mark A. Hallen
- Department of Computer Science, Duke University, Durham, NC
27708
- Toyota Technological Institute at Chicago, Chicago, IL
60637
| | | | - Adegoke Ojewole
- Program in Computational Biology and Bioinformatics, Duke
University Medical Center, Durham, NC 27710
| | - Jonathan D. Jou
- Department of Computer Science, Duke University, Durham, NC
27708
| | - Anna U. Lowegard
- Program in Computational Biology and Bioinformatics, Duke
University Medical Center, Durham, NC 27710
| | - Marcel S. Frenkel
- Department of Biochemistry, Duke University Medical Center,
Durham, NC 27710
| | - Pablo Gainza
- Department of Computer Science, Duke University, Durham, NC
27708
| | | | - Aditya Mukund
- Department of Computer Science, Duke University, Durham, NC
27708
| | - Siyu Wang
- Program in Computational Biology and Bioinformatics, Duke
University Medical Center, Durham, NC 27710
| | - Graham T. Holt
- Program in Computational Biology and Bioinformatics, Duke
University Medical Center, Durham, NC 27710
| | - David Zhou
- Department of Computer Science, Duke University, Durham, NC
27708
| | - Elizabeth Dowd
- Department of Computer Science, Duke University, Durham, NC
27708
| | - Bruce R. Donald
- Department of Computer Science, Duke University, Durham, NC
27708
- Department of Chemistry, Duke University, Durham, NC
27708
- Department of Biochemistry, Duke University Medical Center,
Durham, NC 27710
| |
Collapse
|
15
|
Hallen MA. PLUG (Pruning of Local Unrealistic Geometries) removes restrictions on biophysical modeling for protein design. Proteins 2018; 87:62-73. [PMID: 30378699 DOI: 10.1002/prot.25623] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 10/10/2018] [Accepted: 10/16/2018] [Indexed: 12/29/2022]
Abstract
Protein design algorithms must search an enormous conformational space to identify favorable conformations. As a result, those that perform this search with guarantees of accuracy generally start with a conformational pruning step, such as dead-end elimination (DEE). However, the mathematical assumptions of DEE-based pruning algorithms have up to now severely restricted the biophysical model that can feasibly be used in protein design. To lift these restrictions, I propose to prune local unrealistic geometries (PLUG) using a linear programming-based method. PLUG's biophysical model consists only of well-known lower bounds on interatomic distances. PLUG is intended as preprocessing for energy-based protein design calculations, whose biophysical model need not support DEE pruning. Based on 96 test cases, PLUG is at least as effective at pruning as DEE for larger protein designs-the type that most require pruning. When combined with the LUTE protein design algorithm, PLUG greatly facilitates designs that account for continuous entropy, large multistate designs with continuous flexibility, and designs with extensive continuous backbone flexibility and advanced nonpairwise energy functions. Many of these designs are tractable only with PLUG, either for empirical reasons (LUTE's machine learning step achieves an accurate fit only after PLUG pruning), or for theoretical reasons (many energy functions are fundamentally incompatible with DEE).
Collapse
Affiliation(s)
- Mark A Hallen
- Toyota Technological Institute at Chicago, Chicago, Illinois
| |
Collapse
|
16
|
Lechner H, Ferruz N, Höcker B. Strategies for designing non-natural enzymes and binders. Curr Opin Chem Biol 2018; 47:67-76. [PMID: 30248579 DOI: 10.1016/j.cbpa.2018.07.022] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2018] [Revised: 07/16/2018] [Accepted: 07/17/2018] [Indexed: 12/20/2022]
Abstract
The design of tailor-made enzymes is a major goal in biochemical research that can result in wide-range applications and will lead to a better understanding of how proteins fold and function. In this review we highlight recent advances in enzyme and small molecule binder design. A focus is placed on novel strategies for the design of scaffolds, developments in computational methods, and recent applications of these techniques on receptors, sensors, and enzymes. Further, the integration of computational and experimental methodologies is discussed. The outlined examples of designed enzymes and binders for various purposes highlight the importance of this topic and underline the need for tailor-made proteins.
Collapse
Affiliation(s)
- Horst Lechner
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany
| | - Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany.
| |
Collapse
|
17
|
Abstract
Motivation Multistate protein design addresses real-world challenges, such as multi-specificity design and backbone flexibility, by considering both positive and negative protein states with an ensemble of substates for each. It also presents an enormous challenge to exact algorithms that guarantee the optimal solutions and enable a direct test of mechanistic hypotheses behind models. However, efficient exact algorithms are lacking for multistate protein design. Results We have developed an efficient exact algorithm called interconnected cost function networks (iCFN) for multistate protein design. Its generic formulation allows for a wide array of applications such as stability, affinity and specificity designs while addressing concerns such as global flexibility of protein backbones. iCFN treats each substate design as a weighted constraint satisfaction problem (WCSP) modeled through a CFN; and it solves the coupled WCSPs using novel bounds and a depth-first branch-and-bound search over a tree structure of sequences, substates, and conformations. When iCFN is applied to specificity design of a T-cell receptor, a problem of unprecedented size to exact methods, it drastically reduces search space and running time to make the problem tractable. Moreover, iCFN generates experimentally-agreeing receptor designs with improved accuracy compared with state-of-the-art methods, highlights the importance of modeling backbone flexibility in protein design, and reveals molecular mechanisms underlying binding specificity. Availability and implementation https://shen-lab.github.io/software/iCFN. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mostafa Karimi
- Department of Electrical and Computer Engineering and TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, USA
| | - Yang Shen
- Department of Electrical and Computer Engineering and TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, USA
| |
Collapse
|
18
|
Hallen MA, Donald BR. CATS (Coordinates of Atoms by Taylor Series): protein design with backbone flexibility in all locally feasible directions. Bioinformatics 2018; 33:i5-i12. [PMID: 28882005 PMCID: PMC5870559 DOI: 10.1093/bioinformatics/btx277] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Motivation When proteins mutate or bind to ligands, their backbones often move significantly, especially in loop regions. Computational protein design algorithms must model these motions in order to accurately optimize protein stability and binding affinity. However, methods for backbone conformational search in design have been much more limited than for sidechain conformational search. This is especially true for combinatorial protein design algorithms, which aim to search a large sequence space efficiently and thus cannot rely on temporal simulation of each candidate sequence. Results We alleviate this difficulty with a new parameterization of backbone conformational space, which represents all degrees of freedom of a specified segment of protein chain that maintain valid bonding geometry (by maintaining the original bond lengths and angles and ω dihedrals). In order to search this space, we present an efficient algorithm, CATS, for computing atomic coordinates as a function of our new continuous backbone internal coordinates. CATS generalizes the iMinDEE and EPIC protein design algorithms, which model continuous flexibility in sidechain dihedrals, to model continuous, appropriately localized flexibility in the backbone dihedrals ϕ and ψ as well. We show using 81 test cases based on 29 different protein structures that CATS finds sequences and conformations that are significantly lower in energy than methods with less or no backbone flexibility do. In particular, we show that CATS can model the viability of an antibody mutation known experimentally to increase affinity, but that appears sterically infeasible when modeled with less or no backbone flexibility. Availability and implementation Our code is available as free software at https://github.com/donaldlab/OSPREY_refactor. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mark A Hallen
- Department of Computer Science, Duke University, Durham, NC, USA.,Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC, USA.,Department of Chemistry, Duke University, Durham, NC, USA.,Department of Biochemistry, Duke University Medical Center, Durham, NC, USA
| |
Collapse
|
19
|
Ojewole AA, Jou JD, Fowler VG, Donald BR. BBK* (Branch and Bound Over K*): A Provable and Efficient Ensemble-Based Protein Design Algorithm to Optimize Stability and Binding Affinity Over Large Sequence Spaces. J Comput Biol 2018; 25:726-739. [PMID: 29641249 DOI: 10.1089/cmb.2017.0267] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Computational protein design (CPD) algorithms that compute binding affinity, Ka, search for sequences with an energetically favorable free energy of binding. Recent work shows that three principles improve the biological accuracy of CPD: ensemble-based design, continuous flexibility of backbone and side-chain conformations, and provable guarantees of accuracy with respect to the input. However, previous methods that use all three design principles are single-sequence (SS) algorithms, which are very costly: linear in the number of sequences and thus exponential in the number of simultaneously mutable residues. To address this computational challenge, we introduce BBK*, a new CPD algorithm whose key innovation is the multisequence (MS) bound: BBK* efficiently computes a single provable upper bound to approximate Ka for a combinatorial number of sequences, and avoids SS computation for all provably suboptimal sequences. Thus, to our knowledge, BBK* is the first provable, ensemble-based CPD algorithm to run in time sublinear in the number of sequences. Computational experiments on 204 protein design problems show that BBK* finds the tightest binding sequences while approximating Ka for up to 105-fold fewer sequences than the previous state-of-the-art algorithms, which require exhaustive enumeration of sequences. Furthermore, for 51 protein-ligand design problems, BBK* provably approximates Ka up to 1982-fold faster than the previous state-of-the-art iMinDEE/[Formula: see text]/[Formula: see text] algorithm. Therefore, BBK* not only accelerates protein designs that are possible with previous provable algorithms, but also efficiently performs designs that are too large for previous methods.
Collapse
Affiliation(s)
- Adegoke A Ojewole
- 1 Department of Computer Science, Duke University , Durham, North Carolina.,2 Computational Biology and Bioinformatics Program, Duke University , Durham, North Carolina
| | - Jonathan D Jou
- 1 Department of Computer Science, Duke University , Durham, North Carolina
| | - Vance G Fowler
- 3 Division of Infectious Diseases, Duke University Medical Center , Durham, North Carolina
| | - Bruce R Donald
- 1 Department of Computer Science, Duke University , Durham, North Carolina.,4 Department of Biochemistry, Duke University Medical Center , Durham North Carolina
| |
Collapse
|
20
|
Computationally optimized deimmunization libraries yield highly mutated enzymes with low immunogenicity and enhanced activity. Proc Natl Acad Sci U S A 2017; 114:E5085-E5093. [PMID: 28607051 DOI: 10.1073/pnas.1621233114] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Therapeutic proteins of wide-ranging function hold great promise for treating disease, but immune surveillance of these macromolecules can drive an antidrug immune response that compromises efficacy and even undermines safety. To eliminate widespread T-cell epitopes in any biotherapeutic and thereby mitigate this key source of detrimental immune recognition, we developed a Pareto optimal deimmunization library design algorithm that optimizes protein libraries to account for the simultaneous effects of combinations of mutations on both molecular function and epitope content. Active variants identified by high-throughput screening are thus inherently likely to be deimmunized. Functional screening of an optimized 10-site library (1,536 variants) of P99 β-lactamase (P99βL), a component of ADEPT cancer therapies, revealed that the population possessed high overall fitness, and comprehensive analysis of peptide-MHC II immunoreactivity showed the population possessed lower average immunogenic potential than the wild-type enzyme. Although similar functional screening of an optimized 30-site library (2.15 × 109 variants) revealed reduced population-wide fitness, numerous individual variants were found to have activity and stability better than the wild type despite bearing 13 or more deimmunizing mutations per enzyme. The immunogenic potential of one highly active and stable 14-mutation variant was assessed further using ex vivo cellular immunoassays, and the variant was found to silence T-cell activation in seven of the eight blood donors who responded strongly to wild-type P99βL. In summary, our multiobjective library-design process readily identified large and mutually compatible sets of epitope-deleting mutations and produced highly active but aggressively deimmunized constructs in only one round of library screening.
Collapse
|
21
|
Ojewole A, Lowegard A, Gainza P, Reeve SM, Georgiev I, Anderson AC, Donald BR. OSPREY Predicts Resistance Mutations Using Positive and Negative Computational Protein Design. Methods Mol Biol 2017; 1529:291-306. [PMID: 27914058 PMCID: PMC5192561 DOI: 10.1007/978-1-4939-6637-0_15] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Drug resistance in protein targets is an increasingly common phenomenon that reduces the efficacy of both existing and new antibiotics. However, knowledge of future resistance mutations during pre-clinical phases of drug development would enable the design of novel antibiotics that are robust against not only known resistant mutants, but also against those that have not yet been clinically observed. Computational structure-based protein design (CSPD) is a transformative field that enables the prediction of protein sequences with desired biochemical properties such as binding affinity and specificity to a target. The use of CSPD to predict previously unseen resistance mutations represents one of the frontiers of computational protein design. In a recent study (Reeve et al. Proc Natl Acad Sci U S A 112(3):749-754, 2015), we used our OSPREY (Open Source Protein REdesign for You) suite of CSPD algorithms to prospectively predict resistance mutations that arise in the active site of the dihydrofolate reductase enzyme from methicillin-resistant Staphylococcus aureus (SaDHFR) in response to selective pressure from an experimental competitive inhibitor. We demonstrated that our top predicted candidates are indeed viable resistant mutants. Since that study, we have significantly enhanced the capabilities of OSPREY with not only improved modeling of backbone flexibility, but also efficient multi-state design, fast sparse approximations, partitioned continuous rotamers for more accurate energy bounds, and a computationally efficient representation of molecular-mechanics and quantum-mechanical energy functions. Here, using SaDHFR as an example, we present a protocol for resistance prediction using the latest version of OSPREY. Specifically, we show how to use a combination of positive and negative design to predict active site escape mutations that maintain the enzyme's catalytic function but selectively ablate binding of an inhibitor.
Collapse
Affiliation(s)
- Adegoke Ojewole
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC, 27708, USA
| | - Anna Lowegard
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC, 27708, USA
| | - Pablo Gainza
- Department of Computer Science, Duke University, Durham, NC, 27708, USA
| | - Stephanie M Reeve
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, CT, 06269, USA
| | - Ivelin Georgiev
- Department of Computer Science, Duke University, Durham, NC, 27708, USA
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, Bethesda, MD, 20892, USA
| | - Amy C Anderson
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, CT, 06269, USA
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC, 27708, USA.
- Department of Biochemistry, Duke University, Durham, NC, 27708, USA.
- Department of Chemistry, Duke University, Durham, NC, 27708, USA.
| |
Collapse
|
22
|
Druart K, Bigot J, Audit E, Simonson T. A Hybrid Monte Carlo Scheme for Multibackbone Protein Design. J Chem Theory Comput 2016; 12:6035-6048. [DOI: 10.1021/acs.jctc.6b00421] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Karen Druart
- Laboratoire
de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
- Maison
de la Simulation, CEA, CNRS, Univ. Paris-Sud, UVSQ, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
| | - Julien Bigot
- Maison
de la Simulation, CEA, CNRS, Univ. Paris-Sud, UVSQ, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
| | - Edouard Audit
- Maison
de la Simulation, CEA, CNRS, Univ. Paris-Sud, UVSQ, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
| | - Thomas Simonson
- Laboratoire
de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
23
|
Hallen MA, Jou JD, Donald BR. LUTE (Local Unpruned Tuple Expansion): Accurate Continuously Flexible Protein Design with General Energy Functions and Rigid Rotamer-Like Efficiency. J Comput Biol 2016; 24:536-546. [PMID: 27681371 DOI: 10.1089/cmb.2016.0136] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Most protein design algorithms search over discrete conformations and an energy function that is residue-pairwise, that is, a sum of terms that depend on the sequence and conformation of at most two residues. Although modeling of continuous flexibility and of non-residue-pairwise energies significantly increases the accuracy of protein design, previous methods to model these phenomena add a significant asymptotic cost to design calculations. We now remove this cost by modeling continuous flexibility and non-residue-pairwise energies in a form suitable for direct input to highly efficient, discrete combinatorial optimization algorithms such as DEE/A* or branch-width minimization. Our novel algorithm performs a local unpruned tuple expansion (LUTE), which can efficiently represent both continuous flexibility and general, possibly nonpairwise energy functions to an arbitrary level of accuracy using a discrete energy matrix. We show using 47 design calculation test cases that LUTE provides a dramatic speedup in both single-state and multistate continuously flexible designs.
Collapse
Affiliation(s)
- Mark A Hallen
- 1 Department of Computer Science, Levine Science Research Center, Duke University , Durham, North Carolina
| | - Jonathan D Jou
- 1 Department of Computer Science, Levine Science Research Center, Duke University , Durham, North Carolina
| | - Bruce R Donald
- 1 Department of Computer Science, Levine Science Research Center, Duke University , Durham, North Carolina.,2 Department of Chemistry, Duke University , Durham, North Carolina.,3 Department of Biochemistry, Duke University Medical Center , Durham, North Carolina
| |
Collapse
|
24
|
Gainza P, Nisonoff HM, Donald BR. Algorithms for protein design. Curr Opin Struct Biol 2016; 39:16-26. [PMID: 27086078 PMCID: PMC5065368 DOI: 10.1016/j.sbi.2016.03.006] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Revised: 03/15/2016] [Accepted: 03/22/2016] [Indexed: 02/05/2023]
Abstract
Computational structure-based protein design programs are becoming an increasingly important tool in molecular biology. These programs compute protein sequences that are predicted to fold to a target structure and perform a desired function. The success of a program's predictions largely relies on two components: first, the input biophysical model, and second, the algorithm that computes the best sequence(s) and structure(s) according to the biophysical model. Improving both the model and the algorithm in tandem is essential to improving the success rate of current programs, and here we review recent developments in algorithms for protein design, emphasizing how novel algorithms enable the use of more accurate biophysical models. We conclude with a list of algorithmic challenges in computational protein design that we believe will be especially important for the design of therapeutic proteins and protein assemblies.
Collapse
Affiliation(s)
- Pablo Gainza
- Department of Computer Science, Duke University, Durham, NC, United States
| | - Hunter M Nisonoff
- Department of Computer Science, Duke University, Durham, NC, United States
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC, United States; Department of Biochemistry, Duke University Medical Center, Durham, NC, United States; Department of Chemistry, Duke University, Durham, NC, United States.
| |
Collapse
|