1
|
Wankowicz SA, Ravikumar A, Sharma S, Riley BT, Raju A, Flowers J, Hogan D, van den Bedem H, Keedy DA, Fraser JS. Uncovering Protein Ensembles: Automated Multiconformer Model Building for X-ray Crystallography and Cryo-EM. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.06.28.546963. [PMID: 37425870 PMCID: PMC10327213 DOI: 10.1101/2023.06.28.546963] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
In their folded state, biomolecules exchange between multiple conformational states that are crucial for their function. Traditional structural biology methods, such as X-ray crystallography and cryogenic electron microscopy (cryo-EM), produce density maps that are ensemble averages, reflecting molecules in various conformations. Yet, most models derived from these maps explicitly represent only a single conformation, overlooking the complexity of biomolecular structures. To accurately reflect the diversity of biomolecular forms, there is a pressing need to shift towards modeling structural ensembles that mirror the experimental data. However, the challenge of distinguishing signal from noise complicates manual efforts to create these models. In response, we introduce the latest enhancements to qFit, an automated computational strategy designed to incorporate protein conformational heterogeneity into models built into density maps. These algorithmic improvements in qFit are substantiated by superior R f r e e and geometry metrics across a wide range of proteins. Importantly, unlike more complex multicopy ensemble models, the multiconformer models produced by qFit can be manually modified in most major model building software (e.g. Coot) and fit can be further improved by refinement using standard pipelines (e.g. Phenix, Refmac, Buster). By reducing the barrier of creating multiconformer models, qFit can foster the development of new hypotheses about the relationship between macromolecular conformational dynamics and function.
Collapse
Affiliation(s)
- Stephanie A. Wankowicz
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ashraya Ravikumar
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Shivani Sharma
- Structural Biology Initiative, CUNY Advanced Science Research Center, New York, NY 10031
- Ph.D. Program in Biology, The Graduate Center – City University of New York, New York, NY 10016
| | - Blake T. Riley
- Structural Biology Initiative, CUNY Advanced Science Research Center, New York, NY 10031
| | - Akshay Raju
- Structural Biology Initiative, CUNY Advanced Science Research Center, New York, NY 10031
| | - Jessica Flowers
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Daniel Hogan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Henry van den Bedem
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Atomwise, Inc., San Francisco, CA, United States
| | - Daniel A. Keedy
- Structural Biology Initiative, CUNY Advanced Science Research Center, New York, NY 10031
- Department of Chemistry and Biochemistry, City College of New York, New York, NY 10031
- Ph.D. Programs in Biochemistry, Biology, and Chemistry, The Graduate Center – City University of New York, New York, NY 10016
| | - James S. Fraser
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
2
|
Kortemme T. De novo protein design-From new structures to programmable functions. Cell 2024; 187:526-544. [PMID: 38306980 PMCID: PMC10990048 DOI: 10.1016/j.cell.2023.12.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 12/03/2023] [Accepted: 12/19/2023] [Indexed: 02/04/2024]
Abstract
Methods from artificial intelligence (AI) trained on large datasets of sequences and structures can now "write" proteins with new shapes and molecular functions de novo, without starting from proteins found in nature. In this Perspective, I will discuss the state of the field of de novo protein design at the juncture of physics-based modeling approaches and AI. New protein folds and higher-order assemblies can be designed with considerable experimental success rates, and difficult problems requiring tunable control over protein conformations and precise shape complementarity for molecular recognition are coming into reach. Emerging approaches incorporate engineering principles-tunability, controllability, and modularity-into the design process from the beginning. Exciting frontiers lie in deconstructing cellular functions with de novo proteins and, conversely, constructing synthetic cellular signaling from the ground up. As methods improve, many more challenges are unsolved.
Collapse
Affiliation(s)
- Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA; Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA.
| |
Collapse
|
3
|
Talluri S. Algorithms for protein design. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:1-38. [PMID: 35534105 DOI: 10.1016/bs.apcsb.2022.01.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Computational Protein Design has the potential to contribute to major advances in enzyme technology, vaccine design, receptor-ligand engineering, biomaterials, nanosensors, and synthetic biology. Although Protein Design is a challenging problem, proteins can be designed by experts in Protein Design, as well as by non-experts whose primary interests are in the applications of Protein Design. The increased accessibility of Protein Design technology is attributable to the accumulated knowledge and experience with Protein Design as well as to the availability of software and online resources. The objective of this review is to serve as a guide to the relevant literature with a focus on the novel methods and algorithms that have been developed or applied for Protein Design, and to assist in the selection of algorithms for Protein Design. Novel algorithms and models that have been introduced to utilize the enormous amount of experimental data and novel computational hardware have the potential for producing substantial increases in the accuracy, reliability and range of applications of designed proteins.
Collapse
Affiliation(s)
- Sekhar Talluri
- Department of Biotechnology, GITAM, Visakhapatnam, India.
| |
Collapse
|
4
|
Tailoring Escherichia coli Chemotactic Sensing towards Cadmium by Computational Redesign of Ribose-Binding Protein. mSystems 2022; 7:e0108421. [PMID: 35014867 PMCID: PMC8751387 DOI: 10.1128/msystems.01084-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Periplasmic binding proteins such as ribose-binding proteins (RBPs) are involved in the bacterial chemotaxis two-component system. RBP selectively identifies and interacts with ribose to induce a conformational change that leads to chemotaxis. Here, we report the development of an engineered Escherichia coli (E. coli) strain expressing a redesigned RBP that can effectively sense cadmium ions and regulate chemotactic movement of cells toward a cadmium ion gradient. RBP was computationally redesigned to bind cadmium ions and produce the conformational change required for chemoreceptor binding. The successful design, CdRBP1, binds to cadmium ions with a dissociation constant of 268 nM. When CdRBP1 was expressed in the periplasmic space of E. coli, the bacteria became live cadmium ion hunters with high selectivity over other divalent metal ions. This work presents an example of making cadmium ions, which are toxic for most organisms, as an attractant to regulate cells movement. Our approach also demonstrates that RBP can be precisely designed to develop metal-detecting living systems for potential applications in synthetic biology and environmental studies. IMPORTANCE Cadmium pollution is one of the major environmental problems due to excessive release and accumulation. New technologies that can auto-detect cadmium ions with good biocompatibility are in urgent need. In this study, we engineered the bacterial chemotaxis system to positively sense cadmium ions by redesigning ribose-binding protein (RBP) to tightly bind cadmium ion and produce the right conformational change for receptor binding and signaling. Our engineered E. coli cells can auto-detect and chase cadmium ions with divalent metal ion selectivity. Many attempts have been carried out to redesign RBP at the ribose binding site with little success. Instead of the ribose binding site, we introduced the cadmium binding site in the opening of the ribose binding pocket by a specially developed computational algorithm. Our design strategy can be applied to engineer live bacteria with autonomous detection and remediation abilities for metal ions or other chemicals in the future.
Collapse
|
5
|
Bouchiba Y, Cortés J, Schiex T, Barbe S. Molecular flexibility in computational protein design: an algorithmic perspective. Protein Eng Des Sel 2021; 34:6271252. [PMID: 33959778 DOI: 10.1093/protein/gzab011] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/12/2021] [Accepted: 03/29/2021] [Indexed: 12/19/2022] Open
Abstract
Computational protein design (CPD) is a powerful technique for engineering new proteins, with both great fundamental implications and diverse practical interests. However, the approximations usually made for computational efficiency, using a single fixed backbone and a discrete set of side chain rotamers, tend to produce rigid and hyper-stable folds that may lack functionality. These approximations contrast with the demonstrated importance of molecular flexibility and motions in a wide range of protein functions. The integration of backbone flexibility and multiple conformational states in CPD, in order to relieve the inaccuracies resulting from these simplifications and to improve design reliability, are attracting increased attention. However, the greatly increased search space that needs to be explored in these extensions defines extremely challenging computational problems. In this review, we outline the principles of CPD and discuss recent effort in algorithmic developments for incorporating molecular flexibility in the design process.
Collapse
Affiliation(s)
- Younes Bouchiba
- Toulouse Biotechnology Institute, TBI, CNRS, INRAE, INSA, ANITI, Toulouse 31400, France.,Laboratoire d'Analyse et d'Architecture des Systèmes, LAAS CNRS, Université de Toulouse, CNRS, Toulouse 31400, France
| | - Juan Cortés
- Laboratoire d'Analyse et d'Architecture des Systèmes, LAAS CNRS, Université de Toulouse, CNRS, Toulouse 31400, France
| | - Thomas Schiex
- Université de Toulouse, ANITI, INRAE, UR MIAT, F-31320, Castanet-Tolosan, France
| | - Sophie Barbe
- Toulouse Biotechnology Institute, TBI, CNRS, INRAE, INSA, ANITI, Toulouse 31400, France
| |
Collapse
|
6
|
Pan X, Kortemme T. Recent advances in de novo protein design: Principles, methods, and applications. J Biol Chem 2021; 296:100558. [PMID: 33744284 PMCID: PMC8065224 DOI: 10.1016/j.jbc.2021.100558] [Citation(s) in RCA: 93] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/12/2021] [Accepted: 03/16/2021] [Indexed: 02/06/2023] Open
Abstract
The computational de novo protein design is increasingly applied to address a number of key challenges in biomedicine and biological engineering. Successes in expanding applications are driven by advances in design principles and methods over several decades. Here, we review recent innovations in major aspects of the de novo protein design and include how these advances were informed by principles of protein architecture and interactions derived from the wealth of structures in the Protein Data Bank. We describe developments in de novo generation of designable backbone structures, optimization of sequences, design scoring functions, and the design of the function. The advances not only highlight design goals reachable now but also point to the challenges and opportunities for the future of the field.
Collapse
Affiliation(s)
- Xingjie Pan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA; UC Berkeley - UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA.
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA; UC Berkeley - UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA; Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California, USA.
| |
Collapse
|
7
|
Lowegard AU, Frenkel MS, Holt GT, Jou JD, Ojewole AA, Donald BR. Novel, provable algorithms for efficient ensemble-based computational protein design and their application to the redesign of the c-Raf-RBD:KRas protein-protein interface. PLoS Comput Biol 2020; 16:e1007447. [PMID: 32511232 PMCID: PMC7329130 DOI: 10.1371/journal.pcbi.1007447] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 07/01/2020] [Accepted: 05/13/2020] [Indexed: 11/25/2022] Open
Abstract
The K* algorithm provably approximates partition functions for a set of states (e.g., protein, ligand, and protein-ligand complex) to a user-specified accuracy ε. Often, reaching an ε-approximation for a particular set of partition functions takes a prohibitive amount of time and space. To alleviate some of this cost, we introduce two new algorithms into the osprey suite for protein design: fries, a Fast Removal of Inadequately Energied Sequences, and EWAK*, an Energy Window Approximation to K*. fries pre-processes the sequence space to limit a design to only the most stable, energetically favorable sequence possibilities. EWAK* then takes this pruned sequence space as input and, using a user-specified energy window, calculates K* scores using the lowest energy conformations. We expect fries/EWAK* to be most useful in cases where there are many unstable sequences in the design sequence space and when users are satisfied with enumerating the low-energy ensemble of conformations. In combination, these algorithms provably retain calculational accuracy while limiting the input sequence space and the conformations included in each partition function calculation to only the most energetically favorable, effectively reducing runtime while still enriching for desirable sequences. This combined approach led to significant speed-ups compared to the previous state-of-the-art multi-sequence algorithm, BBK*, while maintaining its efficiency and accuracy, which we show across 40 different protein systems and a total of 2,826 protein design problems. Additionally, as a proof of concept, we used these new algorithms to redesign the protein-protein interface (PPI) of the c-Raf-RBD:KRas complex. The Ras-binding domain of the protein kinase c-Raf (c-Raf-RBD) is the tightest known binder of KRas, a protein implicated in difficult-to-treat cancers. fries/EWAK* accurately retrospectively predicted the effect of 41 different sets of mutations in the PPI of the c-Raf-RBD:KRas complex. Notably, these mutations include mutations whose effect had previously been incorrectly predicted using other computational methods. Next, we used fries/EWAK* for prospective design and discovered a novel point mutation that improves binding of c-Raf-RBD to KRas in its active, GTP-bound state (KRasGTP). We combined this new mutation with two previously reported mutations (which were highly-ranked by osprey) to create a new variant of c-Raf-RBD, c-Raf-RBD(RKY). fries/EWAK* in osprey computationally predicted that this new variant binds even more tightly than the previous best-binding variant, c-Raf-RBD(RK). We measured the binding affinity of c-Raf-RBD(RKY) using a bio-layer interferometry (BLI) assay, and found that this new variant exhibits single-digit nanomolar affinity for KRasGTP, confirming the computational predictions made with fries/EWAK*. This new variant binds roughly five times more tightly than the previous best known binder and roughly 36 times more tightly than the design starting point (wild-type c-Raf-RBD). This study steps through the advancement and development of computational protein design by presenting theory, new algorithms, accurate retrospective designs, new prospective designs, and biochemical validation. Computational structure-based protein design is an innovative tool for redesigning proteins to introduce a particular or novel function. One such function is improving the binding of one protein to another, which can increase our understanding of important protein systems. Herein we introduce two novel, provable algorithms, fries and EWAK*, for more efficient computational structure-based protein design as well as their application to the redesign of the c-Raf-RBD:KRas protein-protein interface. These new algorithms speed-up computational structure-based protein design while maintaining accurate calculations, allowing for larger, previously infeasible protein designs. Additionally, using fries and EWAK* within the osprey suite, we designed the tightest known binder of KRas, a heavily studied cancer target that interacts with a number of different proteins. This previously undiscovered variant of a KRas-binding domain, c-Raf-RBD, has potential to serve as a tool to further probe the protein-protein interface of KRas with its effectors and its discovery alone emphasizes the potential for more successful applications of computational structure-based protein design.
Collapse
Affiliation(s)
- Anna U. Lowegard
- Program in Computational Biology and Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Marcel S. Frenkel
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Graham T. Holt
- Program in Computational Biology and Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Jonathan D. Jou
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Adegoke A. Ojewole
- Program in Computational Biology and Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Bruce R. Donald
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
8
|
Jou JD, Holt GT, Lowegard AU, Donald BR. Minimization-Aware Recursive K*: A Novel, Provable Algorithm that Accelerates Ensemble-Based Protein Design and Provably Approximates the Energy Landscape. J Comput Biol 2019; 27:550-564. [PMID: 31855059 DOI: 10.1089/cmb.2019.0315] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Protein design algorithms that model continuous sidechain flexibility and conformational ensembles better approximate the in vitro and in vivo behavior of proteins. The previous state of the art, iMinDEE-A*-K*, computes provable ɛ-approximations to partition functions of protein states (e.g., bound vs. unbound) by computing provable, admissible pairwise-minimized energy lower bounds on protein conformations, and using the A* enumeration algorithm to return a gap-free list of lowest-energy conformations. iMinDEE-A*-K* runs in time sublinear in the number of conformations, but can be trapped in loosely-bounded, low-energy conformational wells containing many conformations with highly similar energies. That is, iMinDEE-A*-K* is unable to exploit the correlation between protein conformation and energy: similar conformations often have similar energy. We introduce two new concepts that exploit this correlation: Minimization-Aware Enumeration and Recursive K*. We combine these two insights into a novel algorithm, Minimization-Aware Recursive K* (MARK*), which tightens bounds not on single conformations, but instead on distinct regions of the conformation space. We compare the performance of iMinDEE-A*-K* versus MARK* by running the Branch and Bound over K* (BBK*) algorithm, which provably returns sequences in order of decreasing K* score, using either iMinDEE-A*-K* or MARK* to approximate partition functions. We show on 200 design problems that MARK* not only enumerates and minimizes vastly fewer conformations than the previous state of the art, but also runs up to 2 orders of magnitude faster. Finally, we show that MARK* not only efficiently approximates the partition function, but also provably approximates the energy landscape. To our knowledge, MARK* is the first algorithm to do so. We use MARK* to analyze the change in energy landscape of the bound and unbound states of an HIV-1 capsid protein C-terminal domain in complex with a camelid VHH, and measure the change in conformational entropy induced by binding. Thus, MARK* both accelerates existing designs and offers new capabilities not possible with previous algorithms.
Collapse
Affiliation(s)
- Jonathan D Jou
- Department of Computer Science, Duke University, Durham, North Carolina
| | - Graham T Holt
- Department of Computer Science, Duke University, Durham, North Carolina.,Computational Biology and Bioinformatics Program, Duke University, Durham, North Carolina
| | - Anna U Lowegard
- Department of Computer Science, Duke University, Durham, North Carolina.,Computational Biology and Bioinformatics Program, Duke University, Durham, North Carolina
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, North Carolina.,Department of Biochemistry, Duke University Medical Center, Durham, North Carolina.,Department of Chemistry, Duke University, Durham, North Carolina
| |
Collapse
|
9
|
Holt GT, Jou JD, Gill NP, Lowegard AU, Martin JW, Madden DR, Donald BR. Computational Analysis of Energy Landscapes Reveals Dynamic Features That Contribute to Binding of Inhibitors to CFTR-Associated Ligand. J Phys Chem B 2019; 123:10441-10455. [PMID: 31697075 DOI: 10.1021/acs.jpcb.9b07278] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The CFTR-associated ligand PDZ domain (CALP) binds to the cystic fibrosis transmembrane conductance regulator (CFTR) and mediates lysosomal degradation of mature CFTR. Inhibition of this interaction has been explored as a therapeutic avenue for cystic fibrosis. Previously, we reported the ensemble-based computational design of a novel peptide inhibitor of CALP, which resulted in the most binding-efficient inhibitor to date. This inhibitor, kCAL01, was designed using osprey and evinced significant biological activity in in vitro cell-based assays. Here, we report a crystal structure of kCAL01 bound to CALP and compare structural features against iCAL36, a previously developed inhibitor of CALP. We compute side-chain energy landscapes for each structure to not only enable approximation of binding thermodynamics but also reveal ensemble features that contribute to the comparatively efficient binding of kCAL01. Finally, we compare the previously reported design ensemble for kCAL01 vs the new crystal structure and show that, despite small differences between the design model and crystal structure, significant biophysical features that enhance inhibitor binding are captured in the design ensemble. This suggests not only that ensemble-based design captured thermodynamically significant features observed in vitro, but also that a design eschewing ensembles would miss the kCAL01 sequence entirely.
Collapse
Affiliation(s)
- Graham T Holt
- Department of Computer Science , Duke University , Durham , North Carolina 27708 , United States.,Program in Computational Biology and Bioinformatics , Duke University , Durham , North Carolina 27708 , United States
| | - Jonathan D Jou
- Department of Computer Science , Duke University , Durham , North Carolina 27708 , United States
| | - Nicholas P Gill
- Department of Biochemistry & Cell Biology , Geisel School of Medicine at Dartmouth , Hanover , New Hampshire 03755 , United States
| | - Anna U Lowegard
- Department of Computer Science , Duke University , Durham , North Carolina 27708 , United States.,Program in Computational Biology and Bioinformatics , Duke University , Durham , North Carolina 27708 , United States
| | - Jeffrey W Martin
- Department of Computer Science , Duke University , Durham , North Carolina 27708 , United States
| | - Dean R Madden
- Department of Biochemistry & Cell Biology , Geisel School of Medicine at Dartmouth , Hanover , New Hampshire 03755 , United States
| | - Bruce R Donald
- Department of Computer Science , Duke University , Durham , North Carolina 27708 , United States.,Department of Biochemistry , Duke University , Durham , North Carolina 27710 , United States.,Department of Chemistry , Duke University , Durham , North Carolina 27710 , United States
| |
Collapse
|
10
|
Loshbaugh AL, Kortemme T. Comparison of Rosetta flexible-backbone computational protein design methods on binding interactions. Proteins 2019; 88:206-226. [PMID: 31344278 DOI: 10.1002/prot.25790] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 07/15/2019] [Accepted: 07/19/2019] [Indexed: 01/03/2023]
Abstract
Computational design of binding sites in proteins remains difficult, in part due to limitations in our current ability to sample backbone conformations that enable precise and accurate geometric positioning of side chains during sequence design. Here we present a benchmark framework for comparison between flexible-backbone design methods applied to binding interactions. We quantify the ability of different flexible backbone design methods in the widely used protein design software Rosetta to recapitulate observed protein sequence profiles assumed to represent functional protein/protein and protein/small molecule binding interactions. The CoupledMoves method, which combines backbone flexibility and sequence exploration into a single acceptance step during the sampling trajectory, better recapitulates observed sequence profiles than the BackrubEnsemble and FastDesign methods, which separate backbone flexibility and sequence design into separate acceptance steps during the sampling trajectory. Flexible-backbone design with the CoupledMoves method is a powerful strategy for reducing sequence space to generate targeted libraries for experimental screening and selection.
Collapse
Affiliation(s)
- Amanda L Loshbaugh
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California.,Biophysics Graduate Program, University of California San Francisco, San Francisco, California
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California.,Biophysics Graduate Program, University of California San Francisco, San Francisco, California.,Quantitative Biosciences Institute, University of California San Francisco, San Francisco, California.,Chan Zuckerberg Biohub, San Francisco, California
| |
Collapse
|
11
|
Hallen MA, Martin JW, Ojewole A, Jou JD, Lowegard AU, Frenkel MS, Gainza P, Nisonoff HM, Mukund A, Wang S, Holt GT, Zhou D, Dowd E, Donald BR. OSPREY 3.0: Open-source protein redesign for you, with powerful new features. J Comput Chem 2018; 39:2494-2507. [PMID: 30368845 PMCID: PMC6391056 DOI: 10.1002/jcc.25522] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Accepted: 06/14/2018] [Indexed: 12/14/2022]
Abstract
We present osprey 3.0, a new and greatly improved release of the osprey protein design software. Osprey 3.0 features a convenient new Python interface, which greatly improves its ease of use. It is over two orders of magnitude faster than previous versions of osprey when running the same algorithms on the same hardware. Moreover, osprey 3.0 includes several new algorithms, which introduce substantial speedups as well as improved biophysical modeling. It also includes GPU support, which provides an additional speedup of over an order of magnitude. Like previous versions of osprey, osprey 3.0 offers a unique package of advantages over other design software, including provable design algorithms that account for continuous flexibility during design and model conformational entropy. Finally, we show here empirically that osprey 3.0 accurately predicts the effect of mutations on protein-protein binding. Osprey 3.0 is available at http://www.cs.duke.edu/donaldlab/osprey.php as free and open-source software. © 2018 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Mark A. Hallen
- Department of Computer Science, Duke University, Durham, NC
27708
- Toyota Technological Institute at Chicago, Chicago, IL
60637
| | | | - Adegoke Ojewole
- Program in Computational Biology and Bioinformatics, Duke
University Medical Center, Durham, NC 27710
| | - Jonathan D. Jou
- Department of Computer Science, Duke University, Durham, NC
27708
| | - Anna U. Lowegard
- Program in Computational Biology and Bioinformatics, Duke
University Medical Center, Durham, NC 27710
| | - Marcel S. Frenkel
- Department of Biochemistry, Duke University Medical Center,
Durham, NC 27710
| | - Pablo Gainza
- Department of Computer Science, Duke University, Durham, NC
27708
| | | | - Aditya Mukund
- Department of Computer Science, Duke University, Durham, NC
27708
| | - Siyu Wang
- Program in Computational Biology and Bioinformatics, Duke
University Medical Center, Durham, NC 27710
| | - Graham T. Holt
- Program in Computational Biology and Bioinformatics, Duke
University Medical Center, Durham, NC 27710
| | - David Zhou
- Department of Computer Science, Duke University, Durham, NC
27708
| | - Elizabeth Dowd
- Department of Computer Science, Duke University, Durham, NC
27708
| | - Bruce R. Donald
- Department of Computer Science, Duke University, Durham, NC
27708
- Department of Chemistry, Duke University, Durham, NC
27708
- Department of Biochemistry, Duke University Medical Center,
Durham, NC 27710
| |
Collapse
|
12
|
Hallen MA, Donald BR. CATS (Coordinates of Atoms by Taylor Series): protein design with backbone flexibility in all locally feasible directions. Bioinformatics 2018; 33:i5-i12. [PMID: 28882005 PMCID: PMC5870559 DOI: 10.1093/bioinformatics/btx277] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Motivation When proteins mutate or bind to ligands, their backbones often move significantly, especially in loop regions. Computational protein design algorithms must model these motions in order to accurately optimize protein stability and binding affinity. However, methods for backbone conformational search in design have been much more limited than for sidechain conformational search. This is especially true for combinatorial protein design algorithms, which aim to search a large sequence space efficiently and thus cannot rely on temporal simulation of each candidate sequence. Results We alleviate this difficulty with a new parameterization of backbone conformational space, which represents all degrees of freedom of a specified segment of protein chain that maintain valid bonding geometry (by maintaining the original bond lengths and angles and ω dihedrals). In order to search this space, we present an efficient algorithm, CATS, for computing atomic coordinates as a function of our new continuous backbone internal coordinates. CATS generalizes the iMinDEE and EPIC protein design algorithms, which model continuous flexibility in sidechain dihedrals, to model continuous, appropriately localized flexibility in the backbone dihedrals ϕ and ψ as well. We show using 81 test cases based on 29 different protein structures that CATS finds sequences and conformations that are significantly lower in energy than methods with less or no backbone flexibility do. In particular, we show that CATS can model the viability of an antibody mutation known experimentally to increase affinity, but that appears sterically infeasible when modeled with less or no backbone flexibility. Availability and implementation Our code is available as free software at https://github.com/donaldlab/OSPREY_refactor. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mark A Hallen
- Department of Computer Science, Duke University, Durham, NC, USA.,Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC, USA.,Department of Chemistry, Duke University, Durham, NC, USA.,Department of Biochemistry, Duke University Medical Center, Durham, NC, USA
| |
Collapse
|
13
|
Xiao X, Wang Y, Leonard JN, Hall CK. Extended Concerted Rotation Technique Enhances the Sampling Efficiency of the Computational Peptide-Design Algorithm. J Chem Theory Comput 2017; 13:5709-5720. [PMID: 29023116 DOI: 10.1021/acs.jctc.7b00714] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
To enhance the sampling efficiency of our computational peptide-design algorithm in conformational space, the concerted rotation (CONROT) technique is extended to enable larger conformational perturbations of peptide chains. This allows us to make relatively large peptide conformation changes during the process of designing peptide sequences to bind with high affinity to a specific target. Searches conducted using the new algorithm identified six potential λ N(2-22) peptide variants, called B1-B6, which bind to boxB RNA with high affinity. The results of explicit-solvent atomistic molecular dynamics simulations revealed that four of the evolved peptides, viz. B1, B2, B3, and B5, are excellent candidate binders to the target boxB RNA as they have lower binding free energies than the original λ N(2-22) peptide. Three of the four peptides, B2, B3, and B5, result from searches that contain both sequence and conformation changes, indicating that adding backbone motif changes to the peptide-design algorithm improves its performance considerably.
Collapse
Affiliation(s)
- Xingqing Xiao
- Chemical and Biomolecular Engineering Department, North Carolina State University , Raleigh, North Carolina 27695-7905, United States
| | - Yiming Wang
- Chemical and Biomolecular Engineering Department, North Carolina State University , Raleigh, North Carolina 27695-7905, United States
| | - Joshua N Leonard
- Chemical and Biological Engineering Department and Chemistry of Life Processes Institute, Northwestern University , Evanston, Illinois 60208, United States
| | - Carol K Hall
- Chemical and Biomolecular Engineering Department, North Carolina State University , Raleigh, North Carolina 27695-7905, United States
| |
Collapse
|
14
|
Jain S, Jou JD, Georgiev IS, Donald BR. A critical analysis of computational protein design with sparse residue interaction graphs. PLoS Comput Biol 2017; 13:e1005346. [PMID: 28358804 PMCID: PMC5391103 DOI: 10.1371/journal.pcbi.1005346] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 04/13/2017] [Accepted: 01/03/2017] [Indexed: 11/19/2022] Open
Abstract
Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies. Computational structure-based protein design algorithms have successfully redesigned proteins to fold and bind target substrates in vitro, and even in vivo. Because the complexity of a computational design increases dramatically with the number of mutable residues, many design algorithms employ cutoffs (distance or energy) to neglect some pairwise residue interactions, thereby reducing the effective search space and computational cost. However, the energies neglected by such cutoffs can add up, which may have nontrivial effects on the designed sequence and its function. To study the effects of using cutoffs on protein design, we computed the optimal sequence both with and without cutoffs, and showed that neglecting long-range interactions can significantly change the computed conformation and sequence. Designs on proteins with experimentally measured thermostability showed the benefits of computing the optimal sequences (and their conformations), both with and without cutoffs, efficiently and accurately. Therefore, we also showed that a provable, ensemble-based algorithm can efficiently compute the optimal conformation and sequence, both with and without applying cutoffs, by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine cutoffs with provable, ensemble-based algorithms to reap the computational efficiency of cutoffs while avoiding their potential inaccuracies.
Collapse
Affiliation(s)
- Swati Jain
- Computational Biology and Bioinformatics Program, Duke University, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Jonathan D. Jou
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Ivelin S. Georgiev
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Bruce R. Donald
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, United States of America
- Department of Chemistry, Duke University, Durham, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
15
|
Abstract
Protein-protein interactions play critical roles in essentially every cellular process. These interactions are often mediated by protein interaction domains that enable proteins to recognize their interaction partners, often by binding to short peptide motifs. For example, PDZ domains, which are among the most common protein interaction domains in the human proteome, recognize specific linear peptide sequences that are often at the C-terminus of other proteins. Determining the set of peptide sequences that a protein interaction domain binds, or it's "peptide specificity," is crucial for understanding its cellular function, and predicting how mutations impact peptide specificity is important for elucidating the mechanisms underlying human diseases. Moreover, engineering novel cellular functions for synthetic biology applications, such as the biosynthesis of biofuels or drugs, requires the design of protein interaction specificity to avoid crosstalk with native metabolic and signaling pathways. The ability to accurately predict and design protein-peptide interaction specificity is therefore critical for understanding and engineering biological function. One approach that has recently been employed toward accomplishing this goal is computational protein design. This chapter provides an overview of recent methodological advances in computational protein design and highlights examples of how these advances can enable increased accuracy in predicting and designing peptide specificity.
Collapse
Affiliation(s)
- Noah Ollikainen
- Division of Biology and Biological Engineering, California Institute of Technology, 1200 East California Blvd., Pasadena, CA, 91125, USA
| |
Collapse
|
16
|
Hallen MA, Gainza P, Donald BR. Compact Representation of Continuous Energy Surfaces for More Efficient Protein Design. J Chem Theory Comput 2016; 11:2292-306. [PMID: 26089744 DOI: 10.1021/ct501031m] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In macromolecular design, conformational energies are sensitive to small changes in atom coordinates; thus, modeling the small, continuous motions of atoms around low-energy wells confers a substantial advantage in structural accuracy. However, modeling these motions comes at the cost of a very large number of energy function calls, which form the bottleneck in the design calculations. In this work, we remove this bottleneck by consolidating all conformational energy evaluations into the pre-computation of a local polynomial expansion of the energy about the "ideal" conformation for each low-energy, "rotameric" state of each residue pair. This expansion is called "energy as polynomials in internal coordinates" (EPIC), where the internal coordinates can be side-chain dihedrals, backrub angles, and/or any other continuous degrees of freedom of a macromolecule, and any energy function can be used without adding any asymptotic complexity to the design. We demonstrate that EPIC efficiently represents the energy surface for both molecular-mechanics and quantum-mechanical energy functions, and apply it specifically to protein design for modeling both side chain and backbone degrees of freedom.
Collapse
|
17
|
Regan L, Caballero D, Hinrichsen MR, Virrueta A, Williams DM, O'Hern CS. Protein design: Past, present, and future. Biopolymers 2016; 104:334-50. [PMID: 25784145 DOI: 10.1002/bip.22639] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Revised: 03/05/2015] [Accepted: 03/07/2015] [Indexed: 01/16/2023]
Abstract
Building on the pioneering work of Ho and DeGrado (J Am Chem Soc 1987, 109, 6751-6758) in the late 1980s, protein design approaches have revealed many fundamental features of protein structure and stability. We are now in the era that the early work presaged - the design of new proteins with practical applications and uses. Here we briefly survey some past milestones in protein design, in addition to highlighting recent progress and future aspirations.
Collapse
Affiliation(s)
- Lynne Regan
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT.,Department of Chemistry, Yale University, New Haven, CT.,Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, CT
| | - Diego Caballero
- Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, CT.,Department of Physics, Yale University, New Haven, CT
| | - Michael R Hinrichsen
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT
| | - Alejandro Virrueta
- Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, CT.,Department of Mechanical Engineering and Materials Science, Yale University, New Haven, CT
| | - Danielle M Williams
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT
| | - Corey S O'Hern
- Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, CT.,Department of Physics, Yale University, New Haven, CT.,Department of Mechanical Engineering and Materials Science, Yale University, New Haven, CT.,Department of Applied Physics, Yale University, New Haven, CT
| |
Collapse
|
18
|
Xiao X, Agris PF, Hall CK. Designing peptide sequences in flexible chain conformations to bind RNA: a search algorithm combining Monte Carlo, self-consistent mean field and concerted rotation techniques. J Chem Theory Comput 2016; 11:740-52. [PMID: 26579605 DOI: 10.1021/ct5008247] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
A search algorithm combining Monte Carlo, self-consistent mean field, and concerted rotation techniques was developed to discover peptide sequences that are reasonable HIV drug candidates due to their exceptional binding to human tRNAUUU(Lys3), the primer of HIV replication. The search algorithm allows for iteration between sequence mutations and conformation changes during sequence evolution. Searches conducted for different classes of peptides identified several potential peptide candidates. Analysis of the energy revealed that the asparagine and cysteine at residues 11 and 12 play important roles in "recognizing" tRNA(Lys3) via van der Waals interactions, contributing to binding specificity. Arginines preferentially attract the phosphate linkage via charge-charge interaction, contributing to binding affinity. Evaluation of the RNA/peptide complex's structure revealed that adding conformation changes to the search algorithm yields peptides with better binding affinity and specificity to tRNA(Lys3) than a previous mutation-only algorithm.
Collapse
Affiliation(s)
- Xingqing Xiao
- Chemical and Biomolecular Engineering Department, North Carolina State University , Raleigh, North Carolina 27695-7905, United States
| | - Paul F Agris
- The RNA Institute, University at Albany, State University of New York , Albany, New York 12222, United States
| | - Carol K Hall
- Chemical and Biomolecular Engineering Department, North Carolina State University , Raleigh, North Carolina 27695-7905, United States
| |
Collapse
|
19
|
Maximova T, Moffatt R, Ma B, Nussinov R, Shehu A. Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics. PLoS Comput Biol 2016; 12:e1004619. [PMID: 27124275 PMCID: PMC4849799 DOI: 10.1371/journal.pcbi.1004619] [Citation(s) in RCA: 132] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Investigation of macromolecular structure and dynamics is fundamental to understanding how macromolecules carry out their functions in the cell. Significant advances have been made toward this end in silico, with a growing number of computational methods proposed yearly to study and simulate various aspects of macromolecular structure and dynamics. This review aims to provide an overview of recent advances, focusing primarily on methods proposed for exploring the structure space of macromolecules in isolation and in assemblies for the purpose of characterizing equilibrium structure and dynamics. In addition to surveying recent applications that showcase current capabilities of computational methods, this review highlights state-of-the-art algorithmic techniques proposed to overcome challenges posed in silico by the disparate spatial and time scales accessed by dynamic macromolecules. This review is not meant to be exhaustive, as such an endeavor is impossible, but rather aims to balance breadth and depth of strategies for modeling macromolecular structure and dynamics for a broad audience of novices and experts.
Collapse
Affiliation(s)
- Tatiana Maximova
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
| | - Ryan Moffatt
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
| | - Buyong Ma
- Basic Science Program, Leidos Biomedical Research, Inc. Cancer and Inflammation Program, National Cancer Institute, Frederick, Maryland, United States of America
| | - Ruth Nussinov
- Basic Science Program, Leidos Biomedical Research, Inc. Cancer and Inflammation Program, National Cancer Institute, Frederick, Maryland, United States of America
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
- Department of Biongineering, George Mason University, Fairfax, Virginia, United States of America
- School of Systems Biology, George Mason University, Manassas, Virginia, United States of America
| |
Collapse
|
20
|
Jou JD, Jain S, Georgiev IS, Donald BR. BWM*: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design. J Comput Biol 2016; 23:413-24. [PMID: 26744898 DOI: 10.1089/cmb.2015.0194] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.
Collapse
Affiliation(s)
- Jonathan D Jou
- 1 Department of Computer Science, Duke University , Durham, North Carolina
| | - Swati Jain
- 1 Department of Computer Science, Duke University , Durham, North Carolina.,2 Department of Biochemistry, Duke University Medical Center , Durham, North Carolina.,3 Department of Computational Biology and Bioinformatics Program, Duke University , Durham, North Carolina
| | - Ivelin S Georgiev
- 1 Department of Computer Science, Duke University , Durham, North Carolina
| | - Bruce R Donald
- 1 Department of Computer Science, Duke University , Durham, North Carolina.,2 Department of Biochemistry, Duke University Medical Center , Durham, North Carolina.,4 Department of Chemistry, Duke University , Durham, North Carolina
| |
Collapse
|
21
|
Post-Transcriptional Modifications of RNA: Impact on RNA Function and Human Health. MODIFIED NUCLEIC ACIDS IN BIOLOGY AND MEDICINE 2016. [DOI: 10.1007/978-3-319-34175-0_5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
22
|
Abstract
PocketOptimizer is a computational method to design protein binding pockets that has been recently developed. Starting from a protein structure an existing small molecule binding pocket is optimized for the recognition of a new ligand. The modular program predicts mutations that will improve the affinity of a target small molecule to the protein of interest using a receptor-ligand scoring function to estimate the binding free energy. PocketOptimizer has been tested in a comprehensive benchmark and predicted mutations have also been used in experimental tests. In this chapter, we will provide general recommendations for usage as well as an in-depth description of all individual PocketOptimizer modules.
Collapse
Affiliation(s)
- Andre C Stiel
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Mehdi Nellen
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Birte Höcker
- Max Planck Institute for Developmental Biology, Tübingen, Germany. .,Lehrstuhl für Biochemie, Universität Bayreuth, Bayreuth, Germany.
| |
Collapse
|
23
|
Keedy DA, Fraser JS, van den Bedem H. Exposing Hidden Alternative Backbone Conformations in X-ray Crystallography Using qFit. PLoS Comput Biol 2015; 11:e1004507. [PMID: 26506617 PMCID: PMC4624436 DOI: 10.1371/journal.pcbi.1004507] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Accepted: 06/22/2015] [Indexed: 12/13/2022] Open
Abstract
Proteins must move between different conformations of their native ensemble to perform their functions. Crystal structures obtained from high-resolution X-ray diffraction data reflect this heterogeneity as a spatial and temporal conformational average. Although movement between natively populated alternative conformations can be critical for characterizing molecular mechanisms, it is challenging to identify these conformations within electron density maps. Alternative side chain conformations are generally well separated into distinct rotameric conformations, but alternative backbone conformations can overlap at several atomic positions. Our model building program qFit uses mixed integer quadratic programming (MIQP) to evaluate an extremely large number of combinations of sidechain conformers and backbone fragments to locally explain the electron density. Here, we describe two major modeling enhancements to qFit: peptide flips and alternative glycine conformations. We find that peptide flips fall into four stereotypical clusters and are enriched in glycine residues at the n+1 position. The potential for insights uncovered by new peptide flips and glycine conformations is exemplified by HIV protease, where different inhibitors are associated with peptide flips in the “flap” regions adjacent to the inhibitor binding site. Our results paint a picture of peptide flips as conformational switches, often enabled by glycine flexibility, that result in dramatic local rearrangements. Our results furthermore demonstrate the power of large-scale computational analysis to provide new insights into conformational heterogeneity. Overall, improved modeling of backbone heterogeneity with high-resolution X-ray data will connect dynamics to the structure-function relationship and help drive new design strategies for inhibitors of biomedically important systems. Describing the multiple conformations of proteins is important for understanding the relationship between molecular flexibility and function. However, most methods for interpreting data from X-ray crystallography focus on building a single structure of the protein, which limits the potential for biological insights. Here we introduce an improved algorithm for using crystallographic data to model these multiple conformations that addresses two previously overlooked types of protein backbone flexibility: peptide flips and glycine movements. The method successfully models known examples of these types of multiple conformations, and also identifies new cases that were previously unrecognized but are well supported by the experimental data. For example, we discover glycine-driven peptide flips in the inhibitor-gating “flaps” of the drug target HIV protease that were not modeled in the original structures. Automatically modeling “hidden” multiple conformations of proteins using our algorithm may help drive biomedically relevant insights in structural biology pertaining to, e.g., drug discovery for HIV–1 protease and other therapeutic targets.
Collapse
Affiliation(s)
- Daniel A. Keedy
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, United States of America
| | - James S. Fraser
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, United States of America
| | - Henry van den Bedem
- Division of Biosciences, SLAC National Accelerator Laboratory, Stanford University, California, United States of America
- * E-mail:
| |
Collapse
|
24
|
Roberts KE, Gainza P, Hallen MA, Donald BR. Fast gap-free enumeration of conformations and sequences for protein design. Proteins 2015; 83:1859-1877. [PMID: 26235965 DOI: 10.1002/prot.24870] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Revised: 07/14/2015] [Accepted: 07/21/2015] [Indexed: 12/12/2022]
Abstract
Despite significant successes in structure-based computational protein design in recent years, protein design algorithms must be improved to increase the biological accuracy of new designs. Protein design algorithms search through an exponential number of protein conformations, protein ensembles, and amino acid sequences in an attempt to find globally optimal structures with a desired biological function. To improve the biological accuracy of protein designs, it is necessary to increase both the amount of protein flexibility allowed during the search and the overall size of the design, while guaranteeing that the lowest-energy structures and sequences are found. DEE/A*-based algorithms are the most prevalent provable algorithms in the field of protein design and can provably enumerate a gap-free list of low-energy protein conformations, which is necessary for ensemble-based algorithms that predict protein binding. We present two classes of algorithmic improvements to the A* algorithm that greatly increase the efficiency of A*. First, we analyze the effect of ordering the expansion of mutable residue positions within the A* tree and present a dynamic residue ordering that reduces the number of A* nodes that must be visited during the search. Second, we propose new methods to improve the conformational bounds used to estimate the energies of partial conformations during the A* search. The residue ordering techniques and improved bounds can be combined for additional increases in A* efficiency. Our enhancements enable all A*-based methods to more fully search protein conformation space, which will ultimately improve the accuracy of complex biomedically relevant designs.
Collapse
Affiliation(s)
- Kyle E Roberts
- Department of Computer Science, Duke University, Durham, NC
| | - Pablo Gainza
- Department of Computer Science, Duke University, Durham, NC
| | - Mark A Hallen
- Department of Computer Science, Duke University, Durham, NC
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC.,Department of Biochemistry, Duke University Medical Center, Durham, NC.,Department of Chemistry, Duke University, Durham, NC
| |
Collapse
|
25
|
LuCore SD, Litman JM, Powers KT, Gao S, Lynn AM, Tollefson WTA, Fenn TD, Washington MT, Schnieders MJ. Dead-End Elimination with a Polarizable Force Field Repacks PCNA Structures. Biophys J 2015; 109:816-26. [PMID: 26287633 PMCID: PMC4547145 DOI: 10.1016/j.bpj.2015.06.062] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Revised: 06/07/2015] [Accepted: 06/29/2015] [Indexed: 11/15/2022] Open
Abstract
A balance of van der Waals, electrostatic, and hydrophobic forces drive the folding and packing of protein side chains. Although such interactions between residues are often approximated as being pairwise additive, in reality, higher-order many-body contributions that depend on environment drive hydrophobic collapse and cooperative electrostatics. Beginning from dead-end elimination, we derive the first algorithm, to our knowledge, capable of deterministic global repacking of side chains compatible with many-body energy functions. The approach is applied to seven PCNA x-ray crystallographic data sets with resolutions 2.5-3.8 Å (mean 3.0 Å) using an open-source software. While PDB_REDO models average an Rfree value of 29.5% and MOLPROBITY score of 2.71 Å (77th percentile), dead-end elimination with the polarizable AMOEBA force field lowered Rfree by 2.8-26.7% and improved mean MOLPROBITY score to atomic resolution at 1.25 Å (100th percentile). For structural biology applications that depend on side-chain repacking, including x-ray refinement, homology modeling, and protein design, the accuracy limitations of pairwise additivity can now be eliminated via polarizable or quantum mechanical potentials.
Collapse
Affiliation(s)
- Stephen D LuCore
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa
| | - Jacob M Litman
- Department of Biochemistry, University of Iowa, Iowa City, Iowa
| | - Kyle T Powers
- Department of Biochemistry, University of Iowa, Iowa City, Iowa
| | - Shibo Gao
- Department of Biochemistry, University of Iowa, Iowa City, Iowa
| | - Ava M Lynn
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa
| | | | | | | | - Michael J Schnieders
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa; Department of Biochemistry, University of Iowa, Iowa City, Iowa.
| |
Collapse
|
26
|
Allouche D, André I, Barbe S, Davies J, de Givry S, Katsirelos G, O'Sullivan B, Prestwich S, Schiex T, Traoré S. Computational protein design as an optimization problem. ARTIF INTELL 2014. [DOI: 10.1016/j.artint.2014.03.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
27
|
Headd JJ, Echols N, Afonine PV, Moriarty NW, Gildea RJ, Adams PD. Flexible torsion-angle noncrystallographic symmetry restraints for improved macromolecular structure refinement. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2014; 70:1346-56. [PMID: 24816103 PMCID: PMC4014122 DOI: 10.1107/s1399004714003277] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2013] [Accepted: 02/13/2014] [Indexed: 11/10/2022]
Abstract
One of the great challenges in refining macromolecular crystal structures is a low data-to-parameter ratio. Historically, knowledge from chemistry has been used to help to improve this ratio. When a macromolecule crystallizes with more than one copy in the asymmetric unit, the noncrystallographic symmetry relationships can be exploited to provide additional restraints when refining the working model. However, although globally similar, NCS-related chains often have local differences. To allow for local differences between NCS-related molecules, flexible torsion-based NCS restraints have been introduced, coupled with intelligent rotamer handling for protein chains, and are available in phenix.refine for refinement of models at all resolutions.
Collapse
Affiliation(s)
- Jeffrey J. Headd
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Duke University Medical Center, Durham, NC 27710, USA
| | | | | | | | - Richard J. Gildea
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Diamond Light Source, Harwell Science and Innovation Campus, Didcot OX11 0DE, England
| | - Paul D. Adams
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Department of Bioengineering, UC Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
28
|
Wijma HJ, Floor RJ, Jekel PA, Baker D, Marrink SJ, Janssen DB. Computationally designed libraries for rapid enzyme stabilization. Protein Eng Des Sel 2014; 27:49-58. [PMID: 24402331 PMCID: PMC3893934 DOI: 10.1093/protein/gzt061] [Citation(s) in RCA: 167] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Revised: 11/28/2013] [Accepted: 11/30/2013] [Indexed: 11/24/2022] Open
Abstract
The ability to engineer enzymes and other proteins to any desired stability would have wide-ranging applications. Here, we demonstrate that computational design of a library with chemically diverse stabilizing mutations allows the engineering of drastically stabilized and fully functional variants of the mesostable enzyme limonene epoxide hydrolase. First, point mutations were selected if they significantly improved the predicted free energy of protein folding. Disulfide bonds were designed using sampling of backbone conformational space, which tripled the number of experimentally stabilizing disulfide bridges. Next, orthogonal in silico screening steps were used to remove chemically unreasonable mutations and mutations that are predicted to increase protein flexibility. The resulting library of 64 variants was experimentally screened, which revealed 21 (pairs of) stabilizing mutations located both in relatively rigid and in flexible areas of the enzyme. Finally, combining 10-12 of these confirmed mutations resulted in multi-site mutants with an increase in apparent melting temperature from 50 to 85°C, enhanced catalytic activity, preserved regioselectivity and a >250-fold longer half-life. The developed Framework for Rapid Enzyme Stabilization by Computational libraries (FRESCO) requires far less screening than conventional directed evolution.
Collapse
Affiliation(s)
- Hein J. Wijma
- Department of Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| | - Robert J. Floor
- Department of Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| | - Peter A. Jekel
- Department of Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
| | - Siewert J. Marrink
- Department of Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
- Department of Biophysical Chemistry, Zernike Institute for Advanced Materials, University of Groningen, Groningen, The Netherlands
| | - Dick B. Janssen
- Department of Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| |
Collapse
|
29
|
Pitman DJ, Schenkelberg CD, Huang YM, Teets FD, DiTursi D, Bystroff C. Improving computational efficiency and tractability of protein design using a piecemeal approach. A strategy for parallel and distributed protein design. ACTA ACUST UNITED AC 2013; 30:1138-1145. [PMID: 24371152 DOI: 10.1093/bioinformatics/btt735] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Accepted: 12/15/2013] [Indexed: 11/14/2022]
Abstract
MOTIVATION Accuracy in protein design requires a fine-grained rotamer search, multiple backbone conformations, and a detailed energy function, creating a burden in runtime and memory requirements. A design task may be split into manageable pieces in both three-dimensional space and in the rotamer search space to produce small, fast jobs that are easily distributed. However, these jobs must overlap, presenting a problem in resolving conflicting solutions in the overlap regions. RESULTS Piecemeal design, in which the design space is split into overlapping regions and rotamer search spaces, accelerates the design process whether jobs are run in series or in parallel. Large jobs that cannot fit in memory were made possible by splitting. Accepting the consensus amino acid selection in conflict regions led to non-optimal choices. Instead, conflicts were resolved using a second pass, in which the split regions were re-combined and designed as one, producing results that were closer to optimal with a minimal increase in runtime over the consensus strategy. Splitting the search space at the rotamer level instead of at the amino acid level further improved the efficiency by reducing the search space in the second pass. AVAILABILITY AND IMPLEMENTATION Programs for splitting protein design expressions are available at www.bioinfo.rpi.edu/tools/piecemeal.html CONTACT: bystrc@rpi.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Derek J Pitman
- Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Christian D Schenkelberg
- Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Yao-Ming Huang
- Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Frank D Teets
- Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Daniel DiTursi
- Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Christopher Bystroff
- Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| |
Collapse
|
30
|
Huang YM, Bystroff C. Expanded explorations into the optimization of an energy function for protein design. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1176-1187. [PMID: 24384706 PMCID: PMC3919130 DOI: 10.1109/tcbb.2013.113] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Nature possesses a secret formula for the energy as a function of the structure of a protein. In protein design, approximations are made to both the structural representation of the molecule and to the form of the energy equation, such that the existence of a general energy function for proteins is by no means guaranteed. Here, we present new insights toward the application of machine learning to the problem of finding a general energy function for protein design. Machine learning requires the definition of an objective function, which carries with it the implied definition of success in protein design. We explored four functions, consisting of two functional forms, each with two criteria for success. Optimization was carried out by a Monte Carlo search through the space of all variable parameters. Cross-validation of the optimized energy function against a test set gave significantly different results depending on the choice of objective function, pointing to relative correctness of the built-in assumptions. Novel energy cross terms correct for the observed nonadditivity of energy terms and an imbalance in the distribution of predicted amino acids. This paper expands on the work presented at the 2012 ACM-BCB.
Collapse
|
31
|
Traoré S, Allouche D, André I, de Givry S, Katsirelos G, Schiex T, Barbe S. A new framework for computational protein design through cost function network optimization. Bioinformatics 2013; 29:2129-36. [DOI: 10.1093/bioinformatics/btt374] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
32
|
Wijma HJ, Janssen DB. Computational design gains momentum in enzyme catalysis engineering. FEBS J 2013; 280:2948-60. [DOI: 10.1111/febs.12324] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Revised: 04/19/2013] [Accepted: 04/24/2013] [Indexed: 01/19/2023]
Affiliation(s)
- Hein J. Wijma
- Department of Biochemistry; Groningen Biomolecular Sciences and Biotechnology Institute; University of Groningen; The Netherlands
| | - Dick B. Janssen
- Department of Biochemistry; Groningen Biomolecular Sciences and Biotechnology Institute; University of Groningen; The Netherlands
| |
Collapse
|
33
|
Schallmey M, Floor RJ, Hauer B, Breuer M, Jekel PA, Wijma HJ, Dijkstra BW, Janssen DB. Biocatalytic and structural properties of a highly engineered halohydrin dehalogenase. Chembiochem 2013; 14:870-81. [PMID: 23585096 DOI: 10.1002/cbic.201300005] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2013] [Indexed: 01/30/2023]
Abstract
Two highly engineered halohydrin dehalogenase variants were characterized in terms of their performance in dehalogenation and epoxide cyanolysis reactions. Both enzyme variants outperformed the wild-type enzyme in the cyanolysis of ethyl (S)-3,4-epoxybutyrate, a conversion yielding ethyl (R)-4-cyano-3-hydroxybutyrate, an important chiral building block for statin synthesis. One of the enzyme variants, HheC2360, displayed catalytic rates for this cyanolysis reaction enhanced up to tenfold. Furthermore, the enantioselectivity of this variant was the opposite of that of the wild-type enzyme, both for dehalogenation and for cyanolysis reactions. The 37-fold mutant HheC2360 showed an increase in thermal stability of 8 °C relative to the wild-type enzyme. Crystal structures of this enzyme were elucidated with chloride and ethyl (S)-3,4-epoxybutyrate or with ethyl (R)-4-cyano-3-hydroxybutyrate bound in the active site. The observed increase in temperature stability was explained in terms of a substantial increase in buried surface area relative to the wild-type HheC, together with enhanced interfacial interactions between the subunits that form the tetramer. The structures also revealed that the substrate binding pocket was modified both by substitutions and by backbone movements in loops surrounding the active site. The observed changes in the mutant structures are partly governed by coupled mutations, some of which are necessary to remove steric clashes or to allow backbone movements to occur. The importance of interactions between substitutions suggests that efficient directed evolution strategies should allow for compensating and synergistic mutations during library design.
Collapse
Affiliation(s)
- Marcus Schallmey
- Department of Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| | | | | | | | | | | | | | | |
Collapse
|
34
|
Gainza P, Roberts KE, Georgiev I, Lilien RH, Keedy DA, Chen CY, Reza F, Anderson AC, Richardson DC, Richardson JS, Donald BR. OSPREY: protein design with ensembles, flexibility, and provable algorithms. Methods Enzymol 2013; 523:87-107. [PMID: 23422427 DOI: 10.1016/b978-0-12-394292-0.00005-9] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
UNLABELLED We have developed a suite of protein redesign algorithms that improves realistic in silico modeling of proteins. These algorithms are based on three characteristics that make them unique: (1) improved flexibility of the protein backbone, protein side-chains, and ligand to accurately capture the conformational changes that are induced by mutations to the protein sequence; (2) modeling of proteins and ligands as ensembles of low-energy structures to better approximate binding affinity; and (3) a globally optimal protein design search, guaranteeing that the computational predictions are optimal with respect to the input model. Here, we illustrate the importance of these three characteristics. We then describe OSPREY, a protein redesign suite that implements our protein design algorithms. OSPREY has been used prospectively, with experimental validation, in several biomedically relevant settings. We show in detail how OSPREY has been used to predict resistance mutations and explain why improved flexibility, ensembles, and provability are essential for this application. AVAILABILITY OSPREY is free and open source under a Lesser GPL license. The latest version is OSPREY 2.0. The program, user manual, and source code are available at www.cs.duke.edu/donaldlab/software.php. CONTACT osprey@cs.duke.edu.
Collapse
Affiliation(s)
- Pablo Gainza
- Department of Computer Science, Duke University, Durham, North Carolina, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Malisi C, Schumann M, Toussaint NC, Kageyama J, Kohlbacher O, Höcker B. Binding pocket optimization by computational protein design. PLoS One 2012; 7:e52505. [PMID: 23300688 PMCID: PMC3531388 DOI: 10.1371/journal.pone.0052505] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 11/14/2012] [Indexed: 01/19/2023] Open
Abstract
Engineering specific interactions between proteins and small molecules is extremely useful for biological studies, as these interactions are essential for molecular recognition. Furthermore, many biotechnological applications are made possible by such an engineering approach, ranging from biosensors to the design of custom enzyme catalysts. Here, we present a novel method for the computational design of protein-small ligand binding named PocketOptimizer. The program can be used to modify protein binding pocket residues to improve or establish binding of a small molecule. It is a modular pipeline based on a number of customizable molecular modeling tools to predict mutations that alter the affinity of a target protein to its ligand. At its heart it uses a receptor-ligand scoring function to estimate the binding free energy between protein and ligand. We compiled a benchmark set that we used to systematically assess the performance of our method. It consists of proteins for which mutational variants with different binding affinities for their ligands and experimentally determined structures exist. Within this test set PocketOptimizer correctly predicts the mutant with the higher affinity in about 69% of the cases. A detailed analysis of the results reveals that the strengths of PocketOptimizer lie in the correct introduction of stabilizing hydrogen bonds to the ligand, as well as in the improved geometric complemetarity between ligand and binding pocket. Apart from the novel method for binding pocket design we also introduce a much needed benchmark data set for the comparison of affinities of mutant binding pockets, and that we use to asses programs for in silico design of ligand binding.
Collapse
Affiliation(s)
- Christoph Malisi
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Marcel Schumann
- Center for Bioinformatics, Quantitative Biology Center, and Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Nora C. Toussaint
- Center for Bioinformatics, Quantitative Biology Center, and Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Jorge Kageyama
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Oliver Kohlbacher
- Center for Bioinformatics, Quantitative Biology Center, and Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Birte Höcker
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- * E-mail:
| |
Collapse
|
36
|
Chitsaz M, Mayo SL. GRID: a high-resolution protein structure refinement algorithm. J Comput Chem 2012; 34:445-50. [PMID: 23065773 DOI: 10.1002/jcc.23151] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2012] [Revised: 07/31/2012] [Accepted: 08/27/2012] [Indexed: 12/27/2022]
Abstract
The energy-based refinement of protein structures generated by fold prediction algorithms to atomic-level accuracy remains a major challenge in structural biology. Energy-based refinement is mainly dependent on two components: (1) sufficiently accurate force fields, and (2) efficient conformational space search algorithms. Focusing on the latter, we developed a high-resolution refinement algorithm called GRID. It takes a three-dimensional protein structure as input and, using an all-atom force field, attempts to improve the energy of the structure by systematically perturbing backbone dihedrals and side-chain rotamer conformations. We compare GRID to Backrub, a stochastic algorithm that has been shown to predict a significant fraction of the conformational changes that occur with point mutations. We applied GRID and Backrub to 10 high-resolution (≤ 2.8 Å) crystal structures from the Protein Data Bank and measured the energy improvements obtained and the computation times required to achieve them. GRID resulted in energy improvements that were significantly better than those attained by Backrub while expending about the same amount of computational resources. GRID resulted in relaxed structures that had slightly higher backbone RMSDs compared to Backrub relative to the starting crystal structures. The average RMSD was 0.25 ± 0.02 Å for GRID versus 0.14 ± 0.04 Å for Backrub. These relatively minor deviations indicate that both algorithms generate structures that retain their original topologies, as expected given the nature of the algorithms.
Collapse
Affiliation(s)
- Mohsen Chitsaz
- Biochemistry and Molecular Biophysics Option, California Institute of Technology, Pasadena, California 91125, USA
| | | |
Collapse
|
37
|
Hallen MA, Keedy DA, Donald BR. Dead-end elimination with perturbations (DEEPer): a provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins 2012; 81:18-39. [PMID: 22821798 DOI: 10.1002/prot.24150] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Revised: 07/01/2012] [Accepted: 07/11/2012] [Indexed: 11/12/2022]
Abstract
Computational protein and drug design generally require accurate modeling of protein conformations. This modeling typically starts with an experimentally determined protein structure and considers possible conformational changes due to mutations or new ligands. The DEE/A* algorithm provably finds the global minimum-energy conformation (GMEC) of a protein assuming that the backbone does not move and the sidechains take on conformations from a set of discrete, experimentally observed conformations called rotamers. DEE/A* can efficiently find the overall GMEC for exponentially many mutant sequences. Previous improvements to DEE/A* include modeling ensembles of sidechain conformations and either continuous sidechain or backbone flexibility. We present a new algorithm, DEEPer (Dead-End Elimination with Perturbations), that combines these advantages and can also handle much more extensive backbone flexibility and backbone ensembles. DEEPer provably finds the GMEC or, if desired by the user, all conformations and sequences within a specified energy window of the GMEC. It includes the new abilities to handle arbitrarily large backbone perturbations and to generate ensembles of backbone conformations. It also incorporates the shear, an experimentally observed local backbone motion never before used in design. Additionally, we derive a new method to accelerate DEE/A*-based calculations, indirect pruning, that is particularly useful for DEEPer. In 67 benchmark tests on 64 proteins, DEEPer consistently identified lower-energy conformations than previous methods did, indicating more accurate modeling. Additional tests demonstrated its ability to incorporate larger, experimentally observed backbone conformational changes and to model realistic conformational ensembles. These capabilities provide significant advantages for modeling protein mutations and protein-ligand interactions.
Collapse
Affiliation(s)
- Mark A Hallen
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, USA
| | | | | |
Collapse
|
38
|
Keedy DA, Georgiev I, Triplett EB, Donald BR, Richardson DC, Richardson JS. The role of local backrub motions in evolved and designed mutations. PLoS Comput Biol 2012; 8:e1002629. [PMID: 22876172 PMCID: PMC3410847 DOI: 10.1371/journal.pcbi.1002629] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2011] [Accepted: 06/18/2012] [Indexed: 11/23/2022] Open
Abstract
Amino acid substitutions in protein structures often require subtle backbone adjustments that are difficult to model in atomic detail. An improved ability to predict realistic backbone changes in response to engineered mutations would be of great utility for the blossoming field of rational protein design. One model that has recently grown in acceptance is the backrub motion, a low-energy dipeptide rotation with single-peptide counter-rotations, that is coupled to dynamic two-state sidechain rotamer jumps, as evidenced by alternate conformations in very high-resolution crystal structures. It has been speculated that backrubs may facilitate sequence changes equally well as rotamer changes. However, backrub-induced shifts and experimental uncertainty are of similar magnitude for backbone atoms in even high-resolution structures, so comparison of wildtype-vs.-mutant crystal structure pairs is not sufficient to directly link backrubs to mutations. In this study, we use two alternative approaches that bypass this limitation. First, we use a quality-filtered structure database to aggregate many examples for precisely defined motifs with single amino acid differences, and find that the effectively amplified backbone differences closely resemble backrubs. Second, we directly apply a provably-accurate, backrub-enabled protein design algorithm to idealized versions of these motifs, and discover that the lowest-energy computed models match the average-coordinate experimental structures. These results support the hypothesis that backrubs participate in natural protein evolution and validate their continued use for design of synthetic proteins. Protein design has the potential to generate useful molecules for medicine and chemistry, including sensors, drugs, and catalysts for arbitrary reactions. When protein design is carried out starting from an experimentally determined structure, as is often the case, one important aspect to consider is backbone flexibility, because in response to a mutation the backbone often must shift slightly to reconcile the new sidechain with its environment. In principle, one may model the backbone in many ways, but not all are physically realistic or experimentally validated. Here we study the "backrub" motion, which has been previously documented in atomic detail, but only for sidechain movements within single structures. By a twopronged approach involving both structural bioinformatics and computation with a principled design algorithm, we demonstrate that backrubs are sufficient to explain the backbone differences between mutation-related sets of very precisely defined motifs from the protein structure database. Our findings illustrate that backrubs are useful for describing evolutionary sequence change and, by extension, suggest that they are also appropriate for rational protein design calculations.
Collapse
Affiliation(s)
- Daniel A Keedy
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, United States of America.
| | | | | | | | | | | |
Collapse
|
39
|
Kulp DW, Subramaniam S, Donald JE, Hannigan BT, Mueller BK, Grigoryan G, Senes A. Structural informatics, modeling, and design with an open-source Molecular Software Library (MSL). J Comput Chem 2012; 33:1645-61. [PMID: 22565567 PMCID: PMC3432414 DOI: 10.1002/jcc.22968] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Revised: 02/16/2012] [Accepted: 03/02/2012] [Indexed: 01/22/2023]
Abstract
We present the Molecular Software Library (MSL), a C++ library for molecular modeling. MSL is a set of tools that supports a large variety of algorithms for the design, modeling, and analysis of macromolecules. Among the main features supported by the library are methods for applying geometric transformations and alignments, the implementation of a rich set of energy functions, side chain optimization, backbone manipulation, calculation of solvent accessible surface area, and other tools. MSL has a number of unique features, such as the ability of storing alternative atomic coordinates (for modeling) and multiple amino acid identities at the same backbone position (for design). It has a straightforward mechanism for extending its energy functions and can work with any type of molecules. Although the code base is large, MSL was created with ease of developing in mind. It allows the rapid implementation of simple tasks while fully supporting the creation of complex applications. Some of the potentialities of the software are demonstrated here with examples that show how to program complex and essential modeling tasks with few lines of code. MSL is an ongoing and evolving project, with new features and improvements being introduced regularly, but it is mature and suitable for production and has been used in numerous protein modeling and design projects. MSL is open-source software, freely downloadable at http://msl-libraries.org. We propose it as a common platform for the development of new molecular algorithms and to promote the distribution, sharing, and reutilization of computational methods.
Collapse
Affiliation(s)
| | | | | | - Brett T. Hannigan
- U. of Pennsylvania, Genomics and Computational Biology Graduate Group
| | | | | | | |
Collapse
|
40
|
Chen TS, Keating AE. Designing specific protein-protein interactions using computation, experimental library screening, or integrated methods. Protein Sci 2012; 21:949-63. [PMID: 22593041 DOI: 10.1002/pro.2096] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2012] [Accepted: 05/11/2012] [Indexed: 11/11/2022]
Abstract
Given the importance of protein-protein interactions for nearly all biological processes, the design of protein affinity reagents for use in research, diagnosis or therapy is an important endeavor. Engineered proteins would ideally have high specificities for their intended targets, but achieving interaction specificity by design can be challenging. There are two major approaches to protein design or redesign. Most commonly, proteins and peptides are engineered using experimental library screening and/or in vitro evolution. An alternative approach involves using protein structure and computational modeling to rationally choose sequences predicted to have desirable properties. Computational design has successfully produced novel proteins with enhanced stability, desired interactions and enzymatic function. Here we review the strengths and limitations of experimental library screening and computational structure-based design, giving examples where these methods have been applied to designing protein interaction specificity. We highlight recent studies that demonstrate strategies for combining computational modeling with library screening. The computational methods provide focused libraries predicted to be enriched in sequences with the properties of interest. Such integrated approaches represent a promising way to increase the efficiency of protein design and to engineer complex functionality such as interaction specificity.
Collapse
Affiliation(s)
- T Scott Chen
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | | |
Collapse
|
41
|
Abstract
UNLABELLED Optimizing amino acid conformation and identity is a central problem in computational protein design. Protein design algorithms must allow realistic protein flexibility to occur during this optimization, or they may fail to find the best sequence with the lowest energy. Most design algorithms implement side-chain flexibility by allowing the side chains to move between a small set of discrete, low-energy states, which we call rigid rotamers. In this work we show that allowing continuous side-chain flexibility (which we call continuous rotamers) greatly improves protein flexibility modeling. We present a large-scale study that compares the sequences and best energy conformations in 69 protein-core redesigns using a rigid-rotamer model versus a continuous-rotamer model. We show that in nearly all of our redesigns the sequence found by the continuous-rotamer model is different and has a lower energy than the one found by the rigid-rotamer model. Moreover, the sequences found by the continuous-rotamer model are more similar to the native sequences. We then show that the seemingly easy solution of sampling more rigid rotamers within the continuous region is not a practical alternative to a continuous-rotamer model: at computationally feasible resolutions, using more rigid rotamers was never better than a continuous-rotamer model and almost always resulted in higher energies. Finally, we present a new protein design algorithm based on the dead-end elimination (DEE) algorithm, which we call iMinDEE, that makes the use of continuous rotamers feasible in larger systems. iMinDEE guarantees finding the optimal answer while pruning the search space with close to the same efficiency of DEE. AVAILABILITY Software is available under the Lesser GNU Public License v3. Contact the authors for source code.
Collapse
|
42
|
Zhang C, Lai L. Automatch: Target-binding protein design and enzyme design by automatic pinpointing potential active sites in available protein scaffolds. Proteins 2012; 80:1078-94. [DOI: 10.1002/prot.24009] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Revised: 11/17/2011] [Accepted: 11/18/2011] [Indexed: 11/10/2022]
|
43
|
Zeng J, Roberts KE, Zhou P, Donald BR. A Bayesian approach for determining protein side-chain rotamer conformations using unassigned NOE data. J Comput Biol 2011; 18:1661-79. [PMID: 21970619 DOI: 10.1089/cmb.2011.0172] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A major bottleneck in protein structure determination via nuclear magnetic resonance (NMR) is the lengthy and laborious process of assigning resonances and nuclear Overhauser effect (NOE) cross peaks. Recent studies have shown that accurate backbone folds can be determined using sparse NMR data, such as residual dipolar couplings (RDCs) or backbone chemical shifts. This opens a question of whether we can also determine the accurate protein side-chain conformations using sparse or unassigned NMR data. We attack this question by using unassigned nuclear Overhauser effect spectroscopy (NOESY) data, which records the through-space dipolar interactions between protons nearby in three-dimensional (3D) space. We propose a Bayesian approach with a Markov random field (MRF) model to integrate the likelihood function derived from observed experimental data, with prior information (i.e., empirical molecular mechanics energies) about the protein structures. We unify the side-chain structure prediction problem with the side-chain structure determination problem using unassigned NMR data, and apply the deterministic dead-end elimination (DEE) and A* search algorithms to provably find the global optimum solution that maximizes the posterior probability. We employ a Hausdorff-based measure to derive the likelihood of a rotamer or a pairwise rotamer interaction from unassigned NOESY data. In addition, we apply a systematic and rigorous approach to estimate the experimental noise in NMR data, which also determines the weighting factor of the data term in the scoring function derived from the Bayesian framework. We tested our approach on real NMR data of three proteins: the FF Domain 2 of human transcription elongation factor CA150 (FF2), the B1 domain of Protein G (GB1), and human ubiquitin. The promising results indicate that our algorithm can be applied in high-resolution protein structure determination. Since our approach does not require any NOE assignment, it can accelerate the NMR structure determination process.
Collapse
Affiliation(s)
- Jianyang Zeng
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | | | | | | |
Collapse
|
44
|
Smith CA, Kortemme T. Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design. PLoS One 2011; 6:e20451. [PMID: 21789164 PMCID: PMC3138746 DOI: 10.1371/journal.pone.0020451] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2011] [Accepted: 04/20/2011] [Indexed: 11/18/2022] Open
Abstract
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.
Collapse
Affiliation(s)
- Colin A. Smith
- Graduate Program in Biological and Medical Informatics, University of California San Francisco, San Francisco, California, United States of America
- California Institute for Quantitative Biosciences, San Francisco, California, United States of America
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America
| | - Tanja Kortemme
- Graduate Program in Biological and Medical Informatics, University of California San Francisco, San Francisco, California, United States of America
- California Institute for Quantitative Biosciences, San Francisco, California, United States of America
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
45
|
Samish I, MacDermaid CM, Perez-Aguilar JM, Saven JG. Theoretical and Computational Protein Design. Annu Rev Phys Chem 2011; 62:129-49. [DOI: 10.1146/annurev-physchem-032210-103509] [Citation(s) in RCA: 119] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
| | | | | | - Jeffery G. Saven
- Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
| |
Collapse
|
46
|
Sammond DW, Bosch DE, Butterfoss GL, Purbeck C, Machius M, Siderovski DP, Kuhlman B. Computational design of the sequence and structure of a protein-binding peptide. J Am Chem Soc 2011; 133:4190-2. [PMID: 21388199 DOI: 10.1021/ja110296z] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The de novo design of protein-binding peptides is challenging because it requires the identification of both a sequence and a backbone conformation favorable for binding. We used a computational strategy that iterates between structure and sequence optimization to redesign the C-terminal portion of the RGS14 GoLoco motif peptide so that it adopts a new conformation when bound to Gα(i1). An X-ray crystal structure of the redesigned complex closely matches the computational model, with a backbone root-mean-square deviation of 1.1 Å.
Collapse
Affiliation(s)
- Deanne W Sammond
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, North Carolina 27599-7260, USA
| | | | | | | | | | | | | |
Collapse
|
47
|
Sharabi O, Dekel A, Shifman JM. Triathlon for energy functions: who is the winner for design of protein-protein interactions? Proteins 2011; 79:1487-98. [PMID: 21365678 DOI: 10.1002/prot.22977] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2010] [Revised: 12/19/2010] [Accepted: 12/22/2010] [Indexed: 11/09/2022]
Abstract
Computational prediction of stabilizing mutations into monomeric proteins has become an almost ordinary task. Yet, computational stabilization of protein–protein complexes remains a challenge. Design of protein–protein interactions (PPIs) is impeded by the absence of an energy function that could reliably reproduce all favorable interactions between the binding partners. In this work, we present three energy functions: one function that was trained on monomeric proteins, while the other two were optimized by different techniques to predict side-chain conformations in a dataset of PPIs. The performances of these energy functions are evaluated in three different tasks related to design of PPIs: predicting side-chain conformations in PPIs, recovering native binding-interface sequences, and predicting changes in free energy of binding due to mutations. Our findings show that both functions optimized on side-chain repacking in PPIs are more suitable for PPI design compared to the function trained on monomeric proteins. Yet, no function performs best at all three tasks. Comparison of the three energy functions and their performances revealed that (1) burial of polar atoms should not be penalized significantly in PPI design as in single-protein design and (2) contribution of electrostatic interactions should be increased several-fold when switching from single-protein to PPI design. In addition, the use of a softer van der Waals potential is beneficial in cases when backbone flexibility is important. All things considered, we define an energy function that captures most of the nuances of the binding energetics and hence, should be used in future for design of PPIs.
Collapse
Affiliation(s)
- Oz Sharabi
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | | | | |
Collapse
|
48
|
Morin A, Meiler J, Mizoue LS. Computational design of protein-ligand interfaces: potential in therapeutic development. Trends Biotechnol 2011; 29:159-66. [PMID: 21295366 DOI: 10.1016/j.tibtech.2011.01.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2010] [Revised: 12/22/2010] [Accepted: 01/05/2011] [Indexed: 01/16/2023]
Abstract
Computational design of protein-ligand interfaces finds optimal amino acid sequences within a small-molecule binding site of a protein for tight binding of a specific small molecule. It requires a search algorithm that can rapidly sample the vast sequence and conformational space, and a scoring function that can identify low energy designs. This review focuses on recent advances in computational design methods and their application to protein-small molecule binding sites. Strategies for increasing affinity, altering specificity, creating broad-spectrum binding, and building novel enzymes from scratch are described. Future prospects for applications in drug development are discussed, including limitations that will need to be overcome to achieve computational design of protein therapeutics with novel modes of action.
Collapse
Affiliation(s)
- Andrew Morin
- Departments of Chemistry, Pharmacology, and Biomedical Informatics, Vanderbilt University, 7330 Stevenson Center, Station B 351822, Nashville, TN 37235, USA
| | | | | |
Collapse
|
49
|
Abstract
A long-standing goal of computational protein design is to create proteins similar to those found in Nature. One motivation is to harness the exquisite functional capabilities of proteins for our own purposes. The extent of similarity between designed and natural proteins also reports on how faithfully our models represent the selective pressures that determine protein sequences. As the field of protein design shifts emphasis from reproducing native-like protein structure to function, it has become important that these models treat the notion of specificity in molecular interactions. Although specificity may, in some cases, be achieved by optimization of a desired protein in isolation, methods have been developed to address directly the desire for proteins that exhibit specific functions and interactions.
Collapse
Affiliation(s)
- James J Havranek
- Department of Genetics, Washington University School of Medicine, St Louis, Missouri 63110, USA.
| |
Collapse
|
50
|
Abstract
Drug resistance resulting from mutations to the target is an unfortunate common phenomenon that limits the lifetime of many of the most successful drugs. In contrast to the investigation of mutations after clinical exposure, it would be powerful to be able to incorporate strategies early in the development process to predict and overcome the effects of possible resistance mutations. Here we present a unique prospective application of an ensemble-based protein design algorithm, K*, to predict potential resistance mutations in dihydrofolate reductase from Staphylococcus aureus using positive design to maintain catalytic function and negative design to interfere with binding of a lead inhibitor. Enzyme inhibition assays show that three of the four highly-ranked predicted mutants are active yet display lower affinity (18-, 9-, and 13-fold) for the inhibitor. A crystal structure of the top-ranked mutant enzyme validates the predicted conformations of the mutated residues and the structural basis of the loss of potency. The use of protein design algorithms to predict resistance mutations could be incorporated in a lead design strategy against any target that is susceptible to mutational resistance.
Collapse
|