1
|
Nazet J, Lang E, Merkl R. Rosetta:MSF:NN: Boosting performance of multi-state computational protein design with a neural network. PLoS One 2021; 16:e0256691. [PMID: 34437621 PMCID: PMC8389498 DOI: 10.1371/journal.pone.0256691] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 08/12/2021] [Indexed: 12/05/2022] Open
Abstract
Rational protein design aims at the targeted modification of existing proteins. To reach this goal, software suites like Rosetta propose sequences to introduce the desired properties. Challenging design problems necessitate the representation of a protein by means of a structural ensemble. Thus, Rosetta multi-state design (MSD) protocols have been developed wherein each state represents one protein conformation. Computational demands of MSD protocols are high, because for each of the candidate sequences a costly three-dimensional (3D) model has to be created and assessed for all states. Each of these scores contributes one data point to a complex, design-specific energy landscape. As neural networks (NN) proved well-suited to learn such solution spaces, we integrated one into the framework Rosetta:MSF instead of the so far used genetic algorithm with the aim to reduce computational costs. As its predecessor, Rosetta:MSF:NN administers a set of candidate sequences and their scores and scans sequence space iteratively. During each iteration, the union of all candidate sequences and their Rosetta scores are used to re-train NNs that possess a design-specific architecture. The enormous speed of the NNs allows an extensive assessment of alternative sequences, which are ranked on the scores predicted by the NN. Costly 3D models are computed only for a small fraction of best-scoring sequences; these and the corresponding 3D-based scores replace half of the candidate sequences during each iteration. The analysis of two sets of candidate sequences generated for a specific design problem by means of a genetic algorithm confirmed that the NN predicted 3D-based scores quite well; the Pearson correlation coefficient was at least 0.95. Applying Rosetta:MSF:NN:enzdes to a benchmark consisting of 16 ligand-binding problems showed that this protocol converges ten-times faster than the genetic algorithm and finds sequences with comparable scores.
Collapse
Affiliation(s)
- Julian Nazet
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Elmar Lang
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
- * E-mail:
| |
Collapse
|
2
|
Pan X, Kortemme T. Recent advances in de novo protein design: Principles, methods, and applications. J Biol Chem 2021; 296:100558. [PMID: 33744284 PMCID: PMC8065224 DOI: 10.1016/j.jbc.2021.100558] [Citation(s) in RCA: 90] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/12/2021] [Accepted: 03/16/2021] [Indexed: 02/06/2023] Open
Abstract
The computational de novo protein design is increasingly applied to address a number of key challenges in biomedicine and biological engineering. Successes in expanding applications are driven by advances in design principles and methods over several decades. Here, we review recent innovations in major aspects of the de novo protein design and include how these advances were informed by principles of protein architecture and interactions derived from the wealth of structures in the Protein Data Bank. We describe developments in de novo generation of designable backbone structures, optimization of sequences, design scoring functions, and the design of the function. The advances not only highlight design goals reachable now but also point to the challenges and opportunities for the future of the field.
Collapse
Affiliation(s)
- Xingjie Pan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA; UC Berkeley - UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA.
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA; UC Berkeley - UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA; Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California, USA.
| |
Collapse
|
3
|
Mignon D, Druart K, Michael E, Opuu V, Polydorides S, Villa F, Gaillard T, Panel N, Archontis G, Simonson T. Physics-Based Computational Protein Design: An Update. J Phys Chem A 2020; 124:10637-10648. [DOI: 10.1021/acs.jpca.0c07605] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- David Mignon
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128 Palaiseau, France
| | - Karen Druart
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128 Palaiseau, France
| | - Eleni Michael
- Department of Physics, University of Cyprus, PO20537, CY1678 Nicosia, Cyprus
| | - Vaitea Opuu
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128 Palaiseau, France
| | - Savvas Polydorides
- Department of Physics, University of Cyprus, PO20537, CY1678 Nicosia, Cyprus
| | - Francesco Villa
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128 Palaiseau, France
| | - Thomas Gaillard
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128 Palaiseau, France
| | - Nicolas Panel
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128 Palaiseau, France
| | - Georgios Archontis
- Department of Physics, University of Cyprus, PO20537, CY1678 Nicosia, Cyprus
| | - Thomas Simonson
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128 Palaiseau, France
| |
Collapse
|
4
|
Lucas JE, Kortemme T. New computational protein design methods for de novo small molecule binding sites. PLoS Comput Biol 2020; 16:e1008178. [PMID: 33017412 PMCID: PMC7575090 DOI: 10.1371/journal.pcbi.1008178] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 10/20/2020] [Accepted: 07/22/2020] [Indexed: 11/19/2022] Open
Abstract
Protein binding to small molecules is fundamental to many biological processes, yet it remains challenging to predictively design this functionality de novo. Current state-of-the-art computational design methods typically rely on existing small molecule binding sites or protein scaffolds with existing shape complementarity for a target ligand. Here we introduce new methods that utilize pools of discrete contacts between protein side chains and defined small molecule ligand substructures (ligand fragments) observed in the Protein Data Bank. We use the Rosetta Molecular Modeling Suite to recombine protein side chains in these contact pools to generate hundreds of thousands of energetically favorable binding sites for a target ligand. These composite binding sites are built into existing scaffold proteins matching the intended binding site geometry with high accuracy. In addition, we apply pools of side chain rotamers interacting with the target ligand to augment Rosetta's conventional design machinery and improve key metrics known to be predictive of design success. We demonstrate that our method reliably builds diverse binding sites into different scaffold proteins for a variety of target molecules. Our generalizable de novo ligand binding site design method provides a foundation for versatile design of protein to interface previously unattainable molecules for applications in medical diagnostics and synthetic biology.
Collapse
Affiliation(s)
- James E. Lucas
- UC Berkeley–UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, CA, United States of America
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, United States of America
| | - Tanja Kortemme
- UC Berkeley–UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, CA, United States of America
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, United States of America
| |
Collapse
|
5
|
Sauer MF, Sevy AM, Crowe JE, Meiler J. Multi-state design of flexible proteins predicts sequences optimal for conformational change. PLoS Comput Biol 2020; 16:e1007339. [PMID: 32032348 PMCID: PMC7032724 DOI: 10.1371/journal.pcbi.1007339] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 02/20/2020] [Accepted: 12/23/2019] [Indexed: 12/11/2022] Open
Abstract
Computational protein design of an ensemble of conformations for one protein–i.e., multi-state design–determines the side chain identity by optimizing the energetic contributions of that side chain in each of the backbone conformations. Sampling the resulting large sequence-structure search space limits the number of conformations and the size of proteins in multi-state design algorithms. Here, we demonstrated that the REstrained CONvergence (RECON) algorithm can simultaneously evaluate the sequence of large proteins that undergo substantial conformational changes. Simultaneous optimization of side chain conformations across all conformations increased sequence conservation when compared to single-state designs in all cases. More importantly, the sequence space sampled by RECON MSD resembled the evolutionary sequence space of flexible proteins, particularly when confined to predicting the mutational preferences of limited common ancestral descent, such as in the case of influenza type A hemagglutinin. Additionally, we found that sequence positions which require substantial changes in their local environment across an ensemble of conformations are more likely to be conserved. These increased conservation rates are better captured by RECON MSD over multiple conformations and thus multiple local residue environments during design. To quantify this rewiring of contacts at a certain position in sequence and structure, we introduced a new metric designated ‘contact proximity deviation’ that enumerates contact map changes. This measure allows mapping of global conformational changes into local side chain proximity adjustments, a property not captured by traditional global similarity metrics such as RMSD or local similarity metrics such as changes in φ and ψ angles. Multi-state design can be used to engineer proteins that need to exist in multiple conformations or that bind to multiple partner molecules. In essence, multi-state design selects a compromise of protein sequences that allow for an ensemble of protein conformations, or states, associated with a particular biological function. In this paper, we used the REstrained CONvergence (RECON) algorithm with Rosetta to show that multi-state design of flexible proteins predicts sequences optimal for conformational change, mimicking mutation preferences sampled in evolution. Modeling optimal local side chain physicochemical environments within an ensemble selected significantly more native-like sequences than selections performed when all conformations states are designed independently. This outcome was particularly true for amino acids whose local side chain environment change between conformations. To quantify such contact map changes, we introduced a novel metric to show that sequence conservation is dependent on protein flexibility, i.e., changes in local side chain environments between stated limit the space of tolerated mutations. Additionally, such positions in sequence and structure are more likely to be energetically frustrated, at least in some states. Importantly, we showed that multi-state design over an ensemble of conformations (space) can explore evolutionary tolerated sequence space (time), thus enabling RECON to not only design proteins that require multiple states for function but also predict mutations that might be tolerated in native proteins but have not yet been explored by evolution. The latter aspect can be important to anticipate escape mutations, for example in pathogens or oncoproteins.
Collapse
Affiliation(s)
- Marion F Sauer
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America.,Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Alexander M Sevy
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America.,Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - James E Crowe
- Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America.,Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America.,Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Jens Meiler
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America.,Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
| |
Collapse
|
6
|
Kuhlman B, Bradley P. Advances in protein structure prediction and design. Nat Rev Mol Cell Biol 2019; 20:681-697. [PMID: 31417196 PMCID: PMC7032036 DOI: 10.1038/s41580-019-0163-x] [Citation(s) in RCA: 365] [Impact Index Per Article: 73.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/19/2019] [Indexed: 12/18/2022]
Abstract
The prediction of protein three-dimensional structure from amino acid sequence has been a grand challenge problem in computational biophysics for decades, owing to its intrinsic scientific interest and also to the many potential applications for robust protein structure prediction algorithms, from genome interpretation to protein function prediction. More recently, the inverse problem - designing an amino acid sequence that will fold into a specified three-dimensional structure - has attracted growing attention as a potential route to the rational engineering of proteins with functions useful in biotechnology and medicine. Methods for the prediction and design of protein structures have advanced dramatically in the past decade. Increases in computing power and the rapid growth in protein sequence and structure databases have fuelled the development of new data-intensive and computationally demanding approaches for structure prediction. New algorithms for designing protein folds and protein-protein interfaces have been used to engineer novel high-order assemblies and to design from scratch fluorescent proteins with novel or enhanced properties, as well as signalling proteins with therapeutic potential. In this Review, we describe current approaches for protein structure prediction and design and highlight a selection of the successful applications they have enabled.
Collapse
Affiliation(s)
- Brian Kuhlman
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC, USA.
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA.
| | - Philip Bradley
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
- Institute for Protein Design, University of Washington, Seattle, WA, USA.
| |
Collapse
|
7
|
St-Jacques AD, Eyahpaise MÈC, Chica RA. Computational Design of Multisubstrate Enzyme Specificity. ACS Catal 2019. [DOI: 10.1021/acscatal.9b01464] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Antony D. St-Jacques
- Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario K1N 6N5, Canada
| | - Marie-Ève C. Eyahpaise
- Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario K1N 6N5, Canada
| | - Roberto A. Chica
- Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario K1N 6N5, Canada
| |
Collapse
|
8
|
Löffler P, Schmitz S, Hupfeld E, Sterner R, Merkl R. Rosetta:MSF: a modular framework for multi-state computational protein design. PLoS Comput Biol 2017; 13:e1005600. [PMID: 28604768 PMCID: PMC5484525 DOI: 10.1371/journal.pcbi.1005600] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 06/26/2017] [Accepted: 05/27/2017] [Indexed: 12/20/2022] Open
Abstract
Computational protein design (CPD) is a powerful technique to engineer existing proteins or to design novel ones that display desired properties. Rosetta is a software suite including algorithms for computational modeling and analysis of protein structures and offers many elaborate protocols created to solve highly specific tasks of protein engineering. Most of Rosetta’s protocols optimize sequences based on a single conformation (i. e. design state). However, challenging CPD objectives like multi-specificity design or the concurrent consideration of positive and negative design goals demand the simultaneous assessment of multiple states. This is why we have developed the multi-state framework MSF that facilitates the implementation of Rosetta’s single-state protocols in a multi-state environment and made available two frequently used protocols. Utilizing MSF, we demonstrated for one of these protocols that multi-state design yields a 15% higher performance than single-state design on a ligand-binding benchmark consisting of structural conformations. With this protocol, we designed de novo nine retro-aldolases on a conformational ensemble deduced from a (βα)8-barrel protein. All variants displayed measurable catalytic activity, testifying to a high success rate for this concept of multi-state enzyme design. Protein engineering, i. e. the targeted modification or design of proteins has tremendous potential for medical and industrial applications. One generally applicable strategy for protein engineering is rational protein design: based on detailed knowledge of structure and function, computer programs like Rosetta propose the sequence of a protein possessing the desired properties. So far, most computer protocols have used rigid structures for design, which is a simplification because a protein’s structure is more accurately specified by a conformational ensemble. We have now implemented a framework for computational protein design that allows certain design protocols of Rosetta to make use of multiple design states like structural ensembles. An in silico assessment simulating ligand-binding design showed that this new approach generates more reliably native-like sequences than a single-state approach. As a proof-of-concept, we introduced de novo retro-aldolase activity into a scaffold protein and characterized nine variants experimentally, all of which were catalytically active.
Collapse
Affiliation(s)
- Patrick Löffler
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Samuel Schmitz
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Enrico Hupfeld
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Reinhard Sterner
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
- * E-mail:
| |
Collapse
|
9
|
Computationally optimized deimmunization libraries yield highly mutated enzymes with low immunogenicity and enhanced activity. Proc Natl Acad Sci U S A 2017; 114:E5085-E5093. [PMID: 28607051 DOI: 10.1073/pnas.1621233114] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Therapeutic proteins of wide-ranging function hold great promise for treating disease, but immune surveillance of these macromolecules can drive an antidrug immune response that compromises efficacy and even undermines safety. To eliminate widespread T-cell epitopes in any biotherapeutic and thereby mitigate this key source of detrimental immune recognition, we developed a Pareto optimal deimmunization library design algorithm that optimizes protein libraries to account for the simultaneous effects of combinations of mutations on both molecular function and epitope content. Active variants identified by high-throughput screening are thus inherently likely to be deimmunized. Functional screening of an optimized 10-site library (1,536 variants) of P99 β-lactamase (P99βL), a component of ADEPT cancer therapies, revealed that the population possessed high overall fitness, and comprehensive analysis of peptide-MHC II immunoreactivity showed the population possessed lower average immunogenic potential than the wild-type enzyme. Although similar functional screening of an optimized 30-site library (2.15 × 109 variants) revealed reduced population-wide fitness, numerous individual variants were found to have activity and stability better than the wild type despite bearing 13 or more deimmunizing mutations per enzyme. The immunogenic potential of one highly active and stable 14-mutation variant was assessed further using ex vivo cellular immunoassays, and the variant was found to silence T-cell activation in seven of the eight blood donors who responded strongly to wild-type P99βL. In summary, our multiobjective library-design process readily identified large and mutually compatible sets of epitope-deleting mutations and produced highly active but aggressively deimmunized constructs in only one round of library screening.
Collapse
|
10
|
Abstract
The ability of computational protein design (CPD) to identify protein sequences possessing desired characteristics in vast sequence spaces makes it a highly valuable tool in the protein engineering toolbox. CPD calculations are typically performed using a single-state design (SSD) approach in which amino-acid sequences are optimized on a single protein structure. Although SSD has been successfully applied to the design of numerous protein functions and folds, the approach can lead to the incorrect rejection of desirable sequences because of the combined use of a fixed protein backbone template and a set of rigid rotamers. This fixed backbone approximation can be addressed by using multistate design (MSD) with backbone ensembles. MSD improves the quality of predicted sequences by using ensembles approximating conformational flexibility as input templates instead of a single fixed protein structure. In this chapter, we present a step-by-step guide to the implementation and analysis of MSD calculations with backbone ensembles. Specifically, we describe ensemble generation with the PertMin protocol, execution of MSD calculations for recapitulation of Streptococcal protein G domain β1 mutant stability, and analysis of computational predictions by sequence binning. Furthermore, we provide a comparison between MSD and SSD calculation results and discuss the benefits of multistate approaches to CPD.
Collapse
|
11
|
De Paepe B, Peters G, Coussement P, Maertens J, De Mey M. Tailor-made transcriptional biosensors for optimizing microbial cell factories. J Ind Microbiol Biotechnol 2016; 44:623-645. [PMID: 27837353 DOI: 10.1007/s10295-016-1862-3] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 10/30/2016] [Indexed: 12/24/2022]
Abstract
Monitoring cellular behavior and eventually properly adapting cellular processes is key to handle the enormous complexity of today's metabolic engineering questions. Hence, transcriptional biosensors bear the potential to augment and accelerate current metabolic engineering strategies, catalyzing vital advances in industrial biotechnology. The development of such transcriptional biosensors typically starts with exploring nature's richness. Hence, in a first part, the transcriptional biosensor architecture and the various modi operandi are briefly discussed, as well as experimental and computational methods and relevant ontologies to search for natural transcription factors and their corresponding binding sites. In the second part of this review, various engineering approaches are reviewed to tune the main characteristics of these (natural) transcriptional biosensors, i.e., the response curve and ligand specificity, in view of specific industrial biotechnology applications, which is illustrated using success stories of transcriptional biosensor engineering.
Collapse
Affiliation(s)
- Brecht De Paepe
- Department of Biochemical and Microbial Technology, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Gert Peters
- Department of Biochemical and Microbial Technology, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Pieter Coussement
- Department of Biochemical and Microbial Technology, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Jo Maertens
- Department of Biochemical and Microbial Technology, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Marjan De Mey
- Department of Biochemical and Microbial Technology, Ghent University, Coupure Links 653, 9000, Ghent, Belgium.
| |
Collapse
|
12
|
Gainza P, Nisonoff HM, Donald BR. Algorithms for protein design. Curr Opin Struct Biol 2016; 39:16-26. [PMID: 27086078 PMCID: PMC5065368 DOI: 10.1016/j.sbi.2016.03.006] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Revised: 03/15/2016] [Accepted: 03/22/2016] [Indexed: 02/05/2023]
Abstract
Computational structure-based protein design programs are becoming an increasingly important tool in molecular biology. These programs compute protein sequences that are predicted to fold to a target structure and perform a desired function. The success of a program's predictions largely relies on two components: first, the input biophysical model, and second, the algorithm that computes the best sequence(s) and structure(s) according to the biophysical model. Improving both the model and the algorithm in tandem is essential to improving the success rate of current programs, and here we review recent developments in algorithms for protein design, emphasizing how novel algorithms enable the use of more accurate biophysical models. We conclude with a list of algorithmic challenges in computational protein design that we believe will be especially important for the design of therapeutic proteins and protein assemblies.
Collapse
Affiliation(s)
- Pablo Gainza
- Department of Computer Science, Duke University, Durham, NC, United States
| | - Hunter M Nisonoff
- Department of Computer Science, Duke University, Durham, NC, United States
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC, United States; Department of Biochemistry, Duke University Medical Center, Durham, NC, United States; Department of Chemistry, Duke University, Durham, NC, United States.
| |
Collapse
|
13
|
de los Santos ELC, Meyerowitz JT, Mayo SL, Murray RM. Engineering Transcriptional Regulator Effector Specificity Using Computational Design and In Vitro Rapid Prototyping: Developing a Vanillin Sensor. ACS Synth Biol 2016; 5:287-95. [PMID: 26262913 DOI: 10.1021/acssynbio.5b00090] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The pursuit of circuits and metabolic pathways of increasing complexity and robustness in synthetic biology will require engineering new regulatory tools. Feedback control based on relevant molecules, including toxic intermediates and environmental signals, would enable genetic circuits to react appropriately to changing conditions. In this work, variants of qacR, a tetR family repressor, were generated by computational protein design and screened in a cell-free transcription-translation (TX-TL) system for responsiveness to a new targeted effector. The modified repressors target vanillin, a growth-inhibiting small molecule found in lignocellulosic hydrolysates and other industrial processes. Promising candidates from the in vitro screen were further characterized in vitro and in vivo in a gene circuit. The screen yielded two qacR mutants that respond to vanillin both in vitro and in vivo. While the mutants exhibit some toxicity to cells, presumably due to off-target effects, they are prime starting points for directed evolution toward vanillin sensors with the specifications required for use in a dynamic control loop. We believe this process, a combination of the generation of variants coupled with in vitro screening, can serve as a framework for designing new sensors for other target compounds.
Collapse
Affiliation(s)
- Emmanuel L. C. de los Santos
- Division of Biology and Biological
Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Joseph T. Meyerowitz
- Division of Biology and Biological
Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Stephen L. Mayo
- Division of Biology and Biological
Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Richard M. Murray
- Division of Biology and Biological
Engineering, California Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
14
|
Sevy AM, Jacobs TM, Crowe JE, Meiler J. Design of Protein Multi-specificity Using an Independent Sequence Search Reduces the Barrier to Low Energy Sequences. PLoS Comput Biol 2015; 11:e1004300. [PMID: 26147100 PMCID: PMC4493036 DOI: 10.1371/journal.pcbi.1004300] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2015] [Accepted: 04/27/2015] [Indexed: 11/18/2022] Open
Abstract
Computational protein design has found great success in engineering proteins for thermodynamic stability, binding specificity, or enzymatic activity in a 'single state' design (SSD) paradigm. Multi-specificity design (MSD), on the other hand, involves considering the stability of multiple protein states simultaneously. We have developed a novel MSD algorithm, which we refer to as REstrained CONvergence in multi-specificity design (RECON). The algorithm allows each state to adopt its own sequence throughout the design process rather than enforcing a single sequence on all states. Convergence to a single sequence is encouraged through an incrementally increasing convergence restraint for corresponding positions. Compared to MSD algorithms that enforce (constrain) an identical sequence on all states the energy landscape is simplified, which accelerates the search drastically. As a result, RECON can readily be used in simulations with a flexible protein backbone. We have benchmarked RECON on two design tasks. First, we designed antibodies derived from a common germline gene against their diverse targets to assess recovery of the germline, polyspecific sequence. Second, we design "promiscuous", polyspecific proteins against all binding partners and measure recovery of the native sequence. We show that RECON is able to efficiently recover native-like, biologically relevant sequences in this diverse set of protein complexes.
Collapse
Affiliation(s)
- Alexander M. Sevy
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Tim M. Jacobs
- Department of Biochemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - James E. Crowe
- Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Jens Meiler
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
| |
Collapse
|
15
|
Warszawski S, Netzer R, Tawfik DS, Fleishman SJ. A "fuzzy"-logic language for encoding multiple physical traits in biomolecules. J Mol Biol 2014; 426:4125-4138. [PMID: 25311857 PMCID: PMC4270444 DOI: 10.1016/j.jmb.2014.10.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Revised: 09/21/2014] [Accepted: 10/02/2014] [Indexed: 12/16/2022]
Abstract
To carry out their activities, biological macromolecules balance different physical traits, such as stability, interaction affinity, and selectivity. How such often opposing traits are encoded in a macromolecular system is critical to our understanding of evolutionary processes and ability to design new molecules with desired functions. We present a framework for constraining design simulations to balance different physical characteristics. Each trait is represented by the equilibrium fractional occupancy of the desired state relative to its alternatives, ranging from none to full occupancy, and the different traits are combined using Boolean operators to effect a "fuzzy"-logic language for encoding any combination of traits. In another paper, we presented a new combinatorial backbone design algorithm AbDesign where the fuzzy-logic framework was used to optimize protein backbones and sequences for both stability and binding affinity in antibody-design simulation. We now extend this framework and find that fuzzy-logic design simulations reproduce sequence and structure design principles seen in nature to underlie exquisite specificity on the one hand and multispecificity on the other hand. The fuzzy-logic language is broadly applicable and could help define the space of tolerated and beneficial mutations in natural biomolecular systems and design artificial molecules that encode complex characteristics.
Collapse
Affiliation(s)
- Shira Warszawski
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Ravit Netzer
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Dan S Tawfik
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Sarel J Fleishman
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel.
| |
Collapse
|
16
|
Lai YT, Tsai KL, Sawaya MR, Asturias FJ, Yeates TO. Structure and flexibility of nanoscale protein cages designed by symmetric self-assembly. J Am Chem Soc 2013; 135:7738-43. [PMID: 23621606 DOI: 10.1021/ja402277f] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Designing protein molecules that self-assemble into complex architectures is an outstanding goal in the area of nanobiotechnology. One design strategy for doing this involves genetically fusing together two natural proteins, each of which is known to form a simple oligomer on its own (e.g., a dimer or trimer). If two such components can be fused in a geometrically predefined configuration, that designed subunit can, in principle, assemble into highly symmetric architectures. Initial experiments showed that a 12-subunit tetrahedral cage, 16 nm in diameter, could be constructed following such a procedure [Padilla, J. E.; et al. Proc. Natl. Acad. Sci. U.S.A. 2001, 98, 2217; Lai, Y. T.; et al. Science 2012, 336, 1129]. Here we characterize multiple crystal structures of protein cages constructed in this way, including cages assembled from two mutant forms of the same basic protein subunit. The flexibilities of the designed assemblies and their deviations from the target model are described, along with implications for further design developments.
Collapse
Affiliation(s)
- Yen-Ting Lai
- Department of Bioengineering, University of California, Los Angeles, California 90095, USA
| | | | | | | | | |
Collapse
|
17
|
Rational design of a ligand-controlled protein conformational switch. Proc Natl Acad Sci U S A 2013; 110:6800-4. [PMID: 23569285 DOI: 10.1073/pnas.1218319110] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Design of a regulatable multistate protein is a challenge for protein engineering. Here we design a protein with a unique topology, called uniRapR, whose conformation is controlled by the binding of a small molecule. We confirm switching and control ability of uniRapR in silico, in vitro, and in vivo. As a proof of concept, uniRapR is used as an artificial regulatory domain to control activity of kinases. By activating Src kinase using uniRapR in single cells and whole organism, we observe two unique phenotypes consistent with its role in metastasis. Activation of Src kinase leads to rapid induction of protrusion with polarized spreading in HeLa cells, and morphological changes with loss of cell-cell contacts in the epidermal tissue of zebrafish. The rational creation of uniRapR exemplifies the strength of computational protein design, and offers a powerful means for targeted activation of many pathways to study signaling in living organisms.
Collapse
|
18
|
Davey JA, Chica RA. Multistate approaches in computational protein design. Protein Sci 2012; 21:1241-52. [PMID: 22811394 DOI: 10.1002/pro.2128] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2012] [Revised: 07/04/2012] [Accepted: 07/12/2012] [Indexed: 11/10/2022]
Abstract
Computational protein design (CPD) is a useful tool for protein engineers. It has been successfully applied towards the creation of proteins with increased thermostability, improved binding affinity, novel enzymatic activity, and altered ligand specificity. Traditionally, CPD calculations search and rank sequences using a single fixed protein backbone template in an approach referred to as single-state design (SSD). While SSD has enjoyed considerable success, certain design objectives require the explicit consideration of multiple conformational and/or chemical states. Cases where a "multistate" approach may be advantageous over the SSD approach include designing conformational changes into proteins, using native ensembles to mimic backbone flexibility, and designing ligand or oligomeric association specificities. These design objectives can be efficiently tackled using multistate design (MSD), an emerging methodology in CPD that considers any number of protein conformational or chemical states as inputs instead of a single protein backbone template, as in SSD. In this review article, recent examples of the successful design of a desired property into proteins using MSD are described. These studies employing MSD are divided into two categories--those that utilized multiple conformational states, and those that utilized multiple chemical states. In addition, the scoring of competing states during negative design is discussed as a current challenge for MSD.
Collapse
Affiliation(s)
- James A Davey
- Department of Chemistry, University of Ottawa, Ottawa, Ontario K1N 6N5, Canada
| | | |
Collapse
|
19
|
Leaver-Fay A, Jacak R, Stranges PB, Kuhlman B. A generic program for multistate protein design. PLoS One 2011; 6:e20937. [PMID: 21754981 PMCID: PMC3130737 DOI: 10.1371/journal.pone.0020937] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2011] [Accepted: 05/13/2011] [Indexed: 11/18/2022] Open
Abstract
Some protein design tasks cannot be modeled by the traditional single state design strategy of finding a sequence that is optimal for a single fixed backbone. Such cases require multistate design, where a single sequence is threaded onto multiple backbones (states) and evaluated for its strengths and weaknesses on each backbone. For example, to design a protein that can switch between two specific conformations, it is necessary to to find a sequence that is compatible with both backbone conformations. We present in this paper a generic implementation of multistate design that is suited for a wide range of protein design tasks and demonstrate in silico its capabilities at two design tasks: one of redesigning an obligate homodimer into an obligate heterodimer such that the new monomers would not homodimerize, and one of redesigning a promiscuous interface to bind to only a single partner and to no longer bind the rest of its partners. Both tasks contained negative design in that multistate design was asked to find sequences that would produce high energies for several of the states being modeled. Success at negative design was assessed by computationally redocking the undesired protein-pair interactions; we found that multistate design's accuracy improved as the diversity of conformations for the undesired protein-pair interactions increased. The paper concludes with a discussion of the pitfalls of negative design, which has proven considerably more challenging than positive design.
Collapse
Affiliation(s)
- Andrew Leaver-Fay
- Deptartment of Biochemistry, University of North Carolina, Chapel Hill, North Carolina, United States of America.
| | | | | | | |
Collapse
|
20
|
Kumar A, Ramakrishnan V. Creating novel protein scripts beyond natural alphabets. SYSTEMS AND SYNTHETIC BIOLOGY 2011; 4:247-56. [PMID: 22132051 DOI: 10.1007/s11693-011-9068-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2010] [Accepted: 02/03/2011] [Indexed: 11/29/2022]
Abstract
Natural proteins are concatenated amino acids with definite handedness or chirality, with their spatial orientation being preferentially left handed or L-chiral. This paper discusses the biophysics of stereo-chemical perturbation to proteins using D-(α) amino acid and its utility as an additional design alphabet while scripting novel protein structures.
Collapse
|
21
|
Experimental library screening demonstrates the successful application of computational protein design to large structural ensembles. Proc Natl Acad Sci U S A 2010; 107:19838-43. [PMID: 21045132 DOI: 10.1073/pnas.1012985107] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The stability, activity, and solubility of a protein sequence are determined by a delicate balance of molecular interactions in a variety of conformational states. Even so, most computational protein design methods model sequences in the context of a single native conformation. Simulations that model the native state as an ensemble have been mostly neglected due to the lack of sufficiently powerful optimization algorithms for multistate design. Here, we have applied our multistate design algorithm to study the potential utility of various forms of input structural data for design. To facilitate a more thorough analysis, we developed new methods for the design and high-throughput stability determination of combinatorial mutation libraries based on protein design calculations. The application of these methods to the core design of a small model system produced many variants with improved thermodynamic stability and showed that multistate design methods can be readily applied to large structural ensembles. We found that exhaustive screening of our designed libraries helped to clarify several sources of simulation error that would have otherwise been difficult to ascertain. Interestingly, the lack of correlation between our simulated and experimentally measured stability values shows clearly that a design procedure need not reproduce experimental data exactly to achieve success. This surprising result suggests potentially fruitful directions for the improvement of computational protein design technology.
Collapse
|
22
|
Lassila JK. Conformational diversity and computational enzyme design. Curr Opin Chem Biol 2010; 14:676-82. [PMID: 20829099 PMCID: PMC2953567 DOI: 10.1016/j.cbpa.2010.08.010] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2010] [Revised: 08/06/2010] [Accepted: 08/06/2010] [Indexed: 11/22/2022]
Abstract
The application of computational protein design methods to the design of enzyme active sites offers potential routes to new catalysts and new reaction specificities. Computational design methods have typically treated the protein backbone as a rigid structure for the sake of computational tractability. However, this fixed-backbone approximation introduces its own special challenges for enzyme design and it contrasts with an emerging picture of natural enzymes as dynamic ensembles with multiple conformations and motions throughout a reaction cycle. This review considers the impact of conformational variation and dynamics on computational enzyme design and it highlights new approaches to addressing protein conformational diversity in enzyme design including recent advances in multi-state design, backbone flexibility, and computational library design.
Collapse
Affiliation(s)
- Jonathan K Lassila
- Department of Biochemistry, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
23
|
Abstract
Predictive methods for the computational design of proteins search for amino acid sequences adopting desired structures that perform specific functions. Typically, design of 'function' is formulated as engineering new and altered binding activities into proteins. Progress in the design of functional protein-protein interactions is directed toward engineering proteins to precisely control biological processes by specifically recognizing desired interaction partners while avoiding competitors. The field is aiming for strategies to harness recent advances in high-resolution computational modeling-particularly those exploiting protein conformational variability-to engineer new functions and incorporate many functional requirements simultaneously.
Collapse
Affiliation(s)
- Daniel J Mandell
- Graduate Program in Bioinformatics and Computational Biology, California Institute for Quantitative Biosciences, and Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, USA
| | | |
Collapse
|