Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Yanover C, Fromer M, Shifman JM. Dead-end elimination for multistate protein design. J Comput Chem 2007;28:2122-9. [PMID: 17471460 DOI: 10.1002/jcc.20661] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Number

Cited by Other Article(s)

Wang F, Feng X, Kong R, Chang S. Generating new protein sequences by using dense network and attention mechanism. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023;20:4178-4197. [PMID: 36899622 DOI: 10.3934/mbe.2023195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]

Talluri S. Algorithms for protein design. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022;130:1-38. [PMID: 35534105 DOI: 10.1016/bs.apcsb.2022.01.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Nazet J, Lang E, Merkl R. Rosetta:MSF:NN: Boosting performance of multi-state computational protein design with a neural network. PLoS One 2021;16:e0256691. [PMID: 34437621 PMCID: PMC8389498 DOI: 10.1371/journal.pone.0256691] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 08/12/2021] [Indexed: 12/05/2022] Open

Abstract

Rational protein design aims at the targeted modification of existing proteins. To reach this goal, software suites like Rosetta propose sequences to introduce the desired properties. Challenging design problems necessitate the representation of a protein by means of a structural ensemble. Thus, Rosetta multi-state design (MSD) protocols have been developed wherein each state represents one protein conformation. Computational demands of MSD protocols are high, because for each of the candidate sequences a costly three-dimensional (3D) model has to be created and assessed for all states. Each of these scores contributes one data point to a complex, design-specific energy landscape. As neural networks (NN) proved well-suited to learn such solution spaces, we integrated one into the framework Rosetta:MSF instead of the so far used genetic algorithm with the aim to reduce computational costs. As its predecessor, Rosetta:MSF:NN administers a set of candidate sequences and their scores and scans sequence space iteratively. During each iteration, the union of all candidate sequences and their Rosetta scores are used to re-train NNs that possess a design-specific architecture. The enormous speed of the NNs allows an extensive assessment of alternative sequences, which are ranked on the scores predicted by the NN. Costly 3D models are computed only for a small fraction of best-scoring sequences; these and the corresponding 3D-based scores replace half of the candidate sequences during each iteration. The analysis of two sets of candidate sequences generated for a specific design problem by means of a genetic algorithm confirmed that the NN predicted 3D-based scores quite well; the Pearson correlation coefficient was at least 0.95. Applying Rosetta:MSF:NN:enzdes to a benchmark consisting of 16 ligand-binding problems showed that this protocol converges ten-times faster than the genetic algorithm and finds sequences with comparable scores.

Collapse

Mignon D, Druart K, Michael E, Opuu V, Polydorides S, Villa F, Gaillard T, Panel N, Archontis G, Simonson T. Physics-Based Computational Protein Design: An Update. J Phys Chem A 2020;124:10637-10648. [DOI: 10.1021/acs.jpca.0c07605] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

HALLEN MARKA, DONALD BRUCER. Protein Design by Provable Algorithms. COMMUNICATIONS OF THE ACM 2019;62:76-84. [PMID: 31607753 PMCID: PMC6788629 DOI: 10.1145/3338124] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Vucinic J, Simoncini D, Ruffini M, Barbe S, Schiex T. Positive multistate protein design. Bioinformatics 2019;36:122-130. [DOI: 10.1093/bioinformatics/btz497] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Revised: 05/20/2019] [Accepted: 06/11/2019] [Indexed: 11/12/2022] Open

Hallen MA. PLUG (Pruning of Local Unrealistic Geometries) removes restrictions on biophysical modeling for protein design. Proteins 2018;87:62-73. [PMID: 30378699 DOI: 10.1002/prot.25623] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 10/10/2018] [Accepted: 10/16/2018] [Indexed: 12/29/2022]

Karimi M, Shen Y. iCFN: an efficient exact algorithm for multistate protein design. Bioinformatics 2018;34:i811-i820. [PMID: 30423073 PMCID: PMC6129278 DOI: 10.1093/bioinformatics/bty564] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

In silico methods for design of biological therapeutics. Methods 2017;131:33-65. [PMID: 28958951 DOI: 10.1016/j.ymeth.2017.09.008] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 09/21/2017] [Accepted: 09/23/2017] [Indexed: 12/18/2022] Open

Löffler P, Schmitz S, Hupfeld E, Sterner R, Merkl R. Rosetta:MSF: a modular framework for multi-state computational protein design. PLoS Comput Biol 2017;13:e1005600. [PMID: 28604768 PMCID: PMC5484525 DOI: 10.1371/journal.pcbi.1005600] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 06/26/2017] [Accepted: 05/27/2017] [Indexed: 12/20/2022] Open

Abstract

Computational protein design (CPD) is a powerful technique to engineer existing proteins or to design novel ones that display desired properties. Rosetta is a software suite including algorithms for computational modeling and analysis of protein structures and offers many elaborate protocols created to solve highly specific tasks of protein engineering. Most of Rosetta’s protocols optimize sequences based on a single conformation (i. e. design state). However, challenging CPD objectives like multi-specificity design or the concurrent consideration of positive and negative design goals demand the simultaneous assessment of multiple states. This is why we have developed the multi-state framework MSF that facilitates the implementation of Rosetta’s single-state protocols in a multi-state environment and made available two frequently used protocols. Utilizing MSF, we demonstrated for one of these protocols that multi-state design yields a 15% higher performance than single-state design on a ligand-binding benchmark consisting of structural conformations. With this protocol, we designed de novo nine retro-aldolases on a conformational ensemble deduced from a (βα)₈-barrel protein. All variants displayed measurable catalytic activity, testifying to a high success rate for this concept of multi-state enzyme design.

Protein engineering, i. e. the targeted modification or design of proteins has tremendous potential for medical and industrial applications. One generally applicable strategy for protein engineering is rational protein design: based on detailed knowledge of structure and function, computer programs like Rosetta propose the sequence of a protein possessing the desired properties. So far, most computer protocols have used rigid structures for design, which is a simplification because a protein’s structure is more accurately specified by a conformational ensemble. We have now implemented a framework for computational protein design that allows certain design protocols of Rosetta to make use of multiple design states like structural ensembles. An in silico assessment simulating ligand-binding design showed that this new approach generates more reliably native-like sequences than a single-state approach. As a proof-of-concept, we introduced de novo retro-aldolase activity into a scaffold protein and characterized nine variants experimentally, all of which were catalytically active.

Collapse

Computationally optimized deimmunization libraries yield highly mutated enzymes with low immunogenicity and enhanced activity. Proc Natl Acad Sci U S A 2017;114:E5085-E5093. [PMID: 28607051 DOI: 10.1073/pnas.1621233114] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

Abstract

Therapeutic proteins of wide-ranging function hold great promise for treating disease, but immune surveillance of these macromolecules can drive an antidrug immune response that compromises efficacy and even undermines safety. To eliminate widespread T-cell epitopes in any biotherapeutic and thereby mitigate this key source of detrimental immune recognition, we developed a Pareto optimal deimmunization library design algorithm that optimizes protein libraries to account for the simultaneous effects of combinations of mutations on both molecular function and epitope content. Active variants identified by high-throughput screening are thus inherently likely to be deimmunized. Functional screening of an optimized 10-site library (1,536 variants) of P99 β-lactamase (P99βL), a component of ADEPT cancer therapies, revealed that the population possessed high overall fitness, and comprehensive analysis of peptide-MHC II immunoreactivity showed the population possessed lower average immunogenic potential than the wild-type enzyme. Although similar functional screening of an optimized 30-site library (2.15 × 10⁹ variants) revealed reduced population-wide fitness, numerous individual variants were found to have activity and stability better than the wild type despite bearing 13 or more deimmunizing mutations per enzyme. The immunogenic potential of one highly active and stable 14-mutation variant was assessed further using ex vivo cellular immunoassays, and the variant was found to silence T-cell activation in seven of the eight blood donors who responded strongly to wild-type P99βL. In summary, our multiobjective library-design process readily identified large and mutually compatible sets of epitope-deleting mutations and produced highly active but aggressively deimmunized constructs in only one round of library screening.

Collapse

Jain S, Jou JD, Georgiev IS, Donald BR. A critical analysis of computational protein design with sparse residue interaction graphs. PLoS Comput Biol 2017;13:e1005346. [PMID: 28358804 PMCID: PMC5391103 DOI: 10.1371/journal.pcbi.1005346] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 04/13/2017] [Accepted: 01/03/2017] [Indexed: 11/19/2022] Open

Abstract

Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies.

Computational structure-based protein design algorithms have successfully redesigned proteins to fold and bind target substrates in vitro, and even in vivo. Because the complexity of a computational design increases dramatically with the number of mutable residues, many design algorithms employ cutoffs (distance or energy) to neglect some pairwise residue interactions, thereby reducing the effective search space and computational cost. However, the energies neglected by such cutoffs can add up, which may have nontrivial effects on the designed sequence and its function. To study the effects of using cutoffs on protein design, we computed the optimal sequence both with and without cutoffs, and showed that neglecting long-range interactions can significantly change the computed conformation and sequence. Designs on proteins with experimentally measured thermostability showed the benefits of computing the optimal sequences (and their conformations), both with and without cutoffs, efficiently and accurately. Therefore, we also showed that a provable, ensemble-based algorithm can efficiently compute the optimal conformation and sequence, both with and without applying cutoffs, by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine cutoffs with provable, ensemble-based algorithms to reap the computational efficiency of cutoffs while avoiding their potential inaccuracies.

Collapse

Multistate Computational Protein Design with Backbone Ensembles. Methods Mol Biol 2017;1529:161-179. [PMID: 27914050 DOI: 10.1007/978-1-4939-6637-0_7] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Druart K, Bigot J, Audit E, Simonson T. A Hybrid Monte Carlo Scheme for Multibackbone Protein Design. J Chem Theory Comput 2016;12:6035-6048. [DOI: 10.1021/acs.jctc.6b00421] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Hallen MA, Jou JD, Donald BR. LUTE (Local Unpruned Tuple Expansion): Accurate Continuously Flexible Protein Design with General Energy Functions and Rigid Rotamer-Like Efficiency. J Comput Biol 2016;24:536-546. [PMID: 27681371 DOI: 10.1089/cmb.2016.0136] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

Hallen MA, Donald BR. comets (Constrained Optimization of Multistate Energies by Tree Search): A Provable and Efficient Protein Design Algorithm to Optimize Binding Affinity and Specificity with Respect to Sequence. J Comput Biol 2016;23:311-21. [PMID: 26761641 DOI: 10.1089/cmb.2015.0188] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Sevy AM, Jacobs TM, Crowe JE, Meiler J. Design of Protein Multi-specificity Using an Independent Sequence Search Reduces the Barrier to Low Energy Sequences. PLoS Comput Biol 2015;11:e1004300. [PMID: 26147100 PMCID: PMC4493036 DOI: 10.1371/journal.pcbi.1004300] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2015] [Accepted: 04/27/2015] [Indexed: 11/18/2022] Open

Warszawski S, Netzer R, Tawfik DS, Fleishman SJ. A "fuzzy"-logic language for encoding multiple physical traits in biomolecules. J Mol Biol 2014;426:4125-4138. [PMID: 25311857 PMCID: PMC4270444 DOI: 10.1016/j.jmb.2014.10.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Revised: 09/21/2014] [Accepted: 10/02/2014] [Indexed: 12/16/2022]

Gaillard T, Simonson T. Pairwise decomposition of an MMGBSA energy function for computational protein design. J Comput Chem 2014;35:1371-87. [PMID: 24854675 DOI: 10.1002/jcc.23637] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Revised: 04/14/2014] [Accepted: 05/01/2014] [Indexed: 02/02/2023]

Rational design of a ligand-controlled protein conformational switch. Proc Natl Acad Sci U S A 2013;110:6800-4. [PMID: 23569285 DOI: 10.1073/pnas.1218319110] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open

Multistate Protein Design Using CLEVER and CLASSY. Methods Enzymol 2013. [DOI: 10.1016/b978-0-12-394292-0.00008-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

Computational Methods for Controlling Binding Specificity. Methods Enzymol 2013;523:41-59. [DOI: 10.1016/b978-0-12-394292-0.00003-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Hallen MA, Keedy DA, Donald BR. Dead-end elimination with perturbations (DEEPer): a provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins 2012;81:18-39. [PMID: 22821798 DOI: 10.1002/prot.24150] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Revised: 07/01/2012] [Accepted: 07/11/2012] [Indexed: 11/12/2022]

Abstract

Computational protein and drug design generally require accurate modeling of protein conformations. This modeling typically starts with an experimentally determined protein structure and considers possible conformational changes due to mutations or new ligands. The DEE/A* algorithm provably finds the global minimum-energy conformation (GMEC) of a protein assuming that the backbone does not move and the sidechains take on conformations from a set of discrete, experimentally observed conformations called rotamers. DEE/A* can efficiently find the overall GMEC for exponentially many mutant sequences. Previous improvements to DEE/A* include modeling ensembles of sidechain conformations and either continuous sidechain or backbone flexibility. We present a new algorithm, DEEPer (Dead-End Elimination with Perturbations), that combines these advantages and can also handle much more extensive backbone flexibility and backbone ensembles. DEEPer provably finds the GMEC or, if desired by the user, all conformations and sequences within a specified energy window of the GMEC. It includes the new abilities to handle arbitrarily large backbone perturbations and to generate ensembles of backbone conformations. It also incorporates the shear, an experimentally observed local backbone motion never before used in design. Additionally, we derive a new method to accelerate DEE/A*-based calculations, indirect pruning, that is particularly useful for DEEPer. In 67 benchmark tests on 64 proteins, DEEPer consistently identified lower-energy conformations than previous methods did, indicating more accurate modeling. Additional tests demonstrated its ability to incorporate larger, experimentally observed backbone conformational changes and to model realistic conformational ensembles. These capabilities provide significant advantages for modeling protein mutations and protein-ligand interactions.

Collapse

Davey JA, Chica RA. Multistate approaches in computational protein design. Protein Sci 2012;21:1241-52. [PMID: 22811394 DOI: 10.1002/pro.2128] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2012] [Revised: 07/04/2012] [Accepted: 07/12/2012] [Indexed: 11/10/2022]

Designing electrostatic interactions in biological systems via charge optimization or combinatorial approaches: insights and challenges with a continuum electrostatic framework. Theor Chem Acc 2012. [DOI: 10.1007/s00214-012-1252-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Samish I, MacDermaid CM, Perez-Aguilar JM, Saven JG. Theoretical and Computational Protein Design. Annu Rev Phys Chem 2011;62:129-49. [DOI: 10.1146/annurev-physchem-032210-103509] [Citation(s) in RCA: 119] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Erijman A, Aizner Y, Shifman JM. Multispecific Recognition: Mechanism, Evolution, and Design. Biochemistry 2011;50:602-11. [DOI: 10.1021/bi101563v] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Fromer M, Yanover C, Harel A, Shachar O, Weiss Y, Linial M. SPRINT: side-chain prediction inference toolbox for multistate protein design. Bioinformatics 2010;26:2466-7. [DOI: 10.1093/bioinformatics/btq445] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Havranek JJ. Specificity in computational protein design. J Biol Chem 2010;285:31095-9. [PMID: 20670934 PMCID: PMC2951182 DOI: 10.1074/jbc.r110.157685] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

Fromer M, Yanover C, Linial M. Design of multispecific protein sequences using probabilistic graphical modeling. Proteins 2010;78:530-47. [PMID: 19842166 DOI: 10.1002/prot.22575] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Fromer M, Yanover C. Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space. Proteins 2009;75:682-705. [PMID: 19003998 DOI: 10.1002/prot.22280] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Abstract

The task of engineering a protein to assume a target three-dimensional structure is known as protein design. Computational search algorithms are devised to predict a minimal energy amino acid sequence for a particular structure. In practice, however, an ensemble of low-energy sequences is often sought. Primarily, this is performed because an individual predicted low-energy sequence may not necessarily fold to the target structure because of both inaccuracies in modeling protein energetics and the nonoptimal nature of search algorithms employed. Additionally, some low-energy sequences may be overly stable and thus lack the dynamic flexibility required for biological functionality. Furthermore, the investigation of low-energy sequence ensembles will provide crucial insights into the pseudo-physical energy force fields that have been derived to describe structural energetics for protein design. Significantly, numerous studies have predicted low-energy sequences, which were subsequently synthesized and demonstrated to fold to desired structures. However, the characterization of the sequence space defined by such energy functions as compatible with a target structure has not been performed in full detail. This issue is critical for protein design scientists to successfully continue using these force fields at an ever-increasing pace and scale. In this paper, we present a conceptually novel algorithm that rapidly predicts the set of lowest energy sequences for a given structure. Based on the theory of probabilistic graphical models, it performs efficient inspection and partitioning of the near-optimal sequence space, without making any assumptions of positional independence. We benchmark its performance on a diverse set of relevant protein design examples and show that it consistently yields sequences of lower energy than those derived from state-of-the-art techniques. Thus, we find that previously presented search techniques do not fully depict the low-energy space as precisely. Examination of the predicted ensembles indicates that, for each structure, the amino acid identity at a majority of positions must be chosen extremely selectively so as to not incur significant energetic penalties. We investigate this high degree of similarity and demonstrate how more diverse near-optimal sequences can be predicted in order to systematically overcome this bottleneck for computational design. Finally, we exploit this in-depth analysis of a collection of the lowest energy sequences to suggest an explanation for previously observed experimental design results. The novel methodologies introduced here accurately portray the sequence space compatible with a protein structure and further supply a scheme to yield heterogeneous low-energy sequences, thus providing a powerful instrument for future work on protein design.

Collapse

Georgiev I, Keedy D, Richardson JS, Richardson DC, Donald BR. Algorithm for backrub motions in protein design. Bioinformatics 2008;24:i196-204. [PMID: 18586714 PMCID: PMC2718647 DOI: 10.1093/bioinformatics/btn169] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open