1
|
Doga H, Raubenolt B, Cumbo F, Joshi J, DiFilippo FP, Qin J, Blankenberg D, Shehab O. A Perspective on Protein Structure Prediction Using Quantum Computers. J Chem Theory Comput 2024; 20:3359-3378. [PMID: 38703105 PMCID: PMC11099973 DOI: 10.1021/acs.jctc.4c00067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 04/19/2024] [Accepted: 04/22/2024] [Indexed: 05/06/2024]
Abstract
Despite the recent advancements by deep learning methods such as AlphaFold2, in silico protein structure prediction remains a challenging problem in biomedical research. With the rapid evolution of quantum computing, it is natural to ask whether quantum computers can offer some meaningful benefits for approaching this problem. Yet, identifying specific problem instances amenable to quantum advantage and estimating the quantum resources required are equally challenging tasks. Here, we share our perspective on how to create a framework for systematically selecting protein structure prediction problems that are amenable for quantum advantage, and estimate quantum resources for such problems on a utility-scale quantum computer. As a proof-of-concept, we validate our problem selection framework by accurately predicting the structure of a catalytic loop of the Zika Virus NS3 Helicase, on quantum hardware.
Collapse
Affiliation(s)
- Hakan Doga
- IBM Quantum,
Almaden Research Center, San Jose, California 95120, United States
| | - Bryan Raubenolt
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Fabio Cumbo
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Jayadev Joshi
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Frank P. DiFilippo
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Jun Qin
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Daniel Blankenberg
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Omar Shehab
- IBM
Quantum, IBM Thomas J Watson Research Center, Yorktown Heights, New York 10598, United States
| |
Collapse
|
2
|
Saikia B, Baruah A. In silico design of misfolding resistant proteins: the role of structural similarity of a competing conformational ensemble in the optimization of frustration. SOFT MATTER 2024; 20:3283-3298. [PMID: 38529658 DOI: 10.1039/d4sm00171k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Most state-of-the-art in silico design methods fail due to misfolding of designed sequences to a conformation other than the target. Thus, a method to design misfolding resistant proteins will provide a better understanding of the misfolding phenomenon and will also increase the success rate of in silico design methods. In this work, we optimize the conformational ensemble to be selected for negative design purposes based on the similarity of the conformational ensemble to the target. Five ensembles with different degrees of similarity to the target are created and destabilized and the target is stabilized while designing sequences using mean field theory and Monte Carlo simulation methods. The results suggest that the degree of similarity of the non-native conformations to the target plays a prominent role in designing misfolding resistant protein sequences. The design procedures that destabilize the conformational ensemble with moderate similarity to the target have proven to be more promising. Incorporation of either highly similar or highly dissimilar conformations to the target conformation into the non-native ensemble to be destabilized may lead to sequences with a higher misfolding propensity. This will significantly reduce the conformational space to be considered in any protein design procedure. Interestingly, the results suggest that a sequence with higher frustration in the target structure does not necessarily lead to a misfold prone sequence. A successful design method may purposefully choose a frustrated sequence in the target conformation if that sequence is even more frustrated in the competing non-native conformations.
Collapse
Affiliation(s)
- Bondeepa Saikia
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India.
| | - Anupaul Baruah
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India.
| |
Collapse
|
3
|
Ray S, Tillo D, Assad N, Ufot A, Porollo A, Durell SR, Vinson C. Altering the Double-Stranded DNA Specificity of the bZIP Domain of Zta with Site-Directed Mutagenesis at N182. ACS OMEGA 2022; 7:129-139. [PMID: 35036684 PMCID: PMC8756438 DOI: 10.1021/acsomega.1c04148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 11/23/2021] [Indexed: 06/14/2023]
Abstract
Zta, the Epstein-Barr virus bZIP transcription factor (TF), binds both unmethylated and methylated double-stranded DNA (dsDNA) in a sequence-specific manner. We studied the contribution of a conserved asparagine (N182) to sequence-specific dsDNA binding to four types of dsDNA: (i) dsDNA with cytosine in both strands ((DNA(C|C)), (ii, iii) dsDNA with 5-methylcytosine (5mC, M) or 5-hydroxymethylcytosine (5hmC, H) in one strand and cytosine in the second strand ((DNA(5mC|C) and DNA(5hmC|C)), and (iv) dsDNA with methylated cytosine in both strands in all CG dinucleotides ((DNA(5mCG)). We replaced asparagine with five similarly sized amino acids (glutamine (Q), serine (S), threonine (T), isoleucine (I), or valine (V)) and used protein binding microarrays to evaluate sequence-specific dsDNA binding. Zta preferentially binds the pseudo-palindrome TRE (AP1) motif (T-4G-3A-2G/C 0T2C3A4 ). Zta (N182Q) changes binding to A3 in only one half-site. Zta(N182S) changes binding to G3 in one or both halves of the motif. Zta(N182S) and Zta(N182Q) have 34- and 17-fold weaker median dsDNA binding, respectively. Zta(N182V) and Zta(N182I) have increased binding to dsDNA(5mC|C). Molecular dynamics simulations rationalize some of these results, identifying hydrogen bonds between glutamine and A3 , but do not reveal why serine preferentially binds G3 , suggesting that entropic interactions may mediate this new binding specificity.
Collapse
Affiliation(s)
- Sreejana Ray
- Laboratory
of Metabolism, National Cancer Institute,
National Institutes of Health, Room 5000, Building 37, Bethesda, Maryland 20892, United States
| | - Desiree Tillo
- Laboratory
of Metabolism, National Cancer Institute,
National Institutes of Health, Room 5000, Building 37, Bethesda, Maryland 20892, United States
- Cancer
Genetics Branch, National Cancer Institute,
National Institutes of Health, Building 37, Bethesda, Maryland 20892, United States
| | - Nima Assad
- Laboratory
of Metabolism, National Cancer Institute,
National Institutes of Health, Room 5000, Building 37, Bethesda, Maryland 20892, United States
| | - Aniekanabasi Ufot
- Laboratory
of Metabolism, National Cancer Institute,
National Institutes of Health, Room 5000, Building 37, Bethesda, Maryland 20892, United States
| | - Aleksey Porollo
- Center
for Autoimmune Genomics and Etiology, Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229, United States
- Department
of Pediatrics, University of Cincinnati
College of Medicine, Cincinnati, Ohio 45267, United States
| | - Stewart R. Durell
- Laboratory
of Cell Biology, National Cancer Institute,
National Institutes of Health, Building 37, Bethesda, Maryland 20892, United States
| | - Charles Vinson
- Laboratory
of Metabolism, National Cancer Institute,
National Institutes of Health, Room 5000, Building 37, Bethesda, Maryland 20892, United States
| |
Collapse
|
4
|
Saikia B, Gogoi CR, Rahman A, Baruah A. Identification of an optimal foldability criterion to design misfolding resistant protein. J Chem Phys 2021; 155:144102. [PMID: 34654294 DOI: 10.1063/5.0057533] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Proteins achieve their functional, active, and operative three dimensional native structures by overcoming the possibility of being trapped in non-native energy minima present in the energy landscape. The enormous and intricate interactions that play an important role in protein folding also determine the stability of the proteins. The large number of stabilizing/destabilizing interactions makes proteins to be only marginally stable as compared to the other competing structures. Therefore, there are some possibilities that they become trapped in the non-native conformations and thus get misfolded. These misfolded proteins lead to several debilitating diseases. This work performs a comparative study of some existing foldability criteria in the computational design of misfold resistant protein sequences based on self-consistent mean field theory. The foldability criteria selected for this study are Ef, Δ, and Φ that are commonly used in protein design procedures to determine the most efficient foldability criterion for the design of misfolding resistant proteins. The results suggest that the foldability criterion Δ is significantly better in designing a funnel energy landscape stabilizing the target state. The results also suggest that inclusion of negative design features is important for designing misfolding resistant proteins, but more information about the non-native conformations in terms of Φ leads to worse results compared to even simple positive design. The sequences designed using Δ show better resistance to misfolding in the Monte Carlo simulations performed in the study.
Collapse
Affiliation(s)
- Bondeepa Saikia
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India
| | - Chimi Rekha Gogoi
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India
| | - Aziza Rahman
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India
| | - Anupaul Baruah
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India
| |
Collapse
|
5
|
Takahashi T, Chikenji G, Tokita K. Lattice protein design using Bayesian learning. Phys Rev E 2021; 104:014404. [PMID: 34412286 DOI: 10.1103/physreve.104.014404] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Accepted: 06/11/2021] [Indexed: 01/01/2023]
Abstract
Protein design is the inverse approach of the three-dimensional (3D) structure prediction for elucidating the relationship between the 3D structures and amino acid sequences. In general, the computation of the protein design involves a double loop: A loop for amino acid sequence changes and a loop for an exhaustive conformational search for each amino acid sequence. Herein, we propose a novel statistical mechanical design method using Bayesian learning, which can design lattice proteins without the exhaustive conformational search. We consider a thermodynamic hypothesis of the evolution of proteins and apply it to the prior distribution of amino acid sequences. Furthermore, we take the water effect into account in view of the grand canonical picture. As a result, on applying the 2D lattice hydrophobic-polar (HP) model, our design method successfully finds an amino acid sequence for which the target conformation has a unique ground state. However, the performance was not as good for the 3D lattice HP models compared to the 2D models. The performance of the 3D model improves on using a 20-letter lattice proteins. Furthermore, we find a strong linearity between the chemical potential of water and the number of surface residues, thereby revealing the relationship between protein structure and the effect of water molecules. The advantage of our method is that it greatly reduces computation time, because it does not require long calculations for the partition function corresponding to an exhaustive conformational search. As our method uses a general form of Bayesian learning and statistical mechanics and is not limited to lattice proteins, the results presented here elucidate some heuristics used successfully in previous protein design methods.
Collapse
Affiliation(s)
- Tomoei Takahashi
- Graduate School of Informatics, Nagoya University, Nagoya 464-8601, Japan
| | - George Chikenji
- Graduate School of Engineering, Nagoya University, Nagoya 464-8603, Japan
| | - Kei Tokita
- Graduate School of Informatics, Nagoya University, Nagoya 464-8601, Japan
| |
Collapse
|
6
|
Roy P, Sengupta N. Hydration of a small protein under carbon nanotube confinement: Adsorbed substates induce selective separation of the dynamical response. J Chem Phys 2021; 154:204702. [PMID: 34241160 DOI: 10.1063/5.0047078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The co-involvement of biological molecules and nanomaterials has increasingly come to the fore in modern-day applications. While the "bio-nano" (BN) interface presents physico-chemical characteristics that are manifestly different from those observed in isotropic bulk conditions, the underlying molecular reasons remain little understood; this is especially true of anomalies in interfacial hydration. In this paper, we leverage atomistic simulations to study differential adsorption characteristics of a small protein on the inner (concave) surface of a single-walled carbon nanotube whose diameter exceeds dimensions conducive to single-file water movement. Our findings indicate that the extent of adsorption is decided by the degree of foldedness of the protein conformational substate. Importantly, we find that partially folded substates, but not the natively folded one, induce reorganization of the protein hydration layer into an inner layer water closer to the nanotube axis and an outer layer water in the interstitial space near the nanotube walls. Further analyses reveal sharp dynamical differences between water molecules in the two layers as observed in the onset of increased heterogeneity in rotational relaxation and the enhanced deviation from Fickian behavior. The vibrational density of states reveals that the dynamical distinctions are correlated with differences in crucial bands in the power spectra. The current results set the stage for further systematic studies of various BN interfaces vis-à-vis control of hydration properties.
Collapse
Affiliation(s)
- Priti Roy
- Department of Biological Sciences, Indian Institute of Science Education and Research Kolkata, Mohanpur, West Bengal 741246, India
| | - Neelanjana Sengupta
- Department of Biological Sciences, Indian Institute of Science Education and Research Kolkata, Mohanpur, West Bengal 741246, India
| |
Collapse
|
7
|
Barozet A, Bianciotto M, Vaisset M, Siméon T, Minoux H, Cortés J. Protein loops with multiple meta-stable conformations: A challenge for sampling and scoring methods. Proteins 2020; 89:218-231. [PMID: 32920900 DOI: 10.1002/prot.26008] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 08/10/2020] [Accepted: 08/25/2020] [Indexed: 12/25/2022]
Abstract
Flexible regions in proteins, such as loops, cannot be represented by a single conformation. Instead, conformational ensembles are needed to provide a more global picture. In this context, identifying statistically meaningful conformations within an ensemble generated by loop sampling techniques remains an open problem. The difficulty is primarily related to the lack of structural data about these flexible regions. With the majority of structural data coming from x-ray crystallography and ignoring plasticity, the conception and evaluation of loop scoring methods is challenging. In this work, we compare the performance of various scoring methods on a set of eight protein loops that are known to be flexible. The ability of each method to identify and select all of the known conformations is assessed, and the underlying energy landscapes are produced and projected to visualize the qualitative differences obtained when using the methods. Statistical potentials are found to provide considerable reliability despite their being designed to tradeoff accuracy for lower computational cost. On a large pool of loop models, they are capable of filtering out statistically improbable states while retaining those that resemble known (and thus likely) conformations. However, computationally expensive methods are still required for more precise assessment and structural refinement. The results also highlight the importance of employing several scaffolds for the protein, due to the high influence of small structural rearrangements in the rest of the protein over the modeled energy landscape for the loop.
Collapse
Affiliation(s)
- Amélie Barozet
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France.,Sanofi Recherche & Développement, Integrated Drug Discovery, Molecular Design Sciences, Vitry-sur-Seine, France
| | - Marc Bianciotto
- Sanofi Recherche & Développement, Integrated Drug Discovery, Molecular Design Sciences, Vitry-sur-Seine, France
| | - Marc Vaisset
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Thierry Siméon
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Hervé Minoux
- Sanofi Recherche & Développement, Integrated Drug Discovery, Molecular Design Sciences, Vitry-sur-Seine, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| |
Collapse
|
8
|
Hayes RL, Vilseck JZ, Brooks CL. Approaching protein design with multisite λ dynamics: Accurate and scalable mutational folding free energies in T4 lysozyme. Protein Sci 2019; 27:1910-1922. [PMID: 30175503 DOI: 10.1002/pro.3500] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 08/06/2018] [Accepted: 08/15/2018] [Indexed: 12/14/2022]
Abstract
The estimation of changes in free energy upon mutation is central to the problem of protein design. Modern protein design methods have had remarkable success over a wide range of design targets, but are reaching their limits in ligand binding and enzyme design due to insufficient accuracy in mutational free energies. Alchemical free energy calculations have the potential to supplement modern design methods through more accurate molecular dynamics based prediction of free energy changes, but suffer from high computational cost. Multisite λ dynamics (MSλD) is a particularly efficient and scalable free energy method with potential to explore combinatorially large sequence spaces inaccessible with other free energy methods. This work aims to quantify the accuracy of MSλD and demonstrate its scalability. We apply MSλD to the classic problem of calculating folding free energies in T4 lysozyme, a system with a wealth of experimental measurements. Single site mutants considering 32 mutations show remarkable agreement with experiment with a Pearson correlation of 0.914 and mean unsigned error of 1.19 kcal/mol. Multisite mutants in systems with up to five concurrent mutations spanning 240 different sequences show comparable agreement with experiment. These results demonstrate the promise of MSλD in exploring large sequence spaces for protein design.
Collapse
Affiliation(s)
- Ryan L Hayes
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Jonah Z Vilseck
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Charles L Brooks
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan, 48109.,Biophysics Program, University of Michigan, Ann Arbor, Michigan, 48109
| |
Collapse
|
9
|
Chen J, Schafer NP, Wolynes PG, Clementi C. Localizing Frustration in Proteins Using All-Atom Energy Functions. J Phys Chem B 2019; 123:4497-4504. [PMID: 31063375 DOI: 10.1021/acs.jpcb.9b01545] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The problems of protein folding and protein design are two sides of the same coin. Protein folding involves exploring a protein's configuration space given a fixed sequence, whereas protein design involves searching in sequence space given a particular target structure. For a protein to fold quickly and reliably, its energy landscape must be biased toward the folded ensemble throughout its configuration space and must lack deep kinetic traps that would otherwise frustrate folding. Evolution has "designed" the sequences of many naturally occurring proteins, through an eons-long process of random mutation and selection, to yield landscapes with a minimal degree of frustration. The task facing humans hoping to design protein sequences that fold into particular structures is to use the available approximate energy functions to sculpt funneled landscapes that work in the laboratory. In this work, we demonstrate how to calculate several localized frustration measures using an all-atom energy function. Specifically, we employ the Rosetta energy function, which has been used successfully to design proteins and which has a natural pairwise decomposition that is suitably solvent-averaged. We calculate these newly developed frustration measures for both a mutated WW domain, FiP35, and a three-helix bundle that was designed completely by humans, Alpha3D. The structure of FiP35 exhibits less localized frustration than that of Alpha3D. A mutation toward the consensus sequence for WW domains in FiP35, which has been shown unexpectedly in experiment to disrupt folding, induces localized frustration by disrupting the hydrophobic core. By performing a limited redesign on the sequence of Alpha3D, we show that some, but not all, mutations that lower the energy also result in decreased frustration. The results suggest that, in addition to being useful for detecting residual frustration in protein structures, optimizing the localized frustration measures presented here may be a useful and automatic means of balancing positive and negative design in protein design tasks.
Collapse
|
10
|
Abstract
During the last two decades, the pharmaceutical industry has progressed from detecting small molecules to designing biologic-based therapeutics. Amino acid-based drugs are a group of biologic-based therapeutics that can effectively combat the diseases caused by drug resistance or molecular deficiency. Computational techniques play a key role to design and develop the amino acid-based therapeutics such as proteins, peptides and peptidomimetics. In this study, it was attempted to discuss the various elements for computational design of amino acid-based therapeutics. Protein design seeks to identify the properties of amino acid sequences that fold to predetermined structures with desirable structural and functional characteristics. Peptide drugs occupy a middle space between proteins and small molecules and it is hoped that they can target "undruggable" intracellular protein-protein interactions. Peptidomimetics, the compounds that mimic the biologic characteristics of peptides, present refined pharmacokinetic properties compared to the original peptides. Here, the elaborated techniques that are developed to characterize the amino acid sequences consistent with a specific structure and allow protein design are discussed. Moreover, the key principles and recent advances in currently introduced computational techniques for rational peptide design are spotlighted. The most advanced computational techniques developed to design novel peptidomimetics are also summarized.
Collapse
Affiliation(s)
- Tayebeh Farhadi
- Chronic Respiratory Diseases Research Center (CRDRC), National Research Institute of Tuberculosis and Lung Diseases (NRITLD), Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Seyed MohammadReza Hashemian
- Chronic Respiratory Diseases Research Center (CRDRC), National Research Institute of Tuberculosis and Lung Diseases (NRITLD), Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Clinical Tuberculosis and Epidemiology Research Center, National Research Institute of Tuberculosis and Lung Disease, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
11
|
Jain S, Jou JD, Georgiev IS, Donald BR. A critical analysis of computational protein design with sparse residue interaction graphs. PLoS Comput Biol 2017; 13:e1005346. [PMID: 28358804 PMCID: PMC5391103 DOI: 10.1371/journal.pcbi.1005346] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 04/13/2017] [Accepted: 01/03/2017] [Indexed: 11/19/2022] Open
Abstract
Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies. Computational structure-based protein design algorithms have successfully redesigned proteins to fold and bind target substrates in vitro, and even in vivo. Because the complexity of a computational design increases dramatically with the number of mutable residues, many design algorithms employ cutoffs (distance or energy) to neglect some pairwise residue interactions, thereby reducing the effective search space and computational cost. However, the energies neglected by such cutoffs can add up, which may have nontrivial effects on the designed sequence and its function. To study the effects of using cutoffs on protein design, we computed the optimal sequence both with and without cutoffs, and showed that neglecting long-range interactions can significantly change the computed conformation and sequence. Designs on proteins with experimentally measured thermostability showed the benefits of computing the optimal sequences (and their conformations), both with and without cutoffs, efficiently and accurately. Therefore, we also showed that a provable, ensemble-based algorithm can efficiently compute the optimal conformation and sequence, both with and without applying cutoffs, by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine cutoffs with provable, ensemble-based algorithms to reap the computational efficiency of cutoffs while avoiding their potential inaccuracies.
Collapse
Affiliation(s)
- Swati Jain
- Computational Biology and Bioinformatics Program, Duke University, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Jonathan D. Jou
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Ivelin S. Georgiev
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Bruce R. Donald
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, United States of America
- Department of Chemistry, Duke University, Durham, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
12
|
Porebski BT, Keleher S, Hollins JJ, Nickson AA, Marijanovic EM, Borg NA, Costa MGS, Pearce MA, Dai W, Zhu L, Irving JA, Hoke DE, Kass I, Whisstock JC, Bottomley SP, Webb GI, McGowan S, Buckle AM. Smoothing a rugged protein folding landscape by sequence-based redesign. Sci Rep 2016; 6:33958. [PMID: 27667094 PMCID: PMC5036219 DOI: 10.1038/srep33958] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Accepted: 09/01/2016] [Indexed: 11/09/2022] Open
Abstract
The rugged folding landscapes of functional proteins puts them at risk of misfolding and aggregation. Serine protease inhibitors, or serpins, are paradigms for this delicate balance between function and misfolding. Serpins exist in a metastable state that undergoes a major conformational change in order to inhibit proteases. However, conformational labiality of the native serpin fold renders them susceptible to misfolding, which underlies misfolding diseases such as α1-antitrypsin deficiency. To investigate how serpins balance function and folding, we used consensus design to create conserpin, a synthetic serpin that folds reversibly, is functional, thermostable, and polymerization resistant. Characterization of its structure, folding and dynamics suggest that consensus design has remodeled the folding landscape to reconcile competing requirements for stability and function. This approach may offer general benefits for engineering functional proteins that have risky folding landscapes, including the removal of aggregation-prone intermediates, and modifying scaffolds for use as protein therapeutics.
Collapse
Affiliation(s)
- Benjamin T Porebski
- Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia.,Medical Research Council Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, United Kingdom
| | - Shani Keleher
- Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia
| | - Jeffrey J Hollins
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, United Kingdom
| | - Adrian A Nickson
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, United Kingdom
| | - Emilia M Marijanovic
- Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia
| | - Natalie A Borg
- Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia
| | - Mauricio G S Costa
- Programa de Computação Científica, Fundação Oswaldo Cruz, 21949900 Rio de Janeiro, Brazil
| | - Mary A Pearce
- Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia
| | - Weiwen Dai
- Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia
| | - Liguang Zhu
- Faculty of Information Technology, Monash University, Clayton, Victoria 3800, Australia
| | - James A Irving
- Wolfson Institute for Biomedical Research, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - David E Hoke
- Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia
| | - Itamar Kass
- Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia
| | - James C Whisstock
- Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia.,ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Clayton, Victoria 3800, Australia
| | - Stephen P Bottomley
- Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia
| | - Geoffrey I Webb
- Faculty of Information Technology, Monash University, Clayton, Victoria 3800, Australia
| | - Sheena McGowan
- Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia.,Biomedicine Discovery Institute, Department of Microbiology, Monash University, Clayton, Victoria 3800, Australia
| | - Ashley M Buckle
- Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia
| |
Collapse
|
13
|
Porebski BT, Nickson AA, Hoke DE, Hunter MR, Zhu L, McGowan S, Webb GI, Buckle AM. Structural and dynamic properties that govern the stability of an engineered fibronectin type III domain. Protein Eng Des Sel 2015; 28:67-78. [PMID: 25691761 PMCID: PMC4330816 DOI: 10.1093/protein/gzv002] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Consensus protein design is a rapid and reliable technique for the improvement of protein stability, which relies on the use of homologous protein sequences. To enhance the stability of a fibronectin type III (FN3) domain, consensus design was employed using an alignment of 2123 sequences. The resulting FN3 domain, FN3con, has unprecedented stability, with a melting temperature >100°C, a ΔGD−N of 15.5 kcal mol−1 and a greatly reduced unfolding rate compared with wild-type. To determine the underlying molecular basis for stability, an X-ray crystal structure of FN3con was determined to 2.0 Å and compared with other FN3 domains of varying stabilities. The structure of FN3con reveals significantly increased salt bridge interactions that are cooperatively networked, and a highly optimized hydrophobic core. Molecular dynamics simulations of FN3con and comparison structures show the cooperative power of electrostatic and hydrophobic networks in improving FN3con stability. Taken together, our data reveal that FN3con stability does not result from a single mechanism, but rather the combination of several features and the removal of non-conserved, unfavorable interactions. The large number of sequences employed in this study has most likely enhanced the robustness of the consensus design, which is now possible due to the increased sequence availability in the post-genomic era. These studies increase our knowledge of the molecular mechanisms that govern stability and demonstrate the rising potential for enhancing stability via the consensus method.
Collapse
Affiliation(s)
- Benjamin T Porebski
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, School of Biomedical Sciences, Monash University, Clayton, VIC 3800, Australia
| | - Adrian A Nickson
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - David E Hoke
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, School of Biomedical Sciences, Monash University, Clayton, VIC 3800, Australia
| | - Morag R Hunter
- Centre for Brain Research and Department of Pharmacology and Clinical Pharmacology, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
| | - Liguang Zhu
- Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - Sheena McGowan
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, School of Biomedical Sciences, Monash University, Clayton, VIC 3800, Australia
| | - Geoffrey I Webb
- Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - Ashley M Buckle
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, School of Biomedical Sciences, Monash University, Clayton, VIC 3800, Australia
| |
Collapse
|
14
|
Abstract
Biomolecules are the prime information processing elements of living matter. Most of these inanimate systems are polymers that compute their own structures and dynamics using as input seemingly random character strings of their sequence, following which they coalesce and perform integrated cellular functions. In large computational systems with finite interaction-codes, the appearance of conflicting goals is inevitable. Simple conflicting forces can lead to quite complex structures and behaviors, leading to the concept of frustration in condensed matter. We present here some basic ideas about frustration in biomolecules and how the frustration concept leads to a better appreciation of many aspects of the architecture of biomolecules, and especially how biomolecular structure connects to function by means of localized frustration. These ideas are simultaneously both seductively simple and perilously subtle to grasp completely. The energy landscape theory of protein folding provides a framework for quantifying frustration in large systems and has been implemented at many levels of description. We first review the notion of frustration from the areas of abstract logic and its uses in simple condensed matter systems. We discuss then how the frustration concept applies specifically to heteropolymers, testing folding landscape theory in computer simulations of protein models and in experimentally accessible systems. Studying the aspects of frustration averaged over many proteins provides ways to infer energy functions useful for reliable structure prediction. We discuss how frustration affects folding mechanisms. We review here how the biological functions of proteins are related to subtle local physical frustration effects and how frustration influences the appearance of metastable states, the nature of binding processes, catalysis and allosteric transitions. In this review, we also emphasize that frustration, far from being always a bad thing, is an essential feature of biomolecules that allows dynamics to be harnessed for function. In this way, we hope to illustrate how Frustration is a fundamental concept in molecular biology.
Collapse
|
15
|
Schafer NP, Kim BL, Zheng W, Wolynes PG. Learning To Fold Proteins Using Energy Landscape Theory. Isr J Chem 2014; 54:1311-1337. [PMID: 25308991 PMCID: PMC4189132 DOI: 10.1002/ijch.201300145] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
This review is a tutorial for scientists interested in the problem of protein structure prediction, particularly those interested in using coarse-grained molecular dynamics models that are optimized using lessons learned from the energy landscape theory of protein folding. We also present a review of the results of the AMH/AMC/AMW/AWSEM family of coarse-grained molecular dynamics protein folding models to illustrate the points covered in the first part of the article. Accurate coarse-grained structure prediction models can be used to investigate a wide range of conceptual and mechanistic issues outside of protein structure prediction; specifically, the paper concludes by reviewing how AWSEM has in recent years been able to elucidate questions related to the unusual kinetic behavior of artificially designed proteins, multidomain protein misfolding, and the initial stages of protein aggregation.
Collapse
Affiliation(s)
- N P Schafer
- Department of Physics, Rice University, Houston, TX 77005, USA ; Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA
| | - B L Kim
- Department of Chemistry, Rice University, Houston, TX 77005, USA ; Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA
| | - W Zheng
- Department of Chemistry, Rice University, Houston, TX 77005, USA ; Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA
| | - P G Wolynes
- Department of Physics, Rice University, Houston, TX 77005, USA ; Department of Chemistry, Rice University, Houston, TX 77005, USA ; Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA
| |
Collapse
|
16
|
Yadahalli S, Hemanth Giri Rao VV, Gosavi S. Modeling Non-Native Interactions in Designed Proteins. Isr J Chem 2014. [DOI: 10.1002/ijch.201400035] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
17
|
Truong HH, Kim BL, Schafer NP, Wolynes PG. Funneling and frustration in the energy landscapes of some designed and simplified proteins. J Chem Phys 2013; 139:121908. [PMID: 24089720 PMCID: PMC3732306 DOI: 10.1063/1.4813504] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Accepted: 06/26/2013] [Indexed: 11/15/2022] Open
Abstract
We explore the similarities and differences between the energy landscapes of proteins that have been selected by nature and those of some proteins designed by humans. Natural proteins have evolved to function as well as fold, and this is a source of energetic frustration. The sequence of Top7, on the other hand, was designed with architecture alone in mind using only native state stability as the optimization criterion. Its topology had not previously been observed in nature. Experimental studies show that the folding kinetics of Top7 is more complex than the kinetics of folding of otherwise comparable naturally occurring proteins. In this paper, we use structure prediction tools, frustration analysis, and free energy profiles to illustrate the folding landscapes of Top7 and two other proteins designed by Takada. We use both perfectly funneled (structure-based) and predictive (transferable) models to gain insight into the role of topological versus energetic frustration in these systems and show how they differ from those found for natural proteins. We also study how robust the folding of these designs would be to the simplification of the sequences using fewer amino acid types. Simplification using a five amino acid type code results in comparable quality of structure prediction to the full sequence in some cases, while the two-letter simplification scheme dramatically reduces the quality of structure prediction.
Collapse
Affiliation(s)
- Ha H Truong
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
| | | | | | | |
Collapse
|
18
|
Huntress MM, Gozem S, Malley KR, Jailaubekov AE, Vasileiou C, Vengris M, Geiger JH, Borhan B, Schapiro I, Larsen DS, Olivucci M. Toward an Understanding of the Retinal Chromophore in Rhodopsin Mimics. J Phys Chem B 2013; 117:10053-70. [DOI: 10.1021/jp305935t] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Mark M. Huntress
- Department of Chemistry, Bowling Green State University, Bowling Green, Ohio
43402, United States
| | - Samer Gozem
- Department of Chemistry, Bowling Green State University, Bowling Green, Ohio
43402, United States
| | - Konstantin R. Malley
- Department
of Chemistry, University of California Davis, One Shields Avenure,
Davis, California 95616, United States
| | - Askat E. Jailaubekov
- Department
of Chemistry, University of California Davis, One Shields Avenure,
Davis, California 95616, United States
| | - Chrysoula Vasileiou
- Department of Chemistry, Michigan State University, Lansing, Michigan 48824,
United States
| | - Mikas Vengris
- Department
of Chemistry, University of California Davis, One Shields Avenure,
Davis, California 95616, United States
- Faculty of
Physics, Vilnius University, Sauletekio
10 LT10223 Vilnius,
Lithuania
| | - James H. Geiger
- Department of Chemistry, Michigan State University, Lansing, Michigan 48824,
United States
| | - Babak Borhan
- Department of Chemistry, Michigan State University, Lansing, Michigan 48824,
United States
| | - Igor Schapiro
- Department of Chemistry, Bowling Green State University, Bowling Green, Ohio
43402, United States
| | - Delmar S. Larsen
- Department
of Chemistry, University of California Davis, One Shields Avenure,
Davis, California 95616, United States
| | - Massimo Olivucci
- Department of Chemistry, Bowling Green State University, Bowling Green, Ohio
43402, United States
| |
Collapse
|
19
|
Minning J, Porto M, Bastolla U. Detecting selection for negative design in proteins through an improved model of the misfolded state. Proteins 2013; 81:1102-12. [PMID: 23280507 DOI: 10.1002/prot.24244] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 12/17/2012] [Indexed: 11/05/2022]
Abstract
Proteins that need to be structured in their native state must be stable both against the unfolded ensemble and against incorrectly folded (misfolded) conformations with low free energy. Positive design targets the first type of stability by strengthening native interactions. The second type of stability is achieved by destabilizing interactions that occur frequently in the misfolded ensemble, a strategy called negative design. Here, we investigate negative design adopting a statistical mechanical model of the misfolded ensemble, which improves the usual Gaussian approximation by taking into account the third moment of the energy distribution and contact correlations. Applying this model, we detect and quantify selection for negative design in most natural proteins, and we analytically design protein sequences that are stable both against unfolding and against misfolding.
Collapse
Affiliation(s)
- Jonas Minning
- Institut für Festkörperphysik, Technische Universität Darmstadt, Darmstadt, Germany
| | | | | |
Collapse
|
20
|
Principles for designing ideal protein structures. Nature 2013; 491:222-7. [PMID: 23135467 DOI: 10.1038/nature11600] [Citation(s) in RCA: 408] [Impact Index Per Article: 37.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2012] [Accepted: 09/19/2012] [Indexed: 02/03/2023]
Abstract
Unlike random heteropolymers, natural proteins fold into unique ordered structures. Understanding how these are encoded in amino-acid sequences is complicated by energetically unfavourable non-ideal features--for example kinked α-helices, bulged β-strands, strained loops and buried polar groups--that arise in proteins from evolutionary selection for biological function or from neutral drift. Here we describe an approach to designing ideal protein structures stabilized by completely consistent local and non-local interactions. The approach is based on a set of rules relating secondary structure patterns to protein tertiary motifs, which make possible the design of funnel-shaped protein folding energy landscapes leading into the target folded state. Guided by these rules, we designed sequences predicted to fold into ideal protein structures consisting of α-helices, β-strands and minimal loops. Designs for five different topologies were found to be monomeric and very stable and to adopt structures in solution nearly identical to the computational models. These results illuminate how the folding funnels of natural proteins arise and provide the foundation for engineering a new generation of functional proteins free from natural evolution.
Collapse
|
21
|
Tiwari MK, Singh R, Singh RK, Kim IW, Lee JK. Computational approaches for rational design of proteins with novel functionalities. Comput Struct Biotechnol J 2012; 2:e201209002. [PMID: 24688643 PMCID: PMC3962203 DOI: 10.5936/csbj.201209002] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Revised: 08/17/2012] [Accepted: 08/23/2012] [Indexed: 11/22/2022] Open
Abstract
Proteins are the most multifaceted macromolecules in living systems and have various important functions, including structural, catalytic, sensory, and regulatory functions. Rational design of enzymes is a great challenge to our understanding of protein structure and physical chemistry and has numerous potential applications. Protein design algorithms have been applied to design or engineer proteins that fold, fold faster, catalyze, catalyze faster, signal, and adopt preferred conformational states. The field of de novo protein design, although only a few decades old, is beginning to produce exciting results. Developments in this field are already having a significant impact on biotechnology and chemical biology. The application of powerful computational methods for functional protein designing has recently succeeded at engineering target activities. Here, we review recently reported de novo functional proteins that were developed using various protein design approaches, including rational design, computational optimization, and selection from combinatorial libraries, highlighting recent advances and successes.
Collapse
Affiliation(s)
- Manish Kumar Tiwari
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea ; These authors contributed equally
| | - Ranjitha Singh
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea ; These authors contributed equally
| | - Raushan Kumar Singh
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea
| | - In-Won Kim
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea
| | - Jung-Kul Lee
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea ; Institute of SK-KU Biomaterials, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea
| |
Collapse
|
22
|
Abstract
Surface charges of proteins have in several cases been found to function as "structural gatekeepers," which avoid unwanted interactions by negative design, for example, in the control of protein aggregation and binding. The question is then if side-chain charges, due to their desolvation penalties, play a corresponding role in protein folding by avoiding competing, misfolded traps? To find out, we removed all 32 side-chain charges from the 101-residue protein S6 from Thermus thermophilus. The results show that the charge-depleted S6 variant not only retains its native structure and cooperative folding transition, but folds also faster than the wild-type protein. In addition, charge removal unleashes pronounced aggregation on longer timescales. S6 provides thus an example where the bias toward native contacts of a naturally evolved protein sequence is independent of charges, and point at a fundamental difference in the codes for folding and intermolecular interaction: specificity in folding is governed primarily by hydrophobic packing and hydrogen bonding, whereas solubility and binding relies critically on the interplay of side-chain charges.
Collapse
|
23
|
Samish I, MacDermaid CM, Perez-Aguilar JM, Saven JG. Theoretical and Computational Protein Design. Annu Rev Phys Chem 2011; 62:129-49. [DOI: 10.1146/annurev-physchem-032210-103509] [Citation(s) in RCA: 119] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
| | | | | | - Jeffery G. Saven
- Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
| |
Collapse
|
24
|
The empirical valence bond model: theory and applications. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2011. [DOI: 10.1002/wcms.10] [Citation(s) in RCA: 113] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
25
|
Fromer M, Yanover C, Linial M. Design of multispecific protein sequences using probabilistic graphical modeling. Proteins 2010; 78:530-47. [PMID: 19842166 DOI: 10.1002/prot.22575] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In nature, proteins partake in numerous protein- protein interactions that mediate their functions. Moreover, proteins have been shown to be physically stable in multiple structures, induced by cellular conditions, small ligands, or covalent modifications. Understanding how protein sequences achieve this structural promiscuity at the atomic level is a fundamental step in the drug design pipeline and a critical question in protein physics. One way to investigate this subject is to computationally predict protein sequences that are compatible with multiple states, i.e., multiple target structures or binding to distinct partners. The goal of engineering such proteins has been termed multispecific protein design. We develop a novel computational framework to efficiently and accurately perform multispecific protein design. This framework utilizes recent advances in probabilistic graphical modeling to predict sequences with low energies in multiple target states. Furthermore, it is also geared to specifically yield positional amino acid probability profiles compatible with these target states. Such profiles can be used as input to randomly bias high-throughput experimental sequence screening techniques, such as phage display, thus providing an alternative avenue for elucidating the multispecificity of natural proteins and the synthesis of novel proteins with specific functionalities. We prove the utility of such multispecific design techniques in better recovering amino acid sequence diversities similar to those resulting from millions of years of evolution. We then compare the approaches of prediction of low energy ensembles and of amino acid profiles and demonstrate their complementarity in providing more robust predictions for protein design.
Collapse
Affiliation(s)
- Menachem Fromer
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel.
| | | | | |
Collapse
|
26
|
Kamerlin SCL, Warshel A. The EVB as a quantitative tool for formulating simulations and analyzing biological and chemical reactions. Faraday Discuss 2010; 145:71-106. [PMID: 25285029 PMCID: PMC4184467 DOI: 10.1039/b907354j] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Recent years have seen dramatic improvements in computer power, allowing ever more challenging problems to be approached. In light of this, it is imperative to have a quantitative model for examining chemical reactivity, both in the condensed phase and in solution, as well as to accurately quantify physical organic chemistry (particularly as experimental approaches can often be inconclusive). Similarly, computational approaches allow for great progress in studying enzyme catalysis, as they allow for the separation of the relevant energy contributions to catalysis. Due to the complexity of the problems that need addressing, there is a need for an approach that can combine reliability with an ability to capture complex systems in order to resolve long-standing controversies in a unique way. Herein, we will demonstrate that the empirical valence bond (EVB) approach provides a powerful way to connect the classical concepts of physical organic chemistry to the actual energies of enzymatic reactions by means of computation. Additionally, we will discuss the proliferation of this approach, as well as attempts to capture its basic chemistry and repackage it under different names. We believe that the EVB approach is the most powerful tool that is currently available for studies of chemical processes in the condensed phase in general and enzymes in particular, particularly when trying to explore the different proposals about the origin of the catalytic power of enzymes.
Collapse
Affiliation(s)
- Shina C. L. Kamerlin
- Department of Chemistry SGM418, University of Southern California, 3620 McClintock Ave., Los Angeles, CA-90089, USA
| | - Arieh Warshel
- Department of Chemistry SGM418, University of Southern California, 3620 McClintock Ave., Los Angeles, CA-90089, USA
| |
Collapse
|
27
|
Fromer M, Yanover C. Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space. Proteins 2009; 75:682-705. [PMID: 19003998 DOI: 10.1002/prot.22280] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The task of engineering a protein to assume a target three-dimensional structure is known as protein design. Computational search algorithms are devised to predict a minimal energy amino acid sequence for a particular structure. In practice, however, an ensemble of low-energy sequences is often sought. Primarily, this is performed because an individual predicted low-energy sequence may not necessarily fold to the target structure because of both inaccuracies in modeling protein energetics and the nonoptimal nature of search algorithms employed. Additionally, some low-energy sequences may be overly stable and thus lack the dynamic flexibility required for biological functionality. Furthermore, the investigation of low-energy sequence ensembles will provide crucial insights into the pseudo-physical energy force fields that have been derived to describe structural energetics for protein design. Significantly, numerous studies have predicted low-energy sequences, which were subsequently synthesized and demonstrated to fold to desired structures. However, the characterization of the sequence space defined by such energy functions as compatible with a target structure has not been performed in full detail. This issue is critical for protein design scientists to successfully continue using these force fields at an ever-increasing pace and scale. In this paper, we present a conceptually novel algorithm that rapidly predicts the set of lowest energy sequences for a given structure. Based on the theory of probabilistic graphical models, it performs efficient inspection and partitioning of the near-optimal sequence space, without making any assumptions of positional independence. We benchmark its performance on a diverse set of relevant protein design examples and show that it consistently yields sequences of lower energy than those derived from state-of-the-art techniques. Thus, we find that previously presented search techniques do not fully depict the low-energy space as precisely. Examination of the predicted ensembles indicates that, for each structure, the amino acid identity at a majority of positions must be chosen extremely selectively so as to not incur significant energetic penalties. We investigate this high degree of similarity and demonstrate how more diverse near-optimal sequences can be predicted in order to systematically overcome this bottleneck for computational design. Finally, we exploit this in-depth analysis of a collection of the lowest energy sequences to suggest an explanation for previously observed experimental design results. The novel methodologies introduced here accurately portray the sequence space compatible with a protein structure and further supply a scheme to yield heterogeneous low-energy sequences, thus providing a powerful instrument for future work on protein design.
Collapse
Affiliation(s)
- Menachem Fromer
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
| | | |
Collapse
|
28
|
Jumawid MT, Takahashi T, Yamazaki T, Ashigai H, Mihara H. Selection and structural analysis of de novo proteins from an alpha3beta3 genetic library. Protein Sci 2009; 18:384-98. [PMID: 19173222 DOI: 10.1002/pro.41] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The construction of novel functional proteins has been a key area of protein engineering. However, there are few reports of functional proteins constructed from artificial scaffolds. Here, we have constructed a genetic library encoding alpha3beta3 de novo proteins to generate novel scaffolds in smaller size using a binary combination of simplified hydrophobic and hydrophilic amino acid sets. To screen for folded de novo proteins, we used a GFP-based screening system and successfully obtained the proteins from the colonies emitting the very bright fluorescence as a similar intensity of GFP. Proteins isolated from the very bright colonies (vTAJ) and bright colonies (wTAJ) were analyzed by circular dichroism (CD), 8-anilino-1-naphthalenesulfonate (ANS) binding assay, and analytical size-exclusion chromatography (SEC). CD studies revealed that vTAJ and wTAJ proteins had both alpha-helix and beta-sheet structures with thermal stabilities. Moreover, the selected proteins demonstrated a variety of association states existing as monomer, dimer, and oligomer formation. The SEC and ANS binding assays revealed that vTAJ proteins tend to be a characteristic of the folded protein, but not in a molten-globule state. A vTAJ protein, vTAJ13, which has a packed globular structure and exists as a monomer, was further analyzed by nuclear magnetic resonance. NOE connectivities between backbone signals of vTAJ13 suggested that the protein contains three alpha-helices and three beta-strands as intended by its design. Thus, it would appear that artificially generated alpha3beta3 de novo proteins isolated from very bright colonies using the GFP fusion system exhibit excellent properties similar to folded proteins and would be available as artificial scaffolds to generate functional proteins with catalytic and ligand binding properties.
Collapse
Affiliation(s)
- Mariejoy Therese Jumawid
- Department of Bioengineering, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Nagatsuta, Yokohama, Japan
| | | | | | | | | |
Collapse
|
29
|
Suárez M, Jaramillo A. Challenges in the computational design of proteins. J R Soc Interface 2009; 6 Suppl 4:S477-91. [PMID: 19324680 PMCID: PMC2843960 DOI: 10.1098/rsif.2008.0508.focus] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2008] [Accepted: 02/04/2009] [Indexed: 11/12/2022] Open
Abstract
Protein design has many applications not only in biotechnology but also in basic science. It uses our current knowledge in structural biology to predict, by computer simulations, an amino acid sequence that would produce a protein with targeted properties. As in other examples of synthetic biology, this approach allows the testing of many hypotheses in biology. The recent development of automated computational methods to design proteins has enabled proteins to be designed that are very different from any known ones. Moreover, some of those methods mostly rely on a physical description of atomic interactions, which allows the designed sequences not to be biased towards known proteins. In this paper, we will describe the use of energy functions in computational protein design, the use of atomic models to evaluate the free energy in the unfolded and folded states, the exploration and optimization of amino acid sequences, the problem of negative design and the design of biomolecular function. We will also consider its use together with the experimental techniques such as directed evolution. We will end by discussing the challenges ahead in computational protein design and some of their future applications.
Collapse
Affiliation(s)
- María Suárez
- Laboratoire de Biochimie, Ecole Polytechnique, CNRS, 91128 Palaiseau Cedex, France
- Epigenomics Project, Genopole, Université d'Evry Val d'Essonne-Genopole-CNRS, Tour Evry2, Etage 10, Terrasses de l'Agora, 91034 Evry Cedex, France
| | - Alfonso Jaramillo
- Laboratoire de Biochimie, Ecole Polytechnique, CNRS, 91128 Palaiseau Cedex, France
- Epigenomics Project, Genopole, Université d'Evry Val d'Essonne-Genopole-CNRS, Tour Evry2, Etage 10, Terrasses de l'Agora, 91034 Evry Cedex, France
| |
Collapse
|
30
|
Sciretti D, Bruscolini P, Pelizzola A, Pretti M, Jaramillo A. Computational protein design with side-chain conformational entropy. Proteins 2009; 74:176-91. [PMID: 18618711 DOI: 10.1002/prot.22145] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Recent advances in modeling protein structures at the atomic level have made it possible to tackle "de novo" computational protein design. Most procedures are based on combinatorial optimization using a scoring function that estimates the folding free energy of a protein sequence on a given main-chain structure. However, the computation of the conformational entropy in the folded state is generally an intractable problem, and its contribution to the free energy is not properly evaluated. In this article, we propose a new automated protein design methodology that incorporates such conformational entropy based on statistical mechanics principles. We define the free energy of a protein sequence by the corresponding partition function over rotamer states. The free energy is written in variational form in a pairwise approximation and minimized using the Belief Propagation algorithm. In this way, a free energy is associated to each amino acid sequence: we use this insight to rescore the results obtained with a standard minimization method, with the energy as the cost function. Then, we set up a design method that directly uses the free energy as a cost function in combination with a stochastic search in the sequence space. We validate the methods on the design of three superficial sites of a small SH3 domain, and then apply them to the complete redesign of 27 proteins. Our results indicate that accounting for entropic contribution in the score function affects the outcome in a highly nontrivial way, and might improve current computational design techniques based on protein stability.
Collapse
Affiliation(s)
- Daniele Sciretti
- Departamento de Física Teórica, Universidad de Zaragoza, c. Pedro Cerbuna 12, Zaragoza 50009, Spain
| | | | | | | | | |
Collapse
|
31
|
Crystal structure of an extensively simplified variant of bovine pancreatic trypsin inhibitor in which over one-third of the residues are alanines. Proc Natl Acad Sci U S A 2008; 105:15334-9. [PMID: 18829434 DOI: 10.1073/pnas.0802699105] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We report the high-resolution crystal structures of an extensively simplified variant of bovine pancreatic trypsin inhibitor containing 20 alanines (BPTI-20st) and a reference single-disulfide-bonded variant (BPTI-[5,55]st) at, respectively, 1.39 and 1.09 A resolutions. The sequence was simplified based on the results of an alanine scanning experiment, as reported previously. The effects of the multiple alanine substitutions on the overall backbone structure were surprisingly small (C(alpha) atom RMSD of 0.53 A) being limited to small local structural perturbations. Both BPTI variants retained a wild-type level of trypsin inhibitory activity. The side-chain configurations of residues buried in the hydrophobic cores (<30% accessible surface area) were almost perfectly retained in both BPTI-20st and BPTI-[5,55]st, indicating that neither multiple alanine replacements nor the removal of the disulfide bonds affected their precise placements. However, the side chains of three partially buried residues (Q31, R20, and to some extent Y21) and several unburied residues rearranged into alternative dense-packing structures, suggesting some plasticity in their shape complementarity. These results indicate that a protein sequence simplified over its entire length can retain its densely packed, native side-chain structure, and suggest that both the design and fold recognition of natively folded proteins may be easier than previously thought.
Collapse
|
32
|
Georgiev I, Lilien RH, Donald BR. The minimized dead-end elimination criterion and its application to protein redesign in a hybrid scoring and search algorithm for computing partition functions over molecular ensembles. J Comput Chem 2008; 29:1527-42. [PMID: 18293294 DOI: 10.1002/jcc.20909] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
One of the main challenges for protein redesign is the efficient evaluation of a combinatorial number of candidate structures. The modeling of protein flexibility, typically by using a rotamer library of commonly-observed low-energy side-chain conformations, further increases the complexity of the redesign problem. A dominant algorithm for protein redesign is dead-end elimination (DEE), which prunes the majority of candidate conformations by eliminating rigid rotamers that provably are not part of the global minimum energy conformation (GMEC). The identified GMEC consists of rigid rotamers (i.e., rotamers that have not been energy-minimized) and is thus referred to as the rigid-GMEC. As a postprocessing step, the conformations that survive DEE may be energy-minimized. When energy minimization is performed after pruning with DEE, the combined protein design process becomes heuristic, and is no longer provably accurate: a conformation that is pruned using rigid-rotamer energies may subsequently minimize to a lower energy than the rigid-GMEC. That is, the rigid-GMEC and the conformation with the lowest energy among all energy-minimized conformations (the minimized-GMEC) are likely to be different. While the traditional DEE algorithm succeeds in not pruning rotamers that are part of the rigid-GMEC, it makes no guarantees regarding the identification of the minimized-GMEC. In this paper we derive a novel, provable, and efficient DEE-like algorithm, called minimized-DEE (MinDEE), that guarantees that rotamers belonging to the minimized-GMEC will not be pruned, while still pruning a combinatorial number of conformations. We show that MinDEE is useful not only in identifying the minimized-GMEC, but also as a filter in an ensemble-based scoring and search algorithm for protein redesign that exploits energy-minimized conformations. We compare our results both to our previous computational predictions of protein designs and to biological activity assays of predicted protein mutants. Our provable and efficient minimized-DEE algorithm is applicable in protein redesign, protein-ligand binding prediction, and computer-aided drug design.
Collapse
Affiliation(s)
- Ivelin Georgiev
- Department of Computer Science, Duke University, Durham, NC, USA
| | | | | |
Collapse
|
33
|
Georgiev I, Keedy D, Richardson JS, Richardson DC, Donald BR. Algorithm for backrub motions in protein design. Bioinformatics 2008; 24:i196-204. [PMID: 18586714 PMCID: PMC2718647 DOI: 10.1093/bioinformatics/btn169] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Motivation: The Backrub is a small but kinematically efficient side-chain-coupled local backbone motion frequently observed in atomic-resolution crystal structures of proteins. A backrub shifts the Cα–Cβ orientation of a given side-chain by rigid-body dipeptide rotation plus smaller individual rotations of the two peptides, with virtually no change in the rest of the protein. Backrubs can therefore provide a biophysically realistic model of local backbone flexibility for structure-based protein design. Previously, however, backrub motions were applied via manual interactive model-building, so their incorporation into a protein design algorithm (a simultaneous search over mutation and backbone/side-chain conformation space) was infeasible. Results: We present a combinatorial search algorithm for protein design that incorporates an automated procedure for local backbone flexibility via backrub motions. We further derive a dead-end elimination (DEE)-based criterion for pruning candidate rotamers that, in contrast to previous DEE algorithms, is provably accurate with backrub motions. Our backrub-based algorithm successfully predicts alternate side-chain conformations from ≤0.9 Å resolution structures, confirming the suitability of the automated backrub procedure. Finally, the application of our algorithm to redesign two different proteins is shown to identify a large number of lower-energy conformations and mutation sequences that would have been ignored by a rigid-backbone model. Availability: Contact authors for source code. Contact:brd+ismb08@cs.duke.edu
Collapse
Affiliation(s)
- Ivelin Georgiev
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | | | | | | | | |
Collapse
|
34
|
Effect of glycosylation on protein folding: a close look at thermodynamic stabilization. Proc Natl Acad Sci U S A 2008; 105:8256-61. [PMID: 18550810 DOI: 10.1073/pnas.0801340105] [Citation(s) in RCA: 424] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Glycosylation is one of the most common posttranslational modifications to occur in protein biosynthesis, yet its effect on the thermodynamics and kinetics of proteins is poorly understood. A minimalist model based on the native protein topology, in which each amino acid and sugar ring was represented by a single bead, was used to study the effect of glycosylation on protein folding. We studied in silico the folding of 63 engineered SH3 domain variants that had been glycosylated with different numbers of conjugated polysaccharide chains at different sites on the protein's surface. Thermal stabilization of the protein by the polysaccharide chains was observed in proportion to the number of attached chains. Consistent with recent experimental data, the degree of thermal stabilization depended on the position of the glycosylation sites, but only very weakly on the size of the glycans. A thermodynamic analysis showed that the origin of the enhanced protein stabilization by glycosylation is destabilization of the unfolded state rather than stabilization of the folded state. The higher free energy of the unfolded state is enthalpic in origin because the bulky polysaccharide chains force the unfolded ensemble to adopt more extended conformations by prohibiting formation of a residual structure. The thermodynamic stabilization induced by glycosylation is coupled with kinetic stabilization. The effects introduced by the glycans on the biophysical properties of proteins are likely to be relevant to other protein polymeric conjugate systems that regularly occur in the cell as posttranslational modifications or for biotechnological purposes.
Collapse
|
35
|
Fukunishi H, Teramoto R, Takada T, Shimada J. Bootstrap-Based Consensus Scoring Method for Protein–Ligand Docking. J Chem Inf Model 2008; 48:988-96. [DOI: 10.1021/ci700204v] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Hiroaki Fukunishi
- Nano Electronics Research Laboratories and Bio-IT Center, Central Research Laboratories, NEC Corporation, 34, Miyukigaoka, Tsukuba, Ibaraki 305-8501, Japan, and Riken, Next-Generation Supercomputer R&D Center, sixth Fl., Meiji Seimei Kan, 2-1-1 Marunouchi, Chiyoda-ku, Tokyo 100-0005
| | - Reiji Teramoto
- Nano Electronics Research Laboratories and Bio-IT Center, Central Research Laboratories, NEC Corporation, 34, Miyukigaoka, Tsukuba, Ibaraki 305-8501, Japan, and Riken, Next-Generation Supercomputer R&D Center, sixth Fl., Meiji Seimei Kan, 2-1-1 Marunouchi, Chiyoda-ku, Tokyo 100-0005
| | - Toshikazu Takada
- Nano Electronics Research Laboratories and Bio-IT Center, Central Research Laboratories, NEC Corporation, 34, Miyukigaoka, Tsukuba, Ibaraki 305-8501, Japan, and Riken, Next-Generation Supercomputer R&D Center, sixth Fl., Meiji Seimei Kan, 2-1-1 Marunouchi, Chiyoda-ku, Tokyo 100-0005
| | - Jiro Shimada
- Nano Electronics Research Laboratories and Bio-IT Center, Central Research Laboratories, NEC Corporation, 34, Miyukigaoka, Tsukuba, Ibaraki 305-8501, Japan, and Riken, Next-Generation Supercomputer R&D Center, sixth Fl., Meiji Seimei Kan, 2-1-1 Marunouchi, Chiyoda-ku, Tokyo 100-0005
| |
Collapse
|
36
|
Suzuki Y, Noel JK, Onuchic JN. An analytical study of the interplay between geometrical and energetic effects in protein folding. J Chem Phys 2008; 128:025101. [PMID: 18205476 DOI: 10.1063/1.2812956] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Analytical studies have several advantages for an understanding of the mechanisms of protein folding such as the interplay between geometrical and energetic effects. In this paper, we introduce a Gaussian filament with a C(alpha) structure-based (Go) potential as a new theoretical scheme based on a Hamiltonian approach. This model takes into account geometrical information in a realistic fashion without the need of phenomenological descriptions. In order to make this model more appropriate for comparison with protein folding simulations and experiments, we introduce a many-body interaction into the potential term to enhance cooperativity. We apply our new analytical model to a beta-hairpin-type peptide and compare our results with a molecular dynamics simulation of a structure-based model.
Collapse
Affiliation(s)
- Yoko Suzuki
- Department of Physics, School of Sciences and Engineering, Meisei University, 2-1-1 Hodokubo, Hino-shi, Tokyo 191-8506, Japan.
| | | | | |
Collapse
|
37
|
Abstract
We propose a method of quantifying the degree of frustration manifested by spatially local interactions in protein biomolecules. This method of localization smoothly generalizes the global criterion for an energy landscape to be funneled to the native state, which is in keeping with the principle of minimal frustration. A survey of the structural database shows that natural proteins are multiply connected by a web of local interactions that are individually minimally frustrated. In contrast, highly frustrated interactions are found clustered on the surface, often near binding sites. These binding sites become less frustrated upon complex formation.
Collapse
|
38
|
Abstract
MOTIVATION Dead-End Elimination (DEE) is a powerful algorithm capable of reducing the search space for structure-based protein design by a combinatorial factor. By using a fixed backbone template, a rotamer library, and a potential energy function, DEE identifies and prunes rotamer choices that are provably not part of the Global Minimum Energy Conformation (GMEC), effectively eliminating the majority of the conformations that must be subsequently enumerated to obtain the GMEC. Since a fixed-backbone model biases the algorithm predictions against protein sequences for which even small backbone movements may result in a significantly enhanced stability, the incorporation of backbone flexibility can improve the accuracy of the design predictions. If explicit backbone flexibility is incorporated into the model, however, the traditional DEE criteria can no longer guarantee that the flexible-backbone GMEC, the lowest-energy conformation when the backbone is allowed to flex, will not be pruned. RESULTS We derive a novel DEE pruning criterion, flexible-backbone DEE (BD), that is provably accurate with backbone flexibility, guaranteeing that no rotamers belonging to the flexible-backbone GMEC are pruned; we also present further enhancements to BD for improved pruning efficiency. The results from applying our novel algorithms to redesign the beta1 domain of protein G and to switch the substrate specificity of the NRPS enzyme GrsA-PheA are then compared against the results from previous fixed-backbone DEE algorithms. We confirm experimentally that traditional-DEE is indeed not provably-accurate with backbone flexibility and that BD is capable of generating conformations with significantly lower energies, thus confirming the feasibility of our novel algorithms. AVAILABILITY Contact authors for source code.
Collapse
Affiliation(s)
- Ivelin Georgiev
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | | |
Collapse
|
39
|
Biswas P, Zou J, Saven JG. Statistical theory for protein ensembles with designed energy landscapes. J Chem Phys 2007; 123:154908. [PMID: 16252973 DOI: 10.1063/1.2062047] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Combinatorial protein libraries provide a promising route to investigate the determinants and features of protein folding and to identify novel folding amino acid sequences. A library of sequences based on a pool of different monomer types are screened for folding molecules, consistent with a particular foldability criterion. The number of sequences grows exponentially with the length of the polymer, making both experimental and computational tabulations of sequences infeasible. Herein a statistical theory is extended to specify the properties of sequences having particular values of global energetic quantities that specify their energy landscape. The theory yields the site-specific monomer probabilities. A foldability criterion is derived that characterizes the properties of sequences by quantifying the energetic separation of the target state from low-energy states in the unfolded ensemble and the fluctuations of the energies in the unfolded state ensemble. For a simple lattice model of proteins, excellent agreement is observed between the theory and the results of exact enumeration. The theory may be used to provide a quantitative framework for the design and interpretation of combinatorial experiments.
Collapse
Affiliation(s)
- Parbati Biswas
- Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
| | | | | |
Collapse
|
40
|
Abstract
Multistate protein design is the task of predicting the amino acid sequence that is best suited to selectively and stably fold to one state out of a set of competing structures. Computationally, it entails solving a challenging optimization problem. Therefore, notwithstanding the increased interest in multistate design, the only implementations reported are based on either genetic algorithms or Monte Carlo methods. The dead-end elimination (DEE) theorem cannot be readily transfered to multistate design problems despite its successful application to single-state protein design. In this article we propose a variant of the standard DEE, called type-dependent DEE. Our method reduces the size of the conformational space of the multistate design problem, while provably preserving the minimal energy conformational assignment for any choice of amino acid sequence. Type-dependent DEE can therefore be used as a preprocessing step in any computational multistate design scheme. We demonstrate the applicability of type-dependent DEE on a set of multistate design problems and discuss its strength and limitations.
Collapse
Affiliation(s)
- Chen Yanover
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 91904, Israel.
| | | | | |
Collapse
|
41
|
Gehenn K, Stege J, Reed J. The side chain interaction index as a tool for predicting fast-folding elements and the structure and stability of engineered peptides. Anal Biochem 2006; 356:12-7. [PMID: 16860775 DOI: 10.1016/j.ab.2006.06.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2005] [Revised: 05/10/2006] [Accepted: 06/14/2006] [Indexed: 10/24/2022]
Abstract
The side chain interaction index (SCII) is a method of calculating the propensity for short-range interactions among side chains within a peptide sequence. Here, it is shown that the SCII values of secondary structure elements that have been shown to fold early and independently cluster separately from those of structures that fold later and/or are dependent on long-range interactions. In addition, the SCII values of engineered peptides that spontaneously adopt a particular desired fold in solution are significantly different from those of engineered peptides that fail to exhibit a stable conformation. Thus, the SCII, as a measure of local structural stability, constitutes a useful tool in folding prediction and in protein/peptide engineering. A program that allows rapid calculation of SCII values is presented.
Collapse
Affiliation(s)
- Katja Gehenn
- Department of Pathochemistry, German Cancer Research Center, D-69120, Heidelberg, Germany
| | | | | |
Collapse
|
42
|
Abstract
Over the past 10 years there has been tremendous success in the area of computational protein design. Protein design software has been used to stabilize proteins, solubilize membrane proteins, design intermolecular interactions, and design new protein structures. A key motivation for these studies is that they test our understanding of protein energetics and structure. De novo design of novel structures is a particularly rigorous test because the protein backbone must be designed in addition to the amino acid side chains. A priori it is not guaranteed that the target backbone is even designable. To address this issue, researchers have developed a variety of methods for generating protein-like scaffolds and for optimizing the protein backbone in conjunction with the amino acid sequence. These protocols have been used to design proteins from scratch and to explore sequence space for naturally occurring protein folds.
Collapse
Affiliation(s)
- Glenn L Butterfoss
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7260, USA.
| | | |
Collapse
|
43
|
Shakhnovich E. Protein folding thermodynamics and dynamics: where physics, chemistry, and biology meet. Chem Rev 2006; 106:1559-88. [PMID: 16683745 PMCID: PMC2735084 DOI: 10.1021/cr040425u] [Citation(s) in RCA: 253] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Eugene Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, Massachusetts 02138, USA.
| |
Collapse
|
44
|
Chikenji G, Fujitsuka Y, Takada S. Shaping up the protein folding funnel by local interaction: lesson from a structure prediction study. Proc Natl Acad Sci U S A 2006; 103:3141-6. [PMID: 16488978 PMCID: PMC1413881 DOI: 10.1073/pnas.0508195103] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2005] [Indexed: 11/18/2022] Open
Abstract
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of "chimera proteins." In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape.
Collapse
Affiliation(s)
- George Chikenji
- *Department of Chemistry, Faculty of Science, and
- Department of Computational Science and Engineering, Graduate School of Engineering, Nagoya University, Nagoya 464-8603, Japan; and
| | - Yoshimi Fujitsuka
- Graduate School of Science and Technology, Kobe University, Nada, Kobe 657-8501, Japan
| | - Shoji Takada
- *Department of Chemistry, Faculty of Science, and
- Graduate School of Science and Technology, Kobe University, Nada, Kobe 657-8501, Japan
- Core Research for Evolutionary Science and Technology, Japan Science and Technology Agency, Nada, Kobe 657-8501, Japan
| |
Collapse
|
45
|
A Novel Minimized Dead-End Elimination Criterion and Its Application to Protein Redesign in a Hybrid Scoring and Search Algorithm for Computing Partition Functions over Molecular Ensembles. ACTA ACUST UNITED AC 2006. [DOI: 10.1007/11732990_44] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
46
|
Lilien RH, Stevens BW, Anderson AC, Donald BR. A novel ensemble-based scoring and search algorithm for protein redesign and its application to modify the substrate specificity of the gramicidin synthetase a phenylalanine adenylation enzyme. J Comput Biol 2005; 12:740-61. [PMID: 16108714 DOI: 10.1089/cmb.2005.12.740] [Citation(s) in RCA: 85] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Realization of novel molecular function requires the ability to alter molecular complex formation. Enzymatic function can be altered by changing enzyme-substrate interactions via modification of an enzyme's active site. A redesigned enzyme may either perform a novel reaction on its native substrates or its native reaction on novel substrates. A number of computational approaches have been developed to address the combinatorial nature of the protein redesign problem. These approaches typically search for the global minimum energy conformation among an exponential number of protein conformations. We present a novel algorithm for protein redesign, which combines a statistical mechanics-derived ensemble-based approach to computing the binding constant with the speed and completeness of a branch-and-bound pruning algorithm. In addition, we developed an efficient deterministic approximation algorithm, capable of approximating our scoring function to arbitrary precision. In practice, the approximation algorithm decreases the execution time of the mutation search by a factor of ten. To test our method, we examined the Phe-specific adenylation domain of the nonribosomal peptide synthetase gramicidin synthetase A (GrsA-PheA). Ensemble scoring, using a rotameric approximation to the partition functions of the bound and unbound states for GrsA-PheA, is first used to predict binding of the wildtype protein and a previously described mutant (selective for leucine), and second, to switch the enzyme specificity toward leucine, using two novel active site sequences computationally predicted by searching through the space of possible active site mutations. The top scoring in silico mutants were created in the wetlab and dissociation/binding constants were determined by fluorescence quenching. These tested mutations exhibit the desired change in specificity from Phe to Leu. Our ensemble-based algorithm, which flexibly models both protein and ligand using rotamer-based partition functions, has application in enzyme redesign, the prediction of protein-ligand binding, and computer-aided drug design.
Collapse
Affiliation(s)
- Ryan H Lilien
- Computer Science Department, Dartmouth College, Hanover, NH 03755, USA
| | | | | | | |
Collapse
|
47
|
Suzuki Y, Onuchic JN. Modeling the Interplay between Geometrical and Energetic Effects in Protein Folding. J Phys Chem B 2005; 109:16503-10. [PMID: 16853098 DOI: 10.1021/jp0512863] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A theoretical framework is constructed with the aid of a free-energy functional method that is capable of describing the interplay between geometrical and energetic effects on protein folding. In this paper, we generalize a free-energy functional model based on polymer theory to make it more appropriate for comparison with protein folding simulations and experiments. This generalization is made by introducing cooperativity into the configurational entropy and the internal energy. Modifications to configurational entropy enable the model to account for the loop-loop interactions, a contribution neglected in the original model. Modifications to the internal energy introduce many-body corrections, which are needed to establish quantitative contact to simulations as well as experimental observations. To demonstrate the efficiency of the modified analytical model, we compare our results with C(alpha) structure-based (Go) model simulations of chymotrypsin inhibitor II and the SH3 domain of src.
Collapse
Affiliation(s)
- Yoko Suzuki
- Department of Physics, Faculty of Physical Sciences and Engineering, Meisei University, 2-1-1 Hodokubo, Hino-shi, Tokyo 191-8506, Japan
| | | |
Collapse
|
48
|
Pokala N, Handel TM. Energy Functions for Protein Design: Adjustment with Protein–Protein Complex Affinities, Models for the Unfolded State, and Negative Design of Solubility and Specificity. J Mol Biol 2005; 347:203-27. [PMID: 15733929 DOI: 10.1016/j.jmb.2004.12.019] [Citation(s) in RCA: 157] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2004] [Revised: 12/05/2004] [Accepted: 12/09/2004] [Indexed: 11/16/2022]
Abstract
The development of the EGAD program and energy function for protein design is described. In contrast to most protein design methods, which require several empirical parameters or heuristics such as patterning of residues or rotamers, EGAD has a minimalist philosophy; it uses very few empirical factors to account for inaccuracies resulting from the use of fixed backbones and discrete rotamers in protein design calculations, and describes the unfolded state, aggregates, and alternative conformers explicitly with physical models instead of fitted parameters. This approach unveils important issues in protein design that are often camouflaged by heuristic-emphasizing methods. Inter-atom energies are modeled with the OPLS-AA all-atom forcefield, electrostatics with the generalized Born continuum model, and the hydrophobic effect with a solvent-accessible surface area-dependent term. Experimental characterization of proteins designed with an unmodified version of the energy function revealed problems with under-packing, stability, aggregation, and structural specificity. Under-packing was addressed by modifying the van der Waals function. By optimizing only three parameters, the effects of >400 mutations on protein-protein complex formation were predicted to within 1.0 kcal mol(-1). As an independent test, this modified energy function was used to predict the stabilities of >1500 mutants to within 1.0 kcal mol(-1); this required a physical model of the unfolded state that includes more interactions than traditional tripeptide-based models. Solubility and structural specificity were addressed with simple physical approximations of aggregation and conformational equilibria. The complete energy function can design protein sequences that have high levels of identity with their natural counterparts, and have predicted structural properties more consistent with soluble and uniquely folded proteins than the initial designs.
Collapse
Affiliation(s)
- Navin Pokala
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA.
| | | |
Collapse
|
49
|
Wolynes PG. Energy landscapes and solved protein-folding problems. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2005; 363:453-467. [PMID: 15664893 DOI: 10.1098/rsta.2004.1502] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Energy-landscape theory has led to much progress in protein folding kinetics, protein structure prediction and protein design. Funnel landscapes describe protein folding and binding and explain how protein topology determines kinetics. Landscape-optimized energy functions based on bioinformatic input have been used to correctly predict low-resolution protein structures and also to design novel proteins automatically.
Collapse
Affiliation(s)
- Peter G Wolynes
- Department of Chemistry and Biochemistry, Center for Theoretical Biological Physics, University of California, San Diego, 6202 Urey Hall 0371, 9500 Gilman Drive, La Jolla, California, USA.
| |
Collapse
|
50
|
Chapter 18 Computationally Assisted Protein Design. ACTA ACUST UNITED AC 2005. [DOI: 10.1016/s1574-1400(05)01018-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|