1
|
Das NR, Chaudhury KN, Pal D. Improved NMR-data-compliant protein structure modeling captures context-dependent variations and expands the scope of functional inference. Proteins 2023; 91:412-435. [PMID: 36287124 DOI: 10.1002/prot.26439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 09/12/2022] [Accepted: 10/20/2022] [Indexed: 11/13/2022]
Abstract
Nuclear magnetic resonance (NMR) spectroscopy can reveal conformational states of a protein in physiological conditions. However, sparsely available NMR data for a protein with large degrees of freedom can introduce structural artifacts in the built models. Currently used state-of-the-art methods deriving protein structure and conformation from NMR deploy molecular dynamics (MD) coupled with simulated annealing for building models. We provide an alternate graph-based modeling approach, where we first build substructures from NMR-derived distance-geometry constraints combined in one shot to form the core structure. The remaining molecule with inadequate data is modeled using a hybrid approach respecting the observed distance-geometry constraints. One-shot structure building is rarely undertaken for large and sparse data systems, but our data-driven bottom-up approach makes this uniquely feasible by suitable partitioning of the problem. A detailed comparison of select models with state-of-art methods reveals differences in the secondary structure regions wherein the correctness of our models is confirmed by NMR data. Benchmarking of 106 protein-folds covering 38-282 length structures shows minimal experimental-constraint violations while conforming to other structure quality parameters such as the proper folding, steric clash, and torsion angle violation based on Ramachandran plot criteria. Comparative MD studies using select protein models from a state-of-art method and ours under identical experimental parameters reveal distinct conformational dynamics that could be attributed to protein structure-function. Our work is thus useful in building enhanced NMR-evidence-based models that encapsulate the contextual secondary and tertiary structure variations present during the experimentation and expand the scope of functional inference.
Collapse
Affiliation(s)
- Niladri R Das
- IISc Mathematics Initiative, Indian Institute of Science, Bangalore, India.,Department of Electrical Engineering, Indian Institute of Science, Bangalore, India
| | - Kunal N Chaudhury
- Department of Electrical Engineering, Indian Institute of Science, Bangalore, India
| | - Debnath Pal
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India
| |
Collapse
|
2
|
Labiak R, Lavor C, Souza M. Distance geometry and protein loop modeling. J Comput Chem 2021; 43:349-358. [PMID: 34904248 DOI: 10.1002/jcc.26796] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 10/22/2021] [Accepted: 11/28/2021] [Indexed: 11/11/2022]
Abstract
Due to the role of loops in protein function, loop modeling is an important problem in computational biology. We present a new approach to loop modeling based on a combinatorial version of distance geometry, where the search space of the associated problem is represented by a binary tree and a branch-and-prune method is defined to explore it, following an atomic ordering previously given. This ordering is used to calculate the coordinates of atoms from the positions of its predecessors. In addition to the theoretical development, computational results are presented to illustrate the advantage of the proposed method, compared with another approach of the literature. Our algorithm is freely available at https://github.com/michaelsouza/bpl.
Collapse
Affiliation(s)
- Rodrigo Labiak
- Department of Mathematics, University of Campinas, Campinas, Brazil
| | - Carlile Lavor
- Department of Applied Mathematics, University of Campinas, Campinas, Brazil
| | - Michael Souza
- Department of Applied Mathematics, Federal University of Ceara, Fortaleza, Brazil
| |
Collapse
|
3
|
Cole C, Parks C, Rachele J, Valafar H. Increased usability, algorithmic improvements and incorporation of data mining for structure calculation of proteins with REDCRAFT software package. BMC Bioinformatics 2020; 21:204. [PMID: 33272215 PMCID: PMC7712608 DOI: 10.1186/s12859-020-3522-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 04/29/2020] [Indexed: 02/08/2023] Open
Abstract
Background Traditional approaches to elucidation of protein structures by Nuclear Magnetic Resonance spectroscopy (NMR) rely on distance restraints also known as Nuclear Overhauser effects (NOEs). The use of NOEs as the primary source of structure determination by NMR spectroscopy is time consuming and expensive. Residual Dipolar Couplings (RDCs) have become an alternate approach for structure calculation by NMR spectroscopy. In previous works, the software package REDCRAFT has been presented as a means of harnessing the information containing in RDCs for structure calculation of proteins. However, to meet its full potential, several improvements to REDCRAFT must be made. Results In this work, we present improvements to REDCRAFT that include increased usability, better interoperability, and a more robust core algorithm. We have demonstrated the impact of the improved core algorithm in the successful folding of the protein 1A1Z with as high as ±4 Hz of added error. The REDCRAFT computed structure from the highly corrupted data exhibited less than 1.0 Å with respect to the X-ray structure. We have also demonstrated the interoperability of REDCRAFT in a few instances including with PDBMine to reduce the amount of required data in successful folding of proteins to unprecedented levels. Here we have demonstrated the successful folding of the protein 1D3Z (to within 2.4 Å of the X-ray structure) using only N-H RDCs from one alignment medium. Conclusions The additional GUI features of REDCRAFT combined with the NEF compliance have significantly increased the flexibility and usability of this software package. The improvements of the core algorithm have substantially improved the robustness of REDCRAFT in utilizing less experimental data both in quality and quantity.
Collapse
Affiliation(s)
- Casey Cole
- Department of Computer Science and Engineering, University of South Carolina, M. Bert Storey Engineering and Innovation Center, 550 Assembly St, Columbia, SC, 29201, USA
| | - Caleb Parks
- Department of Computer Science and Engineering, University of South Carolina, M. Bert Storey Engineering and Innovation Center, 550 Assembly St, Columbia, SC, 29201, USA
| | - Julian Rachele
- Department of Computer Science and Engineering, University of South Carolina, M. Bert Storey Engineering and Innovation Center, 550 Assembly St, Columbia, SC, 29201, USA
| | - Homayoun Valafar
- Department of Computer Science and Engineering, University of South Carolina, M. Bert Storey Engineering and Innovation Center, 550 Assembly St, Columbia, SC, 29201, USA.
| |
Collapse
|
4
|
Structural characterization of life-extending Caenorhabditis elegans Lipid Binding Protein 8. Sci Rep 2019; 9:9966. [PMID: 31292465 PMCID: PMC6620326 DOI: 10.1038/s41598-019-46230-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 05/24/2019] [Indexed: 01/07/2023] Open
Abstract
The lysosome plays a crucial role in the regulation of longevity. Lysosomal degradation is tightly coupled with autophagy that is induced by many longevity paradigms and required for lifespan extension. The lysosome also serves as a hub for signal transduction and regulates longevity via affecting nuclear transcription. One lysosome-to-nucleus retrograde signaling pathway is mediated by a lysosome-associated fatty acid binding protein LBP-8 in Caenorhabditis elegans. LBP-8 shuttles lysosomal lipids into the nucleus to activate lipid regulated nuclear receptors NHR-49 and NHR-80 and consequently promote longevity. However, the structural basis of LBP-8 action remains unclear. Here, we determined the first 1.3 Å high-resolution structure of this life-extending protein LBP-8, which allowed us to identify a structurally conserved nuclear localization signal and amino acids involved in lipid binding. Additionally, we described the range of fatty acids LBP-8 is capable of binding and show that it binds to life-extending ligands in worms such as oleic acid and oleoylethanolamide with high affinity.
Collapse
|
5
|
Hallen MA, Donald BR. CATS (Coordinates of Atoms by Taylor Series): protein design with backbone flexibility in all locally feasible directions. Bioinformatics 2018; 33:i5-i12. [PMID: 28882005 PMCID: PMC5870559 DOI: 10.1093/bioinformatics/btx277] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Motivation When proteins mutate or bind to ligands, their backbones often move significantly, especially in loop regions. Computational protein design algorithms must model these motions in order to accurately optimize protein stability and binding affinity. However, methods for backbone conformational search in design have been much more limited than for sidechain conformational search. This is especially true for combinatorial protein design algorithms, which aim to search a large sequence space efficiently and thus cannot rely on temporal simulation of each candidate sequence. Results We alleviate this difficulty with a new parameterization of backbone conformational space, which represents all degrees of freedom of a specified segment of protein chain that maintain valid bonding geometry (by maintaining the original bond lengths and angles and ω dihedrals). In order to search this space, we present an efficient algorithm, CATS, for computing atomic coordinates as a function of our new continuous backbone internal coordinates. CATS generalizes the iMinDEE and EPIC protein design algorithms, which model continuous flexibility in sidechain dihedrals, to model continuous, appropriately localized flexibility in the backbone dihedrals ϕ and ψ as well. We show using 81 test cases based on 29 different protein structures that CATS finds sequences and conformations that are significantly lower in energy than methods with less or no backbone flexibility do. In particular, we show that CATS can model the viability of an antibody mutation known experimentally to increase affinity, but that appears sterically infeasible when modeled with less or no backbone flexibility. Availability and implementation Our code is available as free software at https://github.com/donaldlab/OSPREY_refactor. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mark A Hallen
- Department of Computer Science, Duke University, Durham, NC, USA.,Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC, USA.,Department of Chemistry, Duke University, Durham, NC, USA.,Department of Biochemistry, Duke University Medical Center, Durham, NC, USA
| |
Collapse
|
6
|
Khoo Y, Singer A, Cowburn D. Integrating NOE and RDC using sum-of-squares relaxation for protein structure determination. JOURNAL OF BIOMOLECULAR NMR 2017; 68:163-185. [PMID: 28616711 PMCID: PMC11347928 DOI: 10.1007/s10858-017-0108-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 03/31/2017] [Indexed: 06/07/2023]
Abstract
We revisit the problem of protein structure determination from geometrical restraints from NMR, using convex optimization. It is well-known that the NP-hard distance geometry problem of determining atomic positions from pairwise distance restraints can be relaxed into a convex semidefinite program (SDP). However, often the NOE distance restraints are too imprecise and sparse for accurate structure determination. Residual dipolar coupling (RDC) measurements provide additional geometric information on the angles between atom-pair directions and axes of the principal-axis-frame. The optimization problem involving RDC is highly non-convex and requires a good initialization even within the simulated annealing framework. In this paper, we model the protein backbone as an articulated structure composed of rigid units. Determining the rotation of each rigid unit gives the full protein structure. We propose solving the non-convex optimization problems using the sum-of-squares (SOS) hierarchy, a hierarchy of convex relaxations with increasing complexity and approximation power. Unlike classical global optimization approaches, SOS optimization returns a certificate of optimality if the global optimum is found. Based on the SOS method, we proposed two algorithms-RDC-SOS and RDC-NOE-SOS, that have polynomial time complexity in the number of amino-acid residues and run efficiently on a standard desktop. In many instances, the proposed methods exactly recover the solution to the original non-convex optimization problem. To the best of our knowledge this is the first time SOS relaxation is introduced to solve non-convex optimization problems in structural biology. We further introduce a statistical tool, the Cramér-Rao bound (CRB), to provide an information theoretic bound on the highest resolution one can hope to achieve when determining protein structure from noisy measurements using any unbiased estimator. Our simulation results show that when the RDC measurements are corrupted by Gaussian noise of realistic variance, both SOS based algorithms attain the CRB. We successfully apply our method in a divide-and-conquer fashion to determine the structure of ubiquitin from experimental NOE and RDC measurements obtained in two alignment media, achieving more accurate and faster reconstructions compared to the current state of the art.
Collapse
Affiliation(s)
- Y Khoo
- Department of Physics, Princeton University, Princeton, NJ, 08540, USA.
- Department of Mathematics, Stanford University, Stanford, CA, 94305, USA.
| | - A Singer
- Department of Mathematics and PACM, Princeton University, Princeton, NJ, 08544, USA
| | - D Cowburn
- Department of Biochemistry, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| |
Collapse
|
7
|
Gainza P, Nisonoff HM, Donald BR. Algorithms for protein design. Curr Opin Struct Biol 2016; 39:16-26. [PMID: 27086078 PMCID: PMC5065368 DOI: 10.1016/j.sbi.2016.03.006] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Revised: 03/15/2016] [Accepted: 03/22/2016] [Indexed: 02/05/2023]
Abstract
Computational structure-based protein design programs are becoming an increasingly important tool in molecular biology. These programs compute protein sequences that are predicted to fold to a target structure and perform a desired function. The success of a program's predictions largely relies on two components: first, the input biophysical model, and second, the algorithm that computes the best sequence(s) and structure(s) according to the biophysical model. Improving both the model and the algorithm in tandem is essential to improving the success rate of current programs, and here we review recent developments in algorithms for protein design, emphasizing how novel algorithms enable the use of more accurate biophysical models. We conclude with a list of algorithmic challenges in computational protein design that we believe will be especially important for the design of therapeutic proteins and protein assemblies.
Collapse
Affiliation(s)
- Pablo Gainza
- Department of Computer Science, Duke University, Durham, NC, United States
| | - Hunter M Nisonoff
- Department of Computer Science, Duke University, Durham, NC, United States
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC, United States; Department of Biochemistry, Duke University Medical Center, Durham, NC, United States; Department of Chemistry, Duke University, Durham, NC, United States.
| |
Collapse
|
8
|
Boulton S, Melacini G. Advances in NMR Methods To Map Allosteric Sites: From Models to Translation. Chem Rev 2016; 116:6267-304. [PMID: 27111288 DOI: 10.1021/acs.chemrev.5b00718] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The last five years have witnessed major developments in the understanding of the allosteric phenomenon, broadly defined as coupling between remote molecular sites. Such advances have been driven not only by new theoretical models and pharmacological applications of allostery, but also by progress in the experimental approaches designed to map allosteric sites and transitions. Among these techniques, NMR spectroscopy has played a major role given its unique near-atomic resolution and sensitivity to the dynamics that underlie allosteric couplings. Here, we highlight recent progress in the NMR methods tailored to investigate allostery with the goal of offering an overview of which NMR approaches are best suited for which allosterically relevant questions. The picture of the allosteric "NMR toolbox" is provided starting from one of the simplest models of allostery (i.e., the four-state thermodynamic cycle) and continuing to more complex multistate mechanisms. We also review how such an "NMR toolbox" has assisted the elucidation of the allosteric molecular basis for disease-related mutations and the discovery of novel leads for allosteric drugs. From this overview, it is clear that NMR plays a central role not only in experimentally validating transformative theories of allostery, but also in tapping the full translational potential of allosteric systems.
Collapse
Affiliation(s)
- Stephen Boulton
- Department of Chemistry and Chemical Biology Department of Biochemistry and Biomedical Sciences, McMaster University , 1280 Main St. W., Hamilton L8S 4M1, Canada
| | - Giuseppe Melacini
- Department of Chemistry and Chemical Biology Department of Biochemistry and Biomedical Sciences, McMaster University , 1280 Main St. W., Hamilton L8S 4M1, Canada
| |
Collapse
|
9
|
Vammi V, Song G. Ensembles of a small number of conformations with relative populations. JOURNAL OF BIOMOLECULAR NMR 2015; 63:341-351. [PMID: 26474790 DOI: 10.1007/s10858-015-9993-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Accepted: 10/14/2015] [Indexed: 06/05/2023]
Abstract
In our previous work, we proposed a new way to represent protein native states, using ensembles of a small number of conformations with relative Populations, or ESP in short. Using Ubiquitin as an example, we showed that using a small number of conformations could greatly reduce the potential of overfitting and assigning relative populations to protein ensembles could significantly improve their quality. To demonstrate that ESP indeed is an excellent alternative to represent protein native states, in this work we compare the quality of two ESP ensembles of Ubiquitin with several well-known regular ensembles or average structure representations. Extensive amount of significant experimental data are employed to achieve a thorough assessment. Our results demonstrate that ESP ensembles, though much smaller in size comparing to regular ensembles, perform equally or even better sometimes in all four different types of experimental data used in the assessment, namely, the residual dipolar couplings, residual chemical shift anisotropy, hydrogen exchange rates, and solution scattering profiles. This work further underlines the significance of having relative populations in describing the native states.
Collapse
Affiliation(s)
- Vijay Vammi
- Bioinformatics and Computational Biology Program, Department of Computer Science, Iowa State University, 226 Atanasoff Hall, Ames, IA, 50011, USA.
| | - Guang Song
- Bioinformatics and Computational Biology Program, Department of Computer Science, Iowa State University, 226 Atanasoff Hall, Ames, IA, 50011, USA
- Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA, USA
| |
Collapse
|
10
|
Mukhopadhyay R, Irausquin S, Schmidt C, Valafar H. Dynafold: a dynamic programming approach to protein backbone structure determination from minimal sets of Residual Dipolar Couplings. J Bioinform Comput Biol 2014; 12:1450002. [PMID: 24467760 DOI: 10.1142/s0219720014500024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Residual Dipolar Couplings (RDCs) are a source of NMR data that can provide a powerful set of constraints on the orientation of inter-nuclear vectors, and are quickly becoming a larger part of the experimental toolset for molecular biologists. However, few reliable protocols exist for the determination of protein backbone structures from small sets of RDCs. DynaFold is a new dynamic programming algorithm designed specifically for this task, using minimal sets of RDCs collected in multiple alignment media. DynaFold was first tested utilizing synthetic data generated for the N--H , C(α)--H(α), and C--N vectors of 1BRF, 1F53, 110M, and 3LAY proteins, with up to ±1 Hz error in three alignment media, and was able to produce structures with less than 1.9 Å of the original structures. DynaFold was then tested using experimental data, obtained from the Biological Magnetic Resonance Bank, for proteins PDBID:1P7E and 1D3Z using RDC data from two alignment media. This exercise yielded structures within 1.0 Å of their respective published structures in segments with high data density, and less than 1.9 Å over the entire protein. The same sets of RDC data were also used in comparisons with traditional methods for analysis of RDCs, which failed to match the accuracy of DynaFold's approach to structure determination.
Collapse
Affiliation(s)
- Rishi Mukhopadhyay
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
| | | | | | | |
Collapse
|
11
|
Gainza P, Roberts KE, Georgiev I, Lilien RH, Keedy DA, Chen CY, Reza F, Anderson AC, Richardson DC, Richardson JS, Donald BR. OSPREY: protein design with ensembles, flexibility, and provable algorithms. Methods Enzymol 2013; 523:87-107. [PMID: 23422427 DOI: 10.1016/b978-0-12-394292-0.00005-9] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
UNLABELLED We have developed a suite of protein redesign algorithms that improves realistic in silico modeling of proteins. These algorithms are based on three characteristics that make them unique: (1) improved flexibility of the protein backbone, protein side-chains, and ligand to accurately capture the conformational changes that are induced by mutations to the protein sequence; (2) modeling of proteins and ligands as ensembles of low-energy structures to better approximate binding affinity; and (3) a globally optimal protein design search, guaranteeing that the computational predictions are optimal with respect to the input model. Here, we illustrate the importance of these three characteristics. We then describe OSPREY, a protein redesign suite that implements our protein design algorithms. OSPREY has been used prospectively, with experimental validation, in several biomedically relevant settings. We show in detail how OSPREY has been used to predict resistance mutations and explain why improved flexibility, ensembles, and provability are essential for this application. AVAILABILITY OSPREY is free and open source under a Lesser GPL license. The latest version is OSPREY 2.0. The program, user manual, and source code are available at www.cs.duke.edu/donaldlab/software.php. CONTACT osprey@cs.duke.edu.
Collapse
Affiliation(s)
- Pablo Gainza
- Department of Computer Science, Duke University, Durham, North Carolina, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Zeng J, Zhou P, Donald BR. HASH: a program to accurately predict protein Hα shifts from neighboring backbone shifts. JOURNAL OF BIOMOLECULAR NMR 2013; 55:105-18. [PMID: 23242797 PMCID: PMC3652891 DOI: 10.1007/s10858-012-9693-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2012] [Accepted: 12/05/2012] [Indexed: 06/01/2023]
Abstract
Chemical shifts provide not only peak identities for analyzing nuclear magnetic resonance (NMR) data, but also an important source of conformational information for studying protein structures. Current structural studies requiring H(α) chemical shifts suffer from the following limitations. (1) For large proteins, the H(α) chemical shifts can be difficult to assign using conventional NMR triple-resonance experiments, mainly due to the fast transverse relaxation rate of C(α) that restricts the signal sensitivity. (2) Previous chemical shift prediction approaches either require homologous models with high sequence similarity or rely heavily on accurate backbone and side-chain structural coordinates. When neither sequence homologues nor structural coordinates are available, we must resort to other information to predict H(α) chemical shifts. Predicting accurate H(α) chemical shifts using other obtainable information, such as the chemical shifts of nearby backbone atoms (i.e., adjacent atoms in the sequence), can remedy the above dilemmas, and hence advance NMR-based structural studies of proteins. By specifically exploiting the dependencies on chemical shifts of nearby backbone atoms, we propose a novel machine learning algorithm, called HASH, to predict H(α) chemical shifts. HASH combines a new fragment-based chemical shift search approach with a non-parametric regression model, called the generalized additive model, to effectively solve the prediction problem. We demonstrate that the chemical shifts of nearby backbone atoms provide a reliable source of information for predicting accurate H(α) chemical shifts. Our testing results on different possible combinations of input data indicate that HASH has a wide rage of potential NMR applications in structural and biological studies of proteins.
Collapse
Affiliation(s)
- Jianyang Zeng
- Department of Computer Science, Duke University, Durham NC 27708, USA
| | - Pei Zhou
- Department of Biochemistry, Duke University Medical Center, Durham NC 27708 USA
| | - Bruce Randall Donald
- Department of Computer Science, Duke University, Durham NC 27708, USA
- Department of Biochemistry, Duke University Medical Center, Durham NC 27708 USA
| |
Collapse
|
13
|
Hallen MA, Keedy DA, Donald BR. Dead-end elimination with perturbations (DEEPer): a provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins 2012; 81:18-39. [PMID: 22821798 DOI: 10.1002/prot.24150] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Revised: 07/01/2012] [Accepted: 07/11/2012] [Indexed: 11/12/2022]
Abstract
Computational protein and drug design generally require accurate modeling of protein conformations. This modeling typically starts with an experimentally determined protein structure and considers possible conformational changes due to mutations or new ligands. The DEE/A* algorithm provably finds the global minimum-energy conformation (GMEC) of a protein assuming that the backbone does not move and the sidechains take on conformations from a set of discrete, experimentally observed conformations called rotamers. DEE/A* can efficiently find the overall GMEC for exponentially many mutant sequences. Previous improvements to DEE/A* include modeling ensembles of sidechain conformations and either continuous sidechain or backbone flexibility. We present a new algorithm, DEEPer (Dead-End Elimination with Perturbations), that combines these advantages and can also handle much more extensive backbone flexibility and backbone ensembles. DEEPer provably finds the GMEC or, if desired by the user, all conformations and sequences within a specified energy window of the GMEC. It includes the new abilities to handle arbitrarily large backbone perturbations and to generate ensembles of backbone conformations. It also incorporates the shear, an experimentally observed local backbone motion never before used in design. Additionally, we derive a new method to accelerate DEE/A*-based calculations, indirect pruning, that is particularly useful for DEEPer. In 67 benchmark tests on 64 proteins, DEEPer consistently identified lower-energy conformations than previous methods did, indicating more accurate modeling. Additional tests demonstrated its ability to incorporate larger, experimentally observed backbone conformational changes and to model realistic conformational ensembles. These capabilities provide significant advantages for modeling protein mutations and protein-ligand interactions.
Collapse
Affiliation(s)
- Mark A Hallen
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, USA
| | | | | |
Collapse
|