51
|
Hallen MA, Donald BR. CATS (Coordinates of Atoms by Taylor Series): protein design with backbone flexibility in all locally feasible directions. Bioinformatics 2018; 33:i5-i12. [PMID: 28882005 PMCID: PMC5870559 DOI: 10.1093/bioinformatics/btx277] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Motivation When proteins mutate or bind to ligands, their backbones often move significantly, especially in loop regions. Computational protein design algorithms must model these motions in order to accurately optimize protein stability and binding affinity. However, methods for backbone conformational search in design have been much more limited than for sidechain conformational search. This is especially true for combinatorial protein design algorithms, which aim to search a large sequence space efficiently and thus cannot rely on temporal simulation of each candidate sequence. Results We alleviate this difficulty with a new parameterization of backbone conformational space, which represents all degrees of freedom of a specified segment of protein chain that maintain valid bonding geometry (by maintaining the original bond lengths and angles and ω dihedrals). In order to search this space, we present an efficient algorithm, CATS, for computing atomic coordinates as a function of our new continuous backbone internal coordinates. CATS generalizes the iMinDEE and EPIC protein design algorithms, which model continuous flexibility in sidechain dihedrals, to model continuous, appropriately localized flexibility in the backbone dihedrals ϕ and ψ as well. We show using 81 test cases based on 29 different protein structures that CATS finds sequences and conformations that are significantly lower in energy than methods with less or no backbone flexibility do. In particular, we show that CATS can model the viability of an antibody mutation known experimentally to increase affinity, but that appears sterically infeasible when modeled with less or no backbone flexibility. Availability and implementation Our code is available as free software at https://github.com/donaldlab/OSPREY_refactor. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mark A Hallen
- Department of Computer Science, Duke University, Durham, NC, USA.,Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC, USA.,Department of Chemistry, Duke University, Durham, NC, USA.,Department of Biochemistry, Duke University Medical Center, Durham, NC, USA
| |
Collapse
|
52
|
Ojewole AA, Jou JD, Fowler VG, Donald BR. BBK* (Branch and Bound Over K*): A Provable and Efficient Ensemble-Based Protein Design Algorithm to Optimize Stability and Binding Affinity Over Large Sequence Spaces. J Comput Biol 2018; 25:726-739. [PMID: 29641249 DOI: 10.1089/cmb.2017.0267] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Computational protein design (CPD) algorithms that compute binding affinity, Ka, search for sequences with an energetically favorable free energy of binding. Recent work shows that three principles improve the biological accuracy of CPD: ensemble-based design, continuous flexibility of backbone and side-chain conformations, and provable guarantees of accuracy with respect to the input. However, previous methods that use all three design principles are single-sequence (SS) algorithms, which are very costly: linear in the number of sequences and thus exponential in the number of simultaneously mutable residues. To address this computational challenge, we introduce BBK*, a new CPD algorithm whose key innovation is the multisequence (MS) bound: BBK* efficiently computes a single provable upper bound to approximate Ka for a combinatorial number of sequences, and avoids SS computation for all provably suboptimal sequences. Thus, to our knowledge, BBK* is the first provable, ensemble-based CPD algorithm to run in time sublinear in the number of sequences. Computational experiments on 204 protein design problems show that BBK* finds the tightest binding sequences while approximating Ka for up to 105-fold fewer sequences than the previous state-of-the-art algorithms, which require exhaustive enumeration of sequences. Furthermore, for 51 protein-ligand design problems, BBK* provably approximates Ka up to 1982-fold faster than the previous state-of-the-art iMinDEE/[Formula: see text]/[Formula: see text] algorithm. Therefore, BBK* not only accelerates protein designs that are possible with previous provable algorithms, but also efficiently performs designs that are too large for previous methods.
Collapse
Affiliation(s)
- Adegoke A Ojewole
- 1 Department of Computer Science, Duke University , Durham, North Carolina.,2 Computational Biology and Bioinformatics Program, Duke University , Durham, North Carolina
| | - Jonathan D Jou
- 1 Department of Computer Science, Duke University , Durham, North Carolina
| | - Vance G Fowler
- 3 Division of Infectious Diseases, Duke University Medical Center , Durham, North Carolina
| | - Bruce R Donald
- 1 Department of Computer Science, Duke University , Durham, North Carolina.,4 Department of Biochemistry, Duke University Medical Center , Durham North Carolina
| |
Collapse
|
53
|
Leem J, Georges G, Shi J, Deane CM. Antibody side chain conformations are position-dependent. Proteins 2018; 86:383-392. [PMID: 29318667 DOI: 10.1002/prot.25453] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 12/15/2017] [Accepted: 01/05/2018] [Indexed: 11/11/2022]
Abstract
Side chain prediction is an integral component of computational antibody design and structure prediction. Current antibody modelling tools use backbone-dependent rotamer libraries with conformations taken from general proteins. Here we present our antibody-specific rotamer library, where rotamers are binned according to their immunogenetics (IMGT) position, rather than their local backbone geometry. We find that for some amino acid types at certain positions, only a restricted number of side chain conformations are ever observed. Using this information, we are able to reduce the breadth of the rotamer sampling space. Based on our rotamer library, we built a side chain predictor, position-dependent antibody rotamer swapper (PEARS). On a blind test set of 95 antibody model structures, PEARS had the highest average χ1 and χ1+2 accuracy (78.7% and 64.8%) compared to three leading backbone-dependent side chain predictors. Our use of IMGT position, rather than backbone ϕ/ψ, meant that PEARS was more robust to errors in the backbone of the model structure. PEARS also achieved the lowest number of side chain-side chain clashes. PEARS is freely available as a web application at http://opig.stats.ox.ac.uk/webapps/pears.
Collapse
Affiliation(s)
- Jinwoo Leem
- Department of Statistics, University of Oxford, 24-29 St Giles, Oxford, OX1 3LB, United Kingdom
| | - Guy Georges
- Pharma Research and Early Development, Large Molecule Research, Roche Innovation Center Munich, Nonnenwald 2, Penzberg, 82377, Germany
| | - Jiye Shi
- Chemistry Department, UCB, 208 Bath Road, Slough, SL1 3WE, United Kingdom
| | - Charlotte M Deane
- Department of Statistics, University of Oxford, 24-29 St Giles, Oxford, OX1 3LB, United Kingdom
| |
Collapse
|
54
|
A Critical Note on Symmetry Contact Artifacts and the Evaluation of the Quality of Homology Models. Symmetry (Basel) 2018. [DOI: 10.3390/sym10010025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
55
|
In silico methods for design of biological therapeutics. Methods 2017; 131:33-65. [PMID: 28958951 DOI: 10.1016/j.ymeth.2017.09.008] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 09/21/2017] [Accepted: 09/23/2017] [Indexed: 12/18/2022] Open
Abstract
It has been twenty years since the first rationally designed small molecule drug was introduced into the market. Since then, we have progressed from designing small molecules to designing biotherapeutics. This class of therapeutics includes designed proteins, peptides and nucleic acids that could more effectively combat drug resistance and even act in cases where the disease is caused because of a molecular deficiency. Computational methods are crucial in this design exercise and this review discusses the various elements of designing biotherapeutic proteins and peptides. Many of the techniques discussed here, such as the deterministic and stochastic design methods, are generally used in protein design. We have devoted special attention to the design of antibodies and vaccines. In addition to the methods for designing these molecules, we have included a comprehensive list of all biotherapeutics approved for clinical use. Also included is an overview of methods that predict the binding affinity, cell penetration ability, half-life, solubility, immunogenicity and toxicity of the designed therapeutics. Biotherapeutics are only going to grow in clinical importance and are set to herald a new generation of disease management and cure.
Collapse
|
56
|
Sindhikara D, Spronk SA, Day T, Borrelli K, Cheney DL, Posy SL. Improving Accuracy, Diversity, and Speed with Prime Macrocycle Conformational Sampling. J Chem Inf Model 2017; 57:1881-1894. [DOI: 10.1021/acs.jcim.7b00052] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Affiliation(s)
- Dan Sindhikara
- Schrödinger, Inc., 120 West 45th Street,
17th Floor, New York, New
York 10036, United States
| | - Steven A. Spronk
- Bristol-Myers
Squibb Research and Development, Computer-Assisted Drug Design, Molecular Discovery Technologies, P.O. Box 5400, Princeton, New Jersey 08543, United States
| | - Tyler Day
- Schrödinger, Inc., 120 West 45th Street,
17th Floor, New York, New
York 10036, United States
| | - Ken Borrelli
- Schrödinger, Inc., 120 West 45th Street,
17th Floor, New York, New
York 10036, United States
| | - Daniel L. Cheney
- Bristol-Myers
Squibb Research and Development, Computer-Assisted Drug Design, Molecular Discovery Technologies, P.O. Box 5400, Princeton, New Jersey 08543, United States
| | - Shana L. Posy
- Bristol-Myers
Squibb Research and Development, Computer-Assisted Drug Design, Molecular Discovery Technologies, P.O. Box 5400, Princeton, New Jersey 08543, United States
| |
Collapse
|
57
|
Towards designing new nano-scale protein architectures. Essays Biochem 2017; 60:315-324. [PMID: 27903819 DOI: 10.1042/ebc20160018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Revised: 08/11/2016] [Accepted: 08/18/2016] [Indexed: 11/17/2022]
Abstract
The complexity of designed bionano-scale architectures is rapidly increasing mainly due to the expanding field of DNA-origami technology and accurate protein design approaches. The major advantage offered by polypeptide nanostructures compared with most other polymers resides in their highly programmable complexity. Proteins allow in vivo formation of well-defined structures with a precise spatial arrangement of functional groups, providing extremely versatile nano-scale scaffolds. Extending beyond existing proteins that perform a wide range of functions in biological systems, it became possible in the last few decades to engineer and predict properties of completely novel protein folds, opening the field of protein nanostructure design. This review offers an overview on rational and computational design approaches focusing on the main achievements of novel protein nanostructure design.
Collapse
|
58
|
Pagadala NS, Syed K, Tuszynski J. Software for molecular docking: a review. Biophys Rev 2017; 9:91-102. [PMID: 28510083 DOI: 10.1007/s12551-016-0247-1] [Citation(s) in RCA: 654] [Impact Index Per Article: 93.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2016] [Accepted: 12/27/2016] [Indexed: 11/26/2022] Open
Abstract
Molecular docking methodology explores the behavior of small molecules in the binding site of a target protein. As more protein structures are determined experimentally using X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy, molecular docking is increasingly used as a tool in drug discovery. Docking against homology-modeled targets also becomes possible for proteins whose structures are not known. With the docking strategies, the druggability of the compounds and their specificity against a particular target can be calculated for further lead optimization processes. Molecular docking programs perform a search algorithm in which the conformation of the ligand is evaluated recursively until the convergence to the minimum energy is reached. Finally, an affinity scoring function, ΔG [U total in kcal/mol], is employed to rank the candidate poses as the sum of the electrostatic and van der Waals energies. The driving forces for these specific interactions in biological systems aim toward complementarities between the shape and electrostatics of the binding site surfaces and the ligand or substrate.
Collapse
Affiliation(s)
- Nataraj S Pagadala
- Department of Medical Microbiology and Immunology, Li Ka Shing Institute of Virology, 6-020 Katz Group Centre, University of Alberta, Edmonton, Alberta, T6G 2E1, Canada.
| | - Khajamohiddin Syed
- Unit for Drug Discovery Research, Department of Health Sciences, Faculty of Health and Environmental Sciences, Central University of Technology, Bloemfontein, 9300, Free State, South Africa
| | - Jack Tuszynski
- Department of Experimental Oncology, Cross Cancer Institute, Edmonton, Alberta, Canada
- Department of Physics, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
59
|
Ojewole A, Lowegard A, Gainza P, Reeve SM, Georgiev I, Anderson AC, Donald BR. OSPREY Predicts Resistance Mutations Using Positive and Negative Computational Protein Design. Methods Mol Biol 2017; 1529:291-306. [PMID: 27914058 PMCID: PMC5192561 DOI: 10.1007/978-1-4939-6637-0_15] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Drug resistance in protein targets is an increasingly common phenomenon that reduces the efficacy of both existing and new antibiotics. However, knowledge of future resistance mutations during pre-clinical phases of drug development would enable the design of novel antibiotics that are robust against not only known resistant mutants, but also against those that have not yet been clinically observed. Computational structure-based protein design (CSPD) is a transformative field that enables the prediction of protein sequences with desired biochemical properties such as binding affinity and specificity to a target. The use of CSPD to predict previously unseen resistance mutations represents one of the frontiers of computational protein design. In a recent study (Reeve et al. Proc Natl Acad Sci U S A 112(3):749-754, 2015), we used our OSPREY (Open Source Protein REdesign for You) suite of CSPD algorithms to prospectively predict resistance mutations that arise in the active site of the dihydrofolate reductase enzyme from methicillin-resistant Staphylococcus aureus (SaDHFR) in response to selective pressure from an experimental competitive inhibitor. We demonstrated that our top predicted candidates are indeed viable resistant mutants. Since that study, we have significantly enhanced the capabilities of OSPREY with not only improved modeling of backbone flexibility, but also efficient multi-state design, fast sparse approximations, partitioned continuous rotamers for more accurate energy bounds, and a computationally efficient representation of molecular-mechanics and quantum-mechanical energy functions. Here, using SaDHFR as an example, we present a protocol for resistance prediction using the latest version of OSPREY. Specifically, we show how to use a combination of positive and negative design to predict active site escape mutations that maintain the enzyme's catalytic function but selectively ablate binding of an inhibitor.
Collapse
Affiliation(s)
- Adegoke Ojewole
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC, 27708, USA
| | - Anna Lowegard
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC, 27708, USA
| | - Pablo Gainza
- Department of Computer Science, Duke University, Durham, NC, 27708, USA
| | - Stephanie M Reeve
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, CT, 06269, USA
| | - Ivelin Georgiev
- Department of Computer Science, Duke University, Durham, NC, 27708, USA
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, Bethesda, MD, 20892, USA
| | - Amy C Anderson
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, CT, 06269, USA
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC, 27708, USA.
- Department of Biochemistry, Duke University, Durham, NC, 27708, USA.
- Department of Chemistry, Duke University, Durham, NC, 27708, USA.
| |
Collapse
|
60
|
Abstract
Molecular docking is a key tool in structural biology and computer-assisted drug design. Molecular docking is a method which predicts the preferred orientation of a ligand when bound in an active site to form a stable complex. It is the most common method used as a structure-based drug design. Here, the authors intend to discuss the various types of docking methods and their development and applications in modern drug discovery. The important basic theories such as sampling algorithm and scoring functions have been discussed briefly. The performances of the different available docking software have also been discussed. This chapter also includes some application examples of docking studies in modern drug discovery such as targeted drug delivery using carbon nanotubes, docking of nucleic acids to find the binding modes and a comparative study between high-throughput screening and structure-based virtual screening.
Collapse
|
61
|
Traoré S, Allouche D, André I, Schiex T, Barbe S. Deterministic Search Methods for Computational Protein Design. Methods Mol Biol 2017; 1529:107-123. [PMID: 27914047 DOI: 10.1007/978-1-4939-6637-0_4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
One main challenge in Computational Protein Design (CPD) lies in the exploration of the amino-acid sequence space, while considering, to some extent, side chain flexibility. The exorbitant size of the search space urges for the development of efficient exact deterministic search methods enabling identification of low-energy sequence-conformation models, corresponding either to the global minimum energy conformation (GMEC) or an ensemble of guaranteed near-optimal solutions. In contrast to stochastic local search methods that are not guaranteed to find the GMEC, exact deterministic approaches always identify the GMEC and prove its optimality in finite but exponential worst-case time. After a brief overview on these two classes of methods, we discuss the grounds and merits of four deterministic methods that have been applied to solve CPD problems. These approaches are based either on the Dead-End-Elimination theorem combined with A* algorithm (DEE/A*), on Cost Function Networks algorithms (CFN), on Integer Linear Programming solvers (ILP) or on Markov Random Fields solvers (MRF). The way two of these methods (DEE/A* and CFN) can be used in practice to identify low-energy sequence-conformation models starting from a pairwise decomposed energy matrix is detailed in this review.
Collapse
Affiliation(s)
- Seydou Traoré
- INSA, UPS, INP, Université de Toulouse, 135 Avenue de Rangueil, 31077, Toulouse, France
- Laboratoire d'Ingénierie Ingénierie des Systèmes Biologiques et des Procédés - INSA, INRA, UMR792, 31400, Toulouse, France
- CNRS, UMR5504, 31400, Toulouse, France
| | - David Allouche
- Unité de Mathématiques et Informatique de Toulouse, UR 875, INRA, 31320, Castanet Tolosan, France
| | - Isabelle André
- INSA, UPS, INP, Université de Toulouse, 135 Avenue de Rangueil, 31077, Toulouse, France
- Laboratoire d'Ingénierie Ingénierie des Systèmes Biologiques et des Procédés - INSA, INRA, UMR792, 31400, Toulouse, France
- CNRS, UMR5504, 31400, Toulouse, France
| | - Thomas Schiex
- Unité de Mathématiques et Informatique de Toulouse, UR 875, INRA, 31320, Castanet Tolosan, France
| | - Sophie Barbe
- INSA, UPS, INP, Université de Toulouse, 135 Avenue de Rangueil, 31077, Toulouse, France.
- Laboratoire d'Ingénierie Ingénierie des Systèmes Biologiques et des Procédés - INSA, INRA, UMR792, 31400, Toulouse, France.
- CNRS, UMR5504, 31400, Toulouse, France.
| |
Collapse
|
62
|
Abstract
Computational protein design (CPD), a yet evolving field, includes computer-aided engineering for partial or full de novo designs of proteins of interest. Designs are defined by a requested structure, function, or working environment. This chapter describes the birth and maturation of the field by presenting 101 CPD examples in a chronological order emphasizing achievements and pending challenges. Integrating these aspects presents the plethora of CPD approaches with the hope of providing a "CPD 101". These reflect on the broader structural bioinformatics and computational biophysics field and include: (1) integration of knowledge-based and energy-based methods, (2) hierarchical designated approach towards local, regional, and global motifs and the integration of high- and low-resolution design schemes that fit each such region, (3) systematic differential approaches towards different protein regions, (4) identification of key hot-spot residues and the relative effect of remote regions, (5) assessment of shape-complementarity, electrostatics and solvation effects, (6) integration of thermal plasticity and functional dynamics, (7) negative design, (8) systematic integration of experimental approaches, (9) objective cross-assessment of methods, and (10) successful ranking of potential designs. Future challenges also include dissemination of CPD software to the general use of life-sciences researchers and the emphasis of success within an in vivo milieu. CPD increases our understanding of protein structure and function and the relationships between the two along with the application of such know-how for the benefit of mankind. Applied aspects range from biological drugs, via healthier and tastier food products to nanotechnology and environmentally friendly enzymes replacing toxic chemicals utilized in the industry.
Collapse
|
63
|
Abstract
Computational structure-based protein design (CSPD) is an important problem in computational biology, which aims to design or improve a prescribed protein function based on a protein structure template. It provides a practical tool for real-world protein engineering applications. A popular CSPD method that guarantees to find the global minimum energy solution (GMEC) is to combine both dead-end elimination (DEE) and A* tree search algorithms. However, in this framework, the A* search algorithm can run in exponential time in the worst case, which may become the computation bottleneck of large-scale computational protein design process. To address this issue, we extend and add a new module to the OSPREY program that was previously developed in the Donald lab (Gainza et al., Methods Enzymol 523:87, 2013) to implement a GPU-based massively parallel A* algorithm for improving protein design pipeline. By exploiting the modern GPU computational framework and optimizing the computation of the heuristic function for A* search, our new program, called gOSPREY, can provide up to four orders of magnitude speedups in large protein design cases with a small memory overhead comparing to the traditional A* search algorithm implementation, while still guaranteeing the optimality. In addition, gOSPREY can be configured to run in a bounded-memory mode to tackle the problems in which the conformation space is too large and the global optimal solution cannot be computed previously. Furthermore, the GPU-based A* algorithm implemented in the gOSPREY program can be combined with the state-of-the-art rotamer pruning algorithms such as iMinDEE (Gainza et al., PLoS Comput Biol 8:e1002335, 2012) and DEEPer (Hallen et al., Proteins 81:18-39, 2013) to also consider continuous backbone and side-chain flexibility.
Collapse
Affiliation(s)
- Yichao Zhou
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, P. R. China
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC, USA
- Department of Biochemistry, Duke University Medical Center, Durham, NC, USA
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, P. R. China.
| |
Collapse
|
64
|
Abstract
Protein-protein interactions play critical roles in essentially every cellular process. These interactions are often mediated by protein interaction domains that enable proteins to recognize their interaction partners, often by binding to short peptide motifs. For example, PDZ domains, which are among the most common protein interaction domains in the human proteome, recognize specific linear peptide sequences that are often at the C-terminus of other proteins. Determining the set of peptide sequences that a protein interaction domain binds, or it's "peptide specificity," is crucial for understanding its cellular function, and predicting how mutations impact peptide specificity is important for elucidating the mechanisms underlying human diseases. Moreover, engineering novel cellular functions for synthetic biology applications, such as the biosynthesis of biofuels or drugs, requires the design of protein interaction specificity to avoid crosstalk with native metabolic and signaling pathways. The ability to accurately predict and design protein-peptide interaction specificity is therefore critical for understanding and engineering biological function. One approach that has recently been employed toward accomplishing this goal is computational protein design. This chapter provides an overview of recent methodological advances in computational protein design and highlights examples of how these advances can enable increased accuracy in predicting and designing peptide specificity.
Collapse
Affiliation(s)
- Noah Ollikainen
- Division of Biology and Biological Engineering, California Institute of Technology, 1200 East California Blvd., Pasadena, CA, 91125, USA
| |
Collapse
|
65
|
Kumar A, Ranbhor R, Patel K, Ramakrishnan V, Durani S. Automated protein design: Landmarks and operational principles. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2016; 125:24-35. [PMID: 27979438 DOI: 10.1016/j.pbiomolbio.2016.12.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 12/06/2016] [Indexed: 11/25/2022]
Abstract
Protein design has an eventful history spanning over three decades, with handful of success stories reported, and numerous failures not reported. Design practices have benefited tremendously from improvements in computer hardware and advances in scientific algorithms. Though protein folding problem still remains unsolved, the possibility of having multiple sequence solutions for a single fold makes protein design a more tractable problem than protein folding. One of the most significant advancement in this area is the implementation of automated design algorithms on pre-defined templates or completely new folds, optimized through deterministic and heuristic search algorithms. This progress report provides a succinct presentation of important landmarks in automated design attempts, followed by brief account of operational principles in automated design methods.
Collapse
Affiliation(s)
- Anil Kumar
- Department of Chemistry, University of Toronto, ON, M5S3H6, Canada.
| | | | | | - Vibin Ramakrishnan
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, 781039, India.
| | - Susheel Durani
- Department of Chemistry, Indian Institute of Technology, Bombay, 400076, India
| |
Collapse
|
66
|
Computational protein design with backbone plasticity. Biochem Soc Trans 2016; 44:1523-1529. [PMID: 27911735 PMCID: PMC5264498 DOI: 10.1042/bst20160155] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Revised: 08/01/2016] [Accepted: 08/03/2016] [Indexed: 11/17/2022]
Abstract
The computational algorithms used in the design of artificial proteins have become increasingly sophisticated in recent years, producing a series of remarkable successes. The most dramatic of these is the de novo design of artificial enzymes. The majority of these designs have reused naturally occurring protein structures as ‘scaffolds’ onto which novel functionality can be grafted without having to redesign the backbone structure. The incorporation of backbone flexibility into protein design is a much more computationally challenging problem due to the greatly increased search space, but promises to remove the limitations of reusing natural protein scaffolds. In this review, we outline the principles of computational protein design methods and discuss recent efforts to consider backbone plasticity in the design process.
Collapse
|
67
|
Au L, Green DF. Direct Calculation of Protein Fitness Landscapes through Computational Protein Design. Biophys J 2016; 110:75-84. [PMID: 26745411 DOI: 10.1016/j.bpj.2015.11.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Revised: 11/03/2015] [Accepted: 11/16/2015] [Indexed: 11/24/2022] Open
Abstract
Naturally selected amino-acid sequences or experimentally derived ones are often the basis for understanding how protein three-dimensional conformation and function are determined by primary structure. Such sequences for a protein family comprise only a small fraction of all possible variants, however, representing the fitness landscape with limited scope. Explicitly sampling and characterizing alternative, unexplored protein sequences would directly identify fundamental reasons for sequence robustness (or variability), and we demonstrate that computational methods offer an efficient mechanism toward this end, on a large scale. The dead-end elimination and A(∗) search algorithms were used here to find all low-energy single mutant variants, and corresponding structures of a G-protein heterotrimer, to measure changes in structural stability and binding interactions to define a protein fitness landscape. We established consistency between these algorithms with known biophysical and evolutionary trends for amino-acid substitutions, and could thus recapitulate known protein side-chain interactions and predict novel ones.
Collapse
Affiliation(s)
- Loretta Au
- Department of Statistics, The University of Chicago, Chicago, Illinois.
| | - David F Green
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York
| |
Collapse
|
68
|
Hallen MA, Jou JD, Donald BR. LUTE (Local Unpruned Tuple Expansion): Accurate Continuously Flexible Protein Design with General Energy Functions and Rigid Rotamer-Like Efficiency. J Comput Biol 2016; 24:536-546. [PMID: 27681371 DOI: 10.1089/cmb.2016.0136] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Most protein design algorithms search over discrete conformations and an energy function that is residue-pairwise, that is, a sum of terms that depend on the sequence and conformation of at most two residues. Although modeling of continuous flexibility and of non-residue-pairwise energies significantly increases the accuracy of protein design, previous methods to model these phenomena add a significant asymptotic cost to design calculations. We now remove this cost by modeling continuous flexibility and non-residue-pairwise energies in a form suitable for direct input to highly efficient, discrete combinatorial optimization algorithms such as DEE/A* or branch-width minimization. Our novel algorithm performs a local unpruned tuple expansion (LUTE), which can efficiently represent both continuous flexibility and general, possibly nonpairwise energy functions to an arbitrary level of accuracy using a discrete energy matrix. We show using 47 design calculation test cases that LUTE provides a dramatic speedup in both single-state and multistate continuously flexible designs.
Collapse
Affiliation(s)
- Mark A Hallen
- 1 Department of Computer Science, Levine Science Research Center, Duke University , Durham, North Carolina
| | - Jonathan D Jou
- 1 Department of Computer Science, Levine Science Research Center, Duke University , Durham, North Carolina
| | - Bruce R Donald
- 1 Department of Computer Science, Levine Science Research Center, Duke University , Durham, North Carolina.,2 Department of Chemistry, Duke University , Durham, North Carolina.,3 Department of Biochemistry, Duke University Medical Center , Durham, North Carolina
| |
Collapse
|
69
|
Allouche D, Bessiere C, Boizumault P, de Givry S, Gutierrez P, Lee JH, Leung KL, Loudni S, Métivier JP, Schiex T, Wu Y. Tractability-preserving transformations of global cost functions. ARTIF INTELL 2016. [DOI: 10.1016/j.artint.2016.06.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
70
|
Pan Y, Dong Y, Zhou J, Hallen M, Donald BR, Zeng J, Xu W. cOSPREY: A Cloud-Based Distributed Algorithm for Large-Scale Computational Protein Design. J Comput Biol 2016; 23:737-49. [PMID: 27154509 PMCID: PMC5586165 DOI: 10.1089/cmb.2015.0234] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Finding the global minimum energy conformation (GMEC) of a huge combinatorial search space is the key challenge in computational protein design (CPD) problems. Traditional algorithms lack a scalable and efficient distributed design scheme, preventing researchers from taking full advantage of current cloud infrastructures. We design cloud OSPREY (cOSPREY), an extension to a widely used protein design software OSPREY, to allow the original design framework to scale to the commercial cloud infrastructures. We propose several novel designs to integrate both algorithm and system optimizations, such as GMEC-specific pruning, state search partitioning, asynchronous algorithm state sharing, and fault tolerance. We evaluate cOSPREY on three different cloud platforms using different technologies and show that it can solve a number of large-scale protein design problems that have not been possible with previous approaches.
Collapse
Affiliation(s)
- Yuchao Pan
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Yuxi Dong
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Jingtian Zhou
- Department of Pharmacology and Pharmaceutical Sciences, Tsinghua University, Beijing, China
| | - Mark Hallen
- Department of Computer Science, Duke University, Durham, North Carolina
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina
| | - Bruce R. Donald
- Department of Computer Science, Duke University, Durham, North Carolina
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Wei Xu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| |
Collapse
|
71
|
Koh SK, Ananthasuresh GK, Vishveshwara S. A Deterministic Optimization Approach to Protein Sequence Design Using Continuous Models. Int J Rob Res 2016. [DOI: 10.1177/0278364905050354] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Determining the sequence of amino acid residues in a heteropolymer chain of a protein with a given conformation is a discrete combinatorial problem that is not generally amenable for gradient-based continuous optimization algorithms. In this paper we present a new approach to this problem using continuous models. In this modeling, continuous “state functions” are proposed to designate the type of each residue in the chain. Such a continuous model helps define a continuous sequence space in which a chosen criterion is optimized to find the most appropriate sequence. Searching a continuous sequence space using a deterministic optimization algorithm makes it possible to find the optimal sequences with much less computation than many other approaches. The computational efficiency of this method is further improved by combining it with a graph spectral method, which explicitly takes into account the topology of the desired conformation and also helps make the combined method more robust. The continuous modeling used here appears to have additional advantages in mimicking the folding pathways and in creating the energy landscapes that help find sequences with high stability and kinetic accessibility. To illustrate the new approach, a widely used simplifying assumption is made by considering only two types of residues: hydrophobic (H) and polar (P). Self-avoiding compact lattice models are used to validate the method with known results in the literature and data that can be practically obtained by exhaustive enumeration on a desktop computer. We also present examples of sequence design for the HP models of some real proteins, which are solved in less than five minutes on a single-processor desktop computer. Some open issues and future extensions are noted.
Collapse
Affiliation(s)
- Sung K. Koh
- Mechanical Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, 19104-6315, USA
| | - G. K. Ananthasuresh
- Mechanical Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, 19104-6315, USA and Mechanical Engineering, Indian Institute of Science, Bangalore 560 012, India,
| | | |
Collapse
|
72
|
Swoboda P, Shekhovtsov A, Kappes JH, Schnorr C, Savchynskyy B. Partial Optimality by Pruning for MAP-Inference with General Graphical Models. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2016; 38:1370-1382. [PMID: 26468978 DOI: 10.1109/tpami.2015.2484327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We consider the energy minimization problem for undirected graphical models, also known as MAP-inference problem for Markov random fields which is NP-hard in general. We propose a novel polynomial time algorithm to obtain a part of its optimal non-relaxed integral solution. Our algorithm is initialized with variables taking integral values in the solution of a convex relaxation of the MAP-inference problem and iteratively prunes those, which do not satisfy our criterion for partial optimality. We show that our pruning strategy is in a certain sense theoretically optimal. Also empirically our method outperforms previous approaches in terms of the number of persistently labelled variables. The method is very general, as it is applicable to models with arbitrary factors of an arbitrary order and can employ any solver for the considered relaxed problem. Our method's runtime is determined by the runtime of the convex relaxation solver for the MAP-inference problem.
Collapse
|
73
|
Kmiecik S, Gront D, Kolinski M, Wieteska L, Dawid AE, Kolinski A. Coarse-Grained Protein Models and Their Applications. Chem Rev 2016; 116:7898-936. [DOI: 10.1021/acs.chemrev.6b00163] [Citation(s) in RCA: 555] [Impact Index Per Article: 69.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Sebastian Kmiecik
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Dominik Gront
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Michal Kolinski
- Bioinformatics
Laboratory, Mossakowski Medical Research Center of the Polish Academy of Sciences, Pawinskiego 5, 02-106 Warsaw, Poland
| | - Lukasz Wieteska
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
- Department
of Medical Biochemistry, Medical University of Lodz, Mazowiecka 6/8, 92-215 Lodz, Poland
| | | | - Andrzej Kolinski
- Faculty
of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| |
Collapse
|
74
|
Xiao X, Agris PF, Hall CK. Designing peptide sequences in flexible chain conformations to bind RNA: a search algorithm combining Monte Carlo, self-consistent mean field and concerted rotation techniques. J Chem Theory Comput 2016; 11:740-52. [PMID: 26579605 DOI: 10.1021/ct5008247] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
A search algorithm combining Monte Carlo, self-consistent mean field, and concerted rotation techniques was developed to discover peptide sequences that are reasonable HIV drug candidates due to their exceptional binding to human tRNAUUU(Lys3), the primer of HIV replication. The search algorithm allows for iteration between sequence mutations and conformation changes during sequence evolution. Searches conducted for different classes of peptides identified several potential peptide candidates. Analysis of the energy revealed that the asparagine and cysteine at residues 11 and 12 play important roles in "recognizing" tRNA(Lys3) via van der Waals interactions, contributing to binding specificity. Arginines preferentially attract the phosphate linkage via charge-charge interaction, contributing to binding affinity. Evaluation of the RNA/peptide complex's structure revealed that adding conformation changes to the search algorithm yields peptides with better binding affinity and specificity to tRNA(Lys3) than a previous mutation-only algorithm.
Collapse
Affiliation(s)
- Xingqing Xiao
- Chemical and Biomolecular Engineering Department, North Carolina State University , Raleigh, North Carolina 27695-7905, United States
| | - Paul F Agris
- The RNA Institute, University at Albany, State University of New York , Albany, New York 12222, United States
| | - Carol K Hall
- Chemical and Biomolecular Engineering Department, North Carolina State University , Raleigh, North Carolina 27695-7905, United States
| |
Collapse
|
75
|
Zhou Y, Wu Y, Zeng J. Computational Protein Design Using AND/OR Branch-and-Bound Search. J Comput Biol 2016; 23:439-51. [DOI: 10.1089/cmb.2015.0212] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Affiliation(s)
- Yichao Zhou
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Yuexin Wu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| |
Collapse
|
76
|
Ryu J, Lee M, Cha J, Laskowski RA, Ryu SE, Kim DS. BetaSCPWeb: side-chain prediction for protein structures using Voronoi diagrams and geometry prioritization. Nucleic Acids Res 2016; 44:W416-23. [PMID: 27151195 PMCID: PMC4987919 DOI: 10.1093/nar/gkw368] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Accepted: 04/23/2016] [Indexed: 11/13/2022] Open
Abstract
Many applications, such as protein design, homology modeling, flexible docking, etc. require the prediction of a protein's optimal side-chain conformations from just its amino acid sequence and backbone structure. Side-chain prediction (SCP) is an NP-hard energy minimization problem. Here, we present BetaSCPWeb which efficiently computes a conformation close to optimal using a geometry-prioritization method based on the Voronoi diagram of spherical atoms. Its outputs are visual, textual and PDB file format. The web server is free and open to all users at http://voronoi.hanyang.ac.kr/betascpweb with no login requirement.
Collapse
Affiliation(s)
- Joonghyun Ryu
- Vorononi Diagram Research Center, Hanyang University, Korea
| | - Mokwon Lee
- Vorononi Diagram Research Center, Hanyang University, Korea
| | - Jehyun Cha
- Vorononi Diagram Research Center, Hanyang University, Korea
| | | | - Seong Eon Ryu
- Department of Bioengineering, Hanyang University, Korea
| | - Deok-Soo Kim
- School of Mechanical Engineering, Hanyang University, Korea
| |
Collapse
|
77
|
Traoré S, Roberts KE, Allouche D, Donald BR, André I, Schiex T, Barbe S. Fast search algorithms for computational protein design. J Comput Chem 2016; 37:1048-58. [PMID: 26833706 PMCID: PMC4828276 DOI: 10.1002/jcc.24290] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Revised: 09/23/2015] [Accepted: 11/27/2015] [Indexed: 12/12/2022]
Abstract
One of the main challenges in computational protein design (CPD) is the huge size of the protein sequence and conformational space that has to be computationally explored. Recently, we showed that state-of-the-art combinatorial optimization technologies based on Cost Function Network (CFN) processing allow speeding up provable rigid backbone protein design methods by several orders of magnitudes. Building up on this, we improved and injected CFN technology into the well-established CPD package Osprey to allow all Osprey CPD algorithms to benefit from associated speedups. Because Osprey fundamentally relies on the ability of A* to produce conformations in increasing order of energy, we defined new A* strategies combining CFN lower bounds, with new side-chain positioning-based branching scheme. Beyond the speedups obtained in the new A*-CFN combination, this novel branching scheme enables a much faster enumeration of suboptimal sequences, far beyond what is reachable without it. Together with the immediate and important speedups provided by CFN technology, these developments directly benefit to all the algorithms that previously relied on the DEE/ A* combination inside Osprey* and make it possible to solve larger CPD problems with provable algorithms.
Collapse
Affiliation(s)
- Seydou Traoré
- Université de Toulouse; INSA, UPS, INP; LISBP, 135 Avenue de Rangueil, F-31077 Toulouse, France
- INRA, UMR792, Ingénierie des Systèmes Biologiques et des Procédés, F-31400 Toulouse, France
- CNRS, UMR5504, F-31400 Toulouse, France
| | - Kyle E. Roberts
- Department of Biochemistry, Department of Computer Science, Department of Chemistry, Duke University, Durham, NC, USA
| | - David Allouche
- Unité de Mathématiques et Informatique Appliquées de Toulouse, UR 875, INRA, F-31320 Castanet Tolosan, France
| | - Bruce R. Donald
- Department of Biochemistry, Department of Computer Science, Department of Chemistry, Duke University, Durham, NC, USA
| | - Isabelle André
- Université de Toulouse; INSA, UPS, INP; LISBP, 135 Avenue de Rangueil, F-31077 Toulouse, France
- INRA, UMR792, Ingénierie des Systèmes Biologiques et des Procédés, F-31400 Toulouse, France
- CNRS, UMR5504, F-31400 Toulouse, France
| | - Thomas Schiex
- Unité de Mathématiques et Informatique Appliquées de Toulouse, UR 875, INRA, F-31320 Castanet Tolosan, France
| | - Sophie Barbe
- Université de Toulouse; INSA, UPS, INP; LISBP, 135 Avenue de Rangueil, F-31077 Toulouse, France
- INRA, UMR792, Ingénierie des Systèmes Biologiques et des Procédés, F-31400 Toulouse, France
- CNRS, UMR5504, F-31400 Toulouse, France
| |
Collapse
|
78
|
Purvine E, Monson K, Jurrus E, Star K, Baker NA. Energy Minimization of Discrete Protein Titration State Models Using Graph Theory. J Phys Chem B 2016; 120:8354-60. [PMID: 27089174 DOI: 10.1021/acs.jpcb.6b02059] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
There are several applications in computational biophysics that require the optimization of discrete interacting states, for example, amino acid titration states, ligand oxidation states, or discrete rotamer angles. Such optimization can be very time-consuming as it scales exponentially in the number of sites to be optimized. In this paper, we describe a new polynomial time algorithm for optimization of discrete states in macromolecular systems. This algorithm was adapted from image processing and uses techniques from discrete mathematics and graph theory to restate the optimization problem in terms of "maximum flow-minimum cut" graph analysis. The interaction energy graph, a graph in which vertices (amino acids) and edges (interactions) are weighted with their respective energies, is transformed into a flow network in which the value of the minimum cut in the network equals the minimum free energy of the protein and the cut itself encodes the state that achieves the minimum free energy. Because of its deterministic nature and polynomial time performance, this algorithm has the potential to allow for the ionization state of larger proteins to be discovered.
Collapse
Affiliation(s)
| | | | | | | | - Nathan A Baker
- Division of Applied Mathematics, Brown University , Providence, Rhode Island 02912, United States
| |
Collapse
|
79
|
Pakulska MM, Miersch S, Shoichet MS. Designer protein delivery: From natural to engineered affinity-controlled release systems. Science 2016; 351:aac4750. [PMID: 26989257 DOI: 10.1126/science.aac4750] [Citation(s) in RCA: 107] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Exploiting binding affinities between molecules is an established practice in many fields, including biochemical separations, diagnostics, and drug development; however, using these affinities to control biomolecule release is a more recent strategy. Affinity-controlled release takes advantage of the reversible nature of noncovalent interactions between a therapeutic protein and a binding partner to slow the diffusive release of the protein from a vehicle. This process, in contrast to degradation-controlled sustained-release formulations such as poly(lactic-co-glycolic acid) microspheres, is controlled through the strength of the binding interaction, the binding kinetics, and the concentration of binding partners. In the context of affinity-controlled release--and specifically the discovery or design of binding partners--we review advances in in vitro selection and directed evolution of proteins, peptides, and oligonucleotides (aptamers), aided by computational design.
Collapse
Affiliation(s)
- Malgosia M Pakulska
- Department of Chemical Engineering and Applied Chemistry, Institute of Biomaterials and Biomedical Engineering, and Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Shane Miersch
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Molly S Shoichet
- Department of Chemical Engineering and Applied Chemistry, Institute of Biomaterials and Biomedical Engineering, and Donnelly Centre, University of Toronto, Toronto, Ontario, Canada. Department of Chemistry, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
80
|
Gaillard T, Panel N, Simonson T. Protein side chain conformation predictions with an MMGBSA energy function. Proteins 2016; 84:803-19. [PMID: 26948696 DOI: 10.1002/prot.25030] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Revised: 02/22/2016] [Accepted: 02/27/2016] [Indexed: 12/17/2022]
Abstract
The prediction of protein side chain conformations from backbone coordinates is an important task in structural biology, with applications in structure prediction and protein design. It is a difficult problem due to its combinatorial nature. We study the performance of an "MMGBSA" energy function, implemented in our protein design program Proteus, which combines molecular mechanics terms, a Generalized Born and Surface Area (GBSA) solvent model, with approximations that make the model pairwise additive. Proteus is not a competitor to specialized side chain prediction programs due to its cost, but it allows protein design applications, where side chain prediction is an important step and MMGBSA an effective energy model. We predict the side chain conformations for 18 proteins. The side chains are first predicted individually, with the rest of the protein in its crystallographic conformation. Next, all side chains are predicted together. The contributions of individual energy terms are evaluated and various parameterizations are compared. We find that the GB and SA terms, with an appropriate choice of the dielectric constant and surface energy coefficients, are beneficial for single side chain predictions. For the prediction of all side chains, however, errors due to the pairwise additive approximation overcome the improvement brought by these terms. We also show the crucial contribution of side chain minimization to alleviate the rigid rotamer approximation. Even without GB and SA terms, we obtain accuracies comparable to SCWRL4, a specialized side chain prediction program. In particular, we obtain a better RMSD than SCWRL4 for core residues (at a higher cost), despite our simpler rotamer library. Proteins 2016; 84:803-819. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Thomas Gaillard
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Nicolas Panel
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Thomas Simonson
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| |
Collapse
|
81
|
Maximova T, Moffatt R, Ma B, Nussinov R, Shehu A. Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics. PLoS Comput Biol 2016; 12:e1004619. [PMID: 27124275 PMCID: PMC4849799 DOI: 10.1371/journal.pcbi.1004619] [Citation(s) in RCA: 132] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Investigation of macromolecular structure and dynamics is fundamental to understanding how macromolecules carry out their functions in the cell. Significant advances have been made toward this end in silico, with a growing number of computational methods proposed yearly to study and simulate various aspects of macromolecular structure and dynamics. This review aims to provide an overview of recent advances, focusing primarily on methods proposed for exploring the structure space of macromolecules in isolation and in assemblies for the purpose of characterizing equilibrium structure and dynamics. In addition to surveying recent applications that showcase current capabilities of computational methods, this review highlights state-of-the-art algorithmic techniques proposed to overcome challenges posed in silico by the disparate spatial and time scales accessed by dynamic macromolecules. This review is not meant to be exhaustive, as such an endeavor is impossible, but rather aims to balance breadth and depth of strategies for modeling macromolecular structure and dynamics for a broad audience of novices and experts.
Collapse
Affiliation(s)
- Tatiana Maximova
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
| | - Ryan Moffatt
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
| | - Buyong Ma
- Basic Science Program, Leidos Biomedical Research, Inc. Cancer and Inflammation Program, National Cancer Institute, Frederick, Maryland, United States of America
| | - Ruth Nussinov
- Basic Science Program, Leidos Biomedical Research, Inc. Cancer and Inflammation Program, National Cancer Institute, Frederick, Maryland, United States of America
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
- Department of Biongineering, George Mason University, Fairfax, Virginia, United States of America
- School of Systems Biology, George Mason University, Manassas, Virginia, United States of America
| |
Collapse
|
82
|
Du X, Li Y, Xia YL, Ai SM, Liang J, Sang P, Ji XL, Liu SQ. Insights into Protein-Ligand Interactions: Mechanisms, Models, and Methods. Int J Mol Sci 2016; 17:ijms17020144. [PMID: 26821017 PMCID: PMC4783878 DOI: 10.3390/ijms17020144] [Citation(s) in RCA: 738] [Impact Index Per Article: 92.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Revised: 01/13/2016] [Accepted: 01/18/2016] [Indexed: 01/16/2023] Open
Abstract
Molecular recognition, which is the process of biological macromolecules interacting with each other or various small molecules with a high specificity and affinity to form a specific complex, constitutes the basis of all processes in living organisms. Proteins, an important class of biological macromolecules, realize their functions through binding to themselves or other molecules. A detailed understanding of the protein–ligand interactions is therefore central to understanding biology at the molecular level. Moreover, knowledge of the mechanisms responsible for the protein-ligand recognition and binding will also facilitate the discovery, design, and development of drugs. In the present review, first, the physicochemical mechanisms underlying protein–ligand binding, including the binding kinetics, thermodynamic concepts and relationships, and binding driving forces, are introduced and rationalized. Next, three currently existing protein-ligand binding models—the “lock-and-key”, “induced fit”, and “conformational selection”—are described and their underlying thermodynamic mechanisms are discussed. Finally, the methods available for investigating protein–ligand binding affinity, including experimental and theoretical/computational approaches, are introduced, and their advantages, disadvantages, and challenges are discussed.
Collapse
Affiliation(s)
- Xing Du
- Laboratory for Conservation and Utilization of Bio-Resources, Yunnan University, Kunming 650091, China.
| | - Yi Li
- Laboratory for Conservation and Utilization of Bio-Resources, Yunnan University, Kunming 650091, China.
| | - Yuan-Ling Xia
- Laboratory for Conservation and Utilization of Bio-Resources, Yunnan University, Kunming 650091, China.
| | - Shi-Meng Ai
- Laboratory for Conservation and Utilization of Bio-Resources, Yunnan University, Kunming 650091, China.
- Department of Applied Mathematics, Yunnan Agricultural University, Kunming 650201, China.
| | - Jing Liang
- Laboratory for Conservation and Utilization of Bio-Resources, Yunnan University, Kunming 650091, China.
| | - Peng Sang
- Laboratory for Conservation and Utilization of Bio-Resources, Yunnan University, Kunming 650091, China.
- Laboratory of Molecular Cardiology, Department of Cardiology, The First Affiliated Hospital of Kunming Medical University, Kunming 650032, China.
| | - Xing-Lai Ji
- Laboratory for Conservation and Utilization of Bio-Resources, Yunnan University, Kunming 650091, China.
- Key Laboratory for Tumor molecular biology of High Education in Yunnan Province, School of Life Sciences, Yunnan University, Kunming 650091, China.
| | - Shu-Qun Liu
- Laboratory for Conservation and Utilization of Bio-Resources, Yunnan University, Kunming 650091, China.
- Key Laboratory for Tumor molecular biology of High Education in Yunnan Province, School of Life Sciences, Yunnan University, Kunming 650091, China.
| |
Collapse
|
83
|
Hallen MA, Donald BR. comets (Constrained Optimization of Multistate Energies by Tree Search): A Provable and Efficient Protein Design Algorithm to Optimize Binding Affinity and Specificity with Respect to Sequence. J Comput Biol 2016; 23:311-21. [PMID: 26761641 DOI: 10.1089/cmb.2015.0188] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Practical protein design problems require designing sequences with a combination of affinity, stability, and specificity requirements. Multistate protein design algorithms model multiple structural or binding "states" of a protein to address these requirements. comets provides a new level of versatile, efficient, and provable multistate design. It provably returns the minimum with respect to sequence of any desired linear combination of the energies of multiple protein states, subject to constraints on other linear combinations. Thus, it can target nearly any combination of affinity (to one or multiple ligands), specificity, and stability (for multiple states if needed). Empirical calculations on 52 protein design problems showed comets is far more efficient than the previous state of the art for provable multistate design (exhaustive search over sequences). comets can handle a very wide range of protein flexibility and can enumerate a gap-free list of the best constraint-satisfying sequences in order of objective function value.
Collapse
Affiliation(s)
- Mark A Hallen
- 1 Department of Computer Science, Levine Science Research Center, Duke University , North Carolina
- 2 Department of Biochemistry, Duke University Medical Center , Durham, North Carolina
| | - Bruce R Donald
- 1 Department of Computer Science, Levine Science Research Center, Duke University , North Carolina
- 2 Department of Biochemistry, Duke University Medical Center , Durham, North Carolina
- 3 Department of Chemistry, Duke University , Durham, North Carolina
| |
Collapse
|
84
|
Tak Kam VW, Goddard WA. Flat-Bottom Strategy for Improved Accuracy in Protein Side-Chain Placements. J Chem Theory Comput 2015; 4:2160-9. [PMID: 26620487 DOI: 10.1021/ct800196k] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We present a new strategy for protein side-chain placement that uses flat-bottom potentials for rotamer scoring. The extent of the flat bottom depends on the coarseness of the rotamer library and is optimized for libraries ranging from diversities of 0.2 Å to 5.0 Å. The parameters reported here were optimized for forcefields using Lennard-Jones 12-6 van der Waals potential with DREIDING parameters but are expected to be similar for AMBER, CHARMM, and other forcefields. This Side-Chain Rotamer Excitation Analysis Method is implemented in the SCREAM software package. Similar scoring function strategies should be useful for ligand docking, virtual ligand screening, and protein folding applications.
Collapse
Affiliation(s)
- Victor Wai Tak Kam
- Materials and Process Simulation Center (MC-139-74), California Institute of Technology, Pasadena, California 91125
| | - William A Goddard
- Materials and Process Simulation Center (MC-139-74), California Institute of Technology, Pasadena, California 91125
| |
Collapse
|
85
|
Simoncini D, Allouche D, de Givry S, Delmas C, Barbe S, Schiex T. Guaranteed Discrete Energy Optimization on Large Protein Design Problems. J Chem Theory Comput 2015; 11:5980-9. [DOI: 10.1021/acs.jctc.5b00594] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
| | - David Allouche
- INRA MIAT, UR 875, Castanet-Tolosan, 31326 Cedex, France
| | - Simon de Givry
- INRA MIAT, UR 875, Castanet-Tolosan, 31326 Cedex, France
| | - Céline Delmas
- INRA MIAT, UR 875, Castanet-Tolosan, 31326 Cedex, France
| | - Sophie Barbe
- Université de Toulouse; INSA, UPS, INP; LISBP, 135 Avenue de Rangueil, F-31077 Toulouse, France
- CNRS, UMR5504, F-31400 Toulouse, France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés, F-31400 Toulouse, France
| | - Thomas Schiex
- INRA MIAT, UR 875, Castanet-Tolosan, 31326 Cedex, France
| |
Collapse
|
86
|
Rosenfeld L, Shirian J, Zur Y, Levaot N, Shifman JM, Papo N. Combinatorial and Computational Approaches to Identify Interactions of Macrophage Colony-stimulating Factor (M-CSF) and Its Receptor c-FMS. J Biol Chem 2015; 290:26180-93. [PMID: 26359491 PMCID: PMC4646268 DOI: 10.1074/jbc.m115.671271] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Revised: 09/06/2015] [Indexed: 01/06/2023] Open
Abstract
The molecular interactions between macrophage colony-stimulating factor (M-CSF) and the tyrosine kinase receptor c-FMS play a key role in the immune response, bone metabolism, and the development of some cancers. Because no x-ray structure is available for the human M-CSF · c-FMS complex, the binding epitope for this complex is largely unknown. Our goal was to identify the residues that are essential for binding of the human M-CSF to c-FMS. For this purpose, we used a yeast surface display (YSD) approach. We expressed a combinatorial library of monomeric M-CSF (M-CSFM) single mutants and screened this library to isolate variants with reduced affinity for c-FMS using FACS. Sequencing yielded a number of single M-CSFM variants with mutations both in the direct binding interface and distant from the binding site. In addition, we used computational modeling to map the identified mutations onto the M-CSFM structure and to classify the mutations into three groups as follows: those that significantly decrease protein stability; those that destroy favorable intermolecular interactions; and those that decrease affinity through allosteric effects. To validate the YSD and computational data, M-CSFM and three variants were produced as soluble proteins; their affinity and structure were analyzed; and very good correlations with both YSD data and computational predictions were obtained. By identifying the M-CSFM residues critical for M-CSF · c-FMS interactions, we have laid down the basis for a deeper understanding of the M-CSF · c-FMS signaling mechanism and for the development of target-specific therapeutic agents with the ability to sterically occlude the M-CSF·c-FMS binding interface.
Collapse
Affiliation(s)
- Lior Rosenfeld
- From the Department of Biotechnology Engineering and the National Institute of Biotechnology in the Negev, and
| | - Jason Shirian
- the Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Yuval Zur
- From the Department of Biotechnology Engineering and the National Institute of Biotechnology in the Negev, and the Department of Physiology and Cell Biology, Ben-Gurion University of the Negev, Beer-Sheva 8410501 and
| | - Noam Levaot
- the Department of Physiology and Cell Biology, Ben-Gurion University of the Negev, Beer-Sheva 8410501 and
| | - Julia M Shifman
- the Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Niv Papo
- From the Department of Biotechnology Engineering and the National Institute of Biotechnology in the Negev, and
| |
Collapse
|
87
|
Huang YM, Banerjee S, Crone DE, Schenkelberg CD, Pitman DJ, Buck PM, Bystroff C. Toward Computationally Designed Self-Reporting Biosensors Using Leave-One-Out Green Fluorescent Protein. Biochemistry 2015; 54:6263-73. [PMID: 26397806 DOI: 10.1021/acs.biochem.5b00786] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Leave-one-out green fluorescent protein (LOOn-GFP) is a circularly permuted and truncated GFP lacking the nth β-strand element. LOO7-GFP derived from the wild-type sequence (LOO7-WT) folds and reconstitutes fluorescence upon addition of β-strand 7 (S7) as an exogenous peptide. Computational protein design may be used to modify the sequence of LOO7-GFP to fit a different peptide sequence, while retaining the reconstitution activity. Here we present a computationally designed leave-one-out GFP in which wild-type strand 7 has been replaced by a 12-residue peptide (HA) from the H5 antigenic region of the Thailand strain of H5N1 influenza virus hemagglutinin. The DEEdesign software was used to generate a sequence library with mutations at 13 positions around the peptide, coding for approximately 3 × 10(5) sequence combinations. The library was coexpressed with the HA peptide in E. coli and colonies were screened for in vivo fluorescence. Glowing colonies were sequenced, and one (LOO7-HA4) with 7 mutations was purified and characterized. LOO7-HA4 folds, fluoresces in vivo and in vitro, and binds HA. However, binding results in a decrease in fluorescence instead of the expected increase, caused by the peptide-induced dissociation of a novel, glowing oligomeric complex instead of the reconstitution of the native structure. Efforts to improve binding and recover reconstitution using in vitro evolution produced colonies that glowed brighter and matured faster. Two of these were characterized. One lost all affinity for the HA peptide but glowed more brightly in the unbound oligomeric state. The other increased in affinity to the HA peptide but still did not reconstitute the fully folded state. Despite failing to fold completely, peptide binding by computational design was observed and was improved by directed evolution. The ratio of HA to S7 binding increased from 0.0 for the wild-type sequence (no binding) to 0.01 after computational design (weak binding) and to 0.48 (comparable binding) after in vitro evolution. The novel oligomeric state is composed of an open barrel.
Collapse
Affiliation(s)
- Yao-Ming Huang
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco , San Francisco, California 94158, United States
| | | | | | | | | | | | | |
Collapse
|
88
|
Roberts KE, Gainza P, Hallen MA, Donald BR. Fast gap-free enumeration of conformations and sequences for protein design. Proteins 2015; 83:1859-1877. [PMID: 26235965 DOI: 10.1002/prot.24870] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Revised: 07/14/2015] [Accepted: 07/21/2015] [Indexed: 12/12/2022]
Abstract
Despite significant successes in structure-based computational protein design in recent years, protein design algorithms must be improved to increase the biological accuracy of new designs. Protein design algorithms search through an exponential number of protein conformations, protein ensembles, and amino acid sequences in an attempt to find globally optimal structures with a desired biological function. To improve the biological accuracy of protein designs, it is necessary to increase both the amount of protein flexibility allowed during the search and the overall size of the design, while guaranteeing that the lowest-energy structures and sequences are found. DEE/A*-based algorithms are the most prevalent provable algorithms in the field of protein design and can provably enumerate a gap-free list of low-energy protein conformations, which is necessary for ensemble-based algorithms that predict protein binding. We present two classes of algorithmic improvements to the A* algorithm that greatly increase the efficiency of A*. First, we analyze the effect of ordering the expansion of mutable residue positions within the A* tree and present a dynamic residue ordering that reduces the number of A* nodes that must be visited during the search. Second, we propose new methods to improve the conformational bounds used to estimate the energies of partial conformations during the A* search. The residue ordering techniques and improved bounds can be combined for additional increases in A* efficiency. Our enhancements enable all A*-based methods to more fully search protein conformation space, which will ultimately improve the accuracy of complex biomedically relevant designs.
Collapse
Affiliation(s)
- Kyle E Roberts
- Department of Computer Science, Duke University, Durham, NC
| | - Pablo Gainza
- Department of Computer Science, Duke University, Durham, NC
| | - Mark A Hallen
- Department of Computer Science, Duke University, Durham, NC
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC.,Department of Biochemistry, Duke University Medical Center, Durham, NC.,Department of Chemistry, Duke University, Durham, NC
| |
Collapse
|
89
|
LuCore SD, Litman JM, Powers KT, Gao S, Lynn AM, Tollefson WTA, Fenn TD, Washington MT, Schnieders MJ. Dead-End Elimination with a Polarizable Force Field Repacks PCNA Structures. Biophys J 2015; 109:816-26. [PMID: 26287633 PMCID: PMC4547145 DOI: 10.1016/j.bpj.2015.06.062] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Revised: 06/07/2015] [Accepted: 06/29/2015] [Indexed: 11/15/2022] Open
Abstract
A balance of van der Waals, electrostatic, and hydrophobic forces drive the folding and packing of protein side chains. Although such interactions between residues are often approximated as being pairwise additive, in reality, higher-order many-body contributions that depend on environment drive hydrophobic collapse and cooperative electrostatics. Beginning from dead-end elimination, we derive the first algorithm, to our knowledge, capable of deterministic global repacking of side chains compatible with many-body energy functions. The approach is applied to seven PCNA x-ray crystallographic data sets with resolutions 2.5-3.8 Å (mean 3.0 Å) using an open-source software. While PDB_REDO models average an Rfree value of 29.5% and MOLPROBITY score of 2.71 Å (77th percentile), dead-end elimination with the polarizable AMOEBA force field lowered Rfree by 2.8-26.7% and improved mean MOLPROBITY score to atomic resolution at 1.25 Å (100th percentile). For structural biology applications that depend on side-chain repacking, including x-ray refinement, homology modeling, and protein design, the accuracy limitations of pairwise additivity can now be eliminated via polarizable or quantum mechanical potentials.
Collapse
Affiliation(s)
- Stephen D LuCore
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa
| | - Jacob M Litman
- Department of Biochemistry, University of Iowa, Iowa City, Iowa
| | - Kyle T Powers
- Department of Biochemistry, University of Iowa, Iowa City, Iowa
| | - Shibo Gao
- Department of Biochemistry, University of Iowa, Iowa City, Iowa
| | - Ava M Lynn
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa
| | | | | | | | - Michael J Schnieders
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa; Department of Biochemistry, University of Iowa, Iowa City, Iowa.
| |
Collapse
|
90
|
Kafurke U, Erijman A, Aizner Y, Shifman JM, Eichler J. Synthetic peptides mimicking the binding site of human acetylcholinesterase for its inhibitor fasciculin 2. J Pept Sci 2015. [DOI: 10.1002/psc.2797] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Uwe Kafurke
- Department of Chemistry and Pharmacy; University of Erlangen-Nuremberg; Schuhstr. 19 91052 Erlangen Germany
| | - Ariel Erijman
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences; The Hebrew University of Jerusalem; Jerusalem 91904 Israel
| | - Yonatan Aizner
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences; The Hebrew University of Jerusalem; Jerusalem 91904 Israel
| | - Julia M. Shifman
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences; The Hebrew University of Jerusalem; Jerusalem 91904 Israel
| | - Jutta Eichler
- Department of Chemistry and Pharmacy; University of Erlangen-Nuremberg; Schuhstr. 19 91052 Erlangen Germany
| |
Collapse
|
91
|
Chino M, Maglio O, Nastri F, Pavone V, DeGrado WF, Lombardi A. Artificial Diiron Enzymes with a De Novo Designed Four-Helix Bundle Structure. Eur J Inorg Chem 2015; 2015:3371-3390. [PMID: 27630532 PMCID: PMC5019575 DOI: 10.1002/ejic.201500470] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2015] [Indexed: 12/26/2022]
Abstract
A single polypeptide chain may provide an astronomical number of conformers. Nature selected only a trivial number of them through evolution, composing an alphabet of scaffolds, that can afford the complete set of chemical reactions needed to support life. These structural templates are so stable that they allow several mutations without disruption of the global folding, even having the ability to bind several exogenous cofactors. With this perspective, metal cofactors play a crucial role in the regulation and catalysis of several processes. Nature is able to modulate the chemistry of metals, adopting only a few ligands and slightly different geometries. Several scaffolds and metal-binding motifs are representing the focus of intense interest in the literature. This review discusses the widespread four-helix bundle fold, adopted as a scaffold for metal binding sites in the context of de novo protein design to obtain basic biochemical components for biosensing or catalysis. In particular, we describe the rational refinement of structure/function in diiron-oxo protein models from the due ferri (DF) family. The DF proteins were developed by us through an iterative process of design and rigorous characterization, which has allowed a shift from structural to functional models. The examples reported herein demonstrate the importance of the synergic application of de novo design methods as well as spectroscopic and structural characterization to optimize the catalytic performance of artificial enzymes.
Collapse
Affiliation(s)
- Marco Chino
- Department of Chemical Sciences, University of Naples “Federico II”, Via Cintia, 80126 Naples, Italy
| | - Ornella Maglio
- Department of Chemical Sciences, University of Naples “Federico II”, Via Cintia, 80126 Naples, Italy
- IBB, CNR, Via Mezzocannone 16, 80134 Naples, Italy
| | - Flavia Nastri
- Department of Chemical Sciences, University of Naples “Federico II”, Via Cintia, 80126 Naples, Italy
| | - Vincenzo Pavone
- Department of Structural and Functional Biology, University of Naples “Federico II”, Via Cintia, 80126 Naples, Italy
| | - William F. DeGrado
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco San Francisco, CA 94158, USA
| | - Angela Lombardi
- Department of Chemical Sciences, University of Naples “Federico II”, Via Cintia, 80126 Naples, Italy
| |
Collapse
|
92
|
Roberts KE, Donald BR. Improved energy bound accuracy enhances the efficiency of continuous protein design. Proteins 2015; 83:1151-64. [PMID: 25846627 DOI: 10.1002/prot.24808] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Accepted: 03/24/2015] [Indexed: 11/07/2022]
Abstract
Flexibility and dynamics are important for protein function and a protein's ability to accommodate amino acid substitutions. However, when computational protein design algorithms search over protein structures, the allowed flexibility is often reduced to a relatively small set of discrete side-chain and backbone conformations. While simplifications in scoring functions and protein flexibility are currently necessary to computationally search the vast protein sequence and conformational space, a rigid representation of a protein causes the search to become brittle and miss low-energy structures. Continuous rotamers more closely represent the allowed movement of a side chain within its torsional well and have been successfully incorporated into the protein design framework to design biomedically relevant protein systems. The use of continuous rotamers in protein design enables algorithms to search a larger conformational space than previously possible, but adds additional complexity to the design search. To design large, complex systems with continuous rotamers, new algorithms are needed to increase the efficiency of the search. We present two methods, PartCR and HOT, that greatly increase the speed and efficiency of protein design with continuous rotamers. These methods specifically target the large errors in energetic terms that are used to bound pairwise energies during the design search. By tightening the energy bounds, additional pruning of the conformation space can be achieved, and the number of conformations that must be enumerated to find the global minimum energy conformation is greatly reduced.
Collapse
Affiliation(s)
- Kyle E Roberts
- Department of Computer Science, Duke University, Durham, North Carolina
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, North Carolina.,Department of Biochemistry, Duke University Medical Center, Durham, North Carolina.,Department of Chemistry, Duke University, Durham, North Carolina
| |
Collapse
|
93
|
Moghadasi M, Mirzaei H, Mamonov A, Vakili P, Vajda S, Paschalidis IC, Kozakov D. The impact of side-chain packing on protein docking refinement. J Chem Inf Model 2015; 55:872-81. [PMID: 25714358 PMCID: PMC4734134 DOI: 10.1021/ci500380a] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
We study the impact of optimizing the side-chain positions in the interface region between two proteins during the process of binding. Mathematically, the problem is similar to side-chain prediction, which has been extensively explored in the process of protein structure prediction. The protein-protein docking application, however, has a number of characteristics that necessitate different algorithmic and implementation choices. In this work, we implement a distributed approximate algorithm that can be implemented on multiprocessor architectures and enables a trade-off between accuracy and running speed. We report computational results on benchmarks of enzyme-inhibitor and other types of complexes, establishing that the side-chain flexibility our algorithm introduces substantially improves the performance of docking protocols. Furthermore, we establish that the inclusion of unbound side-chain conformers in the side-chain positioning problem is critical in these performance improvements. The code is available to the community under open source license.
Collapse
Affiliation(s)
- Mohammad Moghadasi
- Division of Systems Engineering & Center for Information and Systems Engineering
| | - Hanieh Mirzaei
- Division of Systems Engineering & Center for Information and Systems Engineering
| | | | - Pirooz Vakili
- Division of Systems Engineering, and Department of Mechanical Engineering
| | | | - Ioannis Ch. Paschalidis
- Department of Electrical and Computer Engineering, Division of Systems Engineering, and Department of Biomedical Engineering
| | | |
Collapse
|
94
|
Stiebritz MT. MetREx: A protein design approach for the exploration of sequence-reactivity relationships in metalloenzymes. J Comput Chem 2015; 36:553-63. [DOI: 10.1002/jcc.23831] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Revised: 12/12/2014] [Accepted: 12/16/2014] [Indexed: 01/10/2023]
Affiliation(s)
- Martin T. Stiebritz
- Laboratorium für Physikalische Chemie, ETH Zürich; Vladimir-Prelog-Weg 2 CH-8093 Zürich Switzerland
| |
Collapse
|
95
|
Approximate Counting with Deterministic Guarantees for Affinity Computation. ADVANCES IN INTELLIGENT SYSTEMS AND COMPUTING 2015. [DOI: 10.1007/978-3-319-18167-7_15] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
96
|
Quan L, Lü Q, Li H, Xia X, Wu H. Improved packing of protein side chains with parallel ant colonies. BMC Bioinformatics 2014; 15 Suppl 12:S5. [PMID: 25474164 PMCID: PMC4251090 DOI: 10.1186/1471-2105-15-s12-s5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
INTRODUCTION The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. METHODS We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. RESULTS We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. CONCLUSIONS This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms.
Collapse
|
97
|
Zhou Y, Xu W, Donald BR, Zeng J. An efficient parallel algorithm for accelerating computational protein design. ACTA ACUST UNITED AC 2014; 30:i255-i263. [PMID: 24931991 PMCID: PMC4058937 DOI: 10.1093/bioinformatics/btu264] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Motivation: Structure-based computational protein design (SCPR) is an important topic in protein engineering. Under the assumption of a rigid backbone and a finite set of discrete conformations of side-chains, various methods have been proposed to address this problem. A popular method is to combine the dead-end elimination (DEE) and A* tree search algorithms, which provably finds the global minimum energy conformation (GMEC) solution. Results: In this article, we improve the efficiency of computing A* heuristic functions for protein design and propose a variant of A* algorithm in which the search process can be performed on a single GPU in a massively parallel fashion. In addition, we make some efforts to address the memory exceeding problem in A* search. As a result, our enhancements can achieve a significant speedup of the A*-based protein design algorithm by four orders of magnitude on large-scale test data through pre-computation and parallelization, while still maintaining an acceptable memory overhead. We also show that our parallel A* search algorithm could be successfully combined with iMinDEE, a state-of-the-art DEE criterion, for rotamer pruning to further improve SCPR with the consideration of continuous side-chain flexibility. Availability: Our software is available and distributed open-source under the GNU Lesser General License Version 2.1 (GNU, February 1999). The source code can be downloaded from http://www.cs.duke.edu/donaldlab/osprey.php or http://iiis.tsinghua.edu.cn/∼compbio/software.html. Contact:zengjy321@tsinghua.edu.cn Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yichao Zhou
- Institute for Theoretical Computer Science (ITCS), Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, P. R. China, Department of Computer Science, Duke University, Durham, NC 27708, USA and Department of Biochemistry, Duke University Medical Center, Durham, NC 27708, USA
| | - Wei Xu
- Institute for Theoretical Computer Science (ITCS), Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, P. R. China, Department of Computer Science, Duke University, Durham, NC 27708, USA and Department of Biochemistry, Duke University Medical Center, Durham, NC 27708, USA
| | - Bruce R Donald
- Institute for Theoretical Computer Science (ITCS), Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, P. R. China, Department of Computer Science, Duke University, Durham, NC 27708, USA and Department of Biochemistry, Duke University Medical Center, Durham, NC 27708, USAInstitute for Theoretical Computer Science (ITCS), Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, P. R. China, Department of Computer Science, Duke University, Durham, NC 27708, USA and Department of Biochemistry, Duke University Medical Center, Durham, NC 27708, USA
| | - Jianyang Zeng
- Institute for Theoretical Computer Science (ITCS), Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, P. R. China, Department of Computer Science, Duke University, Durham, NC 27708, USA and Department of Biochemistry, Duke University Medical Center, Durham, NC 27708, USA
| |
Collapse
|
98
|
Subramaniam S, Senes A. Backbone dependency further improves side chain prediction efficiency in the Energy-based Conformer Library (bEBL). Proteins 2014; 82:3177-87. [PMID: 25212195 DOI: 10.1002/prot.24685] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Revised: 08/21/2014] [Accepted: 09/03/2014] [Indexed: 12/11/2022]
Abstract
Side chain optimization is an integral component of many protein modeling applications. In these applications, the conformational freedom of the side chains is often explored using libraries of discrete, frequently occurring conformations. Because side chain optimization can pose a computationally intensive combinatorial problem, the nature of these conformer libraries is important for ensuring efficiency and accuracy in side chain prediction. We have previously developed an innovative method to create a conformer library with enhanced performance. The Energy-based Library (EBL) was obtained by analyzing the energetic interactions between conformers and a large number of natural protein environments from crystal structures. This process guided the selection of conformers with the highest propensity to fit into spaces that should accommodate a side chain. Because the method requires a large crystallographic data-set, the EBL was created in a backbone-independent fashion. However, it is well established that side chain conformation is strongly dependent on the local backbone geometry, and that backbone-dependent libraries are more efficient in side chain optimization. Here we present the backbone-dependent EBL (bEBL), whose conformers are independently sorted for each populated region of Ramachandran space. The resulting library closely mirrors the local backbone-dependent distribution of side chain conformation. Compared to the EBL, we demonstrate that the bEBL uses fewer conformers to produce similar side chain prediction outcomes, thus further improving performance with respect to the already efficient backbone-independent version of the library.
Collapse
Affiliation(s)
- Sabareesh Subramaniam
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin, 53706
| | | |
Collapse
|
99
|
Peterson LX, Kang X, Kihara D. Assessment of protein side-chain conformation prediction methods in different residue environments. Proteins 2014; 82:1971-84. [PMID: 24619909 PMCID: PMC5007623 DOI: 10.1002/prot.24552] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 03/02/2014] [Accepted: 03/07/2014] [Indexed: 11/09/2022]
Abstract
Computational prediction of side-chain conformation is an important component of protein structure prediction. Accurate side-chain prediction is crucial for practical applications of protein structure models that need atomic-detailed resolution such as protein and ligand design. We evaluated the accuracy of eight side-chain prediction methods in reproducing the side-chain conformations of experimentally solved structures deposited to the Protein Data Bank. Prediction accuracy was evaluated for a total of four different structural environments (buried, surface, interface, and membrane-spanning) in three different protein types (monomeric, multimeric, and membrane). Overall, the highest accuracy was observed for buried residues in monomeric and multimeric proteins. Notably, side-chains at protein interfaces and membrane-spanning regions were better predicted than surface residues even though the methods did not all use multimeric and membrane proteins for training. Thus, we conclude that the current methods are as practically useful for modeling protein docking interfaces and membrane-spanning regions as for modeling monomers.
Collapse
Affiliation(s)
- Lenna X. Peterson
- Department of Biological Sciences, Purdue University, West Lafayette IN, 47907, USA
| | - Xuejiao Kang
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette IN, 47907, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
100
|
On simplified global nonlinear function for fitness landscape: a case study of inverse protein folding. PLoS One 2014; 9:e104403. [PMID: 25110986 PMCID: PMC4128808 DOI: 10.1371/journal.pone.0104403] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2013] [Accepted: 07/14/2014] [Indexed: 11/19/2022] Open
Abstract
The construction of fitness landscape has broad implication in understanding molecular evolution, cellular epigenetic state, and protein structures. We studied the problem of constructing fitness landscape of inverse protein folding or protein design, with the aim to generate amino acid sequences that would fold into an a priori determined structural fold which would enable engineering novel or enhanced biochemistry. For this task, an effective fitness function should allow identification of correct sequences that would fold into the desired structure. In this study, we showed that nonlinear fitness function for protein design can be constructed using a rectangular kernel with a basis set of proteins and decoys chosen a priori. The full landscape for a large number of protein folds can be captured using only 480 native proteins and 3,200 non-protein decoys via a finite Newton method. A blind test of a simplified version of fitness function for sequence design was carried out to discriminate simultaneously 428 native sequences not homologous to any training proteins from 11 million challenging protein-like decoys. This simplified function correctly classified 408 native sequences (20 misclassifications, 95% correct rate), which outperforms several other statistical linear scoring function and optimized linear function. Our results further suggested that for the task of global sequence design of 428 selected proteins, the search space of protein shape and sequence can be effectively parametrized with just about 3,680 carefully chosen basis set of proteins and decoys, and we showed in addition that the overall landscape is not overly sensitive to the specific choice of this set. Our results can be generalized to construct other types of fitness landscape.
Collapse
|