1
|
Chen X, Liu J, Park N, Cheng J. A Survey of Deep Learning Methods for Estimating the Accuracy of Protein Quaternary Structure Models. Biomolecules 2024; 14:574. [PMID: 38785981 PMCID: PMC11117562 DOI: 10.3390/biom14050574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/07/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
The quality prediction of quaternary structure models of a protein complex, in the absence of its true structure, is known as the Estimation of Model Accuracy (EMA). EMA is useful for ranking predicted protein complex structures and using them appropriately in biomedical research, such as protein-protein interaction studies, protein design, and drug discovery. With the advent of more accurate protein complex (multimer) prediction tools, such as AlphaFold2-Multimer and ESMFold, the estimation of the accuracy of protein complex structures has attracted increasing attention. Many deep learning methods have been developed to tackle this problem; however, there is a noticeable absence of a comprehensive overview of these methods to facilitate future development. Addressing this gap, we present a review of deep learning EMA methods for protein complex structures developed in the past several years, analyzing their methodologies, data and feature construction. We also provide a prospective summary of some potential new developments for further improving the accuracy of the EMA methods.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Nolan Park
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
2
|
Chen X, Morehead A, Liu J, Cheng J. A gated graph transformer for protein complex structure quality assessment and its performance in CASP15. Bioinformatics 2023; 39:i308-i317. [PMID: 37387159 PMCID: PMC10311325 DOI: 10.1093/bioinformatics/btad203] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Proteins interact to form complexes to carry out essential biological functions. Computational methods such as AlphaFold-multimer have been developed to predict the quaternary structures of protein complexes. An important yet largely unsolved challenge in protein complex structure prediction is to accurately estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be used to select high-quality predicted complex structures to facilitate biomedical research such as protein function analysis and drug discovery. RESULTS In this work, we introduce a new gated neighborhood-modulating graph transformer to predict the quality of 3D protein complex structures. It incorporates node and edge gates within a graph transformer framework to control information flow during graph message passing. We trained, evaluated and tested the method (called DProQA) on newly-curated protein complex datasets before the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) and then blindly tested it in the 2022 CASP15 experiment. The method was ranked 3rd among the single-model quality assessment methods in CASP15 in terms of the ranking loss of TM-score on 36 complex targets. The rigorous internal and external experiments demonstrate that DProQA is effective in ranking protein complex structures. AVAILABILITY AND IMPLEMENTATION The source code, data, and pre-trained models are available at https://github.com/jianlin-cheng/DProQA.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Alex Morehead
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| |
Collapse
|
3
|
Scalvini B, Sheikhhassani V, Woodard J, Aupič J, Dame RT, Jerala R, Mashaghi A. Topology of Folded Molecular Chains: From Single Biomolecules to Engineered Origami. TRENDS IN CHEMISTRY 2020. [DOI: 10.1016/j.trechm.2020.04.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
4
|
Villa F, Mignon D, Polydorides S, Simonson T. Comparing pairwise-additive and many-body generalized Born models for acid/base calculations and protein design. J Comput Chem 2017; 38:2396-2410. [PMID: 28749575 DOI: 10.1002/jcc.24898] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 06/30/2017] [Accepted: 07/06/2017] [Indexed: 12/13/2022]
Abstract
Generalized Born (GB) solvent models are common in acid/base calculations and protein design. With GB, the interaction between a pair of solute atoms depends on the shape of the protein/solvent boundary and, therefore, the positions of all solute atoms, so that GB is a many-body potential. For compute-intensive applications, the model is often simplified further, by introducing a mean, native-like protein/solvent boundary, which removes the many-body property. We investigate a method for both acid/base calculations and protein design that uses Monte Carlo simulations in which side chains can explore rotamers, bind/release protons, or mutate. The fluctuating protein/solvent dielectric boundary is treated in a way that is numerically exact (within the GB framework), in contrast to a mean boundary. Its originality is that it captures the many-body character while retaining the residue-pairwise complexity given by a fixed boundary. The method is implemented in the Proteus protein design software. It yields a slight but systematic improvement for acid/base constants in nine proteins and a significant improvement for the computational design of three PDZ domains. It eliminates a source of model uncertainty, which will facilitate the analysis of other model limitations. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Francesco Villa
- Ecole Polytechnique, Laboratoire de Biochimie (CNRS UMR7654), Palaiseau, 91128, France
| | - David Mignon
- Ecole Polytechnique, Laboratoire de Biochimie (CNRS UMR7654), Palaiseau, 91128, France
| | - Savvas Polydorides
- Ecole Polytechnique, Laboratoire de Biochimie (CNRS UMR7654), Palaiseau, 91128, France
| | - Thomas Simonson
- Ecole Polytechnique, Laboratoire de Biochimie (CNRS UMR7654), Palaiseau, 91128, France
| |
Collapse
|
5
|
Mignon D, Panel N, Chen X, Fuentes EJ, Simonson T. Computational Design of the Tiam1 PDZ Domain and Its Ligand Binding. J Chem Theory Comput 2017; 13:2271-2289. [PMID: 28394603 DOI: 10.1021/acs.jctc.6b01255] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
PDZ domains direct protein-protein interactions and serve as models for protein design. Here, we optimized a protein design energy function for the Tiam1 and Cask PDZ domains that combines a molecular mechanics energy, Generalized Born solvent, and an empirical unfolded state model. Designed sequences were recognized as PDZ domains by the Superfamily fold recognition tool and had similarity scores comparable to natural PDZ sequences. The optimized model was used to redesign the two PDZ domains, by gradually varying the chemical potential of hydrophobic amino acids; the tendency of each position to lose or gain a hydrophobic character represents a novel hydrophobicity index. We also redesigned four positions in the Tiam1 PDZ domain involved in peptide binding specificity. The calculated affinity differences between designed variants reproduced experimental data and suggest substitutions with altered specificities.
Collapse
Affiliation(s)
- David Mignon
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| | - Nicolas Panel
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| | - Xingyu Chen
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| | - Ernesto J Fuentes
- Department of Biochemistry, Roy J. & Lucille A. Carver College of Medicine and Holden Comprehensive Cancer Center, University of Iowa , Iowa City, Iowa 52242-1109, United States
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| |
Collapse
|
6
|
Shrivastava AK, Kumar S, Sahu PS, Mahapatra RK. In silico identification and validation of a novel hypothetical protein in Cryptosporidium hominis and virtual screening of inhibitors as therapeutics. Parasitol Res 2017; 116:1533-1544. [PMID: 28389892 DOI: 10.1007/s00436-017-5430-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Accepted: 03/21/2017] [Indexed: 01/18/2023]
Abstract
Computational approaches to predict structure/function and other biological characteristics of proteins are becoming more common in comparison to the traditional methods in drug discovery. Cryptosporidiosis is a major zoonotic diarrheal disease particularly in children, which is caused primarily by Cryptosporidium hominis and Cryptosporidium parvum. Currently, there are no vaccines for cryptosporidiosis and recommended drugs are ineffective. With the availability of complete genome sequence of C. hominis, new targets have been recognized for the development of effective and better drugs and/or vaccines. We identified a unique hypothetical protein (TU502HP) in the C. hominis genome from the CryptoDB database. A three-dimensional model of the protein was generated using the Iterative Threading ASSEmbly Refinement server through an iterative threading method. Functional annotation and phylogenetic study of TU502HP protein revealed similarity with human transportin 3. The model is further subjected to a virtual screening study form the ZINC database compound library using the Dock Blaster server. A docking study through AutoDock software reported N-(3-chlorobenzyl)ethane-1,2-diamine as the best inhibitor in terms of docking score and binding energy. The reliability of the binding mode of the inhibitor is confirmed by a complex molecular dynamics simulation study using GROMACS software for 10 ns in the water environment. Furthermore, antigenic determinants of the protein were determined with the help of DNASTAR software. Our findings report a great potential in order to provide insights in the development of new drug(s) or vaccine(s) for treatment and prophylaxis of cryptosporidiosis among humans and animals.
Collapse
Affiliation(s)
| | - Subrat Kumar
- School of Biotechnology, KIIT University, Bhubaneswar, Odisha, India
| | - Priyadarshi Soumyaranjan Sahu
- School of Biotechnology, KIIT University, Bhubaneswar, Odisha, India.
- Divisions of Pathology, School of Medicine, International Medical University, 57000, Kuala Lumpur, Malaysia.
| | | |
Collapse
|
7
|
Gainza P, Nisonoff HM, Donald BR. Algorithms for protein design. Curr Opin Struct Biol 2016; 39:16-26. [PMID: 27086078 PMCID: PMC5065368 DOI: 10.1016/j.sbi.2016.03.006] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Revised: 03/15/2016] [Accepted: 03/22/2016] [Indexed: 02/05/2023]
Abstract
Computational structure-based protein design programs are becoming an increasingly important tool in molecular biology. These programs compute protein sequences that are predicted to fold to a target structure and perform a desired function. The success of a program's predictions largely relies on two components: first, the input biophysical model, and second, the algorithm that computes the best sequence(s) and structure(s) according to the biophysical model. Improving both the model and the algorithm in tandem is essential to improving the success rate of current programs, and here we review recent developments in algorithms for protein design, emphasizing how novel algorithms enable the use of more accurate biophysical models. We conclude with a list of algorithmic challenges in computational protein design that we believe will be especially important for the design of therapeutic proteins and protein assemblies.
Collapse
Affiliation(s)
- Pablo Gainza
- Department of Computer Science, Duke University, Durham, NC, United States
| | - Hunter M Nisonoff
- Department of Computer Science, Duke University, Durham, NC, United States
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC, United States; Department of Biochemistry, Duke University Medical Center, Durham, NC, United States; Department of Chemistry, Duke University, Durham, NC, United States.
| |
Collapse
|
8
|
Lipska AG, Seidman SR, Sieradzan AK, Giełdoń A, Liwo A, Scheraga HA. Molecular dynamics of protein A and a WW domain with a united-residue model including hydrodynamic interaction. J Chem Phys 2016; 144:184110. [PMID: 27179474 PMCID: PMC4866947 DOI: 10.1063/1.4948710] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 04/25/2016] [Indexed: 01/01/2023] Open
Abstract
The folding of the N-terminal part of the B-domain of staphylococcal protein A (PDB ID: 1BDD, a 46-residue three-α-helix bundle) and the formin-binding protein 28 WW domain (PDB ID: 1E0L, a 37-residue three-stranded anti-parallel β protein) was studied by means of Langevin dynamics with the coarse-grained UNRES force field to assess the influence of hydrodynamic interactions on protein-folding pathways and kinetics. The unfolded, intermediate, and native-like structures were identified by cluster analysis, and multi-exponential functions were fitted to the time dependence of the fractions of native and intermediate structures, respectively, to determine bulk kinetics. It was found that introducing hydrodynamic interactions slows down both the formation of an intermediate state and the transition from the collapsed structures to the final native-like structures by creating multiple kinetic traps. Therefore, introducing hydrodynamic interactions considerably slows the folding, as opposed to the results obtained from earlier studies with the use of Gō-like models.
Collapse
Affiliation(s)
- Agnieszka G Lipska
- Laboratory of Molecular Modeling, Faculty of Chemistry, University of Gdańsk, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Steven R Seidman
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853-1301, USA
| | - Adam K Sieradzan
- Laboratory of Molecular Modeling, Faculty of Chemistry, University of Gdańsk, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Artur Giełdoń
- Laboratory of Molecular Modeling, Faculty of Chemistry, University of Gdańsk, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Adam Liwo
- Laboratory of Molecular Modeling, Faculty of Chemistry, University of Gdańsk, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Harold A Scheraga
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853-1301, USA
| |
Collapse
|
9
|
Islam MA, Bhayye S, Adeniyi AA, Soliman ME, Pillay TS. Diabetes mellitus caused by mutations in human insulin: analysis of impaired receptor binding of insulins Wakayama, Los Angeles and Chicago using pharmacoinformatics. J Biomol Struct Dyn 2016; 35:724-737. [DOI: 10.1080/07391102.2016.1160258] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Md Ataul Islam
- Faculty of Health Sciences, Department of Chemical Pathology, & Institute of Cellular & Molecular Medicine, University of Pretoria and National Health Laboratory Service Tshwane Academic Division, Pretoria, South Africa
| | - Sagar Bhayye
- Department of Chemical Technology, University of Calcutta, 92, A. P. C. Road, Kolkata 700009, India
| | - Adebayo A. Adeniyi
- Discipline of Pharmaceutical Sciences, School of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| | - Mahmoud E.S. Soliman
- Discipline of Pharmaceutical Sciences, School of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| | - Tahir S. Pillay
- Faculty of Health Sciences, Department of Chemical Pathology, & Institute of Cellular & Molecular Medicine, University of Pretoria and National Health Laboratory Service Tshwane Academic Division, Pretoria, South Africa
- Division of Chemical Pathology, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
10
|
Abstract
The Rosetta macromolecular modeling software is a versatile, rapidly developing set of tools that are now being routinely utilized to address state-of-the-art research challenges in academia and industrial research settings. A Rosetta Conference (RosettaCon) describing updates to the Rosetta source code is held annually. Every two years, a Rosetta Conference (RosettaCon) special collection describing the results presented at the annual conference by participating RosettaCommons labs is published by the Public Library of Science (PLOS). This is the introduction to the third RosettaCon 2014 Special Collection published by PLOS.
Collapse
Affiliation(s)
- Sagar D. Khare
- Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ, United States of America
- * E-mail: (SDK); (TAW)
| | - Timothy A. Whitehead
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI, United States of America
- Department of Biosystems and Agricultural Engineering, Michigan State University, East Lansing, MI, United States of America
- * E-mail: (SDK); (TAW)
| |
Collapse
|
11
|
Zhang Z, Schindler CEM, Lange OF, Zacharias M. Application of Enhanced Sampling Monte Carlo Methods for High-Resolution Protein-Protein Docking in Rosetta. PLoS One 2015; 10:e0125941. [PMID: 26053419 PMCID: PMC4459952 DOI: 10.1371/journal.pone.0125941] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Accepted: 03/26/2015] [Indexed: 11/30/2022] Open
Abstract
The high-resolution refinement of docked protein-protein complexes can provide valuable structural and mechanistic insight into protein complex formation complementing experiment. Monte Carlo (MC) based approaches are frequently applied to sample putative interaction geometries of proteins including also possible conformational changes of the binding partners. In order to explore efficiency improvements of the MC sampling, several enhanced sampling techniques, including temperature or Hamiltonian replica exchange and well-tempered ensemble approaches, have been combined with the MC method and were evaluated on 20 protein complexes using unbound partner structures. The well-tempered ensemble method combined with a 2-dimensional temperature and Hamiltonian replica exchange scheme (WTE-H-REMC) was identified as the most efficient search strategy. Comparison with prolonged MC searches indicates that the WTE-H-REMC approach requires approximately 5 times fewer MC steps to identify near native docking geometries compared to conventional MC searches.
Collapse
Affiliation(s)
- Zhe Zhang
- Physik-Department T38, Technische Universität München, James-Franck-Str. 1, 84748 Garching, Germany
| | | | - Oliver F. Lange
- Biomolecular NMR and Munich Center for Integrated Protein Science, Department Chemie, Technische Universität München, Lichtenbergstr. 4, 85748 Garching, Germany
| | - Martin Zacharias
- Physik-Department T38, Technische Universität München, James-Franck-Str. 1, 84748 Garching, Germany
- * E-mail:
| |
Collapse
|
12
|
Castro CE, Su HJ, Marras AE, Zhou L, Johnson J. Mechanical design of DNA nanostructures. NANOSCALE 2015; 7:5913-21. [PMID: 25655237 DOI: 10.1039/c4nr07153k] [Citation(s) in RCA: 88] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Structural DNA nanotechnology is a rapidly emerging field that has demonstrated great potential for applications such as single molecule sensing, drug delivery, and templating molecular components. As the applications of DNA nanotechnology expand, a consideration of their mechanical behavior is becoming essential to understand how these structures will respond to physical interactions. This review considers three major avenues of recent progress in this area: (1) measuring and designing mechanical properties of DNA nanostructures, (2) designing complex nanostructures based on imposed mechanical stresses, and (3) designing and controlling structurally dynamic nanostructures. This work has laid the foundation for mechanically active nanomachines that can generate, transmit, and respond to physical cues in molecular systems.
Collapse
Affiliation(s)
- Carlos E Castro
- Department of Mechanical and Aerospace Engineering, The Ohio State University, Columbus, OH 43210, USA.
| | | | | | | | | |
Collapse
|
13
|
Abstract
We have come a long way in the 55 years since Edmond Fischer and the late Edwin Krebs discovered that the activity of glycogen phosphorylase is regulated by reversible protein phosphorylation. Many of the fundamental molecular mechanisms that operate in biological signaling have since been characterized and the vast web of interconnected pathways that make up the cellular signaling network has been mapped in considerable detail. Nonetheless, it is important to consider how fast this field is still moving and the issues at the current boundaries of our understanding. One must also appreciate what experimental strategies have allowed us to attain our present level of knowledge. We summarize here some key issues (both conceptual and methodological), raise unresolved questions, discuss potential pitfalls, and highlight areas in which our understanding is still rudimentary. We hope these wide-ranging ruminations will be useful to investigators who carry studies of signal transduction forward during the rest of the 21st century.
Collapse
|
14
|
Advances in Human Biology: Combining Genetics and Molecular Biophysics to Pave the Way for Personalized Diagnostics and Medicine. ACTA ACUST UNITED AC 2014. [DOI: 10.1155/2014/471836] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Advances in several biology-oriented initiatives such as genome sequencing and structural genomics, along with the progress made through traditional biological and biochemical research, have opened up a unique opportunity to better understand the molecular effects of human diseases. Human DNA can vary significantly from person to person and determines an individual’s physical characteristics and their susceptibility to diseases. Armed with an individual’s DNA sequence, researchers and physicians can check for defects known to be associated with certain diseases by utilizing various databases. However, for unclassified DNA mutations or in order to reveal molecular mechanism behind the effects, the mutations have to be mapped onto the corresponding networks and macromolecular structures and then analyzed to reveal their effect on the wild type properties of biological processes involved. Predicting the effect of DNA mutations on individual’s health is typically referred to as personalized or companion diagnostics. Furthermore, once the molecular mechanism of the mutations is revealed, the patient should be given drugs which are the most appropriate for the individual genome, referred to as pharmacogenomics. Altogether, the shift in focus in medicine towards more genomic-oriented practices is the foundation of personalized medicine. The progress made in these rapidly developing fields is outlined.
Collapse
|
15
|
Polydorides S, Simonson T. Monte Carlo simulations of proteins at constant pH with generalized Born solvent, flexible sidechains, and an effective dielectric boundary. J Comput Chem 2013; 34:2742-56. [PMID: 24122878 DOI: 10.1002/jcc.23450] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Revised: 09/04/2013] [Accepted: 09/08/2013] [Indexed: 12/11/2022]
Abstract
Titratable residues determine the acid/base behavior of proteins, strongly influencing their function; in addition, proton binding is a valuable reporter on electrostatic interactions. We describe a method for pK(a) calculations, using constant-pH Monte Carlo (MC) simulations to explore the space of sidechain conformations and protonation states, with an efficient and accurate generalized Born model (GB) for the solvent effects. To overcome the many-body dependency of the GB model, we use a "Native Environment" approximation, whose accuracy is shown to be good. It allows the precalculation and storage of interactions between all sidechain pairs, a strategy borrowed from computational protein design, which makes the MC simulations themselves very fast. The method is tested for 12 proteins and 167 titratable sidechains. It gives an rms error of 1.1 pH units, similar to the trivial "Null" model. The only adjustable parameter is the protein dielectric constant. The best accuracy is achieved for values between 4 and 8, a range that is physically plausible for a protein interior. For sidechains with large pKa shifts, ≥2, the rms error is 1.6, compared to 2.5 with the Null model and 1.5 with the empirical PROPKA method.
Collapse
Affiliation(s)
- Savvas Polydorides
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | | |
Collapse
|
16
|
Moal IH, Torchala M, Bates PA, Fernández-Recio J. The scoring of poses in protein-protein docking: current capabilities and future directions. BMC Bioinformatics 2013; 14:286. [PMID: 24079540 PMCID: PMC3850738 DOI: 10.1186/1471-2105-14-286] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 09/25/2013] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Protein-protein docking, which aims to predict the structure of a protein-protein complex from its unbound components, remains an unresolved challenge in structural bioinformatics. An important step is the ranking of docked poses using a scoring function, for which many methods have been developed. There is a need to explore the differences and commonalities of these methods with each other, as well as with functions developed in the fields of molecular dynamics and homology modelling. RESULTS We present an evaluation of 115 scoring functions on an unbound docking decoy benchmark covering 118 complexes for which a near-native solution can be found, yielding top 10 success rates of up to 58%. Hierarchical clustering is performed, so as to group together functions which identify near-natives in similar subsets of complexes. Three set theoretic approaches are used to identify pairs of scoring functions capable of correctly scoring different complexes. This shows that functions in different clusters capture different aspects of binding and are likely to work together synergistically. CONCLUSIONS All functions designed specifically for docking perform well, indicating that functions are transferable between sampling methods. We also identify promising methods from the field of homology modelling. Further, differential success rates by docking difficulty and solution quality suggest a need for flexibility-dependent scoring. Investigating pairs of scoring functions, the set theoretic measures identify known scoring strategies as well as a number of novel approaches, indicating promising augmentations of traditional scoring methods. Such augmentation and parameter combination strategies are discussed in the context of the learning-to-rank paradigm.
Collapse
Affiliation(s)
- Iain H Moal
- Joint BSC-IRB Research Program in Computational Biology, Life Science Department, Barcelona Super computing Center, Barcelona 08034, Spain
| | - Mieczyslaw Torchala
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, London WC2A 3LY, UK
| | - Paul A Bates
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, London WC2A 3LY, UK
| | - Juan Fernández-Recio
- Joint BSC-IRB Research Program in Computational Biology, Life Science Department, Barcelona Super computing Center, Barcelona 08034, Spain
| |
Collapse
|
17
|
Simonson T, Gaillard T, Mignon D, Schmidt am Busch M, Lopes A, Amara N, Polydorides S, Sedano A, Druart K, Archontis G. Computational protein design: the Proteus software and selected applications. J Comput Chem 2013; 34:2472-84. [PMID: 24037756 DOI: 10.1002/jcc.23418] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Revised: 07/08/2013] [Accepted: 07/28/2013] [Indexed: 12/13/2022]
Abstract
We describe an automated procedure for protein design, implemented in a flexible software package, called Proteus. System setup and calculation of an energy matrix are done with the XPLOR modeling program and its sophisticated command language, supporting several force fields and solvent models. A second program provides algorithms to search sequence space. It allows a decomposition of the system into groups, which can be combined in different ways in the energy function, for both positive and negative design. The whole procedure can be controlled by editing 2-4 scripts. Two applications consider the tyrosyl-tRNA synthetase enzyme and its successful redesign to bind both O-methyl-tyrosine and D-tyrosine. For the latter, we present Monte Carlo simulations where the D-tyrosine concentration is gradually increased, displacing L-tyrosine from the binding pocket and yielding the binding free energy difference, in good agreement with experiment. Complete redesign of the Crk SH3 domain is presented. The top 10000 sequences are all assigned to the correct fold by the SUPERFAMILY library of Hidden Markov Models. Finally, we report the acid/base behavior of the SNase protein. Sidechain protonation is treated as a form of mutation; it is then straightforward to perform constant-pH Monte Carlo simulations, which yield good agreement with experiment. Overall, the software can be used for a wide range of application, producing not only native-like sequences but also thermodynamic properties with errors that appear comparable to other current software packages.
Collapse
Affiliation(s)
- Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, 91128, France
| | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Tulumello DV, Johnson RM, Isupov I, Deber CM. Design, expression, and purification of de novo transmembrane “hairpin” peptides. Biopolymers 2012. [DOI: 10.1002/bip.22149] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
19
|
Herzog F, Kahraman A, Boehringer D, Mak R, Bracher A, Walzthoeni T, Leitner A, Beck M, Hartl FU, Ban N, Malmström L, Aebersold R. Structural Probing of a Protein Phosphatase 2A Network by Chemical Cross-Linking and Mass Spectrometry. Science 2012; 337:1348-52. [PMID: 22984071 DOI: 10.1126/science.1221483] [Citation(s) in RCA: 320] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Franz Herzog
- Department of Biology, Institute of Molecular Systems Biology, Eidgenössische Technische Hochschule Zürich, Wolfgang-Pauli Strasse 16, 8093 Zurich, Switzerland
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Liu Y, Kellogg E, Liang H. Canonical and micro-canonical analysis of folding of trpzip2: An all-atom replica exchange Monte Carlo simulation study. J Chem Phys 2012; 137:045103. [DOI: 10.1063/1.4738760] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
21
|
Reconstructing virus structures from nanometer to near-atomic resolutions with cryo-electron microscopy and tomography. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 726:49-90. [PMID: 22297510 DOI: 10.1007/978-1-4614-0980-9_4] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
The past few decades have seen tremendous advances in single-particle electron -cryo-microscopy (cryo-EM). The field has matured to the point that near-atomic resolution density maps can be generated for icosahedral viruses without the need for crystallization. In parallel, substantial progress has been made in determining the structures of nonicosahedrally arranged proteins in viruses by employing either single-particle cryo-EM or cryo-electron tomography (cryo-ET). Implicit in this course have been the availability of a new generation of electron cryo-microscopes and the development of the computational tools that are essential for generating these maps and models. This methodology has enabled structural biologists to analyze structures in increasing detail for virus particles that are in different morphogenetic states. Furthermore, electron imaging of frozen, hydrated cells, in the process of being infected by viruses, has also opened up a new avenue for studying virus structures "in situ". Here we present the common techniques used to acquire and process cryo-EM and cryo-ET data and discuss their implications for structural virology both now and in the future.
Collapse
|
22
|
Hoque MT, Chetty M, Lewis A, Sattar A. Twin removal in genetic algorithms for protein structure prediction using low-resolution model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:234-245. [PMID: 21071811 DOI: 10.1109/tcbb.2009.34] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
This paper presents the impact of twins and the measures for their removal from the population of genetic algorithm (GA) when applied to effective conformational searching. It is conclusively shown that a twin removal strategy for a GA provides considerably enhanced performance when investigating solutions to complex ab initio protein structure prediction (PSP) problems in low-resolution model. Without twin removal, GA crossover and mutation operations can become ineffectual as generations lose their ability to produce significant differences, which can lead to the solution stalling. The paper relaxes the definition of chromosomal twins in the removal strategy to not only encompass identical, but also highly correlated chromosomes within the GA population, with empirical results consistently exhibiting significant improvements solving PSP problems.
Collapse
Affiliation(s)
- Md Tamjidul Hoque
- Griffith University, Nathan campus, 170 Kessels Road, Nathan, Brisbane, Qld 4111, Australia.
| | | | | | | |
Collapse
|
23
|
|
24
|
Glembo TJ, Ozkan SB. Union of geometric constraint-based simulations with molecular dynamics for protein structure prediction. Biophys J 2010; 98:1046-54. [PMID: 20303862 PMCID: PMC2849074 DOI: 10.1016/j.bpj.2009.11.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2009] [Revised: 11/05/2009] [Accepted: 11/17/2009] [Indexed: 10/19/2022] Open
Abstract
Although proteins are a fundamental unit in biology, the mechanism by which proteins fold into their native state is not well understood. In this work, we explore the assembly of secondary structure units via geometric constraint-based simulations and the effect of refinement of assembled structures using reservoir replica exchange molecular dynamics. Our approach uses two crucial features of these methods: i), geometric simulations speed up the search for nativelike topologies as there are no energy barriers to overcome; and ii), molecular dynamics identifies the low free energy structures and further refines these structures toward the actual native conformation. We use eight alpha-, beta-, and alpha/beta-proteins to test our method. The geometric simulations of our test set result in an average RMSD from native of 3.7 A and this further reduces to 2.7 A after refinement. We also explore the question of robustness of assembly for inaccurate (shifted and shortened) secondary structure. We find that the RMSD from native is highly dependent on the accuracy of secondary structure input, and even slightly shifting the location of secondary structure along the amino acid sequence can lead to a rapid decrease in RMSD to native due to incorrect packing.
Collapse
Key Words
- casp, critical assessment of techniques for protein structure prediction
- froda, framework rigidity optimized dynamics algorithm
- md, molecular dynamic
- remd, replica exchange molecular dynamics
- rmsd, root mean-square deviation
- r-remd, reservoir replica exchange molecular dynamics
- zam, zipping and assembly method
- zamf, zam with froda
- 3-d, three-dimensional
- 1-d, one-dimensional
Collapse
Affiliation(s)
| | - S. Banu Ozkan
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona
| |
Collapse
|
25
|
McAllister SR, Floudas CA. An improved hybrid global optimization method for protein tertiary structure prediction. COMPUTATIONAL OPTIMIZATION AND APPLICATIONS 2010; 45:377-413. [PMID: 20357906 PMCID: PMC2847311 DOI: 10.1007/s10589-009-9277-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
First principles approaches to the protein structure prediction problem must search through an enormous conformational space to identify low-energy, near-native structures. In this paper, we describe the formulation of the tertiary structure prediction problem as a nonlinear constrained minimization problem, where the goal is to minimize the energy of a protein conformation subject to constraints on torsion angles and interatomic distances. The core of the proposed algorithm is a hybrid global optimization method that combines the benefits of the αBB deterministic global optimization approach with conformational space annealing. These global optimization techniques employ a local minimization strategy that combines torsion angle dynamics and rotamer optimization to identify and improve the selection of initial conformations and then applies a sequential quadratic programming approach to further minimize the energy of the protein conformations subject to constraints. The proposed algorithm demonstrates the ability to identify both lower energy protein structures, as well as larger ensembles of low-energy conformations.
Collapse
|
26
|
Joo K, Lee J, Seo JH, Lee K, Kim BG, Lee J. All-atom chain-building by optimizing MODELLER energy function using conformational space annealing. Proteins 2009; 75:1010-23. [PMID: 19089941 DOI: 10.1002/prot.22312] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We have investigated the effect of rigorous optimization of the MODELLER energy function for possible improvement in protein all-atom chain-building. For this we applied the global optimization method called conformational space annealing (CSA) to the standard MODELLER procedure to achieve better energy optimization than what MODELLER provides. The method, which we call MODELLERCSA, is tested on two benchmark sets. The first is the 298 proteins taken from the HOMSTRAD multiple alignment set. By simply optimizing the MODELLER energy function, we observe significant improvement in side-chain modeling, where MODELLERCSA provides about 10.7% (14.5%) improvement for chi(1) (chi(1) + chi(2)) accuracy compared to the standard MODELLER modeling. The improvement of backbone accuracy by MODELLERCSA is shown to be less prominent, and a similar improvement can be achieved by simply generating many standard MODELLER models and selecting lowest energy models. However, the level of side-chain modeling accuracy by MODELLERCSA could not be matched either by extensive MODELLER strategies, side-chain remodeling by SCWRL3, or copying unmutated rotamers. The identical procedure was successfully applied to 100 CASP7 template base modeling domains during the prediction season in a blind fashion, and the results are included here for comparison. From this study, we observe a good correlation between the MODELLER energy and the side-chain accuracy. Our findings indicate that, when a good alignment between a target protein and its templates is provided, thorough optimization of the MODELLER energy function leads to accurate all-atom models.
Collapse
Affiliation(s)
- Keehyoung Joo
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 130-722, Korea
| | | | | | | | | | | |
Collapse
|
27
|
Jumawid MT, Takahashi T, Yamazaki T, Ashigai H, Mihara H. Selection and structural analysis of de novo proteins from an alpha3beta3 genetic library. Protein Sci 2009; 18:384-98. [PMID: 19173222 DOI: 10.1002/pro.41] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The construction of novel functional proteins has been a key area of protein engineering. However, there are few reports of functional proteins constructed from artificial scaffolds. Here, we have constructed a genetic library encoding alpha3beta3 de novo proteins to generate novel scaffolds in smaller size using a binary combination of simplified hydrophobic and hydrophilic amino acid sets. To screen for folded de novo proteins, we used a GFP-based screening system and successfully obtained the proteins from the colonies emitting the very bright fluorescence as a similar intensity of GFP. Proteins isolated from the very bright colonies (vTAJ) and bright colonies (wTAJ) were analyzed by circular dichroism (CD), 8-anilino-1-naphthalenesulfonate (ANS) binding assay, and analytical size-exclusion chromatography (SEC). CD studies revealed that vTAJ and wTAJ proteins had both alpha-helix and beta-sheet structures with thermal stabilities. Moreover, the selected proteins demonstrated a variety of association states existing as monomer, dimer, and oligomer formation. The SEC and ANS binding assays revealed that vTAJ proteins tend to be a characteristic of the folded protein, but not in a molten-globule state. A vTAJ protein, vTAJ13, which has a packed globular structure and exists as a monomer, was further analyzed by nuclear magnetic resonance. NOE connectivities between backbone signals of vTAJ13 suggested that the protein contains three alpha-helices and three beta-strands as intended by its design. Thus, it would appear that artificially generated alpha3beta3 de novo proteins isolated from very bright colonies using the GFP fusion system exhibit excellent properties similar to folded proteins and would be available as artificial scaffolds to generate functional proteins with catalytic and ligand binding properties.
Collapse
Affiliation(s)
- Mariejoy Therese Jumawid
- Department of Bioengineering, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Nagatsuta, Yokohama, Japan
| | | | | | | | | |
Collapse
|
28
|
Pechmann S, Levy ED, Tartaglia GG, Vendruscolo M. Physicochemical principles that regulate the competition between functional and dysfunctional association of proteins. Proc Natl Acad Sci U S A 2009; 106:10159-64. [PMID: 19502422 PMCID: PMC2700930 DOI: 10.1073/pnas.0812414106] [Citation(s) in RCA: 124] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2008] [Indexed: 01/31/2023] Open
Abstract
To maintain protein homeostasis, a variety of quality control mechanisms, such as the unfolded protein response and the heat shock response, enable proteins to fold and to assemble into functional complexes while avoiding the formation of aberrant and potentially harmful aggregates. We show here that a complementary contribution to the regulation of the interactions between proteins is provided by the physicochemical properties of their amino acid sequences. The results of a systematic analysis of the protein-protein complexes in the Protein Data Bank (PDB) show that interface regions are more prone to aggregate than other surface regions, indicating that many of the interactions that promote the formation of functional complexes, including hydrophobic and electrostatic forces, can potentially also cause abnormal intermolecular association. We also show, however, that aggregation-prone interfaces are prevented from triggering uncontrolled assembly by being stabilized into their functional conformations by disulfide bonds and salt bridges. These results indicate that functional and dysfunctional association of proteins are promoted by similar forces but also that they are closely regulated by the presence of specific interactions that stabilize native states.
Collapse
Affiliation(s)
- Sebastian Pechmann
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom; and
| | - Emmanuel D. Levy
- Laboratory of Molecular Biology, Medical Research Council, Hills Road, Cambridge CB2 0QH, United Kingdom
| | - Gian Gaetano Tartaglia
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom; and
| | - Michele Vendruscolo
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom; and
| |
Collapse
|
29
|
Bowman GR, Pande VS. Simulated tempering yields insight into the low-resolution Rosetta scoring functions. Proteins 2009; 74:777-88. [PMID: 18767152 DOI: 10.1002/prot.22210] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Rosetta is a structure prediction package that has been employed successfully in numerous protein design and other applications.1 Previous reports have attributed the current limitations of the Rosetta de novo structure prediction algorithm to inadequate sampling, particularly during the low-resolution phase.2-5 Here, we implement the Simulated Tempering (ST) sampling algorithm67 in Rosetta to address this issue. ST is intended to yield canonical sampling by inducing a random walk in temperatures space such that broad sampling is achieved at high temperatures and detailed exploration of local free energy minima is achieved at low temperatures. ST should therefore visit basins in accordance with their free energies rather than their energies and achieve more global sampling than the localized scheme currently implemented in Rosetta. However, we find that ST does not improve structure prediction with Rosetta. To understand why, we carried out a detailed analysis of the low-resolution scoring functions and find that they do not provide a strong bias towards the native state. In addition, we find that both ST and standard Rosetta runs started from the native state are biased away from the native state. Although the low-resolution scoring functions could be improved, we propose that working entirely at full-atom resolution is now possible and may be a better option due to superior native-state discrimination at full-atom resolution. Such an approach will require more attention to the kinetics of convergence, however, as functions capable of native state discrimination are not necessarily capable of rapidly guiding non-native conformations to the native state.
Collapse
Affiliation(s)
- Gregory R Bowman
- Biophysics Program, Stanford University, Stanford, California 94305, USA
| | | |
Collapse
|
30
|
Hoque T, Chetty M, Sattar A. Extended HP model for protein structure prediction. J Comput Biol 2009; 16:85-103. [PMID: 19119994 DOI: 10.1089/cmb.2008.0082] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This paper describes a detailed investigation of a lattice-based HP (hydrophobic-hydrophilic) model for ab initio protein structure prediction (PSP). The outcome of the simplified HP lattice model has high degeneracy, which could mislead the prediction. The HPNX model was proposed to address the degeneracy problem as well as to avoid the conformational deformity with the hydrophilic (P) residues. We have experimentally shown that it is necessary to further improve the existing HPNX model. We have found and solved the critical error of another existing YhHX model. By extracting the significant features from the YhHX for the HPNX model, we have proposed a novel hHPNX model. Hybrid Genetic Algorithm (HGA) has been used to compare the predictability of these models and hHPNX outperformed other models. We preferred 3D face-centered-cube (FCC) lattice configuration to have closest resemblance to the real folded 3D protein.
Collapse
Affiliation(s)
- Tamjidul Hoque
- Institute for Integrated and Intelligent Systems (IIIS), Griffith University, Nathan, QLD, Australia
| | - Madhu Chetty
- Gippsland School of Information Technology (GSIT), Monash University, Churchill, VIC, Australia
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems (IIIS), Griffith University, Nathan, QLD, Australia
| |
Collapse
|
31
|
On the Reconstruction of Three-dimensional Protein Structures from Contact Maps. ALGORITHMS 2009. [DOI: 10.3390/a2010076] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
32
|
Hoque MT, Chetty M, Sattar A. Genetic Algorithm inAb Initio Protein Structure Prediction Using Low Resolution Model: A Review. BIOMEDICAL DATA AND APPLICATIONS 2009. [DOI: 10.1007/978-3-642-02193-0_14] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
33
|
van der Kamp MW, Shaw KE, Woods CJ, Mulholland AJ. Biomolecular simulation and modelling: status, progress and prospects. J R Soc Interface 2008; 5 Suppl 3:S173-90. [PMID: 18611844 PMCID: PMC2706107 DOI: 10.1098/rsif.2008.0105.focus] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Revised: 06/05/2008] [Accepted: 06/06/2008] [Indexed: 11/12/2022] Open
Abstract
Molecular simulation is increasingly demonstrating its practical value in the investigation of biological systems. Computational modelling of biomolecular systems is an exciting and rapidly developing area, which is expanding significantly in scope. A range of simulation methods has been developed that can be applied to study a wide variety of problems in structural biology and at the interfaces between physics, chemistry and biology. Here, we give an overview of methods and some recent developments in atomistic biomolecular simulation. Some recent applications and theoretical developments are highlighted.
Collapse
Affiliation(s)
| | | | | | - Adrian J. Mulholland
- Centre for Computational Chemistry, School of Chemistry, University of BristolBristol BS8 1TS, UK
| |
Collapse
|
34
|
Süel KE, Gu H, Chook YM. Modular organization and combinatorial energetics of proline-tyrosine nuclear localization signals. PLoS Biol 2008; 6:e137. [PMID: 18532879 PMCID: PMC2408616 DOI: 10.1371/journal.pbio.0060137] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2007] [Accepted: 04/23/2008] [Indexed: 01/21/2023] Open
Abstract
Proline–tyrosine nuclear localization signals (PY-NLSs) are recognized and transported into the nucleus by human Karyopherin (Kap) β2/Transportin and yeast Kap104p. Multipartite PY-NLSs are highly diverse in sequence and structure, share a common C-terminal R/H/KX2–5PY motif, and can be subdivided into hydrophobic and basic subclasses based on loose N-terminal sequence motifs. PY-NLS variability is consistent with weak consensus motifs, but such diversity potentially renders comprehensive genome-scale searches intractable. Here, we use yeast Kap104p as a model system to understand the energetic organization of this NLS. First, we show that Kap104p substrates contain PY-NLSs, demonstrating their generality across eukaryotes. Previously reported Kapβ2–NLS structures explain Kap104p specificity for the basic PY-NLS. More importantly, thermodynamic analyses revealed physical properties that govern PY-NLS binding affinity: (1) PY-NLSs contain three energetically significant linear epitopes, (2) each epitope accommodates substantial sequence diversity, within defined limits, (3) the epitopes are energetically quasi-independent, and (4) a given linear epitope can contribute differently to total binding energy in different PY-NLSs, amplifying signal diversity through combinatorial mixing of energetically weak and strong motifs. The modular organization of the PY-NLS coupled with its combinatorial energetics lays a path to decode this diverse and evolvable signal for future comprehensive genome-scale identification of nuclear import substrates. To travel between the cytoplasm and nucleus, proteins rely on a family of transport proteins known as the karyopherinβ family. Karyopherinβ2, the human version of a family member, recognizes cargo proteins containing a class of nuclear localization signal known as the PY-NLS. The yeast homolog of Karyopherinβ2, Kap104p, also recognizes PY-NLSs, indicating that this pathway has been conserved between evolutionarily distant species. We mutated residues in the PY-NLSs of two Kap104p cargo proteins and analyzed how tightly these mutants bound Kap104p. These experiments revealed three PY-NLS regions, or epitopes, that are important for binding Kap104p. Each epitope is composed of amino acids that vary between cargoes. The epitopes are energetically independent and bind Kap104p with varying strengths in different PY-NLSs, such that mutating the epitope of one PY-NLS may mistakenly direct cargo to the cytoplasm, while a similar mutation in a different PY-NLS has little effect on cargo localization. This flexible, energetically modular, and combinatorial architecture of PY-NLSs may confer higher tolerance to mutations, but it also allows greater sequence diversity, making prediction of new PY-NLSs difficult. The characteristics of PY-NLSs reported here will assist in the identification of new Kap104p cargoes. And the approach used may be applicable to other biological recognition pathways. PY-nuclear localization signals contain three binding regions that are not closely related in sequence and are energetically quasi-independent. These modular epitopes can contribute differently to the total binding energy in different signals, to tune their affinity for binding to the carrier protein Karyopherinβ2/Kap104p, and also to amplify signal diversity.
Collapse
Affiliation(s)
- Katherine E Süel
- Department of Pharmacology, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, United States of America
| | - Hongmei Gu
- Department of Pharmacology, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, United States of America
| | - Yuh Min Chook
- Department of Pharmacology, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
35
|
Pawłowski K. Uncharacterized/hypothetical proteins in biomedical 'omics' experiments: is novelty being swept under the carpet? BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2008; 7:283-90. [PMID: 18641417 DOI: 10.1093/bfgp/eln033] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Many 'omics' studies, gene expression microarray experiments in particular, aim at charting the molecular mechanisms of physiology, disease and drug response. This short review discusses the bias present in many such studies whereas the focus is set on the well understood and established molecular scenarios. The under-reporting rate of 'hypothetical' or uncharacterized genes and proteins, differentially regulated in disease context, is assessed here. Reasons for this bias are discussed. Particular examples from the genomics studies on respiratory diseases are presented. This review aims at increasing awareness of the unexplored genomics data and proposes remedies in order to refocus genomics studies on the less-charted territories of the genome, transcriptome and proteome. It is suggested that routine use of function prediction methods in conjunction with omics analyses may allow better interpretation of the data, and facilitate discovery of true novelty.
Collapse
Affiliation(s)
- Krzysztof Pawłowski
- Nencki Institute of Experimental Biology, PAS, Warsaw University of Life Sciences, Warszawa, Poland.
| |
Collapse
|
36
|
Crystal structure of the CaV2 IQ domain in complex with Ca2+/calmodulin: high-resolution mechanistic implications for channel regulation by Ca2+. Structure 2008; 16:607-20. [PMID: 18400181 DOI: 10.1016/j.str.2008.01.011] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2007] [Revised: 01/12/2008] [Accepted: 01/22/2008] [Indexed: 11/21/2022]
Abstract
Calmodulin (CaM) regulation of Ca(2+) channels is central to Ca(2+) signaling. Ca(V)1 versus Ca(V)2 classes of these channels exhibit divergent forms of regulation, potentially relating to customized CaM/IQ interactions among different channels. Here we report the crystal structures for the Ca(2+)/CaM IQ domains of both Ca(V)2.1 and Ca(V)2.3 channels. These highly similar structures emphasize that major CaM contacts with the IQ domain extend well upstream of traditional consensus residues. Surprisingly, upstream mutations strongly diminished Ca(V)2.1 regulation, whereas downstream perturbations had limited effects. Furthermore, our Ca(V)2 structures closely resemble published Ca(2+)/CaM-Ca(V)1.2 IQ structures, arguing against Ca(V)1/2 regulatory differences based solely on contrasting CaM/IQ conformations. Instead, alanine scanning of the Ca(V)2.1 IQ domain, combined with structure-based molecular simulation of corresponding CaM/IQ binding energy perturbations, suggests that the C lobe of CaM partially dislodges from the IQ element during channel regulation, allowing exposed IQ residues to trigger regulation via isoform-specific interactions with alternative channel regions.
Collapse
|
37
|
Safety assessment of food products from r-DNA animals. Comp Immunol Microbiol Infect Dis 2008; 32:163-89. [PMID: 18258300 DOI: 10.1016/j.cimid.2007.11.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/10/2007] [Indexed: 01/26/2023]
Abstract
Recombinant-DNA (transgenic) animals intended for food production are approaching the market. Among them, recombinant-DNA fishes constitute the most advanced case. As a result, intergovernmental organizations are working on guidelines which would eventually become international standards for national food safety assessments of these products. This article reviews the emerging elements for the food safety assessment of products derived from recombinant-DNA animals. These elements will become highly relevant both for researchers and regulators interested in developing or analyzing recombinant-DNA animals intended to be used in the commercial elaboration of food products. It also provides references to science-based tools that can be used to support food safety assessments. Finally, it proposes recommendations for the further development of biosafety assessment methodologies in this area.
Collapse
|
38
|
Schmidt Am Busch M, Lopes A, Mignon D, Simonson T. Computational protein design: Software implementation, parameter optimization, and performance of a simple model. J Comput Chem 2008; 29:1092-102. [DOI: 10.1002/jcc.20870] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
39
|
Abstract
The "protein folding problem" consists of three closely related puzzles: (a) What is the folding code? (b) What is the folding mechanism? (c) Can we predict the native structure of a protein from its amino acid sequence? Once regarded as a grand challenge, protein folding has seen great progress in recent years. Now, foldable proteins and nonbiological polymers are being designed routinely and moving toward successful applications. The structures of small proteins are now often well predicted by computer methods. And, there is now a testable explanation for how a protein can fold so quickly: A protein solves its large global optimization problem as a series of smaller local optimization problems, growing and assembling the native structure from peptide fragments, local structures first.
Collapse
Affiliation(s)
- Ken A. Dill
- Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94143
- Graduate Group in Biophysics, University of California, San Francisco, California 94143;
| | - S. Banu Ozkan
- Department of Physics, Arizona State University, Tempe, Arizona 85287;
| | - M. Scott Shell
- Department of Chemical Engineering, University of California, Santa Barbara, California 93106;
| | - Thomas R. Weikl
- Max Planck Institute of Colloids and Interfaces, Department of Theory and Bio-Systems, 14424 Potsdam, Germany;
| |
Collapse
|
40
|
Armstrong KA, Tidor B. Computationally mapping sequence space to understand evolutionary protein engineering. Biotechnol Prog 2007; 24:62-73. [PMID: 18020358 DOI: 10.1021/bp070134h] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Evolutionary protein engineering has been dramatically successful, producing a wide variety of new proteins with altered stability, binding affinity, and enzymatic activity. However, the success of such procedures is often unreliable, and the impact of the choice of protein, engineering goal, and evolutionary procedure is not well understood. We have created a framework for understanding aspects of the protein engineering process by computationally mapping regions of feasible sequence space for three small proteins using structure-based design protocols. We then tested the ability of different evolutionary search strategies to explore these sequence spaces. The results point to a non-intuitive relationship between the error-prone PCR mutation rate and the number of rounds of replication. The evolutionary relationships among feasible sequences reveal hub-like sequences that serve as particularly fruitful starting sequences for evolutionary search. Moreover, genetic recombination procedures were examined, and tradeoffs relating sequence diversity and search efficiency were identified. This framework allows us to consider the impact of protein structure on the allowed sequence space and therefore on the challenges that each protein presents to error-prone PCR and genetic recombination procedures.
Collapse
Affiliation(s)
- Kathryn A Armstrong
- Computer Science and Artificial Intelligence Laboratory, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, USA
| | | |
Collapse
|
41
|
Abstract
This review presents the advances in protein structure prediction from the computational methods perspective. The approaches are classified into four major categories: comparative modeling, fold recognition, first principles methods that employ database information, and first principles methods without database information. Important advances along with current limitations and challenges are presented.
Collapse
Affiliation(s)
- C A Floudas
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA.
| |
Collapse
|
42
|
Lippow SM, Tidor B. Progress in computational protein design. Curr Opin Biotechnol 2007; 18:305-11. [PMID: 17644370 PMCID: PMC3495006 DOI: 10.1016/j.copbio.2007.04.009] [Citation(s) in RCA: 161] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2007] [Accepted: 04/17/2007] [Indexed: 11/25/2022]
Abstract
Current progress in computational structure-based protein design is reviewed in the areas of methodology and applications. Foundational advances include new potential functions, more efficient ways of computing energetics, flexible treatments of solvent, and useful energy function approximations, as well as ensemble-based approaches to scoring designs for inclusion of entropic effects, improvements to guaranteed and to stochastic search techniques, and methods to design combinatorial libraries for screening and selection. Applications include new approaches and successes in the design of specificity for protein folding, binding, and catalysis, in the redesign of proteins for enhanced binding affinity, and in the application of design technology to study and alter enzyme catalysis. Computational protein design continues to mature and advance.
Collapse
Affiliation(s)
- Shaun M Lippow
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.
| | | |
Collapse
|
43
|
Dill KA, Ozkan SB, Weikl TR, Chodera JD, Voelz VA. The protein folding problem: when will it be solved? Curr Opin Struct Biol 2007; 17:342-6. [PMID: 17572080 DOI: 10.1016/j.sbi.2007.06.001] [Citation(s) in RCA: 163] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2007] [Revised: 04/11/2007] [Accepted: 06/06/2007] [Indexed: 01/29/2023]
Abstract
The protein folding problem can be viewed as three different problems: defining the thermodynamic folding code; devising a good computational structure prediction algorithm; and answering Levinthal's question regarding the kinetic mechanism of how proteins can fold so quickly. Once regarded as a grand challenge, protein folding has seen much progress in recent years. Folding codes are now being used to successfully design proteins and non-biological foldable polymers; aided by the Critical Assessment of Techniques for Structure Prediction (CASP) competition, protein structure prediction has now become quite good. Even the once-challenging Levinthal puzzle now seems to have an answer--a protein can avoid searching irrelevant conformations and fold quickly by making local independent decisions first, followed by non-local global decisions later.
Collapse
Affiliation(s)
- Ken A Dill
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94143, USA.
| | | | | | | | | |
Collapse
|
44
|
Goodman CM, Choi S, Shandler S, DeGrado WF. Foldamers as versatile frameworks for the design and evolution of function. Nat Chem Biol 2007; 3:252-62. [PMID: 17438550 PMCID: PMC3810020 DOI: 10.1038/nchembio876] [Citation(s) in RCA: 759] [Impact Index Per Article: 44.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Foldamers are sequence-specific oligomers akin to peptides, proteins and oligonucleotides that fold into well-defined three-dimensional structures. They offer the chemical biologist a broad pallet of building blocks for the construction of molecules that test and extend our understanding of protein folding and function. Foldamers also provide templates for presenting complex arrays of functional groups in virtually unlimited geometrical patterns, thereby presenting attractive opportunities for the design of molecules that bind in a sequence- and structure-specific manner to oligosaccharides, nucleic acids, membranes and proteins. We summarize recent advances and highlight the future applications and challenges of this rapidly expanding field.
Collapse
Affiliation(s)
- Catherine M Goodman
- Department of Biochemistry and Biophysics, University of Pennsylvania, School of Medicine, 422 Curie Boulevard, Philadelphia, Pennsylvania 19104-6059, USA
| | | | | | | |
Collapse
|
45
|
Liu T, Whitten ST, Hilser VJ. Functional residues serve a dominant role in mediating the cooperativity of the protein ensemble. Proc Natl Acad Sci U S A 2007; 104:4347-52. [PMID: 17360527 PMCID: PMC1838605 DOI: 10.1073/pnas.0607132104] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2006] [Indexed: 11/18/2022] Open
Abstract
Conformational fluctuations in proteins have emerged as a potentially important aspect of biological function, although the precise relationship and the implications have yet to be fully explored. Numerous studies have reported that the binding of ligand can influence fluctuations. However, the role of the binding site in mediating these fluctuations is not known. Of particular interest is whether in addition to serving as structural scaffolds for recognition and catalysis, active-site residues may also play a role in modulating the cooperative network. To address this question, we employ an experimentally validated ensemble-based description of proteins to elucidate the extent to which perturbations at different sites can influence the cooperative network in the protein. Applying this method to a database of test proteins, it is found statistically that binding sites are located in regions most able to affect the cooperative network, even for cooperative interactions between residues distant to the binding sites. This indicates that the conformational manifold under native conditions is determined by the network of cooperative interactions within the protein and suggests that proteins have evolved to use these conformational fluctuations in carrying out their functions. Furthermore, because the energetic coupling pattern calculated for each protein is robust and relatively insensitive to sequence, these studies further suggest that binding sites evolved in regions of the protein that are inherently poised to take advantage of the fluctuations in the native structure.
Collapse
Affiliation(s)
- Tong Liu
- Department of Biochemistry and Molecular Biology, and Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, TX 77555-1068
| | - Steven T. Whitten
- Department of Biochemistry and Molecular Biology, and Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, TX 77555-1068
| | - Vincent J. Hilser
- Department of Biochemistry and Molecular Biology, and Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, TX 77555-1068
| |
Collapse
|
46
|
Liwo A, Khalili M, Czaplewski C, Kalinowski S, Ołdziej S, Wachucik K, Scheraga HA. Modification and optimization of the united-residue (UNRES) potential energy function for canonical simulations. I. Temperature dependence of the effective energy function and tests of the optimization method with single training proteins. J Phys Chem B 2007; 111:260-85. [PMID: 17201450 PMCID: PMC3236617 DOI: 10.1021/jp065380a] [Citation(s) in RCA: 157] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We report the modification and parametrization of the united-residue (UNRES) force field for energy-based protein structure prediction and protein folding simulations. We tested the approach on three training proteins separately: 1E0L (beta), 1GAB (alpha), and 1E0G (alpha + beta). Heretofore, the UNRES force field had been designed and parametrized to locate native-like structures of proteins as global minima of their effective potential energy surfaces, which largely neglected the conformational entropy because decoys composed of only lowest-energy conformations were used to optimize the force field. Recently, we developed a mesoscopic dynamics procedure for UNRES and applied it with success to simulate protein folding pathways. However, the force field turned out to be largely biased toward -helical structures in canonical simulations because the conformational entropy had been neglected in the parametrization. We applied the hierarchical optimization method, developed in our earlier work, to optimize the force field; in this method, the conformational space of a training protein is divided into levels, each corresponding to a certain degree of native-likeness. The levels are ordered according to increasing native-likeness; level 0 corresponds to structures with no native-like elements, and the highest level corresponds to the fully native-like structures. The aim of optimization is to achieve the order of the free energies of levels, decreasing as their native-likeness increases. The procedure is iterative, and decoys of the training protein(s) generated with the energy function parameters of the preceding iteration are used to optimize the force field in a current iteration. We applied the multiplexing replica-exchange molecular dynamics (MREMD) method, recently implemented in UNRES, to generate decoys; with this modification, conformational entropy is taken into account. Moreover, we optimized the free-energy gaps between levels at temperatures corresponding to a predominance of folded or unfolded structures, as well as to structures at the putative folding-transition temperature, changing the sign of the gaps at the transition temperature. This enabled us to obtain force fields characterized by a single peak in the heat capacity at the transition temperature. Furthermore, we introduced temperature dependence to the UNRES force field; this is consistent with the fact that it is a free-energy and not a potential energy function. beta
Collapse
Affiliation(s)
- Adam Liwo
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A
| | - Mey Khalili
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A
| | - Cezary Czaplewski
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A
| | - Sebastian Kalinowski
- Faculty of Chemistry, University of Gdańsk, Sobieskiego 18, 80-952 Gdańsk, Poland
| | - Stanisław Ołdziej
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A
| | - Katarzyna Wachucik
- Faculty of Chemistry, University of Gdańsk, Sobieskiego 18, 80-952 Gdańsk, Poland
| | - Harold A. Scheraga
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A
| |
Collapse
|
47
|
Jones DT, Sternberg MJE, Thornton JM. Introduction. Bioinformatics: from molecules to systems. Philos Trans R Soc Lond B Biol Sci 2006; 361:389-91. [PMID: 16524827 PMCID: PMC1609343 DOI: 10.1098/rstb.2005.1811] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- David T Jones
- University College London Department of Computer Science, Bioinformatics Unit Gower Street, London WC1E 6BT, UK
| | | | | |
Collapse
|