1
|
Dos Santos VP, Rodrigues A, Dutra G, Bastos L, Mariano D, Mendonça JG, Lobo YJG, Mendes E, Maia G, Machado KDS, Werhli AV, Rocha G, de Lima LHF, de Melo-Minardi R. E-Volve: understanding the impact of mutations in SARS-CoV-2 variants spike protein on antibodies and ACE2 affinity through patterns of chemical interactions at protein interfaces. PeerJ 2022; 10:e13099. [PMID: 35341044 PMCID: PMC8953562 DOI: 10.7717/peerj.13099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 02/21/2022] [Indexed: 01/12/2023] Open
Abstract
Background The SARS-CoV-2 pandemic reverberated, posing health and social hygiene obstacles throughout the globe. Mutant lineages of the virus have concerned scientists because of convergent amino acid alterations, mainly on the viral spike protein. Studies have shown that mutants have diminished activity of neutralizing antibodies and enhanced affinity with its human cell receptor, the ACE2 protein. Methods Hence, for real-time measuring of the impacts caused by variant strains in such complexes, we implemented E-Volve, a tool designed to model a structure with a list of mutations requested by users and return analyses of the variant protein. As a proof of concept, we scrutinized the spike-antibody and spike-ACE2 complexes formed in the variants of concern, B.1.1.7 (Alpha), B.1.351 (Beta), and P.1 (Gamma), by using contact maps depicting the interactions made amid them, along with heat maps to quantify these major interactions. Results The results found in this study depict the highly frequent interface changes made by the entire set of mutations, mainly conducted by N501Y and E484K. In the spike-Antibody complex, we have noticed alterations concerning electrostatic surface complementarity, breaching essential sites in the P17 and BD-368-2 antibodies. Alongside, the spike-ACE2 complex has presented new hydrophobic bonds. Discussion Molecular dynamics simulations followed by Poisson-Boltzmann calculations corroborate the higher complementarity to the receptor and lower to the antibodies for the K417T/E484K/N501Y (Gamma) mutant compared to the wild-type strain, as pointed by E-Volve, as well as an intensification of this effect by changes at the protein conformational equilibrium in solution. A local disorder of the loop α1'/β1', as well its possible effects on the affinity to the BD-368-2 antibody were also incorporated to the final conclusions after this analysis. Moreover, E-Volve can depict the main alterations in important biological structures, as shown in the SARS-CoV-2 complexes, marking a major step in the real-time tracking of the virus mutant lineages. E-Volve is available at http://bioinfo.dcc.ufmg.br/evolve.
Collapse
Affiliation(s)
- Vitor Pimentel Dos Santos
- Laboratory of Bioinformatics and Systems, Institute of Exact Sciences, Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - André Rodrigues
- Laboratory of Bioinformatics and Systems, Institute of Exact Sciences, Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Gabriel Dutra
- Laboratory of Bioinformatics and Systems, Institute of Exact Sciences, Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Luana Bastos
- Laboratory of Bioinformatics and Systems, Institute of Exact Sciences, Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Diego Mariano
- Laboratory of Bioinformatics and Systems, Institute of Exact Sciences, Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - José Gutembergue Mendonça
- Laboratory of Quantum and Computational Chemistry, Center of Exact and Natural Sciences, Department of Chemistry, Universidade Federal da Paraíba, João Pessoa, PB, Brazil
| | - Yan Jerônimo Gomes Lobo
- Laboratory of Molecular Modeling and Bioinformatics, Campus Sete Lagoas, Department of Exact and Biological Sciences, Universidade Federal de São João del-Rei, Sete Lagoas, MG, Brazil
| | - Eduardo Mendes
- Laboratory of Bioinformatics and Systems, Institute of Exact Sciences, Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Giovana Maia
- Laboratory of Bioinformatics and Systems, Institute of Exact Sciences, Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Karina dos Santos Machado
- Computational Biology Laboratory (ComBi-Lab), Center for Computational Sciences-C3, Universidade Federal do Rio Grande, Rio Grande, RS, Brazil
| | - Adriano Velasque Werhli
- Computational Biology Laboratory (ComBi-Lab), Center for Computational Sciences-C3, Universidade Federal do Rio Grande, Rio Grande, RS, Brazil
| | - Gerd Rocha
- Laboratory of Quantum and Computational Chemistry, Center of Exact and Natural Sciences, Department of Chemistry, Universidade Federal da Paraíba, João Pessoa, PB, Brazil
| | - Leonardo Henrique França de Lima
- Laboratory of Molecular Modeling and Bioinformatics, Campus Sete Lagoas, Department of Exact and Biological Sciences, Universidade Federal de São João del-Rei, Sete Lagoas, MG, Brazil
| | - Raquel de Melo-Minardi
- Laboratory of Bioinformatics and Systems, Institute of Exact Sciences, Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| |
Collapse
|
2
|
Abbass J, Nebel JC. Enhancing fragment-based protein structure prediction by customising fragment cardinality according to local secondary structure. BMC Bioinformatics 2020; 21:170. [PMID: 32357827 PMCID: PMC7195757 DOI: 10.1186/s12859-020-3491-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 04/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Whenever suitable template structures are not available, usage of fragment-based protein structure prediction becomes the only practical alternative as pure ab initio techniques require massive computational resources even for very small proteins. However, inaccuracy of their energy functions and their stochastic nature imposes generation of a large number of decoys to explore adequately the solution space, limiting their usage to small proteins. Taking advantage of the uneven complexity of the sequence-structure relationship of short fragments, we adjusted the fragment insertion process by customising the number of available fragment templates according to the expected complexity of the predicted local secondary structure. Whereas the number of fragments is kept to its default value for coil regions, important and dramatic reductions are proposed for beta sheet and alpha helical regions, respectively. RESULTS The evaluation of our fragment selection approach was conducted using an enhanced version of the popular Rosetta fragment-based protein structure prediction tool. It was modified so that the number of fragment candidates used in Rosetta could be adjusted based on the local secondary structure. Compared to Rosetta's standard predictions, our strategy delivered improved first models, + 24% and + 6% in terms of GDT, when using 2000 and 20,000 decoys, respectively, while reducing significantly the number of fragment candidates. Furthermore, our enhanced version of Rosetta is able to deliver with 2000 decoys a performance equivalent to that produced by standard Rosetta while using 20,000 decoys. We hypothesise that, as the fragment insertion process focuses on the most challenging regions, such as coils, fewer decoys are needed to explore satisfactorily conformation spaces. CONCLUSIONS Taking advantage of the high accuracy of sequence-based secondary structure predictions, we showed the value of that information to customise the number of candidates used during the fragment insertion process of fragment-based protein structure prediction. Experimentations conducted using standard Rosetta showed that, when using the recommended number of decoys, i.e. 20,000, our strategy produces better results. Alternatively, similar results can be achieved using only 2000 decoys. Consequently, we recommend the adoption of this strategy to either improve significantly model quality or reduce processing times by a factor 10.
Collapse
Affiliation(s)
- Jad Abbass
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE UK
- Department of Computer Science, Lebanese International University, Bekaa, Lebanon
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE UK
| |
Collapse
|
3
|
Abstract
Chemical Shift-Rosetta (CS-Rosetta) is an automated method that employs NMR chemical shifts to model protein structures de novo. In this chapter, we introduce the terminology and central concepts of CS-Rosetta. We describe the architecture and functionality of automatic NOESY assignment (AutoNOE) and structure determination protocols (Abrelax and RASREC) within the CS-Rosetta framework. We further demonstrate how CS-Rosetta can discriminate near-native structures against a large conformational search space using restraints obtained from NMR data, and/or sequence and structure homology. We highlight how CS-Rosetta can be combined with alternative automated approaches to (i) model oligomeric systems and (ii) create NMR-based structure determination pipeline. To show its practical applicability, we emphasize on the computational requirements and performance of CS-Rosetta for protein targets of varying molecular weight and complexity. Finally, we discuss the current Python interface, which enables easy execution of protocols for rapid and accurate high-resolution structure determination.
Collapse
Affiliation(s)
- Santrupti Nerli
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA, United States; Department of Computer Science, University of California Santa Cruz, Santa Cruz, CA, United States
| | - Nikolaos G Sgourakis
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA, United States.
| |
Collapse
|
4
|
Nerli S, McShan AC, Sgourakis NG. Chemical shift-based methods in NMR structure determination. PROGRESS IN NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY 2018; 106-107:1-25. [PMID: 31047599 PMCID: PMC6788782 DOI: 10.1016/j.pnmrs.2018.03.002] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 03/09/2018] [Accepted: 03/09/2018] [Indexed: 05/08/2023]
Abstract
Chemical shifts are highly sensitive probes harnessed by NMR spectroscopists and structural biologists as conformational parameters to characterize a range of biological molecules. Traditionally, assignment of chemical shifts has been a labor-intensive process requiring numerous samples and a suite of multidimensional experiments. Over the past two decades, the development of complementary computational approaches has bolstered the analysis, interpretation and utilization of chemical shifts for elucidation of high resolution protein and nucleic acid structures. Here, we review the development and application of chemical shift-based methods for structure determination with a focus on ab initio fragment assembly, comparative modeling, oligomeric systems, and automated assignment methods. Throughout our discussion, we point out practical uses, as well as advantages and caveats, of using chemical shifts in structure modeling. We additionally highlight (i) hybrid methods that employ chemical shifts with other types of NMR restraints (residual dipolar couplings, paramagnetic relaxation enhancements and pseudocontact shifts) that allow for improved accuracy and resolution of generated 3D structures, (ii) the utilization of chemical shifts to model the structures of sparsely populated excited states, and (iii) modeling of sidechain conformations. Finally, we briefly discuss the advantages of contemporary methods that employ sparse NMR data recorded using site-specific isotope labeling schemes for chemical shift-driven structure determination of larger molecules. With this review, we aim to emphasize the accessibility and versatility of chemical shifts for structure determination of challenging biological systems, and to point out emerging areas of development that lead us towards the next generation of tools.
Collapse
Affiliation(s)
- Santrupti Nerli
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA 95064, United States; Department of Computer Science, University of California Santa Cruz, Santa Cruz, CA 95064, United States
| | - Andrew C McShan
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA 95064, United States
| | - Nikolaos G Sgourakis
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA 95064, United States.
| |
Collapse
|
5
|
de Oliveira SHP, Law EC, Shi J, Deane CM. Sequential search leads to faster, more efficient fragment-based de novo protein structure prediction. Bioinformatics 2018; 34:1132-1140. [PMID: 29136098 PMCID: PMC6030820 DOI: 10.1093/bioinformatics/btx722] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 09/22/2017] [Accepted: 11/04/2017] [Indexed: 01/12/2023] Open
Abstract
Motivation Most current de novo structure prediction methods randomly sample protein conformations and thus require large amounts of computational resource. Here, we consider a sequential sampling strategy, building on ideas from recent experimental work which shows that many proteins fold cotranslationally. Results We have investigated whether a pseudo-greedy search approach, which begins sequentially from one of the termini, can improve the performance and accuracy of de novo protein structure prediction. We observed that our sequential approach converges when fewer than 20 000 decoys have been produced, fewer than commonly expected. Using our software, SAINT2, we also compared the run time and quality of models produced in a sequential fashion against a standard, non-sequential approach. Sequential prediction produces an individual decoy 1.5-2.5 times faster than non-sequential prediction. When considering the quality of the best model, sequential prediction led to a better model being produced for 31 out of 41 soluble protein validation cases and for 18 out of 24 transmembrane protein cases. Correct models (TM-Score > 0.5) were produced for 29 of these cases by the sequential mode and for only 22 by the non-sequential mode. Our comparison reveals that a sequential search strategy can be used to drastically reduce computational time of de novo protein structure prediction and improve accuracy. Availability and implementation Data are available for download from: http://opig.stats.ox.ac.uk/resources. SAINT2 is available for download from: https://github.com/sauloho/SAINT2. Contact saulo.deoliveira@dtc.ox.ac.uk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Eleanor C Law
- Department of Statistics, University of Oxford, Oxford, UK
| | - Jiye Shi
- Department of Informatics, UCB Pharma, Slough, UK
- Division of Physical Biology, Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai, China
| | | |
Collapse
|
6
|
Protein Tertiary Structure by Crosslinking/Mass Spectrometry. Trends Biochem Sci 2018; 43:157-169. [PMID: 29395654 PMCID: PMC5854373 DOI: 10.1016/j.tibs.2017.12.006] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Revised: 12/19/2017] [Accepted: 12/21/2017] [Indexed: 12/21/2022]
Abstract
Observing the structures of proteins within the cell and tracking structural changes under different cellular conditions are the ultimate challenges for structural biology. This, however, requires an experimental technique that can generate sufficient data for structure determination and is applicable in the native environment of proteins. Crosslinking/mass spectrometry (CLMS) and protein structure determination have recently advanced to meet these requirements and crosslinking-driven de novo structure determination in native environments is now possible. In this opinion article, we highlight recent successes in the field of CLMS with protein structure modeling and challenges it still holds. The earliest structural studies on proteins using crosslinking/mass spectrometry aimed to elucidate their tertiary three-dimensional structure. Tertiary structure modeling using crosslinking fell out of favor for almost two decades because crosslink data were not informative to aid structure modeling. Two game-changing trends emerged: using short-range crosslinkers that capture relevant modeling information and high-density crosslinking. High-density crosslinking uses unspecific crosslinkers to dramatically increase crosslink numbers. In addition, computational structure modeling methods made significant progress in exploiting CLMS data. The combination of high-density crosslinking and computational structure modeling enables the elucidation of tertiary protein structure in native environments. This sidesteps the key limitation of today’s structure determination methods, which are unable (except for a few, specialized methods) to probe the structure of proteins in cell lysates or even intact cells.
Collapse
|
7
|
Gaalswyk K, Rowley CN. An explicit-solvent conformation search method using open software. PeerJ 2016; 4:e2088. [PMID: 27280078 PMCID: PMC4893328 DOI: 10.7717/peerj.2088] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 05/06/2016] [Indexed: 02/05/2023] Open
Abstract
Computer modeling is a popular tool to identify the most-probable conformers of a molecule. Although the solvent can have a large effect on the stability of a conformation, many popular conformational search methods are only capable of describing molecules in the gas phase or with an implicit solvent model. We have developed a work-flow for performing a conformation search on explicitly-solvated molecules using open source software. This method uses replica exchange molecular dynamics (REMD) to sample the conformational states of the molecule efficiently. Cluster analysis is used to identify the most probable conformations from the simulated trajectory. This work-flow was tested on drug molecules α-amanitin and cabergoline to illustrate its capabilities and effectiveness. The preferred conformations of these molecules in gas phase, implicit solvent, and explicit solvent are significantly different.
Collapse
Affiliation(s)
- Kari Gaalswyk
- Department of Chemistry, Memorial University of Newfoundland , St. John's, Newfoundland and Labrador , Canada
| | - Christopher N Rowley
- Department of Chemistry, Memorial University of Newfoundland , St. John's, Newfoundland and Labrador , Canada
| |
Collapse
|
8
|
Maximova T, Moffatt R, Ma B, Nussinov R, Shehu A. Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics. PLoS Comput Biol 2016; 12:e1004619. [PMID: 27124275 PMCID: PMC4849799 DOI: 10.1371/journal.pcbi.1004619] [Citation(s) in RCA: 132] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Investigation of macromolecular structure and dynamics is fundamental to understanding how macromolecules carry out their functions in the cell. Significant advances have been made toward this end in silico, with a growing number of computational methods proposed yearly to study and simulate various aspects of macromolecular structure and dynamics. This review aims to provide an overview of recent advances, focusing primarily on methods proposed for exploring the structure space of macromolecules in isolation and in assemblies for the purpose of characterizing equilibrium structure and dynamics. In addition to surveying recent applications that showcase current capabilities of computational methods, this review highlights state-of-the-art algorithmic techniques proposed to overcome challenges posed in silico by the disparate spatial and time scales accessed by dynamic macromolecules. This review is not meant to be exhaustive, as such an endeavor is impossible, but rather aims to balance breadth and depth of strategies for modeling macromolecular structure and dynamics for a broad audience of novices and experts.
Collapse
Affiliation(s)
- Tatiana Maximova
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
| | - Ryan Moffatt
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
| | - Buyong Ma
- Basic Science Program, Leidos Biomedical Research, Inc. Cancer and Inflammation Program, National Cancer Institute, Frederick, Maryland, United States of America
| | - Ruth Nussinov
- Basic Science Program, Leidos Biomedical Research, Inc. Cancer and Inflammation Program, National Cancer Institute, Frederick, Maryland, United States of America
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
- Department of Biongineering, George Mason University, Fairfax, Virginia, United States of America
- School of Systems Biology, George Mason University, Manassas, Virginia, United States of America
| |
Collapse
|
9
|
Belsom A, Schneider M, Fischer L, Brock O, Rappsilber J. Serum Albumin Domain Structures in Human Blood Serum by Mass Spectrometry and Computational Biology. Mol Cell Proteomics 2016; 15:1105-16. [PMID: 26385339 PMCID: PMC4813692 DOI: 10.1074/mcp.m115.048504] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 09/16/2015] [Indexed: 01/12/2023] Open
Abstract
Chemical cross-linking combined with mass spectrometry has proven useful for studying protein-protein interactions and protein structure, however the low density of cross-link data has so far precluded its use in determining structures de novo. Cross-linking density has been typically limited by the chemical selectivity of the standard cross-linking reagents that are commonly used for protein cross-linking. We have implemented the use of a heterobifunctional cross-linking reagent, sulfosuccinimidyl 4,4'-azipentanoate (sulfo-SDA), combining a traditional sulfo-N-hydroxysuccinimide (sulfo-NHS) ester and a UV photoactivatable diazirine group. This diazirine yields a highly reactive and promiscuous carbene species, the net result being a greatly increased number of cross-links compared with homobifunctional, NHS-based cross-linkers. We present a novel methodology that combines the use of this high density photo-cross-linking data with conformational space search to investigate the structure of human serum albumin domains, from purified samples, and in its native environment, human blood serum. Our approach is able to determine human serum albumin domain structures with good accuracy: root-mean-square deviation to crystal structure are 2.8/5.6/2.9 Å (purified samples) and 4.5/5.9/4.8Å (serum samples) for domains A/B/C for the first selected structure; 2.5/4.9/2.9 Å (purified samples) and 3.5/5.2/3.8 Å (serum samples) for the best out of top five selected structures. Our proof-of-concept study on human serum albumin demonstrates initial potential of our approach for determining the structures of more proteins in the complex biological contexts in which they function and which they may require for correct folding. Data are available via ProteomeXchange with identifier PXD001692.
Collapse
Affiliation(s)
- Adam Belsom
- From the ‡Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom
| | - Michael Schneider
- §Robotics and Biology Laboratory, Technische Universität Berlin, 10587 Berlin, Germany
| | - Lutz Fischer
- From the ‡Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom
| | - Oliver Brock
- §Robotics and Biology Laboratory, Technische Universität Berlin, 10587 Berlin, Germany
| | - Juri Rappsilber
- From the ‡Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom; ¶Department of Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany.
| |
Collapse
|
10
|
Garza-Fabre M, Kandathil SM, Handl J, Knowles J, Lovell SC. Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction. EVOLUTIONARY COMPUTATION 2016; 24:577-607. [PMID: 26908350 DOI: 10.1162/evco_a_00176] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Computational approaches to de novo protein tertiary structure prediction, including those based on the preeminent "fragment-assembly" technique, have failed to scale up fully to larger proteins (on the order of 100 residues and above). A number of limiting factors are thought to contribute to the scaling problem over and above the simple combinatorial explosion, but the key ones relate to the lack of exploration of properly diverse protein folds, and to an acute form of "deception" in the energy function, whereby low-energy conformations do not reliably equate with native structures. In this article, solutions to both of these problems are investigated through a multistage memetic algorithm incorporating the successful Rosetta method as a local search routine. We found that specialised genetic operators significantly add to structural diversity and that this translates well to reaching low energies. The use of a generalised stochastic ranking procedure for selection enables the memetic algorithm to handle and traverse deep energy wells that can be considered deceptive, which further adds to the ability of the algorithm to obtain a much-improved diversity of folds. The results should translate to a tangible improvement in the performance of protein structure prediction algorithms in blind experiments such as CASP, and potentially to a further step towards the more challenging problem of predicting the three-dimensional shape of large proteins.
Collapse
Affiliation(s)
- Mario Garza-Fabre
- Decision and Cognitive Sciences Research Centre, University of Manchester, Manchester, M15 6PB, UK
| | - Shaun M Kandathil
- Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| | - Julia Handl
- Decision and Cognitive Sciences Research Centre, University of Manchester, Manchester, M15 6PB, UK
| | - Joshua Knowles
- School of Computer Science, University of Birmingham, Birmingham, B15 2TT, UK
| | - Simon C Lovell
- Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| |
Collapse
|
11
|
Kandathil SM, Handl J, Lovell SC. Toward a detailed understanding of search trajectories in fragment assembly approaches to protein structure prediction. Proteins 2016; 84:411-26. [PMID: 26799916 PMCID: PMC4982100 DOI: 10.1002/prot.24987] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Revised: 12/03/2015] [Accepted: 12/31/2015] [Indexed: 11/30/2022]
Abstract
Energy functions, fragment libraries, and search methods constitute three key components of fragment‐assembly methods for protein structure prediction, which are all crucial for their ability to generate high‐accuracy predictions. All of these components are tightly coupled; efficient searching becomes more important as the quality of fragment libraries decreases. Given these relationships, there is currently a poor understanding of the strengths and weaknesses of the sampling approaches currently used in fragment‐assembly techniques. Here, we determine how the performance of search techniques can be assessed in a meaningful manner, given the above problems. We describe a set of techniques that aim to reduce the impact of the energy function, and assess exploration in view of the search space defined by a given fragment library. We illustrate our approach using Rosetta and EdaFold, and show how certain features of these methods encourage or limit conformational exploration. We demonstrate that individual trajectories of Rosetta are susceptible to local minima in the energy landscape, and that this can be linked to non‐uniform sampling across the protein chain. We show that EdaFold's novel approach can help balance broad exploration with locating good low‐energy conformations. This occurs through two mechanisms which cannot be readily differentiated using standard performance measures: exclusion of false minima, followed by an increasingly focused search in low‐energy regions of conformational space. Measures such as ours can be helpful in characterizing new fragment‐based methods in terms of the quality of conformational exploration realized. Proteins 2016; 84:411–426. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Faculty of Life Sciences, the University of Manchester, Manchester, M13 9PL, United Kingdom
| | - Julia Handl
- Alliance Manchester Business School, Faculty of Humanities, the University of Manchester, Manchester, M13 9PL, United Kingdom
| | - Simon C Lovell
- Faculty of Life Sciences, the University of Manchester, Manchester, M13 9PL, United Kingdom
| |
Collapse
|
12
|
Mabrouk M, Werner T, Schneider M, Putz I, Brock O. Analysis of free modeling predictions by RBO aleph in CASP11. Proteins 2015; 84 Suppl 1:87-104. [PMID: 26492194 DOI: 10.1002/prot.24950] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 09/28/2015] [Accepted: 10/19/2015] [Indexed: 12/15/2022]
Abstract
The CASP experiment is a biannual benchmark for assessing protein structure prediction methods. In CASP11, RBO Aleph ranked as one of the top-performing automated servers in the free modeling category. This category consists of targets for which structural templates are not easily retrievable. We analyze the performance of RBO Aleph and show that its success in CASP was a result of its ab initio structure prediction protocol. A detailed analysis of this protocol demonstrates that two components unique to our method greatly contributed to prediction quality: residue-residue contact prediction by EPC-map and contact-guided conformational space search by model-based search (MBS). Interestingly, our analysis also points to a possible fundamental problem in evaluating the performance of protein structure prediction methods: Improvements in components of the method do not necessarily lead to improvements of the entire method. This points to the fact that these components interact in ways that are poorly understood. This problem, if indeed true, represents a significant obstacle to community-wide progress. Proteins 2016; 84(Suppl 1):87-104. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Mahmoud Mabrouk
- Department of Electrical Engineering and Computer Science, Robotics and Biology Laboratory, Technische Universität Berlin, Berlin, 10587, Germany
| | - Tim Werner
- Department of Electrical Engineering and Computer Science, Robotics and Biology Laboratory, Technische Universität Berlin, Berlin, 10587, Germany
| | - Michael Schneider
- Department of Electrical Engineering and Computer Science, Robotics and Biology Laboratory, Technische Universität Berlin, Berlin, 10587, Germany
| | - Ines Putz
- Department of Electrical Engineering and Computer Science, Robotics and Biology Laboratory, Technische Universität Berlin, Berlin, 10587, Germany
| | - Oliver Brock
- Department of Electrical Engineering and Computer Science, Robotics and Biology Laboratory, Technische Universität Berlin, Berlin, 10587, Germany.
| |
Collapse
|
13
|
Mabrouk M, Putz I, Werner T, Schneider M, Neeb M, Bartels P, Brock O. RBO Aleph: leveraging novel information sources for protein structure prediction. Nucleic Acids Res 2015; 43:W343-8. [PMID: 25897112 PMCID: PMC4489312 DOI: 10.1093/nar/gkv357] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 04/03/2015] [Indexed: 02/02/2023] Open
Abstract
RBO Aleph is a novel protein structure prediction web server for template-based modeling, protein contact prediction and ab initio structure prediction. The server has a strong emphasis on modeling difficult protein targets for which templates cannot be detected. RBO Aleph's unique features are (i) the use of combined evolutionary and physicochemical information to perform residue–residue contact prediction and (ii) leveraging this contact information effectively in conformational space search. RBO Aleph emerged as one of the leading approaches to ab initio protein structure prediction and contact prediction during the most recent Critical Assessment of Protein Structure Prediction experiment (CASP11, 2014). In addition to RBO Aleph's main focus on ab initio modeling, the server also provides state-of-the-art template-based modeling services. Based on template availability, RBO Aleph switches automatically between template-based modeling and ab initio prediction based on the target protein sequence, facilitating use especially for non-expert users. The RBO Aleph web server offers a range of tools for visualization and data analysis, such as the visualization of predicted models, predicted contacts and the estimated prediction error along the model's backbone. The server is accessible at http://compbio.robotics.tu-berlin.de/rbo_aleph/.
Collapse
Affiliation(s)
- Mahmoud Mabrouk
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, 10587 Berlin, Germany
| | - Ines Putz
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, 10587 Berlin, Germany
| | - Tim Werner
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, 10587 Berlin, Germany
| | - Michael Schneider
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, 10587 Berlin, Germany
| | - Moritz Neeb
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, 10587 Berlin, Germany
| | - Philipp Bartels
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, 10587 Berlin, Germany
| | - Oliver Brock
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, 10587 Berlin, Germany
| |
Collapse
|
14
|
Shehu A. A Review of Evolutionary Algorithms for Computing Functional Conformations of Protein Molecules. METHODS IN PHARMACOLOGY AND TOXICOLOGY 2015. [DOI: 10.1007/7653_2015_47] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
15
|
Petrella RJ. OPTIMIZATION BIAS IN ENERGY-BASED STRUCTURE PREDICTION. JOURNAL OF THEORETICAL & COMPUTATIONAL CHEMISTRY 2013; 12:1341014. [PMID: 25552783 PMCID: PMC4278582 DOI: 10.1142/s0219633613410149] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Physics-based computational approaches to predicting the structure of macromolecules such as proteins are gaining increased use, but there are remaining challenges. In the current work, it is demonstrated that in energy-based prediction methods, the degree of optimization of the sampled structures can influence the prediction results. In particular, discrepancies in the degree of local sampling can bias the predictions in favor of the oversampled structures by shifting the local probability distributions of the minimum sampled energies. In simple systems, it is shown that the magnitude of the errors can be calculated from the energy surface, and for certain model systems, derived analytically. Further, it is shown that for energy wells whose forms differ only by a randomly assigned energy shift, the optimal accuracy of prediction is achieved when the sampling around each structure is equal. Energy correction terms can be used in cases of unequal sampling to reproduce the total probabilities that would occur under equal sampling, but optimal corrections only partially restore the prediction accuracy lost to unequal sampling. For multiwell systems, the determination of the correction terms is a multibody problem; it is shown that the involved cross-correlation multiple integrals can be reduced to simpler integrals. The possible implications of the current analysis for macromolecular structure prediction are discussed.
Collapse
Affiliation(s)
- Robert J. Petrella
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, USA
- Department of Medicine, Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA
| |
Collapse
|
16
|
Molloy K, Shehu A. Elucidating the ensemble of functionally-relevant transitions in protein systems with a robotics-inspired method. BMC STRUCTURAL BIOLOGY 2013; 13 Suppl 1:S8. [PMID: 24565158 PMCID: PMC3952944 DOI: 10.1186/1472-6807-13-s1-s8] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Background Many proteins tune their biological function by transitioning between different functional states, effectively acting as dynamic molecular machines. Detailed structural characterization of transition trajectories is central to understanding the relationship between protein dynamics and function. Computational approaches that build on the Molecular Dynamics framework are in principle able to model transition trajectories at great detail but also at considerable computational cost. Methods that delay consideration of dynamics and focus instead on elucidating energetically-credible conformational paths connecting two functionally-relevant structures provide a complementary approach. Effective sampling-based path planning methods originating in robotics have been recently proposed to produce conformational paths. These methods largely model short peptides or address large proteins by simplifying conformational space. Methods We propose a robotics-inspired method that connects two given structures of a protein by sampling conformational paths. The method focuses on small- to medium-size proteins, efficiently modeling structural deformations through the use of the molecular fragment replacement technique. In particular, the method grows a tree in conformational space rooted at the start structure, steering the tree to a goal region defined around the goal structure. We investigate various bias schemes over a progress coordinate for balance between coverage of conformational space and progress towards the goal. A geometric projection layer promotes path diversity. A reactive temperature scheme allows sampling of rare paths that cross energy barriers. Results and conclusions Experiments are conducted on small- to medium-size proteins of length up to 214 amino acids and with multiple known functionally-relevant states, some of which are more than 13Å apart of each-other. Analysis reveals that the method effectively obtains conformational paths connecting structural states that are significantly different. A detailed analysis on the depth and breadth of the tree suggests that a soft global bias over the progress coordinate enhances sampling and results in higher path diversity. The explicit geometric projection layer that biases the exploration away from over-sampled regions further increases coverage, often improving proximity to the goal by forcing the exploration to find new paths. The reactive temperature scheme is shown effective in increasing path diversity, particularly in difficult structural transitions with known high-energy barriers.
Collapse
|
17
|
Saleh S, Olson B, Shehu A. A population-based evolutionary search approach to the multiple minima problem in de novo protein structure prediction. BMC STRUCTURAL BIOLOGY 2013; 13 Suppl 1:S4. [PMID: 24565020 PMCID: PMC3953177 DOI: 10.1186/1472-6807-13-s1-s4] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Background Elucidating the native structure of a protein molecule from its sequence of amino acids, a problem known as de novo structure prediction, is a long standing challenge in computational structural biology. Difficulties in silico arise due to the high dimensionality of the protein conformational space and the ruggedness of the associated energy surface. The issue of multiple minima is a particularly troublesome hallmark of energy surfaces probed with current energy functions. In contrast to the true energy surface, these surfaces are weakly-funneled and rich in comparably deep minima populated by non-native structures. For this reason, many algorithms seek to be inclusive and obtain a broad view of the low-energy regions through an ensemble of low-energy (decoy) conformations. Conformational diversity in this ensemble is key to increasing the likelihood that the native structure has been captured. Methods We propose an evolutionary search approach to address the multiple-minima problem in decoy sampling for de novo structure prediction. Two population-based evolutionary search algorithms are presented that follow the basic approach of treating conformations as individuals in an evolving population. Coarse graining and molecular fragment replacement are used to efficiently obtain protein-like child conformations from parents. Potential energy is used both to bias parent selection and determine which subset of parents and children will be retained in the evolving population. The effect on the decoy ensemble of sampling minima directly is measured by additionally mapping a conformation to its nearest local minimum before considering it for retainment. The resulting memetic algorithm thus evolves not just a population of conformations but a population of local minima. Results and conclusions Results show that both algorithms are effective in terms of sampling conformations in proximity of the known native structure. The additional minimization is shown to be key to enhancing sampling capability and obtaining a diverse ensemble of decoy conformations, circumventing premature convergence to sub-optimal regions in the conformational space, and approaching the native structure with proximity that is comparable to state-of-the-art decoy sampling methods. The results are shown to be robust and valid when using two representative state-of-the-art coarse-grained energy functions.
Collapse
|
18
|
Olson BS, Shehu A. Rapid sampling of local minima in protein energy surface and effective reduction through a multi-objective filter. Proteome Sci 2013; 11:S12. [PMID: 24564970 PMCID: PMC3908317 DOI: 10.1186/1477-5956-11-s1-s12] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many problems in protein modeling require obtaining a discrete representation of the protein conformational space as an ensemble of conformations. In ab-initio structure prediction, in particular, where the goal is to predict the native structure of a protein chain given its amino-acid sequence, the ensemble needs to satisfy energetic constraints. Given the thermodynamic hypothesis, an effective ensemble contains low-energy conformations which are similar to the native structure. The high-dimensionality of the conformational space and the ruggedness of the underlying energy surface currently make it very difficult to obtain such an ensemble. Recent studies have proposed that Basin Hopping is a promising probabilistic search framework to obtain a discrete representation of the protein energy surface in terms of local minima. Basin Hopping performs a series of structural perturbations followed by energy minimizations with the goal of hopping between nearby energy minima. This approach has been shown to be effective in obtaining conformations near the native structure for small systems. Recent work by us has extended this framework to larger systems through employment of the molecular fragment replacement technique, resulting in rapid sampling of large ensembles. METHODS This paper investigates the algorithmic components in Basin Hopping to both understand and control their effect on the sampling of near-native minima. Realizing that such an ensemble is reduced before further refinement in full ab-initio protocols, we take an additional step and analyze the quality of the ensemble retained by ensemble reduction techniques. We propose a novel multi-objective technique based on the Pareto front to filter the ensemble of sampled local minima. RESULTS AND CONCLUSIONS We show that controlling the magnitude of the perturbation allows directly controlling the distance between consecutively-sampled local minima and, in turn, steering the exploration towards conformations near the native structure. For the minimization step, we show that the addition of Metropolis Monte Carlo-based minimization is no more effective than a simple greedy search. Finally, we show that the size of the ensemble of sampled local minima can be effectively and efficiently reduced by a multi-objective filter to obtain a simpler representation of the probed energy surface.
Collapse
Affiliation(s)
- Brian S Olson
- Department of Computer Science, George Mason University, 4400 University Dr., Fairfax, VA, 22030, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, 4400 University Dr., Fairfax, VA, 22030, USA
- Department of Bioengineering, George Mason University, 4400 University Dr., Fairfax, VA, 22030, USA
- School of Systems Biology, George Mason University, 10900 University Blvd., Manassas, VA, 20110, USA
| |
Collapse
|
19
|
Molloy K, Saleh S, Shehu A. Probabilistic search and energy guidance for biased decoy sampling in ab initio protein structure prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1162-1175. [PMID: 24384705 DOI: 10.1109/tcbb.2013.29] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Adequate sampling of the conformational space is a central challenge in ab initio protein structure prediction. In the absence of a template structure, a conformational search procedure guided by an energy function explores the conformational space, gathering an ensemble of low-energy decoy conformations. If the sampling is inadequate, the native structure may be missed altogether. Even if reproduced, a subsequent stage that selects a subset of decoys for further structural detail and energetic refinement may discard near-native decoys if they are high energy or insufficiently represented in the ensemble. Sampling should produce a decoy ensemble that facilitates the subsequent selection of near-native decoys. In this paper, we investigate a robotics-inspired framework that allows directly measuring the role of energy in guiding sampling. Testing demonstrates that a soft energy bias steers sampling toward a diverse decoy ensemble less prone to exploiting energetic artifacts and thus more likely to facilitate retainment of near-native conformations by selection techniques. We employ two different energy functions, the associative memory Hamiltonian with water and Rosetta. Results show that enhanced sampling provides a rigorous testing of energy functions and exposes different deficiencies in them, thus promising to guide development of more accurate representations and energy functions.
Collapse
|
20
|
Basin Hopping as a General and Versatile Optimization Framework for the Characterization of Biological Macromolecules. ACTA ACUST UNITED AC 2012. [DOI: 10.1155/2012/674832] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Since its introduction, the basin hopping (BH) framework has proven useful for hard nonlinear optimization problems with multiple variables and modalities. Applications span a wide range, from packing problems in geometry to characterization of molecular states in statistical physics. BH is seeing a reemergence in computational structural biology due to its ability to obtain a coarse-grained representation of
the protein energy surface in terms of local minima. In this paper, we show that the BH framework is general and versatile, allowing to address problems related to the characterization of protein structure, assembly, and motion due to its fundamental ability to sample minima in a high-dimensional variable space. We show how specific implementations of the main components in BH yield algorithmic realizations that attain state-of-the-art results in the context of ab initio protein structure prediction and rigid protein-protein docking. We also show that BH can map intermediate minima related with motions connecting diverse stable functionally relevant states in a protein molecule,
thus serving as a first step towards the characterization of transition trajectories connecting these states.
Collapse
|
21
|
Olson B, Molloy K, Hendi SF, Shehu A. Guiding probabilistic search of the protein conformational space with structural profiles. J Bioinform Comput Biol 2012; 10:1242005. [PMID: 22809381 DOI: 10.1142/s021972001242005x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The roughness of the protein energy surface poses a significant challenge to search algorithms that seek to obtain a structural characterization of the native state. Recent research seeks to bias search toward near-native conformations through one-dimensional structural profiles of the protein native state. Here we investigate the effectiveness of such profiles in a structure prediction setting for proteins of various sizes and folds. We pursue two directions. We first investigate the contribution of structural profiles in comparison to or in conjunction with physics-based energy functions in providing an effective energy bias. We conduct this investigation in the context of Metropolis Monte Carlo with fragment-based assembly. Second, we explore the effectiveness of structural profiles in providing projection coordinates through which to organize the conformational space. We do so in the context of a robotics-inspired search framework proposed in our lab that employs projections of the conformational space to guide search. Our findings indicate that structural profiles are most effective in obtaining physically realistic near-native conformations when employed in conjunction with physics-based energy functions. Our findings also show that these profiles are very effective when employed instead as projection coordinates to guide probabilistic search toward undersampled regions of the conformational space.
Collapse
Affiliation(s)
- Brian Olson
- Department of Computer Science, George Mason University, 4400 University Drive Fairfax, VA 22030, USA
| | | | | | | |
Collapse
|
22
|
Olson BS, Shehu A. Evolutionary-inspired probabilistic search for enhancing sampling of local minima in the protein energy surface. Proteome Sci 2012; 10 Suppl 1:S5. [PMID: 22759582 PMCID: PMC3380728 DOI: 10.1186/1477-5956-10-s1-s5] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Background Despite computational challenges, elucidating conformations that a protein system assumes under physiologic conditions for the purpose of biological activity is a central problem in computational structural biology. While these conformations are associated with low energies in the energy surface that underlies the protein conformational space, few existing conformational search algorithms focus on explicitly sampling low-energy local minima in the protein energy surface. Methods This work proposes a novel probabilistic search framework, PLOW, that explicitly samples low-energy local minima in the protein energy surface. The framework combines algorithmic ingredients from evolutionary computation and computational structural biology to effectively explore the subspace of local minima. A greedy local search maps a conformation sampled in conformational space to a nearby local minimum. A perturbation move jumps out of a local minimum to obtain a new starting conformation for the greedy local search. The process repeats in an iterative fashion, resulting in a trajectory-based exploration of the subspace of local minima. Results and conclusions The analysis of PLOW's performance shows that, by navigating only the subspace of local minima, PLOW is able to sample conformations near a protein's native structure, either more effectively or as well as state-of-the-art methods that focus on reproducing the native structure for a protein system. Analysis of the actual subspace of local minima shows that PLOW samples this subspace more effectively that a naive sampling approach. Additional theoretical analysis reveals that the perturbation function employed by PLOW is key to its ability to sample a diverse set of low-energy conformations. This analysis also suggests directions for further research and novel applications for the proposed framework.
Collapse
Affiliation(s)
- Brian S Olson
- Department of Computer Science, George Mason University, 4400 University Dr,, Fairfax, VA, 22030, USA.
| | | |
Collapse
|
23
|
Lange OF, Baker D. Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation. Proteins 2012; 80:884-95. [PMID: 22423358 PMCID: PMC3310173 DOI: 10.1002/prot.23245] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
Recent work has shown that NMR structures can be determined by integrating sparse NMR data with structure prediction methods such as Rosetta. The experimental data serve to guide the search for the lowest energy state towards the deep minimum at the native state which is frequently missed in Rosetta de novo structure calculations. However, as the protein size increases, sampling again becomes limiting; for example, the standard Rosetta protocol involving Monte Carlo fragment insertion starting from an extended chain fails to converge for proteins over 150 amino acids even with guidance from chemical shifts (CS-Rosetta) and other NMR data. The primary limitation of this protocol—that every folding trajectory is completely independent of every other—was recently overcome with the development of a new approach involving resolution-adapted structural recombination (RASREC). Here we describe the RASREC approach in detail and compare it to standard CS-Rosetta. We show that the improved sampling of RASREC is essential in obtaining accurate structures over a benchmark set of 11 proteins in the 15-25 kDa size range using chemical shifts, backbone RDCs and HN-HN NOE data; in a number of cases the improved sampling methodology makes a larger contribution than incorporation of additional experimental data. Experimental data are invaluable for guiding sampling to the vicinity of the global energy minimum, but for larger proteins, the standard Rosetta fold-from-extended-chain protocol does not converge on the native minimum even with experimental data and the more powerful RASREC approach is necessary to converge to accurate solutions.
Collapse
Affiliation(s)
- Oliver F Lange
- Department Chemie, Biomolecular NMR and Munich Center for Integrated Protein Science, Technische Universität München, Garching, Germany.
| | | |
Collapse
|
24
|
OLSON BRIAN, MOLLOY KEVIN, SHEHU AMARDA. IN SEARCH OF THE PROTEIN NATIVE STATE WITH A PROBABILISTIC SAMPLING APPROACH. J Bioinform Comput Biol 2011; 9:383-98. [DOI: 10.1142/s0219720011005574] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2011] [Revised: 04/07/2011] [Accepted: 04/11/2011] [Indexed: 11/18/2022]
Abstract
The three-dimensional structure of a protein is a key determinant of its biological function. Given the cost and time required to acquire this structure through experimental means, computational models are necessary to complement wet-lab efforts. Many computational techniques exist for navigating the high-dimensional protein conformational search space, which is explored for low-energy conformations that comprise a protein's native states. This work proposes two strategies to enhance the sampling of conformations near the native state. An enhanced fragment library with greater structural diversity is used to expand the search space in the context of fragment-based assembly. To manage the increased complexity of the search space, only a representative subset of the sampled conformations is retained to further guide the search towards the native state. Our results make the case that these two strategies greatly enhance the sampling of the conformational space near the native state. A detailed comparative analysis shows that our approach performs as well as state-of-the-art ab initio structure prediction protocols.
Collapse
Affiliation(s)
- BRIAN OLSON
- Department of Computer Science, George Mason University 4400 University Drive, Fairfax, VA 22030, USA
| | - KEVIN MOLLOY
- Department of Computer Science, George Mason University 4400 University Drive, Fairfax, VA 22030, USA
| | - AMARDA SHEHU
- Department of Computer Science, George Mason University 4400 University Drive, Fairfax, VA 22030, USA
- Department of Bioinformatics and Computational Biology, George Mason University 4400 University Drive, Fairfax, VA 22030, USA
| |
Collapse
|
25
|
Handl J, Knowles J, Vernon R, Baker D, Lovell SC. The dual role of fragments in fragment-assembly methods for de novo protein structure prediction. Proteins 2011; 80:490-504. [PMID: 22095594 DOI: 10.1002/prot.23215] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2011] [Revised: 08/17/2011] [Accepted: 09/14/2011] [Indexed: 11/07/2022]
Abstract
In fragment-assembly techniques for protein structure prediction, models of protein structure are assembled from fragments of known protein structures. This process is typically guided by a knowledge-based energy function and uses a heuristic optimization method. The fragments play two important roles in this process: they define the set of structural parameters available, and they also assume the role of the main variation operators that are used by the optimiser. Previous analysis has typically focused on the first of these roles. In particular, the relationship between local amino acid sequence and local protein structure has been studied by a range of authors. The correlation between the two has been shown to vary with the window length considered, and the results of these analyses have informed directly the choice of fragment length in state-of-the-art prediction techniques. Here, we focus on the second role of fragments and aim to determine the effect of fragment length from an optimization perspective. We use theoretical analyses to reveal how the size and structure of the search space changes as a function of insertion length. Furthermore, empirical analyses are used to explore additional ways in which the size of the fragment insertion influences the search both in a simulation model and for the fragment-assembly technique, Rosetta.
Collapse
Affiliation(s)
- Julia Handl
- Manchester Business School, The University of Manchester, United Kingdom.
| | | | | | | | | |
Collapse
|
26
|
Zhou Y, Duan Y, Yang Y, Faraggi E, Lei H. Trends in template/fragment-free protein structure prediction. Theor Chem Acc 2011; 128:3-16. [PMID: 21423322 PMCID: PMC3030773 DOI: 10.1007/s00214-010-0799-2] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2010] [Accepted: 08/15/2010] [Indexed: 12/13/2022]
Abstract
Predicting the structure of a protein from its amino acid sequence is a long-standing unsolved problem in computational biology. Its solution would be of both fundamental and practical importance as the gap between the number of known sequences and the number of experimentally solved structures widens rapidly. Currently, the most successful approaches are based on fragment/template reassembly. Lacking progress in template-free structure prediction calls for novel ideas and approaches. This article reviews trends in the development of physical and specific knowledge-based energy functions as well as sampling techniques for fragment-free structure prediction. Recent physical- and knowledge-based studies demonstrated that it is possible to sample and predict highly accurate protein structures without borrowing native fragments from known protein structures. These emerging approaches with fully flexible sampling have the potential to move the field forward.
Collapse
Affiliation(s)
- Yaoqi Zhou
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Yong Duan
- UC Davis Genome Center and Department of Applied Science, University of California, One Shields Avenue, Davis, CA USA
- College of Physics, Huazhong University of Science and Technology, 1037 Luoyu Road, 430074 Wuhan, China
| | - Yuedong Yang
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Eshel Faraggi
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Hongxing Lei
- UC Davis Genome Center and Department of Applied Science, University of California, One Shields Avenue, Davis, CA USA
- Beijing Institute of Genomics, Chinese Academy of Sciences, 100029 Beijing, China
| |
Collapse
|
27
|
Guiding the Search for Native-like Protein Conformations with an Ab-initio Tree-based Exploration. Int J Rob Res 2010. [DOI: 10.1177/0278364910371527] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In this paper we propose a robotics-inspired method to enhance sampling of native-like conformations when employing only aminoacid sequence information for a protein at hand. Computing such conformations, essential to associating structural and functional information with gene sequences, is challenging due to the high-dimensionality and the rugged energy surface of the protein conformational space. The contribution of this paper is a novel two-layered method to enhance the sampling of geometrically distinct low-energy conformations at a coarse-grained level of detail. The method grows a tree in conformational space reconciling two goals: (i) guiding the tree towards lower energies; and (ii) not oversampling geometrically similar conformations. Discretizations of the energy surface and a low-dimensional projection space are employed to select more often for expansion low-energy conformations in under-explored regions of the conformational space. The tree is expanded with low-energy conformations through a Metropolis Monte Carlo framework that uses a move set of physical fragment configurations. Testing on sequences of eight small-to-medium structurally diverse proteins shows that the method rapidly samples native-like conformations in a few hours on a single CPU. Analysis shows that computed conformations are good candidates for further detailed energetic refinements by larger studies in protein engineering and design.
Collapse
|
28
|
Raman S, Lange OF, Rossi P, Tyka M, Wang X, Aramini J, Liu G, Ramelot T, Eletsky A, Szyperski T, Kennedy M, Prestegard J, Montelione GT, Baker D. NMR structure determination for larger proteins using backbone-only data. Science 2010; 327:1014-8. [PMID: 20133520 PMCID: PMC2909653 DOI: 10.1126/science.1183649] [Citation(s) in RCA: 220] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Conventional protein structure determination from nuclear magnetic resonance data relies heavily on side-chain proton-to-proton distances. The necessary side-chain resonance assignment, however, is labor intensive and prone to error. Here we show that structures can be accurately determined without nuclear magnetic resonance (NMR) information on the side chains for proteins up to 25 kilodaltons by incorporating backbone chemical shifts, residual dipolar couplings, and amide proton distances into the Rosetta protein structure modeling methodology. These data, which are too sparse for conventional methods, serve only to guide conformational search toward the lowest-energy conformations in the folding landscape; the details of the computed models are determined by the physical chemistry implicit in the Rosetta all-atom energy function. The new method is not hindered by the deuteration required to suppress nuclear relaxation processes for proteins greater than 15 kilodaltons and should enable routine NMR structure determination for larger proteins.
Collapse
Affiliation(s)
- Srivatsan Raman
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - Oliver F. Lange
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - Paolo Rossi
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, and Northeast Structural Genomics Consortium, Rutgers University, Piscataway, NJ 08854
| | - Michael Tyka
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - Xu Wang
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602
| | - James Aramini
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, and Northeast Structural Genomics Consortium, Rutgers University, Piscataway, NJ 08854
| | - Gaohua Liu
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, and Northeast Structural Genomics Consortium, Rutgers University, Piscataway, NJ 08854
| | - Theresa Ramelot
- Department of Chemistry and Biochemistry and Northeast Structural Genomics Consortium, Miami University, Oxford, OH
| | - Alexander Eletsky
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York 14260
| | - Thomas Szyperski
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York 14260
| | - Michael Kennedy
- Department of Chemistry and Biochemistry and Northeast Structural Genomics Consortium, Miami University, Oxford, OH
| | - James Prestegard
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602
| | - Gaetano T. Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, and Northeast Structural Genomics Consortium, Rutgers University, Piscataway, NJ 08854
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, Seattle, WA 98195
| |
Collapse
|