1
|
Raddi RM, Ge Y, Voelz VA. BICePs v2.0: Software for Ensemble Reweighting Using Bayesian Inference of Conformational Populations. J Chem Inf Model 2023; 63:2370-2381. [PMID: 37027181 PMCID: PMC10278562 DOI: 10.1021/acs.jcim.2c01296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
Abstract
Bayesian Inference of Conformational Populations (BICePs) version 2.0 (v2.0) is a free, open-source Python package that reweights theoretical predictions of conformational state populations using sparse and/or noisy experimental measurements. In this article, we describe the implementation and usage of the latest version of BICePs (v2.0), a powerful, user-friendly and extensible package which makes several improvements upon the previous version. The algorithm now supports many experimental NMR observables (NOE distances, chemical shifts, J-coupling constants, and hydrogen-deuterium exchange protection factors), and enables convenient data preparation and processing. BICePs v2.0 can perform automatic analysis of the sampled posterior, including visualization, and evaluation of statistical significance and sampling convergence. We provide specific coding examples for these topics, and present a detailed example illustrating how to use BICePs v2.0 to reweight a theoretical ensemble using experimental measurements.
Collapse
Affiliation(s)
- Robert M Raddi
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
| | - Yunhui Ge
- Department of Pharmaceutical Sciences, University of California, Irvine, California 92697, United States
| | - Vincent A Voelz
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
| |
Collapse
|
2
|
Mardia KV, Wiechers H, Eltzner B, Huckemann SF. Principal component analysis and clustering on manifolds. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2021.104862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
3
|
Voelz VA, Ge Y, Raddi RM. Reconciling Simulations and Experiments With BICePs: A Review. Front Mol Biosci 2021; 8:661520. [PMID: 34046431 PMCID: PMC8144449 DOI: 10.3389/fmolb.2021.661520] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Accepted: 04/12/2021] [Indexed: 02/04/2023] Open
Abstract
Bayesian Inference of Conformational Populations (BICePs) is an algorithm developed to reconcile simulated ensembles with sparse experimental measurements. The Bayesian framework of BICePs enables population reweighting as a post-simulation processing step, with several advantages over existing methods, including the proper use of reference potentials, and the estimation of a Bayes factor-like quantity called the BICePs score for model selection. Here, we summarize the theory underlying this method in context with related algorithms, review the history of BICePs applications to date, and discuss current shortcomings along with future plans for improvement.
Collapse
Affiliation(s)
- Vincent A. Voelz
- Department of Chemistry, Temple University, Philadelphia, PA, United States
| | - Yunhui Ge
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA, United States
| | - Robert M. Raddi
- Department of Chemistry, Temple University, Philadelphia, PA, United States
| |
Collapse
|
4
|
Ge Y, Voelz VA. Model Selection Using BICePs: A Bayesian Approach for Force Field Validation and Parameterization. J Phys Chem B 2018. [PMID: 29518328 DOI: 10.1021/acs.jpcb.7b11871] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The Bayesian Inference of Conformational Populations (BICePs) algorithm reconciles theoretical predictions of conformational state populations with sparse and/or noisy experimental measurements. Among its key advantages is its ability to perform objective model selection through a quantity we call the BICePs score, which reflects the integrated posterior evidence in favor of a given model, computed through free energy estimation methods. Here, we explore how the BICePs score can be used for force field validation and parametrization. Using a 2D lattice protein as a toy model, we demonstrate that BICePs is able to select the correct value of an interaction energy parameter given ensemble-averaged experimental distance measurements. We show that if conformational states are sufficiently fine-grained, the results are robust to experimental noise and measurement sparsity. Using these insights, we apply BICePs to perform force field evaluations for all-atom simulations of designed β-hairpin peptides against experimental NMR chemical shift measurements. These tests suggest that BICePs scores can be used for model selection in the context of all-atom simulations. We expect this approach to be particularly useful for the computational foldamer design as a tool for improving general-purpose force fields given sparse experimental measurements.
Collapse
Affiliation(s)
- Yunhui Ge
- Department of Chemistry , Temple University , Philadelphia , Pennsylvania 19122 , United States
| | - Vincent A Voelz
- Department of Chemistry , Temple University , Philadelphia , Pennsylvania 19122 , United States
| |
Collapse
|
5
|
Vögeli B, Olsson S, Riek R, Güntert P. Complementarity and congruence between exact NOEs and traditional NMR probes for spatial decoding of protein dynamics. J Struct Biol 2015. [DOI: 10.1016/j.jsb.2015.07.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
6
|
Bratholm LA, Christensen AS, Hamelryck T, Jensen JH. Bayesian inference of protein structure from chemical shift data. PeerJ 2015; 3:e861. [PMID: 25825683 PMCID: PMC4375973 DOI: 10.7717/peerj.861] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2014] [Accepted: 03/06/2015] [Indexed: 12/15/2022] Open
Abstract
Protein chemical shifts are routinely used to augment molecular mechanics force fields in protein structure simulations, with weights of the chemical shift restraints determined empirically. These weights, however, might not be an optimal descriptor of a given protein structure and predictive model, and a bias is introduced which might result in incorrect structures. In the inferential structure determination framework, both the unknown structure and the disagreement between experimental and back-calculated data are formulated as a joint probability distribution, thus utilizing the full information content of the data. Here, we present the formulation of such a probability distribution where the error in chemical shift prediction is described by either a Gaussian or Cauchy distribution. The methodology is demonstrated and compared to a set of empirically weighted potentials through Markov chain Monte Carlo simulations of three small proteins (ENHD, Protein G and the SMN Tudor Domain) using the PROFASI force field and the chemical shift predictor CamShift. Using a clustering-criterion for identifying the best structure, together with the addition of a solvent exposure scoring term, the simulations suggests that sampling both the structure and the uncertainties in chemical shift prediction leads more accurate structures compared to conventional methods using empirical determined weights. The Cauchy distribution, using either sampled uncertainties or predetermined weights, did, however, result in overall better convergence to the native fold, suggesting that both types of distribution might be useful in different aspects of the protein structure prediction.
Collapse
Affiliation(s)
- Lars A. Bratholm
- Department of Chemistry, University of Copenhagen, Copenhagen, Denmark
| | | | - Thomas Hamelryck
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jan H. Jensen
- Department of Chemistry, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
7
|
Voelz VA, Zhou G. Bayesian inference of conformational state populations from computational models and sparse experimental observables. J Comput Chem 2014; 35:2215-24. [PMID: 25250719 DOI: 10.1002/jcc.23738] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2014] [Revised: 08/25/2014] [Accepted: 08/31/2014] [Indexed: 12/29/2022]
Abstract
We present a Bayesian inference approach to estimating conformational state populations from a combination of molecular modeling and sparse experimental data. Unlike alternative approaches, our method is designed for use with small molecules and emphasizes high-resolution structural models, using inferential structure determination with reference potentials, and Markov Chain Monte Carlo to sample the posterior distribution of conformational states. As an application of the method, we determine solution-state conformational populations of the 14-membered macrocycle cineromycin B, using a combination of previously published sparse Nuclear Magnetic Resonance (NMR) observables and replica-exchange molecular dynamic/Quantum Mechanical (QM)-refined conformational ensembles. Our results agree better with experimental data compared to previous modeling efforts. Bayes factors are calculated to quantify the consistency of computational modeling with experiment, and the relative importance of reference potentials and other model parameters.
Collapse
Affiliation(s)
- Vincent A Voelz
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania
| | | |
Collapse
|
8
|
Olsson S, Vögeli BR, Cavalli A, Boomsma W, Ferkinghoff-Borg J, Lindorff-Larsen K, Hamelryck T. Probabilistic Determination of Native State Ensembles of Proteins. J Chem Theory Comput 2014; 10:3484-91. [DOI: 10.1021/ct5001236] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Simon Olsson
- Bioinformatics
Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
- Institute for Research in Biomedicine, CH-6500 Bellinzona, Switzerland
| | - Beat Rolf Vögeli
- Laboratory
of Physical Chemistry, Eidgenössische Technische Hochschule Zürich, 8093 Zürich, Switzerland
| | - Andrea Cavalli
- Institute for Research in Biomedicine, CH-6500 Bellinzona, Switzerland
| | - Wouter Boomsma
- Structural
Biology and NMR Laboratory, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Jesper Ferkinghoff-Borg
- Cellular
Signal Integration Group, Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
| | - Kresten Lindorff-Larsen
- Structural
Biology and NMR Laboratory, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Thomas Hamelryck
- Bioinformatics
Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
9
|
Boomsma W, Ferkinghoff-Borg J, Lindorff-Larsen K. Combining experiments and simulations using the maximum entropy principle. PLoS Comput Biol 2014; 10:e1003406. [PMID: 24586124 PMCID: PMC3930489 DOI: 10.1371/journal.pcbi.1003406] [Citation(s) in RCA: 133] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
A key component of computational biology is to compare the results of computer modelling with experimental measurements. Despite substantial progress in the models and algorithms used in many areas of computational biology, such comparisons sometimes reveal that the computations are not in quantitative agreement with experimental data. The principle of maximum entropy is a general procedure for constructing probability distributions in the light of new data, making it a natural tool in cases when an initial model provides results that are at odds with experiments. The number of maximum entropy applications in our field has grown steadily in recent years, in areas as diverse as sequence analysis, structural modelling, and neurobiology. In this Perspectives article, we give a broad introduction to the method, in an attempt to encourage its further adoption. The general procedure is explained in the context of a simple example, after which we proceed with a real-world application in the field of molecular simulations, where the maximum entropy procedure has recently provided new insight. Given the limited accuracy of force fields, macromolecular simulations sometimes produce results that are at not in complete and quantitative accordance with experiments. A common solution to this problem is to explicitly ensure agreement between the two by perturbing the potential energy function towards the experimental data. So far, a general consensus for how such perturbations should be implemented has been lacking. Three very recent papers have explored this problem using the maximum entropy approach, providing both new theoretical and practical insights to the problem. We highlight each of these contributions in turn and conclude with a discussion on remaining challenges.
Collapse
Affiliation(s)
- Wouter Boomsma
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (WB); (JFB); (KLL)
| | - Jesper Ferkinghoff-Borg
- Cellular Signal Integration Group, Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
- * E-mail: (WB); (JFB); (KLL)
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (WB); (JFB); (KLL)
| |
Collapse
|
10
|
Sanchez-Martinez M, Crehuet R. Application of the maximum entropy principle to determine ensembles of intrinsically disordered proteins from residual dipolar couplings. Phys Chem Chem Phys 2014; 16:26030-9. [DOI: 10.1039/c4cp03114h] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
We present a method based on the maximum entropy principle that can re-weight an ensemble of protein structures based on data from residual dipolar couplings (RDCs).
Collapse
Affiliation(s)
| | - R. Crehuet
- Institute of Advanced Chemistry of Catalunya (IQAC)
- CSIC
- Spain
| |
Collapse
|
11
|
Olsson S, Frellsen J, Boomsma W, Mardia KV, Hamelryck T. Inference of structure ensembles of flexible biomolecules from sparse, averaged data. PLoS One 2013; 8:e79439. [PMID: 24244505 PMCID: PMC3820694 DOI: 10.1371/journal.pone.0079439] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Accepted: 09/24/2013] [Indexed: 11/21/2022] Open
Abstract
We present the theoretical foundations of a general principle to infer structure ensembles of flexible biomolecules from spatially and temporally averaged data obtained in biophysical experiments. The central idea is to compute the Kullback-Leibler optimal modification of a given prior distribution with respect to the experimental data and its uncertainty. This principle generalizes the successful inferential structure determination method and recently proposed maximum entropy methods. Tractability of the protocol is demonstrated through the analysis of simulated nuclear magnetic resonance spectroscopy data of a small peptide.
Collapse
Affiliation(s)
- Simon Olsson
- Bioinformatics Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (SO); (TH)
| | - Jes Frellsen
- Bioinformatics Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Structural Biology and NMR Laboratory, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Kanti V. Mardia
- Department of Statistics, School of Mathematics, University of Leeds, Leeds, United Kingdom
| | - Thomas Hamelryck
- Bioinformatics Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (SO); (TH)
| |
Collapse
|
12
|
Boomsma W, Frellsen J, Harder T, Bottaro S, Johansson KE, Tian P, Stovgaard K, Andreetta C, Olsson S, Valentin JB, Antonov LD, Christensen AS, Borg M, Jensen JH, Lindorff-Larsen K, Ferkinghoff-Borg J, Hamelryck T. PHAISTOS: a framework for Markov chain Monte Carlo simulation and inference of protein structure. J Comput Chem 2013; 34:1697-705. [PMID: 23619610 DOI: 10.1002/jcc.23292] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2012] [Revised: 03/14/2013] [Accepted: 03/20/2013] [Indexed: 11/10/2022]
Abstract
We present a new software framework for Markov chain Monte Carlo sampling for simulation, prediction, and inference of protein structure. The software package contains implementations of recent advances in Monte Carlo methodology, such as efficient local updates and sampling from probabilistic models of local protein structure. These models form a probabilistic alternative to the widely used fragment and rotamer libraries. Combined with an easily extendible software architecture, this makes PHAISTOS well suited for Bayesian inference of protein structure from sequence and/or experimental data. Currently, two force-fields are available within the framework: PROFASI and OPLS-AA/L, the latter including the generalized Born surface area solvent model. A flexible command-line and configuration-file interface allows users quickly to set up simulations with the desired configuration. PHAISTOS is released under the GNU General Public License v3.0. Source code and documentation are freely available from http://phaistos.sourceforge.net. The software is implemented in C++ and has been tested on Linux and OSX platforms.
Collapse
Affiliation(s)
- Wouter Boomsma
- Department of Biology, University of Copenhagen, Copenhagen, 2200, Denmark
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Johansson KE, Hamelryck T. A simple probabilistic model of multibody interactions in proteins. Proteins 2013; 81:1340-50. [DOI: 10.1002/prot.24277] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2012] [Revised: 01/31/2013] [Accepted: 02/18/2013] [Indexed: 11/10/2022]
Affiliation(s)
- Kristoffer Enøe Johansson
- Section for Biomolecular Sciences; Department of Biology, University of Copenhagen; Ole Maal⊘es Vej 5, DK-2200 Copenhagen N Denmark
| | - Thomas Hamelryck
- Section for Computational and RNA biology; Department of Biology, University of Copenhagen; Room 1.2.22, Ole Maal⊘es Vej 5 DK-2200 Copenhagen N Denmark
| |
Collapse
|
14
|
Harder T, Borg M, Bottaro S, Boomsma W, Olsson S, Ferkinghoff-Borg J, Hamelryck T. An Efficient Null Model for Conformational Fluctuations in Proteins. Structure 2012; 20:1028-39. [DOI: 10.1016/j.str.2012.03.020] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Revised: 03/08/2012] [Accepted: 03/12/2012] [Indexed: 10/28/2022]
|