1
|
Lei H, Li X. Petrov-Galerkin methods for the construction of non-Markovian dynamics preserving nonlocal statistics. J Chem Phys 2021; 154:184108. [PMID: 34241032 DOI: 10.1063/5.0042679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A common observation in coarse-graining a molecular system is the non-Markovian behavior, primarily due to the lack of scale separations. This is reflected in the strong memory effect and the non-white noise spectrum, which must be incorporated into a coarse-grained description to correctly predict dynamic properties. To construct a stochastic model that gives rise to the correct non-Markovian dynamics, we propose a Galerkin projection approach, which transforms the exhausting effort of finding an appropriate model to choosing appropriate subspaces in terms of the derivatives of the coarse-grained variables and, at the same time, provides an accurate approximation to the generalized Langevin equation. We introduce the notion of fractional statistics that embodies nonlocal properties. More importantly, we show how to pick subspaces in the Galerkin projection so that those statistics are automatically matched.
Collapse
Affiliation(s)
- Huan Lei
- Department of Computational Mathematics, Science and Engineering and Department of Statistics and Probability, Michigan State University, East Lansing, Michigan 48824, USA
| | - Xiantao Li
- Department of Mathematics, the Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
2
|
Grogan F, Lei H, Li X, Baker NA. Data-driven molecular modeling with the generalized Langevin equation. JOURNAL OF COMPUTATIONAL PHYSICS 2020; 418:109633. [PMID: 32952214 PMCID: PMC7494205 DOI: 10.1016/j.jcp.2020.109633] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The complexity of molecular dynamics simulations necessitates dimension reduction and coarse-graining techniques to enable tractable computation. The generalized Langevin equation (GLE) describes coarse-grained dynamics in reduced dimensions. In spite of playing a crucial role in non-equilibrium dynamics, the memory kernel of the GLE is often ignored because it is difficult to characterize and expensive to solve. To address these issues, we construct a data-driven rational approximation to the GLE. Building upon previous work leveraging the GLE to simulate simple systems, we extend these results to more complex molecules, whose many degrees of freedom and complicated dynamics require approximation methods. We demonstrate the effectiveness of our approximation by testing it against exact methods and comparing observables such as autocorrelation and transition rates.
Collapse
Affiliation(s)
- Francesca Grogan
- Pacific Northwest National Laboratory, Richland, WA 99352, United States
| | - Huan Lei
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, United States
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, United States
| | - Xiantao Li
- Department of Mathematics, Pennsylvania State University, State College, PA 16801, United States
| | - Nathan A. Baker
- Pacific Northwest National Laboratory, Richland, WA 99352, United States
- Division of Applied Mathematics, Brown University, Providence, RI 02912, United States
| |
Collapse
|
3
|
Gao P, Yang X, Tartakovsky AM. Learning Coarse-Grained Potentials for Binary Fluids. J Chem Inf Model 2020; 60:3731-3745. [PMID: 32668158 DOI: 10.1021/acs.jcim.0c00337] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
For a multiple-fluid system, CG models capable of accurately predicting the interfacial properties as a function of curvature are still lacking. In this work, we propose a new probabilistic machine learning (ML) model for learning CG potentials for binary fluids. The water-hexane mixture is selected as a typical immiscible binary liquid-liquid system. We develop a new CG force field (FF) using the Shinoda-DeVane-Klein (SDK) FF framework and compute parameters in this CG FF using the proposed probabilistic ML method. It is shown that a standard response-surface approach does not provide a unique set of parameters, as it results in a loss function with multiple shallow minima. To address this challenge, we develop a probabilistic ML approach where we compute the probability density function (PDF) of parameters that minimize the loss function. The PDF has a well-defined peak corresponding to a unique set of parameters in the CG FF that reproduces the desired properties of a liquid-liquid interface. We compare the performance of the new CG FF with several existing FFs for the water-hexane mixture, including two atomistic and three CG FFs with respect to modeling the interface structure and thermodynamic properties. It is demonstrated that the new FF significantly improves the CG model prediction of both the interfacial tension and structure for the water-hexane mixture.
Collapse
Affiliation(s)
- Peiyuan Gao
- Advanced Computing, Mathematics, and Data Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Xiu Yang
- Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Alexandre M Tartakovsky
- Advanced Computing, Mathematics, and Data Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
4
|
Lei H, Li J, Gao P, Stinis P, Baker NA. A data-driven framework for sparsity-enhanced surrogates with arbitrary mutually dependent randomness. COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING 2019; 350:199-227. [PMID: 32038051 PMCID: PMC7007047 DOI: 10.1016/j.cma.2019.03.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The challenge of quantifying uncertainty propagation in real-world systems is rooted in the high-dimensionality of the stochastic input and the frequent lack of explicit knowledge of its probability distribution. Traditional approaches show limitations for such problems, especially when the size of the training data is limited. To address these difficulties, we have developed a general framework of constructing surrogate models on spaces of stochastic input with arbitrary probability measure irrespective of the mutual dependencies between individual components of the random inputs and the analytical form. The present Data-driven Sparsity-enhancing Rotation for Arbitrary Randomness (DSRAR) framework includes a data-driven construction of multivariate polynomial basis for arbitrary mutually dependent probability measures and a sparsity enhancement rotation procedure. This sparsity-enhancing rotation method was initially proposed in our previous work [1] for Gaussian density distributions, which may not be feasible for non-Gaussian distributions due to the loss of orthogonality after the rotation. To remedy such difficulties, we developed a new data-driven approach to construct orthonormal polynomials for arbitrary mutually dependent randomness, ensuring the constructed basis maintains the orthogonality/near-orthogonality with respect to the density of the rotated random vector, where directly applying the regular polynomial chaos including arbitrary polynomial chaos (aPC) [2] shows limitations due to the assumption of the mutual independence between the components of the random inputs. The developed DSRAR framework leads to accurate recovery, with only limited training data, of a sparse representation of the target functions. The effectiveness of our method is demonstrated in challenging problems such as partial differential equations and realistic molecular systems within high-dimensional (O(10)) conformational spaces where the underlying density is implicitly represented by a large collection of sample data, as well as systems with explicitly given non-Gaussian probabilistic measures.
Collapse
Affiliation(s)
- Huan Lei
- Pacific Northwest National Laboratory, Richland, WA
99352
| | - Jing Li
- Pacific Northwest National Laboratory, Richland, WA
99352
| | - Peiyuan Gao
- Pacific Northwest National Laboratory, Richland, WA
99352
| | - Panagiotis Stinis
- Pacific Northwest National Laboratory, Richland, WA
99352
- Department of Applied Mathematics, University of
Washington, Seattle, WA 98195
| | - Nathan A. Baker
- Pacific Northwest National Laboratory, Richland, WA
99352
- Division of Applied Mathematics, Brown University,
Providence, RI 02912
| |
Collapse
|
5
|
Goethe M, Fita I, Rubi JM. Testing the mutual information expansion of entropy with multivariate Gaussian distributions. J Chem Phys 2018; 147:224102. [PMID: 29246041 DOI: 10.1063/1.4996847] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The mutual information expansion (MIE) represents an approximation of the configurational entropy in terms of low-dimensional integrals. It is frequently employed to compute entropies from simulation data of large systems, such as macromolecules, for which brute-force evaluation of the full configurational integral is intractable. Here, we test the validity of MIE for systems consisting of more than m = 100 degrees of freedom (dofs). The dofs are distributed according to multivariate Gaussian distributions which were generated from protein structures using a variant of the anisotropic network model. For the Gaussian distributions, we have semi-analytical access to the configurational entropy as well as to all contributions of MIE. This allows us to accurately assess the validity of MIE for different situations. We find that MIE diverges for systems containing long-range correlations which means that the error of consecutive MIE approximations grows with the truncation order n for all tractable n ≪ m. This fact implies severe limitations on the applicability of MIE, which are discussed in the article. For systems with correlations that decay exponentially with distance, MIE represents an asymptotic expansion of entropy, where the first successive MIE approximations approach the exact entropy, while MIE also diverges for larger orders. In this case, MIE serves as a useful entropy expansion when truncated up to a specific truncation order which depends on the correlation length of the system.
Collapse
Affiliation(s)
- Martin Goethe
- Department of Condensed Matter Physics, University of Barcelona, Carrer Martí i Franqués 1, 08028 Barcelona, Spain
| | - Ignacio Fita
- Molecular Biology Institute of Barcelona (IBMB-CSIC, Maria de Maeztu Unit of Excellence), Carrer Baldiri Reixac 4-8, 08028 Barcelona, Spain
| | - J Miguel Rubi
- Department of Condensed Matter Physics, University of Barcelona, Carrer Martí i Franqués 1, 08028 Barcelona, Spain
| |
Collapse
|
6
|
Yang X, Lei H, Gao P, Thomas DG, Mobley DL, Baker NA. Atomic Radius and Charge Parameter Uncertainty in Biomolecular Solvation Energy Calculations. J Chem Theory Comput 2018; 14:759-767. [PMID: 29293342 DOI: 10.1021/acs.jctc.7b00905] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Atomic radii and charges are two major parameters used in implicit solvent electrostatics and energy calculations. The optimization problem for charges and radii is underdetermined, leading to uncertainty in the values of these parameters and in the results of solvation energy calculations using these parameters. This paper presents a new method for quantifying this uncertainty in implicit solvation calculations of small molecules using surrogate models based on generalized polynomial chaos (gPC) expansions. There are relatively few atom types used to specify radii parameters in implicit solvation calculations; therefore, surrogate models for these low-dimensional spaces could be constructed using least-squares fitting. However, there are many more types of atomic charges; therefore, construction of surrogate models for the charge parameter space requires compressed sensing combined with an iterative rotation method to enhance problem sparsity. We demonstrate the application of the method by presenting results for the uncertainties in small molecule solvation energies based on these approaches. The method presented in this paper is a promising approach for efficiently quantifying uncertainty in a wide range of force field parametrization problems, including those beyond continuum solvation calculations. The intent of this study is to provide a way for developers of implicit solvent model parameter sets to understand the sensitivity of their target properties (solvation energy) on underlying choices for solute radius and charge parameters.
Collapse
Affiliation(s)
| | | | | | | | - David L Mobley
- Department of Pharmaceutical Sciences, University of California Irvine , Irvine, California 92697, United States
| | - Nathan A Baker
- Division of Applied Mathematics, Brown University , Providence, Rhode Island 02912, United States
| |
Collapse
|
7
|
Li Z, Bian X, Yang X, Karniadakis GE. A comparative study of coarse-graining methods for polymeric fluids: Mori-Zwanzig vs. iterative Boltzmann inversion vs. stochastic parametric optimization. J Chem Phys 2017; 145:044102. [PMID: 27475343 DOI: 10.1063/1.4959121] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We construct effective coarse-grained (CG) models for polymeric fluids by employing two coarse-graining strategies. The first one is a forward-coarse-graining procedure by the Mori-Zwanzig (MZ) projection while the other one applies a reverse-coarse-graining procedure, such as the iterative Boltzmann inversion (IBI) and the stochastic parametric optimization (SPO). More specifically, we perform molecular dynamics (MD) simulations of star polymer melts to provide the atomistic fields to be coarse-grained. Each molecule of a star polymer with internal degrees of freedom is coarsened into a single CG particle and the effective interactions between CG particles can be either evaluated directly from microscopic dynamics based on the MZ formalism, or obtained by the reverse methods, i.e., IBI and SPO. The forward procedure has no free parameters to tune and recovers the MD system faithfully. For the reverse procedure, we find that the parameters in CG models cannot be selected arbitrarily. If the free parameters are properly defined, the reverse CG procedure also yields an accurate effective potential. Moreover, we explain how an aggressive coarse-graining procedure introduces the many-body effect, which makes the pairwise potential invalid for the same system at densities away from the training point. From this work, general guidelines for coarse-graining of polymeric fluids can be drawn.
Collapse
Affiliation(s)
- Zhen Li
- Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912, USA
| | - Xin Bian
- Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912, USA
| | - Xiu Yang
- Pacific Northwest National Laboratory, Richland, Washington 99352, USA
| | - George Em Karniadakis
- Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912, USA
| |
Collapse
|
8
|
Gosink LJ, Overall CC, Reehl SM, Whitney PD, Mobley DL, Baker NA. Bayesian Model Averaging for Ensemble-Based Estimates of Solvation-Free Energies. J Phys Chem B 2017; 121:3458-3472. [PMID: 27966363 DOI: 10.1021/acs.jpcb.6b09198] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
This paper applies the Bayesian Model Averaging statistical ensemble technique to estimate small molecule solvation free energies. There is a wide range of methods available for predicting solvation free energies, ranging from empirical statistical models to ab initio quantum mechanical approaches. Each of these methods is based on a set of conceptual assumptions that can affect predictive accuracy and transferability. Using an iterative statistical process, we have selected and combined solvation energy estimates using an ensemble of 17 diverse methods from the fourth Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) blind prediction study to form a single, aggregated solvation energy estimate. Methods that possess minimal or redundant information are pruned from the ensemble and the evaluation process repeats until aggregate predictive performance can no longer be improved. We show that this process results in a final aggregate estimate that outperforms all individual methods by reducing estimate errors by as much as 91% to 1.2 kcal mol-1 accuracy. This work provides a new approach for accurate solvation free energy prediction and lays the foundation for future work on aggregate models that can balance computational cost with prediction accuracy.
Collapse
Affiliation(s)
| | | | | | | | - David L Mobley
- Departments of Pharmaceutical Sciences and Chemistry, University of California, Irvine , Irvine, California 92697, United States
| | - Nathan A Baker
- Division of Applied Mathematics, Brown University , Providence, Rhode Island 02912, United States
| |
Collapse
|
9
|
Clement N, Rasheed M, Bajaj C. Uncertainty Quantified Computational Analysis of the Energetics of Virus Capsid Assembly. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2016; 2016:1706-1713. [PMID: 28936368 PMCID: PMC5604467 DOI: 10.1109/bibm.2016.7822775] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Most of the existing research in assembly pathway prediction/analysis of viral capsids makes the simplifying assumption that the configuration of the intermediate states can be extracted directly from the final configuration of the entire capsid. This assumption does not take into account the conformational changes of the constituent proteins as well as minor changes to the binding interfaces that continue throughout the assembly process until stabilization. This paper presents a statistical-ensemble based approach which samples the configurational space for each monomer with the relative local orientation between monomers, to capture the uncertainties in binding and conformations. Furthermore, instead of using larger capsomers (trimers, pentamers) as building blocks, we allow all possible subassemblies to bind in all possible combinations. We represent the resulting assembly graph in two different ways: First, we use the Wilcoxon signed rank measure to compare the distributions of binding free energy computed on the sampled conformations to predict likely pathways. Second, we represent chemical equilibrium aspects of the transitions as a Bayesian Factor graph where both associations and dissociations are modeled based on concentrations and the binding free energies. We applied these protocols on the feline panleukopenia virus and the Nudaurelia capensis virus. Results from these experiments showed significant departure from those one would obtain if only the static configurations of the proteins were considered. Hence, we establish the importance of an uncertainty-aware protocol for pathway analysis, and provide a statistical framework as an important first step towards assembly pathway prediction with high statistical confidence.
Collapse
Affiliation(s)
- N Clement
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712
| | - M Rasheed
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712
| | - C Bajaj
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712
| |
Collapse
|
10
|
Rasheed M, Clement N, Bhowmick A, Bajaj C. Statistical Framework for Uncertainty Quantification in Computational Molecular Modeling. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2016; 2016:146-155. [PMID: 29202129 PMCID: PMC5710766 DOI: 10.1145/2975167.2975182] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
As computational modeling, simulation, and predictions are becoming integral parts of biomedical pipelines, it behooves us to emphasize the reliability of the computational protocol. For any reported quantity of interest (QOI), one must also compute and report a measure of the uncertainty or error associated with the QOI. This is especially important in molecular modeling, since in most practical applications the inputs to the computational protocol are often noisy, incomplete, or low-resolution. Unfortunately, currently available modeling tools do not account for uncertainties and their effect on the final QOIs with sufficient rigor. We have developed a statistical framework that expresses the uncertainty of the QOI as the probability that the reported value deviates from the true value by more than some user-defined threshold. First, we provide a theoretical approach where this probability can be bounded using Azuma-Hoeffding like inequalities. Second, we approximate this probability empirically by sampling the space of uncertainties of the input and provide applications of our framework to bound uncertainties of several QOIs commonly used in molecular modeling. Finally, we also present several visualization techniques to effectively and quantitavely visualize the uncertainties: in the input, final QOIs, and also intermediate states.
Collapse
Affiliation(s)
- Muhibur Rasheed
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78705
| | - Nathan Clement
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78705
| | - Abhishek Bhowmick
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78705
| | - Chandrajit Bajaj
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78705
| |
Collapse
|
11
|
An Adaptive WENO Collocation Method for Differential Equations with Random Coefficients. MATHEMATICS 2016. [DOI: 10.3390/math4020029] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|