1
|
An H, Liu X, Cai W, Shao X. AttenGpKa: A Universal Predictor of Solvation Acidity Using Graph Neural Network and Molecular Topology. J Chem Inf Model 2024. [PMID: 38982757 DOI: 10.1021/acs.jcim.4c00449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
Rapid and accurate calculation of acid dissociation constant (pKa) is crucial for designing chemical synthesis routes, optimizing catalysts, and predicting chemical behavior. Despite recent progress in machine learning, predicting solvation acidity, especially in nonaqueous solvents, remains challenging due to limited experimental data. This challenge arises from treating experimental values in different solvents as distinct data domains and modeling them separately. In this work, we treat both the solutes and solvents equally from a perspective of molecular topology and propose a highly universal framework called AttenGpKa for predicting solvation acidity. AttenGpKa is trained using 26,522 experimental pKa values from 60 pure and mixed solvents in the iBonD database. As a result, our model can simultaneously predict the pKa values of a compound in various solvents, including pure water, pure nonaqueous, and mixed solvents. AttenGpKa achieves universality by using graph neural networks and attention mechanisms to learn complex effects within solute and solvent molecules. Furthermore, encodings of both solute and solvent molecules are adaptively fused to simulate the influence of the solvent on acid dissociation. AttenGpKa demonstrates robust generalization in extensive validations. The interpretability studies further indicate that our model has effectively learnt electronic and solvent effects. A free-to-use software is provided to facilitate the use of AttenGpKa for pKa prediction.
Collapse
Affiliation(s)
- Hongle An
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
2
|
Lelièvre T, Pigeon T, Stoltz G, Zhang W. Analyzing Multimodal Probability Measures with Autoencoders. J Phys Chem B 2024; 128:2607-2631. [PMID: 38466759 DOI: 10.1021/acs.jpcb.3c07075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Finding collective variables to describe some important coarse-grained information on physical systems, in particular metastable states, remains a key issue in molecular dynamics. Recently, machine learning techniques have been intensively used to complement and possibly bypass expert knowledge in order to construct collective variables. Our focus here is on neural network approaches based on autoencoders. We study some relevant mathematical properties of the loss function considered for training autoencoders and provide physical interpretations based on conditional variances and minimum energy paths. We also consider various extensions in order to better describe physical systems, by incorporating more information on transition states at saddle points, and/or allowing for multiple decoders in order to describe several transition paths. Our results are illustrated on toy two-dimensional systems and on alanine dipeptide.
Collapse
Affiliation(s)
- Tony Lelièvre
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
| | - Thomas Pigeon
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
- IFP Energies Nouvelles, Rond-Point de l'Echangeur de Solaize, BP 3, 69360 Solaize, France
| | - Gabriel Stoltz
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
| | - Wei Zhang
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany
- Zuse Institute Berlin, Takustraße 7, 14195 Berlin, Germany
| |
Collapse
|
3
|
Gong S, Zheng Z. A slow feature analysis approach for the optimization of collective variables. J Chem Phys 2024; 160:094104. [PMID: 38426510 DOI: 10.1063/5.0191014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 02/13/2024] [Indexed: 03/02/2024] Open
Abstract
Molecular dynamics simulations have become increasingly important in understanding the microscopic mechanisms of various molecular systems. However, the high energy barriers in complicated molecules often make it difficult to observe events of interest within a reasonable timescale. To address this issue, researchers have developed a variety of enhanced sampling methods to explore configuration space by adding bias potentials along the slowly changing collective variables (CVs). In this study, we have developed a new tool that combines slow feature analysis and biasing-enhanced sampling methods to identify effective CVs and enhance the sampling efficiency of configuration space. We have demonstrated the effectiveness of this tool through three general examples.
Collapse
Affiliation(s)
- Shuai Gong
- School of Chemistry, Chemical Engineering and Life Science, Wuhan University of Technology, 122 Luoshi Road, Wuhan 430070, People's Republic of China
| | - Zheng Zheng
- School of Chemistry, Chemical Engineering and Life Science, Wuhan University of Technology, 122 Luoshi Road, Wuhan 430070, People's Republic of China
- Divamics Inc., Suzhou 215000, People's Republic of China
| |
Collapse
|
4
|
Fu H, Bian H, Shao X, Cai W. Collective Variable-Based Enhanced Sampling: From Human Learning to Machine Learning. J Phys Chem Lett 2024; 15:1774-1783. [PMID: 38329095 DOI: 10.1021/acs.jpclett.3c03542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Enhanced-sampling algorithms relying on collective variables (CVs) are extensively employed to study complex (bio)chemical processes that are not amenable to brute-force molecular simulations. The selection of appropriate CVs characterizing the slow movement modes is of paramount importance for reliable and efficient enhanced-sampling simulations. In this Perspective, we first review the application and limitations of CVs obtained from chemical and geometrical intuition. We also introduce path-sampling algorithms, which can identify path-like CVs in a high-dimensional free-energy space. Machine-learning algorithms offer a viable approach to finding suitable CVs by analyzing trajectories from preliminary simulations. We discuss both the performance of machine-learning-derived CVs in enhanced-sampling simulations of experimental models and the challenges involved in applying these CVs to realistic, complex molecular assemblies. Moreover, we provide a prospective view of the potential advancements of machine-learning algorithms for the development of CVs in the field of enhanced-sampling simulations.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Hengwei Bian
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
5
|
Liu X, Xing J, Fu H, Shao X, Cai W. Analyzing Molecular Dynamics Trajectories Thermodynamically through Artificial Intelligence. J Chem Theory Comput 2024; 20:665-676. [PMID: 38193858 DOI: 10.1021/acs.jctc.3c00975] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Molecular dynamics simulations produce trajectories that correspond to vast amounts of structure when exploring biochemical processes. Extracting valuable information, e.g., important intermediate states and collective variables (CVs) that describe the major movement modes, from molecular trajectories to understand the underlying mechanisms of biological processes presents a significant challenge. To achieve this goal, we introduce a deep learning approach, coined DIKI (deep identification of key intermediates), to determine low-dimensional CVs distinguishing key intermediate conformations without a-priori assumptions. DIKI dynamically plans the distribution of latent space and groups together similar conformations within the same cluster. Moreover, by incorporating two user-defined parameters, namely, coarse focus knob and fine focus knob, to help identify conformations with low free energy and differentiate the subtle distinctions among these conformations, resolution-tunable clustering was achieved. Furthermore, the integration of DIKI with a path-finding algorithm contributes to the identification of crucial intermediates along the lowest free-energy pathway. We postulate that DIKI is a robust and flexible tool that can find widespread applications in the analysis of complex biochemical processes.
Collapse
Affiliation(s)
- Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Jingya Xing
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
6
|
Fu H, Liu H, Xing J, Zhao T, Shao X, Cai W. Deep-Learning-Assisted Enhanced Sampling for Exploring Molecular Conformational Changes. J Phys Chem B 2023; 127:9926-9935. [PMID: 37947397 DOI: 10.1021/acs.jpcb.3c05284] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
We present a novel strategy to explore conformational changes and identify stable states of molecular objects, eliminating the need for a priori knowledge. The approach applies a deep learning method to extract information about the movement modes of the molecular object from a short, high-dimensional, and parameter-free preliminary enhanced-sampling simulation. The gathered information is described by a small set of deep-learning-based collective variables (dCVs), which steer the production-enhanced-sampling simulation. Considering the challenge of adequately exploring the configurational space using the low-dimensional, suboptimal dCVs, we incorporate a method designed for ergodic sampling, namely, Gaussian-accelerated molecular dynamics (MD), into the framework of CV-based enhanced sampling. MD simulations on both toy models and nontrivial examples demonstrate the remarkable computational efficiency of the strategy in capturing the conformational changes of molecular objects without a priori knowledge. Specifically, we achieved the blind folding of two fast folders, chignolin and villin, within a time scale of hundreds of nanoseconds and successfully reconstructed the free-energy landscapes that characterize their reversible folding. All in all, the presented strategy holds significant promise for investigating conformational changes in macromolecules, and it is anticipated to find extensive applications in the fields of chemistry and biology.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Han Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Jingya Xing
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Tong Zhao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
7
|
New avenues in artificial-intelligence-assisted drug discovery. Drug Discov Today 2023; 28:103516. [PMID: 36736583 DOI: 10.1016/j.drudis.2023.103516] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 12/08/2022] [Accepted: 01/26/2023] [Indexed: 02/05/2023]
Abstract
Over the past decade, the amount of biomedical data available has grown at unprecedented rates. Increased automation technology and larger data volumes have encouraged the use of machine learning (ML) or artificial intelligence (AI) techniques for mining such data and extracting useful patterns. Because the identification of chemical entities with desired biological activity is a crucial task in drug discovery, AI technologies have the potential to accelerate this process and support decision making. In addition, the advent of deep learning (DL) has shown great promise in addressing diverse problems in drug discovery, such as de novo molecular design. Herein, we will appraise the current state-of-the-art in AI-assisted drug discovery, discussing the recent applications covering generative models for chemical structure generation, scoring functions to improve binding affinity and pose prediction, and molecular dynamics to assist in the parametrization, featurization and generalization tasks. Finally, we will discuss current hurdles and the strategies to overcome them, as well as potential future directions.
Collapse
|
8
|
Heydari S, Raniolo S, Livi L, Limongelli V. Transferring chemical and energetic knowledge between molecular systems with machine learning. Commun Chem 2023; 6:13. [PMID: 36697971 PMCID: PMC9839695 DOI: 10.1038/s42004-022-00790-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 12/07/2022] [Indexed: 01/15/2023] Open
Abstract
Predicting structural and energetic properties of a molecular system is one of the fundamental tasks in molecular simulations, and it has applications in chemistry, biology, and medicine. In the past decade, the advent of machine learning algorithms had an impact on molecular simulations for various tasks, including property prediction of atomistic systems. In this paper, we propose a novel methodology for transferring knowledge obtained from simple molecular systems to a more complex one, endowed with a significantly larger number of atoms and degrees of freedom. In particular, we focus on the classification of high and low free-energy conformations. Our approach relies on utilizing (i) a novel hypergraph representation of molecules, encoding all relevant information for characterizing multi-atom interactions for a given conformation, and (ii) novel message passing and pooling layers for processing and making free-energy predictions on such hypergraph-structured data. Despite the complexity of the problem, our results show a remarkable Area Under the Curve of 0.92 for transfer learning from tri-alanine to the deca-alanine system. Moreover, we show that the same transfer learning approach can also be used in an unsupervised way to group chemically related secondary structures of deca-alanine in clusters having similar free-energy values. Our study represents a proof of concept that reliable transfer learning models for molecular systems can be designed, paving the way to unexplored routes in prediction of structural and energetic properties of biologically relevant systems.
Collapse
Affiliation(s)
- Sajjad Heydari
- grid.21613.370000 0004 1936 9609Department of Computer Science, University of Manitoba, Winnipeg, MB R3T 2N2 Canada
| | - Stefano Raniolo
- grid.29078.340000 0001 2203 2861Faculty of Biomedical Sciences, Euler Institute, Università della Svizzera italiana (USI), via G. Buffi 13, CH-6900 Lugano, Switzerland
| | - Lorenzo Livi
- grid.21613.370000 0004 1936 9609Department of Computer Science, University of Manitoba, Winnipeg, MB R3T 2N2 Canada ,grid.8391.30000 0004 1936 8024Department of Computer Science, University of Exeter, Exeter, EX4 4QF UK
| | - Vittorio Limongelli
- grid.29078.340000 0001 2203 2861Faculty of Biomedical Sciences, Euler Institute, Università della Svizzera italiana (USI), via G. Buffi 13, CH-6900 Lugano, Switzerland ,grid.4691.a0000 0001 0790 385XDepartment of Pharmacy, University of Naples “Federico II”, via D. Montesano 49, I-80131 Naples, Italy
| |
Collapse
|
9
|
Chen H, Chipot C. Chasing collective variables using temporal data-driven strategies. QRB DISCOVERY 2023; 4:e2. [PMID: 37564298 PMCID: PMC10411323 DOI: 10.1017/qrd.2022.23] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/21/2022] [Accepted: 12/29/2022] [Indexed: 01/09/2023] Open
Abstract
The convergence of free-energy calculations based on importance sampling depends heavily on the choice of collective variables (CVs), which in principle, should include the slow degrees of freedom of the biological processes to be investigated. Autoencoders (AEs), as emerging data-driven dimension reduction tools, have been utilised for discovering CVs. AEs, however, are often treated as black boxes, and what AEs actually encode during training, and whether the latent variables from encoders are suitable as CVs for further free-energy calculations remains unknown. In this contribution, we review AEs and their time-series-based variants, including time-lagged AEs (TAEs) and modified TAEs, as well as the closely related model variational approach for Markov processes networks (VAMPnets). We then show through numerical examples that AEs learn the high-variance modes instead of the slow modes. In stark contrast, time series-based models are able to capture the slow modes. Moreover, both modified TAEs with extensions from slow feature analysis and the state-free reversible VAMPnets (SRVs) can yield orthogonal multidimensional CVs. As an illustration, we employ SRVs to discover the CVs of the isomerizations of N-acetyl-N'-methylalanylamide and trialanine by iterative learning with trajectories from biased simulations. Last, through numerical experiments with anisotropic diffusion, we investigate the potential relationship of time-series-based models and committor probabilities.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
- Theoretical and Computational Biophysics Group, Beckman Institute, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL61801, USA
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL60637, USA
| |
Collapse
|
10
|
Ketkaew R, Luber S. DeepCV: A Deep Learning Framework for Blind Search of Collective Variables in Expanded Configurational Space. J Chem Inf Model 2022; 62:6352-6364. [PMID: 36445176 DOI: 10.1021/acs.jcim.2c00883] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
We present Deep learning for Collective Variables (DeepCV), a computer code that provides an efficient and customizable implementation of the deep autoencoder neural network (DAENN) algorithm that has been developed in our group for computing collective variables (CVs) and can be used with enhanced sampling methods to reconstruct free energy surfaces of chemical reactions. DeepCV can be used to conveniently calculate molecular features, train models, generate CVs, validate rare events from sampling, and analyze a trajectory for chemical reactions of interest. We use DeepCV in an example study of the conformational transition of cyclohexene, where metadynamics simulations are performed using DAENN-generated CVs. The results show that the adopted CVs give free energies in line with those obtained by previously developed CVs and experimental results. DeepCV is open-source software written in Python/C++ object-oriented languages, based on the TensorFlow framework and distributed free of charge for noncommercial purposes, which can be incorporated into general molecular dynamics software. DeepCV also comes with several additional tools, i.e., an application program interface (API), documentation, and tutorials.
Collapse
Affiliation(s)
- Rangsiman Ketkaew
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Sandra Luber
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| |
Collapse
|
11
|
Duan C, Liu X, Cai W, Shao X. Spectral Encoder to Extract the Features of Near-Infrared Spectra for Multivariate Calibration. J Chem Inf Model 2022; 62:3695-3703. [PMID: 35916486 DOI: 10.1021/acs.jcim.2c00786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
An autoencoder architecture was adopted for near-infrared (NIR) spectral analysis by extracting the common features in the spectra. Three autoencoder-based networks with different purposes were constructed. First, a spectral encoder was established by training the network with a set of spectra as the input. The features of the spectra can be encoded by the nodes in the bottleneck layer, which in turn can be used to build a sparse and robust model. Second, taking the spectra of one instrument as the input and that of another instrument as the reference output, the common features in both spectra can be obtained in the bottleneck layer. Therefore, in the prediction step, the spectral features of the second can be predicted by taking the reverse of the decoder as the encoder. Furthermore, transfer learning was used to build the model for the spectra of more instruments by fine-tuning the trained network. NIR datasets of plant, wheat, and pharmaceutical tablets measured on multiple instruments were used to test the method. The multi-linear regression (MLR) model with the encoded features was found to have a similar or slightly better performance in prediction compared with the partial least-squares (PLS) model.
Collapse
Affiliation(s)
- Chaoshu Duan
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin 300071, China.,Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin 300071, China.,Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin 300071, China.,Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin 300071, China.,Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
12
|
Challenges and frontiers of computational modelling of biomolecular recognition. QRB DISCOVERY 2022. [DOI: 10.1017/qrd.2022.11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
Abstract
Biomolecular recognition including binding of small molecules, peptides and proteins to their target receptors plays a key role in cellular function and has been targeted for therapeutic drug design. However, the high flexibility of biomolecules and slow binding and dissociation processes have presented challenges for computational modelling. Here, we review the challenges and computational approaches developed to characterise biomolecular binding, including molecular docking, molecular dynamics simulations (especially enhanced sampling) and machine learning. Further improvements are still needed in order to accurately and efficiently characterise binding structures, mechanisms, thermodynamics and kinetics of biomolecules in the future.
Collapse
|
13
|
Bhakat S. Collective variable discovery in the age of machine learning: reality, hype and everything in between. RSC Adv 2022; 12:25010-25024. [PMID: 36199882 PMCID: PMC9437778 DOI: 10.1039/d2ra03660f] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/20/2022] [Indexed: 11/21/2022] Open
Abstract
Understanding the kinetics and thermodynamics profile of biomolecules is necessary to understand their functional roles which has a major impact in mechanism driven drug discovery. Molecular dynamics simulation has been routinely used to understand conformational dynamics and molecular recognition in biomolecules. Statistical analysis of high-dimensional spatiotemporal data generated from molecular dynamics simulation requires identification of a few low-dimensional variables which can describe the essential dynamics of a system without significant loss of information. In physical chemistry, these low-dimensional variables are often called collective variables. Collective variables are used to generate reduced representations of free energy surfaces and calculate transition probabilities between different metastable basins. However the choice of collective variables is not trivial for complex systems. Collective variables range from geometric criteria such as distances and dihedral angles to abstract ones such as weighted linear combinations of multiple geometric variables. The advent of machine learning algorithms led to increasing use of abstract collective variables to represent biomolecular dynamics. In this review, I will highlight several nuances of commonly used collective variables ranging from geometric to abstract ones. Further, I will put forward some cases where machine learning based collective variables were used to describe simple systems which in principle could have been described by geometric ones. Finally, I will put forward my thoughts on artificial general intelligence and how it can be used to discover and predict collective variables from spatiotemporal data generated by molecular dynamics simulations. Data driven collective variable discovery methods to capture conformational dynamics in biological macromolecules.![]()
Collapse
Affiliation(s)
- Soumendranath Bhakat
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Pennsylvania 19104-6059, USA
| |
Collapse
|