1
|
Alavi SF, Chen Y, Hou YF, Ge F, Zheng P, Dral PO. ANI-1ccx-gelu Universal Interatomic Potential and Its Fine-Tuning: Toward Accurate and Efficient Anharmonic Vibrational Frequencies. J Phys Chem Lett 2025:483-493. [PMID: 39748511 DOI: 10.1021/acs.jpclett.4c03031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
Abstract
Calculating anharmonic vibrational modes of molecules for interpreting experimental spectra is one of the most interesting challenges of contemporary computational chemistry. However, the traditional QM methods are costly for this application. Machine learning techniques have emerged as a powerful tool for substituting the traditional QM methods. Universal interatomic potentials (UIPs) hold a particular promise to deliver accurate results at a fraction of the cost of the traditional QM methods, but the performance of UIPs for calculating anharmonic vibrational frequencies remains hitherto unknown. Here we show that despite a known excellent performance of the representative UIP ANI-1ccx for thermochemical properties, it fails for the anharmonic frequencies due to the original unfortunate choice of the activation function. Hence, we recommend evaluating new UIPs on anharmonic frequencies as an additional important quality test. To remedy the shortcomings of ANI-1ccx, we introduce its reformulation ANI-1ccx-gelu with the GELU activation function, which is capable of calculating IR anharmonic frequencies with reasonable accuracy (close to B3LYP/6-31G*). We also show that our new UIP can be fine-tuned to obtain very accurate anharmonic frequencies for some specific molecules but more effort is needed to improve the overall quality of UIP and its capability for fine-tuning. The new UIP will be included as part of our universal and updatable AI-enhanced QM methods (UAIQM) platform and is available together with usage and fine-tuning tutorials in open-source MLatom at https://github.com/dralgroup/mlatom. The calculations can also be performed via a web browser at https://XACScloud.com.
Collapse
Affiliation(s)
- Seyedeh Fatemeh Alavi
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yuxinxin Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yi-Fan Hou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Institute of Physics, Faculty of Physics, Astronomy, and Informatics, Nicolaus Copernicus University in Torun, ul. Grudziądzka 5, 87-100 Torun, Poland
| |
Collapse
|
2
|
Maennel H, Unke OT, Müller KR. Complete and Efficient Covariants for Three-Dimensional Point Configurations with Application to Learning Molecular Quantum Properties. J Phys Chem Lett 2024; 15:12513-12519. [PMID: 39670428 DOI: 10.1021/acs.jpclett.4c02376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2024]
Abstract
When physical properties of molecules are being modeled with machine learning, it is desirable to incorporate SO(3)-covariance. While such models based on low body order features are not complete, we formulate and prove general completeness properties for higher order methods and show that 6k - 5 of these features are enough for up to k atoms. We also find that the Clebsch-Gordan operations commonly used in these methods can be replaced by matrix multiplications without sacrificing completeness, lowering the scaling from O(l6) to O(l3) in the degree of the features. We apply this to quantum chemistry, but the proposed methods are generally applicable for problems involving three-dimensional point configurations.
Collapse
Affiliation(s)
- Hartmut Maennel
- Google DeepMind Zürich, Brandschenkestraße 110, 8002 Zürich, Switzerland
| | - Oliver T Unke
- Google DeepMind Berlin, Tucholskystraße 2, 10117 Berlin, Germany
| | - Klaus-Robert Müller
- Google DeepMind, https://deepmind.google/
- TU Berlin, Machine Learning Group, Marchstraße 23, 10587 Berlin, Germany
- Berlin Institute for the Foundation of Learning and Data, Ernst-Reuter-Platz 7, 10587 Berlin, Germany
- Max Planck Institute for Informatics Saarbrücken, Saarland Informatics Campus, Building E1 4, 66123 Sarbrücken, Germany
- Department of Artificial Intelligence, Korea University, Seoul 136-713, Korea
| |
Collapse
|
3
|
Kulichenko M, Nebgen B, Lubbers N, Smith JS, Barros K, Allen AEA, Habib A, Shinkle E, Fedik N, Li YW, Messerly RA, Tretiak S. Data Generation for Machine Learning Interatomic Potentials and Beyond. Chem Rev 2024; 124:13681-13714. [PMID: 39572011 DOI: 10.1021/acs.chemrev.4c00572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2024]
Abstract
The field of data-driven chemistry is undergoing an evolution, driven by innovations in machine learning models for predicting molecular properties and behavior. Recent strides in ML-based interatomic potentials have paved the way for accurate modeling of diverse chemical and structural properties at the atomic level. The key determinant defining MLIP reliability remains the quality of the training data. A paramount challenge lies in constructing training sets that capture specific domains in the vast chemical and structural space. This Review navigates the intricate landscape of essential components and integrity of training data that ensure the extensibility and transferability of the resulting models. We delve into the details of active learning, discussing its various facets and implementations. We outline different types of uncertainty quantification applied to atomistic data acquisition and the correlations between estimated uncertainty and true error. The role of atomistic data samplers in generating diverse and informative structures is highlighted. Furthermore, we discuss data acquisition via modified and surrogate potential energy surfaces as an innovative approach to diversify training data. The Review also provides a list of publicly available data sets that cover essential domains of chemical space.
Collapse
Affiliation(s)
- Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Justin S Smith
- NVIDIA Corporation, Santa Clara, California 95051, United States
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Alice E A Allen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Adela Habib
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Emily Shinkle
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nikita Fedik
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Richard A Messerly
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
4
|
Vendrell RC, Ajagekar A, Bergman MT, Hall CK, You F. Designing microplastic-binding peptides with a variational quantum circuit-based hybrid quantum-classical approach. SCIENCE ADVANCES 2024; 10:eadq8492. [PMID: 39693432 DOI: 10.1126/sciadv.adq8492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 11/12/2024] [Indexed: 12/20/2024]
Abstract
De novo peptide design exhibits great potential in materials engineering, particularly for the use of plastic-binding peptides to help remediate microplastic pollution. There are no known peptide binders for many plastics-a gap that can be filled with de novo design. Current computational methods for peptide design exhibit limitations in sampling and scaling that could be addressed with quantum computing. Hybrid quantum-classical methods can leverage complementary strengths of near-term quantum algorithms and classical techniques for complex tasks like peptide design. This work introduces a hybrid quantum-classical generative framework for designing plastic-binding peptides combining variational quantum circuits with a variational autoencoder network. We demonstrate the framework's effectiveness in generating peptide candidates, evaluate its efficiency for property-oriented design, and validate the candidates with molecular dynamics simulations. This quantum computing-based approach could accelerate the development of biomolecular tools for environmental and biomedical applications while advancing the study of biomolecular systems through quantum technologies.
Collapse
Affiliation(s)
- Raul Conchello Vendrell
- Institute for Theoretical Physics, ETH Zurich, Zurich 8093, Switzerland
- Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853, USA
| | - Akshay Ajagekar
- Systems Engineering, College of Engineering, Cornell University, Ithaca, NY 14853, USA
| | - Michael T Bergman
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC 27606, USA
| | - Carol K Hall
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC 27606, USA
| | - Fengqi You
- Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853, USA
- Systems Engineering, College of Engineering, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
5
|
Sit MK, Das S, Samanta K. Machine Learning-Assisted Mixed Quantum-Classical Dynamics without Explicit Nonadiabatic Coupling: Application to the Photodissociation of Peroxynitric Acid. J Phys Chem A 2024; 128:8244-8253. [PMID: 39283987 DOI: 10.1021/acs.jpca.4c02876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
We have devised a hybrid quantum-classical scheme utilizing machine-learned potential energy surfaces (PES), which circumvents the need for explicit computation of nonadiabatic coupling elements. The quantities necessary to account for the nonadiabatic effects are directly obtained from the PESs. The simulation of dynamics is based on the fewest-switches surface-hopping method. We applied this scheme to model the photodissociation of both N-O and O-O bonds in a conformer of peroxynitric acid (HO2NO2). Adiabatic PES data for the six lowest states of this molecule were computed at the CASSCF level for various nuclear configurations. These served as the training data for the machine-learning models for the PESs. The dynamics simulation was initiated on the lowest optically bright singlet excited state (S4) and propagated along the two Jacobi coordinates J → 1 and J → 2 while accounting for the nonadiabatic effects through transitions between PESs. Our analysis revealed that there is a very high chance of dissociation of the N-O bond leading to the HO2 and NO2 fragments.
Collapse
Affiliation(s)
- Mahesh K Sit
- School of Basic Sciences, Indian Institute of Technology Bhubaneswar, Argul, Odisha 752050, India
| | - Subhasish Das
- School of Basic Sciences, Indian Institute of Technology Bhubaneswar, Argul, Odisha 752050, India
| | - Kousik Samanta
- School of Basic Sciences, Indian Institute of Technology Bhubaneswar, Argul, Odisha 752050, India
| |
Collapse
|
6
|
Gryn'ova G, Bereau T, Müller C, Friederich P, Wade RC, Nunes-Alves A, Soares TA, Merz K. EDITORIAL: Chemical Compound Space Exploration by Multiscale High-Throughput Screening and Machine Learning. J Chem Inf Model 2024; 64:5737-5738. [PMID: 39129448 DOI: 10.1021/acs.jcim.4c01300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Affiliation(s)
- Ganna Gryn'ova
- School of Chemistry, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Tristan Bereau
- Institute for Theoretical Physics, Heidelberg University, Heidelberg 69120, Germany
| | - Carolin Müller
- Computer-Chemistry-Center, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nägelsbachstraße 25, Erlangen 91052, Germany
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Kaiserstr. 12, Karlsruhe 76131, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Kaiserstr. 12, Karlsruhe 76131, Germany
| | - Rebecca C Wade
- Molecular and Cellular Modeling Group, Heidelberg Institute for Theoretical Studies (HITS), Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany
- Center for Molecular Biology of Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Im Neuenheimer Feld 329, Heidelberg 69120, Germany
- Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Im Neuenheimer Feld 205, Heidelberg 69120, Germany
| | - Ariane Nunes-Alves
- Institute of Chemistry, Technische Universität Berlin, Berlin 10623, Germany
| | - Thereza A Soares
- Department of Chemistry, FFCLRP, University of São Paulo, Ribeirão Preto 14040-901, Brazil
- Hylleraas Centre for Quantum Molecular Sciences, University of Oslo, Oslo 0315, Norway
| | - Kenneth Merz
- Department of Chemistry, Michigan State University, Michigan 48824, United States
| |
Collapse
|
7
|
Frank JT, Unke OT, Müller KR, Chmiela S. A Euclidean transformer for fast and stable machine learned force fields. Nat Commun 2024; 15:6539. [PMID: 39107296 PMCID: PMC11303804 DOI: 10.1038/s41467-024-50620-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 07/10/2024] [Indexed: 08/10/2024] Open
Abstract
Recent years have seen vast progress in the development of machine learned force fields (MLFFs) based on ab-initio reference calculations. Despite achieving low test errors, the reliability of MLFFs in molecular dynamics (MD) simulations is facing growing scrutiny due to concerns about instability over extended simulation timescales. Our findings suggest a potential connection between robustness to cumulative inaccuracies and the use of equivariant representations in MLFFs, but the computational cost associated with these representations can limit this advantage in practice. To address this, we propose a transformer architecture called SO3KRATES that combines sparse equivariant representations (Euclidean variables) with a self-attention mechanism that separates invariant and equivariant information, eliminating the need for expensive tensor products. SO3KRATES achieves a unique combination of accuracy, stability, and speed that enables insightful analysis of quantum properties of matter on extended time and system size scales. To showcase this capability, we generate stable MD trajectories for flexible peptides and supra-molecular structures with hundreds of atoms. Furthermore, we investigate the PES topology for medium-sized chainlike molecules (e.g., small peptides) by exploring thousands of minima. Remarkably, SO3KRATES demonstrates the ability to strike a balance between the conflicting demands of stability and the emergence of new minimum-energy conformations beyond the training data, which is crucial for realistic exploration tasks in the field of biochemistry.
Collapse
Affiliation(s)
- J Thorben Frank
- Machine Learning Group, TU Berlin, Berlin, Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| | | | - Klaus-Robert Müller
- Machine Learning Group, TU Berlin, Berlin, Germany.
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
- Google DeepMind, Berlin, Germany.
- Department of Artificial Intelligence, Korea University, Seoul, Korea.
- Max Planck Institut für Informatik, Saarbrücken, Germany.
| | - Stefan Chmiela
- Machine Learning Group, TU Berlin, Berlin, Germany.
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
| |
Collapse
|
8
|
Abranches DO, Maginn EJ, Colón YJ. Stochastic machine learning via sigma profiles to build a digital chemical space. Proc Natl Acad Sci U S A 2024; 121:e2404676121. [PMID: 39042681 PMCID: PMC11295021 DOI: 10.1073/pnas.2404676121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 06/16/2024] [Indexed: 07/25/2024] Open
Abstract
This work establishes a different paradigm on digital molecular spaces and their efficient navigation by exploiting sigma profiles. To do so, the remarkable capability of Gaussian processes (GPs), a type of stochastic machine learning model, to correlate and predict physicochemical properties from sigma profiles is demonstrated, outperforming state-of-the-art neural networks previously published. The amount of chemical information encoded in sigma profiles eases the learning burden of machine learning models, permitting the training of GPs on small datasets which, due to their negligible computational cost and ease of implementation, are ideal models to be combined with optimization tools such as gradient search or Bayesian optimization (BO). Gradient search is used to efficiently navigate the sigma profile digital space, quickly converging to local extrema of target physicochemical properties. While this requires the availability of pretrained GP models on existing datasets, such limitations are eliminated with the implementation of BO, which can find global extrema with a limited number of iterations. A remarkable example of this is that of BO toward boiling temperature optimization. Holding no knowledge of chemistry except for the sigma profile and boiling temperature of carbon monoxide (the worst possible initial guess), BO finds the global maximum of the available boiling temperature dataset (over 1,000 molecules encompassing more than 40 families of organic and inorganic compounds) in just 15 iterations (i.e., 15 property measurements), cementing sigma profiles as a powerful digital chemical space for molecular optimization and discovery, particularly when little to no experimental data is initially available.
Collapse
Affiliation(s)
- Dinis O. Abranches
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, IN46556
| | - Edward J. Maginn
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, IN46556
| | - Yamil J. Colón
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, IN46556
| |
Collapse
|
9
|
Yang ZX, Xie XT, Kang PL, Wang ZX, Shang C, Liu ZP. Many-Body Function Corrected Neural Network with Atomic Attention (MBNN-att) for Molecular Property Prediction. J Chem Theory Comput 2024. [PMID: 39034686 DOI: 10.1021/acs.jctc.4c00660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/23/2024]
Abstract
Recent years have seen a surge of machine learning (ML) in chemistry for predicting chemical properties, but a low-cost, general-purpose, and high-performance model, desirable to be accessible on central processing unit (CPU) devices, remains not available. For this purpose, here we introduce an atomic attention mechanism into many-body function corrected neural network (MBNN), namely, MBNN-att ML model, to predict both the extensive and intensive properties of molecules and materials. The MBNN-att uses explicit function descriptors as the inputs for the atom-based feed-forward neural network (NN). The output of the NN is designed to be a vector to implement the multihead self-attention mechanism. This vector is split into two parts: the atomic attention weight part and the many-body-function part. The final property is obtained by summing the products of each atomic attention weight and the corresponding many-body function. We show that MBNN-att performs well on all QM9 properties, i.e., errors on all properties, below chemical accuracy, and, in particular, achieves the top performance for the energy-related extensive properties. By systematically comparing with other explicit-function-type descriptor ML models and the graph representation ML models, we demonstrate that the many-body-function framework and atomic attention mechanism are key ingredients for the high performance and the good transferability of MBNN-att in molecular property prediction.
Collapse
Affiliation(s)
- Zheng-Xin Yang
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
| | - Xin-Tian Xie
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
| | - Pei-Lin Kang
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
| | - Zhen-Xiong Wang
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
| | - Cheng Shang
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
- Shanghai Qi Zhi Institution, Shanghai 200030, China
| | - Zhi-Pan Liu
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
- Key Laboratory of Synthetic and Self-Assembly Chemistry for Organic Functional Molecules, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, China
- Shanghai Qi Zhi Institution, Shanghai 200030, China
| |
Collapse
|
10
|
Fallani A, Medrano Sandonas L, Tkatchenko A. Inverse mapping of quantum properties to structures for chemical space of small organic molecules. Nat Commun 2024; 15:6061. [PMID: 39025883 PMCID: PMC11258234 DOI: 10.1038/s41467-024-50401-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 07/01/2024] [Indexed: 07/20/2024] Open
Abstract
Computer-driven molecular design combines the principles of chemistry, physics, and artificial intelligence to identify chemical compounds with tailored properties. While quantum-mechanical (QM) methods, coupled with machine learning, already offer a direct mapping from 3D molecular structures to their properties, effective methodologies for the inverse mapping in chemical space remain elusive. We address this challenge by demonstrating the possibility of parametrizing a chemical space with a finite set of QM properties. Our proof-of-concept implementation achieves an approximate property-to-structure mapping, the QIM model (which stands for "Quantum Inverse Mapping"), by forcing a variational auto-encoder with a property encoder to obtain a common internal representation for both structures and properties. After validating this mapping for small drug-like molecules, we illustrate its capabilities with an explainability study as well as by the generation of de novo molecular structures with targeted properties and transition pathways between conformational isomers. Our findings thus provide a proof-of-principle demonstration aiming to enable the inverse property-to-structure design in diverse chemical spaces.
Collapse
Affiliation(s)
- Alessio Fallani
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany.
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
11
|
Gould T, Chan B, Dale SG, Vuckovic S. Identifying and embedding transferability in data-driven representations of chemical space. Chem Sci 2024; 15:11122-11133. [PMID: 39027290 PMCID: PMC11253166 DOI: 10.1039/d4sc02358g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 06/02/2024] [Indexed: 07/20/2024] Open
Abstract
Transferability, especially in the context of model generalization, is a paradigm of all scientific disciplines. However, the rapid advancement of machine learned model development threatens this paradigm, as it can be difficult to understand how transferability is embedded (or missed) in complex models developed using large training data sets. Two related open problems are how to identify, without relying on human intuition, what makes training data transferable; and how to embed transferability into training data. To solve both problems for ab initio chemical modelling, an indispensable tool in everyday chemistry research, we introduce a transferability assessment tool (TAT) and demonstrate it on a controllable data-driven model for developing density functional approximations (DFAs). We reveal that human intuition in the curation of training data introduces chemical biases that can hamper the transferability of data-driven DFAs. We use our TAT to motivate three transferability principles; one of which introduces the key concept of transferable diversity. Finally, we propose data curation strategies for general-purpose machine learning models in chemistry that identify and embed the transferability principles.
Collapse
Affiliation(s)
- Tim Gould
- Queensland Micro- and Nanotechnology Centre, Griffith University Nathan Qld 4111 Australia
| | - Bun Chan
- Graduate School of Engineering, Nagasaki University Bunkyo 1-14 Nagasaki 852-8521 Japan
| | - Stephen G Dale
- Queensland Micro- and Nanotechnology Centre, Griffith University Nathan Qld 4111 Australia
- Institute of Functional Intelligent Materials, National University of Singapore 4 Science Drive 2 Singapore 117544
| | - Stefan Vuckovic
- Department of Chemistry, University of Fribourg Fribourg Switzerland
| |
Collapse
|
12
|
Atz K, Nippa DF, Müller AT, Jost V, Anelli A, Reutlinger M, Kramer C, Martin RE, Grether U, Schneider G, Wuitschik G. Geometric deep learning-guided Suzuki reaction conditions assessment for applications in medicinal chemistry. RSC Med Chem 2024; 15:2310-2321. [PMID: 39026644 PMCID: PMC11253849 DOI: 10.1039/d4md00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/25/2024] [Indexed: 07/20/2024] Open
Abstract
Suzuki cross-coupling reactions are considered a valuable tool for constructing carbon-carbon bonds in small molecule drug discovery. However, the synthesis of chemical matter often represents a time-consuming and labour-intensive bottleneck. We demonstrate how machine learning methods trained on high-throughput experimentation (HTE) data can be leveraged to enable fast reaction condition selection for novel coupling partners. We show that the trained models support chemists in determining suitable catalyst-solvent-base combinations for individual transformations including an evaluation of the need for HTE screening. We introduce an algorithm for designing 96-well plates optimized towards reaction yields and discuss the model performance of zero- and few-shot machine learning. The best-performing machine learning model achieved a three-category classification accuracy of 76.3% (±0.2%) and an F 1-score for a binary classification of 79.1% (±0.9%). Validation on eight reactions revealed a receiver operating characteristic (ROC) curve (AUC) value of 0.82 (±0.07) for few-shot machine learning. On the other hand, zero-shot machine learning models achieved a mean ROC-AUC value of 0.63 (±0.16). This study positively advocates the application of few-shot machine learning-guided reaction condition selection for HTE campaigns in medicinal chemistry and highlights practical applications as well as challenges associated with zero-shot machine learning.
Collapse
Affiliation(s)
- Kenneth Atz
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - David F Nippa
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Alex T Müller
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Vera Jost
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Andrea Anelli
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Michael Reutlinger
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Christian Kramer
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Rainer E Martin
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Uwe Grether
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich Vladimir-Prelog-Weg 4 8093 Zurich Switzerland
| | - Georg Wuitschik
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| |
Collapse
|
13
|
Butin O, Pereyaslavets L, Kamath G, Illarionov A, Sakipov S, Kurnikov IV, Voronina E, Ivahnenko I, Leontyev I, Nawrocki G, Darkhovskiy M, Olevanov M, Cherniavskyi YK, Lock C, Greenslade S, Kornberg RD, Levitt M, Fain B. The Determination of Free Energy of Hydration of Water Ions from First Principles. J Chem Theory Comput 2024; 20:5215-5224. [PMID: 38842599 DOI: 10.1021/acs.jctc.3c01411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
We model the autoionization of water by determining the free energy of hydration of the major intermediate species of water ions. We represent the smallest ions─the hydroxide ion OH-, the hydronium ion H3O+, and the Zundel ion H5O2+─by bonded models and the more extended ionic structures by strong nonbonded interactions (e.g., the Eigen H9O4+ = H3O+ + 3(H2O) and the Stoyanov H13O6+ = H5O2+ + 4(H2O)). Our models are faithful to the precise QM energies and their components to within 1% or less. Using the calculated free energies and atomization energies, we compute the pKa of pure water from first principles as a consistency check and arrive at a value within 1.3 log units of the experimental one. From these calculations, we conclude that the hydronium ion, and its hydrated state, the Eigen cation, are the dominant species in the water autoionization process.
Collapse
Affiliation(s)
- Oleg Butin
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Leonid Pereyaslavets
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Ganesh Kamath
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Alexey Illarionov
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Serzhan Sakipov
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Igor V Kurnikov
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Ekaterina Voronina
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
- Skobeltsyn Institute of Nuclear Physics, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Ilya Ivahnenko
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Igor Leontyev
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Grzegorz Nawrocki
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Mikhail Darkhovskiy
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Michael Olevanov
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
- Department of Physics, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Yevhen K Cherniavskyi
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Christopher Lock
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Palo Alto, California 94304, United States
| | - Sean Greenslade
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Roger D Kornberg
- Department of Structural Biology, Stanford University School of Medicine, Stanford, California 94305, United States
| | - Michael Levitt
- Department of Structural Biology, Stanford University School of Medicine, Stanford, California 94305, United States
| | - Boris Fain
- InterX, Inc. (a subsidiary of NeoTX Therapeutics, Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| |
Collapse
|
14
|
Hadad RE, Roy A, Rabani E, Redmer R, Baer R. Stochastic density functional theory combined with Langevin dynamics for warm dense matter. Phys Rev E 2024; 109:065304. [PMID: 39020867 DOI: 10.1103/physreve.109.065304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 05/17/2024] [Indexed: 07/19/2024]
Abstract
This study overviews and extends a recently developed stochastic finite-temperature Kohn-Sham density functional theory to study warm dense matter using Langevin dynamics, specifically under periodic boundary conditions. The method's algorithmic complexity exhibits nearly linear scaling with system size and is inversely proportional to the temperature. Additionally, a linear-scaling stochastic approach is introduced to assess the Kubo-Greenwood conductivity, demonstrating exceptional stability for dc conductivity. Utilizing the developed tools, we investigate the equation of state, radial distribution, and electronic conductivity of hydrogen at a temperature of 30 000 K. As for the radial distribution functions, we reveal a transition of hydrogen from gaslike to liquidlike behavior as its density exceeds 4g/cm^{3}. As for the electronic conductivity as a function of the density, we identified a remarkable isosbestic point at frequencies around 7 eV, which may be an additional signature of a gas-liquid transition in hydrogen at 30 000 K.
Collapse
Affiliation(s)
| | | | - Eran Rabani
- Department of Chemistry, University of California, Berkeley, California 94720, USA; Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; and The Raymond and Beverly Sackler Center of Computational Molecular and Materials Science, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | |
Collapse
|
15
|
Shakiba M, Akimov AV. Machine-Learned Kohn-Sham Hamiltonian Mapping for Nonadiabatic Molecular Dynamics. J Chem Theory Comput 2024; 20:2992-3007. [PMID: 38581699 DOI: 10.1021/acs.jctc.4c00008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2024]
Abstract
In this work, we report a simple, efficient, and scalable machine-learning (ML) approach for mapping non-self-consistent Kohn-Sham Hamiltonians constructed with one kind of density functional to the nearly self-consistent Hamiltonians constructed with another kind of density functional. This approach is designed as a fast surrogate Hamiltonian calculator for use in long nonadiabatic dynamics simulations of large atomistic systems. In this approach, the input and output features are Hamiltonian matrices computed from different levels of theory. We demonstrate that the developed ML-based Hamiltonian mapping method (1) speeds up the calculations by several orders of magnitude, (2) is conceptually simpler than alternative ML approaches, (3) is applicable to different systems and sizes and can be used for mapping Hamiltonians constructed with arbitrary density functionals, (4) requires a modest training data, learns fast, and generates molecular orbitals and their energies with the accuracy nearly matching that of conventional calculations, and (5) when applied to nonadiabatic dynamics simulation of excitation energy relaxation in large systems yields the corresponding time scales within the margin of error of the conventional calculations. Using this approach, we explore the excitation energy relaxation in C60 fullerene and Si75H64 quantum dot structures and derive qualitative and quantitative insights into dynamics in these systems.
Collapse
Affiliation(s)
- Mohammad Shakiba
- Department of Chemistry, University at Buffalo, The State University of New York, Buffalo, New York 14260, United States
| | - Alexey V Akimov
- Department of Chemistry, University at Buffalo, The State University of New York, Buffalo, New York 14260, United States
| |
Collapse
|
16
|
Unke OT, Stöhr M, Ganscha S, Unterthiner T, Maennel H, Kashubin S, Ahlin D, Gastegger M, Medrano Sandonas L, Berryman JT, Tkatchenko A, Müller KR. Biomolecular dynamics with machine-learned quantum-mechanical force fields trained on diverse chemical fragments. SCIENCE ADVANCES 2024; 10:eadn4397. [PMID: 38579003 DOI: 10.1126/sciadv.adn4397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Accepted: 02/29/2024] [Indexed: 04/07/2024]
Abstract
The GEMS method enables molecular dynamics simulations of large heterogeneous systems at ab initio quality.
Collapse
Affiliation(s)
- Oliver T Unke
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Martin Stöhr
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Stefan Ganscha
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Thomas Unterthiner
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Hartmut Maennel
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Sergii Kashubin
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Daniel Ahlin
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
- BASLEARN - TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587 Berlin, Germany
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Joshua T Berryman
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| |
Collapse
|
17
|
Kneiding H, Nova A, Balcells D. Directional multiobjective optimization of metal complexes at the billion-system scale. NATURE COMPUTATIONAL SCIENCE 2024; 4:263-273. [PMID: 38553635 DOI: 10.1038/s43588-024-00616-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 02/29/2024] [Indexed: 04/14/2024]
Abstract
The discovery of transition metal complexes (TMCs) with optimal properties requires large ligand libraries and efficient multiobjective optimization algorithms. Here we provide the tmQMg-L library, containing 30k diverse and synthesizable ligands with robustly assigned charges and metal coordination modes. tmQMg-L enabled the generation of 1.37 million palladium TMCs, which were used to develop and benchmark the Pareto-Lighthouse multiobjective genetic algorithm (PL-MOGA). With fine control over aim and scope, this algorithm maximized both the polarizability and highest occupied molecular orbital-lowest unoccupied molecular orbital gap of the TMCs within selected regions of the Pareto front, without requiring prior knowledge on the objective limits. Instead of genetic operations on small ligand fragments, the PL-MOGA did whole-ligand mutation and crossover operations, which in chemical spaces containing billions of systems, yielded thousands of highly diverse TMCs in an interpretable manner.
Collapse
Affiliation(s)
- Hannes Kneiding
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, Oslo, Norway
| | - Ainara Nova
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, Oslo, Norway
- Centre for Materials Science and Nanotechnology, Department of Chemistry, University of Oslo, Oslo, Norway
| | - David Balcells
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, Oslo, Norway.
| |
Collapse
|
18
|
Domenichini G. Extending the definition of atomic basis sets to atoms with fractional nuclear charge. J Chem Phys 2024; 160:124107. [PMID: 38526100 DOI: 10.1063/5.0196383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Accepted: 03/10/2024] [Indexed: 03/26/2024] Open
Abstract
Alchemical transformations showed that perturbation theory can be applied also to changes in the atomic nuclear charges of a molecule. The alchemical path that connects two different chemical species involves the conceptualization of a non-physical system in which an atom possess a non-integer nuclear charge. A correct quantum mechanical treatment of these systems is limited by the fact that finite size atomic basis sets do not define exponents and contraction coefficients for fractional charge atoms. This paper proposes a solution to this problem and shows that a smooth interpolation of the atomic orbital coefficients and exponents across the periodic table is a convenient way to produce accurate alchemical predictions, even using small size basis sets.
Collapse
Affiliation(s)
- Giorgio Domenichini
- Faculty of Physics, University of Vienna, Kolingasse 14-16, 1090 Vienna, Austria
| |
Collapse
|
19
|
Dral PO. AI in computational chemistry through the lens of a decade-long journey. Chem Commun (Camb) 2024; 60:3240-3258. [PMID: 38444290 DOI: 10.1039/d4cc00010b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
This article gives a perspective on the progress of AI tools in computational chemistry through the lens of the author's decade-long contributions put in the wider context of the trends in this rapidly expanding field. This progress over the last decade is tremendous: while a decade ago we had a glimpse of what was to come through many proof-of-concept studies, now we witness the emergence of many AI-based computational chemistry tools that are mature enough to make faster and more accurate simulations increasingly routine. Such simulations in turn allow us to validate and even revise experimental results, deepen our understanding of the physicochemical processes in nature, and design better materials, devices, and drugs. The rapid introduction of powerful AI tools gives rise to unique challenges and opportunities that are discussed in this article too.
Collapse
Affiliation(s)
- Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China.
| |
Collapse
|
20
|
Yu L, Zhang W, Nie Z, Duan J, Chen S. Machine learning guided tuning charge distribution by composition in MOFs for oxygen evolution reaction. RSC Adv 2024; 14:9032-9037. [PMID: 38500624 PMCID: PMC10945371 DOI: 10.1039/d3ra08873a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/25/2024] [Indexed: 03/20/2024] Open
Abstract
Traditional design/optimization of metal-organic frameworks (MOFs) is time-consuming and labor-intensive. In this study, we utilize machine learning (ML) to accelerate the synthesis of MOFs. We have built a library of over 900 MOFs with different metal salts, solvent ratios, reaction durations and temperatures, and utilize zeta potentials as target variables for ML training. A total of four ML models have been used to train the collected dataset and assess their convergence performances, where Random Forest Regression (RFR) and Gradient Boosting Regression (GBR) models show strong correlation and accurate predictions. We then predicted two kinds of MOFs from RFR and GBR models. Remarkably, the experimentally data of the synthesized MOFs closely matched the predicted results, and these MOFs exhibited excellent electrocatalytic performances for oxygen evolution. This study would have general implications in the utilization of machine learning for accelerating the synthesis of MOFs for diverse applications.
Collapse
Affiliation(s)
- Licheng Yu
- Key Laboratory for Soft Chemistry and Functional Materials (Ministry of Education), School of Chemistry and Chemical Engineering, School of Energy and Power Engineering, Nanjing University of Science and Technology Nanjing 210094 China
| | - Wenwen Zhang
- Key Laboratory for Soft Chemistry and Functional Materials (Ministry of Education), School of Chemistry and Chemical Engineering, School of Energy and Power Engineering, Nanjing University of Science and Technology Nanjing 210094 China
| | - Zhihao Nie
- Key Laboratory for Soft Chemistry and Functional Materials (Ministry of Education), School of Chemistry and Chemical Engineering, School of Energy and Power Engineering, Nanjing University of Science and Technology Nanjing 210094 China
| | - Jingjing Duan
- Key Laboratory for Soft Chemistry and Functional Materials (Ministry of Education), School of Chemistry and Chemical Engineering, School of Energy and Power Engineering, Nanjing University of Science and Technology Nanjing 210094 China
| | - Sheng Chen
- Key Laboratory for Soft Chemistry and Functional Materials (Ministry of Education), School of Chemistry and Chemical Engineering, School of Energy and Power Engineering, Nanjing University of Science and Technology Nanjing 210094 China
| |
Collapse
|
21
|
Kurnikov IV, Pereyaslavets L, Kamath G, Sakipov SN, Voronina E, Butin O, Illarionov A, Leontyev I, Nawrocki G, Darkhovskiy M, Olevanov M, Ivahnenko I, Chen Y, Lock CB, Levitt M, Kornberg RD, Fain B. Neural Network Corrections to Intermolecular Interaction Terms of a Molecular Force Field Capture Nuclear Quantum Effects in Calculations of Liquid Thermodynamic Properties. J Chem Theory Comput 2024; 20:1347-1357. [PMID: 38240485 PMCID: PMC11042917 DOI: 10.1021/acs.jctc.3c00921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
We incorporate nuclear quantum effects (NQE) in condensed matter simulations by introducing short-range neural network (NN) corrections to the ab initio fitted molecular force field ARROW. Force field NN corrections are fitted to average interaction energies and forces of molecular dimers, which are simulated using the Path Integral Molecular Dynamics (PIMD) technique with restrained centroid positions. The NN-corrected force field allows reproduction of the NQE for computed liquid water and methane properties such as density, radial distribution function (RDF), heat of evaporation (HVAP), and solvation free energy. Accounting for NQE through molecular force field corrections circumvents the need for explicit computationally expensive PIMD simulations in accurate calculations of the properties of chemical and biological systems. The accuracy and locality of pairwise NN NQE corrections indicate that this approach could be applicable to complex heterogeneous systems, such as proteins.
Collapse
Affiliation(s)
- Igor V Kurnikov
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Leonid Pereyaslavets
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Ganesh Kamath
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Serzhan N Sakipov
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Ekaterina Voronina
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Oleg Butin
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Alexey Illarionov
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Igor Leontyev
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Grzegorz Nawrocki
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Mikhail Darkhovskiy
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Michael Olevanov
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Ilya Ivahnenko
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - YuChun Chen
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Christopher B Lock
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Palo Alto, California 94304, United States
| | - Michael Levitt
- Department of Structural Biology, Stanford University School of Medicine, Stanford, California 94305, United States
| | - Roger D Kornberg
- Department of Structural Biology, Stanford University School of Medicine, Stanford, California 94305, United States
| | - Boris Fain
- InterX Inc., (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| |
Collapse
|
22
|
Deb J, Saikia L, Dihingia KD, Sastry GN. ChatGPT in the Material Design: Selected Case Studies to Assess the Potential of ChatGPT. J Chem Inf Model 2024; 64:799-811. [PMID: 38237025 DOI: 10.1021/acs.jcim.3c01702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2024]
Abstract
The pursuit of designing smart and functional materials is of paramount importance across various domains, such as material science, engineering, chemical technology, electronics, biomedicine, energy, and numerous others. Consequently, researchers are actively involved in the development of innovative models and strategies for material design. Recent advancements in analytical tools, experimentation, and computer technology additionally enhance the material design possibilities. Notably, data-driven techniques like artificial intelligence and machine learning have achieved substantial progress in exploring various applications within material science. One such approach, ChatGPT, a large language model, holds transformative potential for addressing complex queries. In this article, we explore ChatGPT's understanding of material science by assigning some simple tasks across various subareas of computational material science. The findings indicate that while ChatGPT may make some minor errors in accomplishing general tasks, it demonstrates the capability to learn and adapt through human interactions. However, issues like output consistency, probable hidden errors, and ethical consequences should be addressed.
Collapse
Affiliation(s)
- Jyotirmoy Deb
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India
| | - Lakshi Saikia
- Advanced Materials Group, Materials Sciences & Technology Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, Uttar Pradesh, India
| | - Kripa Dristi Dihingia
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, Uttar Pradesh, India
| | - G Narahari Sastry
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, Uttar Pradesh, India
| |
Collapse
|
23
|
Jijila B, Nirmala V, Selvarengan P, Kavitha D, Arun Muthuraj V, Rajagopal A. Employing neural density functionals to generate potential energy surfaces. J Mol Model 2024; 30:65. [PMID: 38340208 DOI: 10.1007/s00894-024-05834-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 01/04/2024] [Indexed: 02/12/2024]
Abstract
CONTEXT With the union of machine learning (ML) and quantum chemistry, amid the debate between machine-learned functionals and human-designed functionals in density functional theory (DFT), this paper aims to demonstrate the generation of potential energy surfaces using computations with machine-learned density functional approximation (ML-DFA). A recent research trend is the application of ML in quantum sciences in the design of density functionals such as DeepMind's Deep Learning model (DeepMind21, DM21). Though science reported the state-of-the-art performance of DM21, the opportunity to utilize DeepMind's pretrained DM21 neural networks in computations in quantum chemistry has not yet been tapped. So far in the literature, the Deep Learning density functionals (DM21) have not been applied to generate potential energy surfaces. While the superior accuracy of DM21 has been reported, there is still a scarcity of publications that apply DM21 in calculations in the field. In this context, for the first time in literature, neural density functionals inferring 2D potential energy surfaces (ML-DFA-PES) based on machine-learned DFA-based computational method is contributed in this paper. This paper reports the ML-DFA-generated PES for C4H8, H2O, H2, and H2+ by employing a pretrained DM21m TensorFlow model with cc-pVDZ basis set. In addition, we also analyze the long-range behavior of DM21 based PES to investigate the ability to describe a system at long ranges. Furthermore, we compare PES diagrams from DM21 with popular DFT functionals (b3lyp/ PW6B95) and CCSD(T). METHODS In this method, 2D potential energy surfaces are obtained using a method that relies upon the neural network's ability to accurately learn the mapping between 3D electron density and exchange-correlation potential. By inserting Deep Learning inference in DFT with a pretrained neural network, self-consistent field (SCF) energy at different geometries along the coordinates of interest is computed, and then, potential energy surfaces are plotted. In this method, first, the electron density is computed mathematically, and this computed 3D electron density is used as a ML feature vector to predict the exchange correlation potential as a ML inference computed by a forward pass of pre-trained DM21 TensorFlow computational graph, followed by the computation of self-consistent field energy at multiple geometries, and then, SCF energies at different bond lengths/angles are plotted as 2D PES. We implement this in a python source code using frameworks such as PySCF and DM21. This paper contributes this implementation in open source. The source code and DM21-DFA-based PES are contributed at https://sites.google.com/view/MLfunctionals-DeepMind-PES .
Collapse
Affiliation(s)
- B Jijila
- Queen Mary's College, Chennai, India
| | - V Nirmala
- Queen Mary's College, Chennai, India.
| | - P Selvarengan
- Kalasalingam Academy of Research & Education, Krishnankoil, India
| | - D Kavitha
- Dr. MGR Educational and Research Institute, Chennai, India
| | | | - A Rajagopal
- Indian Institute of Technology, Madras, India
| |
Collapse
|
24
|
Sahre MJ, von Rudorff GF, Marquetand P, von Lilienfeld OA. Transferability of atomic energies from alchemical decomposition. J Chem Phys 2024; 160:054106. [PMID: 38341696 DOI: 10.1063/5.0187298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 01/09/2024] [Indexed: 02/13/2024] Open
Abstract
We study alchemical atomic energy partitioning as a method to estimate atomization energies from atomic contributions, which are defined in physically rigorous and general ways through the use of the uniform electron gas as a joint reference. We analyze quantitatively the relation between atomic energies and their local environment using a dataset of 1325 organic molecules. The atomic energies are transferable across various molecules, enabling the prediction of atomization energies with a mean absolute error of 23 kcal/mol, comparable to simple statistical estimates but potentially more robust given their grounding in the physics-based decomposition scheme. A comparative analysis with other decomposition methods highlights its sensitivity to electrostatic variations, underlining its potential as a representation of the environment as well as in studying processes like diffusion in solids characterized by significant electrostatic shifts.
Collapse
Affiliation(s)
- Michael J Sahre
- Vienna Doctoral School in Chemistry (DoSChem) and Institute of Theoretical Chemistry and Faculty of Physics, University of Vienna, 1090 Vienna, Austria
| | - Guido Falk von Rudorff
- Department of Chemistry, University Kassel, Heinrich-Plett-Str.40, 34132 Kassel, Germany
- Center for Interdisciplinary Nanostructure Science and Technology (CINSaT), Heinrich-Plett-Straße 40, 34132 Kassel, Germany
| | - Philipp Marquetand
- Faculty of Chemistry, Institute of Theoretical Chemistry, University of Vienna, Währinger Str. 17, 1090 Vienna, Austria
| | - O Anatole von Lilienfeld
- Vienna Doctoral School in Chemistry (DoSChem) and Institute of Theoretical Chemistry and Faculty of Physics, University of Vienna, 1090 Vienna, Austria
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, St. George Campus, Toronto, M5S 3H6 Ontario, Canada
- Department of Materials Science and Engineering, University of Toronto, St. George Campus, Toronto, M5S 3E4 Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, M5S 1M1 Ontario, Canada
- ML Group, Technische Universität Berlin and Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
- Department of Physics, University of Toronto, St. George Campus, Toronto, M5S 1A7 Ontario, Canada
| |
Collapse
|
25
|
Kamath G, Illarionov A, Sakipov S, Pereyaslavets L, Kurnikov IV, Butin O, Voronina E, Ivahnenko I, Leontyev I, Nawrocki G, Darkhovskiy M, Olevanov M, Cherniavskyi YK, Lock C, Greenslade S, Chen Y, Kornberg RD, Levitt M, Fain B. Combining Force Fields and Neural Networks for an Accurate Representation of Bonded Interactions. J Phys Chem A 2024; 128:807-812. [PMID: 38232765 PMCID: PMC11008955 DOI: 10.1021/acs.jpca.3c07598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
We present a formalism of a neural network encoding bonded interactions in molecules. This intramolecular encoding is consistent with the models of intermolecular interactions previously designed by this group. Variants of the encoding fed into a corresponding neural network may be used to economically improve the representation of torsional degrees of freedom in any force field. We test the accuracy of the reproduction of the ab initio potential energy surface on a set of conformations of two dipeptides, methyl-capped ALA and ASP, in several scenarios. The encoding, either alone or in conjunction with an analytical potential, improves agreement with ab initio energies that are on par with those of other neural network-based potentials. Using the encoding and neural nets in tandem with an analytical model places the agreements firmly within "chemical accuracy" of ±0.5 kcal/mol.
Collapse
Affiliation(s)
- Ganesh Kamath
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
| | - Alexey Illarionov
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
| | - Serzhan Sakipov
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
| | - Leonid Pereyaslavets
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
| | - Igor V Kurnikov
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
| | - Oleg Butin
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
| | - Ekaterina Voronina
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
- Lomonosov MSU, Skobeltsyn Institute of Nuclear Physics, Moscow 119991, Russia
| | - Ilya Ivahnenko
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
| | - Igor Leontyev
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
| | - Grzegorz Nawrocki
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
| | - Mikhail Darkhovskiy
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
| | - Michael Olevanov
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
- Department of Physics, Lomonosov MSU, Moscow 119991, Russia
| | - Yevhen K Cherniavskyi
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
| | - Christopher Lock
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Palo Alto, California 94304, United States
| | - Sean Greenslade
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
| | - YuChun Chen
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
| | - Roger D Kornberg
- Department of Structural Biology, Stanford University School of Medicine, Stanford, California 94304, United States
| | - Michael Levitt
- Department of Structural Biology, Stanford University School of Medicine, Stanford, California 94304, United States
| | - Boris Fain
- InterX Inc. (a subsidiary of NeoTX Therapeutics LTD), 805 Allston Way, Berkeley, California 94710, United States
| |
Collapse
|
26
|
Isert C, Atz K, Riniker S, Schneider G. Exploring protein-ligand binding affinity prediction with electron density-based geometric deep learning. RSC Adv 2024; 14:4492-4502. [PMID: 38312732 PMCID: PMC10835705 DOI: 10.1039/d3ra08650j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 01/19/2024] [Indexed: 02/06/2024] Open
Abstract
Rational structure-based drug design relies on accurate predictions of protein-ligand binding affinity from structural molecular information. Although deep learning-based methods for predicting binding affinity have shown promise in computational drug design, certain approaches have faced criticism for their potential to inadequately capture the fundamental physical interactions between ligands and their macromolecular targets or for being susceptible to dataset biases. Herein, we propose to include bond-critical points based on the electron density of a protein-ligand complex as a fundamental physical representation of protein-ligand interactions. Employing a geometric deep learning model, we explore the usefulness of these bond-critical points to predict absolute binding affinities of protein-ligand complexes, benchmark model performance against existing methods, and provide a critical analysis of this new approach. The models achieved root-mean-squared errors of 1.4-1.8 log units on the PDBbind dataset, and 1.0-1.7 log units on the PDE10A dataset, not indicating significant advantages over benchmark methods, and thus rendering the utility of electron density for deep learning models context-dependent. The relationship between intermolecular electron density and corresponding binding affinity was analyzed, and Pearson correlation coefficients r > 0.7 were obtained for several macromolecular targets.
Collapse
Affiliation(s)
- Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Sereina Riniker
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| |
Collapse
|
27
|
Tu C, Huang W, Liang S, Wang K, Tian Q, Yan W. High-throughput virtual screening of organic second-order nonlinear optical chromophores within the donor-π-bridge-acceptor framework. Phys Chem Chem Phys 2024; 26:2363-2375. [PMID: 38167888 DOI: 10.1039/d3cp04046a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
In view of the theoretical importance and huge application potential of second-order nonlinear optical (NLO) materials, it is of great significance to conduct high-throughput virtual screening (HTVS) on a compound library to find candidate NLO chromophores. Under the donor-π-bridge-acceptor structural framework, a virtual compound library (size = 27 090) was constructed by enumeration of structural fragments. The kernel property adopted for optimization is the static first hyperpolarizability (β0). By combining machine learning and quantum chemical calculations, we have performed an HTVS procedure to sieve NLO chromophores out, and the response mechanism of the selected optimal NLO chromophores was examined. We have found: (a) The multi-layer perceptron/extended connectivity fingerprint combination with 20% selection ratio gives the highest prediction accuracy for the studied systems. (b) The two optimal donors are bis(4-diphenylaminophenyl)aminyl and bis(4-tert-butylphenyl)aminyl; the optimal π-bridges are composed of two thiophenyl, selenophenyl or furanyl units; and the two optimal acceptors are tri-s-triazinyl and 2,3-dicyanopyrazinyl. (c) The no. 1 candidate molecule can exhibit a calculated β0 equal to 8.55 × 104 a.u. (d) The difference in NLO responses of the optimal 16 molecules comes from the synergistic interaction of ES1, Δμ and f, by employing the two-level model. In addition, the sizable Δμ and f allow the studied optimal molecules to obtain a large NLO response in the meantime keeping a not-too-low excitation energy (retaining good optical transparency in the restricted range of the visible spectrum region). (e) With further modification on the acceptor, the designed DPA-π-TRZ-A' (A' = CN or NO2, π = oligo-thiophenyl or selenophenyl) systems can exhibit a rather large NLO response (maximum β0 = 3.17 × 105 a.u.), hence should have considerable potential as second-order NLO chromophores. With the above observations, we expect to provide some insight for the research community into the HTVS of organic second-order NLO chromophores.
Collapse
Affiliation(s)
- Chunyun Tu
- School of Chemistry and Materials Engineering, Guiyang University, Guiyang, 550005, P. R. China.
| | - Weijiang Huang
- School of Chemistry and Materials Engineering, Guiyang University, Guiyang, 550005, P. R. China.
| | - Sheng Liang
- School of Mathematics and Information Science, Guiyang University, Guiyang, 550005, P. R. China
| | - Kui Wang
- School of Chemistry and Materials Engineering, Guiyang University, Guiyang, 550005, P. R. China.
| | - Qin Tian
- School of Chemistry and Materials Engineering, Guiyang University, Guiyang, 550005, P. R. China.
| | - Wei Yan
- School of Chemistry and Materials Engineering, Guiyang University, Guiyang, 550005, P. R. China.
| |
Collapse
|
28
|
Duan C, Du Y, Jia H, Kulik HJ. Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model. NATURE COMPUTATIONAL SCIENCE 2023; 3:1045-1055. [PMID: 38177724 DOI: 10.1038/s43588-023-00563-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 11/03/2023] [Indexed: 01/06/2024]
Abstract
Transition state search is key in chemistry for elucidating reaction mechanisms and exploring reaction networks. The search for accurate 3D transition state structures, however, requires numerous computationally intensive quantum chemistry calculations due to the complexity of potential energy surfaces. Here we developed an object-aware SE(3) equivariant diffusion model that satisfies all physical symmetries and constraints for generating sets of structures-reactant, transition state and product-in an elementary reaction. Provided reactant and product, this model generates a transition state structure in seconds instead of hours, which is typically required when performing quantum-chemistry-based optimizations. The generated transition state structures achieve a median of 0.08 Å root mean square deviation compared to the true transition state. With a confidence scoring model for uncertainty quantification, we approach an accuracy required for reaction barrier estimation (2.6 kcal mol-1) by only performing quantum chemistry-based optimizations on 14% of the most challenging reactions. We envision usefulness for our approach in constructing large reaction networks with unknown mechanisms.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, US.
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, US.
| | - Yuanqi Du
- Department of Computer Science, Cornell University, Ithaca, NY, US
| | - Haojun Jia
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, US
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, US
| | - Heather J Kulik
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, US
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, US
| |
Collapse
|
29
|
Millan R, Bello-Jurado E, Moliner M, Boronat M, Gomez-Bombarelli R. Effect of Framework Composition and NH 3 on the Diffusion of Cu + in Cu-CHA Catalysts Predicted by Machine-Learning Accelerated Molecular Dynamics. ACS CENTRAL SCIENCE 2023; 9:2044-2056. [PMID: 38033797 PMCID: PMC10683499 DOI: 10.1021/acscentsci.3c00870] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Indexed: 12/02/2023]
Abstract
Cu-exchanged zeolites rely on mobile solvated Cu+ cations for their catalytic activity, but the role of the framework composition in transport is not fully understood. Ab initio molecular dynamics simulations can provide quantitative atomistic insight but are too computationally expensive to explore large length and time scales or diverse compositions. We report a machine-learning interatomic potential that accurately reproduces ab initio results and effectively generalizes to allow multinanosecond simulations of large supercells and diverse chemical compositions. Biased and unbiased simulations of [Cu(NH3)2]+ mobility show that aluminum pairing in eight-membered rings accelerates local hopping and demonstrate that increased NH3 concentration enhances long-range diffusion. The probability of finding two [Cu(NH3)2]+ complexes in the same cage, which is key for SCR-NOx reaction, increases with Cu content and Al content but does not correlate with the long-range mobility of Cu+. Supporting experimental evidence was obtained from reactivity tests of Cu-CHA catalysts with a controlled chemical composition.
Collapse
Affiliation(s)
- Reisel Millan
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
- Instituto
de Tecnología Química, Universitat
Politècnica de València-Consejo Superior de Investigaciones
Científicas, Avenida de los Naranjos s/n, 46022 Valencia, Spain
| | - Estefanía Bello-Jurado
- Instituto
de Tecnología Química, Universitat
Politècnica de València-Consejo Superior de Investigaciones
Científicas, Avenida de los Naranjos s/n, 46022 Valencia, Spain
| | - Manuel Moliner
- Instituto
de Tecnología Química, Universitat
Politècnica de València-Consejo Superior de Investigaciones
Científicas, Avenida de los Naranjos s/n, 46022 Valencia, Spain
| | - Mercedes Boronat
- Instituto
de Tecnología Química, Universitat
Politècnica de València-Consejo Superior de Investigaciones
Científicas, Avenida de los Naranjos s/n, 46022 Valencia, Spain
| | - Rafael Gomez-Bombarelli
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
30
|
Domenichini G, Dellago C. Molecular Hessian matrices from a machine learning random forest regression algorithm. J Chem Phys 2023; 159:194111. [PMID: 37982481 DOI: 10.1063/5.0169384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 10/27/2023] [Indexed: 11/21/2023] Open
Abstract
In this article, we present a machine learning model to obtain fast and accurate estimates of the molecular Hessian matrix. In this model, based on a random forest, the second derivatives of the energy with respect to redundant internal coordinates are learned individually. The internal coordinates together with their specific representation guarantee rotational and translational invariance. The model is trained on a subset of the QM7 dataset but is shown to be applicable to larger molecules picked from the QM9 dataset. From the predicted Hessian, it is also possible to obtain reasonable estimates of the vibrational frequencies, normal modes, and zero point energies of the molecules.
Collapse
Affiliation(s)
- Giorgio Domenichini
- Faculty of Physics, University of Vienna, Kolingasse 14-16, 1090 Vienna, Austria
| | - Christoph Dellago
- Faculty of Physics, University of Vienna, Kolingasse 14-16, 1090 Vienna, Austria
| |
Collapse
|
31
|
Nippa DF, Atz K, Müller AT, Wolfard J, Isert C, Binder M, Scheidegger O, Konrad DB, Grether U, Martin RE, Schneider G. Identifying opportunities for late-stage C-H alkylation with high-throughput experimentation and in silico reaction screening. Commun Chem 2023; 6:256. [PMID: 37985850 PMCID: PMC10661846 DOI: 10.1038/s42004-023-01047-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 10/30/2023] [Indexed: 11/22/2023] Open
Abstract
Enhancing the properties of advanced drug candidates is aided by the direct incorporation of specific chemical groups, avoiding the need to construct the entire compound from the ground up. Nevertheless, their chemical intricacy often poses challenges in predicting reactivity for C-H activation reactions and planning their synthesis. We adopted a reaction screening approach that combines high-throughput experimentation (HTE) at a nanomolar scale with computational graph neural networks (GNNs). This approach aims to identify suitable substrates for late-stage C-H alkylation using Minisci-type chemistry. GNNs were trained using experimentally generated reactions derived from in-house HTE and literature data. These trained models were then used to predict, in a forward-looking manner, the coupling of 3180 advanced heterocyclic building blocks with a diverse set of sp3-rich carboxylic acids. This predictive approach aimed to explore the substrate landscape for Minisci-type alkylations. Promising candidates were chosen, their production was scaled up, and they were subsequently isolated and characterized. This process led to the creation of 30 novel, functionally modified molecules that hold potential for further refinement. These results positively advocate the application of HTE-based machine learning to virtual reaction screening.
Collapse
Affiliation(s)
- David F Nippa
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Butenandtstrasse 5, 81377, Munich, Germany
| | - Kenneth Atz
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Alex T Müller
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland
| | - Jens Wolfard
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland
| | - Clemens Isert
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Martin Binder
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland
| | - Oliver Scheidegger
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland
| | - David B Konrad
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Butenandtstrasse 5, 81377, Munich, Germany.
| | - Uwe Grether
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland.
| | - Rainer E Martin
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland.
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland.
| |
Collapse
|
32
|
Xia S, Chen E, Zhang Y. Integrated Molecular Modeling and Machine Learning for Drug Design. J Chem Theory Comput 2023; 19:7478-7495. [PMID: 37883810 PMCID: PMC10653122 DOI: 10.1021/acs.jctc.3c00814] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Modern therapeutic development often involves several stages that are interconnected, and multiple iterations are usually required to bring a new drug to the market. Computational approaches have increasingly become an indispensable part of helping reduce the time and cost of the research and development of new drugs. In this Perspective, we summarize our recent efforts on integrating molecular modeling and machine learning to develop computational tools for modulator design, including a pocket-guided rational design approach based on AlphaSpace to target protein-protein interactions, delta machine learning scoring functions for protein-ligand docking as well as virtual screening, and state-of-the-art deep learning models to predict calculated and experimental molecular properties based on molecular mechanics optimized geometries. Meanwhile, we discuss remaining challenges and promising directions for further development and use a retrospective example of FDA approved kinase inhibitor Erlotinib to demonstrate the use of these newly developed computational tools.
Collapse
Affiliation(s)
- Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Eric Chen
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
33
|
Vinod V, Maity S, Zaspel P, Kleinekathöfer U. Multifidelity Machine Learning for Molecular Excitation Energies. J Chem Theory Comput 2023; 19:7658-7670. [PMID: 37862054 DOI: 10.1021/acs.jctc.3c00882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
The accurate but fast calculation of molecular excited states is still a very challenging topic. For many applications, detailed knowledge of the energy funnel in larger molecular aggregates is of key importance, requiring highly accurate excitation energies. To this end, machine learning techniques can be a very useful tool, though the cost of generating highly accurate training data sets still remains a severe challenge. To overcome this hurdle, this work proposes the use of multifidelity machine learning where very little training data from high accuracies is combined with cheaper and less accurate data to achieve the accuracy of the costlier level. In the present study, the approach is employed to predict vertical excitation energies to the first excited state for three molecules of increasing size, namely, benzene, naphthalene, and anthracene. The energies are trained and tested for conformations stemming from classical molecular dynamics and density functional based tight-binding simulations. It can be shown that the multifidelity machine learning model can achieve the same accuracy as a machine learning model built only on high-cost training data while expending a much lower computational effort to generate the data. The numerical gain observed in these benchmark test calculations was over a factor of 30 but certainly can be much higher for high-accuracy data.
Collapse
Affiliation(s)
- Vivin Vinod
- School of Mathematics and Natural Science, University of Wuppertal, Wuppertal 42119, Germany
- School of Computer Science and Engineering, Constructor University, Campus Ring 1, Bremen 28759, Germany
| | - Sayan Maity
- School of Science, Constructor University, Campus Ring 1, Bremen 28759, Germany
| | - Peter Zaspel
- School of Mathematics and Natural Science, University of Wuppertal, Wuppertal 42119, Germany
- School of Computer Science and Engineering, Constructor University, Campus Ring 1, Bremen 28759, Germany
| | | |
Collapse
|
34
|
Xiang Y, Tang YH, Gong Z, Liu H, Wu L, Lin G, Sun H. Efficient Exploration of Chemical Compound Space Using Active Learning for Prediction of Thermodynamic Properties of Alkane Molecules. J Chem Inf Model 2023; 63:6515-6524. [PMID: 37857374 DOI: 10.1021/acs.jcim.3c01430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
We introduce an exploratory active learning (AL) algorithm using Gaussian process regression and marginalized graph kernel (GPR-MGK) to sample chemical compound space (CCS) at minimal cost. Targeting 251,728 enumerated alkane molecules with 4-19 carbon atoms, we applied the AL algorithm to select a diverse and representative set of molecules and then conducted high-throughput molecular simulations on these selected molecules. To demonstrate the power of the AL algorithm, we built directed message-passing neural networks (D-MPNN) using simulation data as the training set to predict liquid densities, heat capacities, and vaporization enthalpies of the CCS. Validations show that D-MPNN models built on the smallest training set considered in this work, which consists of 313 molecules or 0.124% of the original CCS, predict the properties with R2 > 0.99 against the computational data and R2 > 0.94 against the experimental data. The advantage of the presented AL algorithm is that the predicted uncertainty of GPR depends on only the molecular structures, which renders it compatible with high-throughput data generation.
Collapse
Affiliation(s)
- Yan Xiang
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yu-Hang Tang
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- NVIDIA Corporation, Santa Clara, California 95051, United States
| | - Zheng Gong
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hongyi Liu
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Liang Wu
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Guang Lin
- Department of Mathematics & School of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47907, United States
| | - Huai Sun
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
35
|
Bhatia AS, Saggi MK, Kais S. Quantum Machine Learning Predicting ADME-Tox Properties in Drug Discovery. J Chem Inf Model 2023; 63:6476-6486. [PMID: 37603536 DOI: 10.1021/acs.jcim.3c01079] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
In the drug discovery paradigm, the evaluation of absorption, distribution, metabolism, and excretion (ADME) and toxicity properties of new chemical entities is one of the most critical issues, which is a time-consuming process, immensely expensive, and poses formidable challenges in pharmaceutical R&D. In recent years, emerging technologies like artificial intelligence (AI), big data, and cloud technologies have garnered great attention to predict the ADME and toxicity of molecules. Currently, the blend of quantum computation and machine learning has attracted considerable attention in almost every field ranging from chemistry to biomedicine and several engineering disciplines as well. Quantum computers have the potential to bring advances in high-throughput experimental techniques and in screening billions of molecules by reducing development costs and time associated with the drug discovery process. Motivated by the efficiency of quantum kernel methods, we proposed a quantum machine learning (QML) framework consisting of a classical support vector classifier algorithm with a kernel-based quantum classifier. To demonstrate the feasibility of the proposed QML framework, the simplified molecular input line entry system (SMILES) notation-based string kernel, combined with a quantum support vector classifier, is used for the evaluation of chemical/drug ADME-Tox properties. The proposed quantum machine learning framework is validated and assessed via large-scale simulations. Based on our results from numerical simulations, the quantum model achieved the best performance as compared to classical counterparts in terms of the area under the curve of the receiver operating characteristic curve (AUC ROC; 0.80-0.95) for predicting outcomes on ADME-Tox data sets for small molecules, with a different number of features. The deployment of the proposed framework in the pharmaceutical industry would be extremely valuable in making the best decisions possible.
Collapse
Affiliation(s)
- Amandeep Singh Bhatia
- School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana 47907, United States
| | - Mandeep Kaur Saggi
- Department of Chemistry, Purdue University, West Lafayette, Indiana 47907, United States
| | - Sabre Kais
- Department of Chemistry, Purdue University, West Lafayette, Indiana 47907, United States
| |
Collapse
|
36
|
Illarionov A, Sakipov S, Pereyaslavets L, Kurnikov IV, Kamath G, Butin O, Voronina E, Ivahnenko I, Leontyev I, Nawrocki G, Darkhovskiy M, Olevanov M, Cherniavskyi YK, Lock C, Greenslade S, Sankaranarayanan SKRS, Kurnikova MG, Potoff J, Kornberg RD, Levitt M, Fain B. Combining Force Fields and Neural Networks for an Accurate Representation of Chemically Diverse Molecular Interactions. J Am Chem Soc 2023; 145:23620-23629. [PMID: 37856313 PMCID: PMC10623557 DOI: 10.1021/jacs.3c07628] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Indexed: 10/21/2023]
Abstract
A key goal of molecular modeling is the accurate reproduction of the true quantum mechanical potential energy of arbitrary molecular ensembles with a tractable classical approximation. The challenges are that analytical expressions found in general purpose force fields struggle to faithfully represent the intermolecular quantum potential energy surface at close distances and in strong interaction regimes; that the more accurate neural network approximations do not capture crucial physics concepts, e.g., nonadditive inductive contributions and application of electric fields; and that the ultra-accurate narrowly targeted models have difficulty generalizing to the entire chemical space. We therefore designed a hybrid wide-coverage intermolecular interaction model consisting of an analytically polarizable force field combined with a short-range neural network correction for the total intermolecular interaction energy. Here, we describe the methodology and apply the model to accurately determine the properties of water, the free energy of solvation of neutral and charged molecules, and the binding free energy of ligands to proteins. The correction is subtyped for distinct chemical species to match the underlying force field, to segment and reduce the amount of quantum training data, and to increase accuracy and computational speed. For the systems considered, the hybrid ab initio parametrized Hamiltonian reproduces the two-body dimer quantum mechanics (QM) energies to within 0.03 kcal/mol and the nonadditive many-molecule contributions to within 2%. Simulations of molecular systems using this interaction model run at speeds of several nanoseconds per day.
Collapse
Affiliation(s)
- Alexey Illarionov
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Serzhan Sakipov
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Leonid Pereyaslavets
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Igor V. Kurnikov
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Ganesh Kamath
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Oleg Butin
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Ekaterina Voronina
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
- Lomonosov
MSU, Skobeltsyn Institute of Nuclear Physics, Moscow, 119991, Russia
| | - Ilya Ivahnenko
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Igor Leontyev
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Grzegorz Nawrocki
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Mikhail Darkhovskiy
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Michael Olevanov
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
- Lomonosov
MSU, Dept. of Physics, Moscow, 119991, Russia
| | - Yevhen K. Cherniavskyi
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Christopher Lock
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
- Department
of Neurology and Neurological Sciences, Stanford University School of Medicine, Palo Alto, California 94304, United States
| | - Sean Greenslade
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Subramanian KRS Sankaranarayanan
- Center
for Nanoscale Materials, Argonne National
Lab, Argonne, Illinois 604391, United States
- Department
of Mechanical and Industrial Engineering, University of Illinois, Chicago, Illinois 60607, United States
| | - Maria G. Kurnikova
- Department
of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Jeffrey Potoff
- Department
of Chemical Engineering and Materials Science, Wayne State University, Detroit, Michigan 48202, United States
| | - Roger D. Kornberg
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94304, United States
| | - Michael Levitt
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94304, United States
| | - Boris Fain
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| |
Collapse
|
37
|
Kandpal SC, Otukile KP, Jindal S, Senthil S, Matthews C, Chakraborty S, Moskaleva LV, Ramakrishnan R. Stereo-electronic factors influencing the stability of hydroperoxyalkyl radicals: transferability of chemical trends across hydrocarbons and ab initio methods. Phys Chem Chem Phys 2023; 25:27302-27320. [PMID: 37791466 DOI: 10.1039/d3cp03598k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
The hydroperoxyalkyl radicals (˙QOOH) are known to play a significant role in combustion and tropospheric processes, yet their direct spectroscopic detection remains challenging. In this study, we investigate molecular stereo-electronic effects influencing the kinetic and thermodynamic stability of a ˙QOOH along its formation path from the precursor, alkylperoxyl radical (ROO˙), and the depletion path resulting in the formation of cyclic ether + ˙OH. We focus on reactive intermediates encountered in the oxidation of acyclic hydrocarbon radicals: ethyl, isopropyl, isobutyl, tert-butyl, neopentyl, and their alicyclic counterparts: cyclohexyl, cyclohexenyl, and cyclohexadienyl. We report reaction energies and barriers calculated with the highly accurate method Weizmann-1 (W1) for the channels: ROO˙ ⇌ ˙QOOH, ROO˙ ⇌ alkene + ˙OOH, ˙QOOH ⇌ alkene + ˙OOH, and ˙QOOH ⇌ cyclic ether + ˙OH. Using W1 results as a reference, we have systematically benchmarked the accuracy of popular density functional theory (DFT), composite thermochemistry methods, and an explicitly correlated coupled-cluster method. We ascertain inductive, resonance, and steric effects on the overall stability of ˙QOOH and computationally investigate the possibility of forming more stable species. With new reactions as test cases, we probe the capacity of various ab initio methods to yield quantitative insights on the elementary steps of combustion.
Collapse
Affiliation(s)
| | - Kgalaletso P Otukile
- Department of Chemistry, University of the Free State, PO Box 339, Bloemfontein 9300, South Africa.
| | - Shweta Jindal
- Tata Institute of Fundamental Research, Hyderabad 500046, India.
| | - Salini Senthil
- Tata Institute of Fundamental Research, Hyderabad 500046, India.
| | - Cameron Matthews
- Department of Chemistry, University of the Free State, PO Box 339, Bloemfontein 9300, South Africa.
| | | | - Lyudmila V Moskaleva
- Department of Chemistry, University of the Free State, PO Box 339, Bloemfontein 9300, South Africa.
| | | |
Collapse
|
38
|
Li J, Wu N, Zhang J, Wu HH, Pan K, Wang Y, Liu G, Liu X, Yao Z, Zhang Q. Machine Learning-Assisted Low-Dimensional Electrocatalysts Design for Hydrogen Evolution Reaction. NANO-MICRO LETTERS 2023; 15:227. [PMID: 37831203 PMCID: PMC10575847 DOI: 10.1007/s40820-023-01192-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 08/10/2023] [Indexed: 10/14/2023]
Abstract
Efficient electrocatalysts are crucial for hydrogen generation from electrolyzing water. Nevertheless, the conventional "trial and error" method for producing advanced electrocatalysts is not only cost-ineffective but also time-consuming and labor-intensive. Fortunately, the advancement of machine learning brings new opportunities for electrocatalysts discovery and design. By analyzing experimental and theoretical data, machine learning can effectively predict their hydrogen evolution reaction (HER) performance. This review summarizes recent developments in machine learning for low-dimensional electrocatalysts, including zero-dimension nanoparticles and nanoclusters, one-dimensional nanotubes and nanowires, two-dimensional nanosheets, as well as other electrocatalysts. In particular, the effects of descriptors and algorithms on screening low-dimensional electrocatalysts and investigating their HER performance are highlighted. Finally, the future directions and perspectives for machine learning in electrocatalysis are discussed, emphasizing the potential for machine learning to accelerate electrocatalyst discovery, optimize their performance, and provide new insights into electrocatalytic mechanisms. Overall, this work offers an in-depth understanding of the current state of machine learning in electrocatalysis and its potential for future research.
Collapse
Affiliation(s)
- Jin Li
- College of Chemistry and Chemical Engineering, and Henan Key Laboratory of Function-Oriented Porous Materials, Luoyang Normal University, Luoyang, 471934, People's Republic of China
| | - Naiteng Wu
- College of Chemistry and Chemical Engineering, and Henan Key Laboratory of Function-Oriented Porous Materials, Luoyang Normal University, Luoyang, 471934, People's Republic of China
| | - Jian Zhang
- New Energy Technology Engineering Lab of Jiangsu Province, College of Science, Nanjing University of Posts and Telecommunications (NUPT), Nanjing, 210023, People's Republic of China
| | - Hong-Hui Wu
- School of Materials Science and Engineering, University of Science and Technology Beijing, Beijing, 100083, People's Republic of China.
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE, 8588, USA.
| | - Kunming Pan
- Henan Key Laboratory of High-Temperature Structural and Functional Materials, National Joint Engineering Research Center for Abrasion Control and Molding of Metal Materials, Henan University of Science and Technology, Luoyang, 471003, People's Republic of China
| | - Yingxue Wang
- National Engineering Laboratory for Risk Perception and Prevention, Beijing, 100041, People's Republic of China.
| | - Guilong Liu
- College of Chemistry and Chemical Engineering, and Henan Key Laboratory of Function-Oriented Porous Materials, Luoyang Normal University, Luoyang, 471934, People's Republic of China
| | - Xianming Liu
- College of Chemistry and Chemical Engineering, and Henan Key Laboratory of Function-Oriented Porous Materials, Luoyang Normal University, Luoyang, 471934, People's Republic of China.
| | - Zhenpeng Yao
- Center of Hydrogen Science, Shanghai Jiao Tong University, Shanghai, 200000, People's Republic of China
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200000, People's Republic of China
| | - Qiaobao Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Materials, Xiamen University, Xiamen, 361005, People's Republic of China.
| |
Collapse
|
39
|
Medrano Sandonas L, Hoja J, Ernst BG, Vázquez-Mayagoitia Á, DiStasio RA, Tkatchenko A. "Freedom of design" in chemical compound space: towards rational in silico design of molecules with targeted quantum-mechanical properties. Chem Sci 2023; 14:10702-10717. [PMID: 37829035 PMCID: PMC10566466 DOI: 10.1039/d3sc03598k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 08/17/2023] [Indexed: 10/14/2023] Open
Abstract
The rational design of molecules with targeted quantum-mechanical (QM) properties requires an advanced understanding of the structure-property/property-property relationships (SPR/PPR) that exist across chemical compound space (CCS). In this work, we analyze these fundamental relationships in the sector of CCS spanned by small (primarily organic) molecules using the recently developed QM7-X dataset, a systematic, extensive, and tightly converged collection of 42 QM properties corresponding to ≈4.2M equilibrium and non-equilibrium molecular structures containing up to seven heavy/non-hydrogen atoms (including C, N, O, S, and Cl). By characterizing and enumerating progressively more complex manifolds of molecular property space-the corresponding high-dimensional space defined by the properties of each molecule in this sector of CCS-our analysis reveals that one has a substantial degree of flexibility or "freedom of design" when searching for a single molecule with a desired pair of properties or a set of distinct molecules sharing an array of properties. To explore how this intrinsic flexibility manifests in the molecular design process, we used multi-objective optimization to search for molecules with simultaneously large polarizabilities and HOMO-LUMO gaps; analysis of the resulting Pareto fronts identified non-trivial paths through CCS consisting of sequential structural and/or compositional changes that yield molecules with optimal combinations of these properties.
Collapse
Affiliation(s)
- Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg City Luxembourg
| | - Johannes Hoja
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg City Luxembourg
- Institute of Chemistry, University of Graz 8010 Graz Austria
| | - Brian G Ernst
- Department of Chemistry and Chemical Biology, Cornell University Ithaca NY 14853 USA
| | | | - Robert A DiStasio
- Department of Chemistry and Chemical Biology, Cornell University Ithaca NY 14853 USA
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg City Luxembourg
| |
Collapse
|
40
|
Lederer J, Gastegger M, Schütt KT, Kampffmeyer M, Müller KR, Unke OT. Automatic identification of chemical moieties. Phys Chem Chem Phys 2023; 25:26370-26379. [PMID: 37750554 PMCID: PMC10548786 DOI: 10.1039/d3cp03845a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 08/18/2023] [Indexed: 09/27/2023]
Abstract
In recent years, the prediction of quantum mechanical observables with machine learning methods has become increasingly popular. Message-passing neural networks (MPNNs) solve this task by constructing atomic representations, from which the properties of interest are predicted. Here, we introduce a method to automatically identify chemical moieties (molecular building blocks) from such representations, enabling a variety of applications beyond property prediction, which otherwise rely on expert knowledge. The required representation can either be provided by a pretrained MPNN, or be learned from scratch using only structural information. Beyond the data-driven design of molecular fingerprints, the versatility of our approach is demonstrated by enabling the selection of representative entries in chemical databases, the automatic construction of coarse-grained force fields, as well as the identification of reaction coordinates.
Collapse
Affiliation(s)
- Jonas Lederer
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Michael Gastegger
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Kristof T Schütt
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Michael Kampffmeyer
- Department of Physics and Technology, UiT The Arctic University of Norway, 9019 Tromsø, Norway
| | - Klaus-Robert Müller
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
- Google Deepmind, Germany
- Department of Artificial Intelligence, Korea University, Seoul 136-713, Korea
- Max Planck Institut für Informatik, 66123 Saarbrücken, Germany
| | - Oliver T Unke
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
- Google Deepmind, Germany
| |
Collapse
|
41
|
Hermann J, Spencer J, Choo K, Mezzacapo A, Foulkes WMC, Pfau D, Carleo G, Noé F. Ab initio quantum chemistry with neural-network wavefunctions. Nat Rev Chem 2023; 7:692-709. [PMID: 37558761 DOI: 10.1038/s41570-023-00516-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/16/2023] [Indexed: 08/11/2023]
Abstract
Deep learning methods outperform human capabilities in pattern recognition and data processing problems and now have an increasingly important role in scientific discovery. A key application of machine learning in molecular science is to learn potential energy surfaces or force fields from ab initio solutions of the electronic Schrödinger equation using data sets obtained with density functional theory, coupled cluster or other quantum chemistry (QC) methods. In this Review, we discuss a complementary approach using machine learning to aid the direct solution of QC problems from first principles. Specifically, we focus on quantum Monte Carlo methods that use neural-network ansatzes to solve the electronic Schrödinger equation, in first and second quantization, computing ground and excited states and generalizing over multiple nuclear configurations. Although still at their infancy, these methods can already generate virtually exact solutions of the electronic Schrödinger equation for small systems and rival advanced conventional QC methods for systems with up to a few dozen electrons.
Collapse
Affiliation(s)
- Jan Hermann
- Microsoft Research AI4Science, Berlin, Germany
- FU Berlin, Department of Mathematics and Computer Science, Berlin, Germany
| | | | - Kenny Choo
- Department of Physics, University of Zurich, Zurich, Switzerland
- IBM Quantum, IBM Research Zurich, Ruschlikon, Switzerland
| | | | - W M C Foulkes
- Imperial College London, Department of Physics, London, UK
| | - David Pfau
- DeepMind, London, UK.
- Imperial College London, Department of Physics, London, UK.
| | | | - Frank Noé
- Microsoft Research AI4Science, Berlin, Germany.
- FU Berlin, Department of Mathematics and Computer Science, Berlin, Germany.
- FU Berlin, Department of Physics, Berlin, Germany.
- Department of Chemistry,Rice University, Houston, TX, USA.
| |
Collapse
|
42
|
Hu F, He F, Yaron DJ. Treating Semiempirical Hamiltonians as Flexible Machine Learning Models Yields Accurate and Interpretable Results. J Chem Theory Comput 2023; 19:6185-6196. [PMID: 37705220 PMCID: PMC10536991 DOI: 10.1021/acs.jctc.3c00491] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Indexed: 09/15/2023]
Abstract
Quantum chemistry provides chemists with invaluable information, but the high computational cost limits the size and type of systems that can be studied. Machine learning (ML) has emerged as a means to dramatically lower the cost while maintaining high accuracy. However, ML models often sacrifice interpretability by using components such as the artificial neural networks of deep learning that function as black boxes. These components impart the flexibility needed to learn from large volumes of data but make it difficult to gain insight into the physical or chemical basis for the predictions. Here, we demonstrate that semiempirical quantum chemical (SEQC) models can learn from large volumes of data without sacrificing interpretability. The SEQC model is that of density-functional-based tight binding (DFTB) with fixed atomic orbital energies and interactions that are one-dimensional functions of the interatomic distance. This model is trained to ab initio data in a manner that is analogous to that used to train deep learning models. Using benchmarks that reflect the accuracy of the training data, we show that the resulting model maintains a physically reasonable functional form while achieving an accuracy, relative to coupled cluster energies with a complete basis set extrapolation (CCSD(T)*/CBS), that is comparable to that of density functional theory (DFT). This suggests that trained SEQC models can achieve a low computational cost and high accuracy without sacrificing interpretability. Use of a physically motivated model form also substantially reduces the amount of ab initio data needed to train the model compared to that required for deep learning models.
Collapse
Affiliation(s)
- Frank Hu
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Francis He
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - David J. Yaron
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
43
|
Góger S, Sandonas LM, Müller C, Tkatchenko A. Data-driven tailoring of molecular dipole polarizability and frontier orbital energies in chemical compound space. Phys Chem Chem Phys 2023; 25:22211-22222. [PMID: 37566426 PMCID: PMC10445328 DOI: 10.1039/d3cp02256k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 07/27/2023] [Indexed: 08/12/2023]
Abstract
Understanding correlations - or lack thereof - between molecular properties is crucial for enabling fast and accurate molecular design strategies. In this contribution, we explore the relation between two key quantities describing the electronic structure and chemical properties of molecular systems: the energy gap between the frontier orbitals and the dipole polarizability. Based on the recently introduced QM7-X dataset, augmented with accurate molecular polarizability calculations as well as analysis of functional group compositions, we show that polarizability and HOMO-LUMO gap are uncorrelated when considering sufficiently extended subsets of the chemical compound space. The relation between these two properties is further analyzed on specific examples of molecules with similar composition as well as homooligomers. Remarkably, the freedom brought by the lack of correlation between molecular polarizability and HOMO-LUMO gap enables the design of novel materials, as we demonstrate on the example of organic photodetector candidates.
Collapse
Affiliation(s)
- Szabolcs Góger
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg.
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg.
| | - Carolin Müller
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg.
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg.
| |
Collapse
|
44
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
45
|
Zhang Z, Liu Q, Lee CK, Hsieh CY, Chen E. An equivariant generative framework for molecular graph-structure Co-design. Chem Sci 2023; 14:8380-8392. [PMID: 37564414 PMCID: PMC10411624 DOI: 10.1039/d3sc02538a] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 07/05/2023] [Indexed: 08/12/2023] Open
Abstract
Designing molecules with desirable physiochemical properties and functionalities is a long-standing challenge in chemistry, material science, and drug discovery. Recently, machine learning-based generative models have emerged as promising approaches for de novo molecule design. However, further refinement of methodology is highly desired as most existing methods lack unified modeling of 2D topology and 3D geometry information and fail to effectively learn the structure-property relationship for molecule design. Here we present MolCode, a roto-translation equivariant generative framework for molecular graph-structure Co-design. In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure. Extensive experimental results show that MolCode outperforms previous methods on a series of challenging tasks including de novo molecule design, targeted molecule discovery, and structure-based drug design. Particularly, MolCode not only consistently generates valid (99.95% validity) and diverse (98.75% uniqueness) molecular graphs/structures with desirable properties, but also generates drug-like molecules with high affinity to target proteins (61.8% high affinity ratio), which demonstrates MolCode's potential applications in material design and drug discovery. Our extensive investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design, and provide new insights into machine learning-based molecule representation and generation.
Collapse
Affiliation(s)
- Zaixi Zhang
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China Hefei Anhui 230026 China
- State Key Laboratory of Cognitive Intelligence Hefei Anhui 230088 China
| | - Qi Liu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China Hefei Anhui 230026 China
- State Key Laboratory of Cognitive Intelligence Hefei Anhui 230088 China
| | | | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou Zhejiang 310058 China
| | - Enhong Chen
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China Hefei Anhui 230026 China
- State Key Laboratory of Cognitive Intelligence Hefei Anhui 230088 China
| |
Collapse
|
46
|
Huang B, von Rudorff GF, von Lilienfeld OA. The central role of density functional theory in the AI age. Science 2023; 381:170-175. [PMID: 37440654 DOI: 10.1126/science.abn3445] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 05/30/2023] [Indexed: 07/15/2023]
Abstract
Density functional theory (DFT) plays a pivotal role in chemical and materials science because of its relatively high predictive power, applicability, versatility, and computational efficiency. We review recent progress in machine learning (ML) model developments, which have relied heavily on DFT for synthetic data generation and for the design of model architectures. The general relevance of these developments is placed in a broader context for chemical and materials sciences. DFT-based ML models have reached high efficiency, accuracy, scalability, and transferability and pave the way to the routine use of successful experimental planning software within self-driving laboratories.
Collapse
Affiliation(s)
- Bing Huang
- University of Vienna, Faculty of Physics, AT1090 Wien, Austria
| | - Guido Falk von Rudorff
- University Kassel, Department of Chemistry, 34132 Kassel, Germany
- Center for Interdisciplinary Nanostructure Science and Technology (CINSaT), 34132 Kassel, Germany
| | - O Anatole von Lilienfeld
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5S 1M1, Canada
- Department of Chemistry, University of Toronto, St. George Campus, Toronto, Ontario M5S 3H6, Canada
- Department of Materials Science and Engineering, University of Toronto, St. George Campus, Toronto, Ontario M5S 3E4, Canada
- Department of Physics, University of Toronto, St. George Campus, Toronto, Ontario M5S 1A7, Canada
- Machine Learning Group, Technische Universität Berlin and Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
| |
Collapse
|
47
|
Serafim LF, Jayasinghe-Arachchige VM, Wang L, Rathee P, Yang J, Moorkkannur N S, Prabhakar R. Distinct chemical factors in hydrolytic reactions catalyzed by metalloenzymes and metal complexes. Chem Commun (Camb) 2023. [PMID: 37366367 DOI: 10.1039/d3cc01380d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2023]
Abstract
The selective hydrolysis of the extremely stable phosphoester, peptide and ester bonds of molecules by bio-inspired metal-based catalysts (metallohydrolases) is required in a wide range of biological, biotechnological and industrial applications. Despite the impressive advances made in the field, the ultimate goal of designing efficient enzyme mimics for these reactions is still elusive. Its realization will require a deeper understanding of the diverse chemical factors that influence the activities of both natural and synthetic catalysts. They include catalyst-substrate complexation, non-covalent interactions and the electronic nature of the metal ion, ligand environment and nucleophile. Based on our computational studies, their roles are discussed for several mono- and binuclear metallohydrolases and their synthetic analogues. Hydrolysis by natural metallohydrolases is found to be promoted by a ligand environment with low basicity, a metal bound water and a heterobinuclear metal center (in binuclear enzymes). Additionally, peptide and phosphoester hydrolysis is dominated by two competing effects, i.e. nucleophilicity and Lewis acid activation, respectively. In synthetic analogues, hydrolysis is facilitated by the inclusion of a second metal center, hydrophobic effects, a biological metal (Zn, Cu and Co) and a terminal hydroxyl nucleophile. Due to the absence of the protein environment, hydrolysis by these small molecules is exclusively influenced by nucleophile activation. The results gleaned from these studies will enhance the understanding of fundamental principles of multiple hydrolytic reactions. They will also advance the development of computational methods as a predictive tool to design more efficient catalysts for hydrolysis, Diels-Alder reaction, Michael addition, epoxide opening and aldol condensation.
Collapse
Affiliation(s)
- Leonardo F Serafim
- Department of Chemistry, University of Miami, Coral Gables, FL 33146, USA.
| | | | - Lukun Wang
- Department of Chemistry, University of Miami, Coral Gables, FL 33146, USA.
| | - Parth Rathee
- Department of Chemistry, University of Miami, Coral Gables, FL 33146, USA.
| | - Jiawen Yang
- Department of Chemistry, University of Miami, Coral Gables, FL 33146, USA.
| | | | - Rajeev Prabhakar
- Department of Chemistry, University of Miami, Coral Gables, FL 33146, USA.
| |
Collapse
|
48
|
Li SW, Xu LC, Zhang C, Zhang SQ, Hong X. Reaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge. Nat Commun 2023; 14:3569. [PMID: 37322041 DOI: 10.1038/s41467-023-39283-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023] Open
Abstract
Accurate prediction of reactivity and selectivity provides the desired guideline for synthetic development. Due to the high-dimensional relationship between molecular structure and synthetic function, it is challenging to achieve the predictive modelling of synthetic transformation with the required extrapolative ability and chemical interpretability. To meet the gap between the rich domain knowledge of chemistry and the advanced molecular graph model, herein we report a knowledge-based graph model that embeds the digitalized steric and electronic information. In addition, a molecular interaction module is developed to enable the learning of the synergistic influence of reaction components. In this study, we demonstrate that this knowledge-based graph model achieves excellent predictions of reaction yield and stereoselectivity, whose extrapolative ability is corroborated by additional scaffold-based data splittings and experimental verifications with new catalysts. Because of the embedding of local environment, the model allows the atomic level of interpretation of the steric and electronic influence on the overall synthetic performance, which serves as a useful guide for the molecular engineering towards the target synthetic function. This model offers an extrapolative and interpretable approach for reaction performance prediction, pointing out the importance of chemical knowledge-constrained reaction modelling for synthetic purpose.
Collapse
Affiliation(s)
- Shu-Wen Li
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China
| | - Li-Cheng Xu
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China
| | - Cheng Zhang
- Department of Chemistry, University of Science and Technology of China, Hefei, China
| | - Shuo-Qing Zhang
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China.
| | - Xin Hong
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China.
- Beijing National Laboratory for Molecular Sciences, Zhongguancun North First Street No. 2, Beijing, 100190, PR China.
- Key Laboratory of Precise Synthesis of Functional Molecules of Zhejiang Province, School of Science, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, China.
| |
Collapse
|
49
|
Collins EM, Raghavachari K. Interpretable Graph-Network-Based Machine Learning Models via Molecular Fragmentation. J Chem Theory Comput 2023; 19:2804-2810. [PMID: 37134275 DOI: 10.1021/acs.jctc.2c01308] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Chemists have long benefitted from the ability to understand and interpret the predictions of computational models. With the current shift to more complex deep learning models, in many situations that utility is lost. In this work, we expand on our previously work on computational thermochemistry and propose an interpretable graph network, FragGraph(nodes), that provides decomposed predictions into fragment-wise contributions. We demonstrate the usefulness of our model in predicting a correction to density functional theory (DFT)-calculated atomization energies using Δ-learning. Our model predicts G4(MP2)-quality thermochemistry with an accuracy of <1 kJ mol-1 for the GDB9 dataset. Besides the high accuracy of our predictions, we observe trends in the fragment corrections which quantitatively describe the deficiencies of B3LYP. Node-wise predictions significantly outperform our previous model predictions from a global state vector. This effect is most pronounced as we explore the generality by predicting on more diverse test sets indicating node-wise predictions are less sensitive to extending machine learning models to larger molecules.
Collapse
Affiliation(s)
- Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
50
|
Schütt KT, Hessmann SSP, Gebauer NWA, Lederer J, Gastegger M. SchNetPack 2.0: A neural network toolbox for atomistic machine learning. J Chem Phys 2023; 158:144801. [PMID: 37061495 DOI: 10.1063/5.0138367] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023] Open
Abstract
SchNetPack is a versatile neural network toolbox that addresses both the requirements of method development and the application of atomistic machine learning. Version 2.0 comes with an improved data pipeline, modules for equivariant neural networks, and a PyTorch implementation of molecular dynamics. An optional integration with PyTorch Lightning and the Hydra configuration framework powers a flexible command-line interface. This makes SchNetPack 2.0 easily extendable with a custom code and ready for complex training tasks, such as the generation of 3D molecular structures.
Collapse
Affiliation(s)
- Kristof T Schütt
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | | | - Niklas W A Gebauer
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Jonas Lederer
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| |
Collapse
|