1
|
Lourenço MP, Hostaš J, Bellinger C, Tchagang A, Salahub DR. Reinforcement learning for in silico determination of adsorbate-substrate structures. J Comput Chem 2024; 45:1289-1302. [PMID: 38357973 DOI: 10.1002/jcc.27322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 01/18/2024] [Accepted: 01/22/2024] [Indexed: 02/16/2024]
Abstract
Reinforcement learning (RL) methods have helped to define the state of the art in the field of modern artificial intelligence, mostly after the breakthrough involving AlphaGo and the discovery of novel algorithms. In this work, we present a RL method, based on Q-learning, for the structural determination of adsorbate@substrate models in silico, where the minimization of the energy landscape resulting from adsorbate interactions with a substrate is made by actions on states (translations and rotations) chosen from an agent's policy. The proposed RL method is implemented in an early version of the reinforcement learning software for materials design and discovery (RLMaterial), developed in Python3.x. RLMaterial interfaces with deMon2k, DFTB+, ORCA, and Quantum Espresso codes to compute the adsorbate@substrate energies. The RL method was applied for the structural determination of (i) the amino acid glycine and (ii) 2-amino-acetaldehyde, both interacting with a boron nitride (BN) monolayer, (iii) host-guest interactions between phenylboronic acid and β-cyclodextrin and (iv) ammonia on naphthalene. Density functional tight binding calculations were used to build the complex search surfaces with a reasonably low computational cost for systems (i)-(iii) and DFT for system (iv). Artificial neural network and gradient boosting regression techniques were employed to approximate the Q-matrix or Q-table for better decision making (policy) on next actions. Finally, we have developed a transfer-learning protocol within the RL framework that allows learning from one chemical system and transferring the experience to another, as well as from different DFT or DFTB levels.
Collapse
Affiliation(s)
- Maicon Pierre Lourenço
- Departamento de Química e Física-Centro de Ciências Exatas, Naturais e da Saúde-CCENS-Universidade Federal do Espírito Santo, Alegre, Brasil
| | - Jiří Hostaš
- Department of Chemistry, Department of Physics and Astronomy, CMS Centre for Molecular Simulation, IQST Institute for Quantum Science and Technology, Quantum Alberta, University of Calgary, Calgary, Alberta, Canada
- Digital Technologies Research Centre, National Research Council of Canada, Ottawa, Ontario, Canada
| | - Colin Bellinger
- Digital Technologies Research Centre, National Research Council of Canada, Ottawa, Ontario, Canada
| | - Alain Tchagang
- Digital Technologies Research Centre, National Research Council of Canada, Ottawa, Ontario, Canada
| | - Dennis R Salahub
- Department of Chemistry, Department of Physics and Astronomy, CMS Centre for Molecular Simulation, IQST Institute for Quantum Science and Technology, Quantum Alberta, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
2
|
Das M, Ghosh A, Sunoj RB. Advances in machine learning with chemical language models in molecular property and reaction outcome predictions. J Comput Chem 2024; 45:1160-1176. [PMID: 38299229 DOI: 10.1002/jcc.27315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/06/2024] [Accepted: 01/09/2024] [Indexed: 02/02/2024]
Abstract
Molecular properties and reactions form the foundation of chemical space. Over the years, innumerable molecules have been synthesized, a smaller fraction of them found immediate applications, while a larger proportion served as a testimony to creative and empirical nature of the domain of chemical science. With increasing emphasis on sustainable practices, it is desirable that a target set of molecules are synthesized preferably through a fewer empirical attempts instead of a larger library, to realize an active candidate. In this front, predictive endeavors using machine learning (ML) models built on available data acquire high timely significance. Prediction of molecular property and reaction outcome remain one of the burgeoning applications of ML in chemical science. Among several methods of encoding molecular samples for ML models, the ones that employ language like representations are gaining steady popularity. Such representations would additionally help adopt well-developed natural language processing (NLP) models for chemical applications. Given this advantageous background, herein we describe several successful chemical applications of NLP focusing on molecular property and reaction outcome predictions. From relatively simpler recurrent neural networks (RNNs) to complex models like transformers, different network architecture have been leveraged for tasks such as de novo drug design, catalyst generation, forward and retro-synthesis predictions. The chemical language model (CLM) provides promising avenues toward a broad range of applications in a time and cost-effective manner. While we showcase an optimistic outlook of CLMs, attention is also placed on the persisting challenges in reaction domain, which would optimistically be addressed by advanced algorithms tailored to chemical language and with increased availability of high-quality datasets.
Collapse
Affiliation(s)
- Manajit Das
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| | - Ankit Ghosh
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
- Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Mumbai, India
| |
Collapse
|
3
|
Wang G, Wang C, Zhang X, Li Z, Zhou J, Sun Z. Machine learning interatomic potential: Bridge the gap between small-scale models and realistic device-scale simulations. iScience 2024; 27:109673. [PMID: 38646181 PMCID: PMC11033164 DOI: 10.1016/j.isci.2024.109673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024] Open
Abstract
Machine learning interatomic potential (MLIP) overcomes the challenges of high computational costs in density-functional theory and the relatively low accuracy in classical large-scale molecular dynamics, facilitating more efficient and precise simulations in materials research and design. In this review, the current state of the four essential stages of MLIP is discussed, including data generation methods, material structure descriptors, six unique machine learning algorithms, and available software. Furthermore, the applications of MLIP in various fields are investigated, notably in phase-change memory materials, structure searching, material properties predicting, and the pre-trained universal models. Eventually, the future perspectives, consisting of standard datasets, transferability, generalization, and trade-off between accuracy and complexity in MLIPs, are reported.
Collapse
Affiliation(s)
- Guanjie Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
- School of Integrated Circuit Science and Engineering, Beihang University, Beijing 100191, China
| | - Changrui Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Xuanguang Zhang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Zefeng Li
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Jian Zhou
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Zhimei Sun
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| |
Collapse
|
4
|
Ge M, Pan Y, Liu X, Zhao Z, Su D. Automatic center identification of electron diffraction with multi-scale transformer networks. Ultramicroscopy 2024; 259:113926. [PMID: 38310650 DOI: 10.1016/j.ultramic.2024.113926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 12/08/2023] [Accepted: 01/21/2024] [Indexed: 02/06/2024]
Abstract
Selected area electron diffraction (SAED) is a widely used technique for characterizing the structure and measuring lattice parameters of materials. An autonomous analytic method has become an urgent demand for the large-scale SAED data produced from in-situ experiments. In this work, we realize the automatic processing for center identification with a proposed deep segmentation model named the multi-scale Transformer (MS-Trans) network. This algorithm enables robust segmentation of the central spots by combining a novel gated axial-attention module and multi-scale feature fusion. The proposed MS-Trans model shows high precision and robustness, enabling autonomous processing of SAED patterns without any prior knowledge. The application on in-situ SAED data of the oxidation process of FeNi alloy demonstrates its capability of implementing autonomous quantitative processing. © 2017 Elsevier Inc. All rights reserved.
Collapse
Affiliation(s)
- Mengshu Ge
- Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing, 100190, China
| | - Yue Pan
- Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing, 100190, China
| | - Xiaozhi Liu
- Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing, 100190, China
| | - Zhicheng Zhao
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China; Beijing Key Laboratory of Network System and Network Culture, Beijing University of Posts and Telecommunications, Beijing, 100876, China.
| | - Dong Su
- Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing, 100190, China.
| |
Collapse
|
5
|
Huang Z, Wang Y, Li C, He H. Growing Like a Tree: Finding Trunks From Graph Skeleton Trees. IEEE Trans Pattern Anal Mach Intell 2024; 46:2838-2851. [PMID: 38015698 DOI: 10.1109/tpami.2023.3336315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
The message-passing paradigm has served as the foundation of graph neural networks (GNNs) for years, making them achieve great success in a wide range of applications. Despite its elegance, this paradigm presents several unexpected challenges for graph-level tasks, such as the long-range problem, information bottleneck, over-squashing phenomenon, and limited expressivity. In this study, we aim to overcome these major challenges and break the conventional "node- and edge-centric" mindset in graph-level tasks. To this end, we provide an in-depth theoretical analysis of the causes of the information bottleneck from the perspective of information influence. Building on the theoretical results, we offer unique insights to break this bottleneck and suggest extracting a skeleton tree from the original graph, followed by propagating information in a distinctive manner on this tree. Drawing inspiration from natural trees, we further propose to find trunks from graph skeleton trees to create powerful graph representations and develop the corresponding framework for graph-level tasks. Extensive experiments on multiple real-world datasets demonstrate the superiority of our model. Comprehensive experimental analyses further highlight its capability of capturing long-range dependencies and alleviating the over-squashing problem, thereby providing novel insights into graph-level tasks.
Collapse
|
6
|
Wang HE, Triebkorn P, Breyton M, Dollomaja B, Lemarechal JD, Petkoski S, Sorrentino P, Depannemaecker D, Hashemi M, Jirsa VK. Virtual brain twins: from basic neuroscience to clinical use. Natl Sci Rev 2024; 11:nwae079. [PMID: 38698901 PMCID: PMC11065363 DOI: 10.1093/nsr/nwae079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 02/05/2024] [Accepted: 02/20/2024] [Indexed: 05/05/2024] Open
Abstract
Virtual brain twins are personalized, generative and adaptive brain models based on data from an individual's brain for scientific and clinical use. After a description of the key elements of virtual brain twins, we present the standard model for personalized whole-brain network models. The personalization is accomplished using a subject's brain imaging data by three means: (1) assemble cortical and subcortical areas in the subject-specific brain space; (2) directly map connectivity into the brain models, which can be generalized to other parameters; and (3) estimate relevant parameters through model inversion, typically using probabilistic machine learning. We present the use of personalized whole-brain network models in healthy ageing and five clinical diseases: epilepsy, Alzheimer's disease, multiple sclerosis, Parkinson's disease and psychiatric disorders. Specifically, we introduce spatial masks for relevant parameters and demonstrate their use based on the physiological and pathophysiological hypotheses. Finally, we pinpoint the key challenges and future directions.
Collapse
Affiliation(s)
- Huifang E Wang
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale, Institut de Neurosciences des Systèmes (INS) UMR1106; Marseille 13005, France
| | - Paul Triebkorn
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale, Institut de Neurosciences des Systèmes (INS) UMR1106; Marseille 13005, France
| | - Martin Breyton
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale, Institut de Neurosciences des Systèmes (INS) UMR1106; Marseille 13005, France
- Service de Pharmacologie Clinique et Pharmacosurveillance, AP–HM, Marseille, 13005, France
| | - Borana Dollomaja
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale, Institut de Neurosciences des Systèmes (INS) UMR1106; Marseille 13005, France
| | - Jean-Didier Lemarechal
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale, Institut de Neurosciences des Systèmes (INS) UMR1106; Marseille 13005, France
| | - Spase Petkoski
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale, Institut de Neurosciences des Systèmes (INS) UMR1106; Marseille 13005, France
| | - Pierpaolo Sorrentino
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale, Institut de Neurosciences des Systèmes (INS) UMR1106; Marseille 13005, France
| | - Damien Depannemaecker
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale, Institut de Neurosciences des Systèmes (INS) UMR1106; Marseille 13005, France
| | - Meysam Hashemi
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale, Institut de Neurosciences des Systèmes (INS) UMR1106; Marseille 13005, France
| | - Viktor K Jirsa
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale, Institut de Neurosciences des Systèmes (INS) UMR1106; Marseille 13005, France
| |
Collapse
|
7
|
Ahmadi M, Alizadeh B, Ayyoubzadeh SM, Abiyarghamsari M. Predicting Pharmacokinetics of Drugs Using Artificial Intelligence Tools: A Systematic Review. Eur J Drug Metab Pharmacokinet 2024; 49:249-262. [PMID: 38457092 DOI: 10.1007/s13318-024-00883-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/29/2024] [Indexed: 03/09/2024]
Abstract
BACKGROUND AND OBJECTIVE Pharmacokinetic studies encompass the examination of the absorption, distribution, metabolism, and excretion of bioactive compounds. The pharmacokinetics of drugs exert a substantial influence on their efficacy and safety. Consequently, the investigation of pharmacokinetics holds great importance. However, laboratory-based assessment necessitates the use of numerous animals, various materials, and significant time. To mitigate these challenges, alternative methods such as artificial intelligence have emerged as a promising approach. This systematic review aims to review existing studies, focusing on the application of artificial intelligence tools in predicting the pharmacokinetics of drugs. METHODS A pre-prepared search strategy based on related keywords was used to search different databases (PubMed, Scopus, Web of Science). The process involved combining articles, eliminating duplicates, and screening articles based on their titles, abstracts, and full text. Articles were selected based on inclusion and exclusion criteria. Then, the quality of the included articles was assessed using an appraisal tool. RESULTS Ultimately, 23 relevant articles were included in this study. The clearance parameter received the highest level of investigation, followed by the area under the concentration-time curve (AUC) parameter, in pharmacokinetic studies. Among the various models employed in the articles, Random Forest and eXtreme Gradient Boosting (XGBoost) emerged as the most commonly utilized ones. Generalized Linear Models and Elastic Nets (GLMnet) and Random Forest models showed the most performance in predicting clearance. CONCLUSION Overall, artificial intelligence tools offer a robust, rapid, and precise means of predicting various pharmacokinetic parameters based on a dataset containing information of patients or drugs.
Collapse
Affiliation(s)
- Mahnaz Ahmadi
- Student Research Committee, School of Pharmacy, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Medical Nanotechnology and Tissue Engineering Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Bahareh Alizadeh
- Protein Technology Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Seyed Mohammad Ayyoubzadeh
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
- Health Information Management Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Mahdiye Abiyarghamsari
- Department of Clinical Pharmacy, School of Pharmacy, Shahid Beheshti University of Medical Sciences, Tehran, 1991953381, Iran.
| |
Collapse
|
8
|
Smart SE, Welakuh DM, Narang P. Many-Body Excited States with a Contracted Quantum Eigensolver. J Chem Theory Comput 2024. [PMID: 38693607 DOI: 10.1021/acs.jctc.4c00030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2024]
Abstract
Calculating ground and excited states is an exciting prospect for near-term quantum computing applications, and accurate and efficient algorithms are needed to assess viable directions. We develop an excited-state approach based on the contracted quantum eigensolver (ES-CQE), which iteratively attempts to find a solution to a contraction of the Schrödinger equation projected onto a subspace and does not require a priori information on the system. We focus on the anti-Hermitian portion of the equation, leading to a two-body unitary ansatz. We investigate the role of symmetries, initial states, constraints, and overall performance within the context of the model strongly correlated rectangular H4 system. We show that the ES-CQE achieves near-exact accuracy across the majority of states, covering regions of strong and weak electron correlation, while also elucidating challenging instances for two-body unitary ansatz.
Collapse
Affiliation(s)
- Scott E Smart
- College of Letters and Science, Physical Sciences Division, University of California, Los Angeles, California 90095, United States
| | - Davis M Welakuh
- College of Letters and Science, Physical Sciences Division, University of California, Los Angeles, California 90095, United States
| | - Prineha Narang
- College of Letters and Science, Physical Sciences Division, University of California, Los Angeles, California 90095, United States
| |
Collapse
|
9
|
Ni HC, Yuan R, Zhang J, Zuo JM. Framework of compressive sensing and data compression for 4D-STEM. Ultramicroscopy 2024; 259:113938. [PMID: 38359632 DOI: 10.1016/j.ultramic.2024.113938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 01/28/2024] [Accepted: 02/08/2024] [Indexed: 02/17/2024]
Abstract
Four-dimensional Scanning Transmission Electron Microscopy (4D-STEM) is a powerful technique for high-resolution and high-precision materials characterization at multiple length scales, including the characterization of beam-sensitive materials. However, the field of view of 4D-STEM is relatively small, which in absence of live processing is limited by the data size required for storage. Furthermore, the rectilinear scan approach currently employed in 4D-STEM places a resolution- and signal-dependent dose limit for the study of beam sensitive materials. Improving 4D-STEM data and dose efficiency, by keeping the data size manageable while limiting the amount of electron dose, is thus critical for broader applications. Here we introduce a general method for reconstructing 4D-STEM data with subsampling in both real and reciprocal spaces at high fidelity. The approach is first tested on the subsampled datasets created from a full 4D-STEM dataset, and then demonstrated experimentally using random scan in real-space. The same reconstruction algorithm can also be used for compression of 4D-STEM datasets, leading to a large reduction (100 times or more) in data size, while retaining the fine features of 4D-STEM imaging, for crystalline samples.
Collapse
Affiliation(s)
- Hsu-Chih Ni
- Department of Materials Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Materials Research Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Renliang Yuan
- Intel Corporation, Corporate Quality Network, Hillsboro, OR 97124, USA
| | - Jiong Zhang
- Intel Corporation, Corporate Quality Network, Hillsboro, OR 97124, USA
| | - Jian-Min Zuo
- Department of Materials Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Materials Research Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| |
Collapse
|
10
|
Quetin S, Bahoric B, Maleki F, Enger SA. Deep learning for high-resolution dose prediction in high dose rate brachytherapy for breast cancer treatment. Phys Med Biol 2024; 69:105011. [PMID: 38604185 DOI: 10.1088/1361-6560/ad3dbd] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 04/11/2024] [Indexed: 04/13/2024]
Abstract
Objective.Monte Carlo (MC) simulations are the benchmark for accurate radiotherapy dose calculations, notably in patient-specific high dose rate brachytherapy (HDR BT), in cases where considering tissue heterogeneities is critical. However, the lengthy computational time limits the practical application of MC simulations. Prior research used deep learning (DL) for dose prediction as an alternative to MC simulations. While accurate dose predictions akin to MC were attained, graphics processing unit limitations constrained these predictions to large voxels of 3 mm × 3 mm × 3 mm. This study aimed to enable dose predictions as accurate as MC simulations in 1 mm × 1 mm × 1 mm voxels within a clinically acceptable timeframe.Approach.Computed tomography scans of 98 breast cancer patients treated with Iridium-192-based HDR BT were used: 70 for training, 14 for validation, and 14 for testing. A new cropping strategy based on the distance to the seed was devised to reduce the volume size, enabling efficient training of 3D DL models using 1 mm × 1 mm × 1 mm dose grids. Additionally, novel DL architecture with layer-level fusion were proposed to predict MC simulated dose to medium-in-medium (Dm,m). These architectures fuse information from TG-43 dose to water-in-water (Dw,w) with patient tissue composition at the layer-level. Different inputs describing patient body composition were investigated.Main results.The proposed approach demonstrated state-of-the-art performance, on par with the MCDm,mmaps, but 300 times faster. The mean absolute percent error for dosimetric indices between the MC and DL-predicted complete treatment plans was 0.17% ± 0.15% for the planning target volumeV100, 0.30% ± 0.32% for the skinD2cc, 0.82% ± 0.79% for the lungD2cc, 0.34% ± 0.29% for the chest wallD2ccand 1.08% ± 0.98% for the heartD2cc.Significance.Unlike the time-consuming MC simulations, the proposed novel strategy efficiently converts TG-43Dw,wmaps into preciseDm,mmaps at high resolution, enabling clinical integration.
Collapse
Affiliation(s)
- Sébastien Quetin
- Medical Physics Unit, Department of Oncology, McGill University, Montreal, QC, Canada
- Montreal Institute for Learning Algorithms, Mila, Montreal, QC, Canada
| | - Boris Bahoric
- Department of Radiation Oncology, Jewish General Hospital, McGill University, Montreal, QC, Canada
| | - Farhad Maleki
- Department of Computer Science, University of Calgary, Calgary, AB, Canada
- Department of Diagnostic Radiology, McGill University, Montreal, QC, Canada
- Department of Radiology, University of Florida, Gainesville, FL, United States of America
| | - Shirin A Enger
- Medical Physics Unit, Department of Oncology, McGill University, Montreal, QC, Canada
- Montreal Institute for Learning Algorithms, Mila, Montreal, QC, Canada
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada
| |
Collapse
|
11
|
Harris SB, Biswas A, Yun SJ, Roccapriore KM, Rouleau CM, Puretzky AA, Vasudevan RK, Geohegan DB, Xiao K. Autonomous Synthesis of Thin Film Materials with Pulsed Laser Deposition Enabled by In Situ Spectroscopy and Automation. Small Methods 2024:e2301763. [PMID: 38678523 DOI: 10.1002/smtd.202301763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 04/10/2024] [Indexed: 05/01/2024]
Abstract
Autonomous systems that combine synthesis, characterization, and artificial intelligence can greatly accelerate the discovery and optimization of materials, however platforms for growth of macroscale thin films by physical vapor deposition techniques have lagged far behind others. Here this study demonstrates autonomous synthesis by pulsed laser deposition (PLD), a highly versatile synthesis technique, in the growth of ultrathin WSe2 films. By combing the automation of PLD synthesis and in situ diagnostic feedback with a high-throughput methodology, this study demonstrates a workflow and platform which uses Gaussian process regression and Bayesian optimization to autonomously identify growth regimes for WSe2 films based on Raman spectral criteria by efficiently sampling 0.25% of the chosen 4D parameter space. With throughputs at least 10x faster than traditional PLD workflows, this platform and workflow enables the accelerated discovery and autonomous optimization of the vast number of materials that can be synthesized by PLD.
Collapse
Affiliation(s)
- Sumner B Harris
- Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Arpan Biswas
- Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Seok Joon Yun
- Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Kevin M Roccapriore
- Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Christopher M Rouleau
- Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Alexander A Puretzky
- Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Rama K Vasudevan
- Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - David B Geohegan
- Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Kai Xiao
- Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| |
Collapse
|
12
|
Schwerdtfeger P, Wales DJ. 100 Years of the Lennard-Jones Potential. J Chem Theory Comput 2024. [PMID: 38669689 DOI: 10.1021/acs.jctc.4c00135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
It is now 100 years since Lennard-Jones published his first paper introducing the now famous potential that bears his name. It is therefore timely to reflect on the many achievements, as well as the limitations, of this potential in the theory of atomic and molecular interactions, where applications range from descriptions of intermolecular forces to molecules, clusters, and condensed matter.
Collapse
Affiliation(s)
- Peter Schwerdtfeger
- Centre for Theoretical Chemistry and Physics, The New Zealand Institute for Advanced Study, Massey University Auckland, Private Bag 102904, Auckland 0745, New Zealand
| | - David J Wales
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
13
|
Roche ST, Bayer Q, Carlson BT, Ouligian WC, Serhiayenka P, Stelzer J, Hong TM. Nanosecond anomaly detection with decision trees and real-time application to exotic Higgs decays. Nat Commun 2024; 15:3527. [PMID: 38664390 PMCID: PMC11045859 DOI: 10.1038/s41467-024-47704-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 04/09/2024] [Indexed: 04/28/2024] Open
Abstract
We present an interpretable implementation of the autoencoding algorithm, used as an anomaly detector, built with a forest of deep decision trees on FPGA, field programmable gate arrays. Scenarios at the Large Hadron Collider at CERN are considered, for which the autoencoder is trained using known physical processes of the Standard Model. The design is then deployed in real-time trigger systems for anomaly detection of unknown physical processes, such as the detection of rare exotic decays of the Higgs boson. The inference is made with a latency value of 30 ns at percent-level resource usage using the Xilinx Virtex UltraScale+ VU9P FPGA. Our method offers anomaly detection at low latency values for edge AI users with resource constraints.
Collapse
Affiliation(s)
- S T Roche
- School of Medicine, Saint Louis University, Saint Louis, MO, USA
- Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA, USA
| | - Q Bayer
- Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA, USA
| | - B T Carlson
- Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Physics and Engineering, Westmont College, Santa Barbara, CA, USA
| | - W C Ouligian
- Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA, USA
| | - P Serhiayenka
- Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA, USA
| | - J Stelzer
- Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA, USA
| | - T M Hong
- Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
14
|
van Tilborg D, Brinkmann H, Criscuolo E, Rossen L, Özçelik R, Grisoni F. Deep learning for low-data drug discovery: Hurdles and opportunities. Curr Opin Struct Biol 2024; 86:102818. [PMID: 38669740 DOI: 10.1016/j.sbi.2024.102818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/27/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024]
Abstract
Deep learning is becoming increasingly relevant in drug discovery, from de novo design to protein structure prediction and synthesis planning. However, it is often challenged by the small data regimes typical of certain drug discovery tasks. In such scenarios, deep learning approaches-which are notoriously 'data-hungry'-might fail to live up to their promise. Developing novel approaches to leverage the power of deep learning in low-data scenarios is sparking great attention, and future developments are expected to propel the field further. This mini-review provides an overview of recent low-data-learning approaches in drug discovery, analyzing their hurdles and advantages. Finally, we venture to provide a forecast of future research directions in low-data learning for drug discovery.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands. https://twitter.com/DerekvTilborg
| | - Helena Brinkmann
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/hlnbrkmnn
| | - Emanuele Criscuolo
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/emanuelecriscu9
| | - Luke Rossen
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/molecular_ml
| | - Rıza Özçelik
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands. https://twitter.com/Rza_ozcelik
| | - Francesca Grisoni
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands.
| |
Collapse
|
15
|
Doucet M, Candeago R, Wang H, Browning JF, Su X. Studying Transient Phenomena in Thin Films with Reinforcement Learning. J Phys Chem Lett 2024; 15:4444-4450. [PMID: 38626466 DOI: 10.1021/acs.jpclett.4c00467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Neutron reflectometry has long been a powerful tool to study the interfacial properties of energy materials. Recently, time-resolved neutron reflectometry has been used to better understand transient phenomena in electrochemical systems. Those measurements often comprise a large number of reflectivity curves acquired over a narrow q range, with each individual curve having lower information content compared to a typical steady-state measurement. In this work, we present an approach that leverages existing reinforcement learning tools to model time-resolved data to extract the time evolution of structure parameters. By mapping the reflectivity curves taken at different times as individual states, we use the Soft Actor-Critic algorithm to optimize the time series of structure parameters that best represent the evolution of an electrochemical system. We show that this approach constitutes an elegant solution to the modeling of time-resolved neutron reflectometry data.
Collapse
Affiliation(s)
- Mathieu Doucet
- Neutron Scattering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Riccardo Candeago
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Hanyu Wang
- Center for Nanophase Materials Science, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - James F Browning
- Neutron Scattering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Xiao Su
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
16
|
Ge F, Wang R, Qu C, Zheng P, Nandi A, Conte R, Houston PL, Bowman JM, Dral PO. Tell Machine Learning Potentials What They Are Needed For: Simulation-Oriented Training Exemplified for Glycine. J Phys Chem Lett 2024; 15:4451-4460. [PMID: 38626460 DOI: 10.1021/acs.jpclett.4c00746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Machine learning potentials (MLPs) are widely applied as an efficient alternative way to represent potential energy surfaces (PESs) in many chemical simulations. The MLPs are often evaluated with the root-mean-square errors on the test set drawn from the same distribution as the training data. Here, we systematically investigate the relationship between such test errors and the simulation accuracy with MLPs on an example of a full-dimensional, global PES for the glycine amino acid. Our results show that the errors in the test set do not unambiguously reflect the MLP performance in different simulation tasks, such as relative conformer energies, barriers, vibrational levels, and zero-point vibrational energies. We also offer an easily accessible solution for improving the MLP quality in a simulation-oriented manner, yielding the most precise relative conformer energies and barriers. This solution also passed the stringent test by diffusion Monte Carlo simulations.
Collapse
Affiliation(s)
- Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Ran Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Chen Qu
- Independent Researcher, Toronto, Ontario M9B0E3, Canada
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Apurba Nandi
- Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
- Department of Physics and Materials Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg
| | - Riccardo Conte
- Dipartimento di Chimica, Università degli Studi di Milano, via Golgi 19, 20133 Milano, Italy
| | - Paul L Houston
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States
| | - Joel M Bowman
- Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| |
Collapse
|
17
|
Boldini D, Friedrich L, Kuhn D, Sieber SA. Machine Learning Assisted Hit Prioritization for High Throughput Screening in Drug Discovery. ACS Cent Sci 2024; 10:823-832. [PMID: 38680560 PMCID: PMC11046457 DOI: 10.1021/acscentsci.3c01517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 03/01/2024] [Accepted: 03/01/2024] [Indexed: 05/01/2024]
Abstract
Efficient prioritization of bioactive compounds from high throughput screening campaigns is a fundamental challenge for accelerating drug development efforts. In this study, we present the first data-driven approach to simultaneously detect assay interferents and prioritize true bioactive compounds. By analyzing the learning dynamics during training of a gradient boosting model on noisy high throughput screening data using a novel formulation of sample influence, we are able to distinguish between compounds exhibiting the desired biological response and those producing assay artifacts. Therefore, our method enables false positive and true positive detection without relying on prior screens or assay interference mechanisms, making it applicable to any high throughput screening campaign. We demonstrate that our approach consistently excludes assay interferents with different mechanisms and prioritizes biologically relevant compounds more efficiently than all tested baselines, including a retrospective case study simulating its use in a real drug discovery campaign. Finally, our tool is extremely computationally efficient, requiring less than 30 s per assay on low-resource hardware. As such, our findings show that our method is an ideal addition to existing false positive detection tools and can be used to guide further pharmacological optimization after high throughput screening campaigns.
Collapse
Affiliation(s)
- Davide Boldini
- TUM
School of Natural Sciences, Department of Bioscience, Center for Functional
Protein Assemblies (CPA), Technical University
of Munich, 85748 Garching bei München, Germany
| | - Lukas Friedrich
- The
Healthcare business of Merck KGaA, 64293 Darmstadt, Germany
| | - Daniel Kuhn
- The
Healthcare business of Merck KGaA, 64293 Darmstadt, Germany
| | - Stephan A. Sieber
- TUM
School of Natural Sciences, Department of Bioscience, Center for Functional
Protein Assemblies (CPA), Technical University
of Munich, 85748 Garching bei München, Germany
| |
Collapse
|
18
|
Truex N, Mohapatra S, Melo M, Rodriguez J, Li N, Abraham W, Sementa D, Touti F, Keskin DB, Wu CJ, Irvine DJ, Gómez-Bombarelli R, Pentelute BL. Design of Cytotoxic T Cell Epitopes by Machine Learning of Human Degrons. ACS Cent Sci 2024; 10:793-802. [PMID: 38680558 PMCID: PMC11046456 DOI: 10.1021/acscentsci.3c01544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 02/13/2024] [Accepted: 02/16/2024] [Indexed: 05/01/2024]
Abstract
Antigen processing is critical for therapeutic vaccines to generate epitopes for priming cytotoxic T cell responses against cancer and pathogens, but insufficient processing often limits the quantity of epitopes released. We address this challenge using machine learning to ascribe a proteasomal degradation score to epitope sequences. Epitopes with varying scores were translocated into cells using nontoxic anthrax proteins. Epitopes with a low score show pronounced immunogenicity due to antigen processing, but epitopes with a high score show limited immunogenicity. This work sheds light on the sequence-activity relationships between proteasomal degradation and epitope immunogenicity. We anticipate that future efforts to incorporate proteasomal degradation signals into vaccine designs will lead to enhanced cytotoxic T cell priming by these vaccines in clinical settings.
Collapse
Affiliation(s)
- Nicholas
L. Truex
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry and Biochemistry, University
of South Carolina, Columbia, South Carolina 29208, United States
| | - Somesh Mohapatra
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
- Machine
Intelligence and Manufacturing Operations Group, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Mariane Melo
- The
Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, United States
- Ragon Institute
of Massachusetts General Hospital, Massachusetts
Institute of Technology, and Harvard University, Cambridge, Massachusetts 02139, United States
| | - Jacob Rodriguez
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Na Li
- The
Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, United States
| | - Wuhbet Abraham
- The
Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, United States
| | - Deborah Sementa
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Faycal Touti
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Derin B. Keskin
- Department
of Medical Oncology, Dana-Farber Cancer
Institute, Boston, Massachusetts 02215, United States
- Harvard
Medical School, Boston, Massachusetts 02115, United States
- Broad
Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
- Translational
Immunogenomics Laboratory (TIGL), Dana-Farber
Cancer Institute, Boston, Massachusetts 02215, United States
- Department
of Computer Science, Metropolitan College, Boston University, Boston, Massachusetts 02215, United States
- Section
for Bioinformatics, Department of Health Technology, Technical University of Denmark, Lyngby DK-2800, Denmark
| | - Catherine J. Wu
- Department
of Medical Oncology, Dana-Farber Cancer
Institute, Boston, Massachusetts 02215, United States
- Harvard
Medical School, Boston, Massachusetts 02115, United States
- Broad
Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts 02115, United States
| | - Darrell J. Irvine
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
- The
Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, United States
- Ragon Institute
of Massachusetts General Hospital, Massachusetts
Institute of Technology, and Harvard University, Cambridge, Massachusetts 02139, United States
- Department
of Biological Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
- Howard Hughes Medical Institute, Chevy Chase, Maryland 20815, United States
| | - Rafael Gómez-Bombarelli
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Bradley L. Pentelute
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
- The
Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, United States
- Broad
Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
- Center
for Environmental Health Sciences, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
19
|
Margraf JT. Neural graph distance embedding for molecular geometry generation. J Comput Chem 2024. [PMID: 38655845 DOI: 10.1002/jcc.27349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 03/05/2024] [Accepted: 03/08/2024] [Indexed: 04/26/2024]
Abstract
This article introduces neural graph distance embedding (nGDE), a method for generating 3D molecular geometries. Leveraging a graph neural network trained on the OE62 dataset of molecular geometries, nGDE predicts interatomic distances based on molecular graphs. These distances are then used in multidimensional scaling to produce 3D geometries, subsequently refined with standard bioorganic forcefields. The machine learning-based graph distance introduced herein is found to be an improvement over the conventional shortest path distances used in graph drawing. Comparative analysis with a state-of-the-art distance geometry method demonstrates nGDE's competitive performance, particularly showcasing robustness in handling polycyclic molecules-a challenge for existing methods.
Collapse
Affiliation(s)
- Johannes T Margraf
- Bavarian Center for Battery Technology (BayBatt), University of Bayreuth, Bayreuth, Germany
| |
Collapse
|
20
|
Gangwal A, Lavecchia A. Unlocking the potential of generative AI in drug discovery. Drug Discov Today 2024:103992. [PMID: 38663579 DOI: 10.1016/j.drudis.2024.103992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 03/22/2024] [Accepted: 04/18/2024] [Indexed: 05/04/2024]
Abstract
Artificial intelligence (AI) is revolutionizing drug discovery by enhancing precision, reducing timelines and costs, and enabling AI-driven computer-aided drug design. This review focuses on recent advancements in deep generative models (DGMs) for de novo drug design, exploring diverse algorithms and their profound impact. It critically analyses the challenges that are intricately interwoven into these technologies, proposing strategies to unlock their full potential. It features case studies of both successes and failures in advancing drugs to clinical trials with AI assistance. Last, it outlines a forward-looking plan for optimizing DGMs in de novo drug design, thereby fostering faster and more cost-effective drug development.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule 424001, Maharashtra, India
| | - Antonio Lavecchia
- "Drug Discovery" Laboratory, Department of Pharmacy, University of Naples Federico II, I-80131 Naples, Italy.
| |
Collapse
|
21
|
Shakiba M, Akimov AV. Machine-Learned Kohn-Sham Hamiltonian Mapping for Nonadiabatic Molecular Dynamics. J Chem Theory Comput 2024; 20:2992-3007. [PMID: 38581699 DOI: 10.1021/acs.jctc.4c00008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2024]
Abstract
In this work, we report a simple, efficient, and scalable machine-learning (ML) approach for mapping non-self-consistent Kohn-Sham Hamiltonians constructed with one kind of density functional to the nearly self-consistent Hamiltonians constructed with another kind of density functional. This approach is designed as a fast surrogate Hamiltonian calculator for use in long nonadiabatic dynamics simulations of large atomistic systems. In this approach, the input and output features are Hamiltonian matrices computed from different levels of theory. We demonstrate that the developed ML-based Hamiltonian mapping method (1) speeds up the calculations by several orders of magnitude, (2) is conceptually simpler than alternative ML approaches, (3) is applicable to different systems and sizes and can be used for mapping Hamiltonians constructed with arbitrary density functionals, (4) requires a modest training data, learns fast, and generates molecular orbitals and their energies with the accuracy nearly matching that of conventional calculations, and (5) when applied to nonadiabatic dynamics simulation of excitation energy relaxation in large systems yields the corresponding time scales within the margin of error of the conventional calculations. Using this approach, we explore the excitation energy relaxation in C60 fullerene and Si75H64 quantum dot structures and derive qualitative and quantitative insights into dynamics in these systems.
Collapse
Affiliation(s)
- Mohammad Shakiba
- Department of Chemistry, University at Buffalo, The State University of New York, Buffalo, New York 14260, United States
| | - Alexey V Akimov
- Department of Chemistry, University at Buffalo, The State University of New York, Buffalo, New York 14260, United States
| |
Collapse
|
22
|
France-Lanord A, Vroylandt H, Salanne M, Rotenberg B, Saitta AM, Pietrucci F. Data-Driven Path Collective Variables. J Chem Theory Comput 2024; 20:3069-3084. [PMID: 38619076 DOI: 10.1021/acs.jctc.4c00123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Identifying optimal collective variables to model transformations using atomic-scale simulations is a long-standing challenge. We propose a new method for the generation, optimization, and comparison of collective variables that can be thought of as a data-driven generalization of the path collective variable concept. It consists of a kernel ridge regression of the committor probability, which encodes a transformation's progress. The resulting collective variable is one-dimensional, interpretable, and differentiable, making it appropriate for enhanced sampling simulations requiring biasing. We demonstrate the validity of the method on two different applications: a precipitation model and the association of Li+ and F- in water. For the former, we show that global descriptors such as the permutation invariant vector allow reaching an accuracy far from the one achieved via simpler, more intuitive variables. For the latter, we show that information correlated with the transformation mechanism is contained in the first solvation shell only and that inertial effects prevent the derivation of optimal collective variables from the atomic positions only.
Collapse
Affiliation(s)
- Arthur France-Lanord
- Institut des Sciences du Calcul et des Données, ISCD, Sorbonne Université, F-75005 Paris, France
- Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Sorbonne Université, F-75005 Paris, France
| | - Hadrien Vroylandt
- Institut des Sciences du Calcul et des Données, ISCD, Sorbonne Université, F-75005 Paris, France
| | - Mathieu Salanne
- Physicochimie des Électrolytes et Nanosystèmes Interfaciaux, Sorbonne Université, CNRS, 4 Place Jussieu, F-75005 Paris, France
- Institut Universitaire de France (IUF), 75231 Paris, France
| | - Benjamin Rotenberg
- Physicochimie des Électrolytes et Nanosystèmes Interfaciaux, Sorbonne Université, CNRS, 4 Place Jussieu, F-75005 Paris, France
| | - A Marco Saitta
- Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Sorbonne Université, F-75005 Paris, France
| | - Fabio Pietrucci
- Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Sorbonne Université, F-75005 Paris, France
| |
Collapse
|
23
|
Wang Y, Chen H, Xie L, Liu J, Zhang L, Yu J. Swarm Autonomy: From Agent Functionalization to Machine Intelligence. Adv Mater 2024:e2312956. [PMID: 38653192 DOI: 10.1002/adma.202312956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 04/17/2024] [Indexed: 04/25/2024]
Abstract
Swarm behaviors are common in nature, where individual organisms collaborate via perception, communication, and adaptation. Emulating these dynamics, large groups of active agents can self-organize through localized interactions, giving rise to complex swarm behaviors, which exhibit potential for applications across various domains. This review presents a comprehensive summary and perspective of synthetic swarms, to bridge the gap between the microscale individual agents and potential applications of synthetic swarms. It is begun by examining active agents, the fundamental units of synthetic swarms, to understand the origins of their motility and functionality in the presence of external stimuli. Then inter-agent communications and agent-environment communications that contribute to the swarm generation are summarized. Furthermore, the swarm behaviors reported to date and the emergence of machine intelligence within these behaviors are reviewed. Eventually, the applications enabled by distinct synthetic swarms are summarized. By discussing the emergent machine intelligence in swarm behaviors, insights are offered into the design and deployment of autonomous synthetic swarms for real-world applications.
Collapse
Affiliation(s)
- Yibin Wang
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 518172, China
- Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, 518172, China
| | - Hui Chen
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 518172, China
- Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, 518172, China
| | - Leiming Xie
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 518172, China
- Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, 518172, China
| | - Jinbo Liu
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 518172, China
- Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, 518172, China
| | - Li Zhang
- Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong, 999077, China
| | - Jiangfan Yu
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 518172, China
- Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, 518172, China
| |
Collapse
|
24
|
Schmidt B, Hildebrandt A. From GPUs to AI and quantum: three waves of acceleration in bioinformatics. Drug Discov Today 2024; 29:103990. [PMID: 38663581 DOI: 10.1016/j.drudis.2024.103990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/05/2024] [Accepted: 04/17/2024] [Indexed: 05/01/2024]
Abstract
The enormous growth in the amount of data generated by the life sciences is continuously shifting the field from model-driven science towards data-driven science. The need for efficient processing has led to the adoption of massively parallel accelerators such as graphics processing units (GPUs). Consequently, the development of bioinformatics methods nowadays often heavily depends on the effective use of these powerful technologies. Furthermore, progress in computational techniques and architectures continues to be highly dynamic, involving novel deep neural network models and artificial intelligence (AI) accelerators, and potentially quantum processing units in the future. These are expected to be disruptive for the life sciences as a whole and for drug discovery in particular. Here, we identify three waves of acceleration and their applications in a bioinformatics context: (i) GPU computing, (ii) AI and (iii) next-generation quantum computers.
Collapse
Affiliation(s)
- Bertil Schmidt
- Institut für Informatik, Johannes Gutenberg University, Mainz, Germany.
| | | |
Collapse
|
25
|
Meewan I, Panmanee J, Petchyam N, Lertvilai P. HBCVTr: an end-to-end transformer with a deep neural network hybrid model for anti-HBV and HCV activity predictor from SMILES. Sci Rep 2024; 14:9262. [PMID: 38649402 PMCID: PMC11035669 DOI: 10.1038/s41598-024-59933-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 04/16/2024] [Indexed: 04/25/2024] Open
Abstract
Hepatitis B and C viruses (HBV and HCV) are significant causes of chronic liver diseases, with approximately 350 million infections globally. To accelerate the finding of effective treatment options, we introduce HBCVTr, a novel ligand-based drug design (LBDD) method for predicting the inhibitory activity of small molecules against HBV and HCV. HBCVTr employs a hybrid model consisting of double encoders of transformers and a deep neural network to learn the relationship between small molecules' simplified molecular-input line-entry system (SMILES) and their antiviral activity against HBV or HCV. The prediction accuracy of HBCVTr has surpassed baseline machine learning models and existing methods, with R-squared values of 0.641 and 0.721 for the HBV and HCV test sets, respectively. The trained models were successfully applied to virtual screening against 10 million compounds within 240 h, leading to the discovery of the top novel inhibitor candidates, including IJN04 for HBV and IJN12 and IJN19 for HCV. Molecular docking and dynamics simulations identified IJN04, IJN12, and IJN19 target proteins as the HBV core antigen, HCV NS5B RNA-dependent RNA polymerase, and HCV NS3/4A serine protease, respectively. Overall, HBCVTr offers a new and rapid drug discovery and development screening method targeting HBV and HCV.
Collapse
Affiliation(s)
- Ittipat Meewan
- Center for Advanced Therapeutics, Institute of Molecular Biosciences, Mahidol University, Nakhon Pathom, 73170, Thailand.
| | - Jiraporn Panmanee
- Research Center for Neuroscience, Institute of Molecular Biosciences, Mahidol University, Nakhon Pathom, 73170, Thailand
| | - Nopphon Petchyam
- Center for Advanced Therapeutics, Institute of Molecular Biosciences, Mahidol University, Nakhon Pathom, 73170, Thailand
| | - Pichaya Lertvilai
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, 92037, USA
| |
Collapse
|
26
|
Westerlund AM, Manohar Koki S, Kancharla S, Tibo A, Saigiridharan L, Kabeshov M, Mercado R, Genheden S. Do Chemformers Dream of Organic Matter? Evaluating a Transformer Model for Multistep Retrosynthesis. J Chem Inf Model 2024; 64:3021-3033. [PMID: 38602390 DOI: 10.1021/acs.jcim.3c01685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Synthesis planning of new pharmaceutical compounds is a well-known bottleneck in modern drug design. Template-free methods, such as transformers, have recently been proposed as an alternative to template-based methods for single-step retrosynthetic predictions. Here, we trained and evaluated a transformer model, called the Chemformer, for retrosynthesis predictions within drug discovery. The proprietary data set used for training comprised ∼18 M reactions from literature, patents, and electronic lab notebooks. Chemformer was evaluated for the purpose of both single-step and multistep retrosynthesis. We found that the single-step performance of Chemformer was especially good on reaction classes common in drug discovery, with most reaction classes showing a top-10 round-trip accuracy above 0.97. Moreover, Chemformer reached a higher round-trip accuracy compared to that of a template-based model. By analyzing multistep retrosynthesis experiments, we observed that Chemformer found synthetic routes, leading to commercial starting materials for 95% of the target compounds, an increase of more than 20% compared to the template-based model on a proprietary compound data set. In addition to this, we discovered that Chemformer suggested novel disconnections corresponding to reaction templates, which are not included in the template-based model. These findings were further supported by a publicly available ChEMBL compound data set. The conclusions drawn from this work allow for the design of a synthesis planning tool where template-based and template-free models work in harmony to optimize retrosynthetic recommendations.
Collapse
Affiliation(s)
- Annie M Westerlund
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
| | - Siva Manohar Koki
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Supriya Kancharla
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Alessandro Tibo
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
| | | | - Mikhail Kabeshov
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
| | - Rocío Mercado
- Department of Computer Science and Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Samuel Genheden
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
| |
Collapse
|
27
|
Gallegos M, Isamura BK, Popelier PLA, Martín Pendás Á. An Unsupervised Machine Learning Approach for the Automatic Construction of Local Chemical Descriptors. J Chem Inf Model 2024; 64:3059-3079. [PMID: 38498942 DOI: 10.1021/acs.jcim.3c01906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Condensing the many physical variables defining a chemical system into a fixed-size array poses a significant challenge in the development of chemical Machine Learning (ML). Atom Centered Symmetry Functions (ACSFs) offer an intuitive featurization approach by means of a tedious and labor-intensive selection of tunable parameters. In this work, we implement an unsupervised ML strategy relying on a Gaussian Mixture Model (GMM) to automatically optimize the ACSF parameters. GMMs effortlessly decompose the vastness of the chemical and conformational spaces into well-defined radial and angular clusters, which are then used to build tailor-made ACSFs. The unsupervised exploration of the space has demonstrated general applicability across a diverse range of systems, spanning from various unimolecular landscapes to heterogeneous databases. The impact of the sampling technique and temperature on space exploration is also addressed, highlighting the particularly advantageous role of high-temperature Molecular Dynamics (MD) simulations. The reliability of the resulting features is assessed through the estimation of the atomic charges of a prototypical capped amino acid and a heterogeneous collection of CHON molecules. The automatically constructed ACSFs serve as high-quality descriptors, consistently yielding typical prediction errors below 0.010 electrons bound for the reported atomic charges. Altering the spatial distribution of the functions with respect to the cluster highlights the critical role of symmetry rupture in achieving significantly improved features. More specifically, using two separate functions to describe the lower and upper tails of the cluster results in the best performing models with errors as low as 0.006 electrons. Finally, the effectiveness of finely tuned features was checked across different architectures, unveiling the superior performance of Gaussian Process (GP) models over Feed Forward Neural Networks (FFNNs), particularly in low-data regimes, with nearly a 2-fold increase in prediction quality. Altogether, this approach paves the way toward an easier construction of local chemical descriptors, while providing valuable insights into how radial and angular spaces should be mapped. Finally, this work opens the possibility of encoding many-body information beyond angular terms into upcoming ML features.
Collapse
Affiliation(s)
- Miguel Gallegos
- Department of Analytical and Physical Chemistry, University of Oviedo, Oviedo E-33006, Spain
| | | | - Paul L A Popelier
- Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, U.K
| | - Ángel Martín Pendás
- Department of Analytical and Physical Chemistry, University of Oviedo, Oviedo E-33006, Spain
| |
Collapse
|
28
|
Ding Y, Qiang B, Chen Q, Liu Y, Zhang L, Liu Z. Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective. J Chem Inf Model 2024; 64:2955-2970. [PMID: 38489239 DOI: 10.1021/acs.jcim.4c00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate the design of novel reactions, optimize existing ones for higher yields, and discover new pathways for synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning models, it is imperative to derive robust and informative representations or engage in feature engineering using extensive data sets of reactions. This work aims to provide a comprehensive review of established reaction featurization approaches, offering insights into the selection of representations and the design of features for a wide array of tasks. The advantages and limitations of employing SMILES, molecular fingerprints, molecular graphs, and physics-based properties are meticulously elaborated. Solutions to bridge the gap between different representations will also be critically evaluated. Additionally, we introduce a new frontier in chemical reaction pretraining, holding promise as an innovative yet unexplored avenue.
Collapse
Affiliation(s)
- Yuheng Ding
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Bo Qiang
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Qixuan Chen
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Yiqiao Liu
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Liangren Zhang
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Zhenming Liu
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| |
Collapse
|
29
|
Gou Q, Liu J, Su H, Guo Y, Chen J, Zhao X, Pu X. Exploring an accurate machine learning model to quickly estimate stability of diverse energetic materials. iScience 2024; 27:109452. [PMID: 38523799 PMCID: PMC10960145 DOI: 10.1016/j.isci.2024.109452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 01/27/2024] [Accepted: 03/06/2024] [Indexed: 03/26/2024] Open
Abstract
High energy and low sensitivity have been the focus of developing new energetic materials (EMs). However, there has been a lack of a quick and accurate method for evaluating the stability of diverse EMs. Here, we develop a machine learning prediction model with high accuracy for bond dissociation energy (BDE) of EMs. A reliable and representative BDE dataset of EMs is constructed by collecting 778 experimental energetic compounds and quantum mechanics calculation. To sufficiently characterize the BDE of EMs, a hybrid feature representation is proposed by coupling the local target bond into the global structure characteristics. To alleviate the limitation of the low dataset, pairwise difference regression is utilized as a data augmentation with the advantage of reducing systematic errors and improving diversity. Benefiting from these improvements, the XGBoost model achieves the best prediction accuracy with R2 of 0.98 and MAE of 8.8 kJ mol-1, significantly outperforming competitive models.
Collapse
Affiliation(s)
- Qiaolin Gou
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Jing Liu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Haoming Su
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Jiayi Chen
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xueyan Zhao
- Institute of Chemical Materials, China Academy of Engineering Physics, Mianyang 621900, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
30
|
Zills F, Schäfer MR, Segreto N, Kästner J, Holm C, Tovey S. Collaboration on Machine-Learned Potentials with IPSuite: A Modular Framework for Learning-on-the-Fly. J Phys Chem B 2024; 128:3662-3676. [PMID: 38568231 DOI: 10.1021/acs.jpcb.3c07187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
The field of machine learning potentials has experienced a rapid surge in progress, thanks to advances in machine learning theory, algorithms, and hardware capabilities. While the underlying methods are continuously evolving, the infrastructure for their deployment has lagged. The community, due to these rapid developments, frequently finds itself split into groups built around different implementations of machine-learned potentials. In this work, we introduce IPSuite, a Python-driven software package designed to connect different methods and algorithms from the comprehensive field of machine-learned potentials into a single platform while also providing a collaborative infrastructure, helping ensure reproducibility. Furthermore, the data management infrastructure of the IPSuite code enables simple model sharing and deployment in simulations. Currently, IPSuite supports six state-of-the-art machine learning approaches for the fitting of interatomic potentials as well as a variety of methods for the selection of training data, running of ab initio calculations, learning-on-the-fly strategies, model evaluation, and simulation deployment.
Collapse
Affiliation(s)
- Fabian Zills
- Institute for Computational Physics, University of Stuttgart, 70569 Stuttgart, Germany
| | - Moritz René Schäfer
- Institute for Theoretical Chemistry, University of Stuttgart, 70569 Stuttgart, Germany
| | - Nico Segreto
- Institute for Theoretical Chemistry, University of Stuttgart, 70569 Stuttgart, Germany
| | - Johannes Kästner
- Institute for Theoretical Chemistry, University of Stuttgart, 70569 Stuttgart, Germany
| | - Christian Holm
- Institute for Computational Physics, University of Stuttgart, 70569 Stuttgart, Germany
| | - Samuel Tovey
- Institute for Computational Physics, University of Stuttgart, 70569 Stuttgart, Germany
| |
Collapse
|
31
|
Qin W, Wang H, Zhang F, Ma W, Wang J, Huang T. Nonconvex Robust High-Order Tensor Completion Using Randomized Low-Rank Approximation. IEEE Trans Image Process 2024; 33:2835-2850. [PMID: 38598373 DOI: 10.1109/tip.2024.3385284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Within the tensor singular value decomposition (T-SVD) framework, existing robust low-rank tensor completion approaches have made great achievements in various areas of science and engineering. Nevertheless, these methods involve the T-SVD based low-rank approximation, which suffers from high computational costs when dealing with large-scale tensor data. Moreover, most of them are only applicable to third-order tensors. Against these issues, in this article, two efficient low-rank tensor approximation approaches fusing random projection techniques are first devised under the order-d ( d ≥ 3 ) T-SVD framework. Theoretical results on error bounds for the proposed randomized algorithms are provided. On this basis, we then further investigate the robust high-order tensor completion problem, in which a double nonconvex model along with its corresponding fast optimization algorithms with convergence guarantees are developed. Experimental results on large-scale synthetic and real tensor data illustrate that the proposed method outperforms other state-of-the-art approaches in terms of both computational efficiency and estimated precision.
Collapse
|
32
|
Joshi KP, Adhikari G, Bhattarai D, Adhikari A, Lamichanne S. Forest fire vulnerability in Nepal's chure region: Investigating the influencing factors using generalized linear model. Heliyon 2024; 10:e28525. [PMID: 38596031 PMCID: PMC11002069 DOI: 10.1016/j.heliyon.2024.e28525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/20/2024] [Accepted: 03/20/2024] [Indexed: 04/11/2024] Open
Abstract
The Chure region, among the world's youngest mountains, stands out as highly susceptible to natural calamities, particularly forest fires. The region has consistently experienced forest fire incidents, resulting in the degradation of valuable natural and anthropogenic resources. Despite its vulnerability, there have been limited studies to understand the relationship of various causative factors for the recurring fire problem. Hence, to comprehend the influencing factors for the recurring forest fire problem and its extent, we utilized generalized linear modeling under binary logistic regression to combine the dependent variable of satellite detected fire points and various independent variables. We conducted a variance inflation factor (VIF) test and correlation matrix to identify the 14 suitable variables for the study. The analysis revealed that forest fires occurred mostly during the three pre-monsoon periods and had a significant positive relation with the area under forest, rangeland, bare-grounds, and Normalized Difference Vegetation Index (NDVI) (P < 0.05). Consequently, our model showed that the probability of fire incidents decreases with elevation, precipitation, and population density (P < 0.05). Among the significant variables, the forest areas emerges as the most influencing factor, followed by precipitation, elevation, area of rangeland, population density, NDVI, and the area of bare ground. The validation of the model was done through the area under the curve (AUC = 0.92) and accuracy (ACC = 0.89) assessments, which showed the model performed excellently in terms of predictive capabilities. The modeling result and the forest fire susceptible map provide valuable insights into the forest fire vulnerability in the region, offering baseline information about forest fires that will be helpful for line agencies to prepare management strategies to further prevent the deterioration of the region.
Collapse
Affiliation(s)
| | - Gunjan Adhikari
- Institute of Forestry, Pokhara Campus, Tribhuvan University, Pokhara, Nepal
| | - Divya Bhattarai
- Faculty of Forestry, Agriculture and Forestry University, Hetauda, 44100, Nepal
- Nepal Conservation and Research Center, Ratnanagar-6, Sauraha, Chitwan, Nepal
| | | | - Saurav Lamichanne
- Faculty of Forestry, Agriculture and Forestry University, Hetauda, 44100, Nepal
- Nepal Conservation and Research Center, Ratnanagar-6, Sauraha, Chitwan, Nepal
| |
Collapse
|
33
|
Zhang HK, Liu S, Zhang SX. Absence of Barren Plateaus in Finite Local-Depth Circuits with Long-Range Entanglement. Phys Rev Lett 2024; 132:150603. [PMID: 38682974 DOI: 10.1103/physrevlett.132.150603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/24/2024] [Accepted: 03/13/2024] [Indexed: 05/01/2024]
Abstract
Ground state preparation is classically intractable for general Hamiltonians. On quantum devices, shallow parametrized circuits can be effectively trained to obtain short-range entangled states under the paradigm of variational quantum eigensolver, while deep circuits are generally untrainable due to the barren plateau phenomenon. In this Letter, we give a general lower bound on the variance of circuit gradients for arbitrary quantum circuits composed of local 2-designs. Based on our unified framework, we prove the absence of barren plateaus in training finite local-depth circuits (FLDC) for the ground states of local Hamiltonians. FLDCs are allowed to be deep in the conventional circuit depth to generate long-range entangled ground states, such as topologically ordered states, but their local depths are finite, i.e., there is only a finite number of gates acting on individual qubits. This characteristic sets FLDC apart from shallow circuits: FLDC in general cannot be classically simulated to estimate local observables efficiently by existing tensor network methods in two and higher dimensions. We validate our analytical results with extensive numerical simulations and demonstrate the effectiveness of variational training using the generalized toric code model.
Collapse
Affiliation(s)
- Hao-Kai Zhang
- Institute for Advanced Study, Tsinghua University, Beijing 100084, China
| | - Shuo Liu
- Institute for Advanced Study, Tsinghua University, Beijing 100084, China
| | - Shi-Xin Zhang
- Tencent Quantum Laboratory, Tencent, Shenzhen, Guangdong 518057, China
| |
Collapse
|
34
|
Drmota P, Nadlinger DP, Main D, Nichol BC, Ainley EM, Leichtle D, Mantri A, Kashefi E, Srinivas R, Araneda G, Ballance CJ, Lucas DM. Verifiable Blind Quantum Computing with Trapped Ions and Single Photons. Phys Rev Lett 2024; 132:150604. [PMID: 38682960 DOI: 10.1103/physrevlett.132.150604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 01/16/2024] [Indexed: 05/01/2024]
Abstract
We report the first hybrid matter-photon implementation of verifiable blind quantum computing. We use a trapped-ion quantum server and a client-side photonic detection system networked via a fiber-optic quantum link. The availability of memory qubits and deterministic entangling gates enables interactive protocols without postselection-key requirements for any scalable blind server, which previous realizations could not provide. We quantify the privacy at ≲0.03 leaked classical bits per qubit. This experiment demonstrates a path to fully verified quantum computing in the cloud.
Collapse
Affiliation(s)
- P Drmota
- Department of Physics, University of Oxford, Clarendon Laboratory, Parks Road, Oxford OX1 3PU, United Kingdom
| | - D P Nadlinger
- Department of Physics, University of Oxford, Clarendon Laboratory, Parks Road, Oxford OX1 3PU, United Kingdom
| | - D Main
- Department of Physics, University of Oxford, Clarendon Laboratory, Parks Road, Oxford OX1 3PU, United Kingdom
| | - B C Nichol
- Department of Physics, University of Oxford, Clarendon Laboratory, Parks Road, Oxford OX1 3PU, United Kingdom
| | - E M Ainley
- Department of Physics, University of Oxford, Clarendon Laboratory, Parks Road, Oxford OX1 3PU, United Kingdom
| | - D Leichtle
- Laboratoire d'Informatique de Paris 6, CNRS, Sorbonne Université, Paris 75005, France
| | - A Mantri
- Joint Center for Quantum Information and Computer Science, University of Maryland, College Park, Maryland, USA
| | - E Kashefi
- Laboratoire d'Informatique de Paris 6, CNRS, Sorbonne Université, Paris 75005, France
- School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, United Kingdom
| | - R Srinivas
- Department of Physics, University of Oxford, Clarendon Laboratory, Parks Road, Oxford OX1 3PU, United Kingdom
| | - G Araneda
- Department of Physics, University of Oxford, Clarendon Laboratory, Parks Road, Oxford OX1 3PU, United Kingdom
| | - C J Ballance
- Department of Physics, University of Oxford, Clarendon Laboratory, Parks Road, Oxford OX1 3PU, United Kingdom
| | - D M Lucas
- Department of Physics, University of Oxford, Clarendon Laboratory, Parks Road, Oxford OX1 3PU, United Kingdom
| |
Collapse
|
35
|
Choi S, Lee J, Seo J, Han SW, Lee SH, Seo JH, Seok J. Automated BigSMILES conversion workflow and dataset for homopolymeric macromolecules. Sci Data 2024; 11:371. [PMID: 38605036 PMCID: PMC11009387 DOI: 10.1038/s41597-024-03212-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 04/02/2024] [Indexed: 04/13/2024] Open
Abstract
The simplified molecular-input line-entry system (SMILES) has been utilized in a variety of artificial intelligence analyses owing to its capability of representing chemical structures using line notation. However, its ease of representation is limited, which has led to the proposal of BigSMILES as an alternative method suitable for the representation of macromolecules. Nevertheless, research on BigSMILES remains limited due to its preprocessing requirements. Thus, this study proposes a conversion workflow of BigSMILES, focusing on its automated generation from SMILES representations of homopolymers. BigSMILES representations for 4,927,181 records are provided, thereby enabling its immediate use for various research and development applications. Our study presents detailed descriptions on a validation process to ensure the accuracy, interchangeability, and robustness of the conversion. Additionally, a systematic overview of utilized codes and functions that emphasizes their relevance in the context of BigSMILES generation are produced. This advancement is anticipated to significantly aid researchers and facilitate further studies in BigSMILES representation, including potential applications in deep learning and further extension to complex structures such as copolymers.
Collapse
Affiliation(s)
- Sunho Choi
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Joonbum Lee
- Department of Materials Science and Engineering, Korea University, Seoul, South Korea
| | - Jangwon Seo
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Sung Won Han
- School of Industrial Management Engineering, Korea University, Seoul, South Korea
| | - Sang Hyun Lee
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Ji-Hun Seo
- Department of Materials Science and Engineering, Korea University, Seoul, South Korea
| | - Junhee Seok
- School of Electrical Engineering, Korea University, Seoul, South Korea.
| |
Collapse
|
36
|
Panwar P, Yang Q, Martini A. Temperature-Dependent Density and Viscosity Prediction for Hydrocarbons: Machine Learning and Molecular Dynamics Simulations. J Chem Inf Model 2024; 64:2760-2774. [PMID: 37582234 DOI: 10.1021/acs.jcim.3c00231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Machine learning-based predictive models allow rapid and reliable prediction of material properties and facilitate innovative materials design. Base oils used in the formulation of lubricant products are complex hydrocarbons of varying sizes and structure. This study developed Gaussian process regression-based models to accurately predict the temperature-dependent density and dynamic viscosity of 305 complex hydrocarbons. In our approach, strongly correlated/collinear predictors were trimmed, important predictors were selected by least absolute shrinkage and selection operator (LASSO) regularization and prior domain knowledge, hyperparameters were systematically optimized by Bayesian optimization, and the models were interpreted. The approach provided versatile and quantitative structure-property relationship (QSPR) models with relatively simple predictors for determining the dynamic viscosity and density of complex hydrocarbons at any temperature. In addition, we developed molecular dynamics simulation-based descriptors and evaluated the feasibility and versatility of dynamic descriptors from simulations for predicting the material properties. It was found that the models developed using a comparably smaller pool of dynamic descriptors performed similarly in predicting density and viscosity to models based on many more static descriptors. The best models were shown to predict density and dynamic viscosity with coefficient of determination (R2) values of 99.6% and 97.7%, respectively, for all data sets, including a test data set of 45 molecules. Finally, partial dependency plots (PDPs), individual conditional expectation (ICE) plots, local interpretable model-agnostic explanation (LIME) values, and trimmed model R2 values were used to identify the most important static and dynamic predictors of the density and viscosity.
Collapse
Affiliation(s)
- Pawan Panwar
- Department of Mechanical Engineering, University of California Merced, 5200 North Lake Road, Merced, California 95343, United States
| | - Quanpeng Yang
- Department of Mechanical Engineering, University of California Merced, 5200 North Lake Road, Merced, California 95343, United States
| | - Ashlie Martini
- Department of Mechanical Engineering, University of California Merced, 5200 North Lake Road, Merced, California 95343, United States
| |
Collapse
|
37
|
Gao C, Bao W, Wang S, Zheng J, Wang L, Ren Y, Jiao L, Wang J, Wang X. DockingGA: enhancing targeted molecule generation using transformer neural network and genetic algorithm with docking simulation. Brief Funct Genomics 2024:elae011. [PMID: 38582610 DOI: 10.1093/bfgp/elae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/25/2024] [Accepted: 03/13/2024] [Indexed: 04/08/2024] Open
Abstract
Generative molecular models generate novel molecules with desired properties by searching chemical space. Traditional combinatorial optimization methods, such as genetic algorithms, have demonstrated superior performance in various molecular optimization tasks. However, these methods do not utilize docking simulation to inform the design process, and heavy dependence on the quality and quantity of available data, as well as require additional structural optimization to become candidate drugs. To address this limitation, we propose a novel model named DockingGA that combines Transformer neural networks and genetic algorithms to generate molecules with better binding affinity for specific targets. In order to generate high quality molecules, we chose the Self-referencing Chemical Structure Strings to represent the molecule and optimize the binding affinity of the molecules to different targets. Compared to other baseline models, DockingGA proves to be the optimal model in all docking results for the top 1, 10 and 100 molecules, while maintaining 100% novelty. Furthermore, the distribution of physicochemical properties demonstrates the ability of DockingGA to generate molecules with favorable and appropriate properties. This innovation creates new opportunities for the application of generative models in practical drug discovery.
Collapse
Affiliation(s)
- Changnan Gao
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Wenjie Bao
- Guanghua School of Management, Peking University, Beijing 100091, China
| | - Shuang Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Jianyang Zheng
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Lulu Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Yongqi Ren
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Linfang Jiao
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology, Yonsei University, Incheon 21983, Republic of Korea
| | - Xun Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
- High Performance Computer Research Center, Institute of Computing Technology, CAS, Beijing 100190, China
| |
Collapse
|
38
|
Unke OT, Stöhr M, Ganscha S, Unterthiner T, Maennel H, Kashubin S, Ahlin D, Gastegger M, Medrano Sandonas L, Berryman JT, Tkatchenko A, Müller KR. Biomolecular dynamics with machine-learned quantum-mechanical force fields trained on diverse chemical fragments. Sci Adv 2024; 10:eadn4397. [PMID: 38579003 DOI: 10.1126/sciadv.adn4397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Accepted: 02/29/2024] [Indexed: 04/07/2024]
Abstract
The GEMS method enables molecular dynamics simulations of large heterogeneous systems at ab initio quality.
Collapse
Affiliation(s)
- Oliver T Unke
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Martin Stöhr
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Stefan Ganscha
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Thomas Unterthiner
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Hartmut Maennel
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Sergii Kashubin
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Daniel Ahlin
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
- BASLEARN - TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587 Berlin, Germany
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Joshua T Berryman
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| |
Collapse
|
39
|
Lu B, Xia Y, Ren Y, Xie M, Zhou L, Vinai G, Morton SA, Wee ATS, van der Wiel WG, Zhang W, Wong PKJ. When Machine Learning Meets 2D Materials: A Review. Adv Sci (Weinh) 2024; 11:e2305277. [PMID: 38279508 PMCID: PMC10987159 DOI: 10.1002/advs.202305277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/21/2023] [Indexed: 01/28/2024]
Abstract
The availability of an ever-expanding portfolio of 2D materials with rich internal degrees of freedom (spin, excitonic, valley, sublattice, and layer pseudospin) together with the unique ability to tailor heterostructures made layer by layer in a precisely chosen stacking sequence and relative crystallographic alignments, offers an unprecedented platform for realizing materials by design. However, the breadth of multi-dimensional parameter space and massive data sets involved is emblematic of complex, resource-intensive experimentation, which not only challenges the current state of the art but also renders exhaustive sampling untenable. To this end, machine learning, a very powerful data-driven approach and subset of artificial intelligence, is a potential game-changer, enabling a cheaper - yet more efficient - alternative to traditional computational strategies. It is also a new paradigm for autonomous experimentation for accelerated discovery and machine-assisted design of functional 2D materials and heterostructures. Here, the study reviews the recent progress and challenges of such endeavors, and highlight various emerging opportunities in this frontier research area.
Collapse
Affiliation(s)
- Bin Lu
- ARTIST Lab for Artificial Electronic Materials and Technologies, School of MicroelectronicsNorthwestern Polytechnical UniversityXi'an710072P. R. China
- Yangtze River Delta Research Institute of Northwestern Polytechnical UniversityTaicang215400P. R. China
| | - Yuze Xia
- ARTIST Lab for Artificial Electronic Materials and Technologies, School of MicroelectronicsNorthwestern Polytechnical UniversityXi'an710072P. R. China
- Yangtze River Delta Research Institute of Northwestern Polytechnical UniversityTaicang215400P. R. China
| | - Yuqian Ren
- ARTIST Lab for Artificial Electronic Materials and Technologies, School of MicroelectronicsNorthwestern Polytechnical UniversityXi'an710072P. R. China
- Yangtze River Delta Research Institute of Northwestern Polytechnical UniversityTaicang215400P. R. China
| | - Miaomiao Xie
- ARTIST Lab for Artificial Electronic Materials and Technologies, School of MicroelectronicsNorthwestern Polytechnical UniversityXi'an710072P. R. China
- Yangtze River Delta Research Institute of Northwestern Polytechnical UniversityTaicang215400P. R. China
| | - Liguo Zhou
- ARTIST Lab for Artificial Electronic Materials and Technologies, School of MicroelectronicsNorthwestern Polytechnical UniversityXi'an710072P. R. China
- Yangtze River Delta Research Institute of Northwestern Polytechnical UniversityTaicang215400P. R. China
| | - Giovanni Vinai
- Instituto Officina dei Materiali (IOM)‐CNRLaboratorio TASCTriesteI‐34149Italy
| | - Simon A. Morton
- Advanced Light Source (ALS)Lawrence Berkeley National LaboratoryBerkeleyCA94720USA
| | - Andrew T. S. Wee
- Department of Physics and Centre for Advanced 2D Materials (CA2DM) and Graphene Research Centre (GRC)National University of SingaporeSingapore117542Singapore
| | - Wilfred G. van der Wiel
- NanoElectronics Group, MESA+ Institute for Nanotechnology and BRAINS Center for Brain‐Inspired Nano SystemsUniversity of TwenteEnschede7500AEThe Netherlands
- Institute of PhysicsUniversity of Münster48149MünsterGermany
| | - Wen Zhang
- ARTIST Lab for Artificial Electronic Materials and Technologies, School of MicroelectronicsNorthwestern Polytechnical UniversityXi'an710072P. R. China
- Yangtze River Delta Research Institute of Northwestern Polytechnical UniversityTaicang215400P. R. China
- NanoElectronics Group, MESA+ Institute for Nanotechnology and BRAINS Center for Brain‐Inspired Nano SystemsUniversity of TwenteEnschede7500AEThe Netherlands
| | - Ping Kwan Johnny Wong
- ARTIST Lab for Artificial Electronic Materials and Technologies, School of MicroelectronicsNorthwestern Polytechnical UniversityXi'an710072P. R. China
- Yangtze River Delta Research Institute of Northwestern Polytechnical UniversityTaicang215400P. R. China
- NPU Chongqing Technology Innovation CenterChongqing400000P. R. China
| |
Collapse
|
40
|
Guo Y, Zhang H, Yuan L, Chen W, Zhao H, Yu QQ, Shi W. Machine learning and new insights for breast cancer diagnosis. J Int Med Res 2024; 52:3000605241237867. [PMID: 38663911 PMCID: PMC11047257 DOI: 10.1177/03000605241237867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 02/21/2024] [Indexed: 04/28/2024] Open
Abstract
Breast cancer (BC) is the most prominent form of cancer among females all over the world. The current methods of BC detection include X-ray mammography, ultrasound, computed tomography, magnetic resonance imaging, positron emission tomography and breast thermographic techniques. More recently, machine learning (ML) tools have been increasingly employed in diagnostic medicine for its high efficiency in detection and intervention. The subsequent imaging features and mathematical analyses can then be used to generate ML models, which stratify, differentiate and detect benign and malignant breast lesions. Given its marked advantages, radiomics is a frequently used tool in recent research and clinics. Artificial neural networks and deep learning (DL) are novel forms of ML that evaluate data using computer simulation of the human brain. DL directly processes unstructured information, such as images, sounds and language, and performs precise clinical image stratification, medical record analyses and tumour diagnosis. Herein, this review thoroughly summarizes prior investigations on the application of medical images for the detection and intervention of BC using radiomics, namely DL and ML. The aim was to provide guidance to scientists regarding the use of artificial intelligence and ML in research and the clinic.
Collapse
Affiliation(s)
- Ya Guo
- Department of Oncology, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Heng Zhang
- Department of Laboratory Medicine, Shandong Daizhuang Hospital, Jining, Shandong Province, China
| | - Leilei Yuan
- Department of Oncology, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Weidong Chen
- Department of Oncology, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Haibo Zhao
- Department of Oncology, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Qing-Qing Yu
- Phase I Clinical Research Centre, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Wenjie Shi
- Molecular and Experimental Surgery, University Clinic for General-, Visceral-, Vascular- and Trans-Plantation Surgery, Medical Faculty University Hospital Magdeburg, Otto-von Guericke University, Magdeburg, Germany
| |
Collapse
|
41
|
Munteanu V, Starostin V, Greco A, Pithan L, Gerlach A, Hinderhofer A, Kowarik S, Schreiber F. Neural network analysis of neutron and X-ray reflectivity data incorporating prior knowledge. J Appl Crystallogr 2024; 57:456-469. [PMID: 38596736 PMCID: PMC11001411 DOI: 10.1107/s1600576724002115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 03/03/2024] [Indexed: 04/11/2024] Open
Abstract
Due to the ambiguity related to the lack of phase information, determining the physical parameters of multilayer thin films from measured neutron and X-ray reflectivity curves is, on a fundamental level, an underdetermined inverse problem. This ambiguity poses limitations on standard neural networks, constraining the range and number of considered parameters in previous machine learning solutions. To overcome this challenge, a novel training procedure has been designed which incorporates dynamic prior boundaries for each physical parameter as additional inputs to the neural network. In this manner, the neural network can be trained simultaneously on all well-posed subintervals of a larger parameter space in which the inverse problem is underdetermined. During inference, users can flexibly input their own prior knowledge about the physical system to constrain the neural network prediction to distinct target subintervals in the parameter space. The effectiveness of the method is demonstrated in various scenarios, including multilayer structures with a box model parameterization and a physics-inspired special parameterization of the scattering length density profile for a multilayer structure. In contrast to previous methods, this approach scales favourably when increasing the complexity of the inverse problem, working properly even for a five-layer multilayer model and a periodic multilayer model with up to 17 open parameters.
Collapse
Affiliation(s)
- Valentin Munteanu
- University of Tübingen, Auf der Morgenstelle 10, 72076 Tübingen, Germany
| | - Vladimir Starostin
- University of Tübingen, Auf der Morgenstelle 10, 72076 Tübingen, Germany
| | - Alessandro Greco
- University of Tübingen, Auf der Morgenstelle 10, 72076 Tübingen, Germany
| | - Linus Pithan
- University of Tübingen, Auf der Morgenstelle 10, 72076 Tübingen, Germany
- Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Alexander Gerlach
- University of Tübingen, Auf der Morgenstelle 10, 72076 Tübingen, Germany
| | | | - Stefan Kowarik
- Department of Physical Chemistry, University of Graz, Heinrichstraße 28, 8010 Graz, Austria
| | - Frank Schreiber
- University of Tübingen, Auf der Morgenstelle 10, 72076 Tübingen, Germany
| |
Collapse
|
42
|
Yang Z, Zhang L, Liu T, Wang H, Tang Z, Zhao H, Yuan L, Zhang Z, Liu X. Alternating projection combined with fast gradient projection (FGP-AP) method for intensity-only measurement optical diffraction tomography in LED array microscopy. Biomed Opt Express 2024; 15:2524-2542. [PMID: 38633101 PMCID: PMC11019679 DOI: 10.1364/boe.518955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/06/2024] [Accepted: 03/11/2024] [Indexed: 04/19/2024]
Abstract
Optical diffraction tomography (ODT) is a powerful label-free measurement tool that can quantitatively image the three-dimensional (3D) refractive index (RI) distribution of samples. However, the inherent "missing cone problem," limited illumination angles, and dependence on intensity-only measurements in a simplified imaging setup can all lead to insufficient information mapping in the Fourier domain, affecting 3D reconstruction results. In this paper, we propose the alternating projection combined with the fast gradient projection (FGP-AP) method to compensate for the above problem, which effectively reconstructs the 3D RI distribution of samples using intensity-only images captured from LED array microscopy. The FGP-AP method employs the alternating projection (AP) algorithm for gradient descent and the fast gradient projection (FGP) algorithm for regularization constraints. This approach is equivalent to incorporating prior knowledge of sample non-negativity and smoothness into the 3D reconstruction process. Simulations demonstrate that the FGP-AP method improves reconstruction quality compared to the original AP method, particularly in the presence of noise. Experimental results, obtained from mouse kidney cells and label-free blood cells, further affirm the superior 3D imaging efficacy of the FGP-AP method.
Collapse
Affiliation(s)
- Zewen Yang
- State Key Laboratory for Manufacturing System Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| | - Lu Zhang
- State Key Laboratory for Manufacturing System Engineering, Xi’an Jiaotong University, Xi’an 710049, China
- School of Instrument Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
| | - Tong Liu
- State Key Laboratory for Manufacturing System Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| | - Huijun Wang
- State Key Laboratory for Manufacturing System Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| | - Zhiyuan Tang
- Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China
| | - Hong Zhao
- State Key Laboratory for Manufacturing System Engineering, Xi’an Jiaotong University, Xi’an 710049, China
- School of Instrument Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
| | - Li Yuan
- First Affiliated Hospital, Xi’an Jiaotong University, Xi’an, Shannxi, 710049, China
| | - Zhenxi Zhang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Xi’an Jiaotong University, Xi’an 710049, China
| | - Xiaolong Liu
- Mengchao Hepatobiliary Hospital of Fujian Medical University, The United Innovation of Mengchao Hepatobiliary Technology Key Laboratory of Fujian Provincey, Fuzhou 350025, China
| |
Collapse
|
43
|
Martínez‐Mauricio KL, García‐Jacas CR, Cordoves‐Delgado G. Examining evolutionary scale modeling-derived different-dimensional embeddings in the antimicrobial peptide classification through a KNIME workflow. Protein Sci 2024; 33:e4928. [PMID: 38501511 PMCID: PMC10949403 DOI: 10.1002/pro.4928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 01/28/2024] [Accepted: 01/30/2024] [Indexed: 03/20/2024]
Abstract
Molecular features play an important role in different bio-chem-informatics tasks, such as the Quantitative Structure-Activity Relationships (QSAR) modeling. Several pre-trained models have been recently created to be used in downstream tasks, either by fine-tuning a specific model or by extracting features to feed traditional classifiers. In this regard, a new family of Evolutionary Scale Modeling models (termed as ESM-2 models) was recently introduced, demonstrating outstanding results in protein structure prediction benchmarks. Herein, we studied the usefulness of the different-dimensional embeddings derived from the ESM-2 models to classify antimicrobial peptides (AMPs). To this end, we built a KNIME workflow to use the same modeling methodology across experiments in order to guarantee fair analyses. As a result, the 640- and 1280-dimensional embeddings derived from the 30- and 33-layer ESM-2 models, respectively, are the most valuable since statistically better performances were achieved by the QSAR models built from them. We also fused features of the different ESM-2 models, and it was concluded that the fusion contributes to getting better QSAR models than using features of a single ESM-2 model. Frequency studies revealed that only a portion of the ESM-2 embeddings is valuable for modeling tasks since between 43% and 66% of the features were never used. Comparisons regarding state-of-the-art deep learning (DL) models confirm that when performing methodologically principled studies in the prediction of AMPs, non-DL based QSAR models yield comparable-to-superior performances to DL-based QSAR models. The developed KNIME workflow is available-freely at https://github.com/cicese-biocom/classification-QSAR-bioKom. This workflow can be valuable to avoid unfair comparisons regarding new computational methods, as well as to propose new non-DL based QSAR models.
Collapse
Affiliation(s)
- Karla L. Martínez‐Mauricio
- Departamento de Ciencias de la ComputaciónCentro de Investigación Científica y de Educación Superior de Ensenada (CICESE)EnsenadaMexico
| | - César R. García‐Jacas
- Cátedras CONAHCYT – Departamento de Ciencias de la ComputaciónCentro de Investigación Científica y de Educación Superior de Ensenada (CICESE)EnsenadaMexico
| | - Greneter Cordoves‐Delgado
- Departamento de Ciencias de la ComputaciónCentro de Investigación Científica y de Educación Superior de Ensenada (CICESE)EnsenadaMexico
| |
Collapse
|
44
|
Ghiandoni GM, Evertsson E, Riley DJ, Tyrchan C, Rathi PC. Augmenting DMTA using predictive AI modelling at AstraZeneca. Drug Discov Today 2024; 29:103945. [PMID: 38460568 DOI: 10.1016/j.drudis.2024.103945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 03/11/2024]
Abstract
Design-Make-Test-Analyse (DMTA) is the discovery cycle through which molecules are designed, synthesised, and assayed to produce data that in turn are analysed to inform the next iteration. The process is repeated until viable drug candidates are identified, often requiring many cycles before reaching a sweet spot. The advent of artificial intelligence (AI) and cloud computing presents an opportunity to innovate drug discovery to reduce the number of cycles needed to yield a candidate. Here, we present the Predictive Insight Platform (PIP), a cloud-native modelling platform developed at AstraZeneca. The impact of PIP in each step of DMTA, as well as its architecture, integration, and usage, are discussed and used to provide insights into the future of drug discovery.
Collapse
Affiliation(s)
- Gian Marco Ghiandoni
- Augmented DMTA Platform, R&D IT, AstraZeneca, The Discovery Centre (DISC), Francis Crick Avenue, Cambridge CB2 0AA, UK.
| | - Emma Evertsson
- Research and Early Development, Respiratory and Immunology (R&I), Biopharmaceuticals R&D, AstraZeneca, Pepparedsleden, Mölndal, SE 43183, Sweden
| | - David J Riley
- Augmented DMTA Platform, R&D IT, AstraZeneca, The Discovery Centre (DISC), Francis Crick Avenue, Cambridge CB2 0AA, UK
| | - Christian Tyrchan
- Research and Early Development, Respiratory and Immunology (R&I), Biopharmaceuticals R&D, AstraZeneca, Pepparedsleden, Mölndal, SE 43183, Sweden
| | - Prakash Chandra Rathi
- Augmented DMTA Platform, R&D IT, AstraZeneca, The Discovery Centre (DISC), Francis Crick Avenue, Cambridge CB2 0AA, UK
| |
Collapse
|
45
|
Daniel DT, Mitra S, Eichel RA, Diddens D, Granwehr J. Machine Learning Isotropic g Values of Radical Polymers. J Chem Theory Comput 2024; 20:2592-2604. [PMID: 38456629 DOI: 10.1021/acs.jctc.3c01252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
Methods for electronic structure computations, such as density functional theory (DFT), are routinely used for the calculation of spectroscopic parameters to establish and validate structure-parameter correlations. DFT calculations, however, are computationally expensive for large systems such as polymers. This work explores the machine learning (ML) of isotropic g values, giso, obtained from electron paramagnetic resonance (EPR) experiments of an organic radical polymer. An ML model based on regression trees is trained on DFT-calculated g values of poly(2,2,6,6-tetramethylpiperidinyloxy-4-yl methacrylate) (PTMA) polymer structures extracted from different time frames of a molecular dynamics trajectory. The DFT-derived g values, gisocalc, for different radical densities of PTMA, are compared against experimentally derived g values obtained from in operando EPR measurements of a PTMA-based organic radical battery. The ML-predicted giso values, gisopred, were compared with gisocalc to evaluate the performance of the model. Mean deviations of gisopred from gisocalc were found to be on the order of 0.0001. Furthermore, a performance evaluation on test structures from a separate MD trajectory indicated that the model is sensitive to the radical density and efficiently learns to predict giso values even for radical densities that were not part of the training data set. Since our trained model can reproduce the changes in giso along the MD trajectory and is sensitive to the extent of equilibration of the polymer structure, it is a promising alternative to computationally more expensive DFT methods, particularly for large systems that cannot be easily represented by a smaller model system.
Collapse
Affiliation(s)
- Davis Thomas Daniel
- Institute of Energy and Climate Research (IEK-9), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
- Institute of Technical and Macromolecular Chemistry, RWTH Aachen University, 52056 Aachen, Germany
| | - Souvik Mitra
- Institute of Physical Chemistry, University of Münster, 48149 Münster, Germany
| | - Rüdiger-A Eichel
- Institute of Energy and Climate Research (IEK-9), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
- Institute of Physical Chemistry, RWTH Aachen University, Aachen 52056, Germany
| | - Diddo Diddens
- Helmholtz Institute Münster (IEK-12), Forschungszentrum Jülich GmbH, 48149 Münster, Germany
| | - Josef Granwehr
- Institute of Energy and Climate Research (IEK-9), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
- Institute of Technical and Macromolecular Chemistry, RWTH Aachen University, 52056 Aachen, Germany
| |
Collapse
|
46
|
Elgendy R, Younes A, Abu-Donia HM, Farouk RM. Efficient quantum algorithms for set operations. Sci Rep 2024; 14:7015. [PMID: 38527996 DOI: 10.1038/s41598-024-56860-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 03/12/2024] [Indexed: 03/27/2024] Open
Abstract
Analyzing the relations between Boolean functions has many applications in many fields, such as database systems, cryptography, and collision problems. This paper proposes four quantum algorithms that use amplitude amplification techniques to perform set operations, including Intersection, Difference, and Union, on two Boolean functions in O ( N ) time complexity. The proposed algorithms employ two quantum amplitude amplification techniques divided into two stages. The first stage uses the Younes et al. algorithm for quantum searching via entanglement and partial diffusion to prepare incomplete superpositions of the truth set of the first Boolean function. In the second stage, a modified version of Arima's algorithm, along with an oracle that represent the second Boolean function, is employed to handle the set operations. The proposed algorithms have a higher probability of success in more general and comprehensive applications when compared with relevant techniques in literature.
Collapse
Affiliation(s)
- Rehab Elgendy
- Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt.
| | - Ahmed Younes
- Department of Mathematics and Computer Science, Faculty of Science, Alexandria University, Alexandria, Egypt
- School of Computer Science, University of Birmingham, Birmingham, B15 2TT, UK
| | - H M Abu-Donia
- Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt
| | - R M Farouk
- Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt
| |
Collapse
|
47
|
Luo M, Lee SS. Tandem neural network-assisted inverse design of highly efficient diffractive slanted waveguide grating. Opt Express 2024; 32:12587-12600. [PMID: 38571077 DOI: 10.1364/oe.514502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 03/12/2024] [Indexed: 04/05/2024]
Abstract
Virtual reality devices featuring diffractive grating components have emerged as hotspots in the field of near-to-eye displays. The core aim of our work is to streamline the intricacies involved in devising the highly efficient slanted waveguide grating using the deep-learning-driven inverse design technique. We propose and establish a tandem neural network (TNN) comprising a generative flow-based invertible neural network and a fully connected neural network. The proposed TNN can automatically optimize the coupling efficiencies of the proposed grating at multi-wavelengths, including red, green, and blue beams at incident angles in the range of 0°-15°. The efficiency indicators manifest in the peak transmittance, average transmittance, and illuminance uniformity, reaching approximately 100%, 92%, and 98%, respectively. Additionally, the structural parameters of the grating can be deduced inversely based on the indicators within a short duration of hundreds of milliseconds to seconds using the TNN. The implementation of the inverse-engineered grating is anticipated to serve as a paradigm for simplifying and expediting the development of diverse types of waveguide gratings.
Collapse
|
48
|
Korolev V, Mitrofanov A. Coarse-Grained Crystal Graph Neural Networks for Reticular Materials Design. J Chem Inf Model 2024; 64:1919-1931. [PMID: 38456446 DOI: 10.1021/acs.jcim.3c02083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
Reticular materials, including metal-organic frameworks and covalent organic frameworks, combine the relative ease of synthesis and an impressive range of applications in various fields from gas storage to biomedicine. Diverse properties arise from the variation of building units─metal centers and organic linkers─in almost infinite chemical space. Such variation substantially complicates the experimental design and promotes the use of computational methods. In particular, the most successful artificial intelligence algorithms for predicting the properties of reticular materials are atomic-level graph neural networks, which optionally incorporate domain knowledge. Nonetheless, the data-driven inverse design involving these models suffers from the incorporation of irrelevant and redundant features such as a full atomistic graph and network topology. In this study, we propose a new way of representing materials, aiming to overcome the limitations of existing methods; the message passing is performed on a coarse-grained crystal graph that comprises molecular building units. To highlight the merits of our approach, we assessed the predictive performance and energy efficiency of neural networks built on different materials representations, including composition-based and crystal-structure-aware models. Coarse-grained crystal graph neural networks showed decent accuracy at low computational costs, making them a valuable alternative to omnipresent atomic-level algorithms. Moreover, the presented models can be successfully integrated into an inverse materials design pipeline as estimators of the objective function. Overall, the coarse-grained crystal graph framework is aimed at challenging the prevailing atom-centric perspective on reticular materials design.
Collapse
Affiliation(s)
- Vadim Korolev
- Department of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russia
- MSU Institute for Artificial Intelligence, Lomonosov Moscow State University, Moscow 119192, Russia
| | - Artem Mitrofanov
- Department of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russia
- MSU Institute for Artificial Intelligence, Lomonosov Moscow State University, Moscow 119192, Russia
| |
Collapse
|
49
|
Lalith N, Singh AR, Gauthier JA. The Importance of Reaction Energy in Predicting Chemical Reaction Barriers with Machine Learning Models. Chemphyschem 2024:e202300933. [PMID: 38517585 DOI: 10.1002/cphc.202300933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 03/24/2024]
Abstract
Improving our fundamental understanding of complex heterocatalytic processes increasingly relies on electronic structure simulations and microkinetic models based on calculated energy differences. In particular, calculation of activation barriers, usually achieved through compute-intensive saddle point search routines, remains a serious bottleneck in understanding trends in catalytic activity for highly branched reaction networks. Although the well-known Brønsted-Evans-Polyani (BEP) scaling - a one-feature linear regression model - has been widely applied in such microkinetic models, they still rely on calculated reaction energies and may not generalize beyond a single facet on a single class of materials, e. g., a terrace sites on transition metals. For highly branched and energetically shallow reaction networks, such as electrochemical CO2 reduction or wastewater remediation, calculating even reaction energies on many surfaces can become computationally intractable due to the combinatorial explosion of states that must be considered. Here, we investigate the feasibility of activation barrier prediction without knowledge of the reaction energy using linear and nonlinear machine learning (ML) models trained on a new database of over 500 dehydrogenation activation barriers. We also find that inclusion of the reaction energy significantly improves both classes of ML models, but complex nonlinear models can achieve performance similar to the simplest BEP scaling when predicting activation barriers on new systems. Additionally, inclusion of the reaction energy significantly improves generalizability to new systems beyond the training set. Our results suggest that the reaction energy is a critical feature to consider when building models to predict activation barriers, indicating that efforts to reliably predict reaction energies through, e. g., the Open Catalyst Project and others, will be an important route to effective model development for more complex systems.
Collapse
Affiliation(s)
- Nithin Lalith
- Department of Chemical Engineering, Texas Tech University, Lubbock, TX 79409, USA
| | | | - Joseph A Gauthier
- Department of Chemical Engineering, Texas Tech University, Lubbock, TX 79409, USA
| |
Collapse
|
50
|
Sammüller F, Hermann S, Schmidt M. Why neural functionals suit statistical mechanics. J Phys Condens Matter 2024; 36:243002. [PMID: 38467072 DOI: 10.1088/1361-648x/ad326f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 03/11/2024] [Indexed: 03/13/2024]
Abstract
We describe recent progress in the statistical mechanical description of many-body systems via machine learning combined with concepts from density functional theory and many-body simulations. We argue that the neural functional theory by Sammülleret al(2023Proc. Natl Acad. Sci.120e2312484120) gives a functional representation of direct correlations and of thermodynamics that allows for thorough quality control and consistency checking of the involved methods of artificial intelligence. Addressing a prototypical system we here present a pedagogical application to hard core particle in one spatial dimension, where Percus' exact solution for the free energy functional provides an unambiguous reference. A corresponding standalone numerical tutorial that demonstrates the neural functional concepts together with the underlying fundamentals of Monte Carlo simulations, classical density functional theory, machine learning, and differential programming is available online athttps://github.com/sfalmo/NeuralDFT-Tutorial.
Collapse
Affiliation(s)
- Florian Sammüller
- Theoretische Physik II, Physikalisches Institut, Universität Bayreuth, D-95447 Bayreuth, Germany
| | - Sophie Hermann
- Theoretische Physik II, Physikalisches Institut, Universität Bayreuth, D-95447 Bayreuth, Germany
| | - Matthias Schmidt
- Theoretische Physik II, Physikalisches Institut, Universität Bayreuth, D-95447 Bayreuth, Germany
| |
Collapse
|