1
|
Grambow CA, Weir H, Cunningham CN, Biancalani T, Chuang KV. CREMP: Conformer-rotamer ensembles of macrocyclic peptides for machine learning. Sci Data 2024; 11:859. [PMID: 39122750 PMCID: PMC11316032 DOI: 10.1038/s41597-024-03698-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 07/29/2024] [Indexed: 08/12/2024] Open
Abstract
Computational and machine learning approaches to model the conformational landscape of macrocyclic peptides have the potential to enable rational design and optimization. However, accurate, fast, and scalable methods for modeling macrocycle geometries remain elusive. Recent deep learning approaches have significantly accelerated protein structure prediction and the generation of small-molecule conformational ensembles, yet similar progress has not been made for macrocyclic peptides due to their unique properties. Here, we introduce CREMP, a resource generated for the rapid development and evaluation of machine learning models for macrocyclic peptides. CREMP contains 36,198 unique macrocyclic peptides and their high-quality structural ensembles generated using the Conformer-Rotamer Ensemble Sampling Tool (CREST). Altogether, this new dataset contains nearly 31.3 million unique macrocycle geometries, each annotated with energies derived from semi-empirical extended tight-binding (xTB) DFT calculations. Additionally, we include 3,258 macrocycles with reported passive permeability data to couple conformational ensembles to experiment. We anticipate that this dataset will enable the development of machine learning models that can improve peptide design and optimization for novel therapeutics.
Collapse
Affiliation(s)
- Colin A Grambow
- Prescient Design, Genentech, 1 DNA Way, South San Francisco, CA, 94080, USA.
| | - Hayley Weir
- Prescient Design, Genentech, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Christian N Cunningham
- Department of Peptide Therapeutics, Genentech, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Tommaso Biancalani
- Biology Research | Development, Genentech, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Kangway V Chuang
- Prescient Design, Genentech, 1 DNA Way, South San Francisco, CA, 94080, USA.
| |
Collapse
|
2
|
Ai C, Yang H, Liu X, Dong R, Ding Y, Guo F. MTMol-GPT: De novo multi-target molecular generation with transformer-based generative adversarial imitation learning. PLoS Comput Biol 2024; 20:e1012229. [PMID: 38924082 PMCID: PMC11233020 DOI: 10.1371/journal.pcbi.1012229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 07/09/2024] [Accepted: 06/03/2024] [Indexed: 06/28/2024] Open
Abstract
De novo drug design is crucial in advancing drug discovery, which aims to generate new drugs with specific pharmacological properties. Recently, deep generative models have achieved inspiring progress in generating drug-like compounds. However, the models prioritize a single target drug generation for pharmacological intervention, neglecting the complicated inherent mechanisms of diseases, and influenced by multiple factors. Consequently, developing novel multi-target drugs that simultaneously target specific targets can enhance anti-tumor efficacy and address issues related to resistance mechanisms. To address this issue and inspired by Generative Pre-trained Transformers (GPT) models, we propose an upgraded GPT model with generative adversarial imitation learning for multi-target molecular generation called MTMol-GPT. The multi-target molecular generator employs a dual discriminator model using the Inverse Reinforcement Learning (IRL) method for a concurrently multi-target molecular generation. Extensive results show that MTMol-GPT generates various valid, novel, and effective multi-target molecules for various complex diseases, demonstrating robustness and generalization capability. In addition, molecular docking and pharmacophore mapping experiments demonstrate the drug-likeness properties and effectiveness of generated molecules potentially improve neuropsychiatric interventions. Furthermore, our model's generalizability is exemplified by a case study focusing on the multi-targeted drug design for breast cancer. As a broadly applicable solution for multiple targets, MTMol-GPT provides new insight into future directions to enhance potential complex disease therapeutics by generating high-quality multi-target molecules in drug discovery.
Collapse
Affiliation(s)
- Chengwei Ai
- School of computer science and engineering, Central South University, Changsha, China
| | - Hongpeng Yang
- Department of computer science and engineering, University of South Carolina, Columbia, South Carolina, United States of America
| | - Xiaoyi Liu
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, China
- Ministry of Education, Engineering Research Center for Pharmaceutics of Chinese Materia Medica and New Drug Development, Beijing, China
| | - Ruihan Dong
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Fei Guo
- School of computer science and engineering, Central South University, Changsha, China
| |
Collapse
|
3
|
Kuznetsov M, Ryabov F, Schutski R, Shayakhmetov R, Lin YC, Aliper A, Polykovskiy D. COSMIC: Molecular Conformation Space Modeling in Internal Coordinates with an Adversarial Framework. J Chem Inf Model 2024; 64:3610-3620. [PMID: 38668753 PMCID: PMC11094738 DOI: 10.1021/acs.jcim.3c00989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 03/29/2024] [Accepted: 04/02/2024] [Indexed: 05/14/2024]
Abstract
The fast and accurate conformation space modeling is an essential part of computational approaches for solving ligand and structure-based drug discovery problems. Recent state-of-the-art diffusion models for molecular conformation generation show promising distribution coverage and physical plausibility metrics but suffer from a slow sampling procedure. We propose a novel adversarial generative framework, COSMIC, that shows comparable generative performance but provides a time-efficient sampling and training procedure. Given a molecular graph and random noise, the generator produces a conformation in two stages. First, it constructs a conformation in a rotation and translation invariant representation─internal coordinates. In the second step, the model predicts the distances between neighboring atoms and performs a few fast optimization steps to refine the initial conformation. The proposed model considers conformation energy, achieving comparable space coverage, and diversity metrics results.
Collapse
Affiliation(s)
- Maksim Kuznetsov
- Insilico
Medicine Canada Inc., 1250 René-Lévesque Ouest, Suite 3710, Montréal, Québec H3B 4W8, Canada
| | - Fedor Ryabov
- Insilico
Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W, Phase 2, Hong Kong Science Park, Pak
Shek Kok, New Territories, Hong Kong 999077, China
| | - Roman Schutski
- Insilico
Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W, Phase 2, Hong Kong Science Park, Pak
Shek Kok, New Territories, Hong Kong 999077, China
| | - Rim Shayakhmetov
- Insilico
Medicine Canada Inc., 1250 René-Lévesque Ouest, Suite 3710, Montréal, Québec H3B 4W8, Canada
| | - Yen-Chu Lin
- Insilico
Medicine Taiwan Ltd., Taipei City 110208, Taiwan
| | - Alex Aliper
- Insilico
Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W, Phase 2, Hong Kong Science Park, Pak
Shek Kok, New Territories, Hong Kong 999077, China
| | - Daniil Polykovskiy
- Insilico
Medicine Canada Inc., 1250 René-Lévesque Ouest, Suite 3710, Montréal, Québec H3B 4W8, Canada
| |
Collapse
|
4
|
Ju W, Fang Z, Gu Y, Liu Z, Long Q, Qiao Z, Qin Y, Shen J, Sun F, Xiao Z, Yang J, Yuan J, Zhao Y, Wang Y, Luo X, Zhang M. A Comprehensive Survey on Deep Graph Representation Learning. Neural Netw 2024; 173:106207. [PMID: 38442651 DOI: 10.1016/j.neunet.2024.106207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 01/23/2024] [Accepted: 02/21/2024] [Indexed: 03/07/2024]
Abstract
Graph representation learning aims to effectively encode high-dimensional sparse graph-structured data into low-dimensional dense vectors, which is a fundamental task that has been widely studied in a range of fields, including machine learning and data mining. Classic graph embedding methods follow the basic idea that the embedding vectors of interconnected nodes in the graph can still maintain a relatively close distance, thereby preserving the structural information between the nodes in the graph. However, this is sub-optimal due to: (i) traditional methods have limited model capacity which limits the learning performance; (ii) existing techniques typically rely on unsupervised learning strategies and fail to couple with the latest learning paradigms; (iii) representation learning and downstream tasks are dependent on each other which should be jointly enhanced. With the remarkable success of deep learning, deep graph representation learning has shown great potential and advantages over shallow (traditional) methods, there exist a large number of deep graph representation learning techniques have been proposed in the past decade, especially graph neural networks. In this survey, we conduct a comprehensive survey on current deep graph representation learning algorithms by proposing a new taxonomy of existing state-of-the-art literature. Specifically, we systematically summarize the essential components of graph representation learning and categorize existing approaches by the ways of graph neural network architectures and the most recent advanced learning paradigms. Moreover, this survey also provides the practical and promising applications of deep graph representation learning. Last but not least, we state new perspectives and suggest challenging directions which deserve further investigations in the future.
Collapse
Affiliation(s)
- Wei Ju
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Zheng Fang
- School of Intelligence Science and Technology, Peking University, Beijing, 100871, China
| | - Yiyang Gu
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Zequn Liu
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Qingqing Long
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100086, China
| | - Ziyue Qiao
- Artificial Intelligence Thrust, The Hong Kong University of Science and Technology, Guangzhou, 511453, China
| | - Yifang Qin
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Jianhao Shen
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Fang Sun
- Department of Computer Science, University of California, Los Angeles, 90095, USA
| | - Zhiping Xiao
- Department of Computer Science, University of California, Los Angeles, 90095, USA
| | - Junwei Yang
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Jingyang Yuan
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Yusheng Zhao
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Yifan Wang
- School of Information Technology & Management, University of International Business and Economics, Beijing, 100029, China
| | - Xiao Luo
- Department of Computer Science, University of California, Los Angeles, 90095, USA.
| | - Ming Zhang
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China.
| |
Collapse
|
5
|
Ding Y, Qiang B, Chen Q, Liu Y, Zhang L, Liu Z. Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective. J Chem Inf Model 2024; 64:2955-2970. [PMID: 38489239 DOI: 10.1021/acs.jcim.4c00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate the design of novel reactions, optimize existing ones for higher yields, and discover new pathways for synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning models, it is imperative to derive robust and informative representations or engage in feature engineering using extensive data sets of reactions. This work aims to provide a comprehensive review of established reaction featurization approaches, offering insights into the selection of representations and the design of features for a wide array of tasks. The advantages and limitations of employing SMILES, molecular fingerprints, molecular graphs, and physics-based properties are meticulously elaborated. Solutions to bridge the gap between different representations will also be critically evaluated. Additionally, we introduce a new frontier in chemical reaction pretraining, holding promise as an innovative yet unexplored avenue.
Collapse
Affiliation(s)
- Yuheng Ding
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Bo Qiang
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Qixuan Chen
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Yiqiao Liu
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Liangren Zhang
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Zhenming Liu
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| |
Collapse
|
6
|
Williams DC, Inala N. Physics-Informed Generative Model for Drug-like Molecule Conformers. J Chem Inf Model 2024; 64:2988-3007. [PMID: 38486425 DOI: 10.1021/acs.jcim.3c01816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
We present a diffusion-based generative model for conformer generation. Our model is focused on the reproduction of the bonded structure and is constructed from the associated terms traditionally found in classical force fields to ensure a physically relevant representation. Techniques in deep learning are used to infer atom typing and geometric parameters from a training set. Conformer sampling is achieved by taking advantage of recent advancements in diffusion-based generation. By training on large, synthetic data sets of diverse, drug-like molecules optimized with the semiempirical GFN2-xTB method, high accuracy is achieved for bonded parameters, exceeding that of conventional, knowledge-based methods. Results are also compared to experimental structures from the Protein Databank and the Cambridge Structural Database.
Collapse
Affiliation(s)
- David C Williams
- Nobias Therapeutics, Inc., 144 S Whisman Rd, Suite C, Mountain View, California 94041, United States
| | - Neil Inala
- Nobias Therapeutics, Inc., 144 S Whisman Rd, Suite C, Mountain View, California 94041, United States
| |
Collapse
|
7
|
Guo Z, Liu J, Wang Y, Chen M, Wang D, Xu D, Cheng J. Diffusion models in bioinformatics and computational biology. NATURE REVIEWS BIOENGINEERING 2024; 2:136-154. [PMID: 38576453 PMCID: PMC10994218 DOI: 10.1038/s44222-023-00114-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 08/25/2023] [Indexed: 04/06/2024]
Abstract
Denoising diffusion models embody a type of generative artificial intelligence that can be applied in computer vision, natural language processing and bioinformatics. In this Review, we introduce the key concepts and theoretical foundations of three diffusion modelling frameworks (denoising diffusion probabilistic models, noise-conditioned scoring networks and score stochastic differential equations). We then explore their applications in bioinformatics and computational biology, including protein design and generation, drug and small-molecule design, protein-ligand interaction modelling, cryo-electron microscopy image data analysis and single-cell data analysis. Finally, we highlight open-source diffusion model tools and consider the future applications of diffusion models in bioinformatics.
Collapse
Affiliation(s)
- Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Yanli Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Mengrui Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Duolin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| |
Collapse
|
8
|
Guzman-Pando A, Ramirez-Alonso G, Arzate-Quintana C, Camarillo-Cisneros J. Deep learning algorithms applied to computational chemistry. Mol Divers 2023:10.1007/s11030-023-10771-y. [PMID: 38151697 DOI: 10.1007/s11030-023-10771-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 11/14/2023] [Indexed: 12/29/2023]
Abstract
Recently, there has been a significant increase in the use of deep learning techniques in the molecular sciences, which have shown high performance on datasets and the ability to generalize across data. However, no model has achieved perfect performance in solving all problems, and the pros and cons of each approach remain unclear to those new to the field. Therefore, this paper aims to review deep learning algorithms that have been applied to solve molecular challenges in computational chemistry. We proposed a comprehensive categorization that encompasses two primary approaches; conventional deep learning and geometric deep learning models. This classification takes into account the distinct techniques employed by the algorithms within each approach. We present an up-to-date analysis of these algorithms, emphasizing their key features and open issues. This includes details of input descriptors, datasets used, open-source code availability, task solutions, and actual research applications, focusing on general applications rather than specific ones such as drug discovery. Furthermore, our report discusses trends and future directions in molecular algorithm design, including the input descriptors used for each deep learning model, GPU usage, training and forward processing time, model parameters, the most commonly used datasets, libraries, and optimization schemes. This information aids in identifying the most suitable algorithms for a given task. It also serves as a reference for the datasets and input data frequently used for each algorithm technique. In addition, it provides insights into the benefits and open issues of each technique, and supports the development of novel computational chemistry systems.
Collapse
Affiliation(s)
- Abimael Guzman-Pando
- Computational Chemistry Physics Laboratory, Facultad de Medicina y Ciencias Biomédicas, Universidad Autónoma de Chihuahua, Campus II, 31125, Chihuahua, Mexico
| | - Graciela Ramirez-Alonso
- Faculty of Engineering, Universidad Autónoma de Chihuahua, Campus II, 31125, Chihuahua, Mexico
| | - Carlos Arzate-Quintana
- Computational Chemistry Physics Laboratory, Facultad de Medicina y Ciencias Biomédicas, Universidad Autónoma de Chihuahua, Campus II, 31125, Chihuahua, Mexico
| | - Javier Camarillo-Cisneros
- Computational Chemistry Physics Laboratory, Facultad de Medicina y Ciencias Biomédicas, Universidad Autónoma de Chihuahua, Campus II, 31125, Chihuahua, Mexico.
| |
Collapse
|
9
|
Stylianakis I, Zervos N, Lii JH, Pantazis DA, Kolocouris A. Conformational energies of reference organic molecules: benchmarking of common efficient computational methods against coupled cluster theory. J Comput Aided Mol Des 2023; 37:607-656. [PMID: 37597063 PMCID: PMC10618395 DOI: 10.1007/s10822-023-00513-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 06/03/2023] [Indexed: 08/21/2023]
Abstract
We selected 145 reference organic molecules that include model fragments used in computer-aided drug design. We calculated 158 conformational energies and barriers using force fields, with wide applicability in commercial and free softwares and extensive application on the calculation of conformational energies of organic molecules, e.g. the UFF and DREIDING force fields, the Allinger's force fields MM3-96, MM3-00, MM4-8, the MM2-91 clones MMX and MM+, the MMFF94 force field, MM4, ab initio Hartree-Fock (HF) theory with different basis sets, the standard density functional theory B3LYP, the second-order post-HF MP2 theory and the Domain-based Local Pair Natural Orbital Coupled Cluster DLPNO-CCSD(T) theory, with the latter used for accurate reference values. The data set of the organic molecules includes hydrocarbons, haloalkanes, conjugated compounds, and oxygen-, nitrogen-, phosphorus- and sulphur-containing compounds. We reviewed in detail the conformational aspects of these model organic molecules providing the current understanding of the steric and electronic factors that determine the stability of low energy conformers and the literature including previous experimental observations and calculated findings. While progress on the computer hardware allows the calculations of thousands of conformations for later use in drug design projects, this study is an update from previous classical studies that used, as reference values, experimental ones using a variety of methods and different environments. The lowest mean error against the DLPNO-CCSD(T) reference was calculated for MP2 (0.35 kcal mol-1), followed by B3LYP (0.69 kcal mol-1) and the HF theories (0.81-1.0 kcal mol-1). As regards the force fields, the lowest errors were observed for the Allinger's force fields MM3-00 (1.28 kcal mol-1), ΜΜ3-96 (1.40 kcal mol-1) and the Halgren's MMFF94 force field (1.30 kcal mol-1) and then for the MM2-91 clones MMX (1.77 kcal mol-1) and MM+ (2.01 kcal mol-1) and MM4 (2.05 kcal mol-1). The DREIDING (3.63 kcal mol-1) and UFF (3.77 kcal mol-1) force fields have the lowest performance. These model organic molecules we used are often present as fragments in drug-like molecules. The values calculated using DLPNO-CCSD(T) make up a valuable data set for further comparisons and for improved force field parameterization.
Collapse
Affiliation(s)
- Ioannis Stylianakis
- Department of Medicinal Chemistry, Faculty of Pharmacy, National and Kapodistrian University of Athens, Panepistimioupolis Zografou, 15771, Athens, Greece
| | - Nikolaos Zervos
- Department of Medicinal Chemistry, Faculty of Pharmacy, National and Kapodistrian University of Athens, Panepistimioupolis Zografou, 15771, Athens, Greece
| | - Jenn-Huei Lii
- Department of Chemistry, National Changhua University of Education, Changhua City, Taiwan
| | - Dimitrios A Pantazis
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470, Mülheim an der Ruhr, Germany
| | - Antonios Kolocouris
- Department of Medicinal Chemistry, Faculty of Pharmacy, National and Kapodistrian University of Athens, Panepistimioupolis Zografou, 15771, Athens, Greece.
- Laboratory of Medicinal Chemistry, Section of Pharmaceutical Chemistry, Department of Pharmacy, National and Kapodistrian University of Athens, Panepistimiopolis-Zografou, 15771, Athens, Greece.
| |
Collapse
|
10
|
Park YJ, Kim H, Jo J, Yoon S. Deep contrastive learning of molecular conformation for efficient property prediction. NATURE COMPUTATIONAL SCIENCE 2023; 3:1015-1022. [PMID: 38177719 DOI: 10.1038/s43588-023-00560-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 10/31/2023] [Indexed: 01/06/2024]
Abstract
Data-driven deep learning algorithms provide accurate prediction of high-level quantum-chemical molecular properties. However, their inputs must be constrained to the same quantum-chemical level of geometric relaxation as the training dataset, limiting their flexibility. Adopting alternative cost-effective conformation generative methods introduces domain-shift problems, deteriorating prediction accuracy. Here we propose a deep contrastive learning-based domain-adaptation method called Local Atomic environment Contrastive Learning (LACL). LACL learns to alleviate the disparities in distribution between the two geometric conformations by comparing different conformation-generation methods. We found that LACL forms a domain-agnostic latent space that encapsulates the semantics of an atom's local atomic environment. LACL achieves quantum-chemical accuracy while circumventing the geometric relaxation bottleneck and could enable future application scenarios such as inverse molecular engineering and large-scale screening. Our approach is also generalizable from small organic molecules to long chains of biological and pharmacological molecules.
Collapse
Affiliation(s)
- Yang Jeong Park
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea.
- Institute of New Media and Communications, Seoul National University, Seoul, Republic of Korea.
- Department of Nuclear Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - HyunGi Kim
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea
| | - Jeonghee Jo
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea
- Institute of New Media and Communications, Seoul National University, Seoul, Republic of Korea
| | - Sungroh Yoon
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea.
- Institute of New Media and Communications, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
11
|
McNutt A, Bisiriyu F, Song S, Vyas A, Hutchison GR, Koes DR. Conformer Generation for Structure-Based Drug Design: How Many and How Good? J Chem Inf Model 2023; 63:6598-6607. [PMID: 37903507 PMCID: PMC10647020 DOI: 10.1021/acs.jcim.3c01245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/18/2023] [Accepted: 10/19/2023] [Indexed: 11/01/2023]
Abstract
Conformer generation, the assignment of realistic 3D coordinates to a small molecule, is fundamental to structure-based drug design. Conformational ensembles are required for rigid-body matching algorithms, such as shape-based or pharmacophore approaches, and even methods that treat the ligand flexibly, such as docking, are dependent on the quality of the provided conformations due to not sampling all degrees of freedom (e.g., only sampling torsions). Here, we empirically elucidate some general principles about the size, diversity, and quality of the conformational ensembles needed to get the best performance in common structure-based drug discovery tasks. In many cases, our findings may parallel "common knowledge" well-known to practitioners of the field. Nonetheless, we feel that it is valuable to quantify these conformational effects while reproducing and expanding upon previous studies. Specifically, we investigate the performance of a state-of-the-art generative deep learning approach versus a more classical geometry-based approach, the effect of energy minimization as a postprocessing step, the effect of ensemble size (maximum number of conformers), and construction (filtering by root-mean-square deviation for diversity) and how these choices influence the ability to recapitulate bioactive conformations and perform pharmacophore screening and molecular docking.
Collapse
Affiliation(s)
- Andrew
T. McNutt
- Department
of Computational and Systems Biology, University
of Pittsburgh, Pittsburgh, Pennsylvania 15213, United States
| | - Fatimah Bisiriyu
- The
Neighborhood Academy, Pittsburgh, Pennsylvania 15206, United States
| | - Sophia Song
- Upper
St. Clair High School, Pittsburgh, Pennsylvania 15241, United States
| | - Ananya Vyas
- Taylor
Allderdice High School, Pittsburgh, Pennsylvania 15217, United States
| | - Geoffrey R. Hutchison
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, United States
- Department
of Chemical and Petroleum Engineering, University
of Pittsburgh, Pittsburgh, Pennsylvania 15213, United States
| | - David Ryan Koes
- Department
of Computational and Systems Biology, University
of Pittsburgh, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
12
|
Wang Z, Zhong H, Zhang J, Pan P, Wang D, Liu H, Yao X, Hou T, Kang Y. Small-Molecule Conformer Generators: Evaluation of Traditional Methods and AI Models on High-Quality Data Sets. J Chem Inf Model 2023; 63:6525-6536. [PMID: 37883143 DOI: 10.1021/acs.jcim.3c01519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]
Abstract
Small-molecule conformer generation (SMCG) is an extremely important task in both ligand- and structure-based computer-aided drug design, especially during the hit discovery phase. Recently, a multitude of artificial intelligence (AI) models tailored for SMCG have emerged. Despite developers typically furnishing performance evaluation data upon releasing their AI models, a comprehensive and equitable performance comparison between AI models and conventional methods is still lacking. In this study, we curated a new benchmarking data set comprising 3354 high-quality ligand bioactive conformations. Subsequently, we conducted a systematic assessment of the performance of four widely adopted traditional methods (i.e., ConfGenX, Conformator, OMEGA, and RDKit ETKDG) and five AI models (i.e., ConfGF, DMCG, GeoDiff, GeoMol, and torsional diffusion) in the tasks of reproducing bioactive and low-energy conformations of small molecules. In the former task, the AI models have no advantage, particularly with a maximum ensemble size of 1. Even the best-performing AI model GeoMol is still worse than any of the tested traditional methods. Conversely, in the latter task, the torsional diffusion model shows obvious advantages, surpassing the best-performing traditional method ConfGenX by 26.09 and 12.97% on the COV-R and COV-P metrics, respectively. Furthermore, the influence of force field-based fine-tuning on the quality of the generated conformers was also discussed. Finally, a user-friendly Web server called fastSMCG was developed to enable researchers to rapidly and flexibly generate small-molecule conformers using both traditional and AI methods. We anticipate that our work will offer valuable practical assistance to the scientific community in this field.
Collapse
Affiliation(s)
- Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Haiyang Zhong
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jintu Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macao SAR 999078, China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao SAR 999078, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
13
|
Ilnicka A, Schneider G. Designing molecules with autoencoder networks. NATURE COMPUTATIONAL SCIENCE 2023; 3:922-933. [PMID: 38177601 DOI: 10.1038/s43588-023-00548-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/03/2023] [Indexed: 01/06/2024]
Abstract
Autoencoders are versatile tools in molecular informatics. These unsupervised neural networks serve diverse tasks such as data-driven molecular representation and constructive molecular design. This Review explores their algorithmic foundations and applications in drug discovery, highlighting the most active areas of development and the contributions autoencoder networks have made in advancing this field. We also explore the challenges and prospects concerning the utilization of autoencoders and the various adaptations of this neural network architecture in molecular design.
Collapse
Affiliation(s)
- Agnieszka Ilnicka
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland.
| |
Collapse
|
14
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
15
|
Tang M, Li B, Chen H. Application of message passing neural networks for molecular property prediction. Curr Opin Struct Biol 2023; 81:102616. [PMID: 37267824 DOI: 10.1016/j.sbi.2023.102616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 04/28/2023] [Accepted: 05/04/2023] [Indexed: 06/04/2023]
Abstract
Accurate molecular property prediction, as one of the classical cheminformatics topics, plays a prominent role in the fields of computer-aided drug design. For instance, property prediction models can be used to quickly screen large molecular libraries to find lead compounds. Message-passing neural networks (MPNNs), a sub-class of Graph neural networks (GNNs), have recently been demonstrated to outperform other deep learning methods on a variety of tasks, including the prediction of molecular characteristics. In this survey, we provide a brief review of the MPNN models and their applications on molecular property prediction.
Collapse
Affiliation(s)
- Miru Tang
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong Province, China; Bioland Laboratory (Guangzhou Regenerative Medicine and Health-Guangdong Laboratory), Guangzhou, 510530, China; State Key Laboratory of Respiratory Disease, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, 510530, China
| | - Baiqing Li
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong Province, China
| | - Hongming Chen
- Guangzhou Laboratory, Guangzhou, 510005, Guangdong Province, China.
| |
Collapse
|
16
|
Lungu CN, Putz MV. SARS-CoV-2 Spike Protein Interaction Space. Int J Mol Sci 2023; 24:12058. [PMID: 37569436 PMCID: PMC10418891 DOI: 10.3390/ijms241512058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 07/10/2023] [Accepted: 07/12/2023] [Indexed: 08/13/2023] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a +sense single-strand RNA virus. The virus has four major surface proteins: spike (S), envelope (E), membrane (M), and nucleocapsid (N), respectively. The constitutive proteins present a high grade of symmetry. Identifying a binding site is difficult. The virion is approximately 50-200 nm in diameter. Angiotensin-converting enzyme 2 (ACE2) acts as the cell receptor for the virus. SARS-CoV-2 has an increased affinity to human ACE2 compared with the original SAR strain. Topological space, and its symmetry, is a critical component in molecular interactions. By exploring this space, a suitable ligand space can be characterized accordingly. A spike protein (S) computational model in a complex with ACE 2 was generated using silica methods. Topological spaces were probed using high computational throughput screening techniques to identify and characterize the topological space of both SARS and SARS-CoV-2 spike protein and its ligand space. In order to identify the symmetry clusters, computational analysis techniques, together with statistical analysis, were utilized. The computations are based on crystallographic protein data bank PDB-based models of constitutive proteins. Cartesian coordinates of component atoms and some cluster maps were generated and analyzed. Dihedral angles were used in order to compute a topological receptor space. This computational study uses a multimodal representation of spike protein interactions with some fragment proteins. The chemical space of the receptors (a dimensional volume) suggests the relevance of the receptor as a drug target. The spike protein S of SARS and SARS-CoV-2 is analyzed and compared. The results suggest a mirror symmetry of SARS and SARS-CoV-2 spike proteins. The results show thatSARS-CoV-2 space is variable and has a distinct topology. In conclusion, surface proteins grant virion variability and symmetry in interactions with a potential complementary target (protein, antibody, ligand). The mirror symmetry of dihedral angle clusters determines a high specificity of the receptor space.
Collapse
Affiliation(s)
- Claudiu N. Lungu
- Department of Morphological and Functional Science, University of Medicine and Pharmacy Dunarea de Jos, Str. Alexandru Ioan Cuza No. 36, 800017 Galati, Romania;
| | - Mihai V. Putz
- Laboratory of Structural and Computational Physical-Chemistry for Nanosciences and QSAR, Biology-Chemistry Department, Faculty of Chemistry, Biology, Geography, West University of Timisoara, Str. Pestalozzi No. 16, 300115 Timisoara, Romania
| |
Collapse
|
17
|
Tran T, Ekenna C. Molecular Descriptors Property Prediction Using Transformer-Based Approach. Int J Mol Sci 2023; 24:11948. [PMID: 37569322 PMCID: PMC10419034 DOI: 10.3390/ijms241511948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 07/21/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023] Open
Abstract
In this study, we introduce semi-supervised machine learning models designed to predict molecular properties. Our model employs a two-stage approach, involving pre-training and fine-tuning. Particularly, our model leverages a substantial amount of labeled and unlabeled data consisting of SMILES strings, a text representation system for molecules. During the pre-training stage, our model capitalizes on the Masked Language Model, which is widely used in natural language processing, for learning molecular chemical space representations. During the fine-tuning stage, our model is trained on a smaller labeled dataset to tackle specific downstream tasks, such as classification or regression. Preliminary results indicate that our model demonstrates comparable performance to state-of-the-art models on the chosen downstream tasks from MoleculeNet. Additionally, to reduce the computational overhead, we propose a new approach taking advantage of 3D compound structures for calculating the attention score used in the end-to-end transformer model to predict anti-malaria drug candidates. The results show that using the proposed attention score, our end-to-end model is able to have comparable performance with pre-trained models.
Collapse
|
18
|
Zhang Z, Wang G, Li R, Ni L, Zhang R, Cheng K, Ren Q, Kong X, Ni S, Tong X, Luo L, Wang D, Lu X, Zheng M, Li X. Tora3D: an autoregressive torsion angle prediction model for molecular 3D conformation generation. J Cheminform 2023; 15:57. [PMID: 37287071 DOI: 10.1186/s13321-023-00726-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 05/20/2023] [Indexed: 06/09/2023] Open
Abstract
Three-dimensional (3D) conformations of a small molecule profoundly affect its binding to the target of interest, the resulting biological effects, and its disposition in living organisms, but it is challenging to accurately characterize the conformational ensemble experimentally. Here, we proposed an autoregressive torsion angle prediction model Tora3D for molecular 3D conformer generation. Rather than directly predicting the conformations in an end-to-end way, Tora3D predicts a set of torsion angles of rotatable bonds by an interpretable autoregressive method and reconstructs the 3D conformations from them, which keeps structural validity during reconstruction. Another advancement of our method over other conformational generation methods is the ability to use energy to guide the conformation generation. In addition, we propose a new message-passing mechanism that applies the Transformer to the graph to solve the difficulty of remote message passing. Tora3D shows superior performance to prior computational models in the trade-off between accuracy and efficiency, and ensures conformational validity, accuracy, and diversity in an interpretable way. Overall, Tora3D can be used for the quick generation of diverse molecular conformations and 3D-based molecular representation, contributing to a wide range of downstream drug design tasks.
Collapse
Affiliation(s)
- Zimei Zhang
- Division of Life Science and Medicine, University of Science and Technology of China, Hefei, 230026, Anhui, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Gang Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
| | - Rui Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Pharmacy, China Pharmaceutical University, 639 Longmian Road, Nanjing, 211198, China
| | - Lin Ni
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - RunZe Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
| | - Kaiyang Cheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - Qun Ren
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - Xiangtai Kong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
| | - Shengkun Ni
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
| | - Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
| | - Li Luo
- Precision Pharmacy & Drug Development Center, Department of Pharmacy, Tangdu Hospital, Fourth Military Medical University, Xi'an, 710038, China
| | | | - Xiaojie Lu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
| | - Mingyue Zheng
- Division of Life Science and Medicine, University of Science and Technology of China, Hefei, 230026, Anhui, China.
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China.
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
| |
Collapse
|
19
|
Kubečka J, Knattrup Y, Engsvang M, Jensen AB, Ayoubi D, Wu H, Christiansen O, Elm J. Current and future machine learning approaches for modeling atmospheric cluster formation. NATURE COMPUTATIONAL SCIENCE 2023; 3:495-503. [PMID: 38177415 DOI: 10.1038/s43588-023-00435-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 03/16/2023] [Indexed: 01/06/2024]
Abstract
The formation of strongly bound atmospheric molecular clusters is the first step towards forming new aerosol particles. Recent advances in the application of machine learning models open an enormous opportunity for complementing expensive quantum chemical calculations with efficient machine learning predictions. In this Perspective, we present how data-driven approaches can be applied to accelerate cluster configurational sampling, thereby greatly increasing the number of chemically relevant systems that can be covered.
Collapse
Affiliation(s)
- Jakub Kubečka
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | - Yosef Knattrup
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | | | | | - Daniel Ayoubi
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | - Haide Wu
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | | | - Jonas Elm
- Department of Chemistry, Aarhus University, Aarhus, Denmark.
- iCLIMATE Aarhus University Interdisciplinary Centre for Climate Change, Aarhus, Denmark.
| |
Collapse
|
20
|
Zaripova K, Cosmo L, Kazi A, Ahmadi SA, Bronstein MM, Navab N. Graph-in-Graph (GiG): Learning interpretable latent graphs in non-Euclidean domain for biological and healthcare applications. Med Image Anal 2023; 88:102839. [PMID: 37263109 DOI: 10.1016/j.media.2023.102839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 04/26/2023] [Accepted: 05/06/2023] [Indexed: 06/03/2023]
Abstract
Graphs are a powerful tool for representing and analyzing unstructured, non-Euclidean data ubiquitous in the healthcare domain. Two prominent examples are molecule property prediction and brain connectome analysis. Importantly, recent works have shown that considering relationships between input data samples has a positive regularizing effect on the downstream task in healthcare applications. These relationships are naturally modeled by a (possibly unknown) graph structure between input samples. In this work, we propose Graph-in-Graph (GiG), a neural network architecture for protein classification and brain imaging applications that exploits the graph representation of the input data samples and their latent relation. We assume an initially unknown latent-graph structure between graph-valued input data and propose to learn a parametric model for message passing within and across input graph samples, end-to-end along with the latent structure connecting the input graphs. Further, we introduce a Node Degree Distribution Loss (NDDL) that regularizes the predicted latent relationships structure. This regularization can significantly improve the downstream task. Moreover, the obtained latent graph can represent patient population models or networks of molecule clusters, providing a level of interpretability and knowledge discovery in the input domain, which is of particular value in healthcare.
Collapse
Affiliation(s)
- Kamilia Zaripova
- Department of Computer Science, Technical University of Munich, Munich, Germany.
| | - Luca Cosmo
- Department of Environmental Sciences, Informatics and Statistics, Ca' Foscari University of Venice, Venice, Italy; Informatics Department, USI University of Lugano, Lugano, Switzerland
| | - Anees Kazi
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Harvard Medical School, Boston, USA
| | | | | | - Nassir Navab
- Department of Computer Science, Technical University of Munich, Munich, Germany; Whiting School of Engineering, Johns Hopkins University, Baltimore, USA
| |
Collapse
|
21
|
Jun Yim S, Gyak KW, Kawale SA, Mottafegh A, Park CH, Ko Y, Kim I, Soo Jee S, Kim DP. One-flow Multi-step Synthesis of a Monomer as a Precursor of Thermal-Conductive Semiconductor Packaging Polymer via Multi-phasic Separation. J IND ENG CHEM 2023. [DOI: 10.1016/j.jiec.2023.03.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
|
22
|
Ma S, Liu JW. Self-supervised contrastive learning for heterogeneous graph based on multi-pretext tasks. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08234-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
|
23
|
Zhang H, Li S, Zhang J, Wang Z, Wang J, Jiang D, Bian Z, Zhang Y, Deng Y, Song J, Kang Y, Hou T. SDEGen: learning to evolve molecular conformations from thermodynamic noise for conformation generation. Chem Sci 2023; 14:1557-1568. [PMID: 36794194 PMCID: PMC9906649 DOI: 10.1039/d2sc04429c] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 01/11/2023] [Indexed: 01/13/2023] Open
Abstract
Generation of representative conformations for small molecules is a fundamental task in cheminformatics and computer-aided drug discovery, but capturing the complex distribution of conformations that contains multiple low energy minima is still a great challenge. Deep generative modeling, aiming to learn complex data distributions, is a promising approach to tackle the conformation generation problem. Here, inspired by stochastic dynamics and recent advances in generative modeling, we developed SDEGen, a novel conformation generation model based on stochastic differential equations. Compared with existing conformation generation methods, it enjoys the following advantages: (1) high model capacity to capture multimodal conformation distribution, thereby searching for multiple low-energy conformations of a molecule quickly, (2) higher conformation generation efficiency, almost ten times faster than the state-of-the-art score-based model, ConfGF, and (3) a clear physical interpretation to learn how a molecule evolves in a stochastic dynamics system starting from noise and eventually relaxing to the conformation that falls in low energy minima. Extensive experiments demonstrate that SDEGen has surpassed existing methods in different tasks for conformation generation, interatomic distance distribution prediction, and thermodynamic property estimation, showing great potential for real-world applications.
Collapse
Affiliation(s)
- Haotian Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Shengming Li
- College of Computer Science and Technology, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Jintu Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- School of Computer Science, Wuhan University Wuhan 430072 Hubei China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Zhiwen Bian
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yixue Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Jianfei Song
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
24
|
Pang S, Zhang K, Wang G, Lin JCW, Wang F, Meng X, Wang S, Zhang Y. AF-GCN: Completing Various Graph Tasks Efficiently via Adaptive Quadratic Frequency Response Function in Graph Spectral Domain. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.12.054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
25
|
Weinreich J, Lemm D, von Rudorff GF, von Lilienfeld OA. Ab initio machine learning of phase space averages. J Chem Phys 2022; 157:024303. [PMID: 35840379 DOI: 10.1063/5.0095674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Equilibrium structures determine material properties and biochemical functions. We here propose to machine learn phase space averages, conventionally obtained by ab initio or force-field-based molecular dynamics (MD) or Monte Carlo (MC) simulations. In analogy to ab initio MD, our ab initio machine learning (AIML) model does not require bond topologies and, therefore, enables a general machine learning pathway to obtain ensemble properties throughout the chemical compound space. We demonstrate AIML for predicting Boltzmann averaged structures after training on hundreds of MD trajectories. The AIML output is subsequently used to train machine learning models of free energies of solvation using experimental data and to reach competitive prediction errors (mean absolute error ∼ 0.8 kcal/mol) for out-of-sample molecules-within milliseconds. As such, AIML effectively bypasses the need for MD or MC-based phase space sampling, enabling exploration campaigns of Boltzmann averages throughout the chemical compound space at a much accelerated pace. We contextualize our findings by comparison to state-of-the-art methods resulting in a Pareto plot for the free energy of solvation predictions in terms of accuracy and time.
Collapse
Affiliation(s)
- Jan Weinreich
- Faculty of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
| | - Dominik Lemm
- Faculty of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
| | | | | |
Collapse
|
26
|
Xu Z, Escalera S, Pavão A, Richard M, Tu WW, Yao Q, Zhao H, Guyon I. Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform. PATTERNS 2022; 3:100543. [PMID: 35845844 PMCID: PMC9278500 DOI: 10.1016/j.patter.2022.100543] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/21/2022] [Accepted: 06/03/2022] [Indexed: 11/29/2022]
Abstract
Obtaining a standardized benchmark of computational methods is a major issue in data-science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here, we introduce Codabench, a meta-benchmark platform that is open sourced and community driven for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone free of charge and allows benchmark organizers to fairly compare submissions under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating easy organization of flexible and reproducible benchmarks, such as the possibility of reusing templates of benchmarks and supplying compute resources on demand. Codabench has been used internally and externally on various applications, receiving more than 130 users and 2,500 submissions. As illustrative use cases, we introduce four diverse benchmarks covering graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning. Codabench facilitates flexible, easy, and reproducible benchmarking Organizers can customize benchmark design and submission format Organizers may host their own platform instance or use the public instance Four use cases in diverse domains are introduced to demonstrate the key features
In almost all communities working on data science, researchers face increasingly severe issues of reproducibility and fair comparison. Researchers work on their own version of hardware/software environment, code, and data, and consequently, the published results are hardly comparable. We introduce Codabench, a meta-benchmark platform, that is capable of flexible and easy benchmarking and supports reproducibility. Codabench is an important step toward benchmarking and reproducible research. It has been used in various communities including graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning. Codabench is ready to help trendy research, e.g., artificial intelligence (AI) for science and data-centric AI.
Collapse
Affiliation(s)
- Zhen Xu
- 4Paradigm, Beijing 100085, China
- Corresponding author
| | - Sergio Escalera
- Computer Vision Center, Universitat de Barcelona, 08007 Barcelona, Spain
| | - Adrien Pavão
- LISN/CNRS/INRIA, University Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Magali Richard
- University Grenoble Alpes, CNRS, UMR 5525, VetAgro Sup, Grenoble INP, TIMC, 38000 Grenoble, France
| | | | | | | | - Isabelle Guyon
- LISN/CNRS/INRIA, University Paris-Saclay, 91190 Gif-sur-Yvette, France
- ChaLearn, Berkeley, CA, USA
- Corresponding author
| |
Collapse
|
27
|
Spiekermann KA, Pattanaik L, Green WH. Fast Predictions of Reaction Barrier Heights: Toward Coupled-Cluster Accuracy. J Phys Chem A 2022; 126:3976-3986. [PMID: 35727075 DOI: 10.1021/acs.jpca.2c02614] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Quantitative estimates of reaction barriers are essential for developing kinetic mechanisms and predicting reaction outcomes. However, the lack of experimental data and the steep scaling of accurate quantum calculations often hinder the ability to obtain reliable kinetic values. Here, we train a directed message passing neural network on nearly 24,000 diverse gas-phase reactions calculated at CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP. Our model uses 75% fewer parameters than previous studies, an improved reaction representation, and proper data splits to accurately estimate performance on unseen reactions. Using information from only the reactant and product, our model quickly predicts barrier heights with a testing MAE of 2.6 kcal mol-1 relative to the coupled-cluster data, making it more accurate than a good density functional theory calculation. Furthermore, our results show that future modeling efforts to estimate reaction properties would significantly benefit from fine-tuning calibration using a transfer learning technique. We anticipate this model will accelerate and improve kinetic predictions for small molecule chemistry.
Collapse
Affiliation(s)
- Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
28
|
Lin X, Jiang Y, Yang Y. Molecular distance matrix prediction based on graph convolutional networks. J Mol Struct 2022. [DOI: 10.1016/j.molstruc.2022.132540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
29
|
GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Sci Data 2022; 9:185. [PMID: 35449137 PMCID: PMC9023519 DOI: 10.1038/s41597-022-01288-4] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 03/04/2022] [Indexed: 12/23/2022] Open
Abstract
Machine learning (ML) outperforms traditional approaches in many molecular design tasks. ML models usually predict molecular properties from a 2D chemical graph or a single 3D structure, but neither of these representations accounts for the ensemble of 3D conformers that are accessible to a molecule. Property prediction could be improved by using conformer ensembles as input, but there is no large-scale dataset that contains graphs annotated with accurate conformers and experimental data. Here we use advanced sampling and semi-empirical density functional theory (DFT) to generate 37 million molecular conformations for over 450,000 molecules. The Geometric Ensemble Of Molecules (GEOM) dataset contains conformers for 133,000 species from QM9, and 317,000 species with experimental data related to biophysics, physiology, and physical chemistry. Ensembles of 1,511 species with BACE-1 inhibition data are also labeled with high-quality DFT free energies in an implicit water solvent, and 534 ensembles are further optimized with DFT. GEOM will assist in the development of models that predict properties from conformer ensembles, and generative models that sample 3D conformations. Measurement(s) | Conformer geometries and properties | Technology Type(s) | Computational Chemistry |
Collapse
|
30
|
Ager Meldgaard S, Köhler J, Lund Mortensen H, Christiansen MPV, Noé F, Hammer B. Generating stable molecules using imitation and reinforcement learning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac3eb4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning (RL) approach for generating molecules in Cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning (IL) on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a RL setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how RL further refines the IL model in domains far from the training data.
Collapse
|
31
|
Chen BP, Chen Y, Zeng GQ, She Q. Fractional-order convolutional neural networks with population extremal optimization. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
32
|
Gebauer NWA, Gastegger M, Hessmann SSP, Müller KR, Schütt KT. Inverse design of 3d molecular structures with conditional generative neural networks. Nat Commun 2022; 13:973. [PMID: 35190542 PMCID: PMC8861047 DOI: 10.1038/s41467-022-28526-y] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 01/28/2022] [Indexed: 11/09/2022] Open
Abstract
AbstractThe rational design of molecules with desired properties is a long-standing challenge in chemistry. Generative neural networks have emerged as a powerful approach to sample novel molecules from a learned distribution. Here, we propose a conditional generative neural network for 3d molecular structures with specified chemical and structural properties. This approach is agnostic to chemical bonding and enables targeted sampling of novel molecules from conditional distributions, even in domains where reference calculations are sparse. We demonstrate the utility of our method for inverse design by generating molecules with specified motifs or composition, discovering particularly stable molecules, and jointly targeting multiple electronic properties beyond the training regime.
Collapse
|
33
|
Steiner M, Reiher M. Autonomous Reaction Network Exploration in Homogeneous and Heterogeneous Catalysis. Top Catal 2022; 65:6-39. [PMID: 35185305 PMCID: PMC8816766 DOI: 10.1007/s11244-021-01543-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/17/2021] [Indexed: 12/11/2022]
Abstract
Autonomous computations that rely on automated reaction network elucidation algorithms may pave the way to make computational catalysis on a par with experimental research in the field. Several advantages of this approach are key to catalysis: (i) automation allows one to consider orders of magnitude more structures in a systematic and open-ended fashion than what would be accessible by manual inspection. Eventually, full resolution in terms of structural varieties and conformations as well as with respect to the type and number of potentially important elementary reaction steps (including decomposition reactions that determine turnover numbers) may be achieved. (ii) Fast electronic structure methods with uncertainty quantification warrant high efficiency and reliability in order to not only deliver results quickly, but also to allow for predictive work. (iii) A high degree of autonomy reduces the amount of manual human work, processing errors, and human bias. Although being inherently unbiased, it is still steerable with respect to specific regions of an emerging network and with respect to the addition of new reactant species. This allows for a high fidelity of the formalization of some catalytic process and for surprising in silico discoveries. In this work, we first review the state of the art in computational catalysis to embed autonomous explorations into the general field from which it draws its ingredients. We then elaborate on the specific conceptual issues that arise in the context of autonomous computational procedures, some of which we discuss at an example catalytic system.
Collapse
Affiliation(s)
- Miguel Steiner
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Markus Reiher
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
34
|
Ragoza M, Masuda T, Koes DR. Generating 3D Molecules Conditional on Receptor Binding Sites with Deep Generative Models. Chem Sci 2022; 13:2701-2713. [PMID: 35356675 PMCID: PMC8890264 DOI: 10.1039/d1sc05976a] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 02/06/2022] [Indexed: 11/22/2022] Open
Abstract
The goal of structure-based drug discovery is to find small molecules that bind to a given target protein. Deep learning has been used to generate drug-like molecules with certain cheminformatic properties, but has not yet been applied to generating 3D molecules predicted to bind to proteins by sampling the conditional distribution of protein–ligand binding interactions. In this work, we describe for the first time a deep learning system for generating 3D molecular structures conditioned on a receptor binding site. We approach the problem using a conditional variational autoencoder trained on an atomic density grid representation of cross-docked protein–ligand structures. We apply atom fitting and bond inference procedures to construct valid molecular conformations from generated atomic densities. We evaluate the properties of the generated molecules and demonstrate that they change significantly when conditioned on mutated receptors. We also explore the latent space learned by our generative model using sampling and interpolation techniques. This work opens the door for end-to-end prediction of stable bioactive molecules from protein structures with deep learning. We generate 3D molecules conditioned on receptor binding sites by training a deep generative model on protein–ligand complexes. Our model uses the conditional receptor information to make chemically relevant changes to the generated molecules.![]()
Collapse
Affiliation(s)
- Matthew Ragoza
- Intelligent Systems Program, University of Pittsburgh Pittsburgh PA 15213 USA
| | - Tomohide Masuda
- Department of Computational and Systems Biology, University of Pittsburgh Pittsburgh PA 15213 USA
| | - David Ryan Koes
- Department of Computational and Systems Biology, University of Pittsburgh Pittsburgh PA 15213 USA
| |
Collapse
|
35
|
Abstract
Computational methods play an increasingly important role in drug discovery. Structure-based drug design (SBDD), in particular, includes techniques that take into account the structure of the macromolecular target to predict compounds that are likely to establish optimal interactions with the binding site. The current interest in machine learning algorithms based on deep neural networks encouraged the application of deep learning to SBDD related problems. This chapter covers selected works in this active area of research.
Collapse
|
36
|
Tong X, Liu X, Tan X, Li X, Jiang J, Xiong Z, Xu T, Jiang H, Qiao N, Zheng M. Generative Models for De Novo Drug Design. J Med Chem 2021; 64:14011-14027. [PMID: 34533311 DOI: 10.1021/acs.jmedchem.1c00927] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Artificial intelligence (AI) is booming. Among various AI approaches, generative models have received much attention in recent years. Inspired by these successes, researchers are now applying generative model techniques to de novo drug design, which has been considered as the "holy grail" of drug discovery. In this Perspective, we first focus on describing models such as recurrent neural network, autoencoder, generative adversarial network, transformer, and hybrid models with reinforcement learning. Next, we summarize the applications of generative models to drug design, including generating various compounds to expand the compound library and designing compounds with specific properties, and we also list a few publicly available molecular design tools based on generative models which can be used directly to generate molecules. In addition, we also introduce current benchmarks and metrics frequently used for generative models. Finally, we discuss the challenges and prospects of using generative models to aid drug design.
Collapse
Affiliation(s)
- Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xiaohong Liu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xiaoqin Tan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Jiaxin Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Zhaoping Xiong
- Laboratory of Health Intelligence, Huawei Technologies Co., Ltd, Shenzhen 518100, China
| | | | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Nan Qiao
- Laboratory of Health Intelligence, Huawei Technologies Co., Ltd, Shenzhen 518100, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| |
Collapse
|
37
|
Keith JA, Vassilev-Galindo V, Cheng B, Chmiela S, Gastegger M, Müller KR, Tkatchenko A. Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems. Chem Rev 2021; 121:9816-9872. [PMID: 34232033 PMCID: PMC8391798 DOI: 10.1021/acs.chemrev.1c00107] [Citation(s) in RCA: 190] [Impact Index Per Article: 63.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Indexed: 12/23/2022]
Abstract
Machine learning models are poised to make a transformative impact on chemical sciences by dramatically accelerating computational algorithms and amplifying insights available from computational chemistry methods. However, achieving this requires a confluence and coaction of expertise in computer science and physical sciences. This Review is written for new and experienced researchers working at the intersection of both fields. We first provide concise tutorials of computational chemistry and machine learning methods, showing how insights involving both can be achieved. We follow with a critical review of noteworthy applications that demonstrate how computational chemistry and machine learning can be used together to provide insightful (and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design.
Collapse
Affiliation(s)
- John A. Keith
- Department
of Chemical and Petroleum Engineering Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Valentin Vassilev-Galindo
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Bingqing Cheng
- Accelerate
Programme for Scientific Discovery, Department
of Computer Science and Technology, 15 J. J. Thomson Avenue, Cambridge CB3 0FD, United Kingdom
| | - Stefan Chmiela
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Michael Gastegger
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea
- Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany
- Google Research, Brain Team, 10117 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| |
Collapse
|
38
|
Lemm D, von Rudorff GF, von Lilienfeld OA. Machine learning based energy-free structure predictions of molecules, transition states, and solids. Nat Commun 2021; 12:4468. [PMID: 34294693 PMCID: PMC8298673 DOI: 10.1038/s41467-021-24525-7] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 06/22/2021] [Indexed: 02/06/2023] Open
Abstract
The computational prediction of atomistic structure is a long-standing problem in physics, chemistry, materials, and biology. Conventionally, force-fields or ab initio methods determine structure through energy minimization, which is either approximate or computationally demanding. This accuracy/cost trade-off prohibits the generation of synthetic big data sets accounting for chemical space with atomistic detail. Exploiting implicit correlations among relaxed structures in training data sets, our machine learning model Graph-To-Structure (G2S) generalizes across compound space in order to infer interatomic distances for out-of-sample compounds, effectively enabling the direct reconstruction of coordinates, and thereby bypassing the conventional energy optimization task. The numerical evidence collected includes 3D coordinate predictions for organic molecules, transition states, and crystalline solids. G2S improves systematically with training set size, reaching mean absolute interatomic distance prediction errors of less than 0.2 Å for less than eight thousand training structures - on par or better than conventional structure generators. Applicability tests of G2S include successful predictions for systems which typically require manual intervention, improved initial guesses for subsequent conventional ab initio based relaxation, and input generation for subsequent use of structure based quantum machine learning models.
Collapse
Affiliation(s)
- Dominik Lemm
- Faculty of Physics, University of Vienna, Vienna, Austria
| | | | - O Anatole von Lilienfeld
- Faculty of Physics, University of Vienna, Vienna, Austria.
- Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, Basel, Switzerland.
| |
Collapse
|
39
|
Terayama K, Sumita M, Katouda M, Tsuda K, Okuno Y. Efficient Search for Energetically Favorable Molecular Conformations against Metastable States via Gray-Box Optimization. J Chem Theory Comput 2021; 17:5419-5427. [PMID: 34261321 DOI: 10.1021/acs.jctc.1c00301] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In order to accurately understand and estimate molecular properties, finding energetically favorable molecular conformations is the most fundamental task for atomistic computational research on molecules and materials. Geometry optimization based on quantum chemical calculations has enabled the conformation prediction of arbitrary molecules, including de novo ones. However, it is computationally expensive to perform geometry optimizations for enormous conformers. In this study, we introduce the gray-box optimization (GBO) framework, which enables optimal control over the entire geometry optimization process, among multiple conformers. Algorithms designed for GBO roughly estimate energetically preferable conformers during their geometry optimization iterations. They then preferentially compute promising conformers. To evaluate the performance of the GBO framework, we applied it to a test set consisting of seven dipeptides and mycophenolic acid to determine their stable conformations at the density functional theory level. We thus preferentially obtained energetically favorable conformations. Furthermore, the computational costs required to find the most stable conformation were significantly reduced (approximately 1% on average, compared to the naive approach for the dipeptides).
Collapse
Affiliation(s)
- Kei Terayama
- Graduate School of Medical Life Science, Yokohama City University, Tsurumi-ku, Yokohama 230-0045, Japan.,RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan.,Medical Sciences Innovation Hub Program, RIKEN, Yokohama 230-0045, Japan.,Graduate School of Medicine, Kyoto University, Sakyo-ku, Kyoto 606-8507, Japan
| | - Masato Sumita
- RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan.,International Center for Materials Nanoarchitectonics(WPI-MANA), National Institute for Materials Science, Tsukuba 305-0044, Japan
| | - Michio Katouda
- Department of Computational Science and Technology, Research Organization for Information Science and Technology, Minato-ku, Tokyo 105-0013, Japan.,Waseda Research Institute for Science and Engineering, Waseda University, Sinjuku-ku, Tokyo 169-8555, Japan
| | - Koji Tsuda
- RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan.,Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa 277-8561, Japan.,Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba 305-0047, Japan
| | - Yasushi Okuno
- Medical Sciences Innovation Hub Program, RIKEN, Yokohama 230-0045, Japan.,Graduate School of Medicine, Kyoto University, Sakyo-ku, Kyoto 606-8507, Japan
| |
Collapse
|
40
|
Moskal M, Beker W, Szymkuć S, Grzybowski BA. Scaffold‐Directed Face Selectivity Machine‐Learned from Vectors of Non‐covalent Interactions. Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202101986] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Martyna Moskal
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Wiktor Beker
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Sara Szymkuć
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
- IBS Center for Soft and Living Matter and Department of Chemistry UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun Ulsan South Korea
| |
Collapse
|
41
|
Moskal M, Beker W, Szymkuć S, Grzybowski BA. Scaffold-Directed Face Selectivity Machine-Learned from Vectors of Non-covalent Interactions. Angew Chem Int Ed Engl 2021; 60:15230-15235. [PMID: 33876554 DOI: 10.1002/anie.202101986] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 03/29/2021] [Indexed: 11/06/2022]
Abstract
This work describes a method to vectorize and Machine-Learn, ML, non-covalent interactions responsible for scaffold-directed reactions important in synthetic chemistry. Models trained on this representation predict correct face of approach in ca. 90 % of Michael additions or Diels-Alder cycloadditions. These accuracies are significantly higher than those based on traditional ML descriptors, energetic calculations, or intuition of experienced synthetic chemists. Our results also emphasize the importance of ML models being provided with relevant mechanistic knowledge; without such knowledge, these models cannot easily "transfer-learn" and extrapolate to previously unseen reaction mechanisms.
Collapse
Affiliation(s)
- Martyna Moskal
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Wiktor Beker
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Sara Szymkuć
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA.,IBS Center for Soft and Living Matter and Department of Chemistry, UNIST, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan, South Korea
| |
Collapse
|
42
|
Westermayr J, Gastegger M, Schütt KT, Maurer RJ. Perspective on integrating machine learning into computational chemistry and materials science. J Chem Phys 2021; 154:230903. [PMID: 34241249 DOI: 10.1063/5.0047760] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Machine learning (ML) methods are being used in almost every conceivable area of electronic structure theory and molecular simulation. In particular, ML has become firmly established in the construction of high-dimensional interatomic potentials. Not a day goes by without another proof of principle being published on how ML methods can represent and predict quantum mechanical properties-be they observable, such as molecular polarizabilities, or not, such as atomic charges. As ML is becoming pervasive in electronic structure theory and molecular simulation, we provide an overview of how atomistic computational modeling is being transformed by the incorporation of ML approaches. From the perspective of the practitioner in the field, we assess how common workflows to predict structure, dynamics, and spectroscopy are affected by ML. Finally, we discuss how a tighter and lasting integration of ML methods with computational chemistry and materials science can be achieved and what it will mean for research practice, software development, and postgraduate training.
Collapse
Affiliation(s)
- Julia Westermayr
- Department of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, United Kingdom
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Kristof T Schütt
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Reinhard J Maurer
- Department of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, United Kingdom
| |
Collapse
|
43
|
Mercado R, Rastemo T, Lindelöf E, Klambauer G, Engkvist O, Chen H, Jannik Bjerrum E. Graph networks for molecular design. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abcf91] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
44
|
Mercado R, Rastemo T, Lindelöf E, Klambauer G, Engkvist O, Chen H, Bjerrum EJ. Practical notes on building molecular graph generative models. ACTA ACUST UNITED AC 2021. [DOI: 10.1002/ail2.18] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Rocío Mercado
- Molecular AI, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Tobias Rastemo
- Molecular AI, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
- Chalmers University of Technology Gothenburg Sweden
| | - Edvard Lindelöf
- Molecular AI, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
- Chalmers University of Technology Gothenburg Sweden
| | - Günter Klambauer
- Institute of Bioinformatics, Johannes Kepler University Linz Austria
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Hongming Chen
- Centre of Chemistry and Chemical Biology, Guangzhou Regenerative Medicine and Health, Guangdong Laboratory Guangzhou China
| | - Esben Jannik Bjerrum
- Molecular AI, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| |
Collapse
|
45
|
Wieder O, Kohlbacher S, Kuenemann M, Garon A, Ducrot P, Seidel T, Langer T. A compact review of molecular property prediction with graph neural networks. DRUG DISCOVERY TODAY. TECHNOLOGIES 2020; 37:1-12. [PMID: 34895648 DOI: 10.1016/j.ddtec.2020.11.009] [Citation(s) in RCA: 102] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 11/25/2020] [Accepted: 11/30/2020] [Indexed: 05/22/2023]
Abstract
As graph neural networks are becoming more and more powerful and useful in the field of drug discovery, many pharmaceutical companies are getting interested in utilizing these methods for their own in-house frameworks. This is especially compelling for tasks such as the prediction of molecular properties which is often one of the most crucial tasks in computer-aided drug discovery workflows. The immense hype surrounding these kinds of algorithms has led to the development of many different types of promising architectures and in this review we try to structure this highly dynamic field of AI-research by collecting and classifying 80 GNNs that have been used to predict more than 20 molecular properties using 48 different datasets.
Collapse
Affiliation(s)
- Oliver Wieder
- University of Vienna, Department of Pharmaceutical Chemistry, Althanstraße 14, A-1090 Vienna, Austria
| | - Stefan Kohlbacher
- University of Vienna, Department of Pharmaceutical Chemistry, Althanstraße 14, A-1090 Vienna, Austria
| | - Mélaine Kuenemann
- Servier Research Institute - CentEx Biotechnology, 125 Chemin de Ronde, 78290 Croissy-sur-Seine, France
| | - Arthur Garon
- University of Vienna, Department of Pharmaceutical Chemistry, Althanstraße 14, A-1090 Vienna, Austria
| | - Pierre Ducrot
- University of Vienna, Department of Pharmaceutical Chemistry, Althanstraße 14, A-1090 Vienna, Austria
| | - Thomas Seidel
- University of Vienna, Department of Pharmaceutical Chemistry, Althanstraße 14, A-1090 Vienna, Austria
| | - Thierry Langer
- University of Vienna, Department of Pharmaceutical Chemistry, Althanstraße 14, A-1090 Vienna, Austria.
| |
Collapse
|
46
|
Stieffenhofer M, Wand M, Bereau T. Adversarial reverse mapping of equilibrated condensed-phase molecular structures. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/abb6d4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
47
|
Kwon Y, Lee D, Choi YS, Kang M, Kang S. Neural Message Passing for NMR Chemical Shift Prediction. J Chem Inf Model 2020; 60:2024-2030. [DOI: 10.1021/acs.jcim.0c00195] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Youngchun Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
- Department of Computer Science and Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Dongseon Lee
- Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Youn-Suk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Myeonginn Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Seokho Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| |
Collapse
|
48
|
Mansimov E, Mahmood O, Kang S, Cho K. Molecular Geometry Prediction using a Deep Generative Graph Neural Network. Sci Rep 2019; 9:20381. [PMID: 31892716 PMCID: PMC6938476 DOI: 10.1038/s41598-019-56773-5] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 12/16/2019] [Indexed: 11/25/2022] Open
Abstract
A molecule's geometry, also known as conformation, is one of a molecule's most important properties, determining the reactions it participates in, the bonds it forms, and the interactions it has with other molecules. Conventional conformation generation methods minimize hand-designed molecular force field energy functions that are often not well correlated with the true energy function of a molecule observed in nature. They generate geometrically diverse sets of conformations, some of which are very similar to the lowest-energy conformations and others of which are very different. In this paper, we propose a conditional deep generative graph neural network that learns an energy function by directly learning to generate molecular conformations that are energetically favorable and more likely to be observed experimentally in data-driven manner. On three large-scale datasets containing small molecules, we show that our method generates a set of conformations that on average is far more likely to be close to the corresponding reference conformations than are those obtained from conventional force field methods. Our method maintains geometrical diversity by generating conformations that are not too similar to each other, and is also computationally faster. We also show that our method can be used to provide initial coordinates for conventional force field methods. On one of the evaluated datasets we show that this combination allows us to combine the best of both methods, yielding generated conformations that are on average close to reference conformations with some very similar to reference conformations.
Collapse
Affiliation(s)
- Elman Mansimov
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, 60 5th Avenue, New York, New York, 10011, United States
| | - Omar Mahmood
- Center for Data Science, New York University, 60 5th Avenue, New York, New York, 10011, United States
| | - Seokho Kang
- Department of Systems Management Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon, 16419, Republic of Korea
| | - Kyunghyun Cho
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, 60 5th Avenue, New York, New York, 10011, United States.
- Center for Data Science, New York University, 60 5th Avenue, New York, New York, 10011, United States.
- Facebook AI Research, 770 Broadway, New York, New York, 10003, United States.
- CIFAR Azrieli Global Scholar, Canadian Institute for Advanced Research, 661 University Avenue, Toronto, ON, M5G 1M1, Canada.
| |
Collapse
|