1
|
Zhang O, Lin H, Zhang H, Zhao H, Huang Y, Hsieh CY, Pan P, Hou T. Deep Lead Optimization: Leveraging Generative AI for Structural Modification. J Am Chem Soc 2024. [PMID: 39499822 DOI: 10.1021/jacs.4c11686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2024]
Abstract
The integration of deep learning-based molecular generation models into drug discovery has garnered significant attention for its potential to expedite the development process. Central to this is lead optimization, a critical phase where existing molecules are refined into viable drug candidates. As various methods for deep lead optimization continue to emerge, it is essential to classify these approaches more clearly. We categorize lead optimization methods into two main types: goal-directed and structure-directed. Our focus is on structure-directed optimization, which, while highly relevant to practical applications, is less explored compared to goal-directed methods. Through a systematic review of conventional computational approaches, we identify four tasks specific to structure-directed optimization: fragment replacement, linker design, scaffold hopping, and side-chain decoration. We discuss the motivations, training data construction, and current developments for each of these tasks. Additionally, we use classical optimization taxonomy to classify both goal-directed and structure-directed methods, highlighting their challenges and future development prospects. Finally, we propose a reference protocol for experimental chemists to effectively utilize Generative AI (GenAI)-based tools in structural modification tasks, bridging the gap between methodological advancements and practical applications.
Collapse
Affiliation(s)
- Odin Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Haitao Lin
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, Zhejiang, China
| | - Hui Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Huifeng Zhao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yufei Huang
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, Zhejiang, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Peichen Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
2
|
Singh S, Zeh G, Freiherr J, Bauer T, Türkmen I, Grasskamp AT. Classification of substances by health hazard using deep neural networks and molecular electron densities. J Cheminform 2024; 16:45. [PMID: 38627862 PMCID: PMC11302296 DOI: 10.1186/s13321-024-00835-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 03/23/2024] [Indexed: 08/09/2024] Open
Abstract
In this paper we present a method that allows leveraging 3D electron density information to train a deep neural network pipeline to segment regions of high, medium and low electronegativity and classify substances as health hazardous or non-hazardous. We show that this can be used for use-cases such as cosmetics and food products. For this purpose, we first generate 3D electron density cubes using semiempirical molecular calculations for a custom European Chemicals Agency (ECHA) subset consisting of substances labelled as hazardous and non-hazardous for cosmetic usage. Together with their 3-class electronegativity maps we train a modified 3D-UNet with electron density cubes to segment reactive sites in molecules and classify substances with an accuracy of 78.1%. We perform the same process on a custom food dataset (CompFood) consisting of hazardous and non-hazardous substances compiled from European Food Safety Authority (EFSA) OpenFoodTox, Food and Drug Administration (FDA) Generally Recognized as Safe (GRAS) and FooDB datasets to achieve a classification accuracy of 64.1%. Our results show that 3D electron densities and particularly masked electron densities, calculated by taking a product of original electron densities and regions of high and low electronegativity can be used to classify molecules for different use-cases and thus serve not only to guide safe-by-design product development but also aid in regulatory decisions. SCIENTIFIC CONTRIBUTION: We aim to contribute to the diverse 3D molecular representations used for training machine learning algorithms by showing that a deep learning network can be trained on 3D electron density representation of molecules. This approach has previously not been used to train machine learning models and it allows utilization of the true spatial domain of the molecule for prediction of properties such as their suitability for usage in cosmetics and food products and in future, to other molecular properties. The data and code used for training is accessible at https://github.com/s-singh-ivv/eDen-Substances .
Collapse
Affiliation(s)
- Satnam Singh
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
- Department of Psychiatry and Psychotherapy, Friedrich-Alexander-Universität Erlangen-Nürnberg, Schwabachanlage 6, 91054, Erlangen, Germany
| | - Gina Zeh
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
| | - Jessica Freiherr
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
- Department of Psychiatry and Psychotherapy, Friedrich-Alexander-Universität Erlangen-Nürnberg, Schwabachanlage 6, 91054, Erlangen, Germany
| | - Thilo Bauer
- Computer Chemistry Center, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nägelsbachstr. 25, 91052, Erlangen, Germany
| | - Isik Türkmen
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
| | - Andreas T Grasskamp
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany.
| |
Collapse
|
3
|
Helal H, Firoz J, Bilbrey JA, Sprueill H, Herman KM, Krell MM, Murray T, Roldan ML, Kraus M, Li A, Das P, Xantheas SS, Choudhury S. Acceleration of Graph Neural Network-Based Prediction Models in Chemistry via Co-Design Optimization on Intelligence Processing Units. J Chem Inf Model 2024; 64:1568-1580. [PMID: 38382011 DOI: 10.1021/acs.jcim.3c01312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Atomic structure prediction and associated property calculations are the bedrock of chemical physics. Since high-fidelity ab initio modeling techniques for computing the structure and properties can be prohibitively expensive, this motivates the development of machine-learning (ML) models that make these predictions more efficiently. Training graph neural networks over large atomistic databases introduces unique computational challenges, such as the need to process millions of small graphs with variable size and support communication patterns that are distinct from learning over large graphs, such as social networks. We demonstrate a novel hardware-software codesign approach to scale up the training of atomistic graph neural networks (GNN) for structure and property prediction. First, to eliminate redundant computation and memory associated with alternative padding techniques and to improve throughput via minimizing communication, we formulate the effective coalescing of the batches of variable-size atomistic graphs as the bin packing problem and introduce a hardware-agnostic algorithm to pack these batches. In addition, we propose hardware-specific optimizations, including a planner and vectorization for the gather-scatter operations targeted for Graphcore's Intelligence Processing Unit (IPU), as well as model-specific optimizations such as merged communication collectives and optimized softplus. Putting these all together, we demonstrate the effectiveness of the proposed codesign approach by providing an implementation of a well-established atomistic GNN on the Graphcore IPUs. We evaluate the training performance on multiple atomistic graph databases with varying degrees of graph counts, sizes, and sparsity. We demonstrate that such a codesign approach can reduce the training time of atomistic GNNs and can improve their performance by up to 1.5× compared to the baseline implementation of the model on the IPUs. Additionally, we compare our IPU implementation with a Nvidia GPU-based implementation and show that our atomistic GNN implementation on the IPUs can run 1.8× faster on average compared to the execution time on the GPUs.
Collapse
Affiliation(s)
- Hatem Helal
- Graphcore, Kett House, Station Rd, Cambridge CB1 2JH, U.K
| | - Jesun Firoz
- Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, 1100 Dexter Ave N, Seattle, Washington 98109, United States
| | - Jenna A Bilbrey
- Artificial Intelligence and Data Analytics Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| | - Henry Sprueill
- Artificial Intelligence and Data Analytics Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| | - Kristina M Herman
- Department of Chemistry, University of Washington, Seattle, Washington 98185, United States
| | | | - Tom Murray
- Graphcore, Kett House, Station Rd, Cambridge CB1 2JH, U.K
| | | | - Mike Kraus
- Graphcore, Kett House, Station Rd, Cambridge CB1 2JH, U.K
| | - Ang Li
- Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| | - Payel Das
- IBM Research, Yorktown Heights, New York 10598, United States
| | - Sotiris S Xantheas
- Department of Chemistry, University of Washington, Seattle, Washington 98185, United States
- Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| | - Sutanay Choudhury
- Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| |
Collapse
|
4
|
Wang M, Wu Z, Wang J, Weng G, Kang Y, Pan P, Li D, Deng Y, Yao X, Bing Z, Hsieh CY, Hou T. Genetic Algorithm-Based Receptor Ligand: A Genetic Algorithm-Guided Generative Model to Boost the Novelty and Drug-Likeness of Molecules in a Sampling Chemical Space. J Chem Inf Model 2024; 64:1213-1228. [PMID: 38302422 DOI: 10.1021/acs.jcim.3c01964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Deep learning-based de novo molecular design has recently gained significant attention. While numerous DL-based generative models have been successfully developed for designing novel compounds, the majority of the generated molecules lack sufficiently novel scaffolds or high drug-like profiles. The aforementioned issues may not be fully captured by commonly used metrics for the assessment of molecular generative models, such as novelty, diversity, and quantitative estimation of the drug-likeness score. To address these limitations, we proposed a genetic algorithm-guided generative model called GARel (genetic algorithm-based receptor-ligand interaction generator), a novel framework for training a DL-based generative model to produce drug-like molecules with novel scaffolds. To efficiently train the GARel model, we utilized dense net to update the parameters based on molecules with novel scaffolds and drug-like features. To demonstrate the capability of the GARel model, we used it to design inhibitors for three targets: AA2AR, EGFR, and SARS-Cov2. The results indicate that GARel-generated molecules feature more diverse and novel scaffolds and possess more desirable physicochemical properties and favorable docking scores. Compared with other generative models, GARel makes significant progress in balancing novelty and drug-likeness, providing a promising direction for the further development of DL-based de novo design methodology with potential impacts on drug discovery.
Collapse
Affiliation(s)
- Mingyang Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Zhengjian Wu
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- School of Computer Science, Wuhan University, Wuhan 430072, Hubei ,China
| | - Jike Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Gaoqi Weng
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Yu Kang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Peichen Pan
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Dan Li
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Xiaojun Yao
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery Macau Institute for Applied Research in Medicine and Health State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa, Macau 999078, China
| | - Zhitong Bing
- Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou, Gansu 730000, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Tingjun Hou
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| |
Collapse
|
5
|
Kim DN, McNaughton AD, Kumar N. Leveraging Artificial Intelligence to Expedite Antibody Design and Enhance Antibody-Antigen Interactions. Bioengineering (Basel) 2024; 11:185. [PMID: 38391671 PMCID: PMC10886287 DOI: 10.3390/bioengineering11020185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 01/30/2024] [Accepted: 02/06/2024] [Indexed: 02/24/2024] Open
Abstract
This perspective sheds light on the transformative impact of recent computational advancements in the field of protein therapeutics, with a particular focus on the design and development of antibodies. Cutting-edge computational methods have revolutionized our understanding of protein-protein interactions (PPIs), enhancing the efficacy of protein therapeutics in preclinical and clinical settings. Central to these advancements is the application of machine learning and deep learning, which offers unprecedented insights into the intricate mechanisms of PPIs and facilitates precise control over protein functions. Despite these advancements, the complex structural nuances of antibodies pose ongoing challenges in their design and optimization. Our review provides a comprehensive exploration of the latest deep learning approaches, including language models and diffusion techniques, and their role in surmounting these challenges. We also present a critical analysis of these methods, offering insights to drive further progress in this rapidly evolving field. The paper includes practical recommendations for the application of these computational techniques, supplemented with independent benchmark studies. These studies focus on key performance metrics such as accuracy and the ease of program execution, providing a valuable resource for researchers engaged in antibody design and development. Through this detailed perspective, we aim to contribute to the advancement of antibody design, equipping researchers with the tools and knowledge to navigate the complexities of this field.
Collapse
Affiliation(s)
| | | | - Neeraj Kumar
- Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, WA 99352, USA; (D.N.K.); (A.D.M.)
| |
Collapse
|
6
|
Xu C, Liu R, Huang S, Li W, Li Z, Luo HB. 3D-SMGE: a pipeline for scaffold-based molecular generation and evaluation. Brief Bioinform 2023; 24:bbad327. [PMID: 37756591 DOI: 10.1093/bib/bbad327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/19/2023] [Accepted: 08/30/2023] [Indexed: 09/29/2023] Open
Abstract
In the process of drug discovery, one of the key problems is how to improve the biological activity and ADMET properties starting from a specific structure, which is also called structural optimization. Based on a starting scaffold, the use of deep generative model to generate molecules with desired drug-like properties will provide a powerful tool to accelerate the structural optimization process. However, the existing generative models remain challenging in extracting molecular features efficiently in 3D space to generate drug-like 3D molecules. Moreover, most of the existing ADMET prediction models made predictions of different properties through a single model, which can result in reduced prediction accuracy on some datasets. To effectively generate molecules from a specific scaffold and provide basis for the structural optimization, the 3D-SMGE (3-Dimensional Scaffold-based Molecular Generation and Evaluation) work consisting of molecular generation and prediction of ADMET properties is presented. For the molecular generation, we proposed 3D-SMG, a novel deep generative model for the end-to-end design of 3D molecules. In the 3D-SMG model, we designed the cross-aggregated continuous-filter convolution (ca-cfconv), which is used to achieve efficient and low-cost 3D spatial feature extraction while ensuring the invariance of atomic space rotation. 3D-SMG was proved to generate valid, unique and novel molecules with high drug-likeness. Besides, the proposed data-adaptive multi-model ADMET prediction method outperformed or maintained the best evaluation metrics on 24 out of 27 ADMET benchmark datasets. 3D-SMGE is anticipated to emerge as a powerful tool for hit-to-lead structural optimizations and accelerate the drug discovery process.
Collapse
Affiliation(s)
- Chao Xu
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Pharmaceutical Sciences, Hainan University, Haikou 570228, Hainan, P.R. China
| | - Runduo Liu
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510000, Guangdong, P.R. China
| | - Shuheng Huang
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Pharmaceutical Sciences, Hainan University, Haikou 570228, Hainan, P.R. China
| | - Wenchao Li
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510000, Guangdong, P.R. China
| | - Zhe Li
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510000, Guangdong, P.R. China
| | - Hai-Bin Luo
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Pharmaceutical Sciences, Hainan University, Haikou 570228, Hainan, P.R. China
| |
Collapse
|
7
|
Varikoti RA, Schultz KJ, Kombala CJ, Kruel A, Brandvold KR, Zhou M, Kumar N. Integrated data-driven and experimental approaches to accelerate lead optimization targeting SARS-CoV-2 main protease. J Comput Aided Mol Des 2023:10.1007/s10822-023-00509-1. [PMID: 37314632 DOI: 10.1007/s10822-023-00509-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 05/23/2023] [Indexed: 06/15/2023]
Abstract
Identification of potential therapeutic candidates can be expedited by integrating computational modeling with domain aware machine learning (ML) models followed by experimental validation in an iterative manner. Generative deep learning models can generate thousands of new candidates, however, their physiochemical and biochemical properties are typically not fully optimized. Using our recently developed deep learning models and a scaffold as a starting point, we generated tens of thousands of compounds for SARS-CoV-2 Mpro that preserve the core scaffold. We utilized and implemented several computational tools such as structural alert and toxicity analysis, high throughput virtual screening, ML-based 3D quantitative structure-activity relationships, multi-parameter optimization, and graph neural networks on generated candidates to predict biological activity and binding affinity in advance. As a result of these combined computational endeavors, eight promising candidates were singled out and put through experimental testing using Native Mass Spectrometry and FRET-based functional assays. Two of the tested compounds with quinazoline-2-thiol and acetylpiperidine core moieties showed IC[Formula: see text] values in the low micromolar range: [Formula: see text] [Formula: see text]M and 3.41±0.0015 [Formula: see text]M, respectively. Molecular dynamics simulations further highlight that binding of these compounds results in allosteric modulations within the chain B and the interface domains of the Mpro. Our integrated approach provides a platform for data driven lead optimization with rapid characterization and experimental validation in a closed loop that could be applied to other potential protein targets.
Collapse
Affiliation(s)
- Rohith Anand Varikoti
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Katherine J Schultz
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Chathuri J Kombala
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Agustin Kruel
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Kristoffer R Brandvold
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Mowei Zhou
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Neeraj Kumar
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA.
| |
Collapse
|
8
|
Baillif B, Cole J, McCabe P, Bender A. Deep generative models for 3D molecular structure. Curr Opin Struct Biol 2023; 80:102566. [DOI: 10.1016/j.sbi.2023.102566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 02/05/2023] [Accepted: 02/15/2023] [Indexed: 03/30/2023]
|
9
|
Schütt KT, Hessmann SSP, Gebauer NWA, Lederer J, Gastegger M. SchNetPack 2.0: A neural network toolbox for atomistic machine learning. J Chem Phys 2023; 158:144801. [PMID: 37061495 DOI: 10.1063/5.0138367] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023] Open
Abstract
SchNetPack is a versatile neural network toolbox that addresses both the requirements of method development and the application of atomistic machine learning. Version 2.0 comes with an improved data pipeline, modules for equivariant neural networks, and a PyTorch implementation of molecular dynamics. An optional integration with PyTorch Lightning and the Hydra configuration framework powers a flexible command-line interface. This makes SchNetPack 2.0 easily extendable with a custom code and ready for complex training tasks, such as the generation of 3D molecular structures.
Collapse
Affiliation(s)
- Kristof T Schütt
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | | | - Niklas W A Gebauer
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Jonas Lederer
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| |
Collapse
|
10
|
Thomas M, Bender A, de Graaf C. Integrating structure-based approaches in generative molecular design. Curr Opin Struct Biol 2023; 79:102559. [PMID: 36870277 DOI: 10.1016/j.sbi.2023.102559] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 01/23/2023] [Accepted: 01/31/2023] [Indexed: 03/06/2023]
Abstract
Generative molecular design for drug discovery and development has seen a recent resurgence promising to improve the efficiency of the design-make-test-analyse cycle; by computationally exploring much larger chemical spaces than traditional virtual screening techniques. However, most generative models thus far have only utilized small-molecule information to train and condition de novo molecule generators. Here, we instead focus on recent approaches that incorporate protein structure into de novo molecule optimization in an attempt to maximize the predicted on-target binding affinity of generated molecules. We summarize these structure integration principles into either distribution learning or goal-directed optimization and for each case whether the approach is protein structure-explicit or implicit with respect to the generative model. We discuss recent approaches in the context of this categorization and provide our perspective on the future direction of the field.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK. https://twitter.com/@AndreasBenderUK
| | - Chris de Graaf
- Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK. https://twitter.com/@Chris_de_Graaf
| |
Collapse
|
11
|
Joshi RP, Schultz KJ, Wilson JW, Kruel A, Varikoti RA, Kombala CJ, Kneller DW, Galanie S, Phillips G, Zhang Q, Coates L, Parvathareddy J, Surendranathan S, Kong Y, Clyde A, Ramanathan A, Jonsson CB, Brandvold KR, Zhou M, Head MS, Kovalevsky A, Kumar N. AI-Accelerated Design of Targeted Covalent Inhibitors for SARS-CoV-2. J Chem Inf Model 2023; 63:1438-1453. [PMID: 36808989 PMCID: PMC9969887 DOI: 10.1021/acs.jcim.2c01377] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Indexed: 02/23/2023]
Abstract
Direct-acting antivirals for the treatment of the COVID-19 pandemic caused by the SARS-CoV-2 virus are needed to complement vaccination efforts. Given the ongoing emergence of new variants, automated experimentation, and active learning based fast workflows for antiviral lead discovery remain critical to our ability to address the pandemic's evolution in a timely manner. While several such pipelines have been introduced to discover candidates with noncovalent interactions with the main protease (Mpro), here we developed a closed-loop artificial intelligence pipeline to design electrophilic warhead-based covalent candidates. This work introduces a deep learning-assisted automated computational workflow to introduce linkers and an electrophilic "warhead" to design covalent candidates and incorporates cutting-edge experimental techniques for validation. Using this process, promising candidates in the library were screened, and several potential hits were identified and tested experimentally using native mass spectrometry and fluorescence resonance energy transfer (FRET)-based screening assays. We identified four chloroacetamide-based covalent inhibitors of Mpro with micromolar affinities (KI of 5.27 μM) using our pipeline. Experimentally resolved binding modes for each compound were determined using room-temperature X-ray crystallography, which is consistent with the predicted poses. The induced conformational changes based on molecular dynamics simulations further suggest that the dynamics may be an important factor to further improve selectivity, thereby effectively lowering KI and reducing toxicity. These results demonstrate the utility of our modular and data-driven approach for potent and selective covalent inhibitor discovery and provide a platform to apply it to other emerging targets.
Collapse
Affiliation(s)
- Rajendra P. Joshi
- Earth and Biological Sciences Directorate,
Pacific Northwest National Laboratory, Richland, Washington
99352, United States
| | - Katherine J. Schultz
- Earth and Biological Sciences Directorate,
Pacific Northwest National Laboratory, Richland, Washington
99352, United States
| | - Jesse William Wilson
- Earth and Biological Sciences Directorate,
Pacific Northwest National Laboratory, Richland, Washington
99352, United States
| | - Agustin Kruel
- Earth and Biological Sciences Directorate,
Pacific Northwest National Laboratory, Richland, Washington
99352, United States
| | - Rohith Anand Varikoti
- Earth and Biological Sciences Directorate,
Pacific Northwest National Laboratory, Richland, Washington
99352, United States
| | - Chathuri J. Kombala
- Elson S. Floyd College of Medicine, Department of
Nutrition and Exercise Physiology, Washington State University,
Spokane, Washington 99202, United States
| | - Daniel W. Kneller
- Neutron Scattering Division, Oak Ridge
National Laboratory, Oak Ridge, Tennessee 37831, United
States
- National Virtual Biotechnology Laboratory,
US Department of Energy, Washington, District of Columbia
20585, United States
| | - Stephanie Galanie
- National Virtual Biotechnology Laboratory,
US Department of Energy, Washington, District of Columbia
20585, United States
- Biosciences Division, Oak Ridge National
Laboratory, Oak Ridge, Tennessee 37831, United
States
- Department of Process Research and Development,
Merck & Co., Inc., 126 E. Lincoln Avenue, Rahway, New
Jersey 07065, United States
| | - Gwyndalyn Phillips
- Neutron Scattering Division, Oak Ridge
National Laboratory, Oak Ridge, Tennessee 37831, United
States
- National Virtual Biotechnology Laboratory,
US Department of Energy, Washington, District of Columbia
20585, United States
| | - Qiu Zhang
- Neutron Scattering Division, Oak Ridge
National Laboratory, Oak Ridge, Tennessee 37831, United
States
- National Virtual Biotechnology Laboratory,
US Department of Energy, Washington, District of Columbia
20585, United States
| | - Leighton Coates
- National Virtual Biotechnology Laboratory,
US Department of Energy, Washington, District of Columbia
20585, United States
- Second Target Station, Oak Ridge National
Laboratory, Oak Ridge, Tennessee 37831, United
States
| | - Jyothi Parvathareddy
- Regional Biocontainment Laboratory, The
University of Tennessee Health Science Center, Memphis, Tennessee 38105,
United States
| | - Surekha Surendranathan
- Regional Biocontainment Laboratory, The
University of Tennessee Health Science Center, Memphis, Tennessee 38105,
United States
| | - Ying Kong
- Regional Biocontainment Laboratory, The
University of Tennessee Health Science Center, Memphis, Tennessee 38105,
United States
| | - Austin Clyde
- National Virtual Biotechnology Laboratory,
US Department of Energy, Washington, District of Columbia
20585, United States
- Data Science and Learning Division,
Argonne National Laboratory, Lemont, Illinois 60439,
United States
| | - Arvind Ramanathan
- National Virtual Biotechnology Laboratory,
US Department of Energy, Washington, District of Columbia
20585, United States
- Data Science and Learning Division,
Argonne National Laboratory, Lemont, Illinois 60439,
United States
| | - Colleen B. Jonsson
- Regional Biocontainment Laboratory, The
University of Tennessee Health Science Center, Memphis, Tennessee 38105,
United States
- Institute for the Study of Host-Pathogen Systems,
University of Tennessee Health Science Center, Memphis,
Tennessee 38103, United States
- Department of Microbiology, Immunology and
Biochemistry, University of Tennessee Health Science Center,
Memphis, Tennessee 38103, United States
| | - Kristoffer R. Brandvold
- Earth and Biological Sciences Directorate,
Pacific Northwest National Laboratory, Richland, Washington
99352, United States
- Elson S. Floyd College of Medicine, Department of
Nutrition and Exercise Physiology, Washington State University,
Spokane, Washington 99202, United States
| | - Mowei Zhou
- Earth and Biological Sciences Directorate,
Pacific Northwest National Laboratory, Richland, Washington
99352, United States
- National Virtual Biotechnology Laboratory,
US Department of Energy, Washington, District of Columbia
20585, United States
| | - Martha S. Head
- National Virtual Biotechnology Laboratory,
US Department of Energy, Washington, District of Columbia
20585, United States
- Joint Institute for Biological Sciences,
Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831,
United States
- Center for Research Acceleration by Digital
Innovation, Amgen Research, Thousand Oaks, California 91320,
United States
| | - Andrey Kovalevsky
- Neutron Scattering Division, Oak Ridge
National Laboratory, Oak Ridge, Tennessee 37831, United
States
- National Virtual Biotechnology Laboratory,
US Department of Energy, Washington, District of Columbia
20585, United States
| | - Neeraj Kumar
- Earth and Biological Sciences Directorate,
Pacific Northwest National Laboratory, Richland, Washington
99352, United States
- National Virtual Biotechnology Laboratory,
US Department of Energy, Washington, District of Columbia
20585, United States
| |
Collapse
|
12
|
Westermayr J, Gilkes J, Barrett R, Maurer RJ. High-throughput property-driven generative design of functional organic molecules. NATURE COMPUTATIONAL SCIENCE 2023; 3:139-148. [PMID: 38177626 DOI: 10.1038/s43588-022-00391-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 12/14/2022] [Indexed: 01/06/2024]
Abstract
The design of molecules and materials with tailored properties is challenging, as candidate molecules must satisfy multiple competing requirements that are often difficult to measure or compute. While molecular structures produced through generative deep learning will satisfy these patterns, they often only possess specific target properties by chance and not by design, which makes molecular discovery via this route inefficient. In this work, we predict molecules with (Pareto-)optimal properties by combining a generative deep learning model that predicts three-dimensional conformations of molecules with a supervised deep learning model that takes these as inputs and predicts their electronic structure. Optimization of (multiple) molecular properties is achieved by screening newly generated molecules for desirable electronic properties and reusing hit molecules to retrain the generative model with a bias. The approach is demonstrated to find optimal molecules for organic electronics applications. Our method is generally applicable and eliminates the need for quantum chemical calculations during predictions, making it suitable for high-throughput screening in materials and catalyst design.
Collapse
Affiliation(s)
- Julia Westermayr
- Department of Chemistry, University of Warwick, Coventry, UK.
- Wilhelm-Ostwald-Institut für Physikalische und Theoretische Chemie, Universität Leipzig, Leipzig, Germany.
| | - Joe Gilkes
- Department of Chemistry, University of Warwick, Coventry, UK
- HetSys Centre for Doctoral Training, University of Warwick, Coventry, UK
| | - Rhyan Barrett
- Department of Chemistry, University of Warwick, Coventry, UK
- Wilhelm-Ostwald-Institut für Physikalische und Theoretische Chemie, Universität Leipzig, Leipzig, Germany
| | | |
Collapse
|
13
|
Wang M, Wang J, Weng G, Kang Y, Pan P, Li D, Deng Y, Li H, Hsieh CY, Hou T. ReMODE: a deep learning-based web server for target-specific drug design. J Cheminform 2022; 14:84. [PMID: 36510307 PMCID: PMC9743675 DOI: 10.1186/s13321-022-00665-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 12/01/2022] [Indexed: 12/14/2022] Open
Abstract
Deep learning (DL) and machine learning contribute significantly to basic biology research and drug discovery in the past few decades. Recent advances in DL-based generative models have led to superior developments in de novo drug design. However, data availability, deep data processing, and the lack of user-friendly DL tools and interfaces make it difficult to apply these DL techniques to drug design. We hereby present ReMODE (Receptor-based MOlecular DEsign), a new web server based on DL algorithm for target-specific ligand design, which integrates different functional modules to enable users to develop customizable drug design tasks. As designed, the ReMODE sever can construct the target-specific tasks toward the protein targets selected by users. Meanwhile, the server also provides some extensions: users can optimize the drug-likeness or synthetic accessibility of the generated molecules, and control other physicochemical properties; users can also choose a sub-structure/scaffold as a starting point for fragment-based drug design. The ReMODE server also enables users to optimize the pharmacophore matching and docking conformations of the generated molecules. We believe that the ReMODE server will benefit researchers for drug discovery. ReMODE is publicly available at http://cadd.zju.edu.cn/relation/remode/ .
Collapse
Affiliation(s)
- Mingyang Wang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China ,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Jike Wang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China ,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Gaoqi Weng
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China ,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Yu Kang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Peichen Pan
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Dan Li
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Honglin Li
- grid.28056.390000 0001 2163 4895Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237 People’s Republic of China
| | - Chang-Yu Hsieh
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Tingjun Hou
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| |
Collapse
|
14
|
Chan L, Kumar R, Verdonk M, Poelking C. A multilevel generative framework with hierarchical self-contrasting for bias control and transparency in structure-based ligand design. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00564-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
15
|
Sun Y, Jiao Y, Shi C, Zhang Y. Deep learning-based molecular dynamics simulation for structure-based drug design against SARS-CoV-2. Comput Struct Biotechnol J 2022; 20:5014-5027. [PMID: 36091720 PMCID: PMC9448712 DOI: 10.1016/j.csbj.2022.09.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 08/03/2022] [Accepted: 09/03/2022] [Indexed: 11/26/2022] Open
Abstract
Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2), has led to a global pandemic. Deep learning (DL) technology and molecular dynamics (MD) simulation are two mainstream computational approaches to investigate the geometric, chemical and structural features of protein and guide the relevant drug design. Despite a large amount of research papers focusing on drug design for SARS-COV-2 using DL architectures, it remains unclear how the binding energy of the protein-protein/ligand complex dynamically evolves which is also vital for drug development. In addition, traditional deep neural networks usually have obvious deficiencies in predicting the interaction sites as protein conformation changes. In this review, we introduce the latest progresses of the DL and DL-based MD simulation approaches in structure-based drug design (SBDD) for SARS-CoV-2 which could address the problems of protein structure and binding prediction, drug virtual screening, molecular docking and complex evolution. Furthermore, the current challenges and future directions of DL-based MD simulation for SBDD are also discussed.
Collapse
Affiliation(s)
- Yao Sun
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Yanqi Jiao
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Chengcheng Shi
- State Key Lab of Urban Water Resource and Environment, School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Yang Zhang
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| |
Collapse
|
16
|
Wang A, Durrant JD. Open-Source Browser-Based Tools for Structure-Based Computer-Aided Drug Discovery. Molecules 2022; 27:4623. [PMID: 35889494 PMCID: PMC9319651 DOI: 10.3390/molecules27144623] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 07/17/2022] [Accepted: 07/18/2022] [Indexed: 01/27/2023] Open
Abstract
We here outline the importance of open-source, accessible tools for computer-aided drug discovery (CADD). We begin with a discussion of drug discovery in general to provide context for a subsequent discussion of structure-based CADD applied to small-molecule ligand discovery. Next, we identify usability challenges common to many open-source CADD tools. To address these challenges, we propose a browser-based approach to CADD tool deployment in which CADD calculations run in modern web browsers on users' local computers. The browser app approach eliminates the need for user-initiated download and installation, ensures broad operating system compatibility, enables easy updates, and provides a user-friendly graphical user interface. Unlike server apps-which run calculations "in the cloud" rather than on users' local computers-browser apps do not require users to upload proprietary information to a third-party (remote) server. They also eliminate the need for the difficult-to-maintain computer infrastructure required to run user-initiated calculations remotely. We conclude by describing some CADD browser apps developed in our lab, which illustrate the utility of this approach. Aside from introducing readers to these specific tools, we are hopeful that this review highlights the need for additional browser-compatible, user-friendly CADD software.
Collapse
Affiliation(s)
| | - Jacob D. Durrant
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA;
| |
Collapse
|
17
|
Wang M, Hsieh CY, Wang J, Wang D, Weng G, Shen C, Yao X, Bing Z, Li H, Cao D, Hou T. RELATION: A Deep Generative Model for Structure-Based De Novo Drug Design. J Med Chem 2022; 65:9478-9492. [PMID: 35713420 DOI: 10.1021/acs.jmedchem.2c00732] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Deep learning (DL)-based de novo molecular design has recently gained considerable traction. Many DL-based generative models have been successfully developed to design novel molecules, but most of them are ligand-centric and the role of the 3D geometries of target binding pockets in molecular generation has not been well-exploited. Here, we proposed a new 3D-based generative model called RELATION. In the RELATION model, the BiTL algorithm was specifically designed to extract and transfer the desired geometric features of the protein-ligand complexes to a latent space for generation. The pharmacophore conditioning and docking-based Bayesian sampling were applied to efficiently navigate the vast chemical space for the design of molecules with desired geometric properties and pharmacophore features. As a proof of concept, the RELATION model was used to design inhibitors for two targets, AKT1 and CDK2. The calculation results demonstrated that the RELATION model could efficiently generate novel molecules with favorable binding affinity and pharmacophore features.
Collapse
Affiliation(s)
- Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Chang-Yu Hsieh
- Tencent, Tencent Quantum Lab, Shenzhen 518057, Guangdong, P. R. China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Gaoqi Weng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Xiaojun Yao
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery Macau Institute for Applied Research in Medicine and Health State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa 999078, Macau, P. R. China
| | - Zhitong Bing
- Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou 730000, P. R. China
| | - Honglin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai 200237, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| |
Collapse
|
18
|
Xie W, Wang F, Li Y, Lai L, Pei J. Advances and Challenges in De Novo Drug Design Using Three-Dimensional Deep Generative Models. J Chem Inf Model 2022; 62:2269-2279. [PMID: 35544331 DOI: 10.1021/acs.jcim.2c00042] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A persistent goal for de novo drug design is to generate novel chemical compounds with desirable properties in a labor-, time-, and cost-efficient manner. Deep generative models provide alternative routes to this goal. Numerous model architectures and optimization strategies have been explored in recent years, most of which have been developed to generate two-dimensional molecular structures. Some generative models aiming at three-dimensional (3D) molecule generation have also been proposed, gaining attention for their unique advantages and potential to directly design drug-like molecules in a target-conditioning manner. This review highlights current developments in 3D molecular generative models combined with deep learning and discusses future directions for de novo drug design.
Collapse
Affiliation(s)
- Weixin Xie
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Fanhao Wang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Yibo Li
- Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China.,Peking-Tsinghua Center for Life Science at BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| |
Collapse
|
19
|
Gebauer NWA, Gastegger M, Hessmann SSP, Müller KR, Schütt KT. Inverse design of 3d molecular structures with conditional generative neural networks. Nat Commun 2022; 13:973. [PMID: 35190542 PMCID: PMC8861047 DOI: 10.1038/s41467-022-28526-y] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 01/28/2022] [Indexed: 11/09/2022] Open
Abstract
The rational design of molecules with desired properties is a long-standing challenge in chemistry. Generative neural networks have emerged as a powerful approach to sample novel molecules from a learned distribution. Here, we propose a conditional generative neural network for 3d molecular structures with specified chemical and structural properties. This approach is agnostic to chemical bonding and enables targeted sampling of novel molecules from conditional distributions, even in domains where reference calculations are sparse. We demonstrate the utility of our method for inverse design by generating molecules with specified motifs or composition, discovering particularly stable molecules, and jointly targeting multiple electronic properties beyond the training regime.
Collapse
Affiliation(s)
- Niklas W A Gebauer
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- Berlin Institute for the Foundations of Learning and Data, 10587, Berlin, Germany.
- BASLEARN-TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587, Berlin, Germany.
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- BASLEARN-TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587, Berlin, Germany
| | - Stefaan S P Hessmann
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data, 10587, Berlin, Germany
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea
- Max-Planck-Institut für Informatik, 66123, Saarbrücken, Germany
| | - Kristof T Schütt
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- Berlin Institute for the Foundations of Learning and Data, 10587, Berlin, Germany.
| |
Collapse
|
20
|
Abstract
Abstract
Machine learning (ML) has revolutionised the field of structure-based drug design (SBDD) in recent years. During the training stage, ML techniques typically analyse large amounts of experimentally determined data to create predictive models in order to inform the drug discovery process. Deep learning (DL) is a subfield of ML, that relies on multiple layers of a neural network to extract significantly more complex patterns from experimental data, and has recently become a popular choice in SBDD. This review provides a thorough summary of the recent DL trends in SBDD with a particular focus on de novo drug design, binding site prediction, and binding affinity prediction of small molecules.
Collapse
|
21
|
Joshi RP, Kumar N. Artificial Intelligence for Autonomous Molecular Design: A Perspective. Molecules 2021; 26:6761. [PMID: 34833853 PMCID: PMC8619999 DOI: 10.3390/molecules26226761] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 10/23/2021] [Accepted: 10/29/2021] [Indexed: 11/23/2022] Open
Abstract
Domain-aware artificial intelligence has been increasingly adopted in recent years to expedite molecular design in various applications, including drug design and discovery. Recent advances in areas such as physics-informed machine learning and reasoning, software engineering, high-end hardware development, and computing infrastructures are providing opportunities to build scalable and explainable AI molecular discovery systems. This could improve a design hypothesis through feedback analysis, data integration that can provide a basis for the introduction of end-to-end automation for compound discovery and optimization, and enable more intelligent searches of chemical space. Several state-of-the-art ML architectures are predominantly and independently used for predicting the properties of small molecules, their high throughput synthesis, and screening, iteratively identifying and optimizing lead therapeutic candidates. However, such deep learning and ML approaches also raise considerable conceptual, technical, scalability, and end-to-end error quantification challenges, as well as skepticism about the current AI hype to build automated tools. To this end, synergistically and intelligently using these individual components along with robust quantum physics-based molecular representation and data generation tools in a closed-loop holds enormous promise for accelerated therapeutic design to critically analyze the opportunities and challenges for their more widespread application. This article aims to identify the most recent technology and breakthrough achieved by each of the components and discusses how such autonomous AI and ML workflows can be integrated to radically accelerate the protein target or disease model-based probe design that can be iteratively validated experimentally. Taken together, this could significantly reduce the timeline for end-to-end therapeutic discovery and optimization upon the arrival of any novel zoonotic transmission event. Our article serves as a guide for medicinal, computational chemistry and biology, analytical chemistry, and the ML community to practice autonomous molecular design in precision medicine and drug discovery.
Collapse
Affiliation(s)
| | - Neeraj Kumar
- Computational Biology Group, Biological Science Division, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA;
| |
Collapse
|