1
|
Guo J, Schwaller P. Augmented Memory: Sample-Efficient Generative Molecular Design with Reinforcement Learning. JACS AU 2024; 4:2160-2172. [PMID: 38938817 PMCID: PMC11200228 DOI: 10.1021/jacsau.4c00066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/29/2024] [Accepted: 04/01/2024] [Indexed: 06/29/2024]
Abstract
Sample efficiency is a fundamental challenge in de novo molecular design. Ideally, molecular generative models should learn to satisfy a desired objective under minimal calls to oracles (computational property predictors). This problem becomes more apparent when using oracles that can provide increased predictive accuracy but impose significant computational cost. Consequently, designing molecules that are optimized for such oracles cannot be achieved under a practical computational budget. Molecular generative models based on simplified molecular-input line-entry system (SMILES) have shown remarkable sample efficiency when coupled with reinforcement learning, as demonstrated in the practical molecular optimization (PMO) benchmark. Here, we first show that experience replay drastically improves the performance of multiple previously proposed algorithms. Next, we propose a novel algorithm called Augmented Memory that combines data augmentation with experience replay. We show that scores obtained from oracle calls can be reused to update the model multiple times. We compare Augmented Memory to previously proposed algorithms and show significantly enhanced sample efficiency in an exploitation task, a drug discovery case study requiring both exploration and exploitation, and a materials design case study optimizing explicitly for quantum-mechanical properties. Our method achieves a new state-of-the-art in sample-efficient de novo molecular design, outperforming all of the previously reported methods. The code is available at https://github.com/schwallergroup/augmented_memory.
Collapse
Affiliation(s)
- Jeff Guo
- Laboratory
of Artificial Chemical Intelligence (LIAC), Institut des Sciences
et Ingénierie Chimiques, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Philippe Schwaller
- Laboratory
of Artificial Chemical Intelligence (LIAC), Institut des Sciences
et Ingénierie Chimiques, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| |
Collapse
|
2
|
Thomas M, O'Boyle NM, Bender A, De Graaf C. MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design. J Cheminform 2024; 16:64. [PMID: 38816825 PMCID: PMC11141043 DOI: 10.1186/s13321-024-00861-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 05/15/2024] [Indexed: 06/01/2024] Open
Abstract
Generative models are undergoing rapid research and application to de novo drug design. To facilitate their application and evaluation, we present MolScore. MolScore already contains many drug-design-relevant scoring functions commonly used in benchmarks such as, molecular similarity, molecular docking, predictive models, synthesizability, and more. In addition, providing performance metrics to evaluate generative model performance based on the chemistry generated. With this unification of functionality, MolScore re-implements commonly used benchmarks in the field (such as GuacaMol, MOSES, and MolOpt). Moreover, new benchmarks can be created trivially. We demonstrate this by testing a chemical language model with reinforcement learning on three new tasks of increasing complexity related to the design of 5-HT2a ligands that utilise either molecular descriptors, 266 pre-trained QSAR models, or dual molecular docking. Lastly, MolScore can be integrated into an existing Python script with just three lines of code. This framework is a step towards unifying generative model application and evaluation as applied to drug design for both practitioners and researchers. The framework can be found on GitHub and downloaded directly from the Python Package Index.Scientific ContributionMolScore is an open-source platform to facilitate generative molecular design and evaluation thereof for application in drug design. This platform takes important steps towards unifying existing benchmarks, providing a platform to share new benchmarks, and improves customisation, flexibility and usability for practitioners over existing solutions.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.
| | - Noel M O'Boyle
- Computational Chemistry, Nxera Pharma, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
| | - Chris De Graaf
- Computational Chemistry, Nxera Pharma, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK
| |
Collapse
|
3
|
Dodds M, Guo J, Löhr T, Tibo A, Engkvist O, Janet JP. Sample efficient reinforcement learning with active learning for molecular design. Chem Sci 2024; 15:4146-4160. [PMID: 38487235 PMCID: PMC10935729 DOI: 10.1039/d3sc04653b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 02/07/2024] [Indexed: 03/17/2024] Open
Abstract
Reinforcement learning (RL) is a powerful and flexible paradigm for searching for solutions in high-dimensional action spaces. However, bridging the gap between playing computer games with thousands of simulated episodes and solving real scientific problems with complex and involved environments (up to actual laboratory experiments) requires improvements in terms of sample efficiency to make the most of expensive information. The discovery of new drugs is a major commercial application of RL, motivated by the very large nature of the chemical space and the need to perform multiparameter optimization (MPO) across different properties. In silico methods, such as virtual library screening (VS) and de novo molecular generation with RL, show great promise in accelerating this search. However, incorporation of increasingly complex computational models in these workflows requires increasing sample efficiency. Here, we introduce an active learning system linked with an RL model (RL-AL) for molecular design, which aims to improve the sample-efficiency of the optimization process. We identity and characterize unique challenges combining RL and AL, investigate the interplay between the systems, and develop a novel AL approach to solve the MPO problem. Our approach greatly expedites the search for novel solutions relative to baseline-RL for simple ligand- and structure-based oracle functions, with a 5-66-fold increase in hits generated for a fixed oracle budget and a 4-64-fold reduction in computational time to find a specific number of hits. Furthermore, compounds discovered through RL-AL display substantial enrichment of a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds, without a reduction in output diversity. This significant acceleration improves the feasibility of oracle functions that have largely been overlooked in RL due to high computational costs, for example free energy perturbation methods, and in principle is applicable to any RL domain.
Collapse
Affiliation(s)
- Michael Dodds
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jeff Guo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Thomas Löhr
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| |
Collapse
|
4
|
Karrenbrock M, Rizzi V, Procacci P, Gervasio FL. Addressing Suboptimal Poses in Nonequilibrium Alchemical Calculations. J Phys Chem B 2024; 128:1595-1605. [PMID: 38323915 DOI: 10.1021/acs.jpcb.3c06516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2024]
Abstract
Alchemical transformations can be used to quantitatively estimate absolute binding free energies at a reasonable computational cost. However, most of the approaches currently in use require knowledge of the correct (crystallographic) pose. In this paper, we present a combined Hamiltonian replica exchange nonequilibrium alchemical method that allows us to reliably calculate absolute binding free energies, even when starting from suboptimal initial binding poses. Performing a preliminary Hamiltonian replica exchange enhances the sampling of slow degrees of freedom of the ligand and the target, allowing the system to populate the correct binding pose when starting from an approximate docking pose. We apply the method on 6 ligands of the first bromodomain of the BRD4 bromodomain-containing protein. For each ligand, we start nonequilibrium alchemical transformations from both the crystallographic pose and the top-scoring docked pose that are often significantly different. We show that the method produces statistically equivalent binding free energies, making it a useful tool for computational drug discovery pipelines.
Collapse
Affiliation(s)
- Maurice Karrenbrock
- School of Pharmaceutical Sciences, University of Geneva, Rue Michel-Servet 1, CH-1206 Geneva, Switzerland
| | - Valerio Rizzi
- School of Pharmaceutical Sciences, University of Geneva, Rue Michel-Servet 1, CH-1206 Geneva, Switzerland
| | - Piero Procacci
- Chemistry Department, University of Florence, Via della Lastruccia 3-13, 50019 Sesto Fiorentino, Italy
| | - Francesco Luigi Gervasio
- School of Pharmaceutical Sciences, University of Geneva, Rue Michel-Servet 1, CH-1206 Geneva, Switzerland
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, CH-1206 Geneva, Switzerland
- Chemistry Department, University College London (UCL), WC1E 6BT London, U.K
- Swiss Bioinformatics Institute, University of Geneva, CH-1206 Geneva, Switzerland
| |
Collapse
|
5
|
Loeffler HH, He J, Tibo A, Janet JP, Voronov A, Mervin LH, Engkvist O. Reinvent 4: Modern AI-driven generative molecule design. J Cheminform 2024; 16:20. [PMID: 38383444 PMCID: PMC10882833 DOI: 10.1186/s13321-024-00812-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/09/2024] [Indexed: 02/23/2024] Open
Abstract
REINVENT 4 is a modern open-source generative AI framework for the design of small molecules. The software utilizes recurrent neural networks and transformer architectures to drive molecule generation. These generators are seamlessly embedded within the general machine learning optimization algorithms, transfer learning, reinforcement learning and curriculum learning. REINVENT 4 enables and facilitates de novo design, R-group replacement, library design, linker design, scaffold hopping and molecule optimization. This contribution gives an overview of the software and describes its design. Algorithms and their applications are discussed in detail. REINVENT 4 is a command line tool which reads a user configuration in either TOML or JSON format. The aim of this release is to provide reference implementations for some of the most common algorithms in AI based molecule generation. An additional goal with the release is to create a framework for education and future innovation in AI based molecular design. The software is available from https://github.com/MolecularAI/REINVENT4 and released under the permissive Apache 2.0 license. Scientific contribution. The software provides an open-source reference implementation for generative molecular design where the software is also being used in production to support in-house drug discovery projects. The publication of the most common machine learning algorithms in one code and full documentation thereof will increase transparency of AI and foster innovation, collaboration and education.
Collapse
Affiliation(s)
- Hannes H Loeffler
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| | - Jiazhen He
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Alexey Voronov
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Lewis H Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
6
|
Zhang H, Huang J, Xie J, Huang W, Yang Y, Xu M, Lei J, Chen H. GRELinker: A Graph-Based Generative Model for Molecular Linker Design with Reinforcement and Curriculum Learning. J Chem Inf Model 2024; 64:666-676. [PMID: 38241022 DOI: 10.1021/acs.jcim.3c01700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2024]
Abstract
Fragment-based drug discovery (FBDD) is widely used in drug design. One useful strategy in FBDD is designing linkers for linking fragments to optimize their molecular properties. In the current study, we present a novel generative fragment linking model, GRELinker, which utilizes a gated-graph neural network combined with reinforcement and curriculum learning to generate molecules with desirable attributes. The model has been shown to be efficient in multiple tasks, including controlling log P, optimizing synthesizability or predicted bioactivity of compounds, and generating molecules with high 3D similarity but low 2D similarity to the lead compound. Specifically, our model outperforms the previously reported reinforcement learning (RL) built-in method DRlinker on these benchmark tasks. Moreover, GRELinker has been successfully used in an actual FBDD case to generate optimized molecules with enhanced affinities by employing the docking score as the scoring function in RL. Besides, the implementation of curriculum learning in our framework enables the generation of structurally complex linkers more efficiently. These results demonstrate the benefits and feasibility of GRELinker in linker design for molecular optimization and drug discovery.
Collapse
Affiliation(s)
- Hao Zhang
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Jinchao Huang
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Junjie Xie
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
| | - Weifeng Huang
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
| | - Mingyuan Xu
- Guangzhou National Laboratory, Guangzhou International Bio Island, No. 9 Xin Dao Huan Bei Road, Guangzhou 510005, China
| | - Jinping Lei
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Hongming Chen
- Guangzhou National Laboratory, Guangzhou International Bio Island, No. 9 Xin Dao Huan Bei Road, Guangzhou 510005, China
| |
Collapse
|
7
|
Knight IS, Mailhot O, Tang KG, Irwin JJ. DockOpt: A Tool for Automatic Optimization of Docking Models. J Chem Inf Model 2024; 64:1004-1016. [PMID: 38206771 PMCID: PMC10865354 DOI: 10.1021/acs.jcim.3c01406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 12/17/2023] [Accepted: 12/26/2023] [Indexed: 01/13/2024]
Abstract
Molecular docking is a widely used technique for leveraging protein structure for ligand discovery, but it remains difficult to utilize due to limitations that have not been adequately addressed. Despite some progress toward automation, docking still requires expert guidance, hindering its adoption by a broader range of investigators. To make docking more accessible, we developed a new utility called DockOpt, which automates the creation, evaluation, and optimization of docking models prior to their deployment in large-scale prospective screens. DockOpt outperforms our previous automated pipeline across all 43 targets in the DUDE-Z benchmark data set, and the generated models for 84% of targets demonstrate sufficient enrichment to warrant their use in prospective screens, with normalized LogAUC values of at least 15%. DockOpt is available as part of the Python package Pydock3 included in the UCSF DOCK 3.8 distribution, which is available for free to academic researchers at https://dock.compbio.ucsf.edu and free for everyone upon registration at https://tldr.docking.org.
Collapse
Affiliation(s)
- Ian S. Knight
- Department of Pharmaceutical Chemistry, UCSF, 1700 Fourth Street, San Francisco, California 94158-2330, United States
| | - Olivier Mailhot
- Department of Pharmaceutical Chemistry, UCSF, 1700 Fourth Street, San Francisco, California 94158-2330, United States
| | - Khanh G. Tang
- Department of Pharmaceutical Chemistry, UCSF, 1700 Fourth Street, San Francisco, California 94158-2330, United States
| | - John J. Irwin
- Department of Pharmaceutical Chemistry, UCSF, 1700 Fourth Street, San Francisco, California 94158-2330, United States
| |
Collapse
|
8
|
Flachsenberg F, Ehrt C, Gutermuth T, Rarey M. Redocking the PDB. J Chem Inf Model 2024; 64:219-237. [PMID: 38108627 DOI: 10.1021/acs.jcim.3c01573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Molecular docking is a standard technique in structure-based drug design (SBDD). It aims to predict the 3D structure of a small molecule in the binding site of a receptor (often a protein). Despite being a common technique, it often necessitates multiple tools and involves manual steps. Here, we present the JAMDA preprocessing and docking workflow that is easy to use and allows fully automated docking. We evaluate the JAMDA docking workflow on binding sites extracted from the complete PDB and derive key factors determining JAMDA's docking performance. With that, we try to remove most of the bias due to manual intervention and provide a realistic estimate of the redocking performance of our JAMDA preprocessing and docking workflow for any PDB structure. On this large PDBScan22 data set, our JAMDA workflow finds a pose with an RMSD of at most 2 Å to the crystal ligand on the top rank for 30.1% of the structures. When applying objective structure quality filters to the PDBScan22 data set, the success rate increases to 61.8%. Given the prepared structures from the JAMDA preprocessing pipeline, both JAMDA and the widely used AutoDock Vina perform comparably on this filtered data set (the PDBScan22-HQ data set).
Collapse
Affiliation(s)
- Florian Flachsenberg
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| | - Christiane Ehrt
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| | - Torben Gutermuth
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| | - Matthias Rarey
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| |
Collapse
|
9
|
Handa K, Thomas MC, Kageyama M, Iijima T, Bender A. On the difficulty of validating molecular generative models realistically: a case study on public and proprietary data. J Cheminform 2023; 15:112. [PMID: 37990215 PMCID: PMC10664602 DOI: 10.1186/s13321-023-00781-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 11/10/2023] [Indexed: 11/23/2023] Open
Abstract
While a multitude of deep generative models have recently emerged there exists no best practice for their practically relevant validation. On the one hand, novel de novo-generated molecules cannot be refuted by retrospective validation (so that this type of validation is biased); but on the other hand prospective validation is expensive and then often biased by the human selection process. In this case study, we frame retrospective validation as the ability to mimic human drug design, by answering the following question: Can a generative model trained on early-stage project compounds generate middle/late-stage compounds de novo? To this end, we used experimental data that contains the elapsed time of a synthetic expansion following hit identification from five public (where the time series was pre-processed to better reflect realistic synthetic expansions) and six in-house project datasets, and used REINVENT as a widely adopted RNN-based generative model. After splitting the dataset and training REINVENT on early-stage compounds, we found that rediscovery of middle/late-stage compounds was much higher in public projects (at 1.60%, 0.64%, and 0.21% of the top 100, 500, and 5000 scored generated compounds) than in in-house projects (where the values were 0.00%, 0.03%, and 0.04%, respectively). Similarly, average single nearest neighbour similarity between early- and middle/late-stage compounds in public projects was higher between active compounds than inactive compounds; however, for in-house projects the converse was true, which makes rediscovery (if so desired) more difficult. We hence show that the generative model recovers very few middle/late-stage compounds from real-world drug discovery projects, highlighting the fundamental difference between purely algorithmic design and drug discovery as a real-world process. Evaluating de novo compound design approaches appears, based on the current study, difficult or even impossible to do retrospectively.Scientific Contribution This contribution hence illustrates aspects of evaluating the performance of generative models in a real-world setting which have not been extensively described previously and which hopefully contribute to their further future development.
Collapse
Affiliation(s)
- Koichi Handa
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK.
- Toxicology & DMPK Research Department, Teijin Institute for Bio-Medical Research, Teijin Pharma Limited, 4-3-2 Asahigaoka, Hino-Shi, Tokyo, 191-8512, Japan.
| | - Morgan C Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Michiharu Kageyama
- Toxicology & DMPK Research Department, Teijin Institute for Bio-Medical Research, Teijin Pharma Limited, 4-3-2 Asahigaoka, Hino-Shi, Tokyo, 191-8512, Japan
| | - Takeshi Iijima
- Toxicology & DMPK Research Department, Teijin Institute for Bio-Medical Research, Teijin Pharma Limited, 4-3-2 Asahigaoka, Hino-Shi, Tokyo, 191-8512, Japan
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK.
| |
Collapse
|
10
|
Minibaeva G, Ivanova A, Polishchuk P. EasyDock: customizable and scalable docking tool. J Cheminform 2023; 15:102. [PMID: 37915072 PMCID: PMC10619229 DOI: 10.1186/s13321-023-00772-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 10/21/2023] [Indexed: 11/03/2023] Open
Abstract
Docking of large compound collections becomes an important procedure to discover new chemical entities. Screening of large sets of compounds may also occur in de novo design projects guided by molecular docking. To facilitate these processes, there is a need for automated tools capable of efficiently docking a large number of molecules using multiple computational nodes within a reasonable timeframe. These tools should also allow for easy integration of new docking programs and provide a user-friendly program interface to support the development of further approaches utilizing docking as a foundation. Currently available tools have certain limitations, such as lacking a convenient program interface or lacking support for distributed computations. In response to these limitations, we have developed a module called EasyDock. It can be deployed over a network of computational nodes using the Dask library, without requiring a specific cluster scheduler. Furthermore, we have proposed and implemented a simple model that predicts the runtime of docking experiments and applied it to minimize overall docking time. The current version of EasyDock supports popular docking programs, namely Autodock Vina, gnina, and smina. Additionally, we implemented a supplementary feature to enable docking of boron-containing compounds, which are not inherently supported by Vina and smina, and demonstrated its applicability on a set of 55 PDB protein-ligand complexes.
Collapse
Affiliation(s)
- Guzel Minibaeva
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| | - Aleksandra Ivanova
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| | - Pavel Polishchuk
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic.
| |
Collapse
|
11
|
Durai P, Lee SJ, Lee JW, Pan CH, Park K. Iterative machine learning-based chemical similarity search to identify novel chemical inhibitors. J Cheminform 2023; 15:86. [PMID: 37742003 PMCID: PMC10517535 DOI: 10.1186/s13321-023-00760-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 09/12/2023] [Indexed: 09/25/2023] Open
Abstract
Machine learning-based chemical screening has made substantial progress in recent years. However, these predictions often have low accuracy and high uncertainty when identifying new active chemical scaffolds. Hence, a high proportion of retrieved compounds are not structurally novel. In this study, we proposed a strategy to address this issue by iteratively optimizing an evolutionary chemical binding similarity (ECBS) model using experimental validation data. Various data update and model retraining schemes were tested to efficiently incorporate new experimental data into ECBS models, resulting in a fine-tuned ECBS model with improved accuracy and coverage. To demonstrate the effectiveness of our approach, we identified the novel hit molecules for the mitogen-activated protein kinase kinase 1 (MEK1). These molecules showed sub-micromolar affinity (Kd 0.1-5.3 μM) to MEKs and were distinct from previously-known MEK1 inhibitors. We also determined the binding specificity of different MEK isoforms and proposed potential docking models. Furthermore, using de novo drug design tools, we utilized one of the new MEK inhibitors to generate additional drug-like molecules with improved binding scores. This resulted in the identification of several potential MEK1 inhibitors with better binding affinity scores. Our results demonstrated the potential of this approach for identifying novel hit molecules and optimizing their binding affinities.
Collapse
Affiliation(s)
- Prasannavenkatesh Durai
- Natural Product Informatics Research Center, Korea Institute of Science and Technology, Gangneung, 25451, Republic of Korea
| | - Sue Jung Lee
- Natural Product Research Center, Korea Institute of Science and Technology, Gangneung, 25451, Republic of Korea
| | - Jae Wook Lee
- Natural Product Research Center, Korea Institute of Science and Technology, Gangneung, 25451, Republic of Korea
| | - Cheol-Ho Pan
- Natural Product Informatics Research Center, Korea Institute of Science and Technology, Gangneung, 25451, Republic of Korea
| | - Keunwan Park
- Natural Product Informatics Research Center, Korea Institute of Science and Technology, Gangneung, 25451, Republic of Korea.
- Department of YM-KIST Bio-Health Convergence, Yonsei University, Wonju, 26493, Republic of Korea.
| |
Collapse
|
12
|
de Chaves MA, da Costa BS, de Souza JA, Batista MA, de Andrade SF, Hage-Melim LIDS, Abegg M, Lopes MS, Fuentefria AM. In silico and in vitro analysis of the mechanisms of action of nitroxoline against some medically important opportunistic fungi. J Mycol Med 2023; 33:101411. [PMID: 37413753 DOI: 10.1016/j.mycmed.2023.101411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 06/04/2023] [Accepted: 06/26/2023] [Indexed: 07/08/2023]
Abstract
The increasing resistance to antifungal agents associated with toxicity and interactions turns therapeutic management of fungal infections difficult. This scenario emphasizes the importance of drug repositioning, such as nitroxoline - a urinary antibacterial agent that has shown potential antifungal activity. The aims of this study were to discover the possible therapeutic targets of nitroxoline using an in silico approach, and to determine the in vitro antifungal activity of the drug against the fungal cell wall and cytoplasmic membrane. We explored the biological activity of nitroxoline using PASS, SwissTargetPrediction and Cortellis Drug Discovery Intelligence web tools. After confirmation, the molecule was designed and optimized in HyperChem software. GOLD 2020.1 software was used to predict the interactions between the drug and the target proteins. In vitro investigation evaluated the effect of nitroxoline on the fungal cell wall through sorbitol protection assay. Ergosterol binding assay was carried out to assess the effect of the drug on the cytoplasmic membrane. In silico investigation revealed biological activity with alkane 1-monooxygenase and methionine aminopeptidase enzymes, showing nine and five interactions in the molecular docking, respectively. In vitro results exhibited no effect on the fungal cell wall or cytoplasmic membrane. Finally, nitroxoline has potential as an antifungal agent due to the interaction with alkane 1-monooxygenase and methionine aminopeptidase enzymes, which are not the main human therapeutic targets. These results have potentially revealed a new biological target for the treatment of fungal infections. We also consider that further studies are required to confirm the biological activity of nitroxoline on fungal cells, mainly the confirmation of the alkB gene.
Collapse
Affiliation(s)
- Magda Antunes de Chaves
- Graduate Program in Agricultural and Environmental Microbiology, Federal University of Rio Grande do Sul, Porto Alegre, Brazil.
| | - Bárbara Souza da Costa
- Graduate Program in Pharmaceutical Sciences, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| | - Jade André de Souza
- Graduate Program in Agricultural and Environmental Microbiology, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| | - Mateus Alves Batista
- Laboratory of Pharmaceutical and Medicinal Chemistry (PharMedChem), Federal University of Amapá, Rod JK Km 2, Macapá, Amapá, Brazil
| | - Saulo Fernandes de Andrade
- Graduate Program in Pharmaceutical Sciences, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| | | | - Maxwell Abegg
- Institute of Exact Sciences and Technology, Federal University of Amazonas, Itacoatiara, Amazonas, Brazil
| | - Marcela Silva Lopes
- Graduate Program in Pharmaceutical Sciences, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| | - Alexandre Meneghello Fuentefria
- Graduate Program in Agricultural and Environmental Microbiology, Federal University of Rio Grande do Sul, Porto Alegre, Brazil; Graduate Program in Pharmaceutical Sciences, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| |
Collapse
|
13
|
Bjerrum EJ, Margreitter C, Blaschke T, Kolarova S, de Castro RLR. Faster and more diverse de novo molecular optimization with double-loop reinforcement learning using augmented SMILES. J Comput Aided Mol Des 2023:10.1007/s10822-023-00512-6. [PMID: 37329395 DOI: 10.1007/s10822-023-00512-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 05/29/2023] [Indexed: 06/19/2023]
Abstract
Using generative deep learning models and reinforcement learning together can effectively generate new molecules with desired properties. By employing a multi-objective scoring function, thousands of high-scoring molecules can be generated, making this approach useful for drug discovery and material science. However, the application of these methods can be hindered by computationally expensive or time-consuming scoring procedures, particularly when a large number of function calls are required as feedback in the reinforcement learning optimization. Here, we propose the use of double-loop reinforcement learning with simplified molecular line entry system (SMILES) augmentation to improve the efficiency and speed of the optimization. By adding an inner loop that augments the generated SMILES strings to non-canonical SMILES for use in additional reinforcement learning rounds, we can both reuse the scoring calculations on the molecular level, thereby speeding up the learning process, as well as offer additional protection against mode collapse. We find that employing between 5 and 10 augmentation repetitions is optimal for the scoring functions tested and is further associated with an increased diversity in the generated compounds, improved reproducibility of the sampling runs and the generation of molecules of higher similarity to known ligands.
Collapse
Affiliation(s)
| | | | | | | | - Raquel López-Ríos de Castro
- Odyssey Therapeutics, Cambridge, MA, USA
- Department of Physics and Department of Chemistry, King's College, London, UK
| |
Collapse
|
14
|
Yang Y, Hsieh CY, Kang Y, Hou T, Liu H, Yao X. Deep Generation Model Guided by the Docking Score for Active Molecular Design. J Chem Inf Model 2023; 63:2983-2991. [PMID: 37163364 DOI: 10.1021/acs.jcim.3c00572] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
A deep generation model, as a novel drug design and discovery tool, shows obvious advantages in generating compounds with novel backbones and has been applied successfully in the field of drug discovery. However, it is still a challenge to generate molecules with expected properties, especially high activity. Here, to obtain compounds both with novelty and high activity to a target, we proposed a conditional molecular generation model COMG by considering the docking score and 3D pharmacophore matching during molecular generation. The proposed model was based on the conditional variational autoencoder architecture constrained by the pharmacophore matching score. During Bayesian optimization, the docking score was applied to enhance the target relevance of generated compounds. Furthermore, to overcome the problem of high structural similarity caused by Bayesian optimization, the idea of the scaffold memory unit was also introduced. The evaluation results of COMG show that our model not only can improve the structural diversity of generated molecules but also can effectively improve the proportion of target-related drug-active molecules. The obtained results indicate that our proposed model COMG is a useful drug design tool.
Collapse
Affiliation(s)
- Yuwei Yang
- Faculty of Applied Sciences, Macao Polytechnic University, Macao (SAR) 999078, P. R. China
- School of Pharmacy, Lanzhou University, Lanzhou 730000, Gansu, P. R. China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao (SAR) 999078, P. R. China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa, 999078 Macau (SAR), P. R. China
| |
Collapse
|
15
|
Thomas M, Bender A, de Graaf C. Integrating structure-based approaches in generative molecular design. Curr Opin Struct Biol 2023; 79:102559. [PMID: 36870277 DOI: 10.1016/j.sbi.2023.102559] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 01/23/2023] [Accepted: 01/31/2023] [Indexed: 03/06/2023]
Abstract
Generative molecular design for drug discovery and development has seen a recent resurgence promising to improve the efficiency of the design-make-test-analyse cycle; by computationally exploring much larger chemical spaces than traditional virtual screening techniques. However, most generative models thus far have only utilized small-molecule information to train and condition de novo molecule generators. Here, we instead focus on recent approaches that incorporate protein structure into de novo molecule optimization in an attempt to maximize the predicted on-target binding affinity of generated molecules. We summarize these structure integration principles into either distribution learning or goal-directed optimization and for each case whether the approach is protein structure-explicit or implicit with respect to the generative model. We discuss recent approaches in the context of this categorization and provide our perspective on the future direction of the field.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK. https://twitter.com/@AndreasBenderUK
| | - Chris de Graaf
- Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK. https://twitter.com/@Chris_de_Graaf
| |
Collapse
|
16
|
Danel T, Łęski J, Podlewska S, Podolak IT. Docking-based generative approaches in the search for new drug candidates. Drug Discov Today 2023; 28:103439. [PMID: 36372330 DOI: 10.1016/j.drudis.2022.103439] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/08/2022] [Accepted: 11/08/2022] [Indexed: 11/13/2022]
Abstract
Despite the popularity of virtual screening (VS) of existing compound libraries, the search for new potential drug candidates also takes advantage of generative protocols, where new compound suggestions are enumerated using various algorithms. To increase the activity potency of generative approaches, they have recently been coupled with molecular docking, a leading methodology of structure-based drug design (SBDD). In this review, we summarize progress since docking-based generative models emerged. We propose a new taxonomy for these methods and discuss their importance for the field of computer-aided drug design (CADD). In addition, we discuss the most promising directions for the further development of generative protocols coupled with docking.
Collapse
Affiliation(s)
- Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland.
| | - Jan Łęski
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Department of Medicinal Chemistry, 31-343 Kraków, Smętna Street 12, Poland
| | - Igor T Podolak
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| |
Collapse
|
17
|
Sundin I, Voronov A, Xiao H, Papadopoulos K, Bjerrum EJ, Heinonen M, Patronov A, Kaski S, Engkvist O. Human-in-the-loop assisted de novo molecular design. J Cheminform 2022; 14:86. [PMID: 36578043 PMCID: PMC9795720 DOI: 10.1186/s13321-022-00667-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 12/03/2022] [Indexed: 12/29/2022] Open
Abstract
A de novo molecular design workflow can be used together with technologies such as reinforcement learning to navigate the chemical space. A bottleneck in the workflow that remains to be solved is how to integrate human feedback in the exploration of the chemical space to optimize molecules. A human drug designer still needs to design the goal, expressed as a scoring function for the molecules that captures the designer's implicit knowledge about the optimization task. Little support for this task exists and, consequently, a chemist usually resorts to iteratively building the objective function of multi-parameter optimization (MPO) in de novo design. We propose a principled approach to use human-in-the-loop machine learning to help the chemist to adapt the MPO scoring function to better match their goal. An advantage is that the method can learn the scoring function directly from the user's feedback while they browse the output of the molecule generator, instead of the current manual tuning of the scoring function with trial and error. The proposed method uses a probabilistic model that captures the user's idea and uncertainty about the scoring function, and it uses active learning to interact with the user. We present two case studies for this: In the first use-case, the parameters of an MPO are learned, and in the second use-case a non-parametric component of the scoring function to capture human domain knowledge is developed. The results show the effectiveness of the methods in two simulated example cases with an oracle, achieving significant improvement in less than 200 feedback queries, for the goals of a high QED score and identifying potent molecules for the DRD2 receptor, respectively. We further demonstrate the performance gains with a medicinal chemist interacting with the system.
Collapse
Affiliation(s)
- Iiris Sundin
- grid.5373.20000000108389418Department of Computer Science, Aalto University, Espoo, Finland
| | - Alexey Voronov
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Haoping Xiao
- grid.5373.20000000108389418Department of Computer Science, Aalto University, Espoo, Finland
| | - Kostas Papadopoulos
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden ,Present Address: Odyssey Therapeutics, Cambridge, MA USA
| | - Esben Jannik Bjerrum
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden ,Present Address: Odyssey Therapeutics, Cambridge, MA USA
| | - Markus Heinonen
- grid.5373.20000000108389418Department of Computer Science, Aalto University, Espoo, Finland
| | - Atanas Patronov
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden ,Present Address: Odyssey Therapeutics, Cambridge, MA USA
| | - Samuel Kaski
- grid.5373.20000000108389418Department of Computer Science, Aalto University, Espoo, Finland ,grid.5379.80000000121662407Department of Computer Science, University of Manchester, Manchester, UK
| | - Ola Engkvist
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden ,grid.5371.00000 0001 0775 6028Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
18
|
Sauer S, Matter H, Hessler G, Grebner C. Optimizing interactions to protein binding sites by integrating docking-scoring strategies into generative AI methods. Front Chem 2022; 10:1012507. [PMID: 36339033 PMCID: PMC9629386 DOI: 10.3389/fchem.2022.1012507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 09/20/2022] [Indexed: 11/14/2022] Open
Abstract
The identification and optimization of promising lead molecules is essential for drug discovery. Recently, artificial intelligence (AI) based generative methods provided complementary approaches for generating molecules under specific design constraints of relevance in drug design. The goal of our study is to incorporate protein 3D information directly into generative design by flexible docking plus an adapted protein-ligand scoring function, thereby moving towards automated structure-based design. First, the protein-ligand scoring function RFXscore integrating individual scoring terms, ligand descriptors, and combined terms was derived using the PDBbind database and internal data. Next, design results for different workflows are compared to solely ligand-based reward schemes. Our newly proposed, optimal workflow for structure-based generative design is shown to produce promising results, especially for those exploration scenarios, where diverse structures fitting to a protein binding site are requested. Best results are obtained using docking followed by RFXscore, while, depending on the exact application scenario, it was also found useful to combine this approach with other metrics that bias structure generation into “drug-like” chemical space, such as target-activity machine learning models, respectively.
Collapse
|
19
|
Thomas M, O'Boyle NM, Bender A, de Graaf C. Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation. J Cheminform 2022; 14:68. [PMID: 36192789 PMCID: PMC9531503 DOI: 10.1186/s13321-022-00646-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/23/2022] [Indexed: 11/10/2022] Open
Abstract
A plethora of AI-based techniques now exists to conduct de novo molecule generation that can devise molecules conditioned towards a particular endpoint in the context of drug design. One popular approach is using reinforcement learning to update a recurrent neural network or language-based de novo molecule generator. However, reinforcement learning can be inefficient, sometimes requiring up to 105 molecules to be sampled to optimize more complex objectives, which poses a limitation when using computationally expensive scoring functions like docking or computer-aided synthesis planning models. In this work, we propose a reinforcement learning strategy called Augmented Hill-Climb based on a simple, hypothesis-driven hybrid between REINVENT and Hill-Climb that improves sample-efficiency by addressing the limitations of both currently used strategies. We compare its ability to optimize several docking tasks with REINVENT and benchmark this strategy against other commonly used reinforcement learning strategies including REINFORCE, REINVENT (version 1 and 2), Hill-Climb and best agent reminder. We find that optimization ability is improved ~ 1.5-fold and sample-efficiency is improved ~ 45-fold compared to REINVENT while still delivering appealing chemistry as output. Diversity filters were used, and their parameters were tuned to overcome observed failure modes that take advantage of certain diversity filter configurations. We find that Augmented Hill-Climb outperforms the other reinforcement learning strategies used on six tasks, especially in the early stages of training or for more difficult objectives. Lastly, we show improved performance not only on recurrent neural networks but also on a reinforcement learning stabilized transformer architecture. Overall, we show that Augmented Hill-Climb improves sample-efficiency for language-based de novo molecule generation conditioning via reinforcement learning, compared to the current state-of-the-art. This makes more computationally expensive scoring functions, such as docking, more accessible on a relevant timescale.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.
| | - Noel M O'Boyle
- Computational Chemistry, Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
| | - Chris de Graaf
- Computational Chemistry, Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK
| |
Collapse
|
20
|
García-Ortegón M, Simm GNC, Tripp AJ, Hernández-Lobato JM, Bender A, Bacallado S. DOCKSTRING: Easy Molecular Docking Yields Better Benchmarks for Ligand Design. J Chem Inf Model 2022; 62:3486-3502. [PMID: 35849793 PMCID: PMC9364321 DOI: 10.1021/acs.jcim.1c01334] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Indexed: 01/05/2023]
Abstract
The field of machine learning for drug discovery is witnessing an explosion of novel methods. These methods are often benchmarked on simple physicochemical properties such as solubility or general druglikeness, which can be readily computed. However, these properties are poor representatives of objective functions in drug design, mainly because they do not depend on the candidate compound's interaction with the target. By contrast, molecular docking is a widely applied method in drug discovery to estimate binding affinities. However, docking studies require a significant amount of domain knowledge to set up correctly, which hampers adoption. Here, we present dockstring, a bundle for meaningful and robust comparison of ML models using docking scores. dockstring consists of three components: (1) an open-source Python package for straightforward computation of docking scores, (2) an extensive dataset of docking scores and poses of more than 260,000 molecules for 58 medically relevant targets, and (3) a set of pharmaceutically relevant benchmark tasks such as virtual screening or de novo design of selective kinase inhibitors. The Python package implements a robust ligand and target preparation protocol that allows nonexperts to obtain meaningful docking scores. Our dataset is the first to include docking poses, as well as the first of its size that is a full matrix, thus facilitating experiments in multiobjective optimization and transfer learning. Overall, our results indicate that docking scores are a more realistic evaluation objective than simple physicochemical properties, yielding benchmark tasks that are more challenging and more closely related to real problems in drug discovery.
Collapse
Affiliation(s)
- Miguel García-Ortegón
- Statistical
Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Rd., Cambridge CB3 0WB, United Kingdom
| | - Gregor N. C. Simm
- Department
of Engineering, University of Cambridge, Trumpington St., Cambridge CB2 1PZ, United Kingdom
| | - Austin J. Tripp
- Department
of Engineering, University of Cambridge, Trumpington St., Cambridge CB2 1PZ, United Kingdom
| | | | - Andreas Bender
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield
Rd., Cambridge CB2 1EW, United Kingdom
| | - Sergio Bacallado
- Statistical
Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Rd., Cambridge CB3 0WB, United Kingdom
| |
Collapse
|
21
|
|