1
|
van Tilborg D, Grisoni F. Traversing chemical space with active deep learning for low-data drug discovery. NATURE COMPUTATIONAL SCIENCE 2024:10.1038/s43588-024-00697-2. [PMID: 39333789 DOI: 10.1038/s43588-024-00697-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 08/22/2024] [Indexed: 09/30/2024]
Abstract
Deep learning is accelerating drug discovery. However, current approaches are often affected by limitations in the available data, in terms of either size or molecular diversity. Active deep learning has high potential for low-data drug discovery, as it allows iterative model improvement during the screening process. However, there are several 'known unknowns' that limit the wider adoption of active deep learning in drug discovery: (1) what the best computational strategies are for chemical space exploration, (2) how active learning holds up to traditional, non-iterative, approaches and (3) how it should be used in the low-data scenarios typical of drug discovery. To provide answers, this study simulates a low-data drug discovery scenario, and systematically analyzes six active learning strategies combined with two deep learning architectures, on three large-scale molecular libraries. We identify the most important determinants of success in low-data regimes and show that active learning can achieve up to a sixfold improvement in hit discovery when compared with traditional screening methods.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | - Francesca Grisoni
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands.
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands.
| |
Collapse
|
2
|
Clark F, Robb GR, Cole DJ, Michel J. Automated Adaptive Absolute Binding Free Energy Calculations. J Chem Theory Comput 2024. [PMID: 39254715 PMCID: PMC11428140 DOI: 10.1021/acs.jctc.4c00806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Alchemical absolute binding free energy (ABFE) calculations have substantial potential in drug discovery, but are often prohibitively computationally expensive. To unlock their potential, efficient automated ABFE workflows are required to reduce both computational cost and human intervention. We present a fully automated ABFE workflow based on the automated selection of λ windows, the ensemble-based detection of equilibration, and the adaptive allocation of sampling time based on inter-replicate statistics. We find that the automated selection of intermediate states with consistent overlap is rapid, robust, and simple to implement. Robust detection of equilibration is achieved with a paired t-test between the free energy estimates at initial and final portions of a an ensemble of runs. We determine reasonable default parameters for all algorithms and show that the full workflow produces equivalent results to a nonadaptive scheme over a variety of test systems, while often accelerating equilibration. Our complete workflow is implemented in the open-source package A3FE (https://github.com/michellab/a3fe).
Collapse
Affiliation(s)
- Finlay Clark
- EaStCHEM School of Chemistry, University of Edinburgh, David Brewster Road, Edinburgh EH9 3FJ, United Kingdom
| | - Graeme R Robb
- Oncology R&D, AstraZeneca, Cambridge CB4 0WG, United Kingdom
| | - Daniel J Cole
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
| | - Julien Michel
- EaStCHEM School of Chemistry, University of Edinburgh, David Brewster Road, Edinburgh EH9 3FJ, United Kingdom
| |
Collapse
|
3
|
Loeffler HH, Wan S, Klähn M, Bhati AP, Coveney PV. Optimal Molecular Design: Generative Active Learning Combining REINVENT with Precise Binding Free Energy Ranking Simulations. J Chem Theory Comput 2024; 20. [PMID: 39225482 PMCID: PMC11428133 DOI: 10.1021/acs.jctc.4c00576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 08/08/2024] [Accepted: 08/08/2024] [Indexed: 09/04/2024]
Abstract
Active learning (AL) is a specific instance of sequential experimental design and uses machine learning to intelligently choose the next data point or batch of molecular structures to be evaluated. In this sense, it closely mimics the iterative design-make-test-analysis cycle of laboratory experiments to find optimized compounds for a given design task. Here, we describe an AL protocol which combines generative molecular AI, using REINVENT, and physics-based absolute binding free energy molecular dynamics simulation, using ESMACS, to discover new ligands for two different target proteins, 3CLpro and TNKS2. We have deployed our generative active learning (GAL) protocol on Frontier, the world's only exa-scale machine. We show that the protocol can find higher-scoring molecules compared to the baseline, a surrogate ML docking model for 3CLpro and compounds with experimentally determined binding affinities for TNKS2. The ligands found are also chemically diverse and occupy a different chemical space than the baseline. We vary the batch sizes that are put forward for free energy assessment in each GAL cycle to assess the impact on their efficiency on the GAL protocol and recommend their optimal values in different scenarios. Overall, we demonstrate a powerful capability of the combination of physics-based and AI methods which yields effective chemical space sampling at an unprecedented scale and is of immediate and direct relevance to modern, data-driven drug discovery.
Collapse
Affiliation(s)
- Hannes H. Loeffler
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Mölndal 431 83, Sweden
| | - Shunzhou Wan
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
| | - Marco Klähn
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Mölndal 431 83, Sweden
| | - Agastya P. Bhati
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
| | - Peter V. Coveney
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
- Advanced
Research Computing Centre, University College
London, London WC1H 0AJ, U.K.
- Institute
for Informatics, Faculty of Science, University
of Amsterdam, Amsterdam 1098XH, The Netherlands
| |
Collapse
|
4
|
Tibo A, He J, Janet JP, Nittinger E, Engkvist O. Exhaustive local chemical space exploration using a transformer model. Nat Commun 2024; 15:7315. [PMID: 39183239 PMCID: PMC11345417 DOI: 10.1038/s41467-024-51672-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 08/12/2024] [Indexed: 08/27/2024] Open
Abstract
How many near-neighbors does a molecule have? This fundamental question in chemistry is crucial for molecular optimization problems under the similarity principle assumption. Generative models can sample molecules from a vast chemical space but lack explicit knowledge about molecular similarity. Therefore, these models need guidance from reinforcement learning to sample a relevant similar chemical space. However, they still miss a mechanism to measure the coverage of a specific region of the chemical space. To overcome these limitations, a source-target molecular transformer model, regularized via a similarity kernel function, is proposed. Trained on a largest dataset of ≥200 billion molecular pairs, the model enforces a direct relationship between generating a target molecule and its similarity to a source molecule. Results indicate that the regularization term significantly improves the correlation between generation probability and molecular similarity, enabling exhaustive exploration of molecule near-neighborhoods.
Collapse
Affiliation(s)
- Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| | - Jiazhen He
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Eva Nittinger
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D AstraZeneca, Gothenburg, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
- Data Science and AI, Computer Science and Engineering, Chalmers, Gothenburg, Sweden
| |
Collapse
|
5
|
Crivelli-Decker J, Beckwith Z, Tom G, Le L, Khuttan S, Salomon-Ferrer R, Beall J, Gómez-Bombarelli R, Bortolato A. Machine Learning Guided AQFEP: A Fast and Efficient Absolute Free Energy Perturbation Solution for Virtual Screening. J Chem Theory Comput 2024; 20. [PMID: 39146234 PMCID: PMC11360131 DOI: 10.1021/acs.jctc.4c00399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 07/25/2024] [Accepted: 07/29/2024] [Indexed: 08/17/2024]
Abstract
Structure-based methods in drug discovery have become an integral part of the modern drug discovery process. The power of virtual screening lies in its ability to rapidly and cost-effectively explore enormous chemical spaces to select promising ligands for further experimental investigation. Relative free energy perturbation (RFEP) and similar methods are the gold standard for binding affinity prediction in drug discovery hit-to-lead and lead optimization phases, but have high computational cost and the requirement of a structural analog with a known activity. Without a reference molecule requirement, absolute FEP (AFEP) has, in theory, better accuracy for hit ID, but in practice, the slow throughput is not compatible with VS, where fast docking and unreliable scoring functions are still the standard. Here, we present an integrated workflow to virtually screen large and diverse chemical libraries efficiently, combining active learning with a physics-based scoring function based on a fast absolute free energy perturbation method. We validated the performance of the approach in the ranking of structurally related ligands, virtual screening hit rate enrichment, and active learning chemical space exploration; disclosing the largest reported collection of free energy simulations to date.
Collapse
Affiliation(s)
| | - Zane Beckwith
- SandboxAQ, Palo Alto, California 94301, United States
| | - Gary Tom
- SandboxAQ, Palo Alto, California 94301, United States
- Department
of Chemistry and Department of Computer Science, University of Toronto, Toronto, ON M5S 3H6, Canada
- Vector
Institute for Artificial Intelligence, Toronto, ON M5S
3H6, Canada
| | - Ly Le
- SandboxAQ, Palo Alto, California 94301, United States
| | - Sheenam Khuttan
- SandboxAQ, Palo Alto, California 94301, United States
- Department
of Chemistry, Brooklyn College of the City
University of New York, Brooklyn, New York 11367, United States
| | | | - Jackson Beall
- SandboxAQ, Palo Alto, California 94301, United States
| | - Rafael Gómez-Bombarelli
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
| | | |
Collapse
|
6
|
Sankaranarayanan K, Jensen KF. Similarity based functionalization for enumeration of synthetically plausible chemical libraries surrounding a target. Chem Sci 2024; 15:10221-10231. [PMID: 38966353 PMCID: PMC11220589 DOI: 10.1039/d4sc00523f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 05/22/2024] [Indexed: 07/06/2024] Open
Abstract
Functionalization of lead compounds to create analogs is a challenging step in discovering new molecules with desired properties and it is conducted throughout the chemical industry, including pharmaceuticals and agrochemicals. The process can be time-consuming and expensive, requiring expert intuition and experience. To help address synthesis planning challenges in late-stage functionalization, we have developed a molecular similarity approach that proposes single-step functionalization reactions based on analogy to precedent reactions. The developed approach mimics reaction strategies and suggests co-reactants defined implicitly by a corpus of known reactions. Using ca. 348 k reactions from the patent literature as a knowledge base, the recorded products or close analogs are among the top 20 proposed products in 74% of ∼44 k test reactions. The combinatorial growth inherent in recursive applications of the tool allows the enumeration of chemical libraries surrounding a target compound of interest. Moreover, each step of the resulting library synthesis leverages common chemical transformations reported in the literature accessible to most chemists.
Collapse
Affiliation(s)
- Karthik Sankaranarayanan
- Department of Agriculture and Biological Engineering, Purdue University West Lafayette Indiana 47907 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge Massachusetts 02139 USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge Massachusetts 02139 USA
| |
Collapse
|
7
|
Carlino L, Astles PC, Ackroyd B, Ahmed A, Chan C, Collie GW, Dale IL, O'Donovan DH, Fawcett C, di Fruscia P, Gohlke A, Guo X, Hao-Ru Hsu J, Kaplan B, Milbradt AG, Northall S, Petrović D, Rivers EL, Underwood E, Webb A. Identification of Novel Potent NSD2-PWWP1 Ligands Using Structure-Based Design and Computational Approaches. J Med Chem 2024; 67:8962-8987. [PMID: 38748070 DOI: 10.1021/acs.jmedchem.4c00215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]
Abstract
Dysregulation of histone methyl transferase nuclear receptor-binding SET domain 2 (NSD2) has been implicated in several hematological and solid malignancies. NSD2 is a large multidomain protein that carries histone writing and histone reading functions. To date, identifying inhibitors of the enzymatic activity of NSD2 has proven challenging in terms of potency and SET domain selectivity. Inhibition of the NSD2-PWWP1 domain using small molecules has been considered as an alternative approach to reduce NSD2-unregulated activity. In this article, we present novel computational chemistry approaches, encompassing free energy perturbation coupled to machine learning (FEP/ML) models as well as virtual screening (VS) activities, to identify high-affinity NSD2 PWWP1 binders. Through these activities, we have identified the most potent NSD2-PWWP1 binder reported so far in the literature: compound 34 (pIC50 = 8.2). The compounds identified herein represent useful tools for studying the role of PWWP1 domains for inhibition of human NSD2.
Collapse
Affiliation(s)
- Luca Carlino
- Oncology R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Peter C Astles
- Oncology R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Bryony Ackroyd
- Discovery Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, U.K
| | - Afshan Ahmed
- Discovery Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, U.K
| | - Christina Chan
- Oncology R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Gavin W Collie
- Mechanistic and Structural Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, U.K
| | - Ian L Dale
- Oncology R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Daniel H O'Donovan
- Oncology R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Caroline Fawcett
- Oncology R&D, AstraZeneca, Waltham, Massachusetts 02451, United States
| | - Paolo di Fruscia
- Oncology R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Andrea Gohlke
- Mechanistic and Structural Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, U.K
| | - Xiaoxiao Guo
- Mechanistic and Structural Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, U.K
| | - Jessie Hao-Ru Hsu
- Oncology R&D, AstraZeneca, Waltham, Massachusetts 02451, United States
| | - Bethany Kaplan
- Oncology R&D, AstraZeneca, Waltham, Massachusetts 02451, United States
| | - Alexander G Milbradt
- Mechanistic and Structural Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, U.K
| | - Sarah Northall
- Mechanistic and Structural Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, U.K
| | - Dušan Petrović
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 50, Sweden
| | - Emma L Rivers
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, U.K
| | - Elizabeth Underwood
- Discovery Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, U.K
| | - Alice Webb
- Mechanistic and Structural Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, U.K
| |
Collapse
|
8
|
Retchin M, Wang Y, Takaba K, Chodera JD. DrugGym: A testbed for the economics of autonomous drug discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.28.596296. [PMID: 38854082 PMCID: PMC11160604 DOI: 10.1101/2024.05.28.596296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Drug discovery is stochastic. The effectiveness of candidate compounds in satisfying design objectives is unknown ahead of time, and the tools used for prioritization-predictive models and assays-are inaccurate and noisy. In a typical discovery campaign, thousands of compounds may be synthesized and tested before design objectives are achieved, with many others ideated but deprioritized. These challenges are well-documented, but assessing potential remedies has been difficult. We introduce DrugGym, a framework for modeling the stochastic process of drug discovery. Emulating biochemical assays with realistic surrogate models, we simulate the progression from weak hits to sub-micromolar leads with viable ADME. We use this testbed to examine how different ideation, scoring, and decision-making strategies impact statistical measures of utility, such as the probability of program success within predefined budgets and the expected costs to achieve target candidate profile (TCP) goals. We also assess the influence of affinity model inaccuracy, chemical creativity, batch size, and multi-step reasoning. Our findings suggest that reducing affinity model inaccuracy from 2 to 0.5 pIC50 units improves budget-constrained success rates tenfold. DrugGym represents a realistic testbed for machine learning methods applied to the hit-to-lead phase. Source code is available at www.drug-gym.org.
Collapse
Affiliation(s)
- Michael Retchin
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell University, New York, NY 10065
| | - Yuanqing Wang
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
- Simons Center for Computational Chemistry and Center for Data Science, New York University, New York, NY 10004
| | - Kenichiro Takaba
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
- Pharmaceutical Research Center, Advanced Drug Discovery, Asahi Kasei Pharma Corporation, Shizuoka 410-2321, Japan
| | - John D. Chodera
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell University, New York, NY 10065
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
| |
Collapse
|
9
|
Krishnan SR, Bung N, Srinivasan R, Roy A. Target-specific novel molecules with their recipe: Incorporating synthesizability in the design process. J Mol Graph Model 2024; 129:108734. [PMID: 38442440 DOI: 10.1016/j.jmgm.2024.108734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 02/14/2024] [Accepted: 02/15/2024] [Indexed: 03/07/2024]
Abstract
Application of Artificial intelligence (AI) in drug discovery has led to several success stories in recent times. While traditional methods mostly relied upon screening large chemical libraries for early-stage drug-design, de novo design can help identify novel target-specific molecules by sampling from a much larger chemical space. Although this has increased the possibility of finding diverse and novel molecules from previously unexplored chemical space, this has also posed a great challenge for medicinal chemists to synthesize at least some of the de novo designed novel molecules for experimental validation. To address this challenge, in this work, we propose a novel forward synthesis-based generative AI method, which is used to explore the synthesizable chemical space. The method uses a structure-based drug design framework, where the target protein structure and a target-specific seed fragment from co-crystal structures can be the initial inputs. A random fragment from a purchasable fragment library can also be the input if a target-specific fragment is unavailable. Then a template-based forward synthesis route prediction and molecule generation is performed in parallel using the Monte Carlo Tree Search (MCTS) method where, the subsequent fragments for molecule growth can again be obtained from a purchasable fragment library. The rewards for each iteration of MCTS are computed using a drug-target affinity (DTA) model based on the docking pose of the generated reaction intermediates at the binding site of the target protein of interest. With the help of the proposed method, it is now possible to overcome one of the major obstacles posed to the AI-based drug design approaches through the ability of the method to design novel target-specific synthesizable molecules.
Collapse
Affiliation(s)
| | - Navneet Bung
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Rajgopal Srinivasan
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Arijit Roy
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India.
| |
Collapse
|
10
|
Vo HHN, Phung THT, Chung KL, Vu TY. Precise cuts for tailoring chromene-phenyl COX inhibitors with Ligand Designer. J Mol Graph Model 2024; 129:108747. [PMID: 38447296 DOI: 10.1016/j.jmgm.2024.108747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 12/04/2023] [Accepted: 02/20/2024] [Indexed: 03/08/2024]
Abstract
Cyclooxygenases 1 and 2 (COX-1/2) are enzymes renowned for inducing inflammatory responses through the production of prostaglandins. Thus, the development of COX inhibitors has been a promising approach for identifying compounds with anti-inflammatory potential. In this study, we designed 27 new compounds (1-27) based on the structure of a previously known COX inhibitor, using the Ligand Designer tool. Our aim was to improve the affinity of the compounds with COX enzymes by inducing interactions with residue Arg120 while retaining the good π-π stacking interactions of the chromene-phenyl scaffold. Through screening based on ligand-binding free energy defined by molecular docking simulations and MM/GBSA technique, compounds 9 and 10 were identified as having the highest ability to inhibit COX proteins. The binding affinities of the two compounds with COX-1/2 were superior to those of the original NAI10 compound and the reference drug indomethacin. Our virtual screening suggests that compounds 9 and 10 have a strong ability to inhibit COX-1/2 and thus could be promising candidates for further anti-inflammatory drug studies. In essence, our study underscores the pivotal role of the N-aryl iminocoumarin scaffold in shaping the future landscape of novel anti-inflammatory drug development.
Collapse
Affiliation(s)
| | - Thu Huong Thi Phung
- NTT Hi-Tech Institute, Nguyen Tat Thanh University, Ho Chi Minh City, Viet Nam
| | - Khanh Linh Chung
- Faculty of Pharmacy, Ton Duc Thang University, Ho Chi Minh City, Viet Nam
| | - Thien Y Vu
- Faculty of Pharmacy, Ton Duc Thang University, Ho Chi Minh City, Viet Nam.
| |
Collapse
|
11
|
Wang L, Zhou Z, Yang X, Shi S, Zeng X, Cao D. The present state and challenges of active learning in drug discovery. Drug Discov Today 2024; 29:103985. [PMID: 38642700 DOI: 10.1016/j.drudis.2024.103985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 04/08/2024] [Accepted: 04/15/2024] [Indexed: 04/22/2024]
Abstract
Active learning (AL) is an iterative feedback process that efficiently identifies valuable data within vast chemical space, even with limited labeled data. This characteristic renders it a valuable approach to tackle the ongoing challenges faced in drug discovery, such as the ever-expanding explore space and the limitations of labeled data. Consequently, AL is increasingly gaining prominence in the field of drug development. In this paper, we comprehensively review the application of AL at all stages of drug discovery, including compounds-target interaction prediction, virtual screening, molecular generation and optimization, as well as molecular properties prediction. Additionally, we discuss the challenges and prospects associated with the current applications of AL in drug discovery.
Collapse
Affiliation(s)
- Lei Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China
| | - Zhenran Zhou
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China
| | - Xixi Yang
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China
| | - Shaohua Shi
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| | - Xiangxiang Zeng
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China.
| |
Collapse
|
12
|
Burger PB, Hu X, Balabin I, Muller M, Stanley M, Joubert F, Kaiser TM. FEP Augmentation as a Means to Solve Data Paucity Problems for Machine Learning in Chemical Biology. J Chem Inf Model 2024; 64:3812-3825. [PMID: 38651738 PMCID: PMC11094716 DOI: 10.1021/acs.jcim.4c00071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 04/01/2024] [Accepted: 04/02/2024] [Indexed: 04/25/2024]
Abstract
In the realm of medicinal chemistry, the primary objective is to swiftly optimize a multitude of chemical properties of a set of compounds to yield a clinical candidate poised for clinical trials. In recent years, two computational techniques, machine learning (ML) and physics-based methods, have evolved substantially and are now frequently incorporated into the medicinal chemist's toolbox to enhance the efficiency of both hit optimization and candidate design. Both computational methods come with their own set of limitations, and they are often used independently of each other. ML's capability to screen extensive compound libraries expediently is tempered by its reliance on quality data, which can be scarce especially during early-stage optimization. Contrarily, physics-based approaches like free energy perturbation (FEP) are frequently constrained by low throughput and high cost by comparison; however, physics-based methods are capable of making highly accurate binding affinity predictions. In this study, we harnessed the strength of FEP to overcome data paucity in ML by generating virtual activity data sets which then inform the training of algorithms. Here, we show that ML algorithms trained with an FEP-augmented data set could achieve comparable predictive accuracy to data sets trained on experimental data from biological assays. Throughout the paper, we emphasize key mechanistic considerations that must be taken into account when aiming to augment data sets and lay the groundwork for successful implementation. Ultimately, the study advocates for the synergy of physics-based methods and ML to expedite the lead optimization process. We believe that the physics-based augmentation of ML will significantly benefit drug discovery, as these techniques continue to evolve.
Collapse
Affiliation(s)
- Pieter B. Burger
- Avicenna
Biosciences Inc., 101
W. Chapel Hill Street, Suite 210, Durham, North Carolina 27001, United States
| | - Xiaohu Hu
- Schrödinger,
Inc., 120 West 45th Street, New York, New York 10036, United States
| | - Ilya Balabin
- Avicenna
Biosciences Inc., 101
W. Chapel Hill Street, Suite 210, Durham, North Carolina 27001, United States
| | - Morné Muller
- Avicenna
Biosciences Inc., 101
W. Chapel Hill Street, Suite 210, Durham, North Carolina 27001, United States
| | - Megan Stanley
- Microsoft
Research AI4Science, 21 Station Road, Cambridge CB1 2FB, U.K.
| | - Fourie Joubert
- Centre
for Bioinformatics and Computational Biology, Department of Biochemistry,
Genetics and Microbiology, University of
Pretoria, Pretoria 0001, South Africa
| | - Thomas M. Kaiser
- Avicenna
Biosciences Inc., 101
W. Chapel Hill Street, Suite 210, Durham, North Carolina 27001, United States
| |
Collapse
|
13
|
Kim H, Lee K, Kim C, Lim J, Kim WY. DFRscore: Deep Learning-Based Scoring of Synthetic Complexity with Drug-Focused Retrosynthetic Analysis for High-Throughput Virtual Screening. J Chem Inf Model 2024; 64:2432-2444. [PMID: 37651152 DOI: 10.1021/acs.jcim.3c01134] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Recently emerging generative AI models enable us to produce a vast number of compounds for potential applications. While they can provide novel molecular structures, the synthetic feasibility of the generated molecules is often questioned. To address this issue, a few recent studies have attempted to use deep learning models to estimate the synthetic accessibility of many molecules rapidly. However, retrosynthetic analysis tools used to train the models rely on reaction templates automatically extracted from a large reaction database that are not domain-specific and may exhibit low chemical correctness. To overcome this limitation, we introduce DFRscore (Drug-Focused Retrosynthetic score), a deep learning-based approach for a more practical assessment of synthetic accessibility in drug discovery. The DFRscore model is trained exclusively on drug-focused reactions, providing a predicted number of minimally required synthetic steps for each compound. This approach enables practitioners to filter out compounds that do not meet their desired level of synthetic accessibility at an early stage of high-throughput virtual screening for accelerated drug discovery. The proposed strategy can be easily adapted to other domains by adjusting the synthesis planning setup of the reaction templates and starting materials.
Collapse
Affiliation(s)
- Hyeongwoo Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Kyunghoon Lee
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Chansu Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Jaechang Lim
- HITS Incorporation, 124 Teheran-ro, Gangnam-gu, Seoul 06234, Republic of Korea
| | - Woo Youn Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
- HITS Incorporation, 124 Teheran-ro, Gangnam-gu, Seoul 06234, Republic of Korea
- AI Institute, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| |
Collapse
|
14
|
Gorantla R, Kubincová A, Suutari B, Cossins BP, Mey ASJS. Benchmarking Active Learning Protocols for Ligand-Binding Affinity Prediction. J Chem Inf Model 2024; 64:1955-1965. [PMID: 38446131 PMCID: PMC10966646 DOI: 10.1021/acs.jcim.4c00220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 02/23/2024] [Indexed: 03/07/2024]
Abstract
Active learning (AL) has become a powerful tool in computational drug discovery, enabling the identification of top binders from vast molecular libraries. To design a robust AL protocol, it is important to understand the influence of AL parameters, as well as the features of the data sets on the outcomes. We use four affinity data sets for different targets (TYK2, USP7, D2R, Mpro) to systematically evaluate the performance of machine learning models [Gaussian process (GP) model and Chemprop model], sample selection protocols, and the batch size based on metrics describing the overall predictive power of the model (R2, Spearman rank, root-mean-square error) as well as the accurate identification of top 2%/5% binders (Recall, F1 score). Both models have a comparable Recall of top binders on large data sets, but the GP model surpasses the Chemprop model when training data are sparse. A larger initial batch size, especially on diverse data sets, increased the Recall of both models as well as overall correlation metrics. However, for subsequent cycles, smaller batch sizes of 20 or 30 compounds proved to be desirable. Furthermore, adding artificial Gaussian noise to the data up to a certain threshold still allowed the model to identify clusters with top-scoring compounds. However, excessive noise (<1σ) did impact the model's predictive and exploitative capabilities.
Collapse
Affiliation(s)
- Rohan Gorantla
- School
of Informatics, University of Edinburgh, Edinburgh EH8 9AB, U.K.
- EaStCHEM
School of Chemistry, University of Edinburgh, Edinburgh EH9 3FJ, U.K.
- Exscientia, Schrödinger Building, Oxford OX4 4GE, U.K.
| | | | | | | | | |
Collapse
|
15
|
Amezcua M, Setiadi J, Mobley DL. The SAMPL9 host-guest blind challenge: an overview of binding free energy predictive accuracy. Phys Chem Chem Phys 2024; 26:9207-9225. [PMID: 38444308 PMCID: PMC10954238 DOI: 10.1039/d3cp05111k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 02/03/2024] [Indexed: 03/07/2024]
Abstract
We report the results of the SAMPL9 host-guest blind challenge for predicting binding free energies. The challenge focused on macrocycles from pillar[n]-arene and cyclodextrin host families, including WP6, and bCD and HbCD. A variety of methods were used by participants to submit binding free energy predictions. A machine learning approach based on molecular descriptors achieved the highest accuracy (RMSE of 2.04 kcal mol-1) among the ranked methods in the WP6 dataset. Interestingly, predictions for WP6 obtained via docking tended to outperform all methods (RMSE of 1.70 kcal mol-1), most of which are MD based and computationally more expensive. In general, methods applying force fields achieved better correlation with experiments for WP6 opposed to the machine learning and docking models. In the cyclodextrin-phenothiazine challenge, the ATM approach emerged as the top performing method with RMSE less than 1.86 kcal mol-1. Correlation metrics of ranked methods in this dataset were relatively poor compared to WP6. We also highlight several lessons learned to guide future work and help improve studies on the systems discussed. For example, WP6 may be present in other microstates other than its -12 state in the presence of certain guests. Machine learning approaches can be used to fine tune or help train force fields for certain chemistry (i.e. WP6-G4). Certain phenothiazines occupy distinct primary and secondary orientations, some of which were considered individually for accurate binding free energies. The accuracy of predictions from certain methods while starting from a single binding pose/orientation demonstrates the sensitivity of calculated binding free energies to the orientation, and in some cases the likely dominant orientation for the system. Computational and experimental results suggest that guest phenothiazine core traverses both the secondary and primary faces of the cyclodextrin hosts, a bulky cationic side chain will primarily occupy the primary face, and the phenothiazine core substituent resides at the larger secondary face.
Collapse
Affiliation(s)
- Martin Amezcua
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, California 92697, USA.
| | - Jeffry Setiadi
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California 92093, USA
| | - David L Mobley
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, California 92697, USA.
- Department of Chemistry, University of California, Irvine, Irvine, California 92697, USA
| |
Collapse
|
16
|
Dodds M, Guo J, Löhr T, Tibo A, Engkvist O, Janet JP. Sample efficient reinforcement learning with active learning for molecular design. Chem Sci 2024; 15:4146-4160. [PMID: 38487235 PMCID: PMC10935729 DOI: 10.1039/d3sc04653b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 02/07/2024] [Indexed: 03/17/2024] Open
Abstract
Reinforcement learning (RL) is a powerful and flexible paradigm for searching for solutions in high-dimensional action spaces. However, bridging the gap between playing computer games with thousands of simulated episodes and solving real scientific problems with complex and involved environments (up to actual laboratory experiments) requires improvements in terms of sample efficiency to make the most of expensive information. The discovery of new drugs is a major commercial application of RL, motivated by the very large nature of the chemical space and the need to perform multiparameter optimization (MPO) across different properties. In silico methods, such as virtual library screening (VS) and de novo molecular generation with RL, show great promise in accelerating this search. However, incorporation of increasingly complex computational models in these workflows requires increasing sample efficiency. Here, we introduce an active learning system linked with an RL model (RL-AL) for molecular design, which aims to improve the sample-efficiency of the optimization process. We identity and characterize unique challenges combining RL and AL, investigate the interplay between the systems, and develop a novel AL approach to solve the MPO problem. Our approach greatly expedites the search for novel solutions relative to baseline-RL for simple ligand- and structure-based oracle functions, with a 5-66-fold increase in hits generated for a fixed oracle budget and a 4-64-fold reduction in computational time to find a specific number of hits. Furthermore, compounds discovered through RL-AL display substantial enrichment of a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds, without a reduction in output diversity. This significant acceleration improves the feasibility of oracle functions that have largely been overlooked in RL due to high computational costs, for example free energy perturbation methods, and in principle is applicable to any RL domain.
Collapse
Affiliation(s)
- Michael Dodds
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jeff Guo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Thomas Löhr
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| |
Collapse
|
17
|
Kyro GW, Morgunov A, Brent RI, Batista VS. ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation. J Chem Inf Model 2024; 64:653-665. [PMID: 38287889 DOI: 10.1021/acs.jcim.3c01456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology and demonstrate its applicability to targeted molecular generation. When applied to c-Abl kinase, a protein with FDA-approved small-molecule inhibitors, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. To facilitate implementation and reproducibility, we made all of our software available through the open-source ChemSpaceAL Python package.
Collapse
Affiliation(s)
- Gregory W Kyro
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Anton Morgunov
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Rafael I Brent
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Victor S Batista
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| |
Collapse
|
18
|
Zheng L, Shi F, Peng C, Xu M, Fan F, Li Y, Zhang L, Du J, Wang Z, Lin Z, Sun Y, Deng C, Duan X, Wei L, Zhao C, Fang L, Zhang P, Ma S, Lai L, Yang M. Application scenario-oriented molecule generation platform developed for drug discovery. Methods 2024; 222:112-121. [PMID: 38215898 DOI: 10.1016/j.ymeth.2023.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 11/22/2023] [Accepted: 12/23/2023] [Indexed: 01/14/2024] Open
Abstract
Design of molecules for candidate compound selection is one of the central challenges in drug discovery due to the complexity of chemical space and requirement of multi-parameter optimization. Here we present an application scenario-oriented platform (ID4Idea) for molecule generation in different scenarios of drug discovery. This platform utilizes both library or rule based and generative based algorithms (VAE, RNN, GAN, etc.), in combination with various AI learning types (pre-training, transfer learning, reinforcement learning, active learning, etc.) and input representations (1D SMILES, 2D graph, 3D shape, binding site, pharmacophore, etc.), to enable customized solutions for a given molecular design scenario. Besides the usual generation followed screening protocol, goal-directed molecule generation can also be conducted towards predefined goals, enhancing the efficiency of hit identification, lead finding, and lead optimization. We demonstrate the effectiveness of ID4Idea platform through case studies, showcasing customized solutions for different design tasks using various input information, such as binding pockets, pharmacophores, and compound representations. In addition, remaining challenges are discussed to unlock the full potential of AI models in drug discovery and pave the way for the development of novel therapeutics.
Collapse
Affiliation(s)
- Lianjun Zheng
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Fangjun Shi
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Chunwang Peng
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Min Xu
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Fangda Fan
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Yuanpeng Li
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Lin Zhang
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Jiewen Du
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Zonghu Wang
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Zhixiong Lin
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Yina Sun
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Chenglong Deng
- Jingtai Zhiyao Technology (Shanghai) Co., Ltd. (XtalPi), No. 207 Huanqiao Road, Pudong New Area, Shanghai 201315, China
| | - Xinli Duan
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Lin Wei
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | | | - Lei Fang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Peiyu Zhang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Songling Ma
- XtalPi Innovation Center, XtalPi Inc., Beijing, China.
| | - Lipeng Lai
- XtalPi Innovation Center, XtalPi Inc., Beijing, China.
| | - Mingjun Yang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China.
| |
Collapse
|
19
|
Papadourakis M, Sinenka H, Matricon P, Hénin J, Brannigan G, Pérez-Benito L, Pande V, van Vlijmen H, de Graaf C, Deflorian F, Tresadern G, Cecchini M, Cournia Z. Alchemical Free Energy Calculations on Membrane-Associated Proteins. J Chem Theory Comput 2023; 19:7437-7458. [PMID: 37902715 PMCID: PMC11017255 DOI: 10.1021/acs.jctc.3c00365] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Indexed: 10/31/2023]
Abstract
Membrane proteins have diverse functions within cells and are well-established drug targets. The advances in membrane protein structural biology have revealed drug and lipid binding sites on membrane proteins, while computational methods such as molecular simulations can resolve the thermodynamic basis of these interactions. Particularly, alchemical free energy calculations have shown promise in the calculation of reliable and reproducible binding free energies of protein-ligand and protein-lipid complexes in membrane-associated systems. In this review, we present an overview of representative alchemical free energy studies on G-protein-coupled receptors, ion channels, transporters as well as protein-lipid interactions, with emphasis on best practices and critical aspects of running these simulations. Additionally, we analyze challenges and successes when running alchemical free energy calculations on membrane-associated proteins. Finally, we highlight the value of alchemical free energy calculations calculations in drug discovery and their applicability in the pharmaceutical industry.
Collapse
Affiliation(s)
- Michail Papadourakis
- Biomedical
Research Foundation, Academy of Athens, 4 Soranou Ephessiou, 11527 Athens, Greece
| | - Hryhory Sinenka
- Institut
de Chimie de Strasbourg, UMR7177, CNRS, Université de Strasbourg, F-67083 Strasbourg Cedex, France
| | - Pierre Matricon
- Sosei
Heptares, Steinmetz Building,
Granta Park, Great Abington, Cambridge CB21 6DG, United
Kingdom
| | - Jérôme Hénin
- Laboratoire
de Biochimie Théorique UPR 9080, CNRS and Université Paris Cité, 75005 Paris, France
| | - Grace Brannigan
- Center
for Computational and Integrative Biology, Rutgers University−Camden, Camden, New Jersey 08103, United States of America
- Department
of Physics, Rutgers University−Camden, Camden, New Jersey 08102, United States
of America
| | - Laura Pérez-Benito
- CADD,
In Silico Discovery, Janssen Research &
Development, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Vineet Pande
- CADD,
In Silico Discovery, Janssen Research &
Development, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Herman van Vlijmen
- CADD,
In Silico Discovery, Janssen Research &
Development, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Chris de Graaf
- Sosei
Heptares, Steinmetz Building,
Granta Park, Great Abington, Cambridge CB21 6DG, United
Kingdom
| | - Francesca Deflorian
- Sosei
Heptares, Steinmetz Building,
Granta Park, Great Abington, Cambridge CB21 6DG, United
Kingdom
| | - Gary Tresadern
- CADD,
In Silico Discovery, Janssen Research &
Development, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Marco Cecchini
- Institut
de Chimie de Strasbourg, UMR7177, CNRS, Université de Strasbourg, F-67083 Strasbourg Cedex, France
| | - Zoe Cournia
- Biomedical
Research Foundation, Academy of Athens, 4 Soranou Ephessiou, 11527 Athens, Greece
| |
Collapse
|
20
|
Kayes MR, Saha S, Alanazi MM, Ozeki Y, Pal D, Hadda TB, Legssyer A, Kawsar SM. Macromolecules: Synthesis, antimicrobial, POM analysis and computational approaches of some glucoside derivatives bearing acyl moieties. Saudi Pharm J 2023; 31:101804. [PMID: 37868643 PMCID: PMC10585311 DOI: 10.1016/j.jsps.2023.101804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 09/27/2023] [Indexed: 10/24/2023] Open
Abstract
Macromolecules i.e., carbohydrate derivatives are crucial to biochemical and medical research. Herein, we designed and synthesized eight methyl α-D-glucopyranoside (MGP) derivatives (2-8) in good yields following the regioselective direct acylation method. The structural configurations of the synthesized MGP derivatives were analyzed and verified using multiple physicochemical and spectroscopic techniques. Antimicrobial experiments revealed that almost all derivatives demonstrated noticeable antifungal and antibacterial efficacy. The synthesized derivatives showed minimum inhibitory concentration (MIC) values ranging from 0.75 µg/mL to 1.50 µg/mL and minimum bactericidal concentrations (MBCs) ranging from 8.00 µg/mL to 16.00 µg/mL. Compound 6 inhibited Ehrlich ascites carcinoma (EAC) cell proliferation by 10.36% with an IC50 of 2602.23 μg/mL in the MTT colorimetric assay. The obtained results were further rationalized by docking analysis of the synthesized derivatives against 4URO and 4XE3 receptors to explore the binding affinities and nonbonding interactions of MGP derivatives with target proteins. Compound 6 demonstrated the potential to bind with the target with the highest binding energy. In a stimulating environment, a molecular dynamics study showed that MGP derivatives have a stable conformation and binding pattern. The MGP derivatives were examined using POM (Petra/Osiris/Molinspiration) bioinformatics, and as a result, these derivatives showed good toxicity, bioavailability, and pharmacokinetics. Various antifungal/antiviral pharmacophore (Oδ-, O'δ-) sites were identified by using POM investigations, and compound 6 was further tested against other pathogenic fungi and viruses, such as Micron and Delta mutants of SARS-CoV-2.
Collapse
Affiliation(s)
- Mohammad R. Kayes
- Laboratory of Carbohydrate and Nucleoside Chemistry, Department of Chemistry, Faculty of Science, University of Chittagong, Chittagong 4331, Bangladesh
| | - Supriyo Saha
- Uttaranchal Institute of Pharmaceutical Sciences, Uttaranchal University, Dehradun, Uttarakhand 248007, India
| | - Mohammed M. Alanazi
- Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, P.O. Box 2457, Riyadh 11451, Saudi Arabia
| | - Yasuhiro Ozeki
- School of Sciences, Yokohama City University, 22-2, Seto, Kanazawa-Ku, Yokohama 236-0027, Japan
| | - Dilipkumar Pal
- Department of Pharmaceutical Sciences, Guru Ghasidas Vishwavidyalaya (A Central University), C.G, 495009 Bilaspur, India
| | - Taibi B. Hadda
- BBEH and LACE Laboratories of Applied Chemistry & Environment, Faculty of Sciences, Mohammed Premier University, MB 524, 60000 Oujda, Morocco
| | - Abdelkhaleq Legssyer
- BBEH and LACE Laboratories of Applied Chemistry & Environment, Faculty of Sciences, Mohammed Premier University, MB 524, 60000 Oujda, Morocco
| | - Sarkar M.A. Kawsar
- Laboratory of Carbohydrate and Nucleoside Chemistry, Department of Chemistry, Faculty of Science, University of Chittagong, Chittagong 4331, Bangladesh
| |
Collapse
|
21
|
Pereira TO, Abbasi M, Arrais JP. Enhancing reinforcement learning for de novo molecular design applying self-attention mechanisms. Brief Bioinform 2023; 24:bbad368. [PMID: 37903414 DOI: 10.1093/bib/bbad368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 09/04/2023] [Accepted: 09/26/2023] [Indexed: 11/01/2023] Open
Abstract
The drug discovery process can be significantly improved by applying deep reinforcement learning (RL) methods that learn to generate compounds with desired pharmacological properties. Nevertheless, RL-based methods typically condense the evaluation of sampled compounds into a single scalar value, making it difficult for the generative agent to learn the optimal policy. This work combines self-attention mechanisms and RL to generate promising molecules. The idea is to evaluate the relative significance of each atom and functional group in their interaction with the target, and to utilize this information for optimizing the Generator. Therefore, the framework for de novo drug design is composed of a Generator that samples new compounds combined with a Transformer-encoder and a biological affinity Predictor that evaluate the generated structures. Moreover, it takes the advantage of the knowledge encapsulated in the Transformer's attention weights to evaluate each token individually. We compared the performance of two output prediction strategies for the Transformer: standard and masked language model (MLM). The results show that the MLM Transformer is more effective in optimizing the Generator compared with the state-of-the-art works. Additionally, the evaluation models identified the most important regions of each molecule for the biological interaction with the target. As a case study, we generated synthesizable hit compounds that can be putative inhibitors of the enzyme ubiquitin-specific protein 7 (USP7).
Collapse
Affiliation(s)
- Tiago O Pereira
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Univ Coimbra, Coimbra, Portugal
| | - Maryam Abbasi
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Univ Coimbra, Coimbra, Portugal
| | - Joel P Arrais
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Univ Coimbra, Coimbra, Portugal
| |
Collapse
|
22
|
Lotfi B, Mebarka O, Alhatlani BY, Abdallah EM, Kawsar SMA. Pharmacoinformatics and Breed-Based De Novo Hybridization Studies to Develop New Neuraminidase Inhibitors as Potential Anti-Influenza Agents. Molecules 2023; 28:6678. [PMID: 37764457 PMCID: PMC10534564 DOI: 10.3390/molecules28186678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 09/09/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023] Open
Abstract
Influenza represents a profoundly transmissible viral ailment primarily afflicting the respiratory system. Neuraminidase inhibitors constitute a class of antiviral therapeutics employed in the management of influenza. These inhibitors impede the liberation of the viral neuraminidase protein, thereby impeding viral dissemination from the infected cell to host cells. As such, neuraminidase has emerged as a pivotal target for mitigating influenza and its associated complications. Here, we apply a de novo hybridization approach based on a breed-centric methodology to elucidate novel neuraminidase inhibitors. The breed technique amalgamates established ligand frameworks with the shared target, neuraminidase, resulting in innovative inhibitor constructs. Molecular docking analysis revealed that the seven synthesized breed molecules (designated Breeds 1-7) formed more robust complexes with the neuraminidase receptor than conventional clinical neuraminidase inhibitors such as zanamivir, oseltamivir, and peramivir. Pharmacokinetic evaluations of the seven breed molecules (Breeds 1-7) demonstrated favorable bioavailability and optimal permeability, all falling within the specified parameters for human application. Molecular dynamics simulations spanning 100 nanoseconds corroborated the stability of these breed molecules within the active site of neuraminidase, shedding light on their structural dynamics. Binding energy assessments, which were conducted through MM-PBSA analysis, substantiated the enduring complexes formed by the seven types of molecules and the neuraminidase receptor. Last, the investigation employed a reaction-based enumeration technique to ascertain the synthetic pathways for the synthesis of the seven breed molecules.
Collapse
Affiliation(s)
- Bourougaa Lotfi
- Group of Computational and Medicinal Chemistry, LMCE Laboratory, University of Biskra, BP 145, Biskra 70700, Algeria;
| | - Ouassaf Mebarka
- Group of Computational and Medicinal Chemistry, LMCE Laboratory, University of Biskra, BP 145, Biskra 70700, Algeria;
| | - Bader Y. Alhatlani
- Unit of Scientific Research, Applied College, Qassim University, Buraydah 52571, Saudi Arabia
| | - Emad M. Abdallah
- Department of Science Laboratories, College of Science and Arts, Qassim University, Ar Rass 51921, Saudi Arabia;
| | - Sarkar M. A. Kawsar
- Laboratory of Carbohydrate and Nucleoside Chemistry, Department of Chemistry, Faculty of Science, University of Chittagong, Chittagong 4331, Bangladesh;
| |
Collapse
|
23
|
Wei L, Xu M, Liu Z, Jiang C, Lin X, Hu Y, Wen X, Zou R, Peng C, Lin H, Wang G, Yang L, Fang L, Yang M, Zhang P. Hit Identification Driven by Combining Artificial Intelligence and Computational Chemistry Methods: A PI5P4K-β Case Study. J Chem Inf Model 2023; 63:5341-5355. [PMID: 37549337 DOI: 10.1021/acs.jcim.3c00543] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/09/2023]
Abstract
Computer-aided drug design (CADD), especially artificial intelligence-driven drug design (AIDD), is increasingly used in drug discovery. In this paper, a novel and efficient workflow for hit identification was developed within the ID4Inno drug discovery platform, featuring innovative artificial intelligence, high-accuracy computational chemistry, and high-performance cloud computing. The workflow was validated by discovering a few potent hit compounds (best IC50 is ∼0.80 μM) against PI5P4K-β, a novel anti-cancer target. Furthermore, by applying the tools implemented in ID4Inno, we managed to optimize these hit compounds and finally obtained five hit series with different scaffolds, all of which showed high activity against PI5P4K-β. These results demonstrate the effectiveness of ID4inno in driving hit identification based on artificial intelligence, computational chemistry, and cloud computing.
Collapse
Affiliation(s)
- Lin Wei
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
- Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
| | - Min Xu
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
| | - Zhiqiang Liu
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
| | - Chongguo Jiang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
| | - Xiaohua Lin
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
| | - Yaogang Hu
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
| | - Xiaoming Wen
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
| | - Rongfeng Zou
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
| | - Chunwang Peng
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
| | - Hongrui Lin
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
| | - Guo Wang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
| | - Lijun Yang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
| | - Lei Fang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
| | - Mingjun Yang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
| | - Peiyu Zhang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Shenzhen 518000, China
| |
Collapse
|
24
|
Yoo J, Kim TY, Joung I, Song SO. Industrializing AI/ML during the end-to-end drug discovery process. Curr Opin Struct Biol 2023; 79:102528. [PMID: 36736243 DOI: 10.1016/j.sbi.2023.102528] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 12/16/2022] [Accepted: 12/20/2022] [Indexed: 02/04/2023]
Abstract
Drug discovery aims to select proper targets and drug candidates to address unmet clinical needs. The end-to-end drug discovery process includes all stages of drug discovery from target identification to drug candidate selection. Recently, several artificial intelligence and machine learning (AI/ML)-based drug discovery companies have attempted to build data-driven platforms spanning the end-to-end drug discovery process. The ability to identify elusive targets essentially leads to the diversification of discovery pipelines, thereby increasing the ability to address unmet needs. Modern ML technologies are complementing traditional computer-aided drug discovery by accelerating candidate optimization in innovative ways. This review summarizes recent developments in AI/ML methods from target identification to molecule optimization, and concludes with an overview of current industrial trends in end-to-end AI/ML platforms.
Collapse
Affiliation(s)
- Jiho Yoo
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118
| | - Tae Yong Kim
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118
| | - InSuk Joung
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118
| | - Sang Ok Song
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118.
| |
Collapse
|
25
|
Design of Novel Phosphatidylinositol 3-Kinase Inhibitors for Non-Hodgkin's Lymphoma: Molecular Docking, Molecular Dynamics, and Density Functional Theory Studies on Gold Nanoparticles. Molecules 2023; 28:molecules28052289. [PMID: 36903539 PMCID: PMC10005307 DOI: 10.3390/molecules28052289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 02/25/2023] [Accepted: 02/27/2023] [Indexed: 03/06/2023] Open
Abstract
Non-Hodgkin's lymphomas are a diverse collection of lymphoproliferative cancers that are much less predictable than Hodgkin's lymphomas with a far greater tendency to metastasize to extranodal sites. A quarter of non-Hodgkin's lymphoma cases develop at extranodal sites and the majority of them involve nodal and extranodal sites. The most common subtypes include follicular lymphoma, chronic/small lymphocytic leukaemia, mantel cell lymphoma, and marginal zone lymphoma. Umbralisib is one of the latest PI3Kδ inhibitors in clinical trials for several hematologic cancer indications. In this study, new umbralisib analogues were designed and docked to the active site of PI3Kδ, the main target of the phosphoinositol-3-kinase/Akt/mammalian target of the rapamycin pathway (PI3K/AKT/mTOR). This study resulted in eleven candidates, with strong binding to PI3Kδ with a docking score between -7.66 and -8.42 Kcal/mol. The docking analysis of ligand-receptor interactions between umbralisib analogues bound to PI3K showed that their interactions were mainly controlled by hydrophobic interactions and, to a lesser extent, by hydrogen bonding. In addition, the MM-GBSA binding free energy was calculated. Analogue 306 showed the highest free energy of binding with -52.22 Kcal/mol. To identify the structural changes and the complexes' stability of proposed ligands, molecular dynamic simulation was used. Based on this research finding, the best-designed analogue, analogue 306, formed a stable ligand-protein complex. In addition, pharmacokinetics and toxicity analysis using the QikProp tool demonstrated that analogue 306 had good absorption, distribution, metabolism, and excretion properties. Additionally, it has a promising predicted profile in immune toxicity, carcinogenicity, and cytotoxicity. In addition, analogue 306 had stable interactions with gold nanoparticles that have been studied using density functional theory calculations. The best interaction with gold was observed at the oxygen atom number 5 with -29.42 Kcal/mol. Further in vitro and in vivo investigations are recommended to be carried out to verify the anticancer activity of this analogue.
Collapse
|
26
|
Xu H. The slow but steady rise of binding free energy calculations in drug discovery. J Comput Aided Mol Des 2023; 37:67-74. [PMID: 36469232 DOI: 10.1007/s10822-022-00494-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 11/28/2022] [Indexed: 12/12/2022]
Abstract
Binding free energy calculations are increasingly used in drug discovery research to predict protein-ligand binding affinities and to prioritize candidate drug molecules accordingly. It has taken decades of collective effort to transform this academic concept into a technology adopted by the pharmaceutical and biotech industry. Having personally witnessed and taken part in this transformation, here I recount the (incomplete) list of problems that had to be solved to make this computational tool practical and suggest areas of future development.
Collapse
Affiliation(s)
- Huafeng Xu
- Roivant Discovery, 151 West 42nd Street, New York, NY, 10036, USA.
| |
Collapse
|
27
|
Danel T, Łęski J, Podlewska S, Podolak IT. Docking-based generative approaches in the search for new drug candidates. Drug Discov Today 2023; 28:103439. [PMID: 36372330 DOI: 10.1016/j.drudis.2022.103439] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/08/2022] [Accepted: 11/08/2022] [Indexed: 11/13/2022]
Abstract
Despite the popularity of virtual screening (VS) of existing compound libraries, the search for new potential drug candidates also takes advantage of generative protocols, where new compound suggestions are enumerated using various algorithms. To increase the activity potency of generative approaches, they have recently been coupled with molecular docking, a leading methodology of structure-based drug design (SBDD). In this review, we summarize progress since docking-based generative models emerged. We propose a new taxonomy for these methods and discuss their importance for the field of computer-aided drug design (CADD). In addition, we discuss the most promising directions for the further development of generative protocols coupled with docking.
Collapse
Affiliation(s)
- Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland.
| | - Jan Łęski
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Department of Medicinal Chemistry, 31-343 Kraków, Smętna Street 12, Poland
| | - Igor T Podolak
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| |
Collapse
|
28
|
Gusev F, Gutkin E, Kurnikova MG, Isayev O. Active Learning Guided Drug Design Lead Optimization Based on Relative Binding Free Energy Modeling. J Chem Inf Model 2023; 63:583-594. [PMID: 36599125 DOI: 10.1021/acs.jcim.2c01052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
In silico identification of potent protein inhibitors commonly requires prediction of a ligand binding free energy (BFE). Thermodynamics integration (TI) based on molecular dynamics (MD) simulations is a BFE calculation method capable of acquiring accurate BFE, but it is computationally expensive and time-consuming. In this work, we have developed an efficient automated workflow for identifying compounds with the lowest BFE among thousands of congeneric ligands, which requires only hundreds of TI calculations. Automated machine learning (AutoML) orchestrated by active learning (AL) in an AL-AutoML workflow allows unbiased and efficient search for a small set of best-performing molecules. We have applied this workflow to select inhibitors of the SARS-CoV-2 papain-like protease and were able to find 133 compounds with improved binding affinity, including 16 compounds with better than 100-fold binding affinity improvement. We obtained a hit rate that outperforms that expected of traditional expert medicinal chemist-guided campaigns. Thus, we demonstrate that the combination of AL and AutoML with free energy simulations provides at least 20× speedup relative to the naïve brute force approaches.
Collapse
Affiliation(s)
- Filipp Gusev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
| | - Evgeny Gutkin
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
| | - Maria G Kurnikova
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
| |
Collapse
|
29
|
Breznik M, Ge Y, Bluck JP, Briem H, Hahn DF, Christ CD, Mortier J, Mobley DL, Meier K. Prioritizing Small Sets of Molecules for Synthesis through in-silico Tools: A Comparison of Common Ranking Methods. ChemMedChem 2023; 18:e202200425. [PMID: 36240514 PMCID: PMC9868080 DOI: 10.1002/cmdc.202200425] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/10/2022] [Indexed: 01/26/2023]
Abstract
Prioritizing molecules for synthesis is a key role of computational methods within medicinal chemistry. Multiple tools exist for ranking molecules, from the cheap and popular molecular docking methods to more computationally expensive molecular-dynamics (MD)-based methods. It is often questioned whether the accuracy of the more rigorous methods justifies the higher computational cost and associated calculation time. Here, we compared the performance on ranking the binding of small molecules for seven scoring functions from five docking programs, one end-point method (MM/GBSA), and two MD-based free energy methods (PMX, FEP+). We investigated 16 pharmaceutically relevant targets with a total of 423 known binders. The performance of docking methods for ligand ranking was strongly system dependent. We observed that MD-based methods predominantly outperformed docking algorithms and MM/GBSA calculations. Based on our results, we recommend the application of MD-based free energy methods for prioritization of molecules for synthesis in lead optimization, whenever feasible.
Collapse
Affiliation(s)
- Marko Breznik
- Computational Molecular Design, Pharmaceuticals, R&D, Bayer AG, 13342 Berlin, Germany
| | - Yunhui Ge
- Department of Pharmaceutical Sciences, University of California, Irvine, CA 92697, USA
| | - Joseph P. Bluck
- Computational Molecular Design, Pharmaceuticals, R&D, Bayer AG, 13342 Berlin, Germany
| | - Hans Briem
- Computational Molecular Design, Pharmaceuticals, R&D, Bayer AG, 13342 Berlin, Germany
| | - David F. Hahn
- Computational Chemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse B-2340, Belgium
| | - Clara D. Christ
- Molecular Design, Pharmaceuticals, R&D, Bayer AG, 13342 Berlin, Germany
| | - Jérémie Mortier
- Computational Molecular Design, Pharmaceuticals, R&D, Bayer AG, 13342 Berlin, Germany
| | - David L. Mobley
- Department of Pharmaceutical Sciences, University of California, Irvine, CA 92697, USA,Department of Chemistry, University of California, Irvine, CA 92697, USA
| | - Katharina Meier
- Computational Life Science Technology Functions, Crop Science, R&D, Bayer AG, 40789 Monheim, Germany
| |
Collapse
|
30
|
Gumede NJ. Pathfinder-Driven Chemical Space Exploration and Multiparameter Optimization in Tandem with Glide/IFD and QSAR-Based Active Learning Approach to Prioritize Design Ideas for FEP+ Calculations of SARS-CoV-2 PL pro Inhibitors. Molecules 2022; 27:8569. [PMID: 36500659 PMCID: PMC9741453 DOI: 10.3390/molecules27238569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 11/25/2022] [Accepted: 11/30/2022] [Indexed: 12/12/2022] Open
Abstract
A global pandemic caused by the SARS-CoV-2 virus that started in 2020 and has wreaked havoc on humanity still ravages up until now. As a result, the negative impact of travel restrictions and lockdowns has underscored the importance of our preparedness for future pandemics. The main thrust of this work was based on addressing this need by traversing chemical space to design inhibitors that target the SARS-CoV-2 papain-like protease (PLpro). Pathfinder-based retrosynthesis analysis was used to generate analogs of GRL-0617 using commercially available building blocks by replacing the naphthalene moiety. A total of 10 models were built using active learning QSAR, which achieved good statistical results such as an R2 > 0.70, Q2 > 0.64, STD Dev < 0.30, and RMSE < 0.31, on average for all models. A total of 35 ideas were further prioritized for FEP+ calculations. The FEP+ results revealed that compound 45 was the most active compound in this series with a ΔG of −7.28 ± 0.96 kcal/mol. Compound 5 exhibited a ΔG of −6.78 ± 1.30 kcal/mol. The inactive compounds in this series were compound 91 and compound 23 with a ΔG of −5.74 ± 1.06 and −3.11 ± 1.45 kcal/mol. The combined strategy employed here is envisaged to be of great utility in multiparameter lead optimization efforts, to traverse chemical space, maintaining and/or improving the potency as well as the property space of synthetically aware design ideas.
Collapse
Affiliation(s)
- Njabulo Joyfull Gumede
- Department of Chemistry, Mangosuthu University of Technology, P.O. Box 12363, Jacobs 4026, South Africa
| |
Collapse
|
31
|
Khalak Y, Tresadern G, Hahn DF, de Groot BL, Gapsys V. Chemical Space Exploration with Active Learning and Alchemical Free Energies. J Chem Theory Comput 2022; 18:6259-6270. [PMID: 36148968 PMCID: PMC9558370 DOI: 10.1021/acs.jctc.2c00752] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Indexed: 11/30/2022]
Abstract
Drug discovery can be thought of as a search for a needle in a haystack: searching through a large chemical space for the most active compounds. Computational techniques can narrow the search space for experimental follow up, but even they become unaffordable when evaluating large numbers of molecules. Therefore, machine learning (ML) strategies are being developed as computationally cheaper complementary techniques for navigating and triaging large chemical libraries. Here, we explore how an active learning protocol can be combined with first-principles based alchemical free energy calculations to identify high affinity phosphodiesterase 2 (PDE2) inhibitors. We first calibrate the procedure using a set of experimentally characterized PDE2 binders. The optimized protocol is then used prospectively on a large chemical library to navigate toward potent inhibitors. In the active learning cycle, at every iteration a small fraction of compounds is probed by alchemical calculations and the obtained affinities are used to train ML models. With successive rounds, high affinity binders are identified by explicitly evaluating only a small subset of compounds in a large chemical library, thus providing an efficient protocol that robustly identifies a large fraction of true positives.
Collapse
Affiliation(s)
- Yuriy Khalak
- Computational
Biomolecular Dynamics Group, Department of Theoretical and Computational
Biophysics, Max Planck Institute for Multidisciplinary
Sciences, Am Fassberg 11, D-37077 Göttingen, Germany
| | - Gary Tresadern
- Computational
Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340 Beerse, Belgium
| | - David F. Hahn
- Computational
Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Bert L. de Groot
- Computational
Biomolecular Dynamics Group, Department of Theoretical and Computational
Biophysics, Max Planck Institute for Multidisciplinary
Sciences, Am Fassberg 11, D-37077 Göttingen, Germany
| | - Vytautas Gapsys
- Computational
Biomolecular Dynamics Group, Department of Theoretical and Computational
Biophysics, Max Planck Institute for Multidisciplinary
Sciences, Am Fassberg 11, D-37077 Göttingen, Germany
| |
Collapse
|
32
|
A pocket-based 3D molecule generative model fueled by experimental electron density. Sci Rep 2022; 12:15100. [PMID: 36068257 PMCID: PMC9448726 DOI: 10.1038/s41598-022-19363-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Accepted: 08/29/2022] [Indexed: 11/08/2022] Open
Abstract
We report for the first time the use of experimental electron density (ED) as training data for the generation of drug-like three-dimensional molecules based on the structure of a target protein pocket. Similar to a structural biologist building molecules based on their ED, our model functions with two main components: a generative adversarial network (GAN) to generate the ligand ED in the input pocket and an ED interpretation module for molecule generation. The model was tested on three targets: a kinase (hematopoietic progenitor kinase 1), protease (SARS-CoV-2 main protease), and nuclear receptor (vitamin D receptor), and evaluated with a reference dataset composed of over 8000 compounds that have their activities reported in the literature. The evaluation considered the chemical validity, chemical space distribution-based diversity, and similarity with reference active compounds concerning the molecular structure and pocket-binding mode. Our model can generate molecules with similar structures to classical active compounds and novel compounds sharing similar binding modes with active compounds, making it a promising tool for library generation supporting high-throughput virtual screening. The ligand ED generated can also be used to support fragment-based drug design. Our model is available as an online service to academic users via https://edmg.stonewise.cn/#/create .
Collapse
|
33
|
Petrović D, Scott JS, Bodnarchuk MS, Lorthioir O, Boyd S, Hughes GM, Lane J, Wu A, Hargreaves D, Robinson J, Sadowski J. Virtual Screening in the Cloud Identifies Potent and Selective ROS1 Kinase Inhibitors. J Chem Inf Model 2022; 62:3832-3843. [PMID: 35920716 DOI: 10.1021/acs.jcim.2c00644] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
ROS1 rearrangements account for 1-2% of non-small cell lung cancer patients, yet there are no specifically designed, selective ROS1 therapies in the clinic. Previous knowledge of potent ROS1 inhibitors with selectivity over TrkA, a selected antitarget, enabled virtual screening as a hit finding approach in this project. The ligand-based virtual screening was focused on identifying molecules with a similar 3D shape and pharmacophore to the known actives. To that end, we turned to the AstraZeneca virtual library, estimated to cover 1015 synthesizable make-on-demand molecules. We used cloud computing-enabled FastROCS technology to search the enumerated 1010 subset of the full virtual space. A small number of specific libraries were prioritized based on the compound properties and a medicinal chemistry assessment and further enumerated with available building blocks. Following the docking evaluation to the ROS1 structure, the most promising hits were synthesized and tested, resulting in the identification of several potent and selective series. The best among them gave a nanomolar ROS1 inhibitor with over 1000-fold selectivity over TrkA and, from the preliminary established SAR, these have the potential to be further optimized. Our prospective study describes how conceptually simple shape-matching approaches can identify potent and selective compounds by searching ultralarge virtual libraries, demonstrating the applicability of such workflows and their importance in early drug discovery.
Collapse
Affiliation(s)
- Dušan Petrović
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 50, Sweden
| | - James S Scott
- Oncology R&D, AstraZeneca, Cambridge CB4 0WG, United Kingdom
| | | | | | - Scott Boyd
- Oncology R&D, AstraZeneca, Cambridge CB4 0WG, United Kingdom
| | - George M Hughes
- Discovery Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB2 0AA, United Kingdom
| | - Jordan Lane
- Discovery Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB2 0AA, United Kingdom
| | - Allan Wu
- Mechanistic and Structural Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Waltham, Massachusetts 02451, United States
| | - David Hargreaves
- Mechanistic and Structural Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB2 0AA, United Kingdom
| | - James Robinson
- Mechanistic and Structural Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB2 0AA, United Kingdom
| | - Jens Sadowski
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 50, Sweden
| |
Collapse
|
34
|
Zhu Z, Deng Z, Wang Q, Wang Y, Zhang D, Xu R, Guo L, Wen H. Simulation and Machine Learning Methods for Ion-Channel Structure Determination, Mechanistic Studies and Drug Design. Front Pharmacol 2022; 13:939555. [PMID: 35837274 PMCID: PMC9275593 DOI: 10.3389/fphar.2022.939555] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 06/07/2022] [Indexed: 11/13/2022] Open
Abstract
Ion channels are expressed in almost all living cells, controlling the in-and-out communications, making them ideal drug targets, especially for central nervous system diseases. However, owing to their dynamic nature and the presence of a membrane environment, ion channels remain difficult targets for the past decades. Recent advancement in cryo-electron microscopy and computational methods has shed light on this issue. An explosion in high-resolution ion channel structures paved way for structure-based rational drug design and the state-of-the-art simulation and machine learning techniques dramatically improved the efficiency and effectiveness of computer-aided drug design. Here we present an overview of how simulation and machine learning-based methods fundamentally changed the ion channel-related drug design at different levels, as well as the emerging trends in the field.
Collapse
Affiliation(s)
- Zhengdan Zhu
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Beijing Institute of Big Data Research, Beijing, China
| | - Zhenfeng Deng
- DP Technology, Beijing, China
- School of Pharmaceutical Sciences, Peking University, Beijing, China
| | | | | | - Duo Zhang
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- DP Technology, Beijing, China
| | - Ruihan Xu
- DP Technology, Beijing, China
- National Engineering Research Center of Visual Technology, Peking University, Beijing, China
| | | | - Han Wen
- DP Technology, Beijing, China
| |
Collapse
|
35
|
Goldman B, Kearnes S, Kramer T, Riley P, Walters WP. Defining Levels of Automated Chemical Design. J Med Chem 2022; 65:7073-7087. [PMID: 35511951 PMCID: PMC9150065 DOI: 10.1021/acs.jmedchem.2c00334] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Indexed: 01/07/2023]
Abstract
One application area of computational methods in drug discovery is the automated design of small molecules. Despite the large number of publications describing methods and their application in both retrospective and prospective studies, there is a lack of agreement on terminology and key attributes to distinguish these various systems. We introduce Automated Chemical Design (ACD) Levels to clearly define the level of autonomy along the axes of ideation and decision making. To fully illustrate this framework, we provide literature exemplars and place some notable methods and applications into the levels. The ACD framework provides a common language for describing automated small molecule design systems and enables medicinal chemists to better understand and evaluate such systems.
Collapse
Affiliation(s)
- Brian Goldman
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02139, United States
| | - Steven Kearnes
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02139, United States
| | - Trevor Kramer
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02139, United States
| | - Patrick Riley
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02139, United States
| | - W. Patrick Walters
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
36
|
Moshawih S, Goh HP, Kifli N, Idris AC, Yassin H, Kotra V, Goh KW, Liew KB, Ming LC. Synergy between machine learning and natural products cheminformatics: Application to the lead discovery of anthraquinone derivatives. Chem Biol Drug Des 2022; 100:185-217. [PMID: 35490393 DOI: 10.1111/cbdd.14062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 04/15/2022] [Accepted: 04/23/2022] [Indexed: 11/28/2022]
Abstract
Cheminformatics utilizing machine learning (ML) techniques have opened up a new horizon in drug discovery. This is owing to vast chemical space expansion with rocketing numbers of expected hits and lead compounds that match druggable macromolecular targets, in particular from natural compounds. Due to the natural products' (NP) structural complexity, uniqueness, and diversity, they could occupy a bigger space in pharmaceuticals, allowing the industry to pursue more selective leads in the nanomolar range of binding affinity. ML is an essential part of each step of the drug design pipeline, such as target prediction, compound library preparation, and lead optimization. Notably, molecular mechanic and dynamic simulations, induced docking, and free energy perturbations are essential in predicting best binding poses, binding free energy values, and molecular mechanics force fields. Those applications have leveraged from artificial intelligence (AI), which decreases the computational costs required for such costly simulations. This review aimed to describe chemical space and compound libraries related to NPs. High-throughput screening utilized for fractionating NPs and high-throughput virtual screening and their strategies, and significance, are reviewed. Particular emphasis was given to AI approaches, ML tools, algorithms, and techniques, especially in drug discovery of macrocyclic compounds and approaches in computer-aided and ML-based drug discovery. Anthraquinone derivatives were discussed as a source of new lead compounds that can be developed using ML tools for diverse medicinal uses such as cancer, infectious diseases, and metabolic disorders. Furthermore, the power of principal component analysis in understanding relevant protein conformations, and molecular modeling of protein-ligand interaction were also presented. Apart from being a concise reference for cheminformatics, this review is a useful text to understand the application of ML-based algorithms to molecular dynamics simulation and in silico absorption, distribution, metabolism, excretion, and toxicity prediction.
Collapse
Affiliation(s)
- Said Moshawih
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Hui Poh Goh
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Nurolaini Kifli
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Azam Che Idris
- Faculty of Integrated Technologies, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Hayati Yassin
- Faculty of Integrated Technologies, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Vijay Kotra
- Faculty of Pharmacy, Quest International University, Perak, Malaysia
| | - Khang Wen Goh
- Faculty of Data Science and Information Technology, INTI International University, Nilai, Malaysia
| | - Kai Bin Liew
- Faculty of Pharmacy, University of Cyberjaya, Cyberjaya, Malaysia
| | - Long Chiau Ming
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| |
Collapse
|
37
|
Warr WA, Nicklaus MC, Nicolaou CA, Rarey M. Exploration of Ultralarge Compound Collections for Drug Discovery. J Chem Inf Model 2022; 62:2021-2034. [PMID: 35421301 DOI: 10.1021/acs.jcim.2c00224] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Designing new medicines more cheaply and quickly is tightly linked to the quest of exploring chemical space more widely and efficiently. Chemical space is monumentally large, but recent advances in computer software and hardware have enabled researchers to navigate virtual chemical spaces containing billions of chemical structures. This review specifically concerns collections of many millions or even billions of enumerated chemical structures as well as even larger chemical spaces that are not fully enumerated. We present examples of chemical libraries and spaces and the means used to construct them, and we discuss new technologies for searching huge libraries and for searching combinatorially in chemical space. We also cover space navigation techniques and consider new approaches to de novo drug design and the impact of the "autonomous laboratory" on synthesis of designed compounds. Finally, we summarize some other challenges and opportunities for the future.
Collapse
Affiliation(s)
- Wendy A Warr
- Wendy Warr & Associates, 6 Berwick Court, Holmes Chapel, Crewe, Cheshire CW4 7HZ, United Kingdom
| | - Marc C Nicklaus
- NCI, NIH, CADD Group, NCI-Frederick, Frederick, Maryland 21702, United States
| | - Christos A Nicolaou
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Matthias Rarey
- Universität Hamburg, ZBH Center for Bioinformatics, 20146 Hamburg, Germany
| |
Collapse
|
38
|
Bos PH, Houang EM, Ranalli F, Leffler AE, Boyles NA, Eyrich VA, Luria Y, Katz D, Tang H, Abel R, Bhat S. AutoDesigner, a De Novo Design Algorithm for Rapidly Exploring Large Chemical Space for Lead Optimization: Application to the Design and Synthesis of d-Amino Acid Oxidase Inhibitors. J Chem Inf Model 2022; 62:1905-1915. [PMID: 35417149 DOI: 10.1021/acs.jcim.2c00072] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The lead optimization stage of a drug discovery program generally involves the design, synthesis, and assaying of hundreds to thousands of compounds. The design phase is usually carried out via traditional medicinal chemistry approaches and/or structure-based drug design (SBDD) when suitable structural information is available. Two of the major limitations of this approach are (1) difficulty in rapidly designing potent molecules that adhere to myriad project criteria, or the multiparameter optimization (MPO) problem, and (2) the relatively small number of molecules explored compared to the vast size of chemical space. To address these limitations, we have developed AutoDesigner, a de novo design algorithm. AutoDesigner employs a cloud-native, multistage search algorithm to carry out successive rounds of chemical space exploration and filtering. Millions to billions of virtual molecules are explored and optimized while adhering to a customizable set of project criteria such as physicochemical properties and potency. Additionally, the algorithm only requires a single ligand with measurable affinity and a putative binding model as a starting point, making it amenable to the early stages of an SBDD project where limited data are available. To assess the effectiveness of AutoDesigner, we applied it to the design of novel inhibitors of d-amino acid oxidase (DAO), a target for the treatment of schizophrenia. AutoDesigner was able to generate and efficiently explore over 1 billion molecules to successfully address a variety of project goals. The compounds generated by AutoDesigner that were synthesized and assayed (1) simultaneously met not only physicochemical criteria, clearance, and central nervous system (CNS) penetration (Kp,uu) cutoffs but also potency thresholds and (2) fully utilize structural data to discover and explore novel interactions and a previously unexplored subpocket in the DAO active site. The reported data demonstrate that AutoDesigner can play a key role in accelerating the discovery of novel, potent chemical matter within the constraints of a given drug discovery lead optimization campaign.
Collapse
Affiliation(s)
- Pieter H Bos
- Schrödinger, Inc., 1540 Broadway, 24th Floor, New York, New York 10036, United States
| | - Evelyne M Houang
- Schrödinger, Inc., 1540 Broadway, 24th Floor, New York, New York 10036, United States
| | - Fabio Ranalli
- Schrödinger, Inc., 1540 Broadway, 24th Floor, New York, New York 10036, United States
| | - Abba E Leffler
- Schrödinger, Inc., 1540 Broadway, 24th Floor, New York, New York 10036, United States
| | - Nicholas A Boyles
- Schrödinger, Inc., 1540 Broadway, 24th Floor, New York, New York 10036, United States
| | - Volker A Eyrich
- Schrödinger, Inc., 1540 Broadway, 24th Floor, New York, New York 10036, United States
| | - Yuval Luria
- Schrödinger, Inc., 1540 Broadway, 24th Floor, New York, New York 10036, United States
| | - Dana Katz
- Schrödinger, Inc., 1540 Broadway, 24th Floor, New York, New York 10036, United States
| | - Haifeng Tang
- Schrödinger, Inc., 1540 Broadway, 24th Floor, New York, New York 10036, United States
| | - Robert Abel
- Schrödinger, Inc., 1540 Broadway, 24th Floor, New York, New York 10036, United States
| | - Sathesh Bhat
- Schrödinger, Inc., 1540 Broadway, 24th Floor, New York, New York 10036, United States
| |
Collapse
|
39
|
Bolcato G, Heid E, Boström J. On the Value of Using 3D Shape and Electrostatic Similarities in Deep Generative Methods. J Chem Inf Model 2022; 62:1388-1398. [PMID: 35271260 PMCID: PMC8965872 DOI: 10.1021/acs.jcim.1c01535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
Multiparameter optimization,
the heart of drug design, is still
an open challenge. Thus, improved methods for automated compound design
with multiple controlled properties are desired. Here, we present
a significant extension to our previously described fragment-based
reinforcement learning method (DeepFMPO) for the generation of novel
molecules with optimal properties. As before, the generative process
outputs optimized molecules similar to the input structures, now with
the improved feature of replacing parts of these molecules with fragments
of similar three-dimensional (3D) shape and electrostatics. We developed
and benchmarked a new python package, ESP-Sim, for the comparison
of the electrostatic potential and the molecular shape, allowing the
calculation of high-quality partial charges (e.g., RESP with B3LYP/6-31G**)
obtained using the quantum chemistry program Psi4. By performing comparisons
of 3D fragments, we can simulate 3D properties while overcoming the
notoriously difficult step of accurately describing bioactive conformations.
The new improved generative (DeepFMPO v3D) method is demonstrated
with a scaffold-hopping exercise identifying CDK2 bioisosteres. The
code is open-source and freely available.
Collapse
Affiliation(s)
- Giovanni Bolcato
- Molecular Modeling Section, University of Padova, 35131 Padova, Italy
| | - Esther Heid
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, 02139 Massachusetts, United States
| | - Jonas Boström
- Medicinal Chemistry, Early CVRM, BioPharmaceuticals R&D, AstraZeneca, 431 50 Mölndal, Sweden
| |
Collapse
|
40
|
Ferguson AL, Brown KA. Data-Driven Design and Autonomous Experimentation in Soft and Biological Materials Engineering. Annu Rev Chem Biomol Eng 2022; 13:25-44. [PMID: 35236085 DOI: 10.1146/annurev-chembioeng-092120-020803] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This article reviews recent developments in the applications of machine learning, data-driven modeling, transfer learning, and autonomous experimentation for the discovery, design, and optimization of soft and biological materials. The design and engineering of molecules and molecular systems have long been a preoccupation of chemical and biomolecular engineers using a variety of computational and experimental techniques. Increasingly, researchers have looked to emerging and established tools in artificial intelligence and machine learning to integrate with established approaches in chemical science to realize powerful, efficient, and in some cases autonomous platforms for molecular discovery, materials engineering, and process optimization. This review summarizes the basic principles underpinning these techniques and highlights recent successful example applications in autonomous materials discovery, transfer learning, and multi-fidelity active learning. Expected final online publication date for the Annual Review of Chemical and Biomolecular Engineering, Volume 13 is October 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, USA;
| | - Keith A Brown
- Mechanical Engineering, Boston University, Boston, Massachusetts 02215, USA;
| |
Collapse
|
41
|
Iovanac NC, MacKnight R, Savoie BM. Actively Searching: Inverse Design of Novel Molecules with Simultaneously Optimized Properties. J Phys Chem A 2022; 126:333-340. [PMID: 34985908 DOI: 10.1021/acs.jpca.1c08191] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Combining quantum chemistry characterizations with generative machine learning models has the potential to accelerate molecular discovery. In this paradigm, quantum chemistry acts as a relatively cost-effective oracle for evaluating the properties of particular molecules, while generative models provide a means of sampling chemical space based on learned structure-function relationships. For practical applications, multiple potentially orthogonal properties must be optimized in tandem during a discovery workflow. This carries additional difficulties associated with the specificity of the targets and the ability for the model to reconcile all properties simultaneously. Here, we demonstrate an active learning approach to improve the performance of multi-target generative chemical models. We first demonstrate the effectiveness of a set of baseline models trained on single property prediction tasks in generating novel compounds (i.e., not present in the training data) with various property targets, including both interpolative and extrapolative generation scenarios. For property ranges where accurate targeting proves difficult, the novel compounds suggested by the model are characterized using quantum chemistry and the new molecules closest to expressing the desired properties are fed back into the generative model for additional training. This gradually improves the generative models' understanding of targeted areas of chemical space and shifts the distribution of the generated compounds toward the targeted values. We then demonstrate the effectiveness of this active learning approach in generating compounds with multiple chemical constraints, including vertical ionization potential, electron affinity, and dipole moment targets, and validate the results at the ωB97X-D3/def2-TZVP level. This method requires no modifications to extant generative approaches, but rather utilizes their inherent generative and predictive aspects for self-refinement, and can be applied to situations where any number of properties with varying degrees of correlation must be optimized simultaneously.
Collapse
Affiliation(s)
- Nicolae C Iovanac
- Charles D. Davidson School of Chemical Engineering, Purdue University, 480 Stadium Mall Drive, West Lafayette, Indiana 47906, United States
| | - Robert MacKnight
- Charles D. Davidson School of Chemical Engineering, Purdue University, 480 Stadium Mall Drive, West Lafayette, Indiana 47906, United States
| | - Brett M Savoie
- Charles D. Davidson School of Chemical Engineering, Purdue University, 480 Stadium Mall Drive, West Lafayette, Indiana 47906, United States
| |
Collapse
|
42
|
Kwak HS, An Y, Giesen DJ, Hughes TF, Brown CT, Leswing K, Abroshan H, Halls MD. Design of Organic Electronic Materials With a Goal-Directed Generative Model Powered by Deep Neural Networks and High-Throughput Molecular Simulations. Front Chem 2022; 9:800370. [PMID: 35111730 PMCID: PMC8802168 DOI: 10.3389/fchem.2021.800370] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Accepted: 12/15/2021] [Indexed: 11/16/2022] Open
Abstract
In recent years, generative machine learning approaches have attracted significant attention as an enabling approach for designing novel molecular materials with minimal design bias and thereby realizing more directed design for a specific materials property space. Further, data-driven approaches have emerged as a new tool to accelerate the development of novel organic electronic materials for organic light-emitting diode (OLED) applications. We demonstrate and validate a goal-directed generative machine learning framework based on a recurrent neural network (RNN) deep reinforcement learning approach for the design of hole transporting OLED materials. These large-scale molecular simulations also demonstrate a rapid, cost-effective method to identify new materials in OLEDs while also enabling expansion into many other verticals such as catalyst design, aerospace, life science, and petrochemicals.
Collapse
Affiliation(s)
- H. Shaun Kwak
- Schrödinger, Inc., Portland, OR, United States
- *Correspondence: H. Shaun Kwak, ; Yuling An,
| | - Yuling An
- Schrödinger, Inc., New York, NY, United States
- *Correspondence: H. Shaun Kwak, ; Yuling An,
| | | | | | | | | | | | | |
Collapse
|
43
|
Abstract
Artificial intelligence (AI) tools find increasing application in drug discovery supporting every stage of the Design-Make-Test-Analyse (DMTA) cycle. The main focus of this chapter is the application in molecular generation with the aid of deep neural networks (DNN). We present a historical overview of the main advances in the field. We analyze the concepts of distribution and goal-directed learning and then highlight some of the recent applications of generative models in drug design with a focus into research work from the biopharmaceutical industry. We present in some more detail REINVENT which is an open-source software developed within our group in AstraZeneca and the main platform for AI molecular design support for a number of medicinal chemistry projects in the company and we also demonstrate some of our work in library design. Finally, we present some of the main challenges in the application of AI in Drug Discovery and different approaches to respond to these challenges which define areas for current and future work.
Collapse
|
44
|
Andrianov GV, Ong WJG, Serebriiskii I, Karanicolas J. Efficient Hit-to-Lead Searching of Kinase Inhibitor Chemical Space via Computational Fragment Merging. J Chem Inf Model 2021; 61:5967-5987. [PMID: 34762402 PMCID: PMC8865965 DOI: 10.1021/acs.jcim.1c00630] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
In early-stage drug discovery, the hit-to-lead optimization (or "hit expansion") stage entails starting from a newly identified active compound and improving its potency or other properties. Traditionally, this process relies on synthesizing and evaluating a series of analogues to build up structure-activity relationships. Here, we describe a computational strategy focused on kinase inhibitors, intended to expedite the process of identifying analogues with improved potency. Our protocol begins from an inhibitor of the target kinase and generalizes the synthetic route used to access it. By searching for commercially available replacements for the individual building blocks used to make the parent inhibitor, we compile an enumerated library of compounds that can be accessed using the same chemical transformations; these huge libraries can exceed many millions─or billions─of compounds. Because the resulting libraries are much too large for explicit virtual screening, we instead consider alternate approaches to identify the top-scoring compounds. We find that contributions from individual substituents are well described by a pairwise additivity approximation, provided that the corresponding fragments position their shared core in precisely the same way relative to the binding site. This key insight allows us to determine which fragments are suitable for merging into single new compounds and which are not. Further, the use of pairwise approximation allows interaction energies to be assigned to each compound in the library without the need for any further structure-based modeling: interaction energies instead can be reliably estimated from the energies of the component fragments, and the reduced computational requirements allow for flexible energy minimizations that allow the kinase to respond to each substitution. We demonstrate this protocol using libraries built from six representative kinase inhibitors drawn from the literature, which target five different kinases: CDK9, CHK1, CDK2, EGFRT790M, and ACK1. In each example, the enumerated library includes additional analogues reported by the original study to have activity, and these analogues are successfully prioritized within the library. We envision that the insights from this work can facilitate the rapid assembly and screening of increasingly large libraries for focused hit-to-lead optimization. To enable adoption of these methods and to encourage further analyses, we disseminate the computational tools needed to deploy this protocol.
Collapse
Affiliation(s)
- Grigorii V. Andrianov
- Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, PA 19111-2497,Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia, 420008
| | - Wern Juin Gabriel Ong
- Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, PA 19111-2497,Bowdoin College, Brunswick, ME 04011
| | - Ilya Serebriiskii
- Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, PA 19111-2497,Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia, 420008
| | - John Karanicolas
- Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, PA 19111-2497,To whom correspondence should be addressed. , 215-728-7067
| |
Collapse
|
45
|
Frye L, Bhat S, Akinsanya K, Abel R. From computer-aided drug discovery to computer-driven drug discovery. DRUG DISCOVERY TODAY. TECHNOLOGIES 2021; 39:111-117. [PMID: 34906321 DOI: 10.1016/j.ddtec.2021.08.001] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 07/06/2021] [Accepted: 08/02/2021] [Indexed: 12/16/2022]
Abstract
Computational chemistry and structure-based design have traditionally been viewed as a subset of tools that could aid acceleration of the drug discovery process, but were not commonly regarded as a driving force in small molecule drug discovery. In the last decade however, there have been dramatic advances in the field, including (1) development of physics-based computational approaches to accurately predict a broad variety of endpoints from potency to solubility, (2) improvements in artificial intelligence and deep learning methods and (3) dramatic increases in computational power with the advent of GPUs and cloud computing, resulting in the ability to explore and accurately profile vast amounts of drug-like chemical space in silico. There have also been simultaneous advancements in structural biology such as cryogenic electron microscopy (cryo-EM) and computational protein-structure prediction, allowing for access to many more high-resolution 3D structures of novel drug-receptor complexes. The convergence of these breakthroughs has positioned structurally-enabled computational methods to be a driving force behind the discovery of novel small molecule therapeutics. This review will give a broad overview of the synergies in recent advances in the fields of computational chemistry, machine learning and structural biology, in particular in the areas of hit identification, hit-to-lead, and lead optimization.
Collapse
Affiliation(s)
- Leah Frye
- Schrödinger Inc., 120 West 45th Street, 17th Floor, New York, NY 10036-4041, United States
| | - Sathesh Bhat
- Schrödinger Inc., 120 West 45th Street, 17th Floor, New York, NY 10036-4041, United States
| | - Karen Akinsanya
- Schrödinger Inc., 120 West 45th Street, 17th Floor, New York, NY 10036-4041, United States
| | - Robert Abel
- Schrödinger Inc., 120 West 45th Street, 17th Floor, New York, NY 10036-4041, United States.
| |
Collapse
|
46
|
Bhati AP, Wan S, Alfè D, Clyde AR, Bode M, Tan L, Titov M, Merzky A, Turilli M, Jha S, Highfield RR, Rocchia W, Scafuri N, Succi S, Kranzlmüller D, Mathias G, Wifling D, Donon Y, Di Meglio A, Vallecorsa S, Ma H, Trifan A, Ramanathan A, Brettin T, Partin A, Xia F, Duan X, Stevens R, Coveney PV. Pandemic drugs at pandemic speed: infrastructure for accelerating COVID-19 drug discovery with hybrid machine learning- and physics-based simulations on high-performance computers. Interface Focus 2021; 11:20210018. [PMID: 34956592 PMCID: PMC8504892 DOI: 10.1098/rsfs.2021.0018] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/07/2021] [Indexed: 12/13/2022] Open
Abstract
The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case, developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the required large-scale calculations to identify lead antiviral compounds through repurposing on a variety of supercomputers.
Collapse
Affiliation(s)
- Agastya P. Bhati
- Centre for Computational Science, University College London, Gordon Street, London WC1H 0AJ, UK
| | - Shunzhou Wan
- Centre for Computational Science, University College London, Gordon Street, London WC1H 0AJ, UK
| | - Dario Alfè
- Department of Earth Sciences, London Centre for Nanotechnology and Thomas Young Centre at University College London, University College London, Gower Street, London WC1E 6BT, UK
- Dipartimento di Fisica Ettore Pancini, Università di Napoli Federico II, Monte Sant'Angelo, Napoli 80126, Italy
| | - Austin R. Clyde
- Department of Computer Science, University of Chicago, Chicago, IL, USA
| | - Mathis Bode
- Institute for Combustion Technology, RWTH Aachen University, Aachen 52056, Germany
| | - Li Tan
- Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Mikhail Titov
- Department of Electrical and Computer Engineering, Rutgers, the State University of New Jersey, Piscataway, NJ 08854, USA
| | - Andre Merzky
- Department of Electrical and Computer Engineering, Rutgers, the State University of New Jersey, Piscataway, NJ 08854, USA
| | - Matteo Turilli
- Department of Electrical and Computer Engineering, Rutgers, the State University of New Jersey, Piscataway, NJ 08854, USA
| | - Shantenu Jha
- Brookhaven National Laboratory, Upton, NY 11973, USA
- Department of Electrical and Computer Engineering, Rutgers, the State University of New Jersey, Piscataway, NJ 08854, USA
| | | | - Walter Rocchia
- Concept Lab, Italian Institute of Technology, Via Melen, Genova, Italy
| | - Nicola Scafuri
- Concept Lab, Italian Institute of Technology, Via Melen, Genova, Italy
| | - Sauro Succi
- Center for Life Nanosciences at La Sapienza, Italian Institute of Technology, viale Regina Elena, Roma, Italy
| | - Dieter Kranzlmüller
- Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences and Humanities, Boltzmannstrasse 1, Garching bei München 85748, Germany
| | - Gerald Mathias
- Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences and Humanities, Boltzmannstrasse 1, Garching bei München 85748, Germany
| | - David Wifling
- Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences and Humanities, Boltzmannstrasse 1, Garching bei München 85748, Germany
| | | | | | | | - Heng Ma
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Anda Trifan
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Arvind Ramanathan
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Tom Brettin
- Computing, Environment and Life Sciences Directorate, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Alexander Partin
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Fangfang Xia
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Xiaotan Duan
- Department of Computer Science, University of Chicago, Chicago, IL, USA
| | - Rick Stevens
- Computing, Environment and Life Sciences Directorate, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Peter V. Coveney
- Centre for Computational Science, University College London, Gordon Street, London WC1H 0AJ, UK
- Institute for Informatics, University of Amsterdam, Science Park 904, Amsterdam 1098 XH, The Netherlands
| |
Collapse
|
47
|
Yang Y, Yao K, Repasky MP, Leswing K, Abel R, Shoichet BK, Jerome SV. Efficient Exploration of Chemical Space with Docking and Deep Learning. J Chem Theory Comput 2021; 17:7106-7119. [PMID: 34592101 DOI: 10.1021/acs.jctc.1c00810] [Citation(s) in RCA: 77] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling the discovery of more-potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures almost all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blind test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds.
Collapse
Affiliation(s)
- Ying Yang
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - Kun Yao
- Schrödinger, Inc., 120 West 45th Street, 17th Floor, New York, New York 10036, United States
| | - Matthew P Repasky
- Schrödinger, Inc., 101 SW Main Street, #1300, Portland, Oregon 97239, United States
| | - Karl Leswing
- Schrödinger, Inc., 120 West 45th Street, 17th Floor, New York, New York 10036, United States
| | - Robert Abel
- Schrödinger, Inc., 120 West 45th Street, 17th Floor, New York, New York 10036, United States
| | - Brian K Shoichet
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
| | - Steven V Jerome
- Schrödinger, Inc., 10201 Wateridge Cir Suite 220, San Diego, California 92121, United States
| |
Collapse
|
48
|
Machine Learning Applied to the Modeling of Pharmacological and ADMET Endpoints. Methods Mol Biol 2021. [PMID: 34731464 DOI: 10.1007/978-1-0716-1787-8_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2023]
Abstract
The well-known concept of quantitative structure-activity relationships (QSAR) has been gaining significant interest in the recent years. Data, descriptors, and algorithms are the main pillars to build useful models that support more efficient drug discovery processes with in silico methods. Significant advances in all three areas are the reason for the regained interest in these models. In this book chapter we review various machine learning (ML) approaches that make use of measured in vitro/in vivo data of many compounds. We put these in context with other digital drug discovery methods and present some application examples.
Collapse
|
49
|
Hamdy R, Fayed B, Mostafa A, Shama NMA, Mahmoud SH, Mehta CH, Nayak Y, M. Soliman SS. Iterated Virtual Screening-Assisted Antiviral and Enzyme Inhibition Assays Reveal the Discovery of Novel Promising Anti-SARS-CoV-2 with Dual Activity. Int J Mol Sci 2021; 22:9057. [PMID: 34445763 PMCID: PMC8396542 DOI: 10.3390/ijms22169057] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 08/16/2021] [Accepted: 08/19/2021] [Indexed: 02/06/2023] Open
Abstract
Unfortunately, COVID-19 is still a threat to humankind and has a dramatic impact on human health, social life, the world economy, and food security. With the limited number of suggested therapies under clinical trials, the discovery of novel therapeutic agents is essential. Here, a previously identified anti-SARS-CoV-2 compound named Compound 13 (1,2,5-Oxadiazole-3-carboximidic acid, 4,4'-(methylenediimino) bis,bis[[(2-hydroxyphenyl)methylene]hydrazide) was subjected to an iterated virtual screening against SARS-CoV-2 Mpro using a combination of Ligand Designer and PathFinder. PathFinder, a computational reaction enumeration tool, was used for the rapid generation of enumerated structures via default reaction library. Ligand designer was employed for the computerized lead optimization and selection of the best structural modification that resulted in a favorable ligand-protein complex. The obtained compounds that showed the best binding to Mpro were re-screened against TMPRSS2, leading to the identification of 20 shared compounds. The compounds were further visually inspected, which resulted in the identification of five shared compounds M1-5 with dual binding affinity. In vitro evaluation and enzyme inhibition assay indicated that M3, an analogue of Compound 13 afforded by replacing the phenolic moiety with pyridinyl, possesses an improved antiviral activity and safety. M3 displayed in vitro antiviral activity with IC50 0.016 µM and Mpro inhibition activity with IC50 0.013 µM, 7-fold more potent than the parent Compound 13 and potent than the antivirals drugs that are currently under clinical trials. Moreover, M3 showed potent activity against human TMPRSS2 and furin enzymes with IC50 0.05, and 0.08 µM, respectively. Molecular docking, WaterMap analysis, molecular dynamics simulation, and R-group analysis confirmed the superiority of the binding fit to M3 with the target enzymes. WaterMap analysis calculated the thermodynamic properties of the hydration site in the binding pocket that significantly affects the biological activity. Loading M3 on zinc oxide nanoparticles (ZnO NPs) increased the antiviral activity of the compound 1.5-fold, while maintaining a higher safety profile. In conclusion, lead optimized discovery following an iterated virtual screening in association with molecular docking and biological evaluation revealed a novel compound named M3 with promising dual activity against SARS-CoV-2. The compound deserves further investigation for potential clinical-based studies.
Collapse
Affiliation(s)
- Rania Hamdy
- Research Institute for Medical and Health Sciences, University of Sharjah, Sharjah 27272, United Arab Emirates; (R.H.); (B.F.)
- Faculty of Pharmacy, Zagazig University, Zagazig 44519, Egypt
| | - Bahgat Fayed
- Research Institute for Medical and Health Sciences, University of Sharjah, Sharjah 27272, United Arab Emirates; (R.H.); (B.F.)
- Chemistry of Natural and Microbial Product Department, National Research Centre, Cairo 12622, Egypt
| | - Ahmed Mostafa
- Centre of Scientific Excellence for Influenza Viruses, National Research Centre, Giza 12622, Egypt; (A.M.); (N.M.A.S.); (S.H.M.)
| | - Noura M. Abo Shama
- Centre of Scientific Excellence for Influenza Viruses, National Research Centre, Giza 12622, Egypt; (A.M.); (N.M.A.S.); (S.H.M.)
| | - Sara Hussein Mahmoud
- Centre of Scientific Excellence for Influenza Viruses, National Research Centre, Giza 12622, Egypt; (A.M.); (N.M.A.S.); (S.H.M.)
| | - Chetan Hasmukh Mehta
- Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, Manipal 576104, India; (C.H.M.); (Y.N.)
| | - Yogendra Nayak
- Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, Manipal 576104, India; (C.H.M.); (Y.N.)
| | - Sameh S. M. Soliman
- Research Institute for Medical and Health Sciences, University of Sharjah, Sharjah 27272, United Arab Emirates; (R.H.); (B.F.)
- College of Pharmacy, University of Sharjah, Sharjah 27272, United Arab Emirates
| |
Collapse
|
50
|
Tynes M, Gao W, Burrill DJ, Batista ER, Perez D, Yang P, Lubbers N. Pairwise Difference Regression: A Machine Learning Meta-algorithm for Improved Prediction and Uncertainty Quantification in Chemical Search. J Chem Inf Model 2021; 61:3846-3857. [PMID: 34347460 DOI: 10.1021/acs.jcim.1c00670] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Machine learning (ML) plays a growing role in the design and discovery of chemicals, aiming to reduce the need to perform expensive experiments and simulations. ML for such applications is promising but difficult, as models must generalize to vast chemical spaces from small training sets and must have reliable uncertainty quantification metrics to identify and prioritize unexplored regions. Ab initio computational chemistry and chemical intuition alike often take advantage of differences between chemical conditions, rather than their absolute structure or state, to generate more reliable results. We have developed an analogous comparison-based approach for ML regression, called pairwise difference regression (PADRE), which is applicable to arbitrary underlying learning models and operates on pairs of input data points. During training, the model learns to predict differences between all possible pairs of input points. During prediction, the test points are paired with all training set points, giving rise to a set of predictions that can be treated as a distribution of which the mean is treated as a final prediction and the dispersion is treated as an uncertainty measure. Pairwise difference regression was shown to reliably improve the performance of the random forest algorithm across five chemical ML tasks. Additionally, the pair-derived dispersion is both well correlated with model error and performs well in active learning. We also show that this method is competitive with state-of-the-art neural network techniques. Thus, pairwise difference regression is a promising tool for candidate selection algorithms used in chemical discovery.
Collapse
Affiliation(s)
- Michael Tynes
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.,Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Wenhao Gao
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.,Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Daniel J Burrill
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.,Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Enrique R Batista
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.,Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Danny Perez
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Ping Yang
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|