151
|
Sarkar C, Das B, Rawat VS, Wahlang JB, Nongpiur A, Tiewsoh I, Lyngdoh NM, Das D, Bidarolli M, Sony HT. Artificial Intelligence and Machine Learning Technology Driven Modern Drug Discovery and Development. Int J Mol Sci 2023; 24:ijms24032026. [PMID: 36768346 PMCID: PMC9916967 DOI: 10.3390/ijms24032026] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/27/2022] [Accepted: 12/28/2022] [Indexed: 01/22/2023] Open
Abstract
The discovery and advances of medicines may be considered as the ultimate relevant translational science effort that adds to human invulnerability and happiness. But advancing a fresh medication is a quite convoluted, costly, and protracted operation, normally costing USD ~2.6 billion and consuming a mean time span of 12 years. Methods to cut back expenditure and hasten new drug discovery have prompted an arduous and compelling brainstorming exercise in the pharmaceutical industry. The engagement of Artificial Intelligence (AI), including the deep-learning (DL) component in particular, has been facilitated by the employment of classified big data, in concert with strikingly reinforced computing prowess and cloud storage, across all fields. AI has energized computer-facilitated drug discovery. An unrestricted espousing of machine learning (ML), especially DL, in many scientific specialties, and the technological refinements in computing hardware and software, in concert with various aspects of the problem, sustain this progress. ML algorithms have been extensively engaged for computer-facilitated drug discovery. DL methods, such as artificial neural networks (ANNs) comprising multiple buried processing layers, have of late seen a resurgence due to their capability to power automatic attribute elicitations from the input data, coupled with their ability to obtain nonlinear input-output pertinencies. Such features of DL methods augment classical ML techniques which bank on human-contrived molecular descriptors. A major part of the early reluctance concerning utility of AI in pharmaceutical discovery has begun to melt, thereby advancing medicinal chemistry. AI, along with modern experimental technical knowledge, is anticipated to invigorate the quest for new and improved pharmaceuticals in an expeditious, economical, and increasingly compelling manner. DL-facilitated methods have just initiated kickstarting for some integral issues in drug discovery. Many technological advances, such as "message-passing paradigms", "spatial-symmetry-preserving networks", "hybrid de novo design", and other ingenious ML exemplars, will definitely come to be pervasively widespread and help dissect many of the biggest, and most intriguing inquiries. Open data allocation and model augmentation will exert a decisive hold during the progress of drug discovery employing AI. This review will address the impending utilizations of AI to refine and bolster the drug discovery operation.
Collapse
Affiliation(s)
- Chayna Sarkar
- Department of Pharmacology, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), Mawdiangdiang, Shillong 793018, Meghalaya, India
| | - Biswadeep Das
- Department of Pharmacology, All India Institute of Medical Sciences (AIIMS), Virbhadra Road, Rishikesh 249203, Uttarakhand, India
- Correspondence: ; Tel./Fax: +91-135-708-856-0009
| | - Vikram Singh Rawat
- Department of Psychiatry, All India Institute of Medical Sciences (AIIMS), Virbhadra Road, Rishikesh 249203, Uttarakhand, India
| | - Julie Birdie Wahlang
- Department of Pharmacology, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), Mawdiangdiang, Shillong 793018, Meghalaya, India
| | - Arvind Nongpiur
- Department of Psychiatry, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), Mawdiangdiang, Shillong 793018, Meghalaya, India
| | - Iadarilang Tiewsoh
- Department of Medicine, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), Mawdiangdiang, Shillong 793018, Meghalaya, India
| | - Nari M. Lyngdoh
- Department of Anesthesiology, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), Mawdiangdiang, Shillong 793018, Meghalaya, India
| | - Debasmita Das
- Department of Computer Science and Engineering, Vellore Institute of Technology, Vellore Campus, Tiruvalam Road, Katpadi, Vellore 632014, Tamil Nadu, India
| | - Manjunath Bidarolli
- Department of Pharmacology, All India Institute of Medical Sciences (AIIMS), Virbhadra Road, Rishikesh 249203, Uttarakhand, India
| | - Hannah Theresa Sony
- Department of Pharmacology, All India Institute of Medical Sciences (AIIMS), Virbhadra Road, Rishikesh 249203, Uttarakhand, India
| |
Collapse
|
152
|
Choi J, Seo S, Park S. COMA: efficient structure-constrained molecular generation using contractive and margin losses. J Cheminform 2023; 15:8. [PMID: 36658602 PMCID: PMC9850577 DOI: 10.1186/s13321-023-00679-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 01/04/2023] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Structure-constrained molecular generation is a promising approach to drug discovery. The goal of structure-constrained molecular generation is to produce a novel molecule that is similar to a given source molecule (e.g. hit molecules) but has enhanced chemical properties (for lead optimization). Many structure-constrained molecular generation models with superior performance in improving chemical properties have been proposed; however, they still have difficulty producing many novel molecules that satisfy both the high structural similarities to each source molecule and improved molecular properties. METHODS We propose a structure-constrained molecular generation model that utilizes contractive and margin loss terms to simultaneously achieve property improvement and high structural similarity. The proposed model has two training phases; a generator first learns molecular representation vectors using metric learning with contractive and margin losses and then explores optimized molecular structure for target property improvement via reinforcement learning. RESULTS We demonstrate the superiority of our proposed method by comparing it with various state-of-the-art baselines and through ablation studies. Furthermore, we demonstrate the use of our method in drug discovery using an example of sorafenib-like molecular generation in patients with drug resistance.
Collapse
Affiliation(s)
- Jonghwan Choi
- grid.15444.300000 0004 0470 5454Department of Computer Science, Yonsei University, Yonsei-ro 50, 03722 Seoul, Republic of Korea ,UBLBio Corporation, Yeongtong-ro 237, 16679 Suwon, Gyeonggi-do Republic of Korea
| | - Sangmin Seo
- grid.15444.300000 0004 0470 5454Department of Computer Science, Yonsei University, Yonsei-ro 50, 03722 Seoul, Republic of Korea ,UBLBio Corporation, Yeongtong-ro 237, 16679 Suwon, Gyeonggi-do Republic of Korea
| | - Sanghyun Park
- grid.15444.300000 0004 0470 5454Department of Computer Science, Yonsei University, Yonsei-ro 50, 03722 Seoul, Republic of Korea
| |
Collapse
|
153
|
Abate C, Decherchi S, Cavalli A. Graph neural networks for conditional de novo drug design. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2023. [DOI: 10.1002/wcms.1651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Carlo Abate
- Fondazione Istituto Italiano di Tecnologia Genoa Italy
- Università degli Studi di Bologna Bologna Italy
| | | | - Andrea Cavalli
- Fondazione Istituto Italiano di Tecnologia Genoa Italy
- Università degli Studi di Bologna Bologna Italy
| |
Collapse
|
154
|
Moret M, Pachon Angona I, Cotos L, Yan S, Atz K, Brunner C, Baumgartner M, Grisoni F, Schneider G. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat Commun 2023; 14:114. [PMID: 36611029 PMCID: PMC9825622 DOI: 10.1038/s41467-022-35692-6] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 12/19/2022] [Indexed: 01/09/2023] Open
Abstract
Generative chemical language models (CLMs) can be used for de novo molecular structure generation by learning from a textual representation of molecules. Here, we show that hybrid CLMs can additionally leverage the bioactivity information available for the training compounds. To computationally design ligands of phosphoinositide 3-kinase gamma (PI3Kγ), a collection of virtual molecules was created with a generative CLM. This virtual compound library was refined using a CLM-based classifier for bioactivity prediction. This second hybrid CLM was pretrained with patented molecular structures and fine-tuned with known PI3Kγ ligands. Several of the computer-generated molecular designs were commercially available, enabling fast prescreening and preliminary experimental validation. A new PI3Kγ ligand with sub-micromolar activity was identified, highlighting the method's scaffold-hopping potential. Chemical synthesis and biochemical testing of two of the top-ranked de novo designed molecules and their derivatives corroborated the model's ability to generate PI3Kγ ligands with medium to low nanomolar activity for hit-to-lead expansion. The most potent compounds led to pronounced inhibition of PI3K-dependent Akt phosphorylation in a medulloblastoma cell model, demonstrating efficacy of PI3Kγ ligands in PI3K/Akt pathway repression in human tumor cells. The results positively advocate hybrid CLMs for virtual compound screening and activity-focused molecular design.
Collapse
Affiliation(s)
- Michael Moret
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Irene Pachon Angona
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Leandro Cotos
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Shen Yan
- University of Zurich, University Children's Hospital, Children's Research Center, Pediatric Molecular Neuro-Oncology Research, Lengghalde 5, 8008, Zurich, Switzerland
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Cyrill Brunner
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Martin Baumgartner
- University of Zurich, University Children's Hospital, Children's Research Center, Pediatric Molecular Neuro-Oncology Research, Lengghalde 5, 8008, Zurich, Switzerland
| | - Francesca Grisoni
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland.
- Eindhoven University of Technology, Institute for Complex Molecular Systems and Eindhoven Artificial Intelligence Systems Institute, Department of Biomedical Engineering, Groene Loper 7, 5612AZ, Eindhoven, The Netherlands.
- Center for 393 Living Technologies, Alliance TU/e, WUR, UU, UMC 394 Utrecht, Utrecht, 3584 CB, The Netherlands.
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland.
- ETH Singapore SEC Ltd, 1 CREATE Way, #06-01 CREATE Tower, Singapore, 138602, Singapore.
| |
Collapse
|
155
|
Thakur M, Bateman A, Brooksbank C, Freeberg M, Harrison M, Hartley M, Keane T, Kleywegt G, Leach A, Levchenko M, Morgan S, McDonagh E, Orchard S, Papatheodorou I, Velankar S, Vizcaino J, Witham R, Zdrazil B, McEntyre J. EMBL's European Bioinformatics Institute (EMBL-EBI) in 2022. Nucleic Acids Res 2023; 51:D9-D17. [PMID: 36477213 PMCID: PMC9825486 DOI: 10.1093/nar/gkac1098] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 10/21/2022] [Accepted: 10/31/2022] [Indexed: 12/13/2022] Open
Abstract
The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the status of services that EMBL-EBI data resources provide to scientific communities globally. The scale, openness, rich metadata and extensive curation of EMBL-EBI added-value databases makes them particularly well-suited as training sets for deep learning, machine learning and artificial intelligence applications, a selection of which are described here. The data resources at EMBL-EBI can catalyse such developments because they offer sustainable, high-quality data, collected in some cases over decades and made openly availability to any researcher, globally. Our aim is for EMBL-EBI data resources to keep providing the foundations for tools and research insights that transform fields across the life sciences.
Collapse
Affiliation(s)
| | - Alex Bateman
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Cath Brooksbank
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Mallory Freeberg
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Melissa Harrison
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Matthew Hartley
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Thomas Keane
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Gerard Kleywegt
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Andrew Leach
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Mariia Levchenko
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sarah Morgan
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Ellen M McDonagh
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
- OpenTargets, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sandra Orchard
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Irene Papatheodorou
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sameer Velankar
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Juan Antonio Vizcaino
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Rick Witham
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Barbara Zdrazil
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | | |
Collapse
|
156
|
Molecule generation toward target protein (SARS-CoV-2) using reinforcement learning-based graph neural network via knowledge graph. NETWORK MODELING AND ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS 2023; 12:13. [PMID: 36627927 PMCID: PMC9817447 DOI: 10.1007/s13721-023-00409-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 11/23/2022] [Accepted: 12/31/2022] [Indexed: 01/07/2023]
Abstract
AI-driven approaches are widely used in drug discovery, where candidate molecules are generated and tested on a target protein for binding affinity prediction. However, generating new compounds with desirable molecular properties such as Quantitative Estimate of Drug-likeness (QED) and Dopamine Receptor D2 activity (DRD2) while adhering to distinct chemical laws is challenging. To address these challenges, we proposed a graph-based deep learning framework to generate potential therapeutic drugs targeting the SARS-CoV-2 protein. Our proposed framework consists of two modules: a novel reinforcement learning (RL)-based graph generative module with knowledge graph (KG) and a graph early fusion approach (GEFA) for binding affinity prediction. The first module uses a gated graph neural network (GGNN) model under the RL environment for generating novel molecular compounds with desired properties and a custom-made KG for molecule screening. The second module uses GEFA to predict binding affinity scores between the generated compounds and target proteins. Experiments show how fine-tuning the GGNN model under the RL environment enhances the molecules with desired properties to generate 100 % valid and 100 % unique compounds using different scoring functions. Additionally, KG-based screening reduces the search space of generated candidate molecules by 96.64 % while retaining 95.38 % of promising binding molecules against SARS-CoV-2 protein, i.e., 3C-like protease (3CLpro). We achieved a binding affinity score of 8.185 from the top rank of generated compound. In addition, we compared top-ranked generated compounds to Indinavir on different parameters, including drug-likeness and medicinal chemistry, for qualitative analysis from a drug development perspective. Supplementary Information The online version contains supplementary material available at 10.1007/s13721-023-00409-2.
Collapse
|
157
|
Liu C, Zhang Y, Gao J, Zhang Q, Sun L, Ma Q, Qiao X, Li X, Liu J, Bu J, Zhang Z, Han L, Zhao D, Yang Y. A highly potent small-molecule antagonist of exportin-1 selectively eliminates CD44 +CD24 - enriched breast cancer stem-like cells. Drug Resist Updat 2023; 66:100903. [PMID: 36463808 DOI: 10.1016/j.drup.2022.100903] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 11/17/2022] [Accepted: 11/18/2022] [Indexed: 11/21/2022]
Abstract
Breast cancer stem-like cells (BCSCs) have been suggested as the underlying cause of tumor recurrence, metastasis and drug resistance in triple-negative breast cancer (TNBC). Here, we report the discovery and biological evaluation of a highly potent small-molecule antagonist of exportin-1, LFS-1107. We ascertained that exportin-1 (also named as CRM1) is a main cellular target of LFS-1107 by nuclear export functional assay, bio-layer interferometry binding assay and C528S mutant cell line. We found that LFS-1107 significantly inhibited TNBC tumor cells at low-range nanomolar concentration and LFS-1107 can selectively eliminate CD44+CD24- enriched BCSCs. We demonstrated that LFS-1107 can induce the nuclear retention of Survivin and consequent strong suppression of STAT3 transactivation abilities and the expression of downstream stemness regulators. Administration of LFS-1107 can strongly inhibit tumor growth in mouse xenograft model and eradicate BCSCs in residual tumor tissues. Moreover, LFS-1107 can significantly ablate the patient-derived tumor organoids (PDTOs) of TNBC as compared to a few approved cancer drugs. Lastly, we revealed that LFS-1107 can enhance the killing effects of chemotherapy drugs and downregulate multidrug resistance related protein targets. These new findings provide preclinical evidence of defining LFS-1107 as a promising therapeutic agent to deplete BCSCs for the treatment of TNBC.
Collapse
Affiliation(s)
- Caigang Liu
- Department of Oncology, Cancer Stem Cell and Translational Medicine Lab, Innovative Cancer Drug Research and Development Engineering Center of Liaoning Province, Shengjing Hospital of China Medical University, Shenyang 110004, China.
| | - Yixiao Zhang
- Department of Urology Surgery, Shengjing Hospital of China Medical University, Shenyang 110004, China
| | - Jiujiao Gao
- School of Bioengineering, Dalian University of Technology, Dalian 116023, China; Department of Pharmacology, Tsinghua University, Beijing 100191, China.
| | - Qi Zhang
- National Engineering Research Center of Pharmaceutics of Traditional Chinese Medicine, China Resources Sanjiu Medical & Pharmaceutical Co., Ltd, Shenzhen 518000, China
| | - Lisha Sun
- Department of Oncology, Cancer Stem Cell and Translational Medicine Lab, Innovative Cancer Drug Research and Development Engineering Center of Liaoning Province, Shengjing Hospital of China Medical University, Shenyang 110004, China
| | - Qingtian Ma
- Department of Oncology, Cancer Stem Cell and Translational Medicine Lab, Innovative Cancer Drug Research and Development Engineering Center of Liaoning Province, Shengjing Hospital of China Medical University, Shenyang 110004, China
| | - Xinbo Qiao
- Department of Oncology, Cancer Stem Cell and Translational Medicine Lab, Innovative Cancer Drug Research and Development Engineering Center of Liaoning Province, Shengjing Hospital of China Medical University, Shenyang 110004, China
| | - Xinnan Li
- Department of Oncology, Cancer Stem Cell and Translational Medicine Lab, Innovative Cancer Drug Research and Development Engineering Center of Liaoning Province, Shengjing Hospital of China Medical University, Shenyang 110004, China
| | - Jinchi Liu
- Department of Oncology, Cancer Stem Cell and Translational Medicine Lab, Innovative Cancer Drug Research and Development Engineering Center of Liaoning Province, Shengjing Hospital of China Medical University, Shenyang 110004, China
| | - Jiawen Bu
- Department of Oncology, Cancer Stem Cell and Translational Medicine Lab, Innovative Cancer Drug Research and Development Engineering Center of Liaoning Province, Shengjing Hospital of China Medical University, Shenyang 110004, China
| | - Zhan Zhang
- Department of Oncology, Cancer Stem Cell and Translational Medicine Lab, Innovative Cancer Drug Research and Development Engineering Center of Liaoning Province, Shengjing Hospital of China Medical University, Shenyang 110004, China
| | - Ling Han
- National Engineering Research Center of Pharmaceutics of Traditional Chinese Medicine, China Resources Sanjiu Medical & Pharmaceutical Co., Ltd, Shenzhen 518000, China
| | - Dongyu Zhao
- International Cancer Institute, Peking University Health Science Center, Peking University, Beijing 100191, China
| | - Yongliang Yang
- Department of Oncology, Cancer Stem Cell and Translational Medicine Lab, Innovative Cancer Drug Research and Development Engineering Center of Liaoning Province, Shengjing Hospital of China Medical University, Shenyang 110004, China; School of Bioengineering, Dalian University of Technology, Dalian 116023, China.
| |
Collapse
|
158
|
|
159
|
Noguchi S, Inoue J. Exploration of Chemical Space Guided by PixelCNN for Fragment-Based De Novo Drug Discovery. J Chem Inf Model 2022; 62:5988-6001. [PMID: 36454646 DOI: 10.1021/acs.jcim.2c01345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
We report a novel framework for achieving fragment-based molecular design using pixel convolutional neural network (PixelCNN) combined with the simplified molecular input line entry system (SMILES) as molecular representation. While a widely used recurrent neural network (RNN) assumes monotonically decaying correlations in strings, PixelCNN captures a periodicity among characters of SMILES. Thus, PixelCNN provides us with a novel solution for the analysis of chemical space by extracting the periodicity of molecular structures that will be buried in SMILES. Moreover, this characteristic enables us to generate molecules by combining several simple building blocks, such as a benzene ring and side-chain structures, which contributes to the effective exploration of chemical space by step-by-step searching for molecules from a target fragment. In conclusion, PixelCNN could be a powerful approach focusing on the periodicity of molecules to explore chemical space for the fragment-based molecular design.
Collapse
Affiliation(s)
- Satoshi Noguchi
- Department of Advanced Interdisciplinary Studies, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo153-8904, Japan
| | - Junya Inoue
- Institute for Industrial Science, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba277-0082, Japan.,Department of Materials Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo113-8656, Japan.,Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo153-8904, Japan
| |
Collapse
|
160
|
Tan Y, Dai L, Huang W, Guo Y, Zheng S, Lei J, Chen H, Yang Y. DRlinker: Deep Reinforcement Learning for Optimization in Fragment Linking Design. J Chem Inf Model 2022; 62:5907-5917. [PMID: 36404642 DOI: 10.1021/acs.jcim.2c00982] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Fragment-based drug discovery is a widely used strategy for drug design in both academic and pharmaceutical industries. Although fragments can be linked to generate candidate compounds by the latest deep generative models, generating linkers with specified attributes remains underdeveloped. In this study, we presented a novel framework, DRlinker, to control fragment linking toward compounds with given attributes through reinforcement learning. The method has been shown to be effective for many tasks from controlling the linker length and log P, optimizing predicted bioactivity of compounds, to various multiobjective tasks. Specifically, our model successfully generated 91.0% and 93.9% of compounds complying with the desired linker length and log P and improved the 7.5 pChEMBL value in bioactivity optimization. Finally, a quasi-scaffold-hopping study revealed that DRlinker could generate nearly 30% molecules with high 3D similarity but low 2D similarity to the lead inhibitor, demonstrating the benefits and applicability of DRlinker in actual fragment-based drug design.
Collapse
Affiliation(s)
- Youhai Tan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou510006, China
| | - Lingxue Dai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou510006, China
| | - Weifeng Huang
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou510006, China
| | - Yinfeng Guo
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou510006, China
| | - Shuangjia Zheng
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou510006, China.,Galixir Technologies, Beijing100083, China
| | - Jinping Lei
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou510006, China
| | - Hongming Chen
- Guangzhou Laboratory, No. 9 XinDaoHuanBei Road, Guangzhou International Bio Island, Guangzhou510005, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou510006, China
| |
Collapse
|
161
|
Kim H, Ko S, Kim BJ, Ryu SJ, Ahn J. Predicting chemical structure using reinforcement learning with a stack-augmented conditional variational autoencoder. J Cheminform 2022; 14:83. [PMID: 36494855 PMCID: PMC9733204 DOI: 10.1186/s13321-022-00666-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 12/03/2022] [Indexed: 12/13/2022] Open
Abstract
In this paper, a reinforcement learning model is proposed that can maximize the predicted binding affinity between a generated molecule and target proteins. The model used to generate molecules in the proposed model was the Stacked Conditional Variation AutoEncoder (Stack-CVAE), which acts as an agent in reinforcement learning so that the resulting chemical formulas have the desired chemical properties and show high binding affinity with specific target proteins. We generated 1000 chemical formulas using the chemical properties of sorafenib and the three target kinases of sorafenib. Then, we confirmed that Stack-CVAE generates more of the valid and unique chemical compounds that have the desired chemical properties and predicted binding affinity better than other generative models. More detailed analysis for 100 of the top scoring molecules show that they are novel ones not found in existing chemical databases. Moreover, they reveal significantly higher predicted binding affinity score for Raf kinases than for other kinases. Furthermore, they are highly druggable and synthesizable.
Collapse
Affiliation(s)
- Hwanhee Kim
- Department of Computer Science and Engineering, Incheon National University, Incheon, 22012 Republic of Korea
| | - Soohyun Ko
- GenesisEgo, Seoul, 04382 Republic of Korea
| | - Byung Ju Kim
- UBLBio Corporation, Suwon, 16679 Republic of Korea
| | - Sung Jin Ryu
- UBLBio Corporation, Suwon, 16679 Republic of Korea
| | - Jaegyoon Ahn
- Department of Computer Science and Engineering, Incheon National University, Incheon, 22012 Republic of Korea
| |
Collapse
|
162
|
Chan L, Kumar R, Verdonk M, Poelking C. A multilevel generative framework with hierarchical self-contrasting for bias control and transparency in structure-based ligand design. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00564-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
163
|
Li T, Zhao XM, Li L. Co-VAE: Drug-Target Binding Affinity Prediction by Co-Regularized Variational Autoencoders. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8861-8873. [PMID: 34652996 DOI: 10.1109/tpami.2021.3120428] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Identifying drug-target interactions has been a key step in drug discovery. Many computational methods have been proposed to directly determine whether drugs and targets can interact or not. Drug-target binding affinity is another type of data which could show the strength of the binding interaction between a drug and a target. However, it is more challenging to predict drug-target binding affinity, and thus a very few studies follow this line. In our work, we propose a novel co-regularized variational autoencoders (Co-VAE) to predict drug-target binding affinity based on drug structures and target sequences. The Co-VAE model consists of two VAEs for generating drug SMILES strings and target sequences, respectively, and a co-regularization part for generating the binding affinities. We theoretically prove that the Co-VAE model is to maximize the lower bound of the joint likelihood of drug, protein and their affinity. The Co-VAE could predict drug-target affinity and generate new drugs which share similar targets with the input drugs. The experimental results on two datasets show that the Co-VAE could predict drug-target affinity better than existing affinity prediction methods such as DeepDTA and DeepAffinity, and could generate more new valid drugs than existing methods such as GAN and VAE.
Collapse
|
164
|
Urbina F, Ekins S. The Commoditization of AI for Molecule Design. ARTIFICIAL INTELLIGENCE IN THE LIFE SCIENCES 2022; 2:100031. [PMID: 36211981 PMCID: PMC9541920 DOI: 10.1016/j.ailsci.2022.100031] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Anyone involved in designing or finding molecules in the life sciences over the past few years has witnessed a dramatic change in how we now work due to the COVID-19 pandemic. Computational technologies like artificial intelligence (AI) seemed to become ubiquitous in 2020 and have been increasingly applied as scientists worked from home and were separated from the laboratory and their colleagues. This shift may be more permanent as the future of molecule design across different industries will increasingly require machine learning models for design and optimization of molecules as they become "designed by AI". AI and machine learning has essentially become a commodity within the pharmaceutical industry. This perspective will briefly describe our personal opinions of how machine learning has evolved and is being applied to model different molecule properties that crosses industries in their utility and ultimately suggests the potential for tight integration of AI into equipment and automated experimental pipelines. It will also describe how many groups have implemented generative models covering different architectures, for de novo design of molecules. We also highlight some of the companies at the forefront of using AI to demonstrate how machine learning has impacted and influenced our work. Finally, we will peer into the future and suggest some of the areas that represent the most interesting technologies that may shape the future of molecule design, highlighting how we can help increase the efficiency of the design-make-test cycle which is currently a major focus across industries.
Collapse
Affiliation(s)
- Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| |
Collapse
|
165
|
Transformation rule-based molecular evolution for automatic gasoline molecule design. Chem Eng Sci 2022. [DOI: 10.1016/j.ces.2022.118119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
166
|
Bort W, Mazitov D, Horvath D, Bonachera F, Lin A, Marcou G, Baskin I, Madzhidov T, Varnek A. Inverse QSAR: Reversing Descriptor-Driven Prediction Pipeline Using Attention-Based Conditional Variational Autoencoder. J Chem Inf Model 2022; 62:5471-5484. [PMID: 36332178 DOI: 10.1021/acs.jcim.2c01086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In order to better foramize it, the notorious inverse-QSAR problem (finding structures of given QSAR-predicted properties) is considered in this paper as a two-step process including (i) finding "seed" descriptor vectors corresponding to user-constrained QSAR model output values and (ii) identifying the chemical structures best matching the "seed" vectors. The main development effort here was focused on the latter stage, proposing a new attention-based conditional variational autoencoder neural-network architecture based on recent developments in attention-based methods. The obtained results show that this workflow was capable of generating compounds predicted to display desired activity while being completely novel compared to the training database (ChEMBL). Moreover, the generated compounds show acceptable druglikeness and synthetic accessibility. Both pharmacophore and docking studies were carried out as "orthogonal" in silico validation methods, proving that some of de novo structures are, beyond being predicted active by 2D-QSAR models, clearly able to match binding 3D pharmacophores and bind the protein pocket.
Collapse
Affiliation(s)
- William Bort
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Daniyar Mazitov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Fanny Bonachera
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Arkadii Lin
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Igor Baskin
- Department of Material Science and Engineering, Technion─Israel Institute of Technology, 3200003 Haifa, Israel
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| |
Collapse
|
167
|
Yang J, Cai Y, Zhao K, Xie H, Chen X. Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov Today 2022; 27:103356. [PMID: 36113834 DOI: 10.1016/j.drudis.2022.103356] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 07/28/2022] [Accepted: 09/08/2022] [Indexed: 11/22/2022]
Abstract
Molecular fingerprints are used to represent chemical (structural, physicochemical, etc.) properties of large-scale chemical sets in a low computational cost way. They have a prominent role in transforming chemical data sets into consistent input formats (bit strings or numeric values) suitable for in silico approaches. In this review, we summarize and classify common and state-of-the-art fingerprints into eight different types (dictionary based, circular, topological, pharmacophore, protein-ligand interaction, shape based, reinforced, and multi). We also highlight applications of fingerprints in early drug research and development (R&D). Thus, this review provides a guide for the selection of appropriate fingerprints of compounds (or ligand-protein complexes) for use in drug R&D.
Collapse
Affiliation(s)
- Jingbo Yang
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Yiyang Cai
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Kairui Zhao
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Hongbo Xie
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| | - Xiujie Chen
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| |
Collapse
|
168
|
Fu T, Xiao C, Glass LM, Sun J. MOLER: Incorporate Molecule-Level Reward to Enhance Deep Generative Model for Molecule Optimization. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2022; 34:5459-5471. [PMID: 36590707 PMCID: PMC9802662 DOI: 10.1109/tkde.2021.3052150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
The goal of molecular optimization is to generate molecules similar to a target molecule but with better chemical properties. Deep generative models have shown great success in molecule optimization. However, due to the iterative local generation process of deep generative models, the resulting molecules can significantly deviate from the input in molecular similarity and size, leading to poor chemical properties. The key issue here is that the existing deep generative models restrict their attention on substructure-level generation without considering the entire molecule as a whole. To address this challenge, we propose Molecule-Level Reward functions (MOLER) to encourage (1) the input and the generated molecule to be similar, and to ensure (2) the generated molecule has a similar size to the input. The proposed method can be combined with various deep generative models. Policy gradient technique is introduced to optimize reward-based objectives with small computational overhead. Empirical studies show that MOLER achieves up to 20.2% relative improvement in success rate over the best baseline method on several properties, including QED, DRD2 and LogP.
Collapse
Affiliation(s)
- Tianfan Fu
- Department of Computer Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA
| | - Cao Xiao
- Analytics Center of Excellence, IQVIA, Cambridge, MA 02139 USA
| | - Lucas M Glass
- Analytics Center of Excellence, IQVIA, Plymouth Meeting, PA 19462 USA, and also with Temple University, Philadelphia, PA 19122 USA
| | - Jimeng Sun
- Computer Science Department, University of Illinois, Urbana-Champaign, Champaign, IL 61820 USA
| |
Collapse
|
169
|
Kumar R, Sharma A, Alexiou A, Ashraf GM. Artificial Intelligence in De novo Drug Design: Are We Still There? Curr Top Med Chem 2022; 22:2483-2492. [PMID: 36263480 DOI: 10.2174/1568026623666221017143244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 09/06/2022] [Accepted: 09/15/2022] [Indexed: 01/20/2023]
Abstract
BACKGROUND The artificial intelligence (AI)-assisted design of drug candidates with novel structures and desired properties has received significant attention in the recent past, so related areas of forward prediction that aim to discover chemical matters worth synthesizing and further experimental investigation. OBJECTIVES The purpose behind developing AI-driven models is to explore the broader chemical space and suggest new drug candidate scaffolds with promising therapeutic value. Moreover, it is anticipated that such AI-based models may not only significantly reduce the cost and time but also decrease the attrition rate of drug candidates that fail to reach the desirable endpoints at the final stages of drug development. In an attempt to develop AI-based models for de novo drug design, numerous methods have been proposed by various study groups by applying machine learning and deep learning algorithms to chemical datasets. However, there are many challenges in obtaining accurate predictions, and real breakthroughs in de novo drug design are still scarce. METHODS In this review, we explore the recent trends in developing AI-based models for de novo drug design to assess the current status, challenges, and opportunities in the field. CONCLUSION The consistently improved AI algorithms and the abundance of curated training chemical data indicate that AI-based de novo drug design should perform better than the current models. Improvements in the performance are warranted to obtain better outcomes in the form of potential drug candidates, which can perform well in in vivo conditions, especially in the case of more complex diseases.
Collapse
Affiliation(s)
- Rajnish Kumar
- Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Uttar Pradesh, India
| | - Anju Sharma
- Department of Applied Science, Indian Institute of Information Technology, Allahabad, Uttar Pradesh, India
| | - Athanasios Alexiou
- Novel Global Community Educational Foundation, Hebersham, 2770 NSW, Australia.,AFNP Med Austria, 1010 Wien, Austria
| | - Ghulam Md Ashraf
- Pre-Clinical Research Unit (PCRU), King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia.,Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
170
|
Yarish D, Garkot S, Grygorenko OO, Radchenko DS, Moroz YS, Gurbych O. Advancing molecular graphs with descriptors for the prediction of chemical reaction yields. J Comput Chem 2022; 44:76-92. [PMID: 36264601 DOI: 10.1002/jcc.27016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 08/31/2022] [Accepted: 09/05/2022] [Indexed: 11/08/2022]
Abstract
Chemical yield is the percentage of the reactants converted to the desired products. Chemists use predictive algorithms to select high-yielding reactions and score synthesis routes, saving time and reagents. This study suggests a novel graph neural network architecture for chemical yield prediction. The network combines structural information about participants of the transformation as well as molecular and reaction-level descriptors. It works with incomplete chemical reactions and generates reactants-product atom mapping. We show that the network benefits from advanced information by comparing it with several machine learning models and molecular representations. Models included logistic regression, support vector machine, CatBoost, and Bidirectional Encoder Representations from Transformers. Molecular representations included extended-connectivity fingerprints, Morgan fingerprints, SMILESVec embeddings, and textual. Classification and regression objectives were assessed for each model and feature set. The goal of each classification model was to separate zero- and non-zero-yielding reactions. The models were trained and evaluated on a proprietary dataset of 10 reaction types. Also, the models were benchmarked on two public single reaction type datasets. The study was supplemented with analysis of data, results, and errors, as well as the impact of steric factors, side reactions, isolation, and purification efficiency. The supplementary code is available at https://github.com/SoftServeInc/yield-paper.
Collapse
Affiliation(s)
| | - Sofiya Garkot
- SoftServe, Inc., Lviv, Ukraine.,Ukrainian Catholic University, Lviv, Ukraine
| | - Oleksandr O Grygorenko
- Enamine Ltd., Kyiv, Ukraine.,Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Dmytro S Radchenko
- Enamine Ltd., Kyiv, Ukraine.,Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Yurii S Moroz
- Taras Shevchenko National University of Kyiv, Kyiv, Ukraine.,Chemspace LLC, Kyiv, Ukraine
| | - Oleksandr Gurbych
- Lviv Polytechnic National University, Lviv, Ukraine.,Blackthorn AI, Ltd., London, UK
| |
Collapse
|
171
|
Sauer S, Matter H, Hessler G, Grebner C. Optimizing interactions to protein binding sites by integrating docking-scoring strategies into generative AI methods. Front Chem 2022; 10:1012507. [PMID: 36339033 PMCID: PMC9629386 DOI: 10.3389/fchem.2022.1012507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 09/20/2022] [Indexed: 11/14/2022] Open
Abstract
The identification and optimization of promising lead molecules is essential for drug discovery. Recently, artificial intelligence (AI) based generative methods provided complementary approaches for generating molecules under specific design constraints of relevance in drug design. The goal of our study is to incorporate protein 3D information directly into generative design by flexible docking plus an adapted protein-ligand scoring function, thereby moving towards automated structure-based design. First, the protein-ligand scoring function RFXscore integrating individual scoring terms, ligand descriptors, and combined terms was derived using the PDBbind database and internal data. Next, design results for different workflows are compared to solely ligand-based reward schemes. Our newly proposed, optimal workflow for structure-based generative design is shown to produce promising results, especially for those exploration scenarios, where diverse structures fitting to a protein binding site are requested. Best results are obtained using docking followed by RFXscore, while, depending on the exact application scenario, it was also found useful to combine this approach with other metrics that bias structure generation into "drug-like" chemical space, such as target-activity machine learning models, respectively.
Collapse
Affiliation(s)
| | | | | | - Christoph Grebner
- Synthetic Molecular Design, Integrated Drug Discovery, Sanofi, Frankfurt, Germany
| |
Collapse
|
172
|
Korshunova M, Huang N, Capuzzi S, Radchenko DS, Savych O, Moroz YS, Wells CI, Willson TM, Tropsha A, Isayev O. Generative and reinforcement learning approaches for the automated de novo design of bioactive compounds. Commun Chem 2022; 5:129. [PMID: 36697952 PMCID: PMC9814657 DOI: 10.1038/s42004-022-00733-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Accepted: 09/12/2022] [Indexed: 01/28/2023] Open
Abstract
Deep generative neural networks have been used increasingly in computational chemistry for de novo design of molecules with desired properties. Many deep learning approaches employ reinforcement learning for optimizing the target properties of the generated molecules. However, the success of this approach is often hampered by the problem of sparse rewards as the majority of the generated molecules are expectedly predicted as inactives. We propose several technical innovations to address this problem and improve the balance between exploration and exploitation modes in reinforcement learning. In a proof-of-concept study, we demonstrate the application of the deep generative recurrent neural network architecture enhanced by several proposed technical tricks to design inhibitors of the epidermal growth factor (EGFR) and further experimentally validate their potency. The proposed technical solutions are expected to substantially improve the success rate of finding novel bioactive compounds for specific biological targets using generative and reinforcement learning approaches.
Collapse
Affiliation(s)
- Maria Korshunova
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA, USA. .,Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Niles Huang
- Department of Biochemistry, University of Oxford, Oxford, UK
| | - Stephen Capuzzi
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Dmytro S Radchenko
- Enamine Ltd, 78 Chervonotkatska Street, Kyiv, 02094, Ukraine.,Taras Shevchenko National University of Kyiv, Volodymyrska Street 60, Kyiv, 01601, Ukraine
| | - Olena Savych
- Enamine Ltd, 78 Chervonotkatska Street, Kyiv, 02094, Ukraine
| | - Yuriy S Moroz
- Taras Shevchenko National University of Kyiv, Volodymyrska Street 60, Kyiv, 01601, Ukraine.,Chemspace LLC, Chervonotkatska Street 85, Suite 1, Kyiv, 02094, Ukraine
| | - Carrow I Wells
- Structual Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Timothy M Willson
- Structual Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA, USA. .,Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
173
|
Kong W, Hu Y, Zhang J, Tan Q. Application of SMILES-based molecular generative model in new drug design. Front Pharmacol 2022; 13:1046524. [PMCID: PMC9606214 DOI: 10.3389/fphar.2022.1046524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 10/03/2022] [Indexed: 11/13/2022] Open
Affiliation(s)
- Weiya Kong
- School of Sports Medicine and Rehabilitation, Beijing Sport University, Beijing, China
| | - Yuejuan Hu
- Nursing Department of Fenyang College of Shanxi Medical University, Fenyang, China
| | - Jiao Zhang
- Innovation and Entrepreneurship College of Hunan University of Finance and Economics, Changsha, China
| | - Qiaoyin Tan
- College of Teacher Education, Zhejiang Normal University, Jinhua, China
- *Correspondence: Qiaoyin Tan,
| |
Collapse
|
174
|
Atance SR, Diez JV, Engkvist O, Olsson S, Mercado R. De Novo Drug Design Using Reinforcement Learning with Graph-Based Deep Generative Models. J Chem Inf Model 2022; 62:4863-4872. [PMID: 36219571 DOI: 10.1021/acs.jcim.2c00838] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Machine learning provides effective computational tools for exploring the chemical space via deep generative models. Here, we propose a new reinforcement learning scheme to fine-tune graph-based deep generative models for de novo molecular design tasks. We show how our computational framework can successfully guide a pretrained generative model toward the generation of molecules with a specific property profile, even when such molecules are not present in the training set and unlikely to be generated by the pretrained model. We explored the following tasks: generating molecules of decreasing/increasing size, increasing drug-likeness, and increasing bioactivity. Using the proposed approach, we achieve a model which generates diverse compounds with predicted DRD2 activity for 95% of sampled molecules, outperforming previously reported methods on this metric.
Collapse
Affiliation(s)
- Sara Romeo Atance
- Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg, Pepparedsleden 1, 431 50Mölndal, Sweden.,Department of Computer Science and Engineering, Chalmers University of Technology, Rännvägen 6, 412 58Göteborg, Sweden
| | - Juan Viguera Diez
- Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg, Pepparedsleden 1, 431 50Mölndal, Sweden.,Department of Computer Science and Engineering, Chalmers University of Technology, Rännvägen 6, 412 58Göteborg, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg, Pepparedsleden 1, 431 50Mölndal, Sweden.,Department of Computer Science and Engineering, Chalmers University of Technology, Rännvägen 6, 412 58Göteborg, Sweden
| | - Simon Olsson
- Department of Computer Science and Engineering, Chalmers University of Technology, Rännvägen 6, 412 58Göteborg, Sweden
| | - Rocío Mercado
- Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg, Pepparedsleden 1, 431 50Mölndal, Sweden
| |
Collapse
|
175
|
Thomas M, O’Boyle NM, Bender A, de Graaf C. Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation. J Cheminform 2022; 14:68. [PMID: 36192789 PMCID: PMC9531503 DOI: 10.1186/s13321-022-00646-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/23/2022] [Indexed: 11/10/2022] Open
Abstract
A plethora of AI-based techniques now exists to conduct de novo molecule generation that can devise molecules conditioned towards a particular endpoint in the context of drug design. One popular approach is using reinforcement learning to update a recurrent neural network or language-based de novo molecule generator. However, reinforcement learning can be inefficient, sometimes requiring up to 105 molecules to be sampled to optimize more complex objectives, which poses a limitation when using computationally expensive scoring functions like docking or computer-aided synthesis planning models. In this work, we propose a reinforcement learning strategy called Augmented Hill-Climb based on a simple, hypothesis-driven hybrid between REINVENT and Hill-Climb that improves sample-efficiency by addressing the limitations of both currently used strategies. We compare its ability to optimize several docking tasks with REINVENT and benchmark this strategy against other commonly used reinforcement learning strategies including REINFORCE, REINVENT (version 1 and 2), Hill-Climb and best agent reminder. We find that optimization ability is improved ~ 1.5-fold and sample-efficiency is improved ~ 45-fold compared to REINVENT while still delivering appealing chemistry as output. Diversity filters were used, and their parameters were tuned to overcome observed failure modes that take advantage of certain diversity filter configurations. We find that Augmented Hill-Climb outperforms the other reinforcement learning strategies used on six tasks, especially in the early stages of training or for more difficult objectives. Lastly, we show improved performance not only on recurrent neural networks but also on a reinforcement learning stabilized transformer architecture. Overall, we show that Augmented Hill-Climb improves sample-efficiency for language-based de novo molecule generation conditioning via reinforcement learning, compared to the current state-of-the-art. This makes more computationally expensive scoring functions, such as docking, more accessible on a relevant timescale.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW UK
| | - Noel M. O’Boyle
- Computational Chemistry, Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG UK
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW UK
| | - Chris de Graaf
- Computational Chemistry, Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG UK
| |
Collapse
|
176
|
D'Souza S, Kv P, Balaji S. Training recurrent neural networks as generative neural networks for molecular structures: how does it impact drug discovery? Expert Opin Drug Discov 2022; 17:1071-1079. [PMID: 36216812 DOI: 10.1080/17460441.2023.2134340] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
INTRODUCTION Deep learning approaches have become popular in recent years in de novo drug design. Generative models for molecule generation and optimization have shown promising results. Molecules trained on different chemical data could regenerate molecules that were similar to the query molecule, thus supporting lead optimization. Recurrent neural network-based generative models have demonstrated application in low-data drug discovery, fragment-based drug design and in lead optimization. AREAS COVERED In this review, we have provided an overview of recurrent neural network models and their variants for molecule generation with recent examples. The input representation of molecules as SMILES and molecular graphs have been discussed. The evaluation benchmarks and metrics used in generative neural network models are also highlighted. For this, ScienceDirect, Web of Science, and Google Scholar databases were searched with the article's keywords and their combinations to retrieve the most relevant and up-to-date information. EXPERT OPINION The simplicity of SMILES notation makes it suitable for training a sequence-based model such as a recurrent neural network. However, models that could be trained on molecular graphs to generate molecular structures which could be synthesized could open new possibility for valid molecule generation and synthetic feasibility.
Collapse
Affiliation(s)
- Sofia D'Souza
- Department of Computer Science and Engineering, Manipal Institute of Technology, MAHE, Manipal, India
| | - Prema Kv
- Department of Computer Science and Engineering, Manipal Institute of Technology, MAHE, Manipal, India
| | - Seetharaman Balaji
- Department of Computer Science and Engineering, Manipal Institute of Technology, MAHE, Manipal, India
| |
Collapse
|
177
|
An interpretable machine learning model for selectivity of small molecules against homologous protein family. Future Med Chem 2022; 14:1441-1453. [PMID: 36169035 DOI: 10.4155/fmc-2022-0075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Aim: In the early stages of drug discovery, various experimental and computational methods are used to measure the specificity of small molecules against a target protein. The selectivity of small molecules remains a challenge leading to off-target side effects. Methods: We have developed a multitask deep learning model for predicting the selectivity on closely related homologs of the target protein. The model has been tested on the Janus-activated kinase and dopamine receptor families of proteins. Results & conclusion: The feature-based representation (extended connectivity fingerprint 4) with Extreme Gradient Boosting performed better when compared with deep neural network models in most of the evaluation metrics. Both the Extreme Gradient Boosting and deep neural network models outperformed the graph-based models. Furthermore, to decipher the model decision on selectivity, the important fragments associated with each homologous protein were identified.
Collapse
|
178
|
Chadi MA, Mousannif H, Aamouche A. Conditional reduction of the loss value versus reinforcement learning for biassing a de-novo drug design generator. J Cheminform 2022; 14:65. [PMID: 36167559 PMCID: PMC9516832 DOI: 10.1186/s13321-022-00643-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 09/07/2022] [Indexed: 11/10/2022] Open
Abstract
Deep learning has demonstrated promising results in de novo drug design. Often, the general pipeline consists of training a generative model (G) to learn the building rules of valid molecules, then using a biassing technique such as reinforcement learning (RL) to focus G on the desired chemical space. However, this sequential training of the same model for different tasks is known to be prone to a catastrophic forgetting (CF) phenomenon. This work presents a novel yet simple approach to bias G with significantly less CF than RL. The proposed method relies on backpropagating a reduced value of the cross-entropy loss used to train G according to the proportion of desired molecules that the biased-G can generate. We named our approach CRLV, short for conditional reduction of the loss value. We compared the two biased models (RL-biased-G and CRLV-biased-G) for four different objectives related to de novo drug design.CRLV-biased-G outperformed RL-biased-G in all four objectives and manifested appreciably less CF. Besides, an intersection analysis between molecules generated by the RL-biased-G and the CRLV-biased-G revealed that they can be used jointly without losing diversity given the low percentage of overlap between the two to further increase the desirability. Finally, we show that the difficulty of an objective is proportional to (i) its frequency in the dataset used to train G and (ii) the associated structural variance (SV), which is a new parameter we introduced in this paper, calling for novel exploration techniques for such difficult objectives.
Collapse
Affiliation(s)
- Mohamed-Amine Chadi
- Laboratoire Ingénierie des Systems Informatiques (LISI), Department of Computer Science, Faculty of Sciences Semlalia, Cadi Ayyad University, 40000, Marrakech, Morocco.
| | - Hajar Mousannif
- Laboratoire Ingénierie des Systems Informatiques (LISI), Department of Computer Science, Faculty of Sciences Semlalia, Cadi Ayyad University, 40000, Marrakech, Morocco
| | - Ahmed Aamouche
- Laboratoire Ingénierie des Systèmes et Applications (LISA), Ecole Nationale des Sciences Appliquées de Marrakech, Cadi Ayyad University, BP 575, Avenue Abdelkrim Khattabi, 40000, Marrakech, Morocco
| |
Collapse
|
179
|
Soleymani F, Paquet E, Viktor H, Michalowski W, Spinello D. Protein-protein interaction prediction with deep learning: A comprehensive review. Comput Struct Biotechnol J 2022; 20:5316-5341. [PMID: 36212542 PMCID: PMC9520216 DOI: 10.1016/j.csbj.2022.08.070] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/15/2022] Open
Abstract
Most proteins perform their biological function by interacting with themselves or other molecules. Thus, one may obtain biological insights into protein functions, disease prevalence, and therapy development by identifying protein-protein interactions (PPI). However, finding the interacting and non-interacting protein pairs through experimental approaches is labour-intensive and time-consuming, owing to the variety of proteins. Hence, protein-protein interaction and protein-ligand binding problems have drawn attention in the fields of bioinformatics and computer-aided drug discovery. Deep learning methods paved the way for scientists to predict the 3-D structure of proteins from genomes, predict the functions and attributes of a protein, and modify and design new proteins to provide desired functions. This review focuses on recent deep learning methods applied to problems including predicting protein functions, protein-protein interaction and their sites, protein-ligand binding, and protein design.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada
| | - Herna Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
180
|
Zheng S, Tan Y, Wang Z, Li C, Zhang Z, Sang X, Chen H, Yang Y. Accelerated rational PROTAC design via deep learning and molecular simulations. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00527-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
181
|
A pocket-based 3D molecule generative model fueled by experimental electron density. Sci Rep 2022; 12:15100. [PMID: 36068257 PMCID: PMC9448726 DOI: 10.1038/s41598-022-19363-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Accepted: 08/29/2022] [Indexed: 11/08/2022] Open
Abstract
We report for the first time the use of experimental electron density (ED) as training data for the generation of drug-like three-dimensional molecules based on the structure of a target protein pocket. Similar to a structural biologist building molecules based on their ED, our model functions with two main components: a generative adversarial network (GAN) to generate the ligand ED in the input pocket and an ED interpretation module for molecule generation. The model was tested on three targets: a kinase (hematopoietic progenitor kinase 1), protease (SARS-CoV-2 main protease), and nuclear receptor (vitamin D receptor), and evaluated with a reference dataset composed of over 8000 compounds that have their activities reported in the literature. The evaluation considered the chemical validity, chemical space distribution-based diversity, and similarity with reference active compounds concerning the molecular structure and pocket-binding mode. Our model can generate molecules with similar structures to classical active compounds and novel compounds sharing similar binding modes with active compounds, making it a promising tool for library generation supporting high-throughput virtual screening. The ligand ED generated can also be used to support fragment-based drug design. Our model is available as an online service to academic users via https://edmg.stonewise.cn/#/create .
Collapse
|
182
|
Wang J, Wang X, Sun H, Wang M, Zeng Y, Jiang D, Wu Z, Liu Z, Liao B, Yao X, Hsieh CY, Cao D, Chen X, Hou T. ChemistGA: A Chemical Synthesizable Accessible Molecular Generation Algorithm for Real-World Drug Discovery. J Med Chem 2022; 65:12482-12496. [PMID: 36065998 DOI: 10.1021/acs.jmedchem.2c01179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Many deep learning (DL)-based molecular generative models have been proposed to design novel molecules. These models may perform well on benchmarks, but they usually do not take real-world constraints into account, such as available training data set, synthetic accessibility, and scaffold diversity in drug discovery. In this study, a new algorithm, ChemistGA, was proposed by combining the traditional heuristic algorithm with DL, in which the crossover of the traditional genetic algorithm (GA) was redefined by DL in conjunction with GA, and an innovative backcrossing operation was implemented to generate desired molecules. Our results clearly show that ChemistGA not only retains the strength of the traditional GA but also greatly enhances the synthetic accessibility and success rate of the generated molecules with desired properties. Calculations on the two benchmarks illustrate that ChemistGA achieves impressive performance among the state-of-the-art baselines, and it opens a new avenue for the application of generative models to real-world drug discovery scenarios.
Collapse
Affiliation(s)
- Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,School of Computer Science, Wuhan University, Wuhan 430072, Hubei, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, P. R. China
| | - Xiaorui Wang
- CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, P. R. China.,State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa 999078, Macau(SAR), P. R. China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, P. R. China
| | - Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, P. R. China
| | - Yundian Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, P. R. China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Zeyi Liu
- DAMTP, Centre for Mathematical Sciences, University of Cambridge, Cambridge CB30WA, U.K
| | - Ben Liao
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, P. R. China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa 999078, Macau(SAR), P. R. China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, P. R. China
| | - Xi Chen
- School of Computer Science, Wuhan University, Wuhan 430072, Hubei, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| |
Collapse
|
183
|
Ando T, Shimizu N, Yamamoto N, Matsuzawa NN, Maeshima H, Kaneko H. Design of Molecules with Low Hole and Electron Reorganization Energy Using DFT Calculations and Bayesian Optimization. J Phys Chem A 2022; 126:6336-6347. [PMID: 36053017 DOI: 10.1021/acs.jpca.2c05229] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Materials exhibiting higher mobility than conventional organic semiconducting materials, such as fullerenes and fused thiophenes, are in high demand for applications in printed electronics. To discover new molecules that might show improved charge mobility, the adaptive design of experiments (DoE) to design molecules with low reorganization energy was performed by combining density functional theory (DFT) methods and machine learning techniques. DFT-calculated values of 165 molecules were used as an initial training dataset for a Gaussian process regression (GPR) model, and five rounds of molecular designs applying the GPR model and validation via DFT calculations were executed. As a result, new molecules whose reorganization energy is smaller than the lowest value in the initial training dataset were successfully discovered.
Collapse
Affiliation(s)
- Tatsuhito Ando
- Engineering Division, Panasonic Industry Co., Ltd., Kadoma, Osaka 571-8506, Japan
| | - Naoto Shimizu
- Department of Applied Chemistry, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
| | - Norihisa Yamamoto
- Department of Applied Chemistry, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
| | - Nobuyuki N Matsuzawa
- Engineering Division, Panasonic Industry Co., Ltd., Kadoma, Osaka 571-8506, Japan
| | - Hiroyuki Maeshima
- Engineering Division, Panasonic Industry Co., Ltd., Kadoma, Osaka 571-8506, Japan
| | - Hiromasa Kaneko
- Department of Applied Chemistry, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
| |
Collapse
|
184
|
Qian H, Lin C, Zhao D, Tu S, Xu L. AlphaDrug: protein target specific de novo molecular generation. PNAS NEXUS 2022; 1:pgac227. [PMID: 36714828 PMCID: PMC9802440 DOI: 10.1093/pnasnexus/pgac227] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 10/02/2022] [Indexed: 11/06/2022]
Abstract
Traditional drug discovery is very laborious, expensive, and time-consuming, due to the huge combinatorial complexity of the discrete molecular search space. Researchers have turned to machine learning methods for help to tackle this difficult problem. However, most existing methods are either virtual screening on the available database of compounds by protein-ligand affinity prediction, or unconditional molecular generation, which does not take into account the information of the protein target. In this paper, we propose a protein target-oriented de novo drug design method, called AlphaDrug. Our method is able to automatically generate molecular drug candidates in an autoregressive way, and the drug candidates can dock into the given target protein well. To fulfill this goal, we devise a modified transformer network for the joint embedding of protein target and the molecule, and a Monte Carlo tree search (MCTS) algorithm for the conditional molecular generation. In the transformer variant, we impose a hierarchy of skip connections from protein encoder to molecule decoder for efficient feature transfer. The transformer variant computes the probabilities of next atoms based on the protein target and the molecule intermediate. We use the probabilities to guide the look-ahead search by MCTS to enhance or correct the next-atom selection. Moreover, MCTS is also guided by a value function implemented by a docking program, such that the paths with many low docking values are seldom chosen. Experiments on diverse protein targets demonstrate the effectiveness of our methods, indicating that AlphaDrug is a potentially promising solution to target-specific de novo drug design.
Collapse
Affiliation(s)
- Hao Qian
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- Centre for Cognitive Machines and Computational Health (CMaCH), Shanghai Jiao Tong University, Shanghai 200240, China
| | - Cheng Lin
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- Centre for Cognitive Machines and Computational Health (CMaCH), Shanghai Jiao Tong University, Shanghai 200240, China
| | - Dengwei Zhao
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- Centre for Cognitive Machines and Computational Health (CMaCH), Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shikui Tu
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- Centre for Cognitive Machines and Computational Health (CMaCH), Shanghai Jiao Tong University, Shanghai 200240, China
| | - Lei Xu
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- Centre for Cognitive Machines and Computational Health (CMaCH), Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
185
|
Liu Z, Du J, Lin Z, Li Z, Liu B, Cui Z, Fang J, Xie L. DenovoProfiling: A webserver for de novo generated molecule library profiling. Comput Struct Biotechnol J 2022; 20:4082-4097. [PMID: 36016718 PMCID: PMC9379519 DOI: 10.1016/j.csbj.2022.07.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 07/25/2022] [Accepted: 07/25/2022] [Indexed: 01/10/2023] Open
Abstract
Various deep learning-based architectures for molecular generation have been proposed for de novo drug design. The flourish of the de novo molecular generation methods and applications has created a great demand for the visualization and functional profiling for the de novo generated molecules. An increasing number of publicly available chemogenomic databases sets good foundations and creates good opportunities for comprehensive profiling of the de novo library. In this paper, we present DenovoProfiling, a webserver dedicated to de novo library visualization and functional profiling. Currently, DenovoProfiling contains six modules: (1) identification & visualization module for chemical structure visualization and identify the reported structures, (2) chemical space module for chemical space exploration using similarity maps, principal components analysis (PCA), drug-like properties distribution, and scaffold-based clustering, (3) ADMET prediction module for predicting the ADMET properties of the de novo molecules, (4) molecular alignment module for three dimensional molecular shape analysis, (5) drugs mapping module for identifying structural similar drugs, and (6) target & pathway module for identifying the reported targets and corresponding functional pathways. DenovoProfiling could provide structural identification, chemical space exploration, drug mapping, and target & pathway information. The comprehensive annotated information could give users a clear picture of their de novo library and could guide the further selection of candidates for chemical synthesis and biological confirmation. DenovoProfiling is freely available at http://denovoprofiling.xielab.net.
Collapse
Key Words
- DDR1, Discovered potent discoidin domain receptor 1
- De novo drug design
- De novo molecule library
- Deep learning
- FBDD, Fragment-based drug design
- FDR, False discovery rate
- GAN, Generative adversarial networks
- HTS, High throughput screening
- LSTM, Long short-term memory
- Library profiling
- PCA, Principal components analysis
- RNN, Recurrent neural networks
- SCA, Scaffold-based classification approach
- VAE, Variational autoencoders
Collapse
Affiliation(s)
- Zhihong Liu
- School of Public Health, Xinxiang Medical University, Xinxiang, China
- Guangdong Provincial Key Laboratory of Microbial Culture Collection and Application, State Key Laboratory of Applied Microbiology Southern China, Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou 510070, China
| | - Jiewen Du
- Beijing Jingpai Technology Co., Ltd., 1500-1, Hailong Building Z-Park, Beijing 100090, China
| | - Ziying Lin
- Guangdong Provincial Key Laboratory of Microbial Culture Collection and Application, State Key Laboratory of Applied Microbiology Southern China, Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou 510070, China
| | - Ze Li
- School of Public Health, Xinxiang Medical University, Xinxiang, China
| | - Bingdong Liu
- Guangdong Provincial Key Laboratory of Microbial Culture Collection and Application, State Key Laboratory of Applied Microbiology Southern China, Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou 510070, China
| | - Zongbin Cui
- Guangdong Provincial Key Laboratory of Microbial Culture Collection and Application, State Key Laboratory of Applied Microbiology Southern China, Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou 510070, China
| | - Jiansong Fang
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, China
- Corresponding authors at: School of Public Health, Xinxiang Medical University, Xinxiang, China (L. Xie). Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, China (J. Fang).
| | - Liwei Xie
- School of Public Health, Xinxiang Medical University, Xinxiang, China
- Guangdong Provincial Key Laboratory of Microbial Culture Collection and Application, State Key Laboratory of Applied Microbiology Southern China, Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou 510070, China
- Zhujiang Hospital, Southern Medical University, Guangzhou, China
- Corresponding authors at: School of Public Health, Xinxiang Medical University, Xinxiang, China (L. Xie). Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou, China (J. Fang).
| |
Collapse
|
186
|
Li C, Wang C, Sun M, Zeng Y, Yuan Y, Gou Q, Wang G, Guo Y, Pu X. Correlated RNN Framework to Quickly Generate Molecules with Desired Properties for Energetic Materials in the Low Data Regime. J Chem Inf Model 2022; 62:4873-4887. [PMID: 35998331 DOI: 10.1021/acs.jcim.2c00997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Motivated by the challenging of deep learning on the low data regime and the urgent demand for intelligent design on highly energetic materials, we explore a correlated deep learning framework, which consists of three recurrent neural networks (RNNs) correlated by the transfer learning strategy, to efficiently generate new energetic molecules with a high detonation velocity in the case of very limited data available. To avoid the dependence on the external big data set, data augmentation by fragment shuffling of 303 energetic compounds is utilized to produce 500,000 molecules to pretrain RNN, through which the model can learn sufficient structure knowledge. Then the pretrained RNN is fine-tuned by focusing on the 303 energetic compounds to generate 7153 molecules similar to the energetic compounds. In order to more reliably screen the molecules with a high detonation velocity, the SMILE enumeration augmentation coupled with the pretrained knowledge is utilized to build an RNN-based prediction model, through which R2 is boosted from 0.4446 to 0.9572. The comparable performance with the transfer learning strategy based on an existing big database (ChEMBL) to produce the energetic molecules and drug-like ones further supports the effectiveness and generality of our strategy in the low data regime. High-precision quantum mechanics calculations further confirm that 35 new molecules present a higher detonation velocity and lower synthetic accessibility than the classic explosive RDX, along with good thermal stability. In particular, three new molecules are comparable to caged CL-20 in the detonation velocity. All the source codes and the data set are freely available at https://github.com/wangchenghuidream/RNNMGM.
Collapse
Affiliation(s)
- Chuan Li
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Chenghui Wang
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Ming Sun
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yan Zeng
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Yuan Yuan
- College of Management, Southwest University for Nationalities, Chengdu 610041, China
| | - Qiaolin Gou
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Guangchuan Wang
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
187
|
Staker J, Marshall K, Leswing K, Robertson T, Halls MD, Goldberg A, Morisato T, Maeshima H, Ando T, Arai H, Sasago M, Fujii E, Matsuzawa NN. De Novo Design of Molecules with Low Hole Reorganization Energy Based on a Quarter-Million Molecule DFT Screen: Part 2. J Phys Chem A 2022; 126:5837-5852. [PMID: 35984470 DOI: 10.1021/acs.jpca.2c04221] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Organic semiconductors have many desirable properties including improved manufacturing and flexible mechanical properties. Due to the vastness of chemical space, it is essential to efficiently explore chemical space when designing new materials, including through the use of generative techniques. New generative machine learning methods for molecular design continue to be published in the literature at a significant rate but successfully adapting methods to new chemistry and problem domains remains difficult. These challenges necessitate continual method evaluation to probe method viability for use in alternative applications not covered in the original works. In continuation of our previous work, we evaluate four additional machine-learning-based de novo methods for generating molecules with high predicted hole mobility for use in semiconductor applications. The four generative methods evaluated here are (1) Molecule Deep Q-Networks (MolDQN), which utilizes Deep-Q learning to directly optimize molecular structure graphs for desired properties instead of generating SMILES, (2) Graph-based Genetic Algorithm (GraphGA), which uses a genetic algorithm for optimization where crossovers and mutations are defined in terms of RDKit's reaction SMILES, (3) Generative Tensorial Reinforcement Learning (GENTRL), which is a variational autoencoder (VAE) with a learned prior distribution and optimized using reinforcement learning, and (4) Monte Carlo tree search exploration of chemical space in conjunction with a recurrent neural network (RNN) decoder (ChemTS). The generated molecules were evaluated using density functional theory (DFT) and we discovered better performing molecules with the GraphGA method compared to the other approaches.
Collapse
Affiliation(s)
- Joshua Staker
- Schrödinger Inc., 101 SW Main Street, Suite 1300, Portland, Oregon 97204, United States
| | - Kyle Marshall
- Schrödinger Inc., 1540 Broadway, 24th Floor, New York, New York 10036, United States
| | - Karl Leswing
- Schrödinger Inc., 1540 Broadway, 24th Floor, New York, New York 10036, United States
| | - Tim Robertson
- Schrödinger Inc., 1540 Broadway, 24th Floor, New York, New York 10036, United States
| | - Mathew D Halls
- Schrödinger Inc., 10201 Wateridge Circle, Suite 220, San Diego, California 92121, United States
| | - Alexander Goldberg
- Schrödinger Inc., 10201 Wateridge Circle, Suite 220, San Diego, California 92121, United States
| | - Tsuguo Morisato
- Schrödinger K. K., 13th Floor, Marunouchi Trust Tower North Building, 1-8-1 Marunouchi, Chiyoda-ku, Tokyo 100-0005, Japan
| | - Hiroyuki Maeshima
- Engineering Division, Panasonic Industry Co., Ltd., 1006 Kadoma, Kadoma, Osaka 571-8506, Japan
| | - Tatsuhito Ando
- Engineering Division, Panasonic Industry Co., Ltd., 1006 Kadoma, Kadoma, Osaka 571-8506, Japan
| | - Hideyuki Arai
- Engineering Division, Panasonic Industry Co., Ltd., 1006 Kadoma, Kadoma, Osaka 571-8506, Japan
| | - Masaru Sasago
- Engineering Division, Panasonic Industry Co., Ltd., 1006 Kadoma, Kadoma, Osaka 571-8506, Japan
| | - Eiji Fujii
- Engineering Division, Panasonic Industry Co., Ltd., 1006 Kadoma, Kadoma, Osaka 571-8506, Japan
| | - Nobuyuki N Matsuzawa
- Engineering Division, Panasonic Industry Co., Ltd., 1006 Kadoma, Kadoma, Osaka 571-8506, Japan
| |
Collapse
|
188
|
Ishitani R, Kataoka T, Rikimaru K. Molecular Design Method Using a Reversible Tree Representation of Chemical Compounds and Deep Reinforcement Learning. J Chem Inf Model 2022; 62:4032-4048. [PMID: 35960209 PMCID: PMC9472278 DOI: 10.1021/acs.jcim.2c00366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
Automatic design of molecules with specific chemical
and biochemical
properties is an important process in material informatics and computational
drug discovery. In this study, we designed a novel coarse-grained
tree representation of molecules (Reversible Junction Tree; “RJT”)
for the aforementioned purposes, which is reversely convertible to
the original molecule without external information. By leveraging
this representation, we further formulated the molecular design and
optimization problem as a tree-structure construction using deep reinforcement
learning (“RJT-RL”). In this method, all of the intermediate
and final states of reinforcement learning are convertible to valid
molecules, which could efficiently guide the optimization process
in simple benchmark tasks. We further examined the multiobjective
optimization and fine-tuning of the reinforcement learning models
using RJT-RL, demonstrating the applicability of our method to more
realistic tasks in drug discovery.
Collapse
Affiliation(s)
- Ryuichiro Ishitani
- Preferred Networks, Inc., 1-6-1 Otemachi, Chiyoda-ku, Tokyo 100-0004, Japan
| | - Toshiki Kataoka
- Preferred Networks, Inc., 1-6-1 Otemachi, Chiyoda-ku, Tokyo 100-0004, Japan
| | - Kentaro Rikimaru
- Preferred Networks, Inc., 1-6-1 Otemachi, Chiyoda-ku, Tokyo 100-0004, Japan
| |
Collapse
|
189
|
Kumar S, Kumar GS, Maitra SS, Malý P, Bharadwaj S, Sharma P, Dwivedi VD. Viral informatics: bioinformatics-based solution for managing viral infections. Brief Bioinform 2022; 23:6659740. [PMID: 35947964 DOI: 10.1093/bib/bbac326] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 06/26/2022] [Accepted: 07/18/2022] [Indexed: 11/13/2022] Open
Abstract
Several new viral infections have emerged in the human population and establishing as global pandemics. With advancements in translation research, the scientific community has developed potential therapeutics to eradicate or control certain viral infections, such as smallpox and polio, responsible for billions of disabilities and deaths in the past. Unfortunately, some viral infections, such as dengue virus (DENV) and human immunodeficiency virus-1 (HIV-1), are still prevailing due to a lack of specific therapeutics, while new pathogenic viral strains or variants are emerging because of high genetic recombination or cross-species transmission. Consequently, to combat the emerging viral infections, bioinformatics-based potential strategies have been developed for viral characterization and developing new effective therapeutics for their eradication or management. This review attempts to provide a single platform for the available wide range of bioinformatics-based approaches, including bioinformatics methods for the identification and management of emerging or evolved viral strains, genome analysis concerning the pathogenicity and epidemiological analysis, computational methods for designing the viral therapeutics, and consolidated information in the form of databases against the known pathogenic viruses. This enriched review of the generally applicable viral informatics approaches aims to provide an overview of available resources capable of carrying out the desired task and may be utilized to expand additional strategies to improve the quality of translation viral informatics research.
Collapse
Affiliation(s)
- Sanjay Kumar
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, India.,Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India
| | - Geethu S Kumar
- Department of Life Science, School of Basic Science and Research, Sharda University, Greater Noida, Uttar Pradesh, India.,Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India
| | | | - Petr Malý
- Laboratory of Ligand Engineering, Institute of Biotechnology of the Czech Academy of Sciences v.v.i., BIOCEV Research Center, Vestec, Czech Republic
| | - Shiv Bharadwaj
- Laboratory of Ligand Engineering, Institute of Biotechnology of the Czech Academy of Sciences v.v.i., BIOCEV Research Center, Vestec, Czech Republic
| | - Pradeep Sharma
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, India
| | - Vivek Dhar Dwivedi
- Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India.,Institute of Advanced Materials, IAAM, 59053 Ulrika, Sweden
| |
Collapse
|
190
|
Yang L, Yang G, Bing Z, Tian Y, Huang L, Niu Y, Yang L. Accelerating the discovery of anticancer peptides targeting lung and breast cancers with the Wasserstein autoencoder model and PSO algorithm. Brief Bioinform 2022; 23:6658854. [PMID: 35945135 DOI: 10.1093/bib/bbac320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 06/14/2022] [Accepted: 07/15/2022] [Indexed: 11/13/2022] Open
Abstract
In the development of targeted drugs, anticancer peptides (ACPs) have attracted great attention because of their high selectivity, low toxicity and minimal non-specificity. In this work, we report a framework of ACPs generation, which combines Wasserstein autoencoder (WAE) generative model and Particle Swarm Optimization (PSO) forward search algorithm guided by attribute predictive model to generate ACPs with desired properties. It is well known that generative models based on Variational AutoEncoder (VAE) and Generative Adversarial Networks (GAN) are difficult to be used for de novo design due to the problems of posterior collapse and difficult convergence of training. Our WAE-based generative model trains more successfully (lower perplexity and reconstruction loss) than both VAE and GAN-based generative models, and the semantic connections in the latent space of WAE accelerate the process of forward controlled generation of PSO, while VAE fails to capture this feature. Finally, we validated our pipeline on breast cancer targets (HIF-1) and lung cancer targets (VEGR, ErbB2), respectively. By peptide-protein docking, we found candidate compounds with the same binding sites as the peptides carried in the crystal structure but with higher binding affinity and novel structures, which may be potent antagonists that interfere with these target-mediated signaling.
Collapse
Affiliation(s)
- Lijuan Yang
- Institute of modern physics, Chinese Academy of Science, Lanzhou 730000, China.,School of Physics and Technology, Lanzhou University, Lanzhou 730000, China.,School of Physics, University of Chinese Academy of Science, Beijing 100049, China.,Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| | - Guanghui Yang
- Institute of modern physics, Chinese Academy of Science, Lanzhou 730000, China.,Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| | - Zhitong Bing
- Institute of modern physics, Chinese Academy of Science, Lanzhou 730000, China.,Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| | - Yuan Tian
- Institute of modern physics, Chinese Academy of Science, Lanzhou 730000, China.,School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
| | - Liang Huang
- School of Physics and Technology, Lanzhou University, Lanzhou 730000, China
| | - Yuzhen Niu
- Shandong Provincial Research Center for Bioinformatic Engineering and Technique, School of Life Sciences, Shandong University of Technology, Zibo 255000, China
| | - Lei Yang
- Institute of modern physics, Chinese Academy of Science, Lanzhou 730000, China.,Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| |
Collapse
|
191
|
García-Ortegón M, Simm GNC, Tripp AJ, Hernández-Lobato JM, Bender A, Bacallado S. DOCKSTRING: Easy Molecular Docking Yields Better Benchmarks for Ligand Design. J Chem Inf Model 2022; 62:3486-3502. [PMID: 35849793 PMCID: PMC9364321 DOI: 10.1021/acs.jcim.1c01334] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Indexed: 01/05/2023]
Abstract
The field of machine learning for drug discovery is witnessing an explosion of novel methods. These methods are often benchmarked on simple physicochemical properties such as solubility or general druglikeness, which can be readily computed. However, these properties are poor representatives of objective functions in drug design, mainly because they do not depend on the candidate compound's interaction with the target. By contrast, molecular docking is a widely applied method in drug discovery to estimate binding affinities. However, docking studies require a significant amount of domain knowledge to set up correctly, which hampers adoption. Here, we present dockstring, a bundle for meaningful and robust comparison of ML models using docking scores. dockstring consists of three components: (1) an open-source Python package for straightforward computation of docking scores, (2) an extensive dataset of docking scores and poses of more than 260,000 molecules for 58 medically relevant targets, and (3) a set of pharmaceutically relevant benchmark tasks such as virtual screening or de novo design of selective kinase inhibitors. The Python package implements a robust ligand and target preparation protocol that allows nonexperts to obtain meaningful docking scores. Our dataset is the first to include docking poses, as well as the first of its size that is a full matrix, thus facilitating experiments in multiobjective optimization and transfer learning. Overall, our results indicate that docking scores are a more realistic evaluation objective than simple physicochemical properties, yielding benchmark tasks that are more challenging and more closely related to real problems in drug discovery.
Collapse
Affiliation(s)
- Miguel García-Ortegón
- Statistical
Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Rd., Cambridge CB3 0WB, United Kingdom
| | - Gregor N. C. Simm
- Department
of Engineering, University of Cambridge, Trumpington St., Cambridge CB2 1PZ, United Kingdom
| | - Austin J. Tripp
- Department
of Engineering, University of Cambridge, Trumpington St., Cambridge CB2 1PZ, United Kingdom
| | | | - Andreas Bender
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield
Rd., Cambridge CB2 1EW, United Kingdom
| | - Sergio Bacallado
- Statistical
Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Rd., Cambridge CB3 0WB, United Kingdom
| |
Collapse
|
192
|
Lim S, Lee S, Piao Y, Choi M, Bang D, Gu J, Kim S. On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach. Comput Struct Biotechnol J 2022; 20:4288-4304. [PMID: 36051875 PMCID: PMC9399946 DOI: 10.1016/j.csbj.2022.07.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 07/29/2022] [Accepted: 07/29/2022] [Indexed: 11/22/2022] Open
Abstract
A large number of chemical compounds are available in databases such as PubChem and ZINC. However, currently known compounds, though large, represent only a fraction of possible compounds, which is known as chemical space. Many of these compounds in the databases are annotated with properties and assay data that can be used for drug discovery efforts. For this goal, a number of machine learning algorithms have been developed and recent deep learning technologies can be effectively used to navigate chemical space, especially for unknown chemical compounds, in terms of drug-related tasks. In this article, we survey how deep learning technologies can model and utilize chemical compound information in a task-oriented way by exploiting annotated properties and assay data in the chemical compounds databases. We first compile what kind of tasks are trying to be accomplished by machine learning methods. Then, we survey deep learning technologies to show their modeling power and current applications for accomplishing drug related tasks. Next, we survey deep learning techniques to address the insufficiency issue of annotated data for more effective navigation of chemical space. Chemical compound information alone may not be powerful enough for drug related tasks, thus we survey what kind of information, such as assay and gene expression data, can be used to improve the prediction power of deep learning models. Finally, we conclude this survey with four important newly developed technologies that are yet to be fully incorporated into computational analysis of chemical information.
Collapse
Affiliation(s)
- Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Sangseon Lee
- Institute of Computer Technology, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - MinGyu Choi
- Department of Chemistry, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Jeonghyeon Gu
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- MOGAM Institute for Biomedical Research, Yong-in 16924, South Korea
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| |
Collapse
|
193
|
Menon D, Ranganathan R. A Generative Approach to Materials Discovery, Design, and Optimization. ACS OMEGA 2022; 7:25958-25973. [PMID: 35936396 PMCID: PMC9352221 DOI: 10.1021/acsomega.2c03264] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 07/11/2022] [Indexed: 05/25/2023]
Abstract
Despite its potential to transform society, materials research suffers from a major drawback: its long research timeline. Recently, machine-learning techniques have emerged as a viable solution to this drawback and have shown accuracies comparable to other computational techniques like density functional theory (DFT) at a fraction of the computational time. One particular class of machine-learning models, known as "generative models", is of particular interest owing to its ability to approximate high-dimensional probability distribution functions, which in turn can be used to generate novel data such as molecular structures by sampling these approximated probability distribution functions. This review article aims to provide an in-depth understanding of the underlying mathematical principles of popular generative models such as recurrent neural networks, variational autoencoders, and generative adversarial networks and discuss their state-of-the-art applications in the domains of biomaterials and organic drug-like materials, energy materials, and structural materials. Here, we discuss a broad range of applications of these models spanning from the discovery of drugs that treat cancer to finding the first room-temperature superconductor and from the discovery and optimization of battery and photovoltaic materials to the optimization of high-entropy alloys. We conclude by presenting a brief outlook of the major challenges that lie ahead for the mainstream usage of these models for materials research.
Collapse
|
194
|
Tan RK, Liu Y, Xie L. Reinforcement learning for systems pharmacology-oriented and personalized drug design. Expert Opin Drug Discov 2022; 17:849-863. [PMID: 35510835 PMCID: PMC9824901 DOI: 10.1080/17460441.2022.2072288] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
INTRODUCTION Many multi-genic systemic diseases such as neurological disorders, inflammatory diseases, and the majority of cancers do not have effective treatments yet. Reinforcement learning powered systems pharmacology is a potentially effective approach to designing personalized therapies for untreatable complex diseases. AREAS COVERED In this survey, state-of-the-art reinforcement learning methods and their latest applications to drug design are reviewed. The challenges on harnessing reinforcement learning for systems pharmacology and personalized medicine are discussed. Potential solutions to overcome the challenges are proposed. EXPERT OPINION In spite of successful application of advanced reinforcement learning techniques to target-based drug discovery, new reinforcement learning strategies are needed to address systems pharmacology-oriented personalized de novo drug design.
Collapse
Affiliation(s)
- Ryan K. Tan
- Department of Computer Science, Hunter College, The City University of New York
| | - Yang Liu
- Department of Computer Science, Hunter College, The City University of New York
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York,Ph.D. Program in Computer Science, Biology & Biochemistry, The Graduate Center, The City University of New York,Helen and Robert Appel Alzheimer’s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University,Correspondence should be addressed to Lei Xie -
| |
Collapse
|
195
|
Sajjan M, Li J, Selvarajan R, Sureshbabu SH, Kale SS, Gupta R, Singh V, Kais S. Quantum machine learning for chemistry and physics. Chem Soc Rev 2022; 51:6475-6573. [PMID: 35849066 DOI: 10.1039/d2cs00203e] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Machine learning (ML) has emerged as a formidable force for identifying hidden but pertinent patterns within a given data set with the objective of subsequent generation of automated predictive behavior. In recent years, it is safe to conclude that ML and its close cousin, deep learning (DL), have ushered in unprecedented developments in all areas of physical sciences, especially chemistry. Not only classical variants of ML, even those trainable on near-term quantum hardwares have been developed with promising outcomes. Such algorithms have revolutionized materials design and performance of photovoltaics, electronic structure calculations of ground and excited states of correlated matter, computation of force-fields and potential energy surfaces informing chemical reaction dynamics, reactivity inspired rational strategies of drug designing and even classification of phases of matter with accurate identification of emergent criticality. In this review we shall explicate a subset of such topics and delineate the contributions made by both classical and quantum computing enhanced machine learning algorithms over the past few years. We shall not only present a brief overview of the well-known techniques but also highlight their learning strategies using statistical physical insight. The objective of the review is not only to foster exposition of the aforesaid techniques but also to empower and promote cross-pollination among future research in all areas of chemistry which can benefit from ML and in turn can potentially accelerate the growth of such algorithms.
Collapse
Affiliation(s)
- Manas Sajjan
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA
| | - Junxu Li
- Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA.,Department of Physics and Astronomy, Purdue University, West Lafayette, IN-47907, USA
| | - Raja Selvarajan
- Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA.,Department of Physics and Astronomy, Purdue University, West Lafayette, IN-47907, USA
| | - Shree Hari Sureshbabu
- Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA.,Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN-47907, USA
| | - Sumit Suresh Kale
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA
| | - Rishabh Gupta
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA
| | - Vinit Singh
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA
| | - Sabre Kais
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA.,Department of Physics and Astronomy, Purdue University, West Lafayette, IN-47907, USA.,Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN-47907, USA
| |
Collapse
|
196
|
Guo M, Shou W, Makatura L, Erps T, Foshey M, Matusik W. Polygrammar: Grammar for Digital Polymer Representation and Generation. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2022; 9:e2101864. [PMID: 35678650 PMCID: PMC9376847 DOI: 10.1002/advs.202101864] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 12/04/2021] [Indexed: 05/22/2023]
Abstract
Polymers are widely studied materials with diverse properties and applications determined by molecular structures. It is essential to represent these structures clearly and explore the full space of achievable chemical designs. However, existing approaches cannot offer comprehensive design models for polymers because of their inherent scale and structural complexity. Here, a parametric, context-sensitive grammar designed specifically for polymers (PolyGrammar) is proposed. Using the symbolic hypergraph representation and 14 simple production rules, PolyGrammar can represent and generate all valid polyurethane structures. An algorithm is presented to translate any polyurethane structure from the popular Simplified Molecular-Input Line-entry System (SMILES) string format into the PolyGrammar representation. The representative power of PolyGrammar is tested by translating a dataset of over 600 polyurethane samples collected from the literature. Furthermore, it is shown that PolyGrammar can be easily extended to other copolymers and homopolymers. By offering a complete, explicit representation scheme and an explainable generative model with validity guarantees, PolyGrammar takes an essential step toward a more comprehensive and practical system for polymer discovery and exploration. As the first bridge between formal languages and chemistry, PolyGrammar also serves as a critical blueprint to inform the design of similar grammars for other chemistries, including organic and inorganic molecules.
Collapse
Affiliation(s)
- Minghao Guo
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
- CUHK Multimedia LabThe Chinese University of Hong KongSha TinHong Kong
| | - Wan Shou
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Liane Makatura
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Timothy Erps
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Michael Foshey
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Wojciech Matusik
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| |
Collapse
|
197
|
Zhang J, Chen H. De Novo Molecule Design Using Molecular Generative Models Constrained by Ligand-Protein Interactions. J Chem Inf Model 2022; 62:3291-3306. [PMID: 35793555 DOI: 10.1021/acs.jcim.2c00177] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In recent years, molecular deep generative models have attracted much attention for its application in de novo drug design. The data-driven molecular deep generative model approximates the high dimensional distribution of the chemical space through learning from a large number of molecular structural data. So far, most of the molecular generative models rely on purely 2D ligand information in structure generation. Here, we propose a novel molecular deep generative model which adopts a recurrent neural network architecture coupled with a ligand-protein interaction fingerprint as constraints. The fingerprint was constructed on ligand docking poses and represents the 3D binding mode of ligands in the protein pocket. In the current work, generative models constrained with interaction fingerprints were trained and compared with normal RNN models. It has been shown that models trained with constraints of ligand-protein interaction fingerprint have a clear tendency to generating compounds maintaining similar binding modes. Our results demonstrate the potential application of the interaction fingerprint-constrained generative model for the targeted molecule generation and guided exploration on the drug-like chemical space.
Collapse
Affiliation(s)
- Jie Zhang
- Guangdong Provincial Key Laboratory of Laboratory Animals, Guangdong Laboratory Animals Monitoring Institute, Guangzhou 510663, P. R. China.,State Key Laboratory of Respiratory Disease, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, P. R. China.,Bioland Laboratory (Guangzhou Regenerative Medicine and Health─Guangdong Laboratory), Guangzhou 510530, P. R. China
| | - Hongming Chen
- Bioland Laboratory (Guangzhou Regenerative Medicine and Health─Guangdong Laboratory), Guangzhou 510530, P. R. China.,Guangzhou International Bio Island, Guangzhou Laboratory, No. 9 XinDaoHuanBei Road, Guangzhou 510005, China
| |
Collapse
|
198
|
Woodward DJ, Bradley AR, van Hoorn WP. Coverage Score: A Model Agnostic Method to Efficiently Explore Chemical Space. J Chem Inf Model 2022; 62:4391-4402. [PMID: 35867814 DOI: 10.1021/acs.jcim.2c00258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Selecting the most appropriate compounds to synthesize and test is a vital aspect of drug discovery. Methods like clustering and diversity present weaknesses in selecting the optimal sets for information gain. Active learning techniques often rely on an initial model and computationally expensive semi-supervised batch selection. Herein, we describe a new subset-based selection method, Coverage Score, that combines Bayesian statistics and information entropy to balance representation and diversity to select a maximally informative subset. Coverage Score can be influenced by prior selections and desirable properties. In this paper, subsets selected through Coverage Score are compared against subsets selected through model-independent and model-dependent techniques for several datasets. In drug-like chemical space, Coverage Score consistently selects subsets that lead to more accurate predictions compared to other selection methods. Subsets selected through Coverage Score produced Random Forest models that have a root-mean-square-error up to 12.8% lower than subsets selected at random and can retain up to 99% of the structural dissimilarity of a diversity selection.
Collapse
Affiliation(s)
- Daniel J Woodward
- Exscientia plc, The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Anthony R Bradley
- Exscientia plc, The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Willem P van Hoorn
- Exscientia plc, The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| |
Collapse
|
199
|
Wang A, Durrant JD. Open-Source Browser-Based Tools for Structure-Based Computer-Aided Drug Discovery. Molecules 2022; 27:4623. [PMID: 35889494 PMCID: PMC9319651 DOI: 10.3390/molecules27144623] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 07/17/2022] [Accepted: 07/18/2022] [Indexed: 01/27/2023] Open
Abstract
We here outline the importance of open-source, accessible tools for computer-aided drug discovery (CADD). We begin with a discussion of drug discovery in general to provide context for a subsequent discussion of structure-based CADD applied to small-molecule ligand discovery. Next, we identify usability challenges common to many open-source CADD tools. To address these challenges, we propose a browser-based approach to CADD tool deployment in which CADD calculations run in modern web browsers on users' local computers. The browser app approach eliminates the need for user-initiated download and installation, ensures broad operating system compatibility, enables easy updates, and provides a user-friendly graphical user interface. Unlike server apps-which run calculations "in the cloud" rather than on users' local computers-browser apps do not require users to upload proprietary information to a third-party (remote) server. They also eliminate the need for the difficult-to-maintain computer infrastructure required to run user-initiated calculations remotely. We conclude by describing some CADD browser apps developed in our lab, which illustrate the utility of this approach. Aside from introducing readers to these specific tools, we are hopeful that this review highlights the need for additional browser-compatible, user-friendly CADD software.
Collapse
Affiliation(s)
| | - Jacob D. Durrant
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA;
| |
Collapse
|
200
|
Patel LA, Chau P, Debesai S, Darwin L, Neale C. Drug Discovery by Automated Adaptation of Chemical Structure and Identity. J Chem Theory Comput 2022; 18:5006-5024. [PMID: 35834740 DOI: 10.1021/acs.jctc.1c01271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Computer-aided drug design offers the potential to dramatically reduce the cost and effort required for drug discovery. While screening-based methods are valuable in the early stages of hit identification, they are frequently succeeded by iterative, hypothesis-driven computations that require recurrent investment of human time and intuition. To increase automation, we introduce a computational method for lead refinement that combines concerted dynamics of the ligand/protein complex via molecular dynamics simulations with integrated Monte Carlo-based changes in the chemical formula of the ligand. This approach, which we refer to as ligand-exchange Monte Carlo molecular dynamics, accounts for solvent- and entropy-based contributions to competitive binding free energies by coupling the energetics of bound and unbound states during the ligand-exchange attempt. Quantitative comparison of relative binding free energies to reference values from free energy perturbation, conducted in vacuum, indicates that ligand-exchange Monte Carlo molecular dynamics simulations sample relevant conformational ensembles and are capable of identifying strongly binding compounds. Additional simulations demonstrate the use of an implicit solvent model. We speculate that the use of chemical graphs in which exchanges are only permitted between ligands with sufficient similarity may enable an automated search to capture some of the benefits provided by human intuition during hypothesis-guided lead refinement.
Collapse
|