1
|
Thomas M, Ahmad M, Tresadern G, de Fabritiis G. PromptSMILES: prompting for scaffold decoration and fragment linking in chemical language models. J Cheminform 2024; 16:77. [PMID: 38965600 PMCID: PMC11225391 DOI: 10.1186/s13321-024-00866-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 06/04/2024] [Indexed: 07/06/2024] Open
Abstract
SMILES-based generative models are amongst the most robust and successful recent methods used to augment drug design. They are typically used for complete de novo generation, however, scaffold decoration and fragment linking applications are sometimes desirable which requires a different grammar, architecture, training dataset and therefore, re-training of a new model. In this work, we describe a simple procedure to conduct constrained molecule generation with a SMILES-based generative model to extend applicability to scaffold decoration and fragment linking by providing SMILES prompts, without the need for re-training. In combination with reinforcement learning, we show that pre-trained, decoder-only models adapt to these applications quickly and can further optimize molecule generation towards a specified objective. We compare the performance of this approach to a variety of orthogonal approaches and show that performance is comparable or better. For convenience, we provide an easy-to-use python package to facilitate model sampling which can be found on GitHub and the Python Package Index.Scientific contributionThis novel method extends an autoregressive chemical language model to scaffold decoration and fragment linking scenarios. This doesn't require re-training, the use of a bespoke grammar, or curation of a custom dataset, as commonly required by other approaches.
Collapse
Affiliation(s)
- Morgan Thomas
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aguiader 88, 08003, Barcelona, Spain.
| | - Mazen Ahmad
- In Silico Discovery, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gary Tresadern
- In Silico Discovery, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gianni de Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aguiader 88, 08003, Barcelona, Spain.
- Acellera Labs, C Dr. Trueta 183, 08005, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
2
|
Guo J, Schwaller P. Augmented Memory: Sample-Efficient Generative Molecular Design with Reinforcement Learning. JACS AU 2024; 4:2160-2172. [PMID: 38938817 PMCID: PMC11200228 DOI: 10.1021/jacsau.4c00066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/29/2024] [Accepted: 04/01/2024] [Indexed: 06/29/2024]
Abstract
Sample efficiency is a fundamental challenge in de novo molecular design. Ideally, molecular generative models should learn to satisfy a desired objective under minimal calls to oracles (computational property predictors). This problem becomes more apparent when using oracles that can provide increased predictive accuracy but impose significant computational cost. Consequently, designing molecules that are optimized for such oracles cannot be achieved under a practical computational budget. Molecular generative models based on simplified molecular-input line-entry system (SMILES) have shown remarkable sample efficiency when coupled with reinforcement learning, as demonstrated in the practical molecular optimization (PMO) benchmark. Here, we first show that experience replay drastically improves the performance of multiple previously proposed algorithms. Next, we propose a novel algorithm called Augmented Memory that combines data augmentation with experience replay. We show that scores obtained from oracle calls can be reused to update the model multiple times. We compare Augmented Memory to previously proposed algorithms and show significantly enhanced sample efficiency in an exploitation task, a drug discovery case study requiring both exploration and exploitation, and a materials design case study optimizing explicitly for quantum-mechanical properties. Our method achieves a new state-of-the-art in sample-efficient de novo molecular design, outperforming all of the previously reported methods. The code is available at https://github.com/schwallergroup/augmented_memory.
Collapse
Affiliation(s)
- Jeff Guo
- Laboratory
of Artificial Chemical Intelligence (LIAC), Institut des Sciences
et Ingénierie Chimiques, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Philippe Schwaller
- Laboratory
of Artificial Chemical Intelligence (LIAC), Institut des Sciences
et Ingénierie Chimiques, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| |
Collapse
|
3
|
Hua Y, Luo L, Qiu H, Huang D, Zhao Y, Liu H, Lu T, Chen Y, Zhang Y, Jiang Y. Multimodal multi-task deep neural network framework for kinase-target prediction. Mol Divers 2023; 27:2491-2503. [PMID: 36369613 DOI: 10.1007/s11030-022-10565-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 11/01/2022] [Indexed: 11/13/2022]
Abstract
Kinase plays a significant role in various disease signaling pathways. Due to the highly conserved sequence of kinase family members, understanding the selectivity profile of kinase inhibitors remains a priority for drug discovery. Previous methods for kinase selectivity identification use biochemical assays, which are very useful but limited by the protein available. The lack of kinase selectivity can exert benefits but also can cause adverse effects. With the explosion of the dataset for kinase activities, current computational methods can achieve accuracy for large-scale selectivity predictions. Here, we present a multimodal multi-task deep neural network model for kinase selectivity prediction by calculating the fingerprint and physiochemical descriptors. With the multimodal inputs of structure and physiochemical properties information, the multi-task framework could accurately predict the kinome map for selectivity analysis. The proposed model displays better performance for kinase-target prediction based on system evaluations.
Collapse
Affiliation(s)
- Yi Hua
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Lin Luo
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Haodi Qiu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Dingfang Huang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Yang Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| | - Yulei Jiang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| |
Collapse
|
4
|
Stanley M, Segler M. Fake it until you make it? Generative de novo design and virtual screening of synthesizable molecules. Curr Opin Struct Biol 2023; 82:102658. [PMID: 37473637 DOI: 10.1016/j.sbi.2023.102658] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 07/22/2023]
Abstract
Computational techniques, including virtual screening, de novo design, and generative models, play an increasing role in expediting DMTA cycles for modern molecular discovery. However, computationally proposed molecules must be synthetically feasible for laboratory testing. In this perspective, we offer a succinct introduction to the subject, and showcase typical workflows to integrate synthesis planning, synthesizability scoring, and molecule generation. Finally, we address limitations and opportunities for future research.
Collapse
Affiliation(s)
- Megan Stanley
- Microsoft Research AI4Science, UK. https://twitter.com/@megjanestanley
| | | |
Collapse
|
5
|
Ivanenkov Y, Zagribelnyy B, Malyshev A, Evteev S, Terentiev V, Kamya P, Bezrukov D, Aliper A, Ren F, Zhavoronkov A. The Hitchhiker's Guide to Deep Learning Driven Generative Chemistry. ACS Med Chem Lett 2023; 14:901-915. [PMID: 37465301 PMCID: PMC10351082 DOI: 10.1021/acsmedchemlett.3c00041] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 06/09/2023] [Indexed: 07/20/2023] Open
Abstract
This microperspective covers the most recent research outcomes of artificial intelligence (AI) generated molecular structures from the point of view of the medicinal chemist. The main focus is on studies that include synthesis and experimental in vitro validation in biochemical assays of the generated molecular structures, where we analyze the reported structures' relevance in modern medicinal chemistry and their novelty. The authors believe that this review would be appreciated by medicinal chemistry and AI-driven drug design (AIDD) communities and can be adopted as a comprehensive approach for qualifying different research outcomes in AIDD.
Collapse
Affiliation(s)
- Yan Ivanenkov
- Insilico
Medicine Hong Kong Ltd., Science Park East Avenue, Hong Kong Science Park, Pak Shek Kok, Hong Kong
| | - Bogdan Zagribelnyy
- Insilico
Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, P.O.
Box 145748, Masdar City, Abu Dhabi United Arab Emirates
| | - Alex Malyshev
- Insilico
Medicine Hong Kong Ltd., Science Park East Avenue, Hong Kong Science Park, Pak Shek Kok, Hong Kong
| | - Sergei Evteev
- Insilico
Medicine Hong Kong Ltd., Science Park East Avenue, Hong Kong Science Park, Pak Shek Kok, Hong Kong
| | - Victor Terentiev
- Insilico
Medicine Hong Kong Ltd., Science Park East Avenue, Hong Kong Science Park, Pak Shek Kok, Hong Kong
| | - Petrina Kamya
- Insilico
Medicine Canada Inc., 3710-1250 René-Lévesque Blvd W, Montreal, Quebec, Canada H3B 4W8
| | - Dmitry Bezrukov
- Insilico
Medicine Hong Kong Ltd., Science Park East Avenue, Hong Kong Science Park, Pak Shek Kok, Hong Kong
| | - Alex Aliper
- Insilico
Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, P.O.
Box 145748, Masdar City, Abu Dhabi United Arab Emirates
| | - Feng Ren
- Insilico
Medicine Shanghai Ltd., Suite 901, Tower C, Changtai Plaza, 2889 Jinke Road, Pudong New District, Shanghai 201203, China
| | - Alex Zhavoronkov
- Insilico
Medicine Hong Kong Ltd., Science Park East Avenue, Hong Kong Science Park, Pak Shek Kok, Hong Kong
| |
Collapse
|