1
|
Romanelli V, Annunziata D, Cerchia C, Cerciello D, Piccialli F, Lavecchia A. Enhancing De Novo Drug Design across Multiple Therapeutic Targets with CVAE Generative Models. ACS OMEGA 2024; 9:43963-43976. [PMID: 39493989 PMCID: PMC11525747 DOI: 10.1021/acsomega.4c08027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Revised: 09/25/2024] [Accepted: 09/30/2024] [Indexed: 11/05/2024]
Abstract
Drug discovery is a costly and time-consuming process, necessitating innovative strategies to enhance efficiency across different stages, from initial hit identification to final market approval. Recent advancement in deep learning (DL), particularly in de novo drug design, show promise. Generative models, a subclass of DL algorithms, have significantly accelerated the de novo drug design process by exploring vast areas of chemical space. Here, we introduce a Conditional Variational Autoencoder (CVAE) generative model tailored for de novo molecular design tasks, utilizing both SMILES and SELFIES as molecular representations. Our computational framework successfully generates molecules with specific property profiles validated though metrics such as uniqueness, validity, novelty, quantitative estimate of drug-likeness (QED), and synthetic accessibility (SA). We evaluated our model's efficacy in generating novel molecules capable of binding to three therapeutic molecular targets: CDK2, PPARγ, and DPP-IV. Comparing with state-of-the-art frameworks demonstrated our model's ability to achieve higher structural diversity while maintaining the molecular properties ranges observed in the training set molecules. This proposed model stands as a valuable resource for advancing de novo molecular design capabilities.
Collapse
Affiliation(s)
- Virgilio Romanelli
- Department
of Pharmacy, “Drug Discovery Laboratory”, University of Naples Federico II, Naples 80131, Italy
| | - Daniela Annunziata
- Department
of Mathematics and Applications “R. Caccioppoli”, University of Naples Federico II, Naples 80126, Italy
| | - Carmen Cerchia
- Department
of Pharmacy, “Drug Discovery Laboratory”, University of Naples Federico II, Naples 80131, Italy
| | - Donato Cerciello
- Department
of Mathematics and Applications “R. Caccioppoli”, University of Naples Federico II, Naples 80126, Italy
| | - Francesco Piccialli
- Department
of Mathematics and Applications “R. Caccioppoli”, University of Naples Federico II, Naples 80126, Italy
| | - Antonio Lavecchia
- Department
of Pharmacy, “Drug Discovery Laboratory”, University of Naples Federico II, Naples 80131, Italy
| |
Collapse
|
2
|
Kneiding H, Balcells D. Augmenting genetic algorithms with machine learning for inverse molecular design. Chem Sci 2024:d4sc02934h. [PMID: 39296997 PMCID: PMC11404003 DOI: 10.1039/d4sc02934h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 09/09/2024] [Indexed: 09/21/2024] Open
Abstract
Evolutionary and machine learning methods have been successfully applied to the generation of molecules and materials exhibiting desired properties. The combination of these two paradigms in inverse design tasks can yield powerful methods that explore massive chemical spaces more efficiently, improving the quality of the generated compounds. However, such synergistic approaches are still an incipient area of research and appear underexplored in the literature. This perspective covers different ways of incorporating machine learning approaches into evolutionary learning frameworks, with the overall goal of increasing the optimization efficiency of genetic algorithms. In particular, machine learning surrogate models for faster fitness function evaluation, discriminator models to control population diversity on-the-fly, machine learning based crossover operations, and evolution in latent space are discussed. The further potential of these synergistic approaches in generative tasks is also assessed, outlining promising directions for future developments.
Collapse
Affiliation(s)
- Hannes Kneiding
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo P.O. Box 1033, Blindern 0315 Oslo Norway
| | - David Balcells
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo P.O. Box 1033, Blindern 0315 Oslo Norway
| |
Collapse
|
3
|
He J, Tibo A, Janet JP, Nittinger E, Tyrchan C, Czechtizky W, Engkvist O. Evaluation of reinforcement learning in transformer-based molecular design. J Cheminform 2024; 16:95. [PMID: 39118113 PMCID: PMC11312936 DOI: 10.1186/s13321-024-00887-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 07/21/2024] [Indexed: 08/10/2024] Open
Abstract
Designing compounds with a range of desirable properties is a fundamental challenge in drug discovery. In pre-clinical early drug discovery, novel compounds are often designed based on an already existing promising starting compound through structural modifications for further property optimization. Recently, transformer-based deep learning models have been explored for the task of molecular optimization by training on pairs of similar molecules. This provides a starting point for generating similar molecules to a given input molecule, but has limited flexibility regarding user-defined property profiles. Here, we evaluate the effect of reinforcement learning on transformer-based molecular generative models. The generative model can be considered as a pre-trained model with knowledge of the chemical space close to an input compound, while reinforcement learning can be viewed as a tuning phase, steering the model towards chemical space with user-specific desirable properties. The evaluation of two distinct tasks-molecular optimization and scaffold discovery-suggest that reinforcement learning could guide the transformer-based generative model towards the generation of more compounds of interest. Additionally, the impact of pre-trained models, learning steps and learning rates are investigated.Scientific contributionOur study investigates the effect of reinforcement learning on a transformer-based generative model initially trained for generating molecules similar to starting molecules. The reinforcement learning framework is applied to facilitate multiparameter optimisation of starting molecules. This approach allows for more flexibility for optimizing user-specific property profiles and helps finding more ideas of interest.
Collapse
Affiliation(s)
- Jiazhen He
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Eva Nittinger
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Christian Tyrchan
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Werngard Czechtizky
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
4
|
Hu J, Wu P, Wang S, Wang B, Yang G. A Human Feedback Strategy for Photoresponsive Molecules in Drug Delivery: Utilizing GPT-2 and Time-Dependent Density Functional Theory Calculations. Pharmaceutics 2024; 16:1014. [PMID: 39204359 PMCID: PMC11359544 DOI: 10.3390/pharmaceutics16081014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 07/11/2024] [Accepted: 07/19/2024] [Indexed: 09/04/2024] Open
Abstract
Photoresponsive drug delivery stands as a pivotal frontier in smart drug administration, leveraging the non-invasive, stable, and finely tunable nature of light-triggered methodologies. The generative pre-trained transformer (GPT) has been employed to generate molecular structures. In our study, we harnessed GPT-2 on the QM7b dataset to refine a UV-GPT model with adapters, enabling the generation of molecules responsive to UV light excitation. Utilizing the Coulomb matrix as a molecular descriptor, we predicted the excitation wavelengths of these molecules. Furthermore, we validated the excited state properties through quantum chemical simulations. Based on the results of these calculations, we summarized some tips for chemical structures and integrated them into the alignment of large-scale language models within the reinforcement learning from human feedback (RLHF) framework. The synergy of these findings underscores the successful application of GPT technology in this critical domain.
Collapse
Affiliation(s)
- Junjie Hu
- Faculty of Medicine, Imperial College London, London SW7 2AZ, UK
| | - Peng Wu
- School of Chemistry and Chemical Engineering, Ningxia University, Yinchuan 750014, China
| | - Shiyi Wang
- Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK
| | - Binju Wang
- College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Guang Yang
- Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK
- National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK
- Cardiovascular Research Centre, Royal Brompton Hospital, London SW3 6NP, UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, London WC2R 2LS, UK
| |
Collapse
|
5
|
Atz K, Nippa DF, Müller AT, Jost V, Anelli A, Reutlinger M, Kramer C, Martin RE, Grether U, Schneider G, Wuitschik G. Geometric deep learning-guided Suzuki reaction conditions assessment for applications in medicinal chemistry. RSC Med Chem 2024; 15:2310-2321. [PMID: 39026644 PMCID: PMC11253849 DOI: 10.1039/d4md00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/25/2024] [Indexed: 07/20/2024] Open
Abstract
Suzuki cross-coupling reactions are considered a valuable tool for constructing carbon-carbon bonds in small molecule drug discovery. However, the synthesis of chemical matter often represents a time-consuming and labour-intensive bottleneck. We demonstrate how machine learning methods trained on high-throughput experimentation (HTE) data can be leveraged to enable fast reaction condition selection for novel coupling partners. We show that the trained models support chemists in determining suitable catalyst-solvent-base combinations for individual transformations including an evaluation of the need for HTE screening. We introduce an algorithm for designing 96-well plates optimized towards reaction yields and discuss the model performance of zero- and few-shot machine learning. The best-performing machine learning model achieved a three-category classification accuracy of 76.3% (±0.2%) and an F 1-score for a binary classification of 79.1% (±0.9%). Validation on eight reactions revealed a receiver operating characteristic (ROC) curve (AUC) value of 0.82 (±0.07) for few-shot machine learning. On the other hand, zero-shot machine learning models achieved a mean ROC-AUC value of 0.63 (±0.16). This study positively advocates the application of few-shot machine learning-guided reaction condition selection for HTE campaigns in medicinal chemistry and highlights practical applications as well as challenges associated with zero-shot machine learning.
Collapse
Affiliation(s)
- Kenneth Atz
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - David F Nippa
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Alex T Müller
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Vera Jost
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Andrea Anelli
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Michael Reutlinger
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Christian Kramer
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Rainer E Martin
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Uwe Grether
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich Vladimir-Prelog-Weg 4 8093 Zurich Switzerland
| | - Georg Wuitschik
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| |
Collapse
|
6
|
Kim J, Chang W, Ji H, Joung I. Quantum-Informed Molecular Representation Learning Enhancing ADMET Property Prediction. J Chem Inf Model 2024; 64:5028-5040. [PMID: 38916580 DOI: 10.1021/acs.jcim.4c00772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
We examined pretraining tasks leveraging abundant labeled data to effectively enhance molecular representation learning in downstream tasks, specifically emphasizing graph transformers to improve the prediction of ADMET properties. Our investigation revealed limitations in previous pretraining tasks and identified more meaningful training targets, ranging from 2D molecular descriptors to extensive quantum chemistry simulations. These data were seamlessly integrated into supervised pretraining tasks. The implementation of our pretraining strategy and multitask learning outperforms conventional methods, achieving state-of-the-art outcomes in 7 out of 22 ADMET tasks within the Therapeutics Data Commons by utilizing a shared encoder across all tasks. Our approach underscores the effectiveness of learning molecular representations and highlights the potential for scalability when leveraging extensive data sets, marking a significant advancement in this domain.
Collapse
Affiliation(s)
- Jungwoo Kim
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| | - Woojae Chang
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| | - Hyunjun Ji
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| | - InSuk Joung
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| |
Collapse
|
7
|
An Y, Lim J, Glavatskikh M, Wang X, Norris-Drouin J, Hardy PB, Leisner TM, Pearce KH, Kireev D. In silico fragment-based discovery of CIB1-directed anti-tumor agents by FRASE-bot. Nat Commun 2024; 15:5564. [PMID: 38956119 PMCID: PMC11219766 DOI: 10.1038/s41467-024-49892-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 06/19/2024] [Indexed: 07/04/2024] Open
Abstract
Chemical probes are an indispensable tool for translating biological discoveries into new therapies, though are increasingly difficult to identify since novel therapeutic targets are often hard-to-drug proteins. We introduce FRASE-based hit-finding robot (FRASE-bot), to expedite drug discovery for unconventional therapeutic targets. FRASE-bot mines available 3D structures of ligand-protein complexes to create a database of FRAgments in Structural Environments (FRASE). The FRASE database can be screened to identify structural environments similar to those in the target protein and seed the target structure with relevant ligand fragments. A neural network model is used to retain fragments with the highest likelihood of being native binders. The seeded fragments then inform ultra-large-scale virtual screening of commercially available compounds. We apply FRASE-bot to identify ligands for Calcium and Integrin Binding protein 1 (CIB1), a promising drug target implicated in triple negative breast cancer. FRASE-based virtual screening identifies a small-molecule CIB1 ligand (with binding confirmed in a TR-FRET assay) showing specific cell-killing activity in CIB1-dependent cancer cells, but not in CIB1-depletion-insensitive cells.
Collapse
Affiliation(s)
- Yi An
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Jiwoong Lim
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Marta Glavatskikh
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Xiaowen Wang
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
- Chemistry department, University of Missouri, Columbia, Columbia, MO, 65211, USA
| | - Jacqueline Norris-Drouin
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - P Brian Hardy
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Tina M Leisner
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Kenneth H Pearce
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA.
| | - Dmitri Kireev
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA.
- Chemistry department, University of Missouri, Columbia, Columbia, MO, 65211, USA.
| |
Collapse
|
8
|
Yoo S, Kim J. Adapt-cMolGPT: A Conditional Generative Pre-Trained Transformer with Adapter-Based Fine-Tuning for Target-Specific Molecular Generation. Int J Mol Sci 2024; 25:6641. [PMID: 38928346 PMCID: PMC11203498 DOI: 10.3390/ijms25126641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 06/09/2024] [Accepted: 06/14/2024] [Indexed: 06/28/2024] Open
Abstract
Small-molecule drug design aims to generate compounds that target specific proteins, playing a crucial role in the early stages of drug discovery. Recently, research has emerged that utilizes the GPT model, which has achieved significant success in various fields to generate molecular compounds. However, due to the persistent challenge of small datasets in the pharmaceutical field, there has been some degradation in the performance of generating target-specific compounds. To address this issue, we propose an enhanced target-specific drug generation model, Adapt-cMolGPT, which modifies molecular representation and optimizes the fine-tuning process. In particular, we introduce a new fine-tuning method that incorporates an adapter module into a pre-trained base model and alternates weight updates by sections. We evaluated the proposed model through multiple experiments and demonstrated performance improvements compared to previous models. In the experimental results, Adapt-cMolGPT generated a greater number of novel and valid compounds compared to other models, with these generated compounds exhibiting properties similar to those of real molecular data. These results indicate that our proposed method is highly effective in designing drugs targeting specific proteins.
Collapse
Affiliation(s)
- Soyoung Yoo
- Department of Artificial Intelligence, Sejong University, Seoul 05006, Republic of Korea;
| | - Junghyun Kim
- Department of Artificial Intelligence, Sejong University, Seoul 05006, Republic of Korea;
- Deep Learning Architecture Research Center, Sejong University, Seoul 05006, Republic of Korea
| |
Collapse
|
9
|
Das M, Ghosh A, Sunoj RB. Advances in machine learning with chemical language models in molecular property and reaction outcome predictions. J Comput Chem 2024; 45:1160-1176. [PMID: 38299229 DOI: 10.1002/jcc.27315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/06/2024] [Accepted: 01/09/2024] [Indexed: 02/02/2024]
Abstract
Molecular properties and reactions form the foundation of chemical space. Over the years, innumerable molecules have been synthesized, a smaller fraction of them found immediate applications, while a larger proportion served as a testimony to creative and empirical nature of the domain of chemical science. With increasing emphasis on sustainable practices, it is desirable that a target set of molecules are synthesized preferably through a fewer empirical attempts instead of a larger library, to realize an active candidate. In this front, predictive endeavors using machine learning (ML) models built on available data acquire high timely significance. Prediction of molecular property and reaction outcome remain one of the burgeoning applications of ML in chemical science. Among several methods of encoding molecular samples for ML models, the ones that employ language like representations are gaining steady popularity. Such representations would additionally help adopt well-developed natural language processing (NLP) models for chemical applications. Given this advantageous background, herein we describe several successful chemical applications of NLP focusing on molecular property and reaction outcome predictions. From relatively simpler recurrent neural networks (RNNs) to complex models like transformers, different network architecture have been leveraged for tasks such as de novo drug design, catalyst generation, forward and retro-synthesis predictions. The chemical language model (CLM) provides promising avenues toward a broad range of applications in a time and cost-effective manner. While we showcase an optimistic outlook of CLMs, attention is also placed on the persisting challenges in reaction domain, which would optimistically be addressed by advanced algorithms tailored to chemical language and with increased availability of high-quality datasets.
Collapse
Affiliation(s)
- Manajit Das
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| | - Ankit Ghosh
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
- Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Mumbai, India
| |
Collapse
|
10
|
Wang S, Liang D, Wang J, Dong K, Zhang Y, Liang H, Xu X, Song T. FraHMT: A Fragment-Oriented Heterogeneous Graph Molecular Generation Model for Target Proteins. J Chem Inf Model 2024; 64:3718-3732. [PMID: 38644797 DOI: 10.1021/acs.jcim.4c00252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
The molecular generation task stands as a pivotal step in the domains of computational chemistry and drug discovery, aiming to computationally generate molecular structures for specific properties. In contrast to previous models that focused primarily on SMILES strings or molecular graphs, our model placed a special emphasis on the substructure information on molecules, enabling the model to learn richer chemical rules and structure features from fragments and chemical reaction information on molecules. To accomplish this, we fragmented the molecules to construct heterogeneous graph representations based on atom and fragment information. Then our model mapped the heterogeneous graph data into a latent vector space by using an encoder and employed a self-regressive generative model as a decoder for molecular generation. Additionally, we performed transfer learning on the model using a small set of ligand molecules known to be active against the target protein to generate molecules that bind better to the target protein. Experimental results demonstrate that our model is highly competitive with state-of-the-art models. It can generate valid and diverse molecules with favorable physicochemical properties and drug-likeness. Importantly, they produce novel molecules with high docking scores against the target proteins.
Collapse
Affiliation(s)
- Shuang Wang
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
| | - Dingming Liang
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
| | - Jianmin Wang
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
- The Interdisciplinary Graduate Program in Integrative Biotechnology, Yonsei University, Incheon 21983, Republic of Korea
| | - Kaiyu Dong
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
| | - Yunjing Zhang
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
| | - Huicong Liang
- Marine Biomedical Institute of Qingdao, School of Medicine and Pharmacy, Ocean University of China, QingDao 266580, China
| | - Ximing Xu
- Marine Biomedical Institute of Qingdao, School of Medicine and Pharmacy, Ocean University of China, QingDao 266580, China
| | - Tao Song
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
- Department of Artificial Intelligence, Faculty of Computer Science, Polytechnical University of Madrid, Madrid 28031, Spain
| |
Collapse
|
11
|
Xia W, Xiao J, Bian H, Zhang J, Zhang JZH, Zhang H. Deep Learning-Based construction of a Drug-Like compound database and its application in virtual screening of HsDHODH inhibitors. Methods 2024; 225:44-51. [PMID: 38518843 DOI: 10.1016/j.ymeth.2024.03.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 01/24/2024] [Accepted: 03/13/2024] [Indexed: 03/24/2024] Open
Abstract
The process of virtual screening relies heavily on the databases, but it is disadvantageous to conduct virtual screening based on commercial databases with patent-protected compounds, high compound toxicity and side effects. Therefore, this paper utilizes generative recurrent neural networks (RNN) containing long short-term memory (LSTM) cells to learn the properties of drug compounds in the DrugBank, aiming to obtain a new and virtual screening compounds database with drug-like properties. Ultimately, a compounds database consisting of 26,316 compounds is obtained by this method. To evaluate the potential of this compounds database, a series of tests are performed, including chemical space, ADME properties, compound fragmentation, and synthesizability analysis. As a result, it is proved that the database is equipped with good drug-like properties and a relatively new backbone, its potential in virtual screening is further tested. Finally, a series of seedling compounds with completely new backbones are obtained through docking and binding free energy calculations.
Collapse
Affiliation(s)
- Wei Xia
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Jin Xiao
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Key Laboratory of Green Chemistry & Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University at Shanghai, 200062, China
| | - Hengwei Bian
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Key Laboratory of Green Chemistry & Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University at Shanghai, 200062, China.
| | - Jiajun Zhang
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - John Z H Zhang
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Key Laboratory of Green Chemistry & Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University at Shanghai, 200062, China; NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China; Department of Chemistry, New York University, NY, NY10003, USA; Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi, 030006, China.
| | - Haiping Zhang
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| |
Collapse
|
12
|
Vogt M. Chemoinformatic approaches for navigating large chemical spaces. Expert Opin Drug Discov 2024; 19:403-414. [PMID: 38300511 DOI: 10.1080/17460441.2024.2313475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/30/2024] [Indexed: 02/02/2024]
Abstract
INTRODUCTION Large chemical spaces (CSs) include traditional large compound collections, combinatorial libraries covering billions to trillions of molecules, DNA-encoded chemical libraries comprising complete combinatorial CSs in a single mixture, and virtual CSs explored by generative models. The diverse nature of these types of CSs require different chemoinformatic approaches for navigation. AREAS COVERED An overview of different types of large CSs is provided. Molecular representations and similarity metrics suitable for large CS exploration are discussed. A summary of navigation of CSs in generative models is provided. Methods for characterizing and comparing CSs are discussed. EXPERT OPINION The size of large CSs might restrict navigation to specialized algorithms and limit it to considering neighborhoods of structurally similar molecules. Efficient navigation of large CSs not only requires methods that scale with size but also requires smart approaches that focus on better but not necessarily larger molecule selections. Deep generative models aim to provide such approaches by implicitly learning features relevant for targeted biological properties. It is unclear whether these models can fulfill this ideal as validation is difficult as long as the covered CSs remain mainly virtual without experimental verification.
Collapse
Affiliation(s)
- Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
| |
Collapse
|
13
|
Dodds M, Guo J, Löhr T, Tibo A, Engkvist O, Janet JP. Sample efficient reinforcement learning with active learning for molecular design. Chem Sci 2024; 15:4146-4160. [PMID: 38487235 PMCID: PMC10935729 DOI: 10.1039/d3sc04653b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 02/07/2024] [Indexed: 03/17/2024] Open
Abstract
Reinforcement learning (RL) is a powerful and flexible paradigm for searching for solutions in high-dimensional action spaces. However, bridging the gap between playing computer games with thousands of simulated episodes and solving real scientific problems with complex and involved environments (up to actual laboratory experiments) requires improvements in terms of sample efficiency to make the most of expensive information. The discovery of new drugs is a major commercial application of RL, motivated by the very large nature of the chemical space and the need to perform multiparameter optimization (MPO) across different properties. In silico methods, such as virtual library screening (VS) and de novo molecular generation with RL, show great promise in accelerating this search. However, incorporation of increasingly complex computational models in these workflows requires increasing sample efficiency. Here, we introduce an active learning system linked with an RL model (RL-AL) for molecular design, which aims to improve the sample-efficiency of the optimization process. We identity and characterize unique challenges combining RL and AL, investigate the interplay between the systems, and develop a novel AL approach to solve the MPO problem. Our approach greatly expedites the search for novel solutions relative to baseline-RL for simple ligand- and structure-based oracle functions, with a 5-66-fold increase in hits generated for a fixed oracle budget and a 4-64-fold reduction in computational time to find a specific number of hits. Furthermore, compounds discovered through RL-AL display substantial enrichment of a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds, without a reduction in output diversity. This significant acceleration improves the feasibility of oracle functions that have largely been overlooked in RL due to high computational costs, for example free energy perturbation methods, and in principle is applicable to any RL domain.
Collapse
Affiliation(s)
- Michael Dodds
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jeff Guo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Thomas Löhr
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| |
Collapse
|
14
|
Loeffler HH, He J, Tibo A, Janet JP, Voronov A, Mervin LH, Engkvist O. Reinvent 4: Modern AI-driven generative molecule design. J Cheminform 2024; 16:20. [PMID: 38383444 PMCID: PMC10882833 DOI: 10.1186/s13321-024-00812-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/09/2024] [Indexed: 02/23/2024] Open
Abstract
REINVENT 4 is a modern open-source generative AI framework for the design of small molecules. The software utilizes recurrent neural networks and transformer architectures to drive molecule generation. These generators are seamlessly embedded within the general machine learning optimization algorithms, transfer learning, reinforcement learning and curriculum learning. REINVENT 4 enables and facilitates de novo design, R-group replacement, library design, linker design, scaffold hopping and molecule optimization. This contribution gives an overview of the software and describes its design. Algorithms and their applications are discussed in detail. REINVENT 4 is a command line tool which reads a user configuration in either TOML or JSON format. The aim of this release is to provide reference implementations for some of the most common algorithms in AI based molecule generation. An additional goal with the release is to create a framework for education and future innovation in AI based molecular design. The software is available from https://github.com/MolecularAI/REINVENT4 and released under the permissive Apache 2.0 license. Scientific contribution. The software provides an open-source reference implementation for generative molecular design where the software is also being used in production to support in-house drug discovery projects. The publication of the most common machine learning algorithms in one code and full documentation thereof will increase transparency of AI and foster innovation, collaboration and education.
Collapse
Affiliation(s)
- Hannes H Loeffler
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| | - Jiazhen He
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Alexey Voronov
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Lewis H Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
15
|
Kyro GW, Morgunov A, Brent RI, Batista VS. ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation. J Chem Inf Model 2024; 64:653-665. [PMID: 38287889 DOI: 10.1021/acs.jcim.3c01456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology and demonstrate its applicability to targeted molecular generation. When applied to c-Abl kinase, a protein with FDA-approved small-molecule inhibitors, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. To facilitate implementation and reproducibility, we made all of our software available through the open-source ChemSpaceAL Python package.
Collapse
Affiliation(s)
- Gregory W Kyro
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Anton Morgunov
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Rafael I Brent
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Victor S Batista
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| |
Collapse
|
16
|
Chowdhury J, Fricke C, Bamidele O, Bello M, Yang W, Heyden A, Terejanu G. Invariant Molecular Representations for Heterogeneous Catalysis. J Chem Inf Model 2024; 64:327-339. [PMID: 38197612 PMCID: PMC10806804 DOI: 10.1021/acs.jcim.3c00594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 12/25/2023] [Accepted: 12/28/2023] [Indexed: 01/11/2024]
Abstract
Catalyst screening is a critical step in the discovery and development of heterogeneous catalysts, which are vital for a wide range of chemical processes. In recent years, computational catalyst screening, primarily through density functional theory (DFT), has gained significant attention as a method for identifying promising catalysts. However, the computation of adsorption energies for all likely chemical intermediates present in complex surface chemistries is computationally intensive and costly due to the expensive nature of these calculations and the intrinsic idiosyncrasies of the methods or data sets used. This study introduces a novel machine learning (ML) method to learn adsorption energies from multiple DFT functionals by using invariant molecular representations (IMRs). To do this, we first extract molecular fingerprints for the reaction intermediates and later use a Siamese-neural-network-based training strategy to learn invariant molecular representations or the IMR across all available functionals. Our Siamese network-based representations demonstrate superior performance in predicting adsorption energies compared with other molecular representations. Notably, when considering mean absolute values of adsorption energies as 0.43 eV (PBE-D3), 0.46 eV (BEEF-vdW), 0.81 eV (RPBE), and 0.37 eV (scan+rVV10), our IMR method has achieved the lowest mean absolute errors (MAEs) of 0.18 0.10, 0.16, and 0.18 eV, respectively. These results emphasize the superior predictive capacity of our Siamese network-based representations. The empirical findings in this study illuminate the efficacy, robustness, and dependability of our proposed ML paradigm in predicting adsorption energies, specifically for propane dehydrogenation on a platinum catalyst surface.
Collapse
Affiliation(s)
- Jawad Chowdhury
- Department
of Computer Science, University of North
Carolina at Charlotte, Charlotte, North Carolina 28223, United States
| | - Charles Fricke
- Department
of Chemical Engineering, University of South
Carolina, Columbia, South Carolina 29208, United States
| | - Olajide Bamidele
- Department
of Chemical Engineering, University of South
Carolina, Columbia, South Carolina 29208, United States
| | - Mubarak Bello
- Department
of Chemical Engineering, University of South
Carolina, Columbia, South Carolina 29208, United States
| | - Wenqiang Yang
- Department
of Chemical Engineering, University of South
Carolina, Columbia, South Carolina 29208, United States
| | - Andreas Heyden
- Department
of Chemical Engineering, University of South
Carolina, Columbia, South Carolina 29208, United States
| | - Gabriel Terejanu
- Department
of Computer Science, University of North
Carolina at Charlotte, Charlotte, North Carolina 28223, United States
| |
Collapse
|
17
|
Melancon K, Pliushcheuskaya P, Meiler J, Künze G. Targeting ion channels with ultra-large library screening for hit discovery. Front Mol Neurosci 2024; 16:1336004. [PMID: 38249296 PMCID: PMC10796734 DOI: 10.3389/fnmol.2023.1336004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 12/05/2023] [Indexed: 01/23/2024] Open
Abstract
Ion channels play a crucial role in a variety of physiological and pathological processes, making them attractive targets for drug development in diseases such as diabetes, epilepsy, hypertension, cancer, and chronic pain. Despite the importance of ion channels in drug discovery, the vastness of chemical space and the complexity of ion channels pose significant challenges for identifying drug candidates. The use of in silico methods in drug discovery has dramatically reduced the time and cost of drug development and has the potential to revolutionize the field of medicine. Recent advances in computer hardware and software have enabled the screening of ultra-large compound libraries. Integration of different methods at various scales and dimensions is becoming an inevitable trend in drug development. In this review, we provide an overview of current state-of-the-art computational chemistry methodologies for ultra-large compound library screening and their application to ion channel drug discovery research. We discuss the advantages and limitations of various in silico techniques, including virtual screening, molecular mechanics/dynamics simulations, and machine learning-based approaches. We also highlight several successful applications of computational chemistry methodologies in ion channel drug discovery and provide insights into future directions and challenges in this field.
Collapse
Affiliation(s)
- Kortney Melancon
- Department of Chemistry, Vanderbilt University, Nashville, TN, United States
- Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | | | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, United States
- Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
- Medical Faculty, Institute for Drug Discovery, Leipzig University, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence, Leipzig University, Leipzig, Germany
| | - Georg Künze
- Medical Faculty, Institute for Drug Discovery, Leipzig University, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence, Leipzig University, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| |
Collapse
|
18
|
Agarwal D, Kumar S, Ambatwar R, Bhanwala N, Chandrakar L, Khatik GL. Lead Identification Through In Silico Studies: Targeting Acetylcholinesterase Enzyme Against Alzheimer's Disease. Cent Nerv Syst Agents Med Chem 2024; 24:219-242. [PMID: 38288823 DOI: 10.2174/0118715249268585240107184956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 11/09/2023] [Accepted: 12/01/2023] [Indexed: 07/23/2024]
Abstract
AIMS In this work, we aimed to acquire the best potential small molecule for Alzheimer's disease (AD) treatment using different models in Biovia Discovery Studio to identify new potential inhibitors of acetylcholinesterase (AChE) via in silico studies. BACKGROUND The prevalence of cognitive impairment-related neurodegenerative disorders, such as AD, has been observed to escalate rapidly. However, we still know little about the underlying functions, outcome predictors, or intervention targets causing AD. OBJECTIVES The objective of the study was to optimize and identify the lead compound to target AChE against Alzheimer's disease. METHODS Different in silico studies were employed, including the pharmacophore model, virtual screening, molecular docking, de novo evolution model, and molecular dynamics. RESULTS The pharmacophoric features of AChE inhibitors were determined by ligand-based pharmacophore models and 3D QSAR pharmacophore generation. Further validation of the best pharmacophore model was done using the cost analysis method, Fischer's randomization method, and test set. The molecules that harmonized the best pharmacophore model with the estimated activity < 1 nM and ADMET parameters were filtered, and 12 molecules were subjected to molecular docking studies to obtain binding energy. 3vsp_EK8_1 secured the highest binding energy of 65.60 kcal/mol. Further optimization led to a 3v_Evo_4 molecule with a better binding energy of 70.17 kcal/mol. The molecule 3v_evo_4 was subjected to 100 ns molecular simulation compared to donepezil, which showed better stability at the binding site. CONCLUSION A lead compound, 3v_Evo_4 molecule, was identified to inhibit AChE, and it could be further studied to develop as a drug with better efficacy than the existing available drugs for treating AD.
Collapse
Affiliation(s)
- Dhairiya Agarwal
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research-Raebareli, Uttar Pradesh, 226002, India
| | - Sumit Kumar
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research-Raebareli, Uttar Pradesh, 226002, India
| | - Ramesh Ambatwar
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research-Raebareli, Uttar Pradesh, 226002, India
| | - Neeru Bhanwala
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research-Raebareli, Uttar Pradesh, 226002, India
| | - Lokesh Chandrakar
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research-Raebareli, Uttar Pradesh, 226002, India
| | - Gopal L Khatik
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research-Raebareli, Uttar Pradesh, 226002, India
| |
Collapse
|
19
|
Mehrzadi A, Rezaee E, Gharaghani S, Fakhar Z, Mirhosseini SM. A Molecular Generative Model of COVID-19 Main Protease Inhibitors Using Long Short-Term Memory-Based Recurrent Neural Network. J Comput Biol 2024; 31:83-98. [PMID: 38054946 DOI: 10.1089/cmb.2023.0064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023] Open
Abstract
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a serious threat to public health and prompted researchers to find anti-coronavirus 2019 (COVID-19) compounds. In this study, the long short-term memory-based recurrent neural network was used to generate new inhibitors for the coronavirus. First, the model was trained to generate drug compounds in the form of valid simplified molecular-input line-entry system strings. Then, the structures of COVID-19 main protease inhibitors were applied to fine-tune the model. After fine-tuning, the network could generate new molecular structures as novel SARS-CoV-2 main protease inhibitors. Molecular docking exhibited that some generated compounds have the proper affinity to the active site of the protease. Molecular Dynamics simulations explored binding free energies of the compounds over simulation trajectories. In addition, in silico absorption, distribution, metabolism, and excretion studies showed that some novel compounds could be formulated as orally active agents. Based on molecular docking and molecular dynamics simulation studies, compound AADH possessed significant binding affinity and presumably inhibition against the SARS-CoV-2 main protease enzyme. Therefore, the proposed deep learning-based model was capable of generating promising anti-COVID-19 drugs.
Collapse
Affiliation(s)
- Arash Mehrzadi
- Department of Electrical, Computer and IT Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran
| | - Elham Rezaee
- Department of Pharmaceutical Chemistry, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Sajjad Gharaghani
- Department of Bioinformatics, Laboratory of Bioinformatics and Drug Design (LBD), University of Tehran, Tehran, Iran
| | - Zeynab Fakhar
- Department of Bioinformatics, Laboratory of Bioinformatics and Drug Design (LBD), University of Tehran, Tehran, Iran
| | | |
Collapse
|
20
|
Huang CH, Lin ST. MARS Plus: An Improved Molecular Design Tool for Complex Compounds Involving Ionic, Stereo, and Cis-Trans Isomeric Structures. J Chem Inf Model 2023; 63:7711-7728. [PMID: 38100117 DOI: 10.1021/acs.jcim.3c01745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]
Abstract
MARS (Molecular Assembling and Representation Suite) (Hsu et al. J. Chem. Inf. Model. 2019, 59, 3703-3713) is a toolbox for the molecular design of organic molecules. MARS uses integer arrays to represent the elements and connectivity between elements of a molecule. It provides a collection of operations to manipulate the elemental composition and connectivity of a molecule (or a pair of molecules), enabling the creation of novel chemical compounds. In this work, the original MARS is extended to handle complex molecular structures, including geometric (cis-trans) isomers, stereo isomers, cyclic compounds, and ionic species. The extended version of MARS, referred to as MARS+, has a more comprehensive coverage of the chemical space and therefore can explore molecules with a greater chemical and physical diversity. Compared to other molecular design tools, MARS+ is designed to perform all possible manipulations on a given molecule or a pair of molecules. Molecular structure manipulation can be conducted in either a controlled or a random fashion. Furthermore, every structure manipulation has a counterpart so that the operation can be reversed. Nearly any possible chemical structure can be generated with MARS+ via a combination of molecular operations. The capabilities of MARS+ are examined by the design of new ionic liquids (ILs). The results show that MARS+ is a useful tool for computer-aided molecular design (CAMD) and molecular structure enumeration.
Collapse
Affiliation(s)
- Chen-Hsuan Huang
- Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Shiang-Tai Lin
- Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan
| |
Collapse
|
21
|
Karandashev K, Weinreich J, Heinen S, Arismendi Arrieta DJ, von Rudorff GF, Hermansson K, von Lilienfeld OA. Evolutionary Monte Carlo of QM Properties in Chemical Space: Electrolyte Design. J Chem Theory Comput 2023; 19:8861-8870. [PMID: 38009856 PMCID: PMC10720348 DOI: 10.1021/acs.jctc.3c00822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/29/2023] [Accepted: 10/30/2023] [Indexed: 11/29/2023]
Abstract
Optimizing a target function over the space of organic molecules is an important problem appearing in many fields of applied science but also a very difficult one due to the vast number of possible molecular systems. We propose an evolutionary Monte Carlo algorithm for solving such problems which is capable of straightforwardly tuning both exploration and exploitation characteristics of an optimization procedure while retaining favorable properties of genetic algorithms. The method, dubbed MOSAiCS (Metropolis Optimization by Sampling Adaptively in Chemical Space), is tested on problems related to optimizing components of battery electrolytes, namely, minimizing solvation energy in water or maximizing dipole moment while enforcing a lower bound on the HOMO-LUMO gap; optimization was carried out over sets of molecular graphs inspired by QM9 and Electrolyte Genome Project (EGP) data sets. MOSAiCS reliably generated molecular candidates with good target quantity values, which were in most cases better than the ones found in QM9 or EGP. While the optimization results presented in this work sometimes required up to 106 QM calculations and were thus feasible only thanks to computationally efficient ab initio approximations of properties of interest, we discuss possible strategies for accelerating MOSAiCS using machine learning approaches.
Collapse
Affiliation(s)
| | - Jan Weinreich
- Faculty
of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
| | - Stefan Heinen
- Vector
Institute for Artificial Intelligence, Toronto, M5S 1M1 Ontario, Canada
| | | | - Guido Falk von Rudorff
- Department
of Chemistry, University Kassel, Heinrich-Plett-Str.40, 34132 Kassel, Germany
- Center
for Interdisciplinary Nanostructure Science and Technology (CINSaT), Heinrich-Plett-Straße 40, 34132 Kassel, Germany
| | - Kersti Hermansson
- Department
of Chemistry-Ångström Laboratory, Uppsala University, Box 538, SE-75121 Uppsala, Sweden
| | - O. Anatole von Lilienfeld
- Vector
Institute for Artificial Intelligence, Toronto, M5S 1M1 Ontario, Canada
- Departments
of Chemistry, Materials Science and Engineering, and Physics, University of Toronto, St. George
Campus, Toronto, M5S 1A1 Ontario, Canada
- Machine
Learning Group, Technische Universität
Berlin and Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
| |
Collapse
|
22
|
Viswanathan K, Goel M, Laghuvarapu S, Varma G, Priyakumar UD. Streamlining pipeline efficiency: a novel model-agnostic technique for accelerating conditional generative and virtual screening pipelines. Sci Rep 2023; 13:21069. [PMID: 38030689 PMCID: PMC10686981 DOI: 10.1038/s41598-023-42952-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 09/16/2023] [Indexed: 12/01/2023] Open
Abstract
The discovery of potential therapeutic agents for life-threatening diseases has become a significant problem. There is a requirement for fast and accurate methods to identify drug-like molecules that can be used as potential candidates for novel targets. Existing techniques like high-throughput screening and virtual screening are time-consuming and inefficient. Traditional molecule generation pipelines are more efficient than virtual screening but use time-consuming docking software. Such docking functions can be emulated using Machine Learning models with comparable accuracy and faster execution times. However, we find that when pre-trained machine learning models are employed in generative pipelines as oracles, they suffer from model degradation in areas where data is scarce. In this study, we propose an active learning-based model that can be added as a supplement to enhanced molecule generation architectures. The proposed method uses uncertainty sampling on the molecules created by the generator model and dynamically learns as the generator samples molecules from different regions of the chemical space. The proposed framework can generate molecules with high binding affinity with [Formula: see text]a 70% improvement in runtime compared to the baseline model by labeling only [Formula: see text]30% of molecules compared to the baseline oracle.
Collapse
Affiliation(s)
- Karthik Viswanathan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Manan Goel
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Siddhartha Laghuvarapu
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Girish Varma
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India.
| |
Collapse
|
23
|
Zou J, Zhao L, Shi S. Generation of focused drug molecule library using recurrent neural network. J Mol Model 2023; 29:361. [PMID: 37932607 DOI: 10.1007/s00894-023-05772-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 10/26/2023] [Indexed: 11/08/2023]
Abstract
CONTEXT With the wide application of deep learning in drug research and development, de novo molecular design methods based on recurrent neural network (RNN) have strong advantages in drug molecule generation. The RNN model can be used to learn the internal chemical structure of molecules, which is similar to a natural language processing task. Although techniques for generating target-specific molecular libraries based on RNN models are mature, research related to drug design and screening continues around the clock. Research based on de novo drug design methods to generate larger quantities of valid compounds is necessary. METHODS In this study, a molecular generation model based on RNN was designed, which abandoned the traditional way of stacked RNN and introduced the Nested long short-term memory network structure. To enrich the library of focused molecules for specific targets, we fine-tuned the model using active molecules from novel coronavirus pneumonia and screened the molecules using machine learning models. Following rigorous screening, the selected molecules underwent molecular docking with the SARS-CoV-2 M-pro receptor using AutoDock2.4 to identify the top 3 potential inhibitors. Subsequently, 100-ns molecular dynamics simulations were conducted using Amber22. Molecule parameterization involved the GAFF2 force field, while the proteins were modeled using the ff19SB force field, with solvation facilitated by a truncated octahedral TIP3P solvent environment. Upon completion of molecular dynamics simulations, stability of ligand-protein complexes was assessed by analysis of RMSD, H-bonds, and MM-GBSA. Reasonable results prove that the model can complete the task of de novo drug design and has the potential to be ideal drug molecules.
Collapse
Affiliation(s)
- Jinping Zou
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China
- Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China
| | - Long Zhao
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China
- Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China
| | - Shaoping Shi
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
- Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China.
| |
Collapse
|
24
|
Ilnicka A, Schneider G. Designing molecules with autoencoder networks. NATURE COMPUTATIONAL SCIENCE 2023; 3:922-933. [PMID: 38177601 DOI: 10.1038/s43588-023-00548-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/03/2023] [Indexed: 01/06/2024]
Abstract
Autoencoders are versatile tools in molecular informatics. These unsupervised neural networks serve diverse tasks such as data-driven molecular representation and constructive molecular design. This Review explores their algorithmic foundations and applications in drug discovery, highlighting the most active areas of development and the contributions autoencoder networks have made in advancing this field. We also explore the challenges and prospects concerning the utilization of autoencoders and the various adaptations of this neural network architecture in molecular design.
Collapse
Affiliation(s)
- Agnieszka Ilnicka
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland.
| |
Collapse
|
25
|
Capponi S, Daniels KG. Harnessing the power of artificial intelligence to advance cell therapy. Immunol Rev 2023; 320:147-165. [PMID: 37415280 DOI: 10.1111/imr.13236] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 06/17/2023] [Indexed: 07/08/2023]
Abstract
Cell therapies are powerful technologies in which human cells are reprogrammed for therapeutic applications such as killing cancer cells or replacing defective cells. The technologies underlying cell therapies are increasing in effectiveness and complexity, making rational engineering of cell therapies more difficult. Creating the next generation of cell therapies will require improved experimental approaches and predictive models. Artificial intelligence (AI) and machine learning (ML) methods have revolutionized several fields in biology including genome annotation, protein structure prediction, and enzyme design. In this review, we discuss the potential of combining experimental library screens and AI to build predictive models for the development of modular cell therapy technologies. Advances in DNA synthesis and high-throughput screening techniques enable the construction and screening of libraries of modular cell therapy constructs. AI and ML models trained on this screening data can accelerate the development of cell therapies by generating predictive models, design rules, and improved designs.
Collapse
Affiliation(s)
- Sara Capponi
- Department of Functional Genomics and Cellular Engineering, IBM Almaden Research Center, San Jose, California, USA
- Center for Cellular Construction, San Francisco, California, USA
| | - Kyle G Daniels
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, California, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| |
Collapse
|
26
|
Raza A, Chohan TA, Buabeid M, Arafa ESA, Chohan TA, Fatima B, Sultana K, Ullah MS, Murtaza G. Deep learning in drug discovery: a futuristic modality to materialize the large datasets for cheminformatics. J Biomol Struct Dyn 2023; 41:9177-9192. [PMID: 36305195 DOI: 10.1080/07391102.2022.2136244] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 10/08/2022] [Indexed: 10/31/2022]
Abstract
Artificial intelligence (AI) development imitates the workings of the human brain to comprehend modern problems. The traditional approaches such as high throughput screening (HTS) and combinatorial chemistry are lengthy and expensive to the pharmaceutical industry as they can only handle a smaller dataset. Deep learning (DL) is a sophisticated AI method that uses a thorough comprehension of particular systems. The pharmaceutical industry is now adopting DL techniques to enhance the research and development process. Multi-oriented algorithms play a crucial role in the processing of QSAR analysis, de novo drug design, ADME evaluation, physicochemical analysis, preclinical development, followed by clinical trial data precision. In this study, we investigated the performance of several algorithms, including deep neural networks (DNN), convolutional neural networks (CNN) and multi-task learning (MTL), with the aim of generating high-quality, interpretable big and diverse databases for drug design and development. Studies have demonstrated that CNN, recurrent neural network and deep belief network are compatible, accurate and effective for the molecular description of pharmacodynamic properties. In Covid-19, existing pharmacological compounds has also been repurposed using DL models. In the absence of the Covid-19 vaccine, remdesivir and oseltamivir have been widely employed to treat severe SARS-CoV-2 infections. In conclusion, the results indicate the potential benefits of employing the DL strategies in the drug discovery process.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ali Raza
- Department of pharmaceutical chemistry, Faculty of Pharmacy, The University of Lahore, Pakistan
- Institute of Molecular Biology and Biochemistry, The University of Lahore, Pakistan
| | - Talha Ali Chohan
- Institute of Molecular Biology and Biochemistry, The University of Lahore, Pakistan
- Institute of Pharmaceutical Science, UVAS, Lahore, Pakistan
| | - Manal Buabeid
- Department of Clinical Sciences, College of Pharmacy and Health Sciences, Ajman University, Ajman, United Arab Emirates
| | - El-Shaima A Arafa
- Department of Clinical Sciences, College of Pharmacy and Health Sciences, Ajman University, Ajman, United Arab Emirates
- Centre of Medical and Bio-Allied Health Sciences Research, Ajman University, Ajman, United Arab Emirates
| | | | - Batool Fatima
- Department of biochemistry, Bahauddin Zakariya University, Multan, Pakistan
| | - Kishwar Sultana
- Department of pharmaceutical chemistry, Faculty of Pharmacy, The University of Lahore, Pakistan
| | - Malik Saad Ullah
- Department of Pharmacy, Government College University, Faisalabad, Pakistan
| | - Ghulam Murtaza
- Department of Pharmacy, COMSATS University Islamabad, Lahore Campus, Pakistan
| |
Collapse
|
27
|
Arras P, Yoo HB, Pekar L, Clarke T, Friedrich L, Schröter C, Schanz J, Tonillo J, Siegmund V, Doerner A, Krah S, Guarnera E, Zielonka S, Evers A. AI/ML combined with next-generation sequencing of VHH immune repertoires enables the rapid identification of de novo humanized and sequence-optimized single domain antibodies: a prospective case study. Front Mol Biosci 2023; 10:1249247. [PMID: 37842638 PMCID: PMC10575757 DOI: 10.3389/fmolb.2023.1249247] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 08/31/2023] [Indexed: 10/17/2023] Open
Abstract
Introduction: In this study, we demonstrate the feasibility of yeast surface display (YSD) and nextgeneration sequencing (NGS) in combination with artificial intelligence and machine learning methods (AI/ML) for the identification of de novo humanized single domain antibodies (sdAbs) with favorable early developability profiles. Methods: The display library was derived from a novel approach, in which VHH-based CDR3 regions obtained from a llama (Lama glama), immunized against NKp46, were grafted onto a humanized VHH backbone library that was diversified in CDR1 and CDR2. Following NGS analysis of sequence pools from two rounds of fluorescence-activated cell sorting we focused on four sequence clusters based on NGS frequency and enrichment analysis as well as in silico developability assessment. For each cluster, long short-term memory (LSTM) based deep generative models were trained and used for the in silico sampling of new sequences. Sequences were subjected to sequence- and structure-based in silico developability assessment to select a set of less than 10 sequences per cluster for production. Results: As demonstrated by binding kinetics and early developability assessment, this procedure represents a general strategy for the rapid and efficient design of potent and automatically humanized sdAb hits from screening selections with favorable early developability profiles.
Collapse
Affiliation(s)
- Paul Arras
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
- Institute for Organic Chemistry and Biochemistry, Technical University of Darmstadt, Darmstadt, Germany
| | - Han Byul Yoo
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
| | - Lukas Pekar
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
| | - Thomas Clarke
- Bioinformatics, EMD Serono, Billerica, MA, United States
| | - Lukas Friedrich
- Computational Chemistry and Biologics, Merck Healthcare KGaA, Darmstadt, Germany
| | | | - Jennifer Schanz
- ADCs & Targeted NBE Therapeutics, Merck KGaA, Darmstadt, Germany
| | - Jason Tonillo
- ADCs & Targeted NBE Therapeutics, Merck KGaA, Darmstadt, Germany
| | - Vanessa Siegmund
- Early Protein Supply and Characterization, Merck Healthcare KGaA, Darmstadt, Germany
| | - Achim Doerner
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
| | - Simon Krah
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
| | - Enrico Guarnera
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
| | - Stefan Zielonka
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
- Institute for Organic Chemistry and Biochemistry, Technical University of Darmstadt, Darmstadt, Germany
| | - Andreas Evers
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
| |
Collapse
|
28
|
Bae B, Bae H, Nam H. LOGICS: Learning optimal generative distribution for designing de novo chemical structures. J Cheminform 2023; 15:77. [PMID: 37674239 PMCID: PMC10483765 DOI: 10.1186/s13321-023-00747-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 08/23/2023] [Indexed: 09/08/2023] Open
Abstract
In recent years, the field of computational drug design has made significant strides in the development of artificial intelligence (AI) models for the generation of de novo chemical compounds with desired properties and biological activities, such as enhanced binding affinity to target proteins. These high-affinity compounds have the potential to be developed into more potent therapeutics for a broad spectrum of diseases. Due to the lack of data required for the training of deep generative models, however, some of these approaches have fine-tuned their molecular generators using data obtained from a separate predictor. While these studies show that generative models can produce structures with the desired target properties, it remains unclear whether the diversity of the generated structures and the span of their chemical space align with the distribution of the intended target molecules. In this study, we present a novel generative framework, LOGICS, a framework for Learning Optimal Generative distribution Iteratively for designing target-focused Chemical Structures. We address the exploration-exploitation dilemma, which weighs the choice between exploring new options and exploiting current knowledge. To tackle this issue, we incorporate experience memory and employ a layered tournament selection approach to refine the fine-tuning process. The proposed method was applied to the binding affinity optimization of two target proteins of different protein classes, κ-opioid receptors, and PIK3CA, and the quality and the distribution of the generative molecules were evaluated. The results showed that LOGICS outperforms competing state-of-the-art models and generates more diverse de novo chemical structures with optimized properties. The source code is available at the GitHub repository ( https://github.com/GIST-CSBL/LOGICS ).
Collapse
Affiliation(s)
- Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea
| | - Haelee Bae
- AI Graduate School, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
- AI Graduate School, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
- Center for AI-Applied High Efficiency Drug Discovery (AHEDD), Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
| |
Collapse
|
29
|
Williams AH, Zhan CG. Staying Ahead of the Game: How SARS-CoV-2 has Accelerated the Application of Machine Learning in Pandemic Management. BioDrugs 2023; 37:649-674. [PMID: 37464099 DOI: 10.1007/s40259-023-00611-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/28/2023] [Indexed: 07/20/2023]
Abstract
In recent years, machine learning (ML) techniques have garnered considerable interest for their potential use in accelerating the rate of drug discovery. With the emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, the utilization of ML has become even more crucial in the search for effective antiviral medications. The pandemic has presented the scientific community with a unique challenge, and the rapid identification of potential treatments has become an urgent priority. Researchers have been able to accelerate the process of identifying drug candidates, repurposing existing drugs, and designing new compounds with desirable properties using machine learning in drug discovery. To train predictive models, ML techniques in drug discovery rely on the analysis of large datasets, including both experimental and clinical data. These models can be used to predict the biological activities, potential side effects, and interactions with specific target proteins of drug candidates. This strategy has proven to be an effective method for identifying potential coronavirus disease 2019 (COVID-19) and other disease treatments. This paper offers a thorough analysis of the various ML techniques implemented to combat COVID-19, including supervised and unsupervised learning, deep learning, and natural language processing. The paper discusses the impact of these techniques on pandemic drug development, including the identification of potential treatments, the understanding of the disease mechanism, and the creation of effective and safe therapeutics. The lessons learned can be applied to future outbreaks and drug discovery initiatives.
Collapse
Affiliation(s)
- Alexander H Williams
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- GSK Upper Providence, 1250 S. Collegeville Road, Collegeville, PA, 19426, USA
| | - Chang-Guo Zhan
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
| |
Collapse
|
30
|
An Y, Glavatskikh M, Lim J, Wang X, Norris-Drouin J, Hardy PB, Leisner TM, Pearce KH, Kireev D. Machine Learning-driven Fragment-based Discovery of CIB1-directed Anti-Tumor Agents by FRASE-bot. RESEARCH SQUARE 2023:rs.3.rs-3197490. [PMID: 37645935 PMCID: PMC10462244 DOI: 10.21203/rs.3.rs-3197490/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Chemical probes are an indispensable tool for translating biological discoveries into new therapies, though are increasingly difficult to identify. Novel therapeutic targets are often hard-to-drug proteins, such as messengers or transcription factors. Computational strategies arise as a promising solution to expedite drug discovery for unconventional therapeutic targets. FRASE-bot exploits big data and machine learning (ML) to distill 3D information relevant to the target protein from thousands of protein-ligand complexes to seed it with ligand fragments. The seeded fragments can then inform either (i) de novo design of 3D ligand structures or (ii) ultra-large-scale virtual screening of commercially available compounds. Here, FRASE-bot was applied to identify ligands for Calcium and Integrin Binding protein 1 (CIB1), a promising but ligand-orphan drug target implicated in triple negative breast cancer. The signaling function of CIB1 relies on protein-protein interactions and its structure does not feature any natural ligand-binding pocket. FRASE-based virtual screening identified the first small-molecule CIB1 ligand (with binding confirmed in a TR-FRET assay) showing specific cell-killing activity in CIB1-dependent cancer cells, but not in CIB1-depleted cells.
Collapse
Affiliation(s)
- Yi An
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
| | - Marta Glavatskikh
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
| | - Jiwoong Lim
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
| | - Xiaowen Wang
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
- Chemistry department, University of Missouri, Columbia, Columbia, MO, 65211
| | - Jacqueline Norris-Drouin
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
| | - P. Brian Hardy
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
| | - Tina M. Leisner
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
| | - Kenneth H. Pearce
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
| | - Dmitri Kireev
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
- Chemistry department, University of Missouri, Columbia, Columbia, MO, 65211
| |
Collapse
|
31
|
Jin J, Wang D, Shi G, Bao J, Wang J, Zhang H, Pan P, Li D, Yao X, Liu H, Hou T, Kang Y. FFLOM: A Flow-Based Autoregressive Model for Fragment-to-Lead Optimization. J Med Chem 2023; 66:10808-10823. [PMID: 37471134 DOI: 10.1021/acs.jmedchem.3c01009] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/21/2023]
Abstract
Recently, deep generative models have been regarded as promising tools in fragment-based drug design (FBDD). Despite the growing interest in these models, they still face challenges in generating molecules with desired properties in low data regimes. In this study, we propose a novel flow-based autoregressive model named FFLOM for linker and R-group design. In a large-scale benchmark evaluation on ZINC, CASF, and PDBbind test sets, FFLOM achieves state-of-the-art performance in terms of validity, uniqueness, novelty, and recovery of the generated molecules and can recover over 92% of the original molecules in the PDBbind test set (with at least five atoms). FFLOM also exhibits excellent potential applicability in several practical scenarios encompassing fragment linking, PROTAC design, R-group growing, and R-group optimization. In all four cases, FFLOM can perfectly reconstruct the ground-truth compounds and generate over 74% of molecules with novel fragments, some of which have higher binding affinity than the ground truth.
Collapse
Affiliation(s)
- Jieyu Jin
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Guqin Shi
- Shanghai Qilu Pharmaceutical R&D Center, 576 Libing Road, Pudong New Area District, Shanghai 310115, China
| | - Jingxiao Bao
- Shanghai Qilu Pharmaceutical R&D Center, 576 Libing Road, Pudong New Area District, Shanghai 310115, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Haotian Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macau 999078, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macau 999078, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
32
|
Prabhakaran P, Hebbani AV, Menon SV, Paital B, Murmu S, Kumar S, Singh MK, Sahoo DK, Desai PPD. Insilico generation of novel ligands for the inhibition of SARS-CoV-2 main protease (3CL pro) using deep learning. Front Microbiol 2023; 14:1194794. [PMID: 37448573 PMCID: PMC10338188 DOI: 10.3389/fmicb.2023.1194794] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/05/2023] [Indexed: 07/15/2023] Open
Abstract
The recent emergence of novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causing the coronavirus disease (COVID-19) has become a global public health crisis, and a crucial need exists for rapid identification and development of novel therapeutic interventions. In this study, a recurrent neural network (RNN) is trained and optimized to produce novel ligands that could serve as potential inhibitors to the SARS-CoV-2 viral protease: 3 chymotrypsin-like protease (3CLpro). Structure-based virtual screening was performed through molecular docking, ADMET profiling, and predictions of various molecular properties were done to evaluate the toxicity and drug-likeness of the generated novel ligands. The properties of the generated ligands were also compared with current drugs under various phases of clinical trials to assess the efficacy of the novel ligands. Twenty novel ligands were selected that exhibited good drug-likeness properties, with most ligands conforming to Lipinski's rule of 5, high binding affinity (highest binding affinity: -9.4 kcal/mol), and promising ADMET profile. Additionally, the generated ligands complexed with 3CLpro were found to be stable based on the results of molecular dynamics simulation studies conducted over a 100 ns period. Overall, the findings offer a promising avenue for the rapid identification and development of effective therapeutic interventions to treat COVID-19.
Collapse
Affiliation(s)
- Prejwal Prabhakaran
- Department of Biotechnology, New Horizon College of Engineering, Bangalore, India
- Faculty of Biology, Albert-Ludwigs-Universität Freiburg, Freiburg im Breisgau, Germany
| | - Ananda Vardhan Hebbani
- Department of Biochemistry, Indian Academy Degree College (Autonomous), Bangalore, India
| | - Soumya V. Menon
- Department of Chemistry and Biochemistry, School of Sciences, Jain (Deemed-to-be) University, Bangalore, India
| | - Biswaranjan Paital
- Redox Regulation Laboratory, Department of Zoology, College of Basic Science and Humanities, Odisha University of Agriculture and Technology, Bhubaneswar, India
| | - Sneha Murmu
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, India
| | - Sunil Kumar
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, India
| | | | - Dipak Kumar Sahoo
- Department of Veterinary Clinical Sciences, College of Veterinary Medicine, Iowa State University, Ames, IA, United States
| | | |
Collapse
|
33
|
Zhang W, Zhang K, Huang J. A Simple Way to Incorporate Target Structural Information in Molecular Generative Models. J Chem Inf Model 2023. [PMID: 37318828 DOI: 10.1021/acs.jcim.3c00293] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Deep learning generative models are now being applied in various fields including drug discovery. In this work, we propose a novel approach to include target 3D structural information in molecular generative models for structure-based drug design. The method combines a message-passing neural network model that predicts docking scores with a generative neural network model as its reward function to navigate the chemical space searching for molecules that bind favorably with a specific target. A key feature of the method is the construction of target-specific molecular sets for training, designed to overcome potential transferability issues of surrogate docking models through a two-round training process. Consequently, this enables accurate guided exploration of the chemical space without reliance on the collection of prior knowledge about active and inactive compounds for the specific target. Tests on eight target proteins showed a 100-fold increase in hit generation compared to conventional docking calculations and the ability to generate molecules similar to approved drugs or known active ligands for specific targets without prior knowledge. This method provides a general and highly efficient solution for structure-based molecular generation.
Collapse
Affiliation(s)
- Wenyi Zhang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Kaiyue Zhang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Jing Huang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| |
Collapse
|
34
|
Li J, Beaudoin C, Ghosh S. Energy-based generative models for target-specific drug discovery. FRONTIERS IN MOLECULAR MEDICINE 2023; 3:1160877. [PMID: 39086693 PMCID: PMC11285544 DOI: 10.3389/fmmed.2023.1160877] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 05/12/2023] [Indexed: 08/02/2024]
Abstract
Drug targets are the main focus of drug discovery due to their key role in disease pathogenesis. Computational approaches are widely applied to drug development because of the increasing availability of biological molecular datasets. Popular generative approaches can create new drug molecules by learning the given molecule distributions. However, these approaches are mostly not for target-specific drug discovery. We developed an energy-based probabilistic model for computational target-specific drug discovery. Results show that our proposed TagMol can generate molecules with similar binding affinity scores as real molecules. GAT-based models showed faster and better learning relative to Graph Convolutional Network baseline models.
Collapse
Affiliation(s)
- Junde Li
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, United States
| | | | | |
Collapse
|
35
|
Mazuz E, Shtar G, Shapira B, Rokach L. Molecule generation using transformers and policy gradient reinforcement learning. Sci Rep 2023; 13:8799. [PMID: 37258546 DOI: 10.1038/s41598-023-35648-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 05/22/2023] [Indexed: 06/02/2023] Open
Abstract
Generating novel valid molecules is often a difficult task, because the vast chemical space relies on the intuition of experienced chemists. In recent years, deep learning models have helped accelerate this process. These advanced models can also help identify suitable molecules for disease treatment. In this paper, we propose Taiga, a transformer-based architecture for the generation of molecules with desired properties. Using a two-stage approach, we first treat the problem as a language modeling task of predicting the next token, using SMILES strings. Then, we use reinforcement learning to optimize molecular properties such as QED. This approach allows our model to learn the underlying rules of chemistry and more easily optimize for molecules with desired properties. Our evaluation of Taiga, which was performed with multiple datasets and tasks, shows that Taiga is comparable to, or even outperforms, state-of-the-art baselines for molecule optimization, with improvements in the QED ranging from 2 to over 20 percent. The improvement was demonstrated both on datasets containing lead molecules and random molecules. We also show that with its two stages, Taiga is capable of generating molecules with higher biological property scores than the same model without reinforcement learning.
Collapse
Affiliation(s)
- Eyal Mazuz
- Ben-Gurion University of the Negev, Beersheba, Israel.
| | - Guy Shtar
- Ben-Gurion University of the Negev, Beersheba, Israel
| | | | - Lior Rokach
- Ben-Gurion University of the Negev, Beersheba, Israel
| |
Collapse
|
36
|
Wang J, Zeng Y, Sun H, Wang J, Wang X, Jin R, Wang M, Zhang X, Cao D, Chen X, Hsieh CY, Hou T. Molecular Generation with Reduced Labeling through Constraint Architecture. J Chem Inf Model 2023. [PMID: 37184885 DOI: 10.1021/acs.jcim.3c00579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
In the past few years, a number of machine learning (ML)-based molecular generative models have been proposed for generating molecules with desirable properties, but they all require a large amount of label data of pharmacological and physicochemical properties. However, experimental determination of these labels, especially bioactivity labels, is very expensive. In this study, we analyze the dependence of various multi-property molecule generation models on biological activity label data and propose Frag-G/M, a fragment-based multi-constraint molecular generation framework based on conditional transformer, recurrent neural networks (RNNs), and reinforcement learning (RL). The experimental results illustrate that, using the same number of labels, Frag-G/M can generate more desired molecules than the baselines (several times more than the baselines). Moreover, compared with the known active compounds, the molecules generated by Frag-G/M exhibit higher scaffold diversity than those generated by the baselines, thus making it more promising to be used in real-world drug discovery scenarios.
Collapse
Affiliation(s)
- Jike Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
- School of Computer Science, Wuhan University, Wuhan, Hubei 430072, P. R. China
| | - Yundian Zeng
- College of Control Science and Engineering, Zhejiang University, Hangzhou, Zhejiang 310027, P. R. China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing, Jiangsu 210009, P. R. China
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Xiaorui Wang
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macau 999078, P. R. China
| | - Ruofan Jin
- College of Life Science, Zhejiang University, Hangzhou, Zhejiang 310027, P. R. China
| | - Mingyang Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Xujun Zhang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410004, P. R. China
| | - Xi Chen
- School of Computer Science, Wuhan University, Wuhan, Hubei 430072, P. R. China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| |
Collapse
|
37
|
Ji C, Zheng Y, Wang R, Cai Y, Wu H. Graph Polish: A Novel Graph Generation Paradigm for Molecular Optimization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2323-2337. [PMID: 34520363 DOI: 10.1109/tnnls.2021.3106392] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Molecular optimization, which transforms a given input molecule X into another Y with desired properties, is essential in molecular drug discovery. The traditional approaches either suffer from sample-inefficient learning or ignore information that can be captured with the supervised learning of optimized molecule pairs. In this study, we present a novel molecular optimization paradigm, Graph Polish. In this paradigm, with the guidance of the source and target molecule pairs of the desired properties, a heuristic optimization solution can be derived: given an input molecule, we first predict which atom can be viewed as the optimization center, and then the nearby regions are optimized around this center. We then propose an effective and efficient learning framework, Teacher and Student polish, to capture the dependencies in the optimization steps. A teacher component automatically identifies and annotates the optimization centers and the preservation, removal, and addition of some parts of the molecules; a student component learns these knowledges and applies them to a new molecule. The proposed paradigm can offer an intuitive interpretation for the molecular optimization result. Experiments with multiple optimization tasks are conducted on several benchmark datasets. The proposed approach achieves a significant advantage over the six state-of-the-art baseline methods. Also, extensive studies are conducted to validate the effectiveness, explainability, and time savings of the novel optimization paradigm.
Collapse
|
38
|
Chen L, Shen Q, Lou J. Magicmol: a light-weighted pipeline for drug-like molecule evolution and quick chemical space exploration. BMC Bioinformatics 2023; 24:173. [PMID: 37101113 PMCID: PMC10132416 DOI: 10.1186/s12859-023-05286-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Accepted: 04/13/2023] [Indexed: 04/28/2023] Open
Abstract
The flourishment of machine learning and deep learning methods has boosted the development of cheminformatics, especially regarding the application of drug discovery and new material exploration. Lower time and space expenses make it possible for scientists to search the enormous chemical space. Recently, some work combined reinforcement learning strategies with recurrent neural network (RNN)-based models to optimize the property of generated small molecules, which notably improved a batch of critical factors for these candidates. However, a common problem among these RNN-based methods is that several generated molecules have difficulty in synthesizing despite owning higher desired properties such as binding affinity. However, RNN-based framework better reproduces the molecule distribution among the training set than other categories of models during molecule exploration tasks. Thus, to optimize the whole exploration process and make it contribute to the optimization of specified molecules, we devised a light-weighted pipeline called Magicmol; this pipeline has a re-mastered RNN network and utilize SELFIES presentation instead of SMILES. Our backbone model achieved extraordinary performance while reducing the training cost; moreover, we devised reward truncate strategies to eliminate the model collapse problem. Additionally, adopting SELFIES presentation made it possible to combine STONED-SELFIES as a post-processing procedure for specified molecule optimization and quick chemical space exploration.
Collapse
Affiliation(s)
- Lin Chen
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, China
| | - Qing Shen
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China
- School of Electronic Information, Huzhou College, Huzhou, China
| | - Jungang Lou
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China.
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, China.
| |
Collapse
|
39
|
Nemoto S, Mizuno T, Kusuhara H. Investigation of chemical structure recognition by encoder-decoder models in learning progress. J Cheminform 2023; 15:45. [PMID: 37046349 PMCID: PMC10100163 DOI: 10.1186/s13321-023-00713-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 03/18/2023] [Indexed: 04/14/2023] Open
Abstract
Descriptor generation methods using latent representations of encoder-decoder (ED) models with SMILES as input are useful because of the continuity of descriptor and restorability to the structure. However, it is not clear how the structure is recognized in the learning progress of ED models. In this work, we created ED models of various learning progress and investigated the relationship between structural information and learning progress. We showed that compound substructures were learned early in ED models by monitoring the accuracy of downstream tasks and input-output substructure similarity using substructure-based descriptors, which suggests that existing evaluation methods based on the accuracy of downstream tasks may not be sensitive enough to evaluate the performance of ED models with SMILES as descriptor generation methods. On the other hand, we showed that structure restoration was time-consuming, and in particular, insufficient learning led to the estimation of a larger structure than the actual one. It can be inferred that determining the endpoint of the structure is a difficult task for the model. To our knowledge, this is the first study to link the learning progress of SMILES by ED model to chemical structures for a wide range of chemicals.
Collapse
Affiliation(s)
- Shumpei Nemoto
- Department of Pharmaceutical Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan
| | - Tadahaya Mizuno
- Department of Pharmaceutical Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan.
| | - Hiroyuki Kusuhara
- Department of Pharmaceutical Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan
| |
Collapse
|
40
|
Liu X, Zhang W, Tong X, Zhong F, Li Z, Xiong Z, Xiong J, Wu X, Fu Z, Tan X, Liu Z, Zhang S, Jiang H, Li X, Zheng M. MolFilterGAN: a progressively augmented generative adversarial network for triaging AI-designed molecules. J Cheminform 2023; 15:42. [PMID: 37031191 PMCID: PMC10082991 DOI: 10.1186/s13321-023-00711-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 03/14/2023] [Indexed: 04/10/2023] Open
Abstract
Artificial intelligence (AI)-based molecular design methods, especially deep generative models for generating novel molecule structures, have gratified our imagination to explore unknown chemical space without relying on brute-force exploration. However, whether designed by AI or human experts, the molecules need to be accessibly synthesized and biologically evaluated, and the trial-and-error process remains a resources-intensive endeavor. Therefore, AI-based drug design methods face a major challenge of how to prioritize the molecular structures with potential for subsequent drug development. This study indicates that common filtering approaches based on traditional screening metrics fail to differentiate AI-designed molecules. To address this issue, we propose a novel molecular filtering method, MolFilterGAN, based on a progressively augmented generative adversarial network. Comparative analysis shows that MolFilterGAN outperforms conventional screening approaches based on drug-likeness or synthetic ability metrics. Retrospective analysis of AI-designed discoidin domain receptor 1 (DDR1) inhibitors shows that MolFilterGAN significantly increases the efficiency of molecular triaging. Further evaluation of MolFilterGAN on eight external ligand sets suggests that MolFilterGAN is useful in triaging or enriching bioactive compounds across a wide range of target types. These results highlighted the importance of MolFilterGAN in evaluating molecules integrally and further accelerating molecular discovery especially combined with advanced AI generative models.
Collapse
Affiliation(s)
- Xiaohong Liu
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- AlphaMa Inc., No. 108, Yuxin Road, Suzhou Industrial Park, Suzhou, 215128, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Feisheng Zhong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Zhaojun Li
- AlphaMa Inc., No. 108, Yuxin Road, Suzhou Industrial Park, Suzhou, 215128, China
| | - Zhaoping Xiong
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Jiacheng Xiong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaolong Wu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xiaoqin Tan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- ByteDance AI Lab, No. 1999 Yishan Road, Shanghai, 201103, China
| | - Zhiguo Liu
- AlphaMa Inc., No. 108, Yuxin Road, Suzhou Industrial Park, Suzhou, 215128, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Hualiang Jiang
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 310024, Hangzhou, China.
| |
Collapse
|
41
|
Koutroumpa NM, Papavasileiou KD, Papadiamantis AG, Melagraki G, Afantitis A. A Systematic Review of Deep Learning Methodologies Used in the Drug Discovery Process with Emphasis on In Vivo Validation. Int J Mol Sci 2023; 24:6573. [PMID: 37047543 PMCID: PMC10095548 DOI: 10.3390/ijms24076573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 03/24/2023] [Accepted: 03/28/2023] [Indexed: 04/05/2023] Open
Abstract
The discovery and development of new drugs are extremely long and costly processes. Recent progress in artificial intelligence has made a positive impact on the drug development pipeline. Numerous challenges have been addressed with the growing exploitation of drug-related data and the advancement of deep learning technology. Several model frameworks have been proposed to enhance the performance of deep learning algorithms in molecular design. However, only a few have had an immediate impact on drug development since computational results may not be confirmed experimentally. This systematic review aims to summarize the different deep learning architectures used in the drug discovery process and are validated with further in vivo experiments. For each presented study, the proposed molecule or peptide that has been generated or identified by the deep learning model has been biologically evaluated in animal models. These state-of-the-art studies highlight that even if artificial intelligence in drug discovery is still in its infancy, it has great potential to accelerate the drug discovery cycle, reduce the required costs, and contribute to the integration of the 3R (Replacement, Reduction, Refinement) principles. Out of all the reviewed scientific articles, seven algorithms were identified: recurrent neural networks, specifically, long short-term memory (LSTM-RNNs), Autoencoders (AEs) and their Wasserstein Autoencoders (WAEs) and Variational Autoencoders (VAEs) variants; Convolutional Neural Networks (CNNs); Direct Message Passing Neural Networks (D-MPNNs); and Multitask Deep Neural Networks (MTDNNs). LSTM-RNNs were the most used architectures with molecules or peptide sequences as inputs.
Collapse
Affiliation(s)
- Nikoletta-Maria Koutroumpa
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- School of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
| | - Konstantinos D. Papavasileiou
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
- Department of ChemoInformatics, NovaMechanics MIKE., 185 45 Piraeus, Greece
| | - Anastasios G. Papadiamantis
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
| | - Georgia Melagraki
- Division of Physical Sciences & Applications, Hellenic Military Academy, 166 73 Vari, Greece
| | - Antreas Afantitis
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
- Department of ChemoInformatics, NovaMechanics MIKE., 185 45 Piraeus, Greece
| |
Collapse
|
42
|
Zhou Z, Eden M, Shen W. Treat Molecular Linear Notations as Sentences: Accurate Quantitative Structure–Property Relationship Modeling via a Natural Language Processing Approach. Ind Eng Chem Res 2023. [DOI: 10.1021/acs.iecr.2c04070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
|
43
|
Li H, Zou L, Kowah JAH, He D, Liu Z, Ding X, Wen H, Wang L, Yuan M, Liu X. A compact review of progress and prospects of deep learning in drug discovery. J Mol Model 2023; 29:117. [PMID: 36976427 DOI: 10.1007/s00894-023-05492-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 02/27/2023] [Indexed: 03/29/2023]
Abstract
BACKGROUND Drug discovery processes, such as new drug development, drug synergy, and drug repurposing, consume significant yearly resources. Computer-aided drug discovery can effectively improve the efficiency of drug discovery. Traditional computer methods such as virtual screening and molecular docking have achieved many gratifying results in drug development. However, with the rapid growth of computer science, data structures have changed considerably; with more extensive and dimensional data and more significant amounts of data, traditional computer methods can no longer be applied well. Deep learning methods are based on deep neural network structures that can handle high-dimensional data very well, so they are used in current drug development. RESULTS This review summarized the applications of deep learning methods in drug discovery, such as drug target discovery, drug de novo design, drug recommendation, drug synergy, and drug response prediction. While applying deep learning methods to drug discovery suffers from a lack of data, transfer learning is an excellent solution to this problem. Furthermore, deep learning methods can extract deeper features and have higher predictive power than other machine learning methods. Deep learning methods have great potential in drug discovery and are expected to facilitate drug discovery development.
Collapse
Affiliation(s)
- Huijun Li
- College of Medicine, Guangxi University, Nanning, 530004, China
| | - Lin Zou
- College of Medicine, Guangxi University, Nanning, 530004, China
| | | | - Dongqiong He
- College of Chemistry and Chemical Engineering, Guangxi University, Nanning, 530004, China
| | - Zifan Liu
- College of Medicine, Guangxi University, Nanning, 530004, China
| | - Xuejie Ding
- College of Medicine, Guangxi University, Nanning, 530004, China
| | - Hao Wen
- College of Chemistry and Chemical Engineering, Guangxi University, Nanning, 530004, China
| | - Lisheng Wang
- College of Medicine, Guangxi University, Nanning, 530004, China
| | - Mingqing Yuan
- College of Medicine, Guangxi University, Nanning, 530004, China
| | - Xu Liu
- College of Medicine, Guangxi University, Nanning, 530004, China.
| |
Collapse
|
44
|
Chen W, Liu X, Zhang S, Chen S. Artificial intelligence for drug discovery: Resources, methods, and applications. MOLECULAR THERAPY. NUCLEIC ACIDS 2023; 31:691-702. [PMID: 36923950 PMCID: PMC10009646 DOI: 10.1016/j.omtn.2023.02.019] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
Abstract
Conventional wet laboratory testing, validations, and synthetic procedures are costly and time-consuming for drug discovery. Advancements in artificial intelligence (AI) techniques have revolutionized their applications to drug discovery. Combined with accessible data resources, AI techniques are changing the landscape of drug discovery. In the past decades, a series of AI-based models have been developed for various steps of drug discovery. These models have been used as complements of conventional experiments and have accelerated the drug discovery process. In this review, we first introduced the widely used data resources in drug discovery, such as ChEMBL and DrugBank, followed by the molecular representation schemes that convert data into computer-readable formats. Meanwhile, we summarized the algorithms used to develop AI-based models for drug discovery. Subsequently, we discussed the applications of AI techniques in pharmaceutical analysis including predicting drug toxicity, drug bioactivity, and drug physicochemical property. Furthermore, we introduced the AI-based models for de novo drug design, drug-target structure prediction, drug-target interaction, and binding affinity prediction. Moreover, we also highlighted the advanced applications of AI in drug synergism/antagonism prediction and nanomedicine design. Finally, we discussed the challenges and future perspectives on the applications of AI to drug discovery.
Collapse
Affiliation(s)
- Wei Chen
- State Key Laboratory of Southwestern Chinese Medicine Resources, Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.,Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Xuesong Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Sanyin Zhang
- State Key Laboratory of Southwestern Chinese Medicine Resources, Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.,Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Shilin Chen
- State Key Laboratory of Southwestern Chinese Medicine Resources, Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.,Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| |
Collapse
|
45
|
Design of New Dispersants Using Machine Learning and Visual Analytics. Polymers (Basel) 2023; 15:polym15051324. [PMID: 36904566 PMCID: PMC10007083 DOI: 10.3390/polym15051324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 02/23/2023] [Accepted: 02/25/2023] [Indexed: 03/09/2023] Open
Abstract
Artificial intelligence (AI) is an emerging technology that is revolutionizing the discovery of new materials. One key application of AI is virtual screening of chemical libraries, which enables the accelerated discovery of materials with desired properties. In this study, we developed computational models to predict the dispersancy efficiency of oil and lubricant additives, a critical property in their design that can be estimated through a quantity named blotter spot. We propose a comprehensive approach that combines machine learning techniques with visual analytics strategies in an interactive tool that supports domain experts' decision-making. We evaluated the proposed models quantitatively and illustrated their benefits through a case study. Specifically, we analyzed a series of virtual polyisobutylene succinimide (PIBSI) molecules derived from a known reference substrate. Our best-performing probabilistic model was Bayesian Additive Regression Trees (BART), which achieved a mean absolute error of 5.50±0.34 and a root mean square error of 7.56±0.47, as estimated through 5-fold cross-validation. To facilitate future research, we have made the dataset, including the potential dispersants used for modeling, publicly available. Our approach can help accelerate the discovery of new oil and lubricant additives, and our interactive tool can aid domain experts in making informed decisions based on blotter spot and other key properties.
Collapse
|
46
|
Urán Landaburu L, Didier Garnham M, Agüero F. Targeting trypanosomes: how chemogenomics and artificial intelligence can guide drug discovery. Biochem Soc Trans 2023; 51:195-206. [PMID: 36606702 DOI: 10.1042/bst20220618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 12/01/2022] [Accepted: 12/05/2022] [Indexed: 01/07/2023]
Abstract
Trypanosomatids are protozoan parasites that cause human and animal neglected diseases. Despite global efforts, effective treatments are still much needed. Phenotypic screens have provided several chemical leads for drug discovery, but the mechanism of action for many of these chemicals is currently unknown. Recently, chemogenomic screens assessing the susceptibility or resistance of parasites carrying genome-wide modifications started to define the mechanism of action of drugs at large scale. In this review, we discuss how genomics is being used for drug discovery in trypanosomatids, how integration of chemical and genomics data from these and other organisms has guided prioritisations of candidate therapeutic targets and additional chemical starting points, and how these data can fuel the expansion of drug discovery pipelines into the era of artificial intelligence.
Collapse
Affiliation(s)
- Lionel Urán Landaburu
- Instituto de Investigaciones Biotecnológicas (IIB), Universidad Nacional de San Martín (UNSAM) - Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), San Martín, Argentina
- Escuela de Bio y Nanociencias (EByN), Universidad Nacional de San Martín, San Martín, Argentina
| | - Mercedes Didier Garnham
- Instituto de Investigaciones Biotecnológicas (IIB), Universidad Nacional de San Martín (UNSAM) - Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), San Martín, Argentina
- Escuela de Bio y Nanociencias (EByN), Universidad Nacional de San Martín, San Martín, Argentina
| | - Fernán Agüero
- Instituto de Investigaciones Biotecnológicas (IIB), Universidad Nacional de San Martín (UNSAM) - Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), San Martín, Argentina
- Escuela de Bio y Nanociencias (EByN), Universidad Nacional de San Martín, San Martín, Argentina
| |
Collapse
|
47
|
Zhang Y, Li S, Xing M, Yuan Q, He H, Sun S. Universal Approach to De Novo Drug Design for Target Proteins Using Deep Reinforcement Learning. ACS OMEGA 2023; 8:5464-5474. [PMID: 36816653 PMCID: PMC9933084 DOI: 10.1021/acsomega.2c06653] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 01/05/2023] [Indexed: 05/28/2023]
Abstract
In drug design, the design and manufacture of safe and effective compounds is a long-term, complex, and complicated process. Therefore, developing a new rapid and generalizable drug design method is of great value. This study aimed to propose a general model based on reinforcement learning combined with drug-target interaction, which could be used to design new molecules according to different protein targets. The method adopted recurrent neural network molecular modeling and took the drug-target affinity model as the reward function of optimal molecular generation. It did not need to know the three-dimensional structure and active sites of protein targets but only required the information of a one-dimensional amino acid sequence. This approach was demonstrated to produce drugs highly similar to marketed drugs and design molecules with a better binding energy.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| | - Shuyuan Li
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| | - Miaojuan Xing
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| | - Qing Yuan
- Department
of Chemistry and Chemical Engineering, Beijing
University of Technology, Beijing100124, China
| | - Hong He
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| | - Shaorui Sun
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| |
Collapse
|
48
|
Schoenmaker L, Béquignon OJM, Jespers W, van Westen GJP. UnCorrupt SMILES: a novel approach to de novo design. J Cheminform 2023; 15:22. [PMID: 36788579 PMCID: PMC9926805 DOI: 10.1186/s13321-023-00696-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 02/06/2023] [Indexed: 02/16/2023] Open
Abstract
Generative deep learning models have emerged as a powerful approach for de novo drug design as they aid researchers in finding new molecules with desired properties. Despite continuous improvements in the field, a subset of the outputs that sequence-based de novo generators produce cannot be progressed due to errors. Here, we propose to fix these invalid outputs post hoc. In similar tasks, transformer models from the field of natural language processing have been shown to be very effective. Therefore, here this type of model was trained to translate invalid Simplified Molecular-Input Line-Entry System (SMILES) into valid representations. The performance of this SMILES corrector was evaluated on four representative methods of de novo generation: a recurrent neural network (RNN), a target-directed RNN, a generative adversarial network (GAN), and a variational autoencoder (VAE). This study has found that the percentage of invalid outputs from these specific generative models ranges between 4 and 89%, with different models having different error-type distributions. Post hoc correction of SMILES was shown to increase model validity. The SMILES corrector trained with one error per input alters 60-90% of invalid generator outputs and fixes 35-80% of them. However, a higher error detection and performance was obtained for transformer models trained with multiple errors per input. In this case, the best model was able to correct 60-95% of invalid generator outputs. Further analysis showed that these fixed molecules are comparable to the correct molecules from the de novo generators based on novelty and similarity. Additionally, the SMILES corrector can be used to expand the amount of interesting new molecules within the targeted chemical space. Introducing different errors into existing molecules yields novel analogs with a uniqueness of 39% and a novelty of approximately 20%. The results of this research demonstrate that SMILES correction is a viable post hoc extension and can enhance the search for better drug candidates.
Collapse
Affiliation(s)
- Linde Schoenmaker
- grid.5132.50000 0001 2312 1970Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Olivier J. M. Béquignon
- grid.5132.50000 0001 2312 1970Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Willem Jespers
- grid.5132.50000 0001 2312 1970Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Gerard J. P. van Westen
- grid.5132.50000 0001 2312 1970Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| |
Collapse
|
49
|
Zhang H, Saravanan KM, Wei Y, Jiao Y, Yang Y, Pan Y, Wu X, Zhang JZH. Deep Learning-Based Bioactive Therapeutic Peptide Generation and Screening. J Chem Inf Model 2023; 63:835-845. [PMID: 36724090 DOI: 10.1021/acs.jcim.2c01485] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Many bioactive peptides demonstrated therapeutic effects over complicated diseases, such as antiviral, antibacterial, anticancer, etc. It is possible to generate a large number of potentially bioactive peptides using deep learning in a manner analogous to the generation of de novo chemical compounds using the acquired bioactive peptides as a training set. Such generative techniques would be significant for drug development since peptides are much easier and cheaper to synthesize than compounds. Despite the limited availability of deep learning-based peptide-generating models, we have built an LSTM model (called LSTM_Pep) to generate de novo peptides and fine-tuned the model to generate de novo peptides with specific prospective therapeutic benefits. Remarkably, the Antimicrobial Peptide Database has been effectively utilized to generate various kinds of potential active de novo peptides. We proposed a pipeline for screening those generated peptides for a given target and used the main protease of SARS-COV-2 as a proof-of-concept. Moreover, we have developed a deep learning-based protein-peptide prediction model (DeepPep) for rapid screening of the generated peptides for the given targets. Together with the generating model, we have demonstrated that iteratively fine-tuning training, generating, and screening peptides for higher-predicted binding affinity peptides can be achieved. Our work sheds light on developing deep learning-based methods and pipelines to effectively generate and obtain bioactive peptides with a specific therapeutic effect and showcases how artificial intelligence can help discover de novo bioactive peptides that can bind to a particular target.
Collapse
Affiliation(s)
- Haiping Zhang
- Shenzhen Institute of Synthetic Biology, Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, Guangdong, China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai 600073, Tamil Nadu, India
| | - Yanjie Wei
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, Guangdong, China
| | - Yang Jiao
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yang Yang
- Shenzhen Key Laboratory of Pathogen and Immunity, National Clinical Research Center for infectious disease, State Key Discipline of Infectious Disease, Shenzhen Third People's Hospital, Second Hospital Affiliated to Southern University of Science and Technology, Shenzhen 518112, China
| | - Yi Pan
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, Guangdong, China.,Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Xuli Wu
- School of Medicine, Shenzhen University, Shenzhen 518060, Guangdong, China
| | - John Z H Zhang
- Shenzhen Institute of Synthetic Biology, Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, Guangdong, China.,East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
50
|
Wang J, Mao J, Wang M, Le X, Wang Y. Explore drug-like space with deep generative models. Methods 2023; 210:52-59. [PMID: 36682423 DOI: 10.1016/j.ymeth.2023.01.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 01/05/2023] [Accepted: 01/17/2023] [Indexed: 01/20/2023] Open
Abstract
The process of design/discovery of drugs involves the identification and design of novel molecules that have the desired properties and bind well to a given disease-relevant target. One of the main challenges to effectively identify potential drug candidates is to explore the vast drug-like chemical space to find novel chemical structures with desired physicochemical properties and biological characteristics. Moreover, the chemical space of currently available molecular libraries is only a small fraction of the total possible drug-like chemical space. Deep molecular generative models have received much attention and provide an alternative approach to the design and discovery of molecules. To efficiently explore the drug-like space, we first constructed the drug-like dataset and then performed the generative design of drug-like molecules using a Conditional Randomized Transformer approach with the molecular access system (MACCS) fingerprint as a condition and compared it with previously published molecular generative models. The results show that the deep molecular generative model explores the wider drug-like chemical space. The generated drug-like molecules share the chemical space with known drugs, and the drug-like space captured by the combination of quantitative estimation of drug-likeness (QED) and quantitative estimate of protein-protein interaction targeting drug-likeness (QEPPI) can cover a larger drug-like space. Finally, we show the potential application of the model in design of inhibitors of MDM2-p53 protein-protein interaction. Our results demonstrate the potential application of deep molecular generative models for guided exploration in drug-like chemical space and molecular design.
Collapse
Affiliation(s)
- Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon 21983, Korea
| | - Jiashun Mao
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon 21983, Korea
| | - Meng Wang
- Department of Biostatistics, School of Public Health, Harbin Medical University
| | - Xiangyang Le
- Department of Medicinal Chemistry, Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China
| | - Yunyun Wang
- School of Pharmacy and Jiangsu Province Key Laboratory for Inflammation and Molecular Drug Target, Nantong University, Nantong 226001, China
| |
Collapse
|