1
|
Gangwal A, Lavecchia A. Artificial Intelligence in Natural Product Drug Discovery: Current Applications and Future Perspectives. J Med Chem 2025; 68:3948-3969. [PMID: 39916476 PMCID: PMC11874025 DOI: 10.1021/acs.jmedchem.4c01257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Revised: 12/01/2024] [Accepted: 01/28/2025] [Indexed: 02/28/2025]
Abstract
Drug discovery, a multifaceted process from compound identification to regulatory approval, historically plagued by inefficiencies and time lags due to limited data utilization, now faces urgent demands for accelerated lead compound identification. Innovations in biological data and computational chemistry have spurred a shift from trial-and-error methods to holistic approaches to medicinal chemistry. Computational techniques, particularly artificial intelligence (AI), notably machine learning (ML) and deep learning (DL), have revolutionized drug development, enhancing data analysis and predictive modeling. Natural products (NPs) have long served as rich sources of biologically active compounds, with many successful drugs originating from them. Advances in information science expanded NP-related databases, enabling deeper exploration with AI. Integrating AI into NP drug discovery promises accelerated discoveries, leveraging AI's analytical prowess, including generative AI for data synthesis. This perspective illuminates AI's current landscape in NP drug discovery, addressing strengths, limitations, and future trajectories to advance this vital research domain.
Collapse
Affiliation(s)
- Amit Gangwal
- Department
of Natural Product Chemistry, Shri Vile
Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, 424001 Maharashtra, India
| | - Antonio Lavecchia
- “Drug
Discovery” Laboratory, Department of Pharmacy, University of Naples Federico II, I-80131 Naples, Italy
| |
Collapse
|
2
|
Creanza TM, Alberga D, Patruno C, Mangiatordi GF, Ancona N. Transformer Decoder Learns from a Pretrained Protein Language Model to Generate Ligands with High Affinity. J Chem Inf Model 2025; 65:1258-1277. [PMID: 39871540 DOI: 10.1021/acs.jcim.4c02019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2025]
Abstract
The drug discovery process can be significantly accelerated by using deep learning methods to suggest molecules with druglike features and, more importantly, that are good candidates to bind specific proteins of interest. We present a novel deep learning generative model, Prot2Drug, that learns to generate ligands binding specific targets leveraging (i) the information carried by a pretrained protein language model and (ii) the ability of transformers to capitalize the knowledge gathered from thousands of protein-ligand interactions. The embedding unveils the receipt to follow for designing molecules binding a given protein, and Prot2Drug translates such instructions by using the syntax of the molecular language generating novel compounds which are predicted to have favorable physicochemical properties and high affinity toward specific targets. Moreover, Prot2Drug reproduced numerous known interactions between compounds and proteins used for generating them and suggested novel protein targets for known compounds, indicating potential drug repurposing strategies. Remarkably, Prot2Drug facilitates the design of promising ligands even for protein targets with limited or no information about their ligands or 3D structure.
Collapse
Affiliation(s)
- Teresa Maria Creanza
- Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing, Consiglio Nazionale delle Ricerche, Via G. Amendola, 122/d, Bari 70126, Italy
| | - Domenico Alberga
- Institute of Crystallography, Consiglio Nazionale delle Ricerche, Via G. Amendola, 122/d, Bari 70126, Italy
| | - Cosimo Patruno
- Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing, Consiglio Nazionale delle Ricerche, Via G. Amendola, 122/d, Bari 70126, Italy
| | | | - Nicola Ancona
- Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing, Consiglio Nazionale delle Ricerche, Via G. Amendola, 122/d, Bari 70126, Italy
| |
Collapse
|
3
|
Ramos MC, Collison CJ, White AD. A review of large language models and autonomous agents in chemistry. Chem Sci 2025; 16:2514-2572. [PMID: 39829984 PMCID: PMC11739813 DOI: 10.1039/d4sc03921a] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Accepted: 12/03/2024] [Indexed: 01/22/2025] Open
Abstract
Large language models (LLMs) have emerged as powerful tools in chemistry, significantly impacting molecule design, property prediction, and synthesis optimization. This review highlights LLM capabilities in these domains and their potential to accelerate scientific discovery through automation. We also review LLM-based autonomous agents: LLMs with a broader set of tools to interact with their surrounding environment. These agents perform diverse tasks such as paper scraping, interfacing with automated laboratories, and synthesis planning. As agents are an emerging topic, we extend the scope of our review of agents beyond chemistry and discuss across any scientific domains. This review covers the recent history, current capabilities, and design of LLMs and autonomous agents, addressing specific challenges, opportunities, and future directions in chemistry. Key challenges include data quality and integration, model interpretability, and the need for standard benchmarks, while future directions point towards more sophisticated multi-modal agents and enhanced collaboration between agents and experimental methods. Due to the quick pace of this field, a repository has been built to keep track of the latest studies: https://github.com/ur-whitelab/LLMs-in-science.
Collapse
Affiliation(s)
- Mayk Caldas Ramos
- FutureHouse Inc. San Francisco CA USA
- Department of Chemical Engineering, University of Rochester Rochester NY USA
| | - Christopher J Collison
- School of Chemistry and Materials Science, Rochester Institute of Technology Rochester NY USA
| | - Andrew D White
- FutureHouse Inc. San Francisco CA USA
- Department of Chemical Engineering, University of Rochester Rochester NY USA
| |
Collapse
|
4
|
Tang X, Zhao Q, Wang J, Duan G. MolFCL: predicting molecular properties through chemistry-guided contrastive and prompt learning. BIOINFORMATICS (OXFORD, ENGLAND) 2025; 41:btaf061. [PMID: 39921889 DOI: 10.1093/bioinformatics/btaf061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Revised: 01/11/2025] [Accepted: 02/05/2025] [Indexed: 02/10/2025]
Abstract
MOTIVATION Accurately identifying and predicting molecular properties is a crucial task in molecular machine learning, and the key lies in how to extract effective molecular representations. Contrastive learning opens new avenues for representation learning, and a large amount of unlabeled data enables the model to generalize to the huge chemical space. However, existing contrastive learning-based models face two challenges: (i) existing methods destroy the original molecular environment and ignore chemical prior information, and (ii) there is a lack of a prior knowledge to guide the prediction of molecular properties. RESULTS In this work, we propose a molecular property prediction framework called MolFCL, which consists of fragment-based contrastive learning and functional group-based prompt learning. Specifically, we introduced fragment-fragment interactions for the first time in the contrastive learning framework and designed a fragment-based augmented molecular graph that integrates the original chemical environment and fragment reactions. Furthermore, we proposed a novel functional group-based prompt learning during fine-tuning, which first incorporates functional group knowledge and the corresponding atomic signals, to improve molecular representation and provide interpretable analyses. The results show that MolFCL outperforms state-of-the-art baseline models on 23 molecular property prediction datasets. Moreover, visualizations show that MolFCL can learn to embed molecules into representations that can distinguish chemical properties. MolFCL can give higher weight to functional groups consistent with chemical knowledge during the prediction of molecular properties, which offers an interpretable ability of the model. Overall, MolFCL is a practically useful tool for molecular property prediction and assists drug scientists in designing drugs more effectively. AVAILABILITY AND IMPLEMENTATION MolFCL is available at https://github.com/tangxiangcsu/MolFCLSupplementary.
Collapse
Affiliation(s)
- Xiang Tang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Guihua Duan
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
5
|
Mastrolorito F, Ciriaco F, Togo MV, Gambacorta N, Trisciuzzi D, Altomare CD, Amoroso N, Grisoni F, Nicolotti O. fragSMILES as a chemical string notation for advanced fragment and chirality representation. Commun Chem 2025; 8:26. [PMID: 39880917 PMCID: PMC11779804 DOI: 10.1038/s42004-025-01423-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 01/21/2025] [Indexed: 01/31/2025] Open
Abstract
Generative models have revolutionized de novo drug design, allowing to produce molecules on-demand with desired physicochemical and pharmacological properties. String based molecular representations, such as SMILES (Simplified Molecular Input Line Entry System) and SELFIES (Self-Referencing Embedded Strings), have played a pivotal role in the success of generative approaches, thanks to their capacity to encode atom- and bond- information and ease-of-generation. However, such 'atom-level' string representations could have certain limitations, in terms of capturing information on chirality, and synthetic accessibility of the corresponding designs.In this paper, we present fragSMILES, a novel fragment-based molecular representation in the form of string. fragSMILES encode fragments in a 'chemically-meaningful' way via a novel graph-reduction approach, allowing to obtain an efficient, interpretable, and expressive molecular representation, which also avoids fragment redundancy. fragSMILES contributes to the field of fragment-based representation, by reporting fragments and their 'breaking' bonds independently. Moreover, fragSMILES also embeds information of molecular chirality, thereby overcoming known limitations of existing string notations. When compared with SMILES, SELFIES and t-SMILES for de novo design, the fragSMILES notation showed its promise in generating molecules with desirable biochemical and scaffolds properties.
Collapse
Affiliation(s)
- Fabrizio Mastrolorito
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Fulvio Ciriaco
- Dipartimento di Chimica, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Maria Vittoria Togo
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Nicola Gambacorta
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Daniela Trisciuzzi
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Cosimo Damiano Altomare
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Nicola Amoroso
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Francesca Grisoni
- Institute for Complex Molecular Systems and Eindhoven Artificial Intelligence Systems Institute, Department of Biomedical Engineering, Eindhoven University of Technology, AZ, Eindhoven, Netherlands.
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands.
| | - Orazio Nicolotti
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy.
| |
Collapse
|
6
|
Liu T, Hwang L, Burley S, Nitsche C, Southan C, Walters W, Gilson M. BindingDB in 2024: a FAIR knowledgebase of protein-small molecule binding data. Nucleic Acids Res 2025; 53:D1633-D1644. [PMID: 39574417 PMCID: PMC11701568 DOI: 10.1093/nar/gkae1075] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 10/16/2024] [Accepted: 10/23/2024] [Indexed: 01/18/2025] Open
Abstract
BindingDB (bindingdb.org) is a public, web-accessible database of experimentally measured binding affinities between small molecules and proteins, which supports diverse applications including medicinal chemistry, biochemical pathway annotation, training of artificial intelligence models and computational chemistry methods development. This update reports significant growth and enhancements since our last review in 2016. Of note, the database now contains 2.9 million binding measurements spanning 1.3 million compounds and thousands of protein targets. This growth is largely attributable to our unique focus on curating data from US patents, which has yielded a substantial influx of novel binding data. Recent improvements include a remake of the website following responsive web design principles, enhanced search and filtering capabilities, new data download options and webservices and establishment of a long-term data archive replicated across dispersed sites. We also discuss BindingDB's positioning relative to related resources, its open data sharing policies, insights gleaned from the dataset and plans for future growth and development.
Collapse
Affiliation(s)
- Tiqing Liu
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
| | - Linda Hwang
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers. The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Rutgers Cancer Institute, Robert Wood Johnson Medical School, New Brunswick, NJ 08903, USA; Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA; Rutgers Artificial Intelligence and Data Science (RAD) Collaboratory, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Carmen I Nitsche
- Cambridge Crystallographic Data Centre, Inc., Boston, MA 02108, USA
| | - Christopher Southan
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | | | - Michael K Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
7
|
Wang J, Luo H, Qin R, Wang M, Wan X, Fang M, Zhang O, Gou Q, Su Q, Shen C, You Z, Liu L, Hsieh CY, Hou T, Kang Y. 3DSMILES-GPT: 3D molecular pocket-based generation with token-only large language model. Chem Sci 2025; 16:637-648. [PMID: 39664804 PMCID: PMC11629531 DOI: 10.1039/d4sc06864e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Accepted: 12/03/2024] [Indexed: 12/13/2024] Open
Abstract
The generation of three-dimensional (3D) molecules based on target structures represents a cutting-edge challenge in drug discovery. Many existing approaches often produce molecules with invalid configurations, unphysical conformations, suboptimal drug-like qualities, limited synthesizability, and require extensive generation times. To address these challenges, we present 3DSMILES-GPT, a fully language-model-driven framework for 3D molecular generation that utilizes tokens exclusively. We treat both two-dimensional (2D) and 3D molecular representations as linguistic expressions, combining them through full-dimensional representations and pre-training the model on a vast dataset encompassing tens of millions of drug-like molecules. This token-only approach enables the model to comprehensively understand the 2D and 3D characteristics of large-scale molecules. Subsequently, we fine-tune the model using pair-wise structural data of protein pockets and molecules, followed by reinforcement learning to further optimize the biophysical and chemical properties of the generated molecules. Experimental results demonstrate that 3DSMILES-GPT generates molecules that comprehensively outperform existing methods in terms of binding affinity, drug-likeness (QED), and synthetic accessibility score (SAS). Notably, it achieves a 33% enhancement in the quantitative estimation of QED, meanwhile the binding affinity estimated by Vina docking maintaining its state-of-the-art performance. The generation speed is remarkably fast, with the average time approximately 0.45 seconds per generation, representing a threefold increase over the fastest existing methods. This innovative 3DSMILES-GPT approach has the potential to positively impact the generation of 3D molecules in drug discovery.
Collapse
Affiliation(s)
- Jike Wang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Hao Luo
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Rui Qin
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Mingyang Wang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiaozhe Wan
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd Nanjing 210000 Jiangsu China
| | - Meijing Fang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Odin Zhang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Qiaolin Gou
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Qun Su
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chao Shen
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Ziyi You
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd Nanjing 210000 Jiangsu China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
8
|
Lu Z, Han J, Ji Y, Li B, Zhang A. Computational design of CDK1 inhibitors with enhanced target affinity and drug-likeness using deep-learning framework. Heliyon 2024; 10:e40345. [PMID: 39748968 PMCID: PMC11693894 DOI: 10.1016/j.heliyon.2024.e40345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 09/20/2024] [Accepted: 11/11/2024] [Indexed: 01/04/2025] Open
Abstract
Cyclin Dependent Kinase 1 (CDK1) plays a crucial role in cell cycle regulation, and dysregulation of its activity has been implicated in various cancers. Although several CDK1 inhibitors are currently in clinical trials, none have yet been approved for therapeutic use. This research utilized deep learning techniques, specifically Recurrent Neural Networks with Long Short-Term Memory (LSTM), to generate potential CDK1 inhibitors. Molecular docking, evaluation of molecular properties, and molecular dynamics simulations were conducted to identify the most promising candidates. The results showed that the generated ligands exhibited substantial improvements in target affinity and drug-likeness. Molecular docking results showed that the generated ligands had an average binding affinity of -10.65 ± 0.877 kcal/mol towards CDK1. The Quantitative Estimate of Drug-likeness (QED) values for the generated ligands averaged 0.733 ± 0.10, significantly higher than the 0.547 ± 0.15 observed for known CDK1 inhibitors (p < 0.001). Molecular dynamics simulations further confirmed the stability and favorable interactions of the selected ligands with the CDK1 complex. The identification of novel CDK1 inhibitors with improved binding affinities and drug-likeness properties could potentially fill the gap in the ongoing development of CDK inhibitors. However, it is imperative to note that extensive experimental validation is required prior to advancing these generated ligands to subsequent stages of drug development.
Collapse
Affiliation(s)
- Zuokun Lu
- Food and Pharmacy College, Xuchang University, Xuchang, 461000, Henan, China
- Key Laboratory of Biomarker-Based Rapid Detection Technology for Food Safety of Henan Province, Xuchang University, Xuchang, 461000, Henan, China
| | - Jiayuan Han
- Food and Pharmacy College, Xuchang University, Xuchang, 461000, Henan, China
| | - Yibo Ji
- Food and Pharmacy College, Xuchang University, Xuchang, 461000, Henan, China
| | - Bingrui Li
- Food and Pharmacy College, Xuchang University, Xuchang, 461000, Henan, China
| | - Aili Zhang
- Food and Pharmacy College, Xuchang University, Xuchang, 461000, Henan, China
| |
Collapse
|
9
|
Romanelli V, Annunziata D, Cerchia C, Cerciello D, Piccialli F, Lavecchia A. Enhancing De Novo Drug Design across Multiple Therapeutic Targets with CVAE Generative Models. ACS OMEGA 2024; 9:43963-43976. [PMID: 39493989 PMCID: PMC11525747 DOI: 10.1021/acsomega.4c08027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Revised: 09/25/2024] [Accepted: 09/30/2024] [Indexed: 11/05/2024]
Abstract
Drug discovery is a costly and time-consuming process, necessitating innovative strategies to enhance efficiency across different stages, from initial hit identification to final market approval. Recent advancement in deep learning (DL), particularly in de novo drug design, show promise. Generative models, a subclass of DL algorithms, have significantly accelerated the de novo drug design process by exploring vast areas of chemical space. Here, we introduce a Conditional Variational Autoencoder (CVAE) generative model tailored for de novo molecular design tasks, utilizing both SMILES and SELFIES as molecular representations. Our computational framework successfully generates molecules with specific property profiles validated though metrics such as uniqueness, validity, novelty, quantitative estimate of drug-likeness (QED), and synthetic accessibility (SA). We evaluated our model's efficacy in generating novel molecules capable of binding to three therapeutic molecular targets: CDK2, PPARγ, and DPP-IV. Comparing with state-of-the-art frameworks demonstrated our model's ability to achieve higher structural diversity while maintaining the molecular properties ranges observed in the training set molecules. This proposed model stands as a valuable resource for advancing de novo molecular design capabilities.
Collapse
Affiliation(s)
- Virgilio Romanelli
- Department
of Pharmacy, “Drug Discovery Laboratory”, University of Naples Federico II, Naples 80131, Italy
| | - Daniela Annunziata
- Department
of Mathematics and Applications “R. Caccioppoli”, University of Naples Federico II, Naples 80126, Italy
| | - Carmen Cerchia
- Department
of Pharmacy, “Drug Discovery Laboratory”, University of Naples Federico II, Naples 80131, Italy
| | - Donato Cerciello
- Department
of Mathematics and Applications “R. Caccioppoli”, University of Naples Federico II, Naples 80126, Italy
| | - Francesco Piccialli
- Department
of Mathematics and Applications “R. Caccioppoli”, University of Naples Federico II, Naples 80126, Italy
| | - Antonio Lavecchia
- Department
of Pharmacy, “Drug Discovery Laboratory”, University of Naples Federico II, Naples 80131, Italy
| |
Collapse
|
10
|
Kneiding H, Balcells D. Augmenting genetic algorithms with machine learning for inverse molecular design. Chem Sci 2024:d4sc02934h. [PMID: 39296997 PMCID: PMC11404003 DOI: 10.1039/d4sc02934h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 09/09/2024] [Indexed: 09/21/2024] Open
Abstract
Evolutionary and machine learning methods have been successfully applied to the generation of molecules and materials exhibiting desired properties. The combination of these two paradigms in inverse design tasks can yield powerful methods that explore massive chemical spaces more efficiently, improving the quality of the generated compounds. However, such synergistic approaches are still an incipient area of research and appear underexplored in the literature. This perspective covers different ways of incorporating machine learning approaches into evolutionary learning frameworks, with the overall goal of increasing the optimization efficiency of genetic algorithms. In particular, machine learning surrogate models for faster fitness function evaluation, discriminator models to control population diversity on-the-fly, machine learning based crossover operations, and evolution in latent space are discussed. The further potential of these synergistic approaches in generative tasks is also assessed, outlining promising directions for future developments.
Collapse
Affiliation(s)
- Hannes Kneiding
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo P.O. Box 1033, Blindern 0315 Oslo Norway
| | - David Balcells
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo P.O. Box 1033, Blindern 0315 Oslo Norway
| |
Collapse
|
11
|
He J, Tibo A, Janet JP, Nittinger E, Tyrchan C, Czechtizky W, Engkvist O. Evaluation of reinforcement learning in transformer-based molecular design. J Cheminform 2024; 16:95. [PMID: 39118113 PMCID: PMC11312936 DOI: 10.1186/s13321-024-00887-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 07/21/2024] [Indexed: 08/10/2024] Open
Abstract
Designing compounds with a range of desirable properties is a fundamental challenge in drug discovery. In pre-clinical early drug discovery, novel compounds are often designed based on an already existing promising starting compound through structural modifications for further property optimization. Recently, transformer-based deep learning models have been explored for the task of molecular optimization by training on pairs of similar molecules. This provides a starting point for generating similar molecules to a given input molecule, but has limited flexibility regarding user-defined property profiles. Here, we evaluate the effect of reinforcement learning on transformer-based molecular generative models. The generative model can be considered as a pre-trained model with knowledge of the chemical space close to an input compound, while reinforcement learning can be viewed as a tuning phase, steering the model towards chemical space with user-specific desirable properties. The evaluation of two distinct tasks-molecular optimization and scaffold discovery-suggest that reinforcement learning could guide the transformer-based generative model towards the generation of more compounds of interest. Additionally, the impact of pre-trained models, learning steps and learning rates are investigated.Scientific contributionOur study investigates the effect of reinforcement learning on a transformer-based generative model initially trained for generating molecules similar to starting molecules. The reinforcement learning framework is applied to facilitate multiparameter optimisation of starting molecules. This approach allows for more flexibility for optimizing user-specific property profiles and helps finding more ideas of interest.
Collapse
Affiliation(s)
- Jiazhen He
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Eva Nittinger
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Christian Tyrchan
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Werngard Czechtizky
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
12
|
Hu J, Wu P, Wang S, Wang B, Yang G. A Human Feedback Strategy for Photoresponsive Molecules in Drug Delivery: Utilizing GPT-2 and Time-Dependent Density Functional Theory Calculations. Pharmaceutics 2024; 16:1014. [PMID: 39204359 PMCID: PMC11359544 DOI: 10.3390/pharmaceutics16081014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 07/11/2024] [Accepted: 07/19/2024] [Indexed: 09/04/2024] Open
Abstract
Photoresponsive drug delivery stands as a pivotal frontier in smart drug administration, leveraging the non-invasive, stable, and finely tunable nature of light-triggered methodologies. The generative pre-trained transformer (GPT) has been employed to generate molecular structures. In our study, we harnessed GPT-2 on the QM7b dataset to refine a UV-GPT model with adapters, enabling the generation of molecules responsive to UV light excitation. Utilizing the Coulomb matrix as a molecular descriptor, we predicted the excitation wavelengths of these molecules. Furthermore, we validated the excited state properties through quantum chemical simulations. Based on the results of these calculations, we summarized some tips for chemical structures and integrated them into the alignment of large-scale language models within the reinforcement learning from human feedback (RLHF) framework. The synergy of these findings underscores the successful application of GPT technology in this critical domain.
Collapse
Affiliation(s)
- Junjie Hu
- Faculty of Medicine, Imperial College London, London SW7 2AZ, UK
| | - Peng Wu
- School of Chemistry and Chemical Engineering, Ningxia University, Yinchuan 750014, China
| | - Shiyi Wang
- Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK
| | - Binju Wang
- College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Guang Yang
- Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK
- National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK
- Cardiovascular Research Centre, Royal Brompton Hospital, London SW3 6NP, UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, London WC2R 2LS, UK
| |
Collapse
|
13
|
Atz K, Nippa DF, Müller AT, Jost V, Anelli A, Reutlinger M, Kramer C, Martin RE, Grether U, Schneider G, Wuitschik G. Geometric deep learning-guided Suzuki reaction conditions assessment for applications in medicinal chemistry. RSC Med Chem 2024; 15:2310-2321. [PMID: 39026644 PMCID: PMC11253849 DOI: 10.1039/d4md00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/25/2024] [Indexed: 07/20/2024] Open
Abstract
Suzuki cross-coupling reactions are considered a valuable tool for constructing carbon-carbon bonds in small molecule drug discovery. However, the synthesis of chemical matter often represents a time-consuming and labour-intensive bottleneck. We demonstrate how machine learning methods trained on high-throughput experimentation (HTE) data can be leveraged to enable fast reaction condition selection for novel coupling partners. We show that the trained models support chemists in determining suitable catalyst-solvent-base combinations for individual transformations including an evaluation of the need for HTE screening. We introduce an algorithm for designing 96-well plates optimized towards reaction yields and discuss the model performance of zero- and few-shot machine learning. The best-performing machine learning model achieved a three-category classification accuracy of 76.3% (±0.2%) and an F 1-score for a binary classification of 79.1% (±0.9%). Validation on eight reactions revealed a receiver operating characteristic (ROC) curve (AUC) value of 0.82 (±0.07) for few-shot machine learning. On the other hand, zero-shot machine learning models achieved a mean ROC-AUC value of 0.63 (±0.16). This study positively advocates the application of few-shot machine learning-guided reaction condition selection for HTE campaigns in medicinal chemistry and highlights practical applications as well as challenges associated with zero-shot machine learning.
Collapse
Affiliation(s)
- Kenneth Atz
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - David F Nippa
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Alex T Müller
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Vera Jost
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Andrea Anelli
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Michael Reutlinger
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Christian Kramer
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Rainer E Martin
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Uwe Grether
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich Vladimir-Prelog-Weg 4 8093 Zurich Switzerland
| | - Georg Wuitschik
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| |
Collapse
|
14
|
Kim J, Chang W, Ji H, Joung I. Quantum-Informed Molecular Representation Learning Enhancing ADMET Property Prediction. J Chem Inf Model 2024; 64:5028-5040. [PMID: 38916580 DOI: 10.1021/acs.jcim.4c00772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
We examined pretraining tasks leveraging abundant labeled data to effectively enhance molecular representation learning in downstream tasks, specifically emphasizing graph transformers to improve the prediction of ADMET properties. Our investigation revealed limitations in previous pretraining tasks and identified more meaningful training targets, ranging from 2D molecular descriptors to extensive quantum chemistry simulations. These data were seamlessly integrated into supervised pretraining tasks. The implementation of our pretraining strategy and multitask learning outperforms conventional methods, achieving state-of-the-art outcomes in 7 out of 22 ADMET tasks within the Therapeutics Data Commons by utilizing a shared encoder across all tasks. Our approach underscores the effectiveness of learning molecular representations and highlights the potential for scalability when leveraging extensive data sets, marking a significant advancement in this domain.
Collapse
Affiliation(s)
- Jungwoo Kim
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| | - Woojae Chang
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| | - Hyunjun Ji
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| | - InSuk Joung
- Standigm Inc., 182 Dogok-ro, 6F, Gangnam-gu, Seoul 06261, Korea
| |
Collapse
|
15
|
An Y, Lim J, Glavatskikh M, Wang X, Norris-Drouin J, Hardy PB, Leisner TM, Pearce KH, Kireev D. In silico fragment-based discovery of CIB1-directed anti-tumor agents by FRASE-bot. Nat Commun 2024; 15:5564. [PMID: 38956119 PMCID: PMC11219766 DOI: 10.1038/s41467-024-49892-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 06/19/2024] [Indexed: 07/04/2024] Open
Abstract
Chemical probes are an indispensable tool for translating biological discoveries into new therapies, though are increasingly difficult to identify since novel therapeutic targets are often hard-to-drug proteins. We introduce FRASE-based hit-finding robot (FRASE-bot), to expedite drug discovery for unconventional therapeutic targets. FRASE-bot mines available 3D structures of ligand-protein complexes to create a database of FRAgments in Structural Environments (FRASE). The FRASE database can be screened to identify structural environments similar to those in the target protein and seed the target structure with relevant ligand fragments. A neural network model is used to retain fragments with the highest likelihood of being native binders. The seeded fragments then inform ultra-large-scale virtual screening of commercially available compounds. We apply FRASE-bot to identify ligands for Calcium and Integrin Binding protein 1 (CIB1), a promising drug target implicated in triple negative breast cancer. FRASE-based virtual screening identifies a small-molecule CIB1 ligand (with binding confirmed in a TR-FRET assay) showing specific cell-killing activity in CIB1-dependent cancer cells, but not in CIB1-depletion-insensitive cells.
Collapse
Affiliation(s)
- Yi An
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Jiwoong Lim
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Marta Glavatskikh
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Xiaowen Wang
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
- Chemistry department, University of Missouri, Columbia, Columbia, MO, 65211, USA
| | - Jacqueline Norris-Drouin
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - P Brian Hardy
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Tina M Leisner
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Kenneth H Pearce
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA.
| | - Dmitri Kireev
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA.
- Chemistry department, University of Missouri, Columbia, Columbia, MO, 65211, USA.
| |
Collapse
|
16
|
Yoo S, Kim J. Adapt-cMolGPT: A Conditional Generative Pre-Trained Transformer with Adapter-Based Fine-Tuning for Target-Specific Molecular Generation. Int J Mol Sci 2024; 25:6641. [PMID: 38928346 PMCID: PMC11203498 DOI: 10.3390/ijms25126641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 06/09/2024] [Accepted: 06/14/2024] [Indexed: 06/28/2024] Open
Abstract
Small-molecule drug design aims to generate compounds that target specific proteins, playing a crucial role in the early stages of drug discovery. Recently, research has emerged that utilizes the GPT model, which has achieved significant success in various fields to generate molecular compounds. However, due to the persistent challenge of small datasets in the pharmaceutical field, there has been some degradation in the performance of generating target-specific compounds. To address this issue, we propose an enhanced target-specific drug generation model, Adapt-cMolGPT, which modifies molecular representation and optimizes the fine-tuning process. In particular, we introduce a new fine-tuning method that incorporates an adapter module into a pre-trained base model and alternates weight updates by sections. We evaluated the proposed model through multiple experiments and demonstrated performance improvements compared to previous models. In the experimental results, Adapt-cMolGPT generated a greater number of novel and valid compounds compared to other models, with these generated compounds exhibiting properties similar to those of real molecular data. These results indicate that our proposed method is highly effective in designing drugs targeting specific proteins.
Collapse
Affiliation(s)
- Soyoung Yoo
- Department of Artificial Intelligence, Sejong University, Seoul 05006, Republic of Korea;
| | - Junghyun Kim
- Department of Artificial Intelligence, Sejong University, Seoul 05006, Republic of Korea;
- Deep Learning Architecture Research Center, Sejong University, Seoul 05006, Republic of Korea
| |
Collapse
|
17
|
Das M, Ghosh A, Sunoj RB. Advances in machine learning with chemical language models in molecular property and reaction outcome predictions. J Comput Chem 2024; 45:1160-1176. [PMID: 38299229 DOI: 10.1002/jcc.27315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/06/2024] [Accepted: 01/09/2024] [Indexed: 02/02/2024]
Abstract
Molecular properties and reactions form the foundation of chemical space. Over the years, innumerable molecules have been synthesized, a smaller fraction of them found immediate applications, while a larger proportion served as a testimony to creative and empirical nature of the domain of chemical science. With increasing emphasis on sustainable practices, it is desirable that a target set of molecules are synthesized preferably through a fewer empirical attempts instead of a larger library, to realize an active candidate. In this front, predictive endeavors using machine learning (ML) models built on available data acquire high timely significance. Prediction of molecular property and reaction outcome remain one of the burgeoning applications of ML in chemical science. Among several methods of encoding molecular samples for ML models, the ones that employ language like representations are gaining steady popularity. Such representations would additionally help adopt well-developed natural language processing (NLP) models for chemical applications. Given this advantageous background, herein we describe several successful chemical applications of NLP focusing on molecular property and reaction outcome predictions. From relatively simpler recurrent neural networks (RNNs) to complex models like transformers, different network architecture have been leveraged for tasks such as de novo drug design, catalyst generation, forward and retro-synthesis predictions. The chemical language model (CLM) provides promising avenues toward a broad range of applications in a time and cost-effective manner. While we showcase an optimistic outlook of CLMs, attention is also placed on the persisting challenges in reaction domain, which would optimistically be addressed by advanced algorithms tailored to chemical language and with increased availability of high-quality datasets.
Collapse
Affiliation(s)
- Manajit Das
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| | - Ankit Ghosh
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
- Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Mumbai, India
| |
Collapse
|
18
|
Wang S, Liang D, Wang J, Dong K, Zhang Y, Liang H, Xu X, Song T. FraHMT: A Fragment-Oriented Heterogeneous Graph Molecular Generation Model for Target Proteins. J Chem Inf Model 2024; 64:3718-3732. [PMID: 38644797 DOI: 10.1021/acs.jcim.4c00252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
The molecular generation task stands as a pivotal step in the domains of computational chemistry and drug discovery, aiming to computationally generate molecular structures for specific properties. In contrast to previous models that focused primarily on SMILES strings or molecular graphs, our model placed a special emphasis on the substructure information on molecules, enabling the model to learn richer chemical rules and structure features from fragments and chemical reaction information on molecules. To accomplish this, we fragmented the molecules to construct heterogeneous graph representations based on atom and fragment information. Then our model mapped the heterogeneous graph data into a latent vector space by using an encoder and employed a self-regressive generative model as a decoder for molecular generation. Additionally, we performed transfer learning on the model using a small set of ligand molecules known to be active against the target protein to generate molecules that bind better to the target protein. Experimental results demonstrate that our model is highly competitive with state-of-the-art models. It can generate valid and diverse molecules with favorable physicochemical properties and drug-likeness. Importantly, they produce novel molecules with high docking scores against the target proteins.
Collapse
Affiliation(s)
- Shuang Wang
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
| | - Dingming Liang
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
| | - Jianmin Wang
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
- The Interdisciplinary Graduate Program in Integrative Biotechnology, Yonsei University, Incheon 21983, Republic of Korea
| | - Kaiyu Dong
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
| | - Yunjing Zhang
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
| | - Huicong Liang
- Marine Biomedical Institute of Qingdao, School of Medicine and Pharmacy, Ocean University of China, QingDao 266580, China
| | - Ximing Xu
- Marine Biomedical Institute of Qingdao, School of Medicine and Pharmacy, Ocean University of China, QingDao 266580, China
| | - Tao Song
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
- Department of Artificial Intelligence, Faculty of Computer Science, Polytechnical University of Madrid, Madrid 28031, Spain
| |
Collapse
|
19
|
Xia W, Xiao J, Bian H, Zhang J, Zhang JZH, Zhang H. Deep Learning-Based construction of a Drug-Like compound database and its application in virtual screening of HsDHODH inhibitors. Methods 2024; 225:44-51. [PMID: 38518843 DOI: 10.1016/j.ymeth.2024.03.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 01/24/2024] [Accepted: 03/13/2024] [Indexed: 03/24/2024] Open
Abstract
The process of virtual screening relies heavily on the databases, but it is disadvantageous to conduct virtual screening based on commercial databases with patent-protected compounds, high compound toxicity and side effects. Therefore, this paper utilizes generative recurrent neural networks (RNN) containing long short-term memory (LSTM) cells to learn the properties of drug compounds in the DrugBank, aiming to obtain a new and virtual screening compounds database with drug-like properties. Ultimately, a compounds database consisting of 26,316 compounds is obtained by this method. To evaluate the potential of this compounds database, a series of tests are performed, including chemical space, ADME properties, compound fragmentation, and synthesizability analysis. As a result, it is proved that the database is equipped with good drug-like properties and a relatively new backbone, its potential in virtual screening is further tested. Finally, a series of seedling compounds with completely new backbones are obtained through docking and binding free energy calculations.
Collapse
Affiliation(s)
- Wei Xia
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Jin Xiao
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Key Laboratory of Green Chemistry & Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University at Shanghai, 200062, China
| | - Hengwei Bian
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Key Laboratory of Green Chemistry & Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University at Shanghai, 200062, China.
| | - Jiajun Zhang
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - John Z H Zhang
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Key Laboratory of Green Chemistry & Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University at Shanghai, 200062, China; NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China; Department of Chemistry, New York University, NY, NY10003, USA; Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi, 030006, China.
| | - Haiping Zhang
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| |
Collapse
|
20
|
Vogt M. Chemoinformatic approaches for navigating large chemical spaces. Expert Opin Drug Discov 2024; 19:403-414. [PMID: 38300511 DOI: 10.1080/17460441.2024.2313475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/30/2024] [Indexed: 02/02/2024]
Abstract
INTRODUCTION Large chemical spaces (CSs) include traditional large compound collections, combinatorial libraries covering billions to trillions of molecules, DNA-encoded chemical libraries comprising complete combinatorial CSs in a single mixture, and virtual CSs explored by generative models. The diverse nature of these types of CSs require different chemoinformatic approaches for navigation. AREAS COVERED An overview of different types of large CSs is provided. Molecular representations and similarity metrics suitable for large CS exploration are discussed. A summary of navigation of CSs in generative models is provided. Methods for characterizing and comparing CSs are discussed. EXPERT OPINION The size of large CSs might restrict navigation to specialized algorithms and limit it to considering neighborhoods of structurally similar molecules. Efficient navigation of large CSs not only requires methods that scale with size but also requires smart approaches that focus on better but not necessarily larger molecule selections. Deep generative models aim to provide such approaches by implicitly learning features relevant for targeted biological properties. It is unclear whether these models can fulfill this ideal as validation is difficult as long as the covered CSs remain mainly virtual without experimental verification.
Collapse
Affiliation(s)
- Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
| |
Collapse
|
21
|
Dodds M, Guo J, Löhr T, Tibo A, Engkvist O, Janet JP. Sample efficient reinforcement learning with active learning for molecular design. Chem Sci 2024; 15:4146-4160. [PMID: 38487235 PMCID: PMC10935729 DOI: 10.1039/d3sc04653b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 02/07/2024] [Indexed: 03/17/2024] Open
Abstract
Reinforcement learning (RL) is a powerful and flexible paradigm for searching for solutions in high-dimensional action spaces. However, bridging the gap between playing computer games with thousands of simulated episodes and solving real scientific problems with complex and involved environments (up to actual laboratory experiments) requires improvements in terms of sample efficiency to make the most of expensive information. The discovery of new drugs is a major commercial application of RL, motivated by the very large nature of the chemical space and the need to perform multiparameter optimization (MPO) across different properties. In silico methods, such as virtual library screening (VS) and de novo molecular generation with RL, show great promise in accelerating this search. However, incorporation of increasingly complex computational models in these workflows requires increasing sample efficiency. Here, we introduce an active learning system linked with an RL model (RL-AL) for molecular design, which aims to improve the sample-efficiency of the optimization process. We identity and characterize unique challenges combining RL and AL, investigate the interplay between the systems, and develop a novel AL approach to solve the MPO problem. Our approach greatly expedites the search for novel solutions relative to baseline-RL for simple ligand- and structure-based oracle functions, with a 5-66-fold increase in hits generated for a fixed oracle budget and a 4-64-fold reduction in computational time to find a specific number of hits. Furthermore, compounds discovered through RL-AL display substantial enrichment of a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds, without a reduction in output diversity. This significant acceleration improves the feasibility of oracle functions that have largely been overlooked in RL due to high computational costs, for example free energy perturbation methods, and in principle is applicable to any RL domain.
Collapse
Affiliation(s)
- Michael Dodds
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jeff Guo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Thomas Löhr
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| |
Collapse
|
22
|
Loeffler HH, He J, Tibo A, Janet JP, Voronov A, Mervin LH, Engkvist O. Reinvent 4: Modern AI-driven generative molecule design. J Cheminform 2024; 16:20. [PMID: 38383444 PMCID: PMC10882833 DOI: 10.1186/s13321-024-00812-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/09/2024] [Indexed: 02/23/2024] Open
Abstract
REINVENT 4 is a modern open-source generative AI framework for the design of small molecules. The software utilizes recurrent neural networks and transformer architectures to drive molecule generation. These generators are seamlessly embedded within the general machine learning optimization algorithms, transfer learning, reinforcement learning and curriculum learning. REINVENT 4 enables and facilitates de novo design, R-group replacement, library design, linker design, scaffold hopping and molecule optimization. This contribution gives an overview of the software and describes its design. Algorithms and their applications are discussed in detail. REINVENT 4 is a command line tool which reads a user configuration in either TOML or JSON format. The aim of this release is to provide reference implementations for some of the most common algorithms in AI based molecule generation. An additional goal with the release is to create a framework for education and future innovation in AI based molecular design. The software is available from https://github.com/MolecularAI/REINVENT4 and released under the permissive Apache 2.0 license. Scientific contribution. The software provides an open-source reference implementation for generative molecular design where the software is also being used in production to support in-house drug discovery projects. The publication of the most common machine learning algorithms in one code and full documentation thereof will increase transparency of AI and foster innovation, collaboration and education.
Collapse
Affiliation(s)
- Hannes H Loeffler
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| | - Jiazhen He
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Alexey Voronov
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Lewis H Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
23
|
Kyro GW, Morgunov A, Brent RI, Batista VS. ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation. J Chem Inf Model 2024; 64:653-665. [PMID: 38287889 DOI: 10.1021/acs.jcim.3c01456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology and demonstrate its applicability to targeted molecular generation. When applied to c-Abl kinase, a protein with FDA-approved small-molecule inhibitors, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. To facilitate implementation and reproducibility, we made all of our software available through the open-source ChemSpaceAL Python package.
Collapse
Affiliation(s)
- Gregory W Kyro
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Anton Morgunov
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Rafael I Brent
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Victor S Batista
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| |
Collapse
|
24
|
Chowdhury J, Fricke C, Bamidele O, Bello M, Yang W, Heyden A, Terejanu G. Invariant Molecular Representations for Heterogeneous Catalysis. J Chem Inf Model 2024; 64:327-339. [PMID: 38197612 PMCID: PMC10806804 DOI: 10.1021/acs.jcim.3c00594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 12/25/2023] [Accepted: 12/28/2023] [Indexed: 01/11/2024]
Abstract
Catalyst screening is a critical step in the discovery and development of heterogeneous catalysts, which are vital for a wide range of chemical processes. In recent years, computational catalyst screening, primarily through density functional theory (DFT), has gained significant attention as a method for identifying promising catalysts. However, the computation of adsorption energies for all likely chemical intermediates present in complex surface chemistries is computationally intensive and costly due to the expensive nature of these calculations and the intrinsic idiosyncrasies of the methods or data sets used. This study introduces a novel machine learning (ML) method to learn adsorption energies from multiple DFT functionals by using invariant molecular representations (IMRs). To do this, we first extract molecular fingerprints for the reaction intermediates and later use a Siamese-neural-network-based training strategy to learn invariant molecular representations or the IMR across all available functionals. Our Siamese network-based representations demonstrate superior performance in predicting adsorption energies compared with other molecular representations. Notably, when considering mean absolute values of adsorption energies as 0.43 eV (PBE-D3), 0.46 eV (BEEF-vdW), 0.81 eV (RPBE), and 0.37 eV (scan+rVV10), our IMR method has achieved the lowest mean absolute errors (MAEs) of 0.18 0.10, 0.16, and 0.18 eV, respectively. These results emphasize the superior predictive capacity of our Siamese network-based representations. The empirical findings in this study illuminate the efficacy, robustness, and dependability of our proposed ML paradigm in predicting adsorption energies, specifically for propane dehydrogenation on a platinum catalyst surface.
Collapse
Affiliation(s)
- Jawad Chowdhury
- Department
of Computer Science, University of North
Carolina at Charlotte, Charlotte, North Carolina 28223, United States
| | - Charles Fricke
- Department
of Chemical Engineering, University of South
Carolina, Columbia, South Carolina 29208, United States
| | - Olajide Bamidele
- Department
of Chemical Engineering, University of South
Carolina, Columbia, South Carolina 29208, United States
| | - Mubarak Bello
- Department
of Chemical Engineering, University of South
Carolina, Columbia, South Carolina 29208, United States
| | - Wenqiang Yang
- Department
of Chemical Engineering, University of South
Carolina, Columbia, South Carolina 29208, United States
| | - Andreas Heyden
- Department
of Chemical Engineering, University of South
Carolina, Columbia, South Carolina 29208, United States
| | - Gabriel Terejanu
- Department
of Computer Science, University of North
Carolina at Charlotte, Charlotte, North Carolina 28223, United States
| |
Collapse
|
25
|
Melancon K, Pliushcheuskaya P, Meiler J, Künze G. Targeting ion channels with ultra-large library screening for hit discovery. Front Mol Neurosci 2024; 16:1336004. [PMID: 38249296 PMCID: PMC10796734 DOI: 10.3389/fnmol.2023.1336004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 12/05/2023] [Indexed: 01/23/2024] Open
Abstract
Ion channels play a crucial role in a variety of physiological and pathological processes, making them attractive targets for drug development in diseases such as diabetes, epilepsy, hypertension, cancer, and chronic pain. Despite the importance of ion channels in drug discovery, the vastness of chemical space and the complexity of ion channels pose significant challenges for identifying drug candidates. The use of in silico methods in drug discovery has dramatically reduced the time and cost of drug development and has the potential to revolutionize the field of medicine. Recent advances in computer hardware and software have enabled the screening of ultra-large compound libraries. Integration of different methods at various scales and dimensions is becoming an inevitable trend in drug development. In this review, we provide an overview of current state-of-the-art computational chemistry methodologies for ultra-large compound library screening and their application to ion channel drug discovery research. We discuss the advantages and limitations of various in silico techniques, including virtual screening, molecular mechanics/dynamics simulations, and machine learning-based approaches. We also highlight several successful applications of computational chemistry methodologies in ion channel drug discovery and provide insights into future directions and challenges in this field.
Collapse
Affiliation(s)
- Kortney Melancon
- Department of Chemistry, Vanderbilt University, Nashville, TN, United States
- Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | | | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, United States
- Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
- Medical Faculty, Institute for Drug Discovery, Leipzig University, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence, Leipzig University, Leipzig, Germany
| | - Georg Künze
- Medical Faculty, Institute for Drug Discovery, Leipzig University, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence, Leipzig University, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| |
Collapse
|
26
|
Agarwal D, Kumar S, Ambatwar R, Bhanwala N, Chandrakar L, Khatik GL. Lead Identification Through In Silico Studies: Targeting Acetylcholinesterase Enzyme Against Alzheimer's Disease. Cent Nerv Syst Agents Med Chem 2024; 24:219-242. [PMID: 38288823 DOI: 10.2174/0118715249268585240107184956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 11/09/2023] [Accepted: 12/01/2023] [Indexed: 07/23/2024]
Abstract
AIMS In this work, we aimed to acquire the best potential small molecule for Alzheimer's disease (AD) treatment using different models in Biovia Discovery Studio to identify new potential inhibitors of acetylcholinesterase (AChE) via in silico studies. BACKGROUND The prevalence of cognitive impairment-related neurodegenerative disorders, such as AD, has been observed to escalate rapidly. However, we still know little about the underlying functions, outcome predictors, or intervention targets causing AD. OBJECTIVES The objective of the study was to optimize and identify the lead compound to target AChE against Alzheimer's disease. METHODS Different in silico studies were employed, including the pharmacophore model, virtual screening, molecular docking, de novo evolution model, and molecular dynamics. RESULTS The pharmacophoric features of AChE inhibitors were determined by ligand-based pharmacophore models and 3D QSAR pharmacophore generation. Further validation of the best pharmacophore model was done using the cost analysis method, Fischer's randomization method, and test set. The molecules that harmonized the best pharmacophore model with the estimated activity < 1 nM and ADMET parameters were filtered, and 12 molecules were subjected to molecular docking studies to obtain binding energy. 3vsp_EK8_1 secured the highest binding energy of 65.60 kcal/mol. Further optimization led to a 3v_Evo_4 molecule with a better binding energy of 70.17 kcal/mol. The molecule 3v_evo_4 was subjected to 100 ns molecular simulation compared to donepezil, which showed better stability at the binding site. CONCLUSION A lead compound, 3v_Evo_4 molecule, was identified to inhibit AChE, and it could be further studied to develop as a drug with better efficacy than the existing available drugs for treating AD.
Collapse
Affiliation(s)
- Dhairiya Agarwal
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research-Raebareli, Uttar Pradesh, 226002, India
| | - Sumit Kumar
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research-Raebareli, Uttar Pradesh, 226002, India
| | - Ramesh Ambatwar
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research-Raebareli, Uttar Pradesh, 226002, India
| | - Neeru Bhanwala
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research-Raebareli, Uttar Pradesh, 226002, India
| | - Lokesh Chandrakar
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research-Raebareli, Uttar Pradesh, 226002, India
| | - Gopal L Khatik
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research-Raebareli, Uttar Pradesh, 226002, India
| |
Collapse
|
27
|
Mehrzadi A, Rezaee E, Gharaghani S, Fakhar Z, Mirhosseini SM. A Molecular Generative Model of COVID-19 Main Protease Inhibitors Using Long Short-Term Memory-Based Recurrent Neural Network. J Comput Biol 2024; 31:83-98. [PMID: 38054946 DOI: 10.1089/cmb.2023.0064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023] Open
Abstract
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a serious threat to public health and prompted researchers to find anti-coronavirus 2019 (COVID-19) compounds. In this study, the long short-term memory-based recurrent neural network was used to generate new inhibitors for the coronavirus. First, the model was trained to generate drug compounds in the form of valid simplified molecular-input line-entry system strings. Then, the structures of COVID-19 main protease inhibitors were applied to fine-tune the model. After fine-tuning, the network could generate new molecular structures as novel SARS-CoV-2 main protease inhibitors. Molecular docking exhibited that some generated compounds have the proper affinity to the active site of the protease. Molecular Dynamics simulations explored binding free energies of the compounds over simulation trajectories. In addition, in silico absorption, distribution, metabolism, and excretion studies showed that some novel compounds could be formulated as orally active agents. Based on molecular docking and molecular dynamics simulation studies, compound AADH possessed significant binding affinity and presumably inhibition against the SARS-CoV-2 main protease enzyme. Therefore, the proposed deep learning-based model was capable of generating promising anti-COVID-19 drugs.
Collapse
Affiliation(s)
- Arash Mehrzadi
- Department of Electrical, Computer and IT Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran
| | - Elham Rezaee
- Department of Pharmaceutical Chemistry, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Sajjad Gharaghani
- Department of Bioinformatics, Laboratory of Bioinformatics and Drug Design (LBD), University of Tehran, Tehran, Iran
| | - Zeynab Fakhar
- Department of Bioinformatics, Laboratory of Bioinformatics and Drug Design (LBD), University of Tehran, Tehran, Iran
| | | |
Collapse
|
28
|
Huang CH, Lin ST. MARS Plus: An Improved Molecular Design Tool for Complex Compounds Involving Ionic, Stereo, and Cis-Trans Isomeric Structures. J Chem Inf Model 2023; 63:7711-7728. [PMID: 38100117 DOI: 10.1021/acs.jcim.3c01745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]
Abstract
MARS (Molecular Assembling and Representation Suite) (Hsu et al. J. Chem. Inf. Model. 2019, 59, 3703-3713) is a toolbox for the molecular design of organic molecules. MARS uses integer arrays to represent the elements and connectivity between elements of a molecule. It provides a collection of operations to manipulate the elemental composition and connectivity of a molecule (or a pair of molecules), enabling the creation of novel chemical compounds. In this work, the original MARS is extended to handle complex molecular structures, including geometric (cis-trans) isomers, stereo isomers, cyclic compounds, and ionic species. The extended version of MARS, referred to as MARS+, has a more comprehensive coverage of the chemical space and therefore can explore molecules with a greater chemical and physical diversity. Compared to other molecular design tools, MARS+ is designed to perform all possible manipulations on a given molecule or a pair of molecules. Molecular structure manipulation can be conducted in either a controlled or a random fashion. Furthermore, every structure manipulation has a counterpart so that the operation can be reversed. Nearly any possible chemical structure can be generated with MARS+ via a combination of molecular operations. The capabilities of MARS+ are examined by the design of new ionic liquids (ILs). The results show that MARS+ is a useful tool for computer-aided molecular design (CAMD) and molecular structure enumeration.
Collapse
Affiliation(s)
- Chen-Hsuan Huang
- Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Shiang-Tai Lin
- Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan
| |
Collapse
|
29
|
Karandashev K, Weinreich J, Heinen S, Arismendi Arrieta DJ, von Rudorff GF, Hermansson K, von Lilienfeld OA. Evolutionary Monte Carlo of QM Properties in Chemical Space: Electrolyte Design. J Chem Theory Comput 2023; 19:8861-8870. [PMID: 38009856 PMCID: PMC10720348 DOI: 10.1021/acs.jctc.3c00822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/29/2023] [Accepted: 10/30/2023] [Indexed: 11/29/2023]
Abstract
Optimizing a target function over the space of organic molecules is an important problem appearing in many fields of applied science but also a very difficult one due to the vast number of possible molecular systems. We propose an evolutionary Monte Carlo algorithm for solving such problems which is capable of straightforwardly tuning both exploration and exploitation characteristics of an optimization procedure while retaining favorable properties of genetic algorithms. The method, dubbed MOSAiCS (Metropolis Optimization by Sampling Adaptively in Chemical Space), is tested on problems related to optimizing components of battery electrolytes, namely, minimizing solvation energy in water or maximizing dipole moment while enforcing a lower bound on the HOMO-LUMO gap; optimization was carried out over sets of molecular graphs inspired by QM9 and Electrolyte Genome Project (EGP) data sets. MOSAiCS reliably generated molecular candidates with good target quantity values, which were in most cases better than the ones found in QM9 or EGP. While the optimization results presented in this work sometimes required up to 106 QM calculations and were thus feasible only thanks to computationally efficient ab initio approximations of properties of interest, we discuss possible strategies for accelerating MOSAiCS using machine learning approaches.
Collapse
Affiliation(s)
| | - Jan Weinreich
- Faculty
of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
| | - Stefan Heinen
- Vector
Institute for Artificial Intelligence, Toronto, M5S 1M1 Ontario, Canada
| | | | - Guido Falk von Rudorff
- Department
of Chemistry, University Kassel, Heinrich-Plett-Str.40, 34132 Kassel, Germany
- Center
for Interdisciplinary Nanostructure Science and Technology (CINSaT), Heinrich-Plett-Straße 40, 34132 Kassel, Germany
| | - Kersti Hermansson
- Department
of Chemistry-Ångström Laboratory, Uppsala University, Box 538, SE-75121 Uppsala, Sweden
| | - O. Anatole von Lilienfeld
- Vector
Institute for Artificial Intelligence, Toronto, M5S 1M1 Ontario, Canada
- Departments
of Chemistry, Materials Science and Engineering, and Physics, University of Toronto, St. George
Campus, Toronto, M5S 1A1 Ontario, Canada
- Machine
Learning Group, Technische Universität
Berlin and Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
| |
Collapse
|
30
|
Viswanathan K, Goel M, Laghuvarapu S, Varma G, Priyakumar UD. Streamlining pipeline efficiency: a novel model-agnostic technique for accelerating conditional generative and virtual screening pipelines. Sci Rep 2023; 13:21069. [PMID: 38030689 PMCID: PMC10686981 DOI: 10.1038/s41598-023-42952-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 09/16/2023] [Indexed: 12/01/2023] Open
Abstract
The discovery of potential therapeutic agents for life-threatening diseases has become a significant problem. There is a requirement for fast and accurate methods to identify drug-like molecules that can be used as potential candidates for novel targets. Existing techniques like high-throughput screening and virtual screening are time-consuming and inefficient. Traditional molecule generation pipelines are more efficient than virtual screening but use time-consuming docking software. Such docking functions can be emulated using Machine Learning models with comparable accuracy and faster execution times. However, we find that when pre-trained machine learning models are employed in generative pipelines as oracles, they suffer from model degradation in areas where data is scarce. In this study, we propose an active learning-based model that can be added as a supplement to enhanced molecule generation architectures. The proposed method uses uncertainty sampling on the molecules created by the generator model and dynamically learns as the generator samples molecules from different regions of the chemical space. The proposed framework can generate molecules with high binding affinity with [Formula: see text]a 70% improvement in runtime compared to the baseline model by labeling only [Formula: see text]30% of molecules compared to the baseline oracle.
Collapse
Affiliation(s)
- Karthik Viswanathan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Manan Goel
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Siddhartha Laghuvarapu
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Girish Varma
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India.
| |
Collapse
|
31
|
Zou J, Zhao L, Shi S. Generation of focused drug molecule library using recurrent neural network. J Mol Model 2023; 29:361. [PMID: 37932607 DOI: 10.1007/s00894-023-05772-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 10/26/2023] [Indexed: 11/08/2023]
Abstract
CONTEXT With the wide application of deep learning in drug research and development, de novo molecular design methods based on recurrent neural network (RNN) have strong advantages in drug molecule generation. The RNN model can be used to learn the internal chemical structure of molecules, which is similar to a natural language processing task. Although techniques for generating target-specific molecular libraries based on RNN models are mature, research related to drug design and screening continues around the clock. Research based on de novo drug design methods to generate larger quantities of valid compounds is necessary. METHODS In this study, a molecular generation model based on RNN was designed, which abandoned the traditional way of stacked RNN and introduced the Nested long short-term memory network structure. To enrich the library of focused molecules for specific targets, we fine-tuned the model using active molecules from novel coronavirus pneumonia and screened the molecules using machine learning models. Following rigorous screening, the selected molecules underwent molecular docking with the SARS-CoV-2 M-pro receptor using AutoDock2.4 to identify the top 3 potential inhibitors. Subsequently, 100-ns molecular dynamics simulations were conducted using Amber22. Molecule parameterization involved the GAFF2 force field, while the proteins were modeled using the ff19SB force field, with solvation facilitated by a truncated octahedral TIP3P solvent environment. Upon completion of molecular dynamics simulations, stability of ligand-protein complexes was assessed by analysis of RMSD, H-bonds, and MM-GBSA. Reasonable results prove that the model can complete the task of de novo drug design and has the potential to be ideal drug molecules.
Collapse
Affiliation(s)
- Jinping Zou
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China
- Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China
| | - Long Zhao
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China
- Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China
| | - Shaoping Shi
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
- Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China.
| |
Collapse
|
32
|
Ilnicka A, Schneider G. Designing molecules with autoencoder networks. NATURE COMPUTATIONAL SCIENCE 2023; 3:922-933. [PMID: 38177601 DOI: 10.1038/s43588-023-00548-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/03/2023] [Indexed: 01/06/2024]
Abstract
Autoencoders are versatile tools in molecular informatics. These unsupervised neural networks serve diverse tasks such as data-driven molecular representation and constructive molecular design. This Review explores their algorithmic foundations and applications in drug discovery, highlighting the most active areas of development and the contributions autoencoder networks have made in advancing this field. We also explore the challenges and prospects concerning the utilization of autoencoders and the various adaptations of this neural network architecture in molecular design.
Collapse
Affiliation(s)
- Agnieszka Ilnicka
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland.
| |
Collapse
|
33
|
Capponi S, Daniels KG. Harnessing the power of artificial intelligence to advance cell therapy. Immunol Rev 2023; 320:147-165. [PMID: 37415280 DOI: 10.1111/imr.13236] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 06/17/2023] [Indexed: 07/08/2023]
Abstract
Cell therapies are powerful technologies in which human cells are reprogrammed for therapeutic applications such as killing cancer cells or replacing defective cells. The technologies underlying cell therapies are increasing in effectiveness and complexity, making rational engineering of cell therapies more difficult. Creating the next generation of cell therapies will require improved experimental approaches and predictive models. Artificial intelligence (AI) and machine learning (ML) methods have revolutionized several fields in biology including genome annotation, protein structure prediction, and enzyme design. In this review, we discuss the potential of combining experimental library screens and AI to build predictive models for the development of modular cell therapy technologies. Advances in DNA synthesis and high-throughput screening techniques enable the construction and screening of libraries of modular cell therapy constructs. AI and ML models trained on this screening data can accelerate the development of cell therapies by generating predictive models, design rules, and improved designs.
Collapse
Affiliation(s)
- Sara Capponi
- Department of Functional Genomics and Cellular Engineering, IBM Almaden Research Center, San Jose, California, USA
- Center for Cellular Construction, San Francisco, California, USA
| | - Kyle G Daniels
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, California, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| |
Collapse
|
34
|
Raza A, Chohan TA, Buabeid M, Arafa ESA, Chohan TA, Fatima B, Sultana K, Ullah MS, Murtaza G. Deep learning in drug discovery: a futuristic modality to materialize the large datasets for cheminformatics. J Biomol Struct Dyn 2023; 41:9177-9192. [PMID: 36305195 DOI: 10.1080/07391102.2022.2136244] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 10/08/2022] [Indexed: 10/31/2022]
Abstract
Artificial intelligence (AI) development imitates the workings of the human brain to comprehend modern problems. The traditional approaches such as high throughput screening (HTS) and combinatorial chemistry are lengthy and expensive to the pharmaceutical industry as they can only handle a smaller dataset. Deep learning (DL) is a sophisticated AI method that uses a thorough comprehension of particular systems. The pharmaceutical industry is now adopting DL techniques to enhance the research and development process. Multi-oriented algorithms play a crucial role in the processing of QSAR analysis, de novo drug design, ADME evaluation, physicochemical analysis, preclinical development, followed by clinical trial data precision. In this study, we investigated the performance of several algorithms, including deep neural networks (DNN), convolutional neural networks (CNN) and multi-task learning (MTL), with the aim of generating high-quality, interpretable big and diverse databases for drug design and development. Studies have demonstrated that CNN, recurrent neural network and deep belief network are compatible, accurate and effective for the molecular description of pharmacodynamic properties. In Covid-19, existing pharmacological compounds has also been repurposed using DL models. In the absence of the Covid-19 vaccine, remdesivir and oseltamivir have been widely employed to treat severe SARS-CoV-2 infections. In conclusion, the results indicate the potential benefits of employing the DL strategies in the drug discovery process.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ali Raza
- Department of pharmaceutical chemistry, Faculty of Pharmacy, The University of Lahore, Pakistan
- Institute of Molecular Biology and Biochemistry, The University of Lahore, Pakistan
| | - Talha Ali Chohan
- Institute of Molecular Biology and Biochemistry, The University of Lahore, Pakistan
- Institute of Pharmaceutical Science, UVAS, Lahore, Pakistan
| | - Manal Buabeid
- Department of Clinical Sciences, College of Pharmacy and Health Sciences, Ajman University, Ajman, United Arab Emirates
| | - El-Shaima A Arafa
- Department of Clinical Sciences, College of Pharmacy and Health Sciences, Ajman University, Ajman, United Arab Emirates
- Centre of Medical and Bio-Allied Health Sciences Research, Ajman University, Ajman, United Arab Emirates
| | | | - Batool Fatima
- Department of biochemistry, Bahauddin Zakariya University, Multan, Pakistan
| | - Kishwar Sultana
- Department of pharmaceutical chemistry, Faculty of Pharmacy, The University of Lahore, Pakistan
| | - Malik Saad Ullah
- Department of Pharmacy, Government College University, Faisalabad, Pakistan
| | - Ghulam Murtaza
- Department of Pharmacy, COMSATS University Islamabad, Lahore Campus, Pakistan
| |
Collapse
|
35
|
Arras P, Yoo HB, Pekar L, Clarke T, Friedrich L, Schröter C, Schanz J, Tonillo J, Siegmund V, Doerner A, Krah S, Guarnera E, Zielonka S, Evers A. AI/ML combined with next-generation sequencing of VHH immune repertoires enables the rapid identification of de novo humanized and sequence-optimized single domain antibodies: a prospective case study. Front Mol Biosci 2023; 10:1249247. [PMID: 37842638 PMCID: PMC10575757 DOI: 10.3389/fmolb.2023.1249247] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 08/31/2023] [Indexed: 10/17/2023] Open
Abstract
Introduction: In this study, we demonstrate the feasibility of yeast surface display (YSD) and nextgeneration sequencing (NGS) in combination with artificial intelligence and machine learning methods (AI/ML) for the identification of de novo humanized single domain antibodies (sdAbs) with favorable early developability profiles. Methods: The display library was derived from a novel approach, in which VHH-based CDR3 regions obtained from a llama (Lama glama), immunized against NKp46, were grafted onto a humanized VHH backbone library that was diversified in CDR1 and CDR2. Following NGS analysis of sequence pools from two rounds of fluorescence-activated cell sorting we focused on four sequence clusters based on NGS frequency and enrichment analysis as well as in silico developability assessment. For each cluster, long short-term memory (LSTM) based deep generative models were trained and used for the in silico sampling of new sequences. Sequences were subjected to sequence- and structure-based in silico developability assessment to select a set of less than 10 sequences per cluster for production. Results: As demonstrated by binding kinetics and early developability assessment, this procedure represents a general strategy for the rapid and efficient design of potent and automatically humanized sdAb hits from screening selections with favorable early developability profiles.
Collapse
Affiliation(s)
- Paul Arras
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
- Institute for Organic Chemistry and Biochemistry, Technical University of Darmstadt, Darmstadt, Germany
| | - Han Byul Yoo
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
| | - Lukas Pekar
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
| | - Thomas Clarke
- Bioinformatics, EMD Serono, Billerica, MA, United States
| | - Lukas Friedrich
- Computational Chemistry and Biologics, Merck Healthcare KGaA, Darmstadt, Germany
| | | | - Jennifer Schanz
- ADCs & Targeted NBE Therapeutics, Merck KGaA, Darmstadt, Germany
| | - Jason Tonillo
- ADCs & Targeted NBE Therapeutics, Merck KGaA, Darmstadt, Germany
| | - Vanessa Siegmund
- Early Protein Supply and Characterization, Merck Healthcare KGaA, Darmstadt, Germany
| | - Achim Doerner
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
| | - Simon Krah
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
| | - Enrico Guarnera
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
| | - Stefan Zielonka
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
- Institute for Organic Chemistry and Biochemistry, Technical University of Darmstadt, Darmstadt, Germany
| | - Andreas Evers
- Antibody Discovery and Protein Engineering, Merck Healthcare KGaA, Darmstadt, Germany
| |
Collapse
|
36
|
Bae B, Bae H, Nam H. LOGICS: Learning optimal generative distribution for designing de novo chemical structures. J Cheminform 2023; 15:77. [PMID: 37674239 PMCID: PMC10483765 DOI: 10.1186/s13321-023-00747-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 08/23/2023] [Indexed: 09/08/2023] Open
Abstract
In recent years, the field of computational drug design has made significant strides in the development of artificial intelligence (AI) models for the generation of de novo chemical compounds with desired properties and biological activities, such as enhanced binding affinity to target proteins. These high-affinity compounds have the potential to be developed into more potent therapeutics for a broad spectrum of diseases. Due to the lack of data required for the training of deep generative models, however, some of these approaches have fine-tuned their molecular generators using data obtained from a separate predictor. While these studies show that generative models can produce structures with the desired target properties, it remains unclear whether the diversity of the generated structures and the span of their chemical space align with the distribution of the intended target molecules. In this study, we present a novel generative framework, LOGICS, a framework for Learning Optimal Generative distribution Iteratively for designing target-focused Chemical Structures. We address the exploration-exploitation dilemma, which weighs the choice between exploring new options and exploiting current knowledge. To tackle this issue, we incorporate experience memory and employ a layered tournament selection approach to refine the fine-tuning process. The proposed method was applied to the binding affinity optimization of two target proteins of different protein classes, κ-opioid receptors, and PIK3CA, and the quality and the distribution of the generative molecules were evaluated. The results showed that LOGICS outperforms competing state-of-the-art models and generates more diverse de novo chemical structures with optimized properties. The source code is available at the GitHub repository ( https://github.com/GIST-CSBL/LOGICS ).
Collapse
Affiliation(s)
- Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea
| | - Haelee Bae
- AI Graduate School, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
- AI Graduate School, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
- Center for AI-Applied High Efficiency Drug Discovery (AHEDD), Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
| |
Collapse
|
37
|
Williams AH, Zhan CG. Staying Ahead of the Game: How SARS-CoV-2 has Accelerated the Application of Machine Learning in Pandemic Management. BioDrugs 2023; 37:649-674. [PMID: 37464099 DOI: 10.1007/s40259-023-00611-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/28/2023] [Indexed: 07/20/2023]
Abstract
In recent years, machine learning (ML) techniques have garnered considerable interest for their potential use in accelerating the rate of drug discovery. With the emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, the utilization of ML has become even more crucial in the search for effective antiviral medications. The pandemic has presented the scientific community with a unique challenge, and the rapid identification of potential treatments has become an urgent priority. Researchers have been able to accelerate the process of identifying drug candidates, repurposing existing drugs, and designing new compounds with desirable properties using machine learning in drug discovery. To train predictive models, ML techniques in drug discovery rely on the analysis of large datasets, including both experimental and clinical data. These models can be used to predict the biological activities, potential side effects, and interactions with specific target proteins of drug candidates. This strategy has proven to be an effective method for identifying potential coronavirus disease 2019 (COVID-19) and other disease treatments. This paper offers a thorough analysis of the various ML techniques implemented to combat COVID-19, including supervised and unsupervised learning, deep learning, and natural language processing. The paper discusses the impact of these techniques on pandemic drug development, including the identification of potential treatments, the understanding of the disease mechanism, and the creation of effective and safe therapeutics. The lessons learned can be applied to future outbreaks and drug discovery initiatives.
Collapse
Affiliation(s)
- Alexander H Williams
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- GSK Upper Providence, 1250 S. Collegeville Road, Collegeville, PA, 19426, USA
| | - Chang-Guo Zhan
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
| |
Collapse
|
38
|
An Y, Glavatskikh M, Lim J, Wang X, Norris-Drouin J, Hardy PB, Leisner TM, Pearce KH, Kireev D. Machine Learning-driven Fragment-based Discovery of CIB1-directed Anti-Tumor Agents by FRASE-bot. RESEARCH SQUARE 2023:rs.3.rs-3197490. [PMID: 37645935 PMCID: PMC10462244 DOI: 10.21203/rs.3.rs-3197490/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Chemical probes are an indispensable tool for translating biological discoveries into new therapies, though are increasingly difficult to identify. Novel therapeutic targets are often hard-to-drug proteins, such as messengers or transcription factors. Computational strategies arise as a promising solution to expedite drug discovery for unconventional therapeutic targets. FRASE-bot exploits big data and machine learning (ML) to distill 3D information relevant to the target protein from thousands of protein-ligand complexes to seed it with ligand fragments. The seeded fragments can then inform either (i) de novo design of 3D ligand structures or (ii) ultra-large-scale virtual screening of commercially available compounds. Here, FRASE-bot was applied to identify ligands for Calcium and Integrin Binding protein 1 (CIB1), a promising but ligand-orphan drug target implicated in triple negative breast cancer. The signaling function of CIB1 relies on protein-protein interactions and its structure does not feature any natural ligand-binding pocket. FRASE-based virtual screening identified the first small-molecule CIB1 ligand (with binding confirmed in a TR-FRET assay) showing specific cell-killing activity in CIB1-dependent cancer cells, but not in CIB1-depleted cells.
Collapse
Affiliation(s)
- Yi An
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
| | - Marta Glavatskikh
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
| | - Jiwoong Lim
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
| | - Xiaowen Wang
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
- Chemistry department, University of Missouri, Columbia, Columbia, MO, 65211
| | - Jacqueline Norris-Drouin
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
| | - P. Brian Hardy
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
| | - Tina M. Leisner
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
| | - Kenneth H. Pearce
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
| | - Dmitri Kireev
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27513
- Chemistry department, University of Missouri, Columbia, Columbia, MO, 65211
| |
Collapse
|
39
|
Jin J, Wang D, Shi G, Bao J, Wang J, Zhang H, Pan P, Li D, Yao X, Liu H, Hou T, Kang Y. FFLOM: A Flow-Based Autoregressive Model for Fragment-to-Lead Optimization. J Med Chem 2023; 66:10808-10823. [PMID: 37471134 DOI: 10.1021/acs.jmedchem.3c01009] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/21/2023]
Abstract
Recently, deep generative models have been regarded as promising tools in fragment-based drug design (FBDD). Despite the growing interest in these models, they still face challenges in generating molecules with desired properties in low data regimes. In this study, we propose a novel flow-based autoregressive model named FFLOM for linker and R-group design. In a large-scale benchmark evaluation on ZINC, CASF, and PDBbind test sets, FFLOM achieves state-of-the-art performance in terms of validity, uniqueness, novelty, and recovery of the generated molecules and can recover over 92% of the original molecules in the PDBbind test set (with at least five atoms). FFLOM also exhibits excellent potential applicability in several practical scenarios encompassing fragment linking, PROTAC design, R-group growing, and R-group optimization. In all four cases, FFLOM can perfectly reconstruct the ground-truth compounds and generate over 74% of molecules with novel fragments, some of which have higher binding affinity than the ground truth.
Collapse
Affiliation(s)
- Jieyu Jin
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Guqin Shi
- Shanghai Qilu Pharmaceutical R&D Center, 576 Libing Road, Pudong New Area District, Shanghai 310115, China
| | - Jingxiao Bao
- Shanghai Qilu Pharmaceutical R&D Center, 576 Libing Road, Pudong New Area District, Shanghai 310115, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Haotian Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macau 999078, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macau 999078, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
40
|
Prabhakaran P, Hebbani AV, Menon SV, Paital B, Murmu S, Kumar S, Singh MK, Sahoo DK, Desai PPD. Insilico generation of novel ligands for the inhibition of SARS-CoV-2 main protease (3CL pro) using deep learning. Front Microbiol 2023; 14:1194794. [PMID: 37448573 PMCID: PMC10338188 DOI: 10.3389/fmicb.2023.1194794] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/05/2023] [Indexed: 07/15/2023] Open
Abstract
The recent emergence of novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causing the coronavirus disease (COVID-19) has become a global public health crisis, and a crucial need exists for rapid identification and development of novel therapeutic interventions. In this study, a recurrent neural network (RNN) is trained and optimized to produce novel ligands that could serve as potential inhibitors to the SARS-CoV-2 viral protease: 3 chymotrypsin-like protease (3CLpro). Structure-based virtual screening was performed through molecular docking, ADMET profiling, and predictions of various molecular properties were done to evaluate the toxicity and drug-likeness of the generated novel ligands. The properties of the generated ligands were also compared with current drugs under various phases of clinical trials to assess the efficacy of the novel ligands. Twenty novel ligands were selected that exhibited good drug-likeness properties, with most ligands conforming to Lipinski's rule of 5, high binding affinity (highest binding affinity: -9.4 kcal/mol), and promising ADMET profile. Additionally, the generated ligands complexed with 3CLpro were found to be stable based on the results of molecular dynamics simulation studies conducted over a 100 ns period. Overall, the findings offer a promising avenue for the rapid identification and development of effective therapeutic interventions to treat COVID-19.
Collapse
Affiliation(s)
- Prejwal Prabhakaran
- Department of Biotechnology, New Horizon College of Engineering, Bangalore, India
- Faculty of Biology, Albert-Ludwigs-Universität Freiburg, Freiburg im Breisgau, Germany
| | - Ananda Vardhan Hebbani
- Department of Biochemistry, Indian Academy Degree College (Autonomous), Bangalore, India
| | - Soumya V. Menon
- Department of Chemistry and Biochemistry, School of Sciences, Jain (Deemed-to-be) University, Bangalore, India
| | - Biswaranjan Paital
- Redox Regulation Laboratory, Department of Zoology, College of Basic Science and Humanities, Odisha University of Agriculture and Technology, Bhubaneswar, India
| | - Sneha Murmu
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, India
| | - Sunil Kumar
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, India
| | | | - Dipak Kumar Sahoo
- Department of Veterinary Clinical Sciences, College of Veterinary Medicine, Iowa State University, Ames, IA, United States
| | | |
Collapse
|
41
|
Zhang W, Zhang K, Huang J. A Simple Way to Incorporate Target Structural Information in Molecular Generative Models. J Chem Inf Model 2023. [PMID: 37318828 DOI: 10.1021/acs.jcim.3c00293] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Deep learning generative models are now being applied in various fields including drug discovery. In this work, we propose a novel approach to include target 3D structural information in molecular generative models for structure-based drug design. The method combines a message-passing neural network model that predicts docking scores with a generative neural network model as its reward function to navigate the chemical space searching for molecules that bind favorably with a specific target. A key feature of the method is the construction of target-specific molecular sets for training, designed to overcome potential transferability issues of surrogate docking models through a two-round training process. Consequently, this enables accurate guided exploration of the chemical space without reliance on the collection of prior knowledge about active and inactive compounds for the specific target. Tests on eight target proteins showed a 100-fold increase in hit generation compared to conventional docking calculations and the ability to generate molecules similar to approved drugs or known active ligands for specific targets without prior knowledge. This method provides a general and highly efficient solution for structure-based molecular generation.
Collapse
Affiliation(s)
- Wenyi Zhang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Kaiyue Zhang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Jing Huang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| |
Collapse
|
42
|
Li J, Beaudoin C, Ghosh S. Energy-based generative models for target-specific drug discovery. FRONTIERS IN MOLECULAR MEDICINE 2023; 3:1160877. [PMID: 39086693 PMCID: PMC11285544 DOI: 10.3389/fmmed.2023.1160877] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 05/12/2023] [Indexed: 08/02/2024]
Abstract
Drug targets are the main focus of drug discovery due to their key role in disease pathogenesis. Computational approaches are widely applied to drug development because of the increasing availability of biological molecular datasets. Popular generative approaches can create new drug molecules by learning the given molecule distributions. However, these approaches are mostly not for target-specific drug discovery. We developed an energy-based probabilistic model for computational target-specific drug discovery. Results show that our proposed TagMol can generate molecules with similar binding affinity scores as real molecules. GAT-based models showed faster and better learning relative to Graph Convolutional Network baseline models.
Collapse
Affiliation(s)
- Junde Li
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, United States
| | | | | |
Collapse
|
43
|
Mazuz E, Shtar G, Shapira B, Rokach L. Molecule generation using transformers and policy gradient reinforcement learning. Sci Rep 2023; 13:8799. [PMID: 37258546 DOI: 10.1038/s41598-023-35648-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 05/22/2023] [Indexed: 06/02/2023] Open
Abstract
Generating novel valid molecules is often a difficult task, because the vast chemical space relies on the intuition of experienced chemists. In recent years, deep learning models have helped accelerate this process. These advanced models can also help identify suitable molecules for disease treatment. In this paper, we propose Taiga, a transformer-based architecture for the generation of molecules with desired properties. Using a two-stage approach, we first treat the problem as a language modeling task of predicting the next token, using SMILES strings. Then, we use reinforcement learning to optimize molecular properties such as QED. This approach allows our model to learn the underlying rules of chemistry and more easily optimize for molecules with desired properties. Our evaluation of Taiga, which was performed with multiple datasets and tasks, shows that Taiga is comparable to, or even outperforms, state-of-the-art baselines for molecule optimization, with improvements in the QED ranging from 2 to over 20 percent. The improvement was demonstrated both on datasets containing lead molecules and random molecules. We also show that with its two stages, Taiga is capable of generating molecules with higher biological property scores than the same model without reinforcement learning.
Collapse
Affiliation(s)
- Eyal Mazuz
- Ben-Gurion University of the Negev, Beersheba, Israel.
| | - Guy Shtar
- Ben-Gurion University of the Negev, Beersheba, Israel
| | | | - Lior Rokach
- Ben-Gurion University of the Negev, Beersheba, Israel
| |
Collapse
|
44
|
Wang J, Zeng Y, Sun H, Wang J, Wang X, Jin R, Wang M, Zhang X, Cao D, Chen X, Hsieh CY, Hou T. Molecular Generation with Reduced Labeling through Constraint Architecture. J Chem Inf Model 2023. [PMID: 37184885 DOI: 10.1021/acs.jcim.3c00579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
In the past few years, a number of machine learning (ML)-based molecular generative models have been proposed for generating molecules with desirable properties, but they all require a large amount of label data of pharmacological and physicochemical properties. However, experimental determination of these labels, especially bioactivity labels, is very expensive. In this study, we analyze the dependence of various multi-property molecule generation models on biological activity label data and propose Frag-G/M, a fragment-based multi-constraint molecular generation framework based on conditional transformer, recurrent neural networks (RNNs), and reinforcement learning (RL). The experimental results illustrate that, using the same number of labels, Frag-G/M can generate more desired molecules than the baselines (several times more than the baselines). Moreover, compared with the known active compounds, the molecules generated by Frag-G/M exhibit higher scaffold diversity than those generated by the baselines, thus making it more promising to be used in real-world drug discovery scenarios.
Collapse
Affiliation(s)
- Jike Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
- School of Computer Science, Wuhan University, Wuhan, Hubei 430072, P. R. China
| | - Yundian Zeng
- College of Control Science and Engineering, Zhejiang University, Hangzhou, Zhejiang 310027, P. R. China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing, Jiangsu 210009, P. R. China
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Xiaorui Wang
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macau 999078, P. R. China
| | - Ruofan Jin
- College of Life Science, Zhejiang University, Hangzhou, Zhejiang 310027, P. R. China
| | - Mingyang Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Xujun Zhang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410004, P. R. China
| | - Xi Chen
- School of Computer Science, Wuhan University, Wuhan, Hubei 430072, P. R. China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| |
Collapse
|
45
|
Ji C, Zheng Y, Wang R, Cai Y, Wu H. Graph Polish: A Novel Graph Generation Paradigm for Molecular Optimization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2323-2337. [PMID: 34520363 DOI: 10.1109/tnnls.2021.3106392] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Molecular optimization, which transforms a given input molecule X into another Y with desired properties, is essential in molecular drug discovery. The traditional approaches either suffer from sample-inefficient learning or ignore information that can be captured with the supervised learning of optimized molecule pairs. In this study, we present a novel molecular optimization paradigm, Graph Polish. In this paradigm, with the guidance of the source and target molecule pairs of the desired properties, a heuristic optimization solution can be derived: given an input molecule, we first predict which atom can be viewed as the optimization center, and then the nearby regions are optimized around this center. We then propose an effective and efficient learning framework, Teacher and Student polish, to capture the dependencies in the optimization steps. A teacher component automatically identifies and annotates the optimization centers and the preservation, removal, and addition of some parts of the molecules; a student component learns these knowledges and applies them to a new molecule. The proposed paradigm can offer an intuitive interpretation for the molecular optimization result. Experiments with multiple optimization tasks are conducted on several benchmark datasets. The proposed approach achieves a significant advantage over the six state-of-the-art baseline methods. Also, extensive studies are conducted to validate the effectiveness, explainability, and time savings of the novel optimization paradigm.
Collapse
|
46
|
Chen L, Shen Q, Lou J. Magicmol: a light-weighted pipeline for drug-like molecule evolution and quick chemical space exploration. BMC Bioinformatics 2023; 24:173. [PMID: 37101113 PMCID: PMC10132416 DOI: 10.1186/s12859-023-05286-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Accepted: 04/13/2023] [Indexed: 04/28/2023] Open
Abstract
The flourishment of machine learning and deep learning methods has boosted the development of cheminformatics, especially regarding the application of drug discovery and new material exploration. Lower time and space expenses make it possible for scientists to search the enormous chemical space. Recently, some work combined reinforcement learning strategies with recurrent neural network (RNN)-based models to optimize the property of generated small molecules, which notably improved a batch of critical factors for these candidates. However, a common problem among these RNN-based methods is that several generated molecules have difficulty in synthesizing despite owning higher desired properties such as binding affinity. However, RNN-based framework better reproduces the molecule distribution among the training set than other categories of models during molecule exploration tasks. Thus, to optimize the whole exploration process and make it contribute to the optimization of specified molecules, we devised a light-weighted pipeline called Magicmol; this pipeline has a re-mastered RNN network and utilize SELFIES presentation instead of SMILES. Our backbone model achieved extraordinary performance while reducing the training cost; moreover, we devised reward truncate strategies to eliminate the model collapse problem. Additionally, adopting SELFIES presentation made it possible to combine STONED-SELFIES as a post-processing procedure for specified molecule optimization and quick chemical space exploration.
Collapse
Affiliation(s)
- Lin Chen
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, China
| | - Qing Shen
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China
- School of Electronic Information, Huzhou College, Huzhou, China
| | - Jungang Lou
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China.
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, China.
| |
Collapse
|
47
|
Nemoto S, Mizuno T, Kusuhara H. Investigation of chemical structure recognition by encoder-decoder models in learning progress. J Cheminform 2023; 15:45. [PMID: 37046349 PMCID: PMC10100163 DOI: 10.1186/s13321-023-00713-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 03/18/2023] [Indexed: 04/14/2023] Open
Abstract
Descriptor generation methods using latent representations of encoder-decoder (ED) models with SMILES as input are useful because of the continuity of descriptor and restorability to the structure. However, it is not clear how the structure is recognized in the learning progress of ED models. In this work, we created ED models of various learning progress and investigated the relationship between structural information and learning progress. We showed that compound substructures were learned early in ED models by monitoring the accuracy of downstream tasks and input-output substructure similarity using substructure-based descriptors, which suggests that existing evaluation methods based on the accuracy of downstream tasks may not be sensitive enough to evaluate the performance of ED models with SMILES as descriptor generation methods. On the other hand, we showed that structure restoration was time-consuming, and in particular, insufficient learning led to the estimation of a larger structure than the actual one. It can be inferred that determining the endpoint of the structure is a difficult task for the model. To our knowledge, this is the first study to link the learning progress of SMILES by ED model to chemical structures for a wide range of chemicals.
Collapse
Affiliation(s)
- Shumpei Nemoto
- Department of Pharmaceutical Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan
| | - Tadahaya Mizuno
- Department of Pharmaceutical Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan.
| | - Hiroyuki Kusuhara
- Department of Pharmaceutical Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan
| |
Collapse
|
48
|
Liu X, Zhang W, Tong X, Zhong F, Li Z, Xiong Z, Xiong J, Wu X, Fu Z, Tan X, Liu Z, Zhang S, Jiang H, Li X, Zheng M. MolFilterGAN: a progressively augmented generative adversarial network for triaging AI-designed molecules. J Cheminform 2023; 15:42. [PMID: 37031191 PMCID: PMC10082991 DOI: 10.1186/s13321-023-00711-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 03/14/2023] [Indexed: 04/10/2023] Open
Abstract
Artificial intelligence (AI)-based molecular design methods, especially deep generative models for generating novel molecule structures, have gratified our imagination to explore unknown chemical space without relying on brute-force exploration. However, whether designed by AI or human experts, the molecules need to be accessibly synthesized and biologically evaluated, and the trial-and-error process remains a resources-intensive endeavor. Therefore, AI-based drug design methods face a major challenge of how to prioritize the molecular structures with potential for subsequent drug development. This study indicates that common filtering approaches based on traditional screening metrics fail to differentiate AI-designed molecules. To address this issue, we propose a novel molecular filtering method, MolFilterGAN, based on a progressively augmented generative adversarial network. Comparative analysis shows that MolFilterGAN outperforms conventional screening approaches based on drug-likeness or synthetic ability metrics. Retrospective analysis of AI-designed discoidin domain receptor 1 (DDR1) inhibitors shows that MolFilterGAN significantly increases the efficiency of molecular triaging. Further evaluation of MolFilterGAN on eight external ligand sets suggests that MolFilterGAN is useful in triaging or enriching bioactive compounds across a wide range of target types. These results highlighted the importance of MolFilterGAN in evaluating molecules integrally and further accelerating molecular discovery especially combined with advanced AI generative models.
Collapse
Affiliation(s)
- Xiaohong Liu
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- AlphaMa Inc., No. 108, Yuxin Road, Suzhou Industrial Park, Suzhou, 215128, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Feisheng Zhong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Zhaojun Li
- AlphaMa Inc., No. 108, Yuxin Road, Suzhou Industrial Park, Suzhou, 215128, China
| | - Zhaoping Xiong
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Jiacheng Xiong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaolong Wu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xiaoqin Tan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- ByteDance AI Lab, No. 1999 Yishan Road, Shanghai, 201103, China
| | - Zhiguo Liu
- AlphaMa Inc., No. 108, Yuxin Road, Suzhou Industrial Park, Suzhou, 215128, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Hualiang Jiang
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 310024, Hangzhou, China.
| |
Collapse
|
49
|
Koutroumpa NM, Papavasileiou KD, Papadiamantis AG, Melagraki G, Afantitis A. A Systematic Review of Deep Learning Methodologies Used in the Drug Discovery Process with Emphasis on In Vivo Validation. Int J Mol Sci 2023; 24:6573. [PMID: 37047543 PMCID: PMC10095548 DOI: 10.3390/ijms24076573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 03/24/2023] [Accepted: 03/28/2023] [Indexed: 04/05/2023] Open
Abstract
The discovery and development of new drugs are extremely long and costly processes. Recent progress in artificial intelligence has made a positive impact on the drug development pipeline. Numerous challenges have been addressed with the growing exploitation of drug-related data and the advancement of deep learning technology. Several model frameworks have been proposed to enhance the performance of deep learning algorithms in molecular design. However, only a few have had an immediate impact on drug development since computational results may not be confirmed experimentally. This systematic review aims to summarize the different deep learning architectures used in the drug discovery process and are validated with further in vivo experiments. For each presented study, the proposed molecule or peptide that has been generated or identified by the deep learning model has been biologically evaluated in animal models. These state-of-the-art studies highlight that even if artificial intelligence in drug discovery is still in its infancy, it has great potential to accelerate the drug discovery cycle, reduce the required costs, and contribute to the integration of the 3R (Replacement, Reduction, Refinement) principles. Out of all the reviewed scientific articles, seven algorithms were identified: recurrent neural networks, specifically, long short-term memory (LSTM-RNNs), Autoencoders (AEs) and their Wasserstein Autoencoders (WAEs) and Variational Autoencoders (VAEs) variants; Convolutional Neural Networks (CNNs); Direct Message Passing Neural Networks (D-MPNNs); and Multitask Deep Neural Networks (MTDNNs). LSTM-RNNs were the most used architectures with molecules or peptide sequences as inputs.
Collapse
Affiliation(s)
- Nikoletta-Maria Koutroumpa
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- School of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
| | - Konstantinos D. Papavasileiou
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
- Department of ChemoInformatics, NovaMechanics MIKE., 185 45 Piraeus, Greece
| | - Anastasios G. Papadiamantis
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
| | - Georgia Melagraki
- Division of Physical Sciences & Applications, Hellenic Military Academy, 166 73 Vari, Greece
| | - Antreas Afantitis
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
- Department of ChemoInformatics, NovaMechanics MIKE., 185 45 Piraeus, Greece
| |
Collapse
|
50
|
Zhou Z, Eden M, Shen W. Treat Molecular Linear Notations as Sentences: Accurate Quantitative Structure–Property Relationship Modeling via a Natural Language Processing Approach. Ind Eng Chem Res 2023. [DOI: 10.1021/acs.iecr.2c04070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
|