1
|
Proszewska M, Wolczyk M, Zieba M, Wielopolski P, Maziarka L, Smieja M. Multi-Label Conditional Generation From Pre-Trained Models. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:6185-6198. [PMID: 38530738 DOI: 10.1109/tpami.2024.3382008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
Although modern generative models achieve excellent quality in a variety of tasks, they often lack the essential ability to generate examples with requested properties, such as the age of the person in the photo or the weight of the generated molecule. To overcome these limitations we propose PluGeN (Plugin Generative Network), a simple yet effective generative technique that can be used as a plugin for pre-trained generative models. The idea behind our approach is to transform the entangled latent representation using a flow-based module into a multi-dimensional space where the values of each attribute are modeled as an independent one-dimensional distribution. In consequence, PluGeN can generate new samples with desired attributes as well as manipulate labeled attributes of existing examples. Due to the disentangling of the latent representation, we are even able to generate samples with rare or unseen combinations of attributes in the dataset, such as a young person with gray hair, men with make-up, or women with beards. In contrast to competitive approaches, PluGeN can be trained on partially labeled data. We combined PluGeN with GAN and VAE models and applied it to conditional generation and manipulation of images, chemical molecule modeling and 3D point clouds generation.
Collapse
|
2
|
Kanakala GC, Devata S, Chatterjee P, Priyakumar UD. Generative artificial intelligence for small molecule drug design. Curr Opin Biotechnol 2024; 89:103175. [PMID: 39106790 DOI: 10.1016/j.copbio.2024.103175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 06/19/2024] [Accepted: 07/12/2024] [Indexed: 08/09/2024]
Abstract
In recent years, the rapid advancement of generative artificial intelligence (GenAI) has revolutionized the landscape of drug design, offering innovative solutions to potentially expedite the discovery of novel therapeutics. GenAI encompasses algorithms and models that autonomously create new data, including text, images, and molecules, often mirroring characteristics of existing datasets. This comprehensive review delves into the realm of GenAI for drug design, emphasizing recent advancements and methodologies that have propelled the field forward. Specifically, we focus on three prominent paradigms: transformers, diffusion models, and reinforcement learning algorithms, which have been exceptionally impactful in the last few years. By synthesizing insights from a myriad of studies and developments, we elucidate the potential of these approaches in accelerating the drug discovery process. Through a detailed analysis, we explore the current state and future directions of GenAI in the context of drug design, highlighting its transformative impact on pharmaceutical research and development.
Collapse
Affiliation(s)
- Ganesh Chandan Kanakala
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, Telangana, India
| | - Sriram Devata
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, Telangana, India
| | - Prathit Chatterjee
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, Telangana, India
| | - Udaykumar Deva Priyakumar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, Telangana, India.
| |
Collapse
|
3
|
Fallani A, Medrano Sandonas L, Tkatchenko A. Inverse mapping of quantum properties to structures for chemical space of small organic molecules. Nat Commun 2024; 15:6061. [PMID: 39025883 PMCID: PMC11258234 DOI: 10.1038/s41467-024-50401-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 07/01/2024] [Indexed: 07/20/2024] Open
Abstract
Computer-driven molecular design combines the principles of chemistry, physics, and artificial intelligence to identify chemical compounds with tailored properties. While quantum-mechanical (QM) methods, coupled with machine learning, already offer a direct mapping from 3D molecular structures to their properties, effective methodologies for the inverse mapping in chemical space remain elusive. We address this challenge by demonstrating the possibility of parametrizing a chemical space with a finite set of QM properties. Our proof-of-concept implementation achieves an approximate property-to-structure mapping, the QIM model (which stands for "Quantum Inverse Mapping"), by forcing a variational auto-encoder with a property encoder to obtain a common internal representation for both structures and properties. After validating this mapping for small drug-like molecules, we illustrate its capabilities with an explainability study as well as by the generation of de novo molecular structures with targeted properties and transition pathways between conformational isomers. Our findings thus provide a proof-of-principle demonstration aiming to enable the inverse property-to-structure design in diverse chemical spaces.
Collapse
Affiliation(s)
- Alessio Fallani
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany.
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
4
|
Xia X, Liu Y, Zheng C, Zhang X, Wu Q, Gao X, Zeng X, Su Y. Evolutionary Multiobjective Molecule Optimization in an Implicit Chemical Space. J Chem Inf Model 2024; 64:5161-5174. [PMID: 38870455 PMCID: PMC11235097 DOI: 10.1021/acs.jcim.4c00031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 05/08/2024] [Accepted: 05/13/2024] [Indexed: 06/15/2024]
Abstract
Optimization techniques play a pivotal role in advancing drug development, serving as the foundation of numerous generative methods tailored to efficiently design optimized molecules derived from existing lead compounds. However, existing methods often encounter difficulties in generating diverse, novel, and high-property molecules that simultaneously optimize multiple drug properties. To overcome this bottleneck, we propose a multiobjective molecule optimization framework (MOMO). MOMO employs a specially designed Pareto-based multiproperty evaluation strategy at the molecular sequence level to guide the evolutionary search in an implicit chemical space. A comparative analysis of MOMO with five state-of-the-art methods across two benchmark multiproperty molecule optimization tasks reveals that MOMO markedly outperforms them in terms of diversity, novelty, and optimized properties. The practical applicability of MOMO in drug discovery has also been validated on four challenging tasks in the real-world discovery problem. These results suggest that MOMO can provide a useful tool to facilitate molecule optimization problems with multiple properties.
Collapse
Affiliation(s)
- Xin Xia
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
- Institute
of Artificial Intelligence, Hefei Comprehensive
National Science Center, 5089 Wangjiang West Road, Hefei 230088, AnhuiChina
| | - Yiping Liu
- College
of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Chunhou Zheng
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Xingyi Zhang
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Qingwen Wu
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Xin Gao
- Computer
Science Program, Computer, Electrical and Mathematical Sciences and
Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology
(KAUST), Thuwal 23955-6900, Kingdom
of Saudi Arabia
| | - Xiangxiang Zeng
- College
of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Yansen Su
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
- Institute
of Artificial Intelligence, Hefei Comprehensive
National Science Center, 5089 Wangjiang West Road, Hefei 230088, AnhuiChina
| |
Collapse
|
5
|
Wei W, Fang J, Yang N, Li Q, Hu L, Zhao L, Han J. AC-ModNet: Molecular Reverse Design Network Based on Attribute Classification. Int J Mol Sci 2024; 25:6940. [PMID: 39000049 PMCID: PMC11241775 DOI: 10.3390/ijms25136940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2024] [Revised: 06/13/2024] [Accepted: 06/22/2024] [Indexed: 07/14/2024] Open
Abstract
Deep generative models are becoming a tool of choice for exploring the molecular space. One important application area of deep generative models is the reverse design of drug compounds for given attributes (solubility, ease of synthesis, etc.). Although there are many generative models, these models cannot generate specific intervals of attributes. This paper proposes a AC-ModNet model that effectively combines VAE with AC-GAN to generate molecular structures in specific attribute intervals. The AC-ModNet is trained and evaluated using the open 250K ZINC dataset. In comparison with related models, our method performs best in the FCD and Frag model evaluation indicators. Moreover, we prove the AC-ModNet created molecules have potential application value in drug design by comparing and analyzing them with medical records in the PubChem database. The results of this paper will provide a new method for machine learning drug reverse design.
Collapse
Affiliation(s)
| | | | - Ning Yang
- School of Automation, Northwestern Polytechnical University, Xi’an 710072, China; (W.W.); (J.F.); (Q.L.); (L.H.); (L.Z.); (J.H.)
| | | | | | | | | |
Collapse
|
6
|
Alberga D, Lamanna G, Graziano G, Delre P, Lomuscio MC, Corriero N, Ligresti A, Siliqi D, Saviano M, Contino M, Stefanachi A, Mangiatordi GF. DeLA-DrugSelf: Empowering multi-objective de novo design through SELFIES molecular representation. Comput Biol Med 2024; 175:108486. [PMID: 38653065 DOI: 10.1016/j.compbiomed.2024.108486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 04/08/2024] [Accepted: 04/15/2024] [Indexed: 04/25/2024]
Abstract
In this paper, we introduce DeLA-DrugSelf, an upgraded version of DeLA-Drug [J. Chem. Inf. Model. 62 (2022) 1411-1424], which incorporates essential advancements for automated multi-objective de novo design. Unlike its predecessor, which relies on SMILES notation for molecular representation, DeLA-DrugSelf employs a novel and robust molecular representation string named SELFIES (SELF-referencing Embedded String). The generation process in DeLA-DrugSelf not only involves substitutions to the initial string representing the starting query molecule but also incorporates insertions and deletions. This enhancement makes DeLA-DrugSelf significantly more adept at executing data-driven scaffold decoration and lead optimization strategies. Remarkably, DeLA-DrugSelf explicitly addresses the SELFIES-related collapse issue, considering only collapse-free compounds during generation. These compounds undergo a rigorous quality metrics evaluation, highlighting substantial advancements in terms of drug-likeness, uniqueness, and novelty compared to the molecules generated by the previous version of the algorithm. To evaluate the potential of DeLA-DrugSelf as a mutational operator within a genetic algorithm framework for multi-objective optimization, we employed a fitness function based on Pareto dominance. Our objectives focused on target-oriented properties aimed at optimizing known cannabinoid receptor 2 (CB2R) ligands. The results obtained indicate that DeLA-DrugSelf, available as a user-friendly web platform (https://www.ba.ic.cnr.it/softwareic/delaself/), can effectively contribute to the data-driven optimization of starting bioactive molecules based on user-defined parameters.
Collapse
Affiliation(s)
- Domenico Alberga
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Giuseppe Lamanna
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Giovanni Graziano
- Department of Pharmacy - Pharmaceutical Sciences, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125, Bari, Italy
| | - Pietro Delre
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | | | - Nicola Corriero
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Alessia Ligresti
- CNR - Institute of Biomolecular Chemistry, Via Campi Flegrei 34, 80078, Pozzuoli, Italy
| | - Dritan Siliqi
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Michele Saviano
- CNR - Institute of Crystallography, Via Vivaldi 43, 81100, Caserta, Italy
| | - Marialessandra Contino
- Department of Pharmacy - Pharmaceutical Sciences, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125, Bari, Italy
| | - Angela Stefanachi
- Department of Pharmacy - Pharmaceutical Sciences, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125, Bari, Italy
| | | |
Collapse
|
7
|
Horne RI, Andrzejewska EA, Alam P, Brotzakis ZF, Srivastava A, Aubert A, Nowinska M, Gregory RC, Staats R, Possenti A, Chia S, Sormanni P, Ghetti B, Caughey B, Knowles TPJ, Vendruscolo M. Discovery of potent inhibitors of α-synuclein aggregation using structure-based iterative learning. Nat Chem Biol 2024; 20:634-645. [PMID: 38632492 PMCID: PMC11062903 DOI: 10.1038/s41589-024-01580-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 02/12/2024] [Indexed: 04/19/2024]
Abstract
Machine learning methods hold the promise to reduce the costs and the failure rates of conventional drug discovery pipelines. This issue is especially pressing for neurodegenerative diseases, where the development of disease-modifying drugs has been particularly challenging. To address this problem, we describe here a machine learning approach to identify small molecule inhibitors of α-synuclein aggregation, a process implicated in Parkinson's disease and other synucleinopathies. Because the proliferation of α-synuclein aggregates takes place through autocatalytic secondary nucleation, we aim to identify compounds that bind the catalytic sites on the surface of the aggregates. To achieve this goal, we use structure-based machine learning in an iterative manner to first identify and then progressively optimize secondary nucleation inhibitors. Our results demonstrate that this approach leads to the facile identification of compounds two orders of magnitude more potent than previously reported ones.
Collapse
Affiliation(s)
- Robert I Horne
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Ewa A Andrzejewska
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Parvez Alam
- Laboratory of Neurological Infections and Immunity, Rocky Mountain Laboratories, National Institute for Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT, USA
| | - Z Faidon Brotzakis
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Ankit Srivastava
- Laboratory of Neurological Infections and Immunity, Rocky Mountain Laboratories, National Institute for Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT, USA
| | - Alice Aubert
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Magdalena Nowinska
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Rebecca C Gregory
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Roxine Staats
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Andrea Possenti
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Sean Chia
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
- Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Pietro Sormanni
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Bernardino Ghetti
- Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Byron Caughey
- Laboratory of Neurological Infections and Immunity, Rocky Mountain Laboratories, National Institute for Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT, USA
| | - Tuomas P J Knowles
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Michele Vendruscolo
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
8
|
Pang C, Qiao J, Zeng X, Zou Q, Wei L. Deep Generative Models in De Novo Drug Molecule Generation. J Chem Inf Model 2024; 64:2174-2194. [PMID: 37934070 DOI: 10.1021/acs.jcim.3c01496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
The discovery of new drugs has important implications for human health. Traditional methods for drug discovery rely on experiments to optimize the structure of lead molecules, which are time-consuming and high-cost. Recently, artificial intelligence has exhibited promising and efficient performance for drug-like molecule generation. In particular, deep generative models achieve great success in de novo generation of drug-like molecules with desired properties, showing massive potential for novel drug discovery. In this study, we review the recent progress of molecule generation using deep generative models, mainly focusing on molecule representations, public databases, data processing tools, and advanced artificial intelligence based molecule generation frameworks. In particular, we present a comprehensive comparison of state-of-the-art deep generative models for molecule generation and a summary of commonly used molecular design strategies. We identify research gaps and challenges of molecule generation such as the need for better databases, missing 3D information in molecular representation, and the lack of high-precision evaluation metrics. We suggest future directions for molecular generation and drug discovery.
Collapse
Affiliation(s)
- Chao Pang
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha 410082, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| |
Collapse
|
9
|
Li H, Shee Y, Allen B, Maschietto F, Morgunov A, Batista V. Kernel-elastic autoencoder for molecular design. PNAS NEXUS 2024; 3:pgae168. [PMID: 38689710 PMCID: PMC11059255 DOI: 10.1093/pnasnexus/pgae168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 04/04/2024] [Indexed: 05/02/2024]
Abstract
We introduce the kernel-elastic autoencoder (KAE), a self-supervised generative model based on the transformer architecture with enhanced performance for molecular design. KAE employs two innovative loss functions: modified maximum mean discrepancy (m-MMD) and weighted reconstruction (L WCEL ). The m-MMD loss has significantly improved the generative performance of KAE when compared to using the traditional Kullback-Leibler loss of VAE, or standard maximum mean discrepancy. Including the weighted reconstruction loss L WCEL , KAE achieves valid generation and accurate reconstruction at the same time, allowing for generative behavior that is intermediate between VAE and autoencoder not available in existing generative approaches. Further advancements in KAE include its integration with conditional generation, setting a new state-of-the-art benchmark in constrained optimizations. Moreover, KAE has demonstrated its capability to generate molecules with favorable binding affinities in docking applications, as evidenced by AutoDock Vina and Glide scores, outperforming all existing candidates from the training dataset. Beyond molecular design, KAE holds promise to solve problems by generation across a broad spectrum of applications.
Collapse
Affiliation(s)
- Haote Li
- Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Yu Shee
- Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Brandon Allen
- Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | | | - Anton Morgunov
- Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Victor Batista
- Department of Chemistry, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
10
|
Zhang Y, Tong Y, Xia X, Wu Q, Su Y. A domain-label-guided translation model for molecular optimization. Methods 2024; 224:71-78. [PMID: 38395182 DOI: 10.1016/j.ymeth.2024.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/11/2024] [Accepted: 02/17/2024] [Indexed: 02/25/2024] Open
Abstract
Molecular optimization, which aims to improve molecular properties by modifying complex molecular structures, is a crucial and challenging task in drug discovery. In recent years, translation models provide a promising way to transform low-property molecules to high-property molecules, which enables molecular optimization to achieve remarkable progress. However, most existing models require matched molecular pairs, which are prone to be limited by the datasets. Although some models do not require matched molecular pairs, their performance is usually sacrificed due to the lack of useful supervising information. To address this issue, a domain-label-guided translation model is proposed in this paper, namely DLTM. In the model, the domain label information of molecules is exploited as a control condition to obtain different embedding representations, enabling the model to generate diverse molecules. Besides, the model adopts a classifier network to identify the property categories of transformed molecules, guiding the model to generate molecules with desired properties. The performance of DLTM is verified on two optimization tasks, namely the quantitative estimation of drug-likeness and penalized logP. Experimental results show that the proposed DLTM is superior to the compared baseline models.
Collapse
Affiliation(s)
- Yajie Zhang
- School of Computer Science and Technology, Anhui University, Hefei, 230601, China.
| | - Yongqi Tong
- School of Computer Science and Technology, Anhui University, Hefei, 230601, China.
| | - Xin Xia
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China; School of Artificial Intelligence, Anhui University, Hefei, 230601, China.
| | - Qingwen Wu
- Affiliated Hospital of Jining Medical University, Jining, 272007, China.
| | - Yansen Su
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China; School of Artificial Intelligence, Anhui University, Hefei, 230601, China.
| |
Collapse
|
11
|
Zhang C, Xie L, Lu X, Mao R, Xu L, Xu X. Developing an Improved Cycle Architecture for AI-Based Generation of New Structures Aimed at Drug Discovery. Molecules 2024; 29:1499. [PMID: 38611779 PMCID: PMC11013495 DOI: 10.3390/molecules29071499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/18/2024] [Accepted: 03/21/2024] [Indexed: 04/14/2024] Open
Abstract
Drug discovery involves a crucial step of optimizing molecules with the desired structural groups. In the domain of computer-aided drug discovery, deep learning has emerged as a prominent technique in molecular modeling. Deep generative models, based on deep learning, play a crucial role in generating novel molecules when optimizing molecules. However, many existing molecular generative models have limitations as they solely process input information in a forward way. To overcome this limitation, we propose an improved generative model called BD-CycleGAN, which incorporates BiLSTM (bidirectional long short-term memory) and Mol-CycleGAN (molecular cycle generative adversarial network) to preserve the information of molecular input. To evaluate the proposed model, we assess its performance by analyzing the structural distribution and evaluation matrices of generated molecules in the process of structural transformation. The results demonstrate that the BD-CycleGAN model achieves a higher success rate and exhibits increased diversity in molecular generation. Furthermore, we demonstrate its application in molecular docking, where it successfully increases the docking score for the generated molecules. The proposed BD-CycleGAN architecture harnesses the power of deep learning to facilitate the generation of molecules with desired structural features, thus offering promising advancements in the field of drug discovery processes.
Collapse
Affiliation(s)
| | | | | | | | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China; (C.Z.); (L.X.); (X.L.); (R.M.)
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China; (C.Z.); (L.X.); (X.L.); (R.M.)
| |
Collapse
|
12
|
Parrilla-Gutiérrez JM, Granda JM, Ayme JF, Bajczyk MD, Wilbraham L, Cronin L. Electron density-based GPT for optimization and suggestion of host-guest binders. NATURE COMPUTATIONAL SCIENCE 2024; 4:200-209. [PMID: 38459272 DOI: 10.1038/s43588-024-00602-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 01/23/2024] [Indexed: 03/10/2024]
Abstract
Here we present a machine learning model trained on electron density for the production of host-guest binders. These are read out as simplified molecular-input line-entry system (SMILES) format with >98% accuracy, enabling a complete characterization of the molecules in two dimensions. Our model generates three-dimensional representations of the electron density and electrostatic potentials of host-guest systems using a variational autoencoder, and then utilizes these representations to optimize the generation of guests via gradient descent. Finally the guests are converted to SMILES using a transformer. The successful practical application of our model to established molecular host systems, cucurbit[n]uril and metal-organic cages, resulted in the discovery of 9 previously validated guests for CB[6] and 7 unreported guests (with association constant Ka ranging from 13.5 M-1 to 5,470 M-1) and the discovery of 4 unreported guests for [Pd214]4+ (with Ka ranging from 44 M-1 to 529 M-1).
Collapse
Affiliation(s)
- Juan M Parrilla-Gutiérrez
- School of Chemistry, University of Glasgow, Glasgow, UK
- School of Computing, Engineering and Built Environment, Glasgow Caledonian University, Glasgow, UK
| | - Jarosław M Granda
- School of Chemistry, University of Glasgow, Glasgow, UK
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | | | | | | | - Leroy Cronin
- School of Chemistry, University of Glasgow, Glasgow, UK.
| |
Collapse
|
13
|
Zhu Z, Lu J, Yuan S, He Y, Zheng F, Jiang H, Yan Y, Sun Q. Automated Generation and Analysis of Molecular Images Using Generative Artificial Intelligence Models. J Phys Chem Lett 2024; 15:1985-1992. [PMID: 38346383 DOI: 10.1021/acs.jpclett.3c03504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
The development of scanning probe microscopy (SPM) has enabled unprecedented scientific discoveries through high-resolution imaging. Simulations and theoretical analysis of SPM images are equally important as obtaining experimental images since their comparisons provide fruitful understandings of the structures and physical properties of the investigated systems. So far, SPM image simulations are conventionally based on quantum mechanical theories, which can take several days in tasks of large-scale systems. Here, we have developed a scanning tunneling microscopy (STM) molecular image simulation and analysis framework based on a generative adversarial model, CycleGAN. It allows efficient translations between STM data and molecular models. Our CycleGAN-based framework introduces an approach for high-fidelity STM image simulation, outperforming traditional quantum mechanical methods in efficiency and accuracy. We envision that the integration of generative networks and high-resolution molecular imaging opens avenues in materials discovery relying on SPM technologies.
Collapse
Affiliation(s)
- Zhiwen Zhu
- Materials Genome Institute, Shanghai University, 200444 Shanghai, China
| | - Jiayi Lu
- Materials Genome Institute, Shanghai University, 200444 Shanghai, China
| | - Shaoxuan Yuan
- Materials Genome Institute, Shanghai University, 200444 Shanghai, China
| | - Yu He
- Materials Genome Institute, Shanghai University, 200444 Shanghai, China
| | - Fengru Zheng
- Materials Genome Institute, Shanghai University, 200444 Shanghai, China
| | - Hao Jiang
- Materials Genome Institute, Shanghai University, 200444 Shanghai, China
| | - Yuyi Yan
- Materials Genome Institute, Shanghai University, 200444 Shanghai, China
| | - Qiang Sun
- Materials Genome Institute, Shanghai University, 200444 Shanghai, China
| |
Collapse
|
14
|
Horne R, Wilson-Godber J, González Díaz A, Brotzakis ZF, Seal S, Gregory RC, Possenti A, Chia S, Vendruscolo M. Using Generative Modeling to Endow with Potency Initially Inert Compounds with Good Bioavailability and Low Toxicity. J Chem Inf Model 2024; 64:590-596. [PMID: 38261763 PMCID: PMC10865343 DOI: 10.1021/acs.jcim.3c01777] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/10/2023] [Accepted: 12/12/2023] [Indexed: 01/25/2024]
Abstract
In the early stages of drug development, large chemical libraries are typically screened to identify compounds of promising potency against the chosen targets. Often, however, the resulting hit compounds tend to have poor drug metabolism and pharmacokinetics (DMPK), with negative developability features that may be difficult to eliminate. Therefore, starting the drug discovery process with a "null library", compounds that have highly desirable DMPK properties but no potency against the chosen targets, could be advantageous. Here, we explore the opportunities offered by machine learning to realize this strategy in the case of the inhibition of α-synuclein aggregation, a process associated with Parkinson's disease. We apply MolDQN, a generative machine learning method, to build an inhibitory activity against α-synuclein aggregation into an initial inactive compound with good DMPK properties. Our results illustrate how generative modeling can be used to endow initially inert compounds with desirable developability properties.
Collapse
Affiliation(s)
- Robert
I. Horne
- Centre
for Misfolding Diseases, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United
Kingdom
| | - Jared Wilson-Godber
- Centre
for Misfolding Diseases, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United
Kingdom
| | - Alicia González Díaz
- Centre
for Misfolding Diseases, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United
Kingdom
| | - Z. Faidon Brotzakis
- Centre
for Misfolding Diseases, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United
Kingdom
| | - Srijit Seal
- Centre
for Misfolding Diseases, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United
Kingdom
- Imaging
Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | - Rebecca C. Gregory
- Centre
for Misfolding Diseases, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United
Kingdom
| | - Andrea Possenti
- Centre
for Misfolding Diseases, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United
Kingdom
| | - Sean Chia
- Centre
for Misfolding Diseases, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United
Kingdom
- Bioprocessing
Technology Institute, Agency for Science, Technology and Research (A*STAR), 138668 Singapore, Singapore
| | - Michele Vendruscolo
- Centre
for Misfolding Diseases, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United
Kingdom
| |
Collapse
|
15
|
Macedo B, Ribeiro Vaz I, Taveira Gomes T. MedGAN: optimized generative adversarial network with graph convolutional networks for novel molecule design. Sci Rep 2024; 14:1212. [PMID: 38216614 PMCID: PMC10786821 DOI: 10.1038/s41598-023-50834-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 12/26/2023] [Indexed: 01/14/2024] Open
Abstract
Generative Artificial Intelligence can be an important asset in the drug discovery process to meet the demand for novel medicines. This work outlines the optimization and fine-tuning steps of MedGAN, a deep learning model based on Wasserstein Generative Adversarial Networks and Graph Convolutional Networks, developed to generate new quinoline-scaffold molecules from complex molecular graphs, including hyperparameter adjustments and evaluations of drug-likeness attributes such as pharmacokinetics, toxicity, and synthetic accessibility. The best model was capable of generating 25% valid molecules, 62% fully connected, from which 92% were quinolines, 93% were novel, and 95% unique, preserving chirality, atom charge, and favorable drug-like properties while generating 4831 novel quinolines. These results provide valuable insights into how activation functions, optimizers, learning rates, neuron units, molecule size and constitution, and scaffold structure affect the performance of generative models and their potential to create new molecular structures, enhancing deep learning applications in computational drug design.
Collapse
Affiliation(s)
- Bruno Macedo
- Faculty of Medicine, University of Porto, Porto, Portugal.
- MedFacts Lda., Lisbon, Portugal.
| | - Inês Ribeiro Vaz
- Faculty of Medicine, University of Porto, Porto, Portugal
- Department of Community Medicine, Information and Decision in Health, Faculty of Medicine, University of Porto, Porto, Portugal
- Center for Health Technology and Services Research (CINTESIS), Porto, Portugal
| | - Tiago Taveira Gomes
- Faculty of Medicine, University of Porto, Porto, Portugal
- Department of Community Medicine, Information and Decision in Health, Faculty of Medicine, University of Porto, Porto, Portugal
- Faculty of Health Sciences, University Fernando Pessoa, Porto, Portugal
- SIGIL Scientific Enterprises, Dubai, UAE
| |
Collapse
|
16
|
Verburgt J, Jain A, Kihara D. Recent Deep Learning Applications to Structure-Based Drug Design. Methods Mol Biol 2024; 2714:215-234. [PMID: 37676602 PMCID: PMC10578466 DOI: 10.1007/978-1-0716-3441-7_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Identification and optimization of small molecules that bind to and modulate protein function is a crucial step in the early stages of drug development. For decades, this process has benefitted greatly from the use of computational models that can provide insights into molecular binding affinity and optimization. Over the past several years, various types of deep learning models have shown great potential in improving and enhancing the performance of traditional computational methods. In this chapter, we provide an overview of recent deep learning-based developments with applications in drug discovery. We classify these methods into four subcategories dependent on the task each method is aiming to solve. For each subcategory, we provide the general framework of the approach and discuss individual methods.
Collapse
Affiliation(s)
- Jacob Verburgt
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Anika Jain
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
17
|
Viswanathan K, Goel M, Laghuvarapu S, Varma G, Priyakumar UD. Streamlining pipeline efficiency: a novel model-agnostic technique for accelerating conditional generative and virtual screening pipelines. Sci Rep 2023; 13:21069. [PMID: 38030689 PMCID: PMC10686981 DOI: 10.1038/s41598-023-42952-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 09/16/2023] [Indexed: 12/01/2023] Open
Abstract
The discovery of potential therapeutic agents for life-threatening diseases has become a significant problem. There is a requirement for fast and accurate methods to identify drug-like molecules that can be used as potential candidates for novel targets. Existing techniques like high-throughput screening and virtual screening are time-consuming and inefficient. Traditional molecule generation pipelines are more efficient than virtual screening but use time-consuming docking software. Such docking functions can be emulated using Machine Learning models with comparable accuracy and faster execution times. However, we find that when pre-trained machine learning models are employed in generative pipelines as oracles, they suffer from model degradation in areas where data is scarce. In this study, we propose an active learning-based model that can be added as a supplement to enhanced molecule generation architectures. The proposed method uses uncertainty sampling on the molecules created by the generator model and dynamically learns as the generator samples molecules from different regions of the chemical space. The proposed framework can generate molecules with high binding affinity with [Formula: see text]a 70% improvement in runtime compared to the baseline model by labeling only [Formula: see text]30% of molecules compared to the baseline oracle.
Collapse
Affiliation(s)
- Karthik Viswanathan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Manan Goel
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Siddhartha Laghuvarapu
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Girish Varma
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India.
| |
Collapse
|
18
|
Erikawa D, Yasuo N, Suzuki T, Nakamura S, Sekijima M. Gargoyles: An Open Source Graph-Based Molecular Optimization Method Based on Deep Reinforcement Learning. ACS OMEGA 2023; 8:37431-37441. [PMID: 37841174 PMCID: PMC10568706 DOI: 10.1021/acsomega.3c05430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 09/13/2023] [Indexed: 10/17/2023]
Abstract
Automatic optimization methods for compounds in the vast compound space are important for drug discovery and material design. Several machine learning-based molecular generative models for drug discovery have been proposed, but most of these methods generate compounds from scratch and are not suitable for exploring and optimizing user-defined compounds. In this study, we developed a compound optimization method based on molecular graphs using deep reinforcement learning. This method searches for compounds on a fragment-by-fragment basis and at high density by generating fragments to be added atom by atom. Experimental results confirmed that the quantum electrodynamics (QED), the optimization target set in this study, was enhanced by searching around the starting compound. As a use case, we successfully enhanced the activity of a compound by targeting dopamine receptor D2 (DRD2). This means that the generated compounds are not structurally dissimilar from the starting compounds, as well as increasing their activity, indicating that this method is suitable for optimizing molecules from a given compound. The source code is available at https://github.com/sekijima-lab/GARGOYLES.
Collapse
Affiliation(s)
- Daiki Erikawa
- Department
of Computer Science, Tokyo Institute of
Technology, 4259-J3-23, Nagatsuta-cho, Midori-ku, Yokohama 226-8501, Japan
| | - Nobuaki Yasuo
- Academy
for Convergence of Materials and Informatics (TAC-MI), Tokyo Institute of Technology, S6-23, Ookayama, Meguro-ku, Tokyo 152-8550, Japan
| | - Takamasa Suzuki
- Department
of Computer Science, Tokyo Institute of
Technology, 4259-J3-23, Nagatsuta-cho, Midori-ku, Yokohama 226-8501, Japan
| | - Shogo Nakamura
- Department
of Life Science and Technology, Tokyo Institute
of Technology, 4259-J3-23, Nagatsuta-cho, Midori-ku, Yokohama 226-8501, Japan
| | - Masakazu Sekijima
- Department
of Computer Science, Tokyo Institute of
Technology, 4259-J3-23, Nagatsuta-cho, Midori-ku, Yokohama 226-8501, Japan
| |
Collapse
|
19
|
Zhu H, Zhou R, Cao D, Tang J, Li M. A pharmacophore-guided deep learning approach for bioactive molecular generation. Nat Commun 2023; 14:6234. [PMID: 37803000 PMCID: PMC10558534 DOI: 10.1038/s41467-023-41454-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 08/30/2023] [Indexed: 10/08/2023] Open
Abstract
The rational design of novel molecules with the desired bioactivity is a critical but challenging task in drug discovery, especially when treating a novel target family or understudied targets. We propose a Pharmacophore-Guided deep learning approach for bioactive Molecule Generation (PGMG). Through the guidance of pharmacophore, PGMG provides a flexible strategy for generating bioactive molecules. PGMG uses a graph neural network to encode spatially distributed chemical features and a transformer decoder to generate molecules. A latent variable is introduced to solve the many-to-many mapping between pharmacophores and molecules to improve the diversity of the generated molecules. Compared to existing methods, PGMG generates molecules with strong docking affinities and high scores of validity, uniqueness, and novelty. In the case studies, we use PGMG in a ligand-based and structure-based drug de novo design. Overall, the flexibility and effectiveness make PGMG a useful tool to accelerate the drug discovery process.
Collapse
Affiliation(s)
- Huimin Zhu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Renyi Zhou
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410008, China
| | - Jing Tang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, 00290, Finland
- Department of Biochemistry and Developmental Biology, Faculty of Medicine, University of Helsinki, Helsinki, 00290, Finland
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
| |
Collapse
|
20
|
Haroon S, C A H, A S J. Generative Pre-trained Transformer (GPT) based model with relative attention for de novo drug design. Comput Biol Chem 2023; 106:107911. [PMID: 37450999 DOI: 10.1016/j.compbiolchem.2023.107911] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Revised: 06/24/2023] [Accepted: 06/28/2023] [Indexed: 07/18/2023]
Abstract
De novo drug design refers to the process of designing new drug molecules from scratch using computational methods. In contrast to other computational methods that primarily focus on modifying existing molecules, designing from scratch enables the exploration of new chemical space and the potential discovery of novel molecules with enhanced properties. In this research, we proposed a model that utilizes Generative Pre-trained Transformer (GPT) architecture and relative attention for de novo drug design. GPT is a language model that utilizes transformer architecture to predict the next word or token in a given sequence. Representation of molecules using SMILES notation has enabled the use of next-token prediction techniques in de novo drug design. GPT uses attention mechanisms to capture the dependencies and relationships between different tokens in a sequence and allows the model to focus on the most important information when processing the input. Relative attention is a variant of the attention mechanism, which allows the model to capture the relative distances and relationships between tokens in the input sequence. In the standard attention mechanism, positional information is typically encoded using fixed-position embeddings. In relative attention, positional information is supplied dynamically during attention calculation by incorporating relative positional encodings, enabling the model to quickly learn the syntax of new unseen tokens. Relative attention enables the GPT model to better understand the relative positions of tokens in the sequence, which can be particularly useful when dealing with limited dataset sizes or generating target-specific drugs. The proposed model was trained on benchmark datasets, and performance was compared with other generative models. We show that relative attention and transfer learning could enable the GPT model to generate molecules with improved validity, uniqueness, and novelty in the context of de novo drug design. To illustrate the effectiveness of relative attention, the model was trained using transfer learning on three target-specific datasets, and the performance was compared with standard attention.
Collapse
Affiliation(s)
- Suhail Haroon
- Bioinformatics Lab, Department of Computer Science, Cochin University of Science and Technology, Kerala 682022, India.
| | - Hafsath C A
- Bioinformatics Lab, Department of Computer Science, Cochin University of Science and Technology, Kerala 682022, India
| | - Jereesh A S
- Bioinformatics Lab, Department of Computer Science, Cochin University of Science and Technology, Kerala 682022, India.
| |
Collapse
|
21
|
Liu K, Han Y, Gong Z, Xu H. Low-Data Drug Design with Few-Shot Generative Domain Adaptation. Bioengineering (Basel) 2023; 10:1104. [PMID: 37760206 PMCID: PMC10526055 DOI: 10.3390/bioengineering10091104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/04/2023] [Accepted: 09/18/2023] [Indexed: 09/29/2023] Open
Abstract
Developing new drugs for emerging diseases, such as COVID-19, is crucial for promoting public health. In recent years, the application of artificial intelligence (AI) has significantly advanced drug discovery pipelines. Generative models, such as generative adversarial networks (GANs), exhibit the potential for discovering novel drug molecules by relying on a vast number of training samples. However, for new diseases, only a few samples are typically available, posing a significant challenge to learning a generative model that produces both high-quality and diverse molecules under limited supervision. To address this low-data drug generation issue, we propose a novel molecule generative domain adaptation paradigm (Mol-GenDA), which transfers a pre-trained GAN on a large-scale drug molecule dataset to a new disease domain using only a few references. Specifically, we introduce a molecule adaptor into the GAN generator during the fine tuning, allowing the generator to reuse prior knowledge learned in pre-training to the greatest extent and maintain the quality and diversity of the generated molecules. Comprehensive downstream experiments demonstrate that Mol-GenDA can produce high-quality and diverse drug candidates. In summary, the proposed approach offers a promising solution to expedite drug discovery for new diseases, which could lead to the timely development of effective drugs to combat emerging outbreaks.
Collapse
Affiliation(s)
- Ke Liu
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou 311200, China;
| | - Yuqiang Han
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou 311200, China;
| | - Zhichen Gong
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou 311200, China;
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Hongxia Xu
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
22
|
Wojtuch A, Danel T, Podlewska S, Maziarka Ł. Extended study on atomic featurization in graph neural networks for molecular property prediction. J Cheminform 2023; 15:81. [PMID: 37726841 PMCID: PMC10507875 DOI: 10.1186/s13321-023-00751-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 08/23/2023] [Indexed: 09/21/2023] Open
Abstract
Graph neural networks have recently become a standard method for analyzing chemical compounds. In the field of molecular property prediction, the emphasis is now on designing new model architectures, and the importance of atom featurization is oftentimes belittled. When contrasting two graph neural networks, the use of different representations possibly leads to incorrect attribution of the results solely to the network architecture. To better understand this issue, we compare multiple atom representations by evaluating them on the prediction of free energy, solubility, and metabolic stability using graph convolutional networks. We discover that the choice of atom representation has a significant impact on model performance and that the optimal subset of features is task-specific. Additional experiments involving more sophisticated architectures, including graph transformers, support these findings. Moreover, we demonstrate that some commonly used atom features, such as the number of neighbors or the number of hydrogens, can be easily predicted using only information about bonds and atom type, yet their explicit inclusion in the representation has a positive impact on model performance. Finally, we explain the predictions of the best-performing models to better understand how they utilize the available atomic features.
Collapse
Affiliation(s)
- Agnieszka Wojtuch
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland.
| | - Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Smętna 12, 31-343, Kraków, Poland
| | - Łukasz Maziarka
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland
| |
Collapse
|
23
|
Jin J, Wang D, Shi G, Bao J, Wang J, Zhang H, Pan P, Li D, Yao X, Liu H, Hou T, Kang Y. FFLOM: A Flow-Based Autoregressive Model for Fragment-to-Lead Optimization. J Med Chem 2023; 66:10808-10823. [PMID: 37471134 DOI: 10.1021/acs.jmedchem.3c01009] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/21/2023]
Abstract
Recently, deep generative models have been regarded as promising tools in fragment-based drug design (FBDD). Despite the growing interest in these models, they still face challenges in generating molecules with desired properties in low data regimes. In this study, we propose a novel flow-based autoregressive model named FFLOM for linker and R-group design. In a large-scale benchmark evaluation on ZINC, CASF, and PDBbind test sets, FFLOM achieves state-of-the-art performance in terms of validity, uniqueness, novelty, and recovery of the generated molecules and can recover over 92% of the original molecules in the PDBbind test set (with at least five atoms). FFLOM also exhibits excellent potential applicability in several practical scenarios encompassing fragment linking, PROTAC design, R-group growing, and R-group optimization. In all four cases, FFLOM can perfectly reconstruct the ground-truth compounds and generate over 74% of molecules with novel fragments, some of which have higher binding affinity than the ground truth.
Collapse
Affiliation(s)
- Jieyu Jin
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Guqin Shi
- Shanghai Qilu Pharmaceutical R&D Center, 576 Libing Road, Pudong New Area District, Shanghai 310115, China
| | - Jingxiao Bao
- Shanghai Qilu Pharmaceutical R&D Center, 576 Libing Road, Pudong New Area District, Shanghai 310115, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Haotian Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macau 999078, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macau 999078, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
24
|
Wang SH, Chen G, Zhong X, Lin T, Shen Y, Fan X, Cao L. Global development of artificial intelligence in cancer field: a bibliometric analysis range from 1983 to 2022. Front Oncol 2023; 13:1215729. [PMID: 37519796 PMCID: PMC10382324 DOI: 10.3389/fonc.2023.1215729] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 06/26/2023] [Indexed: 08/01/2023] Open
Abstract
Background Artificial intelligence (AI) is widely applied in cancer field nowadays. The aim of this study is to explore the hotspots and trends of AI in cancer research. Methods The retrieval term includes four topic words ("tumor," "cancer," "carcinoma," and "artificial intelligence"), which were searched in the database of Web of Science from January 1983 to December 2022. Then, we documented and processed all data, including the country, continent, Journal Impact Factor, and so on using the bibliometric software. Results A total of 6,920 papers were collected and analyzed. We presented the annual publications and citations, most productive countries/regions, most influential scholars, the collaborations of journals and institutions, and research focus and hotspots in AI-based cancer research. Conclusion This study systematically summarizes the current research overview of AI in cancer research so as to lay the foundation for future research.
Collapse
Affiliation(s)
- Sui-Han Wang
- Department of General Surgery, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Guoqiao Chen
- Department of General Surgery, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Xin Zhong
- Department of General Surgery, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Tianyu Lin
- Department of General Surgery, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Yan Shen
- Department of General Surgery, The First People’s Hospital of Yu Hang District, Hangzhou, China
| | - Xiaoxiao Fan
- Department of General Surgery, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Liping Cao
- Department of General Surgery, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| |
Collapse
|
25
|
Ciepliński T, Danel T, Podlewska S, Jastrzȩbski S. Generative Models Should at Least Be Able to Design Molecules That Dock Well: A New Benchmark. J Chem Inf Model 2023. [PMID: 37224003 DOI: 10.1021/acs.jcim.2c01355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Designing compounds with desired properties is a key element of the drug discovery process. However, measuring progress in the field has been challenging due to the lack of realistic retrospective benchmarks, and the large cost of prospective validation. To close this gap, we propose a benchmark based on docking, a widely used computational method for assessing molecule binding to a protein. Concretely, the goal is to generate drug-like molecules that are scored highly by SMINA, a popular docking software. We observe that various graph-based generative models fail to propose molecules with a high docking score when trained using a realistically sized training set. This suggests a limitation of the current incarnation of models for de novo drug design. Finally, we also include simpler tasks in the benchmark based on a simpler scoring function. We release the benchmark as an easy to use package available at https://github.com/cieplinski-tobiasz/smina-docking-benchmark. We hope that our benchmark will serve as a stepping stone toward the goal of automatically generating promising drug candidates.
Collapse
Affiliation(s)
- Tobiasz Ciepliński
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348 Kraków, Poland
| | - Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348 Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Smȩtna 12, 31-343 Kraków, Poland
| | - Stanisław Jastrzȩbski
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348 Kraków, Poland
- Molecule.one, Al. Jerozolimskie 96, 00-807 Warsaw, Poland
| |
Collapse
|
26
|
Bhat V, Callaway CP, Risko C. Computational Approaches for Organic Semiconductors: From Chemical and Physical Understanding to Predicting New Materials. Chem Rev 2023. [PMID: 37141497 DOI: 10.1021/acs.chemrev.2c00704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
While a complete understanding of organic semiconductor (OSC) design principles remains elusive, computational methods─ranging from techniques based in classical and quantum mechanics to more recent data-enabled models─can complement experimental observations and provide deep physicochemical insights into OSC structure-processing-property relationships, offering new capabilities for in silico OSC discovery and design. In this Review, we trace the evolution of these computational methods and their application to OSCs, beginning with early quantum-chemical methods to investigate resonance in benzene and building to recent machine-learning (ML) techniques and their application to ever more sophisticated OSC scientific and engineering challenges. Along the way, we highlight the limitations of the methods and how sophisticated physical and mathematical frameworks have been created to overcome those limitations. We illustrate applications of these methods to a range of specific challenges in OSCs derived from π-conjugated polymers and molecules, including predicting charge-carrier transport, modeling chain conformations and bulk morphology, estimating thermomechanical properties, and describing phonons and thermal transport, to name a few. Through these examples, we demonstrate how advances in computational methods accelerate the deployment of OSCsin wide-ranging technologies, such as organic photovoltaics (OPVs), organic light-emitting diodes (OLEDs), organic thermoelectrics, organic batteries, and organic (bio)sensors. We conclude by providing an outlook for the future development of computational techniques to discover and assess the properties of high-performing OSCs with greater accuracy.
Collapse
Affiliation(s)
- Vinayak Bhat
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506-0055, United States
| | - Connor P Callaway
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506-0055, United States
| | - Chad Risko
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506-0055, United States
| |
Collapse
|
27
|
Zhang Z, Wei X. Artificial intelligence-assisted selection and efficacy prediction of antineoplastic strategies for precision cancer therapy. Semin Cancer Biol 2023; 90:57-72. [PMID: 36796530 DOI: 10.1016/j.semcancer.2023.02.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 01/12/2023] [Accepted: 02/13/2023] [Indexed: 02/16/2023]
Abstract
The rapid development of artificial intelligence (AI) technologies in the context of the vast amount of collectable data obtained from high-throughput sequencing has led to an unprecedented understanding of cancer and accelerated the advent of a new era of clinical oncology with a tone of precision treatment and personalized medicine. However, the gains achieved by a variety of AI models in clinical oncology practice are far from what one would expect, and in particular, there are still many uncertainties in the selection of clinical treatment options that pose significant challenges to the application of AI in clinical oncology. In this review, we summarize emerging approaches, relevant datasets and open-source software of AI and show how to integrate them to address problems from clinical oncology and cancer research. We focus on the principles and procedures for identifying different antitumor strategies with the assistance of AI, including targeted cancer therapy, conventional cancer therapy, and cancer immunotherapy. In addition, we also highlight the current challenges and directions of AI in clinical oncology translation. Overall, we hope this article will provide researchers and clinicians with a deeper understanding of the role and implications of AI in precision cancer therapy, and help AI move more quickly into accepted cancer guidelines.
Collapse
Affiliation(s)
- Zhe Zhang
- Laboratory of Aging Research and Cancer Drug Target, State Key Laboratory of Biotherapy and Cancer Center, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, PR China; State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy, Chengdu 610041, PR China
| | - Xiawei Wei
- Laboratory of Aging Research and Cancer Drug Target, State Key Laboratory of Biotherapy and Cancer Center, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, PR China.
| |
Collapse
|
28
|
Guo X, Zhao L. A Systematic Survey on Deep Generative Models for Graph Generation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:5370-5390. [PMID: 36251910 DOI: 10.1109/tpami.2022.3214832] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Graphs are important data representations for describing objects and their relationships, which appear in a wide diversity of real-world scenarios. As one of a critical problem in this area, graph generation considers learning the distributions of given graphs and generating more novel graphs. Owing to their wide range of applications, generative models for graphs, which have a rich history, however, are traditionally hand-crafted and only capable of modeling a few statistical properties of graphs. Recent advances in deep generative models for graph generation is an important step towards improving the fidelity of generated graphs and paves the way for new kinds of applications. This article provides an extensive overview of the literature in the field of deep generative models for graph generation. First, the formal definition of deep generative models for the graph generation and the preliminary knowledge are provided. Second, taxonomies of deep generative models for both unconditional and conditional graph generation are proposed respectively; the existing works of each are compared and analyzed. After that, an overview of the evaluation metrics in this specific domain is provided. Finally, the applications that deep graph generation enables are summarized and five promising future research directions are highlighted.
Collapse
|
29
|
Liu X, Zhang W, Tong X, Zhong F, Li Z, Xiong Z, Xiong J, Wu X, Fu Z, Tan X, Liu Z, Zhang S, Jiang H, Li X, Zheng M. MolFilterGAN: a progressively augmented generative adversarial network for triaging AI-designed molecules. J Cheminform 2023; 15:42. [PMID: 37031191 PMCID: PMC10082991 DOI: 10.1186/s13321-023-00711-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 03/14/2023] [Indexed: 04/10/2023] Open
Abstract
Artificial intelligence (AI)-based molecular design methods, especially deep generative models for generating novel molecule structures, have gratified our imagination to explore unknown chemical space without relying on brute-force exploration. However, whether designed by AI or human experts, the molecules need to be accessibly synthesized and biologically evaluated, and the trial-and-error process remains a resources-intensive endeavor. Therefore, AI-based drug design methods face a major challenge of how to prioritize the molecular structures with potential for subsequent drug development. This study indicates that common filtering approaches based on traditional screening metrics fail to differentiate AI-designed molecules. To address this issue, we propose a novel molecular filtering method, MolFilterGAN, based on a progressively augmented generative adversarial network. Comparative analysis shows that MolFilterGAN outperforms conventional screening approaches based on drug-likeness or synthetic ability metrics. Retrospective analysis of AI-designed discoidin domain receptor 1 (DDR1) inhibitors shows that MolFilterGAN significantly increases the efficiency of molecular triaging. Further evaluation of MolFilterGAN on eight external ligand sets suggests that MolFilterGAN is useful in triaging or enriching bioactive compounds across a wide range of target types. These results highlighted the importance of MolFilterGAN in evaluating molecules integrally and further accelerating molecular discovery especially combined with advanced AI generative models.
Collapse
Affiliation(s)
- Xiaohong Liu
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- AlphaMa Inc., No. 108, Yuxin Road, Suzhou Industrial Park, Suzhou, 215128, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Feisheng Zhong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Zhaojun Li
- AlphaMa Inc., No. 108, Yuxin Road, Suzhou Industrial Park, Suzhou, 215128, China
| | - Zhaoping Xiong
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Jiacheng Xiong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaolong Wu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xiaoqin Tan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- ByteDance AI Lab, No. 1999 Yishan Road, Shanghai, 201103, China
| | - Zhiguo Liu
- AlphaMa Inc., No. 108, Yuxin Road, Suzhou Industrial Park, Suzhou, 215128, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Hualiang Jiang
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 310024, Hangzhou, China.
| |
Collapse
|
30
|
Chen Y, Wang Z, Wang L, Wang J, Li P, Cao D, Zeng X, Ye X, Sakurai T. Deep generative model for drug design from protein target sequence. J Cheminform 2023; 15:38. [PMID: 36978179 PMCID: PMC10052801 DOI: 10.1186/s13321-023-00702-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 02/18/2023] [Indexed: 03/30/2023] Open
Abstract
Drug discovery for a protein target is a laborious and costly process. Deep learning (DL) methods have been applied to drug discovery and successfully generated novel molecular structures, and they can substantially reduce development time and costs. However, most of them rely on prior knowledge, either by drawing on the structure and properties of known molecules to generate similar candidate molecules or extracting information on the binding sites of protein pockets to obtain molecules that can bind to them. In this paper, DeepTarget, an end-to-end DL model, was proposed to generate novel molecules solely relying on the amino acid sequence of the target protein to reduce the heavy reliance on prior knowledge. DeepTarget includes three modules: Amino Acid Sequence Embedding (AASE), Structural Feature Inference (SFI), and Molecule Generation (MG). AASE generates embeddings from the amino acid sequence of the target protein. SFI inferences the potential structural features of the synthesized molecule, and MG seeks to construct the eventual molecule. The validity of the generated molecules was demonstrated by a benchmark platform of molecular generation models. The interaction between the generated molecules and the target proteins was also verified on the basis of two metrics, drug-target affinity and molecular docking. The results of the experiments indicated the efficacy of the model for direct molecule generation solely conditioned on amino acid sequence.
Collapse
Affiliation(s)
- Yangyang Chen
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Zixu Wang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Lei Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, China
| | - Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon, 21983, Republic of Korea
- Bioinformatics and Molecular Design Research Center (BMDRC), Incheon, 21983, Republic of Korea
| | - Pengyong Li
- School of Computer Science and Technology, Xidian University, Xian, 710071, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, China.
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, People's Republic of China.
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| |
Collapse
|
31
|
Yao L, Yang M, Song J, Yang Z, Sun H, Shi H, Liu X, Ji X, Deng Y, Wang X. Conditional Molecular Generation Net Enables Automated Structure Elucidation Based on 13C NMR Spectra and Prior Knowledge. Anal Chem 2023; 95:5393-5401. [PMID: 36926883 DOI: 10.1021/acs.analchem.2c05817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Structure elucidation of unknown compounds based on nuclear magnetic resonance (NMR) remains a challenging problem in both synthetic organic and natural product chemistry. Library matching has been an efficient method to assist structure elucidation. However, it is limited by the coverage of libraries. In addition, prior knowledge such as molecular fragments is neglected. To solve the problem, we propose a conditional molecular generation net (CMGNet) to allow input of multiple sources of information. CMGNet not only uses 13C NMR spectrum data as input but molecular formulas and fragments of molecules are also employed as input conditions. Our model applies large-scale pretraining for molecular understanding and fine-tuning on two NMR spectral data sets of different granularity levels to accommodate structure elucidation tasks. CMGNet generates structures based on 13C NMR data, molecular formula, and fragment information, with a recovery rate of 94.17% in the top 10 recommendations. In addition, the generative model performed well in the generation of various classes of compounds and in the structural revision task. CMGNet has a deep understanding of molecular connectivities from 13C NMR, molecular formula, and fragments, paving the way for a new paradigm of deep learning-assisted inverse problem-solving.
Collapse
Affiliation(s)
- Lin Yao
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Jianfei Song
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Zhuo Yang
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hui Shi
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Xiangyang Ji
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China.,Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China.,CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| |
Collapse
|
32
|
Joel IY, Sulaimon LA, Idris MO, Adigun TO, Adisa RA, Ademoye TA, Ogunleye MO, Olaniyi TO. Descriptor-free QSAR: effectiveness in screening for putative inhibitors of FGFR1. J Biomol Struct Dyn 2023; 41:2016-2032. [PMID: 35073829 DOI: 10.1080/07391102.2022.2026248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The long short-term memory (LSTM) algorithm has provided solutions to the limitations of the descriptors-utilizing QSAR models in drug design. However, the direct application of LSTM remains scarce. The effectiveness of a descriptor-free QSAR (LSTM-SM) in modeling the FGFR1 inhibitors dataset while comparing with two conventional QSAR using descriptors (126 bits Morgan fingerprint and 2 D descriptors respectively) as a baseline model was investigated in this study. The validated descriptor-free QSAR model was thereafter used to screen for active FGFR1 inhibitors in the ChemDiv database and subjected to molecular docking, induced-fit docking, QM-MM optimization, and molecular dynamics simulations to filter for compounds with high binding affinity and suggest the putative mechanism of inhibition and specificity. The LSTM-SM model performed better than conventional QSAR; having accuracy, specificity, and sensitivity of 0.92, model loss of 0.025, and AUC of 0.95. Fifteen thousand compounds were predicted as actives from the ChemDiv database and four compounds were finally selected. Of the four, two showed putatively effective binding interactions with key active site residues. Molecular dynamics simulations on these compounds in complex with the receptor further give insight into the conformational dynamics of each compound bounded to the receptor. The complexes formed are stable and exhibit a similar degree of compactness. Our findings predicted the advent of self-feature extracting machine learning algorithms of compounds, and have provided the possibility of better predictive model quality that is not necessarily limited by compound descriptors. The putative FGFR1 inhibitors, with their mechanism of inhibition and specificity, were elucidated using this approachCommunicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- I Y Joel
- University of Ilorin Molecular Diagnostic and Research Laboratory, Ilorin, Kwara State, Nigeria
| | - L A Sulaimon
- Department of Biochemistry, Faculty of Basic Medical Sciences, College of Medicine University of Lagos, Idi-araba, Lagos, Nigeria
| | - M O Idris
- School of Life Sciences, University of Science and Technology of China, Hefei, China
| | - T O Adigun
- University of Ilorin Molecular Diagnostic and Research Laboratory, Ilorin, Kwara State, Nigeria
| | - R A Adisa
- Department of Biochemistry, Faculty of Basic Medical Sciences, College of Medicine University of Lagos, Idi-araba, Lagos, Nigeria
| | - T A Ademoye
- Department of Biochemistry, Faculty of Basic Medical Sciences, College of Medicine University of Lagos, Idi-araba, Lagos, Nigeria
| | - M O Ogunleye
- Department of Biochemistry, Faculty of Basic Medical Sciences, College of Medicine University of Lagos, Idi-araba, Lagos, Nigeria
| | - T O Olaniyi
- Department of Science Laboratory Technology, Faculty of Science, Oyo State College of Agriculture and Technology, Igbo-ora, Oyo, Nigeria
| |
Collapse
|
33
|
Wang L, Song Y, Wang H, Zhang X, Wang M, He J, Li S, Zhang L, Li K, Cao L. Advances of Artificial Intelligence in Anti-Cancer Drug Design: A Review of the Past Decade. Pharmaceuticals (Basel) 2023; 16:253. [PMID: 37259400 PMCID: PMC9963982 DOI: 10.3390/ph16020253] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 01/25/2023] [Accepted: 02/06/2023] [Indexed: 10/13/2023] Open
Abstract
Anti-cancer drug design has been acknowledged as a complicated, expensive, time-consuming, and challenging task. How to reduce the research costs and speed up the development process of anti-cancer drug designs has become a challenging and urgent question for the pharmaceutical industry. Computer-aided drug design methods have played a major role in the development of cancer treatments for over three decades. Recently, artificial intelligence has emerged as a powerful and promising technology for faster, cheaper, and more effective anti-cancer drug designs. This study is a narrative review that reviews a wide range of applications of artificial intelligence-based methods in anti-cancer drug design. We further clarify the fundamental principles of these methods, along with their advantages and disadvantages. Furthermore, we collate a large number of databases, including the omics database, the epigenomics database, the chemical compound database, and drug databases. Other researchers can consider them and adapt them to their own requirements.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Kang Li
- Department of Biostatistics, School of Public Health, Harbin Medical University, Harbin 150081, China
| | - Lei Cao
- Department of Biostatistics, School of Public Health, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
34
|
Chen L, Yu L, Gao L. Potent antibiotic design via guided search from antibacterial activity evaluations. Bioinformatics 2023; 39:7008322. [PMID: 36707990 PMCID: PMC9897189 DOI: 10.1093/bioinformatics/btad059] [Citation(s) in RCA: 39] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 01/14/2023] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION The emergence of drug-resistant bacteria makes the discovery of new antibiotics an urgent issue, but finding new molecules with the desired antibacterial activity is an extremely difficult task. To address this challenge, we established a framework, MDAGS (Molecular Design via Attribute-Guided Search), to optimize and generate potent antibiotic molecules. RESULTS By designing the antibacterial activity latent space and guiding the optimization of functional compounds based on this space, the model MDAGS can generate novel compounds with desirable antibacterial activity without the need for extensive expensive and time-consuming evaluations. Compared with existing antibiotics, candidate antibacterial compounds generated by MDAGS always possessed significantly better antibacterial activity and ensured high similarity. Furthermore, although without explicit constraints on similarity to known antibiotics, these candidate antibacterial compounds all exhibited the highest structural similarity to antibiotics of expected function in the DrugBank database query. Overall, our approach provides a viable solution to the problem of bacterial drug resistance. AVAILABILITY AND IMPLEMENTATION Code of the model and datasets can be downloaded from GitHub (https://github.com/LiangYu-Xidian/MDAGS). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lu Chen
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| |
Collapse
|
35
|
Kumar M, Nguyen TPN, Kaur J, Singh TG, Soni D, Singh R, Kumar P. Opportunities and challenges in application of artificial intelligence in pharmacology. Pharmacol Rep 2023; 75:3-18. [PMID: 36624355 PMCID: PMC9838466 DOI: 10.1007/s43440-022-00445-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 12/23/2022] [Accepted: 12/25/2022] [Indexed: 01/11/2023]
Abstract
Artificial intelligence (AI) is a machine science that can mimic human behaviour like intelligent analysis of data. AI functions with specialized algorithms and integrates with deep and machine learning. Living in the digital world can generate a huge amount of medical data every day. Therefore, we need an automated and reliable evaluation tool that can make decisions more accurately and faster. Machine learning has the potential to learn, understand and analyse the data used in healthcare systems. In the last few years, AI is known to be employed in various fields in pharmaceutical science especially in pharmacological research. It helps in the analysis of preclinical (laboratory animals) and clinical (in human) trial data. AI also plays important role in various processes such as drug discovery/manufacturing, diagnosis of big data for disease identification, personalized treatment, clinical trial research, radiotherapy, surgical robotics, smart electronic health records, and epidemic outbreak prediction. Moreover, AI has been used in the evaluation of biomarkers and diseases. In this review, we explain various models and general processes of machine learning and their role in pharmacological science. Therefore, AI with deep learning and machine learning could be relevant in pharmacological research.
Collapse
Affiliation(s)
- Mandeep Kumar
- Department of Pharmacy, Unit of Pharmacology and Toxicology, University of Genoa, Genoa, Italy
| | - T P Nhung Nguyen
- Department of Pharmacy, Unit of Pharmacology and Toxicology, University of Genoa, Genoa, Italy
- Department of Pharmacy, Da Nang University of Medical Technology and Pharmacy, Da Nang, Vietnam
| | - Jasleen Kaur
- Department of Pharmacology and Toxicology, National Institute of Pharmaceutical Education and Research (NIPER), Lucknow, Uttar Pradesh, 226002, India
| | | | - Divya Soni
- Department of Pharmacology, Central University of Punjab, Ghudda, Bathinda, Punjab, 151401, India
| | - Randhir Singh
- Department of Pharmacology, Central University of Punjab, Ghudda, Bathinda, Punjab, 151401, India
| | - Puneet Kumar
- Department of Pharmacology, Central University of Punjab, Ghudda, Bathinda, Punjab, 151401, India.
| |
Collapse
|
36
|
Duran-Frigola M, Cigler M, Winter GE. Advancing Targeted Protein Degradation via Multiomics Profiling and Artificial Intelligence. J Am Chem Soc 2023; 145:2711-2732. [PMID: 36706315 PMCID: PMC9912273 DOI: 10.1021/jacs.2c11098] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Only around 20% of the human proteome is considered to be druggable with small-molecule antagonists. This leaves some of the most compelling therapeutic targets outside the reach of ligand discovery. The concept of targeted protein degradation (TPD) promises to overcome some of these limitations. In brief, TPD is dependent on small molecules that induce the proximity between a protein of interest (POI) and an E3 ubiquitin ligase, causing ubiquitination and degradation of the POI. In this perspective, we want to reflect on current challenges in the field, and discuss how advances in multiomics profiling, artificial intelligence, and machine learning (AI/ML) will be vital in overcoming them. The presented roadmap is discussed in the context of small-molecule degraders but is equally applicable for other emerging proximity-inducing modalities.
Collapse
Affiliation(s)
- Miquel Duran-Frigola
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria,Ersilia
Open Source Initiative, 28 Belgrave Road, CB1 3DE, Cambridge, United Kingdom,
| | - Marko Cigler
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| | - Georg E. Winter
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria,
| |
Collapse
|
37
|
Abate C, Decherchi S, Cavalli A. Graph neural networks for conditional de novo drug design. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2023. [DOI: 10.1002/wcms.1651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Carlo Abate
- Fondazione Istituto Italiano di Tecnologia Genoa Italy
- Università degli Studi di Bologna Bologna Italy
| | | | - Andrea Cavalli
- Fondazione Istituto Italiano di Tecnologia Genoa Italy
- Università degli Studi di Bologna Bologna Italy
| |
Collapse
|
38
|
Artificial intelligence in cancer research and precision medicine: Applications, limitations and priorities to drive transformation in the delivery of equitable and unbiased care. Cancer Treat Rev 2023; 112:102498. [PMID: 36527795 DOI: 10.1016/j.ctrv.2022.102498] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/03/2022] [Accepted: 12/06/2022] [Indexed: 12/14/2022]
Abstract
Artificial intelligence (AI) has experienced explosive growth in oncology and related specialties in recent years. The improved expertise in data capture, the increased capacity for data aggregation and analytic power, along with decreasing costs of genome sequencing and related biologic "omics", set the foundation and need for novel tools that can meaningfully process these data from multiple sources and of varying types. These advances provide value across biomedical discovery, diagnosis, prognosis, treatment, and prevention, in a multimodal fashion. However, while big data and AI tools have already revolutionized many fields, medicine has partially lagged due to its complexity and multi-dimensionality, leading to technical challenges in developing and validating solutions that generalize to diverse populations. Indeed, inner biases and miseducation of algorithms, in view of their implementation in daily clinical practice, are increasingly relevant concerns; critically, it is possible for AI to mirror the unconscious biases of the humans who generated these algorithms. Therefore, to avoid worsening existing health disparities, it is critical to employ a thoughtful, transparent, and inclusive approach that involves addressing bias in algorithm design and implementation along the cancer care continuum. In this review, a broad landscape of major applications of AI in cancer care is provided, with a focus on cancer research and precision medicine. Major challenges posed by the implementation of AI in the clinical setting will be discussed. Potentially feasible solutions for mitigating bias are provided, in the light of promoting cancer health equity.
Collapse
|
39
|
Noguchi S, Inoue J. Exploration of Chemical Space Guided by PixelCNN for Fragment-Based De Novo Drug Discovery. J Chem Inf Model 2022; 62:5988-6001. [PMID: 36454646 DOI: 10.1021/acs.jcim.2c01345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
We report a novel framework for achieving fragment-based molecular design using pixel convolutional neural network (PixelCNN) combined with the simplified molecular input line entry system (SMILES) as molecular representation. While a widely used recurrent neural network (RNN) assumes monotonically decaying correlations in strings, PixelCNN captures a periodicity among characters of SMILES. Thus, PixelCNN provides us with a novel solution for the analysis of chemical space by extracting the periodicity of molecular structures that will be buried in SMILES. Moreover, this characteristic enables us to generate molecules by combining several simple building blocks, such as a benzene ring and side-chain structures, which contributes to the effective exploration of chemical space by step-by-step searching for molecules from a target fragment. In conclusion, PixelCNN could be a powerful approach focusing on the periodicity of molecules to explore chemical space for the fragment-based molecular design.
Collapse
Affiliation(s)
- Satoshi Noguchi
- Department of Advanced Interdisciplinary Studies, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo153-8904, Japan
| | - Junya Inoue
- Institute for Industrial Science, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba277-0082, Japan.,Department of Materials Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo113-8656, Japan.,Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo153-8904, Japan
| |
Collapse
|
40
|
Urbina F, Ekins S. The Commoditization of AI for Molecule Design. ARTIFICIAL INTELLIGENCE IN THE LIFE SCIENCES 2022; 2:100031. [PMID: 36211981 PMCID: PMC9541920 DOI: 10.1016/j.ailsci.2022.100031] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Anyone involved in designing or finding molecules in the life sciences over the past few years has witnessed a dramatic change in how we now work due to the COVID-19 pandemic. Computational technologies like artificial intelligence (AI) seemed to become ubiquitous in 2020 and have been increasingly applied as scientists worked from home and were separated from the laboratory and their colleagues. This shift may be more permanent as the future of molecule design across different industries will increasingly require machine learning models for design and optimization of molecules as they become "designed by AI". AI and machine learning has essentially become a commodity within the pharmaceutical industry. This perspective will briefly describe our personal opinions of how machine learning has evolved and is being applied to model different molecule properties that crosses industries in their utility and ultimately suggests the potential for tight integration of AI into equipment and automated experimental pipelines. It will also describe how many groups have implemented generative models covering different architectures, for de novo design of molecules. We also highlight some of the companies at the forefront of using AI to demonstrate how machine learning has impacted and influenced our work. Finally, we will peer into the future and suggest some of the areas that represent the most interesting technologies that may shape the future of molecule design, highlighting how we can help increase the efficiency of the design-make-test cycle which is currently a major focus across industries.
Collapse
Affiliation(s)
- Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| |
Collapse
|
41
|
Li S, Wang X, Wu Y, Duan H, Tang L. Generation of novel Diels-Alder reactions using a generative adversarial network. RSC Adv 2022; 12:33801-33807. [PMID: 36505715 PMCID: PMC9693912 DOI: 10.1039/d2ra06022a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Accepted: 11/07/2022] [Indexed: 11/27/2022] Open
Abstract
Deep learning has enormous potential in the chemical and pharmaceutical fields, and generative adversarial networks (GANs) in particular have exhibited remarkable performance in the field of molecular generation as generative models. However, their application in the field of organic chemistry has been limited; thus, in this study, we attempt to utilize a GAN as a generative model for the generation of Diels-Alder reactions. A MaskGAN model was trained with 14 092 Diels-Alder reactions, and 1441 novel Diels-Alder reactions were generated. Analysis of the generated reactions indicated that the model learned several reaction rules in-depth. Thus, the MaskGAN model can be used to generate organic reactions and aid chemists in the exploration of novel reactions.
Collapse
Affiliation(s)
- Sheng Li
- College of Pharmaceutical Sciences, Zhejiang University of TechnologyHangzhou 310014P. R. China,Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of TechnologyHangzhou 310014P. R. China
| | - Xinqiao Wang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of TechnologyHangzhou 310014P. R. China
| | - Yejian Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of TechnologyHangzhou 310014P. R. China
| | - Hongliang Duan
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of TechnologyHangzhou 310014P. R. China,State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica (SIMM), Chinese Academy of SciencesShanghai 201203China
| | - Lan Tang
- College of Pharmaceutical Sciences, Zhejiang University of TechnologyHangzhou 310014P. R. China
| |
Collapse
|
42
|
Nguyen MT, Nguyen T, Tran T. Learning to discover medicines. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022; 16:1-16. [PMID: 36440369 PMCID: PMC9676887 DOI: 10.1007/s41060-022-00371-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 11/05/2022] [Indexed: 11/19/2022]
Abstract
Discovering new medicines is the hallmark of the human endeavor to live a better and longer life. Yet the pace of discovery has slowed down as we need to venture into more wildly unexplored biomedical space to find one that matches today's high standard. Modern AI-enabled by powerful computing, large biomedical databases, and breakthroughs in deep learning offers a new hope to break this loop as AI is rapidly maturing, ready to make a huge impact in the area. In this paper, we review recent advances in AI methodologies that aim to crack this challenge. We organize the vast and rapidly growing literature on AI for drug discovery into three relatively stable sub-areas: (a) representation learning over molecular sequences and geometric graphs; (b) data-driven reasoning where we predict molecular properties and their binding, optimize existing compounds, generate de novo molecules, and plan the synthesis of target molecules; and (c) knowledge-based reasoning where we discuss the construction and reasoning over biomedical knowledge graphs. We will also identify open challenges and chart possible research directions for the years to come.
Collapse
Affiliation(s)
- Minh-Tri Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Burwood, VIC Australia
| | - Thin Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Burwood, VIC Australia
| | - Truyen Tran
- Applied Artificial Intelligence Institute, Deakin University, Burwood, VIC Australia
| |
Collapse
|
43
|
Kumar R, Sharma A, Alexiou A, Ashraf GM. Artificial Intelligence in De novo Drug Design: Are We Still There? Curr Top Med Chem 2022; 22:2483-2492. [PMID: 36263480 DOI: 10.2174/1568026623666221017143244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 09/06/2022] [Accepted: 09/15/2022] [Indexed: 01/20/2023]
Abstract
BACKGROUND The artificial intelligence (AI)-assisted design of drug candidates with novel structures and desired properties has received significant attention in the recent past, so related areas of forward prediction that aim to discover chemical matters worth synthesizing and further experimental investigation. OBJECTIVES The purpose behind developing AI-driven models is to explore the broader chemical space and suggest new drug candidate scaffolds with promising therapeutic value. Moreover, it is anticipated that such AI-based models may not only significantly reduce the cost and time but also decrease the attrition rate of drug candidates that fail to reach the desirable endpoints at the final stages of drug development. In an attempt to develop AI-based models for de novo drug design, numerous methods have been proposed by various study groups by applying machine learning and deep learning algorithms to chemical datasets. However, there are many challenges in obtaining accurate predictions, and real breakthroughs in de novo drug design are still scarce. METHODS In this review, we explore the recent trends in developing AI-based models for de novo drug design to assess the current status, challenges, and opportunities in the field. CONCLUSION The consistently improved AI algorithms and the abundance of curated training chemical data indicate that AI-based de novo drug design should perform better than the current models. Improvements in the performance are warranted to obtain better outcomes in the form of potential drug candidates, which can perform well in in vivo conditions, especially in the case of more complex diseases.
Collapse
Affiliation(s)
- Rajnish Kumar
- Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Uttar Pradesh, India
| | - Anju Sharma
- Department of Applied Science, Indian Institute of Information Technology, Allahabad, Uttar Pradesh, India
| | - Athanasios Alexiou
- Novel Global Community Educational Foundation, Hebersham, 2770 NSW, Australia.,AFNP Med Austria, 1010 Wien, Austria
| | - Ghulam Md Ashraf
- Pre-Clinical Research Unit (PCRU), King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia.,Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
44
|
Deep generative molecular design reshapes drug discovery. Cell Rep Med 2022; 3:100794. [PMID: 36306797 PMCID: PMC9797947 DOI: 10.1016/j.xcrm.2022.100794] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 08/05/2022] [Accepted: 09/30/2022] [Indexed: 11/05/2022]
Abstract
Recent advances and accomplishments of artificial intelligence (AI) and deep generative models have established their usefulness in medicinal applications, especially in drug discovery and development. To correctly apply AI, the developer and user face questions such as which protocols to consider, which factors to scrutinize, and how the deep generative models can integrate the relevant disciplines. This review summarizes classical and newly developed AI approaches, providing an updated and accessible guide to the broad computational drug discovery and development community. We introduce deep generative models from different standpoints and describe the theoretical frameworks for representing chemical and biological structures and their applications. We discuss the data and technical challenges and highlight future directions of multimodal deep generative models for accelerating drug discovery.
Collapse
|
45
|
Interpretable Machine Learning Models for Molecular Design of Tyrosine Kinase Inhibitors Using Variational Autoencoders and Perturbation-Based Approach of Chemical Space Exploration. Int J Mol Sci 2022; 23:ijms231911262. [PMID: 36232566 PMCID: PMC9569663 DOI: 10.3390/ijms231911262] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/21/2022] [Accepted: 09/21/2022] [Indexed: 11/17/2022] Open
Abstract
In the current study, we introduce an integrative machine learning strategy for the autonomous molecular design of protein kinase inhibitors using variational autoencoders and a novel cluster-based perturbation approach for exploration of the chemical latent space. The proposed strategy combines autoencoder-based embedding of small molecules with a cluster-based perturbation approach for efficient navigation of the latent space and a feature-based kinase inhibition likelihood classifier that guides optimization of the molecular properties and targeted molecular design. In the proposed generative approach, molecules sharing similar structures tend to cluster in the latent space, and interpolating between two molecules in the latent space enables smooth changes in the molecular structures and properties. The results demonstrated that the proposed strategy can efficiently explore the latent space of small molecules and kinase inhibitors along interpretable directions to guide the generation of novel family-specific kinase molecules that display a significant scaffold diversity and optimal biochemical properties. Through assessment of the latent-based and chemical feature-based binary and multiclass classifiers, we developed a robust probabilistic evaluator of kinase inhibition likelihood that is specifically tailored to guide the molecular design of novel SRC kinase molecules. The generated molecules originating from LCK and ABL1 kinase inhibitors yielded ~40% of novel and valid SRC kinase compounds with high kinase inhibition likelihood probability values (p > 0.75) and high similarity (Tanimoto coefficient > 0.6) to the known SRC inhibitors. By combining the molecular perturbation design with the kinase inhibition likelihood analysis and similarity assessments, we showed that the proposed molecular design strategy can produce novel valid molecules and transform known inhibitors of different kinase families into potential chemical probes of the SRC kinase with excellent physicochemical profiles and high similarity to the known SRC kinase drugs. The results of our study suggest that task-specific manipulation of a biased latent space may be an important direction for more effective task-oriented and target-specific autonomous chemical design models.
Collapse
|
46
|
Mehta S, Goel M, Priyakumar UD. MO-MEMES: A method for accelerating virtual screening using multi-objective Bayesian optimization. Front Med (Lausanne) 2022; 9:916481. [PMID: 36213671 PMCID: PMC9537730 DOI: 10.3389/fmed.2022.916481] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 08/29/2022] [Indexed: 11/13/2022] Open
Abstract
The pursuit of potential inhibitors for novel targets has become a very important problem especially over the last 2 years with the world in the midst of the COVID-19 pandemic. This entails performing high throughput screening exercises on drug libraries to identify potential “hits”. These hits are identified using analysis of their physical properties like binding affinity to the target receptor, octanol-water partition coefficient (LogP) and more. However, drug libraries can be extremely large and it is infeasible to calculate and analyze the physical properties for each of those molecules within acceptable time and moreover, each molecule must possess a multitude of properties apart from just the binding affinity. To address this problem, in this study, we propose an extension to the Machine learning framework for Enhanced MolEcular Screening (MEMES) framework for multi-objective Bayesian optimization. This approach is capable of identifying over 90% of the most desirable molecules with respect to all required properties while explicitly calculating the values of each of those properties on only 6% of the entire drug library. This framework would provide an immense boost in identifying potential hits that possess all properties required for a drug molecules.
Collapse
|
47
|
García-Ortegón M, Simm GNC, Tripp AJ, Hernández-Lobato JM, Bender A, Bacallado S. DOCKSTRING: Easy Molecular Docking Yields Better Benchmarks for Ligand Design. J Chem Inf Model 2022; 62:3486-3502. [PMID: 35849793 PMCID: PMC9364321 DOI: 10.1021/acs.jcim.1c01334] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Indexed: 01/05/2023]
Abstract
The field of machine learning for drug discovery is witnessing an explosion of novel methods. These methods are often benchmarked on simple physicochemical properties such as solubility or general druglikeness, which can be readily computed. However, these properties are poor representatives of objective functions in drug design, mainly because they do not depend on the candidate compound's interaction with the target. By contrast, molecular docking is a widely applied method in drug discovery to estimate binding affinities. However, docking studies require a significant amount of domain knowledge to set up correctly, which hampers adoption. Here, we present dockstring, a bundle for meaningful and robust comparison of ML models using docking scores. dockstring consists of three components: (1) an open-source Python package for straightforward computation of docking scores, (2) an extensive dataset of docking scores and poses of more than 260,000 molecules for 58 medically relevant targets, and (3) a set of pharmaceutically relevant benchmark tasks such as virtual screening or de novo design of selective kinase inhibitors. The Python package implements a robust ligand and target preparation protocol that allows nonexperts to obtain meaningful docking scores. Our dataset is the first to include docking poses, as well as the first of its size that is a full matrix, thus facilitating experiments in multiobjective optimization and transfer learning. Overall, our results indicate that docking scores are a more realistic evaluation objective than simple physicochemical properties, yielding benchmark tasks that are more challenging and more closely related to real problems in drug discovery.
Collapse
Affiliation(s)
- Miguel García-Ortegón
- Statistical
Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Rd., Cambridge CB3 0WB, United Kingdom
| | - Gregor N. C. Simm
- Department
of Engineering, University of Cambridge, Trumpington St., Cambridge CB2 1PZ, United Kingdom
| | - Austin J. Tripp
- Department
of Engineering, University of Cambridge, Trumpington St., Cambridge CB2 1PZ, United Kingdom
| | | | - Andreas Bender
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield
Rd., Cambridge CB2 1EW, United Kingdom
| | - Sergio Bacallado
- Statistical
Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Rd., Cambridge CB3 0WB, United Kingdom
| |
Collapse
|
48
|
Lim S, Lee S, Piao Y, Choi M, Bang D, Gu J, Kim S. On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach. Comput Struct Biotechnol J 2022; 20:4288-4304. [PMID: 36051875 PMCID: PMC9399946 DOI: 10.1016/j.csbj.2022.07.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 07/29/2022] [Accepted: 07/29/2022] [Indexed: 11/22/2022] Open
Abstract
A large number of chemical compounds are available in databases such as PubChem and ZINC. However, currently known compounds, though large, represent only a fraction of possible compounds, which is known as chemical space. Many of these compounds in the databases are annotated with properties and assay data that can be used for drug discovery efforts. For this goal, a number of machine learning algorithms have been developed and recent deep learning technologies can be effectively used to navigate chemical space, especially for unknown chemical compounds, in terms of drug-related tasks. In this article, we survey how deep learning technologies can model and utilize chemical compound information in a task-oriented way by exploiting annotated properties and assay data in the chemical compounds databases. We first compile what kind of tasks are trying to be accomplished by machine learning methods. Then, we survey deep learning technologies to show their modeling power and current applications for accomplishing drug related tasks. Next, we survey deep learning techniques to address the insufficiency issue of annotated data for more effective navigation of chemical space. Chemical compound information alone may not be powerful enough for drug related tasks, thus we survey what kind of information, such as assay and gene expression data, can be used to improve the prediction power of deep learning models. Finally, we conclude this survey with four important newly developed technologies that are yet to be fully incorporated into computational analysis of chemical information.
Collapse
Affiliation(s)
- Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Sangseon Lee
- Institute of Computer Technology, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - MinGyu Choi
- Department of Chemistry, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Jeonghyeon Gu
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- MOGAM Institute for Biomedical Research, Yong-in 16924, South Korea
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| |
Collapse
|
49
|
Menon D, Ranganathan R. A Generative Approach to Materials Discovery, Design, and Optimization. ACS OMEGA 2022; 7:25958-25973. [PMID: 35936396 PMCID: PMC9352221 DOI: 10.1021/acsomega.2c03264] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 07/11/2022] [Indexed: 05/25/2023]
Abstract
Despite its potential to transform society, materials research suffers from a major drawback: its long research timeline. Recently, machine-learning techniques have emerged as a viable solution to this drawback and have shown accuracies comparable to other computational techniques like density functional theory (DFT) at a fraction of the computational time. One particular class of machine-learning models, known as "generative models", is of particular interest owing to its ability to approximate high-dimensional probability distribution functions, which in turn can be used to generate novel data such as molecular structures by sampling these approximated probability distribution functions. This review article aims to provide an in-depth understanding of the underlying mathematical principles of popular generative models such as recurrent neural networks, variational autoencoders, and generative adversarial networks and discuss their state-of-the-art applications in the domains of biomaterials and organic drug-like materials, energy materials, and structural materials. Here, we discuss a broad range of applications of these models spanning from the discovery of drugs that treat cancer to finding the first room-temperature superconductor and from the discovery and optimization of battery and photovoltaic materials to the optimization of high-entropy alloys. We conclude by presenting a brief outlook of the major challenges that lie ahead for the mainstream usage of these models for materials research.
Collapse
|
50
|
Guo M, Shou W, Makatura L, Erps T, Foshey M, Matusik W. Polygrammar: Grammar for Digital Polymer Representation and Generation. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2022; 9:e2101864. [PMID: 35678650 PMCID: PMC9376847 DOI: 10.1002/advs.202101864] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 12/04/2021] [Indexed: 05/22/2023]
Abstract
Polymers are widely studied materials with diverse properties and applications determined by molecular structures. It is essential to represent these structures clearly and explore the full space of achievable chemical designs. However, existing approaches cannot offer comprehensive design models for polymers because of their inherent scale and structural complexity. Here, a parametric, context-sensitive grammar designed specifically for polymers (PolyGrammar) is proposed. Using the symbolic hypergraph representation and 14 simple production rules, PolyGrammar can represent and generate all valid polyurethane structures. An algorithm is presented to translate any polyurethane structure from the popular Simplified Molecular-Input Line-entry System (SMILES) string format into the PolyGrammar representation. The representative power of PolyGrammar is tested by translating a dataset of over 600 polyurethane samples collected from the literature. Furthermore, it is shown that PolyGrammar can be easily extended to other copolymers and homopolymers. By offering a complete, explicit representation scheme and an explainable generative model with validity guarantees, PolyGrammar takes an essential step toward a more comprehensive and practical system for polymer discovery and exploration. As the first bridge between formal languages and chemistry, PolyGrammar also serves as a critical blueprint to inform the design of similar grammars for other chemistries, including organic and inorganic molecules.
Collapse
Affiliation(s)
- Minghao Guo
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
- CUHK Multimedia LabThe Chinese University of Hong KongSha TinHong Kong
| | - Wan Shou
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Liane Makatura
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Timothy Erps
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Michael Foshey
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Wojciech Matusik
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| |
Collapse
|