1
|
Matsukiyo Y, Tengeiji A, Li C, Yamanishi Y. Transcriptionally Conditional Recurrent Neural Network for De Novo Drug Design. J Chem Inf Model 2024. [PMID: 39049516 DOI: 10.1021/acs.jcim.4c00531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Computational molecular generation methods that generate chemical structures from gene expression profiles have been actively developed for de novo drug design. However, most omics-based methods involve complex models consisting of multiple neural networks, which require pretraining. In this study, we propose a straightforward molecular generation method called GxRNN (gene expression profile-based recurrent neural network), employing a single recurrent neural network (RNN) that necessitates no pretraining for omics-based drug design. Specifically, our method utilizes the desired gene expression profile as input for the RNN, conditioning it to generate molecules likely to induce a similar profile. In a case study involving ten target proteins, GxRNN exhibited superior structural reproducibility of known ligands, surpassing several existing methods. This advancement positions our proposed method as a promising tool for facilitating de novo drug design.
Collapse
Affiliation(s)
- Yuki Matsukiyo
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
- Department of Complex Systems Science, Graduate School of Informatics, Nagoya University, Chikusa, Nagoya, Aichi 464-8601, Japan
| | - Atsushi Tengeiji
- Modality Research Laboratories I, Daiichi Sankyo Co., Ltd., 1-2-58 Hiromachi, Shinagawa, Tokyo 140-8710, Japan
| | - Chen Li
- Department of Complex Systems Science, Graduate School of Informatics, Nagoya University, Chikusa, Nagoya, Aichi 464-8601, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
- Department of Complex Systems Science, Graduate School of Informatics, Nagoya University, Chikusa, Nagoya, Aichi 464-8601, Japan
| |
Collapse
|
2
|
Ivanov SM, Rudik AV, Lagunin AA, Filimonov DA, Poroikov VV. DIGEP-Pred 2.0: A web application for predicting drug-induced cell signaling and gene expression changes. Mol Inform 2024:e202400032. [PMID: 38979651 DOI: 10.1002/minf.202400032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 05/16/2024] [Accepted: 06/14/2024] [Indexed: 07/10/2024]
Abstract
The analysis of drug-induced gene expression profiles (DIGEP) is widely used to estimate the potential therapeutic and adverse drug effects as well as the molecular mechanisms of drug action. However, the corresponding experimental data is absent for many existing drugs and drug-like compounds. To solve this problem, we created the DIGEP-Pred 2.0 web application, which allows predicting DIGEP and potential drug targets by structural formula of drug-like compounds. It is based on the combined use of structure-activity relationships (SARs) and network analysis. SAR models were created using PASS (Prediction of Activity Spectra for Substances) technology for data from the Comparative Toxicogenomics Database (CTD), the Connectivity Map (CMap) for the prediction of DIGEP, and PubChem and ChEMBL for the prediction of molecular mechanisms of action (MoA). Using only the structural formula of a compound, the user can obtain information on potential gene expression changes in several cell lines and drug targets, which are potential master regulators responsible for the observed DIGEP. The mean accuracy of prediction calculated by leave-one-out cross validation was 86.5 % for 13377 genes and 94.8 % for 2932 proteins (CTD data), and it was 97.9 % for 2170 MoAs. SAR models (mean accuracy-87.5 %) were also created for CMap data given on MCF7, PC3, and HL60 cell lines with different threshold values for the logarithm of fold changes: 0.5, 0.7, 1, 1.5, and 2. Additionally, the data on pathways (KEGG, Reactome), biological processes of Gene Ontology, and diseases (DisGeNet) enriched by the predicted genes, together with the estimation of target-master regulators based on OmniPath data, is also provided. DIGEP-Pred 2.0 web application is freely available at https://www.way2drug.com/digep-pred.
Collapse
Affiliation(s)
- Sergey M Ivanov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Pogodinskaya Street, 10 bldg. 8, Moscow, 119121, Russia
- Department of Bioinformatics, Pirogov Russian National Research Medical University, Ostrovityanova Street, 1, Moscow, 117997, Russia
| | - Anastasia V Rudik
- Department of Bioinformatics, Institute of Biomedical Chemistry, Pogodinskaya Street, 10 bldg. 8, Moscow, 119121, Russia
| | - Alexey A Lagunin
- Department of Bioinformatics, Institute of Biomedical Chemistry, Pogodinskaya Street, 10 bldg. 8, Moscow, 119121, Russia
- Department of Bioinformatics, Pirogov Russian National Research Medical University, Ostrovityanova Street, 1, Moscow, 117997, Russia
| | - Dmitry A Filimonov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Pogodinskaya Street, 10 bldg. 8, Moscow, 119121, Russia
| | - Vladimir V Poroikov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Pogodinskaya Street, 10 bldg. 8, Moscow, 119121, Russia
| |
Collapse
|
3
|
Gangwal A, Ansari A, Ahmad I, Azad AK, Wan Sulaiman WMA. Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review. Comput Biol Med 2024; 179:108734. [PMID: 38964243 DOI: 10.1016/j.compbiomed.2024.108734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 06/01/2024] [Accepted: 06/08/2024] [Indexed: 07/06/2024]
Abstract
Artificial intelligence (AI) has played a vital role in computer-aided drug design (CADD). This development has been further accelerated with the increasing use of machine learning (ML), mainly deep learning (DL), and computing hardware and software advancements. As a result, initial doubts about the application of AI in drug discovery have been dispelled, leading to significant benefits in medicinal chemistry. At the same time, it is crucial to recognize that AI is still in its infancy and faces a few limitations that need to be addressed to harness its full potential in drug discovery. Some notable limitations are insufficient, unlabeled, and non-uniform data, the resemblance of some AI-generated molecules with existing molecules, unavailability of inadequate benchmarks, intellectual property rights (IPRs) related hurdles in data sharing, poor understanding of biology, focus on proxy data and ligands, lack of holistic methods to represent input (molecular structures) to prevent pre-processing of input molecules (feature engineering), etc. The major component in AI infrastructure is input data, as most of the successes of AI-driven efforts to improve drug discovery depend on the quality and quantity of data, used to train and test AI algorithms, besides a few other factors. Additionally, data-gulping DL approaches, without sufficient data, may collapse to live up to their promise. Current literature suggests a few methods, to certain extent, effectively handle low data for better output from the AI models in the context of drug discovery. These are transferring learning (TL), active learning (AL), single or one-shot learning (OSL), multi-task learning (MTL), data augmentation (DA), data synthesis (DS), etc. One different method, which enables sharing of proprietary data on a common platform (without compromising data privacy) to train ML model, is federated learning (FL). In this review, we compare and discuss these methods, their recent applications, and limitations while modeling small molecule data to get the improved output of AI methods in drug discovery. Article also sums up some other novel methods to handle inadequate data.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India.
| | - Azim Ansari
- Computer Aided Drug Design Center, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Gondur, Dhule, 424002, Maharashtra, India.
| | - Abul Kalam Azad
- Faculty of Pharmacy, University College of MAIWP International, Batu Caves, 68100, Kuala Lumpur, Malaysia.
| | | |
Collapse
|
4
|
Tong X, Qu N, Kong X, Ni S, Zhou J, Wang K, Zhang L, Wen Y, Shi J, Zhang S, Li X, Zheng M. Deep representation learning of chemical-induced transcriptional profile for phenotype-based drug discovery. Nat Commun 2024; 15:5378. [PMID: 38918369 PMCID: PMC11199551 DOI: 10.1038/s41467-024-49620-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 06/10/2024] [Indexed: 06/27/2024] Open
Abstract
Artificial intelligence transforms drug discovery, with phenotype-based approaches emerging as a promising alternative to target-based methods, overcoming limitations like lack of well-defined targets. While chemical-induced transcriptional profiles offer a comprehensive view of drug mechanisms, inherent noise often obscures the true signal, hindering their potential for meaningful insights. Here, we highlight the development of TranSiGen, a deep generative model employing self-supervised representation learning. TranSiGen analyzes basal cell gene expression and molecular structures to reconstruct chemical-induced transcriptional profiles with high accuracy. By capturing both cellular and compound information, TranSiGen-derived representations demonstrate efficacy in diverse downstream tasks like ligand-based virtual screening, drug response prediction, and phenotype-based drug repurposing. Notably, in vitro validation of TranSiGen's application in pancreatic cancer drug discovery highlights its potential for identifying effective compounds. We envisage that integrating TranSiGen into the drug discovery and mechanism research holds significant promise for advancing biomedicine.
Collapse
Affiliation(s)
- Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Ning Qu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiangtai Kong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Shengkun Ni
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Jingyi Zhou
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Lingang Laboratory, Shanghai, 200031, China
| | - Kun Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
| | - Lehan Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Yiming Wen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China
| | - Jiangshan Shi
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China.
| |
Collapse
|
5
|
Livne M, Miftahutdinov Z, Tutubalina E, Kuznetsov M, Polykovskiy D, Brundyn A, Jhunjhunwala A, Costa A, Aliper A, Aspuru-Guzik A, Zhavoronkov A. nach0: multimodal natural and chemical languages foundation model. Chem Sci 2024; 15:8380-8389. [PMID: 38846388 PMCID: PMC11151847 DOI: 10.1039/d4sc00966e] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 04/26/2024] [Indexed: 06/09/2024] Open
Abstract
Large Language Models (LLMs) have substantially driven scientific progress in various domains, and many papers have demonstrated their ability to tackle complex problems with creative solutions. Our paper introduces a new foundation model, nach0, capable of solving various chemical and biological tasks: biomedical question answering, named entity recognition, molecular generation, molecular synthesis, attributes prediction, and others. nach0 is a multi-domain and multi-task encoder-decoder LLM pre-trained on unlabeled text from scientific literature, patents, and molecule strings to incorporate a range of chemical and linguistic knowledge. We employed instruction tuning, where specific task-related instructions are utilized to fine-tune nach0 for the final set of tasks. To train nach0 effectively, we leverage the NeMo framework, enabling efficient parallel optimization of both base and large model versions. Extensive experiments demonstrate that our model outperforms state-of-the-art baselines on single-domain and cross-domain tasks. Furthermore, it can generate high-quality outputs in molecular and textual formats, showcasing its effectiveness in multi-domain setups.
Collapse
Affiliation(s)
- Micha Livne
- NVIDIA 2788 San Tomas Expressway Santa Clara 95051 CA USA
| | - Zulfat Miftahutdinov
- Insilico Medicine Canada Inc. 3710-1250 René-Lévesque West Montreal Quebec Canada
| | - Elena Tutubalina
- Insilico Medicine Hong Kong Ltd. Unit 310, 3/F, Building 8W, Phase 2, Hong Kong Science Park, Pak Shek Kok New Territories Hong Kong
| | - Maksim Kuznetsov
- Insilico Medicine Canada Inc. 3710-1250 René-Lévesque West Montreal Quebec Canada
| | - Daniil Polykovskiy
- Insilico Medicine Canada Inc. 3710-1250 René-Lévesque West Montreal Quebec Canada
| | - Annika Brundyn
- NVIDIA 2788 San Tomas Expressway Santa Clara 95051 CA USA
| | | | - Anthony Costa
- NVIDIA 2788 San Tomas Expressway Santa Clara 95051 CA USA
| | - Alex Aliper
- Insilico Medicine AI Ltd. Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City Abu Dhabi United Arab Emirates
| | - Alán Aspuru-Guzik
- University of Toronto Lash Miller Building 80 St. George Street Toronto Ontario Canada
| | - Alex Zhavoronkov
- Insilico Medicine Hong Kong Ltd. Unit 310, 3/F, Building 8W, Phase 2, Hong Kong Science Park, Pak Shek Kok New Territories Hong Kong
| |
Collapse
|
6
|
Liu Y, Yu H, Duan X, Zhang X, Cheng T, Jiang F, Tang H, Ruan Y, Zhang M, Zhang H, Zhang Q. TransGEM: a molecule generation model based on Transformer with gene expression data. Bioinformatics 2024; 40:btae189. [PMID: 38632084 PMCID: PMC11078772 DOI: 10.1093/bioinformatics/btae189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 03/26/2024] [Accepted: 04/16/2024] [Indexed: 04/19/2024] Open
Abstract
MOTIVATION It is difficult to generate new molecules with desirable bioactivity through ligand-based de novo drug design, and receptor-based de novo drug design is constrained by disease target information availability. The combination of artificial intelligence and phenotype-based de novo drug design can generate new bioactive molecules, independent from disease target information. Gene expression profiles can be used to characterize biological phenotypes. The Transformer model can be utilized to capture the associations between gene expression profiles and molecular structures due to its remarkable ability in processing contextual information. RESULTS We propose TransGEM (Transformer-based model from gene expression to molecules), which is a phenotype-based de novo drug design model. A specialized gene expression encoder is used to embed gene expression difference values between diseased cell lines and their corresponding normal tissue cells into TransGEM model. The results demonstrate that the TransGEM model can generate molecules with desirable evaluation metrics and property distributions. Case studies illustrate that TransGEM model can generate structurally novel molecules with good binding affinity to disease target proteins. The majority of genes with high attention scores obtained from TransGEM model are associated with the onset of the disease, indicating the potential of these genes as disease targets. Therefore, this study provides a new paradigm for de novo drug design, and it will promote phenotype-based drug discovery. AVAILABILITY AND IMPLEMENTATION The code is available at https://github.com/hzauzqy/TransGEM.
Collapse
Affiliation(s)
- Yanguang Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Hailong Yu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Xinya Duan
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Xiaomin Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Ting Cheng
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Feng Jiang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Hao Tang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Yao Ruan
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Miao Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Hongyu Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Qingye Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| |
Collapse
|
7
|
Fan W, He Y, Zhu F. RM-GPT: Enhance the comprehensive generative ability of molecular GPT model via LocalRNN and RealFormer. Artif Intell Med 2024; 150:102827. [PMID: 38553166 DOI: 10.1016/j.artmed.2024.102827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 02/26/2024] [Accepted: 02/26/2024] [Indexed: 04/02/2024]
Abstract
Due to the surging of cost, artificial intelligence-assisted de novo drug design has supplanted conventional methods and become an emerging option for drug discovery. Although there have arisen many successful examples of applying generative models to the molecular field, these methods struggle to deal with conditional generation that meet chemists' practical requirements which ask for a controllable process to generate new molecules or optimize basic molecules with appointed conditions. To address this problem, a Recurrent Molecular-Generative Pretrained Transformer model is proposed, supplemented by LocalRNN and Residual Attention Layer Transformer, referred to as RM-GPT. RM-GPT rebuilds GPT model's architecture by incorporating LocalRNN and Residual Attention Layer Transformer so that it is able to extract local information and build connectivity between attention blocks. The incorporation of Transformer in these two modules enables leveraging the parallel computing advantages of multi-head attention mechanisms while extracting local structural information effectively. Through exploring and learning in a large chemical space, RM-GPT absorbs the ability to generate drug-like molecules with conditions in demand, such as desired properties and scaffolds, precisely and stably. RM-GPT achieved better results than SOTA methods on conditional generation.
Collapse
Affiliation(s)
- Wenfeng Fan
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, China.
| | - Yue He
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, China.
| | - Fei Zhu
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, China.
| |
Collapse
|
8
|
Wang C, Ong HH, Chiba S, Rajapakse JC. GLDM: hit molecule generation with constrained graph latent diffusion model. Brief Bioinform 2024; 25:bbae142. [PMID: 38581415 PMCID: PMC10998532 DOI: 10.1093/bib/bbae142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 03/08/2024] [Accepted: 03/03/2024] [Indexed: 04/08/2024] Open
Abstract
Discovering hit molecules with desired biological activity in a directed manner is a promising but profound task in computer-aided drug discovery. Inspired by recent generative AI approaches, particularly Diffusion Models (DM), we propose Graph Latent Diffusion Model (GLDM)-a latent DM that preserves both the effectiveness of autoencoders of compressing complex chemical data and the DM's capabilities of generating novel molecules. Specifically, we first develop an autoencoder to encode the molecular data into low-dimensional latent representations and then train the DM on the latent space to generate molecules inducing targeted biological activity defined by gene expression profiles. Manipulating DM in the latent space rather than the input space avoids complicated operations to map molecule decomposition and reconstruction to diffusion processes, and thus improves training efficiency. Experiments show that GLDM not only achieves outstanding performances on molecular generation benchmarks, but also generates samples with optimal chemical properties and potentials to induce desired biological activity.
Collapse
Affiliation(s)
- Conghao Wang
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore
| | - Hiok Hian Ong
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore
| | - Shunsuke Chiba
- School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University, 21 Nanyang Link, 637371, Singapore
| | - Jagath C Rajapakse
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore
| |
Collapse
|
9
|
Pun FW, Ozerov IV, Zhavoronkov A. AI-powered therapeutic target discovery. Trends Pharmacol Sci 2023; 44:561-572. [PMID: 37479540 DOI: 10.1016/j.tips.2023.06.010] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 06/20/2023] [Accepted: 06/23/2023] [Indexed: 07/23/2023]
Abstract
Disease modeling and target identification are the most crucial initial steps in drug discovery, and influence the probability of success at every step of drug development. Traditional target identification is a time-consuming process that takes years to decades and usually starts in an academic setting. Given its advantages of analyzing large datasets and intricate biological networks, artificial intelligence (AI) is playing a growing role in modern drug target identification. We review recent advances in target discovery, focusing on breakthroughs in AI-driven therapeutic target exploration. We also discuss the importance of striking a balance between novelty and confidence in target selection. An increasing number of AI-identified targets are being validated through experiments and several AI-derived drugs are entering clinical trials; we highlight current limitations and potential pathways for moving forward.
Collapse
Affiliation(s)
- Frank W Pun
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong
| | - Ivan V Ozerov
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong
| | - Alex Zhavoronkov
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong; Insilico Medicine MENA, 6F IRENA Building, Abu Dhabi, United Arab Emirates; Buck Institute for Research on Aging, Novato, CA, USA.
| |
Collapse
|
10
|
Sinha K, Ghosh N, Sil PC. A Review on the Recent Applications of Deep Learning in Predictive Drug Toxicological Studies. Chem Res Toxicol 2023; 36:1174-1205. [PMID: 37561655 DOI: 10.1021/acs.chemrestox.2c00375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Drug toxicity prediction is an important step in ensuring patient safety during drug design studies. While traditional preclinical studies have historically relied on animal models to evaluate toxicity, recent advances in deep-learning approaches have shown great promise in advancing drug safety science and reducing animal use in preclinical studies. However, deep-learning-based approaches also face challenges in handling large biological data sets, model interpretability, and regulatory acceptance. In this review, we provide an overview of recent developments in deep-learning-based approaches for predicting drug toxicity, highlighting their potential advantages over traditional methods and the need to address their limitations. Deep-learning models have demonstrated excellent performance in predicting toxicity outcomes from various data sources such as chemical structures, genomic data, and high-throughput screening assays. The potential of deep learning for automated feature engineering is also discussed. This review emphasizes the need to address ethical concerns related to the use of deep learning in drug toxicity studies, including the reduction of animal use and ensuring regulatory acceptance. Furthermore, emerging applications of deep learning in drug toxicity prediction, such as predicting drug-drug interactions and toxicity in rare subpopulations, are highlighted. The integration of deep-learning-based approaches with traditional methods is discussed as a way to develop more reliable and efficient predictive models for drug safety assessment, paving the way for safer and more effective drug discovery and development. Overall, this review highlights the critical role of deep learning in predictive toxicology and drug safety evaluation, emphasizing the need for continued research and development in this rapidly evolving field. By addressing the limitations of traditional methods, leveraging the potential of deep learning for automated feature engineering, and addressing ethical concerns, deep-learning-based approaches have the potential to revolutionize drug toxicity prediction and improve patient safety in drug discovery and development.
Collapse
Affiliation(s)
- Krishnendu Sinha
- Department of Zoology, Jhargram Raj College, Jhargram 721507, West Bengal, India
| | - Nabanita Ghosh
- Department of Zoology, Maulana Azad College, Kolkata 700013, West Bengal, India
| | - Parames C Sil
- Division of Molecular Medicine, Bose Institute, Kolkata 700054, West Bengal, India
| |
Collapse
|
11
|
Pravalphruekul N, Piriyajitakonkij M, Phunchongharn P, Piyayotai S. De Novo Design of Molecules with Multiaction Potential from Differential Gene Expression using Variational Autoencoder. J Chem Inf Model 2023; 63:3999-4011. [PMID: 37347587 DOI: 10.1021/acs.jcim.3c00355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2023]
Abstract
The modulating effect of chemical compounds and therapeutics on gene transcription is well-reported and has been intensively studied for both clinical and research purposes. Emerging research points toward the utility of drug-induced transcriptional alterations in de novo molecular design and highlights the idea of phenotype-matching an expression signature of interest to the structures being designed. In this work, we build an autoencoder-based generative model, BiCEV, around this concept. Our generative autoencoder has demonstrably generated a set of new molecules from gene expression input with notable validity (96%), uniqueness (98%), and internal diversity (0.77). Further, we attempted to validate BiCEV by testing the model on gene-knockdown profiles and combined signatures of synergistic drug pairs. From these investigations, we found the designed structures to be consistently high in collective quality. However, when their similarities to the supposed functional equivalents as determined by shared targets were considered, the findings were somewhat mixed. In spite of this, we believe the generative model merits further development in conjunction with in vitro corroboration to lend itself to being an assistive tool for drug discovery experts, particularly to support the initial stages of hit identification and lead optimization.
Collapse
Affiliation(s)
- Nutaya Pravalphruekul
- Department of Computer Engineering, King Mongkut's University of Technology Thonburi, Bang Mod, Thung Khru, Bangkok 10140, Thailand
| | | | - Phond Phunchongharn
- Department of Computer Engineering, King Mongkut's University of Technology Thonburi, Bang Mod, Thung Khru, Bangkok 10140, Thailand
- Big Data Experience Center, Bang Mod, Thung Khru, Bangkok 10140, Thailand
| | - Supanida Piyayotai
- Big Data Experience Center, Bang Mod, Thung Khru, Bangkok 10140, Thailand
- Learning Institute, King Mongkut's University of Technology Thonburi, Bang Mod, Thung Khru, Bangkok 10140, Thailand
| |
Collapse
|
12
|
Olsen A, Harpaz Z, Ren C, Shneyderman A, Veviorskiy A, Dralkina M, Konnov S, Shcheglova O, Pun FW, Leung GHD, Leung HW, Ozerov IV, Aliper A, Korzinkin M, Zhavoronkov A. Identification of dual-purpose therapeutic targets implicated in aging and glioblastoma multiforme using PandaOmics - an AI-enabled biological target discovery platform. Aging (Albany NY) 2023; 15:2863-2876. [PMID: 37100462 DOI: 10.18632/aging.204678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 04/09/2023] [Indexed: 04/28/2023]
Abstract
Glioblastoma Multiforme (GBM) is the most aggressive and most common primary malignant brain tumor. The age of GBM patients is considered as one of the disease's negative prognostic factors and the mean age of diagnosis is 62 years. A promising approach to preventing both GBM and aging is to identify new potential therapeutic targets that are associated with both conditions as concurrent drivers. In this work, we present a multi-angled approach of identifying targets, which takes into account not only the disease-related genes but also the ones important in aging. For this purpose, we developed three strategies of target identification using the results of correlation analysis augmented with survival data, differences in expression levels and previously published information of aging-related genes. Several studies have recently validated the robustness and applicability of AI-driven computational methods for target identification in both cancer and aging-related diseases. Therefore, we leveraged the AI predictive power of the PandaOmics TargetID engine in order to rank the resulting target hypotheses and prioritize the most promising therapeutic gene targets. We propose cyclic nucleotide gated channel subunit alpha 3 (CNGA3), glutamate dehydrogenase 1 (GLUD1) and sirtuin 1 (SIRT1) as potential novel dual-purpose therapeutic targets to treat aging and GBM.
Collapse
Affiliation(s)
- Andrea Olsen
- The Youth Longevity Association, Sevenoaks, NA, United Kingdom
| | - Zachary Harpaz
- The Youth Longevity Association, Sevenoaks, NA, United Kingdom
- Pine Crest School Science Research Department, Fort Lauderdale, Florida 33334, USA
| | - Christopher Ren
- Shanghai High School International Division, Shanghai 200231, China
| | - Anastasia Shneyderman
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong, China
| | - Alexander Veviorskiy
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong, China
| | - Maria Dralkina
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong, China
| | - Simon Konnov
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong, China
| | - Olga Shcheglova
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong, China
| | - Frank W Pun
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong, China
| | - Geoffrey Ho Duen Leung
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong, China
| | - Hoi Wing Leung
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong, China
| | - Ivan V Ozerov
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong, China
| | - Alex Aliper
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong, China
| | - Mikhail Korzinkin
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong, China
| | - Alex Zhavoronkov
- Insilico Medicine Hong Kong Ltd., Hong Kong Science and Technology Park, New Territories, Hong Kong, China
| |
Collapse
|
13
|
Das D, Chakrabarty B, Srinivasan R, Roy A. Gex2SGen: Designing Drug-like Molecules from Desired Gene Expression Signatures. J Chem Inf Model 2023; 63:1882-1893. [PMID: 36971750 DOI: 10.1021/acs.jcim.2c01301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
Abstract
Drug-induced gene expression profiling provides a lot of useful information covering various aspects of drug discovery and development. Most importantly, this knowledge can be used to discover drugs' mechanisms of action. Recently, deep learning-based drug design methods are in the spotlight due to their ability to explore huge chemical space and design property-optimized target-specific drug molecules. Recent advances in accessibility of open-source drug-induced transcriptomic data along with the ability of deep learning algorithms to understand hidden patterns have opened opportunities for designing drug molecules based on desired gene expression signatures. In this study, we propose a deep learning model, Gex2SGen (Gene Expression 2 SMILES Generation), to generate novel drug-like molecules based on desired gene expression profiles. The model accepts desired gene expression profiles in a cell-specific manner as input and designs drug-like molecules which can elicit the required transcriptomic profile. The model was first tested against individual gene-knocked-out transcriptomic profiles, where the newly designed molecules showed high similarity with known inhibitors of the knocked-out target genes. The model was next applied on a triple negative breast cancer signature profile, where it could generate novel molecules, highly similar to known anti-breast cancer drugs. Overall, this work provides a generalized method, where the method first learned the molecular signature of a given cell due to a specific condition, and designs new small molecules with drug-like properties.
Collapse
|
14
|
Bian Y, Xie XQ. Artificial Intelligent Deep Learning Molecular Generative Modeling of Scaffold-Focused and Cannabinoid CB2 Target-Specific Small-Molecule Sublibraries. Cells 2022; 11:cells11050915. [PMID: 35269537 PMCID: PMC8909864 DOI: 10.3390/cells11050915] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 02/26/2022] [Accepted: 02/26/2022] [Indexed: 02/01/2023] Open
Abstract
Design and generation of high-quality target- and scaffold-specific small molecules is an important strategy for the discovery of unique and potent bioactive drug molecules. To achieve this goal, authors have developed the deep-learning molecule generation model (DeepMGM) and applied it for the de novo molecular generation of scaffold-focused small-molecule libraries. In this study, a recurrent neural network (RNN) using long short-term memory (LSTM) units was trained with drug-like molecules to result in a general model (g-DeepMGM). Sampling practices on indole and purine scaffolds illustrate the feasibility of creating scaffold-focused chemical libraries based on machine intelligence. Subsequently, a target-specific model (t-DeepMGM) for cannabinoid receptor 2 (CB2) was constructed following the transfer learning process of known CB2 ligands. Sampling outcomes can present similar properties to the reported active molecules. Finally, a discriminator was trained and attached to the DeepMGM to result in an in silico molecular design-test circle. Medicinal chemistry synthesis and biological validation was performed to further investigate the generation outcome, showing that XIE9137 was identified as a potential allosteric modulator of CB2. This study demonstrates how recent progress in deep learning intelligence can benefit drug discovery, especially in de novo molecular design and chemical library generation.
Collapse
Affiliation(s)
- Yuemin Bian
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, Pharmacometrics & System Pharmacology PharmacoAnalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA;
- NIH National Center of Excellence for Computational Drug Abuse Research (CDAR), University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Xiang-Qun Xie
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, Pharmacometrics & System Pharmacology PharmacoAnalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA;
- NIH National Center of Excellence for Computational Drug Abuse Research (CDAR), University of Pittsburgh, Pittsburgh, PA 15261, USA
- Drug Discovery Institute, University of Pittsburgh, Pittsburgh, PA 15261, USA
- Departments of Computational Biology and Structural Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15261, USA
- Correspondence:
| |
Collapse
|
15
|
Overhoff B, Falls Z, Mangione W, Samudrala R. A Deep-Learning Proteomic-Scale Approach for Drug Design. Pharmaceuticals (Basel) 2021; 14:1277. [PMID: 34959678 PMCID: PMC8709297 DOI: 10.3390/ph14121277] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 11/27/2021] [Accepted: 11/29/2021] [Indexed: 12/26/2022] Open
Abstract
Computational approaches have accelerated novel therapeutic discovery in recent decades. The Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multitarget therapeutic discovery, repurposing, and design aims to improve their efficacy and safety by employing a holistic approach that computes interaction signatures between every drug/compound and a large library of non-redundant protein structures corresponding to the human proteome fold space. These signatures are compared and analyzed to determine if a given drug/compound is efficacious and safe for a given indication/disease. In this study, we used a deep learning-based autoencoder to first reduce the dimensionality of CANDO-computed drug-proteome interaction signatures. We then employed a reduced conditional variational autoencoder to generate novel drug-like compounds when given a target encoded "objective" signature. Using this approach, we designed compounds to recreate the interaction signatures for twenty approved and experimental drugs and showed that 16/20 designed compounds were predicted to be significantly (p-value ≤ 0.05) more behaviorally similar relative to all corresponding controls, and 20/20 were predicted to be more behaviorally similar relative to a random control. We further observed that redesigns of objectives developed via rational drug design performed significantly better than those derived from natural sources (p-value ≤ 0.05), suggesting that the model learned an abstraction of rational drug design. We also show that the designed compounds are structurally diverse and synthetically feasible when compared to their respective objective drugs despite consistently high predicted behavioral similarity. Finally, we generated new designs that enhanced thirteen drugs/compounds associated with non-small cell lung cancer and anti-aging properties using their predicted proteomic interaction signatures. his study represents a significant step forward in automating holistic therapeutic design with machine learning, enabling the rapid generation of novel, effective, and safe drug leads for any indication.
Collapse
Affiliation(s)
| | | | | | - Ram Samudrala
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY 14203, USA; (B.O.); (Z.F.); (W.M.)
| |
Collapse
|
16
|
Sousa T, Correia J, Pereira V, Rocha M. Generative Deep Learning for Targeted Compound Design. J Chem Inf Model 2021; 61:5343-5361. [PMID: 34699719 DOI: 10.1021/acs.jcim.0c01496] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
In the past few years, de novo molecular design has increasingly been using generative models from the emergent field of Deep Learning, proposing novel compounds that are likely to possess desired properties or activities. De novo molecular design finds applications in different fields ranging from drug discovery and materials sciences to biotechnology. A panoply of deep generative models, including architectures as Recurrent Neural Networks, Autoencoders, and Generative Adversarial Networks, can be trained on existing data sets and provide for the generation of novel compounds. Typically, the new compounds follow the same underlying statistical distributions of properties exhibited on the training data set Additionally, different optimization strategies, including transfer learning, Bayesian optimization, reinforcement learning, and conditional generation, can direct the generation process toward desired aims, regarding their biological activities, synthesis processes or chemical features. Given the recent emergence of these technologies and their relevance, this work presents a systematic and critical review on deep generative models and related optimization methods for targeted compound design, and their applications.
Collapse
Affiliation(s)
- Tiago Sousa
- Centre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, Portugal
| | - João Correia
- Centre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, Portugal
| | - Vítor Pereira
- Centre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, Portugal
| |
Collapse
|
17
|
Deep Learning Applied to Ligand-Based De Novo Drug Design. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:273-299. [PMID: 34731474 DOI: 10.1007/978-1-0716-1787-8_12] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
In the latest years, the application of deep generative models to suggest virtual compounds is becoming a new and powerful tool in drug discovery projects. The idea behind this review is to offer an updated view on de novo design approaches based on artificial intelligent (AI) algorithms, with a particular focus on ligand-based methods. We start this review by reporting a brief overview of the most relevant de novo design approaches developed before the use of AI techniques. We then describe the nowadays most common neural network architectures employed in ligand-based de novo design, together with an up-to-date list of more than 100 deep generative models found in the literature (2017-2020). In order to show how deep generative approaches are applied into drug discovery context, we report all the now available studies in which generated compounds have been synthetized and their biological activity tested. Finally, we discuss what we envisage as beneficial future directions for further application of deep generative models in de novo drug design.
Collapse
|
18
|
Sun AM, Hoffman T, Luu BQ, Ashammakhi N, Li S. Application of lung microphysiological systems to COVID-19 modeling and drug discovery: a review. Biodes Manuf 2021; 4:757-775. [PMID: 34178414 PMCID: PMC8213042 DOI: 10.1007/s42242-021-00136-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 05/13/2021] [Indexed: 01/08/2023]
Abstract
There is a pressing need for effective therapeutics for coronavirus disease 2019 (COVID-19), the respiratory disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus. The process of drug development is a costly and meticulously paced process, where progress is often hindered by the failure of initially promising leads. To aid this challenge, in vitro human microphysiological systems need to be refined and adapted for mechanistic studies and drug screening, thereby saving valuable time and resources during a pandemic crisis. The SARS-CoV-2 virus attacks the lung, an organ where the unique three-dimensional (3D) structure of its functional units is critical for proper respiratory function. The in vitro lung models essentially recapitulate the distinct tissue structure and the dynamic mechanical and biological interactions between different cell types. Current model systems include Transwell, organoid and organ-on-a-chip or microphysiological systems (MPSs). We review models that have direct relevance toward modeling the pathology of COVID-19, including the processes of inflammation, edema, coagulation, as well as lung immune function. We also consider the practical issues that may influence the design and fabrication of MPS. The role of lung MPS is addressed in the context of multi-organ models, and it is discussed how high-throughput screening and artificial intelligence can be integrated with lung MPS to accelerate drug development for COVID-19 and other infectious diseases.
Collapse
Affiliation(s)
- Argus M. Sun
- Department of Bioengineering, Samueli School of Engineering, University of California - Los Angeles, 420 Westwood Plaza 5121 Engineering V University of California, Los Angeles, CA 90095-1600 USA
- UC San Diego Healthcare, UCSD, La Jolla, CA 92037 USA
| | - Tyler Hoffman
- Department of Bioengineering, Samueli School of Engineering, University of California - Los Angeles, 420 Westwood Plaza 5121 Engineering V University of California, Los Angeles, CA 90095-1600 USA
| | - Bao Q. Luu
- Pulmonary Diseases and Critical Care, Scripps Green Hospital, Scripps Health, La Jolla, CA 92037 USA
| | - Nureddin Ashammakhi
- Department of Bioengineering, Samueli School of Engineering, University of California - Los Angeles, 420 Westwood Plaza 5121 Engineering V University of California, Los Angeles, CA 90095-1600 USA
- Department of Biomedical Engineering, College of Engineering, Michigan State University, East Lansing, MI 48824 USA
| | - Song Li
- Department of Bioengineering, Samueli School of Engineering, University of California - Los Angeles, 420 Westwood Plaza 5121 Engineering V University of California, Los Angeles, CA 90095-1600 USA
- Department of Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA 90095 USA
| |
Collapse
|
19
|
Bian Y, Xie XQ. Generative chemistry: drug discovery with deep learning generative models. J Mol Model 2021; 27:71. [PMID: 33543405 PMCID: PMC10984615 DOI: 10.1007/s00894-021-04674-8] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 01/13/2021] [Indexed: 12/15/2022]
Abstract
The de novo design of molecular structures using deep learning generative models introduces an encouraging solution to drug discovery in the face of the continuously increased cost of new drug development. From the generation of original texts, images, and videos, to the scratching of novel molecular structures the creativity of deep learning generative models exhibits the height machine intelligence can achieve. The purpose of this paper is to review the latest advances in generative chemistry which relies on generative modeling to expedite the drug discovery process. This review starts with a brief history of artificial intelligence in drug discovery to outline this emerging paradigm. Commonly used chemical databases, molecular representations, and tools in cheminformatics and machine learning are covered as the infrastructure for generative chemistry. The detailed discussions on utilizing cutting-edge generative architectures, including recurrent neural network, variational autoencoder, adversarial autoencoder, and generative adversarial network for compound generation are focused. Challenges and future perspectives follow.
Collapse
Affiliation(s)
- Yuemin Bian
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, USA
- NIH National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, Pittsburgh, PA, 15261, USA
| | - Xiang-Qun Xie
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
- NIH National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
- Drug Discovery Institute, University of Pittsburgh, 335 Sutherland Drive, 206 Salk Pavilion, Pittsburgh, PA, 15261, USA.
- Departments of Computational Biology and Structural Biology, School of Medicine, University of Pittsburgh, PA, 15261, Pittsburgh, USA.
| |
Collapse
|
20
|
Vanhaelen Q, Lin YC, Zhavoronkov A. The Advent of Generative Chemistry. ACS Med Chem Lett 2020; 11:1496-1505. [PMID: 32832015 PMCID: PMC7429972 DOI: 10.1021/acsmedchemlett.0c00088] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 07/14/2020] [Indexed: 12/12/2022] Open
Abstract
Generative adversarial networks (GANs), first published in 2014, are among the most important concepts in modern artificial intelligence (AI). Bridging deep learning and game theory, GANs are used to generate or "imagine" new objects with desired properties. Since 2016, multiple GANs with reinforcement learning (RL) have been successfully applied in pharmacology for de novo molecular design. Those techniques aim at a more efficient use of the data and a better exploration of the chemical space. We review recent advances for the generation of novel molecules with desired properties with a focus on the applications of GANs, RL, and related techniques. We also discuss the current limitations and challenges in the new growing field of generative chemistry.
Collapse
Affiliation(s)
- Quentin Vanhaelen
- Insilico
Medicine Hong Kong Ltd, Pak Shek Kok, New Territories, Hong Kong
| | - Yen-Chu Lin
- Insilico
Medicine Hong Kong Ltd, Pak Shek Kok, New Territories, Hong Kong
- Insilico
Taiwan, Taipei City 115, Taiwan, R.O.C
| | - Alex Zhavoronkov
- Insilico
Medicine Hong Kong Ltd, Pak Shek Kok, New Territories, Hong Kong
| |
Collapse
|