1
|
Xu C, Zheng L, Fan Q, Liu Y, Zeng C, Ning X, Liu H, Du K, Lu T, Chen Y, Zhang Y. Progress in the application of artificial intelligence in molecular generation models based on protein structure. Eur J Med Chem 2024; 277:116735. [PMID: 39098131 DOI: 10.1016/j.ejmech.2024.116735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 07/12/2024] [Accepted: 07/30/2024] [Indexed: 08/06/2024]
Abstract
The molecular generation models based on protein structures represent a cutting-edge research direction in artificial intelligence-assisted drug discovery. This article aims to comprehensively summarize the research methods and developments by analyzing a series of novel molecular generation models predicated on protein structures. Initially, we categorize the molecular generation models based on protein structures and highlight the architectural frameworks utilized in these models. Subsequently, we detail the design and implementation of protein structure-based molecular generation models by introducing different specific examples. Lastly, we outline the current opportunities and challenges encountered in this field, intending to offer guidance and a referential framework for developing and studying new models in related fields in the future.
Collapse
Affiliation(s)
- Chengcheng Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Lidan Zheng
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Qing Fan
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Yingxu Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Chen Zeng
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Xiangzhen Ning
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Ke Du
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China; State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China.
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| |
Collapse
|
2
|
Tong WH, Wang SQ, Chen GY, Li DX, Wang YS, Zhao LM, Yang Y. Characterization of the structural and molecular interactions of Ferulic acid ethyl ester with human serum albumin and Lysozyme through multi-methods. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 320:124549. [PMID: 38870694 DOI: 10.1016/j.saa.2024.124549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 05/22/2024] [Accepted: 05/26/2024] [Indexed: 06/15/2024]
Abstract
Ferulic acid ethyl ester (FAEE) is an essential raw material for the formulation of drugs for cardiovascular and cerebrovascular diseases and leukopenia. It is also used as a fixed aroma agent for food production due to its high pharmacological activity. In this study, the interaction of FAEE with Human serum albumin (HSA) and Lysozyme (LZM) was characterized by multi-spectrum and molecular dynamics simulations at four different temperatures. Additionally, the quenching mechanism of FAEE-HSA and FAEE-LZM were explored. Meanwhile, the binding constants, binding sites, thermodynamic parameters, molecular dynamics, molecular docking binding energy, and the influence of metal ions in the system were evaluated. The results of Synchronous fluorescence spectroscopy, UV-vis spectroscopy, CD, three-dimensional fluorescence spectrum, and resonance light scattering showed that the microenvironment of HSA and LZM and the protein conformation changed in the presence of FAEE. Furthermore, the effects of some common metal ions on the binding constants of FAEE-HSA and FAEE-LZM were investigated. Overall, the experimental results provide a theoretical basis for promoting the application of FAEE in the cosmetics, food, and pharmaceutical industries and significant guidance for food safety, drug design, and development.
Collapse
Affiliation(s)
- Wen-Hua Tong
- School of Biological Engineering, Sichuan University of Science and Engineering, Yibin 644000, China; Key Laboratory of Brewing Biotechnology and Application, Yibin 644000, China.
| | - Shu-Qin Wang
- School of Biological Engineering, Sichuan University of Science and Engineering, Yibin 644000, China
| | - Guan-Ying Chen
- School of Biological Engineering, Sichuan University of Science and Engineering, Yibin 644000, China
| | - Dong-Xu Li
- School of Biological Engineering, Sichuan University of Science and Engineering, Yibin 644000, China
| | - Yan-Sen Wang
- School of Biological Engineering, Sichuan University of Science and Engineering, Yibin 644000, China
| | - Li-Ming Zhao
- School of Biotechnology, East China University of Science and Technology, Shanghai 200000, China
| | - Ying Yang
- School of Biological Engineering, Sichuan University of Science and Engineering, Yibin 644000, China.
| |
Collapse
|
3
|
Wu K, Xia Y, Deng P, Liu R, Zhang Y, Guo H, Cui Y, Pei Q, Wu L, Xie S, Chen S, Lu X, Hu S, Wu J, Chan CK, Chen S, Zhou L, Yu N, Chen E, Liu H, Guo J, Qin T, Liu TY. TamGen: drug design with target-aware molecule generation through a chemical language model. Nat Commun 2024; 15:9360. [PMID: 39472567 PMCID: PMC11522292 DOI: 10.1038/s41467-024-53632-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 10/14/2024] [Indexed: 11/02/2024] Open
Abstract
Generative drug design facilitates the creation of compounds effective against pathogenic target proteins. This opens up the potential to discover novel compounds within the vast chemical space and fosters the development of innovative therapeutic strategies. However, the practicality of generated molecules is often limited, as many designs focus on a narrow set of drug-related properties, failing to improve the success rate of subsequent drug discovery process. To overcome these challenges, we develop TamGen, a method that employs a GPT-like chemical language model and enables target-aware molecule generation and compound refinement. We demonstrate that the compounds generated by TamGen have improved molecular quality and viability. Additionally, we have integrated TamGen into a drug discovery pipeline and identified 14 compounds showing compelling inhibitory activity against the Tuberculosis ClpP protease, with the most effective compound exhibiting a half maximal inhibitory concentration (IC50) of 1.9 μM. Our findings underscore the practical potential and real-world applicability of generative drug design approaches, paving the way for future advancements in the field.
Collapse
Affiliation(s)
- Kehan Wu
- University of Science and Technology of China, Hefei, China
| | - Yingce Xia
- Microsoft Research AI for Science, Beijing, China.
| | - Pan Deng
- Microsoft Research AI for Science, Beijing, China
| | - Renhe Liu
- Global Health Drug Discovery Institute, Beijing, China
| | - Yuan Zhang
- Global Health Drug Discovery Institute, Beijing, China
| | - Han Guo
- Global Health Drug Discovery Institute, Beijing, China
| | - Yumeng Cui
- Global Health Drug Discovery Institute, Beijing, China
| | - Qizhi Pei
- Renmin University of China, Beijing, China
| | - Lijun Wu
- Microsoft Research AI for Science, Beijing, China
| | - Shufang Xie
- Microsoft Research AI for Science, Beijing, China
| | - Si Chen
- Global Health Drug Discovery Institute, Beijing, China
| | - Xi Lu
- Global Health Drug Discovery Institute, Beijing, China
| | - Song Hu
- Global Health Drug Discovery Institute, Beijing, China
| | - Jinzhi Wu
- Global Health Drug Discovery Institute, Beijing, China
| | - Chi-Kin Chan
- Global Health Drug Discovery Institute, Beijing, China
| | - Shawn Chen
- Global Health Drug Discovery Institute, Beijing, China
| | | | - Nenghai Yu
- University of Science and Technology of China, Hefei, China
| | - Enhong Chen
- University of Science and Technology of China, Hefei, China
| | - Haiguang Liu
- Microsoft Research AI for Science, Beijing, China
| | - Jinjiang Guo
- Global Health Drug Discovery Institute, Beijing, China.
| | - Tao Qin
- Microsoft Research AI for Science, Beijing, China.
| | - Tie-Yan Liu
- Microsoft Research AI for Science, Beijing, China
| |
Collapse
|
4
|
Cheng B. Response Matching for Generating Materials and Molecules. J Chem Theory Comput 2024; 20:9259-9266. [PMID: 39365029 PMCID: PMC11500275 DOI: 10.1021/acs.jctc.4c00998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 09/22/2024] [Accepted: 09/24/2024] [Indexed: 10/05/2024]
Abstract
Diffusion models have recently emerged as powerful tools for the generation of new molecular and material structures. The key insight is that the noise in these models is related to the response of the atoms to displacement, and the denoising step is thus analogous to the geometry relaxation of atomistic systems starting from a random structure. Building on this, we present a generative method called Response Matching (RM), which leverages the fact that each stable material or molecule exists at the minimum of its potential energy surface. Any perturbation induces a response in energy and stress, driving the structure back to equilibrium. Matching this response is closely related to score matching in diffusion models. Another important aspect of state-of-the-art diffusion models is the incorporation of physical symmetries such as translation, rotation, and periodicity. RM employs a machine learning interatomic potential and random structure search as the denoising model, inherently respecting these symmetries and exploiting the locality of atomic interactions. RM handles both molecules and bulk materials under the same framework. Its efficiency and generalization are demonstrated on three systems: a small organic molecular data set, stable crystals from the Materials Project, and one-shot learning on a single diamond configuration.
Collapse
Affiliation(s)
- Bingqing Cheng
- Department
of Chemistry, University of California, Berkeley, California 94720, United States
- The
Institute of Science and Technology Austria, Am Campus 1, 3400 Klosterneuburg, Austria
| |
Collapse
|
5
|
Wang Z, Liu Z, Zhang W, Li Y, Feng Y, Lv S, Diao H, Luo Z, Yan P, He M, Li X. AptaDiff: de novo design and optimization of aptamers based on diffusion models. Brief Bioinform 2024; 25:bbae517. [PMID: 39431516 PMCID: PMC11491854 DOI: 10.1093/bib/bbae517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 07/05/2024] [Accepted: 10/05/2024] [Indexed: 10/22/2024] Open
Abstract
Aptamers are single-stranded nucleic acid ligands, featuring high affinity and specificity to target molecules. Traditionally they are identified from large DNA/RNA libraries using $in vitro$ methods, like Systematic Evolution of Ligands by Exponential Enrichment (SELEX). However, these libraries capture only a small fraction of theoretical sequence space, and various aptamer candidates are constrained by actual sequencing capabilities from the experiment. Addressing this, we proposed AptaDiff, the first in silico aptamer design and optimization method based on the diffusion model. Our Aptadiff can generate aptamers beyond the constraints of high-throughput sequencing data, leveraging motif-dependent latent embeddings from variational autoencoder, and can optimize aptamers by affinity-guided aptamer generation according to Bayesian optimization. Comparative evaluations revealed AptaDiff's superiority over existing aptamer generation methods in terms of quality and fidelity across four high-throughput screening data targeting distinct proteins. Moreover, surface plasmon resonance experiments were conducted to validate the binding affinity of aptamers generated through Bayesian optimization for two target proteins. The results unveiled a significant boost of $87.9\%$ and $60.2\%$ in RU values, along with a 3.6-fold and 2.4-fold decrease in KD values for the respective target proteins. Notably, the optimized aptamers demonstrated superior binding affinity compared to top experimental candidates selected through SELEX, underscoring the promising outcomes of our AptaDiff in accelerating the discovery of superior aptamers.
Collapse
Affiliation(s)
- Zhen Wang
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- College of Electrical and Information Engineering, Hunan University, Changsha, 410082 Hunan, China
| | - Ziqi Liu
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- School of Molecular Medicine, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024 Zhejiang, China
| | - Wei Zhang
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
| | - Yanjun Li
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, United States
| | - Yizhen Feng
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310014 Zhejiang, China
| | - Shaokang Lv
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- Department of Chemical Biology, Zhejiang University of Technology, Huzhou, 313200 Zhejiang, China
| | - Han Diao
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- Department of Chemical Biology, Zhejiang University of Technology, Huzhou, 313200 Zhejiang, China
| | - Zhaofeng Luo
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
| | - Pengju Yan
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- ElasticMind Inc, Hangzhou, 310018 Zhejiang, China
| | - Min He
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- College of Electrical and Information Engineering, Hunan University, Changsha, 410082 Hunan, China
| | - Xiaolin Li
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- ElasticMind Inc, Hangzhou, 310018 Zhejiang, China
| |
Collapse
|
6
|
Yang Y, Chen G, Li J, Li J, Zhang O, Zhang X, Li L, Hao J, Wang E, Heng PA. Enabling target-aware molecule generation to follow multi objectives with Pareto MCTS. Commun Biol 2024; 7:1074. [PMID: 39223327 PMCID: PMC11368924 DOI: 10.1038/s42003-024-06746-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 08/16/2024] [Indexed: 09/04/2024] Open
Abstract
Target-aware drug discovery has greatly accelerated the drug discovery process to design small-molecule ligands with high binding affinity to disease-related protein targets. Conditioned on targeted proteins, previous works utilize various kinds of deep generative models and have shown great potential in generating molecules with strong protein-ligand binding interactions. However, beyond binding affinity, effective drug molecules must manifest other essential properties such as high drug-likeness, which are not explicitly addressed by current target-aware generative methods. In this article, aiming to bridge the gap of multi-objective target-aware molecule generation in the field of deep learning-based drug discovery, we propose ParetoDrug, a Pareto Monte Carlo Tree Search (MCTS) generation algorithm. ParetoDrug searches molecules on the Pareto Front in chemical space using MCTS to enable synchronous optimization of multiple properties. Specifically, ParetoDrug utilizes pretrained atom-by-atom autoregressive generative models for the exploration guidance to desired molecules during MCTS searching. Besides, when selecting the next atom symbol, a scheme named ParetoPUCT is proposed to balance exploration and exploitation. Benchmark experiments and case studies demonstrate that ParetoDrug is highly effective in traversing the large and complex chemical space to discover novel compounds with satisfactory binding affinities and drug-like properties for various multi-objective target-aware drug discovery tasks.
Collapse
Affiliation(s)
- Yaodong Yang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | | | - Jinpeng Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | | | | | | | | | - Jianye Hao
- Noah's Ark Lab, Huawei, Shenzhen, China.
| | | | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
7
|
Weller J, Rohs R. Structure-Based Drug Design with a Deep Hierarchical Generative Model. J Chem Inf Model 2024; 64:6450-6463. [PMID: 39058534 PMCID: PMC11350878 DOI: 10.1021/acs.jcim.4c01193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 07/16/2024] [Accepted: 07/17/2024] [Indexed: 07/28/2024]
Abstract
Recently, the remarkable growth of available crystal structure data and libraries of commercially available or readily synthesizable molecules have unlocked previously inaccessible regions of chemical space for drug development. Paired with improvements in virtual ligand screening methods, these expanded libraries are having a notable impact on early drug design efforts. Yet screening-based methods still face scalability limits, due to computational constraints and the sheer scale of drug-like space. Machine learning approaches are overcoming these limitations by learning the fundamental intra- and intermolecular relationships in drug-target systems from existing data. Here, we introduce DrugHIVE, a deep hierarchical variational autoencoder that outperforms state-of-the-art autoregressive and diffusion-based methods in both speed and performance on common generative benchmarks. DrugHIVE's hierarchical design enables improved control over molecular generation. Its capabilities include dramatically increasing virtual screening efficiency and accelerating a wide range of common drug design tasks, including de novo generation, molecular optimization, scaffold hopping, linker design, and high-throughput pattern replacement. Our highly scalable method can even be applied to receptors with high-confidence AlphaFold-predicted structures, extending the ability to generate high-quality drug-like molecules to a majority of the unsolved human proteome.
Collapse
Affiliation(s)
- Jesse
A. Weller
- Department
of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, United States
- Department
of Physics and Astronomy, University of
Southern California, Los Angeles, California 90089, United States
| | - Remo Rohs
- Department
of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, United States
- Department
of Physics and Astronomy, University of
Southern California, Los Angeles, California 90089, United States
- Department
of Chemistry, University of Southern California, Los Angeles, California 90089, United States
- Thomas
Lord Department of Computer Science, University
of Southern California, Los Angeles, California 90089, United States
| |
Collapse
|
8
|
Cremer J, Le T, Noé F, Clevert DA, Schütt KT. PILOT: equivariant diffusion for pocket-conditioned de novo ligand generation with multi-objective guidance via importance sampling. Chem Sci 2024:d4sc03523b. [PMID: 39211741 PMCID: PMC11348832 DOI: 10.1039/d4sc03523b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024] Open
Abstract
The generation of ligands that both are tailored to a given protein pocket and exhibit a range of desired chemical properties is a major challenge in structure-based drug design. Here, we propose an in silico approach for the de novo generation of 3D ligand structures using the equivariant diffusion model PILOT, combining pocket conditioning with a large-scale pre-training and property guidance. Its multi-objective trajectory-based importance sampling strategy is designed to direct the model towards molecules that not only exhibit desired characteristics such as increased binding affinity for a given protein pocket but also maintains high synthetic accessibility. This ensures the practicality of sampled molecules, thus maximizing their potential for the drug discovery pipeline. PILOT significantly outperforms existing methods across various metrics on the common benchmark dataset CrossDocked2020. Moreover, we employ PILOT to generate novel ligands for unseen protein pockets from the Kinodata-3D dataset, which encompasses a substantial portion of the human kinome. The generated structures exhibit predicted IC50 values indicative of potent biological activity, which highlights the potential of PILOT as a powerful tool for structure-based drug design.
Collapse
Affiliation(s)
- Julian Cremer
- Machine Learning & Computational Sciences, Pfizer Worldwide R&D Berlin Germany
- Computational Science Laboratory, Universitat Pompeu Fabra, PRBB Spain
| | - Tuan Le
- Machine Learning & Computational Sciences, Pfizer Worldwide R&D Berlin Germany
- Department of Mathematics and Computer Science, Freie Universität Berlin Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin Germany
- Microsoft Research AI4Science, Microsoft Berlin Germany
| | - Djork-Arné Clevert
- Machine Learning & Computational Sciences, Pfizer Worldwide R&D Berlin Germany
| | - Kristof T Schütt
- Machine Learning & Computational Sciences, Pfizer Worldwide R&D Berlin Germany
| |
Collapse
|
9
|
Carlsson J, Luttens A. Structure-based virtual screening of vast chemical space as a starting point for drug discovery. Curr Opin Struct Biol 2024; 87:102829. [PMID: 38848655 DOI: 10.1016/j.sbi.2024.102829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/16/2024] [Accepted: 04/21/2024] [Indexed: 06/09/2024]
Abstract
Structure-based virtual screening aims to find molecules forming favorable interactions with a biological macromolecule using computational models of complexes. The recent surge of commercially available chemical space provides the opportunity to search for ligands of therapeutic targets among billions of compounds. This review offers a compact overview of structure-based virtual screens of vast chemical spaces, highlighting successful applications in early drug discovery for therapeutically important targets such as G protein-coupled receptors and viral enzymes. Emphasis is placed on strategies to explore ultra-large chemical libraries and synergies with emerging machine learning techniques. The current opportunities and future challenges of virtual screening are discussed, indicating that this approach will play an important role in the next-generation drug discovery pipeline.
Collapse
Affiliation(s)
- Jens Carlsson
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, BMC, Box 596, SE-751 24 Uppsala, Sweden.
| | - Andreas Luttens
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
10
|
Ibrahim PEGF, Zuccotto F, Zachariae U, Gilbert I, Bodkin M. Accurate prediction of dynamic protein-ligand binding using P-score ranking. J Comput Chem 2024; 45:1762-1778. [PMID: 38647338 DOI: 10.1002/jcc.27370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 03/20/2024] [Accepted: 03/22/2024] [Indexed: 04/25/2024]
Abstract
Protein-ligand binding prediction typically relies on docking methodologies and associated scoring functions to propose the binding mode of a ligand in a biological target. Significant challenges are associated with this approach, including the flexibility of the protein-ligand system, solvent-mediated interactions, and associated entropy changes. In addition, scoring functions are only weakly accurate due to the short time required for calculating enthalpic and entropic binding interactions. The workflow described here attempts to address these limitations by combining supervised molecular dynamics with dynamical averaging quantum mechanics fragment molecular orbital. This combination significantly increased the ability to predict the experimental binding structure of protein-ligand complexes independent from the starting position of the ligands or the binding site conformation. We found that the predictive power could be enhanced by combining the residence time and interaction energies as descriptors in a novel scoring function named the P-score. This is illustrated using six different protein-ligand targets as case studies.
Collapse
Affiliation(s)
- Peter E G F Ibrahim
- Drug Discovery Unit, Division of Biological Chemistry and Drug Discovery, University of Dundee, Dundee, UK
| | - Fabio Zuccotto
- Drug Discovery Unit, Division of Biological Chemistry and Drug Discovery, University of Dundee, Dundee, UK
| | - Ulrich Zachariae
- Drug Discovery Unit, Division of Biological Chemistry and Drug Discovery, University of Dundee, Dundee, UK
| | - Ian Gilbert
- Drug Discovery Unit, Division of Biological Chemistry and Drug Discovery, University of Dundee, Dundee, UK
| | - Mike Bodkin
- Drug Discovery Unit, Division of Biological Chemistry and Drug Discovery, University of Dundee, Dundee, UK
| |
Collapse
|
11
|
Bai Q, Xu T, Huang J, Pérez-Sánchez H. Geometric deep learning methods and applications in 3D structure-based drug design. Drug Discov Today 2024; 29:104024. [PMID: 38759948 DOI: 10.1016/j.drudis.2024.104024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/02/2024] [Accepted: 05/10/2024] [Indexed: 05/19/2024]
Abstract
3D structure-based drug design (SBDD) is considered a challenging and rational way for innovative drug discovery. Geometric deep learning is a promising approach that solves the accurate model training of 3D SBDD through building neural network models to learn non-Euclidean data, such as 3D molecular graphs and manifold data. Here, we summarize geometric deep learning methods and applications that contain 3D molecular representations, equivariant graph neural networks (EGNNs), and six generative model methods [diffusion model, flow-based model, generative adversarial networks (GANs), variational autoencoder (VAE), autoregressive models, and energy-based models]. Our review provides insights into geometric deep learning methods and advanced applications of 3D SBDD that will be of relevance for the drug discovery community.
Collapse
Affiliation(s)
- Qifeng Bai
- School of Basic Medical Sciences, Lanzhou University, Lanzhou 730000, Gansu, PR China.
| | | | - Junzhou Huang
- Department of Computer Science and Engineering, the University of Texas at Arlington, Arlington, TX 76019, USA
| | - Horacio Pérez-Sánchez
- Structural Bioinformatics and High Performance Computing Research Group (BIO-HPC), Computer Engineering Department, UCAM Universidad Católica de Murcia, Murcia 30107, Spain.
| |
Collapse
|
12
|
Wang H, Chen B, Sun H, Zhang Y. Carbon-based molecular properties efficiently predicted by deep learning-based quantum chemical simulation with large language models. Comput Biol Med 2024; 176:108531. [PMID: 38728991 DOI: 10.1016/j.compbiomed.2024.108531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 04/21/2024] [Accepted: 04/28/2024] [Indexed: 05/12/2024]
Abstract
The prediction of thermodynamic properties of carbon-based molecules based on their geometrical conformation using fluctuation and density functional theories has achieved great success in the field of energy chemistry, while the excessive computational cost provides both opportunities and challenges for the integration of machine learning. In this work, a deep learning-based quantum chemical prediction model was constructed for efficient prediction of thermodynamic properties of carbon-based molecules. We constructed a novel framework - encoding the 3D information into a large language model (LLM), which in turn generates a 2D SMILES string, while embedding a learnable encoding designed to preserve the integrity of the original 3D information, providing better structural information for the model. Additionally, we have designed an equivariant learning module to encompass representations of conformations and feature learning for conformational sampling. This framework aims to predict thermodynamic properties more accurately than learning from 2D topology alone, while providing faster computational speeds than conventional simulations. By combining machine learning and quantum chemistry, we pioneer efficient practical applications in the field of energy chemistry. Our model advances the integration of data-driven and physics-based modeling to unlock novel insights into carbon-based molecules.
Collapse
Affiliation(s)
- Haoyu Wang
- University of Shanghai for Science and Technology, Shanghai, China; School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China.
| | - Bin Chen
- University of Shanghai for Science and Technology, Shanghai, China; School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| | - Hangling Sun
- Hengtu Imalligent Technology (Shanghai) Co., Ltd., Shanghai, China
| | - Yuxuan Zhang
- University of Shanghai for Science and Technology, Shanghai, China
| |
Collapse
|
13
|
Ju W, Fang Z, Gu Y, Liu Z, Long Q, Qiao Z, Qin Y, Shen J, Sun F, Xiao Z, Yang J, Yuan J, Zhao Y, Wang Y, Luo X, Zhang M. A Comprehensive Survey on Deep Graph Representation Learning. Neural Netw 2024; 173:106207. [PMID: 38442651 DOI: 10.1016/j.neunet.2024.106207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 01/23/2024] [Accepted: 02/21/2024] [Indexed: 03/07/2024]
Abstract
Graph representation learning aims to effectively encode high-dimensional sparse graph-structured data into low-dimensional dense vectors, which is a fundamental task that has been widely studied in a range of fields, including machine learning and data mining. Classic graph embedding methods follow the basic idea that the embedding vectors of interconnected nodes in the graph can still maintain a relatively close distance, thereby preserving the structural information between the nodes in the graph. However, this is sub-optimal due to: (i) traditional methods have limited model capacity which limits the learning performance; (ii) existing techniques typically rely on unsupervised learning strategies and fail to couple with the latest learning paradigms; (iii) representation learning and downstream tasks are dependent on each other which should be jointly enhanced. With the remarkable success of deep learning, deep graph representation learning has shown great potential and advantages over shallow (traditional) methods, there exist a large number of deep graph representation learning techniques have been proposed in the past decade, especially graph neural networks. In this survey, we conduct a comprehensive survey on current deep graph representation learning algorithms by proposing a new taxonomy of existing state-of-the-art literature. Specifically, we systematically summarize the essential components of graph representation learning and categorize existing approaches by the ways of graph neural network architectures and the most recent advanced learning paradigms. Moreover, this survey also provides the practical and promising applications of deep graph representation learning. Last but not least, we state new perspectives and suggest challenging directions which deserve further investigations in the future.
Collapse
Affiliation(s)
- Wei Ju
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Zheng Fang
- School of Intelligence Science and Technology, Peking University, Beijing, 100871, China
| | - Yiyang Gu
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Zequn Liu
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Qingqing Long
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100086, China
| | - Ziyue Qiao
- Artificial Intelligence Thrust, The Hong Kong University of Science and Technology, Guangzhou, 511453, China
| | - Yifang Qin
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Jianhao Shen
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Fang Sun
- Department of Computer Science, University of California, Los Angeles, 90095, USA
| | - Zhiping Xiao
- Department of Computer Science, University of California, Los Angeles, 90095, USA
| | - Junwei Yang
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Jingyang Yuan
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Yusheng Zhao
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Yifan Wang
- School of Information Technology & Management, University of International Business and Economics, Beijing, 100029, China
| | - Xiao Luo
- Department of Computer Science, University of California, Los Angeles, 90095, USA.
| | - Ming Zhang
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China.
| |
Collapse
|
14
|
Dunn I, Koes DR. Mixed Continuous and Categorical Flow Matching for 3D De Novo Molecule Generation. ARXIV 2024:arXiv:2404.19739v1. [PMID: 38745704 PMCID: PMC11092876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Deep generative models that produce novel molecular structures have the potential to facilitate chemical discovery. Diffusion models currently achieve state of the art performance for 3D molecule generation. In this work, we explore the use of flow matching, a recently proposed generative modeling framework that generalizes diffusion models, for the task of de novo molecule generation. Flow matching provides flexibility in model design; however, the framework is predicated on the assumption of continuously-valued data. 3D de novo molecule generation requires jointly sampling continuous and categorical variables such as atom position and atom type. We extend the flow matching framework to categorical data by constructing flows that are constrained to exist on a continuous representation of categorical data known as the probability simplex. We call this extension SimplexFlow. We explore the use of SimplexFlow for de novo molecule generation. However, we find that, in practice, a simpler approach that makes no accommodations for the categorical nature of the data yields equivalent or superior performance. As a result of these experiments, we present FlowMol, a flow matching model for 3D de novo generative model that achieves improved performance over prior flow matching methods, and we raise important questions about the design of prior distributions for achieving strong performance in flow matching models. Code and trained models for reproducing this work are available at https://github.com/dunni3/FlowMol.
Collapse
Affiliation(s)
- Ian Dunn
- Dept. of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260
| | - David Ryan Koes
- Dept. of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260
| |
Collapse
|
15
|
Kim H, Lee K, Kim C, Lim J, Kim WY. DFRscore: Deep Learning-Based Scoring of Synthetic Complexity with Drug-Focused Retrosynthetic Analysis for High-Throughput Virtual Screening. J Chem Inf Model 2024; 64:2432-2444. [PMID: 37651152 DOI: 10.1021/acs.jcim.3c01134] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Recently emerging generative AI models enable us to produce a vast number of compounds for potential applications. While they can provide novel molecular structures, the synthetic feasibility of the generated molecules is often questioned. To address this issue, a few recent studies have attempted to use deep learning models to estimate the synthetic accessibility of many molecules rapidly. However, retrosynthetic analysis tools used to train the models rely on reaction templates automatically extracted from a large reaction database that are not domain-specific and may exhibit low chemical correctness. To overcome this limitation, we introduce DFRscore (Drug-Focused Retrosynthetic score), a deep learning-based approach for a more practical assessment of synthetic accessibility in drug discovery. The DFRscore model is trained exclusively on drug-focused reactions, providing a predicted number of minimally required synthetic steps for each compound. This approach enables practitioners to filter out compounds that do not meet their desired level of synthetic accessibility at an early stage of high-throughput virtual screening for accelerated drug discovery. The proposed strategy can be easily adapted to other domains by adjusting the synthesis planning setup of the reaction templates and starting materials.
Collapse
Affiliation(s)
- Hyeongwoo Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Kyunghoon Lee
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Chansu Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Jaechang Lim
- HITS Incorporation, 124 Teheran-ro, Gangnam-gu, Seoul 06234, Republic of Korea
| | - Woo Youn Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
- HITS Incorporation, 124 Teheran-ro, Gangnam-gu, Seoul 06234, Republic of Korea
- AI Institute, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| |
Collapse
|
16
|
Wu P, Du H, Yan Y, Lee TY, Bai C, Wu S. Guided diffusion for molecular generation with interaction prompt. Brief Bioinform 2024; 25:bbae174. [PMID: 38647154 PMCID: PMC11033848 DOI: 10.1093/bib/bbae174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 03/16/2024] [Accepted: 03/26/2024] [Indexed: 04/25/2024] Open
Abstract
Molecular generative models have exhibited promising capabilities in designing molecules from scratch with high binding affinities in a predetermined protein pocket, offering potential synergies with traditional structural-based drug design strategy. However, the generative processes of such models are random and the atomic interaction information between ligand and protein are ignored. On the other hand, the ligand has high propensity to bind with residues called hotspots. Hotspot residues contribute to the majority of the binding free energies and have been recognized as appealing targets for designed molecules. In this work, we develop an interaction prompt guided diffusion model, InterDiff to deal with the challenges. Four kinds of atomic interactions are involved in our model and represented as learnable vector embeddings. These embeddings serve as conditions for individual residue to guide the molecular generative process. Comprehensive in silico experiments evince that our model could generate molecules with desired ligand-protein interactions in a guidable way. Furthermore, we validate InterDiff on two realistic protein-based therapeutic agents. Results show that InterDiff could generate molecules with better or similar binding mode compared to known targeted drugs.
Collapse
Affiliation(s)
- Peng Wu
- Department of Urology, South China Hospital, Medical School, Shenzhen University, Fuxin Road, Longgang District, Shenzhen, 518116, China. Tel.: +86 0755 89798999
| | - Huabin Du
- MoMed Biotechnology Co., Ltd., Hangzhou 310005, China
| | - Yingchao Yan
- MoMed Biotechnology Co., Ltd., Hangzhou 310005, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan, China. Tel.:+886 0928 560313
| | - Chen Bai
- MoMed Biotechnology Co., Ltd., Hangzhou 310005, China
- Warshel Institute for Computational Biology, School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, Guangdong, China. Tel.:+86 0755 84273118
| | - Song Wu
- Department of Urology, South China Hospital, Medical School, Shenzhen University, Fuxin Road, Longgang District, Shenzhen, 518116, China. Tel.: +86 0755 89798999
- South China Hospital, Health Science Center, Shenzhen University, Shenzhen 518116, China
| |
Collapse
|
17
|
Huang L, Xu T, Yu Y, Zhao P, Chen X, Han J, Xie Z, Li H, Zhong W, Wong KC, Zhang H. A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets. Nat Commun 2024; 15:2657. [PMID: 38531837 DOI: 10.1038/s41467-024-46569-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 03/01/2024] [Indexed: 03/28/2024] Open
Abstract
Structure-based generative chemistry is essential in computer-aided drug discovery by exploring a vast chemical space to design ligands with high binding affinity for targets. However, traditional in silico methods are limited by computational inefficiency, while machine learning approaches face bottlenecks due to auto-regressive sampling. To address these concerns, we have developed a conditional deep generative model, PMDM, for 3D molecule generation fitting specified targets. PMDM consists of a conditional equivariant diffusion model with both local and global molecular dynamics, enabling PMDM to consider the conditioned protein information to generate molecules efficiently. The comprehensive experiments indicate that PMDM outperforms baseline models across multiple evaluation metrics. To evaluate the applications of PMDM under real drug design scenarios, we conduct lead compound optimization for SARS-CoV-2 main protease (Mpro) and Cyclin-dependent Kinase 2 (CDK2), respectively. The selected lead optimization molecules are synthesized and evaluated for their in-vitro activities against CDK2, displaying improved CDK2 activity.
Collapse
Affiliation(s)
- Lei Huang
- City University of Hong Kong, Hong Kong, SAR, China
- Tencent AI Lab, Shenzhen, China
| | | | - Yang Yu
- Tencent AI Lab, Shenzhen, China
| | | | | | - Jing Han
- Regor Therapeutics Group, Shanghai, China
| | - Zhi Xie
- Regor Therapeutics Group, Shanghai, China
| | - Hailong Li
- Regor Therapeutics Group, Shanghai, China.
| | | | - Ka-Chun Wong
- City University of Hong Kong, Hong Kong, SAR, China.
| | | |
Collapse
|
18
|
Tang Y, Moretti R, Meiler J. Recent Advances in Automated Structure-Based De Novo Drug Design. J Chem Inf Model 2024; 64:1794-1805. [PMID: 38485516 PMCID: PMC10966644 DOI: 10.1021/acs.jcim.4c00247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 02/26/2024] [Accepted: 02/29/2024] [Indexed: 03/26/2024]
Abstract
As the number of determined and predicted protein structures and the size of druglike 'make-on-demand' libraries soar, the time-consuming nature of structure-based computer-aided drug design calls for innovative computational algorithms. De novo drug design introduces in silico heuristics to accelerate searching in the vast chemical space. This review focuses on recent advances in structure-based de novo drug design, ranging from conventional fragment-based methods, evolutionary algorithms, and Metropolis Monte Carlo methods to deep generative models. Due to the historical limitation of de novo drug design generating readily available drug-like molecules, we highlight the synthetic accessibility efforts in each category and the benchmarking strategies taken to validate the proposed framework.
Collapse
Affiliation(s)
- Yidan Tang
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Rocco Moretti
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
- Center
for Structural Biology, Vanderbilt University, Nashville, Tennessee 37240, United States
| | - Jens Meiler
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
- Center
for Structural Biology, Vanderbilt University, Nashville, Tennessee 37240, United States
- Institute
of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany
| |
Collapse
|
19
|
Weller JA, Rohs R. DrugHIVE: Target-specific spatial drug design and optimization with a hierarchical generative model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.22.573155. [PMID: 38187658 PMCID: PMC10769420 DOI: 10.1101/2023.12.22.573155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Rapid advancement in the computational methods of structure-based drug design has led to their widespread adoption as key tools in the early drug development process. Recently, the remarkable growth of available crystal structure data and libraries of commercially available or readily synthesizable molecules have unlocked previously inaccessible regions of chemical space for drug development. Paired with improvements in virtual ligand screening methods, these expanded libraries are having a significant impact on the success of early drug design efforts. However, screening-based methods are limited in their scalability due to computational limits and the sheer scale of drug-like space. An approach within the quickly evolving field of artificial intelligence (AI), deep generative modeling, is extending the reach of molecular design beyond classical methods by learning the fundamental intra- and inter-molecular relationships in drug-target systems from existing data. In this work we introduce DrugHIVE, a deep hierarchical structure-based generative model that enables fine-grained control over molecular generation. Our model outperforms state of the art autoregressive and diffusion-based methods on common benchmarks and in speed of generation. Here, we demonstrate DrugHIVEs capacity to accelerate a wide range of common drug design tasks such as de novo generation, molecular optimization, scaffold hopping, linker design, and high throughput pattern replacement. Our method is highly scalable and can be applied to high confidence AlphaFold predicted receptors, extending our ability to generate high quality drug-like molecules to a majority of the unsolved human proteome.
Collapse
|
20
|
Wang M, Wu Z, Wang J, Weng G, Kang Y, Pan P, Li D, Deng Y, Yao X, Bing Z, Hsieh CY, Hou T. Genetic Algorithm-Based Receptor Ligand: A Genetic Algorithm-Guided Generative Model to Boost the Novelty and Drug-Likeness of Molecules in a Sampling Chemical Space. J Chem Inf Model 2024; 64:1213-1228. [PMID: 38302422 DOI: 10.1021/acs.jcim.3c01964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Deep learning-based de novo molecular design has recently gained significant attention. While numerous DL-based generative models have been successfully developed for designing novel compounds, the majority of the generated molecules lack sufficiently novel scaffolds or high drug-like profiles. The aforementioned issues may not be fully captured by commonly used metrics for the assessment of molecular generative models, such as novelty, diversity, and quantitative estimation of the drug-likeness score. To address these limitations, we proposed a genetic algorithm-guided generative model called GARel (genetic algorithm-based receptor-ligand interaction generator), a novel framework for training a DL-based generative model to produce drug-like molecules with novel scaffolds. To efficiently train the GARel model, we utilized dense net to update the parameters based on molecules with novel scaffolds and drug-like features. To demonstrate the capability of the GARel model, we used it to design inhibitors for three targets: AA2AR, EGFR, and SARS-Cov2. The results indicate that GARel-generated molecules feature more diverse and novel scaffolds and possess more desirable physicochemical properties and favorable docking scores. Compared with other generative models, GARel makes significant progress in balancing novelty and drug-likeness, providing a promising direction for the further development of DL-based de novo design methodology with potential impacts on drug discovery.
Collapse
Affiliation(s)
- Mingyang Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Zhengjian Wu
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- School of Computer Science, Wuhan University, Wuhan 430072, Hubei ,China
| | - Jike Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Gaoqi Weng
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Yu Kang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Peichen Pan
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Dan Li
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Xiaojun Yao
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery Macau Institute for Applied Research in Medicine and Health State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa, Macau 999078, China
| | - Zhitong Bing
- Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou, Gansu 730000, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Tingjun Hou
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| |
Collapse
|
21
|
Garg V. Generative AI for graph-based drug design: Recent advances and the way forward. Curr Opin Struct Biol 2024; 84:102769. [PMID: 38199072 DOI: 10.1016/j.sbi.2023.102769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/17/2023] [Accepted: 12/19/2023] [Indexed: 01/12/2024]
Abstract
Discovering new promising molecule candidates that could translate into effective drugs is a key scientific pursuit. However, factors such as the vastness and discreteness of the molecular search space pose a formidable technical challenge in this quest. AI-driven generative models can effectively learn from data, and offer hope to streamline drug design. In this article, we review state of the art in generative models that operate on molecular graphs. We also shed light on some limitations of the existing methodology and sketch directions to harness the potential of AI for drug design tasks going forward.
Collapse
Affiliation(s)
- Vikas Garg
- Aalto University and YaiYai Ltd, Finland.
| |
Collapse
|
22
|
Bass L, Elder LH, Folescu DE, Forouzesh N, Tolokh IS, Karpatne A, Onufriev AV. Improving the Accuracy of Physics-Based Hydration-Free Energy Predictions by Machine Learning the Remaining Error Relative to the Experiment. J Chem Theory Comput 2024; 20:396-410. [PMID: 38149593 PMCID: PMC10950260 DOI: 10.1021/acs.jctc.3c00981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
The accuracy of computational models of water is key to atomistic simulations of biomolecules. We propose a computationally efficient way to improve the accuracy of the prediction of hydration-free energies (HFEs) of small molecules: the remaining errors of the physics-based models relative to the experiment are predicted and mitigated by machine learning (ML) as a postprocessing step. Specifically, the trained graph convolutional neural network attempts to identify the "blind spots" in the physics-based model predictions, where the complex physics of aqueous solvation is poorly accounted for, and partially corrects for them. The strategy is explored for five classical solvent models representing various accuracy/speed trade-offs, from the fast analytical generalized Born (GB) to the popular TIP3P explicit solvent model; experimental HFEs of small neutral molecules from the FreeSolv set are used for the training and testing. For all of the models, the ML correction reduces the resulting root-mean-square error relative to the experiment for HFEs of small molecules, without significant overfitting and with negligible computational overhead. For example, on the test set, the relative accuracy improvement is 47% for the fast analytical GB, making it, after the ML correction, almost as accurate as uncorrected TIP3P. For the TIP3P model, the accuracy improvement is about 39%, bringing the ML-corrected model's accuracy below the 1 kcal/mol threshold. In general, the relative benefit of the ML corrections is smaller for more accurate physics-based models, reaching the lower limit of about 20% relative accuracy gain compared with that of the physics-based treatment alone. The proposed strategy of using ML to learn the remaining error of physics-based models offers a distinct advantage over training ML alone directly on reference HFEs: it preserves the correct overall trend, even well outside of the training set.
Collapse
Affiliation(s)
- Lewis Bass
- Department of Computer Engineering, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Luke H Elder
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Dan E Folescu
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
- Department of Mathematics, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Negin Forouzesh
- Department of Computer Science, California State University, Los Angeles, California 90032, United States
| | - Igor S Tolokh
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Anuj Karpatne
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Alexey V Onufriev
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
- Department of Physics, Virginia Tech, Blacksburg, Virginia 24061, United States
- Center for Soft Matter and Biological Physics, Virginia Tech, Blacksburg, Virginia 24061, United States
| |
Collapse
|
23
|
Powers A, Yu HH, Suriana P, Koodli RV, Lu T, Paggi JM, Dror RO. Geometric Deep Learning for Structure-Based Ligand Design. ACS CENTRAL SCIENCE 2023; 9:2257-2267. [PMID: 38161364 PMCID: PMC10755842 DOI: 10.1021/acscentsci.3c00572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 01/03/2024]
Abstract
A pervasive challenge in drug design is determining how to expand a ligand-a small molecule that binds to a target biomolecule-in order to improve various properties of the ligand. Adding single chemical groups, known as fragments, is important for lead optimization tasks, and adding multiple fragments is critical for fragment-based drug design. We have developed a comprehensive framework that uses machine learning and three-dimensional protein-ligand structures to address this challenge. Our method, FRAME, iteratively determines where on a ligand to add fragments, selects fragments to add, and predicts the geometry of the added fragments. On a comprehensive benchmark, FRAME consistently improves predicted affinity and selectivity relative to the initial ligand, while generating molecules with more drug-like chemical properties than docking-based methods currently in widespread use. FRAME learns to accurately describe molecular interactions despite being given no prior information on such interactions. The resulting framework for quality molecular hypothesis generation can be easily incorporated into the workflows of medicinal chemists for diverse tasks, including lead optimization, fragment-based drug discovery, and de novo drug design.
Collapse
Affiliation(s)
- Alexander
S. Powers
- Department
of Chemistry, Stanford University, Stanford, California 94305, United States
- Department
of Computer Science, Stanford University, Stanford, California 94305, United States
- Department
of Molecular and Cellular Physiology, Stanford
University School of Medicine, Stanford, California 94305, United States
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94305, United States
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
| | - Helen H. Yu
- Department
of Computer Science, Stanford University, Stanford, California 94305, United States
- Department
of Molecular and Cellular Physiology, Stanford
University School of Medicine, Stanford, California 94305, United States
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94305, United States
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
| | - Patricia Suriana
- Department
of Computer Science, Stanford University, Stanford, California 94305, United States
- Department
of Molecular and Cellular Physiology, Stanford
University School of Medicine, Stanford, California 94305, United States
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94305, United States
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
| | - Rohan V. Koodli
- Department
of Computer Science, Stanford University, Stanford, California 94305, United States
- Department
of Molecular and Cellular Physiology, Stanford
University School of Medicine, Stanford, California 94305, United States
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94305, United States
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
- Biomedical
Informatics Program, Stanford University
School of Medicine, Stanford, California 94305, United States
| | - Tianyu Lu
- Department
of Computer Science, Stanford University, Stanford, California 94305, United States
- Department
of Molecular and Cellular Physiology, Stanford
University School of Medicine, Stanford, California 94305, United States
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94305, United States
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
- Department
of Bioengineering, Stanford University, Stanford, California 94305, United States
| | - Joseph M. Paggi
- Department
of Computer Science, Stanford University, Stanford, California 94305, United States
- Department
of Molecular and Cellular Physiology, Stanford
University School of Medicine, Stanford, California 94305, United States
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94305, United States
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
| | - Ron O. Dror
- Department
of Computer Science, Stanford University, Stanford, California 94305, United States
- Department
of Molecular and Cellular Physiology, Stanford
University School of Medicine, Stanford, California 94305, United States
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94305, United States
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
24
|
Li G, Yuan Y, Zhang R. Ensemble of local and global information for Protein-Ligand Binding Affinity Prediction. Comput Biol Chem 2023; 107:107972. [PMID: 37883905 DOI: 10.1016/j.compbiolchem.2023.107972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 10/07/2023] [Accepted: 10/17/2023] [Indexed: 10/28/2023]
Abstract
Accurately predicting protein-ligand binding affinities is crucial for determining molecular properties and understanding their physical effects. Neural networks and transformers are the predominant methods for sequence modeling, and both have been successfully applied independently for protein-ligand binding affinity prediction. As local and global information of molecules are vital for protein-ligand binding affinity prediction, we aim to combine bi-directional gated recurrent unit (BiGRU) and convolutional neural network (CNN) to effectively capture both local and global molecular information. Additionally, attention mechanisms can be incorporated to automatically learn and adjust the level of attention given to local and global information, thereby enhancing the performance of the model. To achieve this, we propose the PLAsformer approach, which encodes local and global information of molecules using 3DCNN and BiGRU with attention mechanism, respectively. This approach enhances the model's ability to encode comprehensive local and global molecular information. PLAsformer achieved a Pearson's correlation coefficient of 0.812 and a Root Mean Square Error (RMSE) of 1.284 when comparing experimental and predicted affinity on the PDBBind-2016 dataset. These results surpass the current state-of-the-art methods for binding affinity prediction. The high accuracy of PLAsformer's predictive performance, along with its excellent generalization ability, is clearly demonstrated by these findings.
Collapse
Affiliation(s)
- Gaili Li
- School of Information science and Engineering, Lanzhou University, Lanzhou 730000, China.
| | - Yongna Yuan
- School of Information science and Engineering, Lanzhou University, Lanzhou 730000, China.
| | - Ruisheng Zhang
- School of Information science and Engineering, Lanzhou University, Lanzhou 730000, China.
| |
Collapse
|
25
|
Du H, Jiang D, Zhang O, Wu Z, Gao J, Zhang X, Wang X, Deng Y, Kang Y, Li D, Pan P, Hsieh CY, Hou T. A flexible data-free framework for structure-based de novo drug design with reinforcement learning. Chem Sci 2023; 14:12166-12181. [PMID: 37969589 PMCID: PMC10631243 DOI: 10.1039/d3sc04091g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 10/11/2023] [Indexed: 11/17/2023] Open
Abstract
Contemporary structure-based molecular generative methods have demonstrated their potential to model the geometric and energetic complementarity between ligands and receptors, thereby facilitating the design of molecules with favorable binding affinity and target specificity. Despite the introduction of deep generative models for molecular generation, the atom-wise generation paradigm that partially contradicts chemical intuition limits the validity and synthetic accessibility of the generated molecules. Additionally, the dependence of deep learning models on large-scale structural data has hindered their adaptability across different targets. To overcome these challenges, we present a novel search-based framework, 3D-MCTS, for structure-based de novo drug design. Distinct from prevailing atom-centric methods, 3D-MCTS employs a fragment-based molecular editing strategy. The fragments decomposed from small-molecule drugs are recombined under predefined retrosynthetic rules, offering improved drug-likeness and synthesizability, overcoming the inherent limitations of atom-based approaches. Leveraging multi-threaded parallel simulations combined with a real-time energy constraint-based pruning strategy, 3D-MCTS achieves remarkable efficiency. At a fixed computational cost, it outperforms other state-of-the-art (SOTA) methods by producing molecules with enhanced binding affinity. Furthermore, its fragment-based approach ensures the generation of more dependable binding conformations, exhibiting a success rate 43.6% higher than that of other SOTAs. This advantage becomes even more pronounced when handling targets that significantly deviate from the training dataset. 3D-MCTS is capable of achieving thirty times more hits with high binding affinity than traditional virtual screening methods, which demonstrates the superior ability of 3D-MCTS to explore chemical space. Moreover, the flexibility of our framework makes it easy to incorporate domain knowledge during the process, thereby enabling the generation of molecules with desirable pharmacophores and enhanced binding affinity. The adaptability of 3D-MCTS is further showcased in metalloprotein applications, highlighting its potential across various drug design scenarios.
Collapse
Affiliation(s)
- Hongyan Du
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dejun Jiang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Odin Zhang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Zhenxing Wu
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Junbo Gao
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xujun Zhang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiaorui Wang
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology Macao 999078 China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dan Li
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
26
|
Runcie N, Mey AS. SILVR: Guided Diffusion for Molecule Generation. J Chem Inf Model 2023; 63:5996-6005. [PMID: 37724771 PMCID: PMC10565820 DOI: 10.1021/acs.jcim.3c00667] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Indexed: 09/21/2023]
Abstract
Computationally generating new synthetically accessible compounds with high affinity and low toxicity is a great challenge in drug design. Machine learning models beyond conventional pharmacophoric methods have shown promise in the generation of novel small-molecule compounds but require significant tuning for a specific protein target. Here, we introduce a method called selective iterative latent variable refinement (SILVR) for conditioning an existing diffusion-based equivariant generative model without retraining. The model allows the generation of new molecules that fit into a binding site of a protein based on fragment hits. We use the SARS-CoV-2 main protease fragments from Diamond XChem that form part of the COVID Moonshot project as a reference dataset for conditioning the molecule generation. The SILVR rate controls the extent of conditioning, and we show that moderate SILVR rates make it possible to generate new molecules of similar shape to the original fragments, meaning that the new molecules fit the binding site without knowledge of the protein. We can also merge up to 3 fragments into a new molecule without affecting the quality of molecules generated by the underlying generative model. Our method is generalizable to any protein target with known fragments and any diffusion-based model for molecule generation.
Collapse
Affiliation(s)
- Nicholas
T. Runcie
- EaSTCHEM School of Chemistry, University of Edinburgh, Edinburgh EH9 3FJ, U.K.
| | - Antonia S.J.S. Mey
- EaSTCHEM School of Chemistry, University of Edinburgh, Edinburgh EH9 3FJ, U.K.
| |
Collapse
|
27
|
Zhang O, Wang T, Weng G, Jiang D, Wang N, Wang X, Zhao H, Wu J, Wang E, Chen G, Deng Y, Pan P, Kang Y, Hsieh CY, Hou T. Learning on topological surface and geometric structure for 3D molecular generation. NATURE COMPUTATIONAL SCIENCE 2023; 3:849-859. [PMID: 38177756 DOI: 10.1038/s43588-023-00530-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 09/06/2023] [Indexed: 01/06/2024]
Abstract
Highly effective de novo design is a grand challenge of computer-aided drug discovery. Practical structure-specific three-dimensional molecule generations have started to emerge in recent years, but most approaches treat the target structure as a conditional input to bias the molecule generation and do not fully learn the detailed atomic interactions that govern the molecular conformation and stability of the binding complexes. The omission of these fine details leads to many models having difficulty in outputting reasonable molecules for a variety of therapeutic targets. Here, to address this challenge, we formulate a model, called SurfGen, that designs molecules in a fashion closely resembling the figurative key-and-lock principle. SurfGen comprises two equivariant neural networks, Geodesic-GNN and Geoatom-GNN, which capture the topological interactions on the pocket surface and the spatial interaction between ligand atoms and surface nodes, respectively. SurfGen outperforms other methods in a number of benchmarks, and its high sensitivity on the pocket structures enables an effective generative-model-based solution to the thorny issue of mutation-induced drug resistance.
Collapse
Affiliation(s)
- Odin Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Tianyue Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Gaoqi Weng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Ning Wang
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou, China
| | - Xiaorui Wang
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou, China
| | - Huifeng Zhao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Ercheng Wang
- Zhejiang Lab, Zhejiang University, Hangzhou, China
| | | | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.
| |
Collapse
|
28
|
Hadfield TE, Scantlebury J, Deane CM. Exploring the ability of machine learning-based virtual screening models to identify the functional groups responsible for binding. J Cheminform 2023; 15:84. [PMID: 37726844 PMCID: PMC10509074 DOI: 10.1186/s13321-023-00755-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 08/25/2023] [Indexed: 09/21/2023] Open
Abstract
Many recently proposed structure-based virtual screening models appear to be able to accurately distinguish high affinity binders from non-binders. However, several recent studies have shown that they often do so by exploiting ligand-specific biases in the dataset, rather than identifying favourable intermolecular interactions in the input protein-ligand complex. In this work we propose a novel approach for assessing the extent to which machine learning-based virtual screening models are able to identify the functional groups responsible for binding. To sidestep the difficulty in establishing the ground truth importance of each atom of a large scale set of protein-ligand complexes, we propose a protocol for generating synthetic data. Each ligand in the dataset is surrounded by a randomly sampled point cloud of pharmacophores, and the label assigned to the synthetic protein-ligand complex is determined by a 3-dimensional deterministic binding rule. This allows us to precisely quantify the ground truth importance of each atom and compare it to the model generated attributions. Using our generated datasets, we demonstrate that a recently proposed deep learning-based virtual screening model, PointVS, identified the most important functional groups with 39% more efficiency than a fingerprint-based random forest, suggesting that it would generalise more effectively to new examples. In addition, we found that ligand-specific biases, such as those present in widely used virtual screening datasets, substantially impaired the ability of all ML models to identify the most important functional groups. We have made our synthetic data generation framework available to facilitate the benchmarking of new virtual screening models. Code is available at https://github.com/tomhadfield95/synthVS .
Collapse
Affiliation(s)
- Thomas E Hadfield
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| | - Jack Scantlebury
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK.
| |
Collapse
|
29
|
Sagar D, Risheh A, Sheikh N, Forouzesh N. Physics-Guided Deep Generative Model for New Ligand Discovery. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2023; 2023:10.1145/3584371.3613067. [PMID: 38706556 PMCID: PMC11067829 DOI: 10.1145/3584371.3613067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
Structure-based drug discovery aims to identify small molecules that can attach to a specific target protein and change its functionality. Recently, deep learning has shown great promise in generating drug-like molecules with specific biochemical features and conditioned with structural features. However, they usually fail to incorporate an essential factor: the underlying physics which guides molecular formation and binding in real-world scenarios. In this work, we describe a physics-guided deep generative model for new ligand discovery, conditioned not only on the binding site but also on physics-based features that describe the binding mechanism between a receptor and a ligand. The proposed hybrid model has been tested on large protein-ligand complexes and small host-guest systems. Using the top-N methodology, on average more than 75% of the generated structures by our hybrid model were stronger binders than the original reference ligand. All of them had higher ΔGbind (affinity) values than the ones generated by the previous state-of-the-art method by an average margin of 1.88 kcal/mol. The visualization of the top-5 ligands generated by the proposed physics-guided model and the reference deep learning model demonstrate more feasible conformations and orientations by the former. The future directions include training and testing the hybrid model on larger datasets, adding more relevant physics-based features, and interpreting the deep learning outcomes from biophysical perspectives.
Collapse
Affiliation(s)
- Dikshant Sagar
- Department of Computer Science, California State University, Los Angeles, Los Angeles, California, USA
| | - Ali Risheh
- Department of Computer Science, California State University, Los Angeles, Los Angeles, California, USA
| | - Nida Sheikh
- Department of Computer Science, California State University, Los Angeles Los Angeles, California, USA
| | - Negin Forouzesh
- Department of Computer Science, California State University, Los Angeles, Los Angeles, California, USA
| |
Collapse
|
30
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
31
|
Zhang Z, Liu Q, Lee CK, Hsieh CY, Chen E. An equivariant generative framework for molecular graph-structure Co-design. Chem Sci 2023; 14:8380-8392. [PMID: 37564414 PMCID: PMC10411624 DOI: 10.1039/d3sc02538a] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 07/05/2023] [Indexed: 08/12/2023] Open
Abstract
Designing molecules with desirable physiochemical properties and functionalities is a long-standing challenge in chemistry, material science, and drug discovery. Recently, machine learning-based generative models have emerged as promising approaches for de novo molecule design. However, further refinement of methodology is highly desired as most existing methods lack unified modeling of 2D topology and 3D geometry information and fail to effectively learn the structure-property relationship for molecule design. Here we present MolCode, a roto-translation equivariant generative framework for molecular graph-structure Co-design. In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure. Extensive experimental results show that MolCode outperforms previous methods on a series of challenging tasks including de novo molecule design, targeted molecule discovery, and structure-based drug design. Particularly, MolCode not only consistently generates valid (99.95% validity) and diverse (98.75% uniqueness) molecular graphs/structures with desirable properties, but also generates drug-like molecules with high affinity to target proteins (61.8% high affinity ratio), which demonstrates MolCode's potential applications in material design and drug discovery. Our extensive investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design, and provide new insights into machine learning-based molecule representation and generation.
Collapse
Affiliation(s)
- Zaixi Zhang
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China Hefei Anhui 230026 China
- State Key Laboratory of Cognitive Intelligence Hefei Anhui 230088 China
| | - Qi Liu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China Hefei Anhui 230026 China
- State Key Laboratory of Cognitive Intelligence Hefei Anhui 230088 China
| | | | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou Zhejiang 310058 China
| | - Enhong Chen
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China Hefei Anhui 230026 China
- State Key Laboratory of Cognitive Intelligence Hefei Anhui 230088 China
| |
Collapse
|
32
|
Zhang W, Zhang K, Huang J. A Simple Way to Incorporate Target Structural Information in Molecular Generative Models. J Chem Inf Model 2023. [PMID: 37318828 DOI: 10.1021/acs.jcim.3c00293] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Deep learning generative models are now being applied in various fields including drug discovery. In this work, we propose a novel approach to include target 3D structural information in molecular generative models for structure-based drug design. The method combines a message-passing neural network model that predicts docking scores with a generative neural network model as its reward function to navigate the chemical space searching for molecules that bind favorably with a specific target. A key feature of the method is the construction of target-specific molecular sets for training, designed to overcome potential transferability issues of surrogate docking models through a two-round training process. Consequently, this enables accurate guided exploration of the chemical space without reliance on the collection of prior knowledge about active and inactive compounds for the specific target. Tests on eight target proteins showed a 100-fold increase in hit generation compared to conventional docking calculations and the ability to generate molecules similar to approved drugs or known active ligands for specific targets without prior knowledge. This method provides a general and highly efficient solution for structure-based molecular generation.
Collapse
Affiliation(s)
- Wenyi Zhang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Kaiyue Zhang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Jing Huang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| |
Collapse
|
33
|
Baillif B, Cole J, McCabe P, Bender A. Deep generative models for 3D molecular structure. Curr Opin Struct Biol 2023; 80:102566. [DOI: 10.1016/j.sbi.2023.102566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 02/05/2023] [Accepted: 02/15/2023] [Indexed: 03/30/2023]
|
34
|
Janet JP, Mervin L, Engkvist O. Artificial intelligence in molecular de novo design: Integration with experiment. Curr Opin Struct Biol 2023; 80:102575. [PMID: 36966692 DOI: 10.1016/j.sbi.2023.102575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 02/09/2023] [Accepted: 02/18/2023] [Indexed: 06/04/2023]
Abstract
In this mini review, we capture the latest progress of applying artificial intelligence (AI) techniques based on deep learning architectures to molecular de novo design with a focus on integration with experimental validation. We will cover the progress and experimental validation of novel generative algorithms, the validation of QSAR models and how AI-based molecular de novo design is starting to become connected with chemistry automation. While progress has been made in the last few years, it is still early days. The experimental validations conducted thus far should be considered proof-of-principle, providing confidence that the field is moving in the right direction.
Collapse
Affiliation(s)
- Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Lewis Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
35
|
Ciepliński T, Danel T, Podlewska S, Jastrzȩbski S. Generative Models Should at Least Be Able to Design Molecules That Dock Well: A New Benchmark. J Chem Inf Model 2023. [PMID: 37224003 DOI: 10.1021/acs.jcim.2c01355] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Designing compounds with desired properties is a key element of the drug discovery process. However, measuring progress in the field has been challenging due to the lack of realistic retrospective benchmarks, and the large cost of prospective validation. To close this gap, we propose a benchmark based on docking, a widely used computational method for assessing molecule binding to a protein. Concretely, the goal is to generate drug-like molecules that are scored highly by SMINA, a popular docking software. We observe that various graph-based generative models fail to propose molecules with a high docking score when trained using a realistically sized training set. This suggests a limitation of the current incarnation of models for de novo drug design. Finally, we also include simpler tasks in the benchmark based on a simpler scoring function. We release the benchmark as an easy to use package available at https://github.com/cieplinski-tobiasz/smina-docking-benchmark. We hope that our benchmark will serve as a stepping stone toward the goal of automatically generating promising drug candidates.
Collapse
Affiliation(s)
- Tobiasz Ciepliński
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348 Kraków, Poland
| | - Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348 Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Smȩtna 12, 31-343 Kraków, Poland
| | - Stanisław Jastrzȩbski
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348 Kraków, Poland
- Molecule.one, Al. Jerozolimskie 96, 00-807 Warsaw, Poland
| |
Collapse
|
36
|
Thomas M, Bender A, de Graaf C. Integrating structure-based approaches in generative molecular design. Curr Opin Struct Biol 2023; 79:102559. [PMID: 36870277 DOI: 10.1016/j.sbi.2023.102559] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 01/23/2023] [Accepted: 01/31/2023] [Indexed: 03/06/2023]
Abstract
Generative molecular design for drug discovery and development has seen a recent resurgence promising to improve the efficiency of the design-make-test-analyse cycle; by computationally exploring much larger chemical spaces than traditional virtual screening techniques. However, most generative models thus far have only utilized small-molecule information to train and condition de novo molecule generators. Here, we instead focus on recent approaches that incorporate protein structure into de novo molecule optimization in an attempt to maximize the predicted on-target binding affinity of generated molecules. We summarize these structure integration principles into either distribution learning or goal-directed optimization and for each case whether the approach is protein structure-explicit or implicit with respect to the generative model. We discuss recent approaches in the context of this categorization and provide our perspective on the future direction of the field.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK. https://twitter.com/@AndreasBenderUK
| | - Chris de Graaf
- Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK. https://twitter.com/@Chris_de_Graaf
| |
Collapse
|
37
|
New avenues in artificial-intelligence-assisted drug discovery. Drug Discov Today 2023; 28:103516. [PMID: 36736583 DOI: 10.1016/j.drudis.2023.103516] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 12/08/2022] [Accepted: 01/26/2023] [Indexed: 02/05/2023]
Abstract
Over the past decade, the amount of biomedical data available has grown at unprecedented rates. Increased automation technology and larger data volumes have encouraged the use of machine learning (ML) or artificial intelligence (AI) techniques for mining such data and extracting useful patterns. Because the identification of chemical entities with desired biological activity is a crucial task in drug discovery, AI technologies have the potential to accelerate this process and support decision making. In addition, the advent of deep learning (DL) has shown great promise in addressing diverse problems in drug discovery, such as de novo molecular design. Herein, we will appraise the current state-of-the-art in AI-assisted drug discovery, discussing the recent applications covering generative models for chemical structure generation, scoring functions to improve binding affinity and pose prediction, and molecular dynamics to assist in the parametrization, featurization and generalization tasks. Finally, we will discuss current hurdles and the strategies to overcome them, as well as potential future directions.
Collapse
|
38
|
Banerjee A, Saha S, Tvedt NC, Yang LW, Bahar I. Mutually beneficial confluence of structure-based modeling of protein dynamics and machine learning methods. Curr Opin Struct Biol 2023; 78:102517. [PMID: 36587424 PMCID: PMC10038760 DOI: 10.1016/j.sbi.2022.102517] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/19/2022] [Accepted: 11/22/2022] [Indexed: 12/31/2022]
Abstract
Proteins sample an ensemble of conformers under physiological conditions, having access to a spectrum of modes of motions, also called intrinsic dynamics. These motions ensure the adaptation to various interactions in the cell, and largely assist in, if not determine, viable mechanisms of biological function. In recent years, machine learning frameworks have proven uniquely useful in structural biology, and recent studies further provide evidence to the utility and/or necessity of considering intrinsic dynamics for increasing their predictive ability. Efficient quantification of dynamics-based attributes by recently developed physics-based theories and models such as elastic network models provides a unique opportunity to generate data on dynamics for training ML models towards inferring mechanisms of protein function, assessing pathogenicity, or estimating binding affinities.
Collapse
Affiliation(s)
- Anupam Banerjee
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA
| | - Satyaki Saha
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA
| | - Nathan C Tvedt
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA; Computational and Applied Mathematics and Statistics, The College of William and Mary, Williamsburg, VA 23185, USA
| | - Lee-Wei Yang
- Institute of Bioinformatics and Structural Biology, and PhD Program in Biomedical Artificial Intelligence, National Tsing Hua University, Hsinchu 300044, Taiwan; Physics Division, National Center for Theoretical Sciences, Taipei 106319, Taiwan
| | - Ivet Bahar
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA.
| |
Collapse
|
39
|
Danel T, Łęski J, Podlewska S, Podolak IT. Docking-based generative approaches in the search for new drug candidates. Drug Discov Today 2023; 28:103439. [PMID: 36372330 DOI: 10.1016/j.drudis.2022.103439] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/08/2022] [Accepted: 11/08/2022] [Indexed: 11/13/2022]
Abstract
Despite the popularity of virtual screening (VS) of existing compound libraries, the search for new potential drug candidates also takes advantage of generative protocols, where new compound suggestions are enumerated using various algorithms. To increase the activity potency of generative approaches, they have recently been coupled with molecular docking, a leading methodology of structure-based drug design (SBDD). In this review, we summarize progress since docking-based generative models emerged. We propose a new taxonomy for these methods and discuss their importance for the field of computer-aided drug design (CADD). In addition, we discuss the most promising directions for the further development of generative protocols coupled with docking.
Collapse
Affiliation(s)
- Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland.
| | - Jan Łęski
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Department of Medicinal Chemistry, 31-343 Kraków, Smętna Street 12, Poland
| | - Igor T Podolak
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| |
Collapse
|
40
|
Wang M, Wang J, Weng G, Kang Y, Pan P, Li D, Deng Y, Li H, Hsieh CY, Hou T. ReMODE: a deep learning-based web server for target-specific drug design. J Cheminform 2022; 14:84. [PMID: 36510307 PMCID: PMC9743675 DOI: 10.1186/s13321-022-00665-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 12/01/2022] [Indexed: 12/14/2022] Open
Abstract
Deep learning (DL) and machine learning contribute significantly to basic biology research and drug discovery in the past few decades. Recent advances in DL-based generative models have led to superior developments in de novo drug design. However, data availability, deep data processing, and the lack of user-friendly DL tools and interfaces make it difficult to apply these DL techniques to drug design. We hereby present ReMODE (Receptor-based MOlecular DEsign), a new web server based on DL algorithm for target-specific ligand design, which integrates different functional modules to enable users to develop customizable drug design tasks. As designed, the ReMODE sever can construct the target-specific tasks toward the protein targets selected by users. Meanwhile, the server also provides some extensions: users can optimize the drug-likeness or synthetic accessibility of the generated molecules, and control other physicochemical properties; users can also choose a sub-structure/scaffold as a starting point for fragment-based drug design. The ReMODE server also enables users to optimize the pharmacophore matching and docking conformations of the generated molecules. We believe that the ReMODE server will benefit researchers for drug discovery. ReMODE is publicly available at http://cadd.zju.edu.cn/relation/remode/ .
Collapse
Affiliation(s)
- Mingyang Wang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China ,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Jike Wang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China ,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Gaoqi Weng
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China ,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Yu Kang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Peichen Pan
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Dan Li
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018 Zhejiang People’s Republic of China
| | - Honglin Li
- grid.28056.390000 0001 2163 4895Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237 People’s Republic of China
| | - Chang-Yu Hsieh
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| | - Tingjun Hou
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou, 310058 Zhejiang People’s Republic of China
| |
Collapse
|
41
|
Yang X, Zheng Y, Xing X, Sui X, Jia W, Pan H. Immune subtype identification and multi-layer perceptron classifier construction for breast cancer. Front Oncol 2022; 12:943874. [PMID: 36568197 PMCID: PMC9780074 DOI: 10.3389/fonc.2022.943874] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 11/17/2022] [Indexed: 12/13/2022] Open
Abstract
Introduction Breast cancer is a heterogeneous tumor. Tumor microenvironment (TME) has an important effect on the proliferation, metastasis, treatment, and prognosis of breast cancer. Methods In this study, we calculated the relative proportion of tumor infiltrating immune cells (TIICs) in the breast cancer TME, and used the consensus clustering algorithm to cluster the breast cancer subtypes. We also developed a multi-layer perceptron (MLP) classifier based on a deep learning framework to detect breast cancer subtypes, which 70% of the breast cancer research cohort was used for the model training and 30% for validation. Results By performing the K-means clustering algorithm, the research cohort was clustered into two subtypes. The Kaplan-Meier survival estimate analysis showed significant differences in the overall survival (OS) between the two identified subtypes. Estimating the difference in the relative proportion of TIICs showed that the two subtypes had significant differences in multiple immune cells, such as CD8, CD4, and regulatory T cells. Further, the expression level of immune checkpoint molecules (PDL1, CTLA4, LAG3, TIGIT, CD27, IDO1, ICOS) and tumor mutational burden (TMB) also showed significant differences between the two subtypes, indicating the clinical value of the two subtypes. Finally, we identified a 38-gene signature and developed a multilayer perceptron (MLP) classifier that combined multi-gene signature to identify breast cancer subtypes. The results showed that the classifier had an accuracy rate of 93.56% and can be robustly used for the breast cancer subtype diagnosis. Conclusion Identification of breast cancer subtypes based on the immune signature in the tumor microenvironment can assist clinicians to effectively and accurately assess the progression of breast cancer and formulate different treatment strategies for different subtypes.
Collapse
Affiliation(s)
- Xinbo Yang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Yuanjie Zheng
- School of Information Science and Engineering, Shandong Normal University, Jinan, China,*Correspondence: Yuanjie Zheng, ; Huali Pan,
| | - Xianrong Xing
- Department of Pharmacy, Shandong Medical College, Jinan, China
| | - Xiaodan Sui
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Weikuan Jia
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Huali Pan
- School of Information Science and Engineering, Shandong Normal University, Jinan, China,Business School, Shandong Normal University, Jinan, China,*Correspondence: Yuanjie Zheng, ; Huali Pan,
| |
Collapse
|
42
|
Chan L, Kumar R, Verdonk M, Poelking C. A multilevel generative framework with hierarchical self-contrasting for bias control and transparency in structure-based ligand design. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00564-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
43
|
Kong W, Hu Y, Zhang J, Tan Q. Application of SMILES-based molecular generative model in new drug design. Front Pharmacol 2022; 13:1046524. [PMCID: PMC9606214 DOI: 10.3389/fphar.2022.1046524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 10/03/2022] [Indexed: 11/13/2022] Open
Affiliation(s)
- Weiya Kong
- School of Sports Medicine and Rehabilitation, Beijing Sport University, Beijing, China
| | - Yuejuan Hu
- Nursing Department of Fenyang College of Shanxi Medical University, Fenyang, China
| | - Jiao Zhang
- Innovation and Entrepreneurship College of Hunan University of Finance and Economics, Changsha, China
| | - Qiaoyin Tan
- College of Teacher Education, Zhejiang Normal University, Jinhua, China
- *Correspondence: Qiaoyin Tan,
| |
Collapse
|
44
|
Wang M, Hsieh CY, Wang J, Wang D, Weng G, Shen C, Yao X, Bing Z, Li H, Cao D, Hou T. RELATION: A Deep Generative Model for Structure-Based De Novo Drug Design. J Med Chem 2022; 65:9478-9492. [PMID: 35713420 DOI: 10.1021/acs.jmedchem.2c00732] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Deep learning (DL)-based de novo molecular design has recently gained considerable traction. Many DL-based generative models have been successfully developed to design novel molecules, but most of them are ligand-centric and the role of the 3D geometries of target binding pockets in molecular generation has not been well-exploited. Here, we proposed a new 3D-based generative model called RELATION. In the RELATION model, the BiTL algorithm was specifically designed to extract and transfer the desired geometric features of the protein-ligand complexes to a latent space for generation. The pharmacophore conditioning and docking-based Bayesian sampling were applied to efficiently navigate the vast chemical space for the design of molecules with desired geometric properties and pharmacophore features. As a proof of concept, the RELATION model was used to design inhibitors for two targets, AKT1 and CDK2. The calculation results demonstrated that the RELATION model could efficiently generate novel molecules with favorable binding affinity and pharmacophore features.
Collapse
Affiliation(s)
- Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Chang-Yu Hsieh
- Tencent, Tencent Quantum Lab, Shenzhen 518057, Guangdong, P. R. China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Gaoqi Weng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Xiaojun Yao
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery Macau Institute for Applied Research in Medicine and Health State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa 999078, Macau, P. R. China
| | - Zhitong Bing
- Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou 730000, P. R. China
| | - Honglin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai 200237, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| |
Collapse
|
45
|
Xie W, Wang F, Li Y, Lai L, Pei J. Advances and Challenges in De Novo Drug Design Using Three-Dimensional Deep Generative Models. J Chem Inf Model 2022; 62:2269-2279. [PMID: 35544331 DOI: 10.1021/acs.jcim.2c00042] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A persistent goal for de novo drug design is to generate novel chemical compounds with desirable properties in a labor-, time-, and cost-efficient manner. Deep generative models provide alternative routes to this goal. Numerous model architectures and optimization strategies have been explored in recent years, most of which have been developed to generate two-dimensional molecular structures. Some generative models aiming at three-dimensional (3D) molecule generation have also been proposed, gaining attention for their unique advantages and potential to directly design drug-like molecules in a target-conditioning manner. This review highlights current developments in 3D molecular generative models combined with deep learning and discusses future directions for de novo drug design.
Collapse
Affiliation(s)
- Weixin Xie
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Fanhao Wang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Yibo Li
- Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China.,Peking-Tsinghua Center for Life Science at BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| |
Collapse
|
46
|
Hadfield TE, Imrie F, Merritt A, Birchall K, Deane CM. Incorporating Target-Specific Pharmacophoric Information into Deep Generative Models for Fragment Elaboration. J Chem Inf Model 2022; 62:2280-2292. [PMID: 35499971 PMCID: PMC9131447 DOI: 10.1021/acs.jcim.1c01311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Despite recent interest in deep generative models for scaffold elaboration, their applicability to fragment-to-lead campaigns has so far been limited. This is primarily due to their inability to account for local protein structure or a user's design hypothesis. We propose a novel method for fragment elaboration, STRIFE, that overcomes these issues. STRIFE takes as input fragment hotspot maps (FHMs) extracted from a protein target and processes them to provide meaningful and interpretable structural information to its generative model, which in turn is able to rapidly generate elaborations with complementary pharmacophores to the protein. In a large-scale evaluation, STRIFE outperforms existing, structure-unaware, fragment elaboration methods in proposing highly ligand-efficient elaborations. In addition to automatically extracting pharmacophoric information from a protein target's FHM, STRIFE optionally allows the user to specify their own design hypotheses.
Collapse
Affiliation(s)
- Thomas E Hadfield
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| | - Fergus Imrie
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| | - Andy Merritt
- LifeArc, SBC Open Innovation Campus, Stevenage SG1 2FX, United Kingdom
| | - Kristian Birchall
- LifeArc, SBC Open Innovation Campus, Stevenage SG1 2FX, United Kingdom
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| |
Collapse
|