1
|
Cremer J, Le T, Noé F, Clevert DA, Schütt KT. PILOT: equivariant diffusion for pocket-conditioned de novo ligand generation with multi-objective guidance via importance sampling. Chem Sci 2024:d4sc03523b. [PMID: 39211741 PMCID: PMC11348832 DOI: 10.1039/d4sc03523b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024] Open
Abstract
The generation of ligands that both are tailored to a given protein pocket and exhibit a range of desired chemical properties is a major challenge in structure-based drug design. Here, we propose an in silico approach for the de novo generation of 3D ligand structures using the equivariant diffusion model PILOT, combining pocket conditioning with a large-scale pre-training and property guidance. Its multi-objective trajectory-based importance sampling strategy is designed to direct the model towards molecules that not only exhibit desired characteristics such as increased binding affinity for a given protein pocket but also maintains high synthetic accessibility. This ensures the practicality of sampled molecules, thus maximizing their potential for the drug discovery pipeline. PILOT significantly outperforms existing methods across various metrics on the common benchmark dataset CrossDocked2020. Moreover, we employ PILOT to generate novel ligands for unseen protein pockets from the Kinodata-3D dataset, which encompasses a substantial portion of the human kinome. The generated structures exhibit predicted IC50 values indicative of potent biological activity, which highlights the potential of PILOT as a powerful tool for structure-based drug design.
Collapse
Affiliation(s)
- Julian Cremer
- Machine Learning & Computational Sciences, Pfizer Worldwide R&D Berlin Germany
- Computational Science Laboratory, Universitat Pompeu Fabra, PRBB Spain
| | - Tuan Le
- Machine Learning & Computational Sciences, Pfizer Worldwide R&D Berlin Germany
- Department of Mathematics and Computer Science, Freie Universität Berlin Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin Germany
- Microsoft Research AI4Science, Microsoft Berlin Germany
| | - Djork-Arné Clevert
- Machine Learning & Computational Sciences, Pfizer Worldwide R&D Berlin Germany
| | - Kristof T Schütt
- Machine Learning & Computational Sciences, Pfizer Worldwide R&D Berlin Germany
| |
Collapse
|
2
|
Bai Q, Xu T, Huang J, Pérez-Sánchez H. Geometric deep learning methods and applications in 3D structure-based drug design. Drug Discov Today 2024; 29:104024. [PMID: 38759948 DOI: 10.1016/j.drudis.2024.104024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/02/2024] [Accepted: 05/10/2024] [Indexed: 05/19/2024]
Abstract
3D structure-based drug design (SBDD) is considered a challenging and rational way for innovative drug discovery. Geometric deep learning is a promising approach that solves the accurate model training of 3D SBDD through building neural network models to learn non-Euclidean data, such as 3D molecular graphs and manifold data. Here, we summarize geometric deep learning methods and applications that contain 3D molecular representations, equivariant graph neural networks (EGNNs), and six generative model methods [diffusion model, flow-based model, generative adversarial networks (GANs), variational autoencoder (VAE), autoregressive models, and energy-based models]. Our review provides insights into geometric deep learning methods and advanced applications of 3D SBDD that will be of relevance for the drug discovery community.
Collapse
Affiliation(s)
- Qifeng Bai
- School of Basic Medical Sciences, Lanzhou University, Lanzhou 730000, Gansu, PR China.
| | | | - Junzhou Huang
- Department of Computer Science and Engineering, the University of Texas at Arlington, Arlington, TX 76019, USA
| | - Horacio Pérez-Sánchez
- Structural Bioinformatics and High Performance Computing Research Group (BIO-HPC), Computer Engineering Department, UCAM Universidad Católica de Murcia, Murcia 30107, Spain.
| |
Collapse
|
3
|
Singh S, Zeh G, Freiherr J, Bauer T, Türkmen I, Grasskamp AT. Classification of substances by health hazard using deep neural networks and molecular electron densities. J Cheminform 2024; 16:45. [PMID: 38627862 PMCID: PMC11302296 DOI: 10.1186/s13321-024-00835-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 03/23/2024] [Indexed: 08/09/2024] Open
Abstract
In this paper we present a method that allows leveraging 3D electron density information to train a deep neural network pipeline to segment regions of high, medium and low electronegativity and classify substances as health hazardous or non-hazardous. We show that this can be used for use-cases such as cosmetics and food products. For this purpose, we first generate 3D electron density cubes using semiempirical molecular calculations for a custom European Chemicals Agency (ECHA) subset consisting of substances labelled as hazardous and non-hazardous for cosmetic usage. Together with their 3-class electronegativity maps we train a modified 3D-UNet with electron density cubes to segment reactive sites in molecules and classify substances with an accuracy of 78.1%. We perform the same process on a custom food dataset (CompFood) consisting of hazardous and non-hazardous substances compiled from European Food Safety Authority (EFSA) OpenFoodTox, Food and Drug Administration (FDA) Generally Recognized as Safe (GRAS) and FooDB datasets to achieve a classification accuracy of 64.1%. Our results show that 3D electron densities and particularly masked electron densities, calculated by taking a product of original electron densities and regions of high and low electronegativity can be used to classify molecules for different use-cases and thus serve not only to guide safe-by-design product development but also aid in regulatory decisions. SCIENTIFIC CONTRIBUTION: We aim to contribute to the diverse 3D molecular representations used for training machine learning algorithms by showing that a deep learning network can be trained on 3D electron density representation of molecules. This approach has previously not been used to train machine learning models and it allows utilization of the true spatial domain of the molecule for prediction of properties such as their suitability for usage in cosmetics and food products and in future, to other molecular properties. The data and code used for training is accessible at https://github.com/s-singh-ivv/eDen-Substances .
Collapse
Affiliation(s)
- Satnam Singh
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
- Department of Psychiatry and Psychotherapy, Friedrich-Alexander-Universität Erlangen-Nürnberg, Schwabachanlage 6, 91054, Erlangen, Germany
| | - Gina Zeh
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
| | - Jessica Freiherr
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
- Department of Psychiatry and Psychotherapy, Friedrich-Alexander-Universität Erlangen-Nürnberg, Schwabachanlage 6, 91054, Erlangen, Germany
| | - Thilo Bauer
- Computer Chemistry Center, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nägelsbachstr. 25, 91052, Erlangen, Germany
| | - Isik Türkmen
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany
| | - Andreas T Grasskamp
- Department of Sensory Analytics and Technologies, Fraunhofer Institute for Process Engineering and Packaging IVV, Giggenhauser Str. 35, 85354, Freising, Germany.
| |
Collapse
|
4
|
Li X, Shen C, Zhu H, Yang Y, Wang Q, Yang J, Huang N. A High-Quality Data Set of Protein-Ligand Binding Interactions Via Comparative Complex Structure Modeling. J Chem Inf Model 2024; 64:2454-2466. [PMID: 38181418 DOI: 10.1021/acs.jcim.3c01170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2024]
Abstract
High-quality protein-ligand complex structures provide the basis for understanding the nature of noncovalent binding interactions at the atomic level and enable structure-based drug design. However, experimentally determined complex structures are scarce compared with the vast chemical space. In this study, we addressed this issue by constructing the BindingNet data set via comparative complex structure modeling, which contains 69,816 modeled high-quality protein-ligand complex structures with experimental binding affinity data. BindingNet provides valuable insights into investigating protein-ligand interactions, allowing visual inspection and interpretation of structural analogues' structure-activity relationships. It can also be used for evaluating machine-learning-based scoring functions. Our results indicate that machine learning models trained on BindingNet could reduce the bias caused by buried solvent-accessible surface area, as we previously found for models trained on the PDBbind data set. We also discussed strategies to improve BindingNet and its potential utilization for benchmarking the molecular docking methods and ligand binding free energy calculation approaches. The BindingNet complements PDBbind in constructing a sufficient and unbiased protein-ligand binding data set and is freely available at http://bindingnet.huanglab.org.cn.
Collapse
Affiliation(s)
- Xuelian Li
- National Institute of Biological Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Cheng Shen
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Hui Zhu
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
| | - Yujian Yang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Qing Wang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Jincai Yang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Niu Huang
- National Institute of Biological Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
| |
Collapse
|
5
|
Kyro GW, Morgunov A, Brent RI, Batista VS. ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation. ARXIV 2023:arXiv:2309.05853v2. [PMID: 37744464 PMCID: PMC10516108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology that requires evaluation of only a subset of the generated data in the constructed sample space to successfully align a generative model with respect to a specified objective. We demonstrate the applicability of this methodology to targeted molecular generation by fine-tuning a GPT-based molecular generator toward a protein with FDA-approved small-molecule inhibitors, c-Abl kinase. Remarkably, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence, and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. We believe that the inherent generality of this method ensures that it will remain applicable as the exciting field of in silico molecular generation evolves. To facilitate implementation and reproducibility, we have made all of our software available through the open-source ChemSpaceAL Python package.
Collapse
|
6
|
Ma W, Zhang W, Le Y, Shi X, Xu Q, Xiao Y, Dou Y, Wang X, Zhou W, Peng W, Zhang H, Huang B. Using macromolecular electron densities to improve the enrichment of active compounds in virtual screening. Commun Chem 2023; 6:173. [PMID: 37608192 PMCID: PMC10444862 DOI: 10.1038/s42004-023-00984-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 08/15/2023] [Indexed: 08/24/2023] Open
Abstract
The quest for effective virtual screening algorithms is hindered by the scarcity of training data, calling for innovative approaches. This study presents the use of experimental electron density (ED) data for improving active compound enrichment in virtual screening, supported by ED's ability to reflect the time-averaged behavior of ligands and solvents in the binding pocket. Experimental ED-based grid matching score (ExptGMS) was developed to score compounds by measuring the degree of matching between their binding conformations and a series of multi-resolution experimental ED grids. The efficiency of ExptGMS was validated using both in silico tests with the Directory of Useful Decoys-Enhanced dataset and wet-lab tests on Covid-19 3CLpro-inhibitors. ExptGMS improved the active compound enrichment in top-ranked molecules by approximately 20%. Furthermore, ExptGMS identified four active inhibitors of 3CLpro, with the most effective showing an IC50 value of 1.9 µM. We also developed an online database containing experimental ED grids for over 17,000 proteins to facilitate the use of ExptGMS for academic users.
Collapse
Affiliation(s)
- Wenzhi Ma
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, 100080, Beijing, China
| | - Wei Zhang
- State Key Laboratory of Respiratory Disease, First Affiliated Hospital of Guangzhou Medical University, 510182, Guangzhou, China
- Innovation Center for Pathogen Research, Guangzhou Laboratory, 510320, Guangzhou, China
| | - Yuan Le
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, 100080, Beijing, China
| | - Xiaoxuan Shi
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, 100080, Beijing, China
| | - Qingbo Xu
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, 100080, Beijing, China
| | - Yang Xiao
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, 100080, Beijing, China
| | - Yueying Dou
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, 100080, Beijing, China
| | - Xiaoman Wang
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, 100080, Beijing, China
| | - Wenbiao Zhou
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, 100080, Beijing, China
| | - Wei Peng
- State Key Laboratory of Respiratory Disease, First Affiliated Hospital of Guangzhou Medical University, 510182, Guangzhou, China
- Innovation Center for Pathogen Research, Guangzhou Laboratory, 510320, Guangzhou, China
| | - Hongbo Zhang
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, 100080, Beijing, China.
| | - Bo Huang
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, 100080, Beijing, China.
| |
Collapse
|
7
|
Baillif B, Cole J, McCabe P, Bender A. Deep generative models for 3D molecular structure. Curr Opin Struct Biol 2023; 80:102566. [DOI: 10.1016/j.sbi.2023.102566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 02/05/2023] [Accepted: 02/15/2023] [Indexed: 03/30/2023]
|