1
|
Gheeraert A, Guyon F, Pérez S, Galochkina T. Unraveling the diversity of protein-carbohydrate interfaces: Insights from a multi-scale study. Carbohydr Res 2025; 550:109377. [PMID: 39823696 DOI: 10.1016/j.carres.2025.109377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 12/18/2024] [Accepted: 01/08/2025] [Indexed: 01/20/2025]
Abstract
Protein-carbohydrate interactions play a crucial role in numerous fundamental biological processes. Thus, description and comparison of the carbohydrate binding site (CBS) architecture is of great importance for understanding of the underlying biological mechanisms. However, traditional approaches for carbohydrate-binding protein analysis and annotation rely primarily on the sequence-based methods applied to specific protein classes. The recently released DIONYSUS database aims to fill this gap by providing tools for CBS comparison at different levels: both in terms of protein properties and classification, as well as in terms of atomistic CBS organization. In the current study, we explore DIONYSUS content using a combination of the suggested approaches in order to evaluate the diversity of the currently resolved non-covalent protein-carbohydrate interfaces at different scales. Notably, our analysis reveals evolutionary convergence of CBS in proteins with distinct folds and coming from organisms across different kingdoms of life. Furthermore, we demonstrate that a CBS structure based approach has the potential to facilitate functional annotation for the proteins with missing information in the existing databases. In particular, it provides reliable information for numerous carbohydrate-binding proteins from rapidly evolving organisms, whose analysis is particularly challenging for classical sequence-based methods.
Collapse
Affiliation(s)
- Aria Gheeraert
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
| | - Frédéric Guyon
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
| | - Serge Pérez
- Centre de Recherches sur les Macromolécules Végétales, University Grenoble Alpes, CNRS,UPR 5301, Grenoble, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France.
| |
Collapse
|
2
|
Lin M, Li K, Zhang Y, Pan F, Wu W, Zhang J. DisDock: A Deep Learning Method for Metal Ion-Protein Redocking. Proteins 2025. [PMID: 39838957 DOI: 10.1002/prot.26791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 09/10/2024] [Accepted: 12/18/2024] [Indexed: 01/23/2025]
Abstract
The structures of metalloproteins are essential for comprehending their functions and interactions. The breakthrough of AlphaFold has made it possible to predict protein structures with experimental accuracy. However, the type of metal ion that a metalloprotein binds and the binding structure are still not readily available, even with the predicted protein structure. In this study, we present DisDock, a deep learning method for predicting protein-metal docking. DisDock takes distogram of randomly initialized protein-ligand configuration as input and outputs the distogram of the predicted binding complex. It combines the U-net architecture with self-attention modules to enhance model performance. Taking inspiration from the physical principle that atoms in closer proximity display a stronger mutual attraction, this predictor capitalizes on geometric information to uncover latent characteristics indicative of atom interactions. To train our model, we employ a high-quality metalloprotein dataset sourced from the Mother of All Databases (MOAD). Experimental results demonstrate that our approach outperforms other existing methods in prediction accuracy for various types of metal ions.
Collapse
Affiliation(s)
- Menghan Lin
- Department of Statistics, Florida State University, Tallahassee, Florida, USA
| | - Keqiao Li
- Department of Statistics, Florida State University, Tallahassee, Florida, USA
| | - Yuan Zhang
- Department of Statistics, Florida State University, Tallahassee, Florida, USA
| | - Feng Pan
- Department of Statistics, Florida State University, Tallahassee, Florida, USA
| | - Wei Wu
- Department of Statistics, Florida State University, Tallahassee, Florida, USA
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, Florida, USA
| |
Collapse
|
3
|
Marchand A, Buckley S, Schneuing A, Pacesa M, Elia M, Gainza P, Elizarova E, Neeser RM, Lee PW, Reymond L, Miao Y, Scheller L, Georgeon S, Schmidt J, Schwaller P, Maerkl SJ, Bronstein M, Correia BE. Targeting protein-ligand neosurfaces with a generalizable deep learning tool. Nature 2025:10.1038/s41586-024-08435-4. [PMID: 39814890 DOI: 10.1038/s41586-024-08435-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Accepted: 11/20/2024] [Indexed: 01/18/2025]
Abstract
Molecular recognition events between proteins drive biological processes in living systems1. However, higher levels of mechanistic regulation have emerged, in which protein-protein interactions are conditioned to small molecules2-5. Despite recent advances, computational tools for the design of new chemically induced protein interactions have remained a challenging task for the field6,7. Here we present a computational strategy for the design of proteins that target neosurfaces, that is, surfaces arising from protein-ligand complexes. To develop this strategy, we leveraged a geometric deep learning approach based on learned molecular surface representations8,9 and experimentally validated binders against three drug-bound protein complexes: Bcl2-venetoclax, DB3-progesterone and PDF1-actinonin. All binders demonstrated high affinities and accurate specificities, as assessed by mutational and structural characterization. Remarkably, surface fingerprints previously trained only on proteins could be applied to neosurfaces induced by interactions with small molecules, providing a powerful demonstration of generalizability that is uncommon in other deep learning approaches. We anticipate that such designed chemically induced protein interactions will have the potential to expand the sensing repertoire and the assembly of new synthetic pathways in engineered cells for innovative drug-controlled cell-based therapies10.
Collapse
Affiliation(s)
- Anthony Marchand
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Stephen Buckley
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Arne Schneuing
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Martin Pacesa
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Maddalena Elia
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Pablo Gainza
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
- Monte Rosa Therapeutics, Boston, MA, USA
| | - Evgenia Elizarova
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Rebecca M Neeser
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
- Laboratory of Chemical Artificial Intelligence, Institute of Chemical Sciences and Engineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Pao-Wan Lee
- Laboratory of Biological Network Characterization, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Luc Reymond
- Biomolecular Screening Core Facility, School of Life Sciences, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Yangyang Miao
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Leo Scheller
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Sandrine Georgeon
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Joseph Schmidt
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Philippe Schwaller
- Laboratory of Chemical Artificial Intelligence, Institute of Chemical Sciences and Engineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Sebastian J Maerkl
- Laboratory of Biological Network Characterization, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Michael Bronstein
- Department of Computer Science, University of Oxford, Oxford, UK
- Aithyra Research Institute for Biomedical Artificial Intelligence, Austrian Academy of Sciences, Vienna, Austria
| | - Bruno E Correia
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, Ecole polytechnique fédérale de Lausanne, Lausanne, Switzerland.
| |
Collapse
|
4
|
Nie D, Zhao H, Zhang O, Weng G, Zhang H, Jin J, Lin H, Huang Y, Liu L, Li D, Hou T, Kang Y. Durian: A Comprehensive Benchmark for Structure-Based 3D Molecular Generation. J Chem Inf Model 2025; 65:173-186. [PMID: 39681323 DOI: 10.1021/acs.jcim.4c02232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2024]
Abstract
Three-dimensional (3D) molecular generation models employ deep neural networks to simultaneously generate both topological representation and molecular conformations. Due to their advantages in utilizing the structural and interaction information on targets, as well as their reduced reliance on existing bioactivity data, these models have attracted widespread attention. However, limited training and testing data sets and the unexpected biases inherent in single evaluation metrics pose a significant challenge in comparing these models in practical settings. In this work, we proposed Durian, an evaluation framework for structure-based 3D molecular generation that incorporates protein-ligand data with experimental affinity and a comprehensive array of physicochemical and geometric metrics. The benchmark tasks encompass assessing the capability of models to reproduce the property distribution of training sets, generate molecules with rational distributions of drug-related properties, and exhibit potential high affinity toward given targets. Binding affinities were evaluated using three independent docking methods (QuickVina2, Surflex and Gnina) with both "Dock" and "Score" modes to reduce false positives arising from conformational searches or scoring functions. Specifically, we applied Durian to six 3D molecular generation methods: LiGAN, Pocket2Mol, DiffSBDD, SBDD, GraphBP, and SurfGen. While most methods demonstrated the ability to generate drug-like small molecules with reasonable physicochemical properties, they exhibited varying degrees of limitations in balancing novelty, structural rationality, and synthetic accessibility, thereby constraining their practical applications in drug discovery. Based on a total of 17 metrics, Durian highlights the importance of multiobjective optimization in 3D molecular generation methods. For instance, SurfGen and SBDD showed relatively comprehensive performance but could benefit from further improvements in molecular conformational rationality. Our evaluation framework is expected to provide meaningful guidance for the selection, optimization, and application of 3D generative models in practical drug design tasks.
Collapse
Affiliation(s)
- Dou Nie
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Huifeng Zhao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Odin Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Gaoqi Weng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Hui Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Jieyu Jin
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Haitao Lin
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Yufei Huang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Liwei Liu
- Huawei Nanjing Research & Development Center, No. 101 Software Avenue, Yuhuatai District, Nanjing, 210012 Jiangsu, China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| |
Collapse
|
5
|
Gheeraert A, Bailly T, Ren Y, Hamraoui A, Te J, Vander Meersche Y, Cretin G, Leon Foun Lin R, Gelly JC, Pérez S, Guyon F, Galochkina T. DIONYSUS: a database of protein-carbohydrate interfaces. Nucleic Acids Res 2025; 53:D387-D395. [PMID: 39436020 PMCID: PMC11701518 DOI: 10.1093/nar/gkae890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 09/03/2024] [Accepted: 09/26/2024] [Indexed: 10/23/2024] Open
Abstract
Protein-carbohydrate interactions govern a wide variety of biological processes and play an essential role in the development of different diseases. Here, we present DIONYSUS, the first database of protein-carbohydrate interfaces annotated according to structural, chemical and functional properties of both proteins and carbohydrates. We provide exhaustive information on the nature of interactions, binding site composition, biological function and specific additional information retrieved from existing databases. The user can easily search the database using protein sequence and structure information or by carbohydrate binding site properties. Moreover, for a given interaction site, the user can perform its comparison with a representative subset of non-covalent protein-carbohydrate interactions to retrieve information on its potential function or specificity. Therefore, DIONYSUS is a source of valuable information both for a deeper understanding of general protein-carbohydrate interaction patterns, for annotation of the previously unannotated proteins and for such applications as carbohydrate-based drug design. DIONYSUS is freely available at www.dsimb.inserm.fr/DIONYSUS/.
Collapse
Affiliation(s)
- Aria Gheeraert
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Thomas Bailly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Yani Ren
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
- Université Paris-Saclay, INRAE, MetaGenoPolis, 78350 Jouy-en-Josas, France
| | - Ali Hamraoui
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
- Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Universite Paris, 75005 Paris, France
| | - Julie Te
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Yann Vander Meersche
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Gabriel Cretin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Ravy Leon Foun Lin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Serge Pérez
- Centre de Recherches sur les Macromolécules Végétales, University Grenoble Alpes, CNRS, UPR, 5301 Grenoble, France
| | - Frédéric Guyon
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| |
Collapse
|
6
|
Xue Z, Sun C, Zheng W, Lv J, Liu X. TargetSA: adaptive simulated annealing for target-specific drug design. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 41:btae730. [PMID: 39656791 DOI: 10.1093/bioinformatics/btae730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 10/28/2024] [Accepted: 12/02/2024] [Indexed: 12/17/2024]
Abstract
MOTIVATION The burgeoning field of target-specific drug design has attracted considerable attention, focusing on identifying compounds with high binding affinity toward specific target pockets. Nevertheless, existing target-specific deep generative models encounter notable challenges. Some models heavily rely on elaborate datasets and complicated training methodologies, while others neglect the multi-constraint optimization problem inherent in drug design, resulting in generated molecules with irrational structures or chemical properties. RESULTS To address these issues, we propose a novel framework (TargetSA) that leverages adaptive simulated annealing (SA) for target-specific molecular generation and multi-constraint optimization. The SA process explores the discrete structural space of molecules, progressively converging toward the optimal solution that fulfills the predefined objective. To propose novel compounds, we first predict promising editing positions based on historical experience, and then iteratively edit molecular graphs through four operations (insertion, replacement, deletion, and cyclization). Together, these operations collectively constitute a complete operation set, facilitating a thorough exploration of the drug-like space. Furthermore, we introduce a reversible sampling strategy to re-accept currently suboptimal solutions, greatly enhancing the generation quality. Empirical evaluations demonstrate that TargetSA achieves state-of-the-art performance in generating high-affinity molecules (average vina dock -9.09) while maintaining desirable chemical properties. AVAILABILITY AND IMPLEMENTATION https://github.com/XueZhe-Zachary/TargetSA.
Collapse
Affiliation(s)
- Zhe Xue
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Chenwei Sun
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Wenhao Zheng
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Jiancheng Lv
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Xianggen Liu
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
- Laboratory of Anesthesia and Critical Care Medicine, Department of Anesthesiology, Translational Neuroscience Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| |
Collapse
|
7
|
Vittorio S, Lunghini F, Morerio P, Gadioli D, Orlandini S, Silva P, Jan Martinovic, Pedretti A, Bonanni D, Del Bue A, Palermo G, Vistoli G, Beccari AR. Addressing docking pose selection with structure-based deep learning: Recent advances, challenges and opportunities. Comput Struct Biotechnol J 2024; 23:2141-2151. [PMID: 38827235 PMCID: PMC11141151 DOI: 10.1016/j.csbj.2024.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 06/04/2024] Open
Abstract
Molecular docking is a widely used technique in drug discovery to predict the binding mode of a given ligand to its target. However, the identification of the near-native binding pose in docking experiments still represents a challenging task as the scoring functions currently employed by docking programs are parametrized to predict the binding affinity, and, therefore, they often fail to correctly identify the ligand native binding conformation. Selecting the correct binding mode is crucial to obtaining meaningful results and to conveniently optimizing new hit compounds. Deep learning (DL) algorithms have been an area of a growing interest in this sense for their capability to extract the relevant information directly from the protein-ligand structure. Our review aims to present the recent advances regarding the development of DL-based pose selection approaches, discussing limitations and possible future directions. Moreover, a comparison between the performances of some classical scoring functions and DL-based methods concerning their ability to select the correct binding mode is reported. In this regard, two novel DL-based pose selectors developed by us are presented.
Collapse
Affiliation(s)
- Serena Vittorio
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Filippo Lunghini
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| | - Pietro Morerio
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Davide Gadioli
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Sergio Orlandini
- SCAI, SuperComputing Applications and Innovation Department, CINECA, Via dei Tizii 6, Rome 00185, Italy
| | - Paulo Silva
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Jan Martinovic
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Domenico Bonanni
- Department of Physical and Chemical Sciences, University of L′Aquila, via Vetoio, L′Aquila 67010, Italy
| | - Alessio Del Bue
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Gianluca Palermo
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Andrea R. Beccari
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| |
Collapse
|
8
|
Schneuing A, Harris C, Du Y, Didi K, Jamasb A, Igashov I, Du W, Gomes C, Blundell TL, Lio P, Welling M, Bronstein M, Correia B. Structure-based drug design with equivariant diffusion models. NATURE COMPUTATIONAL SCIENCE 2024; 4:899-909. [PMID: 39653846 DOI: 10.1038/s43588-024-00737-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 11/04/2024] [Indexed: 12/21/2024]
Abstract
Structure-based drug design (SBDD) aims to design small-molecule ligands that bind with high affinity and specificity to pre-determined protein targets. Generative SBDD methods leverage structural data of drugs with their protein targets to propose new drug candidates. However, most existing methods focus exclusively on bottom-up de novo design of compounds or tackle other drug development challenges with task-specific models. The latter requires curation of suitable datasets, careful engineering of the models and retraining from scratch for each task. Here we show how a single pretrained diffusion model can be applied to a broader range of problems, such as off-the-shelf property optimization, explicit negative design and partial molecular design with inpainting. We formulate SBDD as a three-dimensional conditional generation problem and present DiffSBDD, an SE(3)-equivariant diffusion model that generates novel ligands conditioned on protein pockets. Furthermore, we show how additional constraints can be used to improve the generated drug candidates according to a variety of computational metrics.
Collapse
Affiliation(s)
- Arne Schneuing
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| | | | | | | | - Arian Jamasb
- University of Cambridge, Cambridge, UK
- Prescient Design, Genentech, Basel, Switzerland
| | - Ilia Igashov
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Weitao Du
- Chinese Academy of Mathematics and System Science, Beijing, China
| | | | - Tom L Blundell
- University of Cambridge, Cambridge, UK
- Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
| | - Pietro Lio
- University of Cambridge, Cambridge, UK
- University of Rome 'La Sapienza', Rome, Italy
| | - Max Welling
- Microsoft Research AI4Science, Amsterdam, Netherlands
- University of Amsterdam, Amsterdam, Netherlands
| | | | - Bruno Correia
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| |
Collapse
|
9
|
Utgés JS, Barton GJ. Comparative evaluation of methods for the prediction of protein-ligand binding sites. J Cheminform 2024; 16:126. [PMID: 39529176 PMCID: PMC11552181 DOI: 10.1186/s13321-024-00923-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 10/28/2024] [Indexed: 11/16/2024] Open
Abstract
The accurate identification of protein-ligand binding sites is of critical importance in understanding and modulating protein function. Accordingly, ligand binding site prediction has remained a research focus for over three decades with over 50 methods developed and a change of paradigm from geometry-based to machine learning. In this work, we collate 13 ligand binding site predictors, spanning 30 years, focusing on the latest machine learning-based methods such as VN-EGNN, IF-SitePred, GrASP, PUResNet, and DeepPocket and compare them to the established P2Rank, PRANK and fpocket and earlier methods like PocketFinder, Ligsite and Surfnet. We benchmark the methods against the human subset of our new curated reference dataset, LIGYSIS. LIGYSIS is a comprehensive protein-ligand complex dataset comprising 30,000 proteins with bound ligands which aggregates biologically relevant unique protein-ligand interfaces across biological units of multiple structures from the same protein. LIGYSIS is an improvement for testing methods over earlier datasets like sc-PDB, PDBbind, binding MOAD, COACH420 and HOLO4K which either include 1:1 protein-ligand complexes or consider asymmetric units. Re-scoring of fpocket predictions by PRANK and DeepPocket display the highest recall (60%) whilst IF-SitePred presents the lowest recall (39%). We demonstrate the detrimental effect that redundant prediction of binding sites has on performance as well as the beneficial impact of stronger pocket scoring schemes, with improvements up to 14% in recall (IF-SitePred) and 30% in precision (Surfnet). Finally, we propose top-N+2 recall as the universal benchmark metric for ligand binding site prediction and urge authors to share not only the source code of their methods, but also of their benchmark.Scientific contributionsThis study conducts the largest benchmark of ligand binding site prediction methods to date, comparing 13 original methods and 15 variants using 10 informative metrics. The LIGYSIS dataset is introduced, which aggregates biologically relevant protein-ligand interfaces across multiple structures of the same protein. The study highlights the detrimental effect of redundant binding site prediction and demonstrates significant improvement in recall and precision through stronger scoring schemes. Finally, top-N+2 recall is proposed as a universal benchmark metric for ligand binding site prediction, with a recommendation for open-source sharing of both methods and benchmarks.
Collapse
Affiliation(s)
- Javier S Utgés
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, Scotland, UK
| | - Geoffrey J Barton
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, Scotland, UK.
| |
Collapse
|
10
|
Long Y, Donald BR. Predicting Affinity Through Homology (PATH): Interpretable Binding Affinity Prediction with Persistent Homology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.16.567384. [PMID: 38014181 PMCID: PMC10680814 DOI: 10.1101/2023.11.16.567384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Accurate binding affinity prediction is crucial to structure-based drug design. Recent work used computational topology to obtain an effective representation of protein-ligand interactions. While algorithms using algebraic topology have proven useful in predicting properties of biomolecules, previous algorithms employed uninterpretable machine learning models which failed to explain the underlying geometric and topological features that drive accurate binding affinity prediction. Moreover, they had high computational complexity which made them intractable for large proteins. We present the fastest known algorithm to compute persistent homology features for protein-ligand complexes using opposition distance, with a runtime that is independent of the protein size. Then, we exploit these features in a novel, interpretable algorithm to predict protein-ligand binding affinity. Our algorithm achieves interpretability through an effective embedding of distances across bipartite matchings of the protein and ligand atoms into real-valued functions by summing Gaussians centered at features constructed by persistent homology. We name these functions internuclear persistent contours (IPCs) . Next, we introduce persistence fingerprints , a vector with 10 components that sketches the distances of different bipartite matching between protein and ligand atoms, refined from IPCs. Let the number of protein atoms in the protein-ligand complex be n , number of ligand atoms be m , and ω ≈ 2.4 be the matrix multiplication exponent. We show that for any 0 < ε < 1, after an 𝒪 ( mn log( mn )) preprocessing procedure, we can compute an ε -accurate approximation to the persistence fingerprint in 𝒪 ( m log 6 ω ( m/ε )) time, independent of protein size. This is an improvement in time complexity by a factor of 𝒪 (( m + n ) 3 ) over any previous binding affinity prediction that uses persistent homology. We show that the representational power of persistence fingerprint generalizes to protein-ligand binding datasets beyond the training dataset. Then, we introduce PATH , Predicting Affinity Through Homology, a two-part algorithm consisting of PATH + and PATH - . PATH + is an interpretable, small ensemble of shallow regression trees for binding affinity prediction from persistence fingerprints. We show that despite using 1,400-fold fewer features, PATH + has comparable performance to a previous state-of-the-art binding affinity prediction algorithm that uses persistent homology. Moreover, PATH + has the advantage of being interpretable. We visualize the features captured by persistence fingerprint for variant HIV-1 protease complexes and show that persistence fingerprint captures binding-relevant structural mutations. PATH - , in turn, uses regression trees over IPCs to differentiate between binding and decoy complexes. Finally, we benchmarked PATH versus established binding affinity prediction algorithms spanning physics-based, knowledge-based, and deep learning methods, revealing that PATH has comparable or better performance with less overfitting, compared to these state-of-the-art methods. The source code for PATH is released open-source as part of the osprey protein design software package.
Collapse
|
11
|
Zhang Z, Shen WX, Liu Q, Zitnik M. Efficient Generation of Protein Pockets with PocketGen. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.25.581968. [PMID: 38464121 PMCID: PMC10925136 DOI: 10.1101/2024.02.25.581968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Designing protein-binding proteins is critical for drug discovery. However, the AI-based design of such proteins is challenging due to the complexity of ligand-protein interactions, the flexibility of ligand molecules and amino acid side chains, and sequence-structure dependencies. We introduce PocketGen, a deep generative model that simultaneously produces both the residue sequence and atomic structure of the protein regions where ligand interactions occur. PocketGen ensures consistency between sequence and structure by using a graph transformer for structural encoding and a sequence refinement module based on a protein language model. The bilevel graph transformer captures interactions at multiple scales, including atom, residue, and ligand levels. To enhance sequence refinement, PocketGen integrates a structural adapter into the protein language model, ensuring that structure-based predictions align with sequence-based predictions. PocketGen can generate high-fidelity protein pockets with superior binding affinity and structural validity. It operates ten times faster than physics-based methods and achieves a 95% success rate, defined as the percentage of generated pockets with higher binding affinity than reference pockets. Additionally, it attains an amino acid recovery rate exceeding 64%.
Collapse
Affiliation(s)
- Zaixi Zhang
- State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China, Hefei, Anhui, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, Anhui, China
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Wan Xiang Shen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Qi Liu
- State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China, Hefei, Anhui, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, Anhui, China
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
| |
Collapse
|
12
|
Manen-Freixa L, Antolin AA. Polypharmacology prediction: the long road toward comprehensively anticipating small-molecule selectivity to de-risk drug discovery. Expert Opin Drug Discov 2024; 19:1043-1069. [PMID: 39004919 DOI: 10.1080/17460441.2024.2376643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 07/02/2024] [Indexed: 07/16/2024]
Abstract
INTRODUCTION Small molecules often bind to multiple targets, a behavior termed polypharmacology. Anticipating polypharmacology is essential for drug discovery since unknown off-targets can modulate safety and efficacy - profoundly affecting drug discovery success. Unfortunately, experimental methods to assess selectivity present significant limitations and drugs still fail in the clinic due to unanticipated off-targets. Computational methods are a cost-effective, complementary approach to predict polypharmacology. AREAS COVERED This review aims to provide a comprehensive overview of the state of polypharmacology prediction and discuss its strengths and limitations, covering both classical cheminformatics methods and bioinformatic approaches. The authors review available data sources, paying close attention to their different coverage. The authors then discuss major algorithms grouped by the types of data that they exploit using selected examples. EXPERT OPINION Polypharmacology prediction has made impressive progress over the last decades and contributed to identify many off-targets. However, data incompleteness currently limits most approaches to comprehensively predict selectivity. Moreover, our limited agreement on model assessment challenges the identification of the best algorithms - which at present show modest performance in prospective real-world applications. Despite these limitations, the exponential increase of multidisciplinary Big Data and AI hold much potential to better polypharmacology prediction and de-risk drug discovery.
Collapse
Affiliation(s)
- Leticia Manen-Freixa
- Oncobell Division, Bellvitge Biomedical Research Institute (IDIBELL) and ProCURE Department, Catalan Institute of Oncology (ICO), Barcelona, Spain
| | - Albert A Antolin
- Oncobell Division, Bellvitge Biomedical Research Institute (IDIBELL) and ProCURE Department, Catalan Institute of Oncology (ICO), Barcelona, Spain
- Center for Cancer Drug Discovery, The Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK
| |
Collapse
|
13
|
Weller J, Rohs R. Structure-Based Drug Design with a Deep Hierarchical Generative Model. J Chem Inf Model 2024; 64:6450-6463. [PMID: 39058534 PMCID: PMC11350878 DOI: 10.1021/acs.jcim.4c01193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 07/16/2024] [Accepted: 07/17/2024] [Indexed: 07/28/2024]
Abstract
Recently, the remarkable growth of available crystal structure data and libraries of commercially available or readily synthesizable molecules have unlocked previously inaccessible regions of chemical space for drug development. Paired with improvements in virtual ligand screening methods, these expanded libraries are having a notable impact on early drug design efforts. Yet screening-based methods still face scalability limits, due to computational constraints and the sheer scale of drug-like space. Machine learning approaches are overcoming these limitations by learning the fundamental intra- and intermolecular relationships in drug-target systems from existing data. Here, we introduce DrugHIVE, a deep hierarchical variational autoencoder that outperforms state-of-the-art autoregressive and diffusion-based methods in both speed and performance on common generative benchmarks. DrugHIVE's hierarchical design enables improved control over molecular generation. Its capabilities include dramatically increasing virtual screening efficiency and accelerating a wide range of common drug design tasks, including de novo generation, molecular optimization, scaffold hopping, linker design, and high-throughput pattern replacement. Our highly scalable method can even be applied to receptors with high-confidence AlphaFold-predicted structures, extending the ability to generate high-quality drug-like molecules to a majority of the unsolved human proteome.
Collapse
Affiliation(s)
- Jesse
A. Weller
- Department
of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, United States
- Department
of Physics and Astronomy, University of
Southern California, Los Angeles, California 90089, United States
| | - Remo Rohs
- Department
of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, United States
- Department
of Physics and Astronomy, University of
Southern California, Los Angeles, California 90089, United States
- Department
of Chemistry, University of Southern California, Los Angeles, California 90089, United States
- Thomas
Lord Department of Computer Science, University
of Southern California, Los Angeles, California 90089, United States
| |
Collapse
|
14
|
Morehead A, Cheng J. Geometry-complete diffusion for 3D molecule generation and optimization. Commun Chem 2024; 7:150. [PMID: 38961141 PMCID: PMC11222514 DOI: 10.1038/s42004-024-01233-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Accepted: 06/20/2024] [Indexed: 07/05/2024] Open
Abstract
Generative deep learning methods have recently been proposed for generating 3D molecules using equivariant graph neural networks (GNNs) within a denoising diffusion framework. However, such methods are unable to learn important geometric properties of 3D molecules, as they adopt molecule-agnostic and non-geometric GNNs as their 3D graph denoising networks, which notably hinders their ability to generate valid large 3D molecules. In this work, we address these gaps by introducing the Geometry-Complete Diffusion Model (GCDM) for 3D molecule generation, which outperforms existing 3D molecular diffusion models by significant margins across conditional and unconditional settings for the QM9 dataset and the larger GEOM-Drugs dataset, respectively. Importantly, we demonstrate that GCDM's generative denoising process enables the model to generate a significant proportion of valid and energetically-stable large molecules at the scale of GEOM-Drugs, whereas previous methods fail to do so with the features they learn. Additionally, we show that extensions of GCDM can not only effectively design 3D molecules for specific protein pockets but can be repurposed to consistently optimize the geometry and chemical composition of existing 3D molecules for molecular stability and property specificity, demonstrating new versatility of molecular diffusion models. Code and data are freely available on GitHub .
Collapse
Affiliation(s)
- Alex Morehead
- Department of Electrical Engineering & Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO, 65211, USA.
| | - Jianlin Cheng
- Department of Electrical Engineering & Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO, 65211, USA
| |
Collapse
|
15
|
Amorim AM, Piochi LF, Gaspar AT, Preto A, Rosário-Ferreira N, Moreira IS. Advancing Drug Safety in Drug Development: Bridging Computational Predictions for Enhanced Toxicity Prediction. Chem Res Toxicol 2024; 37:827-849. [PMID: 38758610 PMCID: PMC11187637 DOI: 10.1021/acs.chemrestox.3c00352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/29/2024] [Accepted: 05/07/2024] [Indexed: 05/19/2024]
Abstract
The attrition rate of drugs in clinical trials is generally quite high, with estimates suggesting that approximately 90% of drugs fail to make it through the process. The identification of unexpected toxicity issues during preclinical stages is a significant factor contributing to this high rate of failure. These issues can have a major impact on the success of a drug and must be carefully considered throughout the development process. These late-stage rejections or withdrawals of drug candidates significantly increase the costs associated with drug development, particularly when toxicity is detected during clinical trials or after market release. Understanding drug-biological target interactions is essential for evaluating compound toxicity and safety, as well as predicting therapeutic effects and potential off-target effects that could lead to toxicity. This will enable scientists to predict and assess the safety profiles of drug candidates more accurately. Evaluation of toxicity and safety is a critical aspect of drug development, and biomolecules, particularly proteins, play vital roles in complex biological networks and often serve as targets for various chemicals. Therefore, a better understanding of these interactions is crucial for the advancement of drug development. The development of computational methods for evaluating protein-ligand interactions and predicting toxicity is emerging as a promising approach that adheres to the 3Rs principles (replace, reduce, and refine) and has garnered significant attention in recent years. In this review, we present a thorough examination of the latest breakthroughs in drug toxicity prediction, highlighting the significance of drug-target binding affinity in anticipating and mitigating possible adverse effects. In doing so, we aim to contribute to the development of more effective and secure drugs.
Collapse
Affiliation(s)
- Ana M.
B. Amorim
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD
Programme in Biosciences, Department of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PURR.AI,
Rua Pedro Nunes, IPN Incubadora, Ed C, 3030-199 Coimbra, Portugal
| | - Luiz F. Piochi
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Ana T. Gaspar
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - António
J. Preto
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD Programme
in Experimental Biology and Biomedicine, Institute for Interdisciplinary
Research (IIIUC), University of Coimbra, Casa Costa Alemão, 3030-789 Coimbra, Portugal
| | - Nícia Rosário-Ferreira
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Irina S. Moreira
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| |
Collapse
|
16
|
Tang X, Dai H, Knight E, Wu F, Li Y, Li T, Gerstein M. A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation. Brief Bioinform 2024; 25:bbae338. [PMID: 39007594 PMCID: PMC11247410 DOI: 10.1093/bib/bbae338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 05/21/2024] [Accepted: 06/27/2024] [Indexed: 07/16/2024] Open
Abstract
Artificial intelligence (AI)-driven methods can vastly improve the historically costly drug design process, with various generative models already in widespread use. Generative models for de novo drug design, in particular, focus on the creation of novel biological compounds entirely from scratch, representing a promising future direction. Rapid development in the field, combined with the inherent complexity of the drug design process, creates a difficult landscape for new researchers to enter. In this survey, we organize de novo drug design into two overarching themes: small molecule and protein generation. Within each theme, we identify a variety of subtasks and applications, highlighting important datasets, benchmarks, and model architectures and comparing the performance of top models. We take a broad approach to AI-driven drug design, allowing for both micro-level comparisons of various methods within each subtask and macro-level observations across different fields. We discuss parallel challenges and approaches between the two applications and highlight future directions for AI-driven de novo drug design as a whole. An organized repository of all covered sources is available at https://github.com/gersteinlab/GenAI4Drug.
Collapse
Affiliation(s)
- Xiangru Tang
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Howard Dai
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Elizabeth Knight
- School of Medicine, Yale University, New Haven, CT 06520, United States
| | - Fang Wu
- Computer Science Department, Stanford University, CA 94305, United States
| | - Yunyang Li
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Tianxiao Li
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
| | - Mark Gerstein
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
- Department of Statistics & Data Science, Yale University, New Haven, CT 06520, United States
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT 06520, United States
| |
Collapse
|
17
|
Siebenmorgen T, Menezes F, Benassou S, Merdivan E, Didi K, Mourão ASD, Kitel R, Liò P, Kesselheim S, Piraud M, Theis FJ, Sattler M, Popowicz GM. MISATO: machine learning dataset of protein-ligand complexes for structure-based drug discovery. NATURE COMPUTATIONAL SCIENCE 2024; 4:367-378. [PMID: 38730184 PMCID: PMC11136668 DOI: 10.1038/s43588-024-00627-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 04/11/2024] [Indexed: 05/12/2024]
Abstract
Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 μs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.
Collapse
Affiliation(s)
- Till Siebenmorgen
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany
| | - Filipe Menezes
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany
| | - Sabrina Benassou
- Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany
| | | | - Kieran Didi
- Computer Laboratory, Cambridge University, Cambridge, UK
| | - André Santos Dias Mourão
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany
| | - Radosław Kitel
- Faculty of Chemistry, Jagiellonian University, Krakow, Poland
| | - Pietro Liò
- Computer Laboratory, Cambridge University, Cambridge, UK
| | - Stefan Kesselheim
- Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany
| | - Marie Piraud
- Helmholtz AI, Helmholtz Munich, Neuherberg, Germany
| | - Fabian J Theis
- Helmholtz AI, Helmholtz Munich, Neuherberg, Germany
- Computational Health Center, Institute of Computational Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Michael Sattler
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany
| | - Grzegorz M Popowicz
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany.
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany.
| |
Collapse
|
18
|
Li X, Shen C, Zhu H, Yang Y, Wang Q, Yang J, Huang N. A High-Quality Data Set of Protein-Ligand Binding Interactions Via Comparative Complex Structure Modeling. J Chem Inf Model 2024; 64:2454-2466. [PMID: 38181418 DOI: 10.1021/acs.jcim.3c01170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2024]
Abstract
High-quality protein-ligand complex structures provide the basis for understanding the nature of noncovalent binding interactions at the atomic level and enable structure-based drug design. However, experimentally determined complex structures are scarce compared with the vast chemical space. In this study, we addressed this issue by constructing the BindingNet data set via comparative complex structure modeling, which contains 69,816 modeled high-quality protein-ligand complex structures with experimental binding affinity data. BindingNet provides valuable insights into investigating protein-ligand interactions, allowing visual inspection and interpretation of structural analogues' structure-activity relationships. It can also be used for evaluating machine-learning-based scoring functions. Our results indicate that machine learning models trained on BindingNet could reduce the bias caused by buried solvent-accessible surface area, as we previously found for models trained on the PDBbind data set. We also discussed strategies to improve BindingNet and its potential utilization for benchmarking the molecular docking methods and ligand binding free energy calculation approaches. The BindingNet complements PDBbind in constructing a sufficient and unbiased protein-ligand binding data set and is freely available at http://bindingnet.huanglab.org.cn.
Collapse
Affiliation(s)
- Xuelian Li
- National Institute of Biological Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Cheng Shen
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Hui Zhu
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
| | - Yujian Yang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Qing Wang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Jincai Yang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Niu Huang
- National Institute of Biological Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
| |
Collapse
|
19
|
Carbery A, Buttenschoen M, Skyner R, von Delft F, Deane CM. Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures. J Cheminform 2024; 16:32. [PMID: 38486231 PMCID: PMC10941399 DOI: 10.1186/s13321-024-00821-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 03/01/2024] [Indexed: 03/17/2024] Open
Abstract
Protein-ligand binding site prediction is a useful tool for understanding the functional behaviour and potential drug-target interactions of a novel protein of interest. However, most binding site prediction methods are tested by providing crystallised ligand-bound (holo) structures as input. This testing regime is insufficient to understand the performance on novel protein targets where experimental structures are not available. An alternative option is to provide computationally predicted protein structures, but this is not commonly tested. However, due to the training data used, computationally-predicted protein structures tend to be extremely accurate, and are often biased toward a holo conformation. In this study we describe and benchmark IF-SitePred, a protein-ligand binding site prediction method which is based on the labelling of ESM-IF1 protein language model embeddings combined with point cloud annotation and clustering. We show that not only is IF-SitePred competitive with state-of-the-art methods when predicting binding sites on experimental structures, but it performs better on proxies for novel proteins where low accuracy has been simulated by molecular dynamics. Finally, IF-SitePred outperforms other methods if ensembles of predicted protein structures are generated.
Collapse
Affiliation(s)
- Anna Carbery
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK
- Diamond Light Source, Harwell Science and Innovation Campus, Didcot, OX11 0DE, UK
| | - Martin Buttenschoen
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK
| | - Rachael Skyner
- OMass Therapeutics, Building 4000, Chancellor Court, John Smith Drive, ARC Oxford, OX4 2GX, UK
| | - Frank von Delft
- Diamond Light Source, Harwell Science and Innovation Campus, Didcot, OX11 0DE, UK
- Centre for Medicines Discovery, University of Oxford, Oxford, OX3 7DQ, UK
- Research Complex at Harwell, Harwell Science and Innovation Campus, Didcot, OX11 0FA, United Kingdom
- Department of Biochemistry, University of Johannesburg, Johannesburg, 2006, South Africa
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK.
| |
Collapse
|
20
|
Clemente CM, Prieto JM, Martí M. Unlocking Precision Docking for Metalloproteins. J Chem Inf Model 2024; 64:1581-1592. [PMID: 38373276 DOI: 10.1021/acs.jcim.3c01853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
Metalloproteins play a fundamental role in molecular biology, contributing to various biological processes. However, the discovery of high-affinity ligands targeting metalloproteins has been delayed due, in part, to a lack of suitable tools and data. Molecular docking, a widely used technique for virtual screening of small-molecule ligand interactions with proteins, often faces challenges when applied to metalloproteins due to the particular nature of the ligand metal bond. To address these limitations associated with docking metalloproteins, we introduce a knowledge-driven docking approach known as "metalloprotein bias docking" (MBD), which extends the AutoDock Bias technique. We assembled a comprehensive data set of metalloprotein-ligand complexes from 15 different metalloprotein families, encompassing Ca, Co, Fe, Mg, Mn, and Zn metal ions. Subsequently, we conducted a performance analysis of our MBD method and compared it to the conventional docking (CD) program AutoDock4, applied to various metalloprotein targets within our data set. Our results demonstrate that MBD outperforms CD, significantly enhancing accuracy, selectivity, and precision in ligand pose prediction. Additionally, we observed a positive correlation between our predicted ligand free energies and the corresponding experimental values. These findings underscore the potential of MBD as a valuable tool for the effective exploration of metalloprotein-ligand interactions.
Collapse
Affiliation(s)
- Camila M Clemente
- Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires (FCEyN-UBA) e Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN) CONICET, Pabellón 2 de Ciudad Universitaria, Ciudad de Buenos Aires C1428EHA, Argentina
| | - Juan M Prieto
- Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires (FCEyN-UBA) e Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN) CONICET, Pabellón 2 de Ciudad Universitaria, Ciudad de Buenos Aires C1428EHA, Argentina
| | - Marcelo Martí
- Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires (FCEyN-UBA) e Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN) CONICET, Pabellón 2 de Ciudad Universitaria, Ciudad de Buenos Aires C1428EHA, Argentina
| |
Collapse
|
21
|
Zhang L, Yang Y, Yang Y, Xiao Z. Discovery of Novel Metalloenzyme Inhibitors Based on Property Characterization: Strategy and Application for HDAC1 Inhibitors. Molecules 2024; 29:1096. [PMID: 38474606 DOI: 10.3390/molecules29051096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/17/2024] [Accepted: 02/27/2024] [Indexed: 03/14/2024] Open
Abstract
Metalloenzymes are ubiquitously present in the human body and are relevant to a variety of diseases. However, the development of metalloenzyme inhibitors is limited by low specificity and poor drug-likeness associated with metal-binding fragments (MBFs). A generalized drug discovery strategy was established, which is characterized by the property characterization of zinc-dependent metalloenzyme inhibitors (ZnMIs). Fifteen potential Zn2+-binding fragments (ZnBFs) were identified, and a customized pharmacophore feature was defined based on these ZnBFs. The customized feature was set as a required feature and applied to a search for novel inhibitors for histone deacetylase 1 (HDAC1). Ten potential HDAC1 inhibitors were recognized, and one of them (compound 9) was a known potent HDAC1 inhibitor. The results demonstrated the effectiveness of our strategy to identify novel inhibitors for zinc-dependent metalloenzymes.
Collapse
Affiliation(s)
- Lu Zhang
- Beijing Key Laboratory of Active Substance Discovery and Druggability Evaluation, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
- Department of Toxicology, Tianjin Centers for Disease Control and Prevention, Tianjin 300011, China
| | - Yajun Yang
- Beijing Key Laboratory of Active Substance Discovery and Druggability Evaluation, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Ying Yang
- Beijing Key Laboratory of Active Substance Discovery and Druggability Evaluation, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Zhiyan Xiao
- Beijing Key Laboratory of Active Substance Discovery and Druggability Evaluation, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
- State Key Laboratory of Digestive Health, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| |
Collapse
|
22
|
Wang DD, Wu W, Wang R. Structure-based, deep-learning models for protein-ligand binding affinity prediction. J Cheminform 2024; 16:2. [PMID: 38173000 PMCID: PMC10765576 DOI: 10.1186/s13321-023-00795-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Accepted: 12/10/2023] [Indexed: 01/05/2024] Open
Abstract
The launch of AlphaFold series has brought deep-learning techniques into the molecular structural science. As another crucial problem, structure-based prediction of protein-ligand binding affinity urgently calls for advanced computational techniques. Is deep learning ready to decode this problem? Here we review mainstream structure-based, deep-learning approaches for this problem, focusing on molecular representations, learning architectures and model interpretability. A model taxonomy has been generated. To compensate for the lack of valid comparisons among those models, we realized and evaluated representatives from a uniform basis, with the advantages and shortcomings discussed. This review will potentially benefit structure-based drug discovery and related areas.
Collapse
Affiliation(s)
- Debby D Wang
- School of Science and Technology, Hong Kong Metropolitan University, 81 Chung Hau Sreet, Ho Man Tin, Hong Kong, China
| | - Wenhui Wu
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen, 518060, China
| | - Ran Wang
- School of Mathematical Science, Shenzhen University, Shenzhen, 518060, China.
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen, 518060, China.
- Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Shenzhen University, Shenzhen , 518060, China.
| |
Collapse
|
23
|
Verburgt J, Jain A, Kihara D. Recent Deep Learning Applications to Structure-Based Drug Design. Methods Mol Biol 2024; 2714:215-234. [PMID: 37676602 PMCID: PMC10578466 DOI: 10.1007/978-1-0716-3441-7_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Identification and optimization of small molecules that bind to and modulate protein function is a crucial step in the early stages of drug development. For decades, this process has benefitted greatly from the use of computational models that can provide insights into molecular binding affinity and optimization. Over the past several years, various types of deep learning models have shown great potential in improving and enhancing the performance of traditional computational methods. In this chapter, we provide an overview of recent deep learning-based developments with applications in drug discovery. We classify these methods into four subcategories dependent on the task each method is aiming to solve. For each subcategory, we provide the general framework of the approach and discuss individual methods.
Collapse
Affiliation(s)
- Jacob Verburgt
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Anika Jain
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
24
|
Libouban PY, Aci-Sèche S, Gómez-Tamayo JC, Tresadern G, Bonnet P. The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks. Int J Mol Sci 2023; 24:16120. [PMID: 38003312 PMCID: PMC10671244 DOI: 10.3390/ijms242216120] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023] Open
Abstract
Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein-ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models' decision-making processes and accurately compare the performance of models.
Collapse
Affiliation(s)
- Pierre-Yves Libouban
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Samia Aci-Sèche
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Jose Carlos Gómez-Tamayo
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Gary Tresadern
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Pascal Bonnet
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| |
Collapse
|
25
|
Krishna Swaroop A, Krishnan Namboori PK, Esakkimuthukumar M, Praveen TK, Nagarjuna P, Patnaik SK, Selvaraj J. Leveraging decagonal in-silico strategies for uncovering IL-6 inhibitors with precision. Comput Biol Med 2023; 163:107231. [PMID: 37421735 DOI: 10.1016/j.compbiomed.2023.107231] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 06/27/2023] [Accepted: 07/01/2023] [Indexed: 07/10/2023]
Abstract
Interleukin-6 upregulation leads to various acute phase reactions such as local inflammation and systemic inflammation in many diseases like cancer, multiple sclerosis, rheumatoid arthritis, anemia, and Alzheimer's disease stimulating JAK/STAT3, Ras/MAPK, PI3K-PKB/Akt pathogenic pathways. Since no small molecules are available in the market against IL-6 till now, we have designed a class of small bioactive 1,3 - indanedione (IDC) molecules for inhibiting IL-6 using a decagonal approach computational studies. The IL-6 mutations were mapped in the IL-6 protein (PDB ID: 1ALU) from thorough pharmacogenomic and proteomics studies. The protein-drug interaction networking analysis for 2637 FFDA-approved drugs with IL-6 protein using Cytoscape software showed that 14 drugs have prominent interactions with IL-6. Molecular docking studies showed that the designed compound IDC-24 (-11.8 kcal/mol) and methotrexate (-5.20) bound most strongly to the 1ALU south asian population mutated protein. MMGBSA results indicated that IDC-24 (-41.78 kcal/mol) and methotrexate (-36.81 kcal/mol) had the highest binding energy when compared to the standard molecules LMT-28 (-35.87 kcal/mol) and MDL-A (-26.18 kcal/mol). These results we substantiated by the molecular dynamic studies in which the compound IDC-24 and the methotrexate had the highest stability. Further, the MMPBSA computations produced energies of -28 kcal/mol and -14.69 kcal/mol for IDC-24 and LMT-28. KDeep absolute binding affinity computations revealed energies of -5.81 kcal/mol and -4.74 kcal/mol for IDC-24 and LMT-28 respectively. Finally, our decagonal approach established the compound IDC-24 from the designed 1,3-indanedione library and methotrexate from protein drug interaction networking as suitable HITs against IL-6.
Collapse
Affiliation(s)
- Akey Krishna Swaroop
- Department of Pharmaceutical Chemistry, JSS College of Pharmacy, JSS Academy of Higher Education and Research, Ooty, Tamilnadu, India
| | - P K Krishnan Namboori
- Amrita Molecular Modeling and Synthesis (AMMAS) Research Lab, Amrita Vishwavidyapeetham, Amrita Nagar, Ettimadai, Coimbatore, Tamilnadu, India
| | - M Esakkimuthukumar
- Department of Pharmaceutical Chemistry, JSS College of Pharmacy, JSS Academy of Higher Education and Research, Ooty, Tamilnadu, India
| | - T K Praveen
- Department of Pharmacology, JSS College of Pharmacy, JSS Academy of Higher Education and Research, Ooty, Tamilnadu, India
| | - Palathoti Nagarjuna
- Department of Pharmaceutical Chemistry, JSS College of Pharmacy, JSS Academy of Higher Education and Research, Ooty, Tamilnadu, India
| | - Sunil Kumar Patnaik
- Department of Pharmaceutical Chemistry, JSS College of Pharmacy, JSS Academy of Higher Education and Research, Ooty, Tamilnadu, India
| | - Jubie Selvaraj
- Department of Pharmaceutical Chemistry, JSS College of Pharmacy, JSS Academy of Higher Education and Research, Ooty, Tamilnadu, India.
| |
Collapse
|
26
|
Thomas M, Bender A, de Graaf C. Integrating structure-based approaches in generative molecular design. Curr Opin Struct Biol 2023; 79:102559. [PMID: 36870277 DOI: 10.1016/j.sbi.2023.102559] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 01/23/2023] [Accepted: 01/31/2023] [Indexed: 03/06/2023]
Abstract
Generative molecular design for drug discovery and development has seen a recent resurgence promising to improve the efficiency of the design-make-test-analyse cycle; by computationally exploring much larger chemical spaces than traditional virtual screening techniques. However, most generative models thus far have only utilized small-molecule information to train and condition de novo molecule generators. Here, we instead focus on recent approaches that incorporate protein structure into de novo molecule optimization in an attempt to maximize the predicted on-target binding affinity of generated molecules. We summarize these structure integration principles into either distribution learning or goal-directed optimization and for each case whether the approach is protein structure-explicit or implicit with respect to the generative model. We discuss recent approaches in the context of this categorization and provide our perspective on the future direction of the field.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK. https://twitter.com/@AndreasBenderUK
| | - Chris de Graaf
- Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK. https://twitter.com/@Chris_de_Graaf
| |
Collapse
|
27
|
Guo B, Zheng H, Jiang H, Li X, Guan N, Zuo Y, Zhang Y, Yang H, Wang X. Enhanced compound-protein binding affinity prediction by representing protein multimodal information via a coevolutionary strategy. Brief Bioinform 2023; 24:6995409. [PMID: 36682005 DOI: 10.1093/bib/bbac628] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 12/12/2022] [Accepted: 12/25/2022] [Indexed: 01/23/2023] Open
Abstract
Due to the lack of a method to efficiently represent the multimodal information of a protein, including its structure and sequence information, predicting compound-protein binding affinity (CPA) still suffers from low accuracy when applying machine-learning methods. To overcome this limitation, in a novel end-to-end architecture (named FeatNN), we develop a coevolutionary strategy to jointly represent the structure and sequence features of proteins and ultimately optimize the mathematical models for predicting CPA. Furthermore, from the perspective of data-driven approach, we proposed a rational method that can utilize both high- and low-quality databases to optimize the accuracy and generalization ability of FeatNN in CPA prediction tasks. Notably, we visually interpret the feature interaction process between sequence and structure in the rationally designed architecture. As a result, FeatNN considerably outperforms the state-of-the-art (SOTA) baseline in virtual drug evaluation tasks, indicating the feasibility of this approach for practical use. FeatNN provides an outstanding method for higher CPA prediction accuracy and better generalization ability by efficiently representing multimodal information of proteins via a coevolutionary strategy.
Collapse
Affiliation(s)
- Binjie Guo
- Department of Neurobiology and Department of Rehabilitation Medicine, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zheng
- Department of Neurobiology and Department of Rehabilitation Medicine, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
| | - Haohan Jiang
- Department of Neurobiology and Department of Rehabilitation Medicine, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
| | - Xiaodan Li
- Department of Neurobiology and Department of Rehabilitation Medicine, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
| | - Naiyu Guan
- Department of Neurobiology and Department of Rehabilitation Medicine, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
| | - Yanming Zuo
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
| | - Yicheng Zhang
- Department of Neurobiology and Department of Rehabilitation Medicine, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
| | - Hengfu Yang
- School of Computer Science, Hunan First Normal University, Changsha, 410205 Hunan, China
| | - Xuhua Wang
- Department of Neurobiology and Department of Rehabilitation Medicine, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
- Co-innovation Center of Neuroregeneration, Nantong University, Nantong, 226001 Jiangsu, China
| |
Collapse
|
28
|
Sunsetting Binding MOAD with its last data update and the addition of 3D-ligand polypharmacology tools. Sci Rep 2023; 13:3008. [PMID: 36810894 PMCID: PMC9944886 DOI: 10.1038/s41598-023-29996-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 02/14/2023] [Indexed: 02/24/2023] Open
Abstract
Binding MOAD is a database of protein-ligand complexes and their affinities with many structured relationships across the dataset. The project has been in development for over 20 years, but now, the time has come to bring it to a close. Currently, the database contains 41,409 structures with affinity coverage for 15,223 (37%) complexes. The website BindingMOAD.org provides numerous tools for polypharmacology exploration. Current relationships include links for structures with sequence similarity, 2D ligand similarity, and binding-site similarity. In this last update, we have added 3D ligand similarity using ROCS to identify ligands which may not necessarily be similar in two dimensions but can occupy the same three-dimensional space. For the 20,387 different ligands present in the database, a total of 1,320,511 3D-shape matches between the ligands were added. Examples of the utility of 3D-shape matching in polypharmacology are presented. Finally, plans for future access to the project data are outlined.
Collapse
|
29
|
Bahia MS, Kaspi O, Touitou M, Binayev I, Dhail S, Spiegel J, Khazanov N, Yosipof A, Senderowitz H. A comparison between 2D and 3D descriptors in QSAR modeling based on bio-active conformations. Mol Inform 2023; 42:e2200186. [PMID: 36617991 DOI: 10.1002/minf.202200186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 01/03/2023] [Accepted: 01/04/2023] [Indexed: 01/10/2023]
Abstract
QSAR models are widely and successfully used in many research areas. The success of such models highly depends on molecular descriptors typically classified as 1D, 2D, 3D, or 4D. While 3D information is likely important, e. g., for modeling ligand-protein binding, previous comparisons between the performances of 2D and 3D descriptors were inconclusive. Yet in such comparisons the modeled ligands were not necessarily represented by their bioactive conformations. With this in mind, we mined the PDB for sets of protein-ligand complexes sharing the same protein for which uniform activity data were reported. The results, totaling 461 structures spread across six series were compiled into a carefully curated, first of its kind dataset in which each ligand is represented by its bioactive conformation. Next, each set was characterized by 2D, 3D and 2D + 3D descriptors and modeled using three machine learning algorithms, namely, k-Nearest Neighbors, Random Forest and Lasso Regression. Models' performances were evaluated on external test sets derived from the parent datasets either randomly or in a rational manner. We found that many more significant models were obtained when combining 2D and 3D descriptors. We attribute these improvements to the ability of 2D and 3D descriptors to code for different, yet complementary molecular properties.
Collapse
Affiliation(s)
| | - Omer Kaspi
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Meir Touitou
- School of Cancer and Pharmaceutical Sciences, King's College London, London, 150 Stamford Street, SE1 9NH, United Kingdom
| | - Idan Binayev
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Seema Dhail
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Jacob Spiegel
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Netaly Khazanov
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Abraham Yosipof
- Department of Information Systems, College of Law & Business, Ramat-Gan, P.O. Box 852, Bnei Brak, 5110801, Israel
| | - Hanoch Senderowitz
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| |
Collapse
|
30
|
Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022; 21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open
Abstract
Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D.J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
31
|
Bogado ML, Villafañe RN, Gómez Chavez JL, Angelina EL, Sosa GL, Peruchena NM. Targeting Protein Pockets with Halogen Bonds: The Role of the Halogen Environment. J Chem Inf Model 2022; 62:6494-6507. [PMID: 36044012 DOI: 10.1021/acs.jcim.2c00475] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Protein pockets that form a halogen bond (X-bond) with a halogenated ligand molecule simultaneously form other (mainly hydrophobic) interactions with the halogen atom that can be considered as its "X-bond environment" (XBenv). Most studies in the field have focused on the X-bond, with the properties of the XBenv usually overlooked. In this work, we derived a protocol that evaluates the XBenv strength as a measure of the propensity of a protein pocket to host an X-bond. The charge density-based topological descriptors in combination with machine learning tools were employed to predict formation and strength of the interactions that conform the XBenv as a function of their geometrical parameters. On the basis of these results, we propose that the XBenv can be used as a footprint to judge the chance of a protein pocket to form an X-bond.
Collapse
Affiliation(s)
- María Lucrecia Bogado
- Lab. Estructura Molecular y Propiedades, IQUIBA-NEA, Universidad Nacional del Nordeste, CONICET, FaCENA, Av. Libertad 5470, Corrientes 3400, Argentina
| | - Roxana Noelia Villafañe
- Lab. Estructura Molecular y Propiedades, IQUIBA-NEA, Universidad Nacional del Nordeste, CONICET, FaCENA, Av. Libertad 5470, Corrientes 3400, Argentina
| | - José Leonardo Gómez Chavez
- Lab. Estructura Molecular y Propiedades, IQUIBA-NEA, Universidad Nacional del Nordeste, CONICET, FaCENA, Av. Libertad 5470, Corrientes 3400, Argentina
| | - Emilio Luis Angelina
- Lab. Estructura Molecular y Propiedades, IQUIBA-NEA, Universidad Nacional del Nordeste, CONICET, FaCENA, Av. Libertad 5470, Corrientes 3400, Argentina
| | - Gladis Laura Sosa
- Lab. Estructura Molecular y Propiedades, IQUIBA-NEA, Universidad Nacional del Nordeste, CONICET, FaCENA, Av. Libertad 5470, Corrientes 3400, Argentina
| | - Nélida María Peruchena
- Lab. Estructura Molecular y Propiedades, IQUIBA-NEA, Universidad Nacional del Nordeste, CONICET, FaCENA, Av. Libertad 5470, Corrientes 3400, Argentina
| |
Collapse
|
32
|
Chan L, Kumar R, Verdonk M, Poelking C. A multilevel generative framework with hierarchical self-contrasting for bias control and transparency in structure-based ligand design. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00564-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
33
|
Trawally M, Demir-Yazıcı K, İpek Dingis-Birgül S, Kaya K, Akdemir A, Güzel-Akdemir Ö. Dithiocarbamates and dithiocarbonates containing 6-nitrosaccharin scaffold: Synthesis, antimycobacterial activity and in silico target prediction using ensemble docking-based reverse virtual screening. J Mol Struct 2022. [DOI: 10.1016/j.molstruc.2022.134818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
34
|
Blanchard AE, Gounley J, Bhowmik D, Chandra Shekar M, Lyngaas I, Gao S, Yin J, Tsaris A, Wang F, Glaser J. Language models for the prediction of SARS-CoV-2 inhibitors. THE INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2022; 36:587-602. [PMID: 38603308 PMCID: PMC9548488 DOI: 10.1177/10943420221121804] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ∼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.
Collapse
Affiliation(s)
| | - John Gounley
- Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | | | | | | | - Shang Gao
- Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Junqi Yin
- Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | | | - Feiyi Wang
- Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jens Glaser
- Oak Ridge National Laboratory, Oak Ridge, TN, USA
| |
Collapse
|
35
|
Wang C, Chen Y, Zhang Y, Li K, Lin M, Pan F, Wu W, Zhang J. A reinforcement learning approach for protein-ligand binding pose prediction. BMC Bioinformatics 2022; 23:368. [PMID: 36076158 PMCID: PMC9454149 DOI: 10.1186/s12859-022-04912-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 08/25/2022] [Indexed: 11/10/2022] Open
Abstract
Protein ligand docking is an indispensable tool for computational prediction of protein functions and screening drug candidates. Despite significant progress over the past two decades, it is still a challenging problem, characterized by the still limited understanding of the energetics between proteins and ligands, and the vast conformational space that has to be searched to find a satisfactory solution. In this project, we developed a novel reinforcement learning (RL) approach, the asynchronous advantage actor-critic model (A3C), to address the protein ligand docking problem. The overall framework consists of two models. During the search process, the agent takes an action selected by the actor model based on the current location. The critic model then evaluates this action and predict the distance between the current location and true binding site. Experimental results showed that in both single- and multi-atom cases, our model improves binding site prediction substantially compared to a naïve model. For the single-atom ligand, copper ion (Cu2+), the model predicted binding sites have a median root-mean-square-deviation (RMSD) of 2.39 Å to the true binding sites when starting from random starting locations. For the multi-atom ligand, sulfate ion (SO42-), the predicted binding sites have a median RMSD of 3.82 Å to the true binding sites. The ligand-specific models built in this study can be used in solvent mapping studies and the RL framework can be readily scaled up to larger and more diverse sets of ligands.
Collapse
Affiliation(s)
- Chenran Wang
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Yang Chen
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Yuan Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Keqiao Li
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Menghan Lin
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Feng Pan
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Wei Wu
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA.
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA.
| |
Collapse
|
36
|
Korlepara DB, Vasavi CS, Jeurkar S, Pal PK, Roy S, Mehta S, Sharma S, Kumar V, Muvva C, Sridharan B, Garg A, Modee R, Bhati AP, Nayar D, Priyakumar UD. PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications. Sci Data 2022; 9:548. [PMID: 36071074 PMCID: PMC9451116 DOI: 10.1038/s41597-022-01631-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 08/15/2022] [Indexed: 11/08/2022] Open
Abstract
Computational methods and recently modern machine learning methods have played a key role in structure-based drug design. Though several benchmarking datasets are available for machine learning applications in virtual screening, accurate prediction of binding affinity for a protein-ligand complex remains a major challenge. New datasets that allow for the development of models for predicting binding affinities better than the state-of-the-art scoring functions are important. For the first time, we have developed a dataset, PLAS-5k comprised of 5000 protein-ligand complexes chosen from PDB database. The dataset consists of binding affinities along with energy components like electrostatic, van der Waals, polar and non-polar solvation energy calculated from molecular dynamics simulations using MMPBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) method. The calculated binding affinities outperformed docking scores and showed a good correlation with the available experimental values. The availability of energy components may enable optimization of desired components during machine learning-based drug design. Further, OnionNet model has been retrained on PLAS-5k dataset and is provided as a baseline for the prediction of binding affinities.
Collapse
Affiliation(s)
- Divya B Korlepara
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - C S Vasavi
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shruti Jeurkar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Pradeep Kumar Pal
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Subhajit Roy
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
- UM-DAE-Centre For Excellence In Basic Sciences, University of Mumbai, Vidyanagari, Mumbai, India
| | - Sarvesh Mehta
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shubham Sharma
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Vishal Kumar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Charuvaka Muvva
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Bhuvanesh Sridharan
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Akshit Garg
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Rohit Modee
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Agastya P Bhati
- Centre for Computational Science, Department of Chemistry, University College London, London, WC1H 0AJ, United Kingdom
| | - Divya Nayar
- Department of Materials Science and Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India.
| | - U Deva Priyakumar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India.
| |
Collapse
|
37
|
McGibbon M, Money-Kyrle S, Blay V, Houston DR. SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation. J Adv Res 2022; 46:135-147. [PMID: 35901959 PMCID: PMC10105235 DOI: 10.1016/j.jare.2022.07.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 07/08/2022] [Accepted: 07/09/2022] [Indexed: 11/17/2022] Open
Abstract
INTRODUCTION The discovery of a new drug is a costly and lengthy endeavour. The computational prediction of which small molecules can bind to a protein target can accelerate this process if the predictions are fast and accurate enough. Recent machine-learning scoring functions re-evaluate the output of molecular docking to achieve more accurate predictions. However, previous scoring functions were trained on crystalised protein-ligand complexes and datasets of decoys. The limited availability of crystal structures and biases in the decoy datasets can lower the performance of scoring functions. OBJECTIVES To address key limitations of previous scoring functions and thus improve the predictive performance of structure-based virtual screening. METHODS A novel machine-learning scoring function was created, named SCORCH (Scoring COnsensus for RMSD-based Classification of Hits). To develop SCORCH, training data is augmented by considering multiple ligand poses and labelling poses based on their RMSD from the native pose. Decoy bias is addressed by generating property-matched decoys for each ligand and using the same methodology for preparing and docking decoys and ligands. A consensus of 3 different machine learning approaches is also used to improve performance. RESULTS We find that multi-pose augmentation in SCORCH improves its docking power and screening power on independent benchmark datasets. SCORCH outperforms an equivalent scoring function trained on single poses, with a 1% enrichment factor (EF) of 13.78 vs. 10.86 on 18 DEKOIS 2.0 targets and a mean native pose rank of 5.9 vs 30.4 on CSAR 2014. Additionally, SCORCH outperforms widely used scoring functions in virtual screening and pose prediction on independent benchmark datasets. CONCLUSION By rationally addressing key limitations of previous scoring functions, SCORCH improves the performance of virtual screening. SCORCH also provides an estimate of its uncertainty, which can help reduce the cost and time required for drug discovery.
Collapse
Affiliation(s)
- Miles McGibbon
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK
| | - Sam Money-Kyrle
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK
| | - Vincent Blay
- Department of Microbiology and Environmental Toxicology, University of California at Santa Cruz, Santa Cruz, CA 95064, USA; Institute for Integrative Systems Biology (I(2)SysBio), Universitat de València and Spanish Research Council (CSIC), 46980 Valencia, Spain.
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK.
| |
Collapse
|
38
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
39
|
Morningstar-Kywi N, Wang K, Asbell TR, Wang Z, Giles JB, Lai J, Brill D, Sutch BT, Haworth IS. Prediction of Water Distributions and Displacement at Protein-Ligand Interfaces. J Chem Inf Model 2022; 62:1489-1497. [PMID: 35261241 DOI: 10.1021/acs.jcim.1c01266] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The retention and displacement of water molecules during formation of ligand-protein interfaces play a major role in determining ligand binding. Understanding these effects requires a method for positioning of water molecules in the bound and unbound proteins and for defining water displacement upon ligand binding. We describe an algorithm for water placement and a calculation of ligand-driven water displacement in >9000 protein-ligand complexes. The algorithm predicts approximately 38% of experimental water positions within 1.0 Å and about 83% within 1.5 Å. We further show that the predicted water molecules can complete water networks not detected in crystallographic structures of the protein-ligand complexes. The algorithm was also applied to solvation of the corresponding unbound proteins, and this allowed calculation of water displacement upon ligand binding based on differences in the water network between the bound and unbound structures. We illustrate use of this approach through comparison of water displacement by structurally related ligands at the same binding site. This method for evaluation of water displacement upon ligand binding may be of value for prediction of the effects of ligand modification in drug design.
Collapse
Affiliation(s)
- Noam Morningstar-Kywi
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, California 90089, United States
| | - Kaichen Wang
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, California 90089, United States
| | - Thomas R Asbell
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, California 90089, United States
| | - Zhaohui Wang
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, California 90089, United States
| | - Jason B Giles
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, California 90089, United States
| | - Jiawei Lai
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, California 90089, United States
| | - Dab Brill
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, California 90089, United States
| | - Brian T Sutch
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, California 90089, United States
| | - Ian S Haworth
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, California 90089, United States
| |
Collapse
|
40
|
Nikolaienko T, Gurbych O, Druchok M. Complex machine learning model needs complex testing: Examining predictability of molecular binding affinity by a graph neural network. J Comput Chem 2022; 43:728-739. [PMID: 35201629 DOI: 10.1002/jcc.26831] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 01/04/2022] [Accepted: 02/09/2022] [Indexed: 12/12/2022]
Abstract
Drug discovery pipelines typically involve high-throughput screening of large amounts of compounds in a search of potential drugs candidates. As a chemical space of small organic molecules is huge, a "navigation" over it urges for fast and lightweight computational methods, thus promoting machine-learning approaches for processing huge pools of candidates. In this contribution, we present a graph-based deep neural network for prediction of protein-drug binding affinity and assess its predictive power under thorough testing conditions. Within the suggested approach, both protein and drug molecules are represented as graphs and passed to separate graph sub-networks, then concatenated and regressed towards a binding affinity. The neural network is trained on two binding affinity datasets-PDBbind and data imported from RCSB Protein Data Bank. In order to explore the generalization capabilities of the model we go beyond traditional random or leave-cluster-out techniques and demonstrate the need for more elaborate model performance assessment - six different strategies for test/train data partitioning (random, time- and property-arranged, protein- and ligand-clustered) with a k-fold cross-validation are engaged. Finally, we discuss the model performance in terms of a set of metrics for different split strategies and fold arrangement. Our code is available at https://github.com/SoftServeInc/affinity-by-GNN.
Collapse
Affiliation(s)
- Tymofii Nikolaienko
- SoftServe, Inc., Lviv, Ukraine.,Faculty of Physics, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Oleksandr Gurbych
- Blackthorn AI Ltd., London, UK.,Department of Artificial Intelligence Systems, Lviv Polytechnic National University, Lviv, Ukraine
| | - Maksym Druchok
- SoftServe, Inc., Lviv, Ukraine.,Institute for Condensed Matter Physics, NAS of Ukraine, Lviv, Ukraine
| |
Collapse
|
41
|
Khashan R, Tropsha A, Zheng W. Data Mining Meets Machine Learning: A Novel ANN-based Multi-Body Interaction Docking Scoring Function (MBI-Score) based on Utilizing Frequent Geometric and Chemical Patterns of Interfacial Atoms in Native Protein-Ligand Complexes. Mol Inform 2022; 41:e2100248. [PMID: 35142086 DOI: 10.1002/minf.202100248] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 02/09/2022] [Indexed: 11/11/2022]
Abstract
Accurate prediction of binding poses is crucial to structure-based drug design. We employ two powerful artificial intelligence (AI) approaches, data-mining and machine-learning, to design artificial neural network (ANN) based pose-scoring function. It is a simple machine-learning-based statistical function that employs frequent geometric and chemical patterns of interacting atoms at protein-ligand interfaces. The patterns are derived by mining interfaces of "native" protein-ligand complexes. Each interface is represented by a graph where nodes are atoms and edges connect protein-ligand interfacial atoms located within certain cutoff distance of each other. Applying frequent subgraph mining to these interfaces provides "native" frequent patterns of interacting atoms. Subsequently, given a pose for a protein-ligand complex of interest, the pose-scoring function (the information-processing unit or neuron) calculates the degree of matching between the interaction patterns present at the pose's interface and the native frequent patterns. The pose-scoring function takes into account the frequency of occurrence of the matching native patterns, the size of the match, and the degree of geometrical similarity between pose-specific and matching native frequent patterns. This novel "multi-body interaction" pose-scoring function (MBI-Score) was validated using two databases, PDBbind and Astex-85, and it outperformed seven commonly used commercial scoring functions. MBI-Score is available at www.khashanlab.org/mbi-score.
Collapse
Affiliation(s)
- Raed Khashan
- University of the Sciences in Philadelphia, UNITED STATES
| | | | - Weifan Zheng
- North Carolina Central University, UNITED STATES
| |
Collapse
|
42
|
Dhakal A, McKay C, Tanner JJ, Cheng J. Artificial intelligence in the prediction of protein-ligand interactions: recent advances and future directions. Brief Bioinform 2022; 23:bbab476. [PMID: 34849575 PMCID: PMC8690157 DOI: 10.1093/bib/bbab476] [Citation(s) in RCA: 91] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 09/28/2021] [Accepted: 10/15/2021] [Indexed: 12/13/2022] Open
Abstract
New drug production, from target identification to marketing approval, takes over 12 years and can cost around $2.6 billion. Furthermore, the COVID-19 pandemic has unveiled the urgent need for more powerful computational methods for drug discovery. Here, we review the computational approaches to predicting protein-ligand interactions in the context of drug discovery, focusing on methods using artificial intelligence (AI). We begin with a brief introduction to proteins (targets), ligands (e.g. drugs) and their interactions for nonexperts. Next, we review databases that are commonly used in the domain of protein-ligand interactions. Finally, we survey and analyze the machine learning (ML) approaches implemented to predict protein-ligand binding sites, ligand-binding affinity and binding pose (conformation) including both classical ML algorithms and recent deep learning methods. After exploring the correlation between these three aspects of protein-ligand interaction, it has been proposed that they should be studied in unison. We anticipate that our review will aid exploration and development of more accurate ML-based prediction strategies for studying protein-ligand interactions.
Collapse
Affiliation(s)
- Ashwin Dhakal
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Cole McKay
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, USA
| | - John J Tanner
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, USA
- Department of Chemistry, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| |
Collapse
|
43
|
Varela-Rial A, Maryanow I, Majewski M, Doerr S, Schapin N, Jiménez-Luna J, De Fabritiis G. PlayMolecule Glimpse: Understanding Protein-Ligand Property Predictions with Interpretable Neural Networks. J Chem Inf Model 2022; 62:225-231. [PMID: 34978201 PMCID: PMC8790755 DOI: 10.1021/acs.jcim.1c00691] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
![]()
Deep learning has
been successfully applied to structure-based
protein–ligand affinity prediction, yet the black box nature
of these models raises some questions. In a previous study, we presented
KDEEP, a convolutional neural network that predicted the
binding affinity of a given protein–ligand complex while reaching
state-of-the-art performance. However, it was unclear what this model
was learning. In this work, we present a new application to visualize
the contribution of each input atom to the prediction made by the
convolutional neural network, aiding in the interpretability of such
predictions. The results suggest that KDEEP is able to
learn meaningful chemistry signals from the data, but it has also
exposed the inaccuracies of the current model, serving as a guideline
for further optimization of our prediction tools.
Collapse
Affiliation(s)
- Alejandro Varela-Rial
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain.,Acellera Labs, Doctor Trueta 183, 08005 Barcelona, Spain
| | - Iain Maryanow
- Acellera Labs, Doctor Trueta 183, 08005 Barcelona, Spain
| | - Maciej Majewski
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Stefan Doerr
- Acellera Labs, Doctor Trueta 183, 08005 Barcelona, Spain
| | - Nikolai Schapin
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain.,Acellera Labs, Doctor Trueta 183, 08005 Barcelona, Spain
| | - José Jiménez-Luna
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain.,Acellera Labs, Doctor Trueta 183, 08005 Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
44
|
Rezaei MA, Li Y, Wu D, Li X, Li C. Deep Learning in Drug Design: Protein-Ligand Binding Affinity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:407-417. [PMID: 33360998 PMCID: PMC8942327 DOI: 10.1109/tcbb.2020.3046945] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Computational drug design relies on the calculation of binding strength between two biological counterparts especially a chemical compound, i.e., a ligand, and a protein. Predicting the affinity of protein-ligand binding with reasonable accuracy is crucial for drug discovery, and enables the optimization of compounds to achieve better interaction with their target protein. In this paper, we propose a data-driven framework named DeepAtom to accurately predict the protein-ligand binding affinity. With 3D Convolutional Neural Network (3D-CNN) architecture, DeepAtom could automatically extract binding related atomic interaction patterns from the voxelized complex structure. Compared with the other CNN based approaches, our light-weight model design effectively improves the model representational capacity, even with the limited available training data. We carried out validation experiments on the PDBbind v.2016 benchmark and the independent Astex Diverse Set. We demonstrate that the less feature engineering dependent DeepAtom approach consistently outperforms the other baseline scoring methods. We also compile and propose a new benchmark dataset to further improve the model performances. With the new dataset as training input, DeepAtom achieves Pearson's R=0.83 and RMSE=1.23 pK units on the PDBbind v.2016 core set. The promising results demonstrate that DeepAtom models can be potentially adopted in computational drug development protocols such as molecular docking and virtual screening.
Collapse
Affiliation(s)
- Mohammad A. Rezaei
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development (CNPD3), University of Florida
| | - Yanjun Li
- Large-scale Intelligent Systems Laboratory, NSF Center for Big Learning, University of Florida Gainesville, FL, USA
| | - Dapeng Wu
- Large-scale Intelligent Systems Laboratory, NSF Center for Big Learning, University of Florida Gainesville, FL, USA
| | - Xiaolin Li
- Cognization Lab, Palo Alto, California, USA
| | - Chenglong Li
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development (CNPD3), University of Florida
- Large-scale Intelligent Systems Laboratory, NSF Center for Big Learning, University of Florida Gainesville, FL, USA
| |
Collapse
|
45
|
Can docking scoring functions guarantee success in virtual screening? VIRTUAL SCREENING AND DRUG DOCKING 2022. [DOI: 10.1016/bs.armc.2022.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
46
|
Gusmão AS, Abreu LS, Tavares JF, de Freitas HF, Silva da Rocha Pita S, Dos Santos EG, Caldas IS, Vieira AA, Silva EO. Computer-Guided Trypanocidal Activity of Natural Lactones Produced by Endophytic Fungus of Euphorbia umbellata. Chem Biodivers 2021; 18:e2100493. [PMID: 34403573 DOI: 10.1002/cbdv.202100493] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 08/17/2021] [Indexed: 11/11/2022]
Abstract
Hundreds of millions of people worldwide are affected by Chagas' disease caused by Trypanosoma cruzi. Since the current treatment lack efficacy, specificity, and suffers from several side-effects, novel therapeutics are mandatory. Natural products from endophytic fungi have been useful sources of lead compounds. In this study, three lactones isolated from an endophytic strain culture were in silico evaluated for rational guidance of their bioassay screening. All lactones displayed in vitro activity against T. cruzi epimastigote and trypomastigote forms. Notably, the IC50 values of (+)-phomolactone were lower than benznidazole (0.86 vs. 30.78 μM against epimastigotes and 0.41 vs. 4.88 μM against trypomastigotes). Target-based studies suggested that lactones displayed their trypanocidal activities due to T. cruzi glyceraldehyde-3-phosphate dehydrogenase (TcGAPDH) inhibition, and the binding free energy for all three TcGAPDH-lactone complexes suggested that (+)-phomolactone has a lower score value (-3.38), corroborating with IC50 assays. These results highlight the potential of these lactones for further anti-T. cruzi drug development.
Collapse
Affiliation(s)
- Amanda Santos Gusmão
- Organic Chemistry Department, Chemistry Institute, Federal University of Bahia, Barão de Jeremoabo 147, Salvador, 40170115, Bahia, Brazil
| | - Lucas Silva Abreu
- Institute for Research in Pharmaceuticals and Medications, Federal University of Paraíba, Campus I, João Pessoa, 58051900, Paraíba, Brazil
| | - Josean Fechine Tavares
- Institute for Research in Pharmaceuticals and Medications, Federal University of Paraíba, Campus I, João Pessoa, 58051900, Paraíba, Brazil
| | - Humberto Fonseca de Freitas
- Laboratory of Bioinformatics and Molecular Modeling (LaBiMM), Pharmacy College, Federal University of Bahia, Barão de Jeremoabo 147, Salvador, 40170115, Bahia, Brazil
| | - Samuel Silva da Rocha Pita
- Laboratory of Bioinformatics and Molecular Modeling (LaBiMM), Pharmacy College, Federal University of Bahia, Barão de Jeremoabo 147, Salvador, 40170115, Bahia, Brazil
| | - Elda Gonçalves Dos Santos
- Pathology and Parasitology Department, Institute of Biomedical Sciences, Federal University of Alfenas, Gabriel Monteiro da Silva 500, Alfenas, 37130001, Minas Gerais, Brazil
| | - Ivo Santana Caldas
- Pathology and Parasitology Department, Institute of Biomedical Sciences, Federal University of Alfenas, Gabriel Monteiro da Silva 500, Alfenas, 37130001, Minas Gerais, Brazil
| | - André Alexandre Vieira
- Organic Chemistry Department, Chemistry Institute, Federal University of Bahia, Barão de Jeremoabo 147, Salvador, 40170115, Bahia, Brazil
| | - Eliane Oliveira Silva
- Organic Chemistry Department, Chemistry Institute, Federal University of Bahia, Barão de Jeremoabo 147, Salvador, 40170115, Bahia, Brazil
| |
Collapse
|
47
|
Ahmed A, Mam B, Sowdhamini R. DEELIG: A Deep Learning Approach to Predict Protein-Ligand Binding Affinity. Bioinform Biol Insights 2021; 15:11779322211030364. [PMID: 34290496 PMCID: PMC8274096 DOI: 10.1177/11779322211030364] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 06/05/2021] [Indexed: 12/03/2022] Open
Abstract
Protein-ligand binding prediction has extensive biological significance. Binding affinity helps in understanding the degree of protein-ligand interactions and is a useful measure in drug design. Protein-ligand docking using virtual screening and molecular dynamic simulations are required to predict the binding affinity of a ligand to its cognate receptor. Performing such analyses to cover the entire chemical space of small molecules requires intense computational power. Recent developments using deep learning have enabled us to make sense of massive amounts of complex data sets where the ability of the model to “learn” intrinsic patterns in a complex plane of data is the strength of the approach. Here, we have incorporated convolutional neural networks to find spatial relationships among data to help us predict affinity of binding of proteins in whole superfamilies toward a diverse set of ligands without the need of a docked pose or complex as user input. The models were trained and validated using a stringent methodology for feature extraction. Our model performs better in comparison to some existing methods used widely and is suitable for predictions on high-resolution protein crystal (⩽2.5 Å) and nonpeptide ligand as individual inputs. Our approach to network construction and training on protein-ligand data set prepared in-house has yielded significant insights. We have also tested DEELIG on few COVID-19 main protease-inhibitor complexes relevant to the current public health scenario. DEELIG-based predictions can be incorporated in existing databases including RSCB PDB, PDBMoad, and PDBbind in filling missing binding affinity data for protein-ligand complexes.
Collapse
Affiliation(s)
- Asad Ahmed
- National Institute of Technology Warangal, Warangal, India
| | - Bhavika Mam
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India
- The University of Trans-Disciplinary Health Sciences and Technology (TDU), Bangalore, India
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India
- Ramanathan Sowdhamini, National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bangalore 560065, Karnataka, India.
| |
Collapse
|
48
|
Ma F, Zhang S, Song L, Wang B, Wei L, Zhang F. Applications and analytical tools of cell communication based on ligand-receptor interactions at single cell level. Cell Biosci 2021; 11:121. [PMID: 34217372 PMCID: PMC8254218 DOI: 10.1186/s13578-021-00635-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 06/22/2021] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Cellular communication is an essential feature of multicellular organisms. Binding of ligands to their homologous receptors, which activate specific cell signaling pathways, is a basic type of cellular communication and intimately linked to many degeneration processes leading to diseases. MAIN BODY This study reviewed the history of ligand-receptor and presents the databases which store ligand-receptor pairs. The recently applications and research tools of ligand-receptor interactions for cell communication at single cell level by using single cell RNA sequencing have been sorted out. CONCLUSION The summary of the advantages and disadvantages of analysis tools will greatly help researchers analyze cell communication at the single cell level. Learning cell communication based on ligand-receptor interactions by single cell RNA sequencing gives way to developing new target drugs and personalizing treatment.
Collapse
Affiliation(s)
- Fen Ma
- Department of Microbiology, Harbin Medical University, Harbin, 150081 China
- Wu Lien-Teh Institute, Harbin Medical University, Harbin, 150081 China
| | - Siwei Zhang
- Department of Microbiology, Harbin Medical University, Harbin, 150081 China
- Wu Lien-Teh Institute, Harbin Medical University, Harbin, 150081 China
| | - Lianhao Song
- Department of Microbiology, Harbin Medical University, Harbin, 150081 China
- Wu Lien-Teh Institute, Harbin Medical University, Harbin, 150081 China
| | - Bozhi Wang
- Department of Microbiology, Harbin Medical University, Harbin, 150081 China
- Wu Lien-Teh Institute, Harbin Medical University, Harbin, 150081 China
| | - Lanlan Wei
- Department of Microbiology, Harbin Medical University, Harbin, 150081 China
- Wu Lien-Teh Institute, Harbin Medical University, Harbin, 150081 China
- Shenzhen Third People‘s Hospital, Second Hospital, Affiliated to Southern University of Science and Technology, Shenzhen, 518112 China
| | - Fengmin Zhang
- Department of Microbiology, Harbin Medical University, Harbin, 150081 China
- Wu Lien-Teh Institute, Harbin Medical University, Harbin, 150081 China
| |
Collapse
|
49
|
Green H, Durrant JD. DeepFrag: An Open-Source Browser App for Deep-Learning Lead Optimization. J Chem Inf Model 2021; 61:2523-2529. [PMID: 34029094 PMCID: PMC8243318 DOI: 10.1021/acs.jcim.1c00103] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Indexed: 11/28/2022]
Abstract
Lead optimization, a critical step in early stage drug discovery, involves making chemical modifications to a small-molecule ligand to improve properties such as binding affinity. We recently developed DeepFrag, a deep-learning model capable of recommending such modifications. Though a powerful hypothesis-generating tool, DeepFrag is currently implemented in Python and so requires a certain degree of computational expertise. To encourage broader adoption, we have created the DeepFrag browser app, which provides a user-friendly graphical user interface that runs the DeepFrag model in users' web browsers. The browser app does not require users to upload their molecular structures to a third-party server, nor does it require the separate installation of any third-party software. We are hopeful that the app will be a useful tool for both researchers and students. It can be accessed free of charge, without registration, at http://durrantlab.com/deepfrag. The source code is also available at http://git.durrantlab.com/jdurrant/deepfrag-app, released under the terms of the open-source Apache License, Version 2.0.
Collapse
Affiliation(s)
- Harrison Green
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Jacob D. Durrant
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| |
Collapse
|
50
|
Rasool N, Yasmin F, Sahai S, Hussain W, Inam H, Arshad A. Biological perspective of thiazolide derivatives against Mpro and MTase of SARS-CoV-2: Molecular docking, DFT and MD simulation investigations. Chem Phys Lett 2021; 771:138463. [PMID: 33716307 PMCID: PMC7936854 DOI: 10.1016/j.cplett.2021.138463] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/15/2021] [Accepted: 02/16/2021] [Indexed: 12/16/2022]
Abstract
Humans around the globe have been severely affected by SARS-CoV-2 and no treatment has yet been authorized for the treatment of this severe condition brought by COVID-19. Here, an in silico research was executed to elucidate the inhibitory potential of selected thiazolides derivatives against SARS-CoV-2 Protease (Mpro) and Methyltransferase (MTase). Based on the analysis; 4 compounds were discovered to have efficacious and remarkable results against the proteins of the interest. Primarily, results obtained through this study not only allude these compounds as potential inhibitors but also pave the way for in vivo and in vitro validation of these compounds.
Collapse
Affiliation(s)
- Nouman Rasool
- Center for Professional & Applied Studies, Lahore, Pakistan,Corresponding author
| | - Farkhanda Yasmin
- Department of Biotechnology, Khawaja Fareed University of Science and Technology, Rahim Yar Khan, Pakistan
| | | | - Waqar Hussain
- Center for Professional & Applied Studies, Lahore, Pakistan,National Center of Artificial Intelligence, Punjab University College of Information Technology, University of the Punjab, Lahore, Pakistan
| | - Hadiqa Inam
- Department of Life Sciences, University of Management and Technology, Lahore, Pakistan
| | - Arooj Arshad
- Department of Life Sciences, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|