51
|
Thomas M, Bender A, de Graaf C. Integrating structure-based approaches in generative molecular design. Curr Opin Struct Biol 2023; 79:102559. [PMID: 36870277 DOI: 10.1016/j.sbi.2023.102559] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 01/23/2023] [Accepted: 01/31/2023] [Indexed: 03/06/2023]
Abstract
Generative molecular design for drug discovery and development has seen a recent resurgence promising to improve the efficiency of the design-make-test-analyse cycle; by computationally exploring much larger chemical spaces than traditional virtual screening techniques. However, most generative models thus far have only utilized small-molecule information to train and condition de novo molecule generators. Here, we instead focus on recent approaches that incorporate protein structure into de novo molecule optimization in an attempt to maximize the predicted on-target binding affinity of generated molecules. We summarize these structure integration principles into either distribution learning or goal-directed optimization and for each case whether the approach is protein structure-explicit or implicit with respect to the generative model. We discuss recent approaches in the context of this categorization and provide our perspective on the future direction of the field.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK. https://twitter.com/@AndreasBenderUK
| | - Chris de Graaf
- Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK. https://twitter.com/@Chris_de_Graaf
| |
Collapse
|
52
|
Zhan H, Zhu X, Qiao Z, Hu J. Graph Neural Tree: A novel and interpretable deep learning-based framework for accurate molecular property predictions. Anal Chim Acta 2023; 1244:340558. [PMID: 36737143 DOI: 10.1016/j.aca.2022.340558] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Accepted: 10/24/2022] [Indexed: 11/06/2022]
Abstract
Determining various properties of molecules is a critical step in drug discovery. Recently, with the improvement of large heterogeneous datasets and the development of deep learning approaches, more and more scientists have turned their attention to neural network-based virtual preliminary screening to reduce the time and monetary cost of drug discovery. However, the poor interpretability of deep learning masks causality, so models' conclusions are often beyond the comprehension of human users, which reduces the credibility of the model and makes it difficult for chemists to further narrow the huge chemical space based on models' results. Thus, this study develops a novel framework consisting of Graph Neural Networks for feature extraction, Curriculum-Based Learning Strategies for optimization, and a Learning Binary Neural Tree (LBNT) for prediction, to improve the performance of neural networks and reveal their decision-making process to chemists. The framework encodes molecular graph data with graph neural networks (GNNs), then retrains the encoder with curriculum-based learning strategies to reduce uncertainty and improve accuracy, and finally uses LBNT as the predictor, which joint retrains with the encoder after independently training, for prediction and visualization. The framework is validated on the public datasets and compared to single GNNs with normal training strategies as well as GNN encoders with common machine learning predictors instead of the LBNT predictor. The result reveals that the proposed framework enhances the point prediction accuracy of the completely trained GNN and reduces its uncertainty through curriculum-based learning, and further improves the accuracy by combining LBNT. Besides, compared with common machine learning tools, the LBNT predictor generally has the best performance because of joint retraining with the GNN encoder. The decision-making process of LBNT is also better and easier to explain than that of other models.
Collapse
Affiliation(s)
- Haolin Zhan
- Guangzhou Key Laboratory for New Energy and Green Catalysis, School of Chemistry and Chemical Engineering, Guangzhou University, Guangzhou, China; College of Economics and Statistics, Guangzhou University, Guangzhou, China
| | - Xin Zhu
- Guangzhou Key Laboratory for New Energy and Green Catalysis, School of Chemistry and Chemical Engineering, Guangzhou University, Guangzhou, China.
| | - Zhiwei Qiao
- Guangzhou Key Laboratory for New Energy and Green Catalysis, School of Chemistry and Chemical Engineering, Guangzhou University, Guangzhou, China; Joint Institute of Guangzhou University & Institute of Corrosion Science and Technology, Guangzhou University, Guangzhou, 510006, China.
| | - Jianming Hu
- College of Economics and Statistics, Guangzhou University, Guangzhou, China.
| |
Collapse
|
53
|
Dorahy G, Chen JZ, Balle T. Computer-Aided Drug Design towards New Psychotropic and Neurological Drugs. Molecules 2023; 28:1324. [PMID: 36770990 PMCID: PMC9921936 DOI: 10.3390/molecules28031324] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 01/23/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023] Open
Abstract
Central nervous system (CNS) disorders are a therapeutic area in drug discovery where demand for new treatments greatly exceeds approved treatment options. This is complicated by the high failure rate in late-stage clinical trials, resulting in exorbitant costs associated with bringing new CNS drugs to market. Computer-aided drug design (CADD) techniques minimise the time and cost burdens associated with drug research and development by ensuring an advantageous starting point for pre-clinical and clinical assessments. The key elements of CADD are divided into ligand-based and structure-based methods. Ligand-based methods encompass techniques including pharmacophore modelling and quantitative structure activity relationships (QSARs), which use the relationship between biological activity and chemical structure to ascertain suitable lead molecules. In contrast, structure-based methods use information about the binding site architecture from an established protein structure to select suitable molecules for further investigation. In recent years, deep learning techniques have been applied in drug design and present an exciting addition to CADD workflows. Despite the difficulties associated with CNS drug discovery, advances towards new pharmaceutical treatments continue to be made, and CADD has supported these findings. This review explores various CADD techniques and discusses applications in CNS drug discovery from 2018 to November 2022.
Collapse
Affiliation(s)
- Georgia Dorahy
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia
- Brain and Mind Centre, The University of Sydney, Camperdown, NSW 2050, Australia
| | - Jake Zheng Chen
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia
- Brain and Mind Centre, The University of Sydney, Camperdown, NSW 2050, Australia
| | - Thomas Balle
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia
- Brain and Mind Centre, The University of Sydney, Camperdown, NSW 2050, Australia
| |
Collapse
|
54
|
Kanakala G, Aggarwal R, Nayar D, Priyakumar UD. Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets. ACS OMEGA 2023; 8:2389-2397. [PMID: 36687059 PMCID: PMC9850481 DOI: 10.1021/acsomega.2c06781] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 11/21/2022] [Indexed: 06/17/2023]
Abstract
Drug design involves the process of identifying and designing molecules that bind well to a given receptor. A vital computational component of this process is the protein-ligand interaction scoring functions that evaluate the binding ability of various molecules or ligands with a given protein receptor binding pocket reasonably accurately. With the publicly available protein-ligand binding affinity data sets in both sequential and structural forms, machine learning methods have gained traction as a top choice for developing such scoring functions. While the performance shown by these models is optimistic, there are several hidden biases present in these data sets themselves that affect the utility of such models for practical purposes such as virtual screening. In this work, we use published methods to systematically investigate several such factors or biases present in these data sets. In our analysis, we highlight the importance of considering sequence, protein-ligand interaction, and pocket structure similarity while constructing data splits and provide an explanation for good protein-only and ligand-only performances in some data sets. Through this study, we provide to the community several pointers for the design of binding affinity predictors and data sets for reliable applicability.
Collapse
Affiliation(s)
| | - Rishal Aggarwal
- International
Institute of Information Technology, Hyderabad500 032, India
| | - Divya Nayar
- Department
of Materials Science and Engineering, Indian
Institute of Technology Delhi, Hauz Khas, New Delhi110016, India
| | - U. Deva Priyakumar
- International
Institute of Information Technology, Hyderabad500 032, India
| |
Collapse
|
55
|
Diakou I, Papakonstantinou E, Papageorgiou L, Pierouli K, Dragoumani K, Spandidos DA, Bacopoulou F, Chrousos GP, Eliopoulos E, Vlachakis D. Novel computational pipelines in antiviral structure‑based drug design (Review). Biomed Rep 2022; 17:97. [PMID: 36382260 PMCID: PMC9634337 DOI: 10.3892/br.2022.1580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 10/05/2022] [Indexed: 11/22/2022] Open
Abstract
Viral infections constitute a fundamental and continuous challenge for the global scientific and medical community, as highlighted by the ongoing COVID-19 pandemic. In combination with prophylactic vaccines, the development of safe and effective antiviral drugs remains a pressing need for the effective management of rare and common pathogenic viruses. The design of potent antivirals can be informed by the study of the three-dimensional structure of viral protein targets. Structure-based design of antivirals in silico provides a solution to the arduous and costly process of conventional drug development pipelines. Furthermore, rapid advances in high-throughput computing, along with the growth of available biomolecular and biochemical data, enable the development of novel computational pipelines in the hunt of antivirals. The incorporation of modern methods, such as deep-learning and artificial intelligence, has the potential to revolutionize the structure-based design and repurposing of antiviral compounds, with minimal side effects and high efficacy. The present review aims to provide an outline of both traditional computational drug design and emerging, high-level computing strategies.
Collapse
Affiliation(s)
- Io Diakou
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Eleni Papakonstantinou
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Louis Papageorgiou
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Katerina Pierouli
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Konstantina Dragoumani
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Demetrios A. Spandidos
- Laboratory of Clinical Virology, School of Medicine, University of Crete, 71003 Heraklion, Greece
| | - Flora Bacopoulou
- University Research Institute of Maternal and Child Health and Precision Medicine, and UNESCO Chair on Adolescent Health Care, National and Kapodistrian University of Athens, ‘Aghia Sophia’ Children's Hospital, 11527 Athens, Greece
| | - George P. Chrousos
- University Research Institute of Maternal and Child Health and Precision Medicine, and UNESCO Chair on Adolescent Health Care, National and Kapodistrian University of Athens, ‘Aghia Sophia’ Children's Hospital, 11527 Athens, Greece
| | - Elias Eliopoulos
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Dimitrios Vlachakis
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
- University Research Institute of Maternal and Child Health and Precision Medicine, and UNESCO Chair on Adolescent Health Care, National and Kapodistrian University of Athens, ‘Aghia Sophia’ Children's Hospital, 11527 Athens, Greece
- Division of Endocrinology and Metabolism, Center of Clinical, Experimental Surgery and Translational Research, Biomedical Research Foundation of The Academy of Athens, 11527 Athens, Greece
| |
Collapse
|
56
|
Zhu H, Yang J, Huang N. Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening. J Chem Inf Model 2022; 62:5485-5502. [PMID: 36268980 DOI: 10.1021/acs.jcim.2c01149] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In structure-based virtual screening (SBVS), it is critical that scoring functions capture protein-ligand atomic interactions. By focusing on the local domains of ligand binding pockets, a standardized pocket Pfam-based clustering (Pfam-cluster) approach was developed to assess the cross-target generalization ability of machine-learning scoring functions (MLSFs). Subsequently, 12 typical MLSFs were evaluated using random cross-validation (Random-CV), protein sequence similarity-based cross-validation (Seq-CV), and pocket Pfam-based cross-validation (Pfam-CV) methods. Surprisingly, all of the tested models showed decreased performances from Random-CV to Seq-CV to Pfam-CV experiments, not showing satisfactory generalization capacity. Our interpretable analysis suggested that the predictions on novel targets by MLSFs were dependent on buried solvent-accessible surface area (SASA)-related features of complex structures, with greater predicted binding affinities on complexes owning larger protein-ligand interfaces. By combining buried SASA-related features with target-specific patterns that were only shared among structurally similar compounds in the same cluster, the random forest (RF)-Score attained a good performance in the Random-CV test. Based on these findings, we strongly advise assessing the generalization ability of MLSFs with the Pfam-cluster approach and being cautious with the features learned by MLSFs.
Collapse
Affiliation(s)
- Hui Zhu
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China102206, China.,National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing102206, China
| | - Jincai Yang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing102206, China
| | - Niu Huang
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China102206, China.,National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing102206, China
| |
Collapse
|
57
|
Réau M, Renaud N, Xue LC, Bonvin AMJJ. DeepRank-GNN: a graph neural network framework to learn patterns in protein-protein interfaces. Bioinformatics 2022; 39:6845451. [PMID: 36420989 PMCID: PMC9805592 DOI: 10.1093/bioinformatics/btac759] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 10/19/2022] [Accepted: 11/23/2022] [Indexed: 11/25/2022] Open
Abstract
MOTIVATION Gaining structural insights into the protein-protein interactome is essential to understand biological phenomena and extract knowledge for rational drug design or protein engineering. We have previously developed DeepRank, a deep-learning framework to facilitate pattern learning from protein-protein interfaces using convolutional neural network (CNN) approaches. However, CNN is not rotation invariant and data augmentation is required to desensitize the network to the input data orientation which dramatically impairs the computation performance. Representing protein-protein complexes as atomic- or residue-scale rotation invariant graphs instead enables using graph neural networks (GNN) approaches, bypassing those limitations. RESULTS We have developed DeepRank-GNN, a framework that converts protein-protein interfaces from PDB 3D coordinates files into graphs that are further provided to a pre-defined or user-defined GNN architecture to learn problem-specific interaction patterns. DeepRank-GNN is designed to be highly modularizable, easily customized and is wrapped into a user-friendly python3 package. Here, we showcase DeepRank-GNN's performance on two applications using a dedicated graph interaction neural network: (i) the scoring of docking poses and (ii) the discriminating of biological and crystal interfaces. In addition to the highly competitive performance obtained in those tasks as compared to state-of-the-art methods, we show a significant improvement in speed and storage requirement using DeepRank-GNN as compared to DeepRank. AVAILABILITY AND IMPLEMENTATION DeepRank-GNN is freely available from https://github.com/DeepRank/DeepRank-GNN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Li C Xue
- Center for Molecular and Biomolecular Informatics, Radboudumc, Nijmegen 6525 GA, The Netherlands
| | | |
Collapse
|
58
|
Zhang Y, Luo M, Wu P, Wu S, Lee TY, Bai C. Application of Computational Biology and Artificial Intelligence in Drug Design. Int J Mol Sci 2022; 23:13568. [PMID: 36362355 PMCID: PMC9658956 DOI: 10.3390/ijms232113568] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 10/29/2022] [Accepted: 11/03/2022] [Indexed: 08/24/2023] Open
Abstract
Traditional drug design requires a great amount of research time and developmental expense. Booming computational approaches, including computational biology, computer-aided drug design, and artificial intelligence, have the potential to expedite the efficiency of drug discovery by minimizing the time and financial cost. In recent years, computational approaches are being widely used to improve the efficacy and effectiveness of drug discovery and pipeline, leading to the approval of plenty of new drugs for marketing. The present review emphasizes on the applications of these indispensable computational approaches in aiding target identification, lead discovery, and lead optimization. Some challenges of using these approaches for drug design are also discussed. Moreover, we propose a methodology for integrating various computational techniques into new drug discovery and design.
Collapse
Affiliation(s)
- Yue Zhang
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
- Warshel Institute for Computational Biology, Shenzhen 518172, China
| | - Mengqi Luo
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- South China Hospital, Health Science Center, Shenzhen University, Shenzhen 518116, China
| | - Peng Wu
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518055, China
| | - Song Wu
- South China Hospital, Health Science Center, Shenzhen University, Shenzhen 518116, China
| | - Tzong-Yi Lee
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- Warshel Institute for Computational Biology, Shenzhen 518172, China
| | - Chen Bai
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- Warshel Institute for Computational Biology, Shenzhen 518172, China
| |
Collapse
|
59
|
Bieniek MK, Cree B, Pirie R, Horton JT, Tatum NJ, Cole DJ. An open-source molecular builder and free energy preparation workflow. Commun Chem 2022; 5:136. [PMID: 36320862 PMCID: PMC9607723 DOI: 10.1038/s42004-022-00754-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 10/11/2022] [Indexed: 01/27/2023] Open
Abstract
Automated free energy calculations for the prediction of binding free energies of congeneric series of ligands to a protein target are growing in popularity, but building reliable initial binding poses for the ligands is challenging. Here, we introduce the open-source FEgrow workflow for building user-defined congeneric series of ligands in protein binding pockets for input to free energy calculations. For a given ligand core and receptor structure, FEgrow enumerates and optimises the bioactive conformations of the grown functional group(s), making use of hybrid machine learning/molecular mechanics potential energy functions where possible. Low energy structures are optionally scored using the gnina convolutional neural network scoring function, and output for more rigorous protein-ligand binding free energy predictions. We illustrate use of the workflow by building and scoring binding poses for ten congeneric series of ligands bound to targets from a standard, high quality dataset of protein-ligand complexes. Furthermore, we build a set of 13 inhibitors of the SARS-CoV-2 main protease from the literature, and use free energy calculations to retrospectively compute their relative binding free energies. FEgrow is freely available at https://github.com/cole-group/FEgrow, along with a tutorial.
Collapse
Affiliation(s)
- Mateusz K. Bieniek
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Ben Cree
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Rachael Pirie
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Joshua T. Horton
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Natalie J. Tatum
- Newcastle University Centre for Cancer, Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH UK
| | - Daniel J. Cole
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| |
Collapse
|
60
|
Krasoulis A, Antonopoulos N, Pitsikalis V, Theodorakis S. DENVIS: Scalable and High-Throughput Virtual Screening Using Graph Neural Networks with Atomic and Surface Protein Pocket Features. J Chem Inf Model 2022; 62:4642-4659. [PMID: 36154119 DOI: 10.1021/acs.jcim.2c01057] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Computational methods for virtual screening can dramatically accelerate early-stage drug discovery by identifying potential hits for a specified target. Docking algorithms traditionally use physics-based simulations to address this challenge by estimating the binding orientation of a query protein-ligand pair and a corresponding binding affinity score. Over the recent years, classical and modern machine learning architectures have shown potential for outperforming traditional docking algorithms. Nevertheless, most learning-based algorithms still rely on the availability of the protein-ligand complex binding pose, typically estimated via docking simulations, which leads to a severe slowdown of the overall virtual screening process. A family of algorithms processing target information at the amino acid sequence level avoid this requirement, however, at the cost of processing protein data at a higher representation level. We introduce deep neural virtual screening (DENVIS), an end-to-end pipeline for virtual screening using graph neural networks (GNNs). By performing experiments on two benchmark databases, we show that our method performs competitively to several docking-based, machine learning-based, and hybrid docking/machine learning-based algorithms. By avoiding the intermediate docking step, DENVIS exhibits several orders of magnitude faster screening times (i.e., higher throughput) than both docking-based and hybrid models. When compared to an amino acid sequence-based machine learning model with comparable screening times, DENVIS achieves dramatically better performance. Some key elements of our approach include protein pocket modeling using a combination of atomic and surface features, the use of model ensembles, and data augmentation via artificial negative sampling during model training. In summary, DENVIS achieves competitive to state-of-the-art virtual screening performance, while offering the potential to scale to billions of molecules using minimal computational resources.
Collapse
|
61
|
Qu X, Dong L, Zhang J, Si Y, Wang B. Systematic Improvement of the Performance of Machine Learning Scoring Functions by Incorporating Features of Protein-Bound Water Molecules. J Chem Inf Model 2022; 62:4369-4379. [PMID: 36083808 DOI: 10.1021/acs.jcim.2c00916] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Water molecules at the ligand-protein interfaces play crucial roles in the binding of the ligands, but the behavior of protein-bound water is largely ignored in many currently used machine learning (ML)-based scoring functions (SFs). In an attempt to improve the prediction performance of existing ML-based SFs, we estimated the water distribution with a HydraMap (HM) method and then incorporated the features extracted from protein-bound waters obtained in this way into three ML-based SFs: RF-Score, ECIF, and PLEC. It was found that a combination of HM-based features can consistently improve the performance of all three SFs, including their scoring, ranking, and docking power. HydraMap-based features show consistently good performance with both crystal structures and docked structures, demonstrating their robustness for SFs. Overall, HM-based features, which are a statistical representation of hydration sites at protein-ligand interfaces, are expected to improve the prediction performance for diverse SFs.
Collapse
Affiliation(s)
- Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Jinyan Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Yubing Si
- College of Chemistry, Zhengzhou University, Zhengzhou 450001, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| |
Collapse
|
62
|
A pocket-based 3D molecule generative model fueled by experimental electron density. Sci Rep 2022; 12:15100. [PMID: 36068257 PMCID: PMC9448726 DOI: 10.1038/s41598-022-19363-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Accepted: 08/29/2022] [Indexed: 11/08/2022] Open
Abstract
We report for the first time the use of experimental electron density (ED) as training data for the generation of drug-like three-dimensional molecules based on the structure of a target protein pocket. Similar to a structural biologist building molecules based on their ED, our model functions with two main components: a generative adversarial network (GAN) to generate the ligand ED in the input pocket and an ED interpretation module for molecule generation. The model was tested on three targets: a kinase (hematopoietic progenitor kinase 1), protease (SARS-CoV-2 main protease), and nuclear receptor (vitamin D receptor), and evaluated with a reference dataset composed of over 8000 compounds that have their activities reported in the literature. The evaluation considered the chemical validity, chemical space distribution-based diversity, and similarity with reference active compounds concerning the molecular structure and pocket-binding mode. Our model can generate molecules with similar structures to classical active compounds and novel compounds sharing similar binding modes with active compounds, making it a promising tool for library generation supporting high-throughput virtual screening. The ligand ED generated can also be used to support fragment-based drug design. Our model is available as an online service to academic users via https://edmg.stonewise.cn/#/create .
Collapse
|
63
|
Osaki K, Ekimoto T, Yamane T, Ikeguchi M. 3D-RISM-AI: A Machine Learning Approach to Predict Protein-Ligand Binding Affinity Using 3D-RISM. J Phys Chem B 2022; 126:6148-6158. [PMID: 35969673 PMCID: PMC9421647 DOI: 10.1021/acs.jpcb.2c03384] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 07/27/2022] [Indexed: 11/30/2022]
Abstract
Hydration free energy (HFE) is a key factor in improving protein-ligand binding free energy (BFE) prediction accuracy. The HFE itself can be calculated using the three-dimensional reference interaction model (3D-RISM); however, the BFE predictions solely evaluated using 3D-RISM are not correlated to the experimental BFE for abundant protein-ligand pairs. In this study, to predict the BFE for multiple sets of protein-ligand pairs, we propose a machine learning approach incorporating the HFEs obtained using 3D-RISM, termed 3D-RISM-AI. In the learning process, structural metrics, intra-/intermolecular energies, and HFEs obtained via 3D-RISM of ∼4000 complexes in the PDBbind database (ver. 2018) were used. The BFEs predicted using 3D-RISM-AI were well correlated to the experimental data (Pearson's correlation coefficient of 0.80 and root-mean-square error of 1.91 kcal/mol). As important factors for the prediction, the difference in the solvent accessible surface area between the bound and unbound structures and the hydration properties of the ligands were detected during the learning process.
Collapse
Affiliation(s)
- Kazu Osaki
- Graduate
School of Medical Life Science, Yokohama
City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Toru Ekimoto
- Graduate
School of Medical Life Science, Yokohama
City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Tsutomu Yamane
- Center
for Computational Science, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Mitsunori Ikeguchi
- Graduate
School of Medical Life Science, Yokohama
City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
- Center
for Computational Science, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| |
Collapse
|
64
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
65
|
Wang Y, Wei Z, Xi L. Sfcnn: a novel scoring function based on 3D convolutional neural network for accurate and stable protein-ligand affinity prediction. BMC Bioinformatics 2022; 23:222. [PMID: 35676617 PMCID: PMC9178885 DOI: 10.1186/s12859-022-04762-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 06/01/2022] [Indexed: 01/09/2023] Open
Abstract
Background Computer-aided drug design provides an effective method of identifying lead compounds. However, success rates are significantly bottlenecked by the lack of accurate and reliable scoring functions needed to evaluate binding affinities of protein–ligand complexes. Therefore, many scoring functions based on machine learning or deep learning have been developed to improve prediction accuracies in recent years. In this work, we proposed a novel featurization method, generating a new scoring function model based on 3D convolutional neural network. Results This work showed the results from testing four architectures and three featurization methods, and outlined the development of a novel deep 3D convolutional neural network scoring function model. This model simplified feature engineering, and in combination with Grad-CAM made the intermediate layers of the neural network more interpretable. This model was evaluated and compared with other scoring functions on multiple independent datasets. The Pearson correlation coefficients between the predicted binding affinities by our model and the experimental data achieved 0.7928, 0.7946, 0.6758, and 0.6474 on CASF-2016 dataset, CASF-2013 dataset, CSAR_HiQ_NRC_set, and Astex_diverse_set, respectively. Overall, our model performed accurately and stably enough in the scoring power to predict the binding affinity of a protein–ligand complex. Conclusions These results indicate our model is an excellent scoring function, and performs well in scoring power for accurately and stably predicting the protein–ligand affinity. Our model will contribute towards improving the success rate of virtual screening, thus will accelerate the development of potential drugs or novel biologically active lead compounds. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04762-3.
Collapse
Affiliation(s)
- Yu Wang
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Tele-Communications, No. 2 Chongwen Road, Nan'an District, Chongqing, 400065, China.
| | - Zhengxiao Wei
- Department of Clinical Laboratory, Public Health Clinical Center of Chengdu, Chengdu, 610095, China
| | - Lei Xi
- Hubei Provincial Key Laboratory of Occurrence and Intervention of Rheumatic Diseases, Hubei Minzu University, Enshi, 445000, China
| |
Collapse
|
66
|
Shim H, Kim H, Allen JE, Wulff H. Pose Classification Using Three-Dimensional Atomic Structure-Based Neural Networks Applied to Ion Channel-Ligand Docking. J Chem Inf Model 2022; 62:2301-2315. [PMID: 35447030 PMCID: PMC9131459 DOI: 10.1021/acs.jcim.1c01510] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Indexed: 12/11/2022]
Abstract
The identification of promising lead compounds showing pharmacological activities toward a biological target is essential in early stage drug discovery. With the recent increase in available small-molecule databases, virtual high-throughput screening using physics-based molecular docking has emerged as an essential tool in assisting fast and cost-efficient lead discovery and optimization. However, the best scored docking poses are often suboptimal, resulting in incorrect screening and chemical property calculation. We address the pose classification problem by leveraging data-driven machine learning approaches to identify correct docking poses from AutoDock Vina and Glide screens. To enable effective classification of docking poses, we present two convolutional neural network approaches: a three-dimensional convolutional neural network (3D-CNN) and an attention-based point cloud network (PCN) trained on the PDBbind refined set. We demonstrate the effectiveness of our proposed classifiers on multiple evaluation data sets including the standard PDBbind CASF-2016 benchmark data set and various compound libraries with structurally different protein targets including an ion channel data set extracted from Protein Data Bank (PDB) and an in-house KCa3.1 inhibitor data set. Our experiments show that excluding false positive docking poses using the proposed classifiers improves virtual high-throughput screening to identify novel molecules against each target protein compared to the initial screen based on the docking scores.
Collapse
Affiliation(s)
- Heesung Shim
- Department
of Pharmacology, University of California, Davis, California 95616, United States
| | - Hyojin Kim
- Center
for Applied Scientific Computing, Lawrence
Livermore National Laboratory, Livermore, California 94550, United States
| | - Jonathan E. Allen
- Global
Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
| | - Heike Wulff
- Department
of Pharmacology, University of California, Davis, California 95616, United States
| |
Collapse
|
67
|
Bai Q, Liu S, Tian Y, Xu T, Banegas‐Luna AJ, Pérez‐Sánchez H, Huang J, Liu H, Yao X. Application advances of deep learning methods for de novo drug design and molecular dynamics simulation. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1581] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Qifeng Bai
- Key Lab of Preclinical Study for New Drugs of Gansu Province Institute of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Lanzhou University Lanzhou Gansu China
| | - Shuo Liu
- School of Pharmacy Lanzhou University Lanzhou Gansu China
| | - Yanan Tian
- School of Pharmacy Lanzhou University Lanzhou Gansu China
| | - Tingyang Xu
- Tencent AI Lab, Shenzhen Tencent Computer Ltd Shenzhen China
| | - Antonio Jesús Banegas‐Luna
- Structural Bioinformatics and High Performance Computing Research Group (BIO‐HPC), Computer Engineering Department UCAM Universidad Católica de Murcia Murcia Spain
| | - Horacio Pérez‐Sánchez
- Structural Bioinformatics and High Performance Computing Research Group (BIO‐HPC), Computer Engineering Department UCAM Universidad Católica de Murcia Murcia Spain
| | - Junzhou Huang
- Tencent AI Lab, Shenzhen Tencent Computer Ltd Shenzhen China
| | - Huanxiang Liu
- School of Pharmacy Lanzhou University Lanzhou Gansu China
| | - Xiaojun Yao
- College of Chemistry and Chemical Engineering Lanzhou University Lanzhou Gansu China
| |
Collapse
|
68
|
McNutt AT, Koes DR. Improving ΔΔG Predictions with a Multitask Convolutional Siamese Network. J Chem Inf Model 2022; 62:1819-1829. [PMID: 35380443 PMCID: PMC9038699 DOI: 10.1021/acs.jcim.1c01497] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The lead optimization phase of drug discovery refines an initial hit molecule for desired properties, especially potency. Synthesis and experimental testing of the small perturbations during this refinement can be quite costly and time-consuming. Relative binding free energy (RBFE, also referred to as ΔΔG) methods allow the estimation of binding free energy changes after small changes to a ligand scaffold. Here, we propose and evaluate a Siamese convolutional neural network (CNN) for the prediction of RBFE between two bound ligands. We show that our multitask loss is able to improve on a previous state-of-the-art Siamese network for RBFE prediction via increased regularization of the latent space. The Siamese network architecture is well suited to the prediction of RBFE in comparison to a standard CNN trained on the same data (Pearson's R of 0.553 and 0.5, respectively). When evaluated on a left-out protein family, our Siamese CNN shows variability in its RBFE predictive performance depending on the protein family being evaluated (Pearson's R ranging from -0.44 to 0.97). RBFE prediction performance can be improved during generalization by injecting only a few examples (few-shot learning) from the evaluation data set during model training.
Collapse
Affiliation(s)
- Andrew T McNutt
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - David Ryan Koes
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| |
Collapse
|
69
|
Sha CM, Wang J, Dokholyan NV. NeuralDock: Rapid and Conformation-Agnostic Docking of Small Molecules. Front Mol Biosci 2022; 9:867241. [PMID: 35392534 PMCID: PMC8980736 DOI: 10.3389/fmolb.2022.867241] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 02/22/2022] [Indexed: 01/09/2023] Open
Abstract
Virtual screening is a cost- and time-effective alternative to traditional high-throughput screening in the drug discovery process. Both virtual screening approaches, structure-based molecular docking and ligand-based cheminformatics, suffer from computational cost, low accuracy, and/or reliance on prior knowledge of a ligand that binds to a given target. Here, we propose a neural network framework, NeuralDock, which accelerates the process of high-quality computational docking by a factor of 106, and does not require prior knowledge of a ligand that binds to a given target. By approximating both protein-small molecule conformational sampling and energy-based scoring, NeuralDock accurately predicts the binding energy, and affinity of a protein-small molecule pair, based on protein pocket 3D structure and small molecule topology. We use NeuralDock and 25 GPUs to dock 937 million molecules from the ZINC database against superoxide dismutase-1 in 21 h, which we validate with physical docking using MedusaDock. Due to its speed and accuracy, NeuralDock may be useful in brute-force virtual screening of massive chemical libraries and training of generative drug models.
Collapse
Affiliation(s)
- Congzhou M. Sha
- Department of Engineering Science and Mechanics, Pennsylvania State University, University Park, PA, United States
- Department of Pharmacology, Penn State College of Medicine, Hershey, PA, United States
| | - Jian Wang
- Department of Pharmacology, Penn State College of Medicine, Hershey, PA, United States
| | - Nikolay V. Dokholyan
- Department of Engineering Science and Mechanics, Pennsylvania State University, University Park, PA, United States
- Department of Pharmacology, Penn State College of Medicine, Hershey, PA, United States
- Department of Biochemistry and Molecular Biology, Penn State College of Medicine, Hershey, PA, United States
- Departments of Chemistry and Biomedical Engineering, Penn State University, University Park, PA, United States
- *Correspondence: Nikolay V. Dokholyan,
| |
Collapse
|
70
|
Affinity prediction using deep learning based on SMILES input for D3R grand challenge 4. J Comput Aided Mol Des 2022; 36:225-235. [PMID: 35314897 DOI: 10.1007/s10822-022-00448-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 03/08/2022] [Indexed: 10/18/2022]
Abstract
Modern molecular docking comprises the prediction of pose and affinity. Prediction of docking poses is required for affinity prediction when three-dimensional coordinates of the ligand have not been provided. However, a large number of feature engineering is required for existing methods. In addition, there is a need for a robust model for the sequential combination of pose and affinity prediction due to the probabilistic deviation of the ligand position issue. We propose a pipeline using a bipartite graph neural network and transfer learning trained on a re-docking dataset. We evaluated our model on the released data from drug design data resource grand challenge 4 (D3R GC4). The two target protein data provided by the challenge have different patterns. The model outperformed the best participant by 9% on the BACE target protein from stage 2. Further, our model showed competitive performance on the CatS target protein.
Collapse
|
71
|
Zheng L, Meng J, Jiang K, Lan H, Wang Z, Lin M, Li W, Guo H, Wei Y, Mu Y. Improving protein-ligand docking and screening accuracies by incorporating a scoring function correction term. Brief Bioinform 2022; 23:6548372. [PMID: 35289359 PMCID: PMC9116214 DOI: 10.1093/bib/bbac051] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 01/30/2022] [Accepted: 01/31/2022] [Indexed: 12/13/2022] Open
Abstract
Scoring functions are important components in molecular docking for structure-based drug discovery. Traditional scoring functions, generally empirical- or force field-based, are robust and have proven to be useful for identifying hits and lead optimizations. Although multiple highly accurate deep learning- or machine learning-based scoring functions have been developed, their direct applications for docking and screening are limited. We describe a novel strategy to develop a reliable protein–ligand scoring function by augmenting the traditional scoring function Vina score using a correction term (OnionNet-SFCT). The correction term is developed based on an AdaBoost random forest model, utilizing multiple layers of contacts formed between protein residues and ligand atoms. In addition to the Vina score, the model considerably enhances the AutoDock Vina prediction abilities for docking and screening tasks based on different benchmarks (such as cross-docking dataset, CASF-2016, DUD-E and DUD-AD). Furthermore, our model could be combined with multiple docking applications to increase pose selection accuracies and screening abilities, indicating its wide usage for structure-based drug discoveries. Furthermore, in a reverse practice, the combined scoring strategy successfully identified multiple known receptors of a plant hormone. To summarize, the results show that the combination of data-driven model (OnionNet-SFCT) and empirical scoring function (Vina score) is a good scoring strategy that could be useful for structure-based drug discoveries and potentially target fishing in future.
Collapse
Affiliation(s)
- Liangzhen Zheng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Shanghai Zelixir Biotech Company Ltd., Shanghai 200030, China
| | - Jintao Meng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,National Supercomputer Center in Shenzhen, Shenzhen, 518000, China
| | - Kai Jiang
- Institute of Plant and Food Science, Department of Biology, School of Life Sciences, Southern University of Science and Technology (SUSTech), Shenzhen, Guangdong 518055, China
| | - Haidong Lan
- Tencent AI Lab, Shenzhen, Guangdong 518000, China
| | - Zechen Wang
- School of Physics, Shandong University, Jinan, Shandong 250101, China
| | - Mingzhi Lin
- Shanghai Zelixir Biotech Company Ltd., Shanghai 200030, China
| | - Weifeng Li
- School of Physics, Shandong University, Jinan, Shandong 250101, China
| | - Hongwei Guo
- Institute of Plant and Food Science, Department of Biology, School of Life Sciences, Southern University of Science and Technology (SUSTech), Shenzhen, Guangdong 518055, China
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive 637551, Singapore
| |
Collapse
|
72
|
Stafford KA, Anderson BM, Sorenson J, van den Bedem H. AtomNet PoseRanker: Enriching Ligand Pose Quality for Dynamic Proteins in Virtual High-Throughput Screens. J Chem Inf Model 2022; 62:1178-1189. [PMID: 35235748 PMCID: PMC8924924 DOI: 10.1021/acs.jcim.1c01250] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Indexed: 12/17/2022]
Abstract
Structure-based, virtual High-Throughput Screening (vHTS) methods for predicting ligand activity in drug discovery are important when there are no or relatively few known compounds that interact with a therapeutic target of interest. State-of-the-art computational vHTS necessarily relies on effective methods for pose sampling and docking and generating an accurate affinity score from the docked poses. However, proteins are dynamic; in vivo ligands bind to a conformational ensemble. In silico docking to the single conformation represented by a crystal structure can adversely affect the pose quality. Here, we introduce AtomNet PoseRanker (ANPR), a graph convolutional network trained to identify and rerank crystal-like ligand poses from a sampled ensemble of protein conformations and ligand poses. In contrast to conventional vHTS methods that incorporate receptor flexibility, a deep learning approach can internalize valid cognate and noncognate binding modes corresponding to distinct receptor conformations, thereby learning to infer and account for receptor flexibility even on single conformations. ANPR significantly enriched pose quality in docking to cognate and noncognate receptors of the PDBbind v2019 data set. Improved pose rankings that better represent experimentally observed ligand binding modes improve hit rates in vHTS campaigns and thereby advance computational drug discovery, especially for novel therapeutic targets or novel binding sites.
Collapse
Affiliation(s)
- Kate A. Stafford
- Atomwise,
Inc., 717 Market Street, Suite 800, San Francisco, California 94103, United States
| | - Brandon M. Anderson
- Atomwise,
Inc., 717 Market Street, Suite 800, San Francisco, California 94103, United States
| | - Jon Sorenson
- Atomwise,
Inc., 717 Market Street, Suite 800, San Francisco, California 94103, United States
| | - Henry van den Bedem
- Atomwise,
Inc., 717 Market Street, Suite 800, San Francisco, California 94103, United States
- Department
of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94158, United States
| |
Collapse
|
73
|
A Brief Review of Machine Learning-Based Bioactive Compound Research. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12062906] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Bioactive compounds are often used as initial substances for many therapeutic agents. In recent years, both theoretical and practical innovations in hardware-assisted and fast-evolving machine learning (ML) have made it possible to identify desired bioactive compounds in chemical spaces, such as those in natural products (NPs). This review introduces how machine learning approaches can be used for the identification and evaluation of bioactive compounds. It also provides an overview of recent research trends in machine learning-based prediction and the evaluation of bioactive compounds by listing real-world examples along with various input data. In addition, several ML-based approaches to identify specific bioactive compounds for cardiovascular and metabolic diseases are described. Overall, these approaches are important for the discovery of novel bioactive compounds and provide new insights into the machine learning basis for various traditional applications of bioactive compound-related research.
Collapse
|
74
|
Ragoza M, Masuda T, Koes DR. Generating 3D molecules conditional on receptor binding sites with deep generative models. Chem Sci 2022; 13:2701-2713. [PMID: 35356675 PMCID: PMC8890264 DOI: 10.1039/d1sc05976a] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 02/06/2022] [Indexed: 11/22/2022] Open
Abstract
The goal of structure-based drug discovery is to find small molecules that bind to a given target protein. Deep learning has been used to generate drug-like molecules with certain cheminformatic properties, but has not yet been applied to generating 3D molecules predicted to bind to proteins by sampling the conditional distribution of protein-ligand binding interactions. In this work, we describe for the first time a deep learning system for generating 3D molecular structures conditioned on a receptor binding site. We approach the problem using a conditional variational autoencoder trained on an atomic density grid representation of cross-docked protein-ligand structures. We apply atom fitting and bond inference procedures to construct valid molecular conformations from generated atomic densities. We evaluate the properties of the generated molecules and demonstrate that they change significantly when conditioned on mutated receptors. We also explore the latent space learned by our generative model using sampling and interpolation techniques. This work opens the door for end-to-end prediction of stable bioactive molecules from protein structures with deep learning.
Collapse
Affiliation(s)
- Matthew Ragoza
- Intelligent Systems Program, University of Pittsburgh Pittsburgh PA 15213 USA
| | - Tomohide Masuda
- Department of Computational and Systems Biology, University of Pittsburgh Pittsburgh PA 15213 USA
| | - David Ryan Koes
- Department of Computational and Systems Biology, University of Pittsburgh Pittsburgh PA 15213 USA
| |
Collapse
|
75
|
Clyde A. Ultrahigh Throughput Protein-Ligand Docking with Deep Learning. Methods Mol Biol 2022; 2390:301-319. [PMID: 34731475 DOI: 10.1007/978-1-0716-1787-8_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Ultrahigh-throughput virtual screening (uHTVS) is an emerging field linking together classical docking techniques with high-throughput AI methods. We outline mechanistic docking models' goals and successes. We present different AI accelerated workflows for uHTVS, mainly through surrogate docking models. We showcase a novel feature representation technique, molecular depictions (images), as a surrogate model for docking. Along with a discussion on analyzing screens using regression enrichment surfaces at the tens of billion scale, we outline a future for uHTVS screening pipelines with deep learning.
Collapse
Affiliation(s)
- Austin Clyde
- Department of Computer Science, University of Chicago, Chicago, IL, USA.
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA.
| |
Collapse
|
76
|
Abstract
Virtual screening-predicting which compounds within a specified compound library bind to a target molecule, typically a protein-is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions.
Collapse
Affiliation(s)
| | - David Ryan Koes
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA;
| |
Collapse
|
77
|
Jabir NR, Rehman MT, Alsolami K, Shakil S, Zughaibi TA, Alserihi RF, Khan MS, AlAjmi MF, Tabrez S. Concatenation of molecular docking and molecular simulation of BACE-1, γ-secretase targeted ligands: in pursuit of Alzheimer's treatment. Ann Med 2021; 53:2332-2344. [PMID: 34889159 PMCID: PMC8667905 DOI: 10.1080/07853890.2021.2009124] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 11/15/2021] [Indexed: 12/13/2022] Open
Abstract
INTRODUCTION Alzheimer's disease (AD), the most predominant cause of dementia, has evolved tremendously with an escalating frequency, mainly affecting the elderly population. An effective means of delaying, preventing, or treating AD is yet to be achieved. The failure rate of dementia drug trials has been relatively higher than in other disease-related clinical trials. Hence, multi-targeted therapeutic approaches are gaining attention in pharmacological developments. AIMS As an extension of our earlier reports, we have performed docking and molecular dynamic (MD) simulation studies for the same 13 potential ligands against beta-site APP cleaving enzyme 1 (BACE-1) and γ-secretase as a therapeutic target for AD. The In-silico screening of these ligands as potential inhibitors of BACE-1 and γ-secretase was performed using AutoDock enabled PyRx v-0.8. The protein-ligand interactions were analyzed in Discovery Studio 2020 (BIOVIA). The stability of the most promising ligand against BACE-1 and γ-secretase was evaluated by MD simulation using Desmond-2018 (Schrodinger, LLC, NY, USA). RESULTS The computational screening revealed that the docking energy values for each of the ligands against both the target enzymes were in the range of -7.0 to -10.1 kcal/mol. Among the 13 ligands, 8 (55E, 6Z2, 6Z5, BRW, F1B, GVP, IQ6, and X37) showed binding energies of ≤-8 kcal/mol against BACE-1 and γ-secretase. For the selected enzyme targets, BACE-1 and γ-secretase, 6Z5 displayed the lowest binding energy of -10.1 and -9.8 kcal/mol, respectively. The MD simulation study confirmed the stability of BACE-6Z5 and γ-secretase-6Z5 complexes and highlighted the formation of a stable complex between 6Z5 and target enzymes. CONCLUSION The virtual screening, molecular docking, and molecular dynamics simulation studies revealed the potential of these multi-enzyme targeted ligands. Among the studied ligands, 6Z5 seems to have the best binding potential and forms a stable complex with BACE-1 and γ-secretase. We recommend the synthesis of 6Z5 for future in-vitro and in-vivo studies.
Collapse
Affiliation(s)
- Nasimudeen R. Jabir
- Department of Biochemistry, Centre for Research and Development, PRIST University, Thanjavur, India
| | - Md. Tabish Rehman
- Department of Pharmacognosy, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Khadeejah Alsolami
- Department of Pharmacology and Toxicology, College of Pharmacy, Taif University, Taif, Saudi Arabia
| | - Shazi Shakil
- King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Torki A. Zughaibi
- King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Raed F. Alserihi
- Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
- 3D Bioprinting Unit, Center of Innovation in Personalized Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mohd. Shahnawaz Khan
- Department of Biochemistry, College of Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Mohamed F. AlAjmi
- Department of Pharmacognosy, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Shams Tabrez
- King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
78
|
Shen C, Hu X, Gao J, Zhang X, Zhong H, Wang Z, Xu L, Kang Y, Cao D, Hou T. The impact of cross-docked poses on performance of machine learning classifier for protein-ligand binding pose prediction. J Cheminform 2021; 13:81. [PMID: 34656169 PMCID: PMC8520186 DOI: 10.1186/s13321-021-00560-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 10/05/2021] [Indexed: 02/06/2023] Open
Abstract
Structure-based drug design depends on the detailed knowledge of the three-dimensional (3D) structures of protein-ligand binding complexes, but accurate prediction of ligand-binding poses is still a major challenge for molecular docking due to deficiency of scoring functions (SFs) and ignorance of protein flexibility upon ligand binding. In this study, based on a cross-docking dataset dedicatedly constructed from the PDBbind database, we developed several XGBoost-trained classifiers to discriminate the near-native binding poses from decoys, and systematically assessed their performance with/without the involvement of the cross-docked poses in the training/test sets. The calculation results illustrate that using Extended Connectivity Interaction Features (ECIF), Vina energy terms and docking pose ranks as the features can achieve the best performance, according to the validation through the random splitting or refined-core splitting and the testing on the re-docked or cross-docked poses. Besides, it is found that, despite the significant decrease of the performance for the threefold clustered cross-validation, the inclusion of the Vina energy terms can effectively ensure the lower limit of the performance of the models and thus improve their generalization capability. Furthermore, our calculation results also highlight the importance of the incorporation of the cross-docked poses into the training of the SFs with wide application domain and high robustness for binding pose prediction. The source code and the newly-developed cross-docking datasets can be freely available at https://github.com/sc8668/ml_pose_prediction and https://zenodo.org/record/5525936 , respectively, under an open-source license. We believe that our study may provide valuable guidance for the development and assessment of new machine learning-based SFs (MLSFs) for the predictions of protein-ligand binding poses.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Xueping Hu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Haiyang Zhong
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan, 410013, People's Republic of China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China. .,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.
| |
Collapse
|
79
|
Crampon K, Giorkallos A, Deldossi M, Baud S, Steffenel LA. Machine-learning methods for ligand-protein molecular docking. Drug Discov Today 2021; 27:151-164. [PMID: 34560276 DOI: 10.1016/j.drudis.2021.09.007] [Citation(s) in RCA: 96] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 07/14/2021] [Accepted: 09/15/2021] [Indexed: 12/22/2022]
Abstract
Artificial intelligence (AI) is often presented as a new Industrial Revolution. Many domains use AI, including molecular simulation for drug discovery. In this review, we provide an overview of ligand-protein molecular docking and how machine learning (ML), especially deep learning (DL), a subset of ML, is transforming the field by tackling the associated challenges.
Collapse
Affiliation(s)
- Kevin Crampon
- Université de Reims Champagne Ardenne, CNRS, MEDyC UMR 7369, 51097 Reims, France; Université de Reims Champagne Ardenne, LICIIS - LRC CEA DIGIT, 51100 Reims, France; Atos SE, Center of Excellence in Advanced Computing, 38130 Echirolles, France
| | - Alexis Giorkallos
- Atos SE, Center of Excellence in Advanced Computing, 38130 Echirolles, France
| | - Myrtille Deldossi
- Atos SE, Center of Excellence in Advanced Computing, 38130 Echirolles, France
| | - Stéphanie Baud
- Université de Reims Champagne Ardenne, CNRS, MEDyC UMR 7369, 51097 Reims, France
| | | |
Collapse
|
80
|
Meli R, Anighoro A, Bodkin MJ, Morris GM, Biggin PC. Learning protein-ligand binding affinity with atomic environment vectors. J Cheminform 2021; 13:59. [PMID: 34391475 PMCID: PMC8364054 DOI: 10.1186/s13321-021-00536-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 07/21/2021] [Indexed: 12/03/2022] Open
Abstract
Scoring functions for the prediction of protein-ligand binding affinity have seen renewed interest in recent years when novel machine learning and deep learning methods started to consistently outperform classical scoring functions. Here we explore the use of atomic environment vectors (AEVs) and feed-forward neural networks, the building blocks of several neural network potentials, for the prediction of protein-ligand binding affinity. The AEV-based scoring function, which we term AEScore, is shown to perform as well or better than other state-of-the-art scoring functions on binding affinity prediction, with an RMSE of 1.22 pK units and a Pearson’s correlation coefficient of 0.83 for the CASF-2016 benchmark. However, AEScore does not perform as well in docking and virtual screening tasks, for which it has not been explicitly trained. Therefore, we show that the model can be combined with the classical scoring function AutoDock Vina in the context of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\Delta$$\end{document}Δ-learning, where corrections to the AutoDock Vina scoring function are learned instead of the protein-ligand binding affinity itself. Combined with AutoDock Vina, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\Delta$$\end{document}Δ-AEScore has an RMSE of 1.32 pK units and a Pearson’s correlation coefficient of 0.80 on the CASF-2016 benchmark, while retaining the docking and screening power of the underlying classical scoring function.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, UK
| | | | | | | | - Philip C Biggin
- Department of Biochemistry, University of Oxford, Oxford, UK.
| |
Collapse
|
81
|
Jabir NR, Rehman MT, Tabrez S, Alserihi RF, AlAjmi MF, Khan MS, Husain FM, Ahmed BA. Identification of Butyrylcholinesterase and Monoamine Oxidase B Targeted Ligands and their Putative Application in Alzheimer's Treatment: A Computational Strategy. Curr Pharm Des 2021; 27:2425-2434. [PMID: 33634754 DOI: 10.2174/1381612827666210226123240] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Accepted: 02/19/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND With the burgeoning worldwide aging population, the incidence of Alzheimer's disease (AD) and its associated disorders is continuously rising. To appraise other relevant drug targets that could lead to potent enzyme targeting, 13 previously predicted ligands (shown favorable binding with AChE (acetylcholinesterase) and GSK-3 (glycogen synthase kinase) were screened for targeting 3 different enzymes, namely butyrylcholinesterase (BChE), monoamine oxidase A (MAO-A), and monoamine oxidase B (MAO-B) to possibly meet the unmet medical need of better AD treatment. MATERIALS AND METHODS The study utilized in silico screening of 13 ligands against BChE, MAO-A and MAOB using PyRx-Python prescription 0.8. The visualization of the active interaction of studied compounds with targeted proteins was performed by Discovery Studio 2020 (BIOVIA). RESULTS The computational screening of studied ligands revealed the docking energies in the range of -2.4 to -11.3 kcal/mol for all the studied enzymes. Among the 13 ligands, 8 ligands (55E, 6Z2, 6Z5, BRW, F1B, GVP, IQ6, and X37) showed the binding energies of ≤ -8.0 kcal/mol towards BChE, MAO-A and MAO-B. The ligand 6Z5 was found to be the most potent inhibitor of BChE and MAO-B, with a binding energy of -9.7 and -10.4 kcal mol, respectively. Molecular dynamics simulation of BChE-6Z5 and MAO-B-6Z5 complex confirmed the formation of a stable complex. CONCLUSION Our computational screening, molecular docking, and molecular dynamics simulation studies revealed that the above-mentioned enzymes targeted ligands might expedite the future design of potent anti-AD drugs generated on this chemical scaffold.
Collapse
Affiliation(s)
- Nasimudeen R Jabir
- Department of Biochemistry, Centre for Research and Development, PRIST University, Vallam, Thanjavur, Tamil Nadu, India
| | - Md Tabish Rehman
- Department of Pharmacognosy, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Shams Tabrez
- King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Raed F Alserihi
- Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mohamed F AlAjmi
- Department of Pharmacognosy, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Mohd Shahnawaz Khan
- Protein Research Chair, Department of Biochemistry, College of Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Fohad Mabood Husain
- Department of Food Science and Nutrition, Faculty of Food and Agriculture Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Bakrudeen Ali Ahmed
- Department of Biochemistry, Centre for Research and Development, PRIST University, Vallam, Thanjavur, Tamil Nadu, India
| |
Collapse
|
82
|
Xiong G, Shen C, Yang Z, Jiang D, Liu S, Lu A, Chen X, Hou T, Cao D. Featurization strategies for protein–ligand interactions and their applications in scoring function development. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1567] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Guoli Xiong
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Ziyi Yang
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Dejun Jiang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
- College of Computer Science and Technology Zhejiang University Hangzhou China
| | - Shao Liu
- Department of Pharmacy Xiangya Hospital, Central South University Changsha China
| | - Aiping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis Xiangya Hospital, Central South University Changsha China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| |
Collapse
|
83
|
McNutt AT, Francoeur P, Aggarwal R, Masuda T, Meli R, Ragoza M, Sunseri J, Koes DR. GNINA 1.0: molecular docking with deep learning. J Cheminform 2021; 13:43. [PMID: 34108002 PMCID: PMC8191141 DOI: 10.1186/s13321-021-00522-2] [Citation(s) in RCA: 175] [Impact Index Per Article: 58.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 05/26/2021] [Indexed: 12/20/2022] Open
Abstract
Molecular docking computationally predicts the conformation of a small molecule when binding to a receptor. Scoring functions are a vital piece of any molecular docking pipeline as they determine the fitness of sampled poses. Here we describe and evaluate the 1.0 release of the Gnina docking software, which utilizes an ensemble of convolutional neural networks (CNNs) as a scoring function. We also explore an array of parameter values for Gnina 1.0 to optimize docking performance and computational cost. Docking performance, as evaluated by the percentage of targets where the top pose is better than 2Å root mean square deviation (Top1), is compared to AutoDock Vina scoring when utilizing explicitly defined binding pockets or whole protein docking. GNINA, utilizing a CNN scoring function to rescore the output poses, outperforms AutoDock Vina scoring on redocking and cross-docking tasks when the binding pocket is defined (Top1 increases from 58% to 73% and from 27% to 37%, respectively) and when the whole protein defines the binding pocket (Top1 increases from 31% to 38% and from 12% to 16%, respectively). The derived ensemble of CNNs generalizes to unseen proteins and ligands and produces scores that correlate well with the root mean square deviation to the known binding pose. We provide the 1.0 version of GNINA under an open source license for use as a molecular docking tool at https://github.com/gnina/gnina .
Collapse
Affiliation(s)
- Andrew T McNutt
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Paul Francoeur
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Rishal Aggarwal
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500 032, India
| | - Tomohide Masuda
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Matthew Ragoza
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Jocelyn Sunseri
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - David Ryan Koes
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
84
|
Qin T, Zhu Z, Wang XS, Xia J, Wu S. Computational representations of protein-ligand interfaces for structure-based virtual screening. Expert Opin Drug Discov 2021; 16:1175-1192. [PMID: 34011222 DOI: 10.1080/17460441.2021.1929921] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Introduction: Structure-based virtual screening (SBVS) is an essential strategy for hit identification. SBVS primarily uses molecular docking, which exploits the protein-ligand binding mode and associated affinity score for compound ranking. Previous studies have shown that computational representation of protein-ligand interfaces and the later establishment of machine learning models are efficacious in improving the accuracy of SBVS.Areas covered: The authors review the computational methods for representing protein-ligand interfaces, which include the traditional ones that use deliberately designed fingerprints and descriptors and the more recent methods that automatically extract features with deep learning. The effects of these methods on the performance of machine learning models are briefly discussed. Additionally, case studies that applied various computational representations to machine learning are cited with remarks.Expert opinion: It has become a trend to extract binding features automatically by deep learning, which uses a completely end-to-end representation. However, there is still plenty of scope for improvement . The interpretability of deep-learning models, the organization of data management, the quantity and quality of available data, and the optimization of hyperparameters could impact the accuracy of feature extraction. In addition, other important structural factors such as water molecules and protein flexibility should be considered.
Collapse
Affiliation(s)
- Tong Qin
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zihao Zhu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xiang Simon Wang
- Artificial Intelligence and Drug Discovery Core Laboratory for District of Columbia Center for AIDS Research (DC CFAR), Department of Pharmaceutical Sciences, College of Pharmacy, Howard University, U.S.A
| | - Jie Xia
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Song Wu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
85
|
Francoeur PG, Koes DR. SolTranNet-A Machine Learning Tool for Fast Aqueous Solubility Prediction. J Chem Inf Model 2021; 61:2530-2536. [PMID: 34038123 DOI: 10.1021/acs.jcim.1c00331] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
While accurate prediction of aqueous solubility remains a challenge in drug discovery, machine learning (ML) approaches have become increasingly popular for this task. For instance, in the Second Challenge to Predict Aqueous Solubility (SC2), all groups utilized machine learning methods in their submissions. We present SolTranNet, a molecule attention transformer to predict aqueous solubility from a molecule's SMILES representation. Atypically, we demonstrate that larger models perform worse at this task, with SolTranNet's final architecture having 3,393 parameters while outperforming linear ML approaches. SolTranNet has a 3-fold scaffold split cross-validation root-mean-square error (RMSE) of 1.459 on AqSolDB and an RMSE of 1.711 on a withheld test set. We also demonstrate that, when used as a classifier to filter out insoluble compounds, SolTranNet achieves a sensitivity of 94.8% on the SC2 data set and is competitive with the other methods submitted to the competition. SolTranNet is distributed via pip, and its source code is available at https://github.com/gnina/SolTranNet.
Collapse
Affiliation(s)
- Paul G Francoeur
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - David R Koes
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| |
Collapse
|
86
|
Deng D, Chen X, Zhang R, Lei Z, Wang X, Zhou F. XGraphBoost: Extracting Graph Neural Network-Based Features for a Better Prediction of Molecular Properties. J Chem Inf Model 2021; 61:2697-2705. [PMID: 34009965 DOI: 10.1021/acs.jcim.0c01489] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Determining the properties of chemical molecules is essential for screening candidates similar to a specific drug. These candidate molecules are further evaluated for their target binding affinities, side effects, target missing probabilities, etc. Conventional machine learning algorithms demonstrated satisfying prediction accuracies of molecular properties. A molecule cannot be directly loaded into a machine learning model, and a set of engineered features needs to be designed and calculated from a molecule. Such hand-crafted features rely heavily on the experiences of the investigating researchers. The concept of graph neural networks (GNNs) was recently introduced to describe the chemical molecules. The features may be automatically and objectively extracted from the molecules through various types of GNNs, e.g., GCN (graph convolution network), GGNN (gated graph neural network), DMPNN (directed message passing neural network), etc. However, the training of a stable GNN model requires a huge number of training samples and a large amount of computing power, compared with the conventional machine learning strategies. This study proposed the integrated framework XGraphBoost to extract the features using a GNN and build an accurate prediction model of molecular properties using the classifier XGBoost. The proposed framework XGraphBoost fully inherits the merits of the GNN-based automatic molecular feature extraction and XGBoost-based accurate prediction performance. Both classification and regression problems were evaluated using the framework XGraphBoost. The experimental results strongly suggest that XGraphBoost may facilitate the efficient and accurate predictions of various molecular properties. The source code is freely available to academic users at https://github.com/chenxiaowei-vincent/XGraphBoost.git.
Collapse
Affiliation(s)
- Daiguo Deng
- Fermion Technology Co., Ltd., Guangzhou, Guangdong 510000, P.R. China
| | - Xiaowei Chen
- Fermion Technology Co., Ltd., Guangzhou, Guangdong 510000, P.R. China
| | - Ruochi Zhang
- Fermion Technology Co., Ltd., Guangzhou, Guangdong 510000, P.R. China.,College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Zengrong Lei
- Fermion Technology Co., Ltd., Guangzhou, Guangdong 510000, P.R. China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P.R. China
| | - Fengfeng Zhou
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| |
Collapse
|
87
|
Son J, Kim D. Development of a graph convolutional neural network model for efficient prediction of protein-ligand binding affinities. PLoS One 2021; 16:e0249404. [PMID: 33831016 PMCID: PMC8031450 DOI: 10.1371/journal.pone.0249404] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 03/17/2021] [Indexed: 12/15/2022] Open
Abstract
Prediction of protein-ligand interactions is a critical step during the initial phase of drug discovery. We propose a novel deep-learning-based prediction model based on a graph convolutional neural network, named GraphBAR, for protein-ligand binding affinity. Graph convolutional neural networks reduce the computational time and resources that are normally required by the traditional convolutional neural network models. In this technique, the structure of a protein-ligand complex is represented as a graph of multiple adjacency matrices whose entries are affected by distances, and a feature matrix that describes the molecular properties of the atoms. We evaluated the predictive power of GraphBAR for protein-ligand binding affinities by using PDBbind datasets and proved the efficiency of the graph convolution. Given the computational efficiency of graph convolutional neural networks, we also performed data augmentation to improve the model performance. We found that data augmentation with docking simulation data could improve the prediction accuracy although the improvement seems not to be significant. The high prediction performance and speed of GraphBAR suggest that such networks can serve as valuable tools in drug discovery.
Collapse
Affiliation(s)
- Jeongtae Son
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| |
Collapse
|
88
|
Abstract
Good binding poses and affinities predicted by docking can be calculated accurately if proper care is taken. Accounting for the entropic penalty to the binding energy due to restriction of conformational freedom in flexible ligands on binding is computationally difficult but very important for obtaining reliable ranking of ligand binding affinities to specific protein targets.
Collapse
Affiliation(s)
- David A Winkler
- La Trobe University, Kingsbury Drive, Bundoora 3042, Australia.,Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Australia.,School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, United Kingdom.,CSIRO Data61, Pullenvale 4069, Australia
| |
Collapse
|