1
|
Stadler PF, Will S. Bi-alignments with affine gaps costs. Algorithms Mol Biol 2022; 17:10. [PMID: 35578255 PMCID: PMC9109335 DOI: 10.1186/s13015-022-00219-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 05/01/2022] [Indexed: 12/02/2022] Open
Abstract
Background Commonly, sequence and structure elements are assumed to evolve congruently, such that homologous sequence positions correspond to homologous structural features. Assuming congruent evolution, alignments based on sequence and structure similarity can therefore optimize both similarities at the same time in a single alignment. To model incongruent evolution, where sequence and structural features diverge positionally, we recently introduced bi-alignments. This generalization of sequence and structure-based alignments is best understood as alignments of two distinct pairwise alignments of the same entities: one modeling sequence similarity, the other structural similarity. Results Optimal bi-alignments with affine gap costs (or affine shift cost) for two constituent alignments can be computed exactly in quartic space and time. Even bi-alignments with affine shift and gap cost, as well as bi-alignment with sub-additive gap cost are optimized efficiently. Affine gap-cost bi-alignment of large proteins (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sim 930$$\end{document}∼930 aa) can be computed. Conclusion Affine cost bi-alignments are of practical interest to study shifts of protein sequences and protein structures relative to each other. Availability The affine cost bi-alignment algorithm has been implemented in Python 3 and Cython. It is available as free software from https://github.com/s-will/BiAlign/releases/tag/v0.3 and as bioconda package bialign. Supplementary Information The online version contains supplementary material available at 10.1186/s13015-022-00219-7.
Collapse
|
2
|
Abstract
Computational methods play an increasingly important role in drug discovery. Structure-based drug design (SBDD), in particular, includes techniques that take into account the structure of the macromolecular target to predict compounds that are likely to establish optimal interactions with the binding site. The current interest in machine learning algorithms based on deep neural networks encouraged the application of deep learning to SBDD related problems. This chapter covers selected works in this active area of research.
Collapse
|
3
|
Grebner C, Matter H, Hessler G. Artificial Intelligence in Compound Design. Methods Mol Biol 2021; 2390:349-382. [PMID: 34731477 DOI: 10.1007/978-1-0716-1787-8_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Artificial intelligence has seen an incredibly fast development in recent years. Many novel technologies for property prediction of drug molecules as well as for the design of novel molecules were introduced by different research groups. These artificial intelligence-based design methods can be applied for suggesting novel chemical motifs in lead generation or scaffold hopping as well as for optimization of desired property profiles during lead optimization. In lead generation, broad sampling of the chemical space for identification of novel motifs is required, while in the lead optimization phase, a detailed exploration of the chemical neighborhood of a current lead series is advantageous. These different requirements for successful design outcomes render different combinations of artificial intelligence technologies useful. Overall, we observe that a combination of different approaches with tailored scoring and evaluation schemes appears beneficial for efficient artificial intelligence-based compound design.
Collapse
Affiliation(s)
- Christoph Grebner
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany
| | - Hans Matter
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany
| | - Gerhard Hessler
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany.
| |
Collapse
|
4
|
Abstract
Molecular docking has become an important component of the drug discovery process. Since first being developed in the 1980s, advancements in the power of computer hardware and the increasing number of and ease of access to small molecule and protein structures have contributed to the development of improved methods, making docking more popular in both industrial and academic settings. Over the years, the modalities by which docking is used to assist the different tasks of drug discovery have changed. Although initially developed and used as a standalone method, docking is now mostly employed in combination with other computational approaches within integrated workflows. Despite its invaluable contribution to the drug discovery process, molecular docking is still far from perfect. In this chapter we will provide an introduction to molecular docking and to the different docking procedures with a focus on several considerations and protocols, including protonation states, active site waters and consensus, that can greatly improve the docking results.
Collapse
Affiliation(s)
| | - Ilenia Giangreco
- Cambridge Crystallographic Data Centre, Cambridge, United Kingdom
| | - Jason C Cole
- Cambridge Crystallographic Data Centre, Cambridge, United Kingdom
| |
Collapse
|
5
|
Ghislat G, Rahman T, Ballester PJ. Recent progress on the prospective application of machine learning to structure-based virtual screening. Curr Opin Chem Biol 2021; 65:28-34. [PMID: 34052776 DOI: 10.1016/j.cbpa.2021.04.009] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/13/2021] [Accepted: 04/23/2021] [Indexed: 12/30/2022]
Abstract
As more bioactivity and protein structure data become available, scoring functions (SFs) using machine learning (ML) to leverage these data sets continue to gain further accuracy and broader applicability. Advances in our understanding of the optimal ways to train and evaluate these ML-based SFs have introduced further improvements. One of these advances is how to select the most suitable decoys (molecules assumed inactive) to train or test an ML-based SF on a given target. We also review the latest applications of ML-based SFs for prospective structure-based virtual screening (SBVS), with a focus on the observed improvement over those using classical SFs. Finally, we provide recommendations for future prospective SBVS studies based on the findings of recent methodological studies.
Collapse
Affiliation(s)
- Ghita Ghislat
- U1104, CNRS UMR7280, Centre D'Immunologie de Marseille-Luminy, Inserm, Marseille, France
| | - Taufiq Rahman
- Department of Pharmacology, University of Cambridge, Cambridge, CB2 1PD, UK
| | - Pedro J Ballester
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France; CNRS, UMR7258, Marseille, F-13009, France; Institut Paoli-Calmettes, Marseille, F-13009, France; Aix-Marseille University, UM 105, F-13284, Marseille, France.
| |
Collapse
|
6
|
Siddiqa MA, Rao DS, Suvarna G, Chennamachetty VK, Verma MK, Rao MVR. In-Silico Drug Designing of Spike Receptor with Its ACE2 Receptor and Nsp10/Nsp16 MTase Complex Against SARS-CoV-2. Int J Pept Res Ther 2021;:1-8. [PMID: 33746660 DOI: 10.1007/s10989-021-10196-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/02/2021] [Indexed: 12/23/2022]
Abstract
The realm Riboviria constitutes Coronaviruses, which led to the emergence of the pandemic COVID 19 in the twenty-first century affected millions of lives. At present, the management of COVID 19 largely depends on antiviral therapeutics along with the anti-inflammatory drug. The vaccine is under the final clinical phase, and emergency use is available. We aim at ACE2 and Nsp10/Nsp16 MTase as potential drug candidate in COVID 19 management in the present work. For drug designing, various computational simulation strategies have been employed like Swiss-Model, Hawk Dock, HDOCK, py Dock, and PockDrug for homology modeling, binding energies of the molecule with a target, simulate the conformation and binding poses, statistics of protein lock with target key and drug ability, respectively. The current in-silico screening depicts that the spike protein receptor is complementary to the target when bound to each other and forms a stable complex. The MMGBSA free energy binding property of receptor and ligand is critical. The intermolecular Statistics with the target Nsp10/Nsp16 MTase complex are plausible. We have also observed a high-affinity pocket binding site with the target. Therefore, the favorable intermolecular interactions and Physico-chemical properties emanate as a drug candidate treating COVID-19. This study has approached computational tools to analyze the conformation, binding affinity, and drug ability of receptor-ligand. Thus, the spike receptor with its ACE2 receptor with Nsp10/Nsp16 MTase complex would be a potent drug against SARS CoV-2 and can cure the infection as per consensus scoring.
Collapse
|
7
|
Zhang X, Shen C, Guo X, Wang Z, Weng G, Ye Q, Wang G, He Q, Yang B, Cao D, Hou T. ASFP (Artificial Intelligence based Scoring Function Platform): a web server for the development of customized scoring functions. J Cheminform 2021; 13:6. [PMID: 33541407 PMCID: PMC7860246 DOI: 10.1186/s13321-021-00486-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 01/17/2021] [Indexed: 12/18/2022] Open
Abstract
Virtual screening (VS) based on molecular docking has emerged as one of the mainstream technologies of drug discovery due to its low cost and high efficiency. However, the scoring functions (SFs) implemented in most docking programs are not always accurate enough and how to improve their prediction accuracy is still a big challenge. Here, we propose an integrated platform called ASFP, a web server for the development of customized SFs for structure-based VS. There are three main modules in ASFP: (1) the descriptor generation module that can generate up to 3437 descriptors for the modelling of protein–ligand interactions; (2) the AI-based SF construction module that can establish target-specific SFs based on the pre-generated descriptors through three machine learning (ML) techniques; (3) the online prediction module that provides some well-constructed target-specific SFs for VS and an additional generic SF for binding affinity prediction. Our methodology has been validated on several benchmark datasets. The target-specific SFs can achieve an average ROC AUC of 0.973 towards 32 targets and the generic SF can achieve the Pearson correlation coefficient of 0.81 on the PDBbind version 2016 core set. To sum up, the ASFP server is a powerful tool for structure-based VS.
Collapse
Affiliation(s)
- Xujun Zhang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Xueying Guo
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Zhe Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Gaoqi Weng
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Qing Ye
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Gaoang Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Qiaojun He
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Bo Yang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan, 10013, China.
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China. .,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, China.
| |
Collapse
|
8
|
Wang S, Jiang JH, Li RY, Deng P. Docking-based virtual screening of TβR1 inhibitors: evaluation of pose prediction and scoring functions. BMC Chem 2020; 14:52. [PMID: 32818203 PMCID: PMC7427878 DOI: 10.1186/s13065-020-00704-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Accepted: 08/06/2020] [Indexed: 12/22/2022] Open
Abstract
To improve the reliability of virtual screening for transforming growth factor-beta type 1 receptor (TβR1) inhibitors, 2 docking methods and 11 scoring functions in Discovery Studio software were evaluated and validated in this study. LibDock and CDOCKER protocols were performed on a test set of 24 TβR1 protein-ligand complexes. Based on the root-mean-square deviation (RMSD) values (in Å) between the docking poses and co-crystal conformations, the CDOCKER protocol can be efficiently applied to obtain more accurate dockings in medium-size virtual screening experiments of TβR1, with a successful docking rate of 95%. A dataset including 281 known active and 8677 inactive ligands was used to determine the best scoring function. The receiver operating characteristic (ROC) curves were used to compare the performance of scoring functions in attributing best scores to active than inactive ligands. The results show that Ludi 1, PMF, Ludi 2, Ludi 3, PMF04, PLP1, PLP2, LigScore2, Jain and LigScore1 are better scoring functions than the random distribution model, with AUC of 0.864, 0.856, 0.842, 0.812, 0.776, 0.774, 0.769, 0.762, 0.697 and 0.660, respectively. Based on the pairwise comparison of ROC curves, Ludi 1 and PMF were chosen as the best scoring functions for virtual screening of TβR1 inhibitors. Further enrichment factors (EF) analysis also supports PMF and Ludi 1 as the top two scoring functions.
Collapse
Affiliation(s)
- Shuai Wang
- College of Pharmacy, Chongqing Medical University, Chongqing, 400016 China
| | - Jun-Hao Jiang
- College of Pharmacy, Chongqing Medical University, Chongqing, 400016 China
| | - Ruo-Yu Li
- College of Pharmacy, Chongqing Medical University, Chongqing, 400016 China
| | - Ping Deng
- College of Pharmacy, Chongqing Medical University, Chongqing, 400016 China
| |
Collapse
|
9
|
Soni A, Bhat R, Jayaram B. Improving the binding affinity estimations of protein-ligand complexes using machine-learning facilitated force field method. J Comput Aided Mol Des 2020; 34:817-30. [PMID: 32185583 DOI: 10.1007/s10822-020-00305-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 03/07/2020] [Indexed: 10/24/2022]
Abstract
Scoring functions are routinely deployed in structure-based drug design to quantify the potential for protein-ligand (PL) complex formation. Here, we present a new scoring function Bappl+ that is designed to predict the binding affinities of non-metallo and metallo PL complexes. Bappl+ outperforms other state-of-the-art scoring functions, achieving a high Pearson correlation coefficient of up to ~ 0.76 with low standard deviations. The biggest contributors to the increased performance are the use of a machine-learning model and the enlarged training dataset. We have also evaluated the performance of Bappl+ on target-specific proteins, which highlighted the limitations of our function and provides a way for further improvements. We believe that Bappl+ methodology could prove valuable in ranking candidate molecules against a target metallo or non-metallo protein by reliably predicting their binding affinities, thus helping in the drug discovery process.
Collapse
|
10
|
El Khoury L, Santos-Martins D, Sasmal S, Eberhardt J, Bianco G, Ambrosio FA, Solis-Vasquez L, Koch A, Forli S, Mobley DL. Comparison of affinity ranking using AutoDock-GPU and MM-GBSA scores for BACE-1 inhibitors in the D3R Grand Challenge 4. J Comput Aided Mol Des 2019; 33:1011-1020. [PMID: 31691919 PMCID: PMC7027993 DOI: 10.1007/s10822-019-00240-w] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 10/21/2019] [Indexed: 11/25/2022]
Abstract
Molecular docking has been successfully used in computer-aided molecular design projects for the identification of ligand poses within protein binding sites. However, relying on docking scores to rank different ligands with respect to their experimental affinities might not be sufficient. It is believed that the binding scores calculated using molecular mechanics combined with the Poisson-Boltzman surface area (MM-PBSA) or generalized Born surface area (MM-GBSA) can predict binding affinities more accurately. In this perspective, we decided to take part in Stage 2 of the Drug Design Data Resource (D3R) Grand Challenge 4 (GC4) to compare the performance of a quick scoring function, AutoDock4, to that of MM-GBSA in predicting the binding affinities of a set of [Formula: see text]-Amyloid Cleaving Enzyme 1 (BACE-1) ligands. Our results show that re-scoring docking poses using MM-GBSA did not improve the correlation with experimental affinities. We further did a retrospective analysis of the results and found that our MM-GBSA protocol is sensitive to details in the protein-ligand system: (i) neutral ligands are more adapted to MM-GBSA calculations than charged ligands, (ii) predicted binding affinities depend on the initial conformation of the BACE-1 receptor, (iii) protonating the aspartyl dyad of BACE-1 correctly results in more accurate binding affinity predictions.
Collapse
Affiliation(s)
- Léa El Khoury
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, USA
| | - Diogo Santos-Martins
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037-1000, USA
| | - Sukanya Sasmal
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, USA
| | - Jérôme Eberhardt
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037-1000, USA
| | - Giulia Bianco
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037-1000, USA
| | - Francesca Alessandra Ambrosio
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037-1000, USA
- Department of Health Sciences, "Magna Græcia" University of Catanzaro, Campus "S. Venuta", Viale Europa, 88100, Catanzaro, Italy
| | - Leonardo Solis-Vasquez
- Embedded Systems and Applications Group, Technische Universität Darmstadt, Darmstadt, Germany
| | - Andreas Koch
- Embedded Systems and Applications Group, Technische Universität Darmstadt, Darmstadt, Germany
| | - Stefano Forli
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037-1000, USA.
| | - David L Mobley
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, USA.
- Department of Chemistry, University of California, Irvine, 147 Bison Modular, Irvine, CA, 92697, USA.
| |
Collapse
|
11
|
Joseph AP, Lagerstedt I, Patwardhan A, Topf M, Winn M. Improved metrics for comparing structures of macromolecular assemblies determined by 3D electron-microscopy. J Struct Biol 2017; 199:12-26. [PMID: 28552721 PMCID: PMC5479444 DOI: 10.1016/j.jsb.2017.05.007] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2016] [Revised: 05/19/2017] [Accepted: 05/23/2017] [Indexed: 11/28/2022]
Abstract
Recent developments in 3-dimensional electron microcopy (3D-EM) techniques and a concomitant drive to look at complex molecular structures, have led to a rapid increase in the amount of volume data available for biomolecules. This creates a demand for better methods to analyse the data, including improved scores for comparison, classification and integration of data at different resolutions. To this end, we developed and evaluated a set of scoring functions that compare 3D-EM volumes. To test our scores we used a benchmark set of volume alignments derived from the Electron Microscopy Data Bank. We find that the performance of different scores vary with the map-type, resolution and the extent of overlap between volumes. Importantly, adding the overlap information to the local scoring functions can significantly improve their precision and accuracy in a range of resolutions. A combined score involving the local mutual information and overlap (LMI_OV) performs best overall, irrespective of the map category, resolution or the extent of overlap, and we recommend this score for general use. The local mutual information score itself is found to be more discriminatory than cross-correlation coefficient for intermediate-to-low resolution maps or when the map size and density distribution differ significantly. For comparing map surfaces, we implemented two filters to detect the surface points, including one based on the 'extent of surface exposure'. We show that scores that compare surfaces are useful at low resolutions and for maps with evident surface features. All the scores discussed are implemented in TEMPy (http://tempy.ismb.lon.ac.uk/).
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck College, University of London, Malet Street, London WC1E 7HX, United Kingdom; Scientific Computing Department, Science and Technology Facilities Council, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom
| | - Ingvar Lagerstedt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom; Computational Chemistry and Cheminformatics, Lilly UK, Windlesham GU20 6PH, United Kingdom
| | - Ardan Patwardhan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Maya Topf
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck College, University of London, Malet Street, London WC1E 7HX, United Kingdom.
| | - Martyn Winn
- Scientific Computing Department, Science and Technology Facilities Council, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom.
| |
Collapse
|
12
|
Abstract
Background One goal of structural biology is to understand how a protein’s 3-dimensional conformation determines its capacity to interact with potential ligands. In the case of small chemical ligands, deconstructing a static protein-ligand complex into its constituent atom-atom interactions is typically sufficient to rapidly predict ligand affinity with high accuracy (>70% correlation between predicted and experimentally-determined affinity), a fact that is exploited to support structure-based drug design. We recently found that protein-DNA/RNA affinity can also be predicted with high accuracy using extensions of existing techniques, but protein-protein affinity could not be predicted with >60% correlation, even when the protein-protein complex was available. Methods X-ray and NMR structures of protein-protein complexes, their associated binding affinities and experimental conditions were obtained from different binding affinity and structural databases. Statistical models were implemented using a generalized linear model framework, including the experimental conditions as new model features. We evaluated the potential for new features to improve affinity prediction models by calculating the Pearson correlation between predicted and experimental binding affinities on the training and test data after model fitting and after cross-validation. Differences in accuracy were assessed using two-sample t test and nonparametric Mann–Whitney U test. Results Here we evaluate a range of potential factors that may interfere with accurate protein-protein affinity prediction. We find that X-ray crystal resolution has the strongest single effect on protein-protein affinity prediction. Limiting our analyses to only high-resolution complexes (≤2.5 Å) increased the correlation between predicted and experimental affinity from 54 to 68% (p = 4.32x10−3). In addition, incorporating information on the experimental conditions under which affinities were measured (pH, temperature and binding assay) had significant effects on prediction accuracy. We also highlight a number of potential errors in large structure-affinity databases, which could affect both model training and accuracy assessment. Conclusions The results suggest that the accuracy of statistical models for protein-protein affinity prediction may be limited by the information present in databases used to train new models. Improving our capacity to integrate large-scale structural and functional information may be required to substantively advance our understanding of the general principles by which a protein’s structure determines its function. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1533-z) contains supplementary material, which is available to authorized users.
Collapse
|
13
|
Abstract
Scoring functions that assess spectrum similarity play a crucial role in many computational mass spectrometry algorithms. These functions are used to compare an experimentally acquired fragmentation (MS/MS) spectrum against two different types of target MS/MS spectra: either against a theoretical MS/MS spectrum derived from a peptide from a sequence database, or against another, previously acquired MS/MS spectrum. The former is typically encountered in database searching, while the latter is used in spectrum clustering and spectral library searching. The comparison between acquired versus theoretical MS/MS spectra is most commonly performed using cross-correlations or probability derived scoring functions, while the comparison of two acquired MS/MS spectra typically makes use of a normalized dot product, especially in spectrum library search algorithms. In addition to these scoring functions, Pearson's or Spearman's correlation coefficients, mean squared error, or median absolute deviation scores can also be used for the same purpose. Here, we describe and evaluate these scoring functions with regards to their ability to assess spectrum similarity for theoretical versus acquired, and acquired versus acquired spectra.
Collapse
Affiliation(s)
- Şule Yilmaz
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium
| | - Elien Vandermarliere
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium
| | - Lennart Martens
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium.
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium.
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium.
| |
Collapse
|
14
|
Pérez-Cano L, Romero-Durana M, Fernández-Recio J. Structural and energy determinants in protein-RNA docking. Methods 2016; 118-119:163-170. [PMID: 27816523 DOI: 10.1016/j.ymeth.2016.11.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 10/14/2016] [Accepted: 11/01/2016] [Indexed: 01/02/2023] Open
Abstract
Deciphering the structural and energetic determinants of protein-RNA interactions harbors the potential to understand key cell processes at molecular level, such as gene expression and regulation. With this purpose, computational methods like docking aim to complement current biophysical and structural biology efforts. However, the few reported docking algorithms for protein-RNA interactions show limited predictive success rates, mainly due to incomplete sampling of the conformational space of both the protein and the RNA molecules, as well as to the difficulties of the scoring function in identifying the correct docking models. Here, we have tested the predictive value of a variety of knowledge-based and energetic scoring functions on a recently published protein-RNA docking benchmark and developed a scoring function able to efficiently discriminate docking decoys. We first performed docking calculations with the bound conformation, which allowed us to analyze the problem in optimal conditions. We found that geometry-based terms and electrostatics were the most important scoring terms, while binding propensities and desolvation were much less relevant for the scoring of protein-RNA models. This is in contrast with what we observed for protein-protein docking. The results also showed an interesting dependence of the predictive rates on the flexibility of the protein molecule, which arises from the observed higher positive charge of flexible interfaces and provides hints for future development of more efficient protein-RNA docking methods.
Collapse
Affiliation(s)
- Laura Pérez-Cano
- Joint BSC-CRG-IRB Research Program in Computational Biology, Life Sciences Department, Barcelona Supercomputing Center (BSC), Jordi Girona 29, Barcelona 08034, Spain; Center for Neurobehavioral Genetics and Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Miguel Romero-Durana
- Joint BSC-CRG-IRB Research Program in Computational Biology, Life Sciences Department, Barcelona Supercomputing Center (BSC), Jordi Girona 29, Barcelona 08034, Spain
| | - Juan Fernández-Recio
- Joint BSC-CRG-IRB Research Program in Computational Biology, Life Sciences Department, Barcelona Supercomputing Center (BSC), Jordi Girona 29, Barcelona 08034, Spain.
| |
Collapse
|
15
|
Zhou Y, McGillick BE, Teng YHG, Haranahalli K, Ojima I, Swaminathan S, Rizzo RC. Identification of small molecule inhibitors of botulinum neurotoxin serotype E via footprint similarity. Bioorg Med Chem 2016; 24:4875-4889. [PMID: 27543389 DOI: 10.1016/j.bmc.2016.07.031] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2016] [Revised: 07/15/2016] [Accepted: 07/16/2016] [Indexed: 11/15/2022]
Abstract
Botulinum neurotoxins (BoNT) are among the most poisonous substances known, and of the 7 serotypes (A-G) identified thus far at least 4 can cause death in humans. The goal of this work was identification of inhibitors that specifically target the light chain catalytic site of the highly pathogenic but lesser-studied E serotype (BoNT/E). Large-scale computational screening, employing the program DOCK, was used to perform atomic-level docking of 1.4 million small molecules to prioritize those making favorable interactions with the BoNT/E site. In particular, 'footprint similarity' (FPS) scoring was used to identify compounds that could potentially mimic features on the known substrate tetrapeptide RIME. Among 92 compounds purchased and experimentally tested, compound C562-1101 emerged as the most promising hit with an apparent IC50 value three-fold more potent than that of the first reported BoNT/E small molecule inhibitor NSC-77053. Additional analysis showed the predicted binding pose of C562-1101 was geometrically and energetically stable over an ensemble of structures generated by molecular dynamic simulations and that many of the intended interactions seen with RIME were maintained. Several analogs were also computationally designed and predicted to have further molecular mimicry thereby demonstrating the potential utility of footprint-based scoring protocols to help guide hit refinement.
Collapse
Affiliation(s)
- Yuchen Zhou
- Department of Applied Mathematics & Statistics, Stony Brook University, Stony Brook, NY 11794, United States
| | - Brian E McGillick
- Graduate Program in Biochemistry & Structural Biology, Stony Brook University, Stony Brook, NY 11794, United States; Biology Department, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Yu-Han Gary Teng
- Institute of Chemical Biology & Drug Discovery, Stony Brook University, Stony Brook, NY 11794, United States; Department of Chemistry, Stony Brook University, Stony Brook, NY 11794, United States
| | | | - Iwao Ojima
- Institute of Chemical Biology & Drug Discovery, Stony Brook University, Stony Brook, NY 11794, United States; Department of Chemistry, Stony Brook University, Stony Brook, NY 11794, United States
| | | | - Robert C Rizzo
- Department of Applied Mathematics & Statistics, Stony Brook University, Stony Brook, NY 11794, United States; Institute of Chemical Biology & Drug Discovery, Stony Brook University, Stony Brook, NY 11794, United States; Laufer Center for Physical & Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, United States.
| |
Collapse
|
16
|
Cerqueira NMFSA, Gesto D, Oliveira EF, Santos-Martins D, Brás NF, Sousa SF, Fernandes PA, Ramos MJ. Receptor-based virtual screening protocol for drug discovery. Arch Biochem Biophys 2015; 582:56-67. [PMID: 26045247 DOI: 10.1016/j.abb.2015.05.011] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 05/26/2015] [Accepted: 05/27/2015] [Indexed: 12/12/2022]
Abstract
Computational aided drug design (CADD) is presently a key component in the process of drug discovery and development as it offers great promise to drastically reduce cost and time requirements. In the pharmaceutical arena, virtual screening is normally regarded as the top CADD tool to screen large libraries of chemical structures and reduce them to a key set of likely drug candidates regarding a specific protein target. This chapter provides a comprehensive overview of the receptor-based virtual screening process and of its importance in the present drug discovery and development paradigm. Following a focused contextualization on the subject, the main stages of a virtual screening campaign, including its strengths and limitations, are the subject of particular attention in this review. In all of these stages special consideration will be given to practical issues that are normally the Achilles heel of the virtual screening process.
Collapse
Affiliation(s)
- Nuno M F S A Cerqueira
- UCIBIO, REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Diana Gesto
- UCIBIO, REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Eduardo F Oliveira
- UCIBIO, REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Diogo Santos-Martins
- UCIBIO, REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Natércia F Brás
- UCIBIO, REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Sérgio F Sousa
- UCIBIO, REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Pedro A Fernandes
- UCIBIO, REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Maria J Ramos
- UCIBIO, REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal.
| |
Collapse
|
17
|
Xu W, Lucke AJ, Fairlie DP. Comparing sixteen scoring functions for predicting biological activities of ligands for protein targets. J Mol Graph Model 2015; 57:76-88. [PMID: 25682361 DOI: 10.1016/j.jmgm.2015.01.009] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Revised: 01/22/2015] [Accepted: 01/23/2015] [Indexed: 12/17/2022]
Abstract
Accurately predicting relative binding affinities and biological potencies for ligands that interact with proteins remains a significant challenge for computational chemists. Most evaluations of docking and scoring algorithms have focused on enhancing ligand affinity for a protein by optimizing docking poses and enrichment factors during virtual screening. However, there is still relatively limited information on the accuracy of commercially available docking and scoring software programs for correctly predicting binding affinities and biological activities of structurally related inhibitors of different enzyme classes. Presented here is a comparative evaluation of eight molecular docking programs (Autodock Vina, Fitted, FlexX, Fred, Glide, GOLD, LibDock, MolDock) using sixteen docking and scoring functions to predict the rank-order activity of different ligand series for six pharmacologically important protein and enzyme targets (Factor Xa, Cdk2 kinase, Aurora A kinase, COX-2, pla2g2a, β Estrogen receptor). Use of Fitted gave an excellent correlation (Pearson 0.86, Spearman 0.91) between predicted and experimental binding only for Cdk2 kinase inhibitors. FlexX and GOLDScore produced good correlations (Pearson>0.6) for hydrophilic targets such as Factor Xa, Cdk2 kinase and Aurora A kinase. By contrast, pla2g2a and COX-2 emerged as difficult targets for scoring functions to predict ligand activities. Although possessing a high hydrophobicity in its binding site, β Estrogen receptor produced reasonable correlations using LibDock (Pearson 0.75, Spearman 0.68). These findings can assist medicinal chemists to better match scoring functions with ligand-target systems for hit-to-lead optimization using computer-aided drug design approaches.
Collapse
Affiliation(s)
- Weijun Xu
- Division of Chemistry and Structural Biology, Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Andrew J Lucke
- Division of Chemistry and Structural Biology, Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
| | - David P Fairlie
- Division of Chemistry and Structural Biology, Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia.
| |
Collapse
|
18
|
Abstract
Docking methodology aims to predict the experimental binding modes and affinities of small molecules within the binding site of particular receptor targets and is currently used as a standard computational tool in drug design for lead compound optimisation and in virtual screening studies to find novel biologically active molecules. The basic tools of a docking methodology include a search algorithm and an energy scoring function for generating and evaluating ligand poses. In this review, we present the search algorithms and scoring functions most commonly used in current molecular docking methods that focus on protein-ligand applications. We summarise the main topics and recent computational and methodological advances in protein-ligand docking. Protein flexibility, multiple ligand binding modes and the free-energy landscape profile for binding affinity prediction are important and interconnected challenges to be overcome by further methodological developments in the docking field.
Collapse
|
19
|
Holden PM, Allen WJ, Gochin M, Rizzo RC. Strategies for lead discovery: application of footprint similarity targeting HIVgp41. Bioorg Med Chem 2013; 22:651-61. [PMID: 24315195 DOI: 10.1016/j.bmc.2013.10.022] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Revised: 10/08/2013] [Accepted: 10/17/2013] [Indexed: 10/26/2022]
Abstract
A highly-conserved binding pocket on HIVgp41 is an important target for development of anti-viral inhibitors. Holden et al. (Bioorg. Med. Chem. Lett.2012, 22, 3011) recently reported 7 experimentally-verified leads identified through a computational screen to the gp41 pocket in conjunction with a new DOCK scoring method (termed FPS scoring) developed in our laboratory. The method employs molecular footprints based on per-residue van der Waals interactions, electrostatic interactions, or the sum. In this work, we critically examine the gp41 screening results, prioritized using different scoring methods, in terms of two main criteria: (1) ligand pose properties which include footprint and energy score decompositions, MW, number of rotatable bonds, ligand efficiency, formal charge, and volume overlap, and (2) ligand pose stability which includes footprint stability (changes in footprint overlap) and rmsd stability (changes in geometry). Relative to standard DOCK scoring, pose property analyses demonstrate how FPS scoring can be used to identify ligands that mimic a known reference (derived here from the native gp41 substrate), while pose stability analyses demonstrate how FPS scoring can be used to enrich for compounds with greater overall stability during molecular dynamics (MD) simulations. Compellingly, of the 115 compounds tested experimentally, the 7 active compounds, as a group, more closely mimic the footprints made by the reference and show greater MD stability compared to the inactive group. Extensive studies using 116 protein-ligand complexes as controls reveal that ligands in their crystallographic binding pose also maintain higher FPS scores and smaller rmsds than do accompanying decoys, confirming that native poses are indeed 'stable' under the same conditions and that monitoring FPS variability during compound prioritization is likely to be beneficial. Overall, the results suggest the new scoring method will complement current virtual screening approaches for both the identification (FPS-ranking) and prioritization (FPS-stability) of target-compatible molecules in a quantitative and logical way.
Collapse
Affiliation(s)
- Patrick M Holden
- Department of Applied Mathematics & Statistics, Stony Brook University, Stony Brook, NY 11794, United States
| | - William J Allen
- Department of Applied Mathematics & Statistics, Stony Brook University, Stony Brook, NY 11794, United States
| | - Miriam Gochin
- Department of Basic Sciences, Touro University-California, Mare Island, Vallejo, CA 94592, United States; Department of Pharmaceutical Chemistry, University of California San Francisco, CA 94143, United States
| | - Robert C Rizzo
- Department of Applied Mathematics & Statistics, Stony Brook University, Stony Brook, NY 11794, United States; Institute of Chemical Biology & Drug Discovery, Stony Brook University, Stony Brook, NY 11794, United States; Laufer Center for Physical & Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, United States.
| |
Collapse
|
20
|
Ul-Haq Z, Uddin R, Gul S. Optimization of Structure Based Virtual Screening Protocols Against Thymidine Monophosphate Kinase Inhibitors as Antitubercular Agents. Mol Inform 2011; 30:851-62. [PMID: 27468105 DOI: 10.1002/minf.201100049] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2011] [Accepted: 06/20/2011] [Indexed: 11/06/2022]
Abstract
Thymidine monophosphate kinase from Mycobacterium tuberculosis (TMPKMtub ) is an established drug target against tuberculosis. The enzyme TMPKMtub is responsible for the survival of bacterium MTB and required to synthesize an essential building block of the bacterial DNA which is thymidine triphosphate (TTP). There are several potent inhibitors available against the target enzyme but the majority are substrate analogues. Recently, three dimensional structures of the enzyme TMPKMtub inhibitor complexes were resolved using X-ray crystallography. These available crystal structures were the basis of initiating a structure based lead identification campaign against TMPKMtub . The available information was utilized to perform structure-based virtual screening against TMPKMtub with the hope to diversify the structures of the current inhibitors. In order to setup the protocol, 10 000 out of 45 000 drug-like molecules were randomly selected from National Cancer Institute's (NCI) database. Additionally 105 known inhibitors along with 11 natural substrates were mixed with the 10 000 selected compounds. For the current study, a rigid based docking algorithm, i.e., FRED has been utilized to set up an efficient docking and scoring protocol. The methods including enrichment curves, consensus scoring and ROC curves are providing useful insights into the setting up of a suitable structure-based docking protocol against TMPKMtub . As a result, an optimum docking and scoring function has been identified for future large scale virtual screening. In the present work, we have demonstrated a rational choice of protocol for structure based virtual screening of chemical libraries and help to understand the influence of receptor flexibility by using multiple geometries.
Collapse
Affiliation(s)
- Zaheer Ul-Haq
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi-75270, Pakistan.,Molecular and Cellular Modeling Group, Heidelberg Institute for Theoretical Studies (HITS) gGmbH, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany
| | - Reaz Uddin
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi-75270, Pakistan. , .,Institute of General, Inorganic and Theoretical Chemistry, University of Innsbruck, Innrain 52a, A-6020 Innsbruck, Austria. ,
| | - Sana Gul
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi-75270, Pakistan
| |
Collapse
|