1
|
Wahl J. PheSA: An Open-Source Tool for Pharmacophore-Enhanced Shape Alignment. J Chem Inf Model 2024; 64:5944-5953. [PMID: 39092495 DOI: 10.1021/acs.jcim.4c00516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
PheSA is an open-source pharmacophore- and shape-based screening and molecular alignment tool that is fully open-source as part of OpenChemLib. Supporting standard ligand-based screening, flexible refinement of alignments, and receptor-guided shape docking, PheSA is a very flexible tool and can be used for different use cases in structure-based drug design. We present the algorithm and different benchmark studies that investigate the screening performance and also the quality of the generated alignments and the pose prediction performance of the receptor-guided PheSA algorithm. An important finding is the effect of the type of similarity metric used for measuring screening enrichment (symmetric Tanimoto versus asymmetric Tversky), whereby we could observe improved enrichment rates by using Tversky. PheSA exhibits enrichments on the DUD-E that are on par with commercial methods.
Collapse
Affiliation(s)
- Joel Wahl
- Scientific Computing Drug Discovery, Idorsia Pharmaceuticals Ltd, Hegenheimermattweg 91, CH-4123 Allschwil, Switzerland
| |
Collapse
|
2
|
Koukos PI, Réau M, Bonvin AMJJ. Shape-Restrained Modeling of Protein-Small-Molecule Complexes with High Ambiguity Driven DOCKing. J Chem Inf Model 2021; 61:4807-4818. [PMID: 34436890 PMCID: PMC8479858 DOI: 10.1021/acs.jcim.1c00796] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Small-molecule docking remains one of the most valuable computational techniques for the structure prediction of protein-small-molecule complexes. It allows us to study the interactions between compounds and the protein receptors they target at atomic detail in a timely and efficient manner. Here, we present a new protocol in HADDOCK (High Ambiguity Driven DOCKing), our integrative modeling platform, which incorporates homology information for both receptor and compounds. It makes use of HADDOCK's unique ability to integrate information in the simulation to drive it toward conformations, which agree with the provided data. The focal point is the use of shape restraints derived from homologous compounds bound to the target receptors. We have developed two protocols: in the first, the shape is composed of dummy atom beads based on the position of the heavy atoms of the homologous template compound, whereas in the second, the shape is additionally annotated with pharmacophore data for some or all beads. For both protocols, ambiguous distance restraints are subsequently defined between those beads and the heavy atoms of the ligand to be docked. We have benchmarked the performance of these protocols with a fully unbound version of the widely used DUD-E (Database of Useful Decoys-Enhanced) dataset. In this unbound docking scenario, our template/shape-based docking protocol reaches an overall success rate of 81% when a reliable template can be identified (which was the case for 99 out of 102 complexes in the DUD-E dataset), which is close to the best results reported for bound docking on the DUD-E dataset.
Collapse
Affiliation(s)
- Panagiotis I Koukos
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht 3584CH, The Netherlands
| | - Manon Réau
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht 3584CH, The Netherlands
| | - Alexandre M J J Bonvin
- Computational Structural Biology Group, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht 3584CH, The Netherlands
| |
Collapse
|
3
|
Cleves AE, Johnson SR, Jain AN. Electrostatic-field and surface-shape similarity for virtual screening and pose prediction. J Comput Aided Mol Des 2019; 33:865-886. [PMID: 31650386 PMCID: PMC6856045 DOI: 10.1007/s10822-019-00236-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 10/11/2019] [Indexed: 02/04/2023]
Abstract
We introduce a new method for rapid computation of 3D molecular similarity that combines electrostatic field comparison with comparison of molecular surface-shape and directional hydrogen-bonding preferences (called "eSim"). Rather than employing heuristic "colors" or user-defined molecular feature types to represent conformation-dependent molecular electrostatics, eSim calculates the similarity of the electrostatic fields of two molecules (in addition to shape and hydrogen-bonding). We present detailed virtual screening performance data on the standard 102 target DUD-E set. In its moderately fast screening mode, eSim running on a single computing core is capable of processing over 60 molecules per second. In this mode, eSim performed significantly better than all alternate methods for which full DUD-E data were available (mean ROC area of 0.74, p [Formula: see text], by paired t-test, compared with the best performing alternate method). In addition, for 92 targets of the DUD-E set where multiple ligand-bound crystal structures were available, screening performance was assessed using alternate ligands or sets thereof (in their bound poses) as similarity targets. Using the joint alignment of five ligands for each protein target, mean ROC area exceeded 0.82 for the 92 targets. Design-focused application of ligand similarity methods depends on accurate predictions of geometric molecular relationships. We comprehensively assessed pose prediction accuracy by curating nearly 400,000 bound ligand pose pairs across the DUD-E targets. Overall, beginning from agnostic initial poses, we observed an 80% success rate for RMSD [Formula: see text] Å among the top 20 predicted eSim poses. These examples were split roughly 50/50 into cases with high direct atomic overlap (where a shared scaffold exists between a pair) and low direct atomic overlap (where where a ligand pair has dissimilar scaffolds but largely occupies the same space). Within the high direct atomic overlap subset, the pose prediction success rate was 93%. For the more challenging subset (where dissimilar scaffolds are to be aligned), the success rate was 70%. The eSim approach enables both large-scale screening and rational design of ligands and is rooted in physically meaningful, non-heuristic, molecular comparisons.
Collapse
Affiliation(s)
- Ann E Cleves
- Applied Science, BioPharmics LLC, Santa Rosa, CA, USA
| | - Stephen R Johnson
- Computer-Assisted Drug-Design, Bristol-Myers Squibb, Co., Princeton, NJ, USA
| | - Ajay N Jain
- Dept. of Bioengineering and Therapeutic Sciences, University of California, San Francisco, USA.
| |
Collapse
|
4
|
Laufkötter O, Miyao T, Bajorath J. Large-Scale Comparison of Alternative Similarity Search Strategies with Varying Chemical Information Contents. ACS OMEGA 2019; 4:15304-15311. [PMID: 31552377 PMCID: PMC6751733 DOI: 10.1021/acsomega.9b02470] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2019] [Accepted: 08/23/2019] [Indexed: 06/10/2023]
Abstract
Similarity searching (SS) is a core approach in computational compound screening and has a long tradition in pharmaceutical research. Over the years, different approaches have been introduced to increase the information content of search calculations and optimize the ability to detect compounds having similar activity. We present a large-scale comparison of distinct search strategies on more than 600 qualifying compound activity classes. Challenging test cases for SS were identified and used to evaluate different ways to further improve search performance, which provided a differentiated view of alternative search strategies and their relative performance. It was found that search results could not only be improved by increasing compound input information but also by focusing similarity calculations on database compounds. In the presence of multiple active reference compounds, asymmetric SS with high weights on chemical features of target compounds emerged as an overall preferred approach across many different activity classes. These findings have implications for practical virtual screening applications.
Collapse
Affiliation(s)
- Oliver Laufkötter
- Department
of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology
and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| | - Tomoyuki Miyao
- Data
Science Center and Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Jürgen Bajorath
- Department
of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology
and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| |
Collapse
|
5
|
Kumar A, Zhang KYJ. Shape similarity guided pose prediction: lessons from D3R Grand Challenge 3. J Comput Aided Mol Des 2018; 33:47-59. [PMID: 30084081 DOI: 10.1007/s10822-018-0142-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2018] [Accepted: 08/01/2018] [Indexed: 12/15/2022]
Abstract
To extend the utility of ligand 3D shape similarity into pose prediction and virtual screening, we have previously developed CDVS and PoPSS methods. Both of them utilize ligand 3D shape similarity with the crystallographic ligands to improve pose prediction. While CDVS utilizes shape similarity to select suitable receptor structures for molecular docking, PoPSS places a ligand conformation of the highest shape similarity with crystal ligands into the target protein binding pocket which is then refined by side-chain repacking and Monte Carlo energy minimization. Analyses of PoPSS revealed some drawbacks in ligand conformation generation and the scoring scheme used. Moreover, as PoPSS does not sample the ligand conformation after placing it in the binding pocket, it relies solely on conformation generation methods to produce native like conformations. To address these limitations of PoPSS method, we report here a modified approach named as PoPSS-Lite, where side-chain repacking was replaced by a simple grid-based energy minimization. This modification also allowed the sampling of terminal functional groups while keeping the core scaffold fixed. Furthermore, shape similarity calculations were improved by increasing the number of ligand conformations and using a different similarity metric. The performance of PoPSS-Lite was prospectively evaluated in D3R GC3. Comparison of PoPSS-Lite demonstrated superior performance over PoPSS and CDVS with lower mean and median RMSDs. Furthermore, comparison with other D3R GC3 pose prediction submissions revealed top performance for PoPSS-Lite. Our D3R GC3 result extends our perspective that ligand 3D shape similarity with known crystallographic information can be successfully used to predict the binding pose of ligands with unknown binding modes. Our D3R GC3 results further highlight the necessity for improvement in conformer generation methods in order to improve shape similarity guided pose prediction.
Collapse
Affiliation(s)
- Ashutosh Kumar
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa, 230-0045, Japan
| | - Kam Y J Zhang
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa, 230-0045, Japan.
| |
Collapse
|
6
|
Assessment of tautomer distribution using the condensed reaction graph approach. J Comput Aided Mol Des 2018; 32:401-414. [DOI: 10.1007/s10822-018-0101-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 01/18/2018] [Indexed: 02/07/2023]
|
7
|
Wong LWY, Tam GSS, Chen X, So FTK, Soecipto A, Sheong FK, Sung HHY, Lin Z, Williams ID. A chiral spiroborate anion from diphenyl-l-tartramide [B{l-Tar(NHPh)2}2]−applied to some challenging resolutions. CrystEngComm 2018. [DOI: 10.1039/c8ce00855h] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
A chiral spiroborate anion [B{l-Tar(NHPh)2}2]−is effective in challenging high yield, 1-pot resolutions, as for the S-2-phenylpropylammonium salt shown.
Collapse
Affiliation(s)
- Lawrence W.-Y. Wong
- Department of Chemistry
- Hong Kong University of Science and Technology
- Kowloon
- China
| | - Gemma S.-S. Tam
- Department of Chemistry
- Hong Kong University of Science and Technology
- Kowloon
- China
| | - Xiaoyan Chen
- Department of Chemistry
- Hong Kong University of Science and Technology
- Kowloon
- China
| | - Frederick T.-K. So
- Department of Chemistry
- Hong Kong University of Science and Technology
- Kowloon
- China
| | - Aristyo Soecipto
- Department of Chemistry
- Hong Kong University of Science and Technology
- Kowloon
- China
| | - Fu Kit Sheong
- Department of Chemistry
- Hong Kong University of Science and Technology
- Kowloon
- China
| | - Herman H.-Y. Sung
- Department of Chemistry
- Hong Kong University of Science and Technology
- Kowloon
- China
| | - Zhenyang Lin
- Department of Chemistry
- Hong Kong University of Science and Technology
- Kowloon
- China
| | - Ian D. Williams
- Department of Chemistry
- Hong Kong University of Science and Technology
- Kowloon
- China
| |
Collapse
|
8
|
QSAR modeling and chemical space analysis of antimalarial compounds. J Comput Aided Mol Des 2017; 31:441-451. [PMID: 28374255 DOI: 10.1007/s10822-017-0019-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Accepted: 03/18/2017] [Indexed: 10/19/2022]
Abstract
Generative topographic mapping (GTM) has been used to visualize and analyze the chemical space of antimalarial compounds as well as to build predictive models linking structure of molecules with their antimalarial activity. For this, a database, including ~3000 molecules tested in one or several of 17 anti-Plasmodium activity assessment protocols, has been compiled by assembling experimental data from in-house and ChEMBL databases. GTM classification models built on subsets corresponding to individual bioassays perform similarly to the earlier reported SVM models. Zones preferentially populated by active and inactive molecules, respectively, clearly emerge in the class landscapes supported by the GTM model. Their analysis resulted in identification of privileged structural motifs of potential antimalarial compounds. Projection of marketed antimalarial drugs on this map allowed us to delineate several areas in the chemical space corresponding to different mechanisms of antimalarial activity. This helped us to make a suggestion about the mode of action of the molecules populating these zones.
Collapse
|
9
|
O'Hagan S, Kell DB. Analysis of drug-endogenous human metabolite similarities in terms of their maximum common substructures. J Cheminform 2017; 9:18. [PMID: 28316656 PMCID: PMC5344883 DOI: 10.1186/s13321-017-0198-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 02/09/2017] [Indexed: 12/21/2022] Open
Abstract
In previous work, we have assessed the structural similarities between marketed drugs (‘drugs’) and endogenous natural human metabolites (‘metabolites’ or ‘endogenites’), using ‘fingerprint’ methods in common use, and the Tanimoto and Tversky similarity metrics, finding that the fingerprint encoding used had a dramatic effect on the apparent similarities observed. By contrast, the maximal common substructure (MCS), when the means of determining it is fixed, is a means of determining similarities that is largely independent of the fingerprints, and also has a clear chemical meaning. We here explored the utility of the MCS and metrics derived therefrom. In many cases, a shared scaffold helps cluster drugs and endogenites, and gives insight into enzymes (in particular transporters) that they both share. Tanimoto and Tversky similarities based on the MCS tend to be smaller than those based on the MACCS fingerprint-type encoding, though the converse is also true for a significant fraction of the comparisons. While no single molecular descriptor can account for these differences, a machine learning-based analysis of the nature of the differences (MACCS_Tanimoto vs MCS_Tversky) shows that they are indeed deterministic, although the features that are used in the model to account for this vary greatly with each individual drug. The extent of its utility and interpretability vary with the drug of interest, implying that while MCS is neither ‘better’ nor ‘worse’ for every drug–endogenite comparison, it is sufficiently different to be of value. The overall conclusion is thus that the use of the MCS provides an additional and valuable strategy for understanding the structural basis for similarities between synthetic, marketed drugs and natural intermediary metabolites.
Collapse
Affiliation(s)
- Steve O'Hagan
- School of Chemistry, The University of Manchester, 131 Princess St, Manchester, M1 7DN UK.,Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester, M1 7DN UK
| | - Douglas B Kell
- School of Chemistry, The University of Manchester, 131 Princess St, Manchester, M1 7DN UK.,Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester, M1 7DN UK.,Centre for the Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), The University of Manchester, 131 Princess St, Manchester, M1 7DN UK
| |
Collapse
|
10
|
Horvath D, Marcou G, Varnek A. Generative Topographic Mapping Approach to Chemical Space Analysis. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2017. [DOI: 10.1007/978-3-319-56850-8_6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
11
|
Kearnes S, Pande V. ROCS-derived features for virtual screening. J Comput Aided Mol Des 2016; 30:609-17. [PMID: 27624668 DOI: 10.1007/s10822-016-9959-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 08/31/2016] [Indexed: 10/21/2022]
Abstract
Rapid overlay of chemical structures (ROCS) is a standard tool for the calculation of 3D shape and chemical ("color") similarity. ROCS uses unweighted sums to combine many aspects of similarity, yielding parameter-free models for virtual screening. In this report, we decompose the ROCS color force field into color components and color atom overlaps, novel color similarity features that can be weighted in a system-specific manner by machine learning algorithms. In cross-validation experiments, these additional features significantly improve virtual screening performance relative to standard ROCS.
Collapse
Affiliation(s)
- Steven Kearnes
- Stanford University, 318 Campus Dr. S296, Stanford, CA, 94305, USA. .,Google Inc., 1600 Amphitheatre Pkwy, Mountain View, CA, 94043, USA.
| | - Vijay Pande
- Stanford University, 318 Campus Dr. S296, Stanford, CA, 94305, USA
| |
Collapse
|
12
|
Kunimoto R, Vogt M, Bajorath J. Maximum common substructure-based Tversky index: an asymmetric hybrid similarity measure. J Comput Aided Mol Des 2016; 30:523-31. [PMID: 27515428 DOI: 10.1007/s10822-016-9935-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 08/04/2016] [Indexed: 12/01/2022]
Abstract
Current approaches for the assessment of molecular similarity can generally be divided into descriptor-based and substructure-based methods. The former require the application of similarity metrics that yield continuous similarity values, whereas the readout of the latter is binary (i.e. similar vs. not similar). However, it is also possible to combine descriptor-based and substructure-based methods to exploit advantages of individual methods in context and generate similarity measures for special applications. Herein we present a hybrid measure for asymmetric similarity calculations on the basis of maximum common core structures. This similarity function can be effectively applied to compare small reference compounds with larger test molecules, which is difficult using conventional metrics.
Collapse
Affiliation(s)
- Ryo Kunimoto
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, 53113, Bonn, Germany
| | - Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, 53113, Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, 53113, Bonn, Germany.
| |
Collapse
|
13
|
Johnson DK, Karanicolas J. Ultra-High-Throughput Structure-Based Virtual Screening for Small-Molecule Inhibitors of Protein-Protein Interactions. J Chem Inf Model 2016; 56:399-411. [PMID: 26726827 DOI: 10.1021/acs.jcim.5b00572] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Protein-protein interactions play important roles in virtually all cellular processes, making them enticing targets for modulation by small-molecule therapeutics: specific examples have been well validated in diseases ranging from cancer and autoimmune disorders, to bacterial and viral infections. Despite several notable successes, however, overall these remain a very challenging target class. Protein interaction sites are especially challenging for computational approaches, because the target protein surface often undergoes a conformational change to enable ligand binding: this confounds traditional approaches for virtual screening. Through previous studies, we demonstrated that biased "pocket optimization" simulations could be used to build collections of low-energy pocket-containing conformations, starting from an unbound protein structure. Here, we demonstrate that these pockets can further be used to identify ligands that complement the protein surface. To do so, we first build from a given pocket its "exemplar": a perfect, but nonphysical, pseudoligand that would optimally match the shape and chemical features of the pocket. In our previous studies, we used these exemplars to quantitatively compare protein surface pockets to one another. Here, we now introduce this exemplar as a template for pharmacophore-based screening of chemical libraries. Through a series of benchmark experiments, we demonstrate that this approach exhibits comparable performance as traditional docking methods for identifying known inhibitors acting at protein interaction sites. However, because this approach is predicated on ligand/exemplar overlays, and thus does not require explicit calculation of protein-ligand interactions, exemplar screening provides a tremendous speed advantage over docking: 6 million compounds can be screened in about 15 min on a single 16-core, dual-GPU computer. The extreme speed at which large compound libraries can be traversed easily enables screening against a "pocket-optimized" ensemble of protein conformations, which in turn facilitates identification of more diverse classes of active compounds for a given protein target.
Collapse
Affiliation(s)
- David K Johnson
- Center for Computational Biology, and ‡Department of Molecular Biosciences, University of Kansas , 2030 Becker Drive, Lawrence, Kansas 66045-7534, United States
| | - John Karanicolas
- Center for Computational Biology, and ‡Department of Molecular Biosciences, University of Kansas , 2030 Becker Drive, Lawrence, Kansas 66045-7534, United States
| |
Collapse
|
14
|
Muegge I, Mukherjee P. An overview of molecular fingerprint similarity search in virtual screening. Expert Opin Drug Discov 2015; 11:137-48. [PMID: 26558489 DOI: 10.1517/17460441.2016.1117070] [Citation(s) in RCA: 119] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
INTRODUCTION A central premise of medicinal chemistry is that structurally similar molecules exhibit similar biological activities. Molecular fingerprints encode properties of small molecules and assess their similarities computationally through bit string comparisons. Based on the similarity to a biologically active template, molecular fingerprint methods allow for identifying additional compounds with a higher chance of displaying similar biological activities against the same target - a process commonly referred to as virtual screening (VS). AREAS COVERED This article focuses on fingerprint similarity searches in the context of compound selection for enhancing hit sets, comparing compound decks, and VS. In addition, the authors discuss the application of fingerprints in predictive modeling. EXPERT OPINION Fingerprint similarity search methods are especially useful in VS if only a few unrelated ligands are known for a given target and therefore more complex and information rich methods such as pharmacophore searches or structure-based design are not applicable. In addition, fingerprint methods are used in characterizing properties of compound collections such as chemical diversity, density in chemical space, and content of biologically active molecules (biodiversity). Such assessments are important for deciding what compounds to experimentally screen, to purchase, or to assemble in a virtual compound deck for in silico screening or de novo design.
Collapse
Affiliation(s)
- Ingo Muegge
- a Boehringer Ingelheim Pharmaceuticals , Department of Small Molecule Discovery Research , Ridgefield , CT , USA
| | - Prasenjit Mukherjee
- a Boehringer Ingelheim Pharmaceuticals , Department of Small Molecule Discovery Research , Ridgefield , CT , USA
| |
Collapse
|
15
|
Sidorov P, Gaspar H, Marcou G, Varnek A, Horvath D. Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds. J Comput Aided Mol Des 2015; 29:1087-108. [PMID: 26564142 DOI: 10.1007/s10822-015-9882-z] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 11/06/2015] [Indexed: 11/30/2022]
Abstract
Intuitive, visual rendering--mapping--of high-dimensional chemical spaces (CS), is an important topic in chemoinformatics. Such maps were so far dedicated to specific compound collections--either limited series of known activities, or large, even exhaustive enumerations of molecules, but without associated property data. Typically, they were challenged to answer some classification problem with respect to those same molecules, admired for their aesthetical virtues and then forgotten--because they were set-specific constructs. This work wishes to address the question whether a general, compound set-independent map can be generated, and the claim of "universality" quantitatively justified, with respect to all the structure-activity information available so far--or, more realistically, an exploitable but significant fraction thereof. The "universal" CS map is expected to project molecules from the initial CS into a lower-dimensional space that is neighborhood behavior-compliant with respect to a large panel of ligand properties. Such map should be able to discriminate actives from inactives, or even support quantitative neighborhood-based, parameter-free property prediction (regression) models, for a wide panel of targets and target families. It should be polypharmacologically competent, without requiring any target-specific parameter fitting. This work describes an evolutionary growth procedure of such maps, based on generative topographic mapping, followed by the validation of their polypharmacological competence. Validation was achieved with respect to a maximum of exploitable structure-activity information, covering all of Homo sapiens proteins of the ChEMBL database, antiparasitic and antiviral data, etc. Five evolved maps satisfactorily solved hundreds of activity-based ligand classification challenges for targets, and even in vivo properties independent from training data. They also stood chemogenomics-related challenges, as cumulated responsibility vectors obtained by mapping of target-specific ligand collections were shown to represent validated target descriptors, complying with currently accepted target classification in biology. Therefore, they represent, in our opinion, a robust and well documented answer to the key question "What is a good CS map?"
Collapse
Affiliation(s)
- Pavel Sidorov
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France.,Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Helena Gaspar
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France
| | - Gilles Marcou
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France.,Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Dragos Horvath
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France.
| |
Collapse
|
16
|
Gadhe CG, Lee E, Kim MH. Finding new scaffolds of JAK3 inhibitors in public database: 3D-QSAR models & shape-based screening. Arch Pharm Res 2015; 38:2008-19. [PMID: 25956696 DOI: 10.1007/s12272-015-0607-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Accepted: 04/20/2015] [Indexed: 01/02/2023]
Abstract
The STAT/JAK3 pathway is a well-known therapeutic target in various diseases (ex. rheumatoid arthritis and psoriasis). The therapeutic advantage of JAK3 inhibition motivated to find new scaffolds with desired DMPK. For the purpose, in silico high-throughput sieves method is developed consisting of a receptor-guided three-dimensional quantitative structure-activity relationship study and shape-based virtual screening. We developed robust and predictive comparative molecular field analysis (q (2) = 0.760, r (2) = 0.915) and comparative molecular similarity index analysis (q (2) = 0.817, r (2) = 0.981) models and validated these using a test set, which produced satisfactory predictions of 0.925 and 0.838, respectively.
Collapse
Affiliation(s)
- Changdev G Gadhe
- Department of Pharmacy, College of Pharmacy, Gachon University, 155 Gaetbeol-ro, Yeonsu-gu, Incheon, Republic of Korea
- Gachon Institute of Pharmaceutical Science, Gachon University, Yeonsu-gu, Incheon, Republic of Korea
| | - Eunhee Lee
- Department of Pharmacy, College of Pharmacy, Gachon University, 155 Gaetbeol-ro, Yeonsu-gu, Incheon, Republic of Korea
- Gachon Institute of Pharmaceutical Science, Gachon University, Yeonsu-gu, Incheon, Republic of Korea
| | - Mi-Hyun Kim
- Department of Pharmacy, College of Pharmacy, Gachon University, 155 Gaetbeol-ro, Yeonsu-gu, Incheon, Republic of Korea.
- Gachon Institute of Pharmaceutical Science, Gachon University, Yeonsu-gu, Incheon, Republic of Korea.
| |
Collapse
|
17
|
Duesbury E, Holliday J, Willett P. Maximum Common Substructure-Based Data Fusion in Similarity Searching. J Chem Inf Model 2015; 55:222-30. [DOI: 10.1021/ci5005702] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Edmund Duesbury
- Information School, University of Sheffield, 211 Portobello, Sheffield S1 4DP, United Kingdom
| | - John Holliday
- Information School, University of Sheffield, 211 Portobello, Sheffield S1 4DP, United Kingdom
| | - Peter Willett
- Information School, University of Sheffield, 211 Portobello, Sheffield S1 4DP, United Kingdom
| |
Collapse
|
18
|
Gan S, Cosgrove DA, Gardiner EJ, Gillet VJ. Investigation of the use of spectral clustering for the analysis of molecular data. J Chem Inf Model 2014; 54:3302-19. [PMID: 25379955 DOI: 10.1021/ci500480b] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Spectral clustering involves placing objects into clusters based on the eigenvectors and eigenvalues of an associated matrix. The technique was first applied to molecular data by Brewer [J. Chem. Inf. Model. 2007, 47, 1727-1733] who demonstrated its use on a very small dataset of 125 COX-2 inhibitors. We have determined suitable parameters for spectral clustering using a wide variety of molecular descriptors and several datasets of a few thousand compounds and compared the results of clustering using a nonoverlapping version of Brewer's use of Sarker and Boyer's algorithm with that of Ward's and k-means clustering. We then replaced the exact eigendecomposition method with two different approximate methods and concluded that Singular Value Decomposition is the most appropriate method for clustering larger compound collections of up to 100,000 compounds. We have also used spectral clustering with the Tversky coefficient to generate two sets of clusters linked by a common set of eigenvalues and have used this novel approach to cluster sets of fragments such as those used in fragment-based drug design.
Collapse
Affiliation(s)
- Sonny Gan
- Information School, University of Sheffield , Regent Court, 211 Portobello Street, Sheffield S1 4DP, United Kingdom
| | | | | | | |
Collapse
|
19
|
Computational chemogenomics: is it more than inductive transfer? J Comput Aided Mol Des 2014; 28:597-618. [PMID: 24771144 DOI: 10.1007/s10822-014-9743-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Accepted: 04/11/2014] [Indexed: 10/25/2022]
Abstract
High-throughput assays challenge us to extract knowledge from multi-ligand, multi-target activity data. In QSAR, weights are statically fitted to each ligand descriptor with respect to a single endpoint or target. However, computational chemogenomics (CG) has demonstrated benefits of learning from entire grids of data at once, rather than building target-specific QSARs. A possible reason for this is the emergence of inductive knowledge transfer (IT) between targets, providing statistical robustness to the model, with no assumption about the structure of the targets. Relevant protein descriptors in CG should allow one to learn how to dynamically adjust ligand attribute weights with respect to protein structure. Hence, models built through explicit learning (EL) by including protein information, while benefitting from IT enhancement, should provide additional predictive capability, notably for protein deorphanization. This interplay between IT and EL in CG modeling is not sufficiently studied. While IT is likely to occur irrespective of the injected target information, it is not clear whether and when boosting due to EL may occur. EL is only possible if protein description is appropriate to the target set under investigation. The key issue here is the search for evidence of genuine EL exceeding expectations based on pure IT. We explore the problem in the context of Support Vector Regression, using more than 9,400 pKi values of 31 GPCRs, where compound-protein interactions are represented by the concatenation of vectorial descriptions of compounds and proteins. This provides a unified framework to generate both IT-enhanced and potentially EL-enabled models, where the difference is toggled by supplied protein information. For EL-enabled models, protein information includes genuine protein descriptors such as typical sequence-based terms, but also the experimentally determined affinity cross-correlation fingerprints. These latter benchmark the expected behavior of a quasi-ideal descriptor capturing the actual functional protein-protein relatedness, and therefore thought to be the most likely to enable EL. EL- and IT-based methods were benchmarked alongside classical QSAR, with respect to cross-validation and deorphanization challenges. A rational method for projecting benchmarked methodologies into a strategy space is given, in the aims that the projection will provide directions for the types of molecule designs possible using a given methodology. While EL-enabled strategies outperform classical QSARs and favorably compare to similar published results, they are, in all respects evaluated herein, not strongly distinguished from IT-enhanced models. Moreover, EL-enabled strategies failed to prove superior in deorphanization challenges. Therefore, this paper raises caution that, contrary to common belief and intuitive expectation, the benefits of chemogenomics models over classical QSAR are quite possibly due less to the injection of protein-related information, and rather impacted more by the effect of inductive transfer, due to simultaneous learning from all of the modeled endpoints. These results show that the field of protein descriptor research needs further improvements to truly realize the expected benefit of EL.
Collapse
|