1
|
Basciu A, Callea L, Motta S, Bonvin AM, Bonati L, Vargiu AV. No dance, no partner! A tale of receptor flexibility in docking and virtual screening. VIRTUAL SCREENING AND DRUG DOCKING 2022. [DOI: 10.1016/bs.armc.2022.08.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
2
|
Ballante F, Kooistra AJ, Kampen S, de Graaf C, Carlsson J. Structure-Based Virtual Screening for Ligands of G Protein-Coupled Receptors: What Can Molecular Docking Do for You? Pharmacol Rev 2021; 73:527-565. [PMID: 34907092 DOI: 10.1124/pharmrev.120.000246] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
G protein-coupled receptors (GPCRs) constitute the largest family of membrane proteins in the human genome and are important therapeutic targets. During the last decade, the number of atomic-resolution structures of GPCRs has increased rapidly, providing insights into drug binding at the molecular level. These breakthroughs have created excitement regarding the potential of using structural information in ligand design and initiated a new era of rational drug discovery for GPCRs. The molecular docking method is now widely applied to model the three-dimensional structures of GPCR-ligand complexes and screen for chemical probes in large compound libraries. In this review article, we first summarize the current structural coverage of the GPCR superfamily and the understanding of receptor-ligand interactions at atomic resolution. We then present the general workflow of structure-based virtual screening and strategies to discover GPCR ligands in chemical libraries. We assess the state of the art of this research field by summarizing prospective applications of virtual screening based on experimental structures. Strategies to identify compounds with specific efficacy and selectivity profiles are discussed, illustrating the opportunities and limitations of the molecular docking method. Our overview shows that structure-based virtual screening can discover novel leads and will be essential in pursuing the next generation of GPCR drugs. SIGNIFICANCE STATEMENT: Extraordinary advances in the structural biology of G protein-coupled receptors have revealed the molecular details of ligand recognition by this large family of therapeutic targets, providing novel avenues for rational drug design. Structure-based docking is an efficient computational approach to identify novel chemical probes from large compound libraries, which has the potential to accelerate the development of drug candidates.
Collapse
Affiliation(s)
- Flavio Ballante
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden (F.B., S.K., J.C.); Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark (A.J.K.); and Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, United Kingdom (C.d.G.)
| | - Albert J Kooistra
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden (F.B., S.K., J.C.); Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark (A.J.K.); and Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, United Kingdom (C.d.G.)
| | - Stefanie Kampen
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden (F.B., S.K., J.C.); Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark (A.J.K.); and Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, United Kingdom (C.d.G.)
| | - Chris de Graaf
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden (F.B., S.K., J.C.); Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark (A.J.K.); and Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, United Kingdom (C.d.G.)
| | - Jens Carlsson
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden (F.B., S.K., J.C.); Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark (A.J.K.); and Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, United Kingdom (C.d.G.)
| |
Collapse
|
3
|
Bitencourt-Ferreira G, Rizzotto C, de Azevedo Junior WF. Machine Learning-Based Scoring Functions, Development and Applications with SAnDReS. Curr Med Chem 2021; 28:1746-1756. [PMID: 32410551 DOI: 10.2174/0929867327666200515101820] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 04/06/2020] [Accepted: 04/07/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. OBJECTIVE Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. METHODS SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding and thermodynamic data to create targeted scoring functions. RESULTS Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. CONCLUSION Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker and AutoDock Vina.
Collapse
Affiliation(s)
| | - Camila Rizzotto
- Pontifical Catholic University of Rio Grande do Sul - PUCRS, Porto Alegre-RS, Brazil
| | | |
Collapse
|
4
|
Berenger F, Kumar A, Zhang KYJ, Yamanishi Y. Lean-Docking: Exploiting Ligands' Predicted Docking Scores to Accelerate Molecular Docking. J Chem Inf Model 2021; 61:2341-2352. [PMID: 33861591 DOI: 10.1021/acs.jcim.0c01452] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In structure-based virtual screening (SBVS), a binding site on a protein structure is used to search for ligands with favorable nonbonded interactions. Because it is computationally difficult, docking is time-consuming and any docking user will eventually encounter a chemical library that is too big to dock. This problem might arise because there is not enough computing power or because preparing and storing so many three-dimensional (3D) ligands requires too much space. In this study, however, we show that quality regressors can be trained to predict docking scores from molecular fingerprints. Although typical docking has a screening rate of less than one ligand per second on one CPU core, our regressors can predict about 5800 docking scores per second. This approach allows us to focus docking on the portion of a database that is predicted to have docking scores below a user-chosen threshold. Herein, usage examples are shown, where only 25% of a ligand database is docked, without any significant virtual screening performance loss. We call this method "lean-docking". To validate lean-docking, a massive docking campaign using several state-of-the-art docking software packages was undertaken on an unbiased data set, with only wet-lab tested active and inactive molecules. Although regressors allow the screening of a larger chemical space, even at a constant docking power, it is also clear that significant progress in the virtual screening power of docking scores is desirable.
Collapse
Affiliation(s)
- Francois Berenger
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka 820-8502, Japan
| | - Ashutosh Kumar
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kam Y J Zhang
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka 820-8502, Japan
| |
Collapse
|
5
|
Bitencourt-Ferreira G, Duarte da Silva A, Filgueira de Azevedo W. Application of Machine Learning Techniques to Predict Binding Affinity for Drug Targets: A Study of Cyclin-Dependent Kinase 2. Curr Med Chem 2021; 28:253-265. [PMID: 31729287 DOI: 10.2174/2213275912666191102162959] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 08/22/2019] [Accepted: 09/24/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. OBJECTIVE Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. METHODS We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. RESULTS Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. CONCLUSION Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.
Collapse
Affiliation(s)
- Gabriela Bitencourt-Ferreira
- Laboratory of Computational Systems Biology. Pontifical Catholic University of Rio Grande do Sul (PUCRS). Av. Ipiranga, 6681 Porto Alegre/RS 90619-900 , Brazil
| | - Amauri Duarte da Silva
- Specialization Program in Bioinformatics. Pontifical Catholic University of Rio Grande do Sul (PUCRS). Av. Ipiranga, 6681 Porto Alegre/RS 90619-900, Brazil
| | - Walter Filgueira de Azevedo
- Laboratory of Computational Systems Biology. Pontifical Catholic University of Rio Grande do Sul (PUCRS). Av. Ipiranga, 6681 Porto Alegre/RS 90619-900 , Brazil
| |
Collapse
|
6
|
Singh N, Villoutreix BO. Demystifying the Molecular Basis of Pyrazoloquinolinones Recognition at the Extracellular α1+/β3- Interface of the GABA A Receptor by Molecular Modeling. Front Pharmacol 2020; 11:561834. [PMID: 33041802 PMCID: PMC7518038 DOI: 10.3389/fphar.2020.561834] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 08/26/2020] [Indexed: 12/16/2022] Open
Abstract
GABAA receptors are pentameric ligand-gated ion channels that serve as major inhibitory neurotransmitter receptors in the mammalian brain and the target of numerous clinically relevant drugs interacting with different ligand binding sites. Here, we report an in silico approach to investigate the binding of pyrazoloquinolinones (PQs) that mediate allosteric effects through the extracellular α+/β- interface of GABAA receptors. First, we docked a potent prototype of PQs into the α1+/β3- site of a homology model of the human α1β3γ2 subtype of the GABAA receptor. Next, for each docking pose, we computationally derived protein-ligand complexes for 18 PQ analogs with known experimental potency. Subsequently, binding energy was calculated for all complexes using the molecular mechanics-generalized Born surface area method. Finally, docking poses were quantitatively assessed in the light of experimental data to derive a binding hypothesis. Collectively, the results indicate that PQs at the α1+/β3- site likely exhibit a common binding mode that can be characterized by a hydrogen bond interaction with β3Q64 and hydrophobic interactions involving residues α1F99, β3Y62, β3M115, α1Y159, and α1Y209. Importantly, our results are in good agreement with the recently resolved cryo-Electron Microscopy structures of the human α1β3γ2 and α1β2γ2 subtypes of GABAA receptors.
Collapse
Affiliation(s)
- Natesh Singh
- Univ. Lille, INSERM, Institut Pasteur de Lille, U1177-Drugs and Molecules for Living Systems, Lille, France.,Department of Pharmaceutical Chemistry, University of Vienna, Vienna, Austria
| | - Bruno O Villoutreix
- Univ. Lille, INSERM, Institut Pasteur de Lille, U1177-Drugs and Molecules for Living Systems, Lille, France
| |
Collapse
|
7
|
Singh N, Chaput L, Villoutreix BO. Fast Rescoring Protocols to Improve the Performance of Structure-Based Virtual Screening Performed on Protein-Protein Interfaces. J Chem Inf Model 2020; 60:3910-3934. [PMID: 32786511 DOI: 10.1021/acs.jcim.0c00545] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Protein-protein interactions (PPIs) are attractive targets for drug design because of their essential role in numerous cellular processes and disease pathways. However, in general, PPIs display exposed binding pockets at the interface, and as such, have been largely unexploited for therapeutic interventions with low-molecular weight compounds. Here, we used docking and various rescoring strategies in an attempt to recover PPI inhibitors from a set of active and inactive molecules for 11 targets collected in ChEMBL and PubChem. Our focus is on the screening power of the various developed protocols and on using fast approaches so as to be able to apply such a strategy to the screening of ultralarge libraries in the future. First, we docked compounds into each target using the fast "pscreen" mode of the structure-based virtual screening (VS) package Surflex. Subsequently, the docking poses were postprocessed to derive a set of 3D topological descriptors: (i) shape similarity and (ii) interaction fingerprint similarity with a co-crystallized inhibitor, (iii) solvent-accessible surface area, and (iv) extent of deviation from the geometric center of a reference inhibitor. The derivatized descriptors, together with descriptor-scaled scoring functions, were utilized to investigate possible impacts on VS performance metrics. Moreover, four standalone scoring functions, RF-Score-VS (machine-learning), DLIGAND2 (knowledge-based), Vinardo (empirical), and X-SCORE (empirical), were employed to rescore the PPI compounds. Collectively, the results indicate that the topological scoring algorithms could be valuable both at a global level, with up to 79% increase in areas under the receiver operating characteristic curve for some targets, and in early stages, with up to a 4-fold increase in enrichment factors at 1% of the screened collections. Outstandingly, DLIGAND2 emerged as the best scoring function on this data set, outperforming all rescoring techniques in terms of VS metrics. The described methodology could help in the rational design of small-molecule PPI inhibitors and has direct applications in many therapeutic areas, including cancer, CNS, and infectious diseases such as COVID-19.
Collapse
Affiliation(s)
- Natesh Singh
- Université de Lille, Inserm, Institut Pasteur de Lille, U1177-Drugs and Molecules for Living Systems, F-59000 Lille, France
| | - Ludovic Chaput
- Université de Lille, Inserm, Institut Pasteur de Lille, U1177-Drugs and Molecules for Living Systems, F-59000 Lille, France
| | - Bruno O Villoutreix
- Université de Lille, Inserm, Institut Pasteur de Lille, U1177-Drugs and Molecules for Living Systems, F-59000 Lille, France
| |
Collapse
|
8
|
Horvath D, Marcou G, Varnek A. "Big Data" Fast Chemoinformatics Model to Predict Generalized Born Radius and Solvent Accessibility as a Function of Geometry. J Chem Inf Model 2020; 60:2951-2965. [PMID: 32374171 DOI: 10.1021/acs.jcim.9b01172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The Generalized Born (GB) solvent model is offering the best accuracy/computing effort ratio yet requires drastic simplifications to estimate of the Effective Born Radii (EBR) in bypassing a too expensive volume integration step. EBRs are a measure of the degree of burial of an atom and not very sensitive to small changes of geometry: in molecular dynamics, the costly EBR update procedure is not mandatory at every step. This work however aims at implementing a GB model into the Sampler for Multiple Protein-Ligand Entities (S4MPLE) evolutionary algorithm with mandatory EBR updates at each step triggering arbitrarily large geometric changes. Therefore, a quantitative structure-property relationship has been developed in order to express the EBRs as a linear function of both the topological neighborhood and geometric occupancy of the space around atoms. A training set of 810 molecular systems, starting from fragment-like to drug-like compounds, proteins, host-guest systems, and ligand-protein complexes, has been compiled. For each species, S4MPLE generated several hundreds of random conformers. For each atom in each geometry of each species, its "standard" EBR was calculated by numeric integration and associated to topological and geometric descriptors of the atom neighborhood. This training set (EBR, atom descriptors) involving >5 M entries was subjected to a boot-strapping multilinear regression process with descriptor selection. In parallel, the strategy was repurposed to also learn atomic solvent-accessible areas (SA) based on the same descriptors. Resulting linear equations were challenged to predict EBR and SA values for a similarly compiled external set of >2000 new molecular systems. Solvation energies calculated with estimated EBR and SA match "standard" energies within the typical error of a force-field-based approach (a few kilocalories per mole). Given the extreme diversity of molecular systems covered by the model, this simple EBR/SA estimator covers a vast applicability domain.
Collapse
Affiliation(s)
- Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| |
Collapse
|
9
|
Ramírez-Palma DI, García-Jacas CR, Carpio-Martínez P, Cortés-Guzmán F. Predicting reactive sites with quantum chemical topology: carbonyl additions in multicomponent reactions. Phys Chem Chem Phys 2020; 22:9283-9289. [DOI: 10.1039/d0cp00300j] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The reactivity of an atom within a molecule depends mostly on the way the electron density polarizes reflected in the quadrupole moment of the reactive atom.
Collapse
Affiliation(s)
| | - Cesar R. García-Jacas
- Cátedras CONACYT – Departamento de Ciencias de la Computación
- Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE)
- Ensenada
- Mexico
| | | | | |
Collapse
|
10
|
Moman E, Grishina MA, Potemkin VA. Nonparametric chemical descriptors for the calculation of ligand-biopolymer affinities with machine-learning scoring functions. J Comput Aided Mol Des 2019; 33:943-953. [PMID: 31728812 DOI: 10.1007/s10822-019-00248-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 11/04/2019] [Indexed: 12/20/2022]
Abstract
The computational prediction of ligand-biopolymer affinities is a crucial endeavor in modern drug discovery and one that still poses major challenges. The choice of the appropriate computational method often reveals itself as a trade-off between accuracy and speed, with mathematical devices referred to as scoring functions being the fastest. Among the many shortcomings of scoring functions there is the lack of universal applicability to every molecular system. This is so largely due to their reliance on atom type perception and/or parametrization. This article proposes the use of nonparametric Model of Effective Radii of Atoms descriptors that can be readily computed for the entire Periodic Table and demonstrate that, in combination with machine learning algorithms, they can yield competitive performances and chemically meaningful insights.
Collapse
Affiliation(s)
- Edelmiro Moman
- South Ural State University, 20A Tchaikovsky Street, Chelyabinsk, Russian Federation, 454080.
| | - Maria A Grishina
- South Ural State University, 20A Tchaikovsky Street, Chelyabinsk, Russian Federation, 454080
| | - Vladimir A Potemkin
- South Ural State University, 20A Tchaikovsky Street, Chelyabinsk, Russian Federation, 454080
| |
Collapse
|
11
|
Scheidig AJ, Horvath D, Szedlacsek SE. Crystal structure of a xylulose 5-phosphate phosphoketolase. Insights into the substrate specificity for xylulose 5-phosphate. J Struct Biol 2019; 207:85-102. [PMID: 31059775 DOI: 10.1016/j.jsb.2019.04.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Revised: 04/25/2019] [Accepted: 04/26/2019] [Indexed: 12/11/2022]
Abstract
Phosphoketolases (PK) are TPP-dependent enzymes which play essential roles in carbohydrate metabolism of numerous bacteria. Depending on the substrate specificity PKs can be subdivided into xylulose 5-phosphate (X5P) specific PKs (XPKs) and PKs which accept both X5P and fructose 6-phosphate (F6P) (XFPKs). Despite their key metabolic importance, so far only the crystal structures of two XFPKs have been reported. There are no reported structures for any XPKs and for any complexes between PK and substrate. One of the major unknowns concerning PKs mechanism of action is related to the structural determinants of PKs substrate specificity for X5P or F6P. We report here the crystal structure of XPK from Lactococcus lactis (XPK-Ll) at 2.1 Å resolution. Using small angle X-ray scattering (SAXS) we proved that XPK-Ll is a dimer in solution. Towards better understanding of PKs substrate specificity, we performed flexible docking of TPP-X5P and TPP-F6P on crystal structures of XPK-Ll, two XFPKs and transketolase (TK). Calculated structure-based binding energies consistently support XPK-Ll preference for X5P. Analysis of structural models thus obtained show that substrates adopt moderately different conformation in PKs active sites following distinct networks of polar interactions. Based on the here reported structure of XPK-Ll we propose the most probable amino acid residues involved in the catalytic steps of reaction mechanism. Altogether our results suggest that PKs substrate preference for X5P or F6P is the outcome of a fine balance between specific binding network and dissimilar catalytic residues depending on the enzyme (XPK or XFPK) - substrate (X5P or F6P) couples.
Collapse
Affiliation(s)
- A J Scheidig
- Structural Biology, Zoological Institute, Kiel University, Am Botanischen Garten 1-9, 24118 Kiel, Germany.
| | - D Horvath
- Laboratoire de Chémoinformatique, UMR 7140 CNRS-Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France.
| | - S E Szedlacsek
- Department of Enzymology, Institute of Biochemistry of the Romanian Academy, Spl. Independentei 296, Bucharest 060031, Romania.
| |
Collapse
|
12
|
Horvath D, Marcou G, Varnek A. Generative Topographic Mapping of the Docking Conformational Space. Molecules 2019; 24:molecules24122269. [PMID: 31216756 PMCID: PMC6631714 DOI: 10.3390/molecules24122269] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 06/14/2019] [Accepted: 06/15/2019] [Indexed: 12/21/2022] Open
Abstract
Following previous efforts to render the Conformational Space (CS) of flexible compounds by Generative Topographic Mapping (GTM), this polyvalent mapping technique is here adapted to the docking problem. Contact fingerprints (CF) characterize ligands from the perspective of the binding site by monitoring protein atoms that are “touched” by those of the ligand. A “Contact” (CF) map was built by GTM-driven dimensionality reduction of the CF vector space. Alternatively, a “Hybrid” (Hy) map used a composite descriptor of CFs concatenated with ligand fragment descriptors. These maps indirectly represent the active site and integrate the binding information of multiple ligands. The concept is illustrated by a docking study into the ATP-binding site of CDK2, using the S4MPLE program to generate thousands of poses for each ligand. Both maps were challenged to (1) Discriminate native from non-native ligand poses, e.g., create RMSD-landscapes “colored” by the conformer ensemble of ligands of known binding modes in order to highlight “native” map zones (poses with RMSD to PDB structures < 2Å). Then, projection of poses of other ligands on such landscapes might serve to predict those falling in native zones as being well-docked. (2) Distinguish ligands–characterized by their ensemble of conformers–by their potency, e.g., testing the hypotheses whether zones privileged by potent binders are clearly separated from the ones preferred by decoys on the maps. Hybrid maps were better in both challenges and outperformed the classical energy and individual contact satisfaction scores in discriminating ligands by potency. Moreover, the intuitive visualization and analysis of docking CS may, as already mentioned, have several applications–from highlighting of key contacts to monitoring docking calculation convergence.
Collapse
Affiliation(s)
- Dragos Horvath
- Laboratoire de Chemoinformatique, UMR7140 CNRS/Univ. of Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg, France.
| | - Gilles Marcou
- Laboratoire de Chemoinformatique, UMR7140 CNRS/Univ. of Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg, France.
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique, UMR7140 CNRS/Univ. of Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg, France.
| |
Collapse
|
13
|
Sattarov B, Baskin II, Horvath D, Marcou G, Bjerrum EJ, Varnek A. De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping. J Chem Inf Model 2019; 59:1182-1196. [PMID: 30785751 DOI: 10.1021/acs.jcim.8b00751] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topographic map. Targeted map zones can be used for generating novel molecular structures by sampling associated latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compound properties "on the fly". The generated focused molecular libraries were shown to contain original and a priori feasible compounds which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estimation procedures (pharmacophore matching, docking).
Collapse
Affiliation(s)
- Boris Sattarov
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Igor I Baskin
- Faculty of Physics , M.V. Lomonosov Moscow State University , Leninskie Gory , Moscow 19991 , Russia
| | - Dragos Horvath
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Gilles Marcou
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Esben Jannik Bjerrum
- Wildcard Pharmaceutical Consulting, Zeaborg Science Center, Frødings Allé 41 , 2860 Søborg , Denmark
| | - Alexandre Varnek
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| |
Collapse
|