1
|
DMSO Solubility Assessment for Fragment-Based Screening. Molecules 2021; 26:molecules26133950. [PMID: 34203441 PMCID: PMC8271413 DOI: 10.3390/molecules26133950] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 06/23/2021] [Accepted: 06/23/2021] [Indexed: 11/16/2022] Open
Abstract
In this paper, we report comprehensive experimental and chemoinformatics analyses of the solubility of small organic molecules (“fragments”) in dimethyl sulfoxide (DMSO) in the context of their ability to be tested in screening experiments. Here, DMSO solubility of 939 fragments has been measured experimentally using an NMR technique. A Support Vector Classification model was built on the obtained data using the ISIDA fragment descriptors. The analysis revealed 34 outliers: experimental issues were retrospectively identified for 28 of them. The updated model performs well in 5-fold cross-validation (balanced accuracy = 0.78). The datasets are available on the Zenodo platform (DOI:10.5281/zenodo.4767511) and the model is available on the website of the Laboratory of Chemoinformatics.
Collapse
|
2
|
Horvath D, Marcou G, Varnek A. Generative topographic mapping in drug design. DRUG DISCOVERY TODAY. TECHNOLOGIES 2019; 32-33:99-107. [PMID: 33386101 DOI: 10.1016/j.ddtec.2020.06.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 06/10/2020] [Accepted: 06/18/2020] [Indexed: 06/12/2023]
Abstract
This is a review article of Generative Topographic Mapping (GTM) - a non-linear dimensionality reduction technique producing generative 2D maps of high-dimensional vector spaces - and its specific applications in Drug Design (chemical space cartography, compound library design and analysis, virtual screening, pharmacological profiling, de novo drug design, conformational space & docking interaction cartography, etc.) Written by chemoinformaticians for potential users among medicinal chemists and biologists, the article purposely avoids all underlying mathematics. First, the GTM concept is intuitively explained, based on the strong analogies with the rather popular Self-Organizing Maps (SOMs), which are well established library analysis tools. GTM is basically a fuzzy-logics-based generalization of SOMs. The second part of the review, some of published GTM applications in drug design are briefly revisited.
Collapse
Affiliation(s)
- Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France.
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France.
| |
Collapse
|
3
|
Lunghini F, Marcou G, Azam P, Horvath D, Patoux R, Van Miert E, Varnek A. Consensus models to predict oral rat acute toxicity and validation on a dataset coming from the industrial context. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2019; 30:879-897. [PMID: 31607169 DOI: 10.1080/1062936x.2019.1672089] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 09/21/2019] [Indexed: 06/10/2023]
Abstract
We report predictive models of acute oral systemic toxicity representing a follow-up of our previous work in the framework of the NICEATM project. It includes the update of original models through the addition of new data and an external validation of the models using a dataset relevant for the chemical industry context. A regression model for LD50 and multi-class classification model for toxicity classes according to the Global Harmonized System categories were prepared. ISIDA descriptors were used to encode molecular structures. Machine learning algorithms included support vector machine (SVM), random forest (RF) and naïve Bayesian. Selected individual models were combined in consensus. The different datasets were compared using the generative topographic mapping approach. It appeared that the NICEATM datasets were lacking some relevant chemotypes for chemical industry. The new models trained on enlarged data sets have applicability domains (AD) sufficiently large to accommodate industrial compounds. The fraction of compounds inside the models' AD increased from 58% (NICEATM model) to 94% (new model). The increase of training sets improved models' prediction performance: RMSE values decreased from 0.56 to 0.47 and balanced accuracies increased from 0.69 to 0.71 for NICEATM and new models, respectively.
Collapse
Affiliation(s)
- F Lunghini
- Laboratory of Chemoinformatics, University of Strasbourg, Strasbourg, France
- Toxicological and Environmental Risk Assessment unit, Solvay S.A., St. Fons, France
| | - G Marcou
- Laboratory of Chemoinformatics, University of Strasbourg, Strasbourg, France
| | - P Azam
- Toxicological and Environmental Risk Assessment unit, Solvay S.A., St. Fons, France
| | - D Horvath
- Laboratory of Chemoinformatics, University of Strasbourg, Strasbourg, France
| | - R Patoux
- Toxicological and Environmental Risk Assessment unit, Solvay S.A., St. Fons, France
| | - E Van Miert
- Toxicological and Environmental Risk Assessment unit, Solvay S.A., St. Fons, France
| | - A Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, Strasbourg, France
| |
Collapse
|
4
|
Lunghini F, Marcou G, Azam P, Patoux R, Enrici MH, Bonachera F, Horvath D, Varnek A. QSPR models for bioconcentration factor (BCF): are they able to predict data of industrial interest? SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2019; 30:507-524. [PMID: 31244346 DOI: 10.1080/1062936x.2019.1626278] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 05/29/2019] [Indexed: 05/27/2023]
Abstract
The bioconcentration factor (BCF), a key parameter required by the REACH regulation, estimates the tendency for a xenobiotic to concentrate inside living organisms. In silico methods can be valid alternatives to costly data measurements. However, in the industrial context, these theoretical approaches may fail to predict BCF with reasonable accuracy. We analyzed whether models built on public data only have adequate performances when challenged to predict industrial compounds. A new set of 1129 compounds has been collected by merging publicly available datasets. Generative Topographic Mapping was employed to compare this chemical space with a set of new compounds issued from the industry. Some new chemotypes absent in the training set (such as siloxanes) have been detected. A new BCF model has been built using ISIDA (In SIlico design and Data Analysis) fragment descriptors, support vector regression and random forest machine-learning methods. It has been externally validated on: (i) collected data from the literature and (ii) industrial data. The latter also served as benchmark for the freely available tools VEGA, EPISuite, TEST, OPERA. New model performs (RMSE of 0.58 log BCF units) comparably to existing ones but benefits of an extended applicability, covering the industrial set chemical space (78% data coverage).
Collapse
Affiliation(s)
- F Lunghini
- a Laboratory of Chemoinformatics , University of Strasbourg , Strasbourg , France
- b Solvay S.A ., France
| | - G Marcou
- a Laboratory of Chemoinformatics , University of Strasbourg , Strasbourg , France
| | | | | | | | - F Bonachera
- a Laboratory of Chemoinformatics , University of Strasbourg , Strasbourg , France
| | - D Horvath
- a Laboratory of Chemoinformatics , University of Strasbourg , Strasbourg , France
| | - A Varnek
- a Laboratory of Chemoinformatics , University of Strasbourg , Strasbourg , France
| |
Collapse
|
5
|
Scheidig AJ, Horvath D, Szedlacsek SE. Crystal structure of a xylulose 5-phosphate phosphoketolase. Insights into the substrate specificity for xylulose 5-phosphate. J Struct Biol 2019; 207:85-102. [PMID: 31059775 DOI: 10.1016/j.jsb.2019.04.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Revised: 04/25/2019] [Accepted: 04/26/2019] [Indexed: 12/11/2022]
Abstract
Phosphoketolases (PK) are TPP-dependent enzymes which play essential roles in carbohydrate metabolism of numerous bacteria. Depending on the substrate specificity PKs can be subdivided into xylulose 5-phosphate (X5P) specific PKs (XPKs) and PKs which accept both X5P and fructose 6-phosphate (F6P) (XFPKs). Despite their key metabolic importance, so far only the crystal structures of two XFPKs have been reported. There are no reported structures for any XPKs and for any complexes between PK and substrate. One of the major unknowns concerning PKs mechanism of action is related to the structural determinants of PKs substrate specificity for X5P or F6P. We report here the crystal structure of XPK from Lactococcus lactis (XPK-Ll) at 2.1 Å resolution. Using small angle X-ray scattering (SAXS) we proved that XPK-Ll is a dimer in solution. Towards better understanding of PKs substrate specificity, we performed flexible docking of TPP-X5P and TPP-F6P on crystal structures of XPK-Ll, two XFPKs and transketolase (TK). Calculated structure-based binding energies consistently support XPK-Ll preference for X5P. Analysis of structural models thus obtained show that substrates adopt moderately different conformation in PKs active sites following distinct networks of polar interactions. Based on the here reported structure of XPK-Ll we propose the most probable amino acid residues involved in the catalytic steps of reaction mechanism. Altogether our results suggest that PKs substrate preference for X5P or F6P is the outcome of a fine balance between specific binding network and dissimilar catalytic residues depending on the enzyme (XPK or XFPK) - substrate (X5P or F6P) couples.
Collapse
Affiliation(s)
- A J Scheidig
- Structural Biology, Zoological Institute, Kiel University, Am Botanischen Garten 1-9, 24118 Kiel, Germany.
| | - D Horvath
- Laboratoire de Chémoinformatique, UMR 7140 CNRS-Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France.
| | - S E Szedlacsek
- Department of Enzymology, Institute of Biochemistry of the Romanian Academy, Spl. Independentei 296, Bucharest 060031, Romania.
| |
Collapse
|
6
|
Horvath D, Marcou G, Varnek A. Generative Topographic Mapping of the Docking Conformational Space. Molecules 2019; 24:molecules24122269. [PMID: 31216756 PMCID: PMC6631714 DOI: 10.3390/molecules24122269] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 06/14/2019] [Accepted: 06/15/2019] [Indexed: 12/21/2022] Open
Abstract
Following previous efforts to render the Conformational Space (CS) of flexible compounds by Generative Topographic Mapping (GTM), this polyvalent mapping technique is here adapted to the docking problem. Contact fingerprints (CF) characterize ligands from the perspective of the binding site by monitoring protein atoms that are “touched” by those of the ligand. A “Contact” (CF) map was built by GTM-driven dimensionality reduction of the CF vector space. Alternatively, a “Hybrid” (Hy) map used a composite descriptor of CFs concatenated with ligand fragment descriptors. These maps indirectly represent the active site and integrate the binding information of multiple ligands. The concept is illustrated by a docking study into the ATP-binding site of CDK2, using the S4MPLE program to generate thousands of poses for each ligand. Both maps were challenged to (1) Discriminate native from non-native ligand poses, e.g., create RMSD-landscapes “colored” by the conformer ensemble of ligands of known binding modes in order to highlight “native” map zones (poses with RMSD to PDB structures < 2Å). Then, projection of poses of other ligands on such landscapes might serve to predict those falling in native zones as being well-docked. (2) Distinguish ligands–characterized by their ensemble of conformers–by their potency, e.g., testing the hypotheses whether zones privileged by potent binders are clearly separated from the ones preferred by decoys on the maps. Hybrid maps were better in both challenges and outperformed the classical energy and individual contact satisfaction scores in discriminating ligands by potency. Moreover, the intuitive visualization and analysis of docking CS may, as already mentioned, have several applications–from highlighting of key contacts to monitoring docking calculation convergence.
Collapse
Affiliation(s)
- Dragos Horvath
- Laboratoire de Chemoinformatique, UMR7140 CNRS/Univ. of Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg, France.
| | - Gilles Marcou
- Laboratoire de Chemoinformatique, UMR7140 CNRS/Univ. of Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg, France.
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique, UMR7140 CNRS/Univ. of Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg, France.
| |
Collapse
|
7
|
Sattarov B, Baskin II, Horvath D, Marcou G, Bjerrum EJ, Varnek A. De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping. J Chem Inf Model 2019; 59:1182-1196. [PMID: 30785751 DOI: 10.1021/acs.jcim.8b00751] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topographic map. Targeted map zones can be used for generating novel molecular structures by sampling associated latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compound properties "on the fly". The generated focused molecular libraries were shown to contain original and a priori feasible compounds which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estimation procedures (pharmacophore matching, docking).
Collapse
Affiliation(s)
- Boris Sattarov
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Igor I Baskin
- Faculty of Physics , M.V. Lomonosov Moscow State University , Leninskie Gory , Moscow 19991 , Russia
| | - Dragos Horvath
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Gilles Marcou
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Esben Jannik Bjerrum
- Wildcard Pharmaceutical Consulting, Zeaborg Science Center, Frødings Allé 41 , 2860 Søborg , Denmark
| | - Alexandre Varnek
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| |
Collapse
|
8
|
Horvath D, Marcou G, Varnek A. Monitoring of the Conformational Space of Dipeptides by Generative Topographic Mapping. Mol Inform 2017; 37. [DOI: 10.1002/minf.201700115] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 11/08/2017] [Indexed: 12/28/2022]
Affiliation(s)
- Dragos Horvath
- Laboratoire de Chémoinformatique; UMR 7140 CNRS-Université de Strasbourg; 1 rue Blaise Pascal Strasbourg 67000 France
| | - Gilles Marcou
- Laboratoire de Chémoinformatique; UMR 7140 CNRS-Université de Strasbourg; 1 rue Blaise Pascal Strasbourg 67000 France
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique; UMR 7140 CNRS-Université de Strasbourg; 1 rue Blaise Pascal Strasbourg 67000 France
| |
Collapse
|