1
|
Lejmi M, Geslin D, Bureau R, Cuissart B, Ben Slima I, Meddouri N, Borgi A, Lamotte JL, Lepailleur A. Navigating pharmacophore space to identify activity discontinuities: A case study with BCR-ABL. Mol Inform 2024; 43:e202400050. [PMID: 38979846 DOI: 10.1002/minf.202400050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 04/03/2024] [Accepted: 04/04/2024] [Indexed: 07/10/2024]
Abstract
The exploration of chemical space is a fundamental aspect of chemoinformatics, particularly when one explores a large compound data set to relate chemical structures with molecular properties. In this study, we extend our previous work on chemical space visualization at the pharmacophoric level. Instead of using conventional binary classification of affinity (active vs inactive), we introduce a refined approach that categorizes compounds into four distinct classes based on their activity levels: super active, very active, active, and inactive. This classification enriches the color scheme applied to pharmacophore space, where the color representation of a pharmacophore hypothesis is driven by the associated compounds. Using the BCR-ABL tyrosine kinase as a case study, we identified intriguing regions corresponding to pharmacophore activity discontinuities, providing valuable insights for structure-activity relationships analysis.
Collapse
Affiliation(s)
- Maroua Lejmi
- Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen, UNICAEN, ENSICAEN, CNRS - UMR GREYC, Normandie Univ, Caen, France
- Laboratoire en Informatique, Programmation Algorithmique et Heuristique, LIPAH, Université de Tunis El Manar, Tunis, Tunisia
| | - Damien Geslin
- Centre d'Etudes et de Recherche sur le Médicament de Normandie, UNICAEN, CERMN, Normandie Univ, Caen, France
| | - Ronan Bureau
- Centre d'Etudes et de Recherche sur le Médicament de Normandie, UNICAEN, CERMN, Normandie Univ, Caen, France
| | - Bertrand Cuissart
- Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen, UNICAEN, ENSICAEN, CNRS - UMR GREYC, Normandie Univ, Caen, France
| | - Ilef Ben Slima
- ISMAI, University of Kairouan, Kairouan, Tunisia
- Laboratory of Signals, systeMs, aRtificial Intelligence and neTworkS, SM@RTS, Digital Research Center of Sfax, Sfax, Tunisia
| | - Nida Meddouri
- Laboratoire de Recherche de l'EPITA, LRE, Le Kremlin-Bicêtre, Paris, France
| | - Amel Borgi
- Laboratoire en Informatique, Programmation Algorithmique et Heuristique, LIPAH, Université de Tunis El Manar, Tunis, Tunisia
| | - Jean-Luc Lamotte
- Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen, UNICAEN, ENSICAEN, CNRS - UMR GREYC, Normandie Univ, Caen, France
| | - Alban Lepailleur
- Centre d'Etudes et de Recherche sur le Médicament de Normandie, UNICAEN, CERMN, Normandie Univ, Caen, France
| |
Collapse
|
2
|
Pikalyova R, Zabolotna Y, Horvath D, Marcou G, Varnek A. Meta-GTM: Visualization and Analysis of the Chemical Library Space. J Chem Inf Model 2023; 63:5571-5582. [PMID: 37602843 DOI: 10.1021/acs.jcim.3c00719] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/22/2023]
Abstract
In chemical library analysis, it may be useful to describe libraries as individual items rather than collections of compounds. This is particularly true for ultra-large noncherry-pickable compound mixtures, such as DNA-encoded libraries (DELs). In this sense, the chemical library space (CLS) is useful for the management of a portfolio of libraries, just like chemical space (CS) helps manage a portfolio of molecules. Several possible CLSs were previously defined using vectorial library representations obtained from generative topographic mapping (GTM). Given the steadily growing number of DEL designs, the CLS becomes "crowded" and requires analysis tools beyond pairwise library comparison. Therefore, herein, we investigate the cartography of CLS on meta-(μ)GTMs─"meta" to remind that these are maps of the CLS, itself based on responsibility vectors issued by regular CS GTMs. 2,5 K DELs and ChEMBL (reference) were projected on the μGTM, producing landscapes of library-specific properties. These describe both interlibrary similarity and intrinsic library characteristics in the same view, herewith facilitating the selection of the best project-specific libraries.
Collapse
Affiliation(s)
- Regina Pikalyova
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Yuliana Zabolotna
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| |
Collapse
|
3
|
Kausar S, Falcao AO. A visual approach for analysis and inference of molecular activity spaces. J Cheminform 2019; 11:63. [PMID: 33430986 PMCID: PMC6805449 DOI: 10.1186/s13321-019-0386-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 10/05/2019] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Molecular space visualization can help to explore the diversity of large heterogeneous chemical data, which ultimately may increase the understanding of structure-activity relationships (SAR) in drug discovery projects. Visual SAR analysis can therefore be useful for library design, chemical classification for their biological evaluation and virtual screening for the selection of compounds for synthesis or in vitro testing. As such, computational approaches for molecular space visualization have become an important issue in cheminformatics research. The proposed approach uses molecular similarity as the sole input for computing a probabilistic surface of molecular activity (PSMA). This similarity matrix is transformed in 2D using different dimension reduction algorithms (Principal Coordinates Analysis ( PCooA), Kruskal multidimensional scaling, Sammon mapping and t-SNE). From this projection, a kernel density function is applied to compute the probability of activity for each coordinate in the new projected space. RESULTS This methodology was tested over four different quantitative structure-activity relationship (QSAR) binary classification data sets and the PSMAs were computed for each. The generated maps showed internal consistency with active molecules grouped together for all data sets and all dimensionality reduction algorithms. To validate the quality of the generated maps, the 2D coordinates of test molecules were computed into the new reference space using a data transformation matrix. In total sixteen PSMAs were built, and their performance was assessed using the Area Under Curve (AUC) and the Matthews Coefficient Correlation (MCC). For the best projections for each data set, AUC testing results ranged from 0.87 to 0.98 and the MCC scores ranged from 0.33 to 0.77, suggesting this methodology can validly capture the complexities of the molecular activity space. All four mapping functions provided generally good results yet the overall performance of PCooA and t-SNE was slightly better than Sammon mapping and Kruskal multidimensional scaling. CONCLUSIONS Our result showed that by using an appropriate combination of metric space representation and dimensionality reduction applied over metric spaces it is possible to produce a visual PSMA for which its consistency has been validated by using this map as a classification model. The produced maps can be used as prediction tools as it is simple to project any molecule into this new reference space as long as the similarities to the molecules used to compute the initial similarity matrix can be computed.
Collapse
Affiliation(s)
- Samina Kausar
- LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
- BioISI: Biosystems & Integrative Sciences Institute, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| | - Andre O. Falcao
- LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
- BioISI: Biosystems & Integrative Sciences Institute, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| |
Collapse
|
4
|
Sattarov B, Baskin II, Horvath D, Marcou G, Bjerrum EJ, Varnek A. De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping. J Chem Inf Model 2019; 59:1182-1196. [PMID: 30785751 DOI: 10.1021/acs.jcim.8b00751] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topographic map. Targeted map zones can be used for generating novel molecular structures by sampling associated latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compound properties "on the fly". The generated focused molecular libraries were shown to contain original and a priori feasible compounds which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estimation procedures (pharmacophore matching, docking).
Collapse
Affiliation(s)
- Boris Sattarov
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Igor I Baskin
- Faculty of Physics , M.V. Lomonosov Moscow State University , Leninskie Gory , Moscow 19991 , Russia
| | - Dragos Horvath
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Gilles Marcou
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Esben Jannik Bjerrum
- Wildcard Pharmaceutical Consulting, Zeaborg Science Center, Frødings Allé 41 , 2860 Søborg , Denmark
| | - Alexandre Varnek
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| |
Collapse
|
5
|
Abstract
Various methods of machine learning, supervised and unsupervised, linear and nonlinear, classification and regression, in combination with various types of molecular descriptors, both "handcrafted" and "data-driven," are considered in the context of their use in computational toxicology. The use of multiple linear regression, variants of naïve Bayes classifier, k-nearest neighbors, support vector machine, decision trees, ensemble learning, random forest, several types of neural networks, and deep learning is the focus of attention of this review. The role of fragment descriptors, graph mining, and graph kernels is highlighted. The application of unsupervised methods, such as Kohonen's self-organizing maps and related approaches, which allow for combining predictions with data analysis and visualization, is also considered. The necessity of applying a wide range of machine learning methods in computational toxicology is underlined.
Collapse
Affiliation(s)
- Igor I Baskin
- Faculty of Physics, M.V. Lomonosov Moscow State University, Moscow, Russian Federation.
- Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russian Federation.
| |
Collapse
|
6
|
Kontijevskis A. Mapping of Drug-like Chemical Universe with Reduced Complexity Molecular Frameworks. J Chem Inf Model 2017; 57:680-699. [DOI: 10.1021/acs.jcim.7b00006] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|