51
|
On Generative Topographic Mapping and Graph Theory combined approach for unsupervised non-linear data visualization and fault identification. Comput Chem Eng 2017. [DOI: 10.1016/j.compchemeng.2016.12.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
52
|
Horvath D, Marcou G, Varnek A. Generative Topographic Mapping Approach to Chemical Space Analysis. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2017. [DOI: 10.1007/978-3-319-56850-8_6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
53
|
Velkoborsky J, Hoksza D. Scaffold analysis of PubChem database as background for hierarchical scaffold-based visualization. J Cheminform 2016; 8:74. [PMID: 28090217 PMCID: PMC5199768 DOI: 10.1186/s13321-016-0186-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 12/02/2016] [Indexed: 11/25/2022] Open
Abstract
Background Visualization of large molecular datasets is a challenging yet important topic utilised in diverse fields of chemistry ranging from material engineering to drug design. Especially in drug design, modern methods of high-throughput screening generate large amounts of molecular data that call for methods enabling their analysis. One such method is classification of compounds based on their molecular scaffolds, a concept widely used by medicinal chemists to group molecules of similar properties. This classification can then be utilized for intuitive visualization of compounds. Results In this paper, we propose a scaffold hierarchy as a result of large-scale analysis of the PubChem Compound database. The analysis not only provided insights into scaffold diversity of the PubChem Compound database, but also enables scaffold-based hierarchical visualization of user compound data sets on the background of empirical chemical space, as defined by the PubChem data, or on the background of any other user-defined data set. The visualization is performed by a web based client-server application called Scaffvis. It provides an interactive zoomable tree map visualization of data sets up to hundreds of thousands molecules. Scaffvis is free to use and its source codes have been published under an open source license.. ![]() Electronic supplementary material The online version of this article (doi:10.1186/s13321-016-0186-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jakub Velkoborsky
- Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
| | - David Hoksza
- Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
| |
Collapse
|
54
|
Klimenko K, Marcou G, Horvath D, Varnek A. Chemical Space Mapping and Structure-Activity Analysis of the ChEMBL Antiviral Compound Set. J Chem Inf Model 2016; 56:1438-54. [PMID: 27410486 DOI: 10.1021/acs.jcim.6b00192] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Curation, standardization and data fusion of the antiviral information present in the ChEMBL public database led to the definition of a robust data set, providing an association of antiviral compounds to seven broadly defined antiviral activity classes. Generative topographic mapping (GTM) subjected to evolutionary tuning was then used to produce maps of the antiviral chemical space, providing an optimal separation of compound families associated with the different antiviral classes. The ability to pinpoint the specific spots occupied (responsibility patterns) on a map by various classes of antiviral compounds opened the way for a GTM-supported search for privileged structural motifs, typical for each antiviral class. The privileged locations of antiviral classes were analyzed in order to highlight underlying privileged common structural motifs. Unlike in classical medicinal chemistry, where privileged structures are, almost always, predefined scaffolds, privileged structural motif detection based on GTM responsibility patterns has the decisive advantage of being able to automatically capture the nature ("resolution detail"-scaffold, detailed substructure, pharmacophore pattern, etc.) of the relevant structural motifs. Responsibility patterns were found to represent underlying structural motifs of various natures-from very fuzzy (groups of various "interchangeable" similar scaffolds), to the classical scenario in medicinal chemistry (underlying motif actually being the scaffold), to very precisely defined motifs (specifically substituted scaffolds).
Collapse
Affiliation(s)
- Kyrylo Klimenko
- Laboratoire de Chemoinformatique, UMR 7140 CNRS/Université de Strasbourg , 1, rue Blaise Pascal, Strasbourg 67000, France.,Department on Molecular Structure and Chemoinformatics, A.V. Bogatsky Physico-Chemical Institute of NAS of Ukraine , Lyustdorfskaya doroga, 86, Odessa 65080, Ukraine
| | - Gilles Marcou
- Laboratoire de Chemoinformatique, UMR 7140 CNRS/Université de Strasbourg , 1, rue Blaise Pascal, Strasbourg 67000, France
| | - Dragos Horvath
- Laboratoire de Chemoinformatique, UMR 7140 CNRS/Université de Strasbourg , 1, rue Blaise Pascal, Strasbourg 67000, France
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique, UMR 7140 CNRS/Université de Strasbourg , 1, rue Blaise Pascal, Strasbourg 67000, France
| |
Collapse
|
55
|
Abstract
INTRODUCTION Neural networks are becoming a very popular method for solving machine learning and artificial intelligence problems. The variety of neural network types and their application to drug discovery requires expert knowledge to choose the most appropriate approach. AREAS COVERED In this review, the authors discuss traditional and newly emerging neural network approaches to drug discovery. Their focus is on backpropagation neural networks and their variants, self-organizing maps and associated methods, and a relatively new technique, deep learning. The most important technical issues are discussed including overfitting and its prevention through regularization, ensemble and multitask modeling, model interpretation, and estimation of applicability domain. Different aspects of using neural networks in drug discovery are considered: building structure-activity models with respect to various targets; predicting drug selectivity, toxicity profiles, ADMET and physicochemical properties; characteristics of drug-delivery systems and virtual screening. EXPERT OPINION Neural networks continue to grow in importance for drug discovery. Recent developments in deep learning suggests further improvements may be gained in the analysis of large chemical data sets. It's anticipated that neural networks will be more widely used in drug discovery in the future, and applied in non-traditional areas such as drug delivery systems, biologically compatible materials, and regenerative medicine.
Collapse
Affiliation(s)
- Igor I Baskin
- a Faculty of Physics , M.V. Lomonosov Moscow State University , Moscow , Russia.,b A.M. Butlerov Institute of Chemistry , Kazan Federal University , Kazan , Russia
| | - David Winkler
- c CSIRO Manufacturing , Clayton , VIC , Australia.,d Monash Institute for Pharmaceutical Sciences , Monash University , Parkville , VIC , Australia.,e Latrobe Institute for Molecular Science , Bundoora , VIC , Australia.,f School of Chemical and Physical Sciences , Flinders University , Bedford Park , SA , Australia
| | - Igor V Tetko
- g Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) , Institute of Structural Biology , Neuherberg , Germany.,h BigChem GmbH , Neuherberg , Germany
| |
Collapse
|
56
|
Awale M, Reymond JL. Web-based 3D-visualization of the DrugBank chemical space. J Cheminform 2016; 8:25. [PMID: 27148409 PMCID: PMC4855437 DOI: 10.1186/s13321-016-0138-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Accepted: 04/27/2016] [Indexed: 12/14/2022] Open
Abstract
Background Similarly to the periodic table for elements, chemical space offers an organizing principle for representing the diversity of organic molecules, usually in the form of multi-dimensional property spaces that are subjected to dimensionality reduction methods to obtain 3D-spaces or 2D-maps suitable for visual inspection. Unfortunately, tools to look at chemical space on the internet are currently very limited. Results Herein we present webDrugCS, a web application freely available at www.gdb.unibe.ch to visualize DrugBank (www.drugbank.ca, containing over 6000 investigational and approved drugs) in five different property spaces. WebDrugCS displays 3D-clouds of color-coded grid points representing molecules, whose structural formula is displayed on mouse over with an option to link to the corresponding molecule page at the DrugBank website. The 3D-clouds are obtained by principal component analysis of high dimensional property spaces describing constitution and topology (42D molecular quantum numbers MQN), structural features (34D SMILES fingerprint SMIfp), molecular shape (20D atom pair fingerprint APfp), pharmacophores (55D atom category extended atom pair fingerprint Xfp) and substructures (1024D binary substructure fingerprint Sfp). User defined molecules can be uploaded as SMILES lists and displayed together with DrugBank. In contrast to 2D-maps where many compounds fold onto each other, these 3D-spaces have a comparable resolution to their parent high-dimensional chemical space. Conclusion To the best of our knowledge webDrugCS is the first publicly available web tool for interactive visualization and exploration of the DrugBank chemical space in 3D. WebDrugCS works on computers, tablets and phones, and facilitates the visual exploration of DrugBank to rapidly learn about the structural diversity of small molecule drugs.webDrugCS visualization of DrugBank projected in 3D MQN space color-coded by ring count, with pointer showing the drug 5-fluorouracil. ![]()
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
| |
Collapse
|
57
|
Gaspar HA, Sidorov P, Horvath D, Baskin II, Marcou G, Varnek A. Generative Topographic Mapping Approach to Chemical Space Analysis. FRONTIERS IN MOLECULAR DESIGN AND CHEMICAL INFORMATION SCIENCE - HERMAN SKOLNIK AWARD SYMPOSIUM 2015: JÜRGEN BAJORATH 2016. [DOI: 10.1021/bk-2016-1222.ch011] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Affiliation(s)
- Héléna A. Gaspar
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Pavel Sidorov
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Dragos Horvath
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Igor I. Baskin
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Gilles Marcou
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| |
Collapse
|
58
|
Sidorov P, Gaspar H, Marcou G, Varnek A, Horvath D. Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds. J Comput Aided Mol Des 2015; 29:1087-108. [PMID: 26564142 DOI: 10.1007/s10822-015-9882-z] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 11/06/2015] [Indexed: 11/30/2022]
Abstract
Intuitive, visual rendering--mapping--of high-dimensional chemical spaces (CS), is an important topic in chemoinformatics. Such maps were so far dedicated to specific compound collections--either limited series of known activities, or large, even exhaustive enumerations of molecules, but without associated property data. Typically, they were challenged to answer some classification problem with respect to those same molecules, admired for their aesthetical virtues and then forgotten--because they were set-specific constructs. This work wishes to address the question whether a general, compound set-independent map can be generated, and the claim of "universality" quantitatively justified, with respect to all the structure-activity information available so far--or, more realistically, an exploitable but significant fraction thereof. The "universal" CS map is expected to project molecules from the initial CS into a lower-dimensional space that is neighborhood behavior-compliant with respect to a large panel of ligand properties. Such map should be able to discriminate actives from inactives, or even support quantitative neighborhood-based, parameter-free property prediction (regression) models, for a wide panel of targets and target families. It should be polypharmacologically competent, without requiring any target-specific parameter fitting. This work describes an evolutionary growth procedure of such maps, based on generative topographic mapping, followed by the validation of their polypharmacological competence. Validation was achieved with respect to a maximum of exploitable structure-activity information, covering all of Homo sapiens proteins of the ChEMBL database, antiparasitic and antiviral data, etc. Five evolved maps satisfactorily solved hundreds of activity-based ligand classification challenges for targets, and even in vivo properties independent from training data. They also stood chemogenomics-related challenges, as cumulated responsibility vectors obtained by mapping of target-specific ligand collections were shown to represent validated target descriptors, complying with currently accepted target classification in biology. Therefore, they represent, in our opinion, a robust and well documented answer to the key question "What is a good CS map?"
Collapse
Affiliation(s)
- Pavel Sidorov
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France.,Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Helena Gaspar
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France
| | - Gilles Marcou
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France.,Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Dragos Horvath
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France.
| |
Collapse
|
59
|
Awale M, Reymond JL. Similarity Mapplet: Interactive Visualization of the Directory of Useful Decoys and ChEMBL in High Dimensional Chemical Spaces. J Chem Inf Model 2015. [PMID: 26207526 DOI: 10.1021/acs.jcim.5b00182] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
An Internet portal accessible at www.gdb.unibe.ch has been set up to automatically generate color-coded similarity maps of the ChEMBL database in relation to up to two sets of active compounds taken from the enhanced Directory of Useful Decoys (eDUD), a random set of molecules, or up to two sets of user-defined reference molecules. These maps visualize the relationships between the selected compounds and ChEMBL in six different high dimensional chemical spaces, namely MQN (42-D molecular quantum numbers), SMIfp (34-D SMILES fingerprint), APfp (20-D shape fingerprint), Xfp (55-D pharmacophore fingerprint), Sfp (1024-bit substructure fingerprint), and ECfp4 (1024-bit extended connectivity fingerprint). The maps are supplied in form of Java based desktop applications called "similarity mapplets" allowing interactive content browsing and linked to a "Multifingerprint Browser for ChEMBL" (also accessible directly at www.gdb.unibe.ch ) to perform nearest neighbor searches. One can obtain six similarity mapplets of ChEMBL relative to random reference compounds, 606 similarity mapplets relative to single eDUD active sets, 30,300 similarity mapplets relative to pairs of eDUD active sets, and any number of similarity mapplets relative to user-defined reference sets to help visualize the structural diversity of compound series in drug optimization projects and their relationship to other known bioactive compounds.
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|
60
|
Osolodkin DI, Radchenko EV, Orlov AA, Voronkov AE, Palyulin VA, Zefirov NS. Progress in visual representations of chemical space. Expert Opin Drug Discov 2015; 10:959-73. [DOI: 10.1517/17460441.2015.1060216] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
61
|
Clark AM, Ekins S. Open Source Bayesian Models. 2. Mining a "Big Dataset" To Create and Validate Models with ChEMBL. J Chem Inf Model 2015; 55:1246-60. [PMID: 25995041 DOI: 10.1021/acs.jcim.5b00144] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
In an associated paper, we have described a reference implementation of Laplacian-corrected naïve Bayesian model building using extended connectivity (ECFP)- and molecular function class fingerprints of maximum diameter 6 (FCFP)-type fingerprints. As a follow-up, we have now undertaken a large-scale validation study in order to ensure that the technique generalizes to a broad variety of drug discovery datasets. To achieve this, we have used the ChEMBL (version 20) database and split it into more than 2000 separate datasets, each of which consists of compounds and measurements with the same target and activity measurement. In order to test these datasets with the two-state Bayesian classification, we developed an automated algorithm for detecting a suitable threshold for active/inactive designation, which we applied to all collections. With these datasets, we were able to establish that our Bayesian model implementation is effective for the large majority of cases, and we were able to quantify the impact of fingerprint folding on the receiver operator curve cross-validation metrics. We were also able to study the impact that the choice of training/testing set partitioning has on the resulting recall rates. The datasets have been made publicly available to be downloaded, along with the corresponding model data files, which can be used in conjunction with the CDK and several mobile apps. We have also explored some novel visualization methods which leverage the structural origins of the ECFP/FCFP fingerprints to attribute regions of a molecule responsible for positive and negative contributions to activity. The ability to score molecules across thousands of relevant datasets across organisms also may help to access desirable and undesirable off-target effects as well as suggest potential targets for compounds derived from phenotypic screens.
Collapse
Affiliation(s)
- Alex M Clark
- †Molecular Materials Informatics, Inc., 1900 St. Jacques No. 302, Montreal H3J 2S1, Quebec, Canada
| | - Sean Ekins
- ‡Collaborations Pharmaceuticals, Inc., 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States.,§Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States.,∥Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| |
Collapse
|