1
|
Vogt M. Chemoinformatic approaches for navigating large chemical spaces. Expert Opin Drug Discov 2024; 19:403-414. [PMID: 38300511 DOI: 10.1080/17460441.2024.2313475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/30/2024] [Indexed: 02/02/2024]
Abstract
INTRODUCTION Large chemical spaces (CSs) include traditional large compound collections, combinatorial libraries covering billions to trillions of molecules, DNA-encoded chemical libraries comprising complete combinatorial CSs in a single mixture, and virtual CSs explored by generative models. The diverse nature of these types of CSs require different chemoinformatic approaches for navigation. AREAS COVERED An overview of different types of large CSs is provided. Molecular representations and similarity metrics suitable for large CS exploration are discussed. A summary of navigation of CSs in generative models is provided. Methods for characterizing and comparing CSs are discussed. EXPERT OPINION The size of large CSs might restrict navigation to specialized algorithms and limit it to considering neighborhoods of structurally similar molecules. Efficient navigation of large CSs not only requires methods that scale with size but also requires smart approaches that focus on better but not necessarily larger molecule selections. Deep generative models aim to provide such approaches by implicitly learning features relevant for targeted biological properties. It is unclear whether these models can fulfill this ideal as validation is difficult as long as the covered CSs remain mainly virtual without experimental verification.
Collapse
Affiliation(s)
- Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
| |
Collapse
|
2
|
Nicolle A, Deng S, Ihme M, Kuzhagaliyeva N, Ibrahim EA, Farooq A. Mixtures Recomposition by Neural Nets: A Multidisciplinary Overview. J Chem Inf Model 2024; 64:597-620. [PMID: 38284618 DOI: 10.1021/acs.jcim.3c01633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]
Abstract
Artificial Neural Networks (ANNs) are transforming how we understand chemical mixtures, providing an expressive view of the chemical space and multiscale processes. Their hybridization with physical knowledge can bridge the gap between predictivity and understanding of the underlying processes. This overview explores recent progress in ANNs, particularly their potential in the 'recomposition' of chemical mixtures. Graph-based representations reveal patterns among mixture components, and deep learning models excel in capturing complexity and symmetries when compared to traditional Quantitative Structure-Property Relationship models. Key components, such as Hamiltonian networks and convolution operations, play a central role in representing multiscale mixtures. The integration of ANNs with Chemical Reaction Networks and Physics-Informed Neural Networks for inverse chemical kinetic problems is also examined. The combination of sensors with ANNs shows promise in optical and biomimetic applications. A common ground is identified in the context of statistical physics, where ANN-based methods iteratively adapt their models by blending their initial states with training data. The concept of mixture recomposition unveils a reciprocal inspiration between ANNs and reactive mixtures, highlighting learning behaviors influenced by the training environment.
Collapse
Affiliation(s)
- Andre Nicolle
- Aramco Fuel Research Center, Rueil-Malmaison 92852, France
| | - Sili Deng
- Massachusetts Institute of Technology, Cambridge 02139, Massachusetts, United States
| | - Matthias Ihme
- Stanford University, Stanford 94305, California, United States
| | | | - Emad Al Ibrahim
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Aamir Farooq
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| |
Collapse
|
3
|
Zhang W, Zhang K, Huang J. A Simple Way to Incorporate Target Structural Information in Molecular Generative Models. J Chem Inf Model 2023. [PMID: 37318828 DOI: 10.1021/acs.jcim.3c00293] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Deep learning generative models are now being applied in various fields including drug discovery. In this work, we propose a novel approach to include target 3D structural information in molecular generative models for structure-based drug design. The method combines a message-passing neural network model that predicts docking scores with a generative neural network model as its reward function to navigate the chemical space searching for molecules that bind favorably with a specific target. A key feature of the method is the construction of target-specific molecular sets for training, designed to overcome potential transferability issues of surrogate docking models through a two-round training process. Consequently, this enables accurate guided exploration of the chemical space without reliance on the collection of prior knowledge about active and inactive compounds for the specific target. Tests on eight target proteins showed a 100-fold increase in hit generation compared to conventional docking calculations and the ability to generate molecules similar to approved drugs or known active ligands for specific targets without prior knowledge. This method provides a general and highly efficient solution for structure-based molecular generation.
Collapse
Affiliation(s)
- Wenyi Zhang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Kaiyue Zhang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Jing Huang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
- Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| |
Collapse
|
4
|
Dang HT, Nguyen VD, Haug GC, Arman HD, Larionov OV. Decarboxylative Triazolation Enables Direct Construction of Triazoles from Carboxylic Acids. JACS AU 2023; 3:813-822. [PMID: 37006773 PMCID: PMC10052276 DOI: 10.1021/jacsau.2c00606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 02/07/2023] [Accepted: 02/08/2023] [Indexed: 06/19/2023]
Abstract
Triazoles have major roles in chemistry, medicine, and materials science, as centrally important heterocyclic motifs and bioisosteric replacements for amides, carboxylic acids, and other carbonyl groups, as well as some of the most widely used linkers in click chemistry. Yet, the chemical space and molecular diversity of triazoles remains limited by the accessibility of synthetically challenging organoazides, thereby requiring preinstallation of the azide precursors and restricting triazole applications. We report herein a photocatalytic, tricomponent decarboxylative triazolation reaction that for the first time enables direct conversion of carboxylic acids to triazoles in a single-step, triple catalytic coupling with alkynes and a simple azide reagent. Data-guided inquiry of the accessible chemical space of decarboxylative triazolation indicates that the transformation can improve access to the structural diversity and molecular complexity of triazoles. Experimental studies demonstrate a broad scope of the synthetic method that includes a variety of carboxylic acid, polymer, and peptide substrates. When performed in the absence of alkynes, the reaction can also be used to access organoazides, thereby obviating preactivation and specialized azide reagents and providing a two-pronged approach to C-N bond-forming decarboxylative functional group interconversions.
Collapse
|
5
|
Zabolotna Y, Bonachera F, Horvath D, Lin A, Marcou G, Klimchuk O, Varnek A. Chemspace Atlas: Multiscale Chemography of Ultralarge Libraries for Drug Discovery. J Chem Inf Model 2022; 62:4537-4548. [DOI: 10.1021/acs.jcim.2c00509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Yuliana Zabolotna
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Fanny Bonachera
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Dragos Horvath
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Arkadii Lin
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Gilles Marcou
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Olga Klimchuk
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Alexandre Varnek
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| |
Collapse
|
6
|
Yoshimori A, Bajorath J. Computational method for the systematic alignment of analogue series with structure-activity relationship transfer potential across different targets. Eur J Med Chem 2022; 239:114558. [PMID: 35763865 DOI: 10.1016/j.ejmech.2022.114558] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/10/2022] [Accepted: 06/18/2022] [Indexed: 11/17/2022]
Abstract
Lead optimization focuses on the generation of analogue series (ASs) with sustainable structure-activity relationship (SAR) progression. If roadblocks are encountered during multi-property optimization, it is often desirable to replace an AS with another containing a different core structure but having similar SAR characteristics for a given target. This process represents target-based SAR transfer. A previously unexplored question is to what extent AS-based SAR transfer events might also occur across different targets. To address this question, we have developed and applied a new computational approach to systematically search for ASs with SAR transfer potential and align qualifying series in a chemically intuitive way. The methodology relies on fragment similarity scoring in combination with dynamic programming. Our large-scale analysis has revealed that SAR transfer events across different targets are more frequently observed than one might expect, providing many opportunities for the design of new SAR transfer analogues for evolving series.
Collapse
Affiliation(s)
- Atsushi Yoshimori
- Institute for Theoretical Medicine, Inc., 26-1 Muraoka-Higashi 2-chome, Fujisawa, Kanagawa, 251-0012, Japan
| | - Jürgen Bajorath
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany.
| |
Collapse
|
7
|
Warr WA, Nicklaus MC, Nicolaou CA, Rarey M. Exploration of Ultralarge Compound Collections for Drug Discovery. J Chem Inf Model 2022; 62:2021-2034. [PMID: 35421301 DOI: 10.1021/acs.jcim.2c00224] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Designing new medicines more cheaply and quickly is tightly linked to the quest of exploring chemical space more widely and efficiently. Chemical space is monumentally large, but recent advances in computer software and hardware have enabled researchers to navigate virtual chemical spaces containing billions of chemical structures. This review specifically concerns collections of many millions or even billions of enumerated chemical structures as well as even larger chemical spaces that are not fully enumerated. We present examples of chemical libraries and spaces and the means used to construct them, and we discuss new technologies for searching huge libraries and for searching combinatorially in chemical space. We also cover space navigation techniques and consider new approaches to de novo drug design and the impact of the "autonomous laboratory" on synthesis of designed compounds. Finally, we summarize some other challenges and opportunities for the future.
Collapse
Affiliation(s)
- Wendy A Warr
- Wendy Warr & Associates, 6 Berwick Court, Holmes Chapel, Crewe, Cheshire CW4 7HZ, United Kingdom
| | - Marc C Nicklaus
- NCI, NIH, CADD Group, NCI-Frederick, Frederick, Maryland 21702, United States
| | - Christos A Nicolaou
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Matthias Rarey
- Universität Hamburg, ZBH Center for Bioinformatics, 20146 Hamburg, Germany
| |
Collapse
|
8
|
Humer C, Heberle H, Montanari F, Wolf T, Huber F, Henderson R, Heinrich J, Streit M. ChemInformatics Model Explorer (CIME): exploratory analysis of chemical model explanations. J Cheminform 2022; 14:21. [PMID: 35379315 PMCID: PMC8981840 DOI: 10.1186/s13321-022-00600-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 03/12/2022] [Indexed: 11/10/2022] Open
Abstract
The introduction of machine learning to small molecule research- an inherently multidisciplinary field in which chemists and data scientists combine their expertise and collaborate - has been vital to making screening processes more efficient. In recent years, numerous models that predict pharmacokinetic properties or bioactivity have been published, and these are used on a daily basis by chemists to make decisions and prioritize ideas. The emerging field of explainable artificial intelligence is opening up new possibilities for understanding the reasoning that underlies a model. In small molecule research, this means relating contributions of substructures of compounds to their predicted properties, which in turn also allows the areas of the compounds that have the greatest influence on the outcome to be identified. However, there is no interactive visualization tool that facilitates such interdisciplinary collaborations towards interpretability of machine learning models for small molecules. To fill this gap, we present CIME (ChemInformatics Model Explorer), an interactive web-based system that allows users to inspect chemical data sets, visualize model explanations, compare interpretability techniques, and explore subgroups of compounds. The tool is model-agnostic and can be run on a server or a workstation.
Collapse
Affiliation(s)
| | - Henry Heberle
- Division Crop Science, Bayer AG, 40789, Monheim am Rhein, DE, Germany.
| | | | - Thomas Wolf
- Division Crop Science, Bayer AG, 65926, Frankfurt, DE, Germany
| | - Florian Huber
- Division Crop Science, Bayer AG, 65926, Frankfurt, DE, Germany
| | - Ryan Henderson
- Digital Technologies, Bayer AG, 13353, Berlin, DE, Germany
| | - Julian Heinrich
- Division Crop Science, Bayer AG, 40789, Monheim am Rhein, DE, Germany.
| | - Marc Streit
- Johannes Kepler University Linz, Linz, Austria.
| |
Collapse
|
9
|
Yang T, Li Z, Chen Y, Feng D, Wang G, Fu Z, Ding X, Tan X, Zhao J, Luo X, Chen K, Jiang H, Zheng M. DrugSpaceX: a large screenable and synthetically tractable database extending drug space. Nucleic Acids Res 2021; 49:D1170-D1178. [PMID: 33104791 PMCID: PMC7778939 DOI: 10.1093/nar/gkaa920] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 09/11/2020] [Accepted: 10/05/2020] [Indexed: 02/07/2023] Open
Abstract
One of the most prominent topics in drug discovery is efficient exploration of the vast drug-like chemical space to find synthesizable and novel chemical structures with desired biological properties. To address this challenge, we created the DrugSpaceX (https://drugspacex.simm.ac.cn/) database based on expert-defined transformations of approved drug molecules. The current version of DrugSpaceX contains >100 million transformed chemical products for virtual screening, with outstanding characteristics in terms of structural novelty, diversity and large three-dimensional chemical space coverage. To illustrate its practical application in drug discovery, we used a case study of discoidin domain receptor 1 (DDR1), a kinase target implicated in fibrosis and other diseases, to show DrugSpaceX performing a quick search of initial hit compounds. Additionally, for ligand identification and optimization purposes, DrugSpaceX also provides several subsets for download, including a 10% diversity subset, an extended drug-like subset, a drug-like subset, a lead-like subset, and a fragment-like subset. In addition to chemical properties and transformation instructions, DrugSpaceX can locate the position of transformation, which will enable medicinal chemists to easily integrate strategy planning and protection design.
Collapse
Affiliation(s)
- Tianbiao Yang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China
| | - Zhaojun Li
- School of Information Management, Dezhou University, No. 566 University Rd. West, Dezhou 253023, Shandong, China
| | - Yingjia Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Dan Feng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department of Chemistry, College of Sciences, Shanghai University, Shanghai, China
| | - Guangchao Wang
- School of Information Management, Dezhou University, No. 566 University Rd. West, Dezhou 253023, Shandong, China
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing 210023, China
| | - Xiaoyu Ding
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Xiaoqin Tan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Jihui Zhao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China
- School of Life Science and Technology, ShanghaiTech University, 393 Huaxiazhong Road, Shanghai 200031, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- Department of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| |
Collapse
|
10
|
Zhao L, Ciallella HL, Aleksunes LM, Zhu H. Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling. Drug Discov Today 2020; 25:1624-1638. [PMID: 32663517 PMCID: PMC7572559 DOI: 10.1016/j.drudis.2020.07.005] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 06/26/2020] [Accepted: 07/06/2020] [Indexed: 02/06/2023]
Abstract
Advancing a new drug to market requires substantial investments in time as well as financial resources. Crucial bioactivities for drug candidates, including their efficacy, pharmacokinetics (PK), and adverse effects, need to be investigated during drug development. With advancements in chemical synthesis and biological screening technologies over the past decade, a large amount of biological data points for millions of small molecules have been generated and are stored in various databases. These accumulated data, combined with new machine learning (ML) approaches, such as deep learning, have shown great potential to provide insights into relevant chemical structures to predict in vitro, in vivo, and clinical outcomes, thereby advancing drug discovery and development in the big data era.
Collapse
Affiliation(s)
- Linlin Zhao
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA
| | - Heather L Ciallella
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ 08854, USA
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA; Department of Chemistry, Rutgers University, Camden, NJ 08102, USA.
| |
Collapse
|
11
|
Probst D, Reymond JL. Visualization of very large high-dimensional data sets as minimum spanning trees. J Cheminform 2020; 12:12. [PMID: 33431043 PMCID: PMC7015965 DOI: 10.1186/s13321-020-0416-x] [Citation(s) in RCA: 116] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 02/04/2020] [Indexed: 01/10/2023] Open
Abstract
The chemical sciences are producing an unprecedented amount of large, high-dimensional data sets containing chemical structures and associated properties. However, there are currently no algorithms to visualize such data while preserving both global and local features with a sufficient level of detail to allow for human inspection and interpretation. Here, we propose a solution to this problem with a new data visualization method, TMAP, capable of representing data sets of up to millions of data points and arbitrary high dimensionality as a two-dimensional tree (http://tmap.gdb.tools). Visualizations based on TMAP are better suited than t-SNE or UMAP for the exploration and interpretation of large data sets due to their tree-like nature, increased local and global neighborhood and structure preservation, and the transparency of the methods the algorithm is based on. We apply TMAP to the most used chemistry data sets including databases of molecules such as ChEMBL, FDB17, the Natural Products Atlas, DSSTox, as well as to the MoleculeNet benchmark collection of data sets. We also show its broad applicability with further examples from biology, particle physics, and literature.![]()
Collapse
Affiliation(s)
- Daniel Probst
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| |
Collapse
|
12
|
Kausar S, Falcao AO. A visual approach for analysis and inference of molecular activity spaces. J Cheminform 2019; 11:63. [PMID: 33430986 PMCID: PMC6805449 DOI: 10.1186/s13321-019-0386-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 10/05/2019] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Molecular space visualization can help to explore the diversity of large heterogeneous chemical data, which ultimately may increase the understanding of structure-activity relationships (SAR) in drug discovery projects. Visual SAR analysis can therefore be useful for library design, chemical classification for their biological evaluation and virtual screening for the selection of compounds for synthesis or in vitro testing. As such, computational approaches for molecular space visualization have become an important issue in cheminformatics research. The proposed approach uses molecular similarity as the sole input for computing a probabilistic surface of molecular activity (PSMA). This similarity matrix is transformed in 2D using different dimension reduction algorithms (Principal Coordinates Analysis ( PCooA), Kruskal multidimensional scaling, Sammon mapping and t-SNE). From this projection, a kernel density function is applied to compute the probability of activity for each coordinate in the new projected space. RESULTS This methodology was tested over four different quantitative structure-activity relationship (QSAR) binary classification data sets and the PSMAs were computed for each. The generated maps showed internal consistency with active molecules grouped together for all data sets and all dimensionality reduction algorithms. To validate the quality of the generated maps, the 2D coordinates of test molecules were computed into the new reference space using a data transformation matrix. In total sixteen PSMAs were built, and their performance was assessed using the Area Under Curve (AUC) and the Matthews Coefficient Correlation (MCC). For the best projections for each data set, AUC testing results ranged from 0.87 to 0.98 and the MCC scores ranged from 0.33 to 0.77, suggesting this methodology can validly capture the complexities of the molecular activity space. All four mapping functions provided generally good results yet the overall performance of PCooA and t-SNE was slightly better than Sammon mapping and Kruskal multidimensional scaling. CONCLUSIONS Our result showed that by using an appropriate combination of metric space representation and dimensionality reduction applied over metric spaces it is possible to produce a visual PSMA for which its consistency has been validated by using this map as a classification model. The produced maps can be used as prediction tools as it is simple to project any molecule into this new reference space as long as the similarities to the molecules used to compute the initial similarity matrix can be computed.
Collapse
Affiliation(s)
- Samina Kausar
- LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
- BioISI: Biosystems & Integrative Sciences Institute, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| | - Andre O. Falcao
- LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
- BioISI: Biosystems & Integrative Sciences Institute, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| |
Collapse
|
13
|
Khodabakhshi-Javinani D, Ebrahim-Habibi A, Afshar M, Navidpour L. Virtual Screening of Henna Compounds Library for Discovery of New Leads against Human Thymidine Phosphorylase, an Overexpressed Factor of Hand-Foot Syndrome. LETT DRUG DES DISCOV 2019. [DOI: 10.2174/1570180815666180816123233] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Background:
Capecitabine is one of the most effective and successful drugs for the
treatment of uterine and colorectal cancer which has been limited in use due to occurrence of handfoot
syndrome (HFS). Overexpression of human thymidine phosphorylase enzyme is predicted to be
one of the main causes of this syndrome. Thymidine phosphorylase enzyme is involved in many
cancers and inflammatory diseases and pyrimidine nucleoside phosphorylase family is found in a
variety of organisms. Results of clinical studies have shown that topical usage of henna plant
(Lawsonia inermis from the family of Lythraceae) could reduce the severity of HFS.
Methods:
By using in silico methods on reported compounds of henna, the present study is aimed at
finding phytochemicals and chemical groups with the potential to efficiently interact with and inhibit
human thymidine phosphorylase. Various compounds (825) of henna from different chemical groups
(138) were virtually screened by the interface to AutoDock in YASARA Software package, against the
enzyme structure obtained from X-ray crystallography and refined by homology modeling methods.
Results:
By virtual screening, i.e. docking of candidate ligands into the determined active site of hTP,
followed by applying the scoring function of binding affinity, 71 compounds (out of 825 compounds)
were estimated to have the likelihood to bind to the protein with an interaction energy higher than 10
kcal/mol (Concerning the sign of “binding energies”, please refer to the Methods section).
Conclusion:
Finally, diosmetin-3'-O-β-D-glucopyranoside (#219) and monoglycosylated naphthalene
were respectively selected as the most potent phytochemicals and chemical groups. Flavonoid-like
compounds with appropriate interaction energy were also considered as the most probable inhibitors.
More investigations on henna compounds, are needed in order to approve their effectiveness and also
to explore more anti-cancer, anti-inflammatory, anti-angiogenesis and even antibiotics.
Collapse
Affiliation(s)
- Davood Khodabakhshi-Javinani
- Department of Medicinal Chemistry, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran 14176, Iran
| | - Azadeh Ebrahim-Habibi
- Biosensor Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Minoo Afshar
- Department of pharmaceutics, Faculty of Pharmacy and Pharmaceutical Sciences, Tehran Medical Sciences, Islamic Azad University, Tehran 193956466, Iran
| | - Latifeh Navidpour
- Department of Medicinal Chemistry, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran 14176, Iran
| |
Collapse
|
14
|
Capecchi A, Awale M, Probst D, Reymond JL. PubChem and ChEMBL beyond Lipinski. Mol Inform 2019; 38:e1900016. [PMID: 30844149 DOI: 10.1002/minf.201900016] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 02/18/2019] [Indexed: 12/13/2022]
Abstract
Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also found in the ChEMBL database. These non-Lipinski PubChem (NLP) and ChEMBL (NLC) subsets are interesting because they contain new modalities that can display biological properties not accessible to small molecule drugs. Unfortunately, the current search tools in PubChem and ChEMBL are designed for small molecules and are not well suited to explore these subsets, which therefore remain poorly appreciated. Herein we report MXFP (macromolecule extended atom-pair fingerprint), a 217-D fingerprint tailored to analyze large molecules in terms of molecular shape and pharmacophores. We implement MXFP in two web-based applications, the first one to visualize NLP and NLC interactively using Faerun (http://faerun.gdb.tools/), the second one to perform MXFP nearest neighbor searches in NLP and NLC (http://similaritysearch.gdb.tools/). We show that these tools provide a meaningful insight into the diversity of large molecules in NLP and NLC. The interactive tools presented here are publicly available at http://gdb.unibe.ch and can be used freely to explore and better understand the diversity of non-Lipinski molecules in PubChem and ChEMBL.
Collapse
Affiliation(s)
- Alice Capecchi
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Daniel Probst
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| |
Collapse
|
15
|
The next level in chemical space navigation: going far beyond enumerable compound libraries. Drug Discov Today 2019; 24:1148-1156. [PMID: 30851414 DOI: 10.1016/j.drudis.2019.02.013] [Citation(s) in RCA: 127] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Revised: 02/01/2019] [Accepted: 02/28/2019] [Indexed: 10/27/2022]
Abstract
Recent innovations have brought pharmacophore-driven methods for navigating virtual chemical spaces, the size of which can reach into the billions of molecules, to the fingertips of every chemist. There has been a paradigm shift in the underlying computational chemistry that drives chemical space search applications, incorporating intelligent reaction knowledge into their core so that they can readily deliver commercially available molecules as nearest neighbor hits from within giant virtual spaces. These vast resources enable medicinal chemists to execute rapid scaffold-hopping experiments, rapid hit expansion, and structure-activity relationship (SAR) exploitation in largely intellectual property (IP)-free territory and at unparalleled low cost.
Collapse
|
16
|
Probst D, Reymond JL. A probabilistic molecular fingerprint for big data settings. J Cheminform 2018; 10:66. [PMID: 30564943 PMCID: PMC6755601 DOI: 10.1186/s13321-018-0321-8] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Accepted: 12/13/2018] [Indexed: 11/10/2022] Open
Abstract
Background Among the various molecular fingerprints available to describe small organic molecules, extended connectivity fingerprint, up to four bonds (ECFP4) performs best in benchmarking drug analog recovery studies as it encodes substructures with a high level of detail. Unfortunately, ECFP4 requires high dimensional representations (≥ 1024D) to perform well, resulting in ECFP4 nearest neighbor searches in very large databases such as GDB, PubChem or ZINC to perform very slowly due to the curse of dimensionality. Results Herein we report a new fingerprint, called MinHash fingerprint, up to six bonds (MHFP6), which encodes detailed substructures using the extended connectivity principle of ECFP in a fundamentally different manner, increasing the performance of exact nearest neighbor searches in benchmarking studies and enabling the application of locality sensitive hashing (LSH) approximate nearest neighbor search algorithms. To describe a molecule, MHFP6 extracts the SMILES of all circular substructures around each atom up to a diameter of six bonds and applies the MinHash method to the resulting set. MHFP6 outperforms ECFP4 in benchmarking analog recovery studies. By leveraging locality sensitive hashing, LSH approximate nearest neighbor search methods perform as well on unfolded MHFP6 as comparable methods do on folded ECFP4 fingerprints in terms of speed and relative recovery rate, while operating in very sparse and high-dimensional binary chemical space. Conclusion MHFP6 is a new molecular fingerprint, encoding circular substructures, which outperforms ECFP4 for analog searches while allowing the direct application of locality sensitive hashing algorithms. It should be well suited for the analysis of large databases. The source code for MHFP6 is available on GitHub (https://github.com/reymond-group/mhfp).![]() Electronic supplementary material The online version of this article (10.1186/s13321-018-0321-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Daniel Probst
- Department of Chemistry and Biochemistry, National Center for Competence in Research NCCR TransCure, University of Berne, Freiestrasse 3, 3012, Bern, Switzerland.
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center for Competence in Research NCCR TransCure, University of Berne, Freiestrasse 3, 3012, Bern, Switzerland
| |
Collapse
|
17
|
Dittrich J, Schmidt D, Pfleger C, Gohlke H. Converging a Knowledge-Based Scoring Function: DrugScore2018. J Chem Inf Model 2018; 59:509-521. [DOI: 10.1021/acs.jcim.8b00582] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Jonas Dittrich
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Denis Schmidt
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Christopher Pfleger
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC) & Institute for Complex Systems−Structural Biochemistry (ICS-6), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
18
|
Abstract
The recent general availability of low-cost virtual reality headsets and accompanying three-dimensional (3D) engine support presents an opportunity to bring the concept of chemical space into virtual environments. While virtual reality applications represent a category of widespread tools in other fields, their use in the visualization and exploration of abstract data such as chemical spaces has been experimental. In our previous work, we established the concept of interactive two-dimensional (2D) maps of chemical spaces followed by interactive web-based 3D visualizations, culminating in the interactive web-based 3D visualization of extremely large chemical spaces. Virtual reality chemical spaces are a natural extension of these concepts. As 2D and 3D embeddings and projections of high-dimensional chemical fingerprint spaces have been shown to be valuable tools in chemical space visualization and exploration, existing pipelines of data mining and preparation can be extended to be used in virtual reality applications. Here we present an application based on the Unity engine and the Virtual Reality Toolkit, allowing for the interactive exploration of chemical space populated by DrugBank compounds in virtual reality. The source code of the application as well as the most recent build are available on GitHub ( https://github.com/reymond-group/virtual-reality-chemical-space ).
Collapse
Affiliation(s)
- Daniel Probst
- Department of Chemistry and Biochemistry, National Center for Competence in Research NCCR TransCure , University of Berne , Freiestrasse 3 , 3012 Berne , Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center for Competence in Research NCCR TransCure , University of Berne , Freiestrasse 3 , 3012 Berne , Switzerland
| |
Collapse
|
19
|
Probst D, Reymond JL. SmilesDrawer: Parsing and Drawing SMILES-Encoded Molecular Structures Using Client-Side JavaScript. J Chem Inf Model 2018; 58:1-7. [PMID: 29257869 DOI: 10.1021/acs.jcim.7b00425] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Here we present SmilesDrawer, a dependency-free JavaScript component capable of both parsing and drawing SMILES-encoded molecular structures client-side, developed to be easily integrated into web projects and to display organic molecules in large numbers and fast succession. SmilesDrawer can draw structurally and stereochemically complex structures such as maitotoxin and C60 without using templates, yet has an exceptionally small computational footprint and low memory usage without the requirement for loading images or any other form of client-server communication, making it easy to integrate even in secure (intranet, firewalled) or offline applications. These features allow the rendering of thousands of molecular structure drawings on a single web page within seconds on a wide range of hardware supporting modern browsers. The source code as well as the most recent build of SmilesDrawer is available on Github ( http://doc.gdb.tools/smilesDrawer/ ). Both yarn and npm packages are also available.
Collapse
Affiliation(s)
- Daniel Probst
- Department of Chemistry and Biochemistry, National Center for Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center for Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|
20
|
Visini R, Arús-Pous J, Awale M, Reymond JL. Virtual Exploration of the Ring Systems Chemical Universe. J Chem Inf Model 2017; 57:2707-2718. [PMID: 29019686 DOI: 10.1021/acs.jcim.7b00457] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Here, we explore the chemical space of all virtually possible organic molecules focusing on ring systems, which represent the cyclic cores of organic molecules obtained by removing all acyclic bonds and converting all remaining atoms to carbon. This approach circumvents the combinatorial explosion encountered when enumerating the molecules themselves. We report the chemical universe database GDB4c containing 916 130 ring systems up to four saturated or aromatic rings and maximum ring size of 14 atoms and GDB4c3D containing the corresponding 6 555 929 stereoisomers. Almost all (98.6%) of these ring systems are unknown and represent chiral 3D-shaped macrocycles containing small rings and quaternary centers reminiscent of polycyclic natural products. We envision that GDB4c can serve to select new ring systems from which to design analogs of such natural products. The database is available for download at www.gdb.unibe.ch together with interactive visualization and search tools as a resource for molecular design.
Collapse
Affiliation(s)
- Ricardo Visini
- Department of Chemistry and Biochemistry, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Josep Arús-Pous
- Department of Chemistry and Biochemistry, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|
21
|
Sebastián-Pérez V, Roca C, Awale M, Reymond JL, Martinez A, Gil C, Campillo NE. Medicinal and Biological Chemistry (MBC) Library: An Efficient Source of New Hits. J Chem Inf Model 2017; 57:2143-2151. [PMID: 28813151 DOI: 10.1021/acs.jcim.7b00401] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Identification of new hits is one of the biggest challenges in drug discovery. Creating a library of well-characterized drug-like compounds is a key step in this process. Our group has developed an in-house chemical library called the Medicinal and Biological Chemistry (MBC) library. This collection has been successfully used to start several medicinal chemistry programs and developed in an accumulation of more than 30 years of experience in drug design and discovery of new drugs for unmet diseases. It contains over 1000 compounds, mainly heterocyclic scaffolds. In this work, analysis of drug-like properties and comparative study with well-known libraries by using different computer software are presented here.
Collapse
Affiliation(s)
- Víctor Sebastián-Pérez
- Centro de Investigaciones Biológicas (CIB, CSIC) , Ramiro de Maeztu 9, 28040 Madrid, Spain
| | - Carlos Roca
- Centro de Investigaciones Biológicas (CIB, CSIC) , Ramiro de Maeztu 9, 28040 Madrid, Spain
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Bern , Freiestrasse 3, 3012 Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern , Freiestrasse 3, 3012 Bern, Switzerland
| | - Ana Martinez
- Centro de Investigaciones Biológicas (CIB, CSIC) , Ramiro de Maeztu 9, 28040 Madrid, Spain
| | - Carmen Gil
- Centro de Investigaciones Biológicas (CIB, CSIC) , Ramiro de Maeztu 9, 28040 Madrid, Spain
| | - Nuria E Campillo
- Centro de Investigaciones Biológicas (CIB, CSIC) , Ramiro de Maeztu 9, 28040 Madrid, Spain
| |
Collapse
|
22
|
Abstract
To better understand chemical space we recently enumerated the database GDB-17 containing 166.4 billion possible molecules up to 17 atoms of C, N, O, S and halogen following the simple rules of chemical stability and synthetic feasibility. However, due to the combinatorial explosion caused by systematic enumeration GDB-17 is strongly biased toward the largest, functionally and stereochemically most complex molecules and far too large for most virtual screening tools. Herein we selected a much smaller subset of GDB-17, called the fragment database FDB-17, which contains 10 million fragmentlike molecules evenly covering a broad value range for molecular size, polarity, and stereochemical complexity. The database is available at www.gdb.unibe.ch for download and free use, together with an interactive visualization application and a Web-based nearest neighbor search tool to facilitate the selection of new fragment-sized molecules for chemical synthesis.
Collapse
Affiliation(s)
- Ricardo Visini
- Department of Chemistry and Biochemistry, University of Bern , Freiestrasse 3, 3012 Berne, Switzerland
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Bern , Freiestrasse 3, 3012 Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern , Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|
23
|
Awale M, Probst D, Reymond JL. WebMolCS: A Web-Based Interface for Visualizing Molecules in Three-Dimensional Chemical Spaces. J Chem Inf Model 2017; 57:643-649. [PMID: 28316236 DOI: 10.1021/acs.jcim.6b00690] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The concept of chemical space provides a convenient framework to analyze large collections of molecules by placing them in property spaces where distances represent similarities. Here we report webMolCS, a new type of web-based interface visualizing up to 5000 user-defined molecules in six different three-dimensional (3D) chemical spaces obtained by principal component analysis or similarity mapping of multidimensional property spaces describing composition (MQN: 42D molecular quantum numbers, SMIfp: 34D SMILES fingerprint), shapes and pharmacophores (APfp: 20D atom pair fingerprint, Xfp: 55D category extended atom pair fingerprint), and substructures (Sfp: 1024D binary substructure fingerprint, ECfp4:1024D extended connectivity fingerprint). Each molecule is shown as a sphere, and its structure appears on mouse over. The sphere is color-coded by similarity to the first compound in the list, by the list rank, or by a user-defined value, which reveals the relationship between any property encoded by these values and structural similarities. WebMolCS is freely available at www.gdb.unibe.ch .
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Daniel Probst
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|
24
|
Lu J, Carlson HA. ChemTreeMap: an interactive map of biochemical similarity in molecular datasets. Bioinformatics 2016; 32:3584-3592. [PMID: 27515740 DOI: 10.1093/bioinformatics/btw523] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Revised: 07/18/2016] [Accepted: 08/07/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION What if you could explain complex chemistry in a simple tree and share that data online with your collaborators? Computational biology often incorporates diverse chemical data to probe a biological question, but the existing tools for chemical data are ill-suited for the very large datasets inherent to bioinformatics. Furthermore, existing visualization methods often require an expert chemist to interpret the patterns. Biologists need an interactive tool for visualizing chemical information in an intuitive, accessible way that facilitates its integration into today's team-based biological research. RESULTS ChemTreeMap is an interactive, bioinformatics tool designed to explore chemical space and mine the relationships between chemical structure, molecular properties, and biological activity. ChemTreeMap synergistically combines extended connectivity fingerprints and a neighbor-joining algorithm to produce a hierarchical tree with branch lengths proportional to molecular similarity. Compound properties are shown by leaf color, size and outline to yield a user-defined visualization of the tree. Two representative analyses are included to demonstrate ChemTreeMap's capabilities and utility: assessing dataset overlap and mining structure-activity relationships. AVAILABILITY AND IMPLEMENTATION The examples from this paper may be accessed at http://ajing.github.io/ChemTreeMap/ Code for the server and client are available in the Supplementary Information, at the aforementioned github site, and on Docker Hub (https://hub.docker.com) with the nametag ajing/chemtreemap. CONTACT carlsonh@umich.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jing Lu
- Department of Computational Medicine and Bioinformatics
| | - Heather A Carlson
- Department of Computational Medicine and Bioinformatics.,Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
25
|
Nicolaou CA, Watson IA, Hu H, Wang J. The Proximal Lilly Collection: Mapping, Exploring and Exploiting Feasible Chemical Space. J Chem Inf Model 2016; 56:1253-66. [DOI: 10.1021/acs.jcim.6b00173] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Christos A. Nicolaou
- Discovery Chemistry, Lilly
Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Ian A. Watson
- Discovery Chemistry, Lilly
Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Hong Hu
- Discovery Chemistry, Lilly
Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Jibo Wang
- Discovery Chemistry, Lilly
Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| |
Collapse
|
26
|
Awale M, Reymond JL. Web-based 3D-visualization of the DrugBank chemical space. J Cheminform 2016; 8:25. [PMID: 27148409 PMCID: PMC4855437 DOI: 10.1186/s13321-016-0138-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Accepted: 04/27/2016] [Indexed: 12/14/2022] Open
Abstract
Background Similarly to the periodic table for elements, chemical space offers an organizing principle for representing the diversity of organic molecules, usually in the form of multi-dimensional property spaces that are subjected to dimensionality reduction methods to obtain 3D-spaces or 2D-maps suitable for visual inspection. Unfortunately, tools to look at chemical space on the internet are currently very limited. Results Herein we present webDrugCS, a web application freely available at www.gdb.unibe.ch to visualize DrugBank (www.drugbank.ca, containing over 6000 investigational and approved drugs) in five different property spaces. WebDrugCS displays 3D-clouds of color-coded grid points representing molecules, whose structural formula is displayed on mouse over with an option to link to the corresponding molecule page at the DrugBank website. The 3D-clouds are obtained by principal component analysis of high dimensional property spaces describing constitution and topology (42D molecular quantum numbers MQN), structural features (34D SMILES fingerprint SMIfp), molecular shape (20D atom pair fingerprint APfp), pharmacophores (55D atom category extended atom pair fingerprint Xfp) and substructures (1024D binary substructure fingerprint Sfp). User defined molecules can be uploaded as SMILES lists and displayed together with DrugBank. In contrast to 2D-maps where many compounds fold onto each other, these 3D-spaces have a comparable resolution to their parent high-dimensional chemical space. Conclusion To the best of our knowledge webDrugCS is the first publicly available web tool for interactive visualization and exploration of the DrugBank chemical space in 3D. WebDrugCS works on computers, tablets and phones, and facilitates the visual exploration of DrugBank to rapidly learn about the structural diversity of small molecule drugs.webDrugCS visualization of DrugBank projected in 3D MQN space color-coded by ring count, with pointer showing the drug 5-fluorouracil. ![]()
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
| |
Collapse
|
27
|
|
28
|
Abstract
Analysis of physical properties and structural diversity of 57 molecules derived from screening 5–16 DNA encoded libraries against two protein targets. DNA encoded library size is not predictive of productivity.
Collapse
Affiliation(s)
- Oliv Eidam
- Roche Pharmaceutical Research and Early Development (pRED)
- Roche Innovation Center Basel
- F. Hoffmann-La Roche Ltd
- CH-4070 Basel
- Switzerland
| | - Alexander L. Satz
- Roche Pharmaceutical Research and Early Development (pRED)
- Roche Innovation Center Basel
- F. Hoffmann-La Roche Ltd
- CH-4070 Basel
- Switzerland
| |
Collapse
|
29
|
Jin X, Awale M, Zasso M, Kostro D, Patiny L, Reymond JL. PDB-Explorer: a web-based interactive map of the protein data bank in shape space. BMC Bioinformatics 2015; 16:339. [PMID: 26493835 PMCID: PMC4619230 DOI: 10.1186/s12859-015-0776-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Accepted: 10/14/2015] [Indexed: 11/17/2022] Open
Abstract
Background The RCSB Protein Data Bank (PDB) provides public access to experimentally determined 3D-structures of biological macromolecules (proteins, peptides and nucleic acids). While various tools are available to explore the PDB, options to access the global structural diversity of the entire PDB and to perceive relationships between PDB structures remain very limited. Methods A 136-dimensional atom pair 3D-fingerprint for proteins (3DP) counting categorized atom pairs at increasing through-space distances was designed to represent the molecular shape of PDB-entries. Nearest neighbor searches examples were reported exemplifying the ability of 3DP-similarity to identify closely related biomolecules from small peptides to enzyme and large multiprotein complexes such as virus particles. The principle component analysis was used to obtain the visualization of PDB in 3DP-space. Results The 3DP property space groups proteins and protein assemblies according to their 3D-shape similarity, yet shows exquisite ability to distinguish between closely related structures. An interactive website called PDB-Explorer is presented featuring a color-coded interactive map of PDB in 3DP-space. Each pixel of the map contains one or more PDB-entries which are directly visualized as ribbon diagrams when the pixel is selected. The PDB-Explorer website allows performing 3DP-nearest neighbor searches of any PDB-entry or of any structure uploaded as protein-type PDB file. All functionalities on the website are implemented in JavaScript in a platform-independent manner and draw data from a server that is updated daily with the latest PDB additions, ensuring complete and up-to-date coverage. The essentially instantaneous 3DP-similarity search with the PDB-Explorer provides results comparable to those of much slower 3D-alignment algorithms, and automatically clusters proteins from the same superfamilies in tight groups. Conclusion A chemical space classification of PDB based on molecular shape was obtained using a new atom-pair 3D-fingerprint for proteins and implemented in a web-based database exploration tool comprising an interactive color-coded map of the PDB chemical space and a nearest neighbor search tool. The PDB-Explorer website is freely available at www.cheminfo.org/pdbexplorer and represents an unprecedented opportunity to interactively visualize and explore the structural diversity of the PDB. ᅟ ᅟMaps of PDB in 3DP-space color-coded by heavy atom count and shape. ![]()
Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0776-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xian Jin
- Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012, Berne, Switzerland.
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012, Berne, Switzerland.
| | - Michaël Zasso
- Ecole Polytechnique Fédérale de Lausanne (EPFL), Institute of Chemical Sciences and Engineering (ISIC), Lausanne, 1015, Switzerland.
| | - Daniel Kostro
- Ecole Polytechnique Fédérale de Lausanne (EPFL), Institute of Chemical Sciences and Engineering (ISIC), Lausanne, 1015, Switzerland.
| | - Luc Patiny
- Ecole Polytechnique Fédérale de Lausanne (EPFL), Institute of Chemical Sciences and Engineering (ISIC), Lausanne, 1015, Switzerland.
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012, Berne, Switzerland.
| |
Collapse
|
30
|
Martínez MJ, Ponzoni I, Díaz MF, Vazquez GE, Soto AJ. Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods. J Cheminform 2015; 7:39. [PMID: 26300983 PMCID: PMC4540751 DOI: 10.1186/s13321-015-0092-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 07/30/2015] [Indexed: 12/02/2022] Open
Abstract
Background The design of QSAR/QSPR models is a challenging problem, where the selection of the most relevant descriptors constitutes a key step of the process. Several feature selection methods that address this step are concentrated on statistical associations among descriptors and target properties, whereas the chemical knowledge is left out of the analysis. For this reason, the interpretability and generality of the QSAR/QSPR models obtained by these feature selection methods are drastically affected. Therefore, an approach for integrating domain expert’s knowledge in the selection process is needed for increase the confidence in the final set of descriptors. Results In this paper a software tool, which we named Visual and Interactive DEscriptor ANalysis (VIDEAN), that combines statistical methods with interactive visualizations for choosing a set of descriptors for predicting a target property is proposed. Domain expertise can be added to the feature selection process by means of an interactive visual exploration of data, and aided by statistical tools and metrics based on information theory. Coordinated visual representations are presented for capturing different relationships and interactions among descriptors, target properties and candidate subsets of descriptors. The competencies of the proposed software were assessed through different scenarios. These scenarios reveal how an expert can use this tool to choose one subset of descriptors from a group of candidate subsets or how to modify existing descriptor subsets and even incorporate new descriptors according to his or her own knowledge of the target property. Conclusions The reported experiences showed the suitability of our software for selecting sets of descriptors with low cardinality, high interpretability, low redundancy and high statistical performance in a visual exploratory way. Therefore, it is possible to conclude that the resulting tool allows the integration of a chemist’s expertise in the descriptor selection process with a low cognitive effort in contrast with the alternative of using an ad-hoc manual analysis of the selected descriptors. Electronic supplementary material The online version of this article (doi:10.1186/s13321-015-0092-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- María Jimena Martínez
- Departamento de Ciencias e Ingeniería de la Computación, Laboratorio de Investigación y Desarrollo en Computación Científica (LIDeCC), Instituto de Ciencias e Ingeniería de la Computación (ICIC), Universidad Nacional del Sur, Av. Alem 1253, 8000 Bahía Blanca, Argentina
| | - Ignacio Ponzoni
- Departamento de Ciencias e Ingeniería de la Computación, Laboratorio de Investigación y Desarrollo en Computación Científica (LIDeCC), Instituto de Ciencias e Ingeniería de la Computación (ICIC), Universidad Nacional del Sur, Av. Alem 1253, 8000 Bahía Blanca, Argentina
| | - Mónica F Díaz
- Planta Piloto de Ingeniería Química (PLAPIQUI)-UNS-CONICET, Co., La Carrindanga km.7, CC 717 Bahía Blanca, Argentina
| | - Gustavo E Vazquez
- Facultad de Ingeniería y Tecnologías, Universidad Católica del Uruguay, Av. 8 de Octubre 2801, CC 11300 Montevideo, Uruguay
| | - Axel J Soto
- Faculty of Computer Science, Dalhousie University, 6050 University Av., Halifax, Canada
| |
Collapse
|
31
|
Awale M, Reymond JL. Similarity Mapplet: Interactive Visualization of the Directory of Useful Decoys and ChEMBL in High Dimensional Chemical Spaces. J Chem Inf Model 2015. [PMID: 26207526 DOI: 10.1021/acs.jcim.5b00182] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
An Internet portal accessible at www.gdb.unibe.ch has been set up to automatically generate color-coded similarity maps of the ChEMBL database in relation to up to two sets of active compounds taken from the enhanced Directory of Useful Decoys (eDUD), a random set of molecules, or up to two sets of user-defined reference molecules. These maps visualize the relationships between the selected compounds and ChEMBL in six different high dimensional chemical spaces, namely MQN (42-D molecular quantum numbers), SMIfp (34-D SMILES fingerprint), APfp (20-D shape fingerprint), Xfp (55-D pharmacophore fingerprint), Sfp (1024-bit substructure fingerprint), and ECfp4 (1024-bit extended connectivity fingerprint). The maps are supplied in form of Java based desktop applications called "similarity mapplets" allowing interactive content browsing and linked to a "Multifingerprint Browser for ChEMBL" (also accessible directly at www.gdb.unibe.ch ) to perform nearest neighbor searches. One can obtain six similarity mapplets of ChEMBL relative to random reference compounds, 606 similarity mapplets relative to single eDUD active sets, 30,300 similarity mapplets relative to pairs of eDUD active sets, and any number of similarity mapplets relative to user-defined reference sets to help visualize the structural diversity of compound series in drug optimization projects and their relationship to other known bioactive compounds.
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|
32
|
Osolodkin DI, Radchenko EV, Orlov AA, Voronkov AE, Palyulin VA, Zefirov NS. Progress in visual representations of chemical space. Expert Opin Drug Discov 2015; 10:959-73. [DOI: 10.1517/17460441.2015.1060216] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
33
|
Tao L, Zhu F, Qin C, Zhang C, Chen S, Zhang P, Zhang C, Tan C, Gao C, Chen Z, Jiang Y, Chen YZ. Clustered distribution of natural product leads of drugs in the chemical space as influenced by the privileged target-sites. Sci Rep 2015; 5:9325. [PMID: 25790752 PMCID: PMC5380136 DOI: 10.1038/srep09325] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 02/18/2015] [Indexed: 01/02/2023] Open
Abstract
Some natural product leads of drugs (NPLDs) have been found to congregate in the chemical space. The extent, detailed patterns, and mechanisms of this congregation phenomenon have not been fully investigated and their usefulness for NPLD discovery needs to be more extensively tested. In this work, we generated and evaluated the distribution patterns of 442 NPLDs of 749 pre-2013 approved and 263 clinical trial small molecule drugs in the chemical space represented by the molecular scaffold and fingerprint trees of 137,836 non-redundant natural products. In the molecular scaffold trees, 62.7% approved and 37.4% clinical trial NPLDs congregate in 62 drug-productive scaffolds/scaffold-branches. In the molecular fingerprint tree, 82.5% approved and 63.0% clinical trial NPLDs are clustered in 60 drug-productive clusters (DCs) partly due to their preferential binding to 45 privileged target-site classes. The distribution patterns of the NPLDs are distinguished from those of the bioactive natural products. 11.7% of the NPLDs in these DCs have remote-similarity relationship with the nearest NPLD in their own DC. The majority of the new NPLDs emerge from preexisting DCs. The usefulness of the derived knowledge for NPLD discovery was demonstrated by the recognition of the new NPLDs of 2013-2014 approved drugs.
Collapse
Affiliation(s)
- Lin Tao
- 1] Department of Pharmacology and Pharmaceutical Sciences, School of Medicine, Tsinghua University, the Ministry-Province Jointly Constructed Base for State Key Lab-Shenzhen Key Laboratory of Chemical Biology, the Graduate School at Shenzhen, Tsinghua University, Shenzhen, and Shenzhen Technology and Engineering Laboratory for Personalized Cancer Diagnostics and Therapeutics, PO Box 518000, P. R. China [2] Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543 [3] NUS Graduate School for Integrative Sciences and Engineering, Singapore 117456
| | - Feng Zhu
- 1] Department of Pharmacology and Pharmaceutical Sciences, School of Medicine, Tsinghua University, the Ministry-Province Jointly Constructed Base for State Key Lab-Shenzhen Key Laboratory of Chemical Biology, the Graduate School at Shenzhen, Tsinghua University, Shenzhen, and Shenzhen Technology and Engineering Laboratory for Personalized Cancer Diagnostics and Therapeutics, PO Box 518000, P. R. China [2] Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543 [3] Innovative Drug Research Centre and College of Chemistry and Chemical Engineering, Chongqing University, Chongqing, P. R. China
| | - Chu Qin
- 1] Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543 [2] NUS Graduate School for Integrative Sciences and Engineering, Singapore 117456
| | - Cheng Zhang
- Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543
| | - Shangying Chen
- Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543
| | - Peng Zhang
- Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543
| | - Cunlong Zhang
- Department of Pharmacology and Pharmaceutical Sciences, School of Medicine, Tsinghua University, the Ministry-Province Jointly Constructed Base for State Key Lab-Shenzhen Key Laboratory of Chemical Biology, the Graduate School at Shenzhen, Tsinghua University, Shenzhen, and Shenzhen Technology and Engineering Laboratory for Personalized Cancer Diagnostics and Therapeutics, PO Box 518000, P. R. China
| | - Chunyan Tan
- Department of Pharmacology and Pharmaceutical Sciences, School of Medicine, Tsinghua University, the Ministry-Province Jointly Constructed Base for State Key Lab-Shenzhen Key Laboratory of Chemical Biology, the Graduate School at Shenzhen, Tsinghua University, Shenzhen, and Shenzhen Technology and Engineering Laboratory for Personalized Cancer Diagnostics and Therapeutics, PO Box 518000, P. R. China
| | - Chunmei Gao
- Department of Pharmacology and Pharmaceutical Sciences, School of Medicine, Tsinghua University, the Ministry-Province Jointly Constructed Base for State Key Lab-Shenzhen Key Laboratory of Chemical Biology, the Graduate School at Shenzhen, Tsinghua University, Shenzhen, and Shenzhen Technology and Engineering Laboratory for Personalized Cancer Diagnostics and Therapeutics, PO Box 518000, P. R. China
| | - Zhe Chen
- Zhejiang Key Laboratory of Gastro-intestinal Pathophysiology, Zhejiang Hospital of Traditional Chinese Medicine, Zhejiang Chinese Medical University, Hangzhou, P. R. China
| | - Yuyang Jiang
- Department of Pharmacology and Pharmaceutical Sciences, School of Medicine, Tsinghua University, the Ministry-Province Jointly Constructed Base for State Key Lab-Shenzhen Key Laboratory of Chemical Biology, the Graduate School at Shenzhen, Tsinghua University, Shenzhen, and Shenzhen Technology and Engineering Laboratory for Personalized Cancer Diagnostics and Therapeutics, PO Box 518000, P. R. China
| | - Yu Zong Chen
- 1] Department of Pharmacology and Pharmaceutical Sciences, School of Medicine, Tsinghua University, the Ministry-Province Jointly Constructed Base for State Key Lab-Shenzhen Key Laboratory of Chemical Biology, the Graduate School at Shenzhen, Tsinghua University, Shenzhen, and Shenzhen Technology and Engineering Laboratory for Personalized Cancer Diagnostics and Therapeutics, PO Box 518000, P. R. China [2] Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543 [3] NUS Graduate School for Integrative Sciences and Engineering, Singapore 117456
| |
Collapse
|
34
|
Abstract
One of the simplest questions that can be asked about molecular diversity is how many organic molecules are possible in total? To answer this question, my research group has computationally enumerated all possible organic molecules up to a certain size to gain an unbiased insight into the entire chemical space. Our latest database, GDB-17, contains 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens, by far the largest small molecule database reported to date. Molecules allowed by valency rules but unstable or nonsynthesizable due to strained topologies or reactive functional groups were not considered, which reduced the enumeration by at least 10 orders of magnitude and was essential to arrive at a manageable database size. Despite these restrictions, GDB-17 is highly relevant with respect to known molecules. Beyond enumeration, understanding and exploiting GDBs (generated databases) led us to develop methods for virtual screening and visualization of very large databases in the form of a "periodic system of molecules" comprising six different fingerprint spaces, with web-browsers for nearest neighbor searches, and the MQN- and SMIfp-Mapplet application for exploring color-coded principal component maps of GDB and other large databases. Proof-of-concept applications of GDB for drug discovery were realized by combining virtual screening with chemical synthesis and activity testing for neurotransmitter receptor and transporter ligands. One surprising lesson from using GDB for drug analog searches is the incredible depth of chemical space, that is, the fact that millions of very close analogs of any molecule can be readily identified by nearest-neighbor searches in the MQN-space of the various GDBs. The chemical space project has opened an unprecedented door on chemical diversity. Ongoing and yet unmet challenges concern enumerating molecules beyond 17 atoms and synthesizing GDB molecules with innovative scaffolds and pharmacophores.
Collapse
Affiliation(s)
- Jean-Louis Reymond
- Department of Chemistry and
Biochemistry, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|
35
|
Abstract
In this article strategies for the design and synthesis of natural product analogues are summarized and illustrated with some selected examples.
Collapse
Affiliation(s)
- Martin E. Maier
- Institut für Organische Chemie
- Eberhard Karls Universität Tübingen
- 72076 Tübingen
- Germany
| |
Collapse
|
36
|
Abstract
Background Sound statistical validation is important to evaluate and compare the overall performance of (Q)SAR models. However, classical validation does not support the user in better understanding the properties of the model or the underlying data. Even though, a number of visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allow the investigation of model validation results are still lacking. Results We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. The approach applies the 3D viewer CheS-Mapper, an open-source application for the exploration of small molecules in virtual 3D space. The present work describes the new functionalities in CheS-Mapper 2.0, that facilitate the analysis of (Q)SAR information and allows the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. The approach is generic: It is model-independent and can handle physico-chemical and structural input features as well as quantitative and qualitative endpoints. Conclusions Visual validation with CheS-Mapper enables analyzing (Q)SAR information in the data and indicates how this information is employed by the (Q)SAR model. It reveals, if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org. Graphical abstract Comparing actual and predicted activity values with CheS-Mapper.
Collapse
|
37
|
Ruddigkeit L, Awale M, Reymond JL. Expanding the fragrance chemical space for virtual screening. J Cheminform 2014; 6:27. [PMID: 24876890 PMCID: PMC4037718 DOI: 10.1186/1758-2946-6-27] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2014] [Accepted: 05/12/2014] [Indexed: 12/30/2022] Open
Abstract
The properties of fragrance molecules in the public databases SuperScent and Flavornet were analyzed to define a “fragrance-like” (FL) property range (Heavy Atom Count ≤ 21, only C, H, O, S, (O + S) ≤ 3, Hydrogen Bond Donor ≤ 1) and the corresponding chemical space including FL molecules from PubChem (NIH repository of molecules), ChEMBL (bioactive molecules), ZINC (drug-like molecules), and GDB-13 (all possible organic molecules up to 13 atoms of C, N, O, S, Cl). The FL subsets of these databases were classified by MQN (Molecular Quantum Numbers, a set of 42 integer value descriptors of molecular structure) and formatted for fast MQN-similarity searching and interactive exploration of color-coded principal component maps in form of the FL-mapplet and FL-browser applications freely available at http://www.gdb.unibe.ch. MQN-similarity is shown to efficiently recover 15 different fragrance molecule families from the different FL subsets, demonstrating the relevance of the MQN-based tool to explore the fragrance chemical space.
Collapse
Affiliation(s)
- Lars Ruddigkeit
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
| |
Collapse
|
38
|
Abstract
To confirm the activity of an initial small molecule 'hit compound' from an activity screening, one needs to probe the structure-activity relationships by testing close analogs. The multi-fingerprint browser presented here (http://dcb-reymond23.unibe.ch:8080/MCSS/) enables one to rapidly identify such close analogs among commercially available compounds in the ZINC database (>13 million molecules). The browser retrieves nearest neighbors of any query molecule in multi-dimensional chemical spaces defined by four different fingerprints, each of which represents relevant structural and pharmacophoric features in a different way: sFP (substructure fingerprint), ECFP4 (extended connectivity fingerprint), MQNs (molecular quantum numbers) and SMIfp (SMILES fingerprint). Distances are calculated using the city-block distance, a similarity measure that performs as well as Tanimoto similarity but is much faster to compute. The list of up to 1000 nearest neighbors of any query molecule is retrieved by the browser and can be then clustered using the K-means clustering algorithm to produce a focused list of analogs with likely similar bioactivity to be considered for experimental evaluation.
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, Berne-3012, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, Berne-3012, Switzerland
| |
Collapse
|
39
|
Kouri TM, Awale M, Slyby JK, Reymond JL, Mehta DP. “Social” Network of Isomers Based on Bond Count Distance: Algorithms. J Chem Inf Model 2014; 54:57-68. [DOI: 10.1021/ci4005173] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Tina M. Kouri
- Department of Computer Science and Engineering, University of South Florida, 4202 E. Fowler Avenue, Tampa, Florida 33620, United States
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3013 Bern, Switzerland
| | - James K. Slyby
- Department of Electrical
Engineering and Computer Science, Colorado School of Mines, 1500
Illinois Street, Golden, Colorado 80401, United States
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3013 Bern, Switzerland
| | - Dinesh P. Mehta
- Department of Electrical
Engineering and Computer Science, Colorado School of Mines, 1500
Illinois Street, Golden, Colorado 80401, United States
| |
Collapse
|
40
|
Melagraki G, Afantitis A. Enalos InSilicoNano platform: an online decision support tool for the design and virtual screening of nanoparticles. RSC Adv 2014. [DOI: 10.1039/c4ra07756c] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A QNAR model, available online through Enalos InSilicoNano platform, has been developed and validated for the risk assessment of nanoparticles (NPs).
Collapse
|
41
|
Ming H, Tiejun C, Yanli W, Stephen BH. Web search and data mining of natural products and their bioactivities in PubChem. Sci China Chem 2013; 56:10.1007/s11426-013-4910-0. [PMID: 24363665 PMCID: PMC3869387 DOI: 10.1007/s11426-013-4910-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Natural products, as major resources for drug discovery historically, are gaining more attentions recently due to the advancement in genomic sequencing and other technologies, which makes them attractive and amenable to drug candidate screening. Collecting and mining the bioactivity information of natural products are extremely important for accelerating drug development process by reducing cost. Lately, a number of publicly accessible databases have been established to facilitate the access to the chemical biology data for small molecules including natural products. Thus, it is imperative for scientists in related fields to exploit these resources in order to expedite their researches on natural products as drug leads/candidates for disease treatment. PubChem, as a public database, contains large amounts of natural products associated with bioactivity data. In this review, we introduce the information system provided at PubChem, and systematically describe the applications for a set of PubChem web services for rapid data retrieval, analysis, and downloading of natural products. We hope this work can serve as a starting point for the researchers to perform data mining on natural products using PubChem.
Collapse
Affiliation(s)
- Hao Ming
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Cheng Tiejun
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Wang Yanli
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Bryant H. Stephen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
42
|
Medina-Franco JL, Aguayo-Ortiz R. Progress in the Visualization and Mining of Chemical and Target Spaces. Mol Inform 2013; 32:942-53. [DOI: 10.1002/minf.201300041] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Accepted: 05/06/2013] [Indexed: 01/15/2023]
|
43
|
Schwartz J, Awale M, Reymond JL. SMIfp (SMILES fingerprint) chemical space for virtual screening and visualization of large databases of organic molecules. J Chem Inf Model 2013; 53:1979-89. [PMID: 23845040 DOI: 10.1021/ci400206h] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
SMIfp (SMILES fingerprint) is defined here as a scalar fingerprint describing organic molecules by counting the occurrences of 34 different symbols in their SMILES strings, which creates a 34-dimensional chemical space. Ligand-based virtual screening using the city-block distance CBD(SMIfp) as similarity measure provides good AUC values and enrichment factors for recovering series of actives from the directory of useful decoys (DUD-E) and from ZINC. DrugBank, ChEMBL, ZINC, PubChem, GDB-11, GDB-13, and GDB-17 can be searched by CBD(SMIfp) using an online SMIfp-browser at www.gdb.unibe.ch. Visualization of the SMIfp chemical space was performed by principal component analysis and color-coded maps of the (PC1, PC2)-planes, with interactive access to the molecules enabled by the Java application SMIfp-MAPPLET available from www.gdb.unibe.ch. These maps spread molecules according to their fraction of aromatic atoms, size and polarity. SMIfp provides a new and relevant entry to explore the small molecule chemical space.
Collapse
Affiliation(s)
- Julian Schwartz
- Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland
| | | | | |
Collapse
|
44
|
Villoutreix BO, Lagorce D, Labbé CM, Sperandio O, Miteva MA. One hundred thousand mouse clicks down the road: selected online resources supporting drug discovery collected over a decade. Drug Discov Today 2013; 18:1081-9. [PMID: 23831439 DOI: 10.1016/j.drudis.2013.06.013] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Revised: 06/18/2013] [Accepted: 06/26/2013] [Indexed: 12/17/2022]
Abstract
Online resources enabling and supporting drug discovery have blossomed during the past ten years. However, drug hunters commonly find themselves overwhelmed by the proliferation of these computer-based resources. Ten years ago, we, the authors of this review, felt that a comprehensive list of in silico resources relating to drug discovery was needed. Especially because the internet provides a wealth of inspiring tools that, if fully exploited, could greatly assist the process. We present here a compilation of online tools and databases collected over the past decade. The tools were essentially found through literature and internet searches and, currently, our list contains over 1500 URLs. We also briefly highlight some recently reported services and comment about ongoing and future efforts in the field.
Collapse
Affiliation(s)
- Bruno O Villoutreix
- Université Paris Diderot, Sorbonne Paris Cité, Inserm UMR-S 973, Molécules Thérapeutiques In Silico, 39 rue Helene Brion, 75013 Paris, France.
| | | | | | | | | |
Collapse
|