1
|
Orsi M, Probst D, Schwaller P, Reymond JL. Alchemical analysis of FDA approved drugs. DIGITAL DISCOVERY 2023; 2:1289-1296. [PMID: 38013905 PMCID: PMC10561545 DOI: 10.1039/d3dd00039g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 08/29/2023] [Indexed: 11/29/2023]
Abstract
Chemical space maps help visualize similarities within molecular sets. However, there are many different molecular similarity measures resulting in a confusing number of possible comparisons. To overcome this limitation, we exploit the fact that tools designed for reaction informatics also work for alchemical processes that do not obey Lavoisier's principle, such as the transmutation of lead into gold. We start by using the differential reaction fingerprint (DRFP) to create tree-maps (TMAPs) representing the chemical space of pairs of drugs selected as being similar according to various molecular fingerprints. We then use the Transformer-based RXNMapper model to understand structural relationships between drugs, and its confidence score to distinguish between pairs related by chemically feasible transformations and pairs related by alchemical transmutations. This analysis reveals a diversity of structural similarity relationships that are otherwise difficult to analyze simultaneously. We exemplify this approach by visualizing FDA-approved drugs, EGFR inhibitors, and polymyxin B analogs.
Collapse
Affiliation(s)
- Markus Orsi
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Daniel Probst
- Ecole Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | | | - Jean-Louis Reymond
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| |
Collapse
|
2
|
Image-based cell subpopulation identification through automated cell tracking, principal component analysis, and partitioning around medoids clustering. Med Biol Eng Comput 2021; 59:1851-1864. [PMID: 34331635 DOI: 10.1007/s11517-021-02418-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 07/14/2021] [Indexed: 01/23/2023]
Abstract
In vitro cell culture model systems often employ monocultures, despite the fact that cells generally exist in a diverse, heterogeneous microenvironment in vivo. In response, heterogeneous cultures are increasingly being used to study how cell phenotypes interact. However, the ability to accurately identify and characterize distinct phenotypic subpopulations within heterogeneous systems remains a major challenge. Here, we present the use of a computational, image analysis-based approach-comprising automated contour-based cell tracking for feature identification, principal component analysis for feature reduction, and partitioning around medoids for subpopulation characterization-to non-destructively and non-invasively identify functionally distinct cell phenotypic subpopulations from live-cell microscopy image data. Using a heterogeneous model system of endothelial and smooth muscle cells, we demonstrate that this approach can be applied to both mono and co-culture nuclear morphometric and motility data to discern cell phenotypic subpopulations. Morphometric clustering identified minimal difference in mono- versus co-culture, while motility clustering revealed that a portion of endothelial cells and smooth muscle cells adopt increased motility rates in co-culture that are not observed in monoculture. We anticipate that this approach using non-destructive and non-invasive imaging can be applied broadly to heterogeneous cell culture model systems to advance understanding of how heterogeneity alters cell phenotype. This work presents a computational, image-analysis-based approach-comprising automated contour-based cell tracking for feature identification, principle component analysis for feature reduction, and partitioning around medoids for subpopulation characterization-to non-destructively and non-invasively identify functionally distinct cell phenotypic subpopulations from live-cell microscopy image data.
Collapse
|
3
|
Miranda-Quintana RA, Rácz A, Bajusz D, Héberger K. Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 2: speed, consistency, diversity selection. J Cheminform 2021; 13:33. [PMID: 33892799 PMCID: PMC8067665 DOI: 10.1186/s13321-021-00504-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 03/12/2021] [Indexed: 11/10/2022] Open
Abstract
Despite being a central concept in cheminformatics, molecular similarity has so far been limited to the simultaneous comparison of only two molecules at a time and using one index, generally the Tanimoto coefficent. In a recent contribution we have not only introduced a complete mathematical framework for extended similarity calculations, (i.e. comparisons of more than two molecules at a time) but defined a series of novel idices. Part 1 is a detailed analysis of the effects of various parameters on the similarity values calculated by the extended formulas. Their features were revealed by sum of ranking differences and ANOVA. Here, in addition to characterizing several important aspects of the newly introduced similarity metrics, we will highlight their applicability and utility in real-life scenarios using datasets with popular molecular fingerprints. Remarkably, for large datasets, the use of extended similarity measures provides an unprecedented speed-up over “traditional” pairwise similarity matrix calculations. We also provide illustrative examples of a more direct algorithm based on the extended Tanimoto similarity to select diverse compound sets, resulting in much higher levels of diversity than traditional approaches. We discuss the inner and outer consistency of our indices, which are key in practical applications, showing whether the n-ary and binary indices rank the data in the same way. We demonstrate the use of the new n-ary similarity metrics on t-distributed stochastic neighbor embedding (t-SNE) plots of datasets of varying diversity, or corresponding to ligands of different pharmaceutical targets, which show that our indices provide a better measure of set compactness than standard binary measures. We also present a conceptual example of the applicability of our indices in agglomerative hierarchical algorithms. The Python code for calculating the extended similarity metrics is freely available at: https://github.com/ramirandaq/MultipleComparisons
Collapse
Affiliation(s)
| | - Anita Rácz
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary
| | - Dávid Bajusz
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary
| | - Károly Héberger
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary.
| |
Collapse
|
4
|
Ahamed TKS, Muraleedharan K. A cheminformatic study on chemical space characterization and diversity analysis of 5-LOX inhibitors. J Mol Graph Model 2020; 100:107699. [PMID: 32799052 DOI: 10.1016/j.jmgm.2020.107699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 06/19/2020] [Accepted: 07/10/2020] [Indexed: 10/23/2022]
Abstract
The process of blocking 5-lipoxygenase (5-LOX) catalyzed leukotriene biosynthesis has been recognized for the past few decades as a promising therapeutic strategy for acute inflammatory, allergic, and respiratory diseases. Due to the toxicity effect of FDA approved 5-LOX inhibitor zileuton, novel 5-LOX inhibitors have been sought by the scientific community. As a result, a significant and relevant amount of information on the structure-activity of 5-LOX inhibitors has been released and stored in public databases. In this study, we aimed at the comprehensive cheminformatic characterization of the diversity and complexity of the chemical space of 5-LOX inhibitors and its activating protein FLAP inhibitors by comparing it with the Approved drug space and virtual LOX library. The visual representation of the property space indicates some compounds in the 5-LOX inhibitors space broaden the traditional medicinal space. The structural diversity of the databases is computed using complementary approaches, including Physicochemical Property (PCP) descriptors, molecular fingerprints, and molecular scaffold. With the apparent exception of approved drugs, the 5-LOX dataset shows more diversity compared to FLAP and LOX virtual library set. This study was able to identify the underlying patterns in the chemical and pharmacological properties space that were decisive for the drug discovery and development of 5-LOX inhibitors.
Collapse
Affiliation(s)
| | - K Muraleedharan
- Department of Chemistry, University of Calicut, Malappuram, 673635, India.
| |
Collapse
|
5
|
Kaspi O, Yosipof A, Senderowitz H. Visualization of Solar Cell Library Space by Dimensionality Reduction Methods. J Chem Inf Model 2018; 58:2428-2439. [PMID: 30485100 DOI: 10.1021/acs.jcim.8b00552] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Visualizing high-dimensional data by projecting them into a two- or three-dimensional space is a popular approach in many scientific fields, including computer-aided drug design and cheminformatics. In contrast, dimensionality reduction techniques have been far less explored for materials informatics. Nevertheless, similar to their usefulness in analyzing the space of, e.g., drug-like molecules, such techniques could provide useful insights on materials space, including an intuitive grasp of the overall distribution of samples, the identification of interesting trends, including the formation of materials clusters and the presence of activity cliffs and outliers, and rational navigation through this space in the search for new materials. Here we present the first application of four dimensionality reduction techniques, namely, principal component analysis (PCA), kernel PCA, Isomap, and diffusion map, to visualize and analyze a part of the materials space populated by solar cells made of metal oxides. Solar cells in general and metal-oxide-based solar cells in particular hold the promise of contributing to the world's search for clean and affordable energy resources. With the exception of PCA, these methods have seldom been used to visualize chemistry space and almost never been used to visualize materials space. For this purpose, we integrated five metal-oxide-based solar cell libraries into a uniform database and subjected it to dimensionality reduction by all four methods, comparing their performances using various criteria such as maintaining the local environment of samples and the clustering structure in the low-dimensional space. We also looked at the number of outliers produced by each method and analyzed common outliers. We found that PCA performs best in terms of the ability to correctly maintain the local environment of samples, whereas Isomap does the best job of assigning class membership on the basis of the identities of nearest neighbors (i.e., it is the best classifier). We also found that many of the outliers identified by all of the methods could be rationalized. We suggest that the methods used in this work could be extended to study other types of solar cells, thereby setting the ground for further analysis of the photovoltaic (PV) space as well as other regions of materials space.
Collapse
Affiliation(s)
- Omer Kaspi
- Department of Chemistry , Bar-Ilan University , Ramat-Gan 5290002 , Israel
| | - Abraham Yosipof
- Department of Information Systems , College of Law & Business, Ramat-Gan , P.O. Box 852, Bnei Brak 5110801 , Israel
| | - Hanoch Senderowitz
- Department of Chemistry , Bar-Ilan University , Ramat-Gan 5290002 , Israel
| |
Collapse
|
6
|
Yosipof A, Guedes RC, García-Sosa AT. Data Mining and Machine Learning Models for Predicting Drug Likeness and Their Disease or Organ Category. Front Chem 2018; 6:162. [PMID: 29868564 PMCID: PMC5954128 DOI: 10.3389/fchem.2018.00162] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 04/20/2018] [Indexed: 12/11/2022] Open
Abstract
Data mining approaches can uncover underlying patterns in chemical and pharmacological property space decisive for drug discovery and development. Two of the most common approaches are visualization and machine learning methods. Visualization methods use dimensionality reduction techniques in order to reduce multi-dimension data into 2D or 3D representations with a minimal loss of information. Machine learning attempts to find correlations between specific activities or classifications for a set of compounds and their features by means of recurring mathematical models. Both models take advantage of the different and deep relationships that can exist between features of compounds, and helpfully provide classification of compounds based on such features or in case of visualization methods uncover underlying patterns in the feature space. Drug-likeness has been studied from several viewpoints, but here we provide the first implementation in chemoinformatics of the t-Distributed Stochastic Neighbor Embedding (t-SNE) method for the visualization and the representation of chemical space, and the use of different machine learning methods separately and together to form a new ensemble learning method called AL Boost. The models obtained from AL Boost synergistically combine decision tree, random forests (RF), support vector machine (SVM), artificial neural network (ANN), k nearest neighbors (kNN), and logistic regression models. In this work, we show that together they form a predictive model that not only improves the predictive force but also decreases bias. This resulted in a corrected classification rate of over 0.81, as well as higher sensitivity and specificity rates for the models. In addition, separation and good models were also achieved for disease categories such as antineoplastic compounds and nervous system diseases, among others. Such models can be used to guide decision on the feature landscape of compounds and their likeness to either drugs or other characteristics, such as specific or multiple disease-category(ies) or organ(s) of action of a molecule.
Collapse
Affiliation(s)
- Abraham Yosipof
- Department of Information Systems and Department of Business Administration, College of Law & Business, Ramat-Gan, Israel
| | - Rita C Guedes
- Department of Medicinal Chemistry, Faculty of Pharmacy, Research Institute for Medicines (iMed.ULisboa), Universidade de Lisboa, Lisbon, Portugal
| | - Alfonso T García-Sosa
- Department of Molecular Technology, Institute of Chemistry, University of Tartu, Tartu, Estonia
| |
Collapse
|
7
|
|
8
|
Kontijevskis A. Mapping of Drug-like Chemical Universe with Reduced Complexity Molecular Frameworks. J Chem Inf Model 2017; 57:680-699. [DOI: 10.1021/acs.jcim.7b00006] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
9
|
Miteva MA, Villoutreix BO. Computational Biology and Chemistry in MTi: Emphasis on the Prediction of Some ADMET Properties. Mol Inform 2017; 36. [DOI: 10.1002/minf.201700008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Accepted: 02/03/2017] [Indexed: 12/21/2022]
Affiliation(s)
- Maria A. Miteva
- Université Paris Diderot, Sorbonne Paris Cité, Molécules Thérapeutiques In Silico , Inserm UMR−S 973; 35 rue Helene Brion 75013 Paris France
- INSERM, U973; F-75205 Paris France
| | - Bruno O. Villoutreix
- Université Paris Diderot, Sorbonne Paris Cité, Molécules Thérapeutiques In Silico , Inserm UMR−S 973; 35 rue Helene Brion 75013 Paris France
- INSERM, U973; F-75205 Paris France
| |
Collapse
|
10
|
Ekins S, Perryman AL, Clark AM, Reynolds RC, Freundlich JS. Machine Learning Model Analysis and Data Visualization with Small Molecules Tested in a Mouse Model of Mycobacterium tuberculosis Infection (2014-2015). J Chem Inf Model 2016; 56:1332-43. [PMID: 27335215 PMCID: PMC4962118 DOI: 10.1021/acs.jcim.6b00004] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
![]()
The
renewed urgency to develop new treatments for Mycobacterium
tuberculosis (Mtb)
infection has resulted in large-scale phenotypic screening and thousands
of new active compounds in vitro. The next challenge
is to identify candidates to pursue in a mouse in vivo efficacy model as a step to predicting clinical efficacy. We previously
analyzed over 70 years of this mouse in vivo efficacy
data, which we used to generate and validate machine learning models.
Curation of 60 additional small molecules with in vivo data published in 2014 and 2015 was undertaken to further test these
models. This represents a much larger test set than for the previous
models. Several computational approaches have now been applied to
analyze these molecules and compare their molecular properties beyond
those attempted previously. Our previous machine learning models have
been updated, and a novel aspect has been added in the form of mouse
liver microsomal half-life (MLM t1/2)
and in vitro-based Mtb models incorporating
cytotoxicity data that were used to predict in vivo activity for comparison. Our best Mtbin
vivo models possess fivefold ROC values > 0.7, sensitivity
> 80%, and concordance > 60%, while the best specificity value
is
>40%. Use of an MLM t1/2 Bayesian model
affords comparable results for scoring the 60 compounds tested. Combining
MLM stability and in vitroMtb models
in a novel consensus workflow in the best cases has a positive predicted
value (hit rate) > 77%. Our results indicate that Bayesian models
constructed with literature in vivoMtb data generated by different laboratories in various mouse models
can have predictive value and may be used alongside MLM t1/2 and in vitro-based Mtb models to assist in selecting antitubercular compounds with desirable in vivo efficacy. We demonstrate for the first time that
consensus models of any kind can be used to predict in vivo activity for Mtb. In addition, we describe a new
clustering method for data visualization and apply this to the in vivo training and test data, ultimately making the method
accessible in a mobile app.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery , 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States.,Collaborations in Chemistry , 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| | - Alexander L Perryman
- Department of Pharmacology, Physiology and Neuroscience, Rutgers University-New Jersey Medical School , Newark, New Jersey 07103, United States
| | - Alex M Clark
- Molecular Materials Informatics, Inc. , 1900 St. Jacques #302, Montreal, Quebec H3J 2S1, Canada
| | - Robert C Reynolds
- Division of Hematology and Oncology, Department of Medicine, and Department of Chemistry, College of Arts and Sciences, University of Alabama at Birmingham , 1530 Third Avenue South, Birmingham, Alabama 35294-1240, United States
| | - Joel S Freundlich
- Department of Pharmacology, Physiology and Neuroscience, Rutgers University-New Jersey Medical School , Newark, New Jersey 07103, United States.,Division of Infectious Diseases, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University-New Jersey Medical School , Newark, New Jersey 07103, United States
| |
Collapse
|
11
|
Awale M, Reymond JL. Web-based 3D-visualization of the DrugBank chemical space. J Cheminform 2016; 8:25. [PMID: 27148409 PMCID: PMC4855437 DOI: 10.1186/s13321-016-0138-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Accepted: 04/27/2016] [Indexed: 12/14/2022] Open
Abstract
Background Similarly to the periodic table for elements, chemical space offers an organizing principle for representing the diversity of organic molecules, usually in the form of multi-dimensional property spaces that are subjected to dimensionality reduction methods to obtain 3D-spaces or 2D-maps suitable for visual inspection. Unfortunately, tools to look at chemical space on the internet are currently very limited. Results Herein we present webDrugCS, a web application freely available at www.gdb.unibe.ch to visualize DrugBank (www.drugbank.ca, containing over 6000 investigational and approved drugs) in five different property spaces. WebDrugCS displays 3D-clouds of color-coded grid points representing molecules, whose structural formula is displayed on mouse over with an option to link to the corresponding molecule page at the DrugBank website. The 3D-clouds are obtained by principal component analysis of high dimensional property spaces describing constitution and topology (42D molecular quantum numbers MQN), structural features (34D SMILES fingerprint SMIfp), molecular shape (20D atom pair fingerprint APfp), pharmacophores (55D atom category extended atom pair fingerprint Xfp) and substructures (1024D binary substructure fingerprint Sfp). User defined molecules can be uploaded as SMILES lists and displayed together with DrugBank. In contrast to 2D-maps where many compounds fold onto each other, these 3D-spaces have a comparable resolution to their parent high-dimensional chemical space. Conclusion To the best of our knowledge webDrugCS is the first publicly available web tool for interactive visualization and exploration of the DrugBank chemical space in 3D. WebDrugCS works on computers, tablets and phones, and facilitates the visual exploration of DrugBank to rapidly learn about the structural diversity of small molecule drugs.webDrugCS visualization of DrugBank projected in 3D MQN space color-coded by ring count, with pointer showing the drug 5-fluorouracil. ![]()
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
| |
Collapse
|
12
|
Mathea M, Klingspohn W, Baumann K. Chemoinformatic Classification Methods and their Applicability Domain. Mol Inform 2016; 35:160-80. [PMID: 27492083 DOI: 10.1002/minf.201501019] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Accepted: 01/20/2016] [Indexed: 11/08/2022]
Abstract
Classification rules are often used in chemoinformatics to predict categorical properties of drug candidates related to bioactivity from explanatory variables, which encode the respective molecular structures (i.e. molecular descriptors). To avoid predictions with an unduly large error probability, the domain the classifier is applied to should be restricted to the domain covered by the training set objects. This latter domain is commonly referred to as applicability domain in chemoinformatics. Conceptually, the applicability domain defines the region in space where the "normal" objects are located. Defining the border of the applicability domain may then be viewed as detecting anomalous or novel objects or as detecting outliers. Currently two different types of measures are in use. The first one defines the applicability domain solely in terms of the molecular descriptor space, which is referred to as novelty detection. The second type defines the applicability domain in terms of the expected reliability of the predictions which is referred to as confidence estimation. Both types are systematically differentiated here and the most popular measures are reviewed. It will be shown that all common chemoinformatic classifiers have built-in confidence scores. Since confidence estimation uses information of the class labels for computing the confidence scores, it is expected to be more efficient in reducing the error rate than novelty detection, which solely uses the information of the explanatory variables.
Collapse
Affiliation(s)
- Miriam Mathea
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany
| | - Waldemar Klingspohn
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany.
| |
Collapse
|
13
|
Awale M, Reymond JL. Similarity Mapplet: Interactive Visualization of the Directory of Useful Decoys and ChEMBL in High Dimensional Chemical Spaces. J Chem Inf Model 2015. [PMID: 26207526 DOI: 10.1021/acs.jcim.5b00182] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
An Internet portal accessible at www.gdb.unibe.ch has been set up to automatically generate color-coded similarity maps of the ChEMBL database in relation to up to two sets of active compounds taken from the enhanced Directory of Useful Decoys (eDUD), a random set of molecules, or up to two sets of user-defined reference molecules. These maps visualize the relationships between the selected compounds and ChEMBL in six different high dimensional chemical spaces, namely MQN (42-D molecular quantum numbers), SMIfp (34-D SMILES fingerprint), APfp (20-D shape fingerprint), Xfp (55-D pharmacophore fingerprint), Sfp (1024-bit substructure fingerprint), and ECfp4 (1024-bit extended connectivity fingerprint). The maps are supplied in form of Java based desktop applications called "similarity mapplets" allowing interactive content browsing and linked to a "Multifingerprint Browser for ChEMBL" (also accessible directly at www.gdb.unibe.ch ) to perform nearest neighbor searches. One can obtain six similarity mapplets of ChEMBL relative to random reference compounds, 606 similarity mapplets relative to single eDUD active sets, 30,300 similarity mapplets relative to pairs of eDUD active sets, and any number of similarity mapplets relative to user-defined reference sets to help visualize the structural diversity of compound series in drug optimization projects and their relationship to other known bioactive compounds.
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|
14
|
Fey N. Lost in chemical space? Maps to support organometallic catalysis. Chem Cent J 2015; 9:38. [PMID: 26113874 PMCID: PMC4480443 DOI: 10.1186/s13065-015-0104-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 05/08/2015] [Indexed: 01/08/2023] Open
Abstract
Descriptors calculated from molecular structures have been used to map different areas of chemical space. A number of applications for such maps can be identified, ranging from the fine-tuning and optimisation of catalytic activity and compound properties to virtual screening of novel compounds, as well as the exhaustive exploration of large areas of chemical space by automated combinatorial building and evaluation. This review focuses on organometallic catalysis, but also touches on other areas where similar approaches have been used, with a view to assessing the extent to which chemical space has been explored. Cartoon representation of a chemical space map. ![]()
Collapse
Affiliation(s)
- Natalie Fey
- School of Chemistry, University of Bristol, Cantock's Close, Bristol, BS8 1TS UK
| |
Collapse
|
15
|
Clark AM, Dole K, Coulon-Spektor A, McNutt A, Grass G, Freundlich JS, Reynolds RC, Ekins S. Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets. J Chem Inf Model 2015; 55:1231-45. [PMID: 25994950 PMCID: PMC4478615 DOI: 10.1021/acs.jcim.5b00143] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
![]()
On the order of hundreds of absorption,
distribution, metabolism,
excretion, and toxicity (ADME/Tox) models have been described in the
literature in the past decade which are more often than not inaccessible
to anyone but their authors. Public accessibility is also an issue
with computational models for bioactivity, and the ability to share
such models still remains a major challenge limiting drug discovery.
We describe the creation of a reference implementation of a Bayesian
model-building software module, which we have released as an open
source component that is now included in the Chemistry Development
Kit (CDK) project, as well as implemented in the CDD Vault and
in several mobile apps. We use this implementation to build an array
of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochemical properties.
We show that these models possess cross-validation receiver operator
curve values comparable to those generated previously in prior publications
using alternative tools. We have now described how the implementation
of Bayesian models with FCFP6 descriptors generated in the CDD Vault
enables the rapid production of robust machine learning models from
public data or the user’s own datasets. The current study sets
the stage for generating models in proprietary software (such as CDD)
and exporting these models in a format that could be run in open source
software using CDK components. This work also demonstrates that we
can enable biocomputation across distributed private or public datasets
to enhance drug discovery.
Collapse
Affiliation(s)
- Alex M Clark
- †Molecular Materials Informatics, Inc., 1900 St. Jacques No. 302, Montreal H3J 2S1, Quebec, Canada
| | - Krishna Dole
- ‡Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | - Anna Coulon-Spektor
- ‡Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | - Andrew McNutt
- ‡Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | - George Grass
- §G2 Research, Inc., P.O. Box 1242, Tahoe City, California 96145, United States
| | | | - Robert C Reynolds
- #Department of Chemistry, College of Arts and Sciences, University of Alabama at Birmingham, , 1530 Third Avenue South, Birmingham, Alabama 35294-1240, United States
| | - Sean Ekins
- ‡Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States.,∇Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| |
Collapse
|
16
|
Ivanenkov YA, Veselov MS, Chufarova NV, Majouga AG, Kudryavceva AA, Ivachtchenko AV. Non-dopamine receptor ligands for the treatment of Parkinson's disease. Insight into the related chemical/property space. Mol Divers 2015; 20:345-65. [PMID: 25956815 DOI: 10.1007/s11030-015-9598-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Accepted: 04/06/2015] [Indexed: 10/23/2022]
Abstract
Extensive biochemical and clinical studies have increasingly recognized Parkinson's disease as a highly complex and multi-faceted neurological disorder having branched non-motor symptoms including sleep disorders, pain, constipation, psychosis, depression, and fatigue. A wide range of biological targets in the brain deeply implicated in this pathology resulted in a plethora of novel small-molecule compounds with promising activity. This review thoroughly describes the chemical space of non-dopamine receptor ligands in terms of diversity, isosteric/bioisosteric morphing, and molecular descriptors.
Collapse
Affiliation(s)
- Yan A Ivanenkov
- Moscow Institute of Physics and Technology (State University), 9 Institutskiy Lane, Dolgoprudny, Moscow Region, 141700, Russian Federation. .,ChemDiv, 6605 Nancy Ridge Drive, San Diego, CA, 92121, USA. .,Chemistry Department, Moscow State University, Leninskie Gory, Building 1/3, Moscow, 119991, Russian Federation.
| | - Mark S Veselov
- Moscow Institute of Physics and Technology (State University), 9 Institutskiy Lane, Dolgoprudny, Moscow Region, 141700, Russian Federation.,Chemistry Department, Moscow State University, Leninskie Gory, Building 1/3, Moscow, 119991, Russian Federation.,National University of Science and Technology MISiS, 9 Leninskiy pr., Moscow, 119049, Russian Federation
| | - Nina V Chufarova
- Moscow Institute of Physics and Technology (State University), 9 Institutskiy Lane, Dolgoprudny, Moscow Region, 141700, Russian Federation.,National University of Science and Technology MISiS, 9 Leninskiy pr., Moscow, 119049, Russian Federation
| | - Alexander G Majouga
- Chemistry Department, Moscow State University, Leninskie Gory, Building 1/3, Moscow, 119991, Russian Federation.,National University of Science and Technology MISiS, 9 Leninskiy pr., Moscow, 119049, Russian Federation
| | - Anna A Kudryavceva
- Moscow Institute of Physics and Technology (State University), 9 Institutskiy Lane, Dolgoprudny, Moscow Region, 141700, Russian Federation
| | | |
Collapse
|
17
|
Abstract
Background Sound statistical validation is important to evaluate and compare the overall performance of (Q)SAR models. However, classical validation does not support the user in better understanding the properties of the model or the underlying data. Even though, a number of visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allow the investigation of model validation results are still lacking. Results We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. The approach applies the 3D viewer CheS-Mapper, an open-source application for the exploration of small molecules in virtual 3D space. The present work describes the new functionalities in CheS-Mapper 2.0, that facilitate the analysis of (Q)SAR information and allows the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. The approach is generic: It is model-independent and can handle physico-chemical and structural input features as well as quantitative and qualitative endpoints. Conclusions Visual validation with CheS-Mapper enables analyzing (Q)SAR information in the data and indicates how this information is employed by the (Q)SAR model. It reveals, if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org. Graphical abstract Comparing actual and predicted activity values with CheS-Mapper.
Collapse
|
18
|
Ovchinnikova SI, Bykov AA, Tsivadze AY, Dyachkov EP, Kireeva NV. Supervised extensions of chemography approaches: case studies of chemical liabilities assessment. J Cheminform 2014; 6:20. [PMID: 24868246 PMCID: PMC4018504 DOI: 10.1186/1758-2946-6-20] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Accepted: 04/28/2014] [Indexed: 12/04/2022] Open
Abstract
Chemical liabilities, such as adverse effects and toxicity, play a significant role in modern drug discovery process. In silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Herein, we propose an approach combining several classification and chemography methods to be able to predict chemical liabilities and to interpret obtained results in the context of impact of structural changes of compounds on their pharmacological profile. To our knowledge for the first time, the supervised extension of Generative Topographic Mapping is proposed as an effective new chemography method. New approach for mapping new data using supervised Isomap without re-building models from the scratch has been proposed. Two approaches for estimation of model's applicability domain are used in our study to our knowledge for the first time in chemoinformatics. The structural alerts responsible for the negative characteristics of pharmacological profile of chemical compounds has been found as a result of model interpretation.
Collapse
Affiliation(s)
- Svetlana I Ovchinnikova
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| | - Arseniy A Bykov
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| | - Aslan Yu Tsivadze
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
| | - Evgeny P Dyachkov
- Kurnakov Institute of General and Inorganic Chemistry RAS, Leninsky pr-t 31, 119071 Moscow, Russia
| | - Natalia V Kireeva
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| |
Collapse
|
19
|
Hoksza D, Skoda P, Voršilák M, Svozil D. Molpher: a software framework for systematic chemical space exploration. J Cheminform 2014; 6:7. [PMID: 24655571 PMCID: PMC3998053 DOI: 10.1186/1758-2946-6-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 03/17/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Chemical space is virtual space occupied by all chemically meaningful organic compounds. It is an important concept in contemporary chemoinformatics research, and its systematic exploration is vital to the discovery of either novel drugs or new tools for chemical biology. RESULTS In this paper, we describe Molpher, an open-source framework for the systematic exploration of chemical space. Through a process we term 'molecular morphing', Molpher produces a path of structurally-related compounds. This path is generated by the iterative application of so-called 'morphing operators' that represent simple structural changes, such as the addition or removal of an atom or a bond. Molpher incorporates an optimized parallel exploration algorithm, compound logging and a two-dimensional visualization of the exploration process. Its feature set can be easily extended by implementing additional morphing operators, chemical fingerprints, similarity measures and visualization methods. Molpher not only offers an intuitive graphical user interface, but also can be run in batch mode. This enables users to easily incorporate molecular morphing into their existing drug discovery pipelines. CONCLUSIONS Molpher is an open-source software framework for the design of virtual chemical libraries focused on a particular mechanistic class of compounds. These libraries, represented by a morphing path and its surroundings, provide valuable starting data for future in silico and in vitro experiments. Molpher is highly extensible and can be easily incorporated into any existing computational drug design pipeline.
Collapse
Affiliation(s)
- David Hoksza
- Laboratory of Informatics and Chemistry, Faculty of Chemical Technology, Institute of Chemical Technology Prague, Technická 5, CZ-166 28 Prague, Czech Republic.
| | | | | | | |
Collapse
|
20
|
Ekins S, Freundlich JS, Reynolds RC. Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation. J Chem Inf Model 2013; 53:3054-63. [PMID: 24144044 DOI: 10.1021/ci400480s] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The search for new tuberculosis treatments continues as we need to find molecules that can act more quickly, be accommodated in multidrug regimens, and overcome ever increasing levels of drug resistance. Multiple large scale phenotypic high-throughput screens against Mycobacterium tuberculosis (Mtb) have generated dose response data, enabling the generation of machine learning models. These models also incorporated cytotoxicity data and were recently validated with a large external data set. A cheminformatics data-fusion approach followed by Bayesian machine learning, Support Vector Machine, or Recursive Partitioning model development (based on publicly available Mtb screening data) was used to compare individual data sets and subsequent combined models. A set of 1924 commercially available molecules with promising antitubercular activity (and lack of relative cytotoxicity to Vero cells) were used to evaluate the predictive nature of the models. We demonstrate that combining three data sets incorporating antitubercular and cytotoxicity data in Vero cells from our previous screens results in external validation receiver operator curve (ROC) of 0.83 (Bayesian or RP Forest). Models that do not have the highest 5-fold cross-validation ROC scores can outperform other models in a test set dependent manner. We demonstrate with predictions for a recently published set of Mtb leads from GlaxoSmithKline that no single machine learning model may be enough to identify compounds of interest. Data set fusion represents a further useful strategy for machine learning construction as illustrated with Mtb. Coverage of chemistry and Mtb target spaces may also be limiting factors for the whole-cell screening data generated to date.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | | | | |
Collapse
|
21
|
Schwartz J, Awale M, Reymond JL. SMIfp (SMILES fingerprint) chemical space for virtual screening and visualization of large databases of organic molecules. J Chem Inf Model 2013; 53:1979-89. [PMID: 23845040 DOI: 10.1021/ci400206h] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
SMIfp (SMILES fingerprint) is defined here as a scalar fingerprint describing organic molecules by counting the occurrences of 34 different symbols in their SMILES strings, which creates a 34-dimensional chemical space. Ligand-based virtual screening using the city-block distance CBD(SMIfp) as similarity measure provides good AUC values and enrichment factors for recovering series of actives from the directory of useful decoys (DUD-E) and from ZINC. DrugBank, ChEMBL, ZINC, PubChem, GDB-11, GDB-13, and GDB-17 can be searched by CBD(SMIfp) using an online SMIfp-browser at www.gdb.unibe.ch. Visualization of the SMIfp chemical space was performed by principal component analysis and color-coded maps of the (PC1, PC2)-planes, with interactive access to the molecules enabled by the Java application SMIfp-MAPPLET available from www.gdb.unibe.ch. These maps spread molecules according to their fraction of aromatic atoms, size and polarity. SMIfp provides a new and relevant entry to explore the small molecule chemical space.
Collapse
Affiliation(s)
- Julian Schwartz
- Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland
| | | | | |
Collapse
|
22
|
Wang L, Wang M, Yan A, Dai B. Using self-organizing map (SOM) and support vector machine (SVM) for classification of selectivity of ACAT inhibitors. Mol Divers 2013; 17:85-96. [PMID: 23124952 DOI: 10.1007/s11030-012-9404-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2012] [Accepted: 10/08/2012] [Indexed: 01/29/2023]
Abstract
Using a self-organizing map (SOM) and support vector machine, two classification models were built to predict whether a compound is a selective inhibitor toward the two Acyl-coenzyme A: cholesterol acyltransferase (ACAT) isozymes, ACAT-1 and ACAT-2. A dataset of 97 ACAT inhibitors was collected. For each molecule, the global descriptors, 2D and 3D property autocorrelation descriptors and autocorrelation of surface properties were calculated from the program ADRIANA.Code. The prediction accuracies of the models (based on the training/ test set splitting by SOM method) for the test sets are 88.9 % for SOM1, 92.6 % for SVM1 model. In addition, the extended connectivity fingerprints (ECFP_4) for all the molecules were calculated and the structure-activity relationship of selective ACAT inhibitors was summarized, which may help find important structural features of inhibitors relating to the selectivity of ACAT isozymes.
Collapse
Affiliation(s)
- Ling Wang
- School of Chemistry and Chemical Engineering, Key Laboratory for Green Process of Chemical Engineering of Xinjiang Bingtuan, Shihezi University, Xinjiang, Shihezi 832003, China
| | | | | | | |
Collapse
|
23
|
Awale M, van Deursen R, Reymond JL. MQN-mapplet: visualization of chemical space with interactive maps of DrugBank, ChEMBL, PubChem, GDB-11, and GDB-13. J Chem Inf Model 2013; 53:509-18. [PMID: 23297797 DOI: 10.1021/ci300513m] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
The MQN-mapplet is a Java application giving access to the structure of small molecules in large databases via color-coded maps of their chemical space. These maps are projections from a 42-dimensional property space defined by 42 integer value descriptors called molecular quantum numbers (MQN), which count different categories of atoms, bonds, polar groups, and topological features and categorize molecules by size, rigidity, and polarity. Despite its simplicity, MQN-space is relevant to biological activities. The MQN-mapplet allows localization of any molecule on the color-coded images, visualization of the molecules, and identification of analogs as neighbors on the MQN-map or in the original 42-dimensional MQN-space. No query molecule is necessary to start the exploration, which may be particularly attractive for nonchemists. To our knowledge, this type of interactive exploration tool is unprecedented for very large databases such as PubChem and GDB-13 (almost one billion molecules). The application is freely available for download at www.gdb.unibe.ch.
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, NCCR TransCure, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland
| | | | | |
Collapse
|
24
|
Benzimidazole derivatives: synthesis, leishmanicidal effectiveness, and molecular docking studies. Med Chem Res 2012. [DOI: 10.1007/s00044-012-0375-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
25
|
Kireeva N, Kuznetsov SL, Tsivadze AY. Toward Navigating Chemical Space of Ionic Liquids: Prediction of Melting Points Using Generative Topographic Maps. Ind Eng Chem Res 2012. [DOI: 10.1021/ie3021895] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Natalia Kireeva
- Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31,
119071 Moscow Russian Federation
- Laboratoire d’Infochimie,
UMR 7177 CNRS, Université de Strasbourg, 4 rue B. Pascal, Strasbourg 67000, France
| | - Sergey L. Kuznetsov
- Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31,
119071 Moscow Russian Federation
| | - Aslan Yu. Tsivadze
- Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31,
119071 Moscow Russian Federation
| |
Collapse
|
26
|
Kireeva N, Baskin II, Gaspar HA, Horvath D, Marcou G, Varnek A. Generative Topographic Mapping (GTM): Universal Tool for Data Visualization, Structure-Activity Modeling and Dataset Comparison. Mol Inform 2012; 31:301-12. [PMID: 27477099 DOI: 10.1002/minf.201100163] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Accepted: 02/29/2012] [Indexed: 11/10/2022]
Abstract
Here, the utility of Generative Topographic Maps (GTM) for data visualization, structure-activity modeling and database comparison is evaluated, on hand of subsets of the Database of Useful Decoys (DUD). Unlike other popular dimensionality reduction approaches like Principal Component Analysis, Sammon Mapping or Self-Organizing Maps, the great advantage of GTMs is providing data probability distribution functions (PDF), both in the high-dimensional space defined by molecular descriptors and in 2D latent space. PDFs for the molecules of different activity classes were successfully used to build classification models in the framework of the Bayesian approach. Because PDFs are represented by a mixture of Gaussian functions, the Bhattacharyya kernel has been proposed as a measure of the overlap of datasets, which leads to an elegant method of global comparison of chemical libraries.
Collapse
Affiliation(s)
- N Kireeva
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4 rue B. Pascal, Strasbourg 67000, France.,Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31, 119991 Moscow, Russian Federation
| | - I I Baskin
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4 rue B. Pascal, Strasbourg 67000, France.,Department of Chemistry, Lomonosov Moscow State University, 119991, Moscow, Russian Federation
| | - H A Gaspar
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4 rue B. Pascal, Strasbourg 67000, France
| | - D Horvath
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4 rue B. Pascal, Strasbourg 67000, France
| | - G Marcou
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4 rue B. Pascal, Strasbourg 67000, France
| | - A Varnek
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4 rue B. Pascal, Strasbourg 67000, France.
| |
Collapse
|
27
|
Reutlinger M, Schneider G. Nonlinear dimensionality reduction and mapping of compound libraries for drug discovery. J Mol Graph Model 2012; 34:108-17. [PMID: 22326864 DOI: 10.1016/j.jmgm.2011.12.006] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2011] [Revised: 12/13/2011] [Accepted: 12/14/2011] [Indexed: 01/29/2023]
Abstract
Visualization of 'chemical space' and compound distributions has received much attraction by medicinal chemists as it may help to intuitively comprehend pharmaceutically relevant molecular features. It has been realized that for meaningful feature extraction from complex multivariate chemical data, such as compound libraries represented by many molecular descriptors, nonlinear projection techniques are required. Recent advances in machine-learning and artificial intelligence have resulted in a transfer of such methods to chemistry. We provide an overview of prominent visualization methods based on nonlinear dimensionality reduction, and highlight applications in drug discovery. Emphasis is on neural network techniques, kernel methods and stochastic embedding approaches, which have been successfully used for ligand-based virtual screening, SAR landscape analysis, combinatorial library design, and screening compound selection.
Collapse
Affiliation(s)
- Michael Reutlinger
- Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Zurich, Switzerland
| | | |
Collapse
|
28
|
Sacan A, Ekins S, Kortagere S. Applications and limitations of in silico models in drug discovery. Methods Mol Biol 2012; 910:87-124. [PMID: 22821594 DOI: 10.1007/978-1-61779-965-5_6] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Drug discovery in the late twentieth and early twenty-first century has witnessed a myriad of changes that were adopted to predict whether a compound is likely to be successful, or conversely enable identification of molecules with liabilities as early as possible. These changes include integration of in silico strategies for lead design and optimization that perform complementary roles to that of the traditional in vitro and in vivo approaches. The in silico models are facilitated by the availability of large datasets associated with high-throughput screening, bioinformatics algorithms to mine and annotate the data from a target perspective, and chemoinformatics methods to integrate chemistry methods into lead design process. This chapter highlights the applications of some of these methods and their limitations. We hope this serves as an introduction to in silico drug discovery.
Collapse
Affiliation(s)
- Ahmet Sacan
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | | | | |
Collapse
|
29
|
Schneider P, Stutz K, Kasper L, Haller S, Reutlinger M, Reisen F, Geppert T, Schneider G. Target Profile Prediction and Practical Evaluation of a Biginelli-Type Dihydropyrimidine Compound Library. Pharmaceuticals (Basel) 2011. [PMCID: PMC4058656 DOI: 10.3390/ph4091236] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
We present a self-organizing map (SOM) approach to predicting macromolecular targets for combinatorial compound libraries. The aim was to study the usefulness of the SOM in combination with a topological pharmacophore representation (CATS) for selecting biologically active compounds from a virtual combinatorial compound collection, taking the multi-component Biginelli dihydropyrimidine reaction as an example. We synthesized a candidate compound from this library, for which the SOM model suggested inhibitory activity against cyclin-dependent kinase 2 (CDK2) and other kinases. The prediction was confirmed in an in vitro panel assay comprising 48 human kinases. We conclude that the computational technique may be used for ligand-based in silico pharmacology studies, off-target prediction, and drug re-purposing, thereby complementing receptor-based approaches.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Gisbert Schneider
- Author to whom correspondence should be addressed; E-Mail: ; Tel.: +41-44-633-7438; Fax: +41-44-633-1379
| |
Collapse
|
30
|
Koeppen H, Kriegl J, Lessel U, Tautermann CS, Wellenzohn B. Ligand-Based Virtual Screening. METHODS AND PRINCIPLES IN MEDICINAL CHEMISTRY 2011. [DOI: 10.1002/9783527633326.ch3] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
31
|
Abstract
This introductory chapter gives a brief overview of the history of cheminformatics, and then summarizes some recent trends in computing, cultures, open systems, chemical structure representation, docking, de novo design, fragment-based drug design, molecular similarity, quantitative structure-activity relationships (QSAR), metabolite prediction, the use of phamacophores in drug discovery, data reduction and visualization, and text mining. The aim is to set the scene for the more detailed exposition of these topics in the later chapters.
Collapse
Affiliation(s)
- Wendy A Warr
- Wendy Warr & Associates, Holmes Chapel, Cheshire, UK
| |
Collapse
|
32
|
Discriminating of HMG-CoA reductase inhibitors and decoys using self-organizing maps. Mol Divers 2010; 15:655-63. [DOI: 10.1007/s11030-010-9288-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2010] [Accepted: 10/22/2010] [Indexed: 10/18/2022]
|
33
|
Wawer M, Lounkine E, Wassermann AM, Bajorath J. Data structures and computational tools for the extraction of SAR information from large compound sets. Drug Discov Today 2010; 15:630-9. [DOI: 10.1016/j.drudis.2010.06.004] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2010] [Revised: 05/17/2010] [Accepted: 06/07/2010] [Indexed: 12/12/2022]
|
34
|
Kortagere S, Ekins S. Troubleshooting computational methods in drug discovery. J Pharmacol Toxicol Methods 2010; 61:67-75. [PMID: 20176118 DOI: 10.1016/j.vascn.2010.02.005] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2010] [Accepted: 02/11/2010] [Indexed: 10/19/2022]
Abstract
Computational approaches for drug discovery such as ligand-based and structure-based methods, are increasingly seen as an efficient approach for lead discovery as well as providing insights on absorption, distribution, metabolism, excretion and toxicity (ADME/Tox). What is perhaps less well known and widely described are the limitations of the different technologies. We have therefore presented a troubleshooting approach to QSAR, homology modeling, docking as well as hybrid methods. If such computational or cheminformatics methods are to become more widely used by non-experts it is critical that such limitations are brought to the user's attention and addressed during their workflows. This could improve the quality of the models and results that are obtained.
Collapse
Affiliation(s)
- Sandhya Kortagere
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, PA 19129, USA.
| | | |
Collapse
|
35
|
Reymond JL, van Deursen R, Blum LC, Ruddigkeit L. Chemical space as a source for new drugs. MEDCHEMCOMM 2010. [DOI: 10.1039/c0md00020e] [Citation(s) in RCA: 210] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|