1
|
Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, Tropsha A. QSAR without borders. Chem Soc Rev 2020; 49:3525-3564. [PMID: 32356548 PMCID: PMC8008490 DOI: 10.1039/d0cs00098a] [Citation(s) in RCA: 319] [Impact Index Per Article: 79.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
Collapse
Affiliation(s)
- Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
2
|
Horvath D, Marcou G, Varnek A. "Big Data" Fast Chemoinformatics Model to Predict Generalized Born Radius and Solvent Accessibility as a Function of Geometry. J Chem Inf Model 2020; 60:2951-2965. [PMID: 32374171 DOI: 10.1021/acs.jcim.9b01172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The Generalized Born (GB) solvent model is offering the best accuracy/computing effort ratio yet requires drastic simplifications to estimate of the Effective Born Radii (EBR) in bypassing a too expensive volume integration step. EBRs are a measure of the degree of burial of an atom and not very sensitive to small changes of geometry: in molecular dynamics, the costly EBR update procedure is not mandatory at every step. This work however aims at implementing a GB model into the Sampler for Multiple Protein-Ligand Entities (S4MPLE) evolutionary algorithm with mandatory EBR updates at each step triggering arbitrarily large geometric changes. Therefore, a quantitative structure-property relationship has been developed in order to express the EBRs as a linear function of both the topological neighborhood and geometric occupancy of the space around atoms. A training set of 810 molecular systems, starting from fragment-like to drug-like compounds, proteins, host-guest systems, and ligand-protein complexes, has been compiled. For each species, S4MPLE generated several hundreds of random conformers. For each atom in each geometry of each species, its "standard" EBR was calculated by numeric integration and associated to topological and geometric descriptors of the atom neighborhood. This training set (EBR, atom descriptors) involving >5 M entries was subjected to a boot-strapping multilinear regression process with descriptor selection. In parallel, the strategy was repurposed to also learn atomic solvent-accessible areas (SA) based on the same descriptors. Resulting linear equations were challenged to predict EBR and SA values for a similarly compiled external set of >2000 new molecular systems. Solvation energies calculated with estimated EBR and SA match "standard" energies within the typical error of a force-field-based approach (a few kilocalories per mole). Given the extreme diversity of molecular systems covered by the model, this simple EBR/SA estimator covers a vast applicability domain.
Collapse
Affiliation(s)
- Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| |
Collapse
|
3
|
Lin A, Horvath D, Marcou G, Beck B, Varnek A. Multi-task generative topographic mapping in virtual screening. J Comput Aided Mol Des 2019; 33:331-343. [PMID: 30739238 DOI: 10.1007/s10822-019-00188-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 02/02/2019] [Indexed: 12/16/2022]
Abstract
The previously reported procedure to generate "universal" Generative Topographic Maps (GTMs) of the drug-like chemical space is in practice a multi-task learning process, in which both operational GTM parameters (example: map grid size) and hyperparameters (key example: the molecular descriptor space to be used) are being chosen by an evolutionary process in order to fit/select "universal" GTM manifolds. After selection (a one-time task aimed at optimizing the compromise in terms of neighborhood behavior compliance, over a large pool of various biological targets), for any further use the manifolds are ready to provide "fit-free" predictive models. Using any structure-activity set-irrespectively whether the associated target served at map fitting stage or not-the generation or "coloring" a property landscape enables predicting the property for any external molecule, with zero additional fitable parameters involved. While previous works have signaled the excellent behavior of such models in aggressive three-fold cross-validation assessments of their predictive power, the present work wished to explore their behavior in Virtual Screening (VS), here simulated on hand of external DUD ligand and decoy series that are fully disjoint from the ChEMBL-extracted landscape coloring sets. Beyond the rather robust results of the universal GTM manifolds in this challenge, it could be shown that the descriptor spaces selected by the evolutionary multi-task learner were intrinsically able to serve as an excellent support for many other VS procedures, starting from parameter-free similarity searching, to local (target-specific) GTM models, to parameter-rich, nonlinear Random Forest and Neural Network approaches.
Collapse
Affiliation(s)
- Arkadii Lin
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorferstrasse 65, 88397, Biberach an der Riss, Germany
| | - Dragos Horvath
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France
| | - Bernd Beck
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorferstrasse 65, 88397, Biberach an der Riss, Germany
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France.
| |
Collapse
|
4
|
Kooistra AJ, Vass M, McGuire R, Leurs R, de Esch IJP, Vriend G, Verhoeven S, de Graaf C. 3D-e-Chem: Structural Cheminformatics Workflows for Computer-Aided Drug Discovery. ChemMedChem 2018; 13:614-626. [PMID: 29337438 PMCID: PMC5900740 DOI: 10.1002/cmdc.201700754] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 01/11/2018] [Indexed: 01/06/2023]
Abstract
eScience technologies are needed to process the information available in many heterogeneous types of protein-ligand interaction data and to capture these data into models that enable the design of efficacious and safe medicines. Here we present scientific KNIME tools and workflows that enable the integration of chemical, pharmacological, and structural information for: i) structure-based bioactivity data mapping, ii) structure-based identification of scaffold replacement strategies for ligand design, iii) ligand-based target prediction, iv) protein sequence-based binding site identification and ligand repurposing, and v) structure-based pharmacophore comparison for ligand repurposing across protein families. The modular setup of the workflows and the use of well-established standards allows the re-use of these protocols and facilitates the design of customized computer-aided drug discovery workflows.
Collapse
Affiliation(s)
- Albert J. Kooistra
- Centre for Molecular and Biomolecular Informatics (CMBI)Radboud University Medical Center (RadboudUMC)NijmegenThe Netherlands
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| | - Márton Vass
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| | - Ross McGuire
- Centre for Molecular and Biomolecular Informatics (CMBI)Radboud University Medical Center (RadboudUMC)NijmegenThe Netherlands
- BioAxis Research, Pivot ParkOssThe Netherlands
| | - Rob Leurs
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| | - Iwan J. P. de Esch
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| | - Gert Vriend
- Centre for Molecular and Biomolecular Informatics (CMBI)Radboud University Medical Center (RadboudUMC)NijmegenThe Netherlands
| | | | - Chris de Graaf
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| |
Collapse
|
5
|
Simões RS, Maltarollo VG, Oliveira PR, Honorio KM. Transfer and Multi-task Learning in QSAR Modeling: Advances and Challenges. Front Pharmacol 2018; 9:74. [PMID: 29467659 PMCID: PMC5807924 DOI: 10.3389/fphar.2018.00074] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 01/22/2018] [Indexed: 12/11/2022] Open
Abstract
Medicinal chemistry projects involve some steps aiming to develop a new drug, such as the analysis of biological targets related to a given disease, the discovery and the development of drug candidates for these targets, performing parallel biological tests to validate the drug effectiveness and side effects. Approaches as quantitative study of activity-structure relationships (QSAR) involve the construction of predictive models that relate a set of descriptors of a chemical compound series and its biological activities with respect to one or more targets in the human body. Datasets used to perform QSAR analyses are generally characterized by a small number of samples and this makes them more complex to build accurate predictive models. In this context, transfer and multi-task learning techniques are very suitable since they take information from other QSAR models to the same biological target, reducing efforts and costs for generating new chemical compounds. Therefore, this review will present the main features of transfer and multi-task learning studies, as well as some applications and its potentiality in drug design projects.
Collapse
Affiliation(s)
- Rodolfo S Simões
- School of Arts, Sciences and Humanities, University of São Paulo, São Paulo, Brazil
| | - Vinicius G Maltarollo
- Department of Pharmaceutical Products, Faculty of Pharmacy, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Patricia R Oliveira
- School of Arts, Sciences and Humanities, University of São Paulo, São Paulo, Brazil
| | - Kathia M Honorio
- School of Arts, Sciences and Humanities, University of São Paulo, São Paulo, Brazil.,Center for Natural and Human Sciences, Federal University of ABC, Santo André, Brazil
| |
Collapse
|
6
|
Rakers C, Najnin RA, Polash AH, Takeda S, Brown J. Chemogenomic Active Learning's Domain of Applicability on Small, Sparse qHTS Matrices: A Study Using Cytochrome P450 and Nuclear Hormone Receptor Families. ChemMedChem 2018; 13:511-521. [DOI: 10.1002/cmdc.201700677] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 12/04/2017] [Indexed: 01/21/2023]
Affiliation(s)
- Christin Rakers
- Institute of Transformative bio-Molecules, WPI-ITbM; Nagoya University; Furo-cho Chikusa-ku Nagoya 464-8602 Japan
| | - Rifat Ara Najnin
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - Ahsan Habib Polash
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - Shunichi Takeda
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - J.B. Brown
- Laboratory for Molecular Biosciences; Kyoto University Graduate School of Medicine; Yoshida-konoemachi Building E 606-8501 Kyoto Sakyo Japan
| |
Collapse
|
7
|
Assessment of tautomer distribution using the condensed reaction graph approach. J Comput Aided Mol Des 2018; 32:401-414. [DOI: 10.1007/s10822-018-0101-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 01/18/2018] [Indexed: 02/07/2023]
|
8
|
Abstract
Following the elucidation of the human genome, chemogenomics emerged in the beginning of the twenty-first century as an interdisciplinary research field with the aim to accelerate target and drug discovery by making best usage of the genomic data and the data linkable to it. What started as a systematization approach within protein target families now encompasses all types of chemical compounds and gene products. A key objective of chemogenomics is the establishment, extension, analysis, and prediction of a comprehensive SAR matrix which by application will enable further systematization in drug discovery. Herein we outline future perspectives of chemogenomics including the extension to new molecular modalities, or the potential extension beyond the pharma to the agro and nutrition sectors, and the importance for environmental protection. The focus is on computational sciences with potential applications for compound library design, virtual screening, hit assessment, analysis of phenotypic screens, lead finding and optimization, and systems biology-based prediction of toxicology and translational research.
Collapse
Affiliation(s)
- Edgar Jacoby
- Janssen Research & Development, Beerse, Belgium.
| | - J B Brown
- Life Science Informatics Research Unit, Laboratory of Molecular Biosciences, Kyoto University Graduate School of Medicine, Kyoto, Japan
| |
Collapse
|
9
|
Xu Y, Ma J, Liaw A, Sheridan RP, Svetnik V. Demystifying Multitask Deep Neural Networks for Quantitative Structure-Activity Relationships. J Chem Inf Model 2017; 57:2490-2504. [PMID: 28872869 DOI: 10.1021/acs.jcim.7b00087] [Citation(s) in RCA: 130] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Deep neural networks (DNNs) are complex computational models that have found great success in many artificial intelligence applications, such as computer vision1,2 and natural language processing.3,4 In the past four years, DNNs have also generated promising results for quantitative structure-activity relationship (QSAR) tasks.5,6 Previous work showed that DNNs can routinely make better predictions than traditional methods, such as random forests, on a diverse collection of QSAR data sets. It was also found that multitask DNN models-those trained on and predicting multiple QSAR properties simultaneously-outperform DNNs trained separately on the individual data sets in many, but not all, tasks. To date there has been no satisfactory explanation of why the QSAR of one task embedded in a multitask DNN can borrow information from other unrelated QSAR tasks. Thus, using multitask DNNs in a way that consistently provides a predictive advantage becomes a challenge. In this work, we explored why multitask DNNs make a difference in predictive performance. Our results show that during prediction a multitask DNN does borrow "signal" from molecules with similar structures in the training sets of the other tasks. However, whether this borrowing leads to better or worse predictive performance depends on whether the activities are correlated. On the basis of this, we have developed a strategy to use multitask DNNs that incorporate prior domain knowledge to select training sets with correlated activities, and we demonstrate its effectiveness on several examples.
Collapse
Affiliation(s)
- Yuting Xu
- Biometrics Research Department, Merck & Co., Inc. , Rahway, New Jersey 07065, United States
| | - Junshui Ma
- Biometrics Research Department, Merck & Co., Inc. , Rahway, New Jersey 07065, United States
| | - Andy Liaw
- Biometrics Research Department, Merck & Co., Inc. , Rahway, New Jersey 07065, United States
| | - Robert P Sheridan
- Modeling and Informatics Department, Merck & Co., Inc. , Kenilworth, New Jersey 07033, United States
| | - Vladimir Svetnik
- Biometrics Research Department, Merck & Co., Inc. , Rahway, New Jersey 07065, United States
| |
Collapse
|
10
|
Marcou G, Horvath D, Varnek A. Neighboring Structure Visualization on a Grid-based Layout. Mol Inform 2017; 36. [PMID: 28902973 DOI: 10.1002/minf.201700047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 06/12/2017] [Indexed: 11/09/2022]
Abstract
Here, we describe an algorithm to visualize chemical structures on a grid-based layout in such a way that similar structures are neighboring. It is based on structure reordering with the help of the Hilbert Schmidt Independence Criterion, representing an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator. The method can be applied to any layout of bi- or three-dimensional shape. The approach is demonstrated on a set of dopamine D5 ligands visualized on squared, disk and spherical layouts.
Collapse
Affiliation(s)
- G Marcou
- Laboratory of Chemoinformatics, University of Strasbourg, 1 rue Biaise Pascal, 67000, Strasbourg, France
| | - D Horvath
- Laboratory of Chemoinformatics, University of Strasbourg, 1 rue Biaise Pascal, 67000, Strasbourg, France
| | - A Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, 1 rue Biaise Pascal, 67000, Strasbourg, France
| |
Collapse
|
11
|
Abstract
Aim: Computational chemogenomics models the compound–protein interaction space, typically for drug discovery, where existing methods predominantly either incorporate increasing numbers of bioactivity samples or focus on specific subfamilies of proteins and ligands. As an alternative to modeling entire large datasets at once, active learning adaptively incorporates a minimum of informative examples for modeling, yielding compact but high quality models. Results/methodology: We assessed active learning for protein/target family-wide chemogenomic modeling by replicate experiment. Results demonstrate that small yet highly predictive models can be extracted from only 10–25% of large bioactivity datasets, irrespective of molecule descriptors used. Conclusion: Chemogenomic active learning identifies small subsets of ligand–target interactions in a large screening database that lead to knowledge discovery and highly predictive models.
Collapse
|
12
|
Small Random Forest Models for Effective Chemogenomic Active Learning. JOURNAL OF COMPUTER AIDED CHEMISTRY 2017. [DOI: 10.2751/jcac.18.124] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
13
|
Glavatskikh M, Madzhidov T, Solov'ev V, Marcou G, Horvath D, Varnek A. Predictive Models for the Free Energy of Hydrogen Bonded Complexes with Single and Cooperative Hydrogen Bonds. Mol Inform 2016; 35:629-638. [DOI: 10.1002/minf.201600070] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2016] [Accepted: 06/27/2016] [Indexed: 11/10/2022]
Affiliation(s)
- Marta Glavatskikh
- Laboratoire de Chémoinformatique; UMR 7140 CNRS; Université de Strasbourg; 1, rue Blaise Pascal 67000 Strasbourg France
- Laboratory of Chemoinformatics and Molecular Modeling; Butlerov Institut of Chemistry; Kazan Federal University; Kremlevskaya 18 Kazan Russia
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling; Butlerov Institut of Chemistry; Kazan Federal University; Kremlevskaya 18 Kazan Russia
| | - Vitaly Solov'ev
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry; Russian Academy of Sciences; Leninskiy prosp., 31 119071 Moscow Russia
| | - Gilles Marcou
- Laboratoire de Chémoinformatique; UMR 7140 CNRS; Université de Strasbourg; 1, rue Blaise Pascal 67000 Strasbourg France
| | - Dragos Horvath
- Laboratoire de Chémoinformatique; UMR 7140 CNRS; Université de Strasbourg; 1, rue Blaise Pascal 67000 Strasbourg France
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique; UMR 7140 CNRS; Université de Strasbourg; 1, rue Blaise Pascal 67000 Strasbourg France
| |
Collapse
|
14
|
Gaspar HA, Sidorov P, Horvath D, Baskin II, Marcou G, Varnek A. Generative Topographic Mapping Approach to Chemical Space Analysis. FRONTIERS IN MOLECULAR DESIGN AND CHEMICAL INFORMATION SCIENCE - HERMAN SKOLNIK AWARD SYMPOSIUM 2015: JÜRGEN BAJORATH 2016. [DOI: 10.1021/bk-2016-1222.ch011] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Affiliation(s)
- Héléna A. Gaspar
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Pavel Sidorov
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Dragos Horvath
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Igor I. Baskin
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Gilles Marcou
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| |
Collapse
|
15
|
Marcou G, Horvath D, Varnek A. Kernel Target Alignment Parameter: A New Modelability Measure for Regression Tasks. J Chem Inf Model 2015; 56:6-11. [PMID: 26673976 DOI: 10.1021/acs.jcim.5b00539] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
In this paper, we demonstrate that the kernel target alignment (KTA) parameter can efficiently be used to estimate the relevance of molecular descriptors for QSAR modeling on a given data set, i.e., as a modelability measure. The efficiency of KTA to assess modelability was demonstrated in two series of QSAR modeling studies, either varying different descriptor spaces for one same data set, or comparing various data sets within one same descriptor space. Considered data sets included 25 series of various GPCR binders with ChEMBL-reported pKi values, and a toxicity data set. Employed descriptor spaces covered more than 100 different ISIDA fragment descriptor types, and ChemAxon BCUT terms. Model performances (RMSE) were seen to anticorrelate consistently with the KTA parameter. Two other modelability measures were employed for benchmarking purposes: the Jaccard distance average over the data set (Div), and a measure related to the normalized mean absolute error (MAE) obtained in 1-nearest neighbors calculations on the training set (Sim = 1 - MAE). It has been demonstrated that both Div and Sim perform similarly to KTA. However, a consensus index combining KTA, Div and Sim provides a more robust correlation with RMSE than any of the individual modelability measures.
Collapse
Affiliation(s)
- Gilles Marcou
- Laboratory of Chemoinformatics, University of Strasbourg , 1 rue Blaise Pascal, 67000 Strasbourg, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, University of Strasbourg , 1 rue Blaise Pascal, 67000 Strasbourg, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, University of Strasbourg , 1 rue Blaise Pascal, 67000 Strasbourg, France.,Laboratory of Chemoinformatics, Federal University of Kazan , Kremlevskaya str. 18, 420008 Kazan, Russia
| |
Collapse
|
16
|
Large-Scale Prediction of Beneficial Drug Combinations Using Drug Efficacy and Target Profiles. J Chem Inf Model 2015; 55:2705-16. [PMID: 26624799 DOI: 10.1021/acs.jcim.5b00444] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The identification of beneficial drug combinations is a challenging issue in pharmaceutical and clinical research toward combinatorial drug therapy. In the present study, we developed a novel computational method for large-scale prediction of beneficial drug combinations using drug efficacy and target profiles. We designed an informative descriptor for each drug-drug pair based on multiple drug profiles representing drug-targeted proteins and Anatomical Therapeutic Chemical Classification System codes. Then, we constructed a predictive model by learning a sparsity-induced classifier based on known drug combinations from the Orange Book and KEGG DRUG databases. Our results show that the proposed method outperforms the previous methods in terms of the accuracy of high-confidence predictions, and the extracted features are biologically meaningful. Finally, we performed a comprehensive prediction of novel drug combinations for 2,639 approved drugs, which predicted 142,988 new potentially beneficial drug-drug pairs. We showed several examples of successfully predicted drug combinations for a variety of diseases.
Collapse
|
17
|
Sidorov P, Gaspar H, Marcou G, Varnek A, Horvath D. Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds. J Comput Aided Mol Des 2015; 29:1087-108. [PMID: 26564142 DOI: 10.1007/s10822-015-9882-z] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 11/06/2015] [Indexed: 11/30/2022]
Abstract
Intuitive, visual rendering--mapping--of high-dimensional chemical spaces (CS), is an important topic in chemoinformatics. Such maps were so far dedicated to specific compound collections--either limited series of known activities, or large, even exhaustive enumerations of molecules, but without associated property data. Typically, they were challenged to answer some classification problem with respect to those same molecules, admired for their aesthetical virtues and then forgotten--because they were set-specific constructs. This work wishes to address the question whether a general, compound set-independent map can be generated, and the claim of "universality" quantitatively justified, with respect to all the structure-activity information available so far--or, more realistically, an exploitable but significant fraction thereof. The "universal" CS map is expected to project molecules from the initial CS into a lower-dimensional space that is neighborhood behavior-compliant with respect to a large panel of ligand properties. Such map should be able to discriminate actives from inactives, or even support quantitative neighborhood-based, parameter-free property prediction (regression) models, for a wide panel of targets and target families. It should be polypharmacologically competent, without requiring any target-specific parameter fitting. This work describes an evolutionary growth procedure of such maps, based on generative topographic mapping, followed by the validation of their polypharmacological competence. Validation was achieved with respect to a maximum of exploitable structure-activity information, covering all of Homo sapiens proteins of the ChEMBL database, antiparasitic and antiviral data, etc. Five evolved maps satisfactorily solved hundreds of activity-based ligand classification challenges for targets, and even in vivo properties independent from training data. They also stood chemogenomics-related challenges, as cumulated responsibility vectors obtained by mapping of target-specific ligand collections were shown to represent validated target descriptors, complying with currently accepted target classification in biology. Therefore, they represent, in our opinion, a robust and well documented answer to the key question "What is a good CS map?"
Collapse
Affiliation(s)
- Pavel Sidorov
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France.,Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Helena Gaspar
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France
| | - Gilles Marcou
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France.,Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Dragos Horvath
- Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France.
| |
Collapse
|
18
|
Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A. Stargate GTM: Bridging Descriptor and Activity Spaces. J Chem Inf Model 2015; 55:2403-10. [DOI: 10.1021/acs.jcim.5b00398] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Héléna A. Gaspar
- Laboratoire
de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue
Blaise Pascal, Strasbourg 67000, France
| | - Igor I. Baskin
- Faculty
of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory
of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan 420008, Russia
| | - Gilles Marcou
- Laboratoire
de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue
Blaise Pascal, Strasbourg 67000, France
| | - Dragos Horvath
- Laboratoire
de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue
Blaise Pascal, Strasbourg 67000, France
| | - Alexandre Varnek
- Laboratoire
de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue
Blaise Pascal, Strasbourg 67000, France
- Laboratory
of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan 420008, Russia
| |
Collapse
|
19
|
Cortés-Ciriano I, van Westen GJP, Bouvier G, Nilges M, Overington JP, Bender A, Malliavin TE. Improved large-scale prediction of growth inhibition patterns using the NCI60 cancer cell line panel. Bioinformatics 2015; 32:85-95. [PMID: 26351271 PMCID: PMC4681992 DOI: 10.1093/bioinformatics/btv529] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 08/26/2015] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION Recent large-scale omics initiatives have catalogued the somatic alterations of cancer cell line panels along with their pharmacological response to hundreds of compounds. In this study, we have explored these data to advance computational approaches that enable more effective and targeted use of current and future anticancer therapeutics. RESULTS We modelled the 50% growth inhibition bioassay end-point (GI50) of 17,142 compounds screened against 59 cancer cell lines from the NCI60 panel (941,831 data-points, matrix 93.08% complete) by integrating the chemical and biological (cell line) information. We determine that the protein, gene transcript and miRNA abundance provide the highest predictive signal when modelling the GI50 endpoint, which significantly outperformed the DNA copy-number variation or exome sequencing data (Tukey's Honestly Significant Difference, P <0.05). We demonstrate that, within the limits of the data, our approach exhibits the ability to both interpolate and extrapolate compound bioactivities to new cell lines and tissues and, although to a lesser extent, to dissimilar compounds. Moreover, our approach outperforms previous models generated on the GDSC dataset. Finally, we determine that in the cases investigated in more detail, the predicted drug-pathway associations and growth inhibition patterns are mostly consistent with the experimental data, which also suggests the possibility of identifying genomic markers of drug sensitivity for novel compounds on novel cell lines. CONTACT terez@pasteur.fr; ab454@ac.cam.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR 3825, Structural Biology and Chemistry Department, 75 724 Paris, France
| | - Gerard J P van Westen
- Medicinal Chemistry, Leiden Academic Centre for Drug Research, Einsteinweg 55, 2333CC, Leiden
| | - Guillaume Bouvier
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR 3825, Structural Biology and Chemistry Department, 75 724 Paris, France
| | - Michael Nilges
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR 3825, Structural Biology and Chemistry Department, 75 724 Paris, France
| | - John P Overington
- European Molecular Biology Laboratory European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Hinxton, Cambridge, UK and
| | - Andreas Bender
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, CB2 1EW Cambridge, UK
| | - Thérèse E Malliavin
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR 3825, Structural Biology and Chemistry Department, 75 724 Paris, France
| |
Collapse
|
20
|
Ain QU, Méndez-Lucio O, Ciriano IC, Malliavin T, van Westen GJP, Bender A. Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features. Integr Biol (Camb) 2015; 6:1023-33. [PMID: 25255469 DOI: 10.1039/c4ib00175c] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Serine proteases, implicated in important physiological functions, have a high intra-family similarity, which leads to unwanted off-target effects of inhibitors with insufficient selectivity. However, the availability of sequence and structure data has now made it possible to develop approaches to design pharmacological agents that can discriminate successfully between their related binding sites. In this study, we have quantified the relationship between 12,625 distinct protease inhibitors and their bioactivity against 67 targets of the serine protease family (20,213 data points) in an integrative manner, using proteochemometric modelling (PCM). The benchmarking of 21 different target descriptors motivated the usage of specific binding pocket amino acid descriptors, which helped in the identification of active site residues and selective compound chemotypes affecting compound affinity and selectivity. PCM models performed better than alternative approaches (models trained using exclusively compound descriptors on all available data, QSAR) employed for comparison with R(2)/RMSE values of 0.64 ± 0.23/0.66 ± 0.20 vs. 0.35 ± 0.27/1.05 ± 0.27 log units, respectively. Moreover, the interpretation of the PCM model singled out various chemical substructures responsible for bioactivity and selectivity towards particular proteases (thrombin, trypsin and coagulation factor 10) in agreement with the literature. For instance, absence of a tertiary sulphonamide was identified to be responsible for decreased selective activity (by on average 0.27 ± 0.65 pChEMBL units) on FA10. Among the binding pocket residues, the amino acids (arginine, leucine and tyrosine) at positions 35, 39, 60, 93, 140 and 207 were observed as key contributing residues for selective affinity on these three targets.
Collapse
Affiliation(s)
- Qurrat U Ain
- Centre for Molecular Informatics, Department of Chemistry, Lensfield Road, CB2 1EW, University of Cambridge, UK.
| | | | | | | | | | | |
Collapse
|
21
|
Paricharak S, Cortés-Ciriano I, IJzerman AP, Malliavin TE, Bender A. Proteochemometric modelling coupled to in silico target prediction: an integrated approach for the simultaneous prediction of polypharmacology and binding affinity/potency of small molecules. J Cheminform 2015; 7:15. [PMID: 25926892 PMCID: PMC4413554 DOI: 10.1186/s13321-015-0063-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 03/17/2015] [Indexed: 11/10/2022] Open
Abstract
The rampant increase of public bioactivity databases has fostered the development of computational chemogenomics methodologies to evaluate potential ligand-target interactions (polypharmacology) both in a qualitative and quantitative way. Bayesian target prediction algorithms predict the probability of an interaction between a compound and a panel of targets, thus assessing compound polypharmacology qualitatively, whereas structure-activity relationship techniques are able to provide quantitative bioactivity predictions. We propose an integrated drug discovery pipeline combining in silico target prediction and proteochemometric modelling (PCM) for the respective prediction of compound polypharmacology and potency/affinity. The proposed pipeline was evaluated on the retrospective discovery of Plasmodium falciparum DHFR inhibitors. The qualitative in silico target prediction model comprised 553,084 ligand-target associations (a total of 262,174 compounds), covering 3,481 protein targets and used protein domain annotations to extrapolate predictions across species. The prediction of bioactivities for plasmodial DHFR led to a recall value of 79% and a precision of 100%, where the latter high value arises from the structural similarity of plasmodial DHFR inhibitors and T. gondii DHFR inhibitors in the training set. Quantitative PCM models were then trained on a dataset comprising 20 eukaryotic, protozoan and bacterial DHFR sequences, and 1,505 distinct compounds (in total 3,099 data points). The most predictive PCM model exhibited R20test and RMSEtest values of 0.79 and 0.59 pIC50 units respectively, which was shown to outperform models based exclusively on compound (R20test/RMSEtest = 0.63/0.78) and target information (R20test/RMSEtest = 0.09/1.22), as well as inductive transfer knowledge between targets, with respective R20test and RMSEtest values of 0.76 and 0.63 pIC50 units. Finally, both methods were integrated to predict the protein targets and the potency on plasmodial DHFR for the GSK TCAMS dataset, which comprises 13,533 compounds displaying strong anti-malarial activity. 534 of those compounds were identified as DHFR inhibitors by the target prediction algorithm, while the PCM algorithm identified 25 compounds, and 23 compounds (predicted pIC50 > 7) were identified by both methods. Overall, this integrated approach simultaneously provides target and potency/affinity predictions for small molecules. Proteochemometric modelling coupled to in silico target prediction. ![]()
Collapse
Affiliation(s)
- Shardul Paricharak
- Department of Chemistry, Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, UK.,Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, , 2300 RA Leiden, The Netherlands
| | - Isidro Cortés-Ciriano
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR 3825, Structural Biology and Chemistry Department, 25-28, rue du Dr. Roux, 75 724 Paris, France
| | - Adriaan P IJzerman
- Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, , 2300 RA Leiden, The Netherlands
| | - Thérèse E Malliavin
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR 3825, Structural Biology and Chemistry Department, 25-28, rue du Dr. Roux, 75 724 Paris, France
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, UK
| |
Collapse
|
22
|
Cortes-Ciriano I, Murrell DS, van Westen GJ, Bender A, Malliavin TE. Prediction of the potency of mammalian cyclooxygenase inhibitors with ensemble proteochemometric modeling. J Cheminform 2015; 7:1. [PMID: 25705261 PMCID: PMC4335128 DOI: 10.1186/s13321-014-0049-z] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Accepted: 11/21/2014] [Indexed: 12/16/2022] Open
Abstract
Cyclooxygenases (COX) are present in the body in two isoforms, namely: COX-1, constitutively expressed, and COX-2, induced in physiopathological conditions such as cancer or chronic inflammation. The inhibition of COX with non-steroideal anti-inflammatory drugs (NSAIDs) is the most widely used treatment for chronic inflammation despite the adverse effects associated to prolonged NSAIDs intake. Although selective COX-2 inhibition has been shown not to palliate all adverse effects (e.g. cardiotoxicity), there are still niche populations which can benefit from selective COX-2 inhibition. Thus, capitalizing on bioactivity data from both isoforms simultaneously would contribute to develop COX inhibitors with better safety profiles. We applied ensemble proteochemometric modeling (PCM) for the prediction of the potency of 3,228 distinct COX inhibitors on 11 mammalian cyclooxygenases. Ensemble PCM models ([Formula: see text], and RMSEtest = 0.71) outperformed models exclusively trained on compound ([Formula: see text], and RMSEtest = 1.09) or protein descriptors ([Formula: see text] and RMSEtest = 1.10) on the test set. Moreover, PCM predicted COX potency for 1,086 selective and non-selective COX inhibitors with [Formula: see text] and RMSEtest = 0.76. These values are in agreement with the maximum and minimum achievable [Formula: see text] and RMSEtest values of approximately 0.68 for both metrics. Confidence intervals for individual predictions were calculated from the standard deviation of the predictions from the individual models composing the ensembles. Finally, two substructure analysis pipelines singled out chemical substructures implicated in both potency and selectivity in agreement with the literature. Graphical AbstractPrediction of uncorrelated bioactivity profiles for mammalian COX inhibitors with Ensemble Proteochemometric Modeling.
Collapse
Affiliation(s)
- Isidro Cortes-Ciriano
- Département de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3825, 25, rue du Dr Roux, Paris, 75015 France
| | - Daniel S Murrell
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Gerard Jp van Westen
- European Molecular Biology Laboratory European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Andreas Bender
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Thérèse E Malliavin
- Département de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3825, 25, rue du Dr Roux, Paris, 75015 France
| |
Collapse
|
23
|
Cortes-Ciriano I, van Westen GJ, Lenselink EB, Murrell DS, Bender A, Malliavin T. Proteochemometric modeling in a Bayesian framework. J Cheminform 2014; 6:35. [PMID: 25045403 PMCID: PMC4083135 DOI: 10.1186/1758-2946-6-35] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 06/18/2014] [Indexed: 11/10/2022] Open
Abstract
Proteochemometrics (PCM) is an approach for bioactivity predictive modeling which models the relationship between protein and chemical information. Gaussian Processes (GP), based on Bayesian inference, provide the most objective estimation of the uncertainty of the predictions, thus permitting the evaluation of the applicability domain (AD) of the model. Furthermore, the experimental error on bioactivity measurements can be used as input for this probabilistic model. In this study, we apply GP implemented with a panel of kernels on three various (and multispecies) PCM datasets. The first dataset consisted of information from 8 human and rat adenosine receptors with 10,999 small molecule ligands and their binding affinity. The second consisted of the catalytic activity of four dengue virus NS3 proteases on 56 small peptides. Finally, we have gathered bioactivity information of small molecule ligands on 91 aminergic GPCRs from 9 different species, leading to a dataset of 24,593 datapoints with a matrix completeness of only 2.43%. GP models trained on these datasets are statistically sound, at the same level of statistical significance as Support Vector Machines (SVM), with R02 values on the external dataset ranging from 0.68 to 0.92, and RMSEP values close to the experimental error. Furthermore, the best GP models obtained with the normalized polynomial and radial kernels provide intervals of confidence for the predictions in agreement with the cumulative Gaussian distribution. GP models were also interpreted on the basis of individual targets and of ligand descriptors. In the dengue dataset, the model interpretation in terms of the amino-acid positions in the tetra-peptide ligands gave biologically meaningful results.
Collapse
Affiliation(s)
- Isidro Cortes-Ciriano
- Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3825; Département de Biologie Structurale et Chimie
| | - Gerard Jp van Westen
- ChEMBL Group, European Molecular Biology Laboratory European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Hinxton, Cambridge, UK
| | - Eelke Bart Lenselink
- Division of Medicinal Chemistry, Leiden Academic Center for Drug Research, Leiden, The Netherlands
| | - Daniel S Murrell
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Andreas Bender
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Thérèse Malliavin
- Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3825; Département de Biologie Structurale et Chimie
| |
Collapse
|