1
|
Catalá TS, Speidel LG, Wenzel-Storjohann A, Dittmar T, Tasdemir D. Bioactivity profile of dissolved organic matter and its relation to molecular composition. NATURAL PRODUCTS AND BIOPROSPECTING 2023; 13:32. [PMID: 37721596 PMCID: PMC10507005 DOI: 10.1007/s13659-023-00395-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 08/30/2023] [Indexed: 09/19/2023]
Abstract
Dissolved organic matter (DOM) occupies a huge and uncharted molecular space. Given its properties, DOM can be presented as a promising biotechnological resource. However, research into bioactivities of DOM is still in early stages. In this study, the biotechnological potential of terrestrial and marine DOM, its molecular composition and their relationships are investigated. Samples were screened for their in vitro antibacterial, antifungal, anticancer and antioxidant activities. Antibacterial activity was detected against Staphylococcus aureus in almost all DOM samples, with freshwater DOM showing the lowest IC50 values. Most samples also inhibited Staphylococcus epidermidis, and four DOM extracts showed up to fourfold higher potency than the reference drug. Antifungal activity was limited to only porewater DOM towards human dermatophyte Trichophyton rubrum. No significant in vitro anticancer activity was observed. Low antioxidant potential was exerted. The molecular characterization by FT-ICR MS allowed a broad compositional overview. Three main distinguished groups have been identified by PCoA analyses. Antibacterial activities are related to high aromaticity content and highly-unsaturated molecular formulae (O-poor). Antifungal effect is correlated with highly-unsaturated molecular formulae (O-rich). Antioxidant activity is positively related to the presence of double bonds and polyphenols. This study evidenced for the first time antibacterial and antifungal activity in DOM with potential applications in cosmeceutical, pharmaceutical and aquaculture industry. The lack of cytotoxicity and the almost unlimited presence of this organic material may open new avenues in future marine bioprospecting efforts.
Collapse
Affiliation(s)
- Teresa S Catalá
- Global Society Institute, Wälderhaus, Hamburg, Germany.
- Organization for Science, Education and Global Society gGmbH, Stuttgart, Germany.
- ICBM-MPI Bridging Group for Marine Geochemistry, Institute for Chemistry and Biology of the Marine Environment (ICBM), University of Oldenburg, Oldenburg, Germany.
| | - Linn G Speidel
- ICBM-MPI Bridging Group for Marine Geochemistry, Institute for Chemistry and Biology of the Marine Environment (ICBM), University of Oldenburg, Oldenburg, Germany
- Geological Institute, Department of Earth Sciences, ETH Zurich, 8092, Zurich, Switzerland
| | - Arlette Wenzel-Storjohann
- GEOMAR Centre for Marine Biotechnology, Research Unit Marine Natural Products Chemistry, GEOMAR Helmholtz Centre for Ocean Research Kiel, Am Kiel-Kanal 44, 24106, Kiel, Germany
| | - Thorsten Dittmar
- ICBM-MPI Bridging Group for Marine Geochemistry, Institute for Chemistry and Biology of the Marine Environment (ICBM), University of Oldenburg, Oldenburg, Germany
- Helmholtz Institute for Functional Marine Biodiversity, University of Oldenburg, Oldenburg, Germany
| | - Deniz Tasdemir
- GEOMAR Centre for Marine Biotechnology, Research Unit Marine Natural Products Chemistry, GEOMAR Helmholtz Centre for Ocean Research Kiel, Am Kiel-Kanal 44, 24106, Kiel, Germany
- Kiel University, Christian-Albrechts-Platz 4, 24118, Kiel, Germany
| |
Collapse
|
2
|
Shimizu Y, Yonezawa T, Bao Y, Sakamoto J, Yokogawa M, Furuya T, Osawa M, Ikeda K. Applying deep learning to iterative screening of medium-sized molecules for protein-protein interaction-targeted drug discovery. Chem Commun (Camb) 2023; 59:6722-6725. [PMID: 37191131 DOI: 10.1039/d3cc01283b] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
We combined a library of medium-sized molecules with iterative screening using multiple machine learning algorithms that were ligand-based, which resulted in a large increase of the hit rate against a protein-protein interaction target. This was demonstrated by inhibition assays using a PPI target, Kelch-like ECH-associated protein 1/nuclear factor erythroid 2-related factor 2 (Keap1/Nrf2), and a deep neural network model based on the first-round assay data showed a highest hit rate of 27.3%. Using the models, we identified novel active and non-flat compounds far from public datasets, expanding the chemical space.
Collapse
Affiliation(s)
- Yugo Shimizu
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo 105-8512, Japan.
| | - Tomoki Yonezawa
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo 105-8512, Japan.
| | - Yu Bao
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo 105-8512, Japan.
| | - Junichi Sakamoto
- Axcelead Drug Discovery Partners, Inc., 26-1, Muraoka-Higashi 2-chome, Fujisawa, Kanagawa 251-0012, Japan
| | - Mariko Yokogawa
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo 105-8512, Japan.
| | - Toshio Furuya
- Drug Discovery Department, Research & Development Division, PharmaDesign, Inc., Hatchobori 2-19-8, Chuo-ku, Tokyo 104-0032, Japan
| | - Masanori Osawa
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo 105-8512, Japan.
| | - Kazuyoshi Ikeda
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo 105-8512, Japan.
- HPC- and AI-driven Drug Development Platform Division, Center for Computational Science, RIKEN, Yokohama 230-0045, Japan
| |
Collapse
|
3
|
Bajusz D, Keserű GM. Maximizing the integration of virtual and experimental screening in hit discovery. Expert Opin Drug Discov 2022; 17:629-640. [PMID: 35671403 DOI: 10.1080/17460441.2022.2085685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
INTRODUCTION Experimental and virtual screening contributes to the discovery of more than 50% of clinical candidates. Considering the similar concept and goals, early-phase drug discovery would benefit from the effective integration of these approaches. AREAS COVERED After reviewing the recent trends in both experimental and virtual screening, the authors discuss different integration strategies from parallel, focused, sequential, and iterative screening. Strategic considerations are demonstrated in a number of real-life case studies. EXPERT OPINION Experimental and virtual screening are complementary approaches that should be integrated in lead discovery settings. Virtual screening can access extremely large synthetically feasible chemical space that can be effectively searched on GPU clusters or cloud architectures. Experimental screening provides reliable datasets by quantitative HTS applications, and DNA-encoded libraries (DEL) have enlarged the chemical space covered by these technologies. These developments, together with the use of artificial intelligence methods, represent new options for their efficient integration. The case studies discussed here demonstrate the benefits of complementary strategies, such as focused and iterative screening.
Collapse
Affiliation(s)
- Dávid Bajusz
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Budapest, Hungary
| | - György M Keserű
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Budapest, Hungary
| |
Collapse
|
4
|
|
5
|
Schuffenhauer A, Schneider N, Hintermann S, Auld D, Blank J, Cotesta S, Engeloch C, Fechner N, Gaul C, Giovannoni J, Jansen J, Joslin J, Krastel P, Lounkine E, Manchester J, Monovich LG, Pelliccioli AP, Schwarze M, Shultz MD, Stiefl N, Baeschlin DK. Evolution of Novartis' Small Molecule Screening Deck Design. J Med Chem 2020; 63:14425-14447. [PMID: 33140646 DOI: 10.1021/acs.jmedchem.0c01332] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
This article summarizes the evolution of the screening deck at the Novartis Institutes for BioMedical Research (NIBR). Historically, the screening deck was an assembly of all available compounds. In 2015, we designed a first deck to facilitate access to diverse subsets with optimized properties. We allocated the compounds as plated subsets on a 2D grid with property based ranking in one dimension and increasing structural redundancy in the other. The learnings from the 2015 screening deck were applied to the design of a next generation in 2019. We found that using traditional leadlikeness criteria (mainly MW, clogP) reduces the hit rates of attractive chemical starting points in subset screening. Consequently, the 2019 deck relies on solubility and permeability to select preferred compounds. The 2019 design also uses NIBR's experimental assay data and inferred biological activity profiles in addition to structural diversity to define redundancy across the compound sets.
Collapse
Affiliation(s)
- Ansgar Schuffenhauer
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Nadine Schneider
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Samuel Hintermann
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Douglas Auld
- Novartis Institutes for BioMedical Research Inc., 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Jutta Blank
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Simona Cotesta
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Caroline Engeloch
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Nikolas Fechner
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Christoph Gaul
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Jerome Giovannoni
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Johanna Jansen
- Novartis Institutes for BioMedical Research-Emeryville, 5300 Chiron Way, Emeryville, California 94608-2916, United States
| | - John Joslin
- Genomics Institute of the Novartis Foundation, San Diego, California 92121, United States
| | - Philipp Krastel
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Eugen Lounkine
- Novartis Institutes for BioMedical Research Inc., 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - John Manchester
- Novartis Institutes for BioMedical Research Inc., 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Lauren G Monovich
- Novartis Institutes for BioMedical Research Inc., 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Anna Paola Pelliccioli
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Manuel Schwarze
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Michael D Shultz
- Novartis Institutes for BioMedical Research Inc., 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Nikolaus Stiefl
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Daniel K Baeschlin
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| |
Collapse
|
6
|
Burggraaff L, Lenselink EB, Jespers W, van Engelen J, Bongers BJ, González MG, Liu R, Hoos HH, van Vlijmen HWT, IJzerman AP, van Westen GJP. Successive Statistical and Structure-Based Modeling to Identify Chemically Novel Kinase Inhibitors. J Chem Inf Model 2020; 60:4283-4295. [PMID: 32343143 PMCID: PMC7525794 DOI: 10.1021/acs.jcim.9b01204] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
![]()
Kinases are frequently
studied in the context of anticancer drugs.
Their involvement in cell responses, such as proliferation, differentiation,
and apoptosis, makes them interesting subjects in multitarget drug
design. In this study, a workflow is presented that models the bioactivity
spectra for two panels of kinases: (1) inhibition of RET, BRAF, SRC,
and S6K, while avoiding inhibition of MKNK1, TTK, ERK8, PDK1, and
PAK3, and (2) inhibition of AURKA, PAK1, FGFR1, and LKB1, while avoiding
inhibition of PAK3, TAK1, and PIK3CA. Both statistical and structure-based
models were included, which were thoroughly benchmarked and optimized.
A virtual screening was performed to test the workflow for one of
the main targets, RET kinase. This resulted in 5 novel and chemically
dissimilar RET inhibitors with remaining RET activity of <60% (at
a concentration of 10 μM) and similarities with known RET inhibitors
from 0.18 to 0.29 (Tanimoto, ECFP6). The four more potent inhibitors
were assessed in a concentration range and proved to be modestly active
with a pIC50 value of 5.1 for the most active compound.
The experimental validation of inhibitors for RET strongly indicates
that the multitarget workflow is able to detect novel inhibitors for
kinases, and hence, this workflow can potentially be applied in polypharmacology
modeling. We conclude that this approach can identify new chemical
matter for existing targets. Moreover, this workflow can easily be
applied to other targets as well.
Collapse
Affiliation(s)
- Lindsey Burggraaff
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Eelke B Lenselink
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Willem Jespers
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.,Department of Cell and Molecular Biology, Uppsala University, Uppsala 75124, Sweden
| | - Jesper van Engelen
- Department of Computer Science, Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
| | - Brandon J Bongers
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Marina Gorostiola González
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Rongfang Liu
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Holger H Hoos
- Department of Computer Science, Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
| | - Herman W T van Vlijmen
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.,Janssen Research & Development, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Adriaan P IJzerman
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| |
Collapse
|
7
|
Dreiman GHS, Bictash M, Fish PV, Griffin L, Svensson F. Changing the HTS Paradigm: AI-Driven Iterative Screening for Hit Finding. SLAS DISCOVERY 2020; 26:257-262. [PMID: 32808550 PMCID: PMC7838329 DOI: 10.1177/2472555220949495] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Iterative screening is a process in which screening is done in batches, with each batch filled by using machine learning to select the most promising compounds from the library based on the previous results. We believe iterative screening is poised to enhance the screening process by improving hit finding while at the same time reducing the number of compounds screened. In addition, we see this process as a key enabler of next-generation high-throughput screening (HTS), which uses more complex assays that better describe the biology but demand more resource per screened compound. To demonstrate the utility of these methods, we retrospectively analyze HTS data from PubChem with a focus on machine learning–based screening strategies that can be readily implemented in practice. Our results show that over a variety of HTS experimental paradigms, an iterative screening setup that screens a total of 35% of the screening collection over as few as three iterations has a median return rate of approximately 70% of the active compounds. Increasing the portion of the library screened to 50% yields median returns of approximately 80% of actives. Using six iterations increases these return rates to 78% and 90%, respectively. The best results were achieved with machine learning models that can be run on a standard desktop. By demonstrating that the utility of iterative screening holds true even with a small number of iterations, and without requiring significant computational resources, we provide a roadmap for the practical implementation of these techniques in hit finding.
Collapse
Affiliation(s)
- Gabriel H S Dreiman
- The Alzheimer's Research UK University College London Drug Discovery Institute, London, UK.,Department of Computer Science, University College London, London, UK
| | - Magda Bictash
- The Alzheimer's Research UK University College London Drug Discovery Institute, London, UK
| | - Paul V Fish
- The Alzheimer's Research UK University College London Drug Discovery Institute, London, UK
| | - Lewis Griffin
- Department of Computer Science, University College London, London, UK
| | - Fredrik Svensson
- The Alzheimer's Research UK University College London Drug Discovery Institute, London, UK
| |
Collapse
|
8
|
Reker D. Practical considerations for active machine learning in drug discovery. DRUG DISCOVERY TODAY. TECHNOLOGIES 2020; 32-33:73-79. [PMID: 33386097 DOI: 10.1016/j.ddtec.2020.06.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 06/01/2020] [Accepted: 06/10/2020] [Indexed: 02/01/2023]
Abstract
Active machine learning enables the automated selection of the most valuable next experiments to improve predictive modelling and hasten active retrieval in drug discovery. Although a long established theoretical concept and introduced to drug discovery approximately 15 years ago, the deployment of active learning technology in the discovery pipelines across academia and industry remains slow. With the recent re-discovered enthusiasm for artificial intelligence as well as improved flexibility of laboratory automation, active learning is expected to surge and become a key technology for molecular optimizations. This review recapitulates key findings from previous active learning studies to highlight the challenges and opportunities of applying adaptive machine learning to drug discovery. Specifically, considerations regarding implementation, infrastructural integration, and expected benefits are discussed. By focusing on these practical aspects of active learning, this review aims at providing insights for scientists planning to implement active learning workflows in their discovery pipelines.
Collapse
Affiliation(s)
- Daniel Reker
- Koch Institute for Integrative Cancer Research and MIT-IBM Watson AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA; Division of Gastroenterology, Hepatology and Endoscopy, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
9
|
Škuta C, Cortés-Ciriano I, Dehaen W, Kříž P, van Westen GJP, Tetko IV, Bender A, Svozil D. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping. J Cheminform 2020; 12:39. [PMID: 33431038 PMCID: PMC7260783 DOI: 10.1186/s13321-020-00443-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 05/16/2020] [Indexed: 02/11/2023] Open
Abstract
An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds.![]()
Collapse
Affiliation(s)
- C Škuta
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic
| | - I Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - W Dehaen
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic.,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - P Kříž
- Department of Mathematics, Faculty of Chemical Engineering, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - G J P van Westen
- Computational Drug Discovery, Drug Discovery and Safety, LACDR, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - I V Tetko
- Helmholtz Zentrum Muenchen - German Research Center for Environmental Health (GmbH) and BIGCHEM GmbH, Ingolstaedter Landstrasse 1, 85764, Neuherberg, Germany
| | - A Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - D Svozil
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic. .,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic.
| |
Collapse
|
10
|
Yan XC, Sanders JM, Gao YD, Tudor M, Haidle AM, Klein DJ, Converso A, Lesburg CA, Zang Y, Wood HB. Augmenting Hit Identification by Virtual Screening Techniques in Small Molecule Drug Discovery. J Chem Inf Model 2020; 60:4144-4152. [PMID: 32309939 DOI: 10.1021/acs.jcim.0c00113] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Two orthogonal approaches for hit identification in drug discovery are large-scale in vitro and in silico screening. In recent years, due to the emergence of new targets and a rapid increase in the size of the readily synthesizable chemical space, there is a growing emphasis on the integration of the two techniques to improve the hit finding efficiency. Here, we highlight three examples of drug discovery projects at Merck & Co., Inc., Kenilworth, NJ, USA in which different virtual screening (VS) techniques, each specifically tailored to leverage knowledge available for the target, were utilized to augment the selection of high-quality chemical matter for in vitro assays and to enhance the diversity and tractability of hits. Central to success is a fully integrated workflow combining in silico and experimental expertise at every stage of the hit identification process. We advocate that workflows encompassing VS as part of an integrated hit finding plan should be widely adopted to accelerate hit identification and foster cross-functional collaborations in modern drug discovery.
Collapse
|
11
|
Willems H, De Cesco S, Svensson F. Computational Chemistry on a Budget: Supporting Drug Discovery with Limited Resources. J Med Chem 2020; 63:10158-10169. [DOI: 10.1021/acs.jmedchem.9b02126] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Henriëtte Willems
- The ALBORADA Drug Discovery Institute, University of Cambridge, Island Research Building, Cambridge Biomedical Campus, Hills Road, Cambridge CB2 0AH, U.K
| | - Stephane De Cesco
- Alzheimer’s Research UK Oxford Drug Discovery Institute, University of Oxford, NDM Research Building, Old Road Campus, Roosevelt Drive, Oxford OX3 7FZ, U.K
| | - Fredrik Svensson
- Alzheimer’s Research UK UCL Drug Discovery Institute, University College London, The Cruciform Building, Gower Street, London WC1E 6BT, U.K
| |
Collapse
|
12
|
Schroedl S. Current methods and challenges for deep learning in drug discovery. DRUG DISCOVERY TODAY. TECHNOLOGIES 2019; 32-33:9-17. [PMID: 33386100 DOI: 10.1016/j.ddtec.2020.07.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 06/17/2020] [Accepted: 07/24/2020] [Indexed: 12/18/2022]
Abstract
Driven by rapid advances in computer hardware and publicly available datasets over the past decade, deep learning has achieved tremendous success in the transformation of many computational disciplines. These novel technologies have had considerable impact on computer-aided drug design as well, throughout all stages of the development pipeline. A flexible toolbox of neural architectures has been developed that are well-suited to represent the sequential, topological, or geometrical concepts of chemistry and biology; and that are able to either discriminate existing molecules or to generate new ones from scratch. For some biochemical prediction tasks, the state of the art has been advanced; however, for complex and practically relevant projects, the outcomes are less clear-cut. Current deep learning methods rely on massive amounts of labeled examples, but drug discovery data is comparatively limited in quantity and quality. These problems need to be resolved and existing sources used more effectively to demonstrate that deep learning can revolutionize the field in general.
Collapse
|
13
|
David L, Arús-Pous J, Karlsson J, Engkvist O, Bjerrum EJ, Kogej T, Kriegl JM, Beck B, Chen H. Applications of Deep-Learning in Exploiting Large-Scale and Heterogeneous Compound Data in Industrial Pharmaceutical Research. Front Pharmacol 2019; 10:1303. [PMID: 31749705 PMCID: PMC6848277 DOI: 10.3389/fphar.2019.01303] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 10/14/2019] [Indexed: 12/21/2022] Open
Abstract
In recent years, the development of high-throughput screening (HTS) technologies and their establishment in an industrialized environment have given scientists the possibility to test millions of molecules and profile them against a multitude of biological targets in a short period of time, generating data in a much faster pace and with a higher quality than before. Besides the structure activity data from traditional bioassays, more complex assays such as transcriptomics profiling or imaging have also been established as routine profiling experiments thanks to the advancement of Next Generation Sequencing or automated microscopy technologies. In industrial pharmaceutical research, these technologies are typically established in conjunction with automated platforms in order to enable efficient handling of screening collections of thousands to millions of compounds. To exploit the ever-growing amount of data that are generated by these approaches, computational techniques are constantly evolving. In this regard, artificial intelligence technologies such as deep learning and machine learning methods play a key role in cheminformatics and bio-image analytics fields to address activity prediction, scaffold hopping, de novo molecule design, reaction/retrosynthesis predictions, or high content screening analysis. Herein we summarize the current state of analyzing large-scale compound data in industrial pharmaceutical research and describe the impact it has had on the drug discovery process over the last two decades, with a specific focus on deep-learning technologies.
Collapse
Affiliation(s)
- Laurianne David
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
- Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Josep Arús-Pous
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
- Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
| | - Johan Karlsson
- Quantitative Biology, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Esben Jannik Bjerrum
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Thierry Kogej
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Jan M. Kriegl
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riss, Germany
| | - Bernd Beck
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riss, Germany
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
- Chemistry and Chemical Biology Centre, Guangzhou Regenerative Medicine and Health – Guangdong Laboratory, Guangzhou, China
| |
Collapse
|
14
|
Laufkötter O, Sturm N, Bajorath J, Chen H, Engkvist O. Combining structural and bioactivity-based fingerprints improves prediction performance and scaffold hopping capability. J Cheminform 2019; 11:54. [PMID: 31396716 PMCID: PMC6686534 DOI: 10.1186/s13321-019-0376-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 07/31/2019] [Indexed: 11/29/2022] Open
Abstract
This study aims at improving upon existing activity predictions methods by augmenting chemical structure fingerprints with bio-activity based fingerprints derived from high-throughput screening (HTS) data (HTSFPs) and thereby showcasing the benefits of combining different descriptor types. This type of descriptor would be applied in an iterative screening scenario for more targeted compound set selection. The HTSFPs were generated from HTS data obtained from PubChem and combined with an ECFP4 structural fingerprint. The bioactivity-structure hybrid (BaSH) fingerprint was benchmarked against the individual ECFP4 and HTSFP fingerprints. Their performance was evaluated via retrospective analysis of a subset of the PubChem HTS data. Results showed that the BaSH fingerprint has improved predictive performance as well as scaffold hopping capability. The BaSH fingerprint identified unique compounds compared to both the ECFP4 and the HTSFP fingerprint indicating synergistic effects between the two fingerprints. A feature importance analysis showed that a small subset of the HTSFP features contribute most to the overall performance of the BaSH fingerprint. This hybrid approach allows for activity prediction of compounds with only sparse HTSFPs due to the supporting effect from the structural fingerprint.![]()
Collapse
Affiliation(s)
- Oliver Laufkötter
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden. .,Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany.
| | - Noé Sturm
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| |
Collapse
|
15
|
Predicting kinase inhibitors using bioactivity matrix derived informer sets. PLoS Comput Biol 2019; 15:e1006813. [PMID: 31381559 PMCID: PMC6695194 DOI: 10.1371/journal.pcbi.1006813] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 08/15/2019] [Accepted: 07/13/2019] [Indexed: 12/21/2022] Open
Abstract
Prediction of compounds that are active against a desired biological target is a common step in drug discovery efforts. Virtual screening methods seek some active-enriched fraction of a library for experimental testing. Where data are too scarce to train supervised learning models for compound prioritization, initial screening must provide the necessary data. Commonly, such an initial library is selected on the basis of chemical diversity by some pseudo-random process (for example, the first few plates of a larger library) or by selecting an entire smaller library. These approaches may not produce a sufficient number or diversity of actives. An alternative approach is to select an informer set of screening compounds on the basis of chemogenomic information from previous testing of compounds against a large number of targets. We compare different ways of using chemogenomic data to choose a small informer set of compounds based on previously measured bioactivity data. We develop this Informer-Based-Ranking (IBR) approach using the Published Kinase Inhibitor Sets (PKIS) as the chemogenomic data to select the informer sets. We test the informer compounds on a target that is not part of the chemogenomic data, then predict the activity of the remaining compounds based on the experimental informer data and the chemogenomic data. Through new chemical screening experiments, we demonstrate the utility of IBR strategies in a prospective test on three kinase targets not included in the PKIS. In the early stages of drug discovery efforts, computational models are used to predict activity and prioritize compounds for experimental testing. New targets commonly lack the data necessary to build effective models, and the screening needed to generate that experimental data can be costly. We seek to improve the efficiency of the initial screening phase, and of the process of prioritizing compounds for subsequent screening. We choose a small informer set of compounds based on publicly available prior screening data on distinct targets. We then collect experimental data on these informer compounds and use that data to predict the activity of other compounds in the set for the target of interest. Computational and statistical tools are needed to identify informer compounds and to prioritize other compounds for subsequent phases of screening. We find that selection of informer compounds on the basis of bioactivity data from previous screening efforts is superior to the traditional approach of selection of a chemically diverse subset of compounds. We demonstrate the success of this approach in retrospective tests on the Published Kinase Inhibitor Sets (PKIS) chemogenomic data and in prospective experimental screens against three additional non-human kinase targets.
Collapse
|
16
|
Jansen JM, De Pascale G, Fong S, Lindvall M, Moser HE, Pfister K, Warne B, Wartchow C. Biased Complement Diversity Selection for Effective Exploration of Chemical Space in Hit-Finding Campaigns. J Chem Inf Model 2019; 59:1709-1714. [DOI: 10.1021/acs.jcim.9b00048] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- Johanna M. Jansen
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Gianfranco De Pascale
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Susan Fong
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Mika Lindvall
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Heinz E. Moser
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Keith Pfister
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Bob Warne
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Charles Wartchow
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| |
Collapse
|
17
|
Buendia R, Kogej T, Engkvist O, Carlsson L, Linusson H, Johansson U, Toccaceli P, Ahlberg E. Accurate Hit Estimation for Iterative Screening Using Venn-ABERS Predictors. J Chem Inf Model 2019; 59:1230-1237. [PMID: 30726080 DOI: 10.1021/acs.jcim.8b00724] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Iterative screening has emerged as a promising approach to increase the efficiency of high-throughput screening (HTS) campaigns in drug discovery. By learning from a subset of the compound library, inferences on what compounds to screen next can be made by predictive models. One of the challenges of iterative screening is to decide how many iterations to perform. This is mainly related to difficulties in estimating the prospective hit rate in any given iteration. In this article, a novel method based on Venn-ABERS predictors is proposed. The method provides accurate estimates of the number of hits retrieved in any given iteration during an HTS campaign. The estimates provide the necessary information to support the decision on the number of iterations needed to maximize the screening outcome. Thus, this method offers a prospective screening strategy for early-stage drug discovery.
Collapse
Affiliation(s)
- Ruben Buendia
- Department of Information Technology , University of Borås , SE-501 90 Borås , Sweden
| | - Thierry Kogej
- Discovery Sciences , AstraZeneca IMED Biotech Unit , SE-431 83 Mölndal , Sweden
| | - Ola Engkvist
- Discovery Sciences , AstraZeneca IMED Biotech Unit , SE-431 83 Mölndal , Sweden
| | - Lars Carlsson
- Discovery Sciences , AstraZeneca IMED Biotech Unit , SE-431 83 Mölndal , Sweden.,Department of Computer Science, Royal Holloway , University of London , Egham , Surrey TW20 0EX , United Kingdom
| | - Henrik Linusson
- Department of Information Technology , University of Borås , SE-501 90 Borås , Sweden
| | - Ulf Johansson
- Department of Information Technology , University of Borås , SE-501 90 Borås , Sweden
| | - Paolo Toccaceli
- Department of Computer Science, Royal Holloway , University of London , Egham , Surrey TW20 0EX , United Kingdom
| | - Ernst Ahlberg
- Data Science and AI, Drug Safety & Metabolism , AstraZeneca IMED Biotech Unit , SE-431 83 Mölndal , Sweden
| |
Collapse
|
18
|
|
19
|
Davies JA. Real-World Synthetic Biology: Is It Founded on an Engineering Approach, and Should It Be? Life (Basel) 2019; 9:life9010006. [PMID: 30621107 PMCID: PMC6463249 DOI: 10.3390/life9010006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 12/20/2018] [Accepted: 12/29/2018] [Indexed: 12/22/2022] Open
Abstract
Authors often assert that a key feature of 21st-century synthetic biology is its use of an 'engineering approach'; design using predictive models, modular architecture, construction using well-characterized standard parts, and rigorous testing using standard metrics. This article examines whether this is, or even should be, the case. A brief survey of synthetic biology projects that have reached, or are near to, commercial application outside laboratories shows that they showed very few of these attributes. Instead, they featured much trial and error, and the use of specialized, custom components and assays. What is more, consideration of the special features of living systems suggest that a conventional engineering approach will often not be helpful. The article concludes that the engineering approach may be useful in some projects, but it should not be used to define or constrain synthetic biological endeavour, and that in fact the conventional engineering has more to gain by expanding and embracing more biological ways of working.
Collapse
Affiliation(s)
- Jamie A Davies
- UK Centre for Mammalian Synthetic Biology, University of Edinburgh, Edinburgh EH8 9YL, UK.
| |
Collapse
|
20
|
Sturm N, Sun J, Vandriessche Y, Mayr A, Klambauer G, Carlsson L, Engkvist O, Chen H. Application of Bioactivity Profile-Based Fingerprints for Building Machine Learning Models. J Chem Inf Model 2018; 59:962-972. [DOI: 10.1021/acs.jcim.8b00550] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Affiliation(s)
- Noé Sturm
- Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Pepparedsleden 1, 43153 Mölndal, Sweden
| | - Jiangming Sun
- Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Pepparedsleden 1, 43153 Mölndal, Sweden
| | - Yves Vandriessche
- Intel Corporation, Data Center Group, Veldkant 31, 2550 Kontich, Belgium
| | - Andreas Mayr
- LIT AI Lab & Institute for Machine Learning, Johannes Kepler University Linz, Altenbergerstr 69, 4040 Linz, Austria
| | - Günter Klambauer
- LIT AI Lab & Institute for Machine Learning, Johannes Kepler University Linz, Altenbergerstr 69, 4040 Linz, Austria
| | - Lars Carlsson
- Quantitative Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Pepparedsleden 1, 43153 Mölndal, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Pepparedsleden 1, 43153 Mölndal, Sweden
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Pepparedsleden 1, 43153 Mölndal, Sweden
| |
Collapse
|
21
|
Mason DJ, Eastman RT, Lewis RPI, Stott IP, Guha R, Bender A. Using Machine Learning to Predict Synergistic Antimalarial Compound Combinations With Novel Structures. Front Pharmacol 2018; 9:1096. [PMID: 30333748 PMCID: PMC6176478 DOI: 10.3389/fphar.2018.01096] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 09/07/2018] [Indexed: 01/28/2023] Open
Abstract
The parasite Plasmodium falciparum is the most lethal species of Plasmodium to cause serious malaria infection in humans, and with resistance developing rapidly novel treatment modalities are currently being sought, one of which being combinations of existing compounds. The discovery of combinations of antimalarial drugs that act synergistically with one another is hence of great importance; however an exhaustive experimental screen of large drug space in a pairwise manner is not an option. In this study we apply our machine learning approach, Combination Synergy Estimation (CoSynE), which can predict novel synergistic drug interactions using only prior experimental combination screening data and knowledge of compound molecular structures, to a dataset of 1,540 antimalarial drug combinations in which 22.2% were synergistic. Cross validation of our model showed that synergistic CoSynE predictions are enriched 2.74 × compared to random selection when both compounds in a predicted combination are known from other combinations among the training data, 2.36 × when only one compound is known from the training data, and 1.5 × for entirely novel combinations. We prospectively validated our model by making predictions for 185 combinations of 23 entirely novel compounds. CoSynE predicted 20 combinations to be synergistic, which was experimentally validated for nine of them (45%), corresponding to an enrichment of 1.70 × compared to random selection from this prospective data set. Such enrichment corresponds to a 41% reduction in experimental effort. Interestingly, we found that pairwise screening of the compounds CoSynE individually predicted to be synergistic would result in an enrichment of 1.36 × compared to random selection, indicating that synergy among compound combinations is not a random event. The nine novel and correctly predicted synergistic compound combinations mainly (where sufficient bioactivity information is available) consist of efflux or transporter inhibitors (such as hydroxyzine), combined with compounds exhibiting antimalarial activity alone (such as sorafenib, apicidin, or dihydroergotamine). However, not all compound synergies could be rationalized easily in this way. Overall, this study highlights the potential for predictive modeling to expedite the discovery of novel drug combinations in fight against antimalarial resistance, while the underlying approach is also generally applicable.
Collapse
Affiliation(s)
- Daniel J Mason
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom.,Healx Ltd., Cambridge, United Kingdom
| | - Richard T Eastman
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States
| | - Richard P I Lewis
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom
| | - Ian P Stott
- Unilever Research and Development, Wirral, United Kingdom
| | - Rajarshi Guha
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
22
|
Cortés-Ciriano I, Firth NC, Bender A, Watson O. Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening. J Chem Inf Model 2018; 58:2000-2014. [PMID: 30130102 DOI: 10.1021/acs.jcim.8b00376] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The versatility of similarity searching and quantitative structure-activity relationships to model the activity of compound sets within given bioactivity ranges (i.e., interpolation) is well established. However, their relative performance in the common scenario in early stage drug discovery where lots of inactive data but no active data points are available (i.e., extrapolation from the low-activity to the high-activity range) has not been thoroughly examined yet. To this aim, we have designed an iterative virtual screening strategy which was evaluated on 25 diverse bioactivity data sets from ChEMBL. We benchmark the efficiency of random forest (RF), multiple linear regression, ridge regression, similarity searching, and random selection of compounds to identify a highly active molecule in the test set among a large number of low-potency compounds. We use the number of iterations required to find this active molecule to evaluate the performance of each experimental setup. We show that linear and ridge regression often outperform RF and similarity searching, reducing the number of iterations to find an active compound by a factor of 2 or more. Even simple regression methods seem better able to extrapolate to high-bioactivity ranges than RF, which only provides output values in the range covered by the training set. In addition, examination of the scaffold diversity in the data sets used shows that in some cases similarity searching and RF require two times as many iterations as random selection depending on the chemical space covered in the initial training data. Lastly, we show using bioactivity data for COX-1 and COX-2 that our framework can be extended to multitarget drug discovery, where compounds are selected by concomitantly considering their activity against multiple targets. Overall, this study provides an approach for iterative screening where only inactive data are present in early stages of drug discovery in order to discover highly potent compounds and the best experimental set up in which to do so.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| | - Nicholas C Firth
- Centre for Medical Image Computing, Department of Computer Science , UCL , London WC1E 6BT , United Kingdom.,Evariste Technologies Ltd , Goring on Thames RG8 9AL , United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| | - Oliver Watson
- Evariste Technologies Ltd , Goring on Thames RG8 9AL , United Kingdom
| |
Collapse
|
23
|
Leveridge M, Chung CW, Gross JW, Phelps CB, Green D. Integration of Lead Discovery Tactics and the Evolution of the Lead Discovery Toolbox. SLAS DISCOVERY 2018; 23:881-897. [PMID: 29874524 DOI: 10.1177/2472555218778503] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
There has been much debate around the success rates of various screening strategies to identify starting points for drug discovery. Although high-throughput target-based and phenotypic screening has been the focus of this debate, techniques such as fragment screening, virtual screening, and DNA-encoded library screening are also increasingly reported as a source of new chemical equity. Here, we provide examples in which integration of more than one screening approach has improved the campaign outcome and discuss how strengths and weaknesses of various methods can be used to build a complementary toolbox of approaches, giving researchers the greatest probability of successfully identifying leads. Among others, we highlight case studies for receptor-interacting serine/threonine-protein kinase 1 and the bromo- and extra-terminal domain family of bromodomains. In each example, the unique insight or chemistries individual approaches provided are described, emphasizing the synergy of information obtained from the various tactics employed and the particular question each tactic was employed to answer. We conclude with a short prospective discussing how screening strategies are evolving, what this screening toolbox might look like in the future, how to maximize success through integration of multiple tactics, and scenarios that drive selection of one combination of tactics over another.
Collapse
Affiliation(s)
- Melanie Leveridge
- 1 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Stevenage, Hertfordshire, UK
| | - Chun-Wa Chung
- 1 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Stevenage, Hertfordshire, UK
| | - Jeffrey W Gross
- 2 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Collegeville, PA, USA
| | - Christopher B Phelps
- 3 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Cambridge, MA, USA
| | - Darren Green
- 1 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Stevenage, Hertfordshire, UK
| |
Collapse
|
24
|
Paricharak S, Méndez-Lucio O, Chavan Ravindranath A, Bender A, IJzerman AP, van Westen GJP. Data-driven approaches used for compound library design, hit triage and bioactivity modeling in high-throughput screening. Brief Bioinform 2018; 19:277-285. [PMID: 27789427 PMCID: PMC6018726 DOI: 10.1093/bib/bbw105] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Revised: 09/26/2016] [Indexed: 12/25/2022] Open
Abstract
High-throughput screening (HTS) campaigns are routinely performed in pharmaceutical companies to explore activity profiles of chemical libraries for the identification of promising candidates for further investigation. With the aim of improving hit rates in these campaigns, data-driven approaches have been used to design relevant compound screening collections, enable effective hit triage and perform activity modeling for compound prioritization. Remarkable progress has been made in the activity modeling area since the recent introduction of large-scale bioactivity-based compound similarity metrics. This is evidenced by increased hit rates in iterative screening strategies and novel insights into compound mode of action obtained through activity modeling. Here, we provide an overview of the developments in data-driven approaches, elaborate on novel activity modeling techniques and screening paradigms explored and outline their significance in HTS.
Collapse
Affiliation(s)
- Shardul Paricharak
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom
- Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University, RA Leiden, The Netherlands
| | - Oscar Méndez-Lucio
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom
- Facultad de Química, Departamento de Farmacia, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City, Mexico
| | - Aakash Chavan Ravindranath
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom
| | - Adriaan P IJzerman
- Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University, RA Leiden, The Netherlands
| | - Gerard J P van Westen
- Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University, RA Leiden, The Netherlands
| |
Collapse
|
25
|
Svensson F, Afzal AM, Norinder U, Bender A. Maximizing gain in high-throughput screening using conformal prediction. J Cheminform 2018; 10:7. [PMID: 29468427 PMCID: PMC5821614 DOI: 10.1186/s13321-018-0260-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 02/09/2018] [Indexed: 11/10/2022] Open
Abstract
Iterative screening has emerged as a promising approach to increase the efficiency of screening campaigns compared to traditional high throughput approaches. By learning from a subset of the compound library, inferences on what compounds to screen next can be made by predictive models, resulting in more efficient screening. One way to evaluate screening is to consider the cost of screening compared to the gain associated with finding an active compound. In this work, we introduce a conformal predictor coupled with a gain-cost function with the aim to maximise gain in iterative screening. Using this setup we were able to show that by evaluating the predictions on the training data, very accurate predictions on what settings will produce the highest gain on the test data can be made. We evaluate the approach on 12 bioactivity datasets from PubChem training the models using 20% of the data. Depending on the settings of the gain-cost function, the settings generating the maximum gain were accurately identified in 8-10 out of the 12 datasets. Broadly, our approach can predict what strategy generates the highest gain based on the results of the cost-gain evaluation: to screen the compounds predicted to be active, to screen all the remaining data, or not to screen any additional compounds. When the algorithm indicates that the predicted active compounds should be screened, our approach also indicates what confidence level to apply in order to maximize gain. Hence, our approach facilitates decision-making and allocation of the resources where they deliver the most value by indicating in advance the likely outcome of a screening campaign.
Collapse
Affiliation(s)
- Fredrik Svensson
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
- IOTA Pharmaceuticals, St Johns Innovation Centre, Cowley Road, Cambridge, CB4 0WS UK
| | - Avid M. Afzal
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Ulf Norinder
- Unit of Toxicology Sciences, Karolinska Institutet, Swetox, Forskargatan 20, 151 36 Södertälje, Sweden
- Department of Computer and Systems Sciences, Stockholm University, Box 7003, 164 07 Kista, Sweden
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| |
Collapse
|
26
|
Mason DJ, Stott I, Ashenden S, Weinstein ZB, Karakoc I, Meral S, Kuru N, Bender A, Cokol M. Prediction of Antibiotic Interactions Using Descriptors Derived from Molecular Structure. J Med Chem 2017; 60:3902-3912. [PMID: 28383902 DOI: 10.1021/acs.jmedchem.7b00204] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Combination antibiotic therapies are clinically important in the fight against bacterial infections. However, the search space of drug combinations is large, making the identification of effective combinations a challenging task. Here, we present a computational framework that uses substructure profiles derived from the molecular structures of drugs and predicts antibiotic interactions. Using a previously published data set of 153 drug pairs, we showed that substructure profiles are useful in predicting synergy. We experimentally measured the interaction of 123 new drug pairs, as a prospective validation set for our approach, and identified 37 new synergistic pairs. Of the 12 pairs predicted to be synergistic, 10 were experimentally validated, corresponding to a 2.8-fold enrichment. Having thus validated our methodology, we produced a compendium of interaction predictions for all pairwise combinations among 100 antibiotics. Our methodology can make reliable antibiotic interaction predictions for any antibiotic pair within the applicability domain of the model since it solely requires chemical structures as an input.
Collapse
Affiliation(s)
- Daniel J Mason
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge , Cambridge CB2 1EW, United Kingdom
| | - Ian Stott
- Unilever Research and Development , Port Sunlight, Wirral CH63 3JW, United Kingdom
| | - Stephanie Ashenden
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge , Cambridge CB2 1EW, United Kingdom
| | - Zohar B Weinstein
- Boston University School of Medicine , Boston, Massachusetts 02118, United States
| | - Idil Karakoc
- Faculty of Engineering and Natural Sciences, Sabanci University , Tuzla, Istanbul 34956, Turkey
| | - Selin Meral
- Faculty of Engineering and Natural Sciences, Sabanci University , Tuzla, Istanbul 34956, Turkey
| | - Nurdan Kuru
- Faculty of Engineering and Natural Sciences, Sabanci University , Tuzla, Istanbul 34956, Turkey
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge , Cambridge CB2 1EW, United Kingdom
| | - Murat Cokol
- Faculty of Engineering and Natural Sciences, Sabanci University , Tuzla, Istanbul 34956, Turkey.,Department of Molecular Biology and Microbiology, Tufts University School of Medicine , Boston, Massachusetts 02111, United States.,Laboratory of Systems Pharmacology, Harvard Medical School , Boston, Massachusetts 02115, United States
| |
Collapse
|
27
|
Pertusi DA, O’Donnell G, Homsher MF, Solly K, Patel A, Stahler SL, Riley D, Finley MF, Finger EN, Adam GC, Meng J, Bell DJ, Zuck PD, Hudak EM, Weber MJ, Nothstein JE, Locco L, Quinn C, Amoss A, Squadroni B, Hartnett M, Heo MR, White T, May SA, Boots E, Roberts K, Cocchiarella P, Wolicki A, Kreamer A, Kutchukian PS, Wassermann AM, Uebele VN, Glick M, Rusinko A, Culberson JC. Prospective Assessment of Virtual Screening Heuristics Derived Using a Novel Fusion Score. SLAS DISCOVERY 2017; 22:995-1006. [DOI: 10.1177/2472555217706058] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
High-throughput screening (HTS) is a widespread method in early drug discovery for identifying promising chemical matter that modulates a target or phenotype of interest. Because HTS campaigns involve screening millions of compounds, it is often desirable to initiate screening with a subset of the full collection. Subsequently, virtual screening methods prioritize likely active compounds in the remaining collection in an iterative process. With this approach, orthogonal virtual screening methods are often applied, necessitating the prioritization of hits from different approaches. Here, we introduce a novel method of fusing these prioritizations and benchmark it prospectively on 17 screening campaigns using virtual screening methods in three descriptor spaces. We found that the fusion approach retrieves 15% to 65% more active chemical series than any single machine-learning method and that appropriately weighting contributions of similarity and machine-learning scoring techniques can increase enrichment by 1% to 19%. We also use fusion scoring to evaluate the tradeoff between screening more chemical matter initially in lieu of replicate samples to prevent false-positives and find that the former option leads to the retrieval of more active chemical series. These results represent guidelines that can increase the rate of identification of promising active compounds in future iterative screens.
Collapse
Affiliation(s)
- Dante A. Pertusi
- Modeling and Informatics, Merck & Co., Inc., West Point, PA, USA
| | - Gregory O’Donnell
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Michelle F. Homsher
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Kelli Solly
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Amita Patel
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Shannon L. Stahler
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Daniel Riley
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Michael F. Finley
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Discovery Sciences, Janssen Research and Development LLC, Spring House, PA, USA
| | - Eleftheria N. Finger
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Discovery & Preclinical Development, GlaxoSmithKline, Collegeville, PA, USA
| | - Gregory C. Adam
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Juncai Meng
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
| | - David J. Bell
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., North Wales, PA, USA
| | - Paul D. Zuck
- Merck & Co., Inc., North Wales, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Edward M. Hudak
- Discovery Sample Management, Merck & Co., Inc., North Wales, PA, USA
| | - Michael J. Weber
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Jennifer E. Nothstein
- Merck & Co., Inc., West Point, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Louis Locco
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Carissa Quinn
- Discovery Sciences, Janssen Research and Development LLC, Spring House, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Adam Amoss
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Brian Squadroni
- Merck & Co., Inc., West Point, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Michelle Hartnett
- Discovery Sciences, Janssen Research and Development LLC, Spring House, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Mee Ra Heo
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., North Wales, PA, USA
| | - Tara White
- Discovery Sample Management, Merck & Co., Inc., North Wales, PA, USA
| | - S. Alex May
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Evelyn Boots
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
| | - Kenneth Roberts
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | | | - Alex Wolicki
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
| | - Anthony Kreamer
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., Kenilworth, NJ, USA
| | | | | | - Victor N. Uebele
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., North Wales, PA, USA
| | - Meir Glick
- Modeling and Informatics, Merck & Co., Inc., Boston, MA, USA
| | - Andrew Rusinko
- Modeling and Informatics, Merck & Co., Inc., West Point, PA, USA
| | | |
Collapse
|
28
|
Svensson F, Norinder U, Bender A. Improving Screening Efficiency through Iterative Screening Using Docking and Conformal Prediction. J Chem Inf Model 2017; 57:439-444. [DOI: 10.1021/acs.jcim.6b00532] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Fredrik Svensson
- Centre
for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Ulf Norinder
- Swetox,
Karolinska Institutet, Unit of Toxicology Sciences, Forskargatan
20, SE-151 36 Södertälje, Sweden
- Department
of Computer and Systems Sciences, Stockholm University, Box 7003, SE-164
07 Kista, Sweden
| | - Andreas Bender
- Centre
for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|
29
|
Kutchukian PS, Warren L, Magliaro BC, Amoss A, Cassaday JA, O’Donnell G, Squadroni B, Zuck P, Pascarella D, Culberson JC, Cooke AJ, Hurzy D, Schlegel KAS, Thomson F, Johnson EN, Uebele VN, Hermes JD, Parmentier-Batteur S, Finley M. Iterative Focused Screening with Biological Fingerprints Identifies Selective Asc-1 Inhibitors Distinct from Traditional High Throughput Screening. ACS Chem Biol 2017; 12:519-527. [PMID: 28032990 DOI: 10.1021/acschembio.6b00913] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
N-methyl-d-aspartate receptors (NMDARs) mediate glutamatergic signaling that is critical to cognitive processes in the central nervous system, and NMDAR hypofunction is thought to contribute to cognitive impairment observed in both schizophrenia and Alzheimer's disease. One approach to enhance the function of NMDAR is to increase the concentration of an NMDAR coagonist, such as glycine or d-serine, in the synaptic cleft. Inhibition of alanine-serine-cysteine transporter-1 (Asc-1), the primary transporter of d-serine, is attractive because the transporter is localized to neurons in brain regions critical to cognitive function, including the hippocampus and cortical layers III and IV, and is colocalized with d-serine and NMDARs. To identify novel Asc-1 inhibitors, two different screening approaches were performed with whole-cell amino acid uptake in heterologous cells stably expressing human Asc-1: (1) a high-throughput screen (HTS) of 3 M compounds measuring 35S l-cysteine uptake into cells attached to scintillation proximity assay beads in a 1536 well format and (2) an iterative focused screen (IFS) of a 45 000 compound diversity set using a 3H d-serine uptake assay with a liquid scintillation plate reader in a 384 well format. Critically important for both screening approaches was the implementation of counter screens to remove nonspecific inhibitors of radioactive amino acid uptake. Furthermore, a 15 000 compound expansion step incorporating both on- and off-target data into chemical and biological fingerprint-based models for selection of additional hits enabled the identification of novel Asc-1-selective chemical matter from the IFS that was not identified in the full-collection HTS.
Collapse
Affiliation(s)
- Peter S. Kutchukian
- Modeling and Informatics, Merck & Co., Inc., MRL, Boston, Massachusetts, United States
| | - Lee Warren
- Neuroscience, Merck & Co., Inc., MRL, West Point, Pennsylvania, United States
| | - Brian C. Magliaro
- Pharmacology, Merck & Co., Inc., MRL, West Point, Pennsylvania, United States
| | - Adam Amoss
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | - Jason A. Cassaday
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | - Gregory O’Donnell
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | - Brian Squadroni
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | - Paul Zuck
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | - Danette Pascarella
- Pharmacology, Merck & Co., Inc., MRL, West Point, Pennsylvania, United States
| | - J. Chris Culberson
- Modeling and Informatics, Merck & Co., Inc., MRL, West Point, Pennsylvania, United States
| | - Andrew J. Cooke
- Chemistry, Merck & Co., Inc., MRL, West Point, Pennsylvania, United States
| | - Danielle Hurzy
- Chemistry, Merck & Co., Inc., MRL, West Point, Pennsylvania, United States
| | | | - Fiona Thomson
- Neuroscience, Merck & Co., Inc., MRL, West Point, Pennsylvania, United States
| | - Eric N. Johnson
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | - Victor N. Uebele
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | - Jeffrey D. Hermes
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | | | - Michael Finley
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| |
Collapse
|
30
|
Mok NY, Brown N. Applications of Systematic Molecular Scaffold Enumeration to Enrich Structure-Activity Relationship Information. J Chem Inf Model 2016; 57:27-35. [PMID: 27990817 PMCID: PMC6152611 DOI: 10.1021/acs.jcim.6b00386] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
![]()
Establishing
structure–activity relationships (SARs) in
hit identification during early stage drug discovery is important
in accelerating hit confirmation and expansion. We describe the development
of EnCore, a systematic molecular scaffold enumeration
protocol using single atom mutations, to enhance the application of
objective scaffold definitions and to enrich SAR information from
analysis of high-throughput screening output. A list of 43 literature
medicinal chemistry compound series, each containing a minimum of
100 compounds, published in the Journal of Medicinal Chemistry was collated to validate the protocol. Analysis using the top representative
Level 1 scaffolds this list of literature compound series demonstrated
that EnCore could mimic the scaffold exploration
conducted when establishing SAR. When EnCore was
applied to analyze an HTS library containing over 200 000 compounds,
we observed that over 70% of the molecular scaffolds matched extant
scaffolds within the library after enumeration. In particular, over
60% of the singleton scaffolds with only one representative compound
were found to have structurally related compounds after enumeration.
These results illustrate the potential of EnCore to
enrich SAR information. A case study using literature cyclooxygenase-2
inhibitors further demonstrates the advantage of EnCore application in establishing SAR from structurally related scaffolds. EnCore complements literature enumeration methods in enabling
changes to the physicochemical properties of molecular scaffolds and
structural modifications to aliphatic rings and linkers. The enumerated
scaffold clusters generated would constitute a comprehensive collection
of scaffolds for scaffold morphing and hopping.
Collapse
Affiliation(s)
- N Yi Mok
- Cancer Research UK Cancer Therapeutics Unit, Division of Cancer Therapeutics, The Institute of Cancer Research , London, SM2 5NG, U.K
| | - Nathan Brown
- Cancer Research UK Cancer Therapeutics Unit, Division of Cancer Therapeutics, The Institute of Cancer Research , London, SM2 5NG, U.K
| |
Collapse
|
31
|
Paricharak S, IJzerman AP, Jenkins JL, Bender A, Nigsch F. Data-Driven Derivation of an "Informer Compound Set" for Improved Selection of Active Compounds in High-Throughput Screening. J Chem Inf Model 2016; 56:1622-30. [PMID: 27487177 DOI: 10.1021/acs.jcim.6b00244] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Despite the usefulness of high-throughput screening (HTS) in drug discovery, for some systems, low assay throughput or high screening cost can prohibit the screening of large numbers of compounds. In such cases, iterative cycles of screening involving active learning (AL) are employed, creating the need for smaller "informer sets" that can be routinely screened to build predictive models for selecting compounds from the screening collection for follow-up screens. Here, we present a data-driven derivation of an informer compound set with improved predictivity of active compounds in HTS, and we validate its benefit over randomly selected training sets on 46 PubChem assays comprising at least 300,000 compounds and covering a wide range of assay biology. The informer compound set showed improvement in BEDROC(α = 100), PRAUC, and ROCAUC values averaged over all assays of 0.024, 0.014, and 0.016, respectively, compared to randomly selected training sets, all with paired t-test p-values <10(-15). A per-assay assessment showed that the BEDROC(α = 100), which is of particular relevance for early retrieval of actives, improved for 38 out of 46 assays, increasing the success rate of smaller follow-up screens. Overall, we showed that an informer set derived from historical HTS activity data can be employed for routine small-scale exploratory screening in an assay-agnostic fashion. This approach led to a consistent improvement in hit rates in follow-up screens without compromising scaffold retrieval. The informer set is adjustable in size depending on the number of compounds one intends to screen, as performance gains are realized for sets with more than 3,000 compounds, and this set is therefore applicable to a variety of situations. Finally, our results indicate that random sampling may not adequately cover descriptor space, drawing attention to the importance of the composition of the training set for predicting actives.
Collapse
Affiliation(s)
- Shardul Paricharak
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge , Lensfield Road, CB2 1EW, Cambridge, United Kingdom.,Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University , P.O. Box 9502, 2300 RA Leiden, The Netherlands.,Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research , Novartis Pharma AG, Novartis Campus, 4056 Basel, Switzerland
| | - Adriaan P IJzerman
- Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University , P.O. Box 9502, 2300 RA Leiden, The Netherlands
| | - Jeremy L Jenkins
- Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research , Cambridge, Massachusetts 02139, United States
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge , Lensfield Road, CB2 1EW, Cambridge, United Kingdom
| | - Florian Nigsch
- Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research , Novartis Pharma AG, Novartis Campus, 4056 Basel, Switzerland
| |
Collapse
|