1
|
Designing small molecules for therapeutic success: A contemporary perspective. Drug Discov Today 2021; 27:538-546. [PMID: 34601124 DOI: 10.1016/j.drudis.2021.09.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 08/31/2021] [Accepted: 09/25/2021] [Indexed: 11/23/2022]
Abstract
Successful small-molecule drug design requires a molecular target with inherent therapeutic potential and a molecule with the right properties to unlock its potential. Present-day drug design strategies have evolved to leave little room for improvement in drug-like properties. As a result, inadequate safety or efficacy associated with molecular targets now constitutes the primary cause of attrition in preclinical development through Phase II. This finding has led to a deeper focus on target selection. In this current reality, design tactics that enable rapid identification of risk-balanced clinical candidates, translation of clinical experience into meaningful differentiation strategies, and expansion of the druggable proteome represent significant levers by which drug designers can accelerate the discovery of the next generation of medicines.
Collapse
|
2
|
CAVIAR: a method for automatic cavity detection, description and decomposition into subcavities. J Comput Aided Mol Des 2021; 35:737-750. [PMID: 34050420 DOI: 10.1007/s10822-021-00390-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 05/11/2021] [Indexed: 10/21/2022]
Abstract
The accurate description of protein binding sites is essential to the determination of similarity and the application of machine learning methods to relate the binding sites to observed functions. This work describes CAVIAR, a new open source tool for generating descriptors for binding sites, using protein structures in PDB and mmCIF format as well as trajectory frames from molecular dynamics simulations as input. The applicability of CAVIAR descriptors is showcased by computing machine learning predictions of binding site ligandability. The method can also automatically assign subcavities, even in the absence of a bound ligand. The defined subpockets mimic the empirical definitions used in medicinal chemistry projects. It is shown that the experimental binding affinity scales relatively well with the number of subcavities filled by the ligand, with compounds binding to more than three subcavities having nanomolar or better affinities to the target. The CAVIAR descriptors and methods can be used in any machine learning-based investigations of problems involving binding sites, from protein engineering to hit identification. The full software code is available on GitHub and a conda package is hosted on Anaconda cloud.
Collapse
|
3
|
Ding X, Cui C, Wang D, Zhao J, Zheng M, Luo X, Jiang H, Chen K. Bioactivity Prediction Based on Matched Molecular Pair and Matched Molecular Series Methods. Curr Pharm Des 2021; 26:4195-4205. [PMID: 32338210 DOI: 10.2174/1381612826666200427111309] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 04/08/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Enhancing a compound's biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process. METHODS Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold. RESULTS Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563). CONCLUSION An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.
Collapse
Affiliation(s)
- Xiaoyu Ding
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Chen Cui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Jihui Zhao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| |
Collapse
|
4
|
Green DVS, Pickett S, Luscombe C, Senger S, Marcus D, Meslamani J, Brett D, Powell A, Masson J. BRADSHAW: a system for automated molecular design. J Comput Aided Mol Des 2020; 34:747-765. [PMID: 31637565 PMCID: PMC7292824 DOI: 10.1007/s10822-019-00234-8] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 10/05/2019] [Indexed: 12/18/2022]
Abstract
This paper introduces BRADSHAW (Biological Response Analysis and Design System using an Heterogenous, Automated Workflow), a system for automated molecular design which integrates methods for chemical structure generation, experimental design, active learning and cheminformatics tools. The simple user interface is designed to facilitate access to large scale automated design whilst minimising software development required to introduce new algorithms, a critical requirement in what is a very fast moving field. The system embodies a philosophy of automation, best practice, experimental design and the use of both traditional cheminformatics and modern machine learning algorithms.
Collapse
Affiliation(s)
- Darren V S Green
- Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, UK.
| | - Stephen Pickett
- Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, UK
| | - Chris Luscombe
- Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, UK
| | - Stefan Senger
- Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, UK
| | - David Marcus
- Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, UK
| | - Jamel Meslamani
- Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, 1250 South Collegeville Road, Collegeville, PA, 19426, USA
| | - David Brett
- Tessella Ltd, Walkern Road, Stevenage, Hertfordshire, SG1 3QP, UK
| | - Adam Powell
- Tessella Ltd, Walkern Road, Stevenage, Hertfordshire, SG1 3QP, UK
| | - Jonathan Masson
- Tessella Ltd, Walkern Road, Stevenage, Hertfordshire, SG1 3QP, UK
| |
Collapse
|
5
|
Kruger F, Fechner N, Stiefl N. Automated Identification of Chemical Series: Classifying like a Medicinal Chemist. J Chem Inf Model 2020; 60:2888-2902. [PMID: 32374165 DOI: 10.1021/acs.jcim.0c00204] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We investigate different automated approaches for the classification of chemical series in early drug discovery, with the aim of closely mimicking human chemical series conception. Chemical series, which are commonly defined by hand-drawn scaffolds, organize datasets in drug discovery projects. Often, they form the basis for further project decisions. To trace and evaluate these decisions in historic and ongoing projects, it is important to know or reconstruct chemical series. There is not a unique correct definition of chemical series, and the human definition certainly involves a subjective bias. Hence, we first develop quality metrics for the chemical series definitions, evaluating the size and specificity of chemical series. These metrics are applied to categorize human series definitions and implemented in automated classification approaches. For the automated classification of chemical series, we test different fragmentation and similarity-based clustering algorithms and apply different approaches to infer series definitions from these clusters or sets of fragments. We benchmark the classification results against human-defined series from 30 internal projects. The best results in reproducing the composition of human-defined series are achieved when applying UPGMA (unweighted pair group method with arithmetic mean) clustering to the project dataset and calculating maximum common substructures of the clusters as series definitions. We evaluate this approach in more detail on a public dataset and assess its robustness by 10-fold cross-validation, each time sampling 40% of the dataset. Through these benchmarking and validation experiments, we show that the proposed automated approach is able to accurately and robustly identify human-defined series, which comply with a certain, predefined level of specificity and size. Suggesting a thoroughly tested algorithm for series classification, as well as quality metrics for series and several benchmarking approaches, this work lays the foundation for further analysis of project decisions, and it offers an enhanced understanding of the properties of human-defined chemical series.
Collapse
Affiliation(s)
- Franziska Kruger
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Nikolas Fechner
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Nikolaus Stiefl
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| |
Collapse
|
6
|
Awale M, Riniker S, Kramer C. Matched Molecular Series Analysis for ADME Property Prediction. J Chem Inf Model 2020; 60:2903-2914. [PMID: 32369360 DOI: 10.1021/acs.jcim.0c00269] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Generation and prioritization of new molecules are the most central part of the drug design process. Matched molecular series analysis (MMSA) has recently been proposed as a formal approach that captures both of these key elements of design. In order to better understand the power of MMSA and its specific limitations, we here evaluate its performance as an ADME property prediction tool. We use four large and diverse inhouse data sets, logD, microsomal clearance, CYP2C9, and CYP3A4 inhibition. MMSA follows the concept of parallel structure-activity relationship (SAR), where if two identical substituent series on different scaffolds show similarity in their property profiles, SAR from one series can be transferred to the other series. We test four different similarity metrics to identify pairs of molecular series where information can be transferred. We find that the best prediction performance is achieved by a combination of centered root-mean-square deviation (cRMSD) and a network score approach previously published by Keefer et al. However, cRMSD alone strikes the best balance between accuracy and the number of predictions that can be made. We identify statistical metrics that allow estimating when MMSA predictions will work, similar to the well-known applicability domain concept in machine learning. MMSA achieves a prediction accuracy that is comparable to a standard machine-learning model and matched molecular pair analysis. In contrast to machine learning, however, it is very easy to understand where MMSA predictions are coming from. Finally, to prospectively test the power of MMSA, we retested compounds that were strong outliers in the initial predictions and show how the MMSA model can help to identify erroneous data points.
Collapse
Affiliation(s)
- Mahendra Awale
- Computer-Aided Drug Design/Therapeutic Modalities, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Christian Kramer
- Computer-Aided Drug Design/Therapeutic Modalities, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland
| |
Collapse
|
7
|
Can we accelerate medicinal chemistry by augmenting the chemist with Big Data and artificial intelligence? Drug Discov Today 2018; 23:1373-1384. [DOI: 10.1016/j.drudis.2018.03.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Revised: 02/27/2018] [Accepted: 03/20/2018] [Indexed: 12/18/2022]
|