1
|
Khan M, Kandwal S, Fayne D. DataPype: A Fully Automated Unified Software Platform for Computer-Aided Drug Design. ACS OMEGA 2023; 8:39468-39480. [PMID: 37901539 PMCID: PMC10601415 DOI: 10.1021/acsomega.3c05207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 09/26/2023] [Indexed: 10/31/2023]
Abstract
With the advent of computer-aided drug design (CADD), traditional physical testing of thousands of molecules has now been replaced by target-focused drug discovery, where potentially bioactive molecules are predicted by computer software before their physical synthesis. However, despite being a significant breakthrough, CADD still faces various limitations and challenges. The increasing availability of data on small molecules has created a need to streamline the sourcing of data from different databases and automate the processing and cleaning of data into a form that can be used by multiple CADD software applications. Several standalone software packages are available to aid the drug designer, each with its own specific application, requiring specialized knowledge and expertise for optimal use. These applications require their own input and output files, making it a challenge for nonexpert users or multidisciplinary discovery teams. Here, we have developed a new software platform called DataPype, which wraps around these different software packages. It provides a unified automated workflow to search for hit compounds using specialist software. Additionally, multiple virtual screening packages can be used in the one workflow, and if different ways of looking at potential hit compounds all predict the same set of molecules, we have higher confidence that we should make or purchase and test the molecules. Importantly, DataPype can run on computer servers, speeding up the virtual screening for new compounds. Combining access to multiple CADD tools within one interface will enhance the early stage of drug discovery, increase usability, and enable the use of parallel computing.
Collapse
Affiliation(s)
- Mohemmed
Faraz Khan
- Molecular
Design Group, School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin 2, Ireland
- Department
of Pharmaceutical Chemistry, Faculty of Pharmacy, Integral University, Lucknow U.P., 226026, India
| | - Shubhangi Kandwal
- Molecular
Design Group, School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin 2, Ireland
| | - Darren Fayne
- Molecular
Design Group, School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin 2, Ireland
| |
Collapse
|
2
|
Kmetič I, Murati T, Kovač V, Jurčević IL, Šimić B, Radošević K, Miletić M. Novel ferrocene-containing triacyl derivative of resveratrol improves viability parameters in ovary cells. J Appl Toxicol 2023. [PMID: 36823762 DOI: 10.1002/jat.4452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/10/2023] [Accepted: 02/20/2023] [Indexed: 02/25/2023]
Abstract
Besides the use of resveratrol as a drug candidate, there are obstacles mainly due to its poor pharmacokinetic properties. Numerous studies are being conducted on the synthesis of resveratrol derivatives that exhibit enhanced biological activity. The aim of our research was to investigate activity of the newly synthesized ferrocene-containing triacyl derivative of resveratrol to achieve cell protection from endo/exogenous ROS and reduction in cell death by assessing multiple endpoints. Our research showed that both resveratrol and the resveratrol derivatives (1-100 μM) lower the levels of ROS in CHO-K1 cells. Resveratrol at doses <20 μM had little or no effect on cell proliferation, while at higher doses, a significant inhibitory effect on cell proliferation and viability has been noticed. The activity of the new derivative was significantly altered compared to resveratrol-cellular viability was not suppressed regardless of the concentration applied, and the extent of apoptosis was low. In summary, the new ferrocene-resveratrol derivative has the potential to protect cells from oxidative stress due to its low cytotoxicity and retained antioxidant properties, whereas caution should be exercised with resveratrol at higher doses, as it significantly impairs cell viability and induces cell death. By linking ROS to specific diseases (relevance in neurodegenerative, cardiovascular, and neoplastic diseases), we can assume that the new resveratrol derivative can prevent or alleviate the mentioned disorders. Furthermore, recognition of the resveratrol derivative as an anti-apoptotic chemical could be useful in the cultivation of various cell lines on a large scale in the industrial biotechnology.
Collapse
Affiliation(s)
- Ivana Kmetič
- Laboratory for Toxicology, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierotti St. 6, Zagreb, 10000, Croatia
| | - Teuta Murati
- Laboratory for Toxicology, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierotti St. 6, Zagreb, 10000, Croatia
| | - Veronika Kovač
- Laboratory for Organic Chemistry, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierotti St. 6, Zagreb, 10000, Croatia
| | - Irena Landeka Jurčević
- Laboratory for Food Chemistry and Biochemistry, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierotti St. 6, Zagreb, 10000, Croatia
| | - Branimir Šimić
- Laboratory for Toxicology, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierotti St. 6, Zagreb, 10000, Croatia
| | - Kristina Radošević
- Laboratory for Cell Culture Technology and Biotransformations, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierotti St. 6, Zagreb, 10000, Croatia
| | - Marina Miletić
- Laboratory for Toxicology, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierotti St. 6, Zagreb, 10000, Croatia
| |
Collapse
|
3
|
Yu J, Wang D, Zheng M. Uncertainty quantification: Can we trust artificial intelligence in drug discovery? iScience 2022; 25:104814. [PMID: 35996575 PMCID: PMC9391523 DOI: 10.1016/j.isci.2022.104814] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The problem of human trust is one of the most fundamental problems in applied artificial intelligence in drug discovery. In silico models have been widely used to accelerate the process of drug discovery in recent years. However, most of these models can only give reliable predictions within a limited chemical space that the training set covers (applicability domain). Predictions of samples falling outside the applicability domain are unreliable and sometimes dangerous for the drug-design decision-making process. Uncertainty quantification accordingly has drawn great attention to enable autonomous drug designing. By quantifying the confidence level of model predictions, the reliability of the predictions can be quantitatively represented to assist researchers in their molecular reasoning and experimental design. Here we summarize the state-of-the-art approaches to uncertainty quantification and underline how they can be used for drug design and discovery projects. Furthermore, we also outline four representative application scenarios of uncertainty quantification in drug discovery.
Collapse
|
4
|
Kolmar SS, Grulke CM. The effect of noise on the predictive limit of QSAR models. J Cheminform 2021; 13:92. [PMID: 34823605 PMCID: PMC8613965 DOI: 10.1186/s13321-021-00571-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 11/14/2021] [Indexed: 01/09/2023] Open
Abstract
A key challenge in the field of Quantitative Structure Activity Relationships (QSAR) is how to effectively treat experimental error in the training and evaluation of computational models. It is often assumed in the field of QSAR that models cannot produce predictions which are more accurate than their training data. Additionally, it is implicitly assumed, by necessity, that data points in test sets or validation sets do not contain error, and that each data point is a population mean. This work proposes the hypothesis that QSAR models can make predictions which are more accurate than their training data and that the error-free test set assumption leads to a significant misevaluation of model performance. This work used 8 datasets with six different common QSAR endpoints, because different endpoints should have different amounts of experimental error associated with varying complexity of the measurements. Up to 15 levels of simulated Gaussian distributed random error was added to the datasets, and models were built on the error laden datasets using five different algorithms. The models were trained on the error laden data, evaluated on error-laden test sets, and evaluated on error-free test sets. The results show that for each level of added error, the RMSE for evaluation on the error free test sets was always better. The results support the hypothesis that, at least under the conditions of Gaussian distributed random error, QSAR models can make predictions which are more accurate than their training data, and that the evaluation of models on error laden test and validation sets may give a flawed measure of model performance. These results have implications for how QSAR models are evaluated, especially for disciplines where experimental error is very large, such as in computational toxicology. ![]()
Collapse
Affiliation(s)
- Scott S Kolmar
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA.
| | - Christopher M Grulke
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| |
Collapse
|
5
|
Mervin LH, Trapotsi MA, Afzal AM, Barrett IP, Bender A, Engkvist O. Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty. J Cheminform 2021; 13:62. [PMID: 34412708 PMCID: PMC8375213 DOI: 10.1186/s13321-021-00539-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 07/30/2021] [Indexed: 11/24/2022] Open
Abstract
Measurements of protein–ligand interactions have reproducibility limits due to experimental errors. Any model based on such assays will consequentially have such unavoidable errors influencing their performance which should ideally be factored into modelling and output predictions, such as the actual standard deviation of experimental measurements (σ) or the associated comparability of activity values between the aggregated heterogenous activity units (i.e., Ki versus IC50 values) during dataset assimilation. However, experimental errors are usually a neglected aspect of model generation. In order to improve upon the current state-of-the-art, we herein present a novel approach toward predicting protein–ligand interactions using a Probabilistic Random Forest (PRF) classifier. The PRF algorithm was applied toward in silico protein target prediction across ~ 550 tasks from ChEMBL and PubChem. Predictions were evaluated by taking into account various scenarios of experimental standard deviations in both training and test sets and performance was assessed using fivefold stratified shuffled splits for validation. The largest benefit in incorporating the experimental deviation in PRF was observed for data points close to the binary threshold boundary, when such information was not considered in any way in the original RF algorithm. For example, in cases when σ ranged between 0.4–0.6 log units and when ideal probability estimates between 0.4–0.6, the PRF outperformed RF with a median absolute error margin of ~ 17%. In comparison, the baseline RF outperformed PRF for cases with high confidence to belong to the active class (far from the binary decision threshold), although the RF models gave errors smaller than the experimental uncertainty, which could indicate that they were overtrained and/or over-confident. Finally, the PRF models trained with putative inactives decreased the performance compared to PRF models without putative inactives and this could be because putative inactives were not assigned an experimental pXC50 value, and therefore they were considered inactives with a low uncertainty (which in practice might not be true). In conclusion, PRF can be useful for target prediction models in particular for data where class boundaries overlap with the measurement uncertainty, and where a substantial part of the training data is located close to the classification threshold.
Collapse
Affiliation(s)
- Lewis H Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Maria-Anna Trapotsi
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Avid M Afzal
- Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Ian P Barrett
- Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.,Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
6
|
Bender A, Cortés-Ciriano I. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: Ways to make an impact, and why we are not there yet. Drug Discov Today 2020; 26:511-524. [PMID: 33346134 DOI: 10.1016/j.drudis.2020.12.009] [Citation(s) in RCA: 95] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 09/07/2020] [Accepted: 12/11/2020] [Indexed: 12/30/2022]
Abstract
Although artificial intelligence (AI) has had a profound impact on areas such as image recognition, comparable advances in drug discovery are rare. This article quantifies the stages of drug discovery in which improvements in the time taken, success rate or affordability will have the most profound overall impact on bringing new drugs to market. Changes in clinical success rates will have the most profound impact on improving success in drug discovery; in other words, the quality of decisions regarding which compound to take forward (and how to conduct clinical trials) are more important than speed or cost. Although current advances in AI focus on how to make a given compound, the question of which compound to make, using clinical efficacy and safety-related end points, has received significantly less attention. As a consequence, current proxy measures and available data cannot fully utilize the potential of AI in drug discovery, in particular when it comes to drug efficacy and safety in vivo. Thus, addressing the questions of which data to generate and which end points to model will be key to improving clinically relevant decision-making in the future.
Collapse
Affiliation(s)
- Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road CB2 1EW, UK; Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Isidro Cortés-Ciriano
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK.
| |
Collapse
|
7
|
Watson OP, Cortes-Ciriano I, Taylor AR, Watson JA. A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery. Bioinformatics 2020; 35:4656-4663. [PMID: 31070704 PMCID: PMC6853675 DOI: 10.1093/bioinformatics/btz293] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 03/22/2019] [Accepted: 04/17/2019] [Indexed: 02/07/2023] Open
Abstract
Motivation Artificial intelligence, trained via machine learning (e.g. neural nets, random forests) or computational statistical algorithms (e.g. support vector machines, ridge regression), holds much promise for the improvement of small-molecule drug discovery. However, small-molecule structure-activity data are high dimensional with low signal-to-noise ratios and proper validation of predictive methods is difficult. It is poorly understood which, if any, of the currently available machine learning algorithms will best predict new candidate drugs. Results The quantile-activity bootstrap is proposed as a new model validation framework using quantile splits on the activity distribution function to construct training and testing sets. In addition, we propose two novel rank-based loss functions which penalize only the out-of-sample predicted ranks of high-activity molecules. The combination of these methods was used to assess the performance of neural nets, random forests, support vector machines (regression) and ridge regression applied to 25 diverse high-quality structure-activity datasets publicly available on ChEMBL. Model validation based on random partitioning of available data favours models that overfit and ‘memorize’ the training set, namely random forests and deep neural nets. Partitioning based on quantiles of the activity distribution correctly penalizes extrapolation of models onto structurally different molecules outside of the training data. Simpler, traditional statistical methods such as ridge regression can outperform state-of-the-art machine learning methods in this setting. In addition, our new rank-based loss functions give considerably different results from mean squared error highlighting the necessity to define model optimality with respect to the decision task at hand. Availability and implementation All software and data are available as Jupyter notebooks found at https://github.com/owatson/QuantileBootstrap. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Isidro Cortes-Ciriano
- Goring on Thames, Evariste Technologies Ltd., RG8 9AL UK.,Department of Chemistry, Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Aimee R Taylor
- Department of Epidemiology, Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health, Boston, MA 02115 USA.,Infectious Disease Microbiome Program, Broad Institute, Cambridge, MA 02142 USA
| | - James A Watson
- Nuffield Department of Medicine, Centre for Tropical Medicine and Global Health, University of Oxford, Oxford OX3, 7LF UK.,Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| |
Collapse
|
8
|
Cortés-Ciriano I, Škuta C, Bender A, Svozil D. QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction. J Cheminform 2020; 12:41. [PMID: 33431016 PMCID: PMC7339533 DOI: 10.1186/s13321-020-00444-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 05/16/2020] [Indexed: 01/22/2023] Open
Abstract
Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using Ki, Kd, IC50 and EC50 data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC50 data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~ 0.65-0.95 pIC50 units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76-1.00 pIC50 units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02-0.08 pIC50 units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https://github.com/isidroc/QAFFP_regression .
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK. .,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, CB10 1SD, UK.
| | - Ctibor Škuta
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague, Czech Republic
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Daniel Svozil
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague, Czech Republic.,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| |
Collapse
|
9
|
Drakakis G, Cortés-Ciriano I, Alexander-Dann B, Bender A. Elucidating Compound Mechanism of Action and Predicting Cytotoxicity Using Machine Learning Approaches, Taking Prediction Confidence into Account. ACTA ACUST UNITED AC 2020; 11:e73. [PMID: 31483099 DOI: 10.1002/cpch.73] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The modes of action (MoAs) of drugs frequently are unknown, because many are small molecules initially identified from phenotypic screens, giving rise to the need to elucidate their MoAs. In addition, the high attrition rate for candidate drugs in preclinical studies due to intolerable toxicity has motivated the development of computational approaches to predict drug candidate (cyto)toxicity as early as possible in the drug-discovery process. Here, we provide detailed instructions for capitalizing on bioactivity predictions to elucidate the MoAs of small molecules and infer their underlying phenotypic effects. We illustrate how these predictions can be used to infer the underlying antidepressive effects of marketed drugs. We also provide the necessary functionalities to model cytotoxicity data using single and ensemble machine-learning algorithms. Finally, we give detailed instructions on how to calculate confidence intervals for individual predictions using the conformal prediction framework. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Georgios Drakakis
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Ben Alexander-Dann
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
10
|
Tarasova OA, Biziukova NY, Filimonov DA, Poroikov VV, Nicklaus MC. Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications. J Chem Inf Model 2019; 59:3635-3644. [PMID: 31453694 DOI: 10.1021/acs.jcim.9b00164] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
A lot of high quality data on the biological activity of chemical compounds are required throughout the whole drug discovery process: from development of computational models of the structure-activity relationship to experimental testing of lead compounds and their validation in clinics. Currently, a large amount of such data is available from databases, scientific publications, and patents. Biological data are characterized by incompleteness, uncertainty, and low reproducibility. Despite the existence of free and commercially available databases of biological activities of compounds, they usually lack unambiguous information about peculiarities of biological assays. On the other hand, scientific papers are the primary source of new data disclosed to the scientific community for the first time. In this study, we have developed and validated a data-mining approach for extraction of text fragments containing description of bioassays. We have used this approach to evaluate compounds and their biological activity reported in scientific publications. We have found that categorization of papers into relevant and irrelevant may be performed based on the machine-learning analysis of the abstracts. Text fragments extracted from the full texts of publications allow their further partitioning into several classes according to the peculiarities of bioassays. We demonstrate the applicability of our approach to the comparison of the endpoint values of biological activity and cytotoxicity of reference compounds.
Collapse
Affiliation(s)
- Olga A Tarasova
- Department of Bioinformatics , Institute of Biomedical Chemistry , 10 Building 8, Pogodinskaya Street , Moscow 119121 , Russia
| | - Nadezhda Yu Biziukova
- Department of Bioinformatics , Institute of Biomedical Chemistry , 10 Building 8, Pogodinskaya Street , Moscow 119121 , Russia
| | - Dmitry A Filimonov
- Department of Bioinformatics , Institute of Biomedical Chemistry , 10 Building 8, Pogodinskaya Street , Moscow 119121 , Russia
| | - Vladimir V Poroikov
- Department of Bioinformatics , Institute of Biomedical Chemistry , 10 Building 8, Pogodinskaya Street , Moscow 119121 , Russia
| | - Marc C Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research , National Cancer Institute , Frederick , Maryland 21702 , United States
| |
Collapse
|
11
|
Cortés-Ciriano I, Bender A. Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout. J Chem Inf Model 2019; 59:3330-3339. [PMID: 31241929 DOI: 10.1021/acs.jcim.9b00297] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
While the use of deep learning in drug discovery is gaining increasing attention, the lack of methods to compute reliable errors in prediction for Neural Networks prevents their application to guide decision making in domains where identifying unreliable predictions is essential, e.g., precision medicine. Here, we present a framework to compute reliable errors in prediction for Neural Networks using Test-Time Dropout and Conformal Prediction. Specifically, the algorithm consists of training a single Neural Network using dropout, and then applying it N times to both the validation and test sets, also employing dropout in this step. Therefore, for each instance in the validation and test sets an ensemble of predictions are generated. The residuals and absolute errors in prediction for the validation set are then used to compute prediction errors for the test set instances using Conformal Prediction. We show using 24 bioactivity data sets from ChEMBL 23 that Dropout Conformal Predictors are valid (i.e., the fraction of instances whose true value lies within the predicted interval strongly correlates with the confidence level) and efficient, as the predicted confidence intervals span a narrower set of values than those computed with Conformal Predictors generated using Random Forest (RF) models. Lastly, we show in retrospective virtual screening experiments that dropout and RF-based Conformal Predictors lead to comparable retrieval rates of active compounds. Overall, we propose a computationally efficient framework (as only N extra forward passes are required in addition to training a single network) to harness Test-Time Dropout and the Conformal Prediction framework, which is generally applicable to generate reliable prediction errors for Deep Neural Networks in drug discovery and beyond.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| |
Collapse
|
12
|
Cortés-Ciriano I, Bender A. KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images. J Cheminform 2019; 11:41. [PMID: 31218493 PMCID: PMC6582521 DOI: 10.1186/s13321-019-0364-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 06/09/2019] [Indexed: 02/08/2023] Open
Abstract
The application of convolutional neural networks (ConvNets) to harness high-content screening images or 2D compound representations is gaining increasing attention in drug discovery. However, existing applications often require large data sets for training, or sophisticated pretraining schemes. Here, we show using 33 IC50 data sets from ChEMBL 23 that the in vitro activity of compounds on cancer cell lines and protein targets can be accurately predicted on a continuous scale from their Kekulé structure representations alone by extending existing architectures (AlexNet, DenseNet-201, ResNet152 and VGG-19), which were pretrained on unrelated image data sets. We show that the predictive power of the generated models, which just require standard 2D compound representations as input, is comparable to that of Random Forest (RF) models and fully-connected Deep Neural Networks trained on circular (Morgan) fingerprints. Notably, including additional fully-connected layers further increases the predictive power of the ConvNets by up to 10%. Analysis of the predictions generated by RF models and ConvNets shows that by simply averaging the output of the RF models and ConvNets we obtain significantly lower errors in prediction for multiple data sets, although the effect size is small, than those obtained with either model alone, indicating that the features extracted by the convolutional layers of the ConvNets provide complementary predictive signal to Morgan fingerprints. Lastly, we show that multi-task ConvNets trained on compound images permit to model COX isoform selectivity on a continuous scale with errors in prediction comparable to the uncertainty of the data. Overall, in this work we present a set of ConvNet architectures for the prediction of compound activity from their Kekulé structure representations with state-of-the-art performance, that require no generation of compound descriptors or use of sophisticated image processing techniques. The code needed to reproduce the results presented in this study and all the data sets are provided at https://github.com/isidroc/kekulescope .
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| |
Collapse
|
13
|
Hofmarcher M, Rumetshofer E, Clevert DA, Hochreiter S, Klambauer G. Accurate Prediction of Biological Assays with High-Throughput Microscopy Images and Convolutional Networks. J Chem Inf Model 2019; 59:1163-1171. [DOI: 10.1021/acs.jcim.8b00670] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Markus Hofmarcher
- LIT AI Lab & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
| | - Elisabeth Rumetshofer
- LIT AI Lab & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
| | | | - Sepp Hochreiter
- LIT AI Lab & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
| | - Günter Klambauer
- LIT AI Lab & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
| |
Collapse
|
14
|
Cortés-Ciriano I, Bender A. Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Prediction Errors for Deep Neural Networks. J Chem Inf Model 2018; 59:1269-1281. [DOI: 10.1021/acs.jcim.8b00542] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|
15
|
Cortés-Ciriano I, Firth NC, Bender A, Watson O. Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening. J Chem Inf Model 2018; 58:2000-2014. [PMID: 30130102 DOI: 10.1021/acs.jcim.8b00376] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The versatility of similarity searching and quantitative structure-activity relationships to model the activity of compound sets within given bioactivity ranges (i.e., interpolation) is well established. However, their relative performance in the common scenario in early stage drug discovery where lots of inactive data but no active data points are available (i.e., extrapolation from the low-activity to the high-activity range) has not been thoroughly examined yet. To this aim, we have designed an iterative virtual screening strategy which was evaluated on 25 diverse bioactivity data sets from ChEMBL. We benchmark the efficiency of random forest (RF), multiple linear regression, ridge regression, similarity searching, and random selection of compounds to identify a highly active molecule in the test set among a large number of low-potency compounds. We use the number of iterations required to find this active molecule to evaluate the performance of each experimental setup. We show that linear and ridge regression often outperform RF and similarity searching, reducing the number of iterations to find an active compound by a factor of 2 or more. Even simple regression methods seem better able to extrapolate to high-bioactivity ranges than RF, which only provides output values in the range covered by the training set. In addition, examination of the scaffold diversity in the data sets used shows that in some cases similarity searching and RF require two times as many iterations as random selection depending on the chemical space covered in the initial training data. Lastly, we show using bioactivity data for COX-1 and COX-2 that our framework can be extended to multitarget drug discovery, where compounds are selected by concomitantly considering their activity against multiple targets. Overall, this study provides an approach for iterative screening where only inactive data are present in early stages of drug discovery in order to discover highly potent compounds and the best experimental set up in which to do so.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| | - Nicholas C Firth
- Centre for Medical Image Computing, Department of Computer Science , UCL , London WC1E 6BT , United Kingdom.,Evariste Technologies Ltd , Goring on Thames RG8 9AL , United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| | - Oliver Watson
- Evariste Technologies Ltd , Goring on Thames RG8 9AL , United Kingdom
| |
Collapse
|
16
|
Seidel D, Rothe R, Kirsten M, Jahnke HG, Dumann K, Ziemer M, Simon JC, Robitzki AA. A multidimensional impedance platform for the real-time analysis of single and combination drug pharmacology in patient-derived viable melanoma models. Biosens Bioelectron 2018; 123:185-194. [PMID: 30201332 DOI: 10.1016/j.bios.2018.08.049] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 08/14/2018] [Accepted: 08/20/2018] [Indexed: 02/06/2023]
Abstract
In today's development of anticancer drugs, there is an enormous demand for sensitive, non-invasive real-time screening technologies to identify pharmacodynamics/-kinetics of single and combined drugs with high precision. The combination of sophisticated drug sensitivity testing with advanced in vitro tumor models reflecting heterogeneous tumor behavior in vivo is needed to more reasonably predict therapeutic outcome in vivo. In this study, the benefits of our real-time, non-invasive multidimensional impedance platform over standard in vitro drug sensitivity assays were demonstrated quantitatively using an advanced melanoma model. Detailed pharmacological profiles of clinically established targeted therapeutics in single and combination treatment have been identified in patient tissue and isolated 2D/3D cell line cultures. Impedance spectroscopy revealed significant differences in tissue structure responsible for BRAF inhibitor pharmacokinetics in BRAFV600E tumor microfragments and cell lines. Remarkably, BRAF-/MEK inhibitor combination treatment of direct patient-derived tissue, but not melanoma cell lines, resulted in short-term antagonistic effects consistent with in vivo findings. In contrast, the clinically validated resistance delay and thus long-term synergy of targeted therapeutics in advanced melanoma models has been demonstrated using impedance technology. The results demonstrate limited clinical transferability of 2D/3D cancer cell line-based chemosensitivity data and underline the importance of in vivo-like direct patient-derived tissue for predictive drug studies. Our non-invasive and highly sensitive multidimensional impedance platform offers great potential for quantifying short- and long-term drug kinetics and synergies to identify the most effective drug combinations in advanced cancer models, thereby improving personalized drug development and treatment planning and ultimately, overall patient outcomes.
Collapse
Affiliation(s)
- Diana Seidel
- Center for Biotechnology and Biomedicine (BBZ), Universität Leipzig, Division of Molecular Biological-Biochemical Processing Technology, Deutscher Platz 5, 04103 Leipzig, Germany
| | - Rebecca Rothe
- Center for Biotechnology and Biomedicine (BBZ), Universität Leipzig, Division of Molecular Biological-Biochemical Processing Technology, Deutscher Platz 5, 04103 Leipzig, Germany
| | - Mandy Kirsten
- Center for Biotechnology and Biomedicine (BBZ), Universität Leipzig, Division of Molecular Biological-Biochemical Processing Technology, Deutscher Platz 5, 04103 Leipzig, Germany
| | - Heinz-Georg Jahnke
- Center for Biotechnology and Biomedicine (BBZ), Universität Leipzig, Division of Molecular Biological-Biochemical Processing Technology, Deutscher Platz 5, 04103 Leipzig, Germany
| | - Konstantin Dumann
- Leipzig University Medical Center, Department of Dermatology, Venerology and Allergology, Philipp-Rosenthal-Str. 23, 04103 Leipzig, Germany
| | - Mirjana Ziemer
- Leipzig University Medical Center, Department of Dermatology, Venerology and Allergology, Philipp-Rosenthal-Str. 23, 04103 Leipzig, Germany
| | - Jan-Christoph Simon
- Leipzig University Medical Center, Department of Dermatology, Venerology and Allergology, Philipp-Rosenthal-Str. 23, 04103 Leipzig, Germany
| | - Andrea A Robitzki
- Center for Biotechnology and Biomedicine (BBZ), Universität Leipzig, Division of Molecular Biological-Biochemical Processing Technology, Deutscher Platz 5, 04103 Leipzig, Germany.
| |
Collapse
|
17
|
Svensson F, Aniceto N, Norinder U, Cortes-Ciriano I, Spjuth O, Carlsson L, Bender A. Conformal Regression for Quantitative Structure–Activity Relationship Modeling—Quantifying Prediction Uncertainty. J Chem Inf Model 2018; 58:1132-1140. [DOI: 10.1021/acs.jcim.8b00054] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Fredrik Svensson
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
- IOTA Pharmaceuticals, St Johns Innovation Centre, Cowley Road, Cambridge CB4 0WS, U.K
| | - Natalia Aniceto
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Ulf Norinder
- Swetox, Unit of Toxicology Sciences, Karolinska Institutet, Forskargatan 20, SE-151 36 Södertälje, Sweden
- Department of Computer and Systems Sciences, Stockholm University, Box 7003, SE-164 07 Kista, Sweden
| | - Isidro Cortes-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, SE-75124, Uppsala Sweden
| | - Lars Carlsson
- Quantitative Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, SE-43183, Mölndal, Sweden
- Department of Computer Science, Royal Holloway, University of London, Egham Hill, Surrey, U.K
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
18
|
Kireeva N, Pervov VS. Materials space of solid-state electrolytes: unraveling chemical composition–structure–ionic conductivity relationships in garnet-type metal oxides using cheminformatics virtual screening approaches. Phys Chem Chem Phys 2017; 19:20904-20918. [DOI: 10.1039/c7cp00518k] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Several candidate garnet-related compounds have been recommended for synthesis as potential materials for solid-state electrolytes.
Collapse
Affiliation(s)
- Natalia Kireeva
- Frumkin Institute of Physical Chemistry and Electrochemistry Russian Academy of Sciences
- Moscow
- Russia
- Moscow Institute of Physics and Technology (State University)
- Dolgoprudny
| | - Vladislav S. Pervov
- Kurnakov Institute of General and Inorganic Chemistry Russian Academy of Sciences
- Moscow
- Russia
| |
Collapse
|
19
|
Svensson F, Norinder U, Bender A. Modelling compound cytotoxicity using conformal prediction and PubChem HTS data. Toxicol Res (Camb) 2017; 6:73-80. [PMID: 30090478 PMCID: PMC6061930 DOI: 10.1039/c6tx00252h] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 10/28/2016] [Indexed: 12/28/2022] Open
Abstract
The assessment of compound cytotoxicity is an important part of the drug discovery process. Accurate predictions of cytotoxicity have the potential to expedite decision making and save considerable time and effort. In this work we apply class conditional conformal prediction to model the cytotoxicity of compounds based on 16 high throughput cytotoxicity assays from PubChem. The data span 16 cell lines and comprise more than 440 000 unique compounds. The data sets are heavily imbalanced with only 0.8% of the tested compounds being cytotoxic. We trained one classification model for each cell line and validated the performance with respect to validity and accuracy. The generated models deliver high quality predictions for both toxic and non-toxic compounds despite the imbalance between the two classes. On external data collected from the same assay provider as one of the investigated cell lines the model had a sensitivity of 74% and a specificity of 65% at the 80% confidence level among the compounds assigned to a single class. Compared to previous approaches for large scale cytotoxicity modelling, this represents a balanced performance in the prediction of the toxic and non-toxic classes. The conformal prediction framework also allows the modeller to control the error frequency of the predictions, allowing predictions of cytotoxicity outcomes with confidence.
Collapse
Affiliation(s)
- Fredrik Svensson
- Centre for Molecular Informatics , Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , UK .
| | - Ulf Norinder
- Swedish Toxicology Sciences Research Center , SE-151 36 Södertälje , Sweden
- Dept. Computer and Systems Sciences , Stockholm Univ. , Box 7003 , SE-164 07 Kista , Sweden
| | - Andreas Bender
- Centre for Molecular Informatics , Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , UK .
| |
Collapse
|
20
|
Santo VE, Rebelo SP, Estrada MF, Alves PM, Boghaert E, Brito C. Drug screening in 3D in vitro tumor models: overcoming current pitfalls of efficacy read-outs. Biotechnol J 2016; 12. [PMID: 27966285 DOI: 10.1002/biot.201600505] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Revised: 10/24/2016] [Accepted: 11/10/2016] [Indexed: 12/13/2022]
Abstract
There is cumulating evidence that in vitro 3D tumor models with increased physiological relevance can improve the predictive value of pre-clinical research and ultimately contribute to achieve decisions earlier during the development of cancer-targeted therapies. Due to the role of tumor microenvironment in the response of tumor cells to therapeutics, the incorporation of different elements of the tumor niche on cell model design is expected to contribute to the establishment of more predictive in vitro tumor models. This review is focused on the several challenges and adjustments that the field of oncology research is facing to translate these advanced tumor cells models to drug discovery, taking advantage of the progress on culture technologies, imaging platforms, high throughput and automated systems. The choice of 3D cell model, the experimental design, choice of read-outs and interpretation of data obtained from 3D cell models are critical aspects when considering their implementation in drug discovery. In this review, we foresee some of these aspects and depict the potential directions of pre-clinical oncology drug discovery towards improved prediction of drug efficacy.
Collapse
Affiliation(s)
- Vítor E Santo
- iBET, Instituto de Biologia Experimental e Tecnológica, Oeiras, Portugal.,Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Oeiras, Portugal
| | - Sofia P Rebelo
- iBET, Instituto de Biologia Experimental e Tecnológica, Oeiras, Portugal.,Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Oeiras, Portugal
| | - Marta F Estrada
- iBET, Instituto de Biologia Experimental e Tecnológica, Oeiras, Portugal.,Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Oeiras, Portugal
| | - Paula M Alves
- iBET, Instituto de Biologia Experimental e Tecnológica, Oeiras, Portugal.,Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Oeiras, Portugal
| | | | - Catarina Brito
- iBET, Instituto de Biologia Experimental e Tecnológica, Oeiras, Portugal.,Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Oeiras, Portugal
| |
Collapse
|
21
|
Mervin LH, Cao Q, Barrett IP, Firth MA, Murray D, McWilliams L, Haddrick M, Wigglesworth M, Engkvist O, Bender A. Understanding Cytotoxicity and Cytostaticity in a High-Throughput Screening Collection. ACS Chem Biol 2016; 11:3007-3023. [PMID: 27571164 DOI: 10.1021/acschembio.6b00538] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
While mechanisms of cytotoxicity and cytostaticity have been studied extensively from the biological side, relatively little is currently understood regarding areas of chemical space leading to cytotoxicity and cytostasis in large compound collections. Predicting and rationalizing potential adverse mechanism-of-actions (MoAs) of small molecules is however crucial for screening library design, given the link of even low level cytotoxicity and adverse events observed in man. In this study, we analyzed results from a cell-based cytotoxicity screening cascade, comprising 296 970 nontoxic, 5784 cytotoxic and cytostatic, and 2327 cytostatic-only compounds evaluated on the THP-1 cell-line. We employed an in silico MoA analysis protocol, utilizing 9.5 million active and 602 million inactive bioactivity points to generate target predictions, annotate predicted targets with pathways, and calculate enrichment metrics to highlight targets and pathways. Predictions identify known mechanisms for the top ranking targets and pathways for both phenotypes after review and indicate that while processes involved in cytotoxicity versus cytostaticity seem to overlap, differences between both phenotypes seem to exist to some extent. Cytotoxic predictions highlight many kinases, including the potentially novel cytotoxicity-related target STK32C, while cytostatic predictions outline targets linked with response to DNA damage, metabolism, and cytoskeletal machinery. Fragment analysis was also employed to generate a library of toxicophores to improve general understanding of the chemical features driving toxicity. We highlight substructures with potential kinase-dependent and kinase-independent mechanisms of toxicity. We also trained a cytotoxic classification model on proprietary and public compound readouts, and prospectively validated these on 988 novel compounds comprising difficult and trivial testing instances, to establish the applicability domain of models. The proprietary model performed with precision and recall scores of 77.9% and 83.8%, respectively. The MoA results and top ranking substructures with accompanying MoA predictions are available as a platform to assess screening collections.
Collapse
Affiliation(s)
- Lewis H. Mervin
- Centre
for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Qing Cao
- Discovery Sciences, AstraZeneca R&D, Waltham, United States
| | - Ian P. Barrett
- Discovery Sciences, AstraZeneca R&D, Cambridge Science Park, Cambridge, United Kingdom
| | - Mike A. Firth
- Discovery Sciences, AstraZeneca R&D, Cambridge Science Park, Cambridge, United Kingdom
| | - David Murray
- Discovery Sciences, AstraZeneca R&D, Alderley Park, Macclesfield, United Kingdom
| | - Lisa McWilliams
- Discovery Sciences, AstraZeneca R&D, Alderley Park, Macclesfield, United Kingdom
| | - Malcolm Haddrick
- Discovery Sciences, AstraZeneca R&D, Alderley Park, Macclesfield, United Kingdom
| | - Mark Wigglesworth
- Discovery Sciences, AstraZeneca R&D, Alderley Park, Macclesfield, United Kingdom
| | - Ola Engkvist
- Discovery Sciences, AstraZeneca R&D, Mölndal, Sweden
| | - Andreas Bender
- Centre
for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|