1
|
Wang N, Li X, Xiao J, Liu S, Cao D. Data-driven toxicity prediction in drug discovery: Current status and future directions. Drug Discov Today 2024; 29:104195. [PMID: 39357621 DOI: 10.1016/j.drudis.2024.104195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 09/13/2024] [Accepted: 09/26/2024] [Indexed: 10/04/2024]
Abstract
Early toxicity assessment plays a vital role in the drug discovery process on account of its significant influence on the attrition rate of candidates. Recently, constant upgrading of information technology has greatly promoted the continuous development of toxicity prediction. To give an overview of the current state of data-driven toxicity prediction, we reviewed relevant studies and summarized them in three main respects: the features and difficulties of toxicity prediction, the evolution of modeling approaches, and the available tools for toxicity prediction. For each part, we expound the research status, existing challenges, and feasible solutions. Finally, several new directions and suggestions for toxicity prediction are also put forward.
Collapse
Affiliation(s)
- Ningning Wang
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; The Hunan Institute of Pharmacy Practice and Clinical Research, Changsha 410008 Hunan, PR China
| | - Xinliang Li
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; The Hunan Institute of Pharmacy Practice and Clinical Research, Changsha 410008 Hunan, PR China
| | - Jing Xiao
- Hunan Institute for Drug Control, Changsha 410001 Hunan, PR China
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; The Hunan Institute of Pharmacy Practice and Clinical Research, Changsha 410008 Hunan, PR China.
| | - Dongsheng Cao
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, PR China.
| |
Collapse
|
2
|
Arvidsson McShane S, Norinder U, Alvarsson J, Ahlberg E, Carlsson L, Spjuth O. CPSign: conformal prediction for cheminformatics modeling. J Cheminform 2024; 16:75. [PMID: 38943219 PMCID: PMC11214261 DOI: 10.1186/s13321-024-00870-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 06/11/2024] [Indexed: 07/01/2024] Open
Abstract
Conformal prediction has seen many applications in pharmaceutical science, being able to calibrate outputs of machine learning models and producing valid prediction intervals. We here present the open source software CPSign that is a complete implementation of conformal prediction for cheminformatics modeling. CPSign implements inductive and transductive conformal prediction for classification and regression, and probabilistic prediction with the Venn-ABERS methodology. The main chemical representation is signatures but other types of descriptors are also supported. The main modeling methodology is support vector machines (SVMs), but additional modeling methods are supported via an extension mechanism, e.g. DeepLearning4J models. We also describe features for visualizing results from conformal models including calibration and efficiency plots, as well as features to publish predictive models as REST services. We compare CPSign against other common cheminformatics modeling approaches including random forest, and a directed message-passing neural network. The results show that CPSign produces robust predictive performance with comparative predictive efficiency, with superior runtime and lower hardware requirements compared to neural network based models. CPSign has been used in several studies and is in production-use in multiple organizations. The ability to work directly with chemical input files, perform descriptor calculation and modeling with SVM in the conformal prediction framework, with a single software package having a low footprint and fast execution time makes CPSign a convenient and yet flexible package for training, deploying, and predicting on chemical data. CPSign can be downloaded from GitHub at https://github.com/arosbio/cpsign .Scientific contribution CPSign provides a single software that allows users to perform data preprocessing, modeling and make predictions directly on chemical structures, using conformal and probabilistic prediction. Building and evaluating new models can be achieved at a high abstraction level, without sacrificing flexibility and predictive performance-showcased with a method evaluation against contemporary modeling approaches, where CPSign performs on par with a state-of-the-art deep learning based model.
Collapse
Affiliation(s)
- Staffan Arvidsson McShane
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden
- Department of Computer and Systems Sciences, Stockholm University, Stockholm, 10587, Sweden
- MTM Research Centre, School of Science and Technology, Örebro University, Örebro, 70182, Sweden
| | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden
| | - Ernst Ahlberg
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden
- Department of Computer Science, Royal Holloway University of London, Egham, TW20 0EX, UK
| | - Lars Carlsson
- Department of Computer Science, Royal Holloway University of London, Egham, TW20 0EX, UK
- Department of Computing, Jönköping University, Jönköping, 55111, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden.
| |
Collapse
|
3
|
Mikutis S, Lawrinowitz S, Kretzer C, Dunsmore L, Sketeris L, Rodrigues T, Werz O, Bernardes GJL. Machine Learning Uncovers Natural Product Modulators of the 5-Lipoxygenase Pathway and Facilitates the Elucidation of Their Biological Mechanisms. ACS Chem Biol 2024; 19:217-229. [PMID: 38149598 PMCID: PMC10804367 DOI: 10.1021/acschembio.3c00725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 12/10/2023] [Accepted: 12/12/2023] [Indexed: 12/28/2023]
Abstract
Machine learning (ML) models have made inroads into chemical sciences, with optimization of chemical reactions and prediction of biologically active molecules being prime examples thereof. These models excel where physical experiments are expensive or time-consuming, for example, due to large scales or the need for materials that are difficult to obtain. Studies of natural products suffer from these issues─this class of small molecules is known for its wealth of structural diversity and wide-ranging biological activities, but their investigation is hindered by poor synthetic accessibility and lack of scalability. To facilitate the evaluation of these molecules, we designed ML models that predict which natural products can interact with a particular target or a relevant pathway. Here, we focused on discovering natural products that are capable of modulating the 5-lipoxygenase (5-LO) pathway that plays key roles in lipid signaling and inflammation. These computational approaches led to the identification of nine natural products that either directly inhibit the activity of the 5-LO enzyme or affect the cellular 5-LO pathway. Further investigation of one of these molecules, deltonin, led us to discover a new cell-type-selective mechanism of action. Our ML approach helped deorphanize natural products as well as shed light on their mechanisms and can be broadly applied to other use cases in chemical biology.
Collapse
Affiliation(s)
- Sigitas Mikutis
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K.
| | - Stefanie Lawrinowitz
- Department
of Pharmaceutical/Medicinal Chemistry, Institute of Pharmacy, Friedrich Schiller University Jena, Philosophenweg 14, 07743 Jena, Germany
| | - Christian Kretzer
- Department
of Pharmaceutical/Medicinal Chemistry, Institute of Pharmacy, Friedrich Schiller University Jena, Philosophenweg 14, 07743 Jena, Germany
| | - Lavinia Dunsmore
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K.
| | - Laurynas Sketeris
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K.
| | - Tiago Rodrigues
- Instituto
de Investigação do Medicamento (iMed), Faculdade de
Farmácia, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003 Lisbon, Portugal
| | - Oliver Werz
- Department
of Pharmaceutical/Medicinal Chemistry, Institute of Pharmacy, Friedrich Schiller University Jena, Philosophenweg 14, 07743 Jena, Germany
| | - Gonçalo J. L. Bernardes
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K.
- Instituto
de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, Avenida Professor Egas Moniz, 1649-028 Lisboa, Portugal
| |
Collapse
|
4
|
Guo W, Liu J, Dong F, Song M, Li Z, Khan MKH, Patterson TA, Hong H. Review of machine learning and deep learning models for toxicity prediction. Exp Biol Med (Maywood) 2023; 248:1952-1973. [PMID: 38057999 PMCID: PMC10798180 DOI: 10.1177/15353702231209421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023] Open
Abstract
The ever-increasing number of chemicals has raised public concerns due to their adverse effects on human health and the environment. To protect public health and the environment, it is critical to assess the toxicity of these chemicals. Traditional in vitro and in vivo toxicity assays are complicated, costly, and time-consuming and may face ethical issues. These constraints raise the need for alternative methods for assessing the toxicity of chemicals. Recently, due to the advancement of machine learning algorithms and the increase in computational power, many toxicity prediction models have been developed using various machine learning and deep learning algorithms such as support vector machine, random forest, k-nearest neighbors, ensemble learning, and deep neural network. This review summarizes the machine learning- and deep learning-based toxicity prediction models developed in recent years. Support vector machine and random forest are the most popular machine learning algorithms, and hepatotoxicity, cardiotoxicity, and carcinogenicity are the frequently modeled toxicity endpoints in predictive toxicology. It is known that datasets impact model performance. The quality of datasets used in the development of toxicity prediction models using machine learning and deep learning is vital to the performance of the developed models. The different toxicity assignments for the same chemicals among different datasets of the same type of toxicity have been observed, indicating benchmarking datasets is needed for developing reliable toxicity prediction models using machine learning and deep learning algorithms. This review provides insights into current machine learning models in predictive toxicology, which are expected to promote the development and application of toxicity prediction models in the future.
Collapse
Affiliation(s)
- Wenjing Guo
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Jie Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Fan Dong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Meng Song
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Zoe Li
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Md Kamrul Hasan Khan
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Tucker A Patterson
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| |
Collapse
|
5
|
Morger A, Garcia de Lomana M, Norinder U, Svensson F, Kirchmair J, Mathea M, Volkamer A. Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data. Sci Rep 2022; 12:7244. [PMID: 35508546 PMCID: PMC9068909 DOI: 10.1038/s41598-022-09309-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 03/17/2022] [Indexed: 11/09/2022] Open
Abstract
Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly-available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models.
Collapse
Affiliation(s)
- Andrea Morger
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany
| | - Marina Garcia de Lomana
- BASF SE, 67056, Ludwigshafen, Germany
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, 751 24, Sweden
- Dept Computer and Systems Sciences, Stockholm University, Kista, 164 07, Sweden
- MTM Research Centre, School of Science and Technology, 701 82, Örebro, Sweden
| | - Fredrik Svensson
- Alzheimer's Research UK UCL Drug Discovery Institute, London, WC1E 6BT, UK
| | - Johannes Kirchmair
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria
| | | | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany.
| |
Collapse
|
6
|
In silico predictions of the gastrointestinal uptake of macrocycles in man using conformal prediction methodology. J Pharm Sci 2022; 111:2614-2619. [DOI: 10.1016/j.xphs.2022.05.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 05/16/2022] [Accepted: 05/16/2022] [Indexed: 11/17/2022]
|
7
|
Fagerholm U, Hellberg S, Alvarsson J, Spjuth O. In silico predictions of the human pharmacokinetics/toxicokinetics of 65 chemicals from various classes using conformal prediction methodology. Xenobiotica 2022; 52:113-118. [PMID: 35238270 DOI: 10.1080/00498254.2022.2049397] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Pharmacokinetic/toxicokinetic (PK/TK) information for chemicals in humans is generally lacking. Here we applied machine learning, conformal prediction and a new physiologically-based PK/TK model for prediction of the human PK/TK of 65 chemicals from different classes, including carcinogens, food constituents and preservatives, vitamins, sweeteners, dyes and colours, pesticides, alternative medicines, flame retardants, psychoactive drugs, dioxins, poisons, UV-absorbents, surfactants, solvents and cosmetics.About 80% of the main human PK/TK (fraction absorbed, oral bioavailability, half-life, unbound fraction in plasma, clearance, volume of distribution, fraction excreted) for the selected chemicals was missing in the literature. This information was now added (from in silico predictions). Median and mean prediction errors for these parameters were 1.3- to 2.7-fold and 1.4- to 4.8-fold, respectively. In total, 59 and 86% of predictions had errors <2- and <5-fold, respectively. Predicted and observed PK/TK for the chemicals was generally within the range for pharmaceutical drugs.The results validated the new integrated system for prediction of the human PK/TK for different chemicals and added important missing information. No general difference in PK/TK-characteristics was found between the selected chemicals and pharmaceutical drugs.
Collapse
Affiliation(s)
| | | | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, Uppsala, SE-751 24 Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, Uppsala, 75124 Sweden
| |
Collapse
|
8
|
Klutzny S, Kornhuber M, Morger A, Schönfelder G, Volkamer A, Oelgeschläger M, Dunst S. Quantitative high-throughput phenotypic screening for environmental estrogens using the E-Morph Screening Assay in combination with in silico predictions. ENVIRONMENT INTERNATIONAL 2022; 158:106947. [PMID: 34717173 DOI: 10.1016/j.envint.2021.106947] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 10/14/2021] [Accepted: 10/18/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND Exposure to environmental chemicals that interfere with normal estrogen function can lead to adverse health effects, including cancer. High-throughput screening (HTS) approaches facilitate the efficient identification and characterization of such substances. OBJECTIVES We recently described the development of the E-Morph Assay, which measures changes at adherens junctions as a clinically-relevant phenotypic readout for estrogen receptor (ER) alpha signaling activity. Here, we describe its further development and application for automated robotic HTS. METHODS Using the advanced E-Morph Screening Assay, we screened a substance library comprising 430 toxicologically-relevant industrial chemicals, biocides, and plant protection products to identify novel substances with estrogenic activities. Based on the primary screening data and the publicly available ToxCast dataset, we performed an insilico similarity search to identify further substances with potential estrogenic activity for follow-up hit expansion screening, and built seven insilico ER models using the conformal prediction (CP) framework to evaluate the HTS results. RESULTS The primary and hit confirmation screens identified 27 'known' estrogenic substances with potencies correlating very well with the published ToxCast ER Agonist Score (r=+0.95). We additionally detected potential 'novel' estrogenic activities for 10 primary hit substances and for another nine out of 20 structurally similar substances from insilico predictions and follow-up hit expansion screening. The concordance of the E-Morph Screening Assay with the ToxCast ER reference data and the generated CP ER models was 71% and 73%, respectively, with a high predictivity for ER active substances of up to 87%, which is particularly important for regulatory purposes. DISCUSSION These data provide a proof-of-concept for the combination of in vitro HTS approaches with insilico methods (similarity search, CP models) for efficient analysis of large substance libraries in order to prioritize substances with potential estrogenic activity for subsequent testing against higher tier human endpoints.
Collapse
Affiliation(s)
- Saskia Klutzny
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany
| | - Marja Kornhuber
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany; Freie Universität Berlin, Berlin, Germany
| | - Andrea Morger
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
| | - Gilbert Schönfelder
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany; Institute of Clinical Pharmacology and Toxicology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
| | - Andrea Volkamer
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
| | - Michael Oelgeschläger
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany
| | - Sebastian Dunst
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany.
| |
Collapse
|
9
|
Fagerholm U, Hellberg S, Alvarsson J, Arvidsson McShane S, Spjuth O. In silico prediction of volume of distribution of drugs in man using conformal prediction performs on par with animal data-based models. Xenobiotica 2021; 51:1366-1371. [PMID: 34845977 DOI: 10.1080/00498254.2021.2011471] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Volume of distribution at steady state (Vss) is an important pharmacokinetic endpoint. In this study we apply machine learning and conformal prediction for human Vss prediction, and make a head-to-head comparison with rat-to-man scaling, allometric scaling and the Rodgers-Lukova method on combined in silico and in vitro data, using a test set of 105 compounds with experimentally observed Vss.The mean prediction error and % with <2-fold prediction error for our method were 2.4-fold and 64%, respectively. 69% of test compounds had an observed Vss within the prediction interval at a 70% confidence level. In comparison, 2.2-, 2.9- and 3.1-fold mean errors and 69, 64 and 61% of predictions with <2-fold error was reached with rat-to-man and allometric scaling and Rodgers-Lukova method, respectively.We conclude that our method has theoretically proven validity that was empirically confirmed, and showing predictive accuracy on par with animal models and superior to an alternative widely used in silico-based method. The option for the user to select the level of confidence in predictions offers better guidance on how to optimise Vss in drug discovery applications.
Collapse
Affiliation(s)
| | | | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Staffan Arvidsson McShane
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| |
Collapse
|
10
|
Kolmar SS, Grulke CM. The effect of noise on the predictive limit of QSAR models. J Cheminform 2021; 13:92. [PMID: 34823605 PMCID: PMC8613965 DOI: 10.1186/s13321-021-00571-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 11/14/2021] [Indexed: 01/09/2023] Open
Abstract
A key challenge in the field of Quantitative Structure Activity Relationships (QSAR) is how to effectively treat experimental error in the training and evaluation of computational models. It is often assumed in the field of QSAR that models cannot produce predictions which are more accurate than their training data. Additionally, it is implicitly assumed, by necessity, that data points in test sets or validation sets do not contain error, and that each data point is a population mean. This work proposes the hypothesis that QSAR models can make predictions which are more accurate than their training data and that the error-free test set assumption leads to a significant misevaluation of model performance. This work used 8 datasets with six different common QSAR endpoints, because different endpoints should have different amounts of experimental error associated with varying complexity of the measurements. Up to 15 levels of simulated Gaussian distributed random error was added to the datasets, and models were built on the error laden datasets using five different algorithms. The models were trained on the error laden data, evaluated on error-laden test sets, and evaluated on error-free test sets. The results show that for each level of added error, the RMSE for evaluation on the error free test sets was always better. The results support the hypothesis that, at least under the conditions of Gaussian distributed random error, QSAR models can make predictions which are more accurate than their training data, and that the evaluation of models on error laden test and validation sets may give a flawed measure of model performance. These results have implications for how QSAR models are evaluated, especially for disciplines where experimental error is very large, such as in computational toxicology. ![]()
Collapse
Affiliation(s)
- Scott S Kolmar
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA.
| | - Christopher M Grulke
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| |
Collapse
|
11
|
Krstajic D. Critical Assessment of Conformal Prediction Methods Applied in Binary Classification Settings. J Chem Inf Model 2021; 61:4823-4826. [PMID: 34550693 DOI: 10.1021/acs.jcim.1c00549] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In recent years there has been an increase in the number of scientific papers that suggest using conformal predictions in drug discovery. We consider that some versions of conformal predictions applied in binary settings are embroiled in pitfalls, not obvious at first sight, and that it is important to inform the scientific community about them. In the paper we first introduce the general theory of conformal predictions and follow with an explanation of the version currently dominant in drug discovery research today. Finally, we provide cases for their critical assessment in binary classification settings.
Collapse
Affiliation(s)
- Damjan Krstajic
- Research Centre for Cheminformatics, Jasenova 7, 11030 Beograd, Serbia
| |
Collapse
|
12
|
Norinder U, Spjuth O, Svensson F. Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning. J Cheminform 2021; 13:77. [PMID: 34600569 PMCID: PMC8487527 DOI: 10.1186/s13321-021-00555-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 09/15/2021] [Indexed: 12/05/2022] Open
Abstract
Confidence predictors can deliver predictions with the associated confidence required for decision making and can play an important role in drug discovery and toxicity predictions. In this work we investigate a recently introduced version of conformal prediction, synergy conformal prediction, focusing on the predictive performance when applied to bioactivity data. We compare the performance to other variants of conformal predictors for multiple partitioned datasets and demonstrate the utility of synergy conformal predictors for federated learning where data cannot be pooled in one location. Our results show that synergy conformal predictors based on training data randomly sampled with replacement can compete with other conformal setups, while using completely separate training sets often results in worse performance. However, in a federated setup where no method has access to all the data, synergy conformal prediction is shown to give promising results. Based on our study, we conclude that synergy conformal predictors are a valuable addition to the conformal prediction toolbox.
Collapse
Affiliation(s)
- Ulf Norinder
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, SE-75124, Uppsala, Sweden.,Department of Computer and Systems Sciences, Stockholm University, Box 7003, 164 07, Kista, Sweden.,MTM Research Centre, School of Science and Technology, Örebro University, 70182, Örebro, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, SE-75124, Uppsala, Sweden.
| | - Fredrik Svensson
- Alzheimer's Research UK UCL Drug Discovery Institute, University College London, The Cruciform Building, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
13
|
Huang Y, Wang J, Wang S, Xu X, Qin W, Wen Y, Zhao YH, Martyniuk CJ. Discrimination of active and inactive substances in cytotoxicity based on Tox21 10K compound library: Structure alert and mode of action. Toxicology 2021; 462:152948. [PMID: 34530041 DOI: 10.1016/j.tox.2021.152948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 08/28/2021] [Accepted: 09/08/2021] [Indexed: 10/20/2022]
Abstract
In vitro cytotoxicity assay is an ideal alternative method for the in vivo toxicity in the risk assessment of pollutants in environment. However, modes of action (MOAs) of cytotoxicity have not been investigated for a wide range of compounds. In this paper, binomial and recursive partitioning analysis were carried out between the cytotoxicity and molecular descriptors for 8981 compounds. The results showed that cytotoxicity is strongly related to the chemical hydrophobicity and excess molar refraction, indicating the bio-uptake and chemical-receptor interaction through π and n electron pair play important roles in the cytotoxicity. The decision tree derived from recursive partitioning analysis revealed that the studied compounds could be divided into 25 groups and their structural characteristics could be used as structure alert to identify active and inactive compounds in cytotoxicity. The descriptors used in the decision tree revealed that chemical ionization and bioavailability could affect the cytotoxicity for ionizable and highly hydrophobic compounds. Comparison of MOAs based on Verhaar's classification scheme showed that many inert or less inert compounds were inactive substance, and many reactive or specifically-acting compounds were active substances in the cytotoxicity. In vitro toxicity assay instead of in vivo toxicity assay can be used in the environmental hazard and risk assessment of organic pollutants. The descriptors used in the binomial equation and decision tree reveal that chemical hydrophobicity, ionization and solubility play very important roles for identification of active and inactive compounds. The results obtained in this paper are valuable for understanding the modes of action in cytotoxicity and in vivo-in vitro toxicity relationship.
Collapse
Affiliation(s)
- Ying Huang
- State Environmental Protection Key Laboratory of Wetland Ecology and Vegetation Restoration, School of Environment, Northeast Normal University, Changchun, Jilin 130117, PR China
| | - Jia Wang
- State Environmental Protection Key Laboratory of Wetland Ecology and Vegetation Restoration, School of Environment, Northeast Normal University, Changchun, Jilin 130117, PR China
| | - Shuo Wang
- State Environmental Protection Key Laboratory of Wetland Ecology and Vegetation Restoration, School of Environment, Northeast Normal University, Changchun, Jilin 130117, PR China
| | - Xiaotian Xu
- State Environmental Protection Key Laboratory of Wetland Ecology and Vegetation Restoration, School of Environment, Northeast Normal University, Changchun, Jilin 130117, PR China
| | - Weichao Qin
- State Environmental Protection Key Laboratory of Wetland Ecology and Vegetation Restoration, School of Environment, Northeast Normal University, Changchun, Jilin 130117, PR China
| | - Yang Wen
- Key Laboratory of Environmental Materials and Pollution Control, The Education Department of Jilin Province, School of Environmental Science and Engineering, Jilin Normal University, Siping, Jilin 136000, PR China.
| | - Yuan H Zhao
- State Environmental Protection Key Laboratory of Wetland Ecology and Vegetation Restoration, School of Environment, Northeast Normal University, Changchun, Jilin 130117, PR China.
| | - Christopher J Martyniuk
- Center for Environmental and Human Toxicology, Department of Physiological Sciences, College of Veterinary Medicine, UF Genetics Institute, Interdisciplinary Program in Biomedical Sciences Neuroscience, University of Florida, Gainesville, FL, 32611, USA
| |
Collapse
|
14
|
MohammadiPeyhani H, Chiappino-Pepe A, Haddadi K, Hafner J, Hadadi N, Hatzimanikatis V. NICEdrug.ch, a workflow for rational drug design and systems-level analysis of drug metabolism. eLife 2021; 10:e65543. [PMID: 34340747 PMCID: PMC8331181 DOI: 10.7554/elife.65543] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 07/07/2021] [Indexed: 12/30/2022] Open
Abstract
The discovery of a drug requires over a decade of intensive research and financial investments - and still has a high risk of failure. To reduce this burden, we developed the NICEdrug.ch resource, which incorporates 250,000 bioactive molecules, and studied their enzymatic metabolic targets, fate, and toxicity. NICEdrug.ch includes a unique fingerprint that identifies reactive similarities between drug-drug and drug-metabolite pairs. We validated the application, scope, and performance of NICEdrug.ch over similar methods in the field on golden standard datasets describing drugs and metabolites sharing reactivity, drug toxicities, and drug targets. We use NICEdrug.ch to evaluate inhibition and toxicity by the anticancer drug 5-fluorouracil, and suggest avenues to alleviate its side effects. We propose shikimate 3-phosphate for targeting liver-stage malaria with minimal impact on the human host cell. Finally, NICEdrug.ch suggests over 1300 candidate drugs and food molecules to target COVID-19 and explains their inhibitory mechanism for further experimental screening. The NICEdrug.ch database is accessible online to systematically identify the reactivity of small molecules and druggable enzymes with practical applications in lead discovery and drug repurposing.
Collapse
Affiliation(s)
- Homa MohammadiPeyhani
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, EPFLLausanneSwitzerland
| | - Anush Chiappino-Pepe
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, EPFLLausanneSwitzerland
| | - Kiandokht Haddadi
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, EPFLLausanneSwitzerland
| | - Jasmin Hafner
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, EPFLLausanneSwitzerland
| | - Noushin Hadadi
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, EPFLLausanneSwitzerland
| | - Vassily Hatzimanikatis
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, EPFLLausanneSwitzerland
| |
Collapse
|
15
|
Nigam A, Pollice R, Hurley MFD, Hickman RJ, Aldeghi M, Yoshikawa N, Chithrananda S, Voelz VA, Aspuru-Guzik A. Assigning confidence to molecular property prediction. Expert Opin Drug Discov 2021; 16:1009-1023. [DOI: 10.1080/17460441.2021.1925247] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- AkshatKumar Nigam
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| | - Robert Pollice
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| | | | - Riley J. Hickman
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| | - Matteo Aldeghi
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
- Vector Institute for Artificial Intelligence, University Ave Suite 710, Toronto, Canada
| | - Naruki Yoshikawa
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| | | | | | - Alán Aspuru-Guzik
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
- Vector Institute for Artificial Intelligence, University Ave Suite 710, Toronto, Canada
- Canadian Institute for Advanced Research (CIFAR), University Ave, Toronto, Canada
| |
Collapse
|
16
|
Zhang J, Norinder U, Svensson F. Deep Learning-Based Conformal Prediction of Toxicity. J Chem Inf Model 2021; 61:2648-2657. [PMID: 34043352 DOI: 10.1021/acs.jcim.1c00208] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Predictive modeling for toxicity can help reduce risks in a range of applications and potentially serve as the basis for regulatory decisions. However, the utility of these predictions can be limited if the associated uncertainty is not adequately quantified. With recent studies showing great promise for deep learning-based models also for toxicity predictions, we investigate the combination of deep learning-based predictors with the conformal prediction framework to generate highly predictive models with well-defined uncertainties. We use a range of deep feedforward neural networks and graph neural networks in a conformal prediction setting and evaluate their performance on data from the Tox21 challenge. We also compare the results from the conformal predictors to those of the underlying machine learning models. The results indicate that highly predictive models can be obtained that result in very efficient conformal predictors even at high confidence levels. Taken together, our results highlight the utility of conformal predictors as a convenient way to deliver toxicity predictions with confidence, adding both statistical guarantees on the model performance as well as better predictions of the minority class compared to the underlying models.
Collapse
Affiliation(s)
- Jin Zhang
- Department of Drug Metabolism and Pharmacokinetics, Janssen Pharmaceutica NV, B-2340 Beerse, Belgium
| | - Ulf Norinder
- Department of Computer and Systems Sciences, Stockholm University, P.O. Box 7003, SE-164 07 Kista, Sweden.,Department of Pharmaceutical Biosciences, Uppsala University, P.O. Box 591, SE-751 24 Uppsala, Sweden.,MTM Research Centre, School of Science and Technology, Örebro University, SE-701 82 Örebro, Sweden
| | - Fredrik Svensson
- The Alzheimer's Research UK University College London Drug Discovery Institute, The Cruciform Building, Gower Street, London WC1E 6BT, U.K
| |
Collapse
|
17
|
Mukherjee A, Su A, Rajan K. Deep Learning Model for Identifying Critical Structural Motifs in Potential Endocrine Disruptors. J Chem Inf Model 2021; 61:2187-2197. [PMID: 33872000 DOI: 10.1021/acs.jcim.0c01409] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
This paper aims to identify structural motifs within a molecule that contribute the most toward a chemical being an endocrine disruptor. We have developed a deep neural network-based toolkit toward this aim. The trained model can virtually assess a synthetic chemical's potential to be an endocrine disruptor using machine-readable molecular representation, simplified molecular input line entry system (SMILES). Our proposed toolkit is a multilabel or multioutput classification model that combines both convolution and long short-term memory (LSTM) architectures. The toolkit leverages the advantages of an active learning-based framework that combines multiple sources of data. Class activation maps (CAMs) generated from the feature-extraction layers can identify the structural alerts and the chemical environment that determines the specificity of the structural alerts.
Collapse
Affiliation(s)
- Arpan Mukherjee
- Department of Materials Design and Innovation, University at Buffalo, Buffalo, New York 14260-1660, United States
| | - An Su
- Department of Materials Design and Innovation, University at Buffalo, Buffalo, New York 14260-1660, United States
| | - Krishna Rajan
- Department of Materials Design and Innovation, University at Buffalo, Buffalo, New York 14260-1660, United States
| |
Collapse
|
18
|
Seal S, Yang H, Vollmers L, Bender A. Comparison of Cellular Morphological Descriptors and Molecular Fingerprints for the Prediction of Cytotoxicity- and Proliferation-Related Assays. Chem Res Toxicol 2021; 34:422-437. [PMID: 33522793 DOI: 10.1021/acs.chemrestox.0c00303] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Cell morphology features, such as those from the Cell Painting assay, can be generated at relatively low costs and represent versatile biological descriptors of a system and thereby compound response. In this study, we explored cell morphology descriptors and molecular fingerprints, separately and in combination, for the prediction of cytotoxicity- and proliferation-related in vitro assay endpoints. We selected 135 compounds from the MoleculeNet ToxCast benchmark data set which were annotated with Cell Painting readouts, where the relatively small size of the data set is due to the overlap of required annotations. We trained Random Forest classification models using nested cross-validation and Cell Painting descriptors, Morgan and ErG fingerprints, and their combinations. While using leave-one-cluster-out cross-validation (with clusters based on physicochemical descriptors), models using Cell Painting descriptors achieved higher average performance over all assays (Balanced Accuracy of 0.65, Matthews Correlation Coefficient of 0.28, and AUC-ROC of 0.71) compared to models using ErG fingerprints (BA 0.55, MCC 0.09, and AUC-ROC 0.60) and Morgan fingerprints alone (BA 0.54, MCC 0.06, and AUC-ROC 0.56). While using random shuffle splits, the combination of Cell Painting descriptors with ErG and Morgan fingerprints further improved balanced accuracy on average by 8.9% (in 9 out of 12 assays) and 23.4% (in 8 out of 12 assays) compared to using only ErG and Morgan fingerprints, respectively. Regarding feature importance, Cell Painting descriptors related to nuclei texture, granularity of cells, and cytoplasm as well as cell neighbors and radial distributions were identified to be most contributing, which is plausible given the endpoint considered. We conclude that cell morphological descriptors contain complementary information to molecular fingerprints which can be used to improve the performance of predictive cytotoxicity models, in particular in areas of novel structural space.
Collapse
Affiliation(s)
- Srijit Seal
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Hongbin Yang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Luis Vollmers
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|
19
|
Jain S, Norinder U, Escher SE, Zdrazil B. Combining In Vivo Data with In Silico Predictions for Modeling Hepatic Steatosis by Using Stratified Bagging and Conformal Prediction. Chem Res Toxicol 2020; 34:656-668. [PMID: 33347274 PMCID: PMC7887803 DOI: 10.1021/acs.chemrestox.0c00511] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Hepatic steatosis (fatty liver) is a severe liver disease induced by the excessive accumulation of fatty acids in hepatocytes. In this study, we developed reliable in silico models for predicting hepatic steatosis on the basis of an in vivo data set of 1041 compounds measured in rodent studies with repeated oral exposure. The imbalanced nature of the data set (1:8, with the "steatotic" compounds belonging to the minority class) required the use of meta-classifiers-bagging with stratified under-sampling and Mondrian conformal prediction-on top of the base classifier random forest. One major goal was the investigation of the influence of different descriptor combinations on model performance (tested by predicting an external validation set): physicochemical descriptors (RDKit), ToxPrint features, as well as predictions from in silico nuclear receptor and transporter models. All models based upon descriptor combinations including physicochemical features led to reasonable balanced accuracies (BAs between 0.65 and 0.69 for the respective models). Combining physicochemical features with transporter predictions and further with ToxPrint features gave the best performing model (BAs up to 0.7 and efficiencies of 0.82). Whereas both meta-classifiers proved useful for this highly imbalanced toxicity data set, the conformal prediction framework also guarantees the error level and thus might be favored for future studies in the field of predictive toxicology.
Collapse
Affiliation(s)
- Sankalp Jain
- Department of Pharmaceutical Chemistry, Division of Drug Design and Medicinal Chemistry, University of Vienna, 1090 Vienna, Austria
| | - Ulf Norinder
- Unit of Toxicology Sciences, Swetox, Karolinska Institutet, SE-15136 Södertälje, Sweden
| | - Sylvia E Escher
- Fraunhofer Institute for Toxicology and Experimental Medicine (ITEM), 30625 Hannover, Germany
| | - Barbara Zdrazil
- Department of Pharmaceutical Chemistry, Division of Drug Design and Medicinal Chemistry, University of Vienna, 1090 Vienna, Austria
| |
Collapse
|
20
|
Mervin LH, Johansson S, Semenova E, Giblin KA, Engkvist O. Uncertainty quantification in drug design. Drug Discov Today 2020; 26:474-489. [PMID: 33253918 DOI: 10.1016/j.drudis.2020.11.027] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 07/13/2020] [Accepted: 11/23/2020] [Indexed: 01/03/2023]
Abstract
Machine learning and artificial intelligence are increasingly being applied to the drug-design process as a result of the development of novel algorithms, growing access, the falling cost of computation and the development of novel technologies for generating chemically and biologically relevant data. There has been recent progress in fields such as molecular de novo generation, synthetic route prediction and, to some extent, property predictions. Despite this, most research in these fields has focused on improving the accuracy of the technologies, rather than on quantifying the uncertainty in the predictions. Uncertainty quantification will become a key component in autonomous decision making and will be crucial for integrating machine learning and chemistry automation to create an autonomous design-make-test-analyse cycle. This review covers the empirical, frequentist and Bayesian approaches to uncertainty quantification, and outlines how they can be used for drug design. We also outline the impact of uncertainty quantification on decision making.
Collapse
Affiliation(s)
- Lewis H Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Simon Johansson
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden; Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Elizaveta Semenova
- Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Kathryn A Giblin
- Medicinal Chemistry, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
21
|
Alvarsson J, Arvidsson McShane S, Norinder U, Spjuth O. Predicting With Confidence: Using Conformal Prediction in Drug Discovery. J Pharm Sci 2020; 110:42-49. [PMID: 33075380 DOI: 10.1016/j.xphs.2020.09.055] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 09/28/2020] [Accepted: 09/29/2020] [Indexed: 10/23/2022]
Abstract
One of the challenges with predictive modeling is how to quantify the reliability of the models' predictions on new objects. In this work we give an introduction to conformal prediction, a framework that sits on top of traditional machine learning algorithms and which outputs valid confidence estimates to predictions from QSAR models in the form of prediction intervals that are specific to each predicted object. For regression, a prediction interval consists of an upper and a lower bound. For classification, a prediction interval is a set that contains none, one, or many of the potential classes. The size of the prediction interval is affected by a user-specified confidence/significance level, and by the nonconformity of the predicted object; i.e., the strangeness as defined by a nonconformity function. Conformal prediction provides a rigorous and mathematically proven framework for in silico modeling with guarantees on error rates as well as a consistent handling of the models' applicability domain intrinsically linked to the underlying machine learning model. Apart from introducing the concepts and types of conformal prediction, we also provide an example application for modeling ABC transporters using conformal prediction, as well as a discussion on general implications for drug discovery.
Collapse
Affiliation(s)
- Jonathan Alvarsson
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, SE-75124, Uppsala, Sweden
| | - Staffan Arvidsson McShane
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, SE-75124, Uppsala, Sweden
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, SE-75124, Uppsala, Sweden; Department of Computer and Systems Sciences, Stockholm University, Box 7003, SE-16407, Kista, Sweden; MTM Research Centre, School of Science and Technology, Örebro University, SE-70182 Örebro, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, SE-75124, Uppsala, Sweden.
| |
Collapse
|
22
|
Mervin LH, Afzal AM, Engkvist O, Bender A. Comparison of Scaling Methods to Obtain Calibrated Probabilities of Activity for Protein–Ligand Predictions. J Chem Inf Model 2020; 60:4546-4559. [DOI: 10.1021/acs.jcim.0c00476] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Lewis H. Mervin
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, U.K
| | - Avid M. Afzal
- Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, U.K
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Mölndal SE-431 83, Sweden
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge CB2 1TN, U.K
| |
Collapse
|
23
|
Norinder U, Spjuth O, Svensson F. Using Predicted Bioactivity Profiles to Improve Predictive Modeling. J Chem Inf Model 2020; 60:2830-2837. [PMID: 32374618 DOI: 10.1021/acs.jcim.0c00250] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Predictive modeling is a cornerstone in early drug development. Using information for multiple domains or across prediction tasks has the potential to improve the performance of predictive modeling. However, aggregating data often leads to incomplete data matrices that might be limiting for modeling. In line with previous studies, we show that by generating predicted bioactivity profiles, and using these as additional features, prediction accuracy of biological endpoints can be improved. Using conformal prediction, a type of confidence predictor, we present a robust framework for the calculation of these profiles and the evaluation of their impact. We report on the outcomes from several approaches to generate the predicted profiles on 16 datasets in cytotoxicity and bioactivity and show that efficiency is improved the most when including the p-values from conformal prediction as bioactivity profiles.
Collapse
Affiliation(s)
- Ulf Norinder
- Department of Computer and Systems Sciences, Stockholm University, Box 7003, SE-164 07 Kista, Sweden.,Department of Pharmaceutical Biosciences, Uppsala University, Box 591, SE-75124 Uppsala, Sweden.,MTM Research Centre, School of Science and Technology, Örebro University, SE-70182 Örebro, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, SE-75124 Uppsala, Sweden.,Science for Life Laboratory, Uppsala University, Box 591, SE-75124 Uppsala, Sweden
| | - Fredrik Svensson
- The Alzheimer's Research UK University College London Drug Discovery Institute, The Cruciform Building, Gower Street, WC1E 6BT London, U.K
| |
Collapse
|
24
|
Revealing cytotoxic substructures in molecules using deep learning. J Comput Aided Mol Des 2020; 34:731-746. [PMID: 32297073 PMCID: PMC7292813 DOI: 10.1007/s10822-020-00310-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 03/16/2020] [Indexed: 01/22/2023]
Abstract
In drug development, late stage toxicity issues of a compound are the main cause of failure in clinical trials. In silico methods are therefore of high importance to guide the early design process to reduce time, costs and animal testing. Technical advances and the ever growing amount of available toxicity data enabled machine learning, especially neural networks, to impact the field of predictive toxicology. In this study, cytotoxicity prediction, one of the earliest handles in drug discovery, is investigated using a deep learning approach trained on a highly consistent in-house data set of over 34,000 compounds with a share of less than 5% of cytotoxic molecules. The model reached a balanced accuracy of over 70%, similar to previously reported studies using Random Forest. Albeit yielding good results, neural networks are often described as a black box lacking deeper mechanistic understanding of the underlying model. To overcome this absence of interpretability, a Deep Taylor Decomposition method is investigated to identify substructures that may be responsible for the cytotoxic effects, the so-called toxicophores. Furthermore, this study introduces cytotoxicity maps which provide a visual structural interpretation of the relevance of these substructures. Using this approach could be helpful in drug development to predict the potential toxicity of a compound as well as to generate new insights into the toxic mechanism. Moreover, it could also help to de-risk and optimize compounds.
Collapse
|
25
|
Morger A, Mathea M, Achenbach JH, Wolf A, Buesen R, Schleifer KJ, Landsiedel R, Volkamer A. KnowTox: pipeline and case study for confident prediction of potential toxic effects of compounds in early phases of development. J Cheminform 2020; 12:24. [PMID: 33431007 PMCID: PMC7157991 DOI: 10.1186/s13321-020-00422-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 03/09/2020] [Indexed: 02/07/2023] Open
Abstract
Risk assessment of newly synthesised chemicals is a prerequisite for regulatory approval. In this context, in silico methods have great potential to reduce time, cost, and ultimately animal testing as they make use of the ever-growing amount of available toxicity data. Here, KnowTox is presented, a novel pipeline that combines three different in silico toxicology approaches to allow for confident prediction of potentially toxic effects of query compounds, i.e. machine learning models for 88 endpoints, alerts for 919 toxic substructures, and computational support for read-across. It is mainly based on the ToxCast dataset, containing after preprocessing a sparse matrix of 7912 compounds tested against 985 endpoints. When applying machine learning models, applicability and reliability of predictions for new chemicals are of utmost importance. Therefore, first, the conformal prediction technique was deployed, comprising an additional calibration step and per definition creating internally valid predictors at a given significance level. Second, to further improve validity and information efficiency, two adaptations are suggested, exemplified at the androgen receptor antagonism endpoint. An absolute increase in validity of 23% on the in-house dataset of 534 compounds could be achieved by introducing KNNRegressor normalisation. This increase in validity comes at the cost of efficiency, which could again be improved by 20% for the initial ToxCast model by balancing the dataset during model training. Finally, the value of the developed pipeline for risk assessment is discussed using two in-house triazole molecules. Compared to a single toxicity prediction method, complementing the outputs of different approaches can have a higher impact on guiding toxicity testing and de-selecting most likely harmful development-candidate compounds early in the development process.
Collapse
Affiliation(s)
- Andrea Morger
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany
| | | | | | | | | | | | | | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany.
| |
Collapse
|
26
|
Hemmerich J, Ecker GF. In silico toxicology: From structure–activity relationships towards deep learning and adverse outcome pathways. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020; 10:e1475. [PMID: 35866138 PMCID: PMC9286356 DOI: 10.1002/wcms.1475] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 03/09/2020] [Accepted: 03/10/2020] [Indexed: 12/18/2022]
Abstract
In silico toxicology is an emerging field. It gains increasing importance as research is aiming to decrease the use of animal experiments as suggested in the 3R principles by Russell and Burch. In silico toxicology is a means to identify hazards of compounds before synthesis, and thus in very early stages of drug development. For chemical industries, as well as regulatory agencies it can aid in gap‐filling and guide risk minimization strategies. Techniques such as structural alerts, read‐across, quantitative structure–activity relationship, machine learning, and deep learning allow to use in silico toxicology in many cases, some even when data is scarce. Especially the concept of adverse outcome pathways puts all techniques into a broader context and can elucidate predictions by mechanistic insights. This article is categorized under:Structure and Mechanism > Computational Biochemistry and Biophysics Data Science > Chemoinformatics
Collapse
Affiliation(s)
- Jennifer Hemmerich
- Department of Pharmaceutical Chemistry University of Vienna Vienna Austria
| | - Gerhard F. Ecker
- Department of Pharmaceutical Chemistry University of Vienna Vienna Austria
| |
Collapse
|
27
|
Sun H, Wang Y, Cheff DM, Hall MD, Shen M. Predictive models for estimating cytotoxicity on the basis of chemical structures. Bioorg Med Chem 2020; 28:115422. [PMID: 32234277 DOI: 10.1016/j.bmc.2020.115422] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 02/28/2020] [Accepted: 03/02/2020] [Indexed: 12/19/2022]
Abstract
Cytotoxicity is a critical property in determining the fate of a small molecule in the drug discovery pipeline. Cytotoxic compounds are identified and triaged in both target-based and cell-based phenotypic approaches due to their off-target toxicity or on-target and on-mechanism toxicity for oncology and neurodegenerative targets. It is critical that chemical-induced cytotoxicity be reliably predicted before drug candidates advance to the late stage of development, or more ideally, before compounds are synthesized. In this study, we assessed the cell-based cytotoxicity of nearly 10,000 compounds in NCATS annotated libraries against four 'normal' cell lines (HEK 293, NIH 3T3, CRL-7250 and HaCat) using CellTiter-Glo (CTG) technology and constructed highly predictive models to estimate cytotoxicity from chemical structures. There are 5,241 non-redundant compounds having unambiguous activities in the four different cell lines, among which 11.8% compounds exhibited cytotoxicity in two or more cell lines and are thus labelled cytotoxic. The support vector classification (SVC) models trained with 80% randomly selected molecules achieved the area under the receiver operating characteristic curve (AUC-ROC) of 0.88 on average for the remaining 20% compounds in the test sets in 10 repeating experiments. Application of under-sampling rebalancing method further improved the averaged AUC-ROC to 0.90. Analysis of structural features shared by cytotoxic compounds may offer medicinal chemists heuristic design ideas to eliminate undesirable cytotoxicity. The profiling of cytotoxicity of drug-like molecules with annotated primary mechanism of action (MOA) will inform on the roles played by different targets or pathways in cellular viability. The predictive models for cytotoxicity (accessible at https://tripod.nih.gov/web_adme/cytotox.html) provide the scientific community a fast yet reliable way to prioritize molecules with little or no cytotoxicity for downstream development.
Collapse
Affiliation(s)
- Hongmao Sun
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Dr., Rockville, MD 20850, United States.
| | - Yuhong Wang
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Dr., Rockville, MD 20850, United States
| | - Dorian M Cheff
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Dr., Rockville, MD 20850, United States
| | - Matthew D Hall
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Dr., Rockville, MD 20850, United States
| | - Min Shen
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Dr., Rockville, MD 20850, United States.
| |
Collapse
|
28
|
Svensson F, Norinder U. Conformal Prediction for Ecotoxicology and Implications for Regulatory Decision-Making. METHODS IN PHARMACOLOGY AND TOXICOLOGY 2020. [DOI: 10.1007/978-1-0716-0150-1_12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
29
|
Zhang J, Mucs D, Norinder U, Svensson F. LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity-Application to the Tox21 and Mutagenicity Data Sets. J Chem Inf Model 2019; 59:4150-4158. [PMID: 31560206 DOI: 10.1021/acs.jcim.9b00633] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Machine learning algorithms have attained widespread use in assessing the potential toxicities of pharmaceuticals and industrial chemicals because of their faster speed and lower cost compared to experimental bioassays. Gradient boosting is an effective algorithm that often achieves high predictivity, but historically the relative long computational time limited its applications in predicting large compound libraries or developing in silico predictive models that require frequent retraining. LightGBM, a recent improvement of the gradient boosting algorithm, inherited its high predictivity but resolved its scalability and long computational time by adopting a leaf-wise tree growth strategy and introducing novel techniques. In this study, we compared the predictive performance and the computational time of LightGBM to deep neural networks, random forests, support vector machines, and XGBoost. All algorithms were rigorously evaluated on publicly available Tox21 and mutagenicity data sets using a Bayesian optimization integrated nested 10-fold cross-validation scheme that performs hyperparameter optimization while examining model generalizability and transferability to new data. The evaluation results demonstrated that LightGBM is an effective and highly scalable algorithm offering the best predictive performance while consuming significantly shorter computational time than the other investigated algorithms across all Tox21 and mutagenicity data sets. We recommend LightGBM for applications of in silico safety assessment and also other areas of cheminformatics to fulfill the ever-growing demand for accurate and rapid prediction of various toxicity or activity related end points of large compound libraries present in the pharmaceutical and chemical industry.
Collapse
Affiliation(s)
- Jin Zhang
- Department of Chemistry , Umeå University , SE-901 87 Umeå , Sweden
| | - Daniel Mucs
- Swetox, Unit of Toxicology Sciences , Karolinska Institutet , Forskargatan 20 , SE-151 36 Södertälje , Sweden
| | - Ulf Norinder
- Swetox, Unit of Toxicology Sciences , Karolinska Institutet , Forskargatan 20 , SE-151 36 Södertälje , Sweden.,Department of Computer and Systems Sciences , Stockholm University , Box 7003, SE-164 07 Kista , Sweden
| | - Fredrik Svensson
- The Alzheimer's Research UK University College London Drug Discovery Institute , The Cruciform Building, Gower Street , London WC1E 6BT , U.K.,The Francis Crick Institute , 1 Midland Road , London NW1 1AT , U.K
| |
Collapse
|
30
|
Zhang Y, Lee AA. Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning. Chem Sci 2019; 10:8154-8163. [PMID: 31857882 PMCID: PMC6837061 DOI: 10.1039/c9sc00616h] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2019] [Accepted: 07/04/2019] [Indexed: 12/16/2022] Open
Abstract
We report a statistically principled method to quantify the uncertainty of machine learning models for molecular properties prediction. We show that this uncertainty estimate can be used to judiciously design experiments.
Predicting bioactivity and physical properties of small molecules is a central challenge in drug discovery. Deep learning is becoming the method of choice but studies to date focus on mean accuracy as the main metric. However, to replace costly and mission-critical experiments by models, a high mean accuracy is not enough: outliers can derail a discovery campaign, thus models need to reliably predict when it will fail, even when the training data is biased; experiments are expensive, thus models need to be data-efficient and suggest informative training sets using active learning. We show that uncertainty quantification and active learning can be achieved by Bayesian semi-supervised graph convolutional neural networks. The Bayesian approach estimates uncertainty in a statistically principled way through sampling from the posterior distribution. Semi-supervised learning disentangles representation learning and regression, keeping uncertainty estimates accurate in the low data limit and allowing the model to start active learning from a small initial pool of training data. Our study highlights the promise of Bayesian deep learning for chemistry.
Collapse
Affiliation(s)
- Yao Zhang
- Cavendish Laboratory , University of Cambridge , Cambridge CB3 0HE , UK . .,Department of Applied Mathematics and Theoretical Physics , University of Cambridge , Cambridge CB3 0WA , UK
| | - Alpha A Lee
- Cavendish Laboratory , University of Cambridge , Cambridge CB3 0HE , UK .
| |
Collapse
|
31
|
Ferreira LL, Andricopulo AD. ADMET modeling approaches in drug discovery. Drug Discov Today 2019; 24:1157-1165. [DOI: 10.1016/j.drudis.2019.03.015] [Citation(s) in RCA: 102] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Revised: 02/08/2019] [Accepted: 03/14/2019] [Indexed: 12/31/2022]
|
32
|
Norinder U, Svensson F. Multitask Modeling with Confidence Using Matrix Factorization and Conformal Prediction. J Chem Inf Model 2019; 59:1598-1604. [PMID: 30908915 DOI: 10.1021/acs.jcim.9b00027] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Multitask prediction of bioactivities is often faced with challenges relating to the sparsity of data and imbalance between different labels. We propose class conditional (Mondrian) conformal predictors using underlying Macau models as a novel approach for large scale bioactivity prediction. This approach handles both high degrees of missing data and label imbalances while still producing high quality predictive models. When applied to ten assay end points from PubChem, the models generated valid models with an efficiency of 74.0-80.1% at the 80% confidence level with similar performance both for the minority and majority class. Also when deleting progressively larger portions of the available data (0-80%) the performance of the models remained robust with only minor deterioration (reduction in efficiency between 5 and 10%). Compared to using Macau without conformal prediction the method presented here significantly improves the performance on imbalanced data sets.
Collapse
Affiliation(s)
- Ulf Norinder
- Swetox, Unit of Toxicology Sciences , Karolinska Institutet , Forskargatan 20 , SE-151 36 Södertälje , Sweden.,Department of Computer and Systems Sciences , Stockholm University , Box 7003 , SE-164 07 Kista , Sweden
| | - Fredrik Svensson
- Alzheimer's Research UK UCL Drug Discovery Institute , University College London , Cruciform Building, Gower Street , London , WC1E 6BT , U.K.,The Francis Crick Institute , 1 Midland Road , London , NW1 1AT , U.K
| |
Collapse
|
33
|
Yin Z, Ai H, Zhang L, Ren G, Wang Y, Zhao Q, Liu H. Predicting the cytotoxicity of chemicals using ensemble learning methods and molecular fingerprints. J Appl Toxicol 2019; 39:1366-1377. [PMID: 30763981 DOI: 10.1002/jat.3785] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Revised: 01/14/2019] [Accepted: 01/14/2019] [Indexed: 12/12/2022]
Abstract
The prediction of compound cytotoxicity is an important part of the drug discovery process. However, it usually appears as poor predictive performance because the datasets are high-throughput and have a class-imbalance problem. In this study, several strategies of performing a structure-activity relationship study for a cytotoxic endpoint in the AID364 dataset were explored to solve the class-imbalance problem. Random forest adaboost was used as the base learners for 10 types of molecular fingerprints and an ensemble method and six data-balancing methods were applied to balance the classes. As a result, the ensemble model using MACCS fingerprint was found to be the best, giving area under the curve of 85.2% ± 0.35%, sensitivity of 81.8% ± 0.65%, and specificity of 76.0% ± 0.12% in fivefold cross-validation and area under the curve of 78.8%, sensitivity of 55.5% and specificity of 78.5% in external validation. Good performance also appeared on other datasets with different sizes/degrees of imbalance. To explore the structural commonality of cytotoxic compounds, several substructures were identified as an important reference for substructure alerts. The convincing results indicate that the proposed models are helpful in predicting the cytotoxicity of chemicals.
Collapse
Affiliation(s)
- Zimo Yin
- School of Information, Liaoning University, Shenyang, 110036, China
| | - Haixin Ai
- School of Life Science, Liaoning University, Shenyang, 110036, China.,Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Liaoning Province, Shenyang, 110036, China.,Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, 110036, China
| | - Li Zhang
- School of Life Science, Liaoning University, Shenyang, 110036, China.,Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Liaoning Province, Shenyang, 110036, China.,Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, 110036, China
| | - Guofei Ren
- School of Information, Liaoning University, Shenyang, 110036, China
| | - Yuming Wang
- Department of Breast Surgery, The First Hospital of China Medical University, Shenyang, Liaoning, 110001, China
| | - Qi Zhao
- School of Mathematics, Liaoning University, Shenyang, 110036, China
| | - Hongsheng Liu
- School of Life Science, Liaoning University, Shenyang, 110036, China.,Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Liaoning Province, Shenyang, 110036, China.,Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, 110036, China
| |
Collapse
|
34
|
Bosc N, Atkinson F, Felix E, Gaulton A, Hersey A, Leach AR. Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery. J Cheminform 2019; 11:4. [PMID: 30631996 PMCID: PMC6690068 DOI: 10.1186/s13321-018-0325-4] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 12/24/2018] [Indexed: 12/22/2022] Open
Abstract
Structure–activity relationship modelling is frequently used in the early stage of drug discovery to assess the activity of a compound on one or several targets, and can also be used to assess the interaction of compounds with liability targets. QSAR models have been used for these and related applications over many years, with good success. Conformal prediction is a relatively new QSAR approach that provides information on the certainty of a prediction, and so helps in decision-making. However, it is not always clear how best to make use of this additional information. In this article, we describe a case study that directly compares conformal prediction with traditional QSAR methods for large-scale predictions of target-ligand binding. The ChEMBL database was used to extract a data set comprising data from 550 human protein targets with different bioactivity profiles. For each target, a QSAR model and a conformal predictor were trained and their results compared. The models were then evaluated on new data published since the original models were built to simulate a “real world” application. The comparative study highlights the similarities between the two techniques but also some differences that it is important to bear in mind when the methods are used in practical drug discovery applications.
Collapse
Affiliation(s)
- Nicolas Bosc
- Chemogenomics Team, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Francis Atkinson
- Chemogenomics Team, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Eloy Felix
- Chemogenomics Team, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Anna Gaulton
- Chemogenomics Team, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Anne Hersey
- Chemogenomics Team, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Andrew R Leach
- Chemogenomics Team, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
35
|
Norinder U, Ahlberg E, Carlsson L. Predicting Ames Mutagenicity Using Conformal Prediction in the Ames/QSAR International Challenge Project. Mutagenesis 2018; 34:33-40. [DOI: 10.1093/mutage/gey038] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 10/10/2018] [Accepted: 11/13/2018] [Indexed: 12/19/2022] Open
Affiliation(s)
- Ulf Norinder
- Swetox, Unit of Toxicology Sciences, Karolinska Institutet, Södertälje, Sweden
- Department of Computer and Systems Sciences, Stockholm University, Kista, Sweden
| | - Ernst Ahlberg
- Drug Safety and Metabolism, Innovative Medicines and Early Development Biotech Unit, AstraZeneca R&D Gothenburg, Mölndal, Sweden
| | - Lars Carlsson
- Computer Learning Research Centre, Royal Holloway, University of London Egham, Surrey, UK
| |
Collapse
|
36
|
Svensson F, Aniceto N, Norinder U, Cortes-Ciriano I, Spjuth O, Carlsson L, Bender A. Conformal Regression for Quantitative Structure–Activity Relationship Modeling—Quantifying Prediction Uncertainty. J Chem Inf Model 2018; 58:1132-1140. [DOI: 10.1021/acs.jcim.8b00054] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Fredrik Svensson
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
- IOTA Pharmaceuticals, St Johns Innovation Centre, Cowley Road, Cambridge CB4 0WS, U.K
| | - Natalia Aniceto
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Ulf Norinder
- Swetox, Unit of Toxicology Sciences, Karolinska Institutet, Forskargatan 20, SE-151 36 Södertälje, Sweden
- Department of Computer and Systems Sciences, Stockholm University, Box 7003, SE-164 07 Kista, Sweden
| | - Isidro Cortes-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, SE-75124, Uppsala Sweden
| | - Lars Carlsson
- Quantitative Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, SE-43183, Mölndal, Sweden
- Department of Computer Science, Royal Holloway, University of London, Egham Hill, Surrey, U.K
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
37
|
Svensson F, Afzal AM, Norinder U, Bender A. Maximizing gain in high-throughput screening using conformal prediction. J Cheminform 2018; 10:7. [PMID: 29468427 PMCID: PMC5821614 DOI: 10.1186/s13321-018-0260-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 02/09/2018] [Indexed: 11/10/2022] Open
Abstract
Iterative screening has emerged as a promising approach to increase the efficiency of screening campaigns compared to traditional high throughput approaches. By learning from a subset of the compound library, inferences on what compounds to screen next can be made by predictive models, resulting in more efficient screening. One way to evaluate screening is to consider the cost of screening compared to the gain associated with finding an active compound. In this work, we introduce a conformal predictor coupled with a gain-cost function with the aim to maximise gain in iterative screening. Using this setup we were able to show that by evaluating the predictions on the training data, very accurate predictions on what settings will produce the highest gain on the test data can be made. We evaluate the approach on 12 bioactivity datasets from PubChem training the models using 20% of the data. Depending on the settings of the gain-cost function, the settings generating the maximum gain were accurately identified in 8-10 out of the 12 datasets. Broadly, our approach can predict what strategy generates the highest gain based on the results of the cost-gain evaluation: to screen the compounds predicted to be active, to screen all the remaining data, or not to screen any additional compounds. When the algorithm indicates that the predicted active compounds should be screened, our approach also indicates what confidence level to apply in order to maximize gain. Hence, our approach facilitates decision-making and allocation of the resources where they deliver the most value by indicating in advance the likely outcome of a screening campaign.
Collapse
Affiliation(s)
- Fredrik Svensson
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
- IOTA Pharmaceuticals, St Johns Innovation Centre, Cowley Road, Cambridge, CB4 0WS UK
| | - Avid M. Afzal
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Ulf Norinder
- Unit of Toxicology Sciences, Karolinska Institutet, Swetox, Forskargatan 20, 151 36 Södertälje, Sweden
- Department of Computer and Systems Sciences, Stockholm University, Box 7003, 164 07 Kista, Sweden
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| |
Collapse
|
38
|
Rakers C, Najnin RA, Polash AH, Takeda S, Brown J. Chemogenomic Active Learning's Domain of Applicability on Small, Sparse qHTS Matrices: A Study Using Cytochrome P450 and Nuclear Hormone Receptor Families. ChemMedChem 2018; 13:511-521. [DOI: 10.1002/cmdc.201700677] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 12/04/2017] [Indexed: 01/21/2023]
Affiliation(s)
- Christin Rakers
- Institute of Transformative bio-Molecules, WPI-ITbM; Nagoya University; Furo-cho Chikusa-ku Nagoya 464-8602 Japan
| | - Rifat Ara Najnin
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - Ahsan Habib Polash
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - Shunichi Takeda
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - J.B. Brown
- Laboratory for Molecular Biosciences; Kyoto University Graduate School of Medicine; Yoshida-konoemachi Building E 606-8501 Kyoto Sakyo Japan
| |
Collapse
|
39
|
Svensson F, Norinder U, Bender A. Improving Screening Efficiency through Iterative Screening Using Docking and Conformal Prediction. J Chem Inf Model 2017; 57:439-444. [DOI: 10.1021/acs.jcim.6b00532] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Fredrik Svensson
- Centre
for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Ulf Norinder
- Swetox,
Karolinska Institutet, Unit of Toxicology Sciences, Forskargatan
20, SE-151 36 Södertälje, Sweden
- Department
of Computer and Systems Sciences, Stockholm University, Box 7003, SE-164
07 Kista, Sweden
| | - Andreas Bender
- Centre
for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|