1
|
Arvidsson McShane S, Norinder U, Alvarsson J, Ahlberg E, Carlsson L, Spjuth O. CPSign: conformal prediction for cheminformatics modeling. J Cheminform 2024; 16:75. [PMID: 38943219 PMCID: PMC11214261 DOI: 10.1186/s13321-024-00870-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 06/11/2024] [Indexed: 07/01/2024] Open
Abstract
Conformal prediction has seen many applications in pharmaceutical science, being able to calibrate outputs of machine learning models and producing valid prediction intervals. We here present the open source software CPSign that is a complete implementation of conformal prediction for cheminformatics modeling. CPSign implements inductive and transductive conformal prediction for classification and regression, and probabilistic prediction with the Venn-ABERS methodology. The main chemical representation is signatures but other types of descriptors are also supported. The main modeling methodology is support vector machines (SVMs), but additional modeling methods are supported via an extension mechanism, e.g. DeepLearning4J models. We also describe features for visualizing results from conformal models including calibration and efficiency plots, as well as features to publish predictive models as REST services. We compare CPSign against other common cheminformatics modeling approaches including random forest, and a directed message-passing neural network. The results show that CPSign produces robust predictive performance with comparative predictive efficiency, with superior runtime and lower hardware requirements compared to neural network based models. CPSign has been used in several studies and is in production-use in multiple organizations. The ability to work directly with chemical input files, perform descriptor calculation and modeling with SVM in the conformal prediction framework, with a single software package having a low footprint and fast execution time makes CPSign a convenient and yet flexible package for training, deploying, and predicting on chemical data. CPSign can be downloaded from GitHub at https://github.com/arosbio/cpsign .Scientific contribution CPSign provides a single software that allows users to perform data preprocessing, modeling and make predictions directly on chemical structures, using conformal and probabilistic prediction. Building and evaluating new models can be achieved at a high abstraction level, without sacrificing flexibility and predictive performance-showcased with a method evaluation against contemporary modeling approaches, where CPSign performs on par with a state-of-the-art deep learning based model.
Collapse
Affiliation(s)
- Staffan Arvidsson McShane
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden
- Department of Computer and Systems Sciences, Stockholm University, Stockholm, 10587, Sweden
- MTM Research Centre, School of Science and Technology, Örebro University, Örebro, 70182, Sweden
| | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden
| | - Ernst Ahlberg
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden
- Department of Computer Science, Royal Holloway University of London, Egham, TW20 0EX, UK
| | - Lars Carlsson
- Department of Computer Science, Royal Holloway University of London, Egham, TW20 0EX, UK
- Department of Computing, Jönköping University, Jönköping, 55111, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, 75124, Sweden.
| |
Collapse
|
2
|
Fagerholm U, Hellberg S, Alvarsson J, Spjuth O. In Silico Prediction of Human Clinical Pharmacokinetics with ANDROMEDA by Prosilico: Predictions for an Established Benchmarking Data Set, a Modern Small Drug Data Set, and a Comparison with Laboratory Methods. Altern Lab Anim 2023; 51:39-54. [PMID: 36572567 DOI: 10.1177/02611929221148447] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
There is an ongoing aim to replace animal and in vitro laboratory models with in silico methods. Such replacement requires the successful validation and comparably good performance of the alternative methods. We have developed an in silico prediction system for human clinical pharmacokinetics, based on machine learning, conformal prediction and a new physiologically-based pharmacokinetic model, i.e. ANDROMEDA. The objectives of this study were: a) to evaluate how well ANDROMEDA predicts the human clinical pharmacokinetics of a previously proposed benchmarking data set comprising 24 physicochemically diverse drugs and 28 small drug molecules new to the market in 2021; b) to compare its predictive performance with that of laboratory methods; and c) to investigate and describe the pharmacokinetic characteristics of the modern drugs. Median and maximum prediction errors for the selected major parameters were ca 1.2 to 2.5-fold and 16-fold for both data sets, respectively. Prediction accuracy was on par with, or better than, the best laboratory-based prediction methods (superior performance for a vast majority of the comparisons), and the prediction range was considerably broader. The modern drugs have higher average molecular weight than those in the benchmarking set from 15 years earlier (ca 200 g/mol higher), and were predicted to (generally) have relatively complex pharmacokinetics, including permeability and dissolution limitations and significant renal, biliary and/or gut-wall elimination. In conclusion, the results were overall better than those obtained with laboratory methods, and thus serve to further validate the ANDROMEDA in silico system for the prediction of human clinical pharmacokinetics of modern and physicochemically diverse drugs.
Collapse
Affiliation(s)
| | | | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ola Spjuth
- Prosilico AB, Huddinge, Sweden.,Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| |
Collapse
|
3
|
Sapounidou M, Norinder U, Andersson PL. Predicting Endocrine Disruption Using Conformal Prediction - A Prioritization Strategy to Identify Hazardous Chemicals with Confidence. Chem Res Toxicol 2022; 36:53-65. [PMID: 36534483 PMCID: PMC9846826 DOI: 10.1021/acs.chemrestox.2c00267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Receptor-mediated molecular initiating events (MIEs) and their relevance in endocrine activity (EA) have been highlighted in literature. More than 15 receptors have been associated with neurodevelopmental adversity and metabolic disruption. MIEs describe chemical interactions with defined biological outcomes, a relationship that could be described with quantitative structure-activity relationship (QSAR) models. QSAR uncertainty can be assessed using the conformal prediction (CP) framework, which provides similarity (i.e., nonconformity) scores relative to the defined classes per prediction. CP calibration can indirectly mitigate data imbalance during model development, and the nonconformity scores serve as intrinsic measures of chemical applicability domain assessment during screening. The focus of this work was to propose an in silico predictive strategy for EA. First, 23 QSAR models for MIEs associated with EA were developed using high-throughput data for 14 receptors. To handle the data imbalance, five protocols were compared, and CP provided the most balanced class definition. Second, the developed QSAR models were applied to a large data set (∼55,000 chemicals), comprising chemicals representative of potential risk for human exposure. Using CP, it was possible to assess the uncertainty of the screening results and identify model strengths and out of domain chemicals. Last, two clustering methods, t-distributed stochastic neighbor embedding and Tanimoto similarity, were used to identify compounds with potential EA using known endocrine disruptors as reference. The cluster overlap between methods produced 23 chemicals with suspected or demonstrated EA potential. The presented models could be utilized for first-tier screening and identification of compounds with potential biological activity across the studied MIEs.
Collapse
Affiliation(s)
| | - Ulf Norinder
- Department
of Computer and Systems Sciences, Stockholm
University, Box 7003, 164
07 Kista, Sweden,MTM
Research
Centre, School of Science and Technology, Örebro University, 701 82 Örebro, Sweden,Department
of Pharmaceutical Biosciences, Uppsala University, Box 591, 75 124 Uppsala, Sweden
| | | |
Collapse
|
4
|
In silico predictions of the gastrointestinal uptake of macrocycles in man using conformal prediction methodology. J Pharm Sci 2022; 111:2614-2619. [DOI: 10.1016/j.xphs.2022.05.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 05/16/2022] [Accepted: 05/16/2022] [Indexed: 11/17/2022]
|
5
|
Climate-change-driven growth decline of European beech forests. Commun Biol 2022; 5:163. [PMID: 35273334 PMCID: PMC8913685 DOI: 10.1038/s42003-022-03107-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 02/02/2022] [Indexed: 11/08/2022] Open
Abstract
The growth of past, present, and future forests was, is and will be affected by climate variability. This multifaceted relationship has been assessed in several regional studies, but spatially resolved, large-scale analyses are largely missing so far. Here we estimate recent changes in growth of 5800 beech trees (Fagus sylvatica L.) from 324 sites, representing the full geographic and climatic range of species. Future growth trends were predicted considering state-of-the-art climate scenarios. The validated models indicate growth declines across large region of the distribution in recent decades, and project severe future growth declines ranging from -20% to more than -50% by 2090, depending on the region and climate change scenario (i.e. CMIP6 SSP1-2.6 and SSP5-8.5). Forecasted forest productivity losses are most striking towards the southern distribution limit of Fagus sylvatica, in regions where persisting atmospheric high-pressure systems are expected to increase drought severity. The projected 21st century growth changes across Europe indicate serious ecological and economic consequences that require immediate forest adaptation.
Collapse
|
6
|
Fagerholm U, Hellberg S, Alvarsson J, Spjuth O. In silico predictions of the human pharmacokinetics/toxicokinetics of 65 chemicals from various classes using conformal prediction methodology. Xenobiotica 2022; 52:113-118. [PMID: 35238270 DOI: 10.1080/00498254.2022.2049397] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Pharmacokinetic/toxicokinetic (PK/TK) information for chemicals in humans is generally lacking. Here we applied machine learning, conformal prediction and a new physiologically-based PK/TK model for prediction of the human PK/TK of 65 chemicals from different classes, including carcinogens, food constituents and preservatives, vitamins, sweeteners, dyes and colours, pesticides, alternative medicines, flame retardants, psychoactive drugs, dioxins, poisons, UV-absorbents, surfactants, solvents and cosmetics.About 80% of the main human PK/TK (fraction absorbed, oral bioavailability, half-life, unbound fraction in plasma, clearance, volume of distribution, fraction excreted) for the selected chemicals was missing in the literature. This information was now added (from in silico predictions). Median and mean prediction errors for these parameters were 1.3- to 2.7-fold and 1.4- to 4.8-fold, respectively. In total, 59 and 86% of predictions had errors <2- and <5-fold, respectively. Predicted and observed PK/TK for the chemicals was generally within the range for pharmaceutical drugs.The results validated the new integrated system for prediction of the human PK/TK for different chemicals and added important missing information. No general difference in PK/TK-characteristics was found between the selected chemicals and pharmaceutical drugs.
Collapse
Affiliation(s)
| | | | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, Uppsala, SE-751 24 Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, Uppsala, 75124 Sweden
| |
Collapse
|
7
|
Klutzny S, Kornhuber M, Morger A, Schönfelder G, Volkamer A, Oelgeschläger M, Dunst S. Quantitative high-throughput phenotypic screening for environmental estrogens using the E-Morph Screening Assay in combination with in silico predictions. ENVIRONMENT INTERNATIONAL 2022; 158:106947. [PMID: 34717173 DOI: 10.1016/j.envint.2021.106947] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 10/14/2021] [Accepted: 10/18/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND Exposure to environmental chemicals that interfere with normal estrogen function can lead to adverse health effects, including cancer. High-throughput screening (HTS) approaches facilitate the efficient identification and characterization of such substances. OBJECTIVES We recently described the development of the E-Morph Assay, which measures changes at adherens junctions as a clinically-relevant phenotypic readout for estrogen receptor (ER) alpha signaling activity. Here, we describe its further development and application for automated robotic HTS. METHODS Using the advanced E-Morph Screening Assay, we screened a substance library comprising 430 toxicologically-relevant industrial chemicals, biocides, and plant protection products to identify novel substances with estrogenic activities. Based on the primary screening data and the publicly available ToxCast dataset, we performed an insilico similarity search to identify further substances with potential estrogenic activity for follow-up hit expansion screening, and built seven insilico ER models using the conformal prediction (CP) framework to evaluate the HTS results. RESULTS The primary and hit confirmation screens identified 27 'known' estrogenic substances with potencies correlating very well with the published ToxCast ER Agonist Score (r=+0.95). We additionally detected potential 'novel' estrogenic activities for 10 primary hit substances and for another nine out of 20 structurally similar substances from insilico predictions and follow-up hit expansion screening. The concordance of the E-Morph Screening Assay with the ToxCast ER reference data and the generated CP ER models was 71% and 73%, respectively, with a high predictivity for ER active substances of up to 87%, which is particularly important for regulatory purposes. DISCUSSION These data provide a proof-of-concept for the combination of in vitro HTS approaches with insilico methods (similarity search, CP models) for efficient analysis of large substance libraries in order to prioritize substances with potential estrogenic activity for subsequent testing against higher tier human endpoints.
Collapse
Affiliation(s)
- Saskia Klutzny
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany
| | - Marja Kornhuber
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany; Freie Universität Berlin, Berlin, Germany
| | - Andrea Morger
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
| | - Gilbert Schönfelder
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany; Institute of Clinical Pharmacology and Toxicology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
| | - Andrea Volkamer
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
| | - Michael Oelgeschläger
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany
| | - Sebastian Dunst
- Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany.
| |
Collapse
|
8
|
Fagerholm U, Hellberg S, Alvarsson J, Arvidsson McShane S, Spjuth O. In silico prediction of volume of distribution of drugs in man using conformal prediction performs on par with animal data-based models. Xenobiotica 2021; 51:1366-1371. [PMID: 34845977 DOI: 10.1080/00498254.2021.2011471] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Volume of distribution at steady state (Vss) is an important pharmacokinetic endpoint. In this study we apply machine learning and conformal prediction for human Vss prediction, and make a head-to-head comparison with rat-to-man scaling, allometric scaling and the Rodgers-Lukova method on combined in silico and in vitro data, using a test set of 105 compounds with experimentally observed Vss.The mean prediction error and % with <2-fold prediction error for our method were 2.4-fold and 64%, respectively. 69% of test compounds had an observed Vss within the prediction interval at a 70% confidence level. In comparison, 2.2-, 2.9- and 3.1-fold mean errors and 69, 64 and 61% of predictions with <2-fold error was reached with rat-to-man and allometric scaling and Rodgers-Lukova method, respectively.We conclude that our method has theoretically proven validity that was empirically confirmed, and showing predictive accuracy on par with animal models and superior to an alternative widely used in silico-based method. The option for the user to select the level of confidence in predictions offers better guidance on how to optimise Vss in drug discovery applications.
Collapse
Affiliation(s)
| | | | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Staffan Arvidsson McShane
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| |
Collapse
|
9
|
Wilm A, Garcia de Lomana M, Stork C, Mathai N, Hirte S, Norinder U, Kühnl J, Kirchmair J. Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors. Pharmaceuticals (Basel) 2021; 14:ph14080790. [PMID: 34451887 PMCID: PMC8402010 DOI: 10.3390/ph14080790] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 08/03/2021] [Accepted: 08/06/2021] [Indexed: 02/06/2023] Open
Abstract
In recent years, a number of machine learning models for the prediction of the skin sensitization potential of small organic molecules have been reported and become available. These models generally perform well within their applicability domains but, as a result of the use of molecular fingerprints and other non-intuitive descriptors, the interpretability of the existing models is limited. The aim of this work is to develop a strategy to replace the non-intuitive features by predicted outcomes of bioassays. We show that such replacement is indeed possible and that as few as ten interpretable, predicted bioactivities are sufficient to reach competitive performance. On a holdout data set of 257 compounds, the best model (“Skin Doctor CP:Bio”) obtained an efficiency of 0.82 and an MCC of 0.52 (at the significance level of 0.20). Skin Doctor CP:Bio is available free of charge for academic research. The modeling strategies explored in this work are easily transferable and could be adopted for the development of more interpretable machine learning models for the prediction of the bioactivity and toxicity of small organic compounds.
Collapse
Affiliation(s)
- Anke Wilm
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany; (A.W.); (C.S.)
- HITeC e.V., 22527 Hamburg, Germany
| | - Marina Garcia de Lomana
- Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria; (M.G.d.L.); (S.H.)
| | - Conrad Stork
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany; (A.W.); (C.S.)
| | - Neann Mathai
- Computational Biology Unit (CBU), Department of Chemistry, University of Bergen, N-5020 Bergen, Norway;
| | - Steffen Hirte
- Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria; (M.G.d.L.); (S.H.)
| | - Ulf Norinder
- MTM Research Centre, School of Science and Technology, Örebro University, SE-70182 Örebro, Sweden;
- Department of Computer and Systems Sciences, Stockholm University, SE-16407 Kista, Sweden
- Department of Pharmaceutical Biosciences, Uppsala University, SE-75124 Uppsala, Sweden
| | - Jochen Kühnl
- Front End Innovation, Beiersdorf AG, 22529 Hamburg, Germany;
| | - Johannes Kirchmair
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany; (A.W.); (C.S.)
- Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria; (M.G.d.L.); (S.H.)
- Correspondence: ; Tel.: +43-1-4277-55104
| |
Collapse
|
10
|
Combined Naïve Bayesian, Chemical Fingerprints and Molecular Docking Classifiers to Model and Predict Androgen Receptor Binding Data for Environmentally- and Health-Sensitive Substances. Int J Mol Sci 2021; 22:ijms22136695. [PMID: 34206613 PMCID: PMC8267747 DOI: 10.3390/ijms22136695] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 06/18/2021] [Accepted: 06/20/2021] [Indexed: 12/15/2022] Open
Abstract
Many chemicals that enter the environment, food chain, and the human body can disrupt androgen-dependent pathways and mimic hormones and therefore, may be responsible for multiple diseases from reproductive to tumor. Thus, modeling and predicting androgen receptor activity is an important area of research. The aim of the current study was to find a method or combination of methods to predict compounds that can bind to and/or disrupt the androgen receptor, and thereby guide decision making and further analysis. A stepwise procedure proceeded from analysis of protein structures from human, chimp, and rat, followed by docking and subsequent ligand, and statistics based techniques that improved classification gradually. The best methods used multivariate logistic regression of combinations of chimpanzee protein structural docking scores, extended connectivity fingerprints, and naïve Bayesians of known binders and non-binders. Combination or consensus methods included data from a variety of procedures to improve the final model accuracy.
Collapse
|
11
|
Wilm A, Norinder U, Agea MI, de Bruyn Kops C, Stork C, Kühnl J, Kirchmair J. Skin Doctor CP: Conformal Prediction of the Skin Sensitization Potential of Small Organic Molecules. Chem Res Toxicol 2020; 34:330-344. [PMID: 33295759 PMCID: PMC7887802 DOI: 10.1021/acs.chemrestox.0c00253] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Skin sensitization potential or potency is an important end point in the safety assessment of new chemicals and new chemical mixtures. Formerly, animal experiments such as the local lymph node assay (LLNA) were the main form of assessment. Today, however, the focus lies on the development of nonanimal testing approaches (i.e., in vitro and in chemico assays) and computational models. In this work, we investigate, based on publicly available LLNA data, the ability of aggregated, Mondrian conformal prediction classifiers to differentiate between non- sensitizing and sensitizing compounds as well as between two levels of skin sensitization potential (weak to moderate sensitizers, and strong to extreme sensitizers). The advantage of the conformal prediction framework over other modeling approaches is that it assigns compounds to activity classes only if a defined minimum level of confidence is reached for the individual predictions. This eliminates the need for applicability domain criteria that often are arbitrary in their nature and less flexible. Our new binary classifier, named Skin Doctor CP, differentiates nonsensitizers from sensitizers with a higher reliability-to-efficiency ratio than the corresponding nonconformal prediction workflow that we presented earlier. When tested on a set of 257 compounds at the significance levels of 0.10 and 0.30, the model reached an efficiency of 0.49 and 0.92, and an accuracy of 0.83 and 0.75, respectively. In addition, we developed a ternary classification workflow to differentiate nonsensitizers, weak to moderate sensitizers, and strong to extreme sensitizers. Although this model achieved satisfactory overall performance (accuracies of 0.90 and 0.73, and efficiencies of 0.42 and 0.90, at significance levels 0.10 and 0.30, respectively), it did not obtain satisfying class-wise results (at a significance level of 0.30, the validities obtained for nonsensitizers, weak to moderate sensitizers, and strong to extreme sensitizers were 0.70, 0.58, and 0.63, respectively). We argue that the model is, in consequence, unable to reliably identify strong to extreme sensitizers and suggest that other ternary models derived from the currently accessible LLNA data might suffer from the same problem. Skin Doctor CP is available via a public web service at https://nerdd.zbh.uni-hamburg.de/skinDoctorII/.
Collapse
Affiliation(s)
- Anke Wilm
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany.,HITeC e.V., 22527 Hamburg, Germany
| | - Ulf Norinder
- Department of Computer and Systems Sciences, Stockholm University, SE-16407 Kista, Sweden.,Department of Pharmaceutical Biosciences, Uppsala University, SE-75124 Uppsala, Sweden.,MTM Research Centre, School of Science and Technology, Örebro University, SE-70182 Örebro, Sweden
| | - M Isabel Agea
- Department of Informatics and Chemistry, University of Chemistry and Technology Prague, 16628 Prague, Czech Republic
| | - Christina de Bruyn Kops
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany
| | - Conrad Stork
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany
| | - Jochen Kühnl
- Front End Innovation, Beiersdorf AG, 22529 Hamburg, Germany
| | - Johannes Kirchmair
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany.,Department of Pharmaceutical Chemistry, University of Vienna, 1090 Vienna, Austria
| |
Collapse
|
12
|
Morger A, Mathea M, Achenbach JH, Wolf A, Buesen R, Schleifer KJ, Landsiedel R, Volkamer A. KnowTox: pipeline and case study for confident prediction of potential toxic effects of compounds in early phases of development. J Cheminform 2020; 12:24. [PMID: 33431007 PMCID: PMC7157991 DOI: 10.1186/s13321-020-00422-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 03/09/2020] [Indexed: 02/07/2023] Open
Abstract
Risk assessment of newly synthesised chemicals is a prerequisite for regulatory approval. In this context, in silico methods have great potential to reduce time, cost, and ultimately animal testing as they make use of the ever-growing amount of available toxicity data. Here, KnowTox is presented, a novel pipeline that combines three different in silico toxicology approaches to allow for confident prediction of potentially toxic effects of query compounds, i.e. machine learning models for 88 endpoints, alerts for 919 toxic substructures, and computational support for read-across. It is mainly based on the ToxCast dataset, containing after preprocessing a sparse matrix of 7912 compounds tested against 985 endpoints. When applying machine learning models, applicability and reliability of predictions for new chemicals are of utmost importance. Therefore, first, the conformal prediction technique was deployed, comprising an additional calibration step and per definition creating internally valid predictors at a given significance level. Second, to further improve validity and information efficiency, two adaptations are suggested, exemplified at the androgen receptor antagonism endpoint. An absolute increase in validity of 23% on the in-house dataset of 534 compounds could be achieved by introducing KNNRegressor normalisation. This increase in validity comes at the cost of efficiency, which could again be improved by 20% for the initial ToxCast model by balancing the dataset during model training. Finally, the value of the developed pipeline for risk assessment is discussed using two in-house triazole molecules. Compared to a single toxicity prediction method, complementing the outputs of different approaches can have a higher impact on guiding toxicity testing and de-selecting most likely harmful development-candidate compounds early in the development process.
Collapse
Affiliation(s)
- Andrea Morger
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany
| | | | | | | | | | | | | | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany.
| |
Collapse
|
13
|
Hemmerich J, Ecker GF. In silico toxicology: From structure–activity relationships towards deep learning and adverse outcome pathways. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020; 10:e1475. [PMID: 35866138 PMCID: PMC9286356 DOI: 10.1002/wcms.1475] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 03/09/2020] [Accepted: 03/10/2020] [Indexed: 12/18/2022]
Abstract
In silico toxicology is an emerging field. It gains increasing importance as research is aiming to decrease the use of animal experiments as suggested in the 3R principles by Russell and Burch. In silico toxicology is a means to identify hazards of compounds before synthesis, and thus in very early stages of drug development. For chemical industries, as well as regulatory agencies it can aid in gap‐filling and guide risk minimization strategies. Techniques such as structural alerts, read‐across, quantitative structure–activity relationship, machine learning, and deep learning allow to use in silico toxicology in many cases, some even when data is scarce. Especially the concept of adverse outcome pathways puts all techniques into a broader context and can elucidate predictions by mechanistic insights. This article is categorized under:Structure and Mechanism > Computational Biochemistry and Biophysics Data Science > Chemoinformatics
Collapse
Affiliation(s)
- Jennifer Hemmerich
- Department of Pharmaceutical Chemistry University of Vienna Vienna Austria
| | - Gerhard F. Ecker
- Department of Pharmaceutical Chemistry University of Vienna Vienna Austria
| |
Collapse
|
14
|
Zorn KM, Lane TR, Russo DP, Clark AM, Makarov V, Ekins S. Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets. Mol Pharm 2019; 16:1620-1632. [PMID: 30779585 DOI: 10.1021/acs.molpharmaceut.8b01297] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The human immunodeficiency virus (HIV) causes over a million deaths every year and has a huge economic impact in many countries. The first class of drugs approved were nucleoside reverse transcriptase inhibitors. A newer generation of reverse transcriptase inhibitors have become susceptible to drug resistant strains of HIV, and hence, alternatives are urgently needed. We have recently pioneered the use of Bayesian machine learning to generate models with public data to identify new compounds for testing against different disease targets. The current study has used the NIAID ChemDB HIV, Opportunistic Infection and Tuberculosis Therapeutics Database for machine learning studies. We curated and cleaned data from HIV-1 wild-type cell-based and reverse transcriptase (RT) DNA polymerase inhibition assays. Compounds from this database with ≤1 μM HIV-1 RT DNA polymerase activity inhibition and cell-based HIV-1 inhibition are correlated (Pearson r = 0.44, n = 1137, p < 0.0001). Models were trained using multiple machine learning approaches (Bernoulli Naive Bayes, AdaBoost Decision Tree, Random Forest, support vector classification, k-Nearest Neighbors, and deep neural networks as well as consensus approaches) and then their predictive abilities were compared. Our comparison of different machine learning methods demonstrated that support vector classification, deep learning, and a consensus were generally comparable and not significantly different from each other using 5-fold cross validation and using 24 training and test set combinations. This study demonstrates findings in line with our previous studies for various targets that training and testing with multiple data sets does not demonstrate a significant difference between support vector machine and deep neural networks.
Collapse
Affiliation(s)
- Kimberley M Zorn
- Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States
| | - Thomas R Lane
- Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States
| | - Daniel P Russo
- Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States.,The Rutgers Center for Computational and Integrative Biology , Camden , New Jersey 08102 , United States
| | - Alex M Clark
- Molecular Materials Informatics, Inc. , 2234 Duvernay Street , Montreal , Quebec H3J2Y3 , Canada
| | - Vadim Makarov
- Bach Institute of Biochemistry , Research Center of Biotechnology of the Russian Academy of Sciences , Leninsky Prospekt 33-2 , Moscow 119071 , Russia
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States
| |
Collapse
|
15
|
Hanser T, Barber C, Guesné S, Marchaland JF, Werner S. Applicability Domain: Towards a More Formal Framework to Express the Applicability of a Model and the Confidence in Individual Predictions. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2019. [DOI: 10.1007/978-3-030-16443-0_11] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
16
|
Lampa S, Alvarsson J, Arvidsson Mc Shane S, Berg A, Ahlberg E, Spjuth O. Predicting Off-Target Binding Profiles With Confidence Using Conformal Prediction. Front Pharmacol 2018; 9:1256. [PMID: 30459617 PMCID: PMC6233526 DOI: 10.3389/fphar.2018.01256] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 10/15/2018] [Indexed: 01/04/2023] Open
Abstract
Ligand-based models can be used in drug discovery to obtain an early indication of potential off-target interactions that could be linked to adverse effects. Another application is to combine such models into a panel, allowing to compare and search for compounds with similar profiles. Most contemporary methods and implementations however lack valid measures of confidence in their predictions, and only provide point predictions. We here describe a methodology that uses Conformal Prediction for predicting off-target interactions, with models trained on data from 31 targets in the ExCAPE-DB dataset selected for their utility in broad early hazard assessment. Chemicals were represented by the signature molecular descriptor and support vector machines were used as the underlying machine learning method. By using conformal prediction, the results from predictions come in the form of confidence p-values for each class. The full pre-processing and model training process is openly available as scientific workflows on GitHub, rendering it fully reproducible. We illustrate the usefulness of the developed methodology on a set of compounds extracted from DrugBank. The resulting models are published online and are available via a graphical web interface and an OpenAPI interface for programmatic access.
Collapse
Affiliation(s)
- Samuel Lampa
- Pharmaceutical Bioinformatics Group, Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Jonathan Alvarsson
- Pharmaceutical Bioinformatics Group, Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Staffan Arvidsson Mc Shane
- Pharmaceutical Bioinformatics Group, Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Arvid Berg
- Pharmaceutical Bioinformatics Group, Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Ernst Ahlberg
- Predictive Compound ADME and Safety, Drug Safety and Metabolism, AstraZeneca IMED Biotech Unit, Mölndal, Sweden
| | - Ola Spjuth
- Pharmaceutical Bioinformatics Group, Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| |
Collapse
|
17
|
Norinder U, Myatt G, Ahlberg E. Predicting Aromatic Amine Mutagenicity with Confidence: A Case Study Using Conformal Prediction. Biomolecules 2018; 8:biom8030085. [PMID: 30158463 PMCID: PMC6163496 DOI: 10.3390/biom8030085] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Revised: 08/16/2018] [Accepted: 08/21/2018] [Indexed: 01/09/2023] Open
Abstract
The occurrence of mutagenicity in primary aromatic amines has been investigated using conformal prediction. The results of the investigation show that it is possible to develop mathematically proven valid models using conformal prediction and that the existence of uncertain classes of prediction, such as both (both classes assigned to a compound) and empty (no class assigned to a compound), provides the user with additional information on how to use, further develop, and possibly improve future models. The study also indicates that the use of different sets of fingerprints results in models, for which the ability to discriminate varies with respect to the set level of acceptable errors.
Collapse
Affiliation(s)
- Ulf Norinder
- Swetox, Karolinska Institutet, Unit of Toxicology Sciences, SE-151 36 Södertälje, Sweden.
- Dept. Computer and Systems Sciences, Stockholm Univ., Box 7003, SE-164 07 Kista, Sweden.
| | - Glenn Myatt
- Leadscope, 1393 Dublin Road, Columbus, OH 43215, USA.
| | - Ernst Ahlberg
- Drug Safety and Metabolism, Innovative Medicines and Early Development Biotech Unit, AstraZeneca R&D Gothenburg, SE-431 83 Mölndal, Sweden.
| |
Collapse
|
18
|
Svensson F, Aniceto N, Norinder U, Cortes-Ciriano I, Spjuth O, Carlsson L, Bender A. Conformal Regression for Quantitative Structure–Activity Relationship Modeling—Quantifying Prediction Uncertainty. J Chem Inf Model 2018; 58:1132-1140. [DOI: 10.1021/acs.jcim.8b00054] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Fredrik Svensson
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
- IOTA Pharmaceuticals, St Johns Innovation Centre, Cowley Road, Cambridge CB4 0WS, U.K
| | - Natalia Aniceto
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Ulf Norinder
- Swetox, Unit of Toxicology Sciences, Karolinska Institutet, Forskargatan 20, SE-151 36 Södertälje, Sweden
- Department of Computer and Systems Sciences, Stockholm University, Box 7003, SE-164 07 Kista, Sweden
| | - Isidro Cortes-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, SE-75124, Uppsala Sweden
| | - Lars Carlsson
- Quantitative Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, SE-43183, Mölndal, Sweden
- Department of Computer Science, Royal Holloway, University of London, Egham Hill, Surrey, U.K
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
19
|
Lapins M, Arvidsson S, Lampa S, Berg A, Schaal W, Alvarsson J, Spjuth O. A confidence predictor for logD using conformal regression and a support-vector machine. J Cheminform 2018; 10:17. [PMID: 29616425 PMCID: PMC5882484 DOI: 10.1186/s13321-018-0271-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 03/25/2018] [Indexed: 02/03/2023] Open
Abstract
Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water–octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and evaluated by a support-vector machine with a linear kernel using conformal prediction methodology, outputting prediction intervals at a specified confidence level. The resulting model shows a predictive ability of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\hbox {Q}^{2}=0.973$$\end{document}Q2=0.973 and with the best performing nonconformity measure having median prediction interval of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\pm ~0.39$$\end{document}±0.39 log units at 80% confidence and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\pm ~0.60$$\end{document}±0.60 log units at 90% confidence. The model is available as an online service via an OpenAPI interface, a web page with a molecular editor, and we also publish predictive values at 90% confidence level for 91 M PubChem structures in RDF format for download and as an URI resolver service.![]()
Collapse
Affiliation(s)
- Maris Lapins
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Staffan Arvidsson
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Samuel Lampa
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Arvid Berg
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Wesley Schaal
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden.
| |
Collapse
|
20
|
Devillers J, Devillers H, Bro E, Millot F. Expert judgment based multicriteria decision models to assess the risk of pesticides on reproduction failures of grey partridge. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017; 28:889-911. [PMID: 29206499 DOI: 10.1080/1062936x.2017.1402449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 11/04/2017] [Indexed: 06/07/2023]
Abstract
A suite of models is proposed for estimating the risk of pesticides against the grey partridge (Perdix perdix) and their clutches. Radio-tracked data of females, description and location of the clutches, and data on the pesticide treatments during the laying periods of the partridges were used as basic information. Quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) modelling allowed us to characterize the pesticides by their 1-octanol/water partition coefficient (log P), vapour pressure, primary and ultimate biodegradation potential, acute toxicity (LD50) on P. perdix, and endocrine disruption potential. From these physicochemical and toxicological data, the system of integration of risk with interaction of scores (SIRIS) method was used to design scores of risk for pesticides, alone or in mixture. A program, written in R (version 3.1.1), called Simulation of Toxicity in Perdix perdix (SimToxPP), was designed for estimating the risk of substances, considered alone or in mixture, against the grey partridge during breeding. The software tool is flexible enough to simulate realistic in situ scenarios. Different examples of applications are shown. The advantages and limitations of the approach are briefly discussed.
Collapse
Affiliation(s)
| | - H Devillers
- b Micalis Institute, INRA, University Paris-Saclay , Jouy-en-Josas , France
| | - E Bro
- c Research Department , National Game and Wildlife Institute (ONCFS) , Auffargis , France
| | - F Millot
- c Research Department , National Game and Wildlife Institute (ONCFS) , Auffargis , France
| |
Collapse
|
21
|
Norinder U, Boyer S. Binary classification of imbalanced datasets using conformal prediction. J Mol Graph Model 2017; 72:256-265. [PMID: 28135672 DOI: 10.1016/j.jmgm.2017.01.008] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2016] [Revised: 12/19/2016] [Accepted: 01/04/2017] [Indexed: 11/28/2022]
Abstract
Aggregated Conformal Prediction is used as an effective alternative to other, more complicated and/or ambiguous methods involving various balancing measures when modelling severely imbalanced datasets. Additional explicit balancing measures other than those already apart of the Conformal Prediction framework are shown not to be required. The Aggregated Conformal Prediction procedure appears to be a promising approach for severely imbalanced datasets in order to retrieve a large majority of active minority class compounds while avoiding information loss or distortion.
Collapse
Affiliation(s)
- Ulf Norinder
- Swedish Toxicology Sciences Research Center, SE-151 36 Södertälje, Sweden.
| | - Scott Boyer
- Swedish Toxicology Sciences Research Center, SE-151 36 Södertälje, Sweden.
| |
Collapse
|
22
|
Hanser T, Barber C, Marchaland JF, Werner S. Applicability domain: towards a more formal definition. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2016; 27:893-909. [PMID: 27827546 DOI: 10.1080/1062936x.2016.1250229] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 10/16/2016] [Indexed: 06/06/2023]
Abstract
In recent years the applicability domain (AD) of a prediction system has become an important concern in (Q)SAR modelling, especially in the context of human safety assessment. Today AD is an active research topic, and many methods have been designed to estimate the adequacy of a model and the confidence in its outcome for a given prediction task. Unfortunately, the wide spectrum of techniques developed for this purpose is based on various definitions of the concept of AD, often taking into account different types of information. This variety of methodologies confuses the end users and makes the comparison of the AD for different models almost impossible. In this article, we demonstrate that AD is not a monolithic concept and can be broken down into three well-defined sub-domains assessing confidence at the model, prediction and decision levels, respectively. By leveraging this separation of concerns we have an opportunity to clarify, formalize and extend the definition of AD. We propose a framework that captures this new vision with the aim to initiate a global effort to converge towards a common AD definition within the (Q)SAR community.
Collapse
Affiliation(s)
- T Hanser
- a Research Group, Lhasa Limited (UK) , Leeds , UK
| | - C Barber
- a Research Group, Lhasa Limited (UK) , Leeds , UK
| | | | - S Werner
- a Research Group, Lhasa Limited (UK) , Leeds , UK
| |
Collapse
|