Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Norinder U, Rybacka A, Andersson PL. Conformal prediction to define applicability domain - A case study on predicting ER and AR binding. SAR QSAR Environ Res 2016;27:303-316. [PMID: 27088868 DOI: 10.1080/1062936x.2016.1172665] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

For:	Norinder U, Rybacka A, Andersson PL. Conformal prediction to define applicability domain - A case study on predicting ER and AR binding. SAR QSAR Environ Res 2016;27:303-316. [PMID: 27088868 DOI: 10.1080/1062936x.2016.1172665] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Number

Cited by Other Article(s)

Arvidsson McShane S, Norinder U, Alvarsson J, Ahlberg E, Carlsson L, Spjuth O. CPSign: conformal prediction for cheminformatics modeling. J Cheminform 2024;16:75. [PMID: 38943219 PMCID: PMC11214261 DOI: 10.1186/s13321-024-00870-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 06/11/2024] [Indexed: 07/01/2024] Open

Abstract

Conformal prediction has seen many applications in pharmaceutical science, being able to calibrate outputs of machine learning models and producing valid prediction intervals. We here present the open source software CPSign that is a complete implementation of conformal prediction for cheminformatics modeling. CPSign implements inductive and transductive conformal prediction for classification and regression, and probabilistic prediction with the Venn-ABERS methodology. The main chemical representation is signatures but other types of descriptors are also supported. The main modeling methodology is support vector machines (SVMs), but additional modeling methods are supported via an extension mechanism, e.g. DeepLearning4J models. We also describe features for visualizing results from conformal models including calibration and efficiency plots, as well as features to publish predictive models as REST services. We compare CPSign against other common cheminformatics modeling approaches including random forest, and a directed message-passing neural network. The results show that CPSign produces robust predictive performance with comparative predictive efficiency, with superior runtime and lower hardware requirements compared to neural network based models. CPSign has been used in several studies and is in production-use in multiple organizations. The ability to work directly with chemical input files, perform descriptor calculation and modeling with SVM in the conformal prediction framework, with a single software package having a low footprint and fast execution time makes CPSign a convenient and yet flexible package for training, deploying, and predicting on chemical data. CPSign can be downloaded from GitHub at https://github.com/arosbio/cpsign .Scientific contribution CPSign provides a single software that allows users to perform data preprocessing, modeling and make predictions directly on chemical structures, using conformal and probabilistic prediction. Building and evaluating new models can be achieved at a high abstraction level, without sacrificing flexibility and predictive performance-showcased with a method evaluation against contemporary modeling approaches, where CPSign performs on par with a state-of-the-art deep learning based model.

Collapse

Fagerholm U, Hellberg S, Alvarsson J, Spjuth O. In Silico Prediction of Human Clinical Pharmacokinetics with ANDROMEDA by Prosilico: Predictions for an Established Benchmarking Data Set, a Modern Small Drug Data Set, and a Comparison with Laboratory Methods. Altern Lab Anim 2023;51:39-54. [PMID: 36572567 DOI: 10.1177/02611929221148447] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Sapounidou M, Norinder U, Andersson PL. Predicting Endocrine Disruption Using Conformal Prediction - A Prioritization Strategy to Identify Hazardous Chemicals with Confidence. Chem Res Toxicol 2022;36:53-65. [PMID: 36534483 PMCID: PMC9846826 DOI: 10.1021/acs.chemrestox.2c00267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Abstract

Receptor-mediated molecular initiating events (MIEs) and their relevance in endocrine activity (EA) have been highlighted in literature. More than 15 receptors have been associated with neurodevelopmental adversity and metabolic disruption. MIEs describe chemical interactions with defined biological outcomes, a relationship that could be described with quantitative structure-activity relationship (QSAR) models. QSAR uncertainty can be assessed using the conformal prediction (CP) framework, which provides similarity (i.e., nonconformity) scores relative to the defined classes per prediction. CP calibration can indirectly mitigate data imbalance during model development, and the nonconformity scores serve as intrinsic measures of chemical applicability domain assessment during screening. The focus of this work was to propose an in silico predictive strategy for EA. First, 23 QSAR models for MIEs associated with EA were developed using high-throughput data for 14 receptors. To handle the data imbalance, five protocols were compared, and CP provided the most balanced class definition. Second, the developed QSAR models were applied to a large data set (∼55,000 chemicals), comprising chemicals representative of potential risk for human exposure. Using CP, it was possible to assess the uncertainty of the screening results and identify model strengths and out of domain chemicals. Last, two clustering methods, t-distributed stochastic neighbor embedding and Tanimoto similarity, were used to identify compounds with potential EA using known endocrine disruptors as reference. The cluster overlap between methods produced 23 chemicals with suspected or demonstrated EA potential. The presented models could be utilized for first-tier screening and identification of compounds with potential biological activity across the studied MIEs.

Collapse

In silico predictions of the gastrointestinal uptake of macrocycles in man using conformal prediction methodology. J Pharm Sci 2022;111:2614-2619. [DOI: 10.1016/j.xphs.2022.05.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 05/16/2022] [Accepted: 05/16/2022] [Indexed: 11/17/2022]

Climate-change-driven growth decline of European beech forests. Commun Biol 2022;5:163. [PMID: 35273334 PMCID: PMC8913685 DOI: 10.1038/s42003-022-03107-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 02/02/2022] [Indexed: 11/08/2022] Open

Fagerholm U, Hellberg S, Alvarsson J, Spjuth O. In silico predictions of the human pharmacokinetics/toxicokinetics of 65 chemicals from various classes using conformal prediction methodology. Xenobiotica 2022;52:113-118. [PMID: 35238270 DOI: 10.1080/00498254.2022.2049397] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

Klutzny S, Kornhuber M, Morger A, Schönfelder G, Volkamer A, Oelgeschläger M, Dunst S. Quantitative high-throughput phenotypic screening for environmental estrogens using the E-Morph Screening Assay in combination with in silico predictions. ENVIRONMENT INTERNATIONAL 2022;158:106947. [PMID: 34717173 DOI: 10.1016/j.envint.2021.106947] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 10/14/2021] [Accepted: 10/18/2021] [Indexed: 06/13/2023]

Abstract

BACKGROUND

Exposure to environmental chemicals that interfere with normal estrogen function can lead to adverse health effects, including cancer. High-throughput screening (HTS) approaches facilitate the efficient identification and characterization of such substances.

OBJECTIVES

We recently described the development of the E-Morph Assay, which measures changes at adherens junctions as a clinically-relevant phenotypic readout for estrogen receptor (ER) alpha signaling activity. Here, we describe its further development and application for automated robotic HTS.

METHODS

Using the advanced E-Morph Screening Assay, we screened a substance library comprising 430 toxicologically-relevant industrial chemicals, biocides, and plant protection products to identify novel substances with estrogenic activities. Based on the primary screening data and the publicly available ToxCast dataset, we performed an insilico similarity search to identify further substances with potential estrogenic activity for follow-up hit expansion screening, and built seven insilico ER models using the conformal prediction (CP) framework to evaluate the HTS results.

RESULTS

The primary and hit confirmation screens identified 27 'known' estrogenic substances with potencies correlating very well with the published ToxCast ER Agonist Score (r=+0.95). We additionally detected potential 'novel' estrogenic activities for 10 primary hit substances and for another nine out of 20 structurally similar substances from insilico predictions and follow-up hit expansion screening. The concordance of the E-Morph Screening Assay with the ToxCast ER reference data and the generated CP ER models was 71% and 73%, respectively, with a high predictivity for ER active substances of up to 87%, which is particularly important for regulatory purposes.

DISCUSSION

These data provide a proof-of-concept for the combination of in vitro HTS approaches with insilico methods (similarity search, CP models) for efficient analysis of large substance libraries in order to prioritize substances with potential estrogenic activity for subsequent testing against higher tier human endpoints.

Collapse

Fagerholm U, Hellberg S, Alvarsson J, Arvidsson McShane S, Spjuth O. In silico prediction of volume of distribution of drugs in man using conformal prediction performs on par with animal data-based models. Xenobiotica 2021;51:1366-1371. [PMID: 34845977 DOI: 10.1080/00498254.2021.2011471] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Wilm A, Garcia de Lomana M, Stork C, Mathai N, Hirte S, Norinder U, Kühnl J, Kirchmair J. Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors. Pharmaceuticals (Basel) 2021;14:ph14080790. [PMID: 34451887 PMCID: PMC8402010 DOI: 10.3390/ph14080790] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 08/03/2021] [Accepted: 08/06/2021] [Indexed: 02/06/2023] Open

Combined Naïve Bayesian, Chemical Fingerprints and Molecular Docking Classifiers to Model and Predict Androgen Receptor Binding Data for Environmentally- and Health-Sensitive Substances. Int J Mol Sci 2021;22:ijms22136695. [PMID: 34206613 PMCID: PMC8267747 DOI: 10.3390/ijms22136695] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 06/18/2021] [Accepted: 06/20/2021] [Indexed: 12/15/2022] Open

Wilm A, Norinder U, Agea MI, de Bruyn Kops C, Stork C, Kühnl J, Kirchmair J. Skin Doctor CP: Conformal Prediction of the Skin Sensitization Potential of Small Organic Molecules. Chem Res Toxicol 2020;34:330-344. [PMID: 33295759 PMCID: PMC7887802 DOI: 10.1021/acs.chemrestox.0c00253] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Abstract

Skin sensitization potential or potency is an important end point in the safety assessment of new chemicals and new chemical mixtures. Formerly, animal experiments such as the local lymph node assay (LLNA) were the main form of assessment. Today, however, the focus lies on the development of nonanimal testing approaches (i.e., in vitro and in chemico assays) and computational models. In this work, we investigate, based on publicly available LLNA data, the ability of aggregated, Mondrian conformal prediction classifiers to differentiate between non- sensitizing and sensitizing compounds as well as between two levels of skin sensitization potential (weak to moderate sensitizers, and strong to extreme sensitizers). The advantage of the conformal prediction framework over other modeling approaches is that it assigns compounds to activity classes only if a defined minimum level of confidence is reached for the individual predictions. This eliminates the need for applicability domain criteria that often are arbitrary in their nature and less flexible. Our new binary classifier, named Skin Doctor CP, differentiates nonsensitizers from sensitizers with a higher reliability-to-efficiency ratio than the corresponding nonconformal prediction workflow that we presented earlier. When tested on a set of 257 compounds at the significance levels of 0.10 and 0.30, the model reached an efficiency of 0.49 and 0.92, and an accuracy of 0.83 and 0.75, respectively. In addition, we developed a ternary classification workflow to differentiate nonsensitizers, weak to moderate sensitizers, and strong to extreme sensitizers. Although this model achieved satisfactory overall performance (accuracies of 0.90 and 0.73, and efficiencies of 0.42 and 0.90, at significance levels 0.10 and 0.30, respectively), it did not obtain satisfying class-wise results (at a significance level of 0.30, the validities obtained for nonsensitizers, weak to moderate sensitizers, and strong to extreme sensitizers were 0.70, 0.58, and 0.63, respectively). We argue that the model is, in consequence, unable to reliably identify strong to extreme sensitizers and suggest that other ternary models derived from the currently accessible LLNA data might suffer from the same problem. Skin Doctor CP is available via a public web service at https://nerdd.zbh.uni-hamburg.de/skinDoctorII/.

Collapse

Morger A, Mathea M, Achenbach JH, Wolf A, Buesen R, Schleifer KJ, Landsiedel R, Volkamer A. KnowTox: pipeline and case study for confident prediction of potential toxic effects of compounds in early phases of development. J Cheminform 2020;12:24. [PMID: 33431007 PMCID: PMC7157991 DOI: 10.1186/s13321-020-00422-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 03/09/2020] [Indexed: 02/07/2023] Open

Abstract

Risk assessment of newly synthesised chemicals is a prerequisite for regulatory approval. In this context, in silico methods have great potential to reduce time, cost, and ultimately animal testing as they make use of the ever-growing amount of available toxicity data. Here, KnowTox is presented, a novel pipeline that combines three different in silico toxicology approaches to allow for confident prediction of potentially toxic effects of query compounds, i.e. machine learning models for 88 endpoints, alerts for 919 toxic substructures, and computational support for read-across. It is mainly based on the ToxCast dataset, containing after preprocessing a sparse matrix of 7912 compounds tested against 985 endpoints. When applying machine learning models, applicability and reliability of predictions for new chemicals are of utmost importance. Therefore, first, the conformal prediction technique was deployed, comprising an additional calibration step and per definition creating internally valid predictors at a given significance level. Second, to further improve validity and information efficiency, two adaptations are suggested, exemplified at the androgen receptor antagonism endpoint. An absolute increase in validity of 23% on the in-house dataset of 534 compounds could be achieved by introducing KNNRegressor normalisation. This increase in validity comes at the cost of efficiency, which could again be improved by 20% for the initial ToxCast model by balancing the dataset during model training. Finally, the value of the developed pipeline for risk assessment is discussed using two in-house triazole molecules. Compared to a single toxicity prediction method, complementing the outputs of different approaches can have a higher impact on guiding toxicity testing and de-selecting most likely harmful development-candidate compounds early in the development process.

Collapse

Hemmerich J, Ecker GF. In silico toxicology: From structure–activity relationships towards deep learning and adverse outcome pathways. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020;10:e1475. [PMID: 35866138 PMCID: PMC9286356 DOI: 10.1002/wcms.1475] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 03/09/2020] [Accepted: 03/10/2020] [Indexed: 12/18/2022]

Zorn KM, Lane TR, Russo DP, Clark AM, Makarov V, Ekins S. Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets. Mol Pharm 2019;16:1620-1632. [PMID: 30779585 DOI: 10.1021/acs.molpharmaceut.8b01297] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Hanser T, Barber C, Guesné S, Marchaland JF, Werner S. Applicability Domain: Towards a More Formal Framework to Express the Applicability of a Model and the Confidence in Individual Predictions. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2019. [DOI: 10.1007/978-3-030-16443-0_11] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Lampa S, Alvarsson J, Arvidsson Mc Shane S, Berg A, Ahlberg E, Spjuth O. Predicting Off-Target Binding Profiles With Confidence Using Conformal Prediction. Front Pharmacol 2018;9:1256. [PMID: 30459617 PMCID: PMC6233526 DOI: 10.3389/fphar.2018.01256] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 10/15/2018] [Indexed: 01/04/2023] Open

Norinder U, Myatt G, Ahlberg E. Predicting Aromatic Amine Mutagenicity with Confidence: A Case Study Using Conformal Prediction. Biomolecules 2018;8:biom8030085. [PMID: 30158463 PMCID: PMC6163496 DOI: 10.3390/biom8030085] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Revised: 08/16/2018] [Accepted: 08/21/2018] [Indexed: 01/09/2023] Open

Svensson F, Aniceto N, Norinder U, Cortes-Ciriano I, Spjuth O, Carlsson L, Bender A. Conformal Regression for Quantitative Structure–Activity Relationship Modeling—Quantifying Prediction Uncertainty. J Chem Inf Model 2018;58:1132-1140. [DOI: 10.1021/acs.jcim.8b00054] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Lapins M, Arvidsson S, Lampa S, Berg A, Schaal W, Alvarsson J, Spjuth O. A confidence predictor for logD using conformal regression and a support-vector machine. J Cheminform 2018;10:17. [PMID: 29616425 PMCID: PMC5882484 DOI: 10.1186/s13321-018-0271-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 03/25/2018] [Indexed: 02/03/2023] Open

Abstract

Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water–octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and evaluated by a support-vector machine with a linear kernel using conformal prediction methodology, outputting prediction intervals at a specified confidence level. The resulting model shows a predictive ability of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {Q}^{2}=0.973$$\end{document}Q2=0.973 and with the best performing nonconformity measure having median prediction interval of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm ~0.39$$\end{document}±0.39 log units at 80% confidence and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm ~0.60$$\end{document}±0.60 log units at 90% confidence. The model is available as an online service via an OpenAPI interface, a web page with a molecular editor, and we also publish predictive values at 90% confidence level for 91 M PubChem structures in RDF format for download and as an URI resolver service.

Collapse

Devillers J, Devillers H, Bro E, Millot F. Expert judgment based multicriteria decision models to assess the risk of pesticides on reproduction failures of grey partridge. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017;28:889-911. [PMID: 29206499 DOI: 10.1080/1062936x.2017.1402449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 11/04/2017] [Indexed: 06/07/2023]

Norinder U, Boyer S. Binary classification of imbalanced datasets using conformal prediction. J Mol Graph Model 2017;72:256-265. [PMID: 28135672 DOI: 10.1016/j.jmgm.2017.01.008] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2016] [Revised: 12/19/2016] [Accepted: 01/04/2017] [Indexed: 11/28/2022]

Hanser T, Barber C, Marchaland JF, Werner S. Applicability domain: towards a more formal definition. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2016;27:893-909. [PMID: 27827546 DOI: 10.1080/1062936x.2016.1250229] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 10/16/2016] [Indexed: 06/06/2023]