1
|
Xu Y, Liaw A, Sheridan RP, Svetnik V. Development and Evaluation of Conformal Prediction Methods for Quantitative Structure-Activity Relationship. ACS OMEGA 2024; 9:29478-29490. [PMID: 39005801 PMCID: PMC11238240 DOI: 10.1021/acsomega.4c02017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 06/10/2024] [Accepted: 06/12/2024] [Indexed: 07/16/2024]
Abstract
The quantitative structure-activity relationship (QSAR) regression model is a commonly used technique for predicting the biological activities of compounds using their molecular descriptors. Besides accurate activity estimation, obtaining a prediction uncertainty metric like a prediction interval is highly desirable. Quantifying prediction uncertainty is an active research area in statistical and machine learning (ML), but the implementation for QSAR remains challenging. However, most ML algorithms with high predictive performance require add-on companions for estimating the uncertainty of their prediction. Conformal prediction (CP) is a promising approach as its main components are agnostic to the prediction modes, and it produces valid prediction intervals under weak assumptions on the data distribution. We proposed computationally efficient CP algorithms tailored to the most widely used ML models, including random forests, deep neural networks, and gradient boosting. The algorithms use a novel approach to the derivation of nonconformity scores from the estimates of prediction uncertainty generated by the ensembles of point predictions. The validity and efficiency of proposed algorithms are demonstrated on a diverse collection of QSAR data sets as well as simulation studies. The provided software implementing our algorithms can be used as stand-alone or easily incorporated into other ML software packages for QSAR modeling.
Collapse
Affiliation(s)
- Yuting Xu
- Early
Development Statistics, Merck & Co.,
Inc., Rahway, New Jersey 07065, United States
| | - Andy Liaw
- Early
Development Statistics, Merck & Co.,
Inc., Rahway, New Jersey 07065, United States
| | - Robert P. Sheridan
- Modeling
and Informatics, Merck & Co., Inc., Rahway, New Jersey 07033, United States
| | - Vladimir Svetnik
- Early
Development Statistics, Merck & Co.,
Inc., Rahway, New Jersey 07065, United States
| |
Collapse
|
2
|
Balraadjsing S, J G M Peijnenburg W, Vijver MG. Building species trait-specific nano-QSARs: Model stacking, navigating model uncertainties and limitations, and the effect of dataset size. ENVIRONMENT INTERNATIONAL 2024; 188:108764. [PMID: 38788418 DOI: 10.1016/j.envint.2024.108764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 05/17/2024] [Accepted: 05/19/2024] [Indexed: 05/26/2024]
Abstract
A strong need exists for broadly applicable nano-QSARs, capable of predicting toxicological outcomes towards untested species and nanomaterials, under different environmental conditions. Existing nano-QSARs are generally limited to only a few species but the inclusion of species characteristics into models can aid in making them applicable to multiple species, even when toxicity data is not available for biological species. Species traits were used to create classification- and regression machine learning models to predict acute toxicity towards aquatic species for metallic nanomaterials. Afterwards, the individual classification- and regression models were stacked into a meta-model to improve performance. Additionally, the uncertainty and limitations of the models were assessed in detail (beyond the OECD principles) and it was investigated whether models would benefit from the addition of more data. Results showed a significant improvement in model performance following model stacking. Investigation of model uncertainties and limitations highlighted the discrepancy between the applicability domain and accuracy of predictions. Data points outside of the assessed chemical space did not have higher likelihoods of generating inadequate predictions or vice versa. It is therefore concluded that the applicability domain does not give complete insight into the uncertainty of predictions and instead the generation of prediction intervals can help in this regard. Furthermore, results indicated that an increase of the dataset size did not improve model performance. This implies that larger dataset sizes may not necessarily improve model performance while in turn also meaning that large datasets are not necessarily required for prediction of acute toxicity with nano-QSARs.
Collapse
Affiliation(s)
- Surendra Balraadjsing
- Institute of Environmental Sciences (CML), Leiden University, PO Box 9518, 2300 RA Leiden, the Netherlands.
| | - Willie J G M Peijnenburg
- Institute of Environmental Sciences (CML), Leiden University, PO Box 9518, 2300 RA Leiden, the Netherlands; Centre for Safety of Substances and Products, National Institute of Public Health and the Environment (RIVM), PO Box 1, 3720 BA Bilthoven, the Netherlands
| | - Martina G Vijver
- Institute of Environmental Sciences (CML), Leiden University, PO Box 9518, 2300 RA Leiden, the Netherlands
| |
Collapse
|
3
|
Smajić A, Rami I, Sosnin S, Ecker GF. Identifying Differences in the Performance of Machine Learning Models for Off-Targets Trained on Publicly Available and Proprietary Data Sets. Chem Res Toxicol 2023; 36:1300-1312. [PMID: 37439496 PMCID: PMC10445286 DOI: 10.1021/acs.chemrestox.3c00042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Indexed: 07/14/2023]
Abstract
Each year, publicly available databases are updated with new compounds from different research institutions. Positive experimental outcomes are more likely to be reported; therefore, they account for a considerable fraction of these entries. Established publicly available databases such as ChEMBL allow researchers to use information without constrictions and create predictive tools for a broad spectrum of applications in the field of toxicology. Therefore, we investigated the distribution of positive and nonpositive entries within ChEMBL for a set of off-targets and its impact on the performance of classification models when applied to pharmaceutical industry data sets. Results indicate that models trained on publicly available data tend to overpredict positives, and models based on industry data sets predict negatives more often than those built using publicly available data sets. This is strengthened even further by the visualization of the prediction space for a set of 10,000 compounds, which makes it possible to identify regions in the chemical space where predictions converge. Finally, we highlight the utilization of these models for consensus modeling for potential adverse events prediction.
Collapse
Affiliation(s)
- Aljoša Smajić
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| | - Iris Rami
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| | - Sergey Sosnin
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| | - Gerhard F. Ecker
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| |
Collapse
|
4
|
Belfield SJ, Cronin MTD, Enoch SJ, Firman JW. Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs). PLoS One 2023; 18:e0282924. [PMID: 37163504 PMCID: PMC10171609 DOI: 10.1371/journal.pone.0282924] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/26/2023] [Indexed: 05/12/2023] Open
Abstract
Recent years have seen a substantial growth in the adoption of machine learning approaches for the purposes of quantitative structure-activity relationship (QSAR) development. Such a trend has coincided with desire to see a shifting in the focus of methodology employed within chemical safety assessment: away from traditional reliance upon animal-intensive in vivo protocols, and towards increased application of in silico (or computational) predictive toxicology. With QSAR central amongst techniques applied in this area, the emergence of algorithms trained through machine learning with the objective of toxicity estimation has, quite naturally, arisen. On account of the pattern-recognition capabilities of the underlying methods, the statistical power of the ensuing models is potentially considerable-appropriate for the handling even of vast, heterogeneous datasets. However, such potency comes at a price: this manifesting as the general practical deficits observed with respect to the reproducibility, interpretability and generalisability of the resulting tools. Unsurprisingly, these elements have served to hinder broader uptake (most notably within a regulatory setting). Areas of uncertainty liable to accompany (and hence detract from applicability of) toxicological QSAR have previously been highlighted, accompanied by the forwarding of suggestions for "best practice" aimed at mitigation of their influence. However, the scope of such exercises has remained limited to "classical" QSAR-that conducted through use of linear regression and related techniques, with the adoption of comparatively few features or descriptors. Accordingly, the intention of this study has been to extend the remit of best practice guidance, so as to address concerns specific to employment of machine learning within the field. In doing so, the impact of strategies aimed at enhancing the transparency (feature importance, feature reduction), generalisability (cross-validation) and predictive power (hyperparameter optimisation) of algorithms, trained upon real toxicity data through six common learning approaches, is evaluated.
Collapse
Affiliation(s)
- Samuel J Belfield
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - Steven J Enoch
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - James W Firman
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| |
Collapse
|
5
|
Multi-Strategy Assessment of Different Uses of QSAR under REACH Analysis of Alternatives to Advance Information Transparency. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19074338. [PMID: 35410019 PMCID: PMC8998180 DOI: 10.3390/ijerph19074338] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/13/2022] [Accepted: 03/17/2022] [Indexed: 11/16/2022]
Abstract
Under the Registration, Evaluation, Authorization, and Restriction of Chemicals (REACH) analysis of alternatives (AoA) process, quantitative structure–activity relationship (QSAR) models play an important role in expanding information gathering and organizing frameworks. Increasingly recognized as an alternative to testing under registration. QSARs have become a relevant tool in bridging data gaps and supporting weight of evidence (WoE) when assessing alternative substances. Additionally, QSARs are growing in importance in integrated testing strategies (ITS). For example, the REACH ITS framework for specific endpoints directs registrants to consider non-testing results, including QSAR predictions, when deciding if further animal testing is needed. Despite the raised profile of QSARs in these frameworks, a gap exists in the evaluation of QSAR use and QSAR documentation under authorization. An assessment of the different uses (e.g., WoE and ITS) in which QSAR predictions play a role in evidence gathering and organizing remains unaddressed for AoA. This study approached the disparity in information for QSAR predictions by conducting a substantive review of 24 AoA through May 2017, which contained higher-tier endpoints under REACH. Understanding the manner in which applicants manage QSAR prediction information in AoA and assessing their potential within ITS will be valuable in promoting regulatory use of QSARs and building out future platforms in the face of rapidly evolving technology while advancing information transparency.
Collapse
|
6
|
Aniceto N, Freitas AA, Bender A, Ghafourian T. A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood. J Cheminform 2016. [PMCID: PMC5395519 DOI: 10.1186/s13321-016-0182-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The ability to define the regions of chemical space where a predictive model can be safely used is a necessary condition to assure the reliability of new predictions. This implies that reliability must be determined across chemical space in the attempt to localize “safe” and “unsafe” regions for prediction. As a result we devised an applicability domain technique that addresses the data locally instead of handling it as a whole—the reliability-density neighbourhood (RDN). The main novelty aspect of this method is that it characterizes each single training instance according to the density of its neighbourhood in the training set, as well as its individual bias and precision. By scanning through the chemical space (by iteratively increasing the applicability domain area), it was observed that new test compounds are successively included into the applicability domain region in such a manner that strongly correlates to their predictive performance. This allows the mapping of local reliability across different locations in the training set space, and thus allows identifying regions where the model has low reliability. This method also showed matching profiles between two external sets, which is an indication that it performs robustly with new data. Another novel aspect in this technique is that it is paired with a specific feature selection algorithm. As a result, the impact of the feature set used was studied from which the top 20 features selected by ReliefF yielded the best results, as opposed to using the model’s features or the entire feature set as commonly done. As the third novel aspect, in this work we propose a new scoring function to help evaluate the quality of an applicability domain profile (i.e., the curve of accuracy vs the applicability domain measure in question). Overall, the RDN showed to be a promising method that can correctly sort new instances according to predictive performance. As a result, this technique can be received by an end-user as proof of concept for the performance of a QSAR model in new data, thus promoting the user’s trust on the QSAR output.. ![]()
Collapse
|
7
|
Yang M, Chen J, Shi X, Xu L, Xi Z, You L, An R, Wang X. Development of in Silico Models for Predicting P-Glycoprotein Inhibitors Based on a Two-Step Approach for Feature Selection and Its Application to Chinese Herbal Medicine Screening. Mol Pharm 2015; 12:3691-713. [PMID: 26376206 DOI: 10.1021/acs.molpharmaceut.5b00465] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
P-glycoprotein (P-gp) is regarded as an important factor in determining the ADMET (absorption, distribution, metabolism, elimination, and toxicity) characteristics of drugs and drug candidates. Successful prediction of P-gp inhibitors can thus lead to an improved understanding of the underlying mechanisms of both changes in the pharmacokinetics of drugs and drug-drug interactions. Therefore, there has been considerable interest in the development of in silico modeling of P-gp inhibitors in recent years. Considering that a large number of molecular descriptors are used to characterize diverse structural moleculars, efficient feature selection methods are required to extract the most informative predictors. In this work, we constructed an extensive available data set of 2428 molecules that includes 1518 P-gp inhibitors and 910 P-gp noninhibitors from multiple resources. Importantly, a two-step feature selection approach based on a genetic algorithm and a greedy forward-searching algorithm was employed to select the minimum set of the most informative descriptors that contribute to the prediction of P-gp inhibitors. To determine the best machine learning algorithm, 18 classifiers coupled with the feature selection method were compared. The top three best-performing models (flexible discriminant analysis, support vector machine, and random forest) and their ensemble model using respectively only 3, 9, 7, and 14 descriptors achieve an overall accuracy of 83.2%-86.7% for the training set containing 1040 compounds, an overall accuracy of 82.3%-85.5% for the test set containing 1039 compounds, and a prediction accuracy of 77.4%-79.9% for the external validation set containing 349 compounds. The models were further extensively validated by DrugBank database (1890 compounds). The proposed models are competitive with and in some cases better than other published models in terms of prediction accuracy and minimum number of descriptors. Applicability domain then was addressed by developing an ensemble classification model to obtain more reliable predictions. Finally, we employed these models as a virtual screening tool for identifying potential P-gp inhibitors in Traditional Chinese Medicine Systems Pharmacology (TCMSP) database containing a total of 13 051 unique compounds from 498 herbs, resulting in 875 potential P-gp inhibitors and 15 inhibitor-rich herbs. These predictions were partly supported by a literature search and are valuable not only to develop novel P-gp inhibitors from TCM in the early stages of drug development, but also to optimize the use of herbal remedies.
Collapse
Affiliation(s)
- Ming Yang
- Department of Chemistry, College of Pharmacy, Shanghai University of Traditional Chinese Medicine , Shanghai 200444, People's Republic of China.,Department of Pharmacy, Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine , Shanghai 200032, People's Republic of China
| | - Jialei Chen
- Department of Pharmacy, Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine , Shanghai 200032, People's Republic of China
| | - Xiufeng Shi
- Department of Pharmacy, Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine , Shanghai 200032, People's Republic of China
| | - Liwen Xu
- Department of Pharmacy, Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine , Shanghai 200032, People's Republic of China
| | - Zhijun Xi
- Department of Pharmacy, Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine , Shanghai 200032, People's Republic of China
| | - Lisha You
- Department of Chemistry, College of Pharmacy, Shanghai University of Traditional Chinese Medicine , Shanghai 200444, People's Republic of China
| | - Rui An
- Department of Chemistry, College of Pharmacy, Shanghai University of Traditional Chinese Medicine , Shanghai 200444, People's Republic of China
| | - Xinhong Wang
- Department of Chemistry, College of Pharmacy, Shanghai University of Traditional Chinese Medicine , Shanghai 200444, People's Republic of China
| |
Collapse
|
8
|
Gajewicz A, Schaeublin N, Rasulev B, Hussain S, Leszczynska D, Puzyn T, Leszczynski J. Towards understanding mechanisms governing cytotoxicity of metal oxides nanoparticles: hints from nano-QSAR studies. Nanotoxicology 2014; 9:313-25. [PMID: 24983896 DOI: 10.3109/17435390.2014.930195] [Citation(s) in RCA: 94] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The production of nanomaterials increases every year exponentially and therefore the probability these novel materials that they could cause adverse outcomes for human health and the environment also expands rapidly. We proposed two types of mechanisms of toxic action that are collectively applied in a nano-QSAR model, which provides governance over the toxicity of metal oxide nanoparticles to the human keratinocyte cell line (HaCaT). The combined experimental-theoretical studies allowed the development of an interpretative nano-QSAR model describing the toxicity of 18 nano-metal oxides to the HaCaT cell line, which is a common in vitro model for keratinocyte response during toxic dermal exposure. The comparison of the toxicity of metal oxide nanoparticles to bacteria Escherichia coli (prokaryotic system) and a human keratinocyte cell line (eukaryotic system), resulted in the hypothesis that different modes of toxic action occur between prokaryotic and eukaryotic systems.
Collapse
Affiliation(s)
- Agnieszka Gajewicz
- Laboratory of Environmental Chemometrics, Institute for Environmental and Human Health Protection, Faculty of Chemistry, University of Gdańsk , Gdańsk , Poland
| | | | | | | | | | | | | |
Collapse
|
9
|
Clark RD, Liang W, Lee AC, Lawless MS, Fraczkiewicz R, Waldman M. Using beta binomials to estimate classification uncertainty for ensemble models. J Cheminform 2014; 6:34. [PMID: 24987464 PMCID: PMC4076254 DOI: 10.1186/1758-2946-6-34] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 06/16/2014] [Indexed: 12/14/2022] Open
Abstract
Background Quantitative structure-activity (QSAR) models have enormous potential for reducing drug discovery and development costs as well as the need for animal testing. Great strides have been made in estimating their overall reliability, but to fully realize that potential, researchers and regulators need to know how confident they can be in individual predictions. Results Submodels in an ensemble model which have been trained on different subsets of a shared training pool represent multiple samples of the model space, and the degree of agreement among them contains information on the reliability of ensemble predictions. For artificial neural network ensembles (ANNEs) using two different methods for determining ensemble classification – one using vote tallies and the other averaging individual network outputs – we have found that the distribution of predictions across positive vote tallies can be reasonably well-modeled as a beta binomial distribution, as can the distribution of errors. Together, these two distributions can be used to estimate the probability that a given predictive classification will be in error. Large data sets comprised of logP, Ames mutagenicity, and CYP2D6 inhibition data are used to illustrate and validate the method. The distributions of predictions and errors for the training pool accurately predicted the distribution of predictions and errors for large external validation sets, even when the number of positive and negative examples in the training pool were not balanced. Moreover, the likelihood of a given compound being prospectively misclassified as a function of the degree of consensus between networks in the ensemble could in most cases be estimated accurately from the fitted beta binomial distributions for the training pool. Conclusions Confidence in an individual predictive classification by an ensemble model can be accurately assessed by examining the distributions of predictions and errors as a function of the degree of agreement among the constituent submodels. Further, ensemble uncertainty estimation can often be improved by adjusting the voting or classification threshold based on the parameters of the error distribution. Finally, the profiles for models whose predictive uncertainty estimates are not reliable provide clues to that effect without the need for comparison to an external test set.
Collapse
Affiliation(s)
- Robert D Clark
- Department of Life Sciences, Simulations Plus, Inc., 45205 10th Street West, Lancaster, CA 93534, USA
| | - Wenkel Liang
- Department of Life Sciences, Simulations Plus, Inc., 45205 10th Street West, Lancaster, CA 93534, USA
| | - Adam C Lee
- Department of Life Sciences, Simulations Plus, Inc., 45205 10th Street West, Lancaster, CA 93534, USA
| | - Michael S Lawless
- Department of Life Sciences, Simulations Plus, Inc., 45205 10th Street West, Lancaster, CA 93534, USA
| | - Robert Fraczkiewicz
- Department of Life Sciences, Simulations Plus, Inc., 45205 10th Street West, Lancaster, CA 93534, USA
| | - Marvin Waldman
- Department of Life Sciences, Simulations Plus, Inc., 45205 10th Street West, Lancaster, CA 93534, USA
| |
Collapse
|