1
|
Banerjee A, Kar S, Roy K, Patlewicz G, Charest N, Benfenati E, Cronin MTD. Molecular similarity in chemical informatics and predictive toxicity modeling: from quantitative read-across (q-RA) to quantitative read-across structure-activity relationship (q-RASAR) with the application of machine learning. Crit Rev Toxicol 2024; 54:659-684. [PMID: 39225123 DOI: 10.1080/10408444.2024.2386260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/25/2024] [Accepted: 07/25/2024] [Indexed: 09/04/2024]
Abstract
This article aims to provide a comprehensive critical, yet readable, review of general interest to the chemistry community on molecular similarity as applied to chemical informatics and predictive modeling with a special focus on read-across (RA) and read-across structure-activity relationships (RASAR). Molecular similarity-based computational tools, such as quantitative structure-activity relationships (QSARs) and RA, are routinely used to fill the data gaps for a wide range of properties including toxicity endpoints for regulatory purposes. This review will explore the background of RA starting from how structural information has been used through to how other similarity contexts such as physicochemical, absorption, distribution, metabolism, and elimination (ADME) properties, and biological aspects are being characterized. More recent developments of RA's integration with QSAR have resulted in the emergence of novel models such as ToxRead, generalized read-across (GenRA), and quantitative RASAR (q-RASAR). Conventional QSAR techniques have been excluded from this review except where necessary for context.
Collapse
Affiliation(s)
- Arkaprava Banerjee
- Department of Pharmaceutical Technology, Drug Theoretics and Cheminformatics (DTC) Laboratory, Jadavpur University, Kolkata, India
| | - Supratik Kar
- Department of Chemistry and Physics, Chemometrics & Molecular Modeling Laboratory, Kean University, Union, NJ, USA
| | - Kunal Roy
- Department of Pharmaceutical Technology, Drug Theoretics and Cheminformatics (DTC) Laboratory, Jadavpur University, Kolkata, India
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Nathaniel Charest
- Center for Computational Toxicology and Exposure, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Emilio Benfenati
- Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
| |
Collapse
|
2
|
Prussia AJ, Welsh C, Somers TS, Ruiz P. Workflow for predictive risk assessments of UVCBs: cheminformatics library design, QSAR, and read-across approaches applied to complex mixtures of metal naphthenates. FRONTIERS IN TOXICOLOGY 2024; 6:1452838. [PMID: 39411268 PMCID: PMC11473587 DOI: 10.3389/ftox.2024.1452838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Accepted: 09/06/2024] [Indexed: 10/19/2024] Open
Abstract
Substances of unknown or variable composition, complex reaction products, and biological materials (UVCBs) are commonly found in the environment. However, assessing their human toxicological risk is challenging due to their variable composition and many constituents. Metal naphthenate salts are one such category of UVCBs that are the reaction products of naphthenic acids with metals to form complex mixtures. Metal naphthenates are often found or used in household and industrial materials with potential for human exposure, but very few of these materials have been evaluated for causing human health hazards. Herein, we evaluate metal naphthenates using predictions derived from read-across and quantitative structure-activity/property relationship (QSAR/QSPR) models. Accordingly, we first built a computational chemistry library by enumerating the structures of naphthenic acids and derived 11,850 QSAR-acceptable structures; then, we used open and commercial in silico tools on these structures to predict a set of physicochemical properties and toxicity endpoints. We then compared the QSAR/QSPR predictions with available experimental data on naphthenic acids to provide a more complete picture of the contributions of the components to the toxicity profiles of metal naphthenate mixtures. The available systematic acute oral toxicity values (LD50) and QSAR LD50 predictions of all the naphthenic acid components indicated low concern for toxicity. The point of departure predictions for chronic repeated dose toxicity for the naphthenic acid components using QSAR models developed from studies on rats ranged from 25 to 50 mg/kg/day. These values are in good agreement with findings from studies on copper and zinc naphthenates, which had no observed adverse effect levels of 30 and 118 mg/kg/day, respectively. Hence, this study demonstrates how published in silico approaches can be used to identify the potential components of metal naphthenates for further testing, inform groupings of UVCBs such as naphthenates, as well as fill the data gaps using read-across and QSAR models to inform risk assessment.
Collapse
Affiliation(s)
- A. J. Prussia
- Office of Innovation and Analytics, Agency for Toxic Substances and Disease Registry, Atlanta, GA, United States
| | - C. Welsh
- Office of Innovation and Analytics, Agency for Toxic Substances and Disease Registry, Atlanta, GA, United States
| | - T. S. Somers
- Office of Community Health and Hazard Assessment, Agency for Toxic Substances and Disease Registry, Atlanta, GA, United States
| | - P. Ruiz
- Office of Innovation and Analytics, Agency for Toxic Substances and Disease Registry, Atlanta, GA, United States
| |
Collapse
|
3
|
Silva M, Capps S, London JK. Community-Engaged Research and the Use of Open Access ToxVal/ToxRef In Vivo Databases and New Approach Methodologies (NAM) to Address Human Health Risks From Environmental Contaminants. Birth Defects Res 2024; 116:e2395. [PMID: 39264239 PMCID: PMC11407745 DOI: 10.1002/bdr2.2395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 06/19/2024] [Accepted: 08/11/2024] [Indexed: 09/13/2024]
Abstract
BACKGROUND The paper analyzes opportunities for integrating Open access resources (Abstract Sifter, US EPA and NTP Toxicity Value and Toxicity Reference [ToxVal/ToxRefDB]) and New Approach Methodologies (NAM) integration into Community Engaged Research (CEnR). METHODS CompTox Chemicals Dashboard and Integrated Chemical Environment with in vivo ToxVal/ToxRef and NAMs (in vitro) databases are presented in three case studies to show how these resources could be used in Pilot Projects involving Community Engaged Research (CEnR) from the University of California, Davis, Environmental Health Sciences Center. RESULTS Case #1 developed a novel assay methodology for testing pesticide toxicity. Case #2 involved detection of water contaminants from wildfire ash and Case #3 involved contaminants on Tribal Lands. Abstract Sifter/ToxVal/ToxRefDB regulatory data and NAMs could be used to screen/prioritize risks from exposure to metals, PAHs and PFAS from wildfire ash leached into water and to investigate activities of environmental toxins (e.g., pesticides) on Tribal lands. Open access NAMs and computational tools can apply to detection of sensitive biological activities in potential or known adverse outcome pathways to predict points of departure (POD) for comparison with regulatory values for hazard identification. Open access Systematic Empirical Evaluation of Models or biomonitoring exposures are available for human subpopulations and can be used to determine bioactivity (POD) to exposure ratio to facilitate mitigation. CONCLUSIONS These resources help prioritize chemical toxicity and facilitate regulatory decisions and health protective policies that can aid stakeholders in deciding on needed research. Insights into exposure risks can aid environmental justice and health equity advocates.
Collapse
Affiliation(s)
- Marilyn Silva
- Co-Chair Community Stakeholders' Advisory Committee, University of California (UC Davis), Environmental Health Sciences Center (EHSC), Davis, California, USA
| | - Shosha Capps
- Co-Director Community Engagement Core, UC Davis EHSC, Davis, California, USA
| | - Jonathan K London
- Department of Human Ecology and Faculty Director Community Engagement Core, UC Davis EHSC, Sacramento, California, USA
| |
Collapse
|
4
|
Mansouri K, Taylor K, Auerbach S, Ferguson S, Frawley R, Hsieh JH, Jahnke G, Kleinstreuer N, Mehta S, Moreira-Filho JT, Parham F, Rider C, Rooney AA, Wang A, Sutherland V. Unlocking the Potential of Clustering and Classification Approaches: Navigating Supervised and Unsupervised Chemical Similarity. ENVIRONMENTAL HEALTH PERSPECTIVES 2024; 132:85002. [PMID: 39106156 DOI: 10.1289/ehp14001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/09/2024]
Abstract
BACKGROUND The field of toxicology has witnessed substantial advancements in recent years, particularly with the adoption of new approach methodologies (NAMs) to understand and predict chemical toxicity. Class-based methods such as clustering and classification are key to NAMs development and application, aiding the understanding of hazard and risk concerns associated with groups of chemicals without additional laboratory work. Advances in computational chemistry, data generation and availability, and machine learning algorithms represent important opportunities for continued improvement of these techniques to optimize their utility for specific regulatory and research purposes. However, due to their intricacy, deep understanding and careful selection are imperative to align the adequate methods with their intended applications. OBJECTIVES This commentary aims to deepen the understanding of class-based approaches by elucidating the pivotal role of chemical similarity (structural and biological) in clustering and classification approaches (CCAs). It addresses the dichotomy between general end point-agnostic similarity, often entailing unsupervised analysis, and end point-specific similarity necessitating supervised learning. The goal is to highlight the nuances of these approaches, their applications, and common misuses. DISCUSSION Understanding similarity is pivotal in toxicological research involving CCAs. The effectiveness of these approaches depends on the right definition and measure of similarity, which varies based on context and objectives of the study. This choice is influenced by how chemical structures are represented and the respective labels indicating biological activity, if applicable. The distinction between unsupervised clustering and supervised classification methods is vital, requiring the use of end point-agnostic vs. end point-specific similarity definition. Separate use or combination of these methods requires careful consideration to prevent bias and ensure relevance for the goal of the study. Unsupervised methods use end point-agnostic similarity measures to uncover general structural patterns and relationships, aiding hypothesis generation and facilitating exploration of datasets without the need for predefined labels or explicit guidance. Conversely, supervised techniques demand end point-specific similarity to group chemicals into predefined classes or to train classification models, allowing accurate predictions for new chemicals. Misuse can arise when unsupervised methods are applied to end point-specific contexts, like analog selection in read-across, leading to erroneous conclusions. This commentary provides insights into the significance of similarity and its role in supervised classification and unsupervised clustering approaches. https://doi.org/10.1289/EHP14001.
Collapse
Affiliation(s)
- Kamel Mansouri
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Kyla Taylor
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Scott Auerbach
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Stephen Ferguson
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Rachel Frawley
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Jui-Hua Hsieh
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Gloria Jahnke
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Nicole Kleinstreuer
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Suril Mehta
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - José T Moreira-Filho
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Fred Parham
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Cynthia Rider
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Andrew A Rooney
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Amy Wang
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Vicki Sutherland
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| |
Collapse
|
5
|
Srisongkram T. DeepRA: A novel deep learning-read-across framework and its application in non-sugar sweeteners mutagenicity prediction. Comput Biol Med 2024; 178:108731. [PMID: 38870727 DOI: 10.1016/j.compbiomed.2024.108731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 05/07/2024] [Accepted: 06/08/2024] [Indexed: 06/15/2024]
Abstract
Non-sugar sweeteners (NSSs) or artificial sweeteners have long been used as food chemicals since World War II. NSSs, however, also raise a concern about their mutagenicity. Evaluating the mutagenic ability of NSSs is crucial for food safety; this step is needed for every new chemical registration in the food and pharmaceutical industries. A computational assessment provides less time, money, and involved animals than the in vivo experiments; thus, this study developed a novel computational method from an ensemble convolutional deep neural network and read-across algorithms, called DeepRA, to classify the mutagenicity of chemicals. The mutagenicity data were obtained from the curated Ames test data set. The DeepRA model was developed using both molecular descriptors and molecular fingerprints. The obtained DeepRA model provides accurate and reliable mutagenicity classification through an independent test set. This model was then used to examine the NSSs-related chemicals, enabling the evaluation of mutagenicity from the NSSs-like substances. Finally, this model was publicly available at https://github.com/taraponglab/deepra for further use in chemical regulation and risk assessment.
Collapse
Affiliation(s)
- Tarapong Srisongkram
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, 40002, Thailand.
| |
Collapse
|
6
|
Newmeyer MN, Lyu Q, Sobus JR, Williams AJ, Nachman KE, Prasse C. Combining Nontargeted Analysis with Computer-Based Hazard Comparison Approaches to Support Prioritization of Unregulated Organic Contaminants in Biosolids. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:12135-12146. [PMID: 38916220 PMCID: PMC11381038 DOI: 10.1021/acs.est.4c02934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Biosolids are a byproduct of wastewater treatment that can be beneficially applied to agricultural land as a fertilizer. While U.S. regulations limit metals and pathogens in biosolids intended for land applications, no organic contaminants are currently regulated. Novel techniques can aid in detection, evaluation, and prioritization of biosolid-associated organic contaminants (BOCs). For example, nontargeted analysis (NTA) can detect a broad range of chemicals, producing data sets representing thousands of measured analytes that can be combined with computational toxicological tools to support human and ecological hazard assessment and prioritization. We combined NTA with a computer-based tool from the U.S. EPA, the Cheminformatics Hazard Comparison Module (HCM), to identify and prioritize BOCs present in U.S. and Canadian biosolids (n = 16). Four-hundred fifty-one features were detected in at least 80% of samples, with identities of 92 compounds confirmed or assigned probable structures. These compounds were primarily categorized as endogenous compounds, pharmaceuticals, industrial chemicals, and fragrances. Examples of top prioritized compounds were p-cresol and chlorophene, based on human health end points, and fludioxonil and triclocarban, based on ecological health end points. Combining NTA results with hazard comparison data allowed us to prioritize compounds to be included in future studies of the environmental fate and transport of BOCs.
Collapse
Affiliation(s)
- Matthew N Newmeyer
- Department of Environmental Health and Engineering, Johns Hopkins University, Baltimore, Maryland 21205, United States
| | - Qinfan Lyu
- Department of Environmental Health and Engineering, Johns Hopkins University, Baltimore, Maryland 21205, United States
| | - Jon R Sobus
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27709, United States
| | - Antony J Williams
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27709, United States
| | - Keeve E Nachman
- Department of Environmental Health and Engineering, Johns Hopkins University, Baltimore, Maryland 21205, United States
- Risk Sciences and Public Policy Institute, Johns Hopkins University, Baltimore, Maryland 21205, United States
- Center for a Livable Future, Johns Hopkins University, Baltimore, Maryland 21205, United States
| | - Carsten Prasse
- Department of Environmental Health and Engineering, Johns Hopkins University, Baltimore, Maryland 21205, United States
- Risk Sciences and Public Policy Institute, Johns Hopkins University, Baltimore, Maryland 21205, United States
| |
Collapse
|
7
|
Groff L, Williams A, Shah I, Patlewicz G. MetSim: Integrated Programmatic Access and Pathway Management for Xenobiotic Metabolism Simulators. Chem Res Toxicol 2024; 37:685-697. [PMID: 38598715 PMCID: PMC11325951 DOI: 10.1021/acs.chemrestox.3c00398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Xenobiotic metabolism is a key consideration in evaluating the hazards and risks posed by environmental chemicals. A number of software tools exist that are capable of simulating metabolites, but each reports its predictions in a different format and with varying levels of detail. This makes comparing the performance and coverage of the tools a practical challenge. To address this shortcoming, we developed a metabolic simulation framework called MetSim, which comprises three main components. A graph-based schema was developed to allow metabolism information to be harmonized. The schema was implemented in MongoDB to store and retrieve metabolic graphs for subsequent analysis. MetSim currently includes an application programming interface for four metabolic simulators: BioTransformer, the OECD Toolbox, EPA's chemical transformation simulator (CTS), and tissue metabolism simulator (TIMES). Lastly, MetSim provides functions to help evaluate simulator performance for specific data sets. In this study, a set of 112 drugs with 432 reported metabolites were compiled, and predictions were made using the 4 simulators. Fifty-nine of the 112 drugs were taken from the Small Molecule Pathway Database, with the remainder sourced from the literature. The human models within BioTransformer and CTS (Phase I only) and the rat models within TIMES and the OECD Toolbox (Phase I only) were used to make predictions for the chemicals in the data set. The recall and precision (recall, precision) ranked in order of highest recall for each individual tool were CTS (0.54, 0.017), BioTransformer (0.50, 0.008), Toolbox in vitro (0.40, 0.144), TIMES in vivo (0.40, 0.133), Toolbox in vivo (0.40, 0.118), and TIMES in vitro (0.39, 0.128). Combining all of the model predictions together increased the overall recall (0.73, 0.008). MetSim enabled insights into the performance and coverage of in silico metabolic simulators to be more efficiently derived, which in turn should aid future efforts to evaluate other data sets.
Collapse
Affiliation(s)
- Louis Groff
- Center for Computational Toxicology and Exposure (CCTE), Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Antony Williams
- Center for Computational Toxicology and Exposure (CCTE), Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Imran Shah
- Center for Computational Toxicology and Exposure (CCTE), Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure (CCTE), Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| |
Collapse
|
8
|
Karamertzanis PG, Patlewicz G, Sannicola M, Paul-Friedman K, Shah I. Systematic Approaches for the Encoding of Chemical Groups: A Case Study. Chem Res Toxicol 2024; 37:600-619. [PMID: 38498310 PMCID: PMC11258607 DOI: 10.1021/acs.chemrestox.3c00411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Regulatory authorities aim to organize substances into groups to facilitate prioritization within hazard and risk assessment processes. Often, such chemical groupings are not explicitly defined by structural rules or physicochemical property information. This is largely due to how these groupings are developed, namely, a manual expert curation process, which in turn makes updating and refining groupings, as new substances are evaluated, a practical challenge. Herein, machine learning methods were leveraged to build models that could preliminarily assign substances to predefined groups. A set of 86 groupings containing 2,184 substances as published on the European Chemicals Agency (ECHA) website were mapped to the U.S. Environmental Protection Agency (EPA) Distributed Toxicity Structure Database (DSSTox) content to extract chemical and structural information. Substances were represented using Morgan fingerprints, and two machine learning approaches were used to classify test substances into 56 groups containing at least 10 substances with a structural representation in the data set: k-nearest neighbor (kNN) and random forest (RF), that led to mean 5-fold cross-validation test accuracies (average F1 scores) of 0.781 and 0.853, respectively. With a 9% improvement, the RF classifier was significantly more accurate than KNN (p-value = 0.001). The approach offers promise as a means of the initial profiling of new substances into predefined groups to facilitate prioritization efforts and streamline the assessment of new substances when earlier groupings are available. The algorithm to fit and use these models has been made available in the accompanying repository, thereby enabling both use of the produced models and refitting of these models, as new groupings become available by regulatory authorities or industry.
Collapse
Affiliation(s)
- Panagiotis G Karamertzanis
- Computational Assessment and Alternative Methods, European Chemicals Agency (ECHA), Telakkakatu 6, Helsinki 00150, Finland
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure (CCTE), US EPA, 109 TW Alexander Dr, Research Triangle Park, North Carolina 27711, United States
| | - Marta Sannicola
- Computational Assessment and Alternative Methods, European Chemicals Agency (ECHA), Telakkakatu 6, Helsinki 00150, Finland
| | - Katie Paul-Friedman
- Center for Computational Toxicology and Exposure (CCTE), US EPA, 109 TW Alexander Dr, Research Triangle Park, North Carolina 27711, United States
| | - Imran Shah
- Center for Computational Toxicology and Exposure (CCTE), US EPA, 109 TW Alexander Dr, Research Triangle Park, North Carolina 27711, United States
| |
Collapse
|
9
|
Tate T, Patlewicz G, Shah I. A Comparison of Machine Learning Approaches for predicting Hepatotoxicity potential using Chemical Structure and Targeted Transcriptomic Data. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2024; 29:1-14. [PMID: 38993502 PMCID: PMC11235188 DOI: 10.1016/j.comtox.2024.100301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
Animal toxicity testing is time and resource intensive, making it difficult to keep pace with the number of substances requiring assessment. Machine learning (ML) models that use chemical structure information and high-throughput experimental data can be helpful in predicting potential toxicity . However, much of the toxicity data used to train ML models is biased with an unequal balance of positives and negatives primarily since substances selected for in vivo testing are expected to elicit some toxicity effect. To investigate the impact this bias had on predictive performance, various sampling approaches were used to balance in vivo toxicity data as part of a supervised ML workflow to predict hepatotoxicity outcomes from chemical structure and/or targeted transcriptomic data. From the chronic, subchronic, developmental, multigenerational reproductive, and subacute repeat-dose testing toxicity outcomes with a minimum of 50 positive and 50 negative substances, 18 different study-toxicity outcome combinations were evaluated in up to 7 ML models. These included Artificial Neural Networks, Random Forests, Bernouilli Naïve Bayes, Gradient Boosting, and Support Vector classification algorithms which were compared with a local approach, Generalised Read-Across (GenRA), a similarity-weighted k-Nearest Neighbour (k-NN) method. The mean CV F1 performance for unbalanced data across all classifiers and descriptors for chronic liver effects was 0.735 (0.0395 SD). Mean CV F1 performance dropped to 0.639 (0.073 SD) with over-sampling approaches though the poorer performance of KNN approaches in some cases contributed to the observed decrease (mean CV F1 performance excluding KNN was 0.697 (0.072 SD)). With under-sampling approaches, the mean CV F1 was 0.523 (0.083 SD). For developmental liver effects, the mean CV F1 performance was much lower with 0.089 (0.111 SD) for unbalanced approaches and 0.149 (0.084 SD) for under-sampling. Over-sampling approaches led to an increase in mean CV F1 performance (0.234, (0.107 SD)) for developmental liver toxicity. Model performance was found to be dependent on dataset, model type, balancing approach and feature selection. Accordingly tailoring ML workflows for predicting toxicity should consider class imbalance and rely on simpler classifiers first.
Collapse
Affiliation(s)
- Tia Tate
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27709, USA
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27709, USA
| | - Imran Shah
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27709, USA
| |
Collapse
|
10
|
Berridge BR, Bucher JR, Sistare F, Stevens JL, Chappell GA, Clemons M, Snow S, Wignall J, Shipkowski KA. Enabling novel paradigms: a biological questions-based approach to human chemical hazard and drug safety assessment. Toxicol Sci 2024; 198:4-13. [PMID: 38134427 PMCID: PMC10901149 DOI: 10.1093/toxsci/kfad124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2023] Open
Abstract
Throughput needs, costs of time and resources, and concerns about the use of animals in hazard and safety assessment studies are fueling a growing interest in adopting new approach methodologies for use in product development and risk assessment. However, current efforts to define "next-generation risk assessment" vary considerably across commercial and regulatory sectors, and an a priori definition of the biological scope of data needed to assess hazards is generally lacking. We propose that the absence of clearly defined questions that can be answered during hazard assessment is the primary barrier to the generation of a paradigm flexible enough to be used across varying product development and approval decision contexts. Herein, we propose a biological questions-based approach (BQBA) for hazard and safety assessment to facilitate fit-for-purpose method selection and more efficient evidence-based decision-making. The key pillars of this novel approach are bioavailability, bioactivity, adversity, and susceptibility. This BQBA is compared with current hazard approaches and is applied in scenarios of varying pathobiological understanding and/or regulatory testing requirements. To further define the paradigm and key questions that allow better prediction and characterization of human health hazard, a multidisciplinary collaboration among stakeholder groups should be initiated.
Collapse
Affiliation(s)
- Brian R Berridge
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA
| | - John R Bucher
- Retired (Division of Translational Toxicology, NIEHS), Hillsborough, North Carolina 27278, USA
| | | | - James L Stevens
- Paradox Found Consulting Services, Apex, North Carolina 27523, USA
| | | | | | | | | | - Kelly A Shipkowski
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA
| |
Collapse
|
11
|
Srisongkram T. Ensemble Quantitative Read-Across Structure-Activity Relationship Algorithm for Predicting Skin Cytotoxicity. Chem Res Toxicol 2023; 36:1961-1972. [PMID: 38047785 DOI: 10.1021/acs.chemrestox.3c00238] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Read-across (RA) and quantitative structure-activity relationship (QSAR) are two alternative methods commonly used to fill data gaps in chemical registrations. These approaches use physicochemical properties or molecular fingerprints of source substances to predict the properties of unknown substances that have similar chemical structures or physicochemical properties. Research on RA and QSAR is essential to minimize the time, money, and animal testing needed to determine biological properties that are not currently known. This study developed a stacked ensemble quantitative read-across structure-activity relationship algorithm (enQRASAR) for predicting skin irritation toxicity based on negative log cell viability inhibition concentration at 50% (pIC50) against skin keratinocytes as the end point. The goodness-of-fit and predictability of this algorithm were validated using leave-one-out cross-validation and external test data sets. The results obtained were statistically reliable in terms of goodness-of-fit, robustness, and predictability metrics. Additionally, the developed model demonstrated a low prediction error when predicting FDA-approved drugs. These results confirm that the enQRASAR algorithm can be used to predict skin cytotoxicity of chemicals. Therefore, this model was publicly available to further facilitate toxicity predictions of unknown compounds in chemical registrations.
Collapse
Affiliation(s)
- Tarapong Srisongkram
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen 40000, Thailand
| |
Collapse
|