1
|
Dobbelaere MR, Lengyel I, Stevens CV, Van Geem KM. Geometric deep learning for molecular property predictions with chemical accuracy across chemical space. J Cheminform 2024; 16:99. [PMID: 39138560 PMCID: PMC11323398 DOI: 10.1186/s13321-024-00895-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Accepted: 08/06/2024] [Indexed: 08/15/2024] Open
Abstract
Chemical engineers heavily rely on precise knowledge of physicochemical properties to model chemical processes. Despite the growing popularity of deep learning, it is only rarely applied for property prediction due to data scarcity and limited accuracy for compounds in industrially-relevant areas of the chemical space. Herein, we present a geometric deep learning framework for predicting gas- and liquid-phase properties based on novel quantum chemical datasets comprising 124,000 molecules. Our findings reveal that the necessity for quantum-chemical information in deep learning models varies significantly depending on the modeled physicochemical property. Specifically, our top-performing geometric model meets the most stringent criteria for "chemically accurate" thermochemistry predictions. We also show that by carefully selecting the appropriate model featurization and evaluating prediction uncertainties, the reliability of the predictions can be strongly enhanced. These insights represent a crucial step towards establishing deep learning as the standard property prediction workflow in both industry and academia.Scientific contributionWe propose a flexible property prediction tool that can handle two-dimensional and three-dimensional molecular information. A thermochemistry prediction methodology that achieves high-level quantum chemistry accuracy for a broad application range is presented. Trained deep learning models and large novel molecular databases of real-world molecules are provided to offer a directly usable and fast property prediction solution to practitioners.
Collapse
Affiliation(s)
- Maarten R Dobbelaere
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
| | - István Lengyel
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
- ChemInsights LLC, Dover, DE, 19901, USA
| | - Christian V Stevens
- SynBioC Research Group, Department of Green Chemistry and Technology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Kevin M Van Geem
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium.
| |
Collapse
|
2
|
Brown TN, Sangion A, Arnot JA. Identifying uncertainty in physical-chemical property estimation with IFSQSAR. J Cheminform 2024; 16:65. [PMID: 38816859 PMCID: PMC11140865 DOI: 10.1186/s13321-024-00853-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Accepted: 05/09/2024] [Indexed: 06/01/2024] Open
Abstract
This study describes the development and evaluation of six new models for predicting physical-chemical (PC) properties that are highly relevant for chemical hazard, exposure, and risk estimation: solubility (in water SW and octanol SO), vapor pressure (VP), and the octanol-water (KOW), octanol-air (KOA), and air-water (KAW) partition ratios. The models are implemented in the Iterative Fragment Selection Quantitative Structure-Activity Relationship (IFSQSAR) python package, Version 1.1.0. These models are implemented as Poly-Parameter Linear Free Energy Relationship (PPLFER) equations which combine experimentally calibrated system parameters and solute descriptors predicted with QSPRs. Two other ancillary models have been developed and implemented, a QSPR for Molar Volume (MV) and a classifier for the physical state of chemicals at room temperature. The IFSQSAR methods for characterizing applicability domain (AD) and calculating uncertainty estimates expressed as 95% prediction intervals (PI) for predicted properties are described and tested on 9,000 measured partition ratios and 4,000 VP and SW values. The measured data are external to IFSQSAR training and validation datasets and are used to assess the predictivity of the models for "novel chemicals" in an unbiased manner. The 95% PI intervals calculated from validation datasets for partition ratios needed to be scaled by a factor of 1.25 to capture 95% of the external data. Predictions for VP and SW are more uncertain, primarily due to the challenges in differentiating their physical state (i.e., liquids or solids) at room temperature. The prediction accuracy of the models for log KOW, log KAW and log KOA of novel, data-poor chemicals is estimated to be in the range of 0.7 to 1.4 root mean squared error of prediction (RMSEP), with RMSEP in the range 1.7-1.8 for log VP and log SW. Scientific contributionNew partitioning models integrate empirical PPLFER equations and QSARs, allowing for seamless integration of experimental data and model predictions. This work tests the real predictivity of the models for novel chemicals which are not in the model training or external validation datasets.
Collapse
Affiliation(s)
- Trevor N Brown
- ARC Arnot Research & Consulting, Toronto, ON, M4C 2B4, Canada.
| | | | - Jon A Arnot
- ARC Arnot Research & Consulting, Toronto, ON, M4C 2B4, Canada
- Department of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto, ON, M1C 1A4, Canada
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, M5S 1A8, Canada
| |
Collapse
|
3
|
Karamertzanis PG, Patlewicz G, Sannicola M, Paul-Friedman K, Shah I. Systematic Approaches for the Encoding of Chemical Groups: A Case Study. Chem Res Toxicol 2024; 37:600-619. [PMID: 38498310 PMCID: PMC11258607 DOI: 10.1021/acs.chemrestox.3c00411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Regulatory authorities aim to organize substances into groups to facilitate prioritization within hazard and risk assessment processes. Often, such chemical groupings are not explicitly defined by structural rules or physicochemical property information. This is largely due to how these groupings are developed, namely, a manual expert curation process, which in turn makes updating and refining groupings, as new substances are evaluated, a practical challenge. Herein, machine learning methods were leveraged to build models that could preliminarily assign substances to predefined groups. A set of 86 groupings containing 2,184 substances as published on the European Chemicals Agency (ECHA) website were mapped to the U.S. Environmental Protection Agency (EPA) Distributed Toxicity Structure Database (DSSTox) content to extract chemical and structural information. Substances were represented using Morgan fingerprints, and two machine learning approaches were used to classify test substances into 56 groups containing at least 10 substances with a structural representation in the data set: k-nearest neighbor (kNN) and random forest (RF), that led to mean 5-fold cross-validation test accuracies (average F1 scores) of 0.781 and 0.853, respectively. With a 9% improvement, the RF classifier was significantly more accurate than KNN (p-value = 0.001). The approach offers promise as a means of the initial profiling of new substances into predefined groups to facilitate prioritization efforts and streamline the assessment of new substances when earlier groupings are available. The algorithm to fit and use these models has been made available in the accompanying repository, thereby enabling both use of the produced models and refitting of these models, as new groupings become available by regulatory authorities or industry.
Collapse
Affiliation(s)
- Panagiotis G Karamertzanis
- Computational Assessment and Alternative Methods, European Chemicals Agency (ECHA), Telakkakatu 6, Helsinki 00150, Finland
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure (CCTE), US EPA, 109 TW Alexander Dr, Research Triangle Park, North Carolina 27711, United States
| | - Marta Sannicola
- Computational Assessment and Alternative Methods, European Chemicals Agency (ECHA), Telakkakatu 6, Helsinki 00150, Finland
| | - Katie Paul-Friedman
- Center for Computational Toxicology and Exposure (CCTE), US EPA, 109 TW Alexander Dr, Research Triangle Park, North Carolina 27711, United States
| | - Imran Shah
- Center for Computational Toxicology and Exposure (CCTE), US EPA, 109 TW Alexander Dr, Research Triangle Park, North Carolina 27711, United States
| |
Collapse
|
4
|
Charest N, Lowe CN, Ramsland C, Meyer B, Samano V, Williams AJ. Improving predictions of compound amenability for liquid chromatography-mass spectrometry to enhance non-targeted analysis. Anal Bioanal Chem 2024; 416:2565-2579. [PMID: 38530399 PMCID: PMC11228616 DOI: 10.1007/s00216-024-05229-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 02/14/2024] [Accepted: 02/16/2024] [Indexed: 03/28/2024]
Abstract
Mass-spectrometry-based non-targeted analysis (NTA), in which mass spectrometric signals are assigned chemical identities based on a systematic collation of evidence, is a growing area of interest for toxicological risk assessment. Successful NTA results in better identification of potentially hazardous pollutants within the environment, facilitating the development of targeted analytical strategies to best characterize risks to human and ecological health. A supporting component of the NTA process involves assessing whether suspected chemicals are amenable to the mass spectrometric method, which is necessary in order to assign an observed signal to the chemical structure. Prior work from this group involved the development of a random forest model for predicting the amenability of 5517 unique chemical structures to liquid chromatography-mass spectrometry (LC-MS). This work improves the interpretability of the group's prior model of the same endpoint, as well as integrating 1348 more data points across negative and positive ionization modes. We enhance interpretability by feature engineering, a machine learning practice that reduces the input dimensionality while attempting to preserve performance statistics. We emphasize the importance of interpretable machine learning models within the context of building confidence in NTA identification. The novel data were curated by the labeling of compounds as amenable or unamenable by expert curators, resulting in an enhanced set of chemical compounds to expand the applicability domain of the prior model. The balanced accuracy benchmark of the newly developed model is comparable to performance previously reported (mean CV BA is 0.84 vs. 0.82 in positive mode, and 0.85 vs. 0.82 in negative mode), while on a novel external set, derived from this work's data, the Matthews correlation coefficients (MCC) for the novel models are 0.66 and 0.68 for positive and negative mode, respectively. Our group's prior published models scored MCC of 0.55 and 0.54 on the same external sets. This demonstrates appreciable improvement over the chemical space captured by the expanded dataset. This work forms part of our ongoing efforts to develop models with higher interpretability and higher performance to support NTA efforts.
Collapse
Affiliation(s)
- Nathaniel Charest
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA.
| | - Charles N Lowe
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA
| | | | - Brian Meyer
- Senior Environmental Employment Program, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA
| | - Vicente Samano
- Senior Environmental Employment Program, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA
| | - Antony J Williams
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA
| |
Collapse
|
5
|
Tian L, Woo W, Canchola A, Chen K, Lin YH. Correlation gas chromatography and two-dimensional volatility basis methods to predict gas-particle partitioning for e-cigarette aerosols. AEROSOL SCIENCE AND TECHNOLOGY : THE JOURNAL OF THE AMERICAN ASSOCIATION FOR AEROSOL RESEARCH 2024; 58:630-643. [PMID: 38774581 PMCID: PMC11105163 DOI: 10.1080/02786826.2024.2326547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 02/27/2024] [Indexed: 05/24/2024]
Abstract
E-cigarette aerosols contain a complex mixture of harmful and potentially harmful chemicals. Once released into the environment, they evolve and become new sources of indoor air pollutants that could pose a significant threat to both users and non-users. However, current understanding of the physicochemical properties of e-cigarette aerosol constituents that govern gas-particle partitioning in the atmosphere is limited, making it difficult to estimate the health risks associated with exposure. Here, we used correlation gas chromatography (C-GC) and two-dimensional volatility basis set (2D-VBS) methods to determine the vapor pressures and volatility for commonly reported toxic and irritating e-cigarette aerosol constituents. The vapor pressures of target compounds at 298 K were estimated from the Antoine-type linear relationship between the vapor pressure of reference standards and their retention times. Our C-GC results showed an overall positive correlation (R = 0.84) with estimates using the EPI (Estimation Programs Interface) Suite. The volatility calculated by 2D-VBS correlates well with the calculated vapor pressure from both C-GC (R = 0.82) and EPI Suite (R = 0.85). The volatility distribution also indicated fresh e-cigarette aerosol constituents are mainly more volatile organic compounds. Our case study revealed that low-vapor-pressure compounds (e.g., σ-dodecalactone, γ-decalactone, and maltol) become enriched in the e-cigarette aerosols within 2 hours following vaping emissions. Overall, these findings demonstrate the applicability of the C-GC and 2D-VBS methods for determining the physiochemical properties of e-cigarette aerosol constituents, which can aid in assessing the dynamic chemical composition of e-cigarette aerosols and exposures to vaping emissions in indoor environments.
Collapse
Affiliation(s)
- Linhui Tian
- Department of Environmental Sciences, University of California, Riverside, California, USA
| | - Wonsik Woo
- Environmental Toxicology Graduate Program, University of California, Riverside, California, USA
| | - Alexa Canchola
- Environmental Toxicology Graduate Program, University of California, Riverside, California, USA
| | - Kunpeng Chen
- Department of Environmental Sciences, University of California, Riverside, California, USA
| | - Ying-Hsuan Lin
- Department of Environmental Sciences, University of California, Riverside, California, USA
- Environmental Toxicology Graduate Program, University of California, Riverside, California, USA
| |
Collapse
|
6
|
Mansouri K, Moreira-Filho JT, Lowe CN, Charest N, Martin T, Tkachenko V, Judson R, Conway M, Kleinstreuer NC, Williams AJ. Free and open-source QSAR-ready workflow for automated standardization of chemical structures in support of QSAR modeling. J Cheminform 2024; 16:19. [PMID: 38378618 PMCID: PMC10880251 DOI: 10.1186/s13321-024-00814-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/10/2024] [Indexed: 02/22/2024] Open
Abstract
The rapid increase of publicly available chemical structures and associated experimental data presents a valuable opportunity to build robust QSAR models for applications in different fields. However, the common concern is the quality of both the chemical structure information and associated experimental data. This is especially true when those data are collected from multiple sources as chemical substance mappings can contain many duplicate structures and molecular inconsistencies. Such issues can impact the resulting molecular descriptors and their mappings to experimental data and, subsequently, the quality of the derived models in terms of accuracy, repeatability, and reliability. Herein we describe the development of an automated workflow to standardize chemical structures according to a set of standard rules and generate two and/or three-dimensional "QSAR-ready" forms prior to the calculation of molecular descriptors. The workflow was designed in the KNIME workflow environment and consists of three high-level steps. First, a structure encoding is read, and then the resulting in-memory representation is cross-referenced with any existing identifiers for consistency. Finally, the structure is standardized using a series of operations including desalting, stripping of stereochemistry (for two-dimensional structures), standardization of tautomers and nitro groups, valence correction, neutralization when possible, and then removal of duplicates. This workflow was initially developed to support collaborative modeling QSAR projects to ensure consistency of the results from the different participants. It was then updated and generalized for other modeling applications. This included modification of the "QSAR-ready" workflow to generate "MS-ready structures" to support the generation of substance mappings and searches for software applications related to non-targeted analysis mass spectrometry. Both QSAR and MS-ready workflows are freely available in KNIME, via standalone versions on GitHub, and as docker container resources for the scientific community. Scientific contribution: This work pioneers an automated workflow in KNIME, systematically standardizing chemical structures to ensure their readiness for QSAR modeling and broader scientific applications. By addressing data quality concerns through desalting, stereochemistry stripping, and normalization, it optimizes molecular descriptors' accuracy and reliability. The freely available resources in KNIME, GitHub, and docker containers democratize access, benefiting collaborative research and advancing diverse modeling endeavors in chemistry and mass spectrometry.
Collapse
Affiliation(s)
- Kamel Mansouri
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA.
| | - José T Moreira-Filho
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA
| | - Charles N Lowe
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Nathaniel Charest
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Todd Martin
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | | | - Richard Judson
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Mike Conway
- National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA
| | - Nicole C Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA
| | - Antony J Williams
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| |
Collapse
|
7
|
Kramer L, Schulze T, Klüver N, Altenburger R, Hackermüller J, Krauss M, Busch W. Curated mode-of-action data and effect concentrations for chemicals relevant for the aquatic environment. Sci Data 2024; 11:60. [PMID: 38200014 PMCID: PMC10781676 DOI: 10.1038/s41597-023-02904-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 12/29/2023] [Indexed: 01/12/2024] Open
Abstract
Chemicals in the aquatic environment can be harmful to organisms and ecosystems. Knowledge on effect concentrations as well as on mechanisms and modes of interaction with biological molecules and signaling pathways is necessary to perform chemical risk assessment and identify toxic compounds. To this end, we developed criteria and a pipeline for harvesting and summarizing effect concentrations from the US ECOTOX database for the three aquatic species groups algae, crustaceans, and fish and researched the modes of action of more than 3,300 environmentally relevant chemicals in literature and databases. We provide a curated dataset ready to be used for risk assessment based on monitoring data and the first comprehensive collection and categorization of modes of action of environmental chemicals. Authorities, regulators, and scientists can use this data for the grouping of chemicals, the establishment of meaningful assessment groups, and the development of in vitro and in silico approaches for chemical testing and assessment.
Collapse
Affiliation(s)
- Lena Kramer
- Helmholtz Centre for Environmental Research - UFZ, Permoserstr. 15, 04318, Leipzig, Germany
| | - Tobias Schulze
- Helmholtz Centre for Environmental Research - UFZ, Permoserstr. 15, 04318, Leipzig, Germany.
| | - Nils Klüver
- Helmholtz Centre for Environmental Research - UFZ, Permoserstr. 15, 04318, Leipzig, Germany
| | - Rolf Altenburger
- Helmholtz Centre for Environmental Research - UFZ, Permoserstr. 15, 04318, Leipzig, Germany
- RWTH Aachen University, Institute for Environmental Research, 52074, Aachen, Germany
| | - Jörg Hackermüller
- Helmholtz Centre for Environmental Research - UFZ, Permoserstr. 15, 04318, Leipzig, Germany
- University of Leipzig, Faculty of Mathematics and Computer Science, Ritterstr. 26, 04109, Leipzig, Germany
| | - Martin Krauss
- Helmholtz Centre for Environmental Research - UFZ, Permoserstr. 15, 04318, Leipzig, Germany
| | - Wibke Busch
- Helmholtz Centre for Environmental Research - UFZ, Permoserstr. 15, 04318, Leipzig, Germany.
| |
Collapse
|
8
|
Chen M, Yang J, Tang C, Lu X, Wei Z, Liu Y, Yu P, Li H. Improving ADMET Prediction Accuracy for Candidate Drugs: Factors to Consider in QSPR Modeling Approaches. Curr Top Med Chem 2024; 24:222-242. [PMID: 38083894 DOI: 10.2174/0115680266280005231207105900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 11/02/2023] [Accepted: 11/10/2023] [Indexed: 05/04/2024]
Abstract
Quantitative Structure-Property Relationship (QSPR) employs mathematical and statistical methods to reveal quantitative correlations between the pharmacokinetics of compounds and their molecular structures, as well as their physical and chemical properties. QSPR models have been widely applied in the prediction of drug absorption, distribution, metabolism, excretion, and toxicity (ADMET). However, the accuracy of QSPR models for predicting drug ADMET properties still needs improvement. Therefore, this paper comprehensively reviews the tools employed in various stages of QSPR predictions for drug ADMET. It summarizes commonly used approaches to building QSPR models, systematically analyzing the advantages and limitations of each modeling method to ensure their judicious application. We provide an overview of recent advancements in the application of QSPR models for predicting drug ADMET properties. Furthermore, this review explores the inherent challenges in QSPR modeling while also proposing a range of considerations aimed at enhancing model prediction accuracy. The objective is to enhance the predictive capabilities of QSPR models in the field of drug development and provide valuable reference and guidance for researchers in this domain.
Collapse
Affiliation(s)
- Meilun Chen
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Jie Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Chunhua Tang
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Xiaoling Lu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Zheng Wei
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Yijie Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Peng Yu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - HuanHuan Li
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| |
Collapse
|
9
|
Gui W, Guo H, Wang C, Li M, Jin Y, Zhang K, Dai J, Zhao Y. Comparative developmental toxicities of zebrafish towards structurally diverse per- and polyfluoroalkyl substances. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 902:166569. [PMID: 37633367 DOI: 10.1016/j.scitotenv.2023.166569] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 08/23/2023] [Accepted: 08/23/2023] [Indexed: 08/28/2023]
Abstract
Structurally diverse per- and polyfluoroalkyl substances (PFASs) are increasingly detected in ecosystems and humans. Therefore, the clarification of their ecological and health risks is urgently required. In the present study, the toxicity of a series of PFASs, including PFOS, PFBS, Nafion BP1, Nafion BP2, F53B, OBS, PFOA, PFUnDA, PFO5DoDA, HFPO-TA was investigated. Similarities and differences in the developmental toxicity potentials were revealed. Our results demonstrated that PFUnDA exhibited the highest toxicity with the lowest EC50 value of 4.36 mg/L (for morphological abnormality); this was followed by F53B (5.58 mg/L), PFOS (6.15 mg/L), and OBS (10.65 mg/L). Positive correlations with volatility/solubility and chemotypes related to specific biological activity, including the bioconcentration factor (LogBCF), and negative correlations with lipid solubility and carbon chain component-related chemotypes, including the number of carbon and fluorine atoms, provided a reasonable explanation in the view of molecular structures. Furthermore, comparative transcriptome analysis provided molecular evidence for the relationship between PFASs exposure and malformations. Common differentially expressed genes (DEGs) involved in spine curve development, pericardial edema, and cell/organism growth-related pathways presented common targets, leading to toxic effects. Therefore, the present results provide novel insights into the potential environmental risks of structurally diverse PFASs and contribute to the selection of safer PFAS replacements.
Collapse
Affiliation(s)
- Wanying Gui
- State Environmental Protection Key Laboratory of Environmental Health Impact Assessment of Emerging Contaminants, School of Environmental Science and Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China; Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hua Guo
- State Environmental Protection Key Laboratory of Environmental Health Impact Assessment of Emerging Contaminants, School of Environmental Science and Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China; Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Congcong Wang
- State Environmental Protection Key Laboratory of Environmental Health Impact Assessment of Emerging Contaminants, School of Environmental Science and Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Minjia Li
- State Environmental Protection Key Laboratory of Environmental Health Impact Assessment of Emerging Contaminants, School of Environmental Science and Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Yuanxiang Jin
- Department of Biotechnology, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, China
| | - Kun Zhang
- State Environmental Protection Key Laboratory of Environmental Health Impact Assessment of Emerging Contaminants, School of Environmental Science and Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Jiayin Dai
- State Environmental Protection Key Laboratory of Environmental Health Impact Assessment of Emerging Contaminants, School of Environmental Science and Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Yanbin Zhao
- State Environmental Protection Key Laboratory of Environmental Health Impact Assessment of Emerging Contaminants, School of Environmental Science and Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China.
| |
Collapse
|
10
|
Pattanaik L, Menon A, Settels V, Spiekermann KA, Tan Z, Vermeire FH, Sandfort F, Eiden P, Green WH. ConfSolv: Prediction of Solute Conformer-Free Energies across a Range of Solvents. J Phys Chem B 2023; 127:10151-10170. [PMID: 37966798 DOI: 10.1021/acs.jpcb.3c05904] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Predicting Gibbs free energy of solution is key to understanding the solvent effects on thermodynamics and reaction rates for kinetic modeling. Accurately computing solution free energies requires the enumeration and evaluation of relevant solute conformers in solution. However, even after generation of relevant conformers, determining their free energy of solution requires an expensive workflow consisting of several ab initio computational chemistry calculations. To help address this challenge, we generate a large data set of solution free energies for nearly 44,000 solutes with almost 9 million conformers calculated in 41 different solvents using density functional theory and COSMO-RS and quantify the impact of solute conformers on the solution free energy. We then train a message passing neural network to predict the relative solution free energies of a set of solute conformers, enabling the identification of a small subset of thermodynamically relevant conformers. The model offers substantial computational time savings with predictions usually substantially within 1 kcal/mol of the free energy of the solution calculated by using computational chemical methods.
Collapse
Affiliation(s)
- Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Angiras Menon
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Volker Settels
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Zipei Tan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Florence H Vermeire
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemical Engineering, KU Leuven, Celestijnenlaan 200F, Leuven 3001, Belgium
| | - Frederik Sandfort
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Philipp Eiden
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
11
|
Tran TTV, Tayara H, Chong KT. Recent Studies of Artificial Intelligence on In Silico Drug Absorption. J Chem Inf Model 2023; 63:6198-6211. [PMID: 37819031 DOI: 10.1021/acs.jcim.3c00960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Absorption is an important area of research in pharmacochemistry and drug development, because the drug has to be absorbed before any drug effects can occur. Furthermore, the ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profile of drugs can be directly and considerably altered by modulating factors affecting absorption. Many drugs in development fail because of poor absorption. The research and continuous efforts of researchers in recent years have brought many successes and promises in drug absorption property prediction, especially in silico, which helps to reduce the time and cost significantly for screening undesirable drug candidates. In this report, we explicitly provide an overview of recent in silico studies on predicting absorption properties, especially from 2019 to the present, using artificial intelligence. Additionally, we have collected and investigated public databases that support absorption prediction research. On those grounds, we also proposed the challenges and development directions of absorption prediction in the future. We hope this review can provide researchers with valuable guidelines on absorption prediction to facilitate the development of newer approaches in drug discovery.
Collapse
Affiliation(s)
- Thi Tuyet Van Tran
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Faculty of Information Technology, An Giang University, Long Xuyen 880000, Vietnam
- Vietnam National University, Ho Chi Minh City, Ho Chi Minh 700000, Vietnam
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Kil To Chong
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea
| |
Collapse
|
12
|
Shukla S, Rawat P, Sharma P, Trivedi P, Ghous F, Bishnoi A. Spectroscopic characterization, molecular docking and machine learning studies of sulphur containing hydrazide derivatives. Phys Chem Chem Phys 2023; 25:27677-27693. [PMID: 37812135 DOI: 10.1039/d3cp01133j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Machine learning applied in chemistry is a growing field of research. For assessing structure-property variations, this paper describes in silico studies of the hydrazide derivatives of thiosemicarbazide (TSCZ) and thiocarbohydrazide (TCHZ). The structures of TSCZ and TCHZ have been elucidated using modern spectroscopic techniques. The UV-vis spectra showed strong charge transfer transitions (π-π*) for TSCZ and TCHZ with high extinction coefficients. The NBO analysis showed orbital overlap between lp1 (N2) and σ* (C3-S4) in TSCZ and TCHZ due to intramolecular charge transfer. The first hyperpolarizabilities (β0) for TSCZ and TCHZ were found to be 0.7155 and 2.1615 × 10-30 esu, respectively, indicating their greater suitability for NLO applications as compared to standard reference urea. The strong electrophilic behaviour of TSCZ and TCHZ has been indicated by their global elecrophilicity index. The electrophilic reactivity descriptor analysis indicated that the investigated molecules could serve as precursors for the targeted synthesis of new heterocyclic derivatives. The docking studies showed appreciable binding energies with target proteins having PDB IDs 2WJE and 6CLU of Gram-positive bacteria, namely, Streptococcus pneumoniae phosphatase (PTP-CPS4B) and Staphylococcus aureus dihydropteroate synthase (saDHPS), respectively, for TSCZ and TCHZ, predicting good antimicrobial activity.
Collapse
Affiliation(s)
- Soni Shukla
- Department of Chemistry, University of Lucknow, Lucknow-226007, Uttar Pradesh, India.
| | - Poonam Rawat
- Department of Chemistry, University of Lucknow, Lucknow-226007, Uttar Pradesh, India.
| | - Pulkit Sharma
- Department of Chemistry, University of Lucknow, Lucknow-226007, Uttar Pradesh, India.
| | - Prince Trivedi
- Department of Chemistry, University of Lucknow, Lucknow-226007, Uttar Pradesh, India.
| | - Faraz Ghous
- Department of Chemistry, University of Lucknow, Lucknow-226007, Uttar Pradesh, India.
| | - Abha Bishnoi
- Department of Chemistry, University of Lucknow, Lucknow-226007, Uttar Pradesh, India.
| |
Collapse
|
13
|
Murillo-Gelvez J, Dmitrenko O, Torralba-Sanchez TL, Tratnyek PG, Di Toro DM. p Ka prediction of per- and polyfluoroalkyl acids in water using in silico gas phase stretching vibrational frequencies and infrared intensities. Phys Chem Chem Phys 2023; 25:24745-24760. [PMID: 37671434 DOI: 10.1039/d3cp01390a] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
To successfully understand and model the environmental fate of per- and polyfluoroalkyl substances (PFAS), it is necessary to know key physicochemical properties (PChPs) such as pKa; however, measured PChPs of PFAS are scarce and of uncertain reliability. In this study, quantitative structure-activity relationships (QSARs) were developed by correlating calculated (M062-X/aug-cc-pVDZ) vibrational frequencies (VF) and corresponding infrared intensities (IRInt) to the pKa of carboxylic acids, sulfonic acids, phosphonic acids, sulfonamides, betaines, and alcohols. Antisymmetric stretching VF of the anionic species were used for all subclasses except for alcohols where the OH stretching VF performed better. The individual QSARs predicted the pKa for each subclass mostly within 0.5 pKa units from the experimental values. The inclusion of IRInt as a pKa predictor for carboxylic acids improved the results by decreasing the root-mean-square error from 0.35 to 0.25 (n > 100). Application of the developed QSARs to estimate the pKa of PFAS within each subclass revealed that the length of the perfluoroalkyl chain has minimal effect on the pKa, consistent with other models but in stark contrast with the limited experimental data available.
Collapse
Affiliation(s)
- Jimmy Murillo-Gelvez
- Department of Civil and Environmental Engineering, University of Delaware, Newark, DE 19716, USA.
| | - Olga Dmitrenko
- Department of Chemistry and Biochemistry, University of Delaware, Newark, DE 19716, USA
| | | | - Paul G Tratnyek
- OHSU-PSU School of Public Health, Oregon Health & Science University, Portland, OR 97239, USA
| | - Dominic M Di Toro
- Department of Civil and Environmental Engineering, University of Delaware, Newark, DE 19716, USA.
| |
Collapse
|
14
|
Muellers TD, Petrovic PV, Zimmerman JB, Anastas PT. Toward Property-Based Regulation. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:11718-11730. [PMID: 37527361 DOI: 10.1021/acs.est.3c00643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
An expanding web of adverse impacts on people and the environment has been steadily linked to anthropogenic chemicals and their proliferation. Central to this web are the regulatory structures intended to protect human and environmental health through the control of new molecules. Through chronically insufficient and inefficient action, the current chemical-by-chemical regulatory approach, which considers regulation at the level of chemical identity, has enabled many adverse impacts to develop and persist. Recognizing the link between fundamental physicochemical properties and hazards, we describe a new paradigm─property-based regulation. By regulating physicochemical properties, we show how governments can delineate and enforce safe chemical spaces, increasing the scalability of chemical assessments, reducing the time and resources to regulate a substance, and providing transparency for chemical designers. We highlight sparse existing property-based approaches and demonstrate their applicability using bioaccumulation as an example. Finally, we present a path to implementation in the United States, prescribing roles and steps for government, nongovernmental organizations, and industry to accelerate this transition, to the benefit of all.
Collapse
Affiliation(s)
- Tobias D Muellers
- School of the Environment, Yale University, 195 Prospect St, New Haven, Connecticut 06511, United States
- Center for Green Chemistry and Green Engineering, Yale University, 370 Prospect St, New Haven, Connecticut 06511, United States
| | - Predrag V Petrovic
- School of the Environment, Yale University, 195 Prospect St, New Haven, Connecticut 06511, United States
- Center for Green Chemistry and Green Engineering, Yale University, 370 Prospect St, New Haven, Connecticut 06511, United States
| | - Julie B Zimmerman
- School of the Environment, Yale University, 195 Prospect St, New Haven, Connecticut 06511, United States
- Center for Green Chemistry and Green Engineering, Yale University, 370 Prospect St, New Haven, Connecticut 06511, United States
| | - Paul T Anastas
- School of the Environment, Yale University, 195 Prospect St, New Haven, Connecticut 06511, United States
- Center for Green Chemistry and Green Engineering, Yale University, 370 Prospect St, New Haven, Connecticut 06511, United States
- School of Public Health, Yale University, 60 College St, New Haven, Connecticut 06520, United States
| |
Collapse
|
15
|
Buckley TJ, Egeghy PP, Isaacs K, Richard AM, Ring C, Sayre RR, Sobus JR, Thomas RS, Ulrich EM, Wambaugh JF, Williams AJ. Cutting-edge computational chemical exposure research at the U.S. Environmental Protection Agency. ENVIRONMENT INTERNATIONAL 2023; 178:108097. [PMID: 37478680 PMCID: PMC10588682 DOI: 10.1016/j.envint.2023.108097] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/05/2023] [Accepted: 07/12/2023] [Indexed: 07/23/2023]
Abstract
Exposure science is evolving from its traditional "after the fact" and "one chemical at a time" approach to forecasting chemical exposures rapidly enough to keep pace with the constantly expanding landscape of chemicals and exposures. In this article, we provide an overview of the approaches, accomplishments, and plans for advancing computational exposure science within the U.S. Environmental Protection Agency's Office of Research and Development (EPA/ORD). First, to characterize the universe of chemicals in commerce and the environment, a carefully curated, web-accessible chemical resource has been created. This DSSTox database unambiguously identifies >1.2 million unique substances reflecting potential environmental and human exposures and includes computationally accessible links to each compound's corresponding data resources. Next, EPA is developing, applying, and evaluating predictive exposure models. These models increasingly rely on data, computational tools like quantitative structure activity relationship (QSAR) models, and machine learning/artificial intelligence to provide timely and efficient prediction of chemical exposure (and associated uncertainty) for thousands of chemicals at a time. Integral to this modeling effort, EPA is developing data resources across the exposure continuum that includes application of high-resolution mass spectrometry (HRMS) non-targeted analysis (NTA) methods providing measurement capability at scale with the number of chemicals in commerce. These research efforts are integrated and well-tailored to support population exposure assessment to prioritize chemicals for exposure as a critical input to risk management. In addition, the exposure forecasts will allow a wide variety of stakeholders to explore sustainable initiatives like green chemistry to achieve economic, social, and environmental prosperity and protection of future generations.
Collapse
Affiliation(s)
- Timothy J Buckley
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States.
| | - Peter P Egeghy
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Kristin Isaacs
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Ann M Richard
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Caroline Ring
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Risa R Sayre
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Jon R Sobus
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Russell S Thomas
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Elin M Ulrich
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - John F Wambaugh
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| | - Antony J Williams
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), 109 TW Alexander Drive, Research Triangle Park, NC 27711, United States
| |
Collapse
|
16
|
Fuchsman P, Fetters K, O'Connor A. Target Lipid Model and Empirical Organic Carbon Partition Coefficients Predict Sediment Toxicity of Polychlorinated Biphenyls to Benthic Invertebrates. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY 2023; 42:1134-1151. [PMID: 36808761 DOI: 10.1002/etc.5588] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 10/19/2022] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
Quantifying causal exposure-response relationships for polychlorinated biphenyl (PCB) toxicity to benthic invertebrates can be an important component of contaminated sediment assessments, informing cleanup decisions and natural resource injury determinations. Building on prior analyses, we demonstrate that the target lipid model accurately predicts aquatic toxicity of PCBs to invertebrates, providing a means to account for effects of PCB mixture composition on the toxicity of bioavailable PCBs. We also incorporate updated data on PCB partitioning between particles and interstitial water in field-collected sediments, to better account for effects of PCB mixture composition on PCB bioavailability. To validate the resulting model, we compare its predictions with sediment toxicity data from spiked sediment toxicity tests and a variety of recent case studies from sites where PCBs are the primary sediment contaminant. The updated model should provide a useful tool for both screening-level and in-depth risk analyses for PCBs in sediment, and it should aid in diagnosing potential contributing factors at sites where sediment toxicity and benthic community impairment are observed. Environ Toxicol Chem 2023;42:1134-1151. © 2023 SETAC.
Collapse
|
17
|
Jonkers TJH, Keizers PHJ, Béen F, Meijer J, Houtman CJ, Al Gharib I, Molenaar D, Hamers T, Lamoree MH. Identifying antimicrobials and their metabolites in wastewater and surface water with effect-directed analysis. CHEMOSPHERE 2023; 320:138093. [PMID: 36758810 DOI: 10.1016/j.chemosphere.2023.138093] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 02/04/2023] [Accepted: 02/06/2023] [Indexed: 06/18/2023]
Abstract
This study aimed to identify antimicrobial contaminants in the aquatic environment with effect-directed analysis. Wastewater influent, effluent, and surface water (up- and downstream of the discharge location) were sampled at two study sites. The samples were enriched, subjected to high-resolution fractionation, and the resulting 80 fractions were tested in an antibiotics bioassay. The resulting bioactive fractions guided the suspect and nontargeted identification strategy in the high-resolution mass spectrometry data that was recorded in parallel. Chemical features were annotated with reference databases, assessed on annotation quality, and assigned identification confidence levels. To identify antibiotic metabolites, Phase I metabolites were predicted in silico for over 500 antibiotics and included as a suspect list. Predicted retention times and fragmentation patterns reduced the number of annotations to consider for confirmation testing. Overall, the bioactivity of three fractions could be explained by the identified antibiotics (clarithromycin and azithromycin) and an antibiotic metabolite (14-OH(R) clarithromycin), explaining 78% of the bioactivity measured at one study site. The applied identification strategy successfully identified antibiotic metabolites in the aquatic environment, emphasizing the need to include the toxic effects of bioactive metabolites in environmental risk assessments.
Collapse
Affiliation(s)
- Tim J H Jonkers
- Department of Environment & Health, Faculty of Science, Amsterdam Institute for Life and Environment, Vrije Universiteit Amsterdam, De Boelelaan 1085, 1081 HV, Amsterdam, the Netherlands.
| | - Peter H J Keizers
- National Institute for Public Health and the Environment RIVM, A. van Leeuwenhoeklaan 9, 3721MA, Bilthoven, the Netherlands.
| | - Frederic Béen
- Department of Environment & Health, Faculty of Science, Amsterdam Institute for Life and Environment, Vrije Universiteit Amsterdam, De Boelelaan 1085, 1081 HV, Amsterdam, the Netherlands; KWR Water Research Institute, Groningenhaven 7, 3430 BB, Nieuwegein, the Netherlands.
| | - Jeroen Meijer
- Department of Environment & Health, Faculty of Science, Amsterdam Institute for Life and Environment, Vrije Universiteit Amsterdam, De Boelelaan 1085, 1081 HV, Amsterdam, the Netherlands; Institute for Risk Assessment Sciences (IRAS), Utrecht University, Yalelaan 2, 3584 CM, Utrecht, the Netherlands.
| | - Corine J Houtman
- The Water Laboratory, J.W. Lucasweg 2, 2031 BE, Haarlem, the Netherlands.
| | - Imane Al Gharib
- Systems Biology Lab, Faculty of Science, Amsterdam Institute for Life and Environment, Vrije Universiteit Amsterdam, De Boelelaan 1085, 1081 HV, Amsterdam, the Netherlands
| | - Douwe Molenaar
- Systems Biology Lab, Faculty of Science, Amsterdam Institute for Life and Environment, Vrije Universiteit Amsterdam, De Boelelaan 1085, 1081 HV, Amsterdam, the Netherlands.
| | - Timo Hamers
- Department of Environment & Health, Faculty of Science, Amsterdam Institute for Life and Environment, Vrije Universiteit Amsterdam, De Boelelaan 1085, 1081 HV, Amsterdam, the Netherlands.
| | - Marja H Lamoree
- Department of Environment & Health, Faculty of Science, Amsterdam Institute for Life and Environment, Vrije Universiteit Amsterdam, De Boelelaan 1085, 1081 HV, Amsterdam, the Netherlands.
| |
Collapse
|
18
|
Kenney DH, Paffenroth RC, Timko MT, Teixeira AR. Dimensionally reduced machine learning model for predicting single component octanol-water partition coefficients. J Cheminform 2023; 15:9. [PMID: 36658606 PMCID: PMC9854055 DOI: 10.1186/s13321-022-00660-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 11/25/2022] [Indexed: 01/20/2023] Open
Abstract
MF-LOGP, a new method for determining a single component octanol-water partition coefficients ([Formula: see text]) is presented which uses molecular formula as the only input. Octanol-water partition coefficients are useful in many applications, ranging from environmental fate and drug delivery. Currently, partition coefficients are either experimentally measured or predicted as a function of structural fragments, topological descriptors, or thermodynamic properties known or calculated from precise molecular structures. The MF-LOGP method presented here differs from classical methods as it does not require any structural information and uses molecular formula as the sole model input. MF-LOGP is therefore useful for situations in which the structure is unknown or where the use of a low dimensional, easily automatable, and computationally inexpensive calculations is required. MF-LOGP is a random forest algorithm that is trained and tested on 15,377 data points, using 10 features derived from the molecular formula to make [Formula: see text] predictions. Using an independent validation set of 2713 data points, MF-LOGP was found to have an average [Formula: see text] = 0.77 ± 0.007, [Formula: see text] = 0.52 ± 0.003, and [Formula: see text] = 0.83 ± 0.003. This performance fell within the spectrum of performances reported in the published literature for conventional higher dimensional models ([Formula: see text] = 0.42-1.54, [Formula: see text] = 0.09-1.07, and [Formula: see text] = 0.32-0.95). Compared with existing models, MF-LOGP requires a maximum of ten features and no structural information, thereby providing a practical and yet predictive tool. The development of MF-LOGP provides the groundwork for development of more physical prediction models leveraging big data analytical methods or complex multicomponent mixtures.
Collapse
Affiliation(s)
- David H. Kenney
- grid.268323.e0000 0001 1957 0327Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, MA 01609 USA
| | - Randy C. Paffenroth
- grid.268323.e0000 0001 1957 0327Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, MA 01609 USA
| | - Michael T. Timko
- grid.268323.e0000 0001 1957 0327Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, MA 01609 USA
| | - Andrew R. Teixeira
- grid.268323.e0000 0001 1957 0327Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, MA 01609 USA
| |
Collapse
|
19
|
van Tilborg D, Alenicheva A, Grisoni F. Exposing the Limitations of Molecular Machine Learning with Activity Cliffs. J Chem Inf Model 2022; 62:5938-5951. [PMID: 36456532 PMCID: PMC9749029 DOI: 10.1021/acs.jcim.2c01073] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Indexed: 12/03/2022]
Abstract
Machine learning has become a crucial tool in drug discovery and chemistry at large, e.g., to predict molecular properties, such as bioactivity, with high accuracy. However, activity cliffs─pairs of molecules that are highly similar in their structure but exhibit large differences in potency─have received limited attention for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization but also models that are well equipped to accurately predict the potency of activity cliffs have increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked a total of 24 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. Our findings highlight large case-by-case differences in performance, advocating for (a) the inclusion of dedicated "activity-cliff-centered" metrics during model development and evaluation and (b) the development of novel algorithms to better predict the properties of activity cliffs. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community toward addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute
for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands
- Centre
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands
| | | | - Francesca Grisoni
- Institute
for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands
- Centre
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands
| |
Collapse
|
20
|
Lee KM, Corley R, Jarabek AM, Kleinstreuer N, Paini A, Stucki AO, Bell S. Advancing New Approach Methodologies (NAMs) for Tobacco Harm Reduction: Synopsis from the 2021 CORESTA SSPT-NAMs Symposium. TOXICS 2022; 10:760. [PMID: 36548593 PMCID: PMC9781465 DOI: 10.3390/toxics10120760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 11/05/2022] [Accepted: 11/05/2022] [Indexed: 06/17/2023]
Abstract
New approach methodologies (NAMs) are emerging chemical safety assessment tools consisting of in vitro and in silico (computational) methodologies intended to reduce, refine, or replace (3R) various in vivo animal testing methods traditionally used for risk assessment. Significant progress has been made toward the adoption of NAMs for human health and environmental toxicity assessment. However, additional efforts are needed to expand their development and their use in regulatory decision making. A virtual symposium was held during the 2021 Cooperation Centre for Scientific Research Relative to Tobacco (CORESTA) Smoke Science and Product Technology (SSPT) conference (titled "Advancing New Alternative Methods for Tobacco Harm Reduction"), with the goals of introducing the concepts and potential application of NAMs in the evaluation of potentially reduced-risk (PRR) tobacco products. At the symposium, experts from regulatory agencies, research organizations, and NGOs shared insights on the status of available tools, strengths, limitations, and opportunities in the application of NAMs using case examples from safety assessments of chemicals and tobacco products. Following seven presentations providing background and application of NAMs, a discussion was held where the presenters and audience discussed the outlook for extending the NAMs toxicological applications for tobacco products. The symposium, endorsed by the CORESTA In Vitro Tox Subgroup, Biomarker Subgroup, and NextG Tox Task Force, illustrated common ground and interest in science-based engagement across the scientific community and stakeholders in support of tobacco regulatory science. Highlights of the symposium are summarized in this paper.
Collapse
Affiliation(s)
| | - Richard Corley
- Greek Creek Toxicokinetics Consulting, LLC, Boise, ID 83714, USA
| | - Annie M. Jarabek
- Office of Research and Development, U.S. Environmental Protection Agency (EPA), Research Triangle Park, NC 27711, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for Evaluation of Alternative Toxicological Methods (NICEATM), Research Triangle Park, NC 27711, USA
| | - Alicia Paini
- European Commission Joint Research Center (EC JRC), 2749 Ispra, Italy
| | - Andreas O. Stucki
- PETA Science Consortium International e.V., 70499 Stuttgart, Germany
| | - Shannon Bell
- Inotiv-RTP, Research Triangle Park, NC 27709, USA
| |
Collapse
|
21
|
Nicolas CI, Linakis MW, Minto MS, Mansouri K, Clewell RA, Yoon M, Wambaugh JF, Patlewicz G, McMullen PD, Andersen ME, Clewell III HJ. Estimating provisional margins of exposure for data-poor chemicals using high-throughput computational methods. Front Pharmacol 2022; 13:980747. [PMID: 36278238 PMCID: PMC9586287 DOI: 10.3389/fphar.2022.980747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open
Abstract
Current computational technologies hold promise for prioritizing the testing of the thousands of chemicals in commerce. Here, a case study is presented demonstrating comparative risk-prioritization approaches based on the ratio of surrogate hazard and exposure data, called margins of exposure (MoEs). Exposures were estimated using a U.S. EPA’s ExpoCast predictive model (SEEM3) results and estimates of bioactivity were predicted using: 1) Oral equivalent doses (OEDs) derived from U.S. EPA’s ToxCast high-throughput screening program, together with in vitro to in vivo extrapolation and 2) thresholds of toxicological concern (TTCs) determined using a structure-based decision-tree using the Toxtree open source software. To ground-truth these computational approaches, we compared the MoEs based on predicted noncancer TTC and OED values to those derived using the traditional method of deriving points of departure from no-observed adverse effect levels (NOAELs) from in vivo oral exposures in rodents. TTC-based MoEs were lower than NOAEL-based MoEs for 520 out of 522 (99.6%) compounds in this smaller overlapping dataset, but were relatively well correlated with the same (r2 = 0.59). TTC-based MoEs were also lower than OED-based MoEs for 590 (83.2%) of the 709 evaluated chemicals, indicating that TTCs may serve as a conservative surrogate in the absence of chemical-specific experimental data. The TTC-based MoE prioritization process was then applied to over 45,000 curated environmental chemical structures as a proof-of-concept for high-throughput prioritization using TTC-based MoEs. This study demonstrates the utility of exploiting existing computational methods at the pre-assessment phase of a tiered risk-based approach to quickly, and conservatively, prioritize thousands of untested chemicals for further study.
Collapse
Affiliation(s)
- Chantel I. Nicolas
- Office of Chemical Safety and Pollution Prevention, US EPA, Washington, DC, United States
| | | | | | - Kamel Mansouri
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Research Triangle Park, NC, United States
| | | | | | - John F. Wambaugh
- Center for Computational Toxicology and Exposure Office of Research and Development, US EPA, Research Triangle Park, NC, United States
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure Office of Research and Development, US EPA, Research Triangle Park, NC, United States
| | | | | | | |
Collapse
|
22
|
Li L, Zhang Z, Men Y, Baskaran S, Sangion A, Wang S, Arnot JA, Wania F. Retrieval, Selection, and Evaluation of Chemical Property Data for Assessments of Chemical Emissions, Fate, Hazard, Exposure, and Risks. ACS ENVIRONMENTAL AU 2022; 2:376-395. [PMID: 37101455 PMCID: PMC10125307 DOI: 10.1021/acsenvironau.2c00010] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 07/01/2022] [Accepted: 07/05/2022] [Indexed: 04/28/2023]
Abstract
Reliable chemical property data are the key to defensible and unbiased assessments of chemical emissions, fate, hazard, exposure, and risks. However, the retrieval, evaluation, and use of reliable chemical property data can often be a formidable challenge for chemical assessors and model users. This comprehensive review provides practical guidance for use of chemical property data in chemical assessments. We assemble available sources for obtaining experimentally derived and in silico predicted property data; we also elaborate strategies for evaluating and curating the obtained property data. We demonstrate that both experimentally derived and in silico predicted property data can be subject to considerable uncertainty and variability. Chemical assessors are encouraged to use property data derived through the harmonization of multiple carefully selected experimental data if a sufficient number of reliable laboratory measurements is available or through the consensus consolidation of predictions from multiple in silico tools if the data pool from laboratory measurements is not adequate.
Collapse
Affiliation(s)
- Li Li
- School
of Public Health, University of Nevada Reno, Reno, Nevada 89557, United States
- . Phone: +1 (775) 682 7077
| | - Zhizhen Zhang
- School
of Public Health, University of Nevada Reno, Reno, Nevada 89557, United States
| | - Yujie Men
- Department
of Chemical & Environmental Engineering, University of California Riverside, Riverside, California 92521, United States
| | - Sivani Baskaran
- Department
of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada
| | - Alessandro Sangion
- Department
of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada
- ARC
Arnot Research & Consulting, Toronto, Ontario M4M 1W4, Canada
| | - Shenghong Wang
- School
of Public Health, University of Nevada Reno, Reno, Nevada 89557, United States
| | - Jon A. Arnot
- Department
of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada
- ARC
Arnot Research & Consulting, Toronto, Ontario M4M 1W4, Canada
- Department
of Pharmacology and Toxicology, University
of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Frank Wania
- Department
of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada
| |
Collapse
|
23
|
Khawar MI, Mahmood A, Nabi D. Exploring the role of octanol-water partition coefficient and Henry's law constant in predicting the lipid-water partition coefficients of organic chemicals. Sci Rep 2022; 12:14936. [PMID: 36056200 PMCID: PMC9440013 DOI: 10.1038/s41598-022-19452-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 08/29/2022] [Indexed: 11/19/2022] Open
Abstract
Partition coefficients for storage lipid-water (logKlw) and phospholipid-water (logKpw) phases are key parameters to understand the bioaccumulation and toxicity of organic contaminants. However, the published experimental databases of these properties are dwarfs and current estimation approaches are cumbersome. Here, we present partition models that exploit the correlations of logKlw, and of logKpw with the linear combinations of the octanol-water partition coefficient (logKow) and the dimensionless Henry's law constant (air-water partition coefficient, logKaw). The calibrated partition models successfully describe the variations in logKlw data (n = 305, R2 = 0.971, root-mean-square-error (rmse) = 0.375), and in logKpw data (n = 131, R2 = 0.953, rmse = 0.413). With the inputs of logKow and logKaw estimated from the U.S. EPA's EPI Suite, our models of logKlw and logKpw have exhibited rmse = 0.52 with respect to experimental values indicating suitability of these models for inclusion in the EPI Suite. Our models perform similar to or better than the previously reported models such as one parameter partition models, Abraham solvation models, and models based on quantum-chemical calculations. Taken together, our models are robust, easy-to-use, and provide insight into variations of logKlw and logKpw in terms of hydrophobicity and volatility trait of chemicals.
Collapse
Affiliation(s)
- Muhammad Irfan Khawar
- Institute of Environmental Science and Engineering (IESE), School of Civil and Environmental Engineering (SCEE), National University of Sciences and Technology (NUST), Islamabad, H-12, Pakistan
- Environment and Agriculture Laboratory, School of Interdisciplinary Engineering and Sciences (SINES), National University of Sciences and Technology (NUST), Islamabad, H-12, Pakistan
| | - Azhar Mahmood
- School of Natural Sciences (SNS), National University of Sciences and Technology (NUST), Islamabad, H-12, Pakistan
| | - Deedar Nabi
- Institute of Environmental Science and Engineering (IESE), School of Civil and Environmental Engineering (SCEE), National University of Sciences and Technology (NUST), Islamabad, H-12, Pakistan.
- Environment and Agriculture Laboratory, School of Interdisciplinary Engineering and Sciences (SINES), National University of Sciences and Technology (NUST), Islamabad, H-12, Pakistan.
| |
Collapse
|
24
|
Aurisano N, Fantke P. Semi-automated harmonization and selection of chemical data for risk and impact assessment. CHEMOSPHERE 2022; 302:134886. [PMID: 35537623 DOI: 10.1016/j.chemosphere.2022.134886] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 05/03/2022] [Accepted: 05/05/2022] [Indexed: 06/14/2023]
Abstract
Chemical data for thousands of substances are available for safety, risk, life cycle and substitution assessments, as submitted for example under the European Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) Regulation. However, to widely disseminate reported physicochemical properties as well as human and ecological exposure and toxicological data for use in various science and policy fields, systematic methods for data harmonization and selection are necessary. In response to this need, we developed a semi-automated method for deriving appropriate substance property values as input for various assessment frameworks with different requirements for resolution and data quality. Starting with data reported for a given substance and property, we propose a set of aligned data selection and harmonization criteria to obtain a representative mean value and related confidence intervals per chemical-property combination. The proposed method was tested on a set of octanol-water partition coefficients (Kow) for an illustrative set of 20 substances, reported under the REACH regulation as example data source. Our method is generally applicable to any set of substances, and can assess specific distributions in quality and variability across reported data. Further research can likely extend our method for mining information from text fields and adapt it to available data reported or collected from other sources and other substance properties to improve the reliability of input data for risk and impact assessments.
Collapse
Affiliation(s)
- Nicolò Aurisano
- Quantitative Sustainability Assessment, Department of Environmental and Resource Engineering, Technical University of Denmark, Produktionstorvet 424, 2800, Kgs. Lyngby, Denmark
| | - Peter Fantke
- Quantitative Sustainability Assessment, Department of Environmental and Resource Engineering, Technical University of Denmark, Produktionstorvet 424, 2800, Kgs. Lyngby, Denmark.
| |
Collapse
|
25
|
Karmaus AL, Mansouri K, To KT, Blake B, Fitzpatrick J, Strickland J, Patlewicz G, Allen D, Casey W, Kleinstreuer N. Evaluation of Variability across Rat Acute Oral Systemic Toxicity Studies. Toxicol Sci 2022; 188:34-47. [PMID: 35426934 PMCID: PMC9237992 DOI: 10.1093/toxsci/kfac042] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Regulatory agencies rely upon rodent in vivo acute oral toxicity data to determine hazard categorization, require appropriate precautionary labeling, and perform quantitative risk assessments. As the field of toxicology moves toward animal-free new approach methodologies (NAMs), there is a pressing need to develop a reliable, robust reference data set to characterize the reproducibility and inherent variability in the in vivo acute oral toxicity test method, which would serve to contextualize results and set expectations regarding NAM performance. Such a data set is also needed for training and evaluating computational models. To meet these needs, rat acute oral LD50 data from multiple databases were compiled, curated, and analyzed to characterize variability and reproducibility of results across a set of up to 2441 chemicals with multiple independent study records. Conditional probability analyses reveal that replicate studies only result in the same hazard categorization on average at 60% likelihood. Although we did not have sufficient study metadata to evaluate the impact of specific protocol components (eg, strain, age, or sex of rat, feed used, treatment vehicle, etc.), studies were assumed to follow standard test guidelines. We investigated, but could not attribute, various chemical properties as the sources of variability (ie, chemical structure, physiochemical properties, functional use). Thus, we conclude that inherent biological or protocol variability likely underlies the variance in the results. Based on the observed variability, we were able to quantify a margin of uncertainty of ±0.24 log10 (mg/kg) associated with discrete in vivo rat acute oral LD50 values.
Collapse
Affiliation(s)
- Agnes L Karmaus
- Integrated Laboratory Systems, LLC, Morrisville, NC, 27560, USA
| | - Kamel Mansouri
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA
| | - Kimberly T To
- Integrated Laboratory Systems, LLC, Morrisville, NC, 27560, USA
| | - Bevin Blake
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA
| | - Jeremy Fitzpatrick
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Judy Strickland
- Integrated Laboratory Systems, LLC, Morrisville, NC, 27560, USA
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - David Allen
- Integrated Laboratory Systems, LLC, Morrisville, NC, 27560, USA
| | - Warren Casey
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA
| |
Collapse
|
26
|
Meng J, Chen P, Wahib M, Yang M, Zheng L, Wei Y, Feng S, Liu W. Boosting the predictive performance with aqueous solubility dataset curation. Sci Data 2022; 9:71. [PMID: 35241693 PMCID: PMC8894363 DOI: 10.1038/s41597-022-01154-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 01/25/2022] [Indexed: 12/02/2022] Open
Abstract
Intrinsic solubility is a critical property in pharmaceutical industry that impacts in-vivo bioavailability of small molecule drugs. However, solubility prediction with Artificial Intelligence(AI) are facing insufficient data, poor data quality, and no unified measurements for AI and physics-based approaches. We collect 7 aqueous solubility datasets, and present a dataset curation workflow. Evaluating the curated data with two expanded deep learning methods, improved RMSE scores on all curated thermodynamic datasets are observed. We also compare expanded Chemprop enhanced with curated data and state-of-art physics-based approach using pearson and spearman correlation coefficients. A similar performance on pearson with 0.930 and spearman with 0.947 from expanded Chemprop is achieved. A steadily improved pearson and spearman values with increasing data points are also illustrated. Besides that, the computation advantage of AI models enables quick evaluation of a large set of molecules during the hit identification or lead optimization stages, which helps further decision making within the time cycle at drug discovery stage.
Collapse
Affiliation(s)
- Jintao Meng
- Shenzhen Institutes of Advanced Technology, CAS, Shenzhen, 518000, China.,National Supercomputer Center in Shenzhen, Shenzhen, 518000, China.,Tencent AI Lab, Shenzhen, 518000, China
| | - Peng Chen
- National Institute of Advanced Industrial Science and Technology, Tokyo, Japan. .,RIKEN Center for Computational Science, Hyogo, Japan.
| | - Mohamed Wahib
- National Institute of Advanced Industrial Science and Technology, Tokyo, Japan.,RIKEN Center for Computational Science, Hyogo, Japan
| | | | - Liangzhen Zheng
- Shenzhen Institutes of Advanced Technology, CAS, Shenzhen, 518000, China
| | - Yanjie Wei
- Shenzhen Institutes of Advanced Technology, CAS, Shenzhen, 518000, China.
| | - Shengzhong Feng
- National Supercomputer Center in Shenzhen, Shenzhen, 518000, China.
| | - Wei Liu
- Tencent AI Lab, Shenzhen, 518000, China
| |
Collapse
|
27
|
Boyce M, Meyer B, Grulke C, Lizarraga L, Patlewicz G. Comparing the performance and coverage of selected in silico (liver) metabolism tools relative to reported studies in the literature to inform analogue selection in read-across: A case study. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2022; 21:1-15. [PMID: 35386221 PMCID: PMC8979226 DOI: 10.1016/j.comtox.2021.100208] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Changes in the regulatory landscape of chemical safety assessment call for the use of New Approach Methodologies (NAMs) including read-across to fill data gaps. One critical aspect of analogue evaluation is the extent to which target and source analogues are metabolically similar. In this study, a set of 37 structurally diverse chemicals were compiled from the EPA ToxCast inventory to compare and contrast a selection of metabolism in silico tools, in terms of their coverage and performance relative to metabolism information reported in the literature. The aim was to build understanding of the scope and capabilities of these tools and how they could be utilised in a read-across assessment. The tools were Systematic Generation of Metabolites (SyGMa), Meteor Nexus, BioTransformer, Tissue Metabolism Simulator (TIMES), OECD Toolbox, and Chemical Transformation Simulator (CTS). Performance was characterised by sensitivity and precision determined by comparing predictions against literature reported metabolites (from 44 publications). A coverage score was derived to provide a relative quantitative comparison between the tools. Meteor, TIMES, Toolbox, and CTS predictions were run in batch mode, using default settings. SyGMa and BioTransformer were run with user-defined settings, (two passes of phase I and one pass of phase II). Hierarchical clustering revealed high similarity between TIMES and Toolbox. SyGMa had the highest coverage, matching an average of 38.63% of predictions generated by the other tools though was prone to significant overprediction. It generated 5,125 metabolites, which represented 54.67% of all predictions. Precision and sensitivity values ranged from 1.1-29% and 14.7-28.3% respectively. The Toolbox had the highest performance overall. A case study was presented for 3,4-Toluenediamine (3,4-TDA), assessed for the derivation of screening-level Provisional Peer Reviewed Toxicity Values (PPRTVs), was used to demonstrate the practical role in silico metabolism information can play in analogue evaluation as part of a read-across approach.
Collapse
Affiliation(s)
- Matthew Boyce
- Oak Ridge Associated University, Oak Ridge, TN, 37830, USA
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Brian Meyer
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Chris Grulke
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Lucina Lizarraga
- Center for Public Human Health and Environmental Assessment (CPHEA), U.S. Environmental Protection Agency, Cincinnati, OH, USA
| | - Grace Patlewicz
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| |
Collapse
|
28
|
Predicting compound amenability with liquid chromatography-mass spectrometry to improve non-targeted analysis. Anal Bioanal Chem 2021; 413:7495-7508. [PMID: 34648052 DOI: 10.1007/s00216-021-03713-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 09/22/2021] [Accepted: 10/01/2021] [Indexed: 10/20/2022]
Abstract
With the increasing availability of high-resolution mass spectrometers, suspect screening and non-targeted analysis are becoming popular compound identification tools for environmental researchers. Samples of interest often contain a large (unknown) number of chemicals spanning the detectable mass range of the instrument. In an effort to separate these chemicals prior to injection into the mass spectrometer, a chromatography method is often utilized. There are numerous types of gas and liquid chromatographs that can be coupled to commercially available mass spectrometers. Depending on the type of instrument used for analysis, the researcher is likely to observe a different subset of compounds based on the amenability of those chemicals to the selected experimental techniques and equipment. It would be advantageous if this subset of chemicals could be predicted prior to conducting the experiment, in order to minimize potential false-positive and false-negative identifications. In this work, we utilize experimental datasets to predict the amenability of chemical compounds to detection with liquid chromatography-electrospray ionization-mass spectrometry (LC-ESI-MS). The assembled dataset totals 5517 unique chemicals either explicitly detected or not detected with LC-ESI-MS. The resulting detected/not-detected matrix has been modeled using specific molecular descriptors to predict which chemicals are amenable to LC-ESI-MS, and to which form(s) of ionization. Random forest models, including a measure of the applicability domain of the model for both positive and negative modes of the electrospray ionization source, were successfully developed. The outcome of this work will help to inform future suspect screening and non-targeted analyses of chemicals by better defining the potential LC-ESI-MS detectable chemical landscape of interest.
Collapse
|
29
|
Lawler R, Liu YH, Majaya N, Allam O, Ju H, Kim JY, Jang SS. DFT-Machine Learning Approach for Accurate Prediction of p Ka. J Phys Chem A 2021; 125:8712-8722. [PMID: 34554744 DOI: 10.1021/acs.jpca.1c05031] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In this study, we propose a novel method of pKa prediction in a diverse set of acids, which combines density functional theory (DFT) method with machine learning (ML) methods. First, the DFT method with B3LYP/6-31++G**/SM8 is used to predict pKa, yielding a mean absolute error of 1.85 pKa units. Subsequently, such pKa values predicted from the DFT method are employed as one of 10 molecular descriptors for developing ML models trained on experimental data. Kernel Ridge Regression (KRR), Gaussian Process Regression, and Artificial Neural Network are optimized using three Pipelines: Pipeline 1 involving only hyperparameter optimization (HPO), Pipeline 2 involving HPO followed by a relative contribution analysis (RCA) and recursive feature elimination (RFE), and Pipeline 3 involving HPO followed by RCA and RFE on an expanded set of composite features. Finally, it is demonstrated that KRR with Pipeline 3 yields optimal pKa prediction at an MAE of 0.60 log units. This algorithm was then utilized to predict the pKa of 37 novel acids. The two most important features were determined to be the number of hydrogen atoms in the molecule and the degree of oxidation of the acid. The predicted pKa values were documented for future reference.
Collapse
Affiliation(s)
- Robin Lawler
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0245, United States.,School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Yao-Hao Liu
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0245, United States
| | - Nessa Majaya
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0245, United States
| | - Omar Allam
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0245, United States.,G. W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Hyunchul Ju
- Department of Mechanical Engineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea
| | - Jin Young Kim
- Center for Hydrogen Fuel Cell Research, Korea Institute of Science and Technology, Seoul, 02792, Republic of Korea
| | - Seung Soon Jang
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0245, United States
| |
Collapse
|
30
|
Williams AJ, Lambert JC, Thayer K, Dorne JLCM. Sourcing data on chemical properties and hazard data from the US-EPA CompTox Chemicals Dashboard: A practical guide for human risk assessment. ENVIRONMENT INTERNATIONAL 2021; 154:106566. [PMID: 33934018 PMCID: PMC9667884 DOI: 10.1016/j.envint.2021.106566] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 04/04/2021] [Accepted: 04/05/2021] [Indexed: 05/19/2023]
Abstract
For the past six decades, human health risk assessment of chemicals has relied on in vivo data from human epidemiological and experimental animal toxicological studies to inform the derivation of non-cancer toxicity values. The ongoing evolution of this risk assessment paradigm in an environmental landscape of data-poor chemicals has highlighted the need to develop and implement non-testing methods, so-called New Approach Methodologies (NAMs). NAMs include a growing number of in silico and in vitro data streams designed to inform hazard properties of chemicals, including kinetics and dynamics at different levels of biological organization, environmental fate and transport, and exposure. NAMs provide a fit-for-purpose science-basis for human hazard and risk characterization of chemicals ranging from data-gap filling applications to broad evidence-based decision-making. Systematic assembly and delivery of empirical and predicted data for chemicals are paramount to advancing chemical evaluation, and software tools serve an essential role in delivering these data to the scientific community. The CompTox Chemicals Dashboard (from here on referred to as the "Dashboard") is one such tool and is a publicly available web-based application developed by the US Environmental Protection Agency to provide access to chemistry, toxicity and exposure information for ~900,000 chemicals. The Dashboard is increasingly becoming a valuable resource for assessors tasked with the evaluation of potential human health risks associated with chemical exposures. In this context, the significant amount of information present in the Dashboard facilitates: 1) assembly of information on physicochemical properties and environmental fate and transport and exposure parameters and metrics; 2) identification of cancer and non-cancer health effects from extant human and experimental animal studies in the public domain and/or information not available in the public domain (i.e., "grey literature"); 3) systematic literature searching and review for developing cancer and non-cancer hazard evidence bases; and 4) access to mechanistic information that can aid or augment the analysis of traditional toxicology evidence bases, or potentially, serve as the primary basis for informing hazard identification and dose-response when traditional bioassay data are lacking. Finally, in silico predictive tools developed to conduct structure-activity or read-across analyses are also available within the Dashboard. This practical tutorial is intended to address key questions from the human health risk assessment community dealing with chemicals in both food and in the environment. Perspectives for future development or refinement of the Dashboard highlight foreseen activities to further support the research and risk assessment community in cancer and non-cancer chemical evaluations.
Collapse
Affiliation(s)
- Antony J Williams
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, NC, USA.
| | - Jason C Lambert
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, NC, USA
| | - Kris Thayer
- Center for Public Health and Environmental Assessment, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, NC, USA
| | - Jean-Lou C M Dorne
- Scientific Committee and Emerging Risks Unit, Department of Risk Assessment and Scientific Assistance, European Food Safety Authority, 43126 Parma, Italy
| |
Collapse
|
31
|
Meijer J, Lamoree M, Hamers T, Antignac JP, Hutinet S, Debrauwer L, Covaci A, Huber C, Krauss M, Walker DI, Schymanski EL, Vermeulen R, Vlaanderen J. An annotation database for chemicals of emerging concern in exposome research. ENVIRONMENT INTERNATIONAL 2021; 152:106511. [PMID: 33773387 DOI: 10.1016/j.envint.2021.106511] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 02/03/2021] [Accepted: 03/06/2021] [Indexed: 05/18/2023]
Abstract
BACKGROUND Chemicals of Emerging Concern (CECs) include a very wide group of chemicals that are suspected to be responsible for adverse effects on health, but for which very limited information is available. Chromatographic techniques coupled with high-resolution mass spectrometry (HRMS) can be used for non-targeted screening and detection of CECs, by using comprehensive annotation databases. Establishing a database focused on the annotation of CECs in human samples will provide new insight into the distribution and extent of exposures to a wide range of CECs in humans. OBJECTIVES This study describes an approach for the aggregation and curation of an annotation database (CECscreen) for the identification of CECs in human biological samples. METHODS The approach consists of three main parts. First, CECs compound lists from various sources were aggregated and duplications and inorganic compounds were removed. Subsequently, the list was curated by standardization of structures to create "MS-ready" and "QSAR-ready" SMILES, as well as calculation of exact masses (monoisotopic and adducts) and molecular formulas. The second step included the simulation of Phase I metabolites. The third and final step included the calculation of QSAR predictions related to physicochemical properties, environmental fate, toxicity and Absorption, Distribution, Metabolism, Excretion (ADME) processes and the retrieval of information from the US EPA CompTox Chemicals Dashboard. RESULTS All CECscreen database and property files are publicly available (DOI: https://doi.org/10.5281/zenodo.3956586). In total, 145,284 entries were aggregated from various CECs data sources. After elimination of duplicates and curation, the pipeline produced 70,397 unique "MS-ready" structures and 66,071 unique QSAR-ready structures, corresponding with 69,526 CAS numbers. Simulation of Phase I metabolites resulted in 306,279 unique metabolites. QSAR predictions could be performed for 64,684 of the QSAR-ready structures, whereas information was retrieved from the CompTox Chemicals Dashboard for 59,739 CAS numbers out of 69,526 inquiries. CECscreen is incorporated in the in silico fragmentation approach MetFrag. DISCUSSION The CECscreen database can be used to prioritize annotation of CECs measured in non-targeted HRMS, facilitating the large-scale detection of CECs in human samples for exposome research. Large-scale detection of CECs can be further improved by integrating the present database with resources that contain CECs (metabolites) and meta-data measurements, further expansion towards in silico and experimental (e.g., MassBank) generation of MS/MS spectra, and development of bioinformatics approaches capable of using correlation patterns in the measured chemical features.
Collapse
Affiliation(s)
- Jeroen Meijer
- Institute for Risk Assessment Sciences (IRAS), Utrecht University, Utrecht, the Netherlands; Department Environment & Health, Vrije Universiteit, Amsterdam, the Netherlands
| | - Marja Lamoree
- Department Environment & Health, Vrije Universiteit, Amsterdam, the Netherlands
| | - Timo Hamers
- Department Environment & Health, Vrije Universiteit, Amsterdam, the Netherlands
| | | | | | - Laurent Debrauwer
- Toxalim (Research Centre in Food Toxicology), Toulouse University, INRAE, ENVT, INP-Purpan, Toulouse, France; Metatoul-AXIOM Platform, National Infrastructure for Metabolomics and Fluxomics: MetaboHUB, Toxalim, INRAE, Toulouse, France
| | - Adrian Covaci
- Toxicological Center, University of Antwerp, Belgium
| | - Carolin Huber
- Department Effect-Directed Analysis, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Martin Krauss
- Department Effect-Directed Analysis, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Douglas I Walker
- Department of Environmental Medicine and Public Health, Icahn School of Medicine, Mount Sinai, New York, NY, USA
| | - Emma L Schymanski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Roel Vermeulen
- Institute for Risk Assessment Sciences (IRAS), Utrecht University, Utrecht, the Netherlands
| | - Jelle Vlaanderen
- Institute for Risk Assessment Sciences (IRAS), Utrecht University, Utrecht, the Netherlands.
| |
Collapse
|
32
|
Exploring the octanol-water partition coefficient dataset using deep learning techniques and data augmentation. Commun Chem 2021; 4:90. [PMID: 36697535 PMCID: PMC9814212 DOI: 10.1038/s42004-021-00528-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 05/21/2021] [Indexed: 01/28/2023] Open
Abstract
Today more and more data are freely available. Based on these big datasets deep neural networks (DNNs) rapidly gain relevance in computational chemistry. Here, we explore the potential of DNNs to predict chemical properties from chemical structures. We have selected the octanol-water partition coefficient (log P) as an example, which plays an essential role in environmental chemistry and toxicology but also in chemical analysis. The predictive performance of the developed DNN is good with an rmse of 0.47 log units in the test dataset and an rmse of 0.33 for an external dataset from the SAMPL6 challenge. To this end, we trained the DNN using data augmentation considering all potential tautomeric forms of the chemicals. We further demonstrate how DNN models can help in the curation of the log P dataset by identifying potential errors, and address limitations of the dataset itself.
Collapse
|
33
|
Alves VM, Auerbach SS, Kleinstreuer N, Rooney JP, Muratov EN, Rusyn I, Tropsha A, Schmitt C. Curated Data In - Trustworthy In Silico Models Out: The Impact of Data Quality on the Reliability of Artificial Intelligence Models as Alternatives to Animal Testing. Altern Lab Anim 2021; 49:73-82. [PMID: 34233495 PMCID: PMC8609471 DOI: 10.1177/02611929211029635] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
New Approach Methodologies (NAMs) that employ artificial intelligence (AI) for predicting adverse effects of chemicals have generated optimistic expectations as alternatives to animal testing. However, the major underappreciated challenge in developing robust and predictive AI models is the impact of the quality of the input data on the model accuracy. Indeed, poor data reproducibility and quality have been frequently cited as factors contributing to the crisis in biomedical research, as well as similar shortcomings in the fields of toxicology and chemistry. In this article, we review the most recent efforts to improve confidence in the robustness of toxicological data and investigate the impact that data curation has on the confidence in model predictions. We also present two case studies demonstrating the effect of data curation on the performance of AI models for predicting skin sensitisation and skin irritation. We show that, whereas models generated with uncurated data had a 7-24% higher correct classification rate (CCR), the perceived performance was, in fact, inflated owing to the high number of duplicates in the training set. We assert that data curation is a critical step in building computational models, to help ensure that reliable predictions of chemical toxicity are achieved through use of the models.
Collapse
Affiliation(s)
- Vinicius M. Alves
- Office of Data Science, Division of the National Toxicology Program (DNTP), National Institute of Environmental Health Sciences (NIEHS), Durham, NC, USA
| | - Scott S. Auerbach
- Toxinformatics Group, Predictive Toxicology Branch, DNTP, NIEHS, Durham, NC, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Scientific Director's Office, DNTP, NIEHS, Durham, NC, USA
| | - John P. Rooney
- Integrated Laboratory Systems, LLC, Morrisville, NC, USA
| | - Eugene N. Muratov
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, The University of North Carolina at Chapel Hill, NC, USA
- Department of Pharmaceutical Sciences, Federal University of Paraiba, Joao Pessoa, Paraiba, Brazil
| | - Ivan Rusyn
- Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, The University of North Carolina at Chapel Hill, NC, USA
| | - Charles Schmitt
- Office of Data Science, Division of the National Toxicology Program (DNTP), National Institute of Environmental Health Sciences (NIEHS), Durham, NC, USA
| |
Collapse
|
34
|
Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TE, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown J, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, Kleinstreuer NC. CATMoS: Collaborative Acute Toxicity Modeling Suite. ENVIRONMENTAL HEALTH PERSPECTIVES 2021; 129:47013. [PMID: 33929906 PMCID: PMC8086800 DOI: 10.1289/ehp8495] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
BACKGROUND Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditional rodent acute toxicity tests. In silico models built using existing data facilitate rapid acute toxicity predictions without using animals. OBJECTIVES The U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) Acute Toxicity Workgroup organized an international collaboration to develop in silico models for predicting acute oral toxicity based on five different end points: Lethal Dose 50 (LD50 value, U.S. Environmental Protection Agency hazard (four) categories, Globally Harmonized System for Classification and Labeling hazard (five) categories, very toxic chemicals [LD50 (LD50≤50mg/kg)], and nontoxic chemicals (LD50>2,000mg/kg). METHODS An acute oral toxicity data inventory for 11,992 chemicals was compiled, split into training and evaluation sets, and made available to 35 participating international research groups that submitted a total of 139 predictive models. Predictions that fell within the applicability domains of the submitted models were evaluated using external validation sets. These were then combined into consensus models to leverage strengths of individual approaches. RESULTS The resulting consensus predictions, which leverage the collective strengths of each individual model, form the Collaborative Acute Toxicity Modeling Suite (CATMoS). CATMoS demonstrated high performance in terms of accuracy and robustness when compared with in vivo results. DISCUSSION CATMoS is being evaluated by regulatory agencies for its utility and applicability as a potential replacement for in vivo rat acute oral toxicity studies. CATMoS predictions for more than 800,000 chemicals have been made available via the National Toxicology Program's Integrated Chemical Environment tools and data sets (ice.ntp.niehs.nih.gov). The models are also implemented in a free, standalone, open-source tool, OPERA, which allows predictions of new and untested chemicals to be made. https://doi.org/10.1289/EHP8495.
Collapse
Affiliation(s)
- Kamel Mansouri
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Research Triangle Park, North Carolina, USA
| | - Agnes L. Karmaus
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | | | - Grace Patlewicz
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Prachi Pradeep
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
- Oak Ridge Institute for Science and Education (ORISE) Research Participation Program, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Domenico Alberga
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | | | - Timothy E.H. Allen
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Dave Allen
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | - Vinicius M. Alves
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | - Carolina H. Andrade
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | | | - Davide Ballabio
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Shannon Bell
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Sudin Bhattacharya
- Institute for Quantitative Health Science and Engineering, Department of Biomedical Engineering, Michigan State University, East Lansing, Michigan, USA
| | - Joyce V. Bastos
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | - Stephen Boyd
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA
| | - J.B. Brown
- Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Stephen J. Capuzzi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Yaroslav Chushak
- Aeromedical Research Department, Force Health Protection, USAFSAM, Dayton, Ohio, USA
- Henry M Jackson Foundation for the Advancement of Military Medicine, Dayton, Ohio, USA
| | - Heather Ciallella
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | - Alex M. Clark
- Collaborations Pharmaceuticals, Inc., Raleigh, North Carolina, USA
| | - Viviana Consonni
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | | | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., Raleigh, North Carolina, USA
| | - Sherif Farag
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Maxim Fedorov
- Skoltech, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Denis Fourches
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | - Domenico Gadaleta
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Feng Gao
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA
| | - Jeffery M. Gearhart
- Aeromedical Research Department, Force Health Protection, USAFSAM, Dayton, Ohio, USA
- Henry M Jackson Foundation for the Advancement of Military Medicine, Dayton, Ohio, USA
| | - Garett Goh
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Jonathan M. Goodman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Francesca Grisoni
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Christopher M. Grulke
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | | | - Matthew Hirn
- Department of Computational Mathematics, Science & Engineering, Department of Mathematics, Michigan State University, East Lansing, Michigan, USA
| | - Pavel Karpov
- Institute of Structural Biology, Helmholtz Zentrum München (GmbH), Neuherberg, Germany
| | | | - Giovanna J. Lavado
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | | | - Xinhao Li
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Filippo Lunghini
- Laboratoire de Chemoinformatique, URM7140, Université de Strasbourg, Strasbourg, France
| | - Giuseppe F. Mangiatordi
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Gilles Marcou
- Laboratoire de Chemoinformatique, URM7140, Université de Strasbourg, Strasbourg, France
| | - Dan Marsh
- Underwriters Laboratories, Northbrook, Illinois, USA
| | - Todd Martin
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Cincinnati, Ohio, USA
| | | | - Eugene N. Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | | | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Orazio Nicolotti
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Reine Note
- L’Oréal Research & Innovation, Aulnay-sous-Bois, France
| | - Paritosh Pande
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | | | - Tyler Peryea
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Robert Rallo
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Alessandra Roncaglioni
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | | | - Patricia Ruiz
- Office of Innovation and Analytics, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Daniel P. Russo
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | - Ahmed Sayed
- Rosettastein Consulting UG, Freising, Germany
| | - Risa Sayre
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
- Oak Ridge Institute for Science and Education (ORISE) Research Participation Program, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Timothy Sheils
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Charles Siegel
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Arthur C. Silva
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | - Anton Simeonov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Sergey Sosnin
- Skoltech, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Noel Southall
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Judy Strickland
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Brian Teppen
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA
| | - Igor V. Tetko
- Institute of Structural Biology, Helmholtz Zentrum München (GmbH), Neuherberg, Germany
- BIGCHEM GmbH, Unterschleissheim, Germany
| | - Dennis Thomas
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | | | - Roberto Todeschini
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Cosimo Toma
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Ignacio Tripodi
- Computer Science/Interdisciplinary Quantitative Biology, University of Colorado, Boulder, Colorado, USA
| | - Daniela Trisciuzzi
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique, URM7140, Université de Strasbourg, Strasbourg, France
| | - Kristijan Vukovic
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Zhongyu Wang
- School of Environmental Sciences and Technology, Dalian University of Technology; Dalian, Liaoning, China
| | - Liguo Wang
- School of Environmental Sciences and Technology, Dalian University of Technology; Dalian, Liaoning, China
| | | | - Andrew J. Wedlake
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | | | - Dan Wilson
- The Dow Chemical Company, Midland, Michigan, USA
| | - Zijun Xiao
- School of Environmental Sciences and Technology, Dalian University of Technology; Dalian, Liaoning, China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Gergely Zahoranszky-Kohalmi
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Alexey V. Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Zhen Zhang
- Dow Agrosciences, Indianapolis, Indiana, USA
| | - Tongan Zhao
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | | | - Warren Casey
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Research Triangle Park, North Carolina, USA
| | - Nicole C. Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Research Triangle Park, North Carolina, USA
| |
Collapse
|
35
|
Chushak Y, Gearhart JM, Ott D. In Silico Assessment of Acute Oral Toxicity for Mixtures. Chem Res Toxicol 2020; 34:345-354. [PMID: 33206501 DOI: 10.1021/acs.chemrestox.0c00256] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
While exposure of humans to environmental hazards often occurs with complex chemical mixtures, the majority of existing toxicity data are for single compounds. The Globally Harmonized System of chemical classification (GHS) developed by the Organization for Economic Cooperation and Development uses the additivity formula for acute oral toxicity classification of mixtures, which is based on the acute toxicity estimate of individual ingredients. We evaluated the prediction of GHS category classifications for mixtures using toxicological data collected in the Integrated Chemical Environment (ICE) developed by the National Toxicology Program (United States Department of Health and Human Services). The ICE database contains in vivo acute oral toxicity data for ∼10,000 chemicals and for 582 mixtures with one or multiple active ingredients. By using the available experimental data for individual ingredients, we were able to calculate a GHS category for only half of the mixtures. To expand a set of components with acute oral toxicity data, we used the Collaborative Acute Toxicity Modeling Suite (CATMoS) implemented in the Open Structure-Activity/Property Relationship App to make predictions for active ingredients without available experimental data. As a result, we were able to make predictions for 503 mixtures/formulations with 72% accuracy for the GHS classification. For 186 mixtures with two or more active ingredients, the accuracy rate was 76%. The structure-based analysis of the misclassified mixtures did not reveal any specific structural features associated with the mispredictions. Our results demonstrate that CATMoS together with an additivity formula can be used to predict the GHS category for chemical mixtures.
Collapse
Affiliation(s)
- Yaroslav Chushak
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Wright-Patterson Air Force Base, Dayton, Ohio 45433, United States
| | - Jeffery M Gearhart
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Wright-Patterson Air Force Base, Dayton, Ohio 45433, United States
| | - Darrin Ott
- Warfighter Medical Optimization Division, 711 Human Performance Wing, Air Force Research Laboratory, Wright-Patterson Air Force Base, Dayton, Ohio 45433, United States
| |
Collapse
|
36
|
The TTC Data Mart: An interactive browser for threshold of toxicological concern calculations. ACTA ACUST UNITED AC 2020. [DOI: 10.1016/j.comtox.2020.100128] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
37
|
Abstract
At the end of her academic career, the author summarizes the main aspects of QSAR modeling, giving comments and suggestions according to her 23 years' experience in QSAR research on environmental topics. The focus is mainly on Multiple Linear Regression, particularly Ordinary Least Squares, using a Genetic Algorithm for variable selection from various theoretical molecular descriptors, but the comments can be useful also for other QSAR methods. The need for rigorous validation, also external, and for applicability domain check to guarantee predictivity and reliability of QSAR models is particularly highlighted. The commented approach is the “predictive” one, based on chemometrics, and is usefully applied to the prioritization of environmental pollutants. All the discussed points and the author's ideas are implemented in the software QSARINS, as a legacy to the QSAR community.
Collapse
|
38
|
Comess S, Akbay A, Vasiliou M, Hines RN, Joppa L, Vasiliou V, Kleinstreuer N. Bringing Big Data to Bear in Environmental Public Health: Challenges and Recommendations. Front Artif Intell 2020; 3. [PMID: 33184612 PMCID: PMC7654840 DOI: 10.3389/frai.2020.00031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Understanding the role that the environment plays in influencing public health often involves collecting and studying large, complex data sets. There have been a number of private and public efforts to gather sufficient information and confront significant unknowns in the field of environmental public health, yet there is a persistent and largely unmet need for findable, accessible, interoperable, and reusable (FAIR) data. Even when data are readily available, the ability to create, analyze, and draw conclusions from these data using emerging computational tools, such as augmented and artificial inteligence (AI) and machine learning, requires technical skills not currently implemented on a programmatic level across research hubs and academic institutions. We argue that collaborative efforts in data curation and storage, scientific computing, and training are of paramount importance to empower researchers within environmental sciences and the broader public health community to apply AI approaches and fully realize their potential. Leaders in the field were asked to prioritize challenges in incorporating big data in environmental public health research: inconsistent implementation of FAIR principles in data collection and sharing, a lack of skilled data scientists and appropriate cyber-infrastructures, and limited understanding of possibilities and communication of benefits were among those identified. These issues are discussed, and actionable recommendations are provided.
Collapse
Affiliation(s)
- Saskia Comess
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, United States.,Department of Statistics and Data Science, Yale University, New Haven, CT, United States
| | - Alexia Akbay
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, United States.,Symbrosia Inc, Kailua-Kona, HI, United States
| | - Melpomene Vasiliou
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, United States
| | - Ronald N Hines
- US Environmental Protection Agency, Center for Public Health and Environmental Assessment, Research Triangle Park, NC, United States
| | - Lucas Joppa
- Microsoft Corporation, AI for Earth, Redmond, WA, United States
| | - Vasilis Vasiliou
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, United States
| | - Nicole Kleinstreuer
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, United States.,National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, United States
| |
Collapse
|
39
|
Cendoya X, Quevedo C, Ipiñazar M, Planes FJ. Computational approach for collection and prediction of molecular initiating events in developmental toxicity. Reprod Toxicol 2020; 94:55-64. [PMID: 32344110 DOI: 10.1016/j.reprotox.2020.03.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 03/04/2020] [Accepted: 03/20/2020] [Indexed: 02/06/2023]
Abstract
Developmental toxicity is defined as the occurrence of adverse effects on the developing organism as a result from exposure to a toxic agent. These alterations can have long-term acute effects. Current in vitro models present important limitations and the evaluation of toxicity is not entirely objective. In silico methods have also shown limited success, in part due to complex and varied mechanisms of action that mediate developmental toxicity, which are sometimes poorly understood. In this article, we compiled a dataset of compounds with developmental toxicity categories and annotated mechanisms of action for both toxic and non-toxic compounds (DVTOX). With it, we selected a panel of protein targets that might be part of putative Molecular Initiating Events (MIEs) of Adverse Outcome Pathways of developmental toxicity. The validity of this list of candidate MIEs was studied through the evaluation of new drug-target relationships that include such proteins, but were not part of the original database. Finally, an orthology analysis of this protein panel was conducted to select an appropriate animal model to assess developmental toxicity. We tested our approach using the zebrafish embryo toxicity test, finding positive results.
Collapse
Affiliation(s)
- Xabier Cendoya
- TECNUN, University of Navarra, San Sebastian, 20018, Spain
| | | | | | | |
Collapse
|
40
|
Hemmerich J, Ecker GF. In silico toxicology: From structure–activity relationships towards deep learning and adverse outcome pathways. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020; 10:e1475. [PMID: 35866138 PMCID: PMC9286356 DOI: 10.1002/wcms.1475] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 03/09/2020] [Accepted: 03/10/2020] [Indexed: 12/18/2022]
Abstract
In silico toxicology is an emerging field. It gains increasing importance as research is aiming to decrease the use of animal experiments as suggested in the 3R principles by Russell and Burch. In silico toxicology is a means to identify hazards of compounds before synthesis, and thus in very early stages of drug development. For chemical industries, as well as regulatory agencies it can aid in gap‐filling and guide risk minimization strategies. Techniques such as structural alerts, read‐across, quantitative structure–activity relationship, machine learning, and deep learning allow to use in silico toxicology in many cases, some even when data is scarce. Especially the concept of adverse outcome pathways puts all techniques into a broader context and can elucidate predictions by mechanistic insights. This article is categorized under:Structure and Mechanism > Computational Biochemistry and Biophysics Data Science > Chemoinformatics
Collapse
Affiliation(s)
- Jennifer Hemmerich
- Department of Pharmaceutical Chemistry University of Vienna Vienna Austria
| | - Gerhard F. Ecker
- Department of Pharmaceutical Chemistry University of Vienna Vienna Austria
| |
Collapse
|
41
|
Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D, Benfenati E, Bhhatarai B, Boyer S, Chen J, Consonni V, Farag S, Fourches D, García-Sosa AT, Gramatica P, Grisoni F, Grulke CM, Hong H, Horvath D, Hu X, Huang R, Jeliazkova N, Li J, Li X, Liu H, Manganelli S, Mangiatordi GF, Maran U, Marcou G, Martin T, Muratov E, Nguyen DT, Nicolotti O, Nikolov NG, Norinder U, Papa E, Petitjean M, Piir G, Pogodin P, Poroikov V, Qiao X, Richard AM, Roncaglioni A, Ruiz P, Rupakheti C, Sakkiah S, Sangion A, Schramm KW, Selvaraj C, Shah I, Sild S, Sun L, Taboureau O, Tang Y, Tetko IV, Todeschini R, Tong W, Trisciuzzi D, Tropsha A, Van Den Driessche G, Varnek A, Wang Z, Wedebye EB, Williams AJ, Xie H, Zakharov AV, Zheng Z, Judson RS. CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity. ENVIRONMENTAL HEALTH PERSPECTIVES 2020; 128:27002. [PMID: 32074470 DOI: 10.23645/epacomptox.5176876] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
BACKGROUND Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling. OBJECTIVES In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP). METHODS The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays. RESULTS The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set. DISCUSSION The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of ∼875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.
Collapse
Affiliation(s)
- Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
- ScitoVation LLC, Research Triangle Park, North Carolina, USA
- Integrated Laboratory Systems, Inc., Morrisville, North Carolina, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM), National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Ahmed M Abdelaziz
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Domenico Alberga
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Vinicius M Alves
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Carolina H Andrade
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
| | - Fang Bai
- School of Pharmacy, Lanzhou University, China
| | - Ilya Balabin
- Information Systems & Global Solutions (IS&GS), Lockheed Martin, USA
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Emilio Benfenati
- Istituto di Ricerche Farmacologiche "Mario Negri", IRCCS, Milan, Italy
| | - Barun Bhhatarai
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Scott Boyer
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Jingwen Chen
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Sherif Farag
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Denis Fourches
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Paola Gramatica
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Francesca Grisoni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Chris M Grulke
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Dragos Horvath
- Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Xin Hu
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ruili Huang
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Jiazhong Li
- School of Pharmacy, Lanzhou University, China
| | - Xuehua Li
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | | | - Serena Manganelli
- Istituto di Ricerche Farmacologiche "Mario Negri", IRCCS, Milan, Italy
| | | | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Gilles Marcou
- Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Todd Martin
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
| | - Eugene Muratov
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Orazio Nicolotti
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Nikolai G Nikolov
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Ulf Norinder
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Ester Papa
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Michel Petitjean
- Computational Modeling of Protein-Ligand Interactions (CMPLI)-INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Geven Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Pavel Pogodin
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Vladimir Poroikov
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Xianliang Qiao
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Ann M Richard
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | | | - Patricia Ruiz
- Computational Toxicology and Methods Development Laboratory, Division of Toxicology and Human Health Sciences, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Chetan Rupakheti
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
- Department of Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, USA
| | - Sugunadevi Sakkiah
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Alessandro Sangion
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Karl-Werner Schramm
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Chandrabose Selvaraj
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Imran Shah
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Lixia Sun
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Olivier Taboureau
- Computational Modeling of Protein-Ligand Interactions (CMPLI)-INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Yun Tang
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Igor V Tetko
- BIGCHEM GmbH, Neuherberg, Germany
- Helmholtz Zentrum Muenchen - German Research Center for Environmental Health (GmbH), Neuherberg, Germany
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | | | - Alexander Tropsha
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - George Van Den Driessche
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Zhongyu Wang
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Eva B Wedebye
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Antony J Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Hongbin Xie
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ziye Zheng
- Chemistry Department, Umeå University, Umeå, Sweden
| | - Richard S Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| |
Collapse
|
42
|
Fantke P, Aurisano N, Provoost J, Karamertzanis PG, Hauschild M. Toward effective use of REACH data for science and policy. ENVIRONMENT INTERNATIONAL 2020; 135:105336. [PMID: 31884133 DOI: 10.1016/j.envint.2019.105336] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 11/15/2019] [Accepted: 11/15/2019] [Indexed: 06/10/2023]
Affiliation(s)
- Peter Fantke
- Quantitative Sustainability Assessment, Department of Technology, Management and Economics, Technical University of Denmark, Produktionstorvet 424, 2800 Kgs. Lyngby, Denmark.
| | - Nicolò Aurisano
- Quantitative Sustainability Assessment, Department of Technology, Management and Economics, Technical University of Denmark, Produktionstorvet 424, 2800 Kgs. Lyngby, Denmark
| | - Jeroen Provoost
- Computational Assessment Unit, Directorate of Prioritisation and Integration, European Chemicals Agency, Annankatu 18, 00121 Helsinki, Finland
| | - Panagiotis G Karamertzanis
- Computational Assessment Unit, Directorate of Prioritisation and Integration, European Chemicals Agency, Annankatu 18, 00121 Helsinki, Finland
| | - Michael Hauschild
- Quantitative Sustainability Assessment, Department of Technology, Management and Economics, Technical University of Denmark, Produktionstorvet 424, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
43
|
Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D, Benfenati E, Bhhatarai B, Boyer S, Chen J, Consonni V, Farag S, Fourches D, García-Sosa AT, Gramatica P, Grisoni F, Grulke CM, Hong H, Horvath D, Hu X, Huang R, Jeliazkova N, Li J, Li X, Liu H, Manganelli S, Mangiatordi GF, Maran U, Marcou G, Martin T, Muratov E, Nguyen DT, Nicolotti O, Nikolov NG, Norinder U, Papa E, Petitjean M, Piir G, Pogodin P, Poroikov V, Qiao X, Richard AM, Roncaglioni A, Ruiz P, Rupakheti C, Sakkiah S, Sangion A, Schramm KW, Selvaraj C, Shah I, Sild S, Sun L, Taboureau O, Tang Y, Tetko IV, Todeschini R, Tong W, Trisciuzzi D, Tropsha A, Van Den Driessche G, Varnek A, Wang Z, Wedebye EB, Williams AJ, Xie H, Zakharov AV, Zheng Z, Judson RS. CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity. ENVIRONMENTAL HEALTH PERSPECTIVES 2020; 128:27002. [PMID: 32074470 PMCID: PMC7064318 DOI: 10.1289/ehp5580] [Citation(s) in RCA: 100] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 11/27/2019] [Accepted: 12/05/2019] [Indexed: 05/04/2023]
Abstract
BACKGROUND Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling. OBJECTIVES In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP). METHODS The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays. RESULTS The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set. DISCUSSION The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of ∼ 875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.
Collapse
Affiliation(s)
- Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
- ScitoVation LLC, Research Triangle Park, North Carolina, USA
- Integrated Laboratory Systems, Inc., Morrisville, North Carolina, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM), National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Ahmed M. Abdelaziz
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Domenico Alberga
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Vinicius M. Alves
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Carolina H. Andrade
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
| | - Fang Bai
- School of Pharmacy, Lanzhou University, China
| | - Ilya Balabin
- Information Systems & Global Solutions (IS&GS), Lockheed Martin, USA
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Emilio Benfenati
- Istituto di Ricerche Farmacologiche “Mario Negri”, IRCCS, Milan, Italy
| | - Barun Bhhatarai
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Scott Boyer
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Jingwen Chen
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Sherif Farag
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Denis Fourches
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Paola Gramatica
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Francesca Grisoni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Chris M. Grulke
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Dragos Horvath
- Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Xin Hu
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ruili Huang
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Jiazhong Li
- School of Pharmacy, Lanzhou University, China
| | - Xuehua Li
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | | | - Serena Manganelli
- Istituto di Ricerche Farmacologiche “Mario Negri”, IRCCS, Milan, Italy
| | | | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Gilles Marcou
- Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Todd Martin
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
| | - Eugene Muratov
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Orazio Nicolotti
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Nikolai G. Nikolov
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Ulf Norinder
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Ester Papa
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Michel Petitjean
- Computational Modeling of Protein-Ligand Interactions (CMPLI)–INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Geven Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Pavel Pogodin
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Vladimir Poroikov
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Xianliang Qiao
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Ann M. Richard
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | | | - Patricia Ruiz
- Computational Toxicology and Methods Development Laboratory, Division of Toxicology and Human Health Sciences, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Chetan Rupakheti
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
- Department of Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, USA
| | - Sugunadevi Sakkiah
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Alessandro Sangion
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Karl-Werner Schramm
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Chandrabose Selvaraj
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Imran Shah
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Lixia Sun
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Olivier Taboureau
- Computational Modeling of Protein-Ligand Interactions (CMPLI)–INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Yun Tang
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Igor V. Tetko
- BIGCHEM GmbH, Neuherberg, Germany
- Helmholtz Zentrum Muenchen – German Research Center for Environmental Health (GmbH), Neuherberg, Germany
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | | | - Alexander Tropsha
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - George Van Den Driessche
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Zhongyu Wang
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Eva B. Wedebye
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Antony J. Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Hongbin Xie
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Alexey V. Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ziye Zheng
- Chemistry Department, Umeå University, Umeå, Sweden
| | - Richard S. Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| |
Collapse
|
44
|
Ambure P, Cordeiro MNDS. Importance of Data Curation in QSAR Studies Especially While Modeling Large-Size Datasets. METHODS IN PHARMACOLOGY AND TOXICOLOGY 2020. [DOI: 10.1007/978-1-0716-0150-1_5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
45
|
Krewski D, Andersen ME, Tyshenko MG, Krishnan K, Hartung T, Boekelheide K, Wambaugh JF, Jones D, Whelan M, Thomas R, Yauk C, Barton-Maclaren T, Cote I. Toxicity testing in the 21st century: progress in the past decade and future perspectives. Arch Toxicol 2019; 94:1-58. [DOI: 10.1007/s00204-019-02613-4] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 11/05/2019] [Indexed: 12/19/2022]
|
46
|
Sheffield TY, Judson RS. Ensemble QSAR Modeling to Predict Multispecies Fish Toxicity Lethal Concentrations and Points of Departure. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2019; 53:12793-12802. [PMID: 31560848 PMCID: PMC7047609 DOI: 10.1021/acs.est.9b03957] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
QSAR modeling can be used to aid testing prioritization of the thousands of chemical substances for which no ecological toxicity data are available. We drew on the U.S. Environmental Protection Agency's ECOTOX database with additional data from ECHA to build a large data set containing in vivo test data on fish for thousands of chemical substances. This was used to create QSAR models to predict two types of end points: acute LC50 (median lethal concentration) and points of departure similar to the NOEC (no observed effect concentration) for any duration (named the "LC50" and "NOEC" models, respectively). These models used study covariates, such as species and exposure route, as features to facilitate the simultaneous use of varied data types. A novel method of substituting taxonomy groups for species dummy variables was introduced to maximize generalizability to different species. A stacked ensemble of three machine learning methods-random forest, gradient boosted trees, and support vector regression-was implemented to best make use of a large data set with many descriptors. The LC50 and NOEC models predicted end points within 1 order of magnitude 81% and 76% of the time, respectively, and had RMSEs of roughly 0.83 and 0.98 log10(mg/L), respectively. Benchmarks against the existing TEST and ECOSAR tools suggest improved prediction accuracy.
Collapse
Affiliation(s)
- Thomas Y. Sheffield
- U.S. Department of Energy, Oak Ridge Institute for Science and Education, Oak Ridge, TN, 37830, USA
| | - Richard S. Judson
- U.S. Environmental Protection Agency, National Center for Computational Toxicology, Research Triangle Park, NC, 27709, USA
| |
Collapse
|
47
|
Nelms MD, Lougee R, Roberts DW, Richard A, Patlewicz G. Comparing and contrasting the coverage of publicly available structural alerts for protein binding. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2019; 12:1-13. [PMID: 37701288 PMCID: PMC10494887 DOI: 10.1016/j.comtox.2019.100100] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
The molecular initiating event for many mechanisms of toxicological action comprise the reactive, covalent binding between an exogenous electrophile and an endogenous nucleophile. The target sites for electrophiles are typically peptides, proteins, enzymes or DNA. Of these, the formation of covalent adducts with proteins and DNA are perhaps the most established as they are most closely associated with skin sensitisation and genotoxicity endpoints. As such, being able to identify electrophilic features within a chemical structure provides a starting point to characterise its reactivity profile. There are a number of software tools that have been developed to help identify structural features indicative of electrophilic reactive potential to address various purposes, including: 1) to facilitate category formation for read-across of toxicity effects such as skin sensitisation potential, as well as 2) to profile substances to identify potential confounding factors to rationalise their activity in high-throughput screening (HTS) assays. Here, three such schemes that have been published in the literature as collections of SMARTS patterns and their associated chemical-biological reaction domains have been compared. The goals are 1) to better understand their scope and coverage, and 2) to assess their performance relative to a published skin sensitisation dataset where manual annotations to assign likely mechanistic domains based on expert judgement were already available. The 3 schemes were then applied to the Tox21 library and the consensus outcome was reported to highlight the proportion of chemicals likely to exhibit a reactivity response, specific to a mechanistic reaction domain, but non-specific with respect to target-tissue based activity. ToxPrint fingerprints were computed and activity enrichments computed to compare the structural features identified for the skin sensitisation dataset and Tox21 chemicals for each 'consensus' reaction domain. Enriched ToxPrints were also used to identify ToxCast assays potentially informative for reactivity.
Collapse
Affiliation(s)
- Mark D. Nelms
- Oak Ridge Institute for Science and Education (ORISE), 1299 Bethel Valley Road, Oak Ridge, TN 37830, USA
- National Center for Computational Toxicology (NCCT), Office of Research and Development, US Environmental Protection Agency (US EPA), 109 TW Alexander Dr, Research Triangle Park (RTP), NC 27711, USA
| | - Ryan Lougee
- Oak Ridge Institute for Science and Education (ORISE), 1299 Bethel Valley Road, Oak Ridge, TN 37830, USA
- National Center for Computational Toxicology (NCCT), Office of Research and Development, US Environmental Protection Agency (US EPA), 109 TW Alexander Dr, Research Triangle Park (RTP), NC 27711, USA
| | - David W. Roberts
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, UK
| | - Ann Richard
- National Center for Computational Toxicology (NCCT), Office of Research and Development, US Environmental Protection Agency (US EPA), 109 TW Alexander Dr, Research Triangle Park (RTP), NC 27711, USA
| | - Grace Patlewicz
- National Center for Computational Toxicology (NCCT), Office of Research and Development, US Environmental Protection Agency (US EPA), 109 TW Alexander Dr, Research Triangle Park (RTP), NC 27711, USA
| |
Collapse
|
48
|
Grulke CM, Williams AJ, Thillanadarajah I, Richard AM. EPA's DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research. ACTA ACUST UNITED AC 2019; 12. [PMID: 33426407 PMCID: PMC7787967 DOI: 10.1016/j.comtox.2019.100096] [Citation(s) in RCA: 94] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The US Environmental Protection Agency's (EPA) Distributed Structure-Searchable Toxicity (DSSTox) database, launched publicly in 2004, currently exceeds 875 K substances spanning hundreds of lists of interest to EPA and environmental researchers. From its inception, DSSTox has focused curation efforts on resolving chemical identifier errors and conflicts in the public domain towards the goal of assigning accurate chemical structures to data and lists of importance to the environmental research and regulatory community. Accurate structure-data associations, in turn, are necessary inputs to structure-based predictive models supporting hazard and risk assessments. In 2014, the legacy, manually curated DSSTox_V1 content was migrated to a MySQL data model, with modern cheminformatics tools supporting both manual and automated curation processes to increase efficiencies. This was followed by sequential auto-loads of filtered portions of three public datasets: EPA's Substance Registry Services (SRS), the National Library of Medicine's ChemID, and PubChem. This process was constrained by a key requirement of uniquely mapped identifiers (i.e., CAS RN, name and structure) for each substance, rejecting content where any two identifiers were conflicted either within or across datasets. This rejected content highlighted the degree of conflicting, inaccurate substance-structure ID mappings in the public domain, ranging from 12% (within EPA SRS) to 49% (across ChemID and PubChem). Substances successfully added to DSSTox from each auto-load were assigned to one of five qc_levels, conveying curator confidence in each dataset. This process enabled a significant expansion of DSSTox content to provide better coverage of the chemical landscape of interest to environmental scientists, while retaining focus on the accuracy of substance-structure-data associations. Currently, DSSTox serves as the core foundation of EPA's CompTox Chemicals Dashboard [https://comptox.epa.gov/dashboard], which provides public access to DSSTox content in support of a broad range of modeling and research activities within EPA and, increasingly, across the field of computational toxicology.
Collapse
Affiliation(s)
- Christopher M Grulke
- National Center for Computational Toxicology, Office of Research & Development, US Environmental Protection Agency, Mail Drop D143-02, Research Triangle Park, NC 27711, USA
| | - Antony J Williams
- National Center for Computational Toxicology, Office of Research & Development, US Environmental Protection Agency, Mail Drop D143-02, Research Triangle Park, NC 27711, USA
| | - Inthirany Thillanadarajah
- Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Ann M Richard
- National Center for Computational Toxicology, Office of Research & Development, US Environmental Protection Agency, Mail Drop D143-02, Research Triangle Park, NC 27711, USA
| |
Collapse
|
49
|
Now the future, we see our dreams: artificial intelligence in drug discovery. FUTURE DRUG DISCOVERY 2019. [DOI: 10.4155/fdd-2019-0027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
|
50
|
Mansouri K, Cariello NF, Korotcov A, Tkachenko V, Grulke CM, Sprankle CS, Allen D, Casey WM, Kleinstreuer NC, Williams AJ. Open-source QSAR models for pKa prediction using multiple machine learning approaches. J Cheminform 2019; 11:60. [PMID: 33430972 PMCID: PMC6749653 DOI: 10.1186/s13321-019-0384-1] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 09/03/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The logarithmic acid dissociation constant pKa reflects the ionization of a chemical, which affects lipophilicity, solubility, protein binding, and ability to pass through the plasma membrane. Thus, pKa affects chemical absorption, distribution, metabolism, excretion, and toxicity properties. Multiple proprietary software packages exist for the prediction of pKa, but to the best of our knowledge no free and open-source programs exist for this purpose. Using a freely available data set and three machine learning approaches, we developed open-source models for pKa prediction. METHODS The experimental strongest acidic and strongest basic pKa values in water for 7912 chemicals were obtained from DataWarrior, a freely available software package. Chemical structures were curated and standardized for quantitative structure-activity relationship (QSAR) modeling using KNIME, and a subset comprising 79% of the initial set was used for modeling. To evaluate different approaches to modeling, several datasets were constructed based on different processing of chemical structures with acidic and/or basic pKas. Continuous molecular descriptors, binary fingerprints, and fragment counts were generated using PaDEL, and pKa prediction models were created using three machine learning methods, (1) support vector machines (SVM) combined with k-nearest neighbors (kNN), (2) extreme gradient boosting (XGB) and (3) deep neural networks (DNN). RESULTS The three methods delivered comparable performances on the training and test sets with a root-mean-squared error (RMSE) around 1.5 and a coefficient of determination (R2) around 0.80. Two commercial pKa predictors from ACD/Labs and ChemAxon were used to benchmark the three best models developed in this work, and performance of our models compared favorably to the commercial products. CONCLUSIONS This work provides multiple QSAR models to predict the strongest acidic and strongest basic pKas of chemicals, built using publicly available data, and provided as free and open-source software on GitHub.
Collapse
Affiliation(s)
- Kamel Mansouri
- Integrated Laboratory Systems, Inc., P.O. Box 13501, Research Triangle Park, NC 27709 USA
| | - Neal F. Cariello
- Integrated Laboratory Systems, Inc., P.O. Box 13501, Research Triangle Park, NC 27709 USA
| | - Alexandru Korotcov
- Science Data Software LLC, 14914 Bradwill Court, Rockville, MD 20850 USA
| | - Valery Tkachenko
- Science Data Software LLC, 14914 Bradwill Court, Rockville, MD 20850 USA
| | - Chris M. Grulke
- National Center for Computational Toxicology, U.S. Environmental Protection Agency, 109 T.W. Alexander Dr., Mail Code D143-02, Research Triangle Park, NC 27709 USA
| | - Catherine S. Sprankle
- Integrated Laboratory Systems, Inc., P.O. Box 13501, Research Triangle Park, NC 27709 USA
| | - David Allen
- Integrated Laboratory Systems, Inc., P.O. Box 13501, Research Triangle Park, NC 27709 USA
| | - Warren M. Casey
- National Institute of Environmental Health Sciences, P.O. Box 12233, Mail Stop K2-16, Research Triangle Park, NC 27709 USA
| | - Nicole C. Kleinstreuer
- National Institute of Environmental Health Sciences, P.O. Box 12233, Mail Stop K2-16, Research Triangle Park, NC 27709 USA
| | - Antony J. Williams
- National Center for Computational Toxicology, U.S. Environmental Protection Agency, 109 T.W. Alexander Dr., Mail Code D143-02, Research Triangle Park, NC 27709 USA
| |
Collapse
|