51
|
Ambure P, Gajewicz-Skretna A, Cordeiro MNDS, Roy K. New Workflow for QSAR Model Development from Small Data Sets: Small Dataset Curator and Small Dataset Modeler. Integration of Data Curation, Exhaustive Double Cross-Validation, and a Set of Optimal Model Selection Techniques. J Chem Inf Model 2019; 59:4070-4076. [DOI: 10.1021/acs.jcim.9b00476] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Pravin Ambure
- LAQV@REQUIMTE/Department of Chemistry and Biochemistry, University of Porto, 4169-007 Porto, Portugal
| | - Agnieszka Gajewicz-Skretna
- Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, Gdansk 80-308, Poland
| | - M. Natalia D. S. Cordeiro
- LAQV@REQUIMTE/Department of Chemistry and Biochemistry, University of Porto, 4169-007 Porto, Portugal
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| |
Collapse
|
52
|
van der Spoel D, Manzetti S, Zhang H, Klamt A. Prediction of Partition Coefficients of Environmental Toxins Using Computational Chemistry Methods. ACS OMEGA 2019; 4:13772-13781. [PMID: 31497695 PMCID: PMC6713992 DOI: 10.1021/acsomega.9b01277] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 06/27/2019] [Indexed: 05/05/2023]
Abstract
The partitioning of compounds between aqueous and other phases is important for predicting toxicity. Although thousands of octanol-water partition coefficients have been measured, these represent only a small fraction of the anthropogenic compounds present in the environment. The octanol phase is often taken to be a mimic of the inner parts of phospholipid membranes. However, the core of such membranes is typically more hydrophobic than octanol, and other partition coefficients with other compounds may give complementary information. Although a number of (cheap) empirical methods exist to compute octanol-water (log k OW) and hexadecane-water (log k HW) partition coefficients, it would be interesting to know whether physics-based models can predict these crucial values more accurately. Here, we have computed log k OW and log k HW for 133 compounds from seven different pollutant categories as well as a control group using the solvation model based on electronic density (SMD) protocol based on Hartree-Fock (HF) or density functional theory (DFT) and the COSMO-RS method. For comparison, XlogP3 (log k OW) values were retrieved from the PubChem database, and KowWin log k OW values were determined as well. For 24 of these compounds, log k OW was computed using potential of mean force (PMF) calculations based on classical molecular dynamics simulations. A comparison of the accuracy of the methods shows that COSMO-RS, KowWin, and XlogP3 all have a root-mean-square deviation (rmsd) from the experimental data of ≈0.4 log units, whereas the SMD protocol has an rmsd of 1.0 log units using HF and 0.9 using DFT. PMF calculations yield the poorest accuracy (rmsd = 1.1 log units). Thirty-six out of 133 calculations are for compounds without known log k OW, and for these, we provide what we consider a robust prediction, in the sense that there are few outliers, by averaging over the methods. The results supplied may be instrumental when developing new methods in computational ecotoxicity. The log k HW values are found to be strongly correlated to log k OW for most compounds.
Collapse
Affiliation(s)
- David van der Spoel
- Uppsala Center for
Computational Chemistry, Science for Life Laboratory, Department of
Cell and Molecular Biology, Uppsala University, Husargatan 3, Box
596, SE-75124 Uppsala, Sweden
- E-mail: . Phone: +46 18 4714205
| | - Sergio Manzetti
- Uppsala Center for
Computational Chemistry, Science for Life Laboratory, Department of
Cell and Molecular Biology, Uppsala University, Husargatan 3, Box
596, SE-75124 Uppsala, Sweden
- Fjordforsk A.S., Institute
for Science and Technology, Midtun, 6894 Vangsnes, Norway
| | - Haiyang Zhang
- Department of Biological Science and Engineering,
School of Chemistry and Biological Engineering, University of Science and Technology Beijing, 100083 Beijing, China
| | - Andreas Klamt
- COSMOlogic GmbH & Co. KG, Imbacher Weg 46, D-51379 Leverkusen, Germany
- Institute of Physical and Theoretical Chemistry, University of Regensburg, 93053 Regensburg, Germany
| |
Collapse
|
53
|
Thomas RS, Bahadori T, Buckley TJ, Cowden J, Deisenroth C, Dionisio KL, Frithsen JB, Grulke CM, Gwinn MR, Harrill JA, Higuchi M, Houck KA, Hughes MF, Hunter ES, Isaacs KK, Judson RS, Knudsen TB, Lambert JC, Linnenbrink M, Martin TM, Newton SR, Padilla S, Patlewicz G, Paul-Friedman K, Phillips KA, Richard AM, Sams R, Shafer TJ, Setzer RW, Shah I, Simmons JE, Simmons SO, Singh A, Sobus JR, Strynar M, Swank A, Tornero-Valez R, Ulrich EM, Villeneuve DL, Wambaugh JF, Wetmore BA, Williams AJ. The Next Generation Blueprint of Computational Toxicology at the U.S. Environmental Protection Agency. Toxicol Sci 2019; 169:317-332. [PMID: 30835285 PMCID: PMC6542711 DOI: 10.1093/toxsci/kfz058] [Citation(s) in RCA: 217] [Impact Index Per Article: 43.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The U.S. Environmental Protection Agency (EPA) is faced with the challenge of efficiently and credibly evaluating chemical safety often with limited or no available toxicity data. The expanding number of chemicals found in commerce and the environment, coupled with time and resource requirements for traditional toxicity testing and exposure characterization, continue to underscore the need for new approaches. In 2005, EPA charted a new course to address this challenge by embracing computational toxicology (CompTox) and investing in the technologies and capabilities to push the field forward. The return on this investment has been demonstrated through results and applications across a range of human and environmental health problems, as well as initial application to regulatory decision-making within programs such as the EPA's Endocrine Disruptor Screening Program. The CompTox initiative at EPA is more than a decade old. This manuscript presents a blueprint to guide the strategic and operational direction over the next 5 years. The primary goal is to obtain broader acceptance of the CompTox approaches for application to higher tier regulatory decisions, such as chemical assessments. To achieve this goal, the blueprint expands and refines the use of high-throughput and computational modeling approaches to transform the components in chemical risk assessment, while systematically addressing key challenges that have hindered progress. In addition, the blueprint outlines additional investments in cross-cutting efforts to characterize uncertainty and variability, develop software and information technology tools, provide outreach and training, and establish scientific confidence for application to different public health and environmental regulatory decisions.
Collapse
Affiliation(s)
- Russell S. Thomas
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Tina Bahadori
- National Center for Environmental Assessment, Office of Research and Development, US Environmental Protection Agency
| | - Timothy J. Buckley
- National Exposure Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - John Cowden
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Chad Deisenroth
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Kathie L. Dionisio
- National Exposure Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Jeffrey B. Frithsen
- Chemical Safety for Sustainability National Research Program, Office of Research and Development, US Environmental Protection Agency
| | - Christopher M. Grulke
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Maureen R. Gwinn
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Joshua A. Harrill
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Mark Higuchi
- National Health and Environmental Effects Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Keith A. Houck
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Michael F. Hughes
- National Health and Environmental Effects Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - E. Sidney Hunter
- National Health and Environmental Effects Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Kristin K. Isaacs
- National Exposure Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Richard S. Judson
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Thomas B. Knudsen
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Jason C. Lambert
- National Center for Environmental Assessment, Office of Research and Development, US Environmental Protection Agency
| | - Monica Linnenbrink
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Todd M. Martin
- National Risk Management Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Seth R. Newton
- National Exposure Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Stephanie Padilla
- National Health and Environmental Effects Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Grace Patlewicz
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Katie Paul-Friedman
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Katherine A. Phillips
- National Exposure Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Ann M. Richard
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Reeder Sams
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Timothy J. Shafer
- National Health and Environmental Effects Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - R. Woodrow Setzer
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Imran Shah
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Jane E. Simmons
- National Health and Environmental Effects Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Steven O. Simmons
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Amar Singh
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Jon R. Sobus
- National Exposure Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Mark Strynar
- National Exposure Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Adam Swank
- National Exposure Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Rogelio Tornero-Valez
- National Exposure Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Elin M. Ulrich
- National Exposure Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Daniel L Villeneuve
- National Health and Environmental Effects Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - John F. Wambaugh
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| | - Barbara A. Wetmore
- National Exposure Research Laboratory, Office of Research and Development, US Environmental Protection Agency
| | - Antony J. Williams
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency
| |
Collapse
|
54
|
Ambure P, Halder AK, González Díaz H, Cordeiro MNDS. QSAR-Co: An Open Source Software for Developing Robust Multitasking or Multitarget Classification-Based QSAR Models. J Chem Inf Model 2019; 59:2538-2544. [PMID: 31083984 DOI: 10.1021/acs.jcim.9b00295] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Quantitative structure-activity relationships (QSAR) modeling is a well-known computational technique with wide applications in fields such as drug design, toxicity predictions, nanomaterials, etc. However, QSAR researchers still face certain problems to develop robust classification-based QSAR models, especially while handling response data pertaining to diverse experimental and/or theoretical conditions. In the present work, we have developed an open source standalone software "QSAR-Co" (available to download at https://sites.google.com/view/qsar-co ) to setup classification-based QSAR models that allow mining the response data coming from multiple conditions. The software comprises two modules: (1) the Model development module and (2) the Screen/Predict module. This user-friendly software provides several functionalities required for developing a robust multitasking or multitarget classification-based QSAR model using linear discriminant analysis or random forest techniques, with appropriate validation, following the principles set by the Organisation for Economic Co-operation and Development (OECD) for applying QSAR models in regulatory assessments.
Collapse
Affiliation(s)
- Pravin Ambure
- LAQV@REQUIMTE, Department of Chemistry and Biochemistry , University of Porto , 4169-007 Porto , Portugal
| | - Amit Kumar Halder
- LAQV@REQUIMTE, Department of Chemistry and Biochemistry , University of Porto , 4169-007 Porto , Portugal
| | - Humbert González Díaz
- Department of Organic Chemistry II , University of Basque Country UPV/EHU , 48940 Leioa , Spain
| | - M Natália D S Cordeiro
- LAQV@REQUIMTE, Department of Chemistry and Biochemistry , University of Porto , 4169-007 Porto , Portugal
| |
Collapse
|
55
|
Vighi M, Barsi A, Focks A, Grisoni F. Predictive models in ecotoxicology: Bridging the gap between scientific progress and regulatory applicability-Remarks and research needs. INTEGRATED ENVIRONMENTAL ASSESSMENT AND MANAGEMENT 2019; 15:345-351. [PMID: 30821044 DOI: 10.1002/ieam.4136] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Accepted: 02/18/2019] [Indexed: 06/09/2023]
Abstract
This paper concludes a special series of 7 articles (4 on toxicokinetic-toxicodynamic [TK-TD] models and 3 on quantitative structure-activity relationship [QSAR] models) published in previous issues of Integrated Environmental Assessment and Management (IEAM). The present paper summarizes the special series articles and highlights their contribution to the topic of increasing the regulatory applicability of effect models. For both TK-TD and QSAR approaches, we then describe the main research needs. The use of TK-TD models for describing sublethal effects must be better developed, particularly through the improvement of the dynamic energy budget (DEBtox) approach. The potential of TK-TD models for moving from lower (molecular) to higher (population) hierarchical levels is highlighted as a promising research line. Some relevant issues to improve the acceptance of QSAR models at the regulatory level are also described, such as increased transparency of the performance assessment and of the modeling algorithms, model documentation, relevance of the chosen target for regulatory needs, and improved mechanistic interpretability. Integr Environ Assess Manag 2019;00:000-000. © 2019 SETAC.
Collapse
Affiliation(s)
- Marco Vighi
- IMDEA Water Institute, Alcalà de Henares (Madrid), Spain
| | - Alpar Barsi
- Dutch Board for the Authorisation of Plant Protection Products and Biocides (Ctgb), Ede, Netherlands
| | - Andreas Focks
- Wageningen University & Research, Wageningen, Netherlands
| | - Francesca Grisoni
- University of Milano-Bicocca, Department of Earth and Environmental Sciences, Milano, Italy
| |
Collapse
|
56
|
Ellison CA, Blackburn KL, Carmichael PL, Clewell HJ, Cronin MTD, Desprez B, Escher SE, Ferguson SS, Grégoire S, Hewitt NJ, Hollnagel HM, Klaric M, Patel A, Salhi S, Schepky A, Schmitt BG, Wambaugh JF, Worth A. Challenges in working towards an internal threshold of toxicological concern (iTTC) for use in the safety assessment of cosmetics: Discussions from the Cosmetics Europe iTTC Working Group workshop. Regul Toxicol Pharmacol 2019; 103:63-72. [PMID: 30653989 PMCID: PMC6644721 DOI: 10.1016/j.yrtph.2019.01.016] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 01/02/2019] [Accepted: 01/07/2019] [Indexed: 11/22/2022]
Abstract
The Threshold of Toxicological Concern (TTC) is an important risk assessment tool which establishes acceptable low-level exposure values to be applied to chemicals with limited toxicological data. One of the logical next steps in the continued evolution of TTC is to develop this concept further so that it is representative of internal exposures (TTC based on plasma concentration). An internal TTC (iTTC) would provide threshold values that could be utilized in exposure-based safety assessments. As part of a Cosmetics Europe (CosEu) research program, CosEu has initiated a project that is working towards the development of iTTCs that can be used for the human safety assessment. Knowing that the development of an iTTC is an ambitious and broad-spanning topic, CosEu organized a Working Group comprised a balance of multiple stakeholders (cosmetics and chemical industries, the EPA and JRC and academia) with relevant experience and expertise and workshop to critically evaluate the requirements to establish an iTTC. Outcomes from the workshop included an evaluation on the current state of the science for iTTC, the overall iTTC strategy, selection of chemical databases, capture and curation of chemical information, ADME and repeat dose data, expected challenges, as well as next steps and ongoing work.
Collapse
Affiliation(s)
- Corie A Ellison
- The Procter & Gamble Company, Cincinnati, OH, United States.
| | | | - Paul L Carmichael
- Unilever Safety and Environmental Assurance Center, Bedfordshire, UK
| | | | - Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, England, UK
| | | | - Sylvia E Escher
- Fraunhofer Institute for Toxicology and Experimental Medicine, Hannover, Germany
| | - Steve S Ferguson
- National Institute of Environmental Health Sciences, North Carolina, United States
| | | | | | | | | | - Atish Patel
- Research Institute for Fragrance Materials, New Jersey, United States
| | | | | | | | - John F Wambaugh
- United States Environmental Protection Agency, National Center for Computational Toxicology, North Carolina, United States
| | - Andrew Worth
- European Commission, Joint Research Centre, Ispra, Italy
| |
Collapse
|
57
|
Idakwo G, Luttrell J, Chen M, Hong H, Zhou Z, Gong P, Zhang C. A review on machine learning methods for in silico toxicity prediction. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, ENVIRONMENTAL CARCINOGENESIS & ECOTOXICOLOGY REVIEWS 2019; 36:169-191. [PMID: 30628866 DOI: 10.1080/10590501.2018.1537118] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In silico toxicity prediction plays an important role in the regulatory decision making and selection of leads in drug design as in vitro/vivo methods are often limited by ethics, time, budget, and other resources. Many computational methods have been employed in predicting the toxicity profile of chemicals. This review provides a detailed end-to-end overview of the application of machine learning algorithms to Structure-Activity Relationship (SAR)-based predictive toxicology. From raw data to model validation, the importance of data quality is stressed as it greatly affects the predictive power of derived models. Commonly overlooked challenges such as data imbalance, activity cliff, model evaluation, and definition of applicability domain are highlighted, and plausible solutions for alleviating these challenges are discussed.
Collapse
Affiliation(s)
- Gabriel Idakwo
- a School of Computing Sciences and Computer Engineering , University of Southern Mississippi , Hattiesburg , Mississippi , USA
| | - Joseph Luttrell
- a School of Computing Sciences and Computer Engineering , University of Southern Mississippi , Hattiesburg , Mississippi , USA
| | - Minjun Chen
- b Division of Bioinformatics and Biostatistics, National Center for Toxicological Science , US Food and Drug Administration , Jefferson , Arkansas , USA
| | - Huixiao Hong
- b Division of Bioinformatics and Biostatistics, National Center for Toxicological Science , US Food and Drug Administration , Jefferson , Arkansas , USA
| | - Zhaoxian Zhou
- a School of Computing Sciences and Computer Engineering , University of Southern Mississippi , Hattiesburg , Mississippi , USA
| | - Ping Gong
- c Environmental Laboratory , US Army Engineer Research and Development Center , Vicksburg , Mississippi , USA
| | - Chaoyang Zhang
- a School of Computing Sciences and Computer Engineering , University of Southern Mississippi , Hattiesburg , Mississippi , USA
| |
Collapse
|
58
|
Gadaleta D, Lombardo A, Toma C, Benfenati E. A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications. J Cheminform 2018; 10:60. [PMID: 30536051 PMCID: PMC6503381 DOI: 10.1186/s13321-018-0315-6] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Accepted: 12/01/2018] [Indexed: 11/30/2022] Open
Abstract
The quality of data used for QSAR model derivation is extremely important as it strongly affects the final robustness and predictive power of the model. Ambiguous or wrong structures need to be carefully checked, because they lead to errors in calculation of descriptors, hence leading to meaningless results. The increasing amounts of data, however, have often made it hard to check of very large databases manually. In the light of this, we designed and implemented a semi-automated workflow integrating structural data retrieval from several web-based databases, automated comparison of these data, chemical structure cleaning, selection and standardization of data into a consistent, ready-to-use format that can be employed for modeling. The workflow integrates best practices for data curation that have been suggested in the recent literature. The workflow has been implemented with the freely available KNIME software and is freely available to the cheminformatics community for improvement and application to a broad range of chemical datasets.![]()
Collapse
Affiliation(s)
- Domenico Gadaleta
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via la Masa 19, 20156, Milan, Italy.
| | - Anna Lombardo
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via la Masa 19, 20156, Milan, Italy
| | - Cosimo Toma
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via la Masa 19, 20156, Milan, Italy
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via la Masa 19, 20156, Milan, Italy
| |
Collapse
|
59
|
Nicolas CI, Mansouri K, Phillips KA, Grulke CM, Richard AM, Williams AJ, Rabinowitz J, Isaacs KK, Yau A, Wambaugh JF. Rapid experimental measurements of physicochemical properties to inform models and testing. THE SCIENCE OF THE TOTAL ENVIRONMENT 2018; 636:901-909. [PMID: 29729507 PMCID: PMC6214190 DOI: 10.1016/j.scitotenv.2018.04.266] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Revised: 04/19/2018] [Accepted: 04/20/2018] [Indexed: 04/14/2023]
Abstract
The structures and physicochemical properties of chemicals are important for determining their potential toxicological effects, toxicokinetics, and route(s) of exposure. These data are needed to prioritize the risk for thousands of environmental chemicals, but experimental values are often lacking. In an attempt to efficiently fill data gaps in physicochemical property information, we generated new data for 200 structurally diverse compounds, which were rigorously selected from the USEPA ToxCast chemical library, and whose structures are available within the Distributed Structure-Searchable Toxicity Database (DSSTox). This pilot study evaluated rapid experimental methods to determine five physicochemical properties, including the log of the octanol:water partition coefficient (known as log(Kow) or logP), vapor pressure, water solubility, Henry's law constant, and the acid dissociation constant (pKa). For most compounds, experiments were successful for at least one property; log(Kow) yielded the largest return (176 values). It was determined that 77 ToxPrint structural features were enriched in chemicals with at least one measurement failure, indicating which features may have played a role in rapid method failures. To gauge consistency with traditional measurement methods, the new measurements were compared with previous measurements (where available). Since quantitative structure-activity/property relationship (QSAR/QSPR) models are used to fill gaps in physicochemical property information, 5 suites of QSPRs were evaluated for their predictive ability and chemical coverage or applicability domain of new experimental measurements. The ability to have accurate measurements of these properties will facilitate better exposure predictions in two ways: 1) direct input of these experimental measurements into exposure models; and 2) construction of QSPRs with a wider applicability domain, as their predicted physicochemical values can be used to parameterize exposure models in the absence of experimental data.
Collapse
Affiliation(s)
- Chantel I Nicolas
- ScitoVation, LLC 6 Davis Drive, Durham, NC 27703, USA; National Center for Computational Toxicology, Office of Research and Development, US EPA, Research Triangle Park, NC 27711, USA; Oak Ridge Institute for Science and Education, Oak Ridge, TN 37831, USA
| | - Kamel Mansouri
- ScitoVation, LLC 6 Davis Drive, Durham, NC 27703, USA; National Center for Computational Toxicology, Office of Research and Development, US EPA, Research Triangle Park, NC 27711, USA; Oak Ridge Institute for Science and Education, Oak Ridge, TN 37831, USA
| | - Katherine A Phillips
- National Exposure Research Laboratory, Office of Research and Development, US EPA, Research Triangle Park, NC 27711, USA
| | - Christopher M Grulke
- National Center for Computational Toxicology, Office of Research and Development, US EPA, Research Triangle Park, NC 27711, USA
| | - Ann M Richard
- National Center for Computational Toxicology, Office of Research and Development, US EPA, Research Triangle Park, NC 27711, USA
| | - Antony J Williams
- National Center for Computational Toxicology, Office of Research and Development, US EPA, Research Triangle Park, NC 27711, USA
| | - James Rabinowitz
- National Center for Computational Toxicology, Office of Research and Development, US EPA, Research Triangle Park, NC 27711, USA
| | - Kristin K Isaacs
- National Exposure Research Laboratory, Office of Research and Development, US EPA, Research Triangle Park, NC 27711, USA
| | - Alice Yau
- Southwest Research Institute, San Antonio, TX 78238, USA
| | - John F Wambaugh
- National Center for Computational Toxicology, Office of Research and Development, US EPA, Research Triangle Park, NC 27711, USA.
| |
Collapse
|
60
|
Oreluk J, Liu Z, Hegde A, Li W, Packard A, Frenklach M, Zubarev D. Diagnostics of Data-Driven Models: Uncertainty Quantification of PM7 Semi-Empirical Quantum Chemical Method. Sci Rep 2018; 8:13248. [PMID: 30185953 PMCID: PMC6125339 DOI: 10.1038/s41598-018-31677-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 08/22/2018] [Indexed: 12/21/2022] Open
Abstract
We report an evaluation of a semi-empirical quantum chemical method PM7 from the perspective of uncertainty quantification. Specifically, we apply Bound-to-Bound Data Collaboration, an uncertainty quantification framework, to characterize (a) variability of PM7 model parameter values consistent with the uncertainty in the training data and (b) uncertainty propagation from the training data to the model predictions. Experimental heats of formation of a homologous series of linear alkanes are used as the property of interest. The training data are chemically accurate, i.e., they have very low uncertainty by the standards of computational chemistry. The analysis does not find evidence of PM7 consistency with the entire data set considered as no single set of parameter values is found that captures the experimental uncertainties of all training data. A set of parameter values for PM7 was able to capture the training data within ±1 kcal/mol, but not to the smaller level of uncertainty in the reported data. Nevertheless, PM7 was found to be consistent for subsets of the training data. In such cases, uncertainty propagation from the chemically accurate training data to the predicted values preserves error within bounds of chemical accuracy if predictions are made for the molecules of comparable size. Otherwise, the error grows linearly with the relative size of the molecules.
Collapse
Affiliation(s)
- James Oreluk
- Department of Mechanical Engineering, University of California at Berkeley, Berkeley, California, 94720-1740, USA
| | - Zhenyuan Liu
- Department of Mechanical Engineering, University of California at Berkeley, Berkeley, California, 94720-1740, USA
| | - Arun Hegde
- Department of Mechanical Engineering, University of California at Berkeley, Berkeley, California, 94720-1740, USA
| | - Wenyu Li
- Department of Mechanical Engineering, University of California at Berkeley, Berkeley, California, 94720-1740, USA
| | - Andrew Packard
- Department of Mechanical Engineering, University of California at Berkeley, Berkeley, California, 94720-1740, USA
| | - Michael Frenklach
- Department of Mechanical Engineering, University of California at Berkeley, Berkeley, California, 94720-1740, USA.
| | - Dmitry Zubarev
- IBM Almaden Research Center, 650 Harry Road, San Jose, California, 95136, USA
| |
Collapse
|
61
|
Sobus JR, Wambaugh JF, Isaacs KK, Williams AJ, McEachran AD, Richard AM, Grulke CM, Ulrich EM, Rager JE, Strynar MJ, Newton SR. Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2018; 28:411-426. [PMID: 29288256 PMCID: PMC6661898 DOI: 10.1038/s41370-017-0012-y] [Citation(s) in RCA: 136] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Revised: 08/04/2017] [Accepted: 08/25/2017] [Indexed: 05/18/2023]
Abstract
Tens-of-thousands of chemicals are registered in the U.S. for use in countless processes and products. Recent evidence suggests that many of these chemicals are measureable in environmental and/or biological systems, indicating the potential for widespread exposures. Traditional public health research tools, including in vivo studies and targeted analytical chemistry methods, have been unable to meet the needs of screening programs designed to evaluate chemical safety. As such, new tools have been developed to enable rapid assessment of potentially harmful chemical exposures and their attendant biological responses. One group of tools, known as "non-targeted analysis" (NTA) methods, allows the rapid characterization of thousands of never-before-studied compounds in a wide variety of environmental, residential, and biological media. This article discusses current applications of NTA methods, challenges to their effective use in chemical screening studies, and ways in which shared resources (e.g., chemical standards, databases, model predictions, and media measurements) can advance their use in risk-based chemical prioritization. A brief review is provided of resources and projects within EPA's Office of Research and Development (ORD) that provide benefit to, and receive benefits from, NTA research endeavors. A summary of EPA's Non-Targeted Analysis Collaborative Trial (ENTACT) is also given, which makes direct use of ORD resources to benefit the global NTA research community. Finally, a research framework is described that shows how NTA methods will bridge chemical prioritization efforts within ORD. This framework exists as a guide for institutions seeking to understand the complexity of chemical exposures, and the impact of these exposures on living systems.
Collapse
Affiliation(s)
- Jon R Sobus
- U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA.
| | - John F Wambaugh
- U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Kristin K Isaacs
- U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Antony J Williams
- U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Andrew D McEachran
- Oak Ridge Institute for Science and Education (ORISE) Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Ann M Richard
- U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Christopher M Grulke
- U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Elin M Ulrich
- U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Julia E Rager
- Oak Ridge Institute for Science and Education (ORISE) Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
- ToxStrategies, Inc., 9390 Research Blvd., Suite 100, Austin, TX, 78759, USA
| | - Mark J Strynar
- U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| | - Seth R Newton
- U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
| |
Collapse
|
62
|
McEachran AD, Mansouri K, Grulke C, Schymanski EL, Ruttkies C, Williams AJ. "MS-Ready" structures for non-targeted high-resolution mass spectrometry screening studies. J Cheminform 2018; 10:45. [PMID: 30167882 PMCID: PMC6117229 DOI: 10.1186/s13321-018-0299-2] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 08/21/2018] [Indexed: 02/05/2023] Open
Abstract
Chemical database searching has become a fixture in many non-targeted identification workflows based on high-resolution mass spectrometry (HRMS). However, the form of a chemical structure observed in HRMS does not always match the form stored in a database (e.g., the neutral form versus a salt; one component of a mixture rather than the mixture form used in a consumer product). Linking the form of a structure observed via HRMS to its related form(s) within a database will enable the return of all relevant variants of a structure, as well as the related metadata, in a single query. A Konstanz Information Miner (KNIME) workflow has been developed to produce structural representations observed using HRMS ("MS-Ready structures") and links them to those stored in a database. These MS-Ready structures, and associated mappings to the full chemical representations, are surfaced via the US EPA's Chemistry Dashboard ( https://comptox.epa.gov/dashboard/ ). This article describes the workflow for the generation and linking of ~ 700,000 MS-Ready structures (derived from ~ 760,000 original structures) as well as download, search and export capabilities to serve structure identification using HRMS. The importance of this form of structural representation for HRMS is demonstrated with several examples, including integration with the in silico fragmentation software application MetFrag. The structures, search, download and export functionality are all available through the CompTox Chemistry Dashboard, while the MetFrag implementation can be viewed at https://msbi.ipb-halle.de/MetFragBeta/ .
Collapse
Affiliation(s)
- Andrew D. McEachran
- Oak Ridge Institute for Science and Education (ORISE) Research Participation Program, U.S. Environmental Protection Agency, 109 T.W. Alexander Dr., Research Triangle Park, NC 27711 USA
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Mail Drop D143-02, 109 T.W. Alexander Dr., Research Triangle Park, NC 27711 USA
| | - Kamel Mansouri
- Oak Ridge Institute for Science and Education (ORISE) Research Participation Program, U.S. Environmental Protection Agency, 109 T.W. Alexander Dr., Research Triangle Park, NC 27711 USA
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Mail Drop D143-02, 109 T.W. Alexander Dr., Research Triangle Park, NC 27711 USA
- Present Address: Integrated Laboratory Systems, Inc., 601 Keystone Dr., Morrisville, NC 27650 USA
| | - Chris Grulke
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Mail Drop D143-02, 109 T.W. Alexander Dr., Research Triangle Park, NC 27711 USA
| | - Emma L. Schymanski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6, avenue du Swing, 4367 Belvaux, Luxembourg
| | - Christoph Ruttkies
- Department of Stress and Development Biology, Leibniz Institute of Plant Biochemistry (IPB), Weinberg 3, 06120 Halle (Saale), Germany
| | - Antony J. Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Mail Drop D143-02, 109 T.W. Alexander Dr., Research Triangle Park, NC 27711 USA
| |
Collapse
|
63
|
Myatt GJ, Ahlberg E, Akahori Y, Allen D, Amberg A, Anger LT, Aptula A, Auerbach S, Beilke L, Bellion P, Benigni R, Bercu J, Booth ED, Bower D, Brigo A, Burden N, Cammerer Z, Cronin MTD, Cross KP, Custer L, Dettwiler M, Dobo K, Ford KA, Fortin MC, Gad-McDonald SE, Gellatly N, Gervais V, Glover KP, Glowienke S, Van Gompel J, Gutsell S, Hardy B, Harvey JS, Hillegass J, Honma M, Hsieh JH, Hsu CW, Hughes K, Johnson C, Jolly R, Jones D, Kemper R, Kenyon MO, Kim MT, Kruhlak NL, Kulkarni SA, Kümmerer K, Leavitt P, Majer B, Masten S, Miller S, Moser J, Mumtaz M, Muster W, Neilson L, Oprea TI, Patlewicz G, Paulino A, Lo Piparo E, Powley M, Quigley DP, Reddy MV, Richarz AN, Ruiz P, Schilter B, Serafimova R, Simpson W, Stavitskaya L, Stidl R, Suarez-Rodriguez D, Szabo DT, Teasdale A, Trejo-Martin A, Valentin JP, Vuorinen A, Wall BA, Watts P, White AT, Wichard J, Witt KL, Woolley A, Woolley D, Zwickl C, Hasselgren C. In silico toxicology protocols. Regul Toxicol Pharmacol 2018; 96:1-17. [PMID: 29678766 DOI: 10.1016/j.yrtph.2018.04.014] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Revised: 03/16/2018] [Accepted: 04/16/2018] [Indexed: 10/17/2022]
Abstract
The present publication surveys several applications of in silico (i.e., computational) toxicology approaches across different industries and institutions. It highlights the need to develop standardized protocols when conducting toxicity-related predictions. This contribution articulates the information needed for protocols to support in silico predictions for major toxicological endpoints of concern (e.g., genetic toxicity, carcinogenicity, acute toxicity, reproductive toxicity, developmental toxicity) across several industries and regulatory bodies. Such novel in silico toxicology (IST) protocols, when fully developed and implemented, will ensure in silico toxicological assessments are performed and evaluated in a consistent, reproducible, and well-documented manner across industries and regulatory bodies to support wider uptake and acceptance of the approaches. The development of IST protocols is an initiative developed through a collaboration among an international consortium to reflect the state-of-the-art in in silico toxicology for hazard identification and characterization. A general outline for describing the development of such protocols is included and it is based on in silico predictions and/or available experimental data for a defined series of relevant toxicological effects or mechanisms. The publication presents a novel approach for determining the reliability of in silico predictions alongside experimental data. In addition, we discuss how to determine the level of confidence in the assessment based on the relevance and reliability of the information.
Collapse
Affiliation(s)
- Glenn J Myatt
- Leadscope, Inc., 1393 Dublin Rd, Columbus, OH 43215, USA.
| | - Ernst Ahlberg
- Predictive Compound ADME & Safety, Drug Safety & Metabolism, AstraZeneca IMED Biotech Unit, Mölndal, Sweden
| | - Yumi Akahori
- Chemicals Evaluation and Research Institute, 1-4-25 Kouraku, Bunkyo-ku, Tokyo 112-0004 Japan
| | - David Allen
- Integrated Laboratory Systems, Inc., Research Triangle Park, NC, USA
| | - Alexander Amberg
- Sanofi, R&D Preclinical Safety Frankfurt, Industriepark Hoechst, D-65926 Frankfurt am Main, Germany
| | - Lennart T Anger
- Sanofi, R&D Preclinical Safety Frankfurt, Industriepark Hoechst, D-65926 Frankfurt am Main, Germany
| | - Aynur Aptula
- Unilever, Safety and Environmental Assurance Centre, Colworth, Beds, UK
| | - Scott Auerbach
- The National Institute of Environmental Health Sciences, Division of the National Toxicology Program, Research Triangle Park, NC 27709, USA
| | - Lisa Beilke
- Toxicology Solutions Inc., San Diego, CA, USA
| | | | | | - Joel Bercu
- Gilead Sciences, 333 Lakeside Drive, Foster City, CA, USA
| | - Ewan D Booth
- Syngenta, Product Safety Department, Jealott's Hill International Research Centre, Bracknell, Berkshire, RG42 6EY, UK
| | - Dave Bower
- Leadscope, Inc., 1393 Dublin Rd, Columbus, OH 43215, USA
| | - Alessandro Brigo
- Roche Pharmaceutical Research & Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, Switzerland
| | - Natalie Burden
- National Centre for the Replacement, Refinement and Reduction of Animals in Research (NC3Rs), Gibbs Building, 215 Euston Road, London NW1 2BE, UK
| | - Zoryana Cammerer
- Janssen Research & Development, 1400 McKean Road, Spring House, PA 19477, USA
| | - Mark T D Cronin
- School of Pharmacy and Chemistry, Liverpool John Moores University, Liverpool, L3 3AF, UK
| | - Kevin P Cross
- Leadscope, Inc., 1393 Dublin Rd, Columbus, OH 43215, USA
| | - Laura Custer
- Bristol-Myers Squibb, Drug Safety Evaluation, 1 Squibb Dr, New Brunswick, NJ 08903, USA
| | | | - Krista Dobo
- Pfizer Global Research & Development, 558 Eastern Point Road, Groton, CT 06340, USA
| | - Kevin A Ford
- Global Blood Therapeutics, South San Francisco, CA 94080, USA
| | - Marie C Fortin
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers, The State University of New Jersey, 170 Frelinghuysen Rd, Piscataway, NJ 08855, USA
| | | | - Nichola Gellatly
- National Centre for the Replacement, Refinement and Reduction of Animals in Research (NC3Rs), Gibbs Building, 215 Euston Road, London NW1 2BE, UK
| | | | - Kyle P Glover
- Defense Threat Reduction Agency, Edgewood Chemical Biological Center, Aberdeen Proving Ground, MD 21010, USA
| | - Susanne Glowienke
- Novartis Pharma AG, Pre-Clinical Safety, Werk Klybeck, CH-4057, Basel, Switzerland
| | - Jacky Van Gompel
- Janssen Pharmaceutical Companies of Johnson & Johnson, 2340 Beerse, Belgium
| | - Steve Gutsell
- Unilever, Safety and Environmental Assurance Centre, Colworth, Beds, UK
| | - Barry Hardy
- Douglas Connect GmbH, Technology Park Basel, Hochbergerstrasse 60C, CH-4057 Basel / Basel-Stadt, Switzerland
| | - James S Harvey
- GlaxoSmithKline Pre-Clinical Development, Park Road, Ware, Hertfordshire, SG12 0DP, UK
| | - Jedd Hillegass
- Bristol-Myers Squibb, Drug Safety Evaluation, 1 Squibb Dr, New Brunswick, NJ 08903, USA
| | | | - Jui-Hua Hsieh
- Kelly Government Solutions, Research Triangle Park, NC 27709, USA
| | - Chia-Wen Hsu
- FDA Center for Drug Evaluation and Research, Silver Spring, MD 20993, USA
| | - Kathy Hughes
- Existing Substances Risk Assessment Bureau, Health Canada, Ottawa, ON, K1A 0K9, Canada
| | | | - Robert Jolly
- Toxicology Division, Eli Lilly and Company, Indianapolis, IN, USA
| | - David Jones
- Medicines and Healthcare Products Regulatory Agency, 151 Buckingham Palace Road, London, SW1W 9SZ, UK
| | - Ray Kemper
- Vertex Pharmaceuticals Inc., Discovery and Investigative Toxicology, 50 Northern Ave, Boston, MA, USA
| | - Michelle O Kenyon
- Pfizer Global Research & Development, 558 Eastern Point Road, Groton, CT 06340, USA
| | - Marlene T Kim
- FDA Center for Drug Evaluation and Research, Silver Spring, MD 20993, USA
| | - Naomi L Kruhlak
- FDA Center for Drug Evaluation and Research, Silver Spring, MD 20993, USA
| | - Sunil A Kulkarni
- Existing Substances Risk Assessment Bureau, Health Canada, Ottawa, ON, K1A 0K9, Canada
| | - Klaus Kümmerer
- Institute for Sustainable and Environmental Chemistry, Leuphana University Lüneburg, Scharnhorststraße 1/C13.311b, 21335 Lüneburg, Germany
| | - Penny Leavitt
- Bristol-Myers Squibb, Drug Safety Evaluation, 1 Squibb Dr, New Brunswick, NJ 08903, USA
| | | | - Scott Masten
- The National Institute of Environmental Health Sciences, Division of the National Toxicology Program, Research Triangle Park, NC 27709, USA
| | - Scott Miller
- Leadscope, Inc., 1393 Dublin Rd, Columbus, OH 43215, USA
| | - Janet Moser
- Chemical Security Analysis Center, Department of Homeland Security, 3401 Ricketts Point Road, Aberdeen Proving Ground, MD 21010-5405, USA; Battelle Memorial Institute, 505 King Avenue, Columbus, OH 43210, USA
| | - Moiz Mumtaz
- Agency for Toxic Substances and Disease Registry, US Department of Health and Human Services, Atlanta, GA, USA
| | - Wolfgang Muster
- Roche Pharmaceutical Research & Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, Switzerland
| | - Louise Neilson
- British American Tobacco, Research and Development, Regents Park Road, Southampton, Hampshire, SO15 8TL, UK
| | - Tudor I Oprea
- Translational Informatics Division, Department of Internal Medicine, Health Sciences Center, The University of New Mexico, NM, USA
| | - Grace Patlewicz
- U.S. Environmental Protection Agency, National Center for Computational Toxicology, Research Triangle Park, NC 27711, USA
| | - Alexandre Paulino
- SAPEC Agro, S.A., Avenida do Rio Tejo, Herdade das Praias, 2910-440 Setúbal, Portugal
| | - Elena Lo Piparo
- Chemical Food Safety Group, Nestlé Research Center, Lausanne, Switzerland
| | - Mark Powley
- FDA Center for Drug Evaluation and Research, Silver Spring, MD 20993, USA
| | | | | | - Andrea-Nicole Richarz
- European Commission, Joint Research Centre, Directorate for Health, Consumers and Reference Materials, Chemical Safety and Alternative Methods Unit, Via Enrico Fermi 2749, 21027 Ispra, VA, Italy
| | - Patricia Ruiz
- Agency for Toxic Substances and Disease Registry, US Department of Health and Human Services, Atlanta, GA, USA
| | - Benoit Schilter
- Chemical Food Safety Group, Nestlé Research Center, Lausanne, Switzerland
| | | | - Wendy Simpson
- Unilever, Safety and Environmental Assurance Centre, Colworth, Beds, UK
| | - Lidiya Stavitskaya
- FDA Center for Drug Evaluation and Research, Silver Spring, MD 20993, USA
| | | | | | - David T Szabo
- RAI Services Company, 950 Reynolds Blvd., Winston-Salem, NC 27105, USA
| | | | | | | | | | - Brian A Wall
- Colgate-Palmolive Company, Piscataway, NJ 08854, USA
| | - Pete Watts
- Bibra, Cantium House, Railway Approach, Wallington, Surrey, SM6 0DZ, UK
| | - Angela T White
- GlaxoSmithKline Pre-Clinical Development, Park Road, Ware, Hertfordshire, SG12 0DP, UK
| | - Joerg Wichard
- Bayer Pharma AG, Investigational Toxicology, Muellerstr. 178, D-13353 Berlin, Germany
| | - Kristine L Witt
- The National Institute of Environmental Health Sciences, Division of the National Toxicology Program, Research Triangle Park, NC 27709, USA
| | - Adam Woolley
- ForthTox Limited, PO Box 13550, Linlithgow, EH49 7YU, UK
| | - David Woolley
- ForthTox Limited, PO Box 13550, Linlithgow, EH49 7YU, UK
| | - Craig Zwickl
- Transendix LLC, 1407 Moores Manor, Indianapolis, IN 46229, USA
| | | |
Collapse
|
64
|
Howe DG. A statistical approach to identify, monitor, and manage incomplete curated data sets. BMC Bioinformatics 2018; 19:110. [PMID: 29609549 PMCID: PMC5879614 DOI: 10.1186/s12859-018-2121-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 03/21/2018] [Indexed: 12/16/2022] Open
Abstract
Background Many biological knowledge bases gather data through expert curation of published literature. High data volume, selective partial curation, delays in access, and publication of data prior to the ability to curate it can result in incomplete curation of published data. Knowing which data sets are incomplete and how incomplete they are remains a challenge. Awareness that a data set may be incomplete is important for proper interpretation, to avoiding flawed hypothesis generation, and can justify further exploration of published literature for additional relevant data. Computational methods to assess data set completeness are needed. One such method is presented here. Results In this work, a multivariate linear regression model was used to identify genes in the Zebrafish Information Network (ZFIN) Database having incomplete curated gene expression data sets. Starting with 36,655 gene records from ZFIN, data aggregation, cleansing, and filtering reduced the set to 9870 gene records suitable for training and testing the model to predict the number of expression experiments per gene. Feature engineering and selection identified the following predictive variables: the number of journal publications; the number of journal publications already attributed for gene expression annotation; the percent of journal publications already attributed for expression data; the gene symbol; and the number of transgenic constructs associated with each gene. Twenty-five percent of the gene records (2483 genes) were used to train the model. The remaining 7387 genes were used to test the model. One hundred and twenty-two and 165 of the 7387 tested genes were identified as missing expression annotations based on their residuals being outside the model lower or upper 95% confidence interval respectively. The model had precision of 0.97 and recall of 0.71 at the negative 95% confidence interval and precision of 0.76 and recall of 0.73 at the positive 95% confidence interval. Conclusions This method can be used to identify data sets that are incompletely curated, as demonstrated using the gene expression data set from ZFIN. This information can help both database resources and data consumers gauge when it may be useful to look further for published data to augment the existing expertly curated information. Electronic supplementary material The online version of this article (10.1186/s12859-018-2121-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Douglas G Howe
- The Institute of Neuroscience, University of Oregon, Eugene, OR, USA.
| |
Collapse
|
65
|
Mansouri K, Grulke CM, Judson RS, Williams AJ. OPERA models for predicting physicochemical properties and environmental fate endpoints. J Cheminform 2018. [PMID: 29520515 PMCID: PMC5843579 DOI: 10.1186/s13321-018-0263-1] [Citation(s) in RCA: 271] [Impact Index Per Article: 45.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The collection of chemical structure information and associated experimental data for quantitative structure–activity/property relationship (QSAR/QSPR) modeling is facilitated by an increasing number of public databases containing large amounts of useful data. However, the performance of QSAR models highly depends on the quality of the data and modeling methodology used. This study aims to develop robust QSAR/QSPR models for chemical properties of environmental interest that can be used for regulatory purposes. This study primarily uses data from the publicly available PHYSPROP database consisting of a set of 13 common physicochemical and environmental fate properties. These datasets have undergone extensive curation using an automated workflow to select only high-quality data, and the chemical structures were standardized prior to calculation of the molecular descriptors. The modeling procedure was developed based on the five Organization for Economic Cooperation and Development (OECD) principles for QSAR models. A weighted k-nearest neighbor approach was adopted using a minimum number of required descriptors calculated using PaDEL, an open-source software. The genetic algorithms selected only the most pertinent and mechanistically interpretable descriptors (2–15, with an average of 11 descriptors). The sizes of the modeled datasets varied from 150 chemicals for biodegradability half-life to 14,050 chemicals for logP, with an average of 3222 chemicals across all endpoints. The optimal models were built on randomly selected training sets (75%) and validated using fivefold cross-validation (CV) and test sets (25%). The CV Q2 of the models varied from 0.72 to 0.95, with an average of 0.86 and an R2 test value from 0.71 to 0.96, with an average of 0.82. Modeling and performance details are described in QSAR model reporting format and were validated by the European Commission’s Joint Research Center to be OECD compliant. All models are freely available as an open-source, command-line application called OPEn structure–activity/property Relationship App (OPERA). OPERA models were applied to more than 750,000 chemicals to produce freely available predicted data on the U.S. Environmental Protection Agency’s CompTox Chemistry Dashboard.![]()
Collapse
Affiliation(s)
- Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA. .,Oak Ridge Institute for Science and Education, 1299 Bethel Valley Road, Oak Ridge, TN, 37830, USA. .,ScitoVation LLC, 6 Davis Drive, Research Triangle Park, NC, 27709, USA.
| | - Chris M Grulke
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Richard S Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Antony J Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| |
Collapse
|
66
|
Newton SR, McMahen RL, Sobus JR, Mansouri K, Williams AJ, McEachran AD, Strynar MJ. Suspect screening and non-targeted analysis of drinking water using point-of-use filters. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2018; 234:297-306. [PMID: 29182974 PMCID: PMC6145080 DOI: 10.1016/j.envpol.2017.11.033] [Citation(s) in RCA: 83] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Revised: 11/07/2017] [Accepted: 11/08/2017] [Indexed: 05/18/2023]
Abstract
Monitored contaminants in drinking water represent a small portion of the total compounds present, many of which may be relevant to human health. To understand the totality of human exposure to compounds in drinking water, broader monitoring methods are imperative. In an effort to more fully characterize the drinking water exposome, point-of-use water filtration devices (Brita® filters) were employed to collect time-integrated drinking water samples in a pilot study of nine North Carolina homes. A suspect screening analysis was performed by matching high resolution mass spectra of unknown features to molecular formulas from EPA's DSSTox database. Candidate compounds with those formulas were retrieved from the EPA's CompTox Chemistry Dashboard, a recently developed data hub for approximately 720,000 compounds. To prioritize compounds into those most relevant for human health, toxicity data from the US federal collaborative Tox21 program and the EPA ToxCast program, as well as exposure estimates from EPA's ExpoCast program, were used in conjunction with sample detection frequency and abundance to calculate a "ToxPi" score for each candidate compound. From ∼15,000 molecular features in the raw data, 91 candidate compounds were ultimately grouped into the highest priority class for follow up study. Fifteen of these compounds were confirmed using analytical standards including the highest priority compound, 1,2-Benzisothiazolin-3-one, which appeared in 7 out of 9 samples. The majority of the other high priority compounds are not targets of routine monitoring, highlighting major gaps in our understanding of drinking water exposures. General product-use categories from EPA's CPCat database revealed that several of the high priority chemicals are used in industrial processes, indicating the drinking water in central North Carolina may be impacted by local industries.
Collapse
Affiliation(s)
- Seth R Newton
- United States Environmental Protection Agency, National Exposure Research Laboratory, Research Triangle Park, NC 27709, United States.
| | - Rebecca L McMahen
- United States Environmental Protection Agency, National Exposure Research Laboratory, Research Triangle Park, NC 27709, United States; Oak Ridge Institute for Science and Education Research Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, United States
| | - Jon R Sobus
- United States Environmental Protection Agency, National Exposure Research Laboratory, Research Triangle Park, NC 27709, United States
| | - Kamel Mansouri
- Oak Ridge Institute for Science and Education Research Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, United States; United States Environmental Protection Agency, National Center for Computational Toxicology, Research Triangle Park, NC 27709, United States
| | - Antony J Williams
- United States Environmental Protection Agency, National Center for Computational Toxicology, Research Triangle Park, NC 27709, United States
| | - Andrew D McEachran
- Oak Ridge Institute for Science and Education Research Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, United States; United States Environmental Protection Agency, National Center for Computational Toxicology, Research Triangle Park, NC 27709, United States
| | - Mark J Strynar
- United States Environmental Protection Agency, National Exposure Research Laboratory, Research Triangle Park, NC 27709, United States
| |
Collapse
|
67
|
Tebes-Stevens C, Patel JM, Koopmans M, Olmstead J, Hilal SH, Pope N, Weber EJ, Wolfe K. Demonstration of a consensus approach for the calculation of physicochemical properties required for environmental fate assessments. CHEMOSPHERE 2018; 194:94-106. [PMID: 29197820 PMCID: PMC6146973 DOI: 10.1016/j.chemosphere.2017.11.137] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 11/21/2017] [Accepted: 11/22/2017] [Indexed: 05/21/2023]
Abstract
Eight software applications are compared for their performance in estimating the octanol-water partition coefficient (Kow), melting point, vapor pressure and water solubility for a dataset of polychlorinated biphenyls, polybrominated diphenyl ethers, polychlorinated dibenzodioxins, and polycyclic aromatic hydrocarbons. The predicted property values are compared against a curated dataset of measured property values compiled from the scientific literature with careful consideration given to the analytical methods used for property measurements of these hydrophobic chemicals. The variability in the predicted values from different calculators generally increases for higher values of Kow and melting point and for lower values of water solubility and vapor pressure. For each property, no individual calculator outperforms the others for all four of the chemical classes included in the analysis. Because calculator performance varies based on chemical class and property value, the geometric mean and the median of the calculated values from multiple calculators that use different estimation algorithms are recommended as more reliable estimates of the property value than the value from any single calculator.
Collapse
Affiliation(s)
- Caroline Tebes-Stevens
- U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, GA 30605, United States.
| | - Jay M Patel
- ORISE Fellow, U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, GA 30605, United States
| | - Michaela Koopmans
- ORAU, U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, GA 30605, United States
| | - John Olmstead
- ORAU, U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, GA 30605, United States
| | - Said H Hilal
- U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, GA 30605, United States
| | - Nick Pope
- Independent Contractor, Hildebran, NC, United States
| | - Eric J Weber
- U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, GA 30605, United States
| | - Kurt Wolfe
- U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, GA 30605, United States
| |
Collapse
|
68
|
Truong L, Ouedraogo G, Pham L, Clouzeau J, Loisel-Joubert S, Blanchet D, Noçairi H, Setzer W, Judson R, Grulke C, Mansouri K, Martin M. Predicting in vivo effect levels for repeat-dose systemic toxicity using chemical, biological, kinetic and study covariates. Arch Toxicol 2018; 92:587-600. [PMID: 29075892 PMCID: PMC5818596 DOI: 10.1007/s00204-017-2067-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 09/18/2017] [Indexed: 11/29/2022]
Abstract
In an effort to address a major challenge in chemical safety assessment, alternative approaches for characterizing systemic effect levels, a predictive model was developed. Systemic effect levels were curated from ToxRefDB, HESS-DB and COSMOS-DB from numerous study types totaling 4379 in vivo studies for 1247 chemicals. Observed systemic effects in mammalian models are a complex function of chemical dynamics, kinetics, and inter- and intra-individual variability. To address this complex problem, systemic effect levels were modeled at the study-level by leveraging study covariates (e.g., study type, strain, administration route) in addition to multiple descriptor sets, including chemical (ToxPrint, PaDEL, and Physchem), biological (ToxCast), and kinetic descriptors. Using random forest modeling with cross-validation and external validation procedures, study-level covariates alone accounted for approximately 15% of the variance reducing the root mean squared error (RMSE) from 0.96 log10 to 0.85 log10 mg/kg/day, providing a baseline performance metric (lower expectation of model performance). A consensus model developed using a combination of study-level covariates, chemical, biological, and kinetic descriptors explained a total of 43% of the variance with an RMSE of 0.69 log10 mg/kg/day. A benchmark model (upper expectation of model performance) was also developed with an RMSE of 0.5 log10 mg/kg/day by incorporating study-level covariates and the mean effect level per chemical. To achieve a representative chemical-level prediction, the minimum study-level predicted and observed effect level per chemical were compared reducing the RMSE from 1.0 to 0.73 log10 mg/kg/day, equivalent to 87% of predictions falling within an order-of-magnitude of the observed value. Although biological descriptors did not improve model performance, the final model was enriched for biological descriptors that indicated xenobiotic metabolism gene expression, oxidative stress, and cytotoxicity, demonstrating the importance of accounting for kinetics and non-specific bioactivity in predicting systemic effect levels. Herein, we generated an externally predictive model of systemic effect levels for use as a safety assessment tool and have generated forward predictions for over 30,000 chemicals.
Collapse
Affiliation(s)
- Lisa Truong
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
- Currently at Oregon State University, Corvallis, USA
| | - Gladys Ouedraogo
- L'Oréal Safety Research Department, 1 Avenue E. Schueller, 93600, Aulnay-Sous-Bois, France
| | - LyLy Pham
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Jacques Clouzeau
- L'Oréal Safety Research Department, 1 Avenue E. Schueller, 93600, Aulnay-Sous-Bois, France
| | - Sophie Loisel-Joubert
- L'Oréal Safety Research Department, 1 Avenue E. Schueller, 93600, Aulnay-Sous-Bois, France
| | - Delphine Blanchet
- L'Oréal Safety Research Department, 1 Avenue E. Schueller, 93600, Aulnay-Sous-Bois, France
| | - Hicham Noçairi
- L'Oréal Safety Research Department, 1 Avenue E. Schueller, 93600, Aulnay-Sous-Bois, France
| | - Woodrow Setzer
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Richard Judson
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Chris Grulke
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
- Currently at Scitovation LLC, Research Triangle Park, NC, USA
| | - Matthew Martin
- National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, 27711, USA.
- Currently at Pfizer, Inc, Drug Safety Research and Development, 445 Eastern Point Road, MS 8274-1224, Groton, CT, 06340, USA.
| |
Collapse
|
69
|
McMullen PD, Andersen ME, Cholewa B, Clewell HJ, Dunnick KM, Hartman JK, Mansouri K, Minto MS, Nicolas CI, Phillips MB, Slattery S, Yoon M, Clewell RA. Evaluating opportunities for advancing the use of alternative methods in risk assessment through the development of fit-for-purpose in vitro assays. Toxicol In Vitro 2018; 48:310-317. [PMID: 29391263 DOI: 10.1016/j.tiv.2018.01.027] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 12/27/2017] [Accepted: 01/29/2018] [Indexed: 12/11/2022]
Abstract
An evolving regulatory, scientific, and legislative landscape is driving a fundamental change in how chemical safety decisions are made. As we move to implement changes, regulatory agencies and industry are beginning to adopt tiered approaches, which leverage high-throughput screening technologies for prioritization and read across, followed by interrogation of "hit chemicals" with more rigorous dose-response assessment either in fit-for-purpose human cell-based assays or with traditional in vivo tests. However, to date, suitable in vitro alternatives do not exist for the vast majority of the organ toxicities that form the basis of current regulatory decisions. To successfully support safety decisions, biologically relevant, quantitative, cell-based assays that evaluate dose-response and identify regions of safety for chemical exposure are required. This review evaluates the current state of the science in the development of such assays, identifies key gaps in the current tests, and recommends areas where research efforts may be focused to help move the risk assessment community towards more wide-spread use of in vitro methods. Our analysis suggests that a key shortcoming in the current efforts is the ability to test volatile compounds and to predict pulmonary toxicity. We present a mechanistically-based path forward for the development of a fit-for-purpose lung toxicity assay.
Collapse
Affiliation(s)
| | | | - Brian Cholewa
- ScitoVation, LLC., Research Triangle Park, NC 27709, United States
| | - Harvey J Clewell
- ScitoVation, LLC., Research Triangle Park, NC 27709, United States
| | | | | | - Kamel Mansouri
- ScitoVation, LLC., Research Triangle Park, NC 27709, United States
| | - Melyssa S Minto
- ScitoVation, LLC., Research Triangle Park, NC 27709, United States
| | | | | | - Scott Slattery
- ScitoVation, LLC., Research Triangle Park, NC 27709, United States
| | - Miyoung Yoon
- ScitoVation, LLC., Research Triangle Park, NC 27709, United States
| | | |
Collapse
|
70
|
A comparison of three liquid chromatography (LC) retention time prediction models. Talanta 2018; 182:371-379. [PMID: 29501166 DOI: 10.1016/j.talanta.2018.01.022] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Revised: 01/08/2018] [Accepted: 01/09/2018] [Indexed: 11/20/2022]
Abstract
High-resolution mass spectrometry (HRMS) data has revolutionized the identification of environmental contaminants through non-targeted analysis (NTA). However, chemical identification remains challenging due to the vast number of unknown molecular features typically observed in environmental samples. Advanced data processing techniques are required to improve chemical identification workflows. The ideal workflow brings together a variety of data and tools to increase the certainty of identification. One such tool is chromatographic retention time (RT) prediction, which can be used to reduce the number of possible suspect chemicals within an observed RT window. This paper compares the relative predictive ability and applicability to NTA workflows of three RT prediction models: (1) a logP (octanol-water partition coefficient)-based model using EPI Suite™ logP predictions; (2) a commercially available ACD/ChromGenius model; and, (3) a newly developed Quantitative Structure Retention Relationship model called OPERA-RT. Models were developed using the same training set of 78 compounds with experimental RT data and evaluated for external predictivity on an identical test set of 19 compounds. Both the ACD/ChromGenius and OPERA-RT models outperformed the EPI Suite™ logP-based RT model (R2 = 0.81-0.92, 0.86-0.83, 0.66-0.69 for training-test sets, respectively). Further, both OPERA-RT and ACD/ChromGenius predicted 95% of RTs within a ± 15% chromatographic time window of experimental RTs. Based on these results, we simulated an NTA workflow with a ten-fold larger list of candidate structures generated for formulae of the known test set chemicals using the U.S. EPA's CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard), RTs for all candidates were predicted using both ACD/ChromGenius and OPERA-RT, and RT screening windows were assessed for their ability to filter out unlikely candidate chemicals and enhance potential identification. Compared to ACD/ChromGenius, OPERA-RT screened out a greater percentage of candidate structures within a 3-min RT window (60% vs. 40%) but retained fewer of the known chemicals (42% vs. 83%). By several metrics, the OPERA-RT model, generated as a proof-of-concept using a limited set of open source data, performed as well as the commercial tool ACD/ChromGenius when constrained to the same small training and test sets. As the availability of RT data increases, we expect the OPERA-RT model's predictive ability will increase.
Collapse
|
71
|
Grisoni F, Ballabio D, Todeschini R, Consonni V. Molecular Descriptors for Structure-Activity Applications: A Hands-On Approach. Methods Mol Biol 2018; 1800:3-53. [PMID: 29934886 DOI: 10.1007/978-1-4939-7899-1_1] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Molecular descriptors capture diverse parts of the structural information of molecules and they are the support of many contemporary computer-assisted toxicological and chemical applications. After briefly introducing some fundamental concepts of structure-activity applications (e.g., molecular descriptor dimensionality, classical vs. fingerprint description, and activity landscapes), this chapter guides the readers through a step-by-step explanation of molecular descriptors rationale and application. To this end, the chapter illustrates a case study of a recently published application of molecular descriptors for modeling the activity on cytochrome P450.
Collapse
Affiliation(s)
- Francesca Grisoni
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy.
| | - Davide Ballabio
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| | - Roberto Todeschini
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| | - Viviana Consonni
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
72
|
Grisoni F, Consonni V, Todeschini R. Impact of Molecular Descriptors on Computational Models. Methods Mol Biol 2018; 1825:171-209. [PMID: 30334206 DOI: 10.1007/978-1-4939-8639-2_5] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Molecular descriptors encode a wide variety of molecular information and have become the support of many contemporary chemoinformatic and bioinformatic applications. They grasp specific molecular features (e.g., geometry, shape, pharmacophores, or atomic properties) and directly affect computational models, in terms of outcome, performance, and applicability. This chapter aims to illustrate the impact of different molecular descriptors on the structural information captured and on the perceived chemical similarity among molecules. After introducing the fundamental concepts of molecular descriptor theory and application, a step-by-step retrospective virtual screening procedure guides users through the fundamental processing steps and discusses the impact of different types of molecular descriptors.
Collapse
Affiliation(s)
- Francesca Grisoni
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy.
| | - Viviana Consonni
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| | - Roberto Todeschini
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
73
|
Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, Patlewicz G, Shah I, Wambaugh JF, Judson RS, Richard AM. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform 2017; 9:61. [PMID: 29185060 PMCID: PMC5705535 DOI: 10.1186/s13321-017-0247-6] [Citation(s) in RCA: 584] [Impact Index Per Article: 83.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Accepted: 11/18/2017] [Indexed: 11/10/2022] Open
Abstract
Despite an abundance of online databases providing access to chemical data, there is increasing demand for high-quality, structure-curated, open data to meet the various needs of the environmental sciences and computational toxicology communities. The U.S. Environmental Protection Agency's (EPA) web-based CompTox Chemistry Dashboard is addressing these needs by integrating diverse types of relevant domain data through a cheminformatics layer, built upon a database of curated substances linked to chemical structures. These data include physicochemical, environmental fate and transport, exposure, usage, in vivo toxicity, and in vitro bioassay data, surfaced through an integration hub with link-outs to additional EPA data and public domain online resources. Batch searching allows for direct chemical identifier (ID) mapping and downloading of multiple data streams in several different formats. This facilitates fast access to available structure, property, toxicity, and bioassay data for collections of chemicals (hundreds to thousands at a time). Advanced search capabilities are available to support, for example, non-targeted analysis and identification of chemicals using mass spectrometry. The contents of the chemistry database, presently containing ~ 760,000 substances, are available as public domain data for download. The chemistry content underpinning the Dashboard has been aggregated over the past 15 years by both manual and auto-curation techniques within EPA's DSSTox project. DSSTox chemical content is subject to strict quality controls to enforce consistency among chemical substance-structure identifiers, as well as list curation review to ensure accurate linkages of DSSTox substances to chemical lists and associated data. The Dashboard, publicly launched in April 2016, has expanded considerably in content and user traffic over the past year. It is continuously evolving with the growth of DSSTox into high-interest or data-rich domains of interest to EPA, such as chemicals on the Toxic Substances Control Act listing, while providing the user community with a flexible and dynamic web-based platform for integration, processing, visualization and delivery of data and resources. The Dashboard provides support for a broad array of research and regulatory programs across the worldwide community of toxicologists and environmental scientists.
Collapse
Affiliation(s)
- Antony J. Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| | - Christopher M. Grulke
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| | - Jeff Edwards
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| | | | - Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
- Oak Ridge Institute for Science and Education, Oak Ridge, TN USA
- ScitoVation LLC, Research Triangle Park, NC USA
| | | | - Grace Patlewicz
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| | - Imran Shah
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| | - John F. Wambaugh
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| | - Richard S. Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| | - Ann M. Richard
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC USA
| |
Collapse
|
74
|
Biryol D, Nicolas CI, Wambaugh J, Phillips K, Isaacs K. High-throughput dietary exposure predictions for chemical migrants from food contact substances for use in chemical prioritization. ENVIRONMENT INTERNATIONAL 2017; 108:185-194. [PMID: 28865378 PMCID: PMC5894819 DOI: 10.1016/j.envint.2017.08.004] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Revised: 08/07/2017] [Accepted: 08/08/2017] [Indexed: 05/21/2023]
Abstract
Under the ExpoCast program, United States Environmental Protection Agency (EPA) researchers have developed a high-throughput (HT) framework for estimating aggregate exposures to chemicals from multiple pathways to support rapid prioritization of chemicals. Here, we present methods to estimate HT exposures to chemicals migrating into food from food contact substances (FCS). These methods consisted of combining an empirical model of chemical migration with estimates of daily population food intakes derived from food diaries from the National Health and Nutrition Examination Survey (NHANES). A linear regression model for migration at equilibrium was developed by fitting available migration measurements as a function of temperature, food type (i.e., fatty, aqueous, acidic, alcoholic), initial chemical concentration in the FCS (C0) and chemical properties. The most predictive variables in the resulting model were C0, molecular weight, log Kow, and food type (R2=0.71, p<0.0001). Migration-based concentrations for 1009 chemicals identified via publicly-available data sources as being present in polymer FCSs were predicted for 12 food groups (combinations of 3 storage temperatures and food type). The model was parameterized with screening-level estimates of C0 based on the functional role of chemicals in FCS. By combining these concentrations with daily intakes for food groups derived from NHANES, population ingestion exposures of chemical in mg/kg-bodyweight/day (mg/kg-BW/day) were estimated. Calibrated aggregate exposures were estimated for 1931 chemicals by fitting HT FCS and consumer product exposures to exposures inferred from NHANES biomonitoring (R2=0.61, p<0.001); both FCS and consumer product pathway exposures were significantly predictive of inferred exposures. Including the FCS pathway significantly impacted the ratio of predicted exposures to those estimated to produce steady-state blood concentrations equal to in-vitro bioactive concentrations. While these HT methods have large uncertainties (and thus may not be appropriate for assessments of single chemicals), they can provide critical refinement to aggregate exposure predictions used in risk-based chemical priority-setting.
Collapse
Affiliation(s)
- Derya Biryol
- Oak Ridge Institute for Science and Education, Oak Ridge, TN 37830, United States; U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, United States
| | - Chantel I Nicolas
- Oak Ridge Institute for Science and Education, Oak Ridge, TN 37830, United States; U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, United States
| | - John Wambaugh
- U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, United States
| | - Katherine Phillips
- U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, United States
| | - Kristin Isaacs
- U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, United States.
| |
Collapse
|
75
|
Devillers J, Devillers H, Bro E, Millot F. Expert judgment based multicriteria decision models to assess the risk of pesticides on reproduction failures of grey partridge. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017; 28:889-911. [PMID: 29206499 DOI: 10.1080/1062936x.2017.1402449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 11/04/2017] [Indexed: 06/07/2023]
Abstract
A suite of models is proposed for estimating the risk of pesticides against the grey partridge (Perdix perdix) and their clutches. Radio-tracked data of females, description and location of the clutches, and data on the pesticide treatments during the laying periods of the partridges were used as basic information. Quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) modelling allowed us to characterize the pesticides by their 1-octanol/water partition coefficient (log P), vapour pressure, primary and ultimate biodegradation potential, acute toxicity (LD50) on P. perdix, and endocrine disruption potential. From these physicochemical and toxicological data, the system of integration of risk with interaction of scores (SIRIS) method was used to design scores of risk for pesticides, alone or in mixture. A program, written in R (version 3.1.1), called Simulation of Toxicity in Perdix perdix (SimToxPP), was designed for estimating the risk of substances, considered alone or in mixture, against the grey partridge during breeding. The software tool is flexible enough to simulate realistic in situ scenarios. Different examples of applications are shown. The advantages and limitations of the approach are briefly discussed.
Collapse
Affiliation(s)
| | - H Devillers
- b Micalis Institute, INRA, University Paris-Saclay , Jouy-en-Josas , France
| | - E Bro
- c Research Department , National Game and Wildlife Institute (ONCFS) , Auffargis , France
| | - F Millot
- c Research Department , National Game and Wildlife Institute (ONCFS) , Auffargis , France
| |
Collapse
|
76
|
Murtazalieva KA, Druzhilovskiy DS, Goel RK, Sastry GN, Poroikov VV. How good are publicly available web services that predict bioactivity profiles for drug repurposing? SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017; 28:843-862. [PMID: 29183230 DOI: 10.1080/1062936x.2017.1399448] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2017] [Accepted: 10/29/2017] [Indexed: 06/07/2023]
Abstract
Drug repurposing provides a non-laborious and less expensive way for finding new human medicines. Computational assessment of bioactivity profiles shed light on the hidden pharmacological potential of the launched drugs. Currently, several freely available computational tools are available via the Internet, which predict multitarget profiles of drug-like compounds. They are based on chemical similarity assessment (ChemProt, SuperPred, SEA, SwissTargetPrediction and TargetHunter) or machine learning methods (ChemProt and PASS). To compare their performance, this study has created two evaluation sets, consisting of (1) 50 well-known repositioned drugs and (2) 12 drugs recently patented for new indications. In the first set, sensitivity values varied from 0.64 (TarPred) to 1.00 (PASS Online) for the initial indications and from 0.64 (TarPred) to 0.98 (PASS Online) for the repurposed indications. In the second set, sensitivity values varied from 0.08 (SuperPred) to 1.00 (PASS Online) for the initial indications and from 0.00 (SuperPred) to 1.00 (PASS Online) for the repurposed indications. Thus, this analysis demonstrated that the performance of machine learning methods surpassed those of chemical similarity assessments, particularly in the case of novel repurposed indications.
Collapse
Affiliation(s)
- K A Murtazalieva
- a Institute of Biomedical Chemistry , Moscow , Russia
- b Pirogov Russian National Research Medical University , Moscow , Russia
| | | | - R K Goel
- c Punjabi University , Patiala , Punjab , India
| | - G N Sastry
- d CSIR-Indian Institute of Chemical Technology , Hyderabad , India
| | - V V Poroikov
- a Institute of Biomedical Chemistry , Moscow , Russia
| |
Collapse
|
77
|
Gedeck P, Skolnik S, Rodde S. Developing Collaborative QSAR Models Without Sharing Structures. J Chem Inf Model 2017; 57:1847-1858. [DOI: 10.1021/acs.jcim.7b00315] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Peter Gedeck
- Peter Gedeck LLC, 2309 Grove Avenue, Falls Church, Virginia 22046, United States
| | - Suzanne Skolnik
- Novartis Institute for Biomedical Research, 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Stephane Rodde
- Novartis Institute for Biomedical Research, Postfach, CH-4002 Basel, Switzerland
| |
Collapse
|
78
|
Schymanski EL, Williams AJ. Open Science for Identifying "Known Unknown" Chemicals. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2017; 51:5357-5359. [PMID: 28475325 PMCID: PMC6260822 DOI: 10.1021/acs.est.7b01908] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Affiliation(s)
- Emma L. Schymanski
- Eawag: Swiss Federal Institute for Aquatic Science and Technology, Überlandstrasse 133, 8600 Dübendorf, Switzerland.
| | - Antony J. Williams
- National Center for Computational Toxicology, US EPA, Research Triangle Park, Durham, NC, 27711.
| |
Collapse
|
79
|
Card ML, Gomez-Alvarez V, Lee WH, Lynch DG, Orentas NS, Lee MT, Wong EM, Boethling RS. History of EPI Suite™ and future perspectives on chemical property estimation in US Toxic Substances Control Act new chemical risk assessments. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2017; 19:203-212. [PMID: 28275775 DOI: 10.1039/c7em00064b] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Chemical property estimation is a key component in many industrial, academic, and regulatory activities, including in the risk assessment associated with the approximately 1000 new chemical pre-manufacture notices the United States Environmental Protection Agency (US EPA) receives annually. The US EPA evaluates fate, exposure and toxicity under the 1976 Toxic Substances Control Act (amended by the 2016 Frank R. Lautenberg Chemical Safety for the 21st Century Act), which does not require test data with new chemical applications. Though the submission of data is not required, the US EPA has, over the past 40 years, occasionally received chemical-specific data with pre-manufacture notices. The US EPA has been actively using this and publicly available data to develop and refine predictive computerized models, most of which are housed in EPI Suite™, to estimate chemical properties used in the risk assessment of new chemicals. The US EPA develops and uses models based on (quantitative) structure-activity relationships ([Q]SARs) to estimate critical parameters. As in any evolving field, (Q)SARs have experienced successes, suffered failures, and responded to emerging trends. Correlations of a chemical structure with its properties or biological activity were first demonstrated in the late 19th century and today have been encapsulated in a myriad of quantitative and qualitative SARs. The development and proliferation of the personal computer in the late 20th century gave rise to a quickly increasing number of property estimation models, and continually improved computing power and connectivity among researchers via the internet are enabling the development of increasingly complex models.
Collapse
Affiliation(s)
- Marcella L Card
- United States Environmental Protection Agency Office of Pollution Prevention and Toxics, Washington, DC 20004, USA.
| | - Vicente Gomez-Alvarez
- United States Environmental Protection Agency Office of Pollution Prevention and Toxics, Washington, DC 20004, USA.
| | - Wen-Hsiung Lee
- United States Environmental Protection Agency Office of Pollution Prevention and Toxics, Washington, DC 20004, USA.
| | - David G Lynch
- United States Environmental Protection Agency Office of Pollution Prevention and Toxics, Washington, DC 20004, USA.
| | - Nerija S Orentas
- United States Environmental Protection Agency Office of Pollution Prevention and Toxics, Washington, DC 20004, USA.
| | - Mari Titcombe Lee
- United States Environmental Protection Agency Office of Pollution Prevention and Toxics, Washington, DC 20004, USA.
| | - Edmund M Wong
- United States Environmental Protection Agency Office of Pollution Prevention and Toxics, Washington, DC 20004, USA.
| | | |
Collapse
|
80
|
McEachran AD, Shea D, Nichols EG. Pharmaceuticals in a temperate forest-water reuse system. THE SCIENCE OF THE TOTAL ENVIRONMENT 2017; 581-582:705-714. [PMID: 28073640 PMCID: PMC5303553 DOI: 10.1016/j.scitotenv.2016.12.185] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Revised: 12/28/2016] [Accepted: 12/29/2016] [Indexed: 05/08/2023]
Abstract
Forest-water reuse systems infiltrate municipal, industrial, and agricultural wastewaters through forest soils to shallow aquifers that ultimately discharge to surface waters. Their ability to mitigate regulated nutrients, metals, and organic chemicals is well known, but the fate of non-regulated chemicals in these systems is largely unstudied. This study quantified 33 pharmaceuticals and personal care products (PPCPs) in soils, groundwaters, and surface waters in a 2000-hectare forest that receives ~1200mm/year of secondary-treated, municipal wastewater in addition to natural rainfall (~1300mm/year). This forest-water reuse system does contribute PPCPs to soils, groundwater, and surface waters. PPCPs were more abundant in soils versus underlying groundwater by an order of magnitude (5-10ng/g summed PPCPs in soil and 50-100ng/L in groundwater) and the more hydrophobic chemicals were predominant in soil over water. PPCP concentrations in surface waters were greater at the onset of significant storm events and during low-rainfall periods when total summed PPCPs were >80ng/L, higher than the annual average. With few exceptions, the margins of exposure for PPCPs in groundwater and surface waters were several orders of magnitude above values indicative of human health risk.
Collapse
Affiliation(s)
- Andrew D McEachran
- North Carolina State University, Department of Forestry and Environmental Resources, College of Natural Resources, Campus Box 8008, Raleigh, NC 27695, USA.
| | - Damian Shea
- North Carolina State University, Department of Biological Sciences, College of Sciences, Campus Box 7614, Raleigh, NC 27695, USA.
| | - Elizabeth Guthrie Nichols
- North Carolina State University, Department of Forestry and Environmental Resources, College of Natural Resources, Campus Box 8008, Raleigh, NC 27695, USA.
| |
Collapse
|
81
|
Zang Q, Mansouri K, Williams AJ, Judson RS, Allen DG, Casey WM, Kleinstreuer NC. In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning. J Chem Inf Model 2017; 57:36-49. [PMID: 28006899 PMCID: PMC6131700 DOI: 10.1021/acs.jcim.6b00625] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
There are little available toxicity data on the vast majority of chemicals in commerce. High-throughput screening (HTS) studies, such as those being carried out by the U.S. Environmental Protection Agency (EPA) ToxCast program in partnership with the federal Tox21 research program, can generate biological data to inform models for predicting potential toxicity. However, physicochemical properties are also needed to model environmental fate and transport, as well as exposure potential. The purpose of the present study was to generate an open-source quantitative structure-property relationship (QSPR) workflow to predict a variety of physicochemical properties that would have cross-platform compatibility to integrate into existing cheminformatics workflows. In this effort, decades-old experimental property data sets available within the EPA EPI Suite were reanalyzed using modern cheminformatics workflows to develop updated QSPR models capable of supplying computationally efficient, open, and transparent HTS property predictions in support of environmental modeling efforts. Models were built using updated EPI Suite data sets for the prediction of six physicochemical properties: octanol-water partition coefficient (logP), water solubility (logS), boiling point (BP), melting point (MP), vapor pressure (logVP), and bioconcentration factor (logBCF). The coefficient of determination (R2) between the estimated values and experimental data for the six predicted properties ranged from 0.826 (MP) to 0.965 (BP), with model performance for five of the six properties exceeding those from the original EPI Suite models. The newly derived models can be employed for rapid estimation of physicochemical properties within an open-source HTS workflow to inform fate and toxicity prediction models of environmental chemicals.
Collapse
Affiliation(s)
- Qingda Zang
- Integrated Laboratory Systems, Inc., Research Triangle Park, NC 27709, USA
| | - Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Antony J. Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Richard S. Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - David G. Allen
- Integrated Laboratory Systems, Inc., Research Triangle Park, NC 27709, USA
| | - Warren M. Casey
- National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA
| | - Nicole C. Kleinstreuer
- National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA
| |
Collapse
|