1
|
Xu JY, Wang K, Men SH, Yang Y, Zhou Q, Yan ZG. QSAR-QSIIR-based prediction of bioconcentration factor using machine learning and preliminary application. ENVIRONMENT INTERNATIONAL 2023; 177:108003. [PMID: 37276762 DOI: 10.1016/j.envint.2023.108003] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 05/25/2023] [Accepted: 05/29/2023] [Indexed: 06/07/2023]
Abstract
Bioconcentration factor (BCF) is one of the important parameters for developing human health ambient water quality criteria (HHAWQC) for chemical pollutants. Traditional experimental method to obtain BCF is time-consuming and costly. Therefore, prediction of BCF by modeling has attracted much attention. QSAR (Quantitative Structure-Activity Relationship) model based on molecular descriptor is often used to predict BCF, however, in order to improve the accuracy of prediction, previous models are only applicable for prediction for a single category of substance and a single species, and cannot meet the needs of BCF prediction of pollutants lacing toxicity data. In this study, optimized 17 traditional molecular descriptor and five kinds of bioactivity descriptor were selected from more than 200 molecular descriptor and 25 kinds of biological activity descriptors. A QSAR-QSIIR (Quantitative Structure In vitro-In vivo Relationship) model suitable for multiple chemical substances and whole species is constructed by using optimized 4-MLP machine learning algorithm with selected molecular and bioactivity descriptors. The constructed model significantly improves the prediction accuracy of BCF. The R2 of verification set and test set are 0.8575 and 0.7924, respectively, and the difference between predicted BCF and measured BCF is mostly less than 1.5 times. Then, BCF of BTEX in Chinese common aquatic products is predicted using the constructed QSAR-QSIIR model, and the HHAWQC of BTEX in China are derived using the predicted BCF, which provides a valuable reference for establishment of China's BTEX water quality standards.
Collapse
Affiliation(s)
- Jia-Yun Xu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Kun Wang
- National Engineering Laboratory for Lake Pollution Control and Ecological Restoration, State Environment Protection Key Laboratory for Lake Pollution Control, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Shu-Hui Men
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Yang Yang
- China Energy Longyuan Environmental Protection Co.,Ltd., Beijing 100039, China
| | - Quan Zhou
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Zhen-Guang Yan
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China.
| |
Collapse
|
2
|
Hao N, Sun P, Zhao W, Li X. Application of a developed triple-classification machine learning model for carcinogenic prediction of hazardous organic chemicals to the US, EU, and WHO based on Chinese database. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2023; 255:114806. [PMID: 36948010 DOI: 10.1016/j.ecoenv.2023.114806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 03/04/2023] [Accepted: 03/16/2023] [Indexed: 06/18/2023]
Abstract
Cancer, the second largest human disease, has become a major public health problem. The prediction of chemicals' carcinogenicity before their synthesis is crucial. In this paper, seven machine learning algorithms (i.e., Random Forest (RF), Logistic Regression (LR), Support Vector Machines (SVM), Complement Naive Bayes (CNB), K-Nearest Neighbor (KNN), XGBoost, and Multilayer Perceptron (MLP)) were used to construct the carcinogenicity triple classification prediction (TCP) model (i.e., 1A, 1B, Category 2). A total of 1444 descriptors of 118 hazardous organic chemicals were calculated by Discovery Studio 2020, Sybyl X-2.0 and PaDEL-Descriptor software. The constructed carcinogenicity TCP model was evaluated through five model evaluation indicators (i.e., Accuracy, Precision, Recall, F1 Score and AUC). The model evaluation results show that Accuracy, Precision, Recall, F1 Score and AUC evaluation indicators meet requirements (greater than 0.6). The accuracy of RF, LR, XGBoost, and MLP models for predicting carcinogenicity of Category 2 is 91.67%, 79.17%, 100%, and 100%, respectively. In addition, the constructed machine learning model in this study has potential for error correction. Taking XGBoost model as an example, the predicted carcinogenicity level of 1,2,3-Trichloropropane (96-18-4) is Category 2, but the actual carcinogenicity level is 1B. But the difference between Category 2 and 1B is only 0.004, indicating that the XGBoost is one optimum model of the seven constructed machine learning models. Besides, results showed that functional groups like chlorine and benzene ring might influence the prediction of carcinogenic classification. Therefore, considering functional group characteristics of chemicals before constructing the carcinogenicity prediction model of organic chemicals is recommended. The predicted carcinogenicity of the organic chemicals using the optimum machine leaning model (i.e., XGBoost) was also evaluated and verified by the toxicokinetics. The RF and XGBoost TCP models constructed in this paper can be used for carcinogenicity detection before synthesizing new organic substances. It also provides technical support for the subsequent management of organic chemicals.
Collapse
Affiliation(s)
- Ning Hao
- College of New Energy and Environment, Jilin University, Changchun 130012, China
| | - Peixuan Sun
- College of New Energy and Environment, Jilin University, Changchun 130012, China
| | - Wenjin Zhao
- College of New Energy and Environment, Jilin University, Changchun 130012, China.
| | - Xixi Li
- State Environmental Protection Key Laboratory of Ecological Effect and Risk Assessment of Chemicals, Chinese Research Academy of Environmental Sciences, Beijing 100012, China; Northern Region Persistent Organic Pollution Control (NRPOP) Laboratory, Faculty of Engineering and Applied Science, Memorial University, St. John's, A1B 3×5, Canada.
| |
Collapse
|
3
|
Allen TEH, Nelms MD, Edwards SW, Goodman JM, Gutsell S, Russell PJ. In Silico Guidance for In Vitro Androgen and Glucocorticoid Receptor ToxCast Assays. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2020; 54:7461-7470. [PMID: 32432465 DOI: 10.1021/acs.est.0c01105] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Molecular initiating events (MIEs) are key events in adverse outcome pathways that link molecular chemistry to target biology. As they are based on chemistry, these interactions are excellent targets for computational chemistry approaches to in silico modeling. In this work, we aim to link ligand chemical structures to MIEs for androgen receptor (AR) and glucocorticoid receptor (GR) binding using ToxCast data. This has been done using an automated computational algorithm to perform maximal common substructure searches on chemical binders for each target from the ToxCast dataset. The models developed show a high level of accuracy, correctly assigning 87.20% of AR binders and 96.81% of GR binders in a 25% test set using holdout cross-validation. The 2D structural alerts developed can be used as in silico models to predict these MIEs and as guidance for in vitro ToxCast assays to confirm hits. These models can target such experimental work, reducing the number of assays to be performed to gain required toxicological insight. Development of these models has also allowed some structural alerts to be identified as predictors for agonist or antagonist behavior at the receptor target. This work represents a first step in using computational methods to guide and target experimental approaches.
Collapse
Affiliation(s)
- Timothy E H Allen
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
- MRC Toxicology Unit, University of Cambridge, Hodgkin Building, Lancaster Road, Leicester LE1 7HB, U.K
| | - Mark D Nelms
- Oak Ridge Institute for Science and Education, Oak Ridge, Tennessee 37830, United States
- Integrated Systems Toxicology Division, National Health and Environmental Effects Research Laboratory, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, Durham, North Carolina 27709, United States
| | - Stephen W Edwards
- Integrated Systems Toxicology Division, National Health and Environmental Effects Research Laboratory, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, Durham, North Carolina 27709, United States
| | - Jonathan M Goodman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Steve Gutsell
- Unilever Safety and Environmental Assurance Centre, Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K
| | - Paul J Russell
- Unilever Safety and Environmental Assurance Centre, Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K
| |
Collapse
|
4
|
Baxter LK, Dionisio K, Pradeep P, Rappazzo K, Neas L. Human exposure factors as potential determinants of the heterogeneity in city-specific associations between PM 2.5 and mortality. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2019; 29:557-567. [PMID: 30310133 PMCID: PMC6643264 DOI: 10.1038/s41370-018-0080-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 08/27/2018] [Accepted: 09/17/2018] [Indexed: 06/01/2023]
Abstract
Multi-city population-based epidemiological studies of short-term fine particulate matter (PM2.5) exposures and mortality have observed heterogeneity in risk estimates between cities. Factors affecting exposures, such as pollutant infiltration, which are not captured by central-site monitoring data, can differ between communities potentially explaining some of this heterogeneity. This analysis evaluates exposure factors as potential determinants of the heterogeneity in 312 core-based statistical areas (CBSA)-specific associations between PM2.5 and mortality using inverse variance weighted linear regression. Exposure factor variables were created based on data on housing characteristics, commuting patterns, heating fuel usage, and climatic factors from national surveys. When survey data were not available, air conditioning (AC) prevalence was predicted utilizing machine learning techniques. Across all CBSAs, there was a 0.95% (Interquartile range (IQR) of 2.25) increase in non-accidental mortality per 10 µg/m3 increase in PM2.5 and significant heterogeneity between CBSAs. CBSAs with larger homes, more heating degree days, a higher percentage of home heating with oil had significantly (p < 0.05) higher health effect estimates, while cities with more gas heating had significantly lower health effect estimates. While univariate models did not explain much of heterogeneity in health effect estimates (R2 < 1%), multivariate models began to explain some of the observed heterogeneity (R2 = 13%).
Collapse
Affiliation(s)
- Lisa K Baxter
- National Health and Environmental Effects Research Laboratory, Office of Research and Development, Environmental Protection Agency, Research Triangle Park, NC, 27711, USA.
| | - Kathie Dionisio
- National Exposure Research Laboratory, Office of Research and Development, Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Prachi Pradeep
- National Center for Computational Toxicology, Office of Research and Development, Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
- Oak Ridge Institute for Science and Education, Oak Ridge, TN, USA
| | - Kristen Rappazzo
- National Health and Environmental Effects Research Laboratory, Office of Research and Development, Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Lucas Neas
- National Health and Environmental Effects Research Laboratory, Office of Research and Development, Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| |
Collapse
|
5
|
Patlewicz G, Cronin MT, Helman G, Lambert JC, Lizarraga LE, Shah I. Navigating through the minefield of read-across frameworks: A commentary perspective. ACTA ACUST UNITED AC 2018. [DOI: 10.1016/j.comtox.2018.04.002] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
6
|
Grace P, George H, Prachi P, Imran S. Navigating through the minefield of read-across tools: A review of in silico tools for grouping. ACTA ACUST UNITED AC 2017; 3:1-18. [PMID: 30221211 DOI: 10.1016/j.comtox.2017.05.003] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Read-across is a popular data gap filling technique used within analogue and category approaches for regulatory purposes. In recent years there have been many efforts focused on the challenges involved in read-across development, its scientific justification and documentation. Tools have also been developed to facilitate read-across development and application. Here, we describe a number of publicly available read-across tools in the context of the category/analogue workflow and review their respective capabilities, strengths and weaknesses. No single tool addresses all aspects of the workflow. We highlight how the different tools complement each other and some of the opportunities for their further development to address the continued evolution of read-across.
Collapse
Affiliation(s)
- Patlewicz Grace
- National Center for Computational Toxicology (NCCT), Office of Research and Development, US Environmental Protection Agency, 109 TW Alexander Dr, Research Triangle Park (RTP), NC 27711, USA
| | - Helman George
- National Center for Computational Toxicology (NCCT), Office of Research and Development, US Environmental Protection Agency, 109 TW Alexander Dr, Research Triangle Park (RTP), NC 27711, USA.,Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, TN, USA
| | - Pradeep Prachi
- National Center for Computational Toxicology (NCCT), Office of Research and Development, US Environmental Protection Agency, 109 TW Alexander Dr, Research Triangle Park (RTP), NC 27711, USA.,Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, TN, USA
| | - Shah Imran
- National Center for Computational Toxicology (NCCT), Office of Research and Development, US Environmental Protection Agency, 109 TW Alexander Dr, Research Triangle Park (RTP), NC 27711, USA
| |
Collapse
|