1
|
Pore S, Pelloux A, Chatterjee M, Banerjee A, Roy K. Machine learning-based q-RASAR predictions of the bioconcentration factor of organic molecules estimated following the organisation for economic co-operation and development guideline 305. JOURNAL OF HAZARDOUS MATERIALS 2024; 479:135725. [PMID: 39243539 DOI: 10.1016/j.jhazmat.2024.135725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/31/2024] [Accepted: 08/31/2024] [Indexed: 09/09/2024]
Abstract
In this study, we utilized an innovative quantitative read-across (RA) structure-activity relationship (q-RASAR) approach to predict the bioconcentration factor (BCF) values of a diverse range of organic compounds, based on a dataset of 575 compounds tested using Organisation for Economic Co-operation and Development Test Guideline 305 for bioaccumulation in fish. Initially, we constructed the q-RASAR model using the partial least squares regression method, yielding promising statistical results for the training set (R2 =0.71, Q2LOO=0.68, mean absolute error [MAE]training=0.54). The model was further validated using the test set (Q2F1=0.77, Q2F2=0.75, MAEtest=0.51). Subsequently, we explored the q-RASAR method using other regression-based supervised machine-learning algorithms, demonstrating favourable results for the training and test sets. All models exhibited R2 and Q2F1 values exceeding 0.7, Q2LOO values greater than 0.6, and low MAE values, indicating high model quality and predictive capability for new, unidentified chemical substances. These findings represent the significance of the RASAR method in enhancing predictivity for new unknown chemicals due to the incorporation of similarity functions in the RASAR descriptors, independent of a specific algorithm.
Collapse
Affiliation(s)
- Souvik Pore
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, 700032 Kolkata, India
| | - Alexia Pelloux
- Global Product Compliance (Europe) AB, Ideon Beta 5, Scheelevägen 17, 223 63 Lund, Sweden
| | - Mainak Chatterjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, 700032 Kolkata, India
| | - Arkaprava Banerjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, 700032 Kolkata, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, 700032 Kolkata, India.
| |
Collapse
|
2
|
Bhattacharyya P, Samanta P, Kumar A, Das S, Ojha PK. Quantitative read-across structure-property relationship (q-RASPR): a novel approach to estimate the bioaccumulative potential for diverse classes of industrial chemicals in aquatic organisms. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2024. [PMID: 39485241 DOI: 10.1039/d4em00374h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
The Bioconcentration Factor (BCF) is used to evaluate the bioaccumulation potential of chemical substances in reference organisms, and it directly correlates with ecotoxicity. Traditional in vivo BCF estimation methods are costly, time-consuming, and involve animal sacrifice. Many in silico technologies are used to avoid the problems associated with in vivo testing. This study aims to develop a quantitative read across structure-property relationship (q-RASPR) model using a structurally diverse dataset consisting of 1303 compounds by combining quantitative structure-property relationship (QSPR) and read-across (RA) algorithms. The model incorporates simple, interpretable, and reproducible 2D molecular descriptors along with RASAR descriptors. The PLS-based q-RASPR model demonstrated robust performance with internal validation metrics (R2 = 0.727 and Q2(LOO) = 0.723) and external validation metrics (Q2F1 = 0.739, Q2F2 = 0.739, and CCC = 0.858). These results indicate that the q-RASPR model is statistically superior to the corresponding QSPR model. Furthermore, screening of 1694 compounds from the Pesticide Properties Database (PPDB) was performed using the PLS-based q-RASPR model for assessing the eco-toxicological bioaccumulative potential of various compounds, ensuring the external predictability of the developed model and confirming the real-world application of the developed model. This model offers a reliable tool for predicting the BCF of new or untested compounds, thereby helping to develop safe and environment-friendly chemicals.
Collapse
Affiliation(s)
- Prodipta Bhattacharyya
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| | - Pabitra Samanta
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| | - Ankur Kumar
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| | - Shubha Das
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| | - Probir Kumar Ojha
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
3
|
Lotfi S, Ahmadi S, Azimi A, Kumar P. In silico aquatic toxicity prediction of chemicals toward Daphnia magna and fathead minnow using Monte Carlo approaches. Toxicol Mech Methods 2024:1-13. [PMID: 39397353 DOI: 10.1080/15376516.2024.2416226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 09/05/2024] [Accepted: 10/08/2024] [Indexed: 10/15/2024]
Abstract
The fast-increasing use of chemicals led to large numbers of chemical compounds entering the aquatic environment, raising concerns about their potential effects on ecosystems. Therefore, assessment of the ecotoxicological features of organic compounds on aquatic organisms is very important. Daphnia magna and Fathead minnow are two aquatic species that are commonly tested as standard test organisms for aquatic risk assessment and are typically chosen as the biological model for the ecotoxicology investigations of chemical pollutants. Herein, global quantitative structure-toxicity relationship (QSTR) models have been developed to predict the toxicity (pEC(LC)50) of a large dataset comprising 2106 chemicals toward Daphnia magna and Fathead minnow. The optimal descriptor of correlation weights (DCWs) is calculated using the notation of simplified molecular input line entry system (SMILES) and is used to construct QSTR models. Three target functions, TF1, TF2, and TF3 are utilized to generate 12 QSTR models from four splits, and their statistical characteristics are also compared. The designed QSTR models are validated using both internal and external validation criteria and are found to be reliable, robust, and excellently predictive. Among the models, those generated using the TF3 demonstrate the best statistical quality with R2 values ranging from 0.9467 to 0.9607, Q2 values ranging from 0.9462 to 0.9603 and RMSE values ranging from 0.3764 to 0.4413 for the validation set. The applicability domain and the mechanistic interpretations of generated models were also discussed.
Collapse
Affiliation(s)
- Shahram Lotfi
- Department of Chemistry, Payame Noor University (PNU), Tehran, Iran
| | - Shahin Ahmadi
- Department of Pharmaceutical Chemistry, Faculty of Pharmaceutical Chemistry, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
| | - Ali Azimi
- Department of Chemistry, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Parvin Kumar
- Department of Chemistry, Kurukshetra University, Kurukshetra, India
| |
Collapse
|
4
|
Khan MSJ, Sidek LM, Kumar P, Alkhadher SAA, Basri H, Zawawi MH, El-Shafie A, Ahmed AN. Machine learning based-model to predict catalytic performance on removal of hazardous nitrophenols and azo dyes pollutants from wastewater. Int J Biol Macromol 2024; 278:134701. [PMID: 39151852 DOI: 10.1016/j.ijbiomac.2024.134701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/26/2024] [Accepted: 08/11/2024] [Indexed: 08/19/2024]
Abstract
To maintain human health and purity of drinking water, it is crucial to eliminate harmful chemicals such as nitrophenols and azo dyes, considering their natural presence in the surroundings. In this particular research study, the application of machine learning techniques was employed in order to make an estimation of the performance of reduction catalysis in the context of ecologically detrimental nitrophenols and azo dyes contaminants. The catalyst utilized in the experiment was Ag@CMC, which proved to be highly effective in eliminating various contaminants found in water, like 4-nitrophenol (4-NP). The experiments were carefully conducted at various time intervals, and the machine learning procedures used in this study were all employed to forecast catalytic performance. The evaluation of the performance of such algorithms were done by means of Mean Absolute Error. The noteworthy findings of this research indicated that the ADAM and LSTM algorithm exhibited the most favourable performance in the case of toxic compounds i.e. 4-NP. Moreover, the Ag@CMC catalyst demonstrated an impressive reduction efficiency of 98 % against nitrophenol in just 8 min. Thus, based on these compelling results, it can be concluded that Ag@CMC works as a highly effective catalyst for practical applications in real-world scenarios.
Collapse
Affiliation(s)
| | - Lariyah Mohd Sidek
- Institute of Energy Infrastructure (IEI), Universiti Tenaga Nasional (UNITEN), 43000, Selangor, Malaysia
| | - Pavitra Kumar
- Department of Geography and Planning, University of Liverpool, Liverpool, UK
| | | | - Hidayah Basri
- Institute of Energy Infrastructure (IEI), Universiti Tenaga Nasional (UNITEN), 43000, Selangor, Malaysia
| | - Mohd Hafiz Zawawi
- Institute of Energy Infrastructure (IEI), Universiti Tenaga Nasional (UNITEN), 43000, Selangor, Malaysia
| | - Ahmed El-Shafie
- National Water and Energy Center, United Arab Emirates University, P.O. Box 15551, Al Ain, United Arab Emirates
| | - Ali Najah Ahmed
- Department of Engineering School of Engineering and Technology, Sunway University, Bandar Sunway, Petaling Jaya, 47500, Malaysia.
| |
Collapse
|
5
|
Wang R, Wang B, Chen A. Application of machine learning in the study of development, behavior, nerve, and genotoxicity of zebrafish. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 358:124473. [PMID: 38945191 DOI: 10.1016/j.envpol.2024.124473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 05/26/2024] [Accepted: 06/28/2024] [Indexed: 07/02/2024]
Abstract
Machine learning (ML) as a novel model-based approach has been used in studying aquatic toxicology in the environmental field. Zebrafish, as an ideal model organism in aquatic toxicology research, has been widely used to study the toxic effects of various pollutants. However, toxicity testing on organisms may cause significant harm, consume considerable time and resources, and raise ethical concerns. Therefore, ML is used in related research to reduce animal experiments and assist researchers in conducting toxicological research. Although ML techniques have matured in various fields, research on ML-based aquatic toxicology is still in its infancy due to the lack of comprehensive large-scale toxicity databases for environmental pollutants and model organisms. Therefore, to better understand the recent research progress of ML in studying the development, behavior, nerve, and genotoxicity of zebrafish, this review mainly focuses on using ML modeling to assess and predict the toxic effects of zebrafish exposure to different toxic chemicals. Meanwhile, the opportunities and challenges faced by ML in the field of toxicology were analyzed. Finally, suggestions and perspectives were proposed for the toxicity studies of ML on zebrafish in future applications.
Collapse
Affiliation(s)
- Rui Wang
- Key Laboratory of Karst Georesources and Environment, Ministry of Education, (Guizhou University), Guiyang, Guizhou, 550025, China
| | - Bing Wang
- Key Laboratory of Karst Georesources and Environment, Ministry of Education, (Guizhou University), Guiyang, Guizhou, 550025, China; College of Resources and Environmental Engineering, Guizhou University, Guiyang, Guizhou, 550025, China.
| | - Anying Chen
- College of Resources and Environmental Engineering, Guizhou University, Guiyang, Guizhou, 550025, China
| |
Collapse
|
6
|
Italiya G, Subramanian S. Leveraging new approach methodologies: ecotoxicological modelling of endocrine disrupting chemicals to Danio rerio through machine learning and toxicity studies. Toxicol Mech Methods 2024:1-17. [PMID: 39223866 DOI: 10.1080/15376516.2024.2400324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/30/2024] [Accepted: 07/31/2024] [Indexed: 09/04/2024]
Abstract
New approach methodologies (NAMs) offer information tailored to the intended application while reducing the use of animals. NAMs aim to develop quantitative structure-activity relationship (QSAR) and quantitive-Read-Across structure-activity relationship (q-RASAR) models to predict and categorize the acute toxicity of known and unknown endocrine-disrupting chemicals (EDCs) against zebrafish. EDCs are a diverse group of toxic substances that disrupt the endocrine system of humans and animals. The q-RASAR model was constructed and verified using validation metrics (R2 = 0.886 and Q2 = 0.814) which found to be more reliable model compare to QSAR model. The substructure fingerprint was well-fitted for the classification model and it was validated using 10-fold average accuracy (Q = 86.88%), specificity (Sp = 88.89%), Matthew's correlation curve (MCC = 0.621) and receiver operating characteristics (ROC = 0.828). The dataset of unknown substances revealed that phenolphthalein (Php) exhibited a significant level of toxicity based on q-RASAR model. The docking and simulation study indicated that the computationally derived important features successfully bound to the target zebrafish sex hormone binding globulin (zfSHBG). The experimental LC50 value of 0.790 mg L-1 was very close to the predicted value of 0.763 mg L-1, which provides high confidence to the developed model.
Collapse
Affiliation(s)
- Gopal Italiya
- School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
| | - Sangeetha Subramanian
- School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
7
|
Yang X, Sun J, Jin B, Lu Y, Cheng J, Jiang J, Zhao Q, Shuai J. Multi-task aquatic toxicity prediction model based on multi-level features fusion. J Adv Res 2024:S2090-1232(24)00226-1. [PMID: 38844122 DOI: 10.1016/j.jare.2024.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 05/21/2024] [Accepted: 06/02/2024] [Indexed: 06/09/2024] Open
Abstract
INTRODUCTION With the escalating menace of organic compounds in environmental pollution imperiling the survival of aquatic organisms, the investigation of organic compound toxicity across diverse aquatic species assumes paramount significance for environmental protection. Understanding how different species respond to these compounds helps assess the potential ecological impact of pollution on aquatic ecosystems as a whole. Compared with traditional experimental methods, deep learning methods have higher accuracy in predicting aquatic toxicity, faster data processing speed and better generalization ability. OBJECTIVES This article presents ATFPGT-multi, an advanced multi-task deep neural network prediction model for organic toxicity. METHODS The model integrates molecular fingerprints and molecule graphs to characterize molecules, enabling the simultaneous prediction of acute toxicity for the same organic compound across four distinct fish species. Furthermore, to validate the advantages of multi-task learning, we independently construct prediction models, named ATFPGT-single, for each fish species. We employ cross-validation in our experiments to assess the performance and generalization ability of ATFPGT-multi. RESULTS The experimental results indicate, first, that ATFPGT-multi outperforms ATFPGT-single on four fish datasets with AUC improvements of 9.8%, 4%, 4.8%, and 8.2%, respectively, demonstrating the superiority of multi-task learning over single-task learning. Furthermore, in comparison with previous algorithms, ATFPGT-multi outperforms comparative methods, emphasizing that our approach exhibits higher accuracy and reliability in predicting aquatic toxicity. Moreover, ATFPGT-multi utilizes attention scores to identify molecular fragments associated with fish toxicity in organic molecules, as demonstrated by two organic molecule examples in the main text, demonstrating the interpretability of ATFPGT-multi. CONCLUSION In summary, ATFPGT-multi provides important support and reference for the further development of aquatic toxicity assessment. All of codes and datasets are freely available online at https://github.com/zhaoqi106/ATFPGT-multi.
Collapse
Affiliation(s)
- Xin Yang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China; Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi 276000, China
| | - Bingyu Jin
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Yuer Lu
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Jinyan Cheng
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Jiaju Jiang
- College of Life Sciences, Sichuan University, Chengdu 610064, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China.
| | - Jianwei Shuai
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China; Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou 325001, China.
| |
Collapse
|
8
|
Liu G, Li X, Guo Y, Zhang L, Liu H, Ai H. Ensemble multiclassification model for predicting developmental toxicity in zebrafish. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2024; 271:106936. [PMID: 38723470 DOI: 10.1016/j.aquatox.2024.106936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 04/29/2024] [Accepted: 05/01/2024] [Indexed: 05/21/2024]
Abstract
In recent years, with the rapid development of society, organic compounds have been released into aquatic environments in various forms, posing a significant threat to the survival of aquatic organisms. The assessment of developmental toxicity is an important part of environmental safety risk systems, helping to identify the potential impacts of organic compounds on the embryonic development of aquatic organisms and enabling early detection and warning of potential ecological risks. Additionally, binary classification models cannot accurately classify organic compounds. Therefore, it is crucial to construct a multiclassification model for predicting the developmental toxicity of organic compounds. In this study, binary and multiclassification models were developed based on the ToxCast™ Phase I chemical library and literature data. The random forest, support vector machine, extreme gradient boosting, adaptive gradient boosting, and C5.0 decision tree algorithms, as well as 8 types of molecular fingerprint were used to establish a multiclassification base model for predicting developmental toxicity through 5-fold cross-validation and external validation. Ultimately, a multiclassification ensemble model was derived through a voting method. The performance of the binary ensemble model, as measured by the balanced accuracy, was 0.918, while that of the multiclassification model was 0.819. The developmental toxicity voting ensemble model (DT-VEM) achieved accuracies of 0.804, 0.834, and 0.855. Furthermore, by utilizing the XGBoost machine learning algorithm to construct separate models for molecular descriptors and substructure molecular fingerprints, we identified several substructures and physical properties related to developmental toxicity. Our research contributes to a more detailed classification of developmental toxicity, providing a new and valuable tool for predicting the developmental toxicity effects of unknown compounds. This supplement addresses the limitations of previous tools, as it offers an enhanced ability to predict potential developmental toxicity in novel compounds.
Collapse
Affiliation(s)
- Gaohua Liu
- College of Life Science, Liaoning University, Shenyang, 110036, China
| | - Xinran Li
- College of Life Science, Liaoning University, Shenyang, 110036, China
| | - Yaxu Guo
- College of Life Science, Liaoning University, Shenyang, 110036, China
| | - Li Zhang
- College of Life Science, Liaoning University, Shenyang, 110036, China; China Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, China
| | - Hongsheng Liu
- College of Life Science, Liaoning University, Shenyang, 110036, China; China Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, China
| | - Haixin Ai
- College of Life Science, Liaoning University, Shenyang, 110036, China; China Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, China.
| |
Collapse
|
9
|
Mostafa F, Chen M. Computational models for predicting liver toxicity in the deep learning era. FRONTIERS IN TOXICOLOGY 2024; 5:1340860. [PMID: 38312894 PMCID: PMC10834666 DOI: 10.3389/ftox.2023.1340860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 12/22/2023] [Indexed: 02/06/2024] Open
Abstract
Drug-induced liver injury (DILI) is a severe adverse reaction caused by drugs and may result in acute liver failure and even death. Many efforts have centered on mitigating risks associated with potential DILI in humans. Among these, quantitative structure-activity relationship (QSAR) was proven to be a valuable tool for early-stage hepatotoxicity screening. Its advantages include no requirement for physical substances and rapid delivery of results. Deep learning (DL) made rapid advancements recently and has been used for developing QSAR models. This review discusses the use of DL in predicting DILI, focusing on the development of QSAR models employing extensive chemical structure datasets alongside their corresponding DILI outcomes. We undertake a comprehensive evaluation of various DL methods, comparing with those of traditional machine learning (ML) approaches, and explore the strengths and limitations of DL techniques regarding their interpretability, scalability, and generalization. Overall, our review underscores the potential of DL methodologies to enhance DILI prediction and provides insights into future avenues for developing predictive models to mitigate DILI risk in humans.
Collapse
Affiliation(s)
- Fahad Mostafa
- Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, United States
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Minjun Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| |
Collapse
|
10
|
Chen P, Hu Y, Chen G, Zhao N, Dou Z. Probing the bioconcentration and metabolism disruption of bisphenol A and its analogues in adult female zebrafish from integrated AutoQSAR and metabolomics studies. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 905:167011. [PMID: 37704156 DOI: 10.1016/j.scitotenv.2023.167011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 08/31/2023] [Accepted: 09/10/2023] [Indexed: 09/15/2023]
Abstract
Plenty of emerging bisphenol A (BPA) substitutes rise to wait for assessment of bioconcentration and metabolism disruption. Computational methods are useful to fill the data gap in chemical risk assessment, such as automated quantitative structure-activity relationship (AutoQSAR). It is not clear how AutoQSAR performs in predicting the bioconcentration factor (BCF) in adult zebrafish. Herein, AutoQSAR was used to predict the logBCFs of BPA, bisphenol AF (BPAF), bisphenol B, bisphenol F and bisphenol S (BPS). For the test set, a linear relationship was shown between the observed and predicted logBCFs with a slope of 0.97. The predicted logBCFs of these five bisphenols were quite close to their experimental data with a slope of 0.94, suggesting better performance than directed message passing neural networks and EPI Suite with a slope of 0.69 and 0.61, respectively. Thus, AutoQSAR is powerful in modeling logBCFs in fish with minimal time and expertise. To link bioconcentration with metabolic effects, female zebrafish were exposed to BPA, BPAF and BPS for metabolomics analysis. BPA caused a significant disturbance in amino acid metabolism, while BPAF and BPS significantly altered another three metabolic pathways, showing chemical-specific responses. BPAF with the highest logBCF elicited the strongest metabolomic responses reflected by the metabolic effect level index, followed by BPA and BPS. Thus, BPAF and BPS elicited higher or similar metabolism disruption compared with BPA in female zebrafish, respectively, reflecting consequences of bioconcentration.
Collapse
Affiliation(s)
- Pengyu Chen
- Jiangsu Province Engineering Research Center for Marine Bio-resources Sustainable Utilization, College of Oceanography, Hohai University, Nanjing 210024, China; Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, Hohai University, Nanjing 210024, China.
| | - Yuxi Hu
- Jiangsu Province Engineering Research Center for Marine Bio-resources Sustainable Utilization, College of Oceanography, Hohai University, Nanjing 210024, China
| | - Geng Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 330106, China
| | - Na Zhao
- Jiangsu Province Engineering Research Center for Marine Bio-resources Sustainable Utilization, College of Oceanography, Hohai University, Nanjing 210024, China
| | - Zhichao Dou
- Jiangsu Province Engineering Research Center for Marine Bio-resources Sustainable Utilization, College of Oceanography, Hohai University, Nanjing 210024, China
| |
Collapse
|
11
|
Guo W, Liu J, Dong F, Song M, Li Z, Khan MKH, Patterson TA, Hong H. Review of machine learning and deep learning models for toxicity prediction. Exp Biol Med (Maywood) 2023; 248:1952-1973. [PMID: 38057999 PMCID: PMC10798180 DOI: 10.1177/15353702231209421] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023] Open
Abstract
The ever-increasing number of chemicals has raised public concerns due to their adverse effects on human health and the environment. To protect public health and the environment, it is critical to assess the toxicity of these chemicals. Traditional in vitro and in vivo toxicity assays are complicated, costly, and time-consuming and may face ethical issues. These constraints raise the need for alternative methods for assessing the toxicity of chemicals. Recently, due to the advancement of machine learning algorithms and the increase in computational power, many toxicity prediction models have been developed using various machine learning and deep learning algorithms such as support vector machine, random forest, k-nearest neighbors, ensemble learning, and deep neural network. This review summarizes the machine learning- and deep learning-based toxicity prediction models developed in recent years. Support vector machine and random forest are the most popular machine learning algorithms, and hepatotoxicity, cardiotoxicity, and carcinogenicity are the frequently modeled toxicity endpoints in predictive toxicology. It is known that datasets impact model performance. The quality of datasets used in the development of toxicity prediction models using machine learning and deep learning is vital to the performance of the developed models. The different toxicity assignments for the same chemicals among different datasets of the same type of toxicity have been observed, indicating benchmarking datasets is needed for developing reliable toxicity prediction models using machine learning and deep learning algorithms. This review provides insights into current machine learning models in predictive toxicology, which are expected to promote the development and application of toxicity prediction models in the future.
Collapse
Affiliation(s)
- Wenjing Guo
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Jie Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Fan Dong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Meng Song
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Zoe Li
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Md Kamrul Hasan Khan
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Tucker A Patterson
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| |
Collapse
|
12
|
He Y, Liu G, Hu S, Wang X, Jia J, Zhou H, Yan X. Implementing comprehensive machine learning models of multispecies toxicity assessment to improve regulation of organic compounds. JOURNAL OF HAZARDOUS MATERIALS 2023; 458:131942. [PMID: 37390684 DOI: 10.1016/j.jhazmat.2023.131942] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 06/12/2023] [Accepted: 06/24/2023] [Indexed: 07/02/2023]
Abstract
Machine learning has made significant progress in assessing the risk associated with hazardous chemicals. However, most models were constructed by randomly selecting one algorithm and one toxicity endpoint towards single species, which may cause biased regulation of chemicals. In the present study, we implemented comprehensive prediction models involving multiple advanced machine learning and end-to-end deep learning to assess the aquatic toxicity of chemicals. The generated optimal models accurately unravel the quantitative structure-toxicity relationships, with the correlation coefficients of all training sets from 0.59 to 0.81 and of the test sets from 0.56 to 0.83. For each chemical, its ecological risk was determined from the toxicity information towards multiple species. The results also revealed the toxicity mechanism of chemicals was species sensitivity, and the high-level organisms were faced with more serious side effects from hazardous substances. The proposed approach was finally applied to screen over 16,000 compounds and identify high-risk chemicals. We believe that the current approach can provide a useful tool for predicting the toxicity of diverse organic chemicals and help regulatory authorities make more reasonable decisions.
Collapse
Affiliation(s)
- Ying He
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Guohong Liu
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China; School of Agriculture and Biological Sciences, Qiannan Normal University for Nationalities, Duyun 558000, China
| | - Song Hu
- School of Environmental Science and Engineering, Shandong University, Qingdao 266237, China
| | - Xiaohong Wang
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Jianbo Jia
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Hongyu Zhou
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China.
| | - Xiliang Yan
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China; School of Agriculture and Biological Sciences, Qiannan Normal University for Nationalities, Duyun 558000, China.
| |
Collapse
|
13
|
Li X, Liu G, Wang Z, Zhang L, Liu H, Ai H. Ensemble multiclassification model for aquatic toxicity of organic compounds. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2023; 255:106379. [PMID: 36587517 DOI: 10.1016/j.aquatox.2022.106379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 12/04/2022] [Accepted: 12/19/2022] [Indexed: 06/17/2023]
Abstract
With environmental pollution becoming increasingly serious, organic compounds have become the main hazard of environmental pollution and exert substantial negative impacts on aquatic organisms. In research pertaining to the acute toxicity of organic compounds, traditional biological experimental methods are time-consuming and expensive. In addition, computer-aided binary classification models cannot accurately classify acute toxicity. Therefore, the multiclassication model is necessary for more accurate classification of acute toxicity. In this study, median lethal concentrations of 373 organic compounds in the environmental toxicology datasets ECOTOX and EAT5 were used. These chemicals were classified into four categories based on the European Economic Community criteria. Then the random forest, support vector machine, extreme gradient boosting, adaptive gradient boosting, and C5.0 decision tree algorithms and eight molecular fingerprints were used to build a multiclassification base model for the acute toxicity of organic compounds. The base models were repeated 100 times with fivefold cross-validation and external validation. The ensemble model was obtained by the voting method. The best base classifier was ExtendFP-C5.0, which had an accuracy, sensitivity and specificity values of 87.30%, 87.32% and 95.76% for external validation, and the voting ensemble model performance of 96.92%, 96.93% and 98.97%, respectively. The ensemble model achieved a higher accuracy than previously reported studies. Our study will help to further classify the acute toxicity of organic compounds to aquatic organisms and predict the hazard classes of organic compounds.
Collapse
Affiliation(s)
- Xinran Li
- College of Life Science, Liaoning University, Shenyang, 110036, China
| | - Gaohua Liu
- College of Life Science, Liaoning University, Shenyang, 110036, China
| | - Zhibo Wang
- College of Life Science, Liaoning University, Shenyang, 110036, China
| | - Li Zhang
- College of Life Science, Liaoning University, Shenyang, 110036, China; China Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, China
| | - Hongsheng Liu
- China Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, China; College of Pharmacy, Liaoning University, Shenyang, 110036, China
| | - Haixin Ai
- College of Life Science, Liaoning University, Shenyang, 110036, China; China Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, China.
| |
Collapse
|
14
|
Peng C, Zhou S, Zhang Y, Zhang H, Zhang W, Ling S, Hu S. Dynamics and mechanisms of bioaccumulation and elimination of nonylphenol in zebrafish. Toxicology 2023; 483:153375. [PMID: 36375624 DOI: 10.1016/j.tox.2022.153375] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 11/01/2022] [Accepted: 11/09/2022] [Indexed: 11/13/2022]
Abstract
Nonylphenol (NP) has been widely concerned for its endocrine disrupting effects. In this study, we investigated the accumulation and elimination of NP for the whole body and trunk of zebrafish (Danio rerio). The results show that the LC50 values of NP in zebrafish ranged from 474 μg·L-1 (24-h exposure) to 238 μg·L-1 (96-h exposure). Meanwhile, the NP concentrations in zebrafish during the depuration stage fitted the first-order kinetic model well, and the depuration rate constant (K2) was reduced from 0.412 d-1 to 0.2827 d-1 with higher NP. The half-life (t1/2) of NP was 1.75-2.45 d in the whole fish and 0.56-0.86 d in the trunk under low to high NP, respectively. Both the accumulation and elimination of NP in trunk were faster than those in whole fish, indicating the preferential transfer from viscera to muscle and rapidly diffusion in reverse. The bioconcentration factors (BCFSS) of NP were 104-112 L·kg-1 in whole body and 76-104 L·kg-1 in trunk, respectively, suggesting that the muscle was a major position for NP storage.
Collapse
Affiliation(s)
- Cheng Peng
- State Environmental Protection Key Laboratory of Environmental Risk Assessment and Control on Chemical Process, School of Resource and Environmental Engineering, East China University of Science and Technology, Shanghai 200237, China; Shanghai Academy of Environmental Sciences, Shanghai 200233, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China
| | - Shanqi Zhou
- State Environmental Protection Key Laboratory of Environmental Risk Assessment and Control on Chemical Process, School of Resource and Environmental Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Yinjie Zhang
- State Environmental Protection Key Laboratory of Environmental Risk Assessment and Control on Chemical Process, School of Resource and Environmental Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Hongchang Zhang
- Shanghai Academy of Environmental Sciences, Shanghai 200233, China
| | - Wei Zhang
- State Environmental Protection Key Laboratory of Environmental Risk Assessment and Control on Chemical Process, School of Resource and Environmental Engineering, East China University of Science and Technology, Shanghai 200237, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China.
| | - Siyuan Ling
- Shanghai Academy of Environmental Sciences, Shanghai 200233, China
| | - Shuangqing Hu
- Shanghai Academy of Environmental Sciences, Shanghai 200233, China.
| |
Collapse
|
15
|
Application of multi-objective optimization in the study of anti-breast cancer candidate drugs. Sci Rep 2022; 12:19347. [PMID: 36369522 PMCID: PMC9652409 DOI: 10.1038/s41598-022-23851-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 11/07/2022] [Indexed: 11/13/2022] Open
Abstract
In the development of anti-breast cancer drugs, the quantitative structure-activity relationship model of compounds is usually used to select potential active compounds. However, the existing methods often have problems such as low model prediction performance, lack of overall consideration of the biological activity and related properties of compounds, and difficulty in directly selection candidate drugs. Therefore, this paper constructs a complete set of compound selection framework from three aspects: feature selection, relationship mapping and multi-objective optimization problem solving. In feature selection part, a feature selection method based on unsupervised spectral clustering is proposed. The selected features have more comprehensive information expression ability. In the relationship mapping part, a variety of machine learning algorithms are used for comparative experiments. Finally, the CatBoost algorithm is selected to perform the relationship mapping between each other, and better prediction performance is achieved. In the multi-objective optimization part, based on the analysis of the conflict relationship between the objectives, the AGE-MOEA algorithm is improved and used to solve this problem. Compared with various algorithms, the improved algorithm has better search performance.
Collapse
|
16
|
Yang L, Chen P, He K, Wang R, Chen G, Shan G, Zhu L. Predicting bioconcentration factor and estrogen receptor bioactivity of bisphenol a and its analogues in adult zebrafish by directed message passing neural networks. ENVIRONMENT INTERNATIONAL 2022; 169:107536. [PMID: 36152365 DOI: 10.1016/j.envint.2022.107536] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 08/23/2022] [Accepted: 09/19/2022] [Indexed: 06/16/2023]
Abstract
The bioconcentration factor (BCF) is a key parameter for bioavailability assessment of environmental pollutants in regulatory frameworks. The comparative toxicology and mechanism of action of congeners are also of concern. However, there are limitations to acquire them by conducting field and laboratory experiments while machinelearning is emerging as a promising predictive tool to fill the gap. In this study, the Direct Message Passing Neural Network (DMPNN) was applied to predict logBCFs of bisphenol A (BPA) and its four analogues (bisphenol AF (BPAF), bisphenol B (BPB), bisphenol F (BPF) and bisphenol S (BPS)). For the test set, the Pearson correlation coefficient (PCC) and mean square error (MSE) were 0.85 and 0.52 respectively, suggesting a good predictive performance. The predicted logBCFs values by the DMPNN ranging from 0.35 (BPS) to 2.14 (BPAF) coincided well with those by the classical EPI Suite (BCFBAF model). Besides, estrogen receptor α (ERα) bioactivity of these bisphenols was also predicted well by the DMPNN, with a probability of 97.0 % (BPB) to 99.7 % (BPAF), which was validated by the extent of vitellogenin (VTG) induction in male zebrafish as a biomarker except BPS. Thus, with little need for expert knowledge, DMPNN is confirmed to be a useful tool to accurately predict logBCF and screen for estrogenic activity from molecular structures. Moreover, a gender difference was noted in the changes of three endpoints (logBCF, ER binding affinity and VTG levels), the rank order of which was BPAF > BPB > BPA > BPF > BPS consistently, and abnormal amino acid metabolism is featured as an omics signature of abnormal hormone protein expression.
Collapse
Affiliation(s)
- Liping Yang
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China
| | - Pengyu Chen
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China; College of Oceanography, Hohai University, Nanjing 210098, China
| | - Keyan He
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China
| | - Ruihan Wang
- College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, China
| | - Geng Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 330106, China
| | - Guoqiang Shan
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China.
| | - Lingyan Zhu
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China
| |
Collapse
|
17
|
Zhang R, Guo H, Hua Y, Cui X, Shi Y, Li X. Modeling and insights into the structural basis of chemical acute aquatic toxicity. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2022; 242:113940. [PMID: 35999760 DOI: 10.1016/j.ecoenv.2022.113940] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 07/16/2022] [Accepted: 07/29/2022] [Indexed: 06/15/2023]
Abstract
It has become a top global regulatory priority to prevent and control pollution from the release of synthetic chemicals, which continues to affect the aquatic communities. In the past decades, computational tools were largely used to significantly reduce the budget and time cost of chemical acute aquatic toxicity assessment. But the structural basis of toxic compounds was rarely analyzed. In the present study, we collected 1438, 485 and 961 chemicals with acute toxicity data records for three representative aquatic species, including Tetrahymena pyriformis, Daphnia magna, and Fathead minnow, respectively. A series of artificial intelligence models were developed using OCHEM tools. For each aquatic toxicity endpoint, a consensus model was developed based on the top performed individual models. The consensus models provided good performance on external validation sets with total accuracy values 96.88 %, 90.63 %, and 84.90 % for Tetrahymena pyriformis toxicity (TPT), Daphnia magna toxicity (DMT), and Fathead minnow toxicity (FMT), respectively. The models can be freely accessed via https://ochem.eu/article/146910. Moreover, the analysis of physical-chemical properties suggested that several key molecular properties of aquatic toxic compounds were significantly different with those of non-toxic compounds. Thus, these descriptors may be associated to chemical acute aquatic toxicity, and may be useful for the understand of chemical aquatic toxicity. Besides, in this study, the structural alerts for aquatic toxicity were detected using f-score and frequency ratio analysis of predefined substructures. A total of 112, 58 and 33 structural alerts were identified responsible for TPT, DMT, and FMT, respectively. These structural alerts could provide useful information for the mechanisms of chemical aquatic toxicity and visual alerts for environmental assessment. All the structural alerts were integrated in the web-server SApredictor (www.sapredictor.cn).
Collapse
Affiliation(s)
- Ruiqiu Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Huizhu Guo
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Yuqing Hua
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Xueyan Cui
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Yinping Shi
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Xiao Li
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China; Department of Clinical Pharmacy, Shandong Provincial Qianfoshan Hospital, Shandong University, Jinan 250014, China.
| |
Collapse
|
18
|
Abstract
Machine learning and artificial intelligence approaches have revolutionized multiple disciplines, including toxicology. This review summarizes representative recent applications of machine learning and artificial intelligence approaches in different areas of toxicology, including physiologically based pharmacokinetic (PBPK) modeling, quantitative structure-activity relationship modeling for toxicity prediction, adverse outcome pathway analysis, high-throughput screening, toxicogenomics, big data and toxicological databases. By leveraging machine learning and artificial intelligence approaches, now it is possible to develop PBPK models for hundreds of chemicals efficiently, to create in silico models to predict toxicity for a large number of chemicals with similar accuracies compared to in vivo animal experiments, and to analyze a large amount of different types of data (toxicogenomics, high-content image data, etc.) to generate new insights into toxicity mechanisms rapidly, which was impossible by manual approaches in the past. To continue advancing the field of toxicological sciences, several challenges should be considered: (1) not all machine learning models are equally useful for a particular type of toxicology data, and thus it is important to test different methods to determine the optimal approach; (2) current toxicity prediction is mainly on bioactivity classification (yes/no), so additional studies are needed to predict the intensity of effect or dose-response relationship; (3) as more data become available, it is crucial to perform rigorous data quality check and develop infrastructure to store, share, analyze, evaluate, and manage big data; and (4) it is important to convert machine learning models to user-friendly interfaces to facilitate their applications by both computational and bench scientists.
Collapse
Affiliation(s)
- Zhoumeng Lin
- Department of Environmental and Global Health, College of Public Health and Health Professions, University of Florida, Gainesville, FL, 32610, USA.,Center for Environmental and Human Toxicology, University of Florida, FL, 32608, USA
| | - Wei-Chun Chou
- Department of Environmental and Global Health, College of Public Health and Health Professions, University of Florida, Gainesville, FL, 32610, USA.,Center for Environmental and Human Toxicology, University of Florida, FL, 32608, USA
| |
Collapse
|
19
|
Abstract
In this paper, we study the classification problem of large data with many features and strong feature dependencies. This type of problem has shortcomings when handled by machine learning models. Therefore, a classification model with cognitive reasoning ability is proposed. The core idea is to use cognitive reasoning mechanism proposed in this paper to solve the classification problem of large structured data with multiple features and strong correlation between features, and then implements cognitive reasoning for features. The model has three parts. The first part proposes a Feature-to-Image algorithm for converting structured data into image data. The algorithm quantifies the dependencies between features, so as to take into account the impact of individual independent features and correlations between features on the prediction results. The second part designs and implements low-level feature extraction of the quantified features using convolutional neural networks. With the relative symmetry of the capsule network, the third part proposes a cognitive reasoning mechanism to implement high-level feature extraction, feature cognitive reasoning, and classification tasks of the data. At the same time, this paper provides the derivation process and algorithm description of cognitive reasoning mechanism. Experiments show that our model is efficient and outperforms comparable models on the category prediction experiment of ADMET properties of five compounds.This work will provide a new way for cognitive computing of intelligent data analysis.
Collapse
|
20
|
Wu J, D'Ambrosi S, Ammann L, Stadnicka-Michalak J, Schirmer K, Baity-Jesi M. Predicting chemical hazard across taxa through machine learning. ENVIRONMENT INTERNATIONAL 2022; 163:107184. [PMID: 35306252 DOI: 10.1016/j.envint.2022.107184] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 02/07/2022] [Accepted: 03/11/2022] [Indexed: 06/14/2023]
Abstract
We applied machine learning methods to predict chemical hazards focusing on fish acute toxicity across taxa. We analyzed the relevance of taxonomy and experimental setup, showing that taking them into account can lead to considerable improvements in the classification performance. We quantified the gain obtained throught the introduction of taxonomic and experimental information, compared to classification based on chemical information alone. We used our approach with standard machine learning models (K-nearest neighbors, random forests and deep neural networks), as well as the recently proposed Read-Across Structure Activity Relationship (RASAR) models, which were very successful in predicting chemical hazards to mammals based on chemical similarity. We were able to obtain accuracies of over 93% on datasets where, due to noise in the data, the maximum achievable accuracy was expected to be below 96%. The best performances were obtained by random forests and RASAR models. We analyzed metrics to compare our results with animal test reproducibility, and despite most of our models "outperform animal test reproducibility" as measured through recently proposed metrics, we showed that the comparison between machine learning performance and animal test reproducibility should be addressed with particular care. While we focused on fish mortality, our approach, provided that the right data is available, is valid for any combination of chemicals, effects and taxa.
Collapse
Affiliation(s)
- Jimeng Wu
- Eawag, Überlandstrasse 133, CH-8600 Dübendorf, Switzerland; Department of Environmental Engineering, ETHZ, Zurich, Switzerland.
| | - Simone D'Ambrosi
- Department of Statistics, Sapienza University of Rome, Piazzale Aldo Moro, 5, 00185 Rome, RM, Italy
| | - Lorenz Ammann
- Swiss Federal Institute for Forest, Snow, and Landscape Research WSL, Zürcherstrasse 111, CH-8903 Birmensdorf, Switzerland
| | | | - Kristin Schirmer
- Eawag, Überlandstrasse 133, CH-8600 Dübendorf, Switzerland; School of Architecture, Civil and Environmental Engineering, EPFL, Lausanne, Switzerland.
| | | |
Collapse
|
21
|
In silico prediction models for thyroid peroxidase inhibitors and their application to synthetic flavors. Food Sci Biotechnol 2022; 31:483-495. [PMID: 35464247 PMCID: PMC8994803 DOI: 10.1007/s10068-022-01041-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 01/22/2022] [Accepted: 02/02/2022] [Indexed: 11/27/2022] Open
Abstract
AbstractSystematic toxicity tests are often waived for the synthetic flavors as they are added in a very small amount in foods. However, their safety for some endpoints such as endocrine disruption should be concerned as they are likely to be active in low levels. In this case, structure–activity-relationship (SAR) models are good alternatives. In this study, therefore, binary, ternary, and quaternary prediction models were designed using simple or complex machine-learning methods. Overall, hard-voting classifiers outperformed other methods. The test scores for the best binary, ternary, and quaternary models were 0.6635, 0.5083, and 0.5217, respectively. Along with model development, some substructures including primary aromatic amine, (enol)ether, phenol, heterocyclic sulfur, and heterocyclic nitrogen, dominantly occurred in the most highly active compounds. The best predicting models were applied to synthetic flavors, and 22 agents appeared to have a strong inhibitory potential towards TPO activities.
Collapse
|
22
|
Zhang K, Zhang H. Predicting Solute Descriptors for Organic Chemicals by a Deep Neural Network (DNN) Using Basic Chemical Structures and a Surrogate Metric. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:2054-2064. [PMID: 34995441 DOI: 10.1021/acs.est.1c05398] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Solute descriptors have been widely used to model chemical transfer processes through poly-parameter linear free energy relationships (pp-LFERs); however, there are still substantial difficulties in obtaining these descriptors accurately and quickly for new organic chemicals. In this research, models (PaDEL-DNN) that require only SMILES of chemicals were built to satisfactorily estimate pp-LFER descriptors using deep neural networks (DNN) and the PaDEL chemical representation. The PaDEL-DNN-estimated pp-LFER descriptors demonstrated good performance in modeling storage-lipid/water partitioning coefficient (log Kstorage-lipid/water), bioconcentration factor (BCF), aqueous solubility (ESOL), and hydration free energy (freesolve). Then, assuming that the accuracy in the estimated values of widely available properties, e.g., logP (octanol-water partition coefficient), can calibrate estimates for less available but related properties, we proposed logP as a surrogate metric for evaluating the overall accuracy of the estimated pp-LFER descriptors. When using the pp-LFER descriptors to model log Kstorage-lipid/water, BCF, ESOL, and freesolve, we achieved around 0.1 log unit lower errors for chemicals whose estimated pp-LFER descriptors were deemed "accurate" by the surrogate metric. The interpretation of the PaDEL-DNN models revealed that, for a given test chemical, having several (around 5) "similar" chemicals in the training data set was crucial for accurate estimation while the remaining less similar training chemicals provided reasonable baseline estimates. Lastly, pp-LFER descriptors for over 2800 persistent, bioaccumulative, and toxic chemicals were reasonably estimated by combining PaDEL-DNN with the surrogate metric. Overall, the PaDEL-DNN/surrogate metric and newly estimated descriptors will greatly benefit chemical transfer modeling.
Collapse
Affiliation(s)
- Kai Zhang
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States
| | - Huichun Zhang
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States
| |
Collapse
|
23
|
Huang R, Ma C, Ma J, Huangfu X, He Q. Machine learning in natural and engineered water systems. WATER RESEARCH 2021; 205:117666. [PMID: 34560616 DOI: 10.1016/j.watres.2021.117666] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 09/01/2021] [Accepted: 09/11/2021] [Indexed: 06/13/2023]
Abstract
Water resources of desired quality and quantity are the foundation for human survival and sustainable development. To better protect the water environment and conserve water resources, efficient water management, purification, and transportation are of critical importance. In recent years, machine learning (ML) has exhibited its practicability, reliability, and high efficiency in numerous applications; furthermore, it has solved conventional and emerging problems in both natural and engineered water systems. For example, ML can predict various water quality indicators in situ and real-time by considering the complex interactions among water-related variables. ML approaches can also solve emerging pollution problems with proven rules or universal mechanisms summarized from the related research. Moreover, by applying image recognition technology to analyze the relationships between image information and physicochemical properties of the research object, ML can effectively identify and characterize specific contaminants. In view of the bright prospects of ML, this review comprehensively summarizes the development of ML applications in natural and engineered water systems. First, the concept and modeling steps of ML are briefly introduced, including data preparation, algorithm selection and model evaluation. In addition, comprehensive applications of ML in recent studies, including predicting water quality, mapping groundwater contaminants, classifying water resources, tracing contaminant sources, and evaluating pollutant toxicity in natural water systems, as well as modeling treatment techniques, assisting characterization analysis, purifying and distributing drinking water, and collecting and treating sewage water in engineered water systems, are summarized. Finally, the advantages and disadvantages of commonly used algorithms are analyzed according to their structures and mechanisms, and recommendations on the selection of ML algorithms for different studies, as well as prospects on the application and development of ML in water science are proposed. This review provides references for solving a wider range of water-related problems and brings further insights into the intelligent development of water science.
Collapse
Affiliation(s)
- Ruixing Huang
- Key Laboratory of Eco-environments in the Three Gorges Reservoir Region, Ministry of Education, College of Environmental and Ecology, Chongqing University, Chongqing 400044, China; State Key Laboratory of Urban Water Resource and Environment, School of Municipal and Environmental Engineering, Harbin Institute of Technology, Harbin 150090, China
| | - Chengxue Ma
- Key Laboratory of Eco-environments in the Three Gorges Reservoir Region, Ministry of Education, College of Environmental and Ecology, Chongqing University, Chongqing 400044, China; State Key Laboratory of Urban Water Resource and Environment, School of Municipal and Environmental Engineering, Harbin Institute of Technology, Harbin 150090, China
| | - Jun Ma
- State Key Laboratory of Urban Water Resource and Environment, School of Municipal and Environmental Engineering, Harbin Institute of Technology, Harbin 150090, China
| | - Xiaoliu Huangfu
- Key Laboratory of Eco-environments in the Three Gorges Reservoir Region, Ministry of Education, College of Environmental and Ecology, Chongqing University, Chongqing 400044, China.
| | - Qiang He
- Key Laboratory of Eco-environments in the Three Gorges Reservoir Region, Ministry of Education, College of Environmental and Ecology, Chongqing University, Chongqing 400044, China
| |
Collapse
|
24
|
Choudhary S, Herdt D, Spoor E, García Molina JF, Nachtmann M, Rädle M. Incremental Learning in Modelling Process Analysis Technology (PAT)-An Important Tool in the Measuring and Control Circuit on the Way to the Smart Factory. SENSORS 2021; 21:s21093144. [PMID: 34062767 PMCID: PMC8124399 DOI: 10.3390/s21093144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 04/15/2021] [Accepted: 04/28/2021] [Indexed: 12/01/2022]
Abstract
To meet the demands of the chemical and pharmaceutical process industry for a combination of high measurement accuracy, product selectivity, and low cost of ownership, the existing measurement and evaluation methods have to be further developed. This paper demonstrates the attempt to combine future Raman photometers with promising evaluation methods. As part of the investigations presented here, a new and easy-to-use evaluation method based on a self-learning algorithm is presented. This method can be applied to various measurement methods and is carried out here using an example of a Raman spectrometer system and an alcohol-water mixture as demonstration fluid. The spectra’s chosen bands can be later transformed to low priced and even more robust Raman photometers. The evaluation method gives more precise results than the evaluation through classical methods like one primarily used in the software package Unscrambler. This technique increases the accuracy of detection and proves the concept of Raman process monitoring for determining concentrations. In the example of alcohol/water, the computation time is less, and it can be applied to continuous column monitoring.
Collapse
Affiliation(s)
- Shivani Choudhary
- Center for Mass Spectrometry and Optical Spectroscopy, Mannheim University of Applied Sciences, Paul-Wittsack-Straße 10, 68163 Mannheim, Germany; (E.S.); (M.N.); (M.R.)
- Correspondence: (S.C.); (D.H.)
| | - Deborah Herdt
- Center for Mass Spectrometry and Optical Spectroscopy, Mannheim University of Applied Sciences, Paul-Wittsack-Straße 10, 68163 Mannheim, Germany; (E.S.); (M.N.); (M.R.)
- Correspondence: (S.C.); (D.H.)
| | - Erik Spoor
- Center for Mass Spectrometry and Optical Spectroscopy, Mannheim University of Applied Sciences, Paul-Wittsack-Straße 10, 68163 Mannheim, Germany; (E.S.); (M.N.); (M.R.)
| | - José Fernando García Molina
- Institute of Process Control and Innovative Energy Conversion, Mannheim University of Applied Sciences, 68163 Mannheim, Germany;
| | - Marcel Nachtmann
- Center for Mass Spectrometry and Optical Spectroscopy, Mannheim University of Applied Sciences, Paul-Wittsack-Straße 10, 68163 Mannheim, Germany; (E.S.); (M.N.); (M.R.)
| | - Matthias Rädle
- Center for Mass Spectrometry and Optical Spectroscopy, Mannheim University of Applied Sciences, Paul-Wittsack-Straße 10, 68163 Mannheim, Germany; (E.S.); (M.N.); (M.R.)
| |
Collapse
|
25
|
Jiao Z, Hu P, Xu H, Wang Q. Machine Learning and Deep Learning in Chemical Health and Safety: A Systematic Review of Techniques and Applications. ACS CHEMICAL HEALTH & SAFETY 2020. [DOI: 10.1021/acs.chas.0c00075] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Zeren Jiao
- Mary Kay O’Connor Process Safety Center, Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Pingfan Hu
- Mary Kay O’Connor Process Safety Center, Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Hongfei Xu
- Mary Kay O’Connor Process Safety Center, Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Qingsheng Wang
- Mary Kay O’Connor Process Safety Center, Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| |
Collapse
|
26
|
Toropova AP, Duchowicz PR, Saavedra LM, Castro EA, Toropov AA. The Use of the Index of Ideality of Correlation to Build Up Models for Bioconcentration Factor. Mol Inform 2020; 39:e1900070. [DOI: 10.1002/minf.201900070] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 12/24/2019] [Indexed: 01/16/2023]
Affiliation(s)
- Alla P. Toropova
- Laboratory of Environmental Chemistry and ToxicologyDepartment of Environmental Health ScienceIstituto di Ricerche Farmacologiche Mario Negri IRCCS Via La Masa 19 20156 Milano Italy
| | - Pablo R. Duchowicz
- Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas (INIFTA)CONICETUNLPDiag. 113 y 64C.C. 16 Sucursal 4 1900 La Plata Argentina
| | - Laura M. Saavedra
- Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas (INIFTA)CONICETUNLPDiag. 113 y 64C.C. 16 Sucursal 4 1900 La Plata Argentina
| | - Eduardo A. Castro
- Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas (INIFTA)CONICETUNLPDiag. 113 y 64C.C. 16 Sucursal 4 1900 La Plata Argentina
| | - Andrey A. Toropov
- Laboratory of Environmental Chemistry and ToxicologyDepartment of Environmental Health ScienceIstituto di Ricerche Farmacologiche Mario Negri IRCCS Via La Masa 19 20156 Milano Italy
| |
Collapse
|
27
|
Cui X, Yang R, Li S, Liu J, Wu Q, Li X. Modeling and insights into molecular basis of low molecular weight respiratory sensitizers. Mol Divers 2020; 25:847-859. [PMID: 32166484 DOI: 10.1007/s11030-020-10069-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Accepted: 03/03/2020] [Indexed: 01/10/2023]
Abstract
Respiratory sensitization has been considered an important toxicological endpoint, because of the severe risk to human health. A great part of sensitization events were caused by low molecular weight (< 1000) respiratory sensitizers in the past decades. However, there is currently no widely accepted test method that can identify prospective low molecular weight respiratory sensitisers. Herein, we performed the study of modeling and insights into molecular basis of low molecular weight respiratory sensitizers with a high-quality data set containing 136 respiratory sensitizers and 518 nonsensitizers. We built a number of classification models by using OCHEM tools, and a consensus model was developed based on the ten best individual models. The consensus model showed good predictive ability with a balanced accuracy of 0.78 and 0.85 on fivefold cross-validation and external validation, respectively. The readers can predict the respiratory sensitization of organic compounds via https://ochem.eu/article/114857 . The effect of several molecular properties on respiratory sensitization was also evaluated. The results indicated that these properties differ significantly between respiratory sensitizers and nonsensitizers. Furthermore, 14 privileged substructures responsible for respiratory sensitization were identified. We hope the models and the findings could provide useful help for environmental risk assessment.
Collapse
Affiliation(s)
- Xueyan Cui
- Department of Clinical pharmacy, Shandong Provincial Qianfoshan Hospital, Shandong University, Jinan, 250014, China
| | - Rui Yang
- Department of Clinical pharmacy, Shandong Provincial Qianfoshan Hospital, Shandong University, Jinan, 250014, China
| | - Siwen Li
- Department of Clinical pharmacy, Shandong Provincial Qianfoshan Hospital, Shandong University, Jinan, 250014, China
| | - Juan Liu
- Department of Clinical pharmacy, Shandong Provincial Qianfoshan Hospital, Shandong University, Jinan, 250014, China
| | - Qiuyun Wu
- Department of Clinical pharmacy, Shandong Provincial Qianfoshan Hospital, Shandong University, Jinan, 250014, China
| | - Xiao Li
- Department of Clinical pharmacy, Shandong Provincial Qianfoshan Hospital, Shandong University, Jinan, 250014, China. .,Department of Clinical pharmacy, The First Affiliated Hospital of Shandong First Medical University, Shandong First Medical University, Jinan, 250014, China.
| |
Collapse
|
28
|
Zhang Y, Han Z, Gao Q, Bai X, Zhang C, Hou H. Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches. Curr Pharm Des 2019; 25:4296-4302. [PMID: 31696803 DOI: 10.2174/1381612825666191107092214] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/04/2019] [Indexed: 12/14/2022]
Abstract
BACKGROUND β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. METHODS In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. RESULTS The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. CONCLUSION This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.
Collapse
Affiliation(s)
- Yuan Zhang
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| | - Zhenyan Han
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| | - Qian Gao
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| | - Xiaoyi Bai
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| | - Chi Zhang
- Huaxia Eye Hospital of Foshan, Huaxia Eye Hospital Group, Foshan, Guangdong, China.,University of Auckland, Auckland, New Zealand
| | - Hongying Hou
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| |
Collapse
|