1
|
Cao S, Tian Y, Zhao R, Gu W, Tang S, Xu L, Cai Y. Effective prediction of organosilicon molecular structures and risks in aquatic environment with machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2025; 959:178320. [PMID: 39754948 DOI: 10.1016/j.scitotenv.2024.178320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 12/18/2024] [Accepted: 12/27/2024] [Indexed: 01/06/2025]
Abstract
Until now, mass spectrometry databases lack molecular information of most organosilicon oligomers, and risk models needing accurate molecular descriptors are unavailable for these emerging contaminants with thousands of monomers. To address this issue, based on molecular/fragment ions and relative abundance from GC-Orbitrap-MS, this study developed appropriate classification (accuracies = 0.750-0.804) and regression (MSE = 0.008-0.014) models through neural network and support vector framework for organosilicon main/branch chain structures, which were subsequently used for speculating their persistent, bio-accumulative and toxic (PBT) potentials with neural networks (MSE = 0.002-0.017). By these methods, 116 oligomers [with 1-7 Si atoms, SiO (68.6 %) or CC (31.4 %) backbones, cyclic (14.7 %) or linear (85.3 %) structure, and six kinds of branch groups] were identified in waters from 21 Chinese cities, where hazard indices of total organosilicons were larger than 1 in 17 cities, with 5-43 oligomers first found in rivers showing persistent, bio-accumulative or toxic potential. Characteristic oligomers indicated dyeing, textile, and petrochemical industries making major contribution (13.1-34.8 %) to local organosilicon emission, and petrochemical industry was first found as ubiquitous source of nationwide organosilicon distribution. This study provided valuable methodology for risk assessment of organosilicons and also other chemicals lacking MS database.
Collapse
Affiliation(s)
- Shengyu Cao
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China; School of Environment, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 330106, China
| | - Youliang Tian
- Guizhou Environmental Scientific Research and Design Institute, Guiyang, Guizhou 550081, China
| | - Rusong Zhao
- Qilu University of Technology (Shandong Academy of Sciences), Shandong Analysis and Test Center, Key Laboratory for Applied Technology of Sophisticated Analytical Instruments of Shandong Province, Jinan 250014, China
| | - Wen Gu
- China CDC Key Laboratory of Environment and Population Health, National Institute of Environmental Health, Chinese Center for Disease Control and Prevention, Beijing 100021, China
| | - Song Tang
- China CDC Key Laboratory of Environment and Population Health, National Institute of Environmental Health, Chinese Center for Disease Control and Prevention, Beijing 100021, China; Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Lin Xu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China; School of Environment, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 330106, China.
| | - Yaqi Cai
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China; School of Environment, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 330106, China
| |
Collapse
|
2
|
Daneshmand M, SalarAmoli J, BaghbanZadeh N. A QSAR study for predicting malformation in zebrafish embryo. Toxicol Mech Methods 2024; 34:743-749. [PMID: 38586962 DOI: 10.1080/15376516.2024.2338907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 03/30/2024] [Indexed: 04/09/2024]
Abstract
BACKGROUND Developmental toxicity tests are extremely expensive, require a large number of animals, and are time-consuming. It is necessary to develop a new approach to simplify the analysis of developmental endpoints. One of these endpoints is malformation, and one group of ongoing methods for simplifying is in silico models. In this study, we aim to develop a quantitative structure-activity relationship (QSAR) model and identify the best algorithm for predicting malformations, as well as the most important and effective physicochemical properties associated with malformation. METHODS The dataset was extracted from a reliable database called COMPTOX. Physicochemical properties (descriptors) were calculated using Mordred and RDKit chemoinformatics software. The data were cleaned, preprocessed, and then split into training and testing sets. Machine learning algorithms, such as gradient boosting model (GBM) and logistic regression (LR), as well as deep learning models, including multilayer perceptron (MLP) and neural networks (NNs) trained with train set data and different sets of descriptors. The models were then validated with test set and various statistical parameters, such as Matthew's correlation coefficient (MCC) and balanced accuracy (BAC) score, were used to compare the models. RESULTS A set of descriptors containing with 78% AUC was identified as the best set of descriptors. Gradient boosting was determined to be the best algorithm with 78% predictive power. CONCLUSIONS The descriptors that were the most effective for developing models directly impact the mechanism of malformation, and GBM is the best model due to its MCC and BAC.
Collapse
Affiliation(s)
- Mahsa Daneshmand
- Department of Comparative Bioscience, Faculty of Veterinary Medicine, University of Tehran, Tehran, Iran
| | - Jamileh SalarAmoli
- Department of Comparative Bioscience, Faculty of Veterinary Medicine, University of Tehran, Tehran, Iran
| | | |
Collapse
|
3
|
Lovrić M, Wang T, Staffe MR, Šunić I, Časni K, Lasky-Su J, Chawes B, Rasmussen MA. A Chemical Structure and Machine Learning Approach to Assess the Potential Bioactivity of Endogenous Metabolites and Their Association with Early Childhood Systemic Inflammation. Metabolites 2024; 14:278. [PMID: 38786755 PMCID: PMC11122766 DOI: 10.3390/metabo14050278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 04/29/2024] [Accepted: 05/08/2024] [Indexed: 05/25/2024] Open
Abstract
Metabolomics has gained much attention due to its potential to reveal molecular disease mechanisms and present viable biomarkers. This work uses a panel of untargeted serum metabolomes from 602 children from the COPSAC2010 mother-child cohort. The annotated part of the metabolome consists of 517 chemical compounds curated using automated procedures. We created a filtering method for the quantified metabolites using predicted quantitative structure-bioactivity relationships for the Tox21 database on nuclear receptors and stress response in cell lines. The metabolites measured in the children's serums are predicted to affect specific targeted models, known for their significance in inflammation, immune function, and health outcomes. The targets from Tox21 have been used as targets with quantitative structure-activity relationships (QSARs). They were trained for ~7000 structures, saved as models, and then applied to the annotated metabolites to predict their potential bioactivities. The models were selected based on strict accuracy criteria surpassing random effects. After application, 52 metabolites showed potential bioactivity based on structural similarity with known active compounds from the Tox21 set. The filtered compounds were subsequently used and weighted by their bioactive potential to show an association with early childhood hs-CRP levels at six months in a linear model supporting a physiological adverse effect on systemic low-grade inflammation.
Collapse
Affiliation(s)
- Mario Lovrić
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, 2820 Gentofte, Denmark
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia;
- The Lisbon Council, 1040 Brussels, Belgium
| | - Tingting Wang
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, 2820 Gentofte, Denmark
| | - Mads Rønnow Staffe
- Department of Food Science, University of Copenhagen, 1958 Frederiksberg, Denmark
| | - Iva Šunić
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia;
| | | | - Jessica Lasky-Su
- Department of Medicine, Boston, MA 02115, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Bo Chawes
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, 2820 Gentofte, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, 2300 Copenhagen, Denmark
| | - Morten Arendt Rasmussen
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, 2820 Gentofte, Denmark
- Department of Food Science, University of Copenhagen, 1958 Frederiksberg, Denmark
| |
Collapse
|
4
|
Zhao W, Chen Y, Hu N, Long D, Cao Y. The uses of zebrafish (Danio rerio) as an in vivo model for toxicological studies: A review based on bibliometrics. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2024; 272:116023. [PMID: 38290311 DOI: 10.1016/j.ecoenv.2024.116023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/20/2024] [Accepted: 01/24/2024] [Indexed: 02/01/2024]
Abstract
An in vivo model is necessary for toxicology. This review analyzed the uses of zebrafish (Danio rerio) in toxicology based on bibliometrics. Totally 56,816 publications about zebrafish from 2002 to 2023 were found in Web of Science Core Collection, with Toxicology as the top 6 among all disciplines. Accordingly, the bibliometric map reveals that "toxicity" has become a hot keyword. It further reveals that the most common exposure types include acute, chronic, and combined exposure. The toxicological effects include behavioral, intestinal, cardiovascular, hepatic, endocrine toxicity, neurotoxicity, immunotoxicity, genotoxicity, and reproductive and transgenerational toxicity. The mechanisms include oxidative stress, inflammation, autophagy, and dysbiosis of gut microbiota. The toxicants commonly evaluated by using zebrafish model include nanomaterials, arsenic, metals, bisphenol, and dioxin. Overall, zebrafish provide a unique and well-accepted model to investigate the toxicological effects and mechanisms. We also discussed the possible ways to address some of the limitations of zebrafish model, such as the combination of human organoids to avoid species differences.
Collapse
Affiliation(s)
- Weichao Zhao
- Hunan Province Key Laboratory of Typical Environmental Pollution and Health Hazards, School of Public Health, Hengyang Medical School, University of South China, Hengyang 421001, PR China
| | - Yuna Chen
- Hunan Province Key Laboratory of Typical Environmental Pollution and Health Hazards, School of Public Health, Hengyang Medical School, University of South China, Hengyang 421001, PR China
| | - Nan Hu
- Key Discipline Laboratory for National Defense for Biotechnology in Uranium Mining and Hydrometallurgy, University of South China, Hengyang 421001, PR China.
| | - Dingxin Long
- Hunan Province Key Laboratory of Typical Environmental Pollution and Health Hazards, School of Public Health, Hengyang Medical School, University of South China, Hengyang 421001, PR China.
| | - Yi Cao
- Hunan Province Key Laboratory of Typical Environmental Pollution and Health Hazards, School of Public Health, Hengyang Medical School, University of South China, Hengyang 421001, PR China.
| |
Collapse
|
5
|
Lovrić M, Wang T, Staffe MR, Šunić I, Časni K, Lasky-Su J, Chawes B, Rasmussen MA. A chemical structure and machine learning approach to assess the potential bioactivity of endogenous metabolites and their association with early-childhood hs-CRP levels. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.15.567095. [PMID: 38014335 PMCID: PMC10680762 DOI: 10.1101/2023.11.15.567095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Metabolomics has gained much attraction due to its potential to reveal molecular disease mechanisms and present viable biomarkers. In this work we used a panel of untargeted serum metabolomes in 602 childhood patients of the COPSAC2010 mother-child cohort. The annotated part of the metabolome consists of 493 chemical compounds curated using automated procedures. Using predicted quantitative-structure-bioactivity relationships for the Tox21 database on nuclear receptors and stress response in cell lines, we created a filtering method for the vast number of quantified metabolites. The metabolites measured in children's serums used here have predicted potential against the chosen target modelled targets. The targets from Tox21 have been used with quantitative structure-activity relationships (QSARs) and were trained for ~7000 structures, saved as models, and then applied to 493 metabolites to predict their potential bioactivities. The models were selected based on strict accuracy criteria surpassing random effects. After application, 52 metabolites showed potential bioactivity based on structural similarity with known active compounds from the Tox21 set. The filtered compounds were subsequently used and weighted by their bioactive potential to show an association with early childhood hs-CRP levels at six months in a linear model supporting a physiological adverse effect on systemic low-grade inflammation. The significant metabolites were reported.
Collapse
Affiliation(s)
- Mario Lovrić
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia
- Faculty of Electrical Engineering, Computer Science and Information Technology Osijek, Josip Juraj Strossmayer University of Osijek, Kneza Trpimira 2b, HR-31000 Osijek, Croatia
| | - Tingting Wang
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Mads Rønnow Staffe
- University of Copenhagen, Department of Food Science, Rolighedsvej 26, 1958 Frb. C., Denmark
| | - Iva Šunić
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia
| | | | - Jessica Lasky-Su
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia
- Faculty of Electrical Engineering, Computer Science and Information Technology Osijek, Josip Juraj Strossmayer University of Osijek, Kneza Trpimira 2b, HR-31000 Osijek, Croatia
- University of Copenhagen, Department of Food Science, Rolighedsvej 26, 1958 Frb. C., Denmark
- Know-Center, Inffeldgasse 13, AT-8010 Graz
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Bo Chawes
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Morten Arendt Rasmussen
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
- University of Copenhagen, Department of Food Science, Rolighedsvej 26, 1958 Frb. C., Denmark
| |
Collapse
|
6
|
Fan YL, Hsu FR, Wang Y, Liao LD. Unlocking the Potential of Zebrafish Research with Artificial Intelligence: Advancements in Tracking, Processing, and Visualization. Med Biol Eng Comput 2023; 61:2797-2814. [PMID: 37558927 DOI: 10.1007/s11517-023-02903-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 08/01/2023] [Indexed: 08/11/2023]
Abstract
Zebrafish have become a widely accepted model organism for biomedical research due to their strong cortisol stress response, behavioral strain differences, and sensitivity to both drug treatments and predators. However, experimental zebrafish studies generate substantial data that must be analyzed through objective, accurate, and repeatable analysis methods. Recently, advancements in artificial intelligence (AI) have enabled automated tracking, image recognition, and data analysis, leading to more efficient and insightful investigations. In this review, we examine key AI applications in zebrafish research, including behavior analysis, genomics, and neuroscience. With the development of deep learning technology, AI algorithms have been used to precisely analyze and identify images of zebrafish, enabling automated testing and analysis. By applying AI algorithms in genomics research, researchers have elucidated the relationship between genes and biology, providing a better basis for the development of disease treatments and gene therapies. Additionally, the development of more effective neuroscience tools could help researchers better understand the complex neural networks in the zebrafish brain. In the future, further advancements in AI technology are expected to enable more extensive and in-depth medical research applications in zebrafish, improving our understanding of this important animal model. This review highlights the potential of AI technology in achieving the full potential of zebrafish research by enabling researchers to efficiently track, process, and visualize the outcomes of their experiments.
Collapse
Affiliation(s)
- Yi-Ling Fan
- Institute of Biomedical Engineering and Nanomedicine, National Health Research Institutes, 35, Keyan Road, Zhunan Town, Miaoli County, 35053, Taiwan
- Department of Information Engineering and Computer Science, Feng Chia University, Taichung, 407, Taiwan
| | - Fang-Rong Hsu
- Department of Information Engineering and Computer Science, Feng Chia University, Taichung, 407, Taiwan
| | - Yuhling Wang
- Institute of Biomedical Engineering and Nanomedicine, National Health Research Institutes, 35, Keyan Road, Zhunan Town, Miaoli County, 35053, Taiwan
- Department of Electrical Engineering, National United University, 2, Lien-Da, Nan-Shih Li, Miaoli, 360302, Taiwan
| | - Lun-De Liao
- Institute of Biomedical Engineering and Nanomedicine, National Health Research Institutes, 35, Keyan Road, Zhunan Town, Miaoli County, 35053, Taiwan.
| |
Collapse
|
7
|
Jeong J, Kim D, Choi J. Application of ToxCast/Tox21 data for toxicity mechanism-based evaluation and prioritization of environmental chemicals: Perspective and limitations. Toxicol In Vitro 2022; 84:105451. [PMID: 35921976 DOI: 10.1016/j.tiv.2022.105451] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 07/28/2022] [Indexed: 01/28/2023]
Abstract
In response to the need to minimize the use of experimental animals, new approach methodologies (NAMs) using advanced technology have emerged in the 21st century. ToxCast/Tox21 aims to evaluate the adverse effects of chemicals quickly and efficiently using a high-throughput screening and to transform the paradigm of toxicity assessment into mechanism-based toxicity prediction. The ToxCast/Tox21 database, which contains extensive data from over 1400 assays with numerous biological targets and activity data for over 9000 chemicals, can be used for various purposes in the field of chemical prioritization and toxicity prediction. In this study, an overview of the database was explored to aid mechanism-based chemical prioritization and toxicity prediction. Implications for the utilization of the ToxCast/Tox21 database in chemical prioritization and toxicity prediction were derived. The research trends in ToxCast/Tox21 assay data were reviewed in the context of toxicity mechanism identification, chemical priority, environmental monitoring, assay development, and toxicity prediction. Finally, the potential applications and limitations of using ToxCast/Tox21 assay data in chemical risk assessment were discussed. The analysis of the toxicity mechanism-based assays of ToxCast/Tox21 will help in chemical prioritization and regulatory applications without the use of laboratory animals.
Collapse
Affiliation(s)
- Jaeseong Jeong
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - Donghyeon Kim
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - Jinhee Choi
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea.
| |
Collapse
|
8
|
Davey CJE, Kraak MHS, Praetorius A, Ter Laak TL, van Wezel AP. Occurrence, hazard, and risk of psychopharmaceuticals and illicit drugs in European surface waters. WATER RESEARCH 2022; 222:118878. [PMID: 35878520 DOI: 10.1016/j.watres.2022.118878] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 07/13/2022] [Accepted: 07/17/2022] [Indexed: 06/15/2023]
Abstract
This study aimed to provide insights into the risk posed by psychopharmaceuticals and illicit drugs in European surface waters, and to identify current knowledge gaps hampering this risk assessment. First, the availability and quality of data on the concentrations of psychopharmaceuticals and illicit drugs in surface waters (occurrence) and on the toxicity to aquatic organisms (hazard) were reviewed. If both occurrence and ecotoxicity data were available, risk quotients (risk) were calculated. Where abundant ecotoxicity data were available, a species sensitivity distribution (SSD) was constructed, from which the hazardous concentration for 5% of the species (HC5) was derived, allowing to derive integrated multi-species risks. A total of 702 compounds were categorised as psychopharmaceuticals and illicit drugs based on a combination of all 502 anatomical therapeutic class (ATC) 'N' pharmaceuticals and a list of illicit drugs according to the Dutch Opium Act. Of these, 343 (49%) returned occurrence data, while only 105 (15%) returned ecotoxicity data. Moreover, many ecotoxicity tests used irrelevant endpoints for neurologically active compounds, such as mortality, which may underestimate the hazard of psychopharmaceuticals. Due to data limitations, risks could only be assessed for 87 (12%) compounds, with 23 (3.3%) compounds indicating a potential risk, and several highly prescribed drugs returned neither occurrence nor ecotoxicity data. Primary bottlenecks in risk calculation included the lack of ecotoxicity data, a lack of diversity of test species and ecotoxicological end points, and large disparities between well studied and understudied compounds for both occurrence and toxicity data. This study identified which compounds merit concern, as well as the many compounds that lack the data for any calculation of risk, driving research priorities. Despite the large knowledge gaps, we concluded that the presence of a substantial part (26%) of data-rich psychopharmaceuticals in surface waters present an ecological risk for aquatic non-target organisms.
Collapse
Affiliation(s)
- Charlie J E Davey
- FAME, UvA IBED: Universiteit van Amsterdam Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Science Park 904, Amsterdam 1098 XH, the Netherlands.
| | - Michiel H S Kraak
- FAME, UvA IBED: Universiteit van Amsterdam Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Science Park 904, Amsterdam 1098 XH, the Netherlands
| | - Antonia Praetorius
- FAME, UvA IBED: Universiteit van Amsterdam Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Science Park 904, Amsterdam 1098 XH, the Netherlands
| | - Thomas L Ter Laak
- FAME, UvA IBED: Universiteit van Amsterdam Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Science Park 904, Amsterdam 1098 XH, the Netherlands; KWR Water Research Institute, Nieuwegein, the Netherlands
| | - Annemarie P van Wezel
- FAME, UvA IBED: Universiteit van Amsterdam Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Science Park 904, Amsterdam 1098 XH, the Netherlands
| |
Collapse
|
9
|
Lovrić M, Đuričić T, Tran HTN, Hussain H, Lacić E, Rasmussen MA, Kern R. Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints. Pharmaceuticals (Basel) 2021; 14:758. [PMID: 34451855 PMCID: PMC8400160 DOI: 10.3390/ph14080758] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 07/21/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023] Open
Abstract
Methods for dimensionality reduction are showing significant contributions to knowledge generation in high-dimensional modeling scenarios throughout many disciplines. By achieving a lower dimensional representation (also called embedding), fewer computing resources are needed in downstream machine learning tasks, thus leading to a faster training time, lower complexity, and statistical flexibility. In this work, we investigate the utility of three prominent unsupervised embedding techniques (principal component analysis-PCA, uniform manifold approximation and projection-UMAP, and variational autoencoders-VAEs) for solving classification tasks in the domain of toxicology. To this end, we compare these embedding techniques against a set of molecular fingerprint-based models that do not utilize additional pre-preprocessing of features. Inspired by the success of transfer learning in several fields, we further study the performance of embedders when trained on an external dataset of chemical compounds. To gain a better understanding of their characteristics, we evaluate the embedders with different embedding dimensionalities, and with different sizes of the external dataset. Our findings show that the recently popularized UMAP approach can be utilized alongside known techniques such as PCA and VAE as a pre-compression technique in the toxicology domain. Nevertheless, the generative model of VAE shows an advantage in pre-compressing the data with respect to classification accuracy.
Collapse
Affiliation(s)
- Mario Lovrić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia
| | - Tomislav Đuričić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| | - Han T. N. Tran
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
| | - Hussain Hussain
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| | - Emanuel Lacić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
| | - Morten A. Rasmussen
- Copenhagen Studies on Asthma in Childhood, Herlev-Gentofte Hospital, University of Copenhagen, Ledreborg Alle 34, 2820 Gentofte, Denmark;
- Department of Food Science, University of Copenhagen, Rolighedsvej 26, 1958 Frederiksberg, Denmark
| | - Roman Kern
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| |
Collapse
|
10
|
Wu Y, Zhu J, Fu P, Tong W, Hong H, Chen M. Machine Learning for Predicting Risk of Drug-Induced Autoimmune Diseases by Structural Alerts and Daily Dose. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18137139. [PMID: 34281077 PMCID: PMC8296890 DOI: 10.3390/ijerph18137139] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 06/20/2021] [Accepted: 06/25/2021] [Indexed: 12/28/2022]
Abstract
An effective approach for assessing a drug’s potential to induce autoimmune diseases (ADs) is needed in drug development. Here, we aim to develop a workflow to examine the association between structural alerts and drugs-induced ADs to improve toxicological prescreening tools. Considering reactive metabolite (RM) formation as a well-documented mechanism for drug-induced ADs, we investigated whether the presence of certain RM-related structural alerts was predictive for the risk of drug-induced AD. We constructed a database containing 171 RM-related structural alerts, generated a dataset of 407 AD- and non-AD-associated drugs, and performed statistical analysis. The nitrogen-containing benzene substituent alerts were found to be significantly associated with the risk of drug-induced ADs (odds ratio = 2.95, p = 0.0036). Furthermore, we developed a machine-learning-based predictive model by using daily dose and nitrogen-containing benzene substituent alerts as the top inputs and achieved the predictive performance of area under curve (AUC) of 70%. Additionally, we confirmed the reactivity of the nitrogen-containing benzene substituent aniline and related metabolites using quantum chemistry analysis and explored the underlying mechanisms. These identified structural alerts could be helpful in identifying drug candidates that carry a potential risk of drug-induced ADs to improve their safety profiles.
Collapse
Affiliation(s)
- Yue Wu
- National Center for Toxicological Research, Division of Bioinformatics and Biostatistics, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (Y.W.); (J.Z.); (W.T.); (H.H.)
| | - Jieqiang Zhu
- National Center for Toxicological Research, Division of Bioinformatics and Biostatistics, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (Y.W.); (J.Z.); (W.T.); (H.H.)
| | - Peter Fu
- National Center for Toxicological Research, Division of Biochemical Toxicology, U.S. Food and Drug Administration, Jefferson, AR 72079, USA;
| | - Weida Tong
- National Center for Toxicological Research, Division of Bioinformatics and Biostatistics, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (Y.W.); (J.Z.); (W.T.); (H.H.)
| | - Huixiao Hong
- National Center for Toxicological Research, Division of Bioinformatics and Biostatistics, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (Y.W.); (J.Z.); (W.T.); (H.H.)
| | - Minjun Chen
- National Center for Toxicological Research, Division of Bioinformatics and Biostatistics, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (Y.W.); (J.Z.); (W.T.); (H.H.)
- Correspondence: ; Fax: +1-870-543-7865
| |
Collapse
|