1
|
Lei L, Zhang L, Han Z, Chen Q, Liao P, Wu D, Tai J, Xie B, Su Y. Advancing chronic toxicity risk assessment in freshwater ecology by molecular characterization-based machine learning. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 342:123093. [PMID: 38072027 DOI: 10.1016/j.envpol.2023.123093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/30/2023] [Accepted: 12/02/2023] [Indexed: 01/26/2024]
Abstract
The continuously increased production of various chemicals and their release into environments have raised potential negative effects on ecological health. However, traditional labor-intensive assessment methods cannot effectively and rapidly evaluate these hazards, especially for chronic risk. In this study, machine learning (ML) was employed to construct quantitative structure-activity relationship (QSAR) models, enabling the prediction of chronic toxicity to aquatic organisms by leveraging the molecular characteristics of pollutants, namely, the molecular descriptors, fingerprints, and graphs. The limited dataset size hindered the notable advantages of the graph attention network (GAT) model for the molecular graphs. Considering computational efficiency and performance (R2 = 0.78; RMSE = 0.77), XGBoost (XGB) was used for reliable QSAR-ML models predicting chronic toxicity using small- or medium-sized tabular data and the molecular descriptors. Further kernel density estimation analysis confirmed the high accuracy of the model for pollutant concentrations ranging from 10-3 to 102 mg/L, effectively aligning with most environmental scenarios. Model interpretation showed SlogP and exposure duration as the primary influential factors. SlogP, representing the distribution coefficient of a molecule between lipophilic and hydrophilic environments, had a negative effect on the toxicity outcomes. Additionally, the exposure duration played a crucial role in determining the chronic toxicity. Finally, the chronic toxicity data of bisphenol A validated the robustness and reliability of the model established in this research. Our study provided a robust and feasible methodology for chronic ecological risk evaluation of various types of pollutants and could facilitate and increase the use of ML applications in environmental fields.
Collapse
Affiliation(s)
- Lang Lei
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Liangmao Zhang
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Zhibang Han
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Qirui Chen
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Pengcheng Liao
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Dong Wu
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
| | - Jun Tai
- Shanghai Environmental Sanitation Engineering Design Institute Co., Ltd., Shanghai, 200232, China
| | - Bing Xie
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
| | - Yinglong Su
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China.
| |
Collapse
|
2
|
Yosipof A, Khalemsky A, Gelbard R, Senderowitz H. Dynamic Classification for Materials‐Informatics: Mining the Solar Cell Space. Mol Inform 2020; 41:e2000173. [DOI: 10.1002/minf.202000173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 09/21/2020] [Indexed: 11/11/2022]
Affiliation(s)
- Abraham Yosipof
- Faculty of Information Systems and Computer Science College of Law & Business Ramat-Gan Israel
| | - Anna Khalemsky
- Graduate School of Business Administration Bar Ilan University Ramat-Gan 5290002 Israel
| | - Roy Gelbard
- Graduate School of Business Administration Bar Ilan University Ramat-Gan 5290002 Israel
| | | |
Collapse
|
3
|
Veremyev A, Liyanage L, Fornari M, Boginski V, Curtarolo S, Butenko S, Buongiorno Nardelli M. Networks of materials: Construction and structural analysis. AIChE J 2020. [DOI: 10.1002/aic.17051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Alexander Veremyev
- Department of Industrial Engineering and Management Systems University of Central Florida Orlando Florida USA
| | | | - Marco Fornari
- Department of Physics and Science of Advanced Materials Program Central Michigan University Mount Pleasant Michigan USA
| | - Vladimir Boginski
- Department of Industrial Engineering and Management Systems University of Central Florida Orlando Florida USA
| | - Stefano Curtarolo
- Department of Mechanical Engineering and Materials Science Duke University Durham North Carolina USA
| | - Sergiy Butenko
- Department of Industrial and Systems Engineering Texas A&M University College Station Texas USA
| | | |
Collapse
|
4
|
López-López E, Naveja JJ, Medina-Franco JL. DataWarrior: an evaluation of the open-source drug discovery tool. Expert Opin Drug Discov 2019; 14:335-341. [PMID: 30806519 DOI: 10.1080/17460441.2019.1581170] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
INTRODUCTION DataWarrior is open and interactive software for data analysis and visualization that integrates well-established and novel chemoinformatics algorithms in a single environment. Since its public release in 2014, DataWarrior has been used by research groups in universities, government, and industry. Areas covered: Herein, the authors discuss, in a critical manner, the tools and distinct technical features of DataWarrior and analyze the areas of opportunity. Authors also present the most common applications as well as emerging uses in research areas beyond drug discovery with an emphasis on multidisciplinary projects. Expert opinion: In the era of big data and data-driven science, DataWarrior stands out as a technology that combines prediction of physicochemical properties of pharmaceutical interest, cheminformatics calculations, multivariate data analysis, and interactive visualization with dynamic plots. The well-established chemoinformatics tools implemented in DataWarrior, as well as the innovative algorithms, make the technology useful and attractive as revealed by the increasing number of documented applications.
Collapse
Affiliation(s)
- Edgar López-López
- a Department of Pharmacy, School of Chemistry , National Autonomous University of Mexico , Mexico City , Mexico.,b Medicinal Chemistry Laboratory , University of Veracruz , Veracruz , Mexico
| | - J Jesús Naveja
- a Department of Pharmacy, School of Chemistry , National Autonomous University of Mexico , Mexico City , Mexico.,c PECEM, Faculty of Medicine , National Autonomous University of Mexico , Mexico City , Mexico
| | - José L Medina-Franco
- a Department of Pharmacy, School of Chemistry , National Autonomous University of Mexico , Mexico City , Mexico
| |
Collapse
|
5
|
Kaspi O, Yosipof A, Senderowitz H. Visualization of Solar Cell Library Space by Dimensionality Reduction Methods. J Chem Inf Model 2018; 58:2428-2439. [PMID: 30485100 DOI: 10.1021/acs.jcim.8b00552] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Visualizing high-dimensional data by projecting them into a two- or three-dimensional space is a popular approach in many scientific fields, including computer-aided drug design and cheminformatics. In contrast, dimensionality reduction techniques have been far less explored for materials informatics. Nevertheless, similar to their usefulness in analyzing the space of, e.g., drug-like molecules, such techniques could provide useful insights on materials space, including an intuitive grasp of the overall distribution of samples, the identification of interesting trends, including the formation of materials clusters and the presence of activity cliffs and outliers, and rational navigation through this space in the search for new materials. Here we present the first application of four dimensionality reduction techniques, namely, principal component analysis (PCA), kernel PCA, Isomap, and diffusion map, to visualize and analyze a part of the materials space populated by solar cells made of metal oxides. Solar cells in general and metal-oxide-based solar cells in particular hold the promise of contributing to the world's search for clean and affordable energy resources. With the exception of PCA, these methods have seldom been used to visualize chemistry space and almost never been used to visualize materials space. For this purpose, we integrated five metal-oxide-based solar cell libraries into a uniform database and subjected it to dimensionality reduction by all four methods, comparing their performances using various criteria such as maintaining the local environment of samples and the clustering structure in the low-dimensional space. We also looked at the number of outliers produced by each method and analyzed common outliers. We found that PCA performs best in terms of the ability to correctly maintain the local environment of samples, whereas Isomap does the best job of assigning class membership on the basis of the identities of nearest neighbors (i.e., it is the best classifier). We also found that many of the outliers identified by all of the methods could be rationalized. We suggest that the methods used in this work could be extended to study other types of solar cells, thereby setting the ground for further analysis of the photovoltaic (PV) space as well as other regions of materials space.
Collapse
Affiliation(s)
- Omer Kaspi
- Department of Chemistry , Bar-Ilan University , Ramat-Gan 5290002 , Israel
| | - Abraham Yosipof
- Department of Information Systems , College of Law & Business, Ramat-Gan , P.O. Box 852, Bnei Brak 5110801 , Israel
| | - Hanoch Senderowitz
- Department of Chemistry , Bar-Ilan University , Ramat-Gan 5290002 , Israel
| |
Collapse
|
6
|
Kaspi O, Yosipof A, Senderowitz H. RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells. J Cheminform 2017; 9:34. [PMID: 29086047 PMCID: PMC5461245 DOI: 10.1186/s13321-017-0224-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2017] [Accepted: 05/25/2017] [Indexed: 01/04/2023] Open
Abstract
An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a “one stop shop” algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For “future” predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.
Collapse
Affiliation(s)
- Omer Kaspi
- Department of Systems Engineering, Afeka - Tel-Aviv Academic College of Engineering, Tel-Aviv, Israel.,Department of Chemistry, Bar-Ilan University, 5290002, Ramat-Gan, Israel
| | - Abraham Yosipof
- Faculty of Business Administration, College of Law & Business, 26 Ben Gurion Street, Ramat-Gan, P.O. Box 852, 5110801, Bnei Brak, Israel.
| | - Hanoch Senderowitz
- Department of Chemistry, Bar-Ilan University, 5290002, Ramat-Gan, Israel.
| |
Collapse
|