1
|
Banerjee A, Kar S, Roy K, Patlewicz G, Charest N, Benfenati E, Cronin MTD. Molecular similarity in chemical informatics and predictive toxicity modeling: from quantitative read-across (q-RA) to quantitative read-across structure-activity relationship (q-RASAR) with the application of machine learning. Crit Rev Toxicol 2024; 54:659-684. [PMID: 39225123 DOI: 10.1080/10408444.2024.2386260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/25/2024] [Accepted: 07/25/2024] [Indexed: 09/04/2024]
Abstract
This article aims to provide a comprehensive critical, yet readable, review of general interest to the chemistry community on molecular similarity as applied to chemical informatics and predictive modeling with a special focus on read-across (RA) and read-across structure-activity relationships (RASAR). Molecular similarity-based computational tools, such as quantitative structure-activity relationships (QSARs) and RA, are routinely used to fill the data gaps for a wide range of properties including toxicity endpoints for regulatory purposes. This review will explore the background of RA starting from how structural information has been used through to how other similarity contexts such as physicochemical, absorption, distribution, metabolism, and elimination (ADME) properties, and biological aspects are being characterized. More recent developments of RA's integration with QSAR have resulted in the emergence of novel models such as ToxRead, generalized read-across (GenRA), and quantitative RASAR (q-RASAR). Conventional QSAR techniques have been excluded from this review except where necessary for context.
Collapse
Affiliation(s)
- Arkaprava Banerjee
- Department of Pharmaceutical Technology, Drug Theoretics and Cheminformatics (DTC) Laboratory, Jadavpur University, Kolkata, India
| | - Supratik Kar
- Department of Chemistry and Physics, Chemometrics & Molecular Modeling Laboratory, Kean University, Union, NJ, USA
| | - Kunal Roy
- Department of Pharmaceutical Technology, Drug Theoretics and Cheminformatics (DTC) Laboratory, Jadavpur University, Kolkata, India
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Nathaniel Charest
- Center for Computational Toxicology and Exposure, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Emilio Benfenati
- Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
| |
Collapse
|
2
|
Jiang JR, Cai WX, Chen ZF, Liao XL, Cai Z. Prediction of acute toxicity for Chlorella vulgaris caused by tire wear particle-derived compounds using quantitative structure-activity relationship models. WATER RESEARCH 2024; 256:121643. [PMID: 38663211 DOI: 10.1016/j.watres.2024.121643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 04/16/2024] [Accepted: 04/17/2024] [Indexed: 05/12/2024]
Abstract
Tire wear particles (TWPs) enter aquatic ecosystems through various pathways, such as rainwater and urban runoff. Additives in TWPs can harm aquatic organisms in these ecosystems. Therefore, it is essential to investigate their toxicity to aquatic organisms. In our study, we initially recorded the median effective concentrations of 21 TWP-derived compounds on Chlorella vulgaris growth, ranging from 0.04 to 8.60 mg/L. Subsequently, through an extensive review of the literature, we incorporated 112 compounds with specific toxicity endpoints to construct the QSAR model using genetic algorithm and multiple linear regression techniques, followed by the construction of the consensus model and the quantitative read-across structure-activity relationship (q-RASAR) model. Meanwhile, we employed rigorous internal and external validation measures to assess the performance of the model. The results indicated that the developed q-RASAR model exhibited strong adaptation, robustness, and reliable prediction, with q-RASAR indicators of Q2LOO = 0.7673, R2tr = 0.8079, R2test = 0.8610, Q2Fn = 0.8285-0.8614, and CCCtest = 0.9222. Based on an external dataset containing 128 emerging TWP-derived compounds, the model's applicability domain coverage was 90.6 %. The q-RASAR model predicted that the structure of diphenylamine was associated with higher toxicity, possibly liked to the SpMax2_Bhm and LogBCF descriptors. The established model reliably provides prediction and fills a critical data gap. These findings highlight the potential risks posed by emerging TWP-derived compounds to aquatic organisms.
Collapse
Affiliation(s)
- Jie-Ru Jiang
- Guangdong Key Laboratory of Environmental Catalysis and Health Risk Control, Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, School of Environmental Science and Engineering, Guangdong University of Technology, Guangzhou 510006, China
| | - Wen-Xi Cai
- Guangdong Key Laboratory of Environmental Catalysis and Health Risk Control, Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, School of Environmental Science and Engineering, Guangdong University of Technology, Guangzhou 510006, China
| | - Zhi-Feng Chen
- Guangdong Key Laboratory of Environmental Catalysis and Health Risk Control, Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, School of Environmental Science and Engineering, Guangdong University of Technology, Guangzhou 510006, China.
| | - Xiao-Liang Liao
- Guangdong Key Laboratory of Environmental Catalysis and Health Risk Control, Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, School of Environmental Science and Engineering, Guangdong University of Technology, Guangzhou 510006, China
| | - Zongwei Cai
- Guangdong Key Laboratory of Environmental Catalysis and Health Risk Control, Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, School of Environmental Science and Engineering, Guangdong University of Technology, Guangzhou 510006, China; State Key Laboratory of Environmental and Biological Analysis, Department of Chemistry, Hong Kong Baptist University, Hong Kong 999077, China.
| |
Collapse
|
3
|
Kumar A, Ojha PK, Roy K. The first report on the assessment of maximum acceptable daily intake (MADI) of pesticides for humans using intelligent consensus predictions. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2024; 26:870-881. [PMID: 38652036 DOI: 10.1039/d4em00059e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Direct or indirect consumption of pesticides and their related products by humans and other living organisms without safe dosing may pose a health risk. The risk may arise after a short/long time which depends on the nature and amount of chemicals consumed. Therefore, the maximum acceptable daily intake of chemicals must be calculated to prevent these risks. In the present work, regression-based quantitative structure-activity relationship (QSAR) models were developed using 39 pesticides with maximum acceptable daily intake (MADI) for humans as the endpoint. From the statistical results (R2 = 0.674-0.712, QLOO2 = 0.553-0.580, Q(F1)2 = 0.544-0.611, and Q(F2)2 = 0.531-0.599), it can be inferred that the developed models were robust, reliable, reproducible, accurate, and predictive. Intelligent Consensus Prediction (ICP) was employed to improve the external predictivity (Q(F1)2 =0.579-0.657 and Q(F2)2 = 0.563-0.647) of the models. Some of the chemical markers responsible for toxicity enhancement are the presence of unsaturated bonds, lipophilicity, presence of C< (double bond-single bond-single bonded carbon), and the presence of sulphur and phosphate bonds at the topological distances 1 and 6, while the presence of hydrophilic groups and short chain fragments reduces the toxicity. The Pesticide Properties Database (PPDB) (1694 pesticides) was also screened with the developed models. Hence, this research work will be helpful for the toxicity assessment of pesticides before their synthesis, the development of eco-friendly and safer pesticides, and data-gap filling reducing the time, cost, and animal experimentation. Thus, this study might hold promise for future potential MADI assessment of pesticides and provide a meaningful contribution to the field of risk assessment.
Collapse
Affiliation(s)
- Ankur Kumar
- Drug Discovery and Development Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| | - Probir Kumar Ojha
- Drug Discovery and Development Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| | - Kunal Roy
- Drug Theoretics and Cheminformatics (DTC) Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
4
|
Martinez-Mayorga K, Rosas-Jiménez JG, Gonzalez-Ponce K, López-López E, Neme A, Medina-Franco JL. The pursuit of accurate predictive models of the bioactivity of small molecules. Chem Sci 2024; 15:1938-1952. [PMID: 38332817 PMCID: PMC10848664 DOI: 10.1039/d3sc05534e] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 01/09/2024] [Indexed: 02/10/2024] Open
Abstract
Property prediction is a key interest in chemistry. For several decades there has been a continued and incremental development of mathematical models to predict properties. As more data is generated and accumulated, there seems to be more areas of opportunity to develop models with increased accuracy. The same is true if one considers the large developments in machine and deep learning models. However, along with the same areas of opportunity and development, issues and challenges remain and, with more data, new challenges emerge such as the quality and quantity and reliability of the data, and model reproducibility. Herein, we discuss the status of the accuracy of predictive models and present the authors' perspective of the direction of the field, emphasizing on good practices. We focus on predictive models of bioactive properties of small molecules relevant for drug discovery, agrochemical, food chemistry, natural product research, and related fields.
Collapse
Affiliation(s)
- Karina Martinez-Mayorga
- Institute of Chemistry, Merida Unit, National Autonomous University of Mexico Merida-Tetiz Highway, Km. 4.5 Ucu Yucatan Mexico
- Institute for Applied Mathematics and Systems, Merida Research Unit, National Autonomous University of Mexico Sierra Papacal Merida Yucatan Mexico
| | - José G Rosas-Jiménez
- Department of Theoretical Biophysics, IMPRS on Cellular Biophysics Max-von-Laue Strasse 3 Frankfurt am Main 60438 Germany
| | - Karla Gonzalez-Ponce
- Institute of Chemistry, Merida Unit, National Autonomous University of Mexico Merida-Tetiz Highway, Km. 4.5 Ucu Yucatan Mexico
| | - Edgar López-López
- Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute Mexico City 07000 Mexico
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry National Autonomous University of Mexico Mexico City 04510 Mexico
| | - Antonio Neme
- Institute for Applied Mathematics and Systems, Merida Research Unit, National Autonomous University of Mexico Sierra Papacal Merida Yucatan Mexico
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry National Autonomous University of Mexico Mexico City 04510 Mexico
| |
Collapse
|
5
|
Chatterjee M, Roy K. Predictive binary mixture toxicity modeling of fluoroquinolones (FQs) and the projection of toxicity of hypothetical binary FQ mixtures: a combination of 2D-QSAR and machine-learning approaches. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2024; 26:105-118. [PMID: 38073518 DOI: 10.1039/d3em00445g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
All sorts of chemicals get degraded under various environmental stresses, and the degradates coexist with the parent compounds as mixtures in the environment. Antibiotics emerge as an additional concern due to the bioactive nature of both the parent compound and degradation products and their combined exposure to the environment. Therefore, environmental risk assessment of antibiotics and their degradation products is very much necessary. In this direction, we made use of in silico new approach methodologies (NAMs) and machine-learning algorithms. In this study, we have developed a robust and predictive mixture-quantitative structure-activity relationship (QSAR) model with promising quality and predictability (internal: MAETrain = 0.085, QLOO2 = 0.849, external: MAETest = 0.090, and QF12 = 0.859) for predicting the toxicity of the mixtures of a class of antibiotics and their degradation products. To obtain the predictive model, toxicity data of 78 binary fluoroquinolone mixtures in E. coli (endpoint: log 1/IC50 in molar) have been utilized. We have used only 0D-2D descriptors to efficiently encode the structural features of mixture components without any additional complexities. The optimization of the class of mixture descriptors has been performed in this study by using three different mixing rules (linear combination of molecular contributions, the squared molecular contributions, and the norm of molecular contributions). Different machine-learning approaches namely, random forest (RF), ada boost, gradient boost (GB), extreme gradient boost (XGB), support vector machine (SVM), linear support vector machine (LSVM), and ridge regression (RR) have been employed here apart from the conventional partial least squares (PLS) regression to optimize the modeling approach. A rigorous validation protocol has been used for assessing the goodness-of-fit, robustness, and external predictability of the models. Finally, the toxicity of possible untested mixtures of different photodegradation products of fluoroquinolones has been predicted using the best model reported in this study.
Collapse
Affiliation(s)
- Mainak Chatterjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
6
|
Lee S, Ok SY, Moon HB, Seo SC, Ra JS. Developing a Novel Read-Across Concept for Ecotoxicological Risk Assessment of Phosphate Chemicals: A Case Study. TOXICS 2024; 12:96. [PMID: 38276731 PMCID: PMC10818528 DOI: 10.3390/toxics12010096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 01/17/2024] [Accepted: 01/18/2024] [Indexed: 01/27/2024]
Abstract
This study introduces a novel concept approach for a read-across assessment, considering species sensitivity differences among phosphate chemicals within structurally similar compound groups. Twenty-five organic chemicals, with a log Kow of 5 or less, were categorized into three functional groups based on acetylcholinesterase (AChE) inhibition as a specific mode of action (MOA). The short-term aquatic toxicity data (LC50) for fish, crustaceans, and insects were collected from the U.S. EPA Ecotoxicology (ECOTOX) Knowledgebase. A geometric mean calculation method was applied for multiple toxic endpoints. Performance metrics for the new read-across concept, including correlation coefficient, bias, precision, and accuracy, were calculated. Overall, a slightly higher overestimation (49.2%) than underestimation (48.4%) in toxicity predictions was observed in two case studies. In Case study I, a strong positive correlation (r = 0.93) between the predicted and known toxicity values of target chemicals was observed, while in Case study II, with limited information on species and their ecotoxicity, showed a moderate correlation (r = 0.75). Overall, the bias and precision for Case study I were 0.32 ± 0.01, while Case study II showed 0.65 ± 0.06; however, the relative bias (%) increased from 37.65% (Case study I) to 91.94% (Case study II). Bland-Altman plots highlight the mean differences of 1.33 (Case study I) and 1.24 (Case study II), respectively. The new read-across concept, focusing on AChE inhibition and structural similarity, demonstrated good reliability, applicability, and accuracy with minimal bias. Future studies are needed to evaluate various types of chemical substances, diverse modes of action, functional groups, toxic endpoints, and test species to ensure overall comprehensiveness and robustness in toxicity predictions.
Collapse
Affiliation(s)
- Seokwon Lee
- Geum River Environment Research Center, National Institute of Environmental Research, Okcheon-gun 29027, Chungbuk, Republic of Korea;
| | - Seung-Yeop Ok
- Department of Environmental Fate and Modelling, Knoell Korea Ltd., Seoul 07327, Republic of Korea;
- Department of Marine Sciences and Convergent Engineering, Hanyang University, Ansan 15588, Republic of Korea;
| | - Hyo-Bang Moon
- Department of Marine Sciences and Convergent Engineering, Hanyang University, Ansan 15588, Republic of Korea;
| | - Sung-Chul Seo
- Department of Nano, Chemical and Biological Engineering, College of Engineering, Seokyeong University, Seoul 02173, Republic of Korea
| | - Jin-Sung Ra
- Regulatory Chemical Analysis & Risk Assessment Center, Korea Institute of Industrial Technology (KITECH), Ansan 15588, Republic of Korea
| |
Collapse
|
7
|
Ghosh S, Chatterjee M, Roy K. Quantitative Read-across structure-activity relationship (q-RASAR): A new approach methodology to model aquatic toxicity of organic pesticides against different fish species. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2023; 265:106776. [PMID: 38006764 DOI: 10.1016/j.aquatox.2023.106776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 11/17/2023] [Accepted: 11/19/2023] [Indexed: 11/27/2023]
Abstract
We have developed quantitative toxicity prediction models for organic pesticides of agricultural importance considering different fish species using a novel quantitative Read-across structure-activity relationship (q-RASAR) approach. The current study uses experimental (Log 1/LC50) data of organic pesticides to various fish species, including Rainbow trout (RT: Oncorhynchus mykiss: 715 data points), Lepomis (LP: Lepomis macrochirus: 136 data points), and Miscellaneous (Pimephales promelas, Brachydanio rerio: 226 data points). This study has also discussed the validation of the developed models and the analysis of structural features that are important for aquatic toxicity towards fishes. The read-across-derived similarity, error, and concordance measures (RASAR descriptors) have been extracted from the preliminary 0D-2D descriptors; the combined pool of RASAR and selected 0D-2D descriptors have been used to develop the final models by employing partial least squares algorithm. All the q-RASAR models are acceptable in terms of goodness of fit, robustness, and external predictivity, superseding the quality of the respective QSAR models, as seen from the computed validation metrics. The q-RASAR is an effective approach that has the potential to be used as a good alternative way to enhance external predictivity, interpretability, and transferability for aquatic toxicity prediction as well as ecotoxicity potential identification.
Collapse
Affiliation(s)
- Shilpayan Ghosh
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Mainak Chatterjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
8
|
Chatterjee M, Roy K. "Data fusion" quantitative read-across structure-activity-activity relationships (q-RASAARs) for the prediction of toxicities of binary and ternary antibiotic mixtures toward three bacterial species. JOURNAL OF HAZARDOUS MATERIALS 2023; 459:132129. [PMID: 37506640 DOI: 10.1016/j.jhazmat.2023.132129] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 06/28/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023]
Abstract
Antibiotics are often found in the environment as pollutants. They are usually found as mixtures in the environment and may produce toxicity against different ecological species due to joint exposure in the sub-optimal range. Sometimes the degradation products of parent chemicals also interact with it and cause mixture toxicity. In this study, we have developed three different mixture-Quantitative Structure-Activity Relationship (mixture-QSAR) models for three different bacterial species (Vibrio fischeri, Escherichia coli, and Bacillus subtilis). The toxicity data were collected from a previous experimental report in the literature, which comprised binary and ternary mixtures of sulfonamides (SAs), sulfonamide potentiators (SAPs), and tetracyclines (TCs). We have also explored the interspecies modeling to find inter-correlation among the toxicity of these studied organisms and have developed quantitative structure activity-activity relationship (QSAAR) models by employing the "data fusion" quantitative read-across structure-activity-activity relationship (q-RASAAR) and partial least squares (PLS) regression algorithms. All the models are strictly validated using both internal and external validation tests as suggested in the OECD guidelines. Three different mixing rules have been used in this study for descriptor computations to incorporate the additive and interaction effects among the mixture components. To the best of our knowledge, this is the first report of interspecies mixture toxicity models which can predict the cellular toxicity of binary and ternary mixtures against any of the three above-mentioned organisms.
Collapse
Affiliation(s)
- Mainak Chatterjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
9
|
Zothantluanga JH, Chetia D, Rajkhowa S, Umar AK. Unsupervised machine learning, QSAR modelling and web tool development for streamlining the lead identification process of antimalarial flavonoids. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2023; 34:117-146. [PMID: 36744427 DOI: 10.1080/1062936x.2023.2169347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 01/10/2023] [Indexed: 06/18/2023]
Abstract
Identification of lead compounds with the traditional laboratory approach is expensive and time-consuming. Nowadays, in silico techniques have emerged as a promising approach for lead identification. In this study, we aim to develop robust and predictive 2D-QSAR models to identify lead flavonoids by predicting the IC50 against Plasmodium falciparum. We applied machine learning algorithms (Principal component analysis followed by K-means clustering) and Pearson correlation analysis to select 9 molecular descriptors (MDs) for model building. We selected and validated the three best QSAR models after execution of multiple linear regression (MLR) 100 times with different combinations of MDs. The developed models have fulfilled the five principles for QSAR models as specified by the Organization for Economic Co-operation and Development. The outcome of the study is a reliable and sustainable in silico method of IC50 (Mean ± SD) prediction that will positively impact the antimalarial drug development process by reducing the money and time required to identify potential antimalarial lead compounds from the class of flavonoids. We also developed a web tool (JazQSAR, https://etflin.com/news/4) to offer an easily accessible platform for the developed QSAR models.
Collapse
Affiliation(s)
- J H Zothantluanga
- Department of Pharmaceutical Sciences, Faculty of Science and Engineering, Dibrugarh University, Dibrugarh, India
| | - D Chetia
- Department of Pharmaceutical Sciences, Faculty of Science and Engineering, Dibrugarh University, Dibrugarh, India
| | - S Rajkhowa
- Centre for Biotechnology and Bioinformatics, Faculty of Biological Sciences, Dibrugarh University, Dibrugarh, India
| | - A K Umar
- Department of Pharmaceutics and Pharmaceutical Technology, Faculty of Pharmacy, Universitas Padjadjaran, Sumedang, Indonesia
| |
Collapse
|