1
|
Vittoria Togo M, Mastrolorito F, Orfino A, Graps EA, Tondo AR, Altomare CD, Ciriaco F, Trisciuzzi D, Nicolotti O, Amoroso N. Where developmental toxicity meets explainable artificial intelligence: state-of-the-art and perspectives. Expert Opin Drug Metab Toxicol 2024; 20:561-577. [PMID: 38141160 DOI: 10.1080/17425255.2023.2298827] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 12/20/2023] [Indexed: 12/24/2023]
Abstract
INTRODUCTION The application of Artificial Intelligence (AI) to predictive toxicology is rapidly increasing, particularly aiming to develop non-testing methods that effectively address ethical concerns and reduce economic costs. In this context, Developmental Toxicity (Dev Tox) stands as a key human health endpoint, especially significant for safeguarding maternal and child well-being. AREAS COVERED This review outlines the existing methods employed in Dev Tox predictions and underscores the benefits of utilizing New Approach Methodologies (NAMs), specifically focusing on eXplainable Artificial Intelligence (XAI), which proves highly efficient in constructing reliable and transparent models aligned with recommendations from international regulatory bodies. EXPERT OPINION The limited availability of high-quality data and the absence of dependable Dev Tox methodologies render XAI an appealing avenue for systematically developing interpretable and transparent models, which hold immense potential for both scientific evaluations and regulatory decision-making.
Collapse
Affiliation(s)
- Maria Vittoria Togo
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Fabrizio Mastrolorito
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Angelica Orfino
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Elisabetta Anna Graps
- ARESS Puglia - Agenzia Regionale strategica per laSalute ed il Sociale, Presidenza della Regione Puglia", Bari, Italy
| | - Anna Rita Tondo
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Cosimo Damiano Altomare
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Fulvio Ciriaco
- Department of Chemistry, Universitá degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Daniela Trisciuzzi
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Orazio Nicolotti
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| | - Nicola Amoroso
- Department of Pharmacy - Pharmaceutical Sciences, Università degli Studi di Bari "Aldo Moro", Bari, Italy
| |
Collapse
|
2
|
Daneshmand M, SalarAmoli J, BaghbanZadeh N. A QSAR study for predicting malformation in zebrafish embryo. Toxicol Mech Methods 2024:1-7. [PMID: 38586962 DOI: 10.1080/15376516.2024.2338907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 03/30/2024] [Indexed: 04/09/2024]
Abstract
BACKGROUND Developmental toxicity tests are extremely expensive, require a large number of animals, and are time-consuming. It is necessary to develop a new approach to simplify the analysis of developmental endpoints. One of these endpoints is malformation, and one group of ongoing methods for simplifying is in silico models. In this study, we aim to develop a quantitative structure-activity relationship (QSAR) model and identify the best algorithm for predicting malformations, as well as the most important and effective physicochemical properties associated with malformation. METHODS The dataset was extracted from a reliable database called COMPTOX. Physicochemical properties (descriptors) were calculated using Mordred and RDKit chemoinformatics software. The data were cleaned, preprocessed, and then split into training and testing sets. Machine learning algorithms, such as gradient boosting model (GBM) and logistic regression (LR), as well as deep learning models, including multilayer perceptron (MLP) and neural networks (NNs) trained with train set data and different sets of descriptors. The models were then validated with test set and various statistical parameters, such as Matthew's correlation coefficient (MCC) and balanced accuracy (BAC) score, were used to compare the models. RESULTS A set of descriptors containing with 78% AUC was identified as the best set of descriptors. Gradient boosting was determined to be the best algorithm with 78% predictive power. CONCLUSIONS The descriptors that were the most effective for developing models directly impact the mechanism of malformation, and GBM is the best model due to its MCC and BAC.
Collapse
Affiliation(s)
- Mahsa Daneshmand
- Department of Comparative Bioscience, Faculty of Veterinary Medicine, University of Tehran, Tehran, Iran
| | - Jamileh SalarAmoli
- Department of Comparative Bioscience, Faculty of Veterinary Medicine, University of Tehran, Tehran, Iran
| | | |
Collapse
|
3
|
Mastrolorito F, Togo MV, Gambacorta N, Trisciuzzi D, Giannuzzi V, Bonifazi F, Liantonio A, Imbrici P, De Luca A, Altomare CD, Ciriaco F, Amoroso N, Nicolotti O. TISBE: A Public Web Platform for the Consensus-Based Explainable Prediction of Developmental Toxicity. Chem Res Toxicol 2024; 37:323-339. [PMID: 38200616 DOI: 10.1021/acs.chemrestox.3c00310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
Despite being extremely relevant for the protection of prenatal and neonatal health, the developmental toxicity (Dev Tox) is a highly complex endpoint whose molecular rationale is still largely unknown. The lack of availability of high-quality data as well as robust nontesting methods makes its understanding even more difficult. Thus, the application of new explainable alternative methods is of utmost importance, with Dev Tox being one of the most animal-intensive research themes of regulatory toxicology. Descending from TIRESIA (Toxicology Intelligence and Regulatory Evaluations for Scientific and Industry Applications), the present work describes TISBE (TIRESIA Improved on Structure-Based Explainability), a new public web platform implementing four fundamental advancements for in silico analyses: a three times larger dataset, a transparent XAI (explainable artificial intelligence) framework employing a fragment-based fingerprint coding, a novel consensus classifier based on five independent machine learning models, and a new applicability domain (AD) method based on a double top-down approach for better estimating the prediction reliability. The training set (TS) includes as many as 1008 chemicals annotated with experimental toxicity values. Based on a 5-fold cross-validation, a median value of 0.410 for the Matthews correlation coefficient was calculated; TISBE was very effective, with a median value of sensitivity and specificity equal to 0.984 and 0.274, respectively. TISBE was applied on two external pools made of 1484 bioactive compounds and 85 pediatric drugs taken from ChEMBL (Chemical European Molecular Biology Laboratory) and TEDDY (Task-Force in Europe for Drug Development in the Young) repositories, respectively. Notably, TISBE gives users the option to clearly spot the molecular fragments responsible for the toxicity or the safety of a given chemical query and is available for free at https://prometheus.farmacia.uniba.it/tisbe.
Collapse
Affiliation(s)
- Fabrizio Mastrolorito
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125 Bari, Italy
| | - Maria Vittoria Togo
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125 Bari, Italy
| | - Nicola Gambacorta
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125 Bari, Italy
| | - Daniela Trisciuzzi
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125 Bari, Italy
| | - Viviana Giannuzzi
- Fondazione per la Ricerca Farmacologica Gianni Benzi Onlus, 70010 Valenzano (BA), Italy
| | - Fedele Bonifazi
- Fondazione per la Ricerca Farmacologica Gianni Benzi Onlus, 70010 Valenzano (BA), Italy
| | - Antonella Liantonio
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125 Bari, Italy
| | - Paola Imbrici
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125 Bari, Italy
| | - Annamaria De Luca
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125 Bari, Italy
| | - Cosimo Damiano Altomare
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125 Bari, Italy
| | - Fulvio Ciriaco
- Dipartimento di Chimica, Università degli Studi di Bari Aldo Moro, 70125 Bari, Italy
| | - Nicola Amoroso
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125 Bari, Italy
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70125 Bari, Italy
| | - Orazio Nicolotti
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125 Bari, Italy
| |
Collapse
|
4
|
Amoroso N, Gambacorta N, Mastrolorito F, Togo MV, Trisciuzzi D, Monaco A, Pantaleo E, Altomare CD, Ciriaco F, Nicolotti O. Making sense of chemical space network shows signs of criticality. Sci Rep 2023; 13:21335. [PMID: 38049451 PMCID: PMC10696027 DOI: 10.1038/s41598-023-48107-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 11/22/2023] [Indexed: 12/06/2023] Open
Abstract
Chemical space modelling has great importance in unveiling and visualising latent information, which is critical in predictive toxicology related to drug discovery process. While the use of traditional molecular descriptors and fingerprints may suffer from the so-called curse of dimensionality, complex networks are devoid of the typical drawbacks of coordinate-based representations. Herein, we use chemical space networks (CSNs) to analyse the case of the developmental toxicity (Dev Tox), which remains a challenging endpoint for the difficulty of gathering enough reliable data despite very important for the protection of the maternal and child health. Our study proved that the Dev Tox CSN has a complex non-random organisation and can thus provide a wealth of meaningful information also for predictive purposes. At a phase transition, chemical similarities highlight well-established toxicophores, such as aryl derivatives, mostly neurotoxic hydantoins, barbiturates and amino alcohols, steroids, and volatile organic compounds ether-like chemicals, which are strongly suspected of the Dev Tox onset and can thus be employed as effective alerts for prioritising chemicals before testing.
Collapse
Affiliation(s)
- Nicola Amoroso
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy.
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, via E. Orabona, 4, 70125, Bari, Italy.
| | - Nicola Gambacorta
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy
- Division of Medical Genetics, Fondazione IRCCS-Casa Sollievo della Sofferenza, San Giovanni Rotondo (Foggia), Italy
| | - Fabrizio Mastrolorito
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy
| | - Maria Vittoria Togo
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy
| | - Daniela Trisciuzzi
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy
| | - Alfonso Monaco
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, via E. Orabona, 4, 70125, Bari, Italy
- Dipartimento Interateneo di Fisica "M. Merlin", Università degli studi di Bari Aldo Moro, Via Giovanni Amendola, 173, 70125, Bari, Italy
| | - Ester Pantaleo
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, via E. Orabona, 4, 70125, Bari, Italy
- Dipartimento Interateneo di Fisica "M. Merlin", Università degli studi di Bari Aldo Moro, Via Giovanni Amendola, 173, 70125, Bari, Italy
| | - Cosimo Damiano Altomare
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy
| | - Fulvio Ciriaco
- Dipartimento di Chimica, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy.
| | - Orazio Nicolotti
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy
| |
Collapse
|
5
|
Guo W, Liu J, Dong F, Song M, Li Z, Khan MKH, Patterson TA, Hong H. Review of machine learning and deep learning models for toxicity prediction. Exp Biol Med (Maywood) 2023; 248:1952-1973. [PMID: 38057999 PMCID: PMC10798180 DOI: 10.1177/15353702231209421] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023] Open
Abstract
The ever-increasing number of chemicals has raised public concerns due to their adverse effects on human health and the environment. To protect public health and the environment, it is critical to assess the toxicity of these chemicals. Traditional in vitro and in vivo toxicity assays are complicated, costly, and time-consuming and may face ethical issues. These constraints raise the need for alternative methods for assessing the toxicity of chemicals. Recently, due to the advancement of machine learning algorithms and the increase in computational power, many toxicity prediction models have been developed using various machine learning and deep learning algorithms such as support vector machine, random forest, k-nearest neighbors, ensemble learning, and deep neural network. This review summarizes the machine learning- and deep learning-based toxicity prediction models developed in recent years. Support vector machine and random forest are the most popular machine learning algorithms, and hepatotoxicity, cardiotoxicity, and carcinogenicity are the frequently modeled toxicity endpoints in predictive toxicology. It is known that datasets impact model performance. The quality of datasets used in the development of toxicity prediction models using machine learning and deep learning is vital to the performance of the developed models. The different toxicity assignments for the same chemicals among different datasets of the same type of toxicity have been observed, indicating benchmarking datasets is needed for developing reliable toxicity prediction models using machine learning and deep learning algorithms. This review provides insights into current machine learning models in predictive toxicology, which are expected to promote the development and application of toxicity prediction models in the future.
Collapse
Affiliation(s)
- Wenjing Guo
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Jie Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Fan Dong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Meng Song
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Zoe Li
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Md Kamrul Hasan Khan
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Tucker A Patterson
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| |
Collapse
|
6
|
Togo MV, Mastrolorito F, Ciriaco F, Trisciuzzi D, Tondo AR, Gambacorta N, Bellantuono L, Monaco A, Leonetti F, Bellotti R, Altomare CD, Amoroso N, Nicolotti O. TIRESIA: An eXplainable Artificial Intelligence Platform for Predicting Developmental Toxicity. J Chem Inf Model 2023; 63:56-66. [PMID: 36520016 DOI: 10.1021/acs.jcim.2c01126] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Herein, a robust and reproducible eXplainable Artificial Intelligence (XAI) approach is presented, which allows prediction of developmental toxicity, a challenging human-health endpoint in toxicology. The application of XAI as an alternative method is of the utmost importance with developmental toxicity being one of the most animal-intensive areas of regulatory toxicology. In this work, the established CAESAR (Computer Assisted Evaluation of industrial chemical Substances According to Regulations) training set made of 234 chemicals for model learning is employed. Two test sets, including as a whole 585 chemicals, were instead used for validation and generalization purposes. The proposed framework favorably compares with the state-of-the-art approaches in terms of accuracy, sensitivity, and specificity, thus resulting in a reliable support system for developmental toxicity ensuring informativeness, uncertainty estimation, generalization, and transparency. Based on the eXtreme Gradient Boosting (XGB) algorithm, our predictive model provides easy interpretative keys based on specific molecular descriptors and structural alerts enabling one to distinguish toxic and nontoxic chemicals. Inspired by the Organisation for Economic Co-operation and Development (OECD) principles for the validation of Quantitative Structure-Activity Relationships (QSARs) for regulatory purposes, the results are summarized in a standard report in portable document format, enclosing also details concerned with a density-based model applicability domain and SHAP (SHapley Additive exPlanations) explainability, the latter particularly useful to better understand the effective roles played by molecular features. Notably, our model has been implemented in TIRESIA (Toxicology Intelligence and Regulatory Evaluations for Scientific and Industry Applications), a free of charge web platform available at http://tiresia.uniba.it.
Collapse
Affiliation(s)
- Maria Vittoria Togo
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125Bari, Italy
| | - Fabrizio Mastrolorito
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125Bari, Italy
| | - Fulvio Ciriaco
- Dipartimento di Chimica, Università degli Studi di Bari Aldo Moro, 70125, Bari, Italy
| | - Daniela Trisciuzzi
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125Bari, Italy
| | - Anna Rita Tondo
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125Bari, Italy
| | - Nicola Gambacorta
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125Bari, Italy
| | - Loredana Bellantuono
- Dipartimento di Biomedicina Traslazionale e Neuroscienze (DiBraiN), Università degli Studi di Bari Aldo Moro, 70124Bari, Italy.,Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70125Bari, Italy
| | - Alfonso Monaco
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70125Bari, Italy.,Dipartimento Interateneo di Fisica M. Merlin, Università degli Studi di Bari Aldo Moro, 70125Bari, Italy
| | - Francesco Leonetti
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125Bari, Italy
| | - Roberto Bellotti
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70125Bari, Italy.,Dipartimento Interateneo di Fisica M. Merlin, Università degli Studi di Bari Aldo Moro, 70125Bari, Italy
| | - Cosimo Damiano Altomare
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125Bari, Italy
| | - Nicola Amoroso
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125Bari, Italy.,Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70125Bari, Italy
| | - Orazio Nicolotti
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70125Bari, Italy
| |
Collapse
|
7
|
Manggara AB, Ohkawa K, Sugimoto M. Classifying Modes of Toxic Action of Molecules with Electronic-structure Informatics. Application to Imbalanced Toxicity Data of Phenol Derivatives to Tetrahymena pyriformis. CHEM LETT 2021. [DOI: 10.1246/cl.210453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Algafari Bakti Manggara
- Graduate School of Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto 860-8555, Japan
| | - Kazufumi Ohkawa
- Graduate School of Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto 860-8555, Japan
| | - Manabu Sugimoto
- Graduate School of Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto 860-8555, Japan
- Faculty of Advanced Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto 860-8555, Japan
- Institute of Industrial Nanomaterials, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto 860-8555, Japan
| |
Collapse
|
8
|
Rácz A, Bajusz D, Héberger K. Effect of Dataset Size and Train/Test Split Ratios in QSAR/QSPR Multiclass Classification. Molecules 2021; 26:1111. [PMID: 33669834 PMCID: PMC7922354 DOI: 10.3390/molecules26041111] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 02/04/2021] [Accepted: 02/16/2021] [Indexed: 01/04/2023] Open
Abstract
Applied datasets can vary from a few hundred to thousands of samples in typical quantitative structure-activity/property (QSAR/QSPR) relationships and classification. However, the size of the datasets and the train/test split ratios can greatly affect the outcome of the models, and thus the classification performance itself. We compared several combinations of dataset sizes and split ratios with five different machine learning algorithms to find the differences or similarities and to select the best parameter settings in nonbinary (multiclass) classification. It is also known that the models are ranked differently according to the performance merit(s) used. Here, 25 performance parameters were calculated for each model, then factorial ANOVA was applied to compare the results. The results clearly show the differences not just between the applied machine learning algorithms but also between the dataset sizes and to a lesser extent the train/test split ratios. The XGBoost algorithm could outperform the others, even in multiclass modeling. The performance parameters reacted differently to the change of the sample set size; some of them were much more sensitive to this factor than the others. Moreover, significant differences could be detected between train/test split ratios as well, exerting a great effect on the test validation of our models.
Collapse
Affiliation(s)
- Anita Rácz
- Department of Plasma Chemistry, Institute of Materials and Environmental Chemistry, ELKH Research Centre for Natural Sciences, Magyar Tudósok krt. 2, H-1117 Budapest, Hungary;
| | - Dávid Bajusz
- Medicinal Chemistry Research Group, ELKH Research Centre for Natural Sciences, Magyar Tudósok krt. 2, H-1117 Budapest, Hungary;
| | - Károly Héberger
- Department of Plasma Chemistry, Institute of Materials and Environmental Chemistry, ELKH Research Centre for Natural Sciences, Magyar Tudósok krt. 2, H-1117 Budapest, Hungary;
| |
Collapse
|
9
|
Feng H, Zhang L, Li S, Liu L, Yang T, Yang P, Zhao J, Arkin IT, Liu H. Predicting the reproductive toxicity of chemicals using ensemble learning methods and molecular fingerprints. Toxicol Lett 2021; 340:4-14. [PMID: 33421549 DOI: 10.1016/j.toxlet.2021.01.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 10/29/2020] [Accepted: 01/03/2021] [Indexed: 12/20/2022]
Abstract
Reproductive toxicity endpoints are a significant safety concern in the assessment of the adverse effects of chemicals in drug discovery. Computational models that can accurately predict a chemical's toxic potential are increasingly pursued to replace traditional animal experiments. Thus, ensemble learning models were built to predict the reproductive toxicity of compounds. Our ensemble models were developed using support vector machine, random forest, and extreme gradient boosting methods and 9 molecular fingerprints calculated for a dataset containing 1823 chemicals. The best prediction performance was achieved by the Ensemble-Top12 model, with an accuracy (ACC) of 86.33 %, a sensitivity (SEN) of 82.02 %, a specificity (SPE) of 90.19 %, and an area under the receiver operating characteristic curve (AUC) of 0.937 in 5-fold cross-validation and ACC, SEN, SPE, and AUC values of 84.38 %, 86.90 %, 90.67 %, and 0.920, respectively, in external validation. We also defined the applicability domain (AD) of the ensemble model by calculating the Tanimoto distance of the training set. Compared with models in existing literature, our ensemble model achieves relatively high ACC, SPE and AUC values. We also identified several fingerprint features related to chemical reproductive toxicity. Considering the performance of model, we recommend using the Ensemble-Top12 model to predict reproductive toxicity in early drug development.
Collapse
Affiliation(s)
- Huawei Feng
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Li Zhang
- School of Life Science, Liaoning University, Shenyang, 110036, China; Technology Innovation Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, Shenyang, 110036, China; Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Liaoning University, Shenyang, 110036, China
| | - Shimeng Li
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Lili Liu
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Tianzhou Yang
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Pengyu Yang
- School of Information, Liaoning University, Shenyang, 110036, China
| | - Jian Zhao
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Isaiah Tuvia Arkin
- Department of Biological Chemistry, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat-Ram, Jerusalem, 91904, Israel
| | - Hongsheng Liu
- Technology Innovation Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, Shenyang, 110036, China; Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Liaoning University, Shenyang, 110036, China; School of Pharmaceutical Science, Liaoning University, Shenyang, 110036, China.
| |
Collapse
|
10
|
Zhang H, Mao J, Qi HZ, Ding L. In silico prediction of drug-induced developmental toxicity by using machine learning approaches. Mol Divers 2019; 24:1281-1290. [PMID: 31486961 DOI: 10.1007/s11030-019-09991-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 08/28/2019] [Indexed: 02/05/2023]
Abstract
Some drugs and xenobiotics have the potential to disturb homeostasis, normal growth, differentiation, development or behavior during prenatal development or postnatally until puberty. Assessment of the developmental toxicity is one of the important safety considerations incorporated by international regulatory agencies. In this investigation, seven machine learning methods, including naïve Bayes, support vector machine, recursive partitioning, k-nearest neighbor, C4.5 decision tree, random forest and Adaboost, were used to build binary classification models for developmental toxicity. Among these models, the naïve Bayes classifier represented the best predictive performance and stability, which gave 91.11% overall prediction accuracy, 91.50% balanced accuracy and 0.818 MCC for the training set, and generated 83.93% concordance, 81.85% balanced accuracy and 0.627 MCC for the test set. The application domains were analyzed, and only one chemical in the test set was identified as outside the application domain. In addition, 10 important molecular descriptors related to developmental toxicity were selected by the genetic algorithm, which may contribute to explanation of the mechanisms of developmental toxicants. The best naïve Bayes classification model should be employed as alternative method for qualitative prediction of chemical-induced developmental toxicity in early stages of drug development.
Collapse
Affiliation(s)
- Hui Zhang
- College of Life Science, Northwest Normal University, Lanzhou, 730070, Gansu, People's Republic of China. .,State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, West China Medical School, Sichuan University, Chengdu, 610041, Sichuan, People's Republic of China.
| | - Jun Mao
- College of Life Science, Northwest Normal University, Lanzhou, 730070, Gansu, People's Republic of China
| | - Hua-Zhao Qi
- College of Life Science, Northwest Normal University, Lanzhou, 730070, Gansu, People's Republic of China
| | - Lan Ding
- College of Life Science, Northwest Normal University, Lanzhou, 730070, Gansu, People's Republic of China.
| |
Collapse
|