1
|
Lovrić M, Wang T, Staffe MR, Šunić I, Časni K, Lasky-Su J, Chawes B, Rasmussen MA. A Chemical Structure and Machine Learning Approach to Assess the Potential Bioactivity of Endogenous Metabolites and Their Association with Early Childhood Systemic Inflammation. Metabolites 2024; 14:278. [PMID: 38786755 PMCID: PMC11122766 DOI: 10.3390/metabo14050278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 04/29/2024] [Accepted: 05/08/2024] [Indexed: 05/25/2024] Open
Abstract
Metabolomics has gained much attention due to its potential to reveal molecular disease mechanisms and present viable biomarkers. This work uses a panel of untargeted serum metabolomes from 602 children from the COPSAC2010 mother-child cohort. The annotated part of the metabolome consists of 517 chemical compounds curated using automated procedures. We created a filtering method for the quantified metabolites using predicted quantitative structure-bioactivity relationships for the Tox21 database on nuclear receptors and stress response in cell lines. The metabolites measured in the children's serums are predicted to affect specific targeted models, known for their significance in inflammation, immune function, and health outcomes. The targets from Tox21 have been used as targets with quantitative structure-activity relationships (QSARs). They were trained for ~7000 structures, saved as models, and then applied to the annotated metabolites to predict their potential bioactivities. The models were selected based on strict accuracy criteria surpassing random effects. After application, 52 metabolites showed potential bioactivity based on structural similarity with known active compounds from the Tox21 set. The filtered compounds were subsequently used and weighted by their bioactive potential to show an association with early childhood hs-CRP levels at six months in a linear model supporting a physiological adverse effect on systemic low-grade inflammation.
Collapse
Affiliation(s)
- Mario Lovrić
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, 2820 Gentofte, Denmark
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia;
- The Lisbon Council, 1040 Brussels, Belgium
| | - Tingting Wang
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, 2820 Gentofte, Denmark
| | - Mads Rønnow Staffe
- Department of Food Science, University of Copenhagen, 1958 Frederiksberg, Denmark
| | - Iva Šunić
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia;
| | | | - Jessica Lasky-Su
- Department of Medicine, Boston, MA 02115, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Bo Chawes
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, 2820 Gentofte, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, 2300 Copenhagen, Denmark
| | - Morten Arendt Rasmussen
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, 2820 Gentofte, Denmark
- Department of Food Science, University of Copenhagen, 1958 Frederiksberg, Denmark
| |
Collapse
|
2
|
Nikolov NG, Nissen ACVE, Wedebye EB. A method for in vitro data and structure curation to optimize for QSAR modelling of minimum absolute potency levels and a comparative use case. ENVIRONMENTAL TOXICOLOGY AND PHARMACOLOGY 2023; 98:104069. [PMID: 36702390 DOI: 10.1016/j.etap.2023.104069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 01/16/2023] [Accepted: 01/18/2023] [Indexed: 06/18/2023]
Abstract
Large screening programs such as the US Tox21 are releasing experimental in vitro results for many endpoints of relevance for human health. In (Q)SAR modelling, it is essential to clearly define the endpoint (OECD QSAR Validation Principle 1) and extract the most robust data points according to the definition. We have developed a comprehensive data curation procedure to interpret in vitro experimental data sets for (Q)SAR development, with modules for selecting actives according to quality of curve fittings, magnitude of activity and 'absolute' potency cut-offs, requiring non-cytotoxicity at activity concentration; extracting only very robust inactives; selecting only substances tested in high purity; and accounting for assay signal interference. A structure curation procedure with uniform representation of tautomeric classes of substances is also developed. The detailed method and a use case of modelling Tox21 data for an estrogen receptor α agonism assay with and without use of the method is presented.
Collapse
Affiliation(s)
- Nikolai G Nikolov
- National Food Institute, Technical University of Denmark, Kemitorvet 2, 2800 Kgs., Lyngby, Denmark.
| | - Ana C V E Nissen
- National Food Institute, Technical University of Denmark, Kemitorvet 2, 2800 Kgs., Lyngby, Denmark.
| | - Eva B Wedebye
- National Food Institute, Technical University of Denmark, Kemitorvet 2, 2800 Kgs., Lyngby, Denmark.
| |
Collapse
|
3
|
Huang R. A Quantitative High-Throughput Screening Data Analysis Pipeline for Activity Profiling. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2474:133-145. [PMID: 35294762 DOI: 10.1007/978-1-0716-2213-1_13] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The U.S. Tox21 program has developed in vitro assays to test large collections of environmental chemicals in a quantitative high-throughput screening (qHTS) format, using triplicate 15-dose titrations to generate over 100 million data points to date. Counterscreens are also employed to minimize interferences from non-target-specific assay artifacts, such as compound autofluorescence and cytotoxicity. New data analysis approaches are needed to integrate these data and characterize the activities observed from these assays. Here, we describe a complete analysis pipeline that evaluates these qHTS data for technical quality in terms of signal reproducibility. We integrate signals from repeated assay runs, primary readouts and counterscreens to produce a final call on on-target compound activity.
Collapse
Affiliation(s)
- Ruili Huang
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
4
|
Dusza HM, Manz KE, Pennell KD, Kanda R, Legler J. Identification of known and novel nonpolar endocrine disruptors in human amniotic fluid. ENVIRONMENT INTERNATIONAL 2022; 158:106904. [PMID: 34607043 DOI: 10.1016/j.envint.2021.106904] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 09/22/2021] [Accepted: 09/23/2021] [Indexed: 05/25/2023]
Abstract
BACKGROUND Prenatal exposure to endocrine-disrupting compounds (EDCs) may contribute to endocrine-related diseases and disorders later in life. Nevertheless, data on in utero exposure to these compounds are still scarce. OBJECTIVES We investigated a wide range of known and novel nonpolar EDCs in full-term human amniotic fluid (AF), a representative matrix of direct fetal exposure. METHODS Gas chromatography high-resolution mass spectrometry (GC-HRMS) was used for the targeted and non-targeted analysis of chemicals present in nonpolar AF fractions with dioxin-like, (anti-)androgenic, and (anti-)estrogenic activity. The contribution of detected EDCs to the observed activity was determined based on their relative potencies. The multitude of features detected by non-targeted analysis was tentatively identified through spectra matching and data filtering, and further investigated using curated and freely available sources to predict endocrine activity. Prioritized suspects were purchased and their presence in AF was chemically and biologically confirmed with GC-HRMS and bioassay analysis. RESULTS Targeted analysis revealed 42 known EDCs in AF including dioxins and furans, polybrominated diphenyl ethers, pesticides, polychlorinated biphenyls, and polycyclic aromatic hydrocarbons. Only 30% of dioxin activity and <1% estrogenic and (anti-)androgenic activity was explained by the detected compounds. Non-targeted analysis revealed 14,110 features of which 3,243 matched with library spectra. Our data filtering strategy tentatively identified 121 compounds. Further data mining and in silico predictions revealed in total 69 suspected EDCs. We selected 14 chemicals for confirmation, of which 12 were biologically active and 9 were chemically confirmed in AF, including the plasticizer diphenyl isophthalate and industrial chemical p,p'-ditolylamine. CONCLUSIONS This study reveals the presence of a wide variety of nonpolar EDCs in direct fetal environment and for the first time identifies novel EDCs in human AF. Further assessment of the source and extent of human fetal exposure to these compounds is warranted.
Collapse
Affiliation(s)
- Hanna M Dusza
- Division of Toxicology, Institute for Risk Assessment Sciences, Department of Population Health Sciences, Faculty of Veterinary Medicine, Utrecht University, 3584 CM Utrecht, the Netherlands.
| | - Katherine E Manz
- School of Engineering, Brown University, Providence, RI 02912, United States
| | - Kurt D Pennell
- School of Engineering, Brown University, Providence, RI 02912, United States
| | - Rakesh Kanda
- Institute of Environment, Health and Societies, Brunel University London, Uxbridge, UB8 3PH, Middlesex, United Kingdom
| | - Juliette Legler
- Division of Toxicology, Institute for Risk Assessment Sciences, Department of Population Health Sciences, Faculty of Veterinary Medicine, Utrecht University, 3584 CM Utrecht, the Netherlands
| |
Collapse
|
5
|
Lovrić M, Đuričić T, Tran HTN, Hussain H, Lacić E, Rasmussen MA, Kern R. Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints. Pharmaceuticals (Basel) 2021; 14:758. [PMID: 34451855 PMCID: PMC8400160 DOI: 10.3390/ph14080758] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 07/21/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023] Open
Abstract
Methods for dimensionality reduction are showing significant contributions to knowledge generation in high-dimensional modeling scenarios throughout many disciplines. By achieving a lower dimensional representation (also called embedding), fewer computing resources are needed in downstream machine learning tasks, thus leading to a faster training time, lower complexity, and statistical flexibility. In this work, we investigate the utility of three prominent unsupervised embedding techniques (principal component analysis-PCA, uniform manifold approximation and projection-UMAP, and variational autoencoders-VAEs) for solving classification tasks in the domain of toxicology. To this end, we compare these embedding techniques against a set of molecular fingerprint-based models that do not utilize additional pre-preprocessing of features. Inspired by the success of transfer learning in several fields, we further study the performance of embedders when trained on an external dataset of chemical compounds. To gain a better understanding of their characteristics, we evaluate the embedders with different embedding dimensionalities, and with different sizes of the external dataset. Our findings show that the recently popularized UMAP approach can be utilized alongside known techniques such as PCA and VAE as a pre-compression technique in the toxicology domain. Nevertheless, the generative model of VAE shows an advantage in pre-compressing the data with respect to classification accuracy.
Collapse
Affiliation(s)
- Mario Lovrić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia
| | - Tomislav Đuričić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| | - Han T. N. Tran
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
| | - Hussain Hussain
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| | - Emanuel Lacić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
| | - Morten A. Rasmussen
- Copenhagen Studies on Asthma in Childhood, Herlev-Gentofte Hospital, University of Copenhagen, Ledreborg Alle 34, 2820 Gentofte, Denmark;
- Department of Food Science, University of Copenhagen, Rolighedsvej 26, 1958 Frederiksberg, Denmark
| | - Roman Kern
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| |
Collapse
|
6
|
Morger A, Svensson F, Arvidsson McShane S, Gauraha N, Norinder U, Spjuth O, Volkamer A. Assessing the calibration in toxicological in vitro models with conformal prediction. J Cheminform 2021; 13:35. [PMID: 33926567 PMCID: PMC8082859 DOI: 10.1186/s13321-021-00511-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 04/10/2021] [Indexed: 11/11/2022] Open
Abstract
Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data’s descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy—exchanging the calibration data only—is convenient as it does not require retraining of the underlying model.
Collapse
Affiliation(s)
- Andrea Morger
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin, Berlin, Germany
| | - Fredrik Svensson
- Alzheimer's Research UK UCL Drug Discovery Institute, London, WC1E 6BT, UK
| | - Staffan Arvidsson McShane
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden
| | - Niharika Gauraha
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden.,Division of Computational Science and Technology, KTH, 100 44, Stockholm, Sweden
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden.,Dept. Computer and Systems Sciences, Stockholm University, Box 7003, 164 07, Kista, Sweden.,MTM Research Centre, School of Science and Technology, Örebro University, 70 182, Örebro, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden
| | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin, Berlin, Germany.
| |
Collapse
|
7
|
Antelo-Collado A, Carrasco-Velar R, García-Pedrajas N, Cerruela-García G. Effective Feature Selection Method for Class-Imbalance Datasets Applied to Chemical Toxicity Prediction. J Chem Inf Model 2020; 61:76-94. [PMID: 33350301 DOI: 10.1021/acs.jcim.0c00908] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
During the drug development process, it is common to carry out toxicity tests and adverse effect studies, which are essential to guarantee patient safety and the success of the research. The use of in silico quantitative structure-activity relationship (QSAR) approaches for this task involves processing a huge amount of data that, in many cases, have an imbalanced distribution of active and inactive samples. This is usually termed the class-imbalance problem and may have a significant negative effect on the performance of the learned models. The performance of feature selection (FS) for QSAR models is usually damaged by the class-imbalance nature of the involved datasets. This paper proposes the use of an FS method focused on dealing with the class-imbalance problems. The method is based on the use of FS ensembles constructed by boosting and using two well-known FS methods, fast clustering-based FS and the fast correlation-based filter. The experimental results demonstrate the efficiency of the proposal in terms of the classification performance compared to standard methods. The proposal can be extended to other FS methods and applied to other problems in cheminformatics.
Collapse
Affiliation(s)
| | | | - Nicolás García-Pedrajas
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, E-14071 Córdoba, Spain
| | - Gonzalo Cerruela-García
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, E-14071 Córdoba, Spain
| |
Collapse
|
8
|
Goya-Jorge E, Giner RM, Sylla-Iyarreta Veitía M, Gozalbes R, Barigye SJ. Predictive modeling of aryl hydrocarbon receptor (AhR) agonism. CHEMOSPHERE 2020; 256:127068. [PMID: 32447110 DOI: 10.1016/j.chemosphere.2020.127068] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 05/09/2020] [Accepted: 05/12/2020] [Indexed: 06/11/2023]
Abstract
The aryl hydrocarbon receptor (AhR) plays a key role in the regulation of gene expression in metabolic machinery and detoxification systems. In the recent years, this receptor has attracted interest as a therapeutic target for immunological, oncogenic and inflammatory conditions. In the present report, in silico and in vitro approaches were combined to study the activation of the AhR. To this end, a large database of chemical compounds with known AhR agonistic activity was employed to build 5 classifiers based on the Adaboost (AdB), Gradient Boosting (GB), Random Forest (RF), Multilayer Perceptron (MLP) and Support Vector Machine (SVM) algorithms, respectively. The built classifiers were examined, following a 10-fold external validation procedure, demonstrating adequate robustness and predictivity. These models were integrated into a majority vote based ensemble, subsequently used to screen an in-house library of compounds from which 40 compounds were selected for prospective in vitro experimental validation. The general correspondence between the ensemble predictions and the in vitro results suggests that the constructed ensemble may be useful in predicting the AhR agonistic activity, both in a toxicological and pharmacological context. A preliminary structure-activity analysis of the evaluated compounds revealed that all structures bearing a benzothiazole moiety induced AhR expression while diverse activity profiles were exhibited by phenolic derivatives.
Collapse
Affiliation(s)
- Elizabeth Goya-Jorge
- ProtoQSAR SL. CEEI (Centro Europeo de Empresas Innovadoras) Parque Tecnológico de Valencia, Av. Benjamin Franklin 12, 46980, Paterna, Valencia, Spain; Departament de Farmacologia, Facultat de Farmàcia, Universitat de València, Av. Vicente Andrés Estellés s/n, 46100, Burjassot, Valencia, Spain
| | - Rosa M Giner
- Departament de Farmacologia, Facultat de Farmàcia, Universitat de València, Av. Vicente Andrés Estellés s/n, 46100, Burjassot, Valencia, Spain
| | - Maité Sylla-Iyarreta Veitía
- Equipe de Chimie Moléculaire du Laboratoire Génomique, Bioinformatique et Chimie Moléculaire (EA 7528), Conservatoire National des Arts et Métiers (Cnam), 2 Rue Conté, HESAM Université, 75003, Paris, France
| | - Rafael Gozalbes
- ProtoQSAR SL. CEEI (Centro Europeo de Empresas Innovadoras) Parque Tecnológico de Valencia, Av. Benjamin Franklin 12, 46980, Paterna, Valencia, Spain
| | - Stephen J Barigye
- ProtoQSAR SL. CEEI (Centro Europeo de Empresas Innovadoras) Parque Tecnológico de Valencia, Av. Benjamin Franklin 12, 46980, Paterna, Valencia, Spain.
| |
Collapse
|
9
|
Matsuzaka Y, Hosaka T, Ogaito A, Yoshinari K, Uesawa Y. Prediction Model of Aryl Hydrocarbon Receptor Activation by a Novel QSAR Approach, DeepSnap-Deep Learning. Molecules 2020; 25:molecules25061317. [PMID: 32183141 PMCID: PMC7144728 DOI: 10.3390/molecules25061317] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 03/05/2020] [Accepted: 03/09/2020] [Indexed: 12/31/2022] Open
Abstract
The aryl hydrocarbon receptor (AhR) is a ligand-dependent transcription factor that senses environmental exogenous and endogenous ligands or xenobiotic chemicals. In particular, exposure of the liver to environmental metabolism-disrupting chemicals contributes to the development and propagation of steatosis and hepatotoxicity. However, the mechanisms for AhR-induced hepatotoxicity and tumor propagation in the liver remain to be revealed, due to the wide variety of AhR ligands. Recently, quantitative structure–activity relationship (QSAR) analysis using deep neural network (DNN) has shown superior performance for the prediction of chemical compounds. Therefore, this study proposes a novel QSAR analysis using deep learning (DL), called the DeepSnap–DL method, to construct prediction models of chemical activation of AhR. Compared with conventional machine learning (ML) techniques, such as the random forest, XGBoost, LightGBM, and CatBoost, the proposed method achieves high-performance prediction of AhR activation. Thus, the DeepSnap–DL method may be considered a useful tool for achieving high-throughput in silico evaluation of AhR-induced hepatotoxicity.
Collapse
Affiliation(s)
- Yasunari Matsuzaka
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, 204-8588 Tokyo, Japan;
| | - Takuomi Hosaka
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 422-8529, Japan; (T.H.); (A.O.); (K.Y.)
| | - Anna Ogaito
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 422-8529, Japan; (T.H.); (A.O.); (K.Y.)
| | - Kouichi Yoshinari
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 422-8529, Japan; (T.H.); (A.O.); (K.Y.)
| | - Yoshihiro Uesawa
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, 204-8588 Tokyo, Japan;
- Correspondence:
| |
Collapse
|
10
|
Cravero F, Schustik SA, Martínez MJ, Vázquez GE, Díaz MF, Ponzoni I. Feature Selection for Polymer Informatics: Evaluating Scalability and Robustness of the FS4RV DD Algorithm Using Synthetic Polydisperse Data Sets. J Chem Inf Model 2020; 60:592-603. [PMID: 31790226 DOI: 10.1021/acs.jcim.9b00867] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The feature selection (FS) process is a key step in the Quantitative Structure-Property Relationship (QSPR) modeling of physicochemical properties in cheminformatics. In particular, the inference of QSPR models for polymeric material properties constitutes a complex problem because of the uncertainty introduced by the polydispersity of these materials. The main challenge is how to capture the polydispersity information from the molecular weight distribution (MWD) curve to achieve a more effective computational representation of polymeric materials. To date, most of the existing QSPR techniques use only a single molecule to represent each of these materials, but polydispersity is not considered. Consequently, QSPR models obtained by these approaches are being oversimplified. For this reason, we introduced in a previous work a new FS algorithm called Feature Selection for Random Variables with Discrete Distribution (FS4RVDD), which allows dealing with polydisperse data. In the present paper, we evaluate both the scalability and the robustness of the FS4RVDD algorithm. In this sense, we generated synthetic data by varying and combining different parameters: the size of the database, the cardinality of the selected feature subsets, the presence of noise in the data, and the type of correlation (linear and nonlinear). Moreover, the performances obtained by FS4RVDD were contrasted with traditional FS techniques applied to different simplified representations of polymeric materials. The obtained results show that the FS4RVDD algorithm outperformed the traditional FS methods in all proposed scenarios, which suggest the need of an algorithm such as FS4RVDD to deal with the uncertainty that polydispersity introduces in human-made polymers.
Collapse
Affiliation(s)
- Fiorella Cravero
- Planta Piloto de Ingeniería Química , Universidad Nacional del Sur - CONICET , Camino La Carrindanga 7000 , CP 8000 Bahía Blanca , Argentina
| | - Santiago A Schustik
- Planta Piloto de Ingeniería Química , Universidad Nacional del Sur - CONICET , Camino La Carrindanga 7000 , CP 8000 Bahía Blanca , Argentina.,Comisión de Investigaciones Científicas de la Provincia de Buenos Aires , (CIC) , CP 1900 La Plata , Argentina
| | - M Jimena Martínez
- Instituto de Ciencias e Ingeniería de la Computación , (UNS-CONICET) , San Andrés 800, Campus de Palihue , CP 8000 Bahía Blanca , Argentina
| | - Gustavo E Vázquez
- Facultad de Ingeniería y Tecnologías , Universidad Católica del Uruguay , Av. 8 de Octubre 2788 , CP 11600 Montevideo , Uruguay
| | - Mónica F Díaz
- Planta Piloto de Ingeniería Química , Universidad Nacional del Sur - CONICET , Camino La Carrindanga 7000 , CP 8000 Bahía Blanca , Argentina.,Departamento de Ingeniería Química , (DIQ-UNS) , CP 8000 Bahía Blanca , Argentina
| | - Ignacio Ponzoni
- Instituto de Ciencias e Ingeniería de la Computación , (UNS-CONICET) , San Andrés 800, Campus de Palihue , CP 8000 Bahía Blanca , Argentina.,Departamento de Ciencias e Ingeniería de la Computación , (DCIC-UNS) , CP 8000 Bahía Blanca , Argentina
| |
Collapse
|
11
|
Shin HK, Kang MG, Park D, Park T, Yoon S. Development of Prediction Models for Drug-Induced Cholestasis, Cirrhosis, Hepatitis, and Steatosis Based on Drug and Drug Metabolite Structures. Front Pharmacol 2020; 11:67. [PMID: 32116729 PMCID: PMC7034408 DOI: 10.3389/fphar.2020.00067] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Accepted: 01/23/2020] [Indexed: 12/18/2022] Open
Abstract
Drug-induced liver injury (DILI) is one of the major reasons for termination of drug development. Due to the importance of predicting DILI in early phases of drug development, diverse in silico models have been developed to filter out DILI-causing candidates before clinical study. However, no computational models have achieved sufficient prediction power for screening DILI in early phases because 1) drugs often cause liver injury through reactive metabolites, 2) different clinical outcomes of DILI have different mechanisms, and 3) the DILI label on drugs is not clearly defined. In this study, we developed binary classification models to predict drug-induced cholestasis, cirrhosis, hepatitis, and steatosis based on the structure of drugs and their metabolites. DILI-positive data was obtained from post-market reports of drugs and DILI-negative data from DILIrank, a database curated by the Food and Drug Administration (FDA). Support vector machine (SVM) and random forest (RF) were used in developing models with nine fingerprints and one 2D molecular descriptor calculated from drug (152 DILI-positives and 102 DILI-negatives) and drug metabolite (192 DILI-positives and 126 DILI-negatives) structures. Models were developed according to Organisation for Economic Co-operation and Development (OECD) guidelines for quantitative structure-activity relationship (QSAR) validation. Internal and external validation was performed with a randomization test in order to thoroughly examine model predictability and avoid random correlation between structural features and adverse outcomes. The applicability domain was defined with a leverage method for reliable prediction of new chemicals. The best models for each liver disease were selected based on external validation results from drugs (cholestasis: 70%, cirrhosis: 90%, hepatitis: 83%, and steatosis: 85%) and drug metabolites (cholestasis: 86%, cirrhosis: 88%, hepatitis: 86%, and steatosis: 83%) with applicability domain analysis. Compiled data sets were further exploited to derive privileged substructures that were more frequent in DILI-positive sets compared to DILI-negative sets and in drug metabolite structures compared to drug structures with a Morgan fingerprint level 2.
Collapse
Affiliation(s)
- Hyun Kil Shin
- Toxicoinformatics Group, Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon, South Korea
| | - Myung-Gyun Kang
- Toxicoinformatics Group, Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon, South Korea
| | - Daeui Park
- Toxicoinformatics Group, Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon, South Korea
- Department of Human and Environmental Toxicology, University of Science and Technology, Daejeon, South Korea
| | - Tamina Park
- Toxicoinformatics Group, Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon, South Korea
- Department of Human and Environmental Toxicology, University of Science and Technology, Daejeon, South Korea
| | - Seokjoo Yoon
- Toxicoinformatics Group, Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon, South Korea
- Department of Human and Environmental Toxicology, University of Science and Technology, Daejeon, South Korea
| |
Collapse
|
12
|
Kato H. Computational prediction of cytochrome P450 inhibition and induction. Drug Metab Pharmacokinet 2019; 35:30-44. [PMID: 31902468 DOI: 10.1016/j.dmpk.2019.11.006] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 10/27/2019] [Accepted: 11/17/2019] [Indexed: 12/14/2022]
Abstract
Cytochrome P450 (CYP) enzymes play an important role in the phase I metabolism of many xenobiotics. Most drug-drug interactions (DDIs) associated with CYP are caused by either CYP inhibition or induction. The early detection of potential DDIs is highly desirable in the pharmaceutical industry because DDIs can cause serious adverse events, which can lead to poor patient health and drug development failures. Recently, many computational studies predicting CYP inhibition and induction have been reported. The current computational modeling approaches for CYP metabolism are classified as ligand- and structure-based; various techniques, such as quantitative structure-activity relationships, machine learning, docking, and molecular dynamic simulation, are involved in both the approaches. Recently, combining these two approaches have resulted in improvements in the prediction accuracy of DDIs. In this review, we present important, recent developments in the computational prediction of the inhibition of four clinically crucial CYP isoforms (CYP1A2, 2C9, 2D6, and 3A4) and three nuclear receptors (aryl hydrocarbon receptor, constitutive androstane receptor, and pregnane X receptor) involved in the induction of CYP1A2, 2B6, and 3A4, respectively.
Collapse
Affiliation(s)
- Harutoshi Kato
- DMPK Research Laboratories, Mitsubishi Tanabe Pharma Corporation, Aoba-ku, Yokohama-shi, 227-0033, Japan.
| |
Collapse
|