1
|
Bandini E, Castellano Ontiveros R, Kajtazi A, Eghbali H, Lynen F. Physicochemical modelling of the retention mechanism of temperature-responsive polymeric columns for HPLC through machine learning algorithms. J Cheminform 2024; 16:72. [PMID: 38907264 PMCID: PMC11193285 DOI: 10.1186/s13321-024-00873-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 06/14/2024] [Indexed: 06/23/2024] Open
Abstract
Temperature-responsive liquid chromatography (TRLC) offers a promising alternative to reversed-phase liquid chromatography (RPLC) for environmentally friendly analytical techniques by utilizing pure water as a mobile phase, eliminating the need for harmful organic solvents. TRLC columns, packed with temperature-responsive polymers coupled to silica particles, exhibit a unique retention mechanism influenced by temperature-induced polymer hydration. An investigation of the physicochemical parameters driving separation at high and low temperatures is crucial for better column manufacturing and selectivity control. Assessment of predictability using a dataset of 139 molecules analyzed at different temperatures elucidated the molecular descriptors (MDs) relevant to retention mechanisms. Linear regression, support vector regression (SVR), and tree-based ensemble models were evaluated, with no standout performer. The precision, accuracy, and robustness of models were validated through metrics, such as r and mean absolute error (MAE), and statistical analysis. At 45 ∘ C , logP predominantly influenced retention, akin to reversed-phase columns, while at5 ∘ C , complex interactions with lipophilic and negative MDs, along with specific functional groups, dictated retention. These findings provide deeper insights into TRLC mechanisms, facilitating method development and maximizing column potential.
Collapse
Affiliation(s)
- Elena Bandini
- Separation Science Group, Department of Organic and Macromolecular Chemistry, Univeristy of Ghent, Krijgslaan 281 S4bis, Ghent, 9000, Belgium.
| | - Rodrigo Castellano Ontiveros
- School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, 11428, Sweden
| | - Ardiana Kajtazi
- Separation Science Group, Department of Organic and Macromolecular Chemistry, Univeristy of Ghent, Krijgslaan 281 S4bis, Ghent, 9000, Belgium
| | - Hamed Eghbali
- Packaging and Specialty Plastics R&D, Dow Benelux B.V., Terneuzen, 4530 AA, the Netherlands
| | - Frédéric Lynen
- Separation Science Group, Department of Organic and Macromolecular Chemistry, Univeristy of Ghent, Krijgslaan 281 S4bis, Ghent, 9000, Belgium
| |
Collapse
|
2
|
Chatterjee M, Roy K. Predictive binary mixture toxicity modeling of fluoroquinolones (FQs) and the projection of toxicity of hypothetical binary FQ mixtures: a combination of 2D-QSAR and machine-learning approaches. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2024; 26:105-118. [PMID: 38073518 DOI: 10.1039/d3em00445g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
All sorts of chemicals get degraded under various environmental stresses, and the degradates coexist with the parent compounds as mixtures in the environment. Antibiotics emerge as an additional concern due to the bioactive nature of both the parent compound and degradation products and their combined exposure to the environment. Therefore, environmental risk assessment of antibiotics and their degradation products is very much necessary. In this direction, we made use of in silico new approach methodologies (NAMs) and machine-learning algorithms. In this study, we have developed a robust and predictive mixture-quantitative structure-activity relationship (QSAR) model with promising quality and predictability (internal: MAETrain = 0.085, QLOO2 = 0.849, external: MAETest = 0.090, and QF12 = 0.859) for predicting the toxicity of the mixtures of a class of antibiotics and their degradation products. To obtain the predictive model, toxicity data of 78 binary fluoroquinolone mixtures in E. coli (endpoint: log 1/IC50 in molar) have been utilized. We have used only 0D-2D descriptors to efficiently encode the structural features of mixture components without any additional complexities. The optimization of the class of mixture descriptors has been performed in this study by using three different mixing rules (linear combination of molecular contributions, the squared molecular contributions, and the norm of molecular contributions). Different machine-learning approaches namely, random forest (RF), ada boost, gradient boost (GB), extreme gradient boost (XGB), support vector machine (SVM), linear support vector machine (LSVM), and ridge regression (RR) have been employed here apart from the conventional partial least squares (PLS) regression to optimize the modeling approach. A rigorous validation protocol has been used for assessing the goodness-of-fit, robustness, and external predictability of the models. Finally, the toxicity of possible untested mixtures of different photodegradation products of fluoroquinolones has been predicted using the best model reported in this study.
Collapse
Affiliation(s)
- Mainak Chatterjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
3
|
Shahini E, Chaulagain N, Shankar K, Tang T. Predicting Free Energies of Exfoliation and Solvation for Graphitic Carbon Nitrides Using Machine Learning. ACS APPLIED MATERIALS & INTERFACES 2023; 15:53786-53801. [PMID: 37938813 DOI: 10.1021/acsami.3c09347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2023]
Abstract
As a metal-free and visible-light-responsive photocatalyst, graphitic carbon nitride (g-C3N4) has emerged as a new research hotspot and has attracted broad attention in the field of solar energy conversion and thin-film transistors. Liquid-phase exfoliation (LPE) is the best-known method for the synthesis of 2D g-C3N4 nanosheets. In LPE, bulk g-C3N4 is exfoliated in a solvent via high-shear mixing or sonication in order to produce a stable suspension of individual nanosheets. Two parameters of importance in gauging the performance of a solvent in LPE are the free energy required to exfoliate a unit area of layered materials into individual sheets in the solvent (ΔGexf) and the solvation free energy per unit area of a nanosheet (ΔGsol). While approximations for the free energies exist, they are shown in our previous work to be inaccurate and incapable of capturing the experimentally observed efficacy of LPE. Molecular dynamics (MD) simulations can provide accurate free-energy calculations, but doing so for every single solvent is time- and resource-consuming. Herein, machine learning (ML) algorithms are used to predict ΔGexf and ΔGsol for g-C3N4. First, a database for ΔGexf and ΔGsol is created based on a series of MD simulations involving 49 different solvents with distinct chemical structures and properties. The data set also includes values of critical descriptors for the solvents, including density, surface tension, dielectric constant, etc. Different ML methods are compared, accompanied by descriptor selection, to develop the most accurate model for predicting ΔGexf and ΔGsol. The extra tree regressor is shown to be the best performer among the six ML methods studied. Experimental validation of the model is conducted by performing dispersibility tests in several solvents for which the free energies are predicted. Finally, the influence of the selected descriptors on the free energies is analyzed, and strategies for solvent selection in LPE are proposed.
Collapse
Affiliation(s)
- Ehsan Shahini
- Department of Mechanical Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| | - Narendra Chaulagain
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| | - Karthik Shankar
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| | - Tian Tang
- Department of Mechanical Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| |
Collapse
|
4
|
Mullowney MW, Duncan KR, Elsayed SS, Garg N, van der Hooft JJJ, Martin NI, Meijer D, Terlouw BR, Biermann F, Blin K, Durairaj J, Gorostiola González M, Helfrich EJN, Huber F, Leopold-Messer S, Rajan K, de Rond T, van Santen JA, Sorokina M, Balunas MJ, Beniddir MA, van Bergeijk DA, Carroll LM, Clark CM, Clevert DA, Dejong CA, Du C, Ferrinho S, Grisoni F, Hofstetter A, Jespers W, Kalinina OV, Kautsar SA, Kim H, Leao TF, Masschelein J, Rees ER, Reher R, Reker D, Schwaller P, Segler M, Skinnider MA, Walker AS, Willighagen EL, Zdrazil B, Ziemert N, Goss RJM, Guyomard P, Volkamer A, Gerwick WH, Kim HU, Müller R, van Wezel GP, van Westen GJP, Hirsch AKH, Linington RG, Robinson SL, Medema MH. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 2023; 22:895-916. [PMID: 37697042 DOI: 10.1038/s41573-023-00774-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 09/13/2023]
Abstract
Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.
Collapse
Affiliation(s)
| | - Katherine R Duncan
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Somayah S Elsayed
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Neha Garg
- School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Nathaniel I Martin
- Biological Chemistry Group, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - David Meijer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Barbara R Terlouw
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Friederike Biermann
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Kai Blin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Marina Gorostiola González
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
- ONCODE institute, Leiden, The Netherlands
| | - Eric J N Helfrich
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Florian Huber
- Center for Digitalization and Digitality, Hochschule Düsseldorf, Düsseldorf, Germany
| | - Stefan Leopold-Messer
- Institut für Mikrobiologie, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany
| | - Tristan de Rond
- School of Chemical Sciences, University of Auckland, Auckland, New Zealand
| | - Jeffrey A van Santen
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller University, Jena, Germany
- Pharmaceuticals R&D, Bayer AG, Berlin, Germany
| | - Marcy J Balunas
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Mehdi A Beniddir
- Équipe "Chimie des Substances Naturelles", Université Paris-Saclay, CNRS, BioCIS, Orsay, France
| | - Doris A van Bergeijk
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Laura M Carroll
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Chase M Clark
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chao Du
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | | | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | - Willem Jespers
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Drug Bioinformatics, Medical Faculty, Saarland University, Homburg, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | | | - Hyunwoo Kim
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University Seoul, Goyang-si, Republic of Korea
| | - Tiago F Leao
- Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Joleen Masschelein
- Center for Microbiology, VIB-KU Leuven, Heverlee, Belgium
- Department of Biology, KU Leuven, Heverlee, Belgium
| | - Evan R Rees
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - Raphael Reher
- Institute of Pharmaceutical Biology and Biotechnology, University of Marburg, Marburg, Germany
- Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Duke Microbiome Center, Duke University, Durham, NC, USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence, Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | - Michael A Skinnider
- Adapsyn Bioscience, Hamilton, Ontario, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Allison S Walker
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Barbara Zdrazil
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, UK
| | - Nadine Ziemert
- Interfaculty Institute for Microbiology and Infection Medicine Tuebingen (IMIT), Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany
| | | | - Pierre Guyomard
- Bonsai team, CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Université de Lille, Villeneuve d'Ascq Cedex, France
| | - Andrea Volkamer
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - William H Gerwick
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Department of Pharmacy, Saarland University, Saarbrücken, Germany
- German Center for infection research (DZIF), Braunschweig, Germany
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany
| | - Gilles P van Wezel
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, The Netherlands
| | - Gerard J P van Westen
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
| | - Anna K H Hirsch
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Department of Pharmacy, Saarland University, Saarbrücken, Germany.
- German Center for infection research (DZIF), Braunschweig, Germany.
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany.
| | - Roger G Linington
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Serina L Robinson
- Department of Environmental Microbiology, Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
- Institute of Biology, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
5
|
Viesi E, Sardina DS, Perricone U, Giugno R. APDB: a database on air pollutant characterization and similarity prediction. Database (Oxford) 2023; 2023:baad046. [PMID: 37450416 PMCID: PMC10348400 DOI: 10.1093/database/baad046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 05/12/2023] [Accepted: 06/16/2023] [Indexed: 07/18/2023]
Abstract
The World Health Organization estimates that 9 out of 10 people worldwide breathe air containing high levels of pollutants. Long-term and chronic exposure to high concentrations of air pollutants is associated with deleterious effects on vital organs, including increased inflammation in the lungs, oxidative stress in the heart and disruption of the blood-brain barrier. For this reason, in an effort to find an association between exposure to pollutants and the toxicological effects observable on human health, an online resource collecting and characterizing in detail pollutant molecules could be helpful to investigate their properties and mechanisms of action. We developed a database, APDB, collecting air-pollutant-related data from different online resources, in particular, molecules from the US Environmental Protection Agency, their associated targets and bioassays found in the PubChem chemical repository and their computed molecular descriptors and quantum mechanics properties. A web interface allows (i) to browse data by category, (ii) to navigate the database by querying molecules and targets and (iii) to visualize and download molecule and target structures as well as computed descriptors and similarities. The desired data can be freely exported in textual/tabular format and the whole database in SQL format. Database URL http://apdb.di.univr.it.
Collapse
Affiliation(s)
- Eva Viesi
- Department of Computer Science, University of Verona, Strada le Grazie 15, Verona 37134, Italy
| | - Davide Stefano Sardina
- Molecular Informatics Unit, Ri.MED Foundation, Via Filippo Marini 14, Palermo 90128, Italy
| | - Ugo Perricone
- Molecular Informatics Unit, Ri.MED Foundation, Via Filippo Marini 14, Palermo 90128, Italy
- National Biodiversity Future Center (NBFC), Piazza Marina 61, Palermo 90133, Italy
| | - Rosalba Giugno
- Department of Computer Science, University of Verona, Strada le Grazie 15, Verona 37134, Italy
- National Biodiversity Future Center (NBFC), Piazza Marina 61, Palermo 90133, Italy
| |
Collapse
|
6
|
Dutschmann TM, Kinzel L, Ter Laak A, Baumann K. Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation. J Cheminform 2023; 15:49. [PMID: 37118768 PMCID: PMC10142532 DOI: 10.1186/s13321-023-00709-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Accepted: 03/10/2023] [Indexed: 04/30/2023] Open
Abstract
It is insightful to report an estimator that describes how certain a model is in a prediction, additionally to the prediction alone. For regression tasks, most approaches implement a variation of the ensemble method, apart from few exceptions. Instead of a single estimator, a group of estimators yields several predictions for an input. The uncertainty can then be quantified by measuring the disagreement between the predictions, for example by the standard deviation. In theory, ensembles should not only provide uncertainties, they also boost the predictive performance by reducing errors arising from variance. Despite the development of novel methods, they are still considered the "golden-standard" to quantify the uncertainty of regression models. Subsampling-based methods to obtain ensembles can be applied to all models, regardless whether they are related to deep learning or traditional machine learning. However, little attention has been given to the question whether the ensemble method is applicable to virtually all scenarios occurring in the field of cheminformatics. In a widespread and diversified attempt, ensembles are evaluated for 32 datasets of different sizes and modeling difficulty, ranging from physicochemical properties to biological activities. For increasing ensemble sizes with up to 200 members, the predictive performance as well as the applicability as uncertainty estimator are shown for all combinations of five modeling techniques and four molecular featurizations. Useful recommendations were derived for practitioners regarding the success and minimum size of ensembles, depending on whether predictive performance or uncertainty quantification is of more importance for the task at hand.
Collapse
Affiliation(s)
- Thomas-Martin Dutschmann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany
| | - Lennart Kinzel
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany
| | - Antonius Ter Laak
- Bayer AG, Research & Development, Pharmaceuticals, Muellerstrasse 178, 13353, Berlin, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany.
| |
Collapse
|
7
|
How the Structure of Per- and Polyfluoroalkyl Substances (PFAS) Influences Their Binding Potency to the Peroxisome Proliferator-Activated and Thyroid Hormone Receptors-An In Silico Screening Study. MOLECULES (BASEL, SWITZERLAND) 2023; 28:molecules28020479. [PMID: 36677537 PMCID: PMC9866891 DOI: 10.3390/molecules28020479] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 12/22/2022] [Accepted: 12/23/2022] [Indexed: 01/06/2023]
Abstract
In this study, we investigated PFAS (per- and polyfluoroalkyl substances) binding potencies to nuclear hormone receptors (NHRs): peroxisome proliferator-activated receptors (PPARs) α, β, and γ and thyroid hormone receptors (TRs) α and β. We have simulated the docking scores of 43 perfluoroalkyl compounds and based on these data developed QSAR (Quantitative Structure-Activity Relationship) models for predicting the binding probability to five receptors. In the next step, we implemented the developed QSAR models for the screening approach of a large group of compounds (4464) from the NORMAN Database. The in silico analyses indicated that the probability of PFAS binding to the receptors depends on the chain length, the number of fluorine atoms, and the number of branches in the molecule. According to the findings, the considered PFAS group bind to the PPARα, β, and γ only with low or moderate probability, while in the case of TR α and β it is similar except that those chemicals with longer chains show a moderately high probability of binding.
Collapse
|
8
|
Didachos C, Kintos DP, Fousteris M, Mylonas P, Kanavos A. An Optimized Cloud Computing Method for Extracting Molecular Descriptors. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2023; 1424:247-254. [PMID: 37486501 DOI: 10.1007/978-3-031-31982-2_28] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/25/2023]
Abstract
Extracting molecular descriptors from chemical compounds is an essential preprocessing phase for developing accurate classification models. Supervised machine learning algorithms offer the capability to detect "hidden" patterns that may exist in a large dataset of compounds, which are represented by their molecular descriptors. Assuming that molecules with similar structure tend to share similar physicochemical properties, large chemical libraries can be screened by applying similarity sourcing techniques in order to detect potential bioactive compounds against a molecular target. However, the process of generating these compound features is time-consuming. Our proposed methodology not only employs cloud computing to accelerate the process of extracting molecular descriptors but also introduces an optimized approach to utilize the computational resources in the most efficient way.
Collapse
Affiliation(s)
- Christos Didachos
- Computer Engineering and Informatics Department, University of Patras, Patras, Greece
| | | | | | - Phivos Mylonas
- Department of Informatics, Ionian University, Corfu, Greece
| | - Andreas Kanavos
- Department of Informatics, Ionian University, Corfu, Greece.
| |
Collapse
|
9
|
Metwally AA, Nayel AA, Hathout RM. In silico prediction of siRNA ionizable-lipid nanoparticles In vivo efficacy: Machine learning modeling based on formulation and molecular descriptors. Front Mol Biosci 2022; 9:1042720. [PMID: 36619167 PMCID: PMC9811823 DOI: 10.3389/fmolb.2022.1042720] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 12/07/2022] [Indexed: 12/24/2022] Open
Abstract
In silico prediction of the in vivo efficacy of siRNA ionizable-lipid nanoparticles is desirable as it can save time and resources dedicated to wet-lab experimentation. This study aims to computationally predict siRNA nanoparticles in vivo efficacy. A data set containing 120 entries was prepared by combining molecular descriptors of the ionizable lipids together with two nanoparticles formulation characteristics. Input descriptor combinations were selected by an evolutionary algorithm. Artificial neural networks, support vector machines and partial least squares regression were used for QSAR modeling. Depending on how the data set is split, two training sets and two external validation sets were prepared. Training and validation sets contained 90 and 30 entries respectively. The results showed the successful predictions of validation set log (siRNA dose) with Rval 2= 0.86-0.89 and 0.75-80 for validation sets one and two, respectively. Artificial neural networks resulted in the best Rval 2 for both validation sets. For predictions that have high bias, improvement of Rval 2 from 0.47 to 0.96 was achieved by selecting the training set lipids lying within the applicability domain. In conclusion, in vivo performance of siRNA nanoparticles was successfully predicted by combining cheminformatics with machine learning techniques.
Collapse
Affiliation(s)
- Abdelkader A. Metwally
- Department of Pharmaceutics, Faculty of Pharmacy, Health Sciences Center, Kuwait University, Kuwait City, Kuwait,Department of Pharmaceutics and Industrial Pharmacy, Faculty of Pharmacy, Ain Shams University, Cairo, Egypt,*Correspondence: Abdelkader A. Metwally,
| | - Amira A. Nayel
- Clinical Pharmacy Department, Alexandria Ophthalmology Hospital, Alexandria, Egypt,Department of Clinical Pharmacy and Pharmacy Practice, Faculty of Pharmacy, Alexandria University, Alexandria, Egypt
| | - Rania M. Hathout
- Department of Pharmaceutics and Industrial Pharmacy, Faculty of Pharmacy, Ain Shams University, Cairo, Egypt
| |
Collapse
|
10
|
Cheng XR, Yu BT, Song J, Ma JH, Chen YY, Zhang CX, Tu PH, Muskat MN, Zhu ZG. The Alleviation of Dextran Sulfate Sodium (DSS)-Induced Colitis Correlate with the log P Values of Food-Derived Electrophilic Compounds. Antioxidants (Basel) 2022; 11:antiox11122406. [PMID: 36552614 PMCID: PMC9774124 DOI: 10.3390/antiox11122406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 11/29/2022] [Accepted: 12/02/2022] [Indexed: 12/07/2022] Open
Abstract
Food-derived electrophilic compounds (FECs) are small molecules with electrophilic groups with potential cytoprotective effects. This study investigated the differential effects of six prevalent FECs on colitis in dextran sodium sulfate (DSS)-induced mice and the underlying relationship with molecular characteristics. Fumaric acid (FMA), isoliquiritigenin (ISO), cinnamaldehyde (CA), ferulic acid (FA), sulforaphane (SFN), and chlorogenic acid (CGA) exhibited varying improvements in colitis on clinical signs, colonic histopathology, inflammatory and oxidative indicators, and Nrf2 pathway in a sequence of SFN, ISO > FA, CA > FMA, CGA. Representative molecular characteristics of the “penetration-affinity−covalent binding” procedure, logP value, Keap1 affinity energy, and electrophilic index of FECs were theoretically calculated, among which logP value revealed a strong correlation with colitis improvements, which was related to the expression of Nrf2 and its downstream proteins. Above all, SFN and ISO possessed high logP values and effectively improving DSS-induced colitis by activating the Keap1−Nrf2 pathway to alleviate oxidative stress and inflammatory responses.
Collapse
Affiliation(s)
- Xiang-Rong Cheng
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
- National Engineering Research Center for Functional Food, Jiangnan University, Wuxi 214122, China
| | - Bu-Tao Yu
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
- National Engineering Research Center for Functional Food, Jiangnan University, Wuxi 214122, China
| | - Jie Song
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
- National Engineering Research Center for Functional Food, Jiangnan University, Wuxi 214122, China
| | - Jia-Hui Ma
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
- National Engineering Research Center for Functional Food, Jiangnan University, Wuxi 214122, China
| | - Yu-Yao Chen
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
- National Engineering Research Center for Functional Food, Jiangnan University, Wuxi 214122, China
| | - Chen-Xi Zhang
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
- National Engineering Research Center for Functional Food, Jiangnan University, Wuxi 214122, China
| | - Piao-Han Tu
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
- National Engineering Research Center for Functional Food, Jiangnan University, Wuxi 214122, China
| | - Mitchell N Muskat
- School of Pharmacy, University of California San Francisco, San Francisco, CA 94143, USA
| | - Ze-Gang Zhu
- Jinhua Academy of Agricultural Sciences, Jinhua 321000, China
| |
Collapse
|
11
|
Noise-robust optimization of quantum machine learning models for polymer properties using a simulator and validated on the IonQ quantum computer. Sci Rep 2022; 12:19003. [PMID: 36347908 PMCID: PMC9643424 DOI: 10.1038/s41598-022-22940-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 10/21/2022] [Indexed: 11/09/2022] Open
Abstract
Quantum machine learning for predicting the physical properties of polymer materials based on the molecular descriptors of monomers was investigated. Under the stochastic variation of the expected predicted values obtained from quantum circuits due to finite sampling, the methods proposed in previous works did not make sufficient progress in optimizing the parameters. To enable parameter optimization despite the presence of stochastic variations in the expected values, quantum circuits that improve prediction accuracy without increasing the number of parameters and parameter optimization methods that are robust to stochastic variations in the expected predicted values, were investigated. The multi-scale entanglement renormalization ansatz circuit improved the prediction accuracy without increasing the number of parameters. The stochastic gradient descent method using the parameter-shift rule for gradient calculation was shown to be robust to sampling variability in the expected value. Finally, the quantum machine learning model was trained on an actual ion-trap quantum computer. At each optimization step, the coefficient of determination \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$R^{2}$$\end{document}R2 improved equally on the actual machine and simulator, indicating that our findings enable the training of quantum circuits on the actual quantum computer to the same extent as on the simulator.
Collapse
|
12
|
Asahara R, Miyao T. Extended Connectivity Fingerprints as a Chemical Reaction Representation for Enantioselective Organophosphorus-Catalyzed Asymmetric Reaction Prediction. ACS OMEGA 2022; 7:26952-26964. [PMID: 35936487 PMCID: PMC9352214 DOI: 10.1021/acsomega.2c03812] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 07/07/2022] [Indexed: 06/15/2023]
Abstract
Predicting the outcomes of organic reactions using data-driven approaches aids in the acceleration of research. In laboratory-scale experiments, only a small number of reaction data can be accessed for machine learning model construction, where reaction representations play a pivotal role in the success of model construction. Nevertheless, representation comparison for a small data set is not adequate. Herein, focusing on the enantioselectivity of phosphoric-acid-catalyzed reactions, various two-dimensional and three-dimensional reaction representations (descriptors) were compared. Overall, the concatenated form of the extended connectivity fingerprints showed the best predictive capability for the two types of data sets: high-throughput experimental data and manually collected literature data sets. Furthermore, highlighting the substructure contribution to the prediction outcome was shown to be informative for guiding catalyst development.
Collapse
Affiliation(s)
- Ryosuke Asahara
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Tomoyuki Miyao
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
- Data
Science Center, Nara Institute of Science
and Technology, 8916-5
Takayama-cho, Ikoma, Nara 630-0192, Japan
| |
Collapse
|
13
|
Nkulikiyinka P, Wagland ST, Manovic V, Clough PT. Prediction of Combined Sorbent and Catalyst Materials for SE-SMR, Using QSPR and Multitask Learning. Ind Eng Chem Res 2022; 61:9218-9233. [PMID: 35818477 PMCID: PMC9264356 DOI: 10.1021/acs.iecr.2c00971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
The process of sorption
enhanced steam methane reforming (SE-SMR)
is an emerging technology for the production of low carbon hydrogen.
The development of a suitable catalytic material, as well as a CO2 adsorbent with high capture capacity, has slowed the upscaling
of this process to date. In this study, to aid the development of
a combined sorbent catalyst material (CSCM) for SE-SMR, a novel approach
involving quantitative structure–property relationship analysis
(QSPR) has been proposed. Through data-mining, two databases have
been developed for the prediction of the last cycle capacity (gCO2/gsorbent) and methane conversion
(%). Multitask learning (MTL) was applied for the prediction of CSCM
properties. Patterns in the data of this study have also yielded further
insights; colored scatter plots were able to show certain patterns
in the input data, as well as suggestions on how to develop an optimal
material. With the results from the actual vs predicted plots collated,
raw materials and synthesis conditions were proposed that could lead
to the development of a CSCM that has good performance with respect
to both the last cycle capacity and the methane conversion.
Collapse
Affiliation(s)
- Paula Nkulikiyinka
- Energy and Power Theme, School of Water, Energy and Environment, Cranfield University, Cranfield, Bedfordshire MK43 0AL, U.K
| | - Stuart T. Wagland
- Energy and Power Theme, School of Water, Energy and Environment, Cranfield University, Cranfield, Bedfordshire MK43 0AL, U.K
| | - Vasilije Manovic
- Energy and Power Theme, School of Water, Energy and Environment, Cranfield University, Cranfield, Bedfordshire MK43 0AL, U.K
| | - Peter T. Clough
- Energy and Power Theme, School of Water, Energy and Environment, Cranfield University, Cranfield, Bedfordshire MK43 0AL, U.K
| |
Collapse
|
14
|
De P, Kar S, Ambure P, Roy K. Prediction reliability of QSAR models: an overview of various validation tools. Arch Toxicol 2022; 96:1279-1295. [PMID: 35267067 DOI: 10.1007/s00204-022-03252-y] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 02/14/2022] [Indexed: 01/20/2023]
Abstract
The reliability of any quantitative structure-activity relationship (QSAR) model depends on multiple aspects such as the accuracy of the input dataset, selection of significant descriptors, the appropriate splitting process of the dataset, statistical tools used, and most notably on the measures of validation. Validation, the most crucial step in QSAR model development, confirms the reliability of the developed QSAR models and the acceptability of each step in the model development. The present review deals with various validation tools that involve multiple techniques that improve the model quality and robustness. The double cross-validation tool helps in building improved quality models using different combinations of the same training set in an inner cross-validation loop. This exhaustive method is also integrated for small datasets (< 40 compounds) in another tool, namely the small dataset modeler tool. The main aim of QSAR researchers is to improve prediction quality by lowering the prediction errors for the query compounds. 'Intelligent' selection of multiple models and consensus predictions integrated in the intelligent consensus predictor tool were found to be more externally predictive than individual models. Furthermore, another tool called Prediction Reliability Indicator was explained to understand the quality of predictions for a true external set. This tool uses a composite scoring technique to identify query compounds as 'good' or 'moderate' or 'bad' predictions. We have also discussed a quantitative read-across tool which predicts a chemical response based on the similarity with structural analogues. The discussed tools are freely available from https://dtclab.webs.com/software-tools or http://teqip.jdvu.ac.in/QSAR_Tools/DTCLab/ and https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home (for read-across).
Collapse
Affiliation(s)
- Priyanka De
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032, India
| | - Supratik Kar
- Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, Jackson, MS, 39217, USA
| | | | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032, India.
| |
Collapse
|
15
|
Karthikeyan A, Priyakumar UD. Artificial intelligence: machine learning for chemical sciences. J CHEM SCI 2021; 134:2. [PMID: 34955617 PMCID: PMC8691161 DOI: 10.1007/s12039-021-01995-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 09/08/2021] [Accepted: 09/14/2021] [Indexed: 12/05/2022]
Abstract
Research in molecular sciences witnessed the rise and fall of Artificial Intelligence (AI)/ Machine Learning (ML) methods, especially artificial neural networks, few decades ago. However, we see a major resurgence in the use of modern ML methods in scientific research during the last few years. These methods have had phenomenal success in the areas of computer vision, speech recognition, natural language processing (NLP), etc. This has inspired chemists and biologists to apply these algorithms to problems in natural sciences. Availability of high performance Graphics Processing Unit (GPU) accelerators, large datasets, new algorithms, and libraries has enabled this surge. ML algorithms have successfully been applied to various domains in molecular sciences by providing much faster and sometimes more accurate solutions compared to traditional methods like Quantum Mechanical (QM) calculations, Density Functional Theory (DFT) or Molecular Mechanics (MM) based methods, etc. Some of the areas where the potential of ML methods are shown to be effective are in drug design, prediction of high-level quantum mechanical energies, molecular design, molecular dynamics materials, and retrosynthesis of organic compounds, etc. This article intends to conceptually introduce various modern ML methods and their relevance and applications in computational natural sciences. Synopsis Recent surge in the application of machine learning (ML) methods in fundamental sciences has led to a perspective that these methods may become important tools in chemical science. This perspective provides an overview of the modern ML methods and their successful applications in chemistry during the last few years.
Collapse
Affiliation(s)
- Akshaya Karthikeyan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500 032 India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500 032 India
| |
Collapse
|
16
|
Sleight TW, Sexton CN, Mpourmpakis G, Gilbertson LM, Ng CA. A Classification Model to Identify Direct-Acting Mutagenic Polycyclic Aromatic Hydrocarbon Transformation Products. Chem Res Toxicol 2021; 34:2273-2286. [PMID: 34662518 DOI: 10.1021/acs.chemrestox.1c00187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Polycyclic aromatic hydrocarbons (PAHs) are a complex group of environmental contaminants, many having long environmental half-lives. As these compounds degrade, the changes in their structure can result in a substantial increase in mutagenicity compared to the parent compound. Over time, each individual PAH can potentially degrade into several thousand unique transformation products, creating a complex, constantly evolving set of intermediates. Microbial degradation is the primary mechanism of their transformation and ultimate removal from the environment, and this process can result in mutagenic activation similar to the metabolic activation that can occur in multicellular organisms. The diversity of the potential intermediate structures in PAH-contaminated environments renders hazard assessment difficult for both remediation professionals and regulators. A mixture of structural and energetic descriptors has proven effective in existing studies for classifying which PAH transformation products will be mutagenic. However, most existing studies of environmental PAH mutagens primarily focus on nitrogenated derivatives, which are prevalent in the atmosphere and not as relevant in soil. Additionally, PAH products commonly found in the environment can range from as large as five rings to as small as a single ring, requiring a broadly inclusive methodology to comprehensively evaluate mutagenic potential. We developed a combination of supervised and unsupervised machine learning methods to predict environmentally induced PAH mutagenicity with improved performance over currently available tools. K-means clustering with principal component analysis allows us to identify molecular clusters that we hypothesize to have similar mechanisms of action. Recursive feature elimination identifies the most influential descriptors. The cluster-specific regression outperforms available classifiers in predicting direct-acting mutagens resulting from the microbial biodegradation of PAHs and provides direction for future studies evaluating the environmental hazards resulting from PAH biodegradation.
Collapse
Affiliation(s)
- Trevor W Sleight
- Civil & Environmental Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Caitlin N Sexton
- Chemical and Petroleum Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Giannis Mpourmpakis
- Chemical and Petroleum Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Leanne M Gilbertson
- Civil & Environmental Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States.,Chemical and Petroleum Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Carla A Ng
- Civil & Environmental Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States.,Environmental and Occupational Health, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| |
Collapse
|
17
|
Zhang XC, Wu CK, Yang ZJ, Wu ZX, Yi JC, Hsieh CY, Hou TJ, Cao DS. MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction. Brief Bioinform 2021; 22:6265201. [PMID: 33951729 DOI: 10.1093/bib/bbab152] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 03/11/2021] [Accepted: 04/01/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Accurate and efficient prediction of molecular properties is one of the fundamental issues in drug design and discovery pipelines. Traditional feature engineering-based approaches require extensive expertise in the feature design and selection process. With the development of artificial intelligence (AI) technologies, data-driven methods exhibit unparalleled advantages over the feature engineering-based methods in various domains. Nevertheless, when applied to molecular property prediction, AI models usually suffer from the scarcity of labeled data and show poor generalization ability. RESULTS In this study, we proposed molecular graph BERT (MG-BERT), which integrates the local message passing mechanism of graph neural networks (GNNs) into the powerful BERT model to facilitate learning from molecular graphs. Furthermore, an effective self-supervised learning strategy named masked atoms prediction was proposed to pretrain the MG-BERT model on a large amount of unlabeled data to mine context information in molecules. We found the MG-BERT model can generate context-sensitive atomic representations after pretraining and transfer the learned knowledge to the prediction of a variety of molecular properties. The experimental results show that the pretrained MG-BERT model with a little extra fine-tuning can consistently outperform the state-of-the-art methods on all 11 ADMET datasets. Moreover, the MG-BERT model leverages attention mechanisms to focus on atomic features essential to the target property, providing excellent interpretability for the trained model. The MG-BERT model does not require any hand-crafted feature as input and is more reliable due to its excellent interpretability, providing a novel framework to develop state-of-the-art models for a wide range of drug discovery tasks.
Collapse
Affiliation(s)
- Xiao-Chen Zhang
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, China
| | - Cheng-Kun Wu
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| | - Zhen-Xing Wu
- College of Pharmaceutical Sciences, Zhengjiang University, China
| | - Jia-Cai Yi
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory since 2018. He received his PhD degree in Physics from the University of Ottawa in 2012 and worked as a postdoctoral researcher at the University of Toronto (2012-2013) and Massachusetts Institute of Technology (2013-2016), respectively. Before joining Tencent, he worked as a senior researcher at Singapore-MIT Alliance for Science and Technology (2017-2018)
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| |
Collapse
|
18
|
Khan PM, Lombardo A, Benfenati E, Roy K. First report on chemometric modeling of hydrolysis half-lives of organic chemicals. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2021; 28:1627-1642. [PMID: 32844343 DOI: 10.1007/s11356-020-10500-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 08/12/2020] [Indexed: 06/11/2023]
Abstract
Hydrolysis is one of the most important processes of transformation of organic chemicals in water. The rates of reactions, final chemical entities of these processes, and half-lives of organic chemicals are of considerable interest to environmental chemists as well as authorities involved in the controlling the processing and disposal of such organic chemicals. In this study, we have proposed QSPR models for the prediction of hydrolysis half-life of organic chemicals as a function of different pH and temperature conditions using only two-dimensional molecular descriptors with definite physicochemical significance. For each model, suitable subsets of variables were elected using a genetic algorithm method; next, the elected subsets of variables were subjected to the best subset selection with a key objective to determine the best combination of descriptors for model generation. Finally, QSPR models were constructed using the best combination of variables employing the partial least squares (PLS) regression technique. Next, every final model was subjected for strict validation employing the internationally accepted internal and external validation parameters. The proposed models could be applicable for data gap filling to determine hydrolysis half-lives of organic chemicals at different environmental conditions. Generally, presence of aliphatic ether and ether functional groups, high percentage of oxygen content in the molecule and presence of O-Si pairs of atoms at topological distance one, results in a shorter hydrolysis half-life of organic chemicals. On the other hand, higher unsaturation content and high percentage of nitrogen content in molecules lead to higher hydrolysis half-life. It is also found that branched and compact molecules will have a lower half-life while straight chain analogues will have a higher half-life. To the best of our knowledge, the presented models are the first reported QSPR models for hydrolysis half-lives of organic chemicals at different pH values.
Collapse
Affiliation(s)
- Pathan Mohsin Khan
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Educational and Research (NIPER), Chunilal Bhawan, 168, Manikatala Main Road, Kolkata, 700054, India
| | - Anna Lombardo
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health, Istituto Di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri, 2, 20156, Milano, Italy
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health, Istituto Di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri, 2, 20156, Milano, Italy
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, Kolkata, 700032, India.
| |
Collapse
|
19
|
Hu Y, Zhou G, Zhang C, Zhang M, Chen Q, Zheng L, Niu B. Identify Compounds' Target Against Alzheimer's Disease Based on In-Silico Approach. Curr Alzheimer Res 2020; 16:193-208. [PMID: 30605059 DOI: 10.2174/1567205016666190103154855] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 12/20/2018] [Accepted: 01/03/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Alzheimer's disease swept every corner of the globe and the number of patients worldwide has been rising. At present, there are as many as 30 million people with Alzheimer's disease in the world, and it is expected to exceed 80 million people by 2050. Consequently, the study of Alzheimer's drugs has become one of the most popular medical topics. METHODS In this study, in order to build a predicting model for Alzheimer's drugs and targets, the attribute discriminators CfsSubsetEval, ConsistencySubsetEval and FilteredSubsetEval are combined with search methods such as BestFirst, GeneticSearch and Greedystepwise to filter the molecular descriptors. Then the machine learning algorithms such as BayesNet, SVM, KNN and C4.5 are used to construct the 2D-Structure Activity Relationship(2D-SAR) model. Its modeling results are utilized for Receiver Operating Characteristic curve(ROC) analysis. RESULTS The prediction rates of correctness using Randomforest for AChE, BChE, MAO-B, BACE1, Tau protein and Non-inhibitor are 77.0%, 79.1%, 100.0%, 94.2%, 93.2% and 94.9%, respectively, which are overwhelming as compared to those of BayesNet, BP, SVM, KNN, AdaBoost and C4.5. CONCLUSION In this paper, we conclude that Random Forest is the best learner model for the prediction of Alzheimer's drugs and targets. Besides, we set up an online server to predict whether a small molecule is the inhibitor of Alzheimer's target at http://47.106.158.30:8080/AD/. Furthermore, it can distinguish the target protein of a small molecule.
Collapse
Affiliation(s)
- Yan Hu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Guangya Zhou
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Chi Zhang
- Huaxia Eye Hospital of Foshan, Huaxia Eye Hospital Group, Foshan, Guangdong, China
| | - Mengying Zhang
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Qin Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Linfeng Zheng
- Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University, Shanghai 200080, China.,Department of Radiology, Shanghai First People's Hospital, Baoshan Branch, Shanghai 200940, China
| | - Bing Niu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
20
|
Zhang Y, Han Z, Gao Q, Bai X, Zhang C, Hou H. Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches. Curr Pharm Des 2019; 25:4296-4302. [PMID: 31696803 DOI: 10.2174/1381612825666191107092214] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/04/2019] [Indexed: 12/14/2022]
Abstract
BACKGROUND β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. METHODS In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. RESULTS The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. CONCLUSION This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.
Collapse
Affiliation(s)
- Yuan Zhang
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| | - Zhenyan Han
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| | - Qian Gao
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| | - Xiaoyi Bai
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| | - Chi Zhang
- Huaxia Eye Hospital of Foshan, Huaxia Eye Hospital Group, Foshan, Guangdong, China.,University of Auckland, Auckland, New Zealand
| | - Hongying Hou
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| |
Collapse
|
21
|
Quantitative structure-property relationship modeling of polar analytes lacking UV chromophores to charged aerosol detector response. Anal Bioanal Chem 2019; 411:2945-2959. [DOI: 10.1007/s00216-019-01744-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 02/26/2019] [Accepted: 03/01/2019] [Indexed: 11/27/2022]
|
22
|
Classification of thyroid hormone receptor agonists and antagonists using statistical learning approaches. Mol Divers 2018; 23:85-92. [DOI: 10.1007/s11030-018-9857-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Accepted: 07/09/2018] [Indexed: 02/06/2023]
|
23
|
Quantitative structure –retention relationship modeling of selected antipsychotics and their impurities in green liquid chromatography using cyclodextrin mobile phases. Anal Bioanal Chem 2018; 410:2533-2550. [DOI: 10.1007/s00216-018-0911-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Revised: 12/15/2017] [Accepted: 01/23/2018] [Indexed: 11/25/2022]
|
24
|
Sizochenko N, Gajewicz A, Leszczynski J, Puzyn T. Causation or only correlation? Application of causal inference graphs for evaluating causality in nano-QSAR models. NANOSCALE 2016; 8:7203-8. [PMID: 26972917 DOI: 10.1039/c5nr08279j] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
In this paper, we suggest that causal inference methods could be efficiently used in Quantitative Structure-Activity Relationships (QSAR) modeling as additional validation criteria within quality evaluation of the model. Verification of the relationships between descriptors and toxicity or other activity in the QSAR model has a vital role in understanding the mechanisms of action. The well-known phrase "correlation does not imply causation" reflects insight statistically correlated with the endpoint descriptor may not cause the emergence of this endpoint. Hence, paradigmatic shifts must be undertaken when moving from traditional statistical correlation analysis to causal analysis of multivariate data. Methods of causal discovery have been applied for broader physical insight into mechanisms of action and interpretation of the developed nano-QSAR models. Previously developed nano-QSAR models for toxicity of 17 nano-sized metal oxides towards E. coli bacteria have been validated by means of the causality criteria. Using the descriptors confirmed by the causal technique, we have developed new models consistent with the straightforward causal-reasoning account. It was proven that causal inference methods are able to provide a more robust mechanistic interpretation of the developed nano-QSAR models.
Collapse
Affiliation(s)
- Natalia Sizochenko
- Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308, Gdansk, Poland. and Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Jackson State University, 1400 J. R. Lynch Street, P. O. Box 17910, Jackson, MS 39217, USA
| | - Agnieszka Gajewicz
- Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308, Gdansk, Poland.
| | - Jerzy Leszczynski
- Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Jackson State University, 1400 J. R. Lynch Street, P. O. Box 17910, Jackson, MS 39217, USA
| | - Tomasz Puzyn
- Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308, Gdansk, Poland.
| |
Collapse
|
25
|
Glaab E. Building a virtual ligand screening pipeline using free software: a survey. Brief Bioinform 2016; 17:352-66. [PMID: 26094053 PMCID: PMC4793892 DOI: 10.1093/bib/bbv037] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Revised: 05/20/2015] [Indexed: 12/17/2022] Open
Abstract
Virtual screening, the search for bioactive compounds via computational methods, provides a wide range of opportunities to speed up drug development and reduce the associated risks and costs. While virtual screening is already a standard practice in pharmaceutical companies, its applications in preclinical academic research still remain under-exploited, in spite of an increasing availability of dedicated free databases and software tools. In this survey, an overview of recent developments in this field is presented, focusing on free software and data repositories for screening as alternatives to their commercial counterparts, and outlining how available resources can be interlinked into a comprehensive virtual screening pipeline using typical academic computing facilities. Finally, to facilitate the set-up of corresponding pipelines, a downloadable software system is provided, using platform virtualization to integrate pre-installed screening tools and scripts for reproducible application across different operating systems.
Collapse
|
26
|
Machine Learning Strategy for Accelerated Design of Polymer Dielectrics. Sci Rep 2016; 6:20952. [PMID: 26876223 PMCID: PMC4753456 DOI: 10.1038/srep20952] [Citation(s) in RCA: 111] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 01/13/2016] [Indexed: 01/28/2023] Open
Abstract
The ability to efficiently design new and advanced dielectric polymers is hampered by the lack of sufficient, reliable data on wide polymer chemical spaces, and the difficulty of generating such data given time and computational/experimental constraints. Here, we address the issue of accelerating polymer dielectrics design by extracting learning models from data generated by accurate state-of-the-art first principles computations for polymers occupying an important part of the chemical subspace. The polymers are 'fingerprinted' as simple, easily attainable numerical representations, which are mapped to the properties of interest using a machine learning algorithm to develop an on-demand property prediction model. Further, a genetic algorithm is utilised to optimise polymer constituent blocks in an evolutionary manner, thus directly leading to the design of polymers with given target properties. While this philosophy of learning to make instant predictions and design is demonstrated here for the example of polymer dielectrics, it is equally applicable to other classes of materials as well.
Collapse
|
27
|
Mamy L, Patureau D, Barriuso E, Bedos C, Bessac F, Louchart X, Martin-laurent F, Miege C, Benoit P. Prediction of the Fate of Organic Compounds in the Environment From Their Molecular Properties: A Review. CRITICAL REVIEWS IN ENVIRONMENTAL SCIENCE AND TECHNOLOGY 2015; 45:1277-1377. [PMID: 25866458 PMCID: PMC4376206 DOI: 10.1080/10643389.2014.955627] [Citation(s) in RCA: 76] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
A comprehensive review of quantitative structure-activity relationships (QSAR) allowing the prediction of the fate of organic compounds in the environment from their molecular properties was done. The considered processes were water dissolution, dissociation, volatilization, retention on soils and sediments (mainly adsorption and desorption), degradation (biotic and abiotic), and absorption by plants. A total of 790 equations involving 686 structural molecular descriptors are reported to estimate 90 environmental parameters related to these processes. A significant number of equations was found for dissociation process (pKa), water dissolution or hydrophobic behavior (especially through the KOW parameter), adsorption to soils and biodegradation. A lack of QSAR was observed to estimate desorption or potential of transfer to water. Among the 686 molecular descriptors, five were found to be dominant in the 790 collected equations and the most generic ones: four quantum-chemical descriptors, the energy of the highest occupied molecular orbital (EHOMO) and the energy of the lowest unoccupied molecular orbital (ELUMO), polarizability (α) and dipole moment (μ), and one constitutional descriptor, the molecular weight. Keeping in mind that the combination of descriptors belonging to different categories (constitutional, topological, quantum-chemical) led to improve QSAR performances, these descriptors should be considered for the development of new QSAR, for further predictions of environmental parameters. This review also allows finding of the relevant QSAR equations to predict the fate of a wide diversity of compounds in the environment.
Collapse
Affiliation(s)
- Laure Mamy
- INRA-AgroParisTech, UMR 1402 ECOSYS (Ecologie Fonctionnelle et Ecotoxicologie des Agroécosystèmes), Versailles, France
| | - Dominique Patureau
- INRA, UR 0050 LBE (Laboratoire de Biotechnologie de l’Environnement), Narbonne, France
| | - Enrique Barriuso
- INRA-AgroParisTech, UMR 1402 ECOSYS (Ecologie Fonctionnelle et Ecotoxicologie des Aroécosystèmes), Thiverval-Grignon, France
| | - Carole Bedos
- INRA-AgroParisTech, UMR 1402 ECOSYS (Ecologie Fonctionnelle et Ecotoxicologie des Aroécosystèmes), Thiverval-Grignon, France
| | - Fabienne Bessac
- Université de Toulouse – INPT, Ecole d’Ingénieurs de Purpan – UPS, IRSAMCLaboratoire de Chimie et Physique Quantiques – CNRS, UMR 5626, Toulouse, France
| | - Xavier Louchart
- INRA, UMR 1221 LISAH (Laboratoire d’étude des Interactions Sol - Agrosystème – Hydrosystème), Montpellier, France
| | | | | | - Pierre Benoit
- INRA-AgroParisTech, UMR 1402 ECOSYS (Ecologie Fonctionnelle et Ecotoxicologie des Aroécosystèmes), Thiverval-Grignon, France
| |
Collapse
|
28
|
Kumar SP, Jha PC, Jasrai YT, Pandya HA. The effect of various atomic partial charge schemes to elucidate consensus activity-correlating molecular regions: a test case of diverse QSAR models. J Biomol Struct Dyn 2015; 34:540-59. [PMID: 25997097 DOI: 10.1080/07391102.2015.1044474] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
The estimation of atomic partial charges of the small molecules to calculate molecular interaction fields (MIFs) is an important process in field-based quantitative structure-activity relationship (QSAR). Several studies showed the influence of partial charge schemes that drastically affects the prediction accuracy of the QSAR model and focused on the selection of appropriate charge models that provide highest cross-validated correlation coefficient ([Formula: see text] or q(2)) to explain the variation in chemical structures against biological endpoints. This study shift this focus in a direction to understand the molecular regions deemed to explain SAR in various charge models and recognize a consensus picture of activity-correlating molecular regions. We selected eleven diverse dataset and developed MIF-based QSAR models using various charge schemes including Gasteiger-Marsili, Del Re, Merck Molecular Force Field, Hückel, Gasteiger-Hückel, and Pullman. The generalized resultant QSAR models were then compared with Open3DQSAR model to interpret the MIF descriptors decisively. We suggest the regions of activity contribution or optimization can be effectively determined by studying various charge-based models to understand SAR precisely.
Collapse
Affiliation(s)
- Sivakumar Prasanth Kumar
- a Department of Bioinformatics , Applied Botany Centre (ABC), University School of Sciences, Gujarat University , Ahmedabad 380 009 , India
| | - Prakash C Jha
- b School of Chemical Sciences, Central University of Gujarat , Sector-30, Gandhinagar 382030 , India
| | - Yogesh T Jasrai
- a Department of Bioinformatics , Applied Botany Centre (ABC), University School of Sciences, Gujarat University , Ahmedabad 380 009 , India
| | - Himanshu A Pandya
- a Department of Bioinformatics , Applied Botany Centre (ABC), University School of Sciences, Gujarat University , Ahmedabad 380 009 , India
| |
Collapse
|
29
|
Kandel DD, Raychaudhury C, Pal D. Two new atom centered fragment descriptors and scoring function enhance classification of antibacterial activity. J Mol Model 2014; 20:2164. [PMID: 24664120 DOI: 10.1007/s00894-014-2164-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2013] [Accepted: 01/30/2014] [Indexed: 11/26/2022]
Abstract
Classification of pharmacologic activity of a chemical compound is an essential step in any drug discovery process. We develop two new atom-centered fragment descriptors (vertex indices)--one based solely on topological considerations without discriminating atom or bond types, and another based on topological and electronic features. We also assess their usefulness by devising a method to rank and classify molecules with regard to their antibacterial activity. Classification performances of our method are found to be superior compared to two previous studies on large heterogeneous data sets for hit finding and hit-to-lead studies even though we use much fewer parameters. It is found that for hit finding studies topological features (simple graph) alone provide significant discriminating power, and for hit-to-lead process small but consistent improvement can be made by additionally including electronic features (colored graph). Our approach is simple, interpretable, and suitable for design of molecules as we do not use any physicochemical properties. The singular use of vertex index as descriptor, novel range based feature extraction, and rigorous statistical validation are the key elements of this study.
Collapse
|
30
|
Duardo-Sánchez A, Munteanu CR, Riera-Fernández P, López-Díaz A, Pazos A, González-Díaz H. Modeling Complex Metabolic Reactions, Ecological Systems, and Financial and Legal Networks with MIANN Models Based on Markov-Wiener Node Descriptors. J Chem Inf Model 2013; 54:16-29. [DOI: 10.1021/ci400280n] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Aliuska Duardo-Sánchez
- Department
of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, A Coruña, Spain
- Department of Special Public Law, Financial
and Tributary Law Area, Faculty of Law, University of Santiago de Compostela (USC), 15782, Santiago de Compostela, A Coruña, Spain
| | - Cristian R. Munteanu
- Department
of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, A Coruña, Spain
| | - Pablo Riera-Fernández
- Department
of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, A Coruña, Spain
| | - Antonio López-Díaz
- Department of Special Public Law, Financial
and Tributary Law Area, Faculty of Law, University of Santiago de Compostela (USC), 15782, Santiago de Compostela, A Coruña, Spain
| | - Alejandro Pazos
- Department
of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, A Coruña, Spain
| | - Humberto González-Díaz
- Department of Organic Chemistry II, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), 48940, Leioa, Bizkaia, Spain
- IKERBASQUE, Basque
Foundation for Science, 48011, Bilbao, Biscay, Spain
| |
Collapse
|
31
|
Tian D, Choi KP. Sharp bounds and normalization of Wiener-type indices. PLoS One 2013; 8:e78448. [PMID: 24260118 PMCID: PMC3832646 DOI: 10.1371/journal.pone.0078448] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 09/11/2013] [Indexed: 11/21/2022] Open
Abstract
Complex networks abound in physical, biological and social sciences. Quantifying a network’s topological structure facilitates network exploration and analysis, and network comparison, clustering and classification. A number of Wiener type indices have recently been incorporated as distance-based descriptors of complex networks, such as the R package QuACN. Wiener type indices are known to depend both on the network’s number of nodes and topology. To apply these indices to measure similarity of networks of different numbers of nodes, normalization of these indices is needed to correct the effect of the number of nodes in a network. This paper aims to fill this gap. Moreover, we introduce an -Wiener index of network , denoted by . This notion generalizes the Wiener index to a very wide class of Wiener type indices including all known Wiener type indices. We identify the maximum and minimum of over a set of networks with nodes. We then introduce our normalized-version of -Wiener index. The normalized -Wiener indices were demonstrated, in a number of experiments, to improve significantly the hierarchical clustering over the non-normalized counterparts.
Collapse
Affiliation(s)
- Dechao Tian
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
| | - Kwok Pui Choi
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
- Department of Mathematics, National University of Singapore, Singapore, Singapore
- * E-mail:
| |
Collapse
|
32
|
Haranczyk M, Urbaszek P, Ng EG, Puzyn T. Combinatorial × Computational × Cheminformatics (C3) Approach to Characterization of Congeneric Libraries of Organic Pollutants. J Chem Inf Model 2012; 52:2902-9. [DOI: 10.1021/ci300289b] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Affiliation(s)
- Maciej Haranczyk
- Computational
Research Division, Lawrence Berkeley National Laboratory, One Cyclotron
Road, Mail Stop 50F-1650, Berkeley, California 94720-8139, United
States
| | - Piotr Urbaszek
- Laboratory of Environmental
Chemometrics, Department of Chemistry, University of Gdańsk, Sobieskiego 18/19, 80-952 Gdańsk,
Poland
| | - Esmond G. Ng
- Computational
Research Division, Lawrence Berkeley National Laboratory, One Cyclotron
Road, Mail Stop 50F-1650, Berkeley, California 94720-8139, United
States
| | - Tomasz Puzyn
- Laboratory of Environmental
Chemometrics, Department of Chemistry, University of Gdańsk, Sobieskiego 18/19, 80-952 Gdańsk,
Poland
| |
Collapse
|
33
|
García GC, Palacios-Bejarano B, Ruiz IL, Gómez-Nieto MÁ. Comparison of representational spaces based on structural information in the development of QSAR models for benzylamino enaminone derivatives. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2012; 23:751-774. [PMID: 22988909 DOI: 10.1080/1062936x.2012.719543] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
In this paper we study different representational spaces of molecule data sets based on 2D representation models for the building of QSAR models for the prediction of the activity of 37 benzylamino enaminone derivatives. Approximations based on classical similarity calculated from fingerprint representation of molecules and isomorphism obtained using sub-graph matching algorithms are compared to fragmentation-based approximations using partial least squares and genetic algorithms. The influence of the anchored position of a non-common moiety and the kind of substituents in the common core structure of the data set are analysed, demonstrating the anomalous behaviour of some molecules and therefore the difficulty in building prediction models. These problems are solved by considering approximate similarity models. These models tune the prediction equations based on the size of the substituent and the anchored position, by adjusting the contribution of each substituent in similarity measurements calculated between the molecule data sets.
Collapse
Affiliation(s)
- G Cerruela García
- Department of Computing and Numerical Analysis, University of Córdoba, Spain.
| | | | | | | |
Collapse
|
34
|
Natarajan R. New topological indices with very high discriminatory power. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2011; 22:1-20. [PMID: 21391138 DOI: 10.1080/1062936x.2010.528611] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Several molecular descriptors are used in developing quantitative structure-activity relationships (QSARs). A large number of them are already in use and new descriptors are added every year. Two new topological indices with very high discriminatory power are reported in this paper. The two indices ranked all planar graphs of alkanes C(4) to C(6) uniquely and were found to have non-degenerate values for all the 7668 constitutional isomers (alkane trees) from C(4) to C(15). Low intercorrelation with several of the commonly used topological indices was studied using a diverse data set of 820 chemicals and the new indices proposed in the study were found to cluster in different nodes. This further confirmed their low intercorrelation with other molecular descriptors used in the study. The new descriptors were found to be useful in QSAR modelling.
Collapse
Affiliation(s)
- R Natarajan
- Centre for Mathematical Sciences Pala Campus, Arunapuram, Kerala, India.
| |
Collapse
|