1
|
Tullius Scotti M, Herrera-Acevedo C, Barros de Menezes RP, Martin HJ, Muratov EN, Ítalo de Souza Silva Á, Faustino Albuquerque E, Ferreira Calado L, Coy-Barrera E, Scotti L. MolPredictX: Online Biological Activity Predictions by Machine Learning Models. Mol Inform 2022; 41:e2200133. [PMID: 35961924 DOI: 10.1002/minf.202200133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 08/12/2022] [Indexed: 01/05/2023]
Abstract
Here we report the development of MolPredictX, an innovate and freely accessible web interface for biological activity predictions of query molecules. MolPredictX utilizes in-house QSAR models to provide 27 qualitative predictions (active or inactive), and quantitative probabilities for bioactivity against parasitic (Trypanosoma and Leishmania), viral (Dengue, Sars-CoV and Hepatitis C), pathogenic yeast (Candida albicans), bacterial (Salmonella enterica and Escherichia coli), and Alzheimer disease enzymes. In this article, we introduce the methodology and usability of this webtool, highlighting its potential role in the development of new drugs against a variety of diseases. MolPredictX is undergoing continuous development and is freely available at https://www.molpredictx.ufpb.br/.
Collapse
Affiliation(s)
- Marcus Tullius Scotti
- Programa de Pós-Graduação de Produtos Naturais e Sintéticos Bioativos, Universidade Federal da Paraíba, 58051-900, João Pessoa-PB, Brazil
| | - Chonny Herrera-Acevedo
- Programa de Pós-Graduação de Produtos Naturais e Sintéticos Bioativos, Universidade Federal da Paraíba, 58051-900, João Pessoa-PB, Brazil.,Department of Chemical Engineering, Universidad ECCI, Carrera 19 # 49-20, 111311, Bogotá D.C., Colombia
| | - Renata Priscila Barros de Menezes
- Programa de Pós-Graduação de Produtos Naturais e Sintéticos Bioativos, Universidade Federal da Paraíba, 58051-900, João Pessoa-PB, Brazil
| | - Holli-Joi Martin
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Eugene N Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Ávilla Ítalo de Souza Silva
- Programa de Pós-Graduação de Produtos Naturais e Sintéticos Bioativos, Universidade Federal da Paraíba, 58051-900, João Pessoa-PB, Brazil
| | - Emmanuella Faustino Albuquerque
- Programa de Pós-Graduação de Produtos Naturais e Sintéticos Bioativos, Universidade Federal da Paraíba, 58051-900, João Pessoa-PB, Brazil
| | - Lucas Ferreira Calado
- Programa de Pós-Graduação de Produtos Naturais e Sintéticos Bioativos, Universidade Federal da Paraíba, 58051-900, João Pessoa-PB, Brazil
| | - Ericsson Coy-Barrera
- Bioorganic Chemistry Laboratory, Facultad de Ciencias Básicas y Aplicadas, Universidad Militar Nueva Granada, Cajicá, 250247, Colombia
| | - Luciana Scotti
- Programa de Pós-Graduação de Produtos Naturais e Sintéticos Bioativos, Universidade Federal da Paraíba, 58051-900, João Pessoa-PB, Brazil
| |
Collapse
|
2
|
Aliagas I, Gobbi A, Lee ML, Sellers BD. Comparison of logP and logD correction models trained with public and proprietary data sets. J Comput Aided Mol Des 2022; 36:253-262. [PMID: 35359246 DOI: 10.1007/s10822-022-00450-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 03/15/2022] [Indexed: 10/18/2022]
Abstract
In drug discovery, partition and distribution coefficients, logP and logD for octanol/water, are widely used as metrics of the lipophilicity of molecules, which in turn have a strong influence on the bioactivity and bioavailability of potential drugs. There are a variety of established methods, mostly fragment or atom-based, to calculate logP while logD prediction generally relies on calculated logP and pKa for the estimation of neutral and ionized populations at a given pH. Algorithms such as ClogP have limitations generally leading to systematic errors for chemically related molecules while pKa estimation is generally more difficult due to the interplay of electronic, inductive and conjugation effects for ionizable moieties. We propose an integrated machine learning QSAR modeling approach to predict logD by training the model with experimental data while using ClogP and pKa predicted by commercial software as model descriptors. By optimizing the loss function for the ClogD calculated by the software, we build a correction model that incorporates both descriptors from the software and available experimental logD data. Additionally, we calculate logP from the logD model using the software predicted pKa's. Here, we have trained models using publicly or commercial available logD data to show that this approach can improve on commercial software predictions of lipophilicity. When applied to other logD data sets, this approach extends the domain of applicability of logD and logP predictions over commercial software. Performance of these models favorably compare with models built with a larger set of proprietary logD data.
Collapse
Affiliation(s)
- Ignacio Aliagas
- Discovery Chemistry, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA.
| | - Alberto Gobbi
- Discovery Chemistry, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Man-Ling Lee
- Discovery Chemistry, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Benjamin D Sellers
- Discovery Chemistry, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| |
Collapse
|
3
|
Ksenofontov AA, Lukanov MM, Bocharov PS, Berezin MB, Tetko IV. Deep neural network model for highly accurate prediction of BODIPYs absorption. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 267:120577. [PMID: 34776377 DOI: 10.1016/j.saa.2021.120577] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 10/12/2021] [Accepted: 10/31/2021] [Indexed: 06/13/2023]
Abstract
A possibility to accurately predict the absorption maximum wavelength of BODIPYs was investigated. We found that previously reported models had a low accuracy (40-57 nm) to predict BODIPYs due to the limited dataset sizes and/or number of BODIPYs (few hundreds). New models developed in this study were based on data of 6000-plus fluorescent dyes (including 4000-plus BODIPYs) and the deep neural network architecture. The high prediction accuracy (five-fold cross-validation room mean squared error (RMSE) of 18.4 nm) was obtained using a consensus model, which was more accurate than individual models. This model provided the excellent accuracy (RMSE of 8 nm) for molecules previously synthesized in our laboratory as well as for prospective validation of three new BODIPYs. We found that solvent properties did not significantly influence the model accuracy since only few BODIPYs exhibited solvatochromism. The analysis of large prediction errors suggested that compounds able to have intermolecular interactions with solvent or salts were likely to be incorrectly predicted. The consensus model is freely available at https://ochem.eu/article/134921 and can help the other researchers to accelerate design of new dyes with desired properties.
Collapse
Affiliation(s)
- Alexander A Ksenofontov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Michail M Lukanov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia; Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Pavel S Bocharov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia; Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Michail B Berezin
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia
| | - Igor V Tetko
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia; Helmholtz Zentrum München‑German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; BIGCHEM GmbH, Valerystr. 49, 85716 Unterschleißheim, Germany
| |
Collapse
|
4
|
Web-Based Quantitative Structure-Activity Relationship Resources Facilitate Effective Drug Discovery. Top Curr Chem (Cham) 2021; 379:37. [PMID: 34554348 DOI: 10.1007/s41061-021-00349-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 08/17/2021] [Indexed: 12/28/2022]
Abstract
Traditional drug discovery effectively contributes to the treatment of many diseases but is limited by high costs and long cycles. Quantitative structure-activity relationship (QSAR) methods were introduced to evaluate the activity of compounds virtually, which saves the significant cost of determining the activities of the compounds experimentally. Over the past two decades, many web tools for QSAR modeling with various features have been developed to facilitate the usage of QSAR methods. These web tools significantly reduce the difficulty of using QSAR and indirectly promote drug discovery. However, there are few comprehensive summaries of these QSAR tools, and researchers may have difficulty determining which tool to use. Hence, we systematically surveyed the mainstream web tools for QSAR modeling. This work may guide researchers in choosing appropriate web tools for developing QSAR models, and may also help develop more bioinformatics tools based on these existing resources. For nonprofessionals, we also hope to make more people aware of QSAR methods and expand their use.
Collapse
|
5
|
Keith JA, Vassilev-Galindo V, Cheng B, Chmiela S, Gastegger M, Müller KR, Tkatchenko A. Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems. Chem Rev 2021; 121:9816-9872. [PMID: 34232033 PMCID: PMC8391798 DOI: 10.1021/acs.chemrev.1c00107] [Citation(s) in RCA: 190] [Impact Index Per Article: 63.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Indexed: 12/23/2022]
Abstract
Machine learning models are poised to make a transformative impact on chemical sciences by dramatically accelerating computational algorithms and amplifying insights available from computational chemistry methods. However, achieving this requires a confluence and coaction of expertise in computer science and physical sciences. This Review is written for new and experienced researchers working at the intersection of both fields. We first provide concise tutorials of computational chemistry and machine learning methods, showing how insights involving both can be achieved. We follow with a critical review of noteworthy applications that demonstrate how computational chemistry and machine learning can be used together to provide insightful (and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design.
Collapse
Affiliation(s)
- John A. Keith
- Department
of Chemical and Petroleum Engineering Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Valentin Vassilev-Galindo
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Bingqing Cheng
- Accelerate
Programme for Scientific Discovery, Department
of Computer Science and Technology, 15 J. J. Thomson Avenue, Cambridge CB3 0FD, United Kingdom
| | - Stefan Chmiela
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Michael Gastegger
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea
- Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany
- Google Research, Brain Team, 10117 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| |
Collapse
|
6
|
EGFRisopred: a machine learning-based classification model for identifying isoform-specific inhibitors against EGFR and HER2. Mol Divers 2021; 26:1531-1543. [PMID: 34345964 DOI: 10.1007/s11030-021-10284-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 07/21/2021] [Indexed: 10/20/2022]
Abstract
The EGFR kinase pathway is one of the most frequently activated signaling pathways in human cancers. EGFR and HER2 are the two significant members of this pathway, which are attractive drug targets of clinical relevance in lung and breast cancer. Therefore, identifying EGFR- and HER2-specific inhibitors is one of the important challenges in cancer drug discovery. To address this issue, a dataset of 519 compounds having inhibitory activity against both the isoforms, i.e., EGFR and HER2, was collected from the literature and developed a knowledge-based computational classification model for predicting the specificity of a molecule for an isoform (EGFR/HER2) with precision. A total of seventy-two classification models using nine fingerprint types, four classifiers (IBK, NB, SMO and RF) and two different datasets (EGFR and HER2 isoform specific) were developed. It was observed that the models developed using random forest and IBK performed better for EGFR- and HER2-specific datasets, respectively. Scaffold and functional group analysis led to the identification of prevalent core and fragments in each of the datasets. The accuracy of the selected best performing models was also evaluated using the decoy dataset. We have also developed an application EGFRisopred, which integrates the best performing models and permits the user to predict the specificity of a compound as an EGFR-/HER2-specific anticancer agent. It is expected that the tool's availability as a free utility will allow researchers to identify new inhibitors against these targets important in cancer.
Collapse
|
7
|
Pastor M, Gómez-Tamayo JC, Sanz F. Flame: an open source framework for model development, hosting, and usage in production environments. J Cheminform 2021; 13:31. [PMID: 33875019 PMCID: PMC8054391 DOI: 10.1186/s13321-021-00509-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 04/08/2021] [Indexed: 01/17/2023] Open
Abstract
This article describes Flame, an open source software for building predictive models and supporting their use in production environments. Flame is a web application with a web-based graphic interface, which can be used as a desktop application or installed in a server receiving requests from multiple users. Models can be built starting from any collection of biologically annotated chemical structures since the software supports structural normalization, molecular descriptor calculation, and machine learning model generation using predefined workflows. The model building workflow can be customized from the graphic interface, selecting the type of normalization, molecular descriptors, and machine learning algorithm to be used from a panel of state-of-the-art methods implemented natively. Moreover, Flame implements a mechanism allowing to extend its source code, adding unlimited model customization. Models generated with Flame can be easily exported, facilitating collaborative model development. All models are stored in a model repository supporting model versioning. Models are identified by unique model IDs and include detailed documentation formatted using widely accepted standards. The current version is the result of nearly 3 years of development in collaboration with users from the pharmaceutical industry within the IMI eTRANSAFE project, which aims, among other objectives, to develop high-quality predictive models based on shared legacy data for assessing the safety of drug candidates.
Collapse
Affiliation(s)
- Manuel Pastor
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain.
| | - José Carlos Gómez-Tamayo
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
8
|
Scior T, Abdallah HH, Salvador-Atonal K, Laufer S. Dapsone is not a Pharmacodynamic Lead Compound for its Aryl Derivatives. Curr Comput Aided Drug Des 2021; 16:327-339. [PMID: 32507104 DOI: 10.2174/1573409915666191010104527] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 09/11/2019] [Accepted: 09/16/2019] [Indexed: 02/07/2023]
Abstract
BACKGROUND The relatedness between the linear equations of thermodynamics and QSAR was studied thanks to the recently elucidated crystal structure complexes between sulfonamide pterin conjugates and dihydropteroate synthase (DHPS) together with a published set of thirty- six synthetic dapsone derivatives with their reported entropy-driven activity data. Only a few congeners were slightly better than dapsone. OBJECTIVE Our study aimed at demonstrating the applicability of thermodynamic QSAR and to shed light on the mechanistic aspects of sulfone binding to DHPS. METHODS To this end ligand docking to DHPS, quantum mechanical properties, 2D- and 3D-QSAR as well as Principle Component Analysis (PCA) were carried out. RESULTS The short aryl substituents of the docked pterin-sulfa conjugates were outward oriented into the solvent space without interacting with target residues which explains why binding enthalpy (ΔH) did not correlate with potency. PCA revealed how chemically informative descriptors are evenly loaded on the first three PCs (interpreted as ΔG, ΔH and ΔS), while chemically cryptic ones reflected higher dimensional (complex) loadings. CONCLUSION It is safe to utter that synthesis efforts to introduce short side chains for aryl derivatization of the dapsone scaffold have failed in the past. On theoretical grounds we provide computed evidence why dapsone is not a pharmacodynamic lead for drug profiling because enthalpic terms do not change significantly at the moment of ligand binding to target.
Collapse
Affiliation(s)
- Thomas Scior
- Chemical Science Faculty, Benemerita Universidad Autonoma de Puebla, C.P. 72570, Puebla, Mexico
| | - Hassan H Abdallah
- Chemistry Department, College of Education, Salahaddin University, Erbil, Iraq.,Pharmacy School, University Sains Malaysia, USM, 11800, Penang, Malaysia
| | - Kenia Salvador-Atonal
- Chemical Science Faculty, Benemerita Universidad Autonoma de Puebla, C.P. 72570, Puebla, Mexico
| | - Stefan Laufer
- Pharmazeutisches Institut, Eberhard Karls Universität Tübingen, Tübingen, Germany
| |
Collapse
|
9
|
Banjare P, Matore B, Singh J, Roy PP. In silico local QSAR modeling of bioconcentration factor of organophosphate pesticides. In Silico Pharmacol 2021; 9:28. [PMID: 33868896 PMCID: PMC8019672 DOI: 10.1007/s40203-021-00087-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 03/15/2021] [Indexed: 11/30/2022] Open
Abstract
The persistent and accumulative nature of the pesticide of indiscriminate use emerged as ecotoxicological hazards. The bioconcentration factor (BCF) is one of the key elements for environmental assessments of the aquatic compartment. Limitations of prediction accuracy of global model facilitate the use of local predictive models in toxicity modeling of emerging compounds. The BCF data of diverse organophosphate (n = 55) was collected from the Pesticide Properties Database and used as a model data set in the present study to explore physicochemical properties and structural alert concerning BCF. The structures were downloaded from Pubchem, ChemSpider database. Two splitting techniques (biological sorting and structure-based) were used to divide the whole dataset into training and test set compounds. The QSAR study was carried out with two-dimensional descriptors (2D) calculated from PaDEL by applying genetic algorithm (GA) as chemometric tools using QSARINS software. The models were statistically robust enough both internally as well as externally (Q2: 0.709-0.722, Q2 Ext: 0.717-0.903, CCC: 0.857-0.880). Overall molecular mass, presence of fused, and heterocyclic ring with electron-withdrawing groups affect the BCF value. The developed models reflected extended applicability domain (AD) and reliable predictions than the reported models for the studied chemical class. Finally, predictions of unknown organophosphate pesticides and the toxic nature of unknown organophosphate pesticides were commented on. These findings may be useful for the scientific community in prioritizing high potential pesticides of organophosphate class.
Collapse
Affiliation(s)
- Purusottam Banjare
- Department of Pharmacy, Guru GhasidasVishwavidyalaya (A Central University), Bilaspur, 495009 India
| | - Balaji Matore
- Department of Pharmacy, Guru GhasidasVishwavidyalaya (A Central University), Bilaspur, 495009 India
| | - Jagadish Singh
- Department of Pharmacy, Guru GhasidasVishwavidyalaya (A Central University), Bilaspur, 495009 India
| | - Partha Pratim Roy
- Department of Pharmacy, Guru GhasidasVishwavidyalaya (A Central University), Bilaspur, 495009 India
| |
Collapse
|
10
|
Slavov S, Beger RD. Quantitative structure–toxicity relationships in translational toxicology. CURRENT OPINION IN TOXICOLOGY 2020. [DOI: 10.1016/j.cotox.2020.04.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
11
|
Škuta C, Cortés-Ciriano I, Dehaen W, Kříž P, van Westen GJP, Tetko IV, Bender A, Svozil D. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping. J Cheminform 2020; 12:39. [PMID: 33431038 PMCID: PMC7260783 DOI: 10.1186/s13321-020-00443-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 05/16/2020] [Indexed: 02/11/2023] Open
Abstract
An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds.![]()
Collapse
Affiliation(s)
- C Škuta
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic
| | - I Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - W Dehaen
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic.,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - P Kříž
- Department of Mathematics, Faculty of Chemical Engineering, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - G J P van Westen
- Computational Drug Discovery, Drug Discovery and Safety, LACDR, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - I V Tetko
- Helmholtz Zentrum Muenchen - German Research Center for Environmental Health (GmbH) and BIGCHEM GmbH, Ingolstaedter Landstrasse 1, 85764, Neuherberg, Germany
| | - A Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - D Svozil
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic. .,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic.
| |
Collapse
|
12
|
Trush MM, Kovalishyn V, Hodyna D, Golovchenko OV, Chumachenko S, Tetko IV, Brovarets VS, Metelytsia L. In silico and in vitro studies of a number PILs as new antibacterials against MDR clinical isolate
Acinetobacter baumannii. Chem Biol Drug Des 2020; 95:624-630. [DOI: 10.1111/cbdd.13678] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 01/27/2020] [Accepted: 03/03/2020] [Indexed: 12/30/2022]
Affiliation(s)
- Maria M. Trush
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry National Academy of Science of Ukraine Kyiv Ukraine
| | - Vasyl Kovalishyn
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry National Academy of Science of Ukraine Kyiv Ukraine
| | - Diana Hodyna
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry National Academy of Science of Ukraine Kyiv Ukraine
| | - Olexandr V. Golovchenko
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry National Academy of Science of Ukraine Kyiv Ukraine
| | - Svitlana Chumachenko
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry National Academy of Science of Ukraine Kyiv Ukraine
| | - Igor V. Tetko
- Helmholtz Zentrum München ‐ German Research Center for Environmental Health (GmbH) Neuherberg Germany
- BIGCHEM GmbH Unterschleißheim Germany
| | - Volodymyr S. Brovarets
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry National Academy of Science of Ukraine Kyiv Ukraine
| | - Larysa Metelytsia
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry National Academy of Science of Ukraine Kyiv Ukraine
| |
Collapse
|
13
|
Singh N, Chaput L, Villoutreix BO. Virtual screening web servers: designing chemical probes and drug candidates in the cyberspace. Brief Bioinform 2020; 22:1790-1818. [PMID: 32187356 PMCID: PMC7986591 DOI: 10.1093/bib/bbaa034] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The interplay between life sciences and advancing technology drives a continuous cycle of chemical data growth; these data are most often stored in open or partially open databases. In parallel, many different types of algorithms are being developed to manipulate these chemical objects and associated bioactivity data. Virtual screening methods are among the most popular computational approaches in pharmaceutical research. Today, user-friendly web-based tools are available to help scientists perform virtual screening experiments. This article provides an overview of internet resources enabling and supporting chemical biology and early drug discovery with a main emphasis on web servers dedicated to virtual ligand screening and small-molecule docking. This survey first introduces some key concepts and then presents recent and easily accessible virtual screening and related target-fishing tools as well as briefly discusses case studies enabled by some of these web services. Notwithstanding further improvements, already available web-based tools not only contribute to the design of bioactive molecules and assist drug repositioning but also help to generate new ideas and explore different hypotheses in a timely fashion while contributing to teaching in the field of drug development.
Collapse
Affiliation(s)
- Natesh Singh
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| | - Ludovic Chaput
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| | - Bruno O Villoutreix
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| |
Collapse
|
14
|
Martinez-Mayorga K, Madariaga-Mazon A, Medina-Franco JL, Maggiora G. The impact of chemoinformatics on drug discovery in the pharmaceutical industry. Expert Opin Drug Discov 2020; 15:293-306. [PMID: 31965870 DOI: 10.1080/17460441.2020.1696307] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Introduction: Even though there have been substantial advances in our understanding of biological systems, research in drug discovery is only just now beginning to utilize this type of information. The single-target paradigm, which exemplifies the reductionist approach, remains a mainstay of drug research today. A deeper view of the complexity involved in drug discovery is necessary to advance on this field.Areas covered: This perspective provides a summary of research areas where cheminformatics has played a key role in drug discovery, including of the available resources as well as a personal perspective of the challenges still faced in the field.Expert opinion: Although great strides have been made in the handling and analysis of biological and pharmacological data, more must be done to link the data to biological pathways. This is crucial if one is to understand how drugs modify disease phenotypes, although this will involve a shift from the single drug/single target paradigm that remains a mainstay of drug research. Moreover, such a shift would require an increased awareness of the role of physiology in the mechanism of drug action, which will require the introduction of new mathematical, computer, and biological methods for chemoinformaticians to be trained in.
Collapse
Affiliation(s)
| | | | - José L Medina-Franco
- Facultad de Química, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | | |
Collapse
|
15
|
Fu L, Liu L, Yang ZJ, Li P, Ding JJ, Yun YH, Lu AP, Hou TJ, Cao DS. Systematic Modeling of log D7.4 Based on Ensemble Machine Learning, Group Contribution, and Matched Molecular Pair Analysis. J Chem Inf Model 2019; 60:63-76. [DOI: 10.1021/acs.jcim.9b00718] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Lu Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Pan Li
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, P. R. China
| | - Jun-Jie Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, P. R. China
| | - Yong-Huan Yun
- College of Food Science and Engineering, Hainan University, Haikou 570228, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| |
Collapse
|
16
|
Lamon L, Asturiol D, Vilchez A, Ruperez-Illescas R, Cabellos J, Richarz A, Worth A. Computational models for the assessment of manufactured nanomaterials: Development of model reporting standards and mapping of the model landscape. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2019; 9:143-151. [PMID: 31008416 PMCID: PMC6472618 DOI: 10.1016/j.comtox.2018.12.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 12/05/2018] [Accepted: 12/11/2018] [Indexed: 01/31/2023]
Abstract
Different types of computational models have been developed for predicting the biokinetics, environmental fate, exposure levels and toxicological effects of chemicals and manufactured nanomaterials (MNs). However, these models are not described in a consistent manner in the scientific literature, which is one of the barriers to their broader use and acceptance, especially for regulatory purposes. Quantitative structure-activity relationships (QSARs) are in silico models based on the assumption that the activity of a substance is related to its chemical structure. These models can be used to provide information on (eco)toxicological effects in hazard assessment. In an environmental risk assessment, environmental exposure models can be used to estimate the predicted environmental concentration (PEC). In addition, physiologically based kinetic (PBK) models can be used in various ways to support a human health risk assessment. In this paper, we first propose model reporting templates for systematically and transparently describing models that could potentially be used to support regulatory risk assessments of MNs, for example under the REACH regulation. The model reporting templates include (a) the adaptation of the QSAR Model Reporting Format (QMRF) to report models for MNs, and (b) the development of a model reporting template for PBK and environmental exposure models applicable to MNs. Second, we show the usefulness of these templates to report different models, resulting in an overview of the landscape of available computational models for MNs.
Collapse
Affiliation(s)
- L. Lamon
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| | - D. Asturiol
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| | - A. Vilchez
- Leitat Technological Center, c/de la Innovació 2, Terrassa, Barcelona, Spain
| | - R. Ruperez-Illescas
- Leitat Technological Center, c/de la Innovació 2, Terrassa, Barcelona, Spain
| | - J. Cabellos
- Leitat Technological Center, c/de la Innovació 2, Terrassa, Barcelona, Spain
| | - A. Richarz
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| | - A. Worth
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| |
Collapse
|
17
|
Banegas-Luna AJ, Imbernón B, Llanes Castro A, Pérez-Garrido A, Cerón-Carrasco JP, Gesing S, Merelli I, D'Agostino D, Pérez-Sánchez H. Advances in distributed computing with modern drug discovery. Expert Opin Drug Discov 2018; 14:9-22. [PMID: 30484337 DOI: 10.1080/17460441.2019.1552936] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
INTRODUCTION Computational chemistry dramatically accelerates the drug discovery process and high-performance computing (HPC) can be used to speed up the most expensive calculations. Supporting a local HPC infrastructure is both costly and time-consuming, and, therefore, many research groups are moving from in-house solutions to remote-distributed computing platforms. Areas covered: The authors focus on the use of distributed technologies, solutions, and infrastructures to gain access to HPC capabilities, software tools, and datasets to run the complex simulations required in computational drug discovery (CDD). Expert opinion: The use of computational tools can decrease the time to market of new drugs. HPC has a crucial role in handling the complex algorithms and large volumes of data required to achieve specificity and avoid undesirable side-effects. Distributed computing environments have clear advantages over in-house solutions in terms of cost and sustainability. The use of infrastructures relying on virtualization reduces set-up costs. Distributed computing resources can be difficult to access, although web-based solutions are becoming increasingly available. There is a trade-off between cost-effectiveness and accessibility in using on-demand computing resources rather than free/academic resources. Graphics processing unit computing, with its outstanding parallel computing power, is becoming increasingly important.
Collapse
Affiliation(s)
- Antonio Jesús Banegas-Luna
- a Bioinformatics and High Performance Computing Research Group (BIO-HPC) , Universidad Católica de Murcia (UCAM) , Murcia , Spain
| | - Baldomero Imbernón
- a Bioinformatics and High Performance Computing Research Group (BIO-HPC) , Universidad Católica de Murcia (UCAM) , Murcia , Spain
| | - Antonio Llanes Castro
- a Bioinformatics and High Performance Computing Research Group (BIO-HPC) , Universidad Católica de Murcia (UCAM) , Murcia , Spain
| | - Alfonso Pérez-Garrido
- a Bioinformatics and High Performance Computing Research Group (BIO-HPC) , Universidad Católica de Murcia (UCAM) , Murcia , Spain
| | - José Pedro Cerón-Carrasco
- a Bioinformatics and High Performance Computing Research Group (BIO-HPC) , Universidad Católica de Murcia (UCAM) , Murcia , Spain
| | - Sandra Gesing
- b Center for Research Computing , University of Notre Dame , Notre Dame , IN , USA
| | - Ivan Merelli
- c Institute for Biomedical Technologies , National Research Council of Italy , Segrate (Milan) , Italy
| | - Daniele D'Agostino
- d Institute for Applied Mathematics and Information Technologies "E. Magenes" , National Research Council of Italy , Genoa , Italy
| | - Horacio Pérez-Sánchez
- a Bioinformatics and High Performance Computing Research Group (BIO-HPC) , Universidad Católica de Murcia (UCAM) , Murcia , Spain
| |
Collapse
|
18
|
Piir G, Kahn I, García-Sosa AT, Sild S, Ahte P, Maran U. Best Practices for QSAR Model Reporting: Physical and Chemical Properties, Ecotoxicity, Environmental Fate, Human Health, and Toxicokinetics Endpoints. ENVIRONMENTAL HEALTH PERSPECTIVES 2018; 126:126001. [PMID: 30561225 PMCID: PMC6371683 DOI: 10.1289/ehp3264] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Revised: 10/19/2018] [Accepted: 11/07/2018] [Indexed: 05/31/2023]
Abstract
BACKGROUND Quantitative and qualitative structure–activity relationships (QSARs) have been used to understand chemical behavior for almost a century. The main source of QSAR models is the scientific literature, but the open question is how well these models are documented. OBJECTIVES The main aim of this study was to critically analyze the publication practices of QSARs with regard to transparency, potential reproducibility, and independent verification. The focus was on the level of technical completeness of the published QSARs. METHODS A total of 1,533 QSAR articles reporting 79 individual endpoints, mostly in environmental and health science, were reviewed. The QSAR parameters required for technical completeness were grouped into five categories: chemical structures, experimental endpoint values, descriptor values, mathematical representation of the model, and predicted endpoint values. The data were summarized and discussed using Circos plots. RESULTS Altogether, 42.5% of the reviewed articles were found to be potentially reproducible. The potential reproducibility for different endpoint groups varied; the respective rates were 39% for physical and chemical properties, 52% for ecotoxicity, 56% for environmental fate, 30% for human health, and 32% for toxicokinetics. The reproducibility of QSARs is discussed and placed in the context of the reproducibility of the experimental methods. Included are 65 references to open QSAR datasets as examples of models restored from scientific articles. DISCUSSION Strikingly poor documentation of QSARs was observed, which reduces the transparency, availability, and consequently, the application of research results in scientific, industrial, and regulatory areas. A list of the components needed to ensure the best practices for QSAR reporting is provided, allowing long-term use and preservation of the models. This list also allows an assessment of the reproducibility of models by interested parties such as journal editors, reviewers, regulators, evaluators, and potential users. https://doi.org/10.1289/EHP3264.
Collapse
Affiliation(s)
- Geven Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Iiris Kahn
- Department of Chemistry and Biotechnology, Tallinn University of Technology, Tallinn, Estonia
| | | | - Sulev Sild
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Priit Ahte
- Department of Chemistry and Biotechnology, Tallinn University of Technology, Tallinn, Estonia
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| |
Collapse
|
19
|
Butler KT, Davies DW, Cartwright H, Isayev O, Walsh A. Machine learning for molecular and materials science. Nature 2018; 559:547-555. [PMID: 30046072 DOI: 10.1038/s41586-018-0337-2] [Citation(s) in RCA: 1141] [Impact Index Per Article: 190.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 05/09/2018] [Indexed: 02/06/2023]
Abstract
Here we summarize recent progress in machine learning for the chemical sciences. We outline machine-learning techniques that are suitable for addressing research questions in this domain, as well as future directions for the field. We envisage a future in which the design, synthesis, characterization and application of molecules and materials is accelerated by artificial intelligence.
Collapse
Affiliation(s)
- Keith T Butler
- ISIS Facility, Rutherford Appleton Laboratory, Harwell Campus, Harwell, UK
| | | | | | - Olexandr Isayev
- Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | - Aron Walsh
- Department of Materials Science and Engineering, Yonsei University, Seoul, South Korea. .,Department of Materials, Imperial College London, London, UK.
| |
Collapse
|
20
|
Ghosh D, Koch U, Hadian K, Sattler M, Tetko IV. Luciferase Advisor: High-Accuracy Model To Flag False Positive Hits in Luciferase HTS Assays. J Chem Inf Model 2018; 58:933-942. [PMID: 29667823 DOI: 10.1021/acs.jcim.7b00574] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Firefly luciferase is an enzyme that has found ubiquitous use in biological assays in high-throughput screening (HTS) campaigns. The inhibition of luciferase in such assays could lead to a false positive result. This issue has been known for a long time, and there have been significant efforts to identify luciferase inhibitors in order to enhance recognition of false positives in screening assays. However, although a large amount of publicly accessible luciferase counterscreen data is available, to date little effort has been devoted to building a chemoinformatic model that can identify such molecules in a given data set. In this study we developed models to identify these molecules using various methods, such as molecular docking, SMARTS screening, pharmacophores, and machine learning methods. Among the structure-based methods, the pharmacophore-based method showed promising results, with a balanced accuracy of 74.2%. However, machine-learning approaches using associative neural networks outperformed all of the other methods explored, producing a final model with a balanced accuracy of 89.7%. The high predictive accuracy of this model is expected to be useful for advising which compounds are potential luciferase inhibitors present in luciferase HTS assays. The models developed in this work are freely available at the OCHEM platform at http://ochem.eu .
Collapse
Affiliation(s)
- Dipan Ghosh
- Institute of Structural Biology , Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) , Ingolstaedter Landstrasse 1 , 85764 Neuherberg , Germany
| | - Uwe Koch
- Lead Discovery Center GmbH , Otto-Hahn-Straße 15 , 44227 Dortmund , Germany
| | - Kamyar Hadian
- Assay Development and Screening Platform , Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) , Ingolstaedter Landstrasse 1 , 85764 Neuherberg , Germany
| | - Michael Sattler
- Bayerisches NMR-Zentrum, Department of Chemistry , Technical University of Munich , Ernst-Otto-Fischer-Straße 2 , 85747 Garching , Germany
| | - Igor V Tetko
- Institute of Structural Biology , Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) , Ingolstaedter Landstrasse 1 , 85764 Neuherberg , Germany.,BIGCHEM GmbH , Ingolstaedter Landstrasse 1 b. 60w , 85764 Neuherberg , Germany
| |
Collapse
|
21
|
Druzhilovskiy DS, Rudik AV, Filimonov DA, Gloriozova TA, Lagunin AA, Dmitriev AV, Pogodin PV, Dubovskaya VI, Ivanov SM, Tarasova OA, Bezhentsev VM, Murtazalieva KA, Semin MI, Maiorov IS, Gaur AS, Sastry GN, Poroikov VV. Computational platform Way2Drug: from the prediction of biological activity to drug repurposing. Russ Chem Bull 2018. [DOI: 10.1007/s11172-017-1954-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
22
|
Perspectives from the NanoSafety Modelling Cluster on the validation criteria for (Q)SAR models used in nanotechnology. Food Chem Toxicol 2017; 112:478-494. [PMID: 28943385 DOI: 10.1016/j.fct.2017.09.037] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Revised: 08/31/2017] [Accepted: 09/19/2017] [Indexed: 11/20/2022]
Abstract
Nanotechnology and the production of nanomaterials have been expanding rapidly in recent years. Since many types of engineered nanoparticles are suspected to be toxic to living organisms and to have a negative impact on the environment, the process of designing new nanoparticles and their applications must be accompanied by a thorough risk analysis. (Quantitative) Structure-Activity Relationship ([Q]SAR) modelling creates promising options among the available methods for the risk assessment. These in silico models can be used to predict a variety of properties, including the toxicity of newly designed nanoparticles. However, (Q)SAR models must be appropriately validated to ensure the clarity, consistency and reliability of predictions. This paper is a joint initiative from recently completed European research projects focused on developing (Q)SAR methodology for nanomaterials. The aim was to interpret and expand the guidance for the well-known "OECD Principles for the Validation, for Regulatory Purposes, of (Q)SAR Models", with reference to nano-(Q)SAR, and present our opinions on the criteria to be fulfilled for models developed for nanoparticles.
Collapse
|
23
|
Kovalishyn V, Abramenko N, Kopernyk I, Charochkina L, Metelytsia L, Tetko IV, Peijnenburg W, Kustov L. Modelling the toxicity of a large set of metal and metal oxide nanoparticles using the OCHEM platform. Food Chem Toxicol 2017; 112:507-517. [PMID: 28802948 DOI: 10.1016/j.fct.2017.08.008] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Revised: 06/22/2017] [Accepted: 08/08/2017] [Indexed: 10/19/2022]
Abstract
Inorganic nanomaterials have become one of the new areas of modern knowledge and technology and have already found an increasing number of applications. However, some nanoparticles show toxicity to living organisms, and can potentially have a negative influence on environmental ecosystems. While toxicity can be determined experimentally, such studies are time consuming and costly. Computational toxicology can provide an alternative approach and there is a need to develop methods to reliably assess Quantitative Structure-Property Relationships for nanomaterials (nano-QSPRs). Importantly, development of such models requires careful collection and curation of data. This article overviews freely available nano-QSPR models, which were developed using the Online Chemical Modeling Environment (OCHEM). Multiple data on toxicity of nanoparticles to different living organisms were collected from the literature and uploaded in the OCHEM database. The main characteristics of nanoparticles such as chemical composition of nanoparticles, average particle size, shape, surface charge and information about the biological test species were used as descriptors for developing QSPR models. QSPR methodologies used Random Forests (WEKA-RF), k-Nearest Neighbors and Associative Neural Networks. The predictive ability of the models was tested through cross-validation, giving cross-validated coefficients q2 = 0.58-0.80 for regression models and balanced accuracies of 65-88% for classification models. These results matched the predictions for the test sets used to develop the models. The proposed nano-QSPR models and uploaded data are freely available online at http://ochem.eu/article/103451 and can be used for estimation of toxicity of new and emerging nanoparticles at the early stages of nanomaterial development.
Collapse
Affiliation(s)
- Vasyl Kovalishyn
- Institute of Bioorganic Chemistry & Petrochemistry, National Academy of Science of Ukraine, 1 Murmanska Street, 02660, Kyiv, Ukraine
| | - Natalia Abramenko
- Moscow State University, Chemistry Department, 1 Leninskie Gory, bldg. 3, 119991, Moscow, Russia
| | - Iryna Kopernyk
- Institute of Bioorganic Chemistry & Petrochemistry, National Academy of Science of Ukraine, 1 Murmanska Street, 02660, Kyiv, Ukraine
| | - Larysa Charochkina
- Institute of Bioorganic Chemistry & Petrochemistry, National Academy of Science of Ukraine, 1 Murmanska Street, 02660, Kyiv, Ukraine
| | - Larysa Metelytsia
- Institute of Bioorganic Chemistry & Petrochemistry, National Academy of Science of Ukraine, 1 Murmanska Street, 02660, Kyiv, Ukraine
| | - Igor V Tetko
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, D-85764 Neuherberg, Germany; BIGCHEM, GmbH, Ingolstädter Landstraße 1, b. 60w, D-85764, Neuherberg, Germany
| | - Willie Peijnenburg
- Institute of Environmental Sciences (CML), Leiden University, PO Box 9518, 2300, RA Leiden, The Netherlands; National Institute of Public Health and the Environment, Center for Safety of Substances and Products, PO Box 1, 3720, BA Bilthoven, The Netherlands.
| | - Leonid Kustov
- Moscow State University, Chemistry Department, 1 Leninskie Gory, bldg. 3, 119991, Moscow, Russia; N.D. Zelinsky Institute of Organic Chemistry, RAS, 47 Leninsky Prospect, 119991, Moscow, Russia
| |
Collapse
|
24
|
González-Medina M, Medina-Franco JL. Platform for Unified Molecular Analysis: PUMA. J Chem Inf Model 2017; 57:1735-1740. [PMID: 28737911 DOI: 10.1021/acs.jcim.7b00253] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
We introduce a free platform for chemoinformatic-based diversity analysis and visualization of chemical space of user supplied data sets. Platform for Unified Molecular Analysis (PUMA) integrates metrics used to characterize compound databases including visualization of chemical space, scaffold content, and analysis of chemical diversity. The user's input is a file with SMILES, database names, and compound IDs. PUMA computes molecular properties of pharmaceutical relevance, Murcko scaffolds, and diversity analysis. The user can interactively navigate through the graphs and export image files and the raw data of the diversity calculations. The platform links two public online resources: Consensus Diversity Plots for the assessment of global diversity and Activity Landscape Plotter to analyze structure-activity relationships. Herein, we describe the functionalities of PUMA and exemplify its use through the analysis of compound databases of general interest. PUMA is freely accessible at the authors web-site https://www.difacquim.com/d-tools/ .
Collapse
Affiliation(s)
- Mariana González-Medina
- School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México , Avenida Universidad 3000, Mexico City 04510, Mexico
| | - José L Medina-Franco
- School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México , Avenida Universidad 3000, Mexico City 04510, Mexico
| |
Collapse
|
25
|
|
26
|
Dong J, Yao ZJ, Zhu MF, Wang NN, Lu B, Chen AF, Lu AP, Miao H, Zeng WB, Cao DS. ChemSAR: an online pipelining platform for molecular SAR modeling. J Cheminform 2017; 9:27. [PMID: 29086046 PMCID: PMC5418185 DOI: 10.1186/s13321-017-0215-1] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2016] [Accepted: 04/24/2017] [Indexed: 12/31/2022] Open
Abstract
Background In recent years, predictive models based on machine learning techniques have proven to be feasible and effective in drug discovery. However, to develop such a model, researchers usually have to combine multiple tools and undergo several different steps (e.g., RDKit or ChemoPy package for molecular descriptor calculation, ChemAxon Standardizer for structure preprocessing, scikit-learn package for model building, and ggplot2 package for statistical analysis and visualization, etc.). In addition, it may require strong programming skills to accomplish these jobs, which poses severe challenges for users without advanced training in computer programming. Therefore, an online pipelining platform that integrates a number of selected tools is a valuable and efficient solution that can meet the needs of related researchers. Results This work presents a web-based pipelining platform, called ChemSAR, for generating SAR classification models of small molecules. The capabilities of ChemSAR include the validation and standardization of chemical structure representation, the computation of 783 1D/2D molecular descriptors and ten types of widely-used fingerprints for small molecules, the filtering methods for feature selection, the generation of predictive models via a step-by-step job submission process, model interpretation in terms of feature importance and tree visualization, as well as a helpful report generation system. The results can be visualized as high-quality plots and downloaded as local files. Conclusion ChemSAR provides an integrated web-based platform for generating SAR classification models that will benefit cheminformatics and other biomedical users. It is freely available at: http://chemsar.scbdd.com.. ![]() Electronic supplementary material The online version of this article (doi:10.1186/s13321-017-0215-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China
| | - Zhi-Jiang Yao
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China.,The Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Min-Feng Zhu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China.,The Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Ning-Ning Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China
| | - Ben Lu
- The Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Alex F Chen
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China.,The Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong SAR, People's Republic of China
| | - Hongyu Miao
- Department of Biostatistics, School of Public Health, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Wen-Bin Zeng
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China. .,Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong SAR, People's Republic of China.
| |
Collapse
|
27
|
Capuzzi SJ, Kim ISJ, Lam WI, Thornton TE, Muratov EN, Pozefsky D, Tropsha A. Chembench: A Publicly Accessible, Integrated Cheminformatics Portal. J Chem Inf Model 2017; 57:105-108. [PMID: 28045544 DOI: 10.1021/acs.jcim.6b00462] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The enormous increase in the amount of publicly available chemical genomics data and the growing emphasis on data sharing and open science mandates that cheminformaticians also make their models publicly available for broad use by the scientific community. Chembench is one of the first publicly accessible, integrated cheminformatics Web portals. It has been extensively used by researchers from different fields for curation, visualization, analysis, and modeling of chemogenomics data. Since its launch in 2008, Chembench has been accessed more than 1 million times by more than 5000 users from a total of 98 countries. We report on the recent updates and improvements that increase the simplicity of use, computational efficiency, accuracy, and accessibility of a broad range of tools and services for computer-assisted drug design and computational toxicology available on Chembench. Chembench remains freely accessible at https://chembench.mml.unc.edu.
Collapse
Affiliation(s)
- Stephen J Capuzzi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, and ‡Department of Computer Science, University of North Carolina , Chapel Hill, North Carolina 27599, United States
| | - Ian Sang-June Kim
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, and ‡Department of Computer Science, University of North Carolina , Chapel Hill, North Carolina 27599, United States
| | - Wai In Lam
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, and ‡Department of Computer Science, University of North Carolina , Chapel Hill, North Carolina 27599, United States
| | - Thomas E Thornton
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, and ‡Department of Computer Science, University of North Carolina , Chapel Hill, North Carolina 27599, United States
| | - Eugene N Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, and ‡Department of Computer Science, University of North Carolina , Chapel Hill, North Carolina 27599, United States
| | - Diane Pozefsky
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, and ‡Department of Computer Science, University of North Carolina , Chapel Hill, North Carolina 27599, United States
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, and ‡Department of Computer Science, University of North Carolina , Chapel Hill, North Carolina 27599, United States
| |
Collapse
|