1
|
Engler Hart C, Preto AJ, Chanana S, Healey D, Kind T, Domingo-Fernández D. Evaluating the generalizability of graph neural networks for predicting collision cross section. J Cheminform 2024; 16:105. [PMID: 39210378 PMCID: PMC11363525 DOI: 10.1186/s13321-024-00899-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024] Open
Abstract
Ion Mobility coupled with Mass Spectrometry (IM-MS) is a promising analytical technique that enhances molecular characterization by measuring collision cross-section (CCS) values, which are indicative of the molecular size and shape. However, the effective application of CCS values in structural analysis is still constrained by the limited availability of experimental data, necessitating the development of accurate machine learning (ML) models for in silico predictions. In this study, we evaluated state-of-the-art Graph Neural Networks (GNNs), trained to predict CCS values using the largest publicly available dataset to date. Although our results confirm the high accuracy of these models within chemical spaces similar to their training environments, their performance significantly declines when applied to structurally novel regions. This discrepancy raises concerns about the reliability of in silico CCS predictions and underscores the need for releasing further publicly available CCS datasets. To mitigate this, we introduce Mol2CCS which demonstrates how generalization can be partially improved by extending models to account for additional features such as molecular fingerprints, descriptors, and the molecule types. Lastly, we also show how confidence models can support by enhancing the reliability of the CCS estimates.Scientific contributionWe have benchmarked state-of-the-art graph neural networks for predicting collision cross section. Our work highlights the accuracy of these models when trained and predicted in similar chemical spaces, but also how their accuracy drops when evaluated in structurally novel regions. Lastly, we conclude by presenting potential approaches to mitigate this issue.
Collapse
Affiliation(s)
- Chloe Engler Hart
- Enveda Biosciences, Inc., 5700 Flatiron Pkwy, Boulder, CO, 80301, USA
| | | | - Shaurya Chanana
- Enveda Biosciences, Inc., 5700 Flatiron Pkwy, Boulder, CO, 80301, USA
| | - David Healey
- Enveda Biosciences, Inc., 5700 Flatiron Pkwy, Boulder, CO, 80301, USA
| | - Tobias Kind
- Enveda Biosciences, Inc., 5700 Flatiron Pkwy, Boulder, CO, 80301, USA
| | | |
Collapse
|
2
|
Heyndrickx W, Mervin L, Morawietz T, Sturm N, Friedrich L, Zalewski A, Pentina A, Humbeck L, Oldenhof M, Niwayama R, Schmidtke P, Fechner N, Simm J, Arany A, Drizard N, Jabal R, Afanasyeva A, Loeb R, Verma S, Harnqvist S, Holmes M, Pejo B, Telenczuk M, Holway N, Dieckmann A, Rieke N, Zumsande F, Clevert DA, Krug M, Luscombe C, Green D, Ertl P, Antal P, Marcus D, Do Huu N, Fuji H, Pickett S, Acs G, Boniface E, Beck B, Sun Y, Gohier A, Rippmann F, Engkvist O, Göller AH, Moreau Y, Galtier MN, Schuffenhauer A, Ceulemans H. MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information. J Chem Inf Model 2024; 64:2331-2344. [PMID: 37642660 PMCID: PMC11005050 DOI: 10.1021/acs.jcim.3c00799] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Indexed: 08/31/2023]
Abstract
Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.
Collapse
Affiliation(s)
| | - Lewis Mervin
- AstraZeneca
R&D, Biomedical Campus, 1 Francis Crick Ave, Cambridge CB2 0SL, U.K.
| | - Tobias Morawietz
- Bayer
Pharma
AG, Global Drug Discovery, Chemical Research,
Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany
| | - Noé Sturm
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Lukas Friedrich
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Adam Zalewski
- Amgen Research
(Munich) GmbH, Staffelseestraße
2, Munich 81477, Germany
| | - Anastasia Pentina
- Bayer AG, Machine Learning Research, Research & Development,
Pharmaceuticals, Berlin 10117, Germany
| | - Lina Humbeck
- BI Medicinal
Chemistry Department, Boehringer Ingelheim
Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany
| | - Martijn Oldenhof
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Ritsuya Niwayama
- Institut
de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France
| | | | - Nikolas Fechner
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Jaak Simm
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Adam Arany
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | | | - Rama Jabal
- Iktos, 65 rue de Prony, Paris 75017, France
| | - Arina Afanasyeva
- Modality
Informatics Group, Digital Research Solutions, Advanced Informatics
& Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Regis Loeb
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Shlok Verma
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Simon Harnqvist
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Matthew Holmes
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Balazs Pejo
- Budapest
University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary
| | | | - Nicholas Holway
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Arne Dieckmann
- Bayer
AG, API Production, Product Supply, Pharmaceuticals, Ernst-Schering-Straße 14, Bergkamen 59192, Germany
| | - Nicola Rieke
- NVIDIA
GmbH, Floessergasse 2, Munich 81369, Germany
| | | | - Djork-Arné Clevert
- Bayer AG, Machine Learning Research, Research & Development,
Pharmaceuticals, Berlin 10117, Germany
| | - Michael Krug
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Christopher Luscombe
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Darren Green
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Peter Ertl
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Peter Antal
- Budapest
University of Technology and Economics, Department of Measurement and Information Systems, Műegyetem rkp. 3, Budapest 1111, Hungary
| | - David Marcus
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | | | - Hideyoshi Fuji
- Modality
Informatics Group, Digital Research Solutions, Advanced Informatics
& Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Stephen Pickett
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Gergely Acs
- Budapest
University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary
| | - Eric Boniface
- Substra
Foundation - Labelia Labs, 4 rue Voltaire, Nantes 44000, France
| | - Bernd Beck
- BI Medicinal
Chemistry Department, Boehringer Ingelheim
Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany
| | - Yax Sun
- Amgen
Research, 1 Amgen Center
Drive, Thousand Oaks, California 92130, United States
| | - Arnaud Gohier
- Institut
de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France
| | - Friedrich Rippmann
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Ola Engkvist
- AstraZeneca, Molecular AI, Discovery Sciences,
R&D, Pepparedsleden
1, Mölndal 431 50, Sweden
| | - Andreas H. Göller
- Bayer
Pharma
AG, Global Drug Discovery, Chemical Research,
Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany
| | - Yves Moreau
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | | | - Ansgar Schuffenhauer
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Hugo Ceulemans
- Janssen
Pharmaceutica NV, Turnhoutseweg 30, Beerse 2340, Belgium
| |
Collapse
|
3
|
Samizo S, Kaneko H. Predictive Modeling of HMG-CoA Reductase Inhibitory Activity and Design of New HMG-CoA Reductase Inhibitors. ACS OMEGA 2023; 8:27247-27255. [PMID: 37546661 PMCID: PMC10399166 DOI: 10.1021/acsomega.3c02567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 06/30/2023] [Indexed: 08/08/2023]
Abstract
As blood cholesterol increases, it accumulates in the intima of blood vessels, elevating the risk of atherosclerosis and coronary artery disease. Drugs that inhibit enzymes essential for cholesterol synthesis are effective in improving blood cholesterol levels. Statins are used to treat hypercholesterolemia as they inhibit 3-hydroxyl-3-methylglutaryl coenzyme A (HMG-CoA) reductase (HMGR), the rate-limiting enzyme in cholesterol synthesis. Statins are known to exert their effects by translocating to the liver, where they are taken up by the organic anion transporting polypeptide 1B1 (OATP1B1). Therefore, we hypothesized that a compound with high HMGR inhibitory activity and high affinity for OATP1B1 would be an excellent new therapeutic agent for hypercholesterolemia with increased liver selectivity and fewer side effects. In this study, we developed two models for predicting HMGR inhibitory activity and OATP1B1 affinity to propose the chemical structure of a new therapeutic agent for hypercholesterolemia with both high inhibitory activity and high liver selectivity. HMGR inhibitory activity and OATP1B1 affinity prediction models were constructed with high prediction accuracy for the test data: r2 = 0.772 and 0.768, respectively. New chemical structures were then input into these models to search for candidate compounds. We found compounds with higher HMGR inhibitory activity and OATP1B1 affinity than rosuvastatin, the most recently developed statin drug, and compounds that did not have a common structure of statins with high HMGR inhibitory activity.
Collapse
|
4
|
Gui C, Li Y, Peng T. Development of predictive QSAR models for the substrates/inhibitors of OATP1B1 by deep neural networks. Toxicol Lett 2023; 376:20-25. [PMID: 36649904 DOI: 10.1016/j.toxlet.2023.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 01/10/2023] [Accepted: 01/12/2023] [Indexed: 01/15/2023]
Abstract
The organic anion transporting polypeptide 1B1 (OATP1B1) is an important hepatic uptake transporter. Inhibition of its normal function could lead to drug-drug interactions. In silico prediction is an effective means to identify potential OATP1B1 inhibitors and quantitative structure-activity relationship (QSAR) modeling is extensively used. As the structures of OATP1B1 substrates/inhibitors are quite diverse, machine learning based methods should be a good option for their QSAR analysis. In the present study, deep neural networks (DNNs) were employed to develop QSAR models for the substrates/inhibitors of OATP1B1 with different molecular fingerprints. Our results showed that QSAR models based on 4-hidden layer DNNs and ECFP4/FCFP4 fingerprints had the best generalization performance. The correlation coefficients (R2) of test set for ECFP4 and FCFP4 models were 0.641 and 0.653, respectively. Model application domain (AD) was calculated with Euclidean distance-based method, and AD could improve the performance of ECFP4 model but has little effect on FCFP4 model. Finally, the prediction of additional 8 compounds that not included in the data set further demonstrated that our QSAR models had a good predictive ability (averaged prediction accuracy >92%). The developed QSAR models could be used to screen large data sets and discover novel inhibitors for OATP1B1.
Collapse
Affiliation(s)
- Chunshan Gui
- College of Pharmaceutical Sciences, Soochow University, 199 Renai Road, Suzhou Industrial Park, Suzhou 215123, China.
| | - Ying Li
- College of Pharmaceutical Sciences, Soochow University, 199 Renai Road, Suzhou Industrial Park, Suzhou 215123, China
| | - Taotao Peng
- College of Pharmaceutical Sciences, Soochow University, 199 Renai Road, Suzhou Industrial Park, Suzhou 215123, China
| |
Collapse
|
5
|
In Silico Identification of Anti-SARS-CoV-2 Medicinal Plants Using Cheminformatics and Machine Learning. MOLECULES (BASEL, SWITZERLAND) 2022; 28:molecules28010208. [PMID: 36615401 PMCID: PMC9821958 DOI: 10.3390/molecules28010208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/17/2022] [Accepted: 12/23/2022] [Indexed: 12/28/2022]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative pathogen of COVID-19, is spreading rapidly and has caused hundreds of millions of infections and millions of deaths worldwide. Due to the lack of specific vaccines and effective treatments for COVID-19, there is an urgent need to identify effective drugs. Traditional Chinese medicine (TCM) is a valuable resource for identifying novel anti-SARS-CoV-2 drugs based on the important contribution of TCM and its potential benefits in COVID-19 treatment. Herein, we aimed to discover novel anti-SARS-CoV-2 compounds and medicinal plants from TCM by establishing a prediction method of anti-SARS-CoV-2 activity using machine learning methods. We first constructed a benchmark dataset from anti-SARS-CoV-2 bioactivity data collected from the ChEMBL database. Then, we established random forest (RF) and support vector machine (SVM) models that both achieved satisfactory predictive performance with AUC values of 0.90. By using this method, a total of 1011 active anti-SARS-CoV-2 compounds were predicted from the TCMSP database. Among these compounds, six compounds with highly potent activity were confirmed in the anti-SARS-CoV-2 experiments. The molecular fingerprint similarity analysis revealed that only 24 of the 1011 compounds have high similarity to the FDA-approved antiviral drugs, indicating that most of the compounds were structurally novel. Based on the predicted anti-SARS-CoV-2 compounds, we identified 74 anti-SARS-CoV-2 medicinal plants through enrichment analysis. The 74 plants are widely distributed in 68 genera and 43 families, 14 of which belong to antipyretic detoxicate plants. In summary, this study provided several medicinal plants with potential anti-SARS-CoV-2 activity, which offer an attractive starting point and a broader scope to mine for potentially novel anti-SARS-CoV-2 drugs.
Collapse
|
6
|
Zhao Q, Yu Y, Gao Y, Shen L, Cui S, Gou Y, Zhang C, Zhuang S, Jiang G. Machine Learning-Based Models with High Accuracy and Broad Applicability Domains for Screening PMT/vPvM Substances. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:17880-17889. [PMID: 36475377 DOI: 10.1021/acs.est.2c06155] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Persistent, mobile, and toxic (PMT) substances and very persistent and very mobile (vPvM) substances can transport over long distances from various sources, increasing the public health risk. A rapid and high-throughput screening of PMT/vPvM substances is thus warranted to the risk prevention and mitigation measures. Herein, we construct a machine learning-based screening system integrated with five models for high-throughput classification of PMT/vPvM substances. The models are constructed with 44 971 substances by conventional learning, deep learning, and ensemble learning algorithms, among which, LightGBM and XGBoost outperform other algorithms with metrics exceeding 0.900. Good model interpretability is achieved through the number of free halogen atoms (fr_halogen) and the logarithm of partition coefficient (MolLogP) as the two most critical molecular descriptors representing the persistence and mobility of substances, respectively. Our screening system exhibits a great generalization capability with area under the receiver operating characteristic curve (AUROC) above 0.951 and is successfully applied to the persistent organic pollutants (POPs), prioritized PMT/vPvM substances, and pesticides. The screening system constructed in this study can serve as an efficient and reliable tool for high-throughput risk assessment and the prioritization of managing emerging contaminants.
Collapse
Affiliation(s)
- Qiming Zhao
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou310058, China
| | - Yang Yu
- Solid Waste and Chemicals Management Center, Ministry of Ecology and Environment of the People's Republic of China, Beijing100029, China
| | - Yuchen Gao
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou310058, China
| | - Lilai Shen
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou310058, China
| | - Shixuan Cui
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou310058, China
- Women's Reproductive Health Key Laboratory of Zhejiang Province, Women's Hospital, School of Medicine, Zhejiang University, Hangzhou310006, China
| | - Yiyuan Gou
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou310058, China
| | - Chunlong Zhang
- Department of Environmental Sciences, University of Houston-Clear Lake, 2700 Bay Area Blvd., Houston, Texas77058, United States
| | - Shulin Zhuang
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou310058, China
- Women's Reproductive Health Key Laboratory of Zhejiang Province, Women's Hospital, School of Medicine, Zhejiang University, Hangzhou310006, China
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing100085, China
| |
Collapse
|
7
|
Hao Y, Fan T, Sun G, Li F, Zhang N, Zhao L, Zhong R. Environmental toxicity risk evaluation of nitroaromatic compounds: Machine learning driven binary/multiple classification and design of safe alternatives. Food Chem Toxicol 2022; 170:113461. [PMID: 36243219 DOI: 10.1016/j.fct.2022.113461] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 09/11/2022] [Accepted: 10/04/2022] [Indexed: 11/06/2022]
|
8
|
Bort W, Mazitov D, Horvath D, Bonachera F, Lin A, Marcou G, Baskin I, Madzhidov T, Varnek A. Inverse QSAR: Reversing Descriptor-Driven Prediction Pipeline Using Attention-Based Conditional Variational Autoencoder. J Chem Inf Model 2022; 62:5471-5484. [PMID: 36332178 DOI: 10.1021/acs.jcim.2c01086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In order to better foramize it, the notorious inverse-QSAR problem (finding structures of given QSAR-predicted properties) is considered in this paper as a two-step process including (i) finding "seed" descriptor vectors corresponding to user-constrained QSAR model output values and (ii) identifying the chemical structures best matching the "seed" vectors. The main development effort here was focused on the latter stage, proposing a new attention-based conditional variational autoencoder neural-network architecture based on recent developments in attention-based methods. The obtained results show that this workflow was capable of generating compounds predicted to display desired activity while being completely novel compared to the training database (ChEMBL). Moreover, the generated compounds show acceptable druglikeness and synthetic accessibility. Both pharmacophore and docking studies were carried out as "orthogonal" in silico validation methods, proving that some of de novo structures are, beyond being predicted active by 2D-QSAR models, clearly able to match binding 3D pharmacophores and bind the protein pocket.
Collapse
Affiliation(s)
- William Bort
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Daniyar Mazitov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Fanny Bonachera
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Arkadii Lin
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Igor Baskin
- Department of Material Science and Engineering, Technion─Israel Institute of Technology, 3200003 Haifa, Israel
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| |
Collapse
|
9
|
Rayevsky AV, Poturai AS, Kravets IO, Pashenko AE, Borisova TA, Tolstanova GM, Volochnyuk DM, Borysko PO, Vadzyuk OB, Alieksieieva DO, Zabolotna Y, Klimchuk O, Horvath D, Marcou G, Ryabukhin SV, Varnek A. In Vitro Evaluation of In Silico Screening Approaches in Search for Selective ACE2 Binding Chemical Probes. Molecules 2022; 27:molecules27175400. [PMID: 36080168 PMCID: PMC9458095 DOI: 10.3390/molecules27175400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 08/16/2022] [Accepted: 08/18/2022] [Indexed: 11/16/2022] Open
Abstract
New models for ACE2 receptor binding, based on QSAR and docking algorithms were developed, using XRD structural data and ChEMBL 26 database hits as training sets. The selectivity of the potential ACE2-binding ligands towards Neprilysin (NEP) and ACE was evaluated. The Enamine screening collection (3.2 million compounds) was virtually screened according to the above models, in order to find possible ACE2-chemical probes, useful for the study of SARS-CoV2-induced neurological disorders. An enzymology inhibition assay for ACE2 was optimized, and the combined diversified set of predicted selective ACE2-binding molecules from QSAR modeling, docking, and ultrafast docking was screened in vitro. The in vitro hits included two novel chemotypes suitable for further optimization.
Collapse
Affiliation(s)
- Alexey V. Rayevsky
- Enamine Ltd., 78 Chervonotkatska Street, 02660 Kyiv, Ukraine
- Institute of Food Biotechnology and Genomics, National Academy of Sciences of Ukraine, 2a Osipovskogo Street, 04123 Kyiv, Ukraine
| | | | - Iryna O. Kravets
- Enamine Ltd., 78 Chervonotkatska Street, 02660 Kyiv, Ukraine
- Chemspace LLC, 85 Chervonotkatska Street, 02094 Kyiv, Ukraine
| | - Alexander E. Pashenko
- Enamine Ltd., 78 Chervonotkatska Street, 02660 Kyiv, Ukraine
- Educational and Scientific Institute of High Technologies, Taras Shevchenko National University of Kyiv, 60 Volodymyrska Street, 01033 Kyiv, Ukraine
- Institute of Organic Chemistry, National Academy of Sciences of Ukraine, 5 Murmanska Street, 03028 Kyiv, Ukraine
| | - Tatiana A. Borisova
- Palladin Institute of Biochemistry of the National Academy of Sciences of Ukraine, 9 Leontovitcha Street, 01054 Kyiv, Ukraine
| | - Ganna M. Tolstanova
- Educational and Scientific Institute of High Technologies, Taras Shevchenko National University of Kyiv, 60 Volodymyrska Street, 01033 Kyiv, Ukraine
| | - Dmitriy M. Volochnyuk
- Enamine Ltd., 78 Chervonotkatska Street, 02660 Kyiv, Ukraine
- Chemspace LLC, 85 Chervonotkatska Street, 02094 Kyiv, Ukraine
- Educational and Scientific Institute of High Technologies, Taras Shevchenko National University of Kyiv, 60 Volodymyrska Street, 01033 Kyiv, Ukraine
| | | | - Olga B. Vadzyuk
- Enamine Ltd., 78 Chervonotkatska Street, 02660 Kyiv, Ukraine
| | | | - Yuliana Zabolotna
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, 67081 Strasbourg, France
| | - Olga Klimchuk
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, 67081 Strasbourg, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, 67081 Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, 67081 Strasbourg, France
| | - Sergey V. Ryabukhin
- Enamine Ltd., 78 Chervonotkatska Street, 02660 Kyiv, Ukraine
- Educational and Scientific Institute of High Technologies, Taras Shevchenko National University of Kyiv, 60 Volodymyrska Street, 01033 Kyiv, Ukraine
- Institute of Organic Chemistry, National Academy of Sciences of Ukraine, 5 Murmanska Street, 03028 Kyiv, Ukraine
- Correspondence: (S.V.R.); (A.V.)
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, 67081 Strasbourg, France
- Correspondence: (S.V.R.); (A.V.)
| |
Collapse
|
10
|
Grebner C, Matter H, Hessler G. Artificial Intelligence in Compound Design. Methods Mol Biol 2021; 2390:349-382. [PMID: 34731477 DOI: 10.1007/978-1-0716-1787-8_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Artificial intelligence has seen an incredibly fast development in recent years. Many novel technologies for property prediction of drug molecules as well as for the design of novel molecules were introduced by different research groups. These artificial intelligence-based design methods can be applied for suggesting novel chemical motifs in lead generation or scaffold hopping as well as for optimization of desired property profiles during lead optimization. In lead generation, broad sampling of the chemical space for identification of novel motifs is required, while in the lead optimization phase, a detailed exploration of the chemical neighborhood of a current lead series is advantageous. These different requirements for successful design outcomes render different combinations of artificial intelligence technologies useful. Overall, we observe that a combination of different approaches with tailored scoring and evaluation schemes appears beneficial for efficient artificial intelligence-based compound design.
Collapse
Affiliation(s)
- Christoph Grebner
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany
| | - Hans Matter
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany
| | - Gerhard Hessler
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany.
| |
Collapse
|
11
|
Gantzer P, Creton B, Nieto-Draghi C. Comparisons of Molecular Structure Generation Methods Based on Fragment Assemblies and Genetic Graphs. J Chem Inf Model 2021; 61:4245-4258. [PMID: 34405674 DOI: 10.1021/acs.jcim.1c00803] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The use of quantitative structure-property relationships (QSPRs) helps in predicting molecular properties for several decades, while the automatic design of new molecular structures is still emerging. The choice of algorithms to generate molecules is not obvious and is related to several factors such as the desired chemical diversity (according to an initial dataset's content) and the level of construction (the use of atoms, fragments, pattern-based methods). In this paper, we address the problem of molecular structure generation by revisiting two approaches: fragment-based methods (FMs) and genetic-based methods (GMs). We define a set of indices to compare generation methods on a specific task. New indices inform about the explored data space (coverage), compare how the data space is explored (representativeness), and quantifies the ratio of molecules satisfying requirements (generation specificity) without the use of a database composed of real chemicals as a reference. These indices were employed to compare generations of molecules fulfilling the desired property criterion, evaluated by QSPR.
Collapse
Affiliation(s)
- Philippe Gantzer
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France
| | - Benoit Creton
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France
| | - Carlos Nieto-Draghi
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France
| |
Collapse
|
12
|
Zhang X, Zhao P, Wang Z, Xu X, Liu G, Tang Y, Li W. In Silico Prediction of CYP2C8 Inhibition with Machine-Learning Methods. Chem Res Toxicol 2021; 34:1850-1859. [PMID: 34255486 DOI: 10.1021/acs.chemrestox.1c00078] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Cytochrome P450 2C8 (CYP2C8) is a major drug-metabolizing enzyme in humans and is responsible for the metabolism of ∼5% drugs in clinical use. Thus, inhibition of CYP2C8, which causes potential adverse drug events, cannot be neglected. The in vitro drug interaction studies guidelines for industry issued by the FDA also point out that it needs to be determined whether investigated drugs are CYP2C8 inhibitors before clinical trials. However, current studies mainly focus on predicting the inhibitors of other major P450 enzymes, and the importance of CYP2C8 inhibition has been overlooked. Therefore, there is a need to develop models for identifying potential CYP2C8 inhibition. In this study, in silico classification models for predicting CYP2C8 inhibition were built by five machine-learning methods combined with nine molecular fingerprints. The performance of the models built was evaluated by test and external validation sets. The best model had AUC values of 0.85 and 0.90 for the test and external validation sets, respectively. The applicability domain was analyzed based on the molecular similarity and exhibited an impact on the improvement of prediction accuracy. Furthermore, several representative privileged substructures such as 1H-benzo[d]imidazole, 1-phenyl-1H-pyrazole, and quinoline were identified by information gain and substructure frequency analysis. Overall, our results would be helpful for the prediction of CYP2C8 inhibition.
Collapse
Affiliation(s)
- Xiaoxiao Zhang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Piaopiao Zhao
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Zhiyuan Wang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Xuan Xu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
13
|
Baybekov S, Marcou G, Ramos P, Saurel O, Galzi JL, Varnek A. DMSO Solubility Assessment for Fragment-Based Screening. Molecules 2021; 26:3950. [PMID: 34203441 PMCID: PMC8271413 DOI: 10.3390/molecules26133950] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 06/23/2021] [Accepted: 06/23/2021] [Indexed: 11/16/2022] Open
Abstract
In this paper, we report comprehensive experimental and chemoinformatics analyses of the solubility of small organic molecules ("fragments") in dimethyl sulfoxide (DMSO) in the context of their ability to be tested in screening experiments. Here, DMSO solubility of 939 fragments has been measured experimentally using an NMR technique. A Support Vector Classification model was built on the obtained data using the ISIDA fragment descriptors. The analysis revealed 34 outliers: experimental issues were retrospectively identified for 28 of them. The updated model performs well in 5-fold cross-validation (balanced accuracy = 0.78). The datasets are available on the Zenodo platform (DOI:10.5281/zenodo.4767511) and the model is available on the website of the Laboratory of Chemoinformatics.
Collapse
Affiliation(s)
- Shamkhal Baybekov
- Laboratoire de Chémoinformatique UMR 7140 CNRS, Institut Le Bel, University of Strasbourg, 4 Rue Blaise Pascal, 67081 Strasbourg, France; (S.B.); (G.M.)
| | - Gilles Marcou
- Laboratoire de Chémoinformatique UMR 7140 CNRS, Institut Le Bel, University of Strasbourg, 4 Rue Blaise Pascal, 67081 Strasbourg, France; (S.B.); (G.M.)
| | - Pascal Ramos
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse CNRS, UPS, 205 Route de Narbonne, 31077 Toulouse, France; (P.R.); (O.S.)
| | - Olivier Saurel
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse CNRS, UPS, 205 Route de Narbonne, 31077 Toulouse, France; (P.R.); (O.S.)
| | - Jean-Luc Galzi
- Biotechnologie et Signalisation Cellulaire UMR 7242 CNRS, École Supérieure de Biotechnologie de Strasbourg, University of Strasbourg, 300 Boulevard Sébastien Brant, 67412 Illkirch, France;
- ChemBioFrance—Chimiothèque Nationale UAR3035, 8 Rue de L’école Normale, CEDEX 05, 34296 Montpellier, France
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique UMR 7140 CNRS, Institut Le Bel, University of Strasbourg, 4 Rue Blaise Pascal, 67081 Strasbourg, France; (S.B.); (G.M.)
| |
Collapse
|
14
|
Yoshihama H, Kaneko H. Design of thermoelectric materials with high electrical conductivity, high Seebeck coefficient, and low thermal conductivity. ANALYTICAL SCIENCE ADVANCES 2021; 2:289-294. [PMID: 38716157 PMCID: PMC10989581 DOI: 10.1002/ansa.202000114] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 11/25/2020] [Accepted: 11/26/2020] [Indexed: 08/18/2024]
Abstract
Thermoelectric materials with a high Seebeck coefficient, high electrical conductivity, and low thermal conductivity are required to directly and efficiently convert unused heat into electricity. In this study, we construct models predicting the Seebeck coefficient, electrical conductivity, and thermal conductivity using existing material databases. In addition to the ratios of atoms in the crystals and temperature at which the materials are used, the values from the X-ray diffraction (XRD) spectra were used as inputs to represent the crystal structure of the materials. It was confirmed that the constructed models could predict the properties with high accuracy using the X-ray diffraction values. Additionally, using the constructed models, we succeeded in proposing promising new candidate materials with high Seebeck coefficients, high electric conductivities, and low thermal conductivities.
Collapse
Affiliation(s)
- Hiroki Yoshihama
- Department of Applied ChemistrySchool of Science and TechnologyMeiji UniversityKawasakiKanagawaJapan
| | - Hiromasa Kaneko
- Department of Applied ChemistrySchool of Science and TechnologyMeiji UniversityKawasakiKanagawaJapan
| |
Collapse
|
15
|
Matsumoto K, Miyao T, Funatsu K. Ranking-Oriented Quantitative Structure-Activity Relationship Modeling Combined with Assay-Wise Data Integration. ACS OMEGA 2021; 6:11964-11973. [PMID: 34056351 PMCID: PMC8154010 DOI: 10.1021/acsomega.1c00463] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/21/2021] [Indexed: 05/15/2023]
Abstract
In ligand-based drug design, quantitative structure-activity relationship (QSAR) models play an important role in activity prediction. One of the major end points of QSAR models is half-maximal inhibitory concentration (IC50). Experimental IC50 data from various research groups have been accumulated in publicly accessible databases, providing an opportunity for us to use such data in predictive QSAR models. In this study, we focused on using a ranking-oriented QSAR model as a predictive model because relative potency strength within the same assay is solid information that is not based on any mechanical assumptions. We conducted rigorous validation using the ChEMBL database and previously reported data sets. Ranking support vector machine (ranking-SVM) models trained on compounds from similar assays were as good as support vector regression (SVR) with the Tanimoto kernel trained on compounds from all the assays. As effective ways of data integration, for ranking-SVM, integrated compounds should be selected from only similar assays in terms of compounds. For SVR with the Tanimoto kernel, entire compounds from different assays can be incorporated.
Collapse
Affiliation(s)
- Katsuhisa Matsumoto
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Tomoyuki Miyao
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
- Data
Science Center, Nara Institute of Science
and Technology, 8916-5
Takayama-cho, Ikoma, Nara, 630-0192, Japan
| | - Kimito Funatsu
- Data
Science Center, Nara Institute of Science
and Technology, 8916-5
Takayama-cho, Ikoma, Nara, 630-0192, Japan
- Department
of Chemical System Engineering, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
- E-mail: . Phone: +81-3-5841-7751. Fax: +81-3-5841-7771
| |
Collapse
|
16
|
Morger A, Svensson F, Arvidsson McShane S, Gauraha N, Norinder U, Spjuth O, Volkamer A. Assessing the calibration in toxicological in vitro models with conformal prediction. J Cheminform 2021; 13:35. [PMID: 33926567 PMCID: PMC8082859 DOI: 10.1186/s13321-021-00511-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 04/10/2021] [Indexed: 11/11/2022] Open
Abstract
Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data’s descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy—exchanging the calibration data only—is convenient as it does not require retraining of the underlying model.
Collapse
Affiliation(s)
- Andrea Morger
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin, Berlin, Germany
| | - Fredrik Svensson
- Alzheimer's Research UK UCL Drug Discovery Institute, London, WC1E 6BT, UK
| | - Staffan Arvidsson McShane
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden
| | - Niharika Gauraha
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden.,Division of Computational Science and Technology, KTH, 100 44, Stockholm, Sweden
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden.,Dept. Computer and Systems Sciences, Stockholm University, Box 7003, 164 07, Kista, Sweden.,MTM Research Centre, School of Science and Technology, Örebro University, 70 182, Örebro, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden
| | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin, Berlin, Germany.
| |
Collapse
|
17
|
Liu X, Zhang H, Xue Q, Pan W, Zhang A. In silico health effect prioritization of environmental chemicals through transcriptomics data exploration from a chemo-centric view. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 762:143082. [PMID: 33143927 DOI: 10.1016/j.scitotenv.2020.143082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Revised: 10/11/2020] [Accepted: 10/11/2020] [Indexed: 06/11/2023]
Abstract
With the explosive growth of synthetic compounds, the health effects caused by exogenous chemical exposure have attracted more and more public attention. The prediction of health effect is a never-ending story. Collective resource of transcriptomics data offers an opportunity to understand and identify the multiple health effects of small molecule. Inspired by the fact that environmental chemicals of high health risk frequently share both similar gene expression profile and common structural feature of certain drugs, we here propose a novel computational effect prioritization method for environmental chemicals through transcriptomics data exploration from a chemo-centric view. Specifically, non-negative matrix factorization (NMF) method has been adopted to get the association network linking structural features with transcriptomics characteristics of drugs with specific effects. The model yields 13 pivotal types of effects, so-called components, that represent drug categories with common chemo- and geno- type features. Moreover, the established model effectively prioritizes potential toxic effects for the external chemicals from the endocrine disruptor screening program (EDSP) for their potential estrogenicity and other verified risks. Even if only the highest priority is set for the estrogenic effect, the precision and recall can reach 0.76 and 0.77 respectively for these chemicals. Our effort provides a successful endeavor as to profile potential toxic effects simultaneously for environmental chemicals using both chemical and omics data.
Collapse
Affiliation(s)
- Xian Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, PR China.
| | - Huazhou Zhang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, PR China.
| | - Qiao Xue
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China.
| | - Wenxiao Pan
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China.
| | - Aiqian Zhang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, PR China; Institute of Environment and Health, Jianghan University, Wuhan 430056, PR China.
| |
Collapse
|
18
|
Horvath D, Marcou G, Varnek A. Trustworthiness, the Key to Grid-Based Map-Driven Predictive Model Enhancement and Applicability Domain Control. J Chem Inf Model 2020; 60:6020-6032. [DOI: 10.1021/acs.jcim.0c00998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| |
Collapse
|
19
|
Kurosaki K, Wu R, Uesawa Y. A Toxicity Prediction Tool for Potential Agonist/Antagonist Activities in Molecular Initiating Events Based on Chemical Structures. Int J Mol Sci 2020; 21:ijms21217853. [PMID: 33113912 PMCID: PMC7660166 DOI: 10.3390/ijms21217853] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 10/07/2020] [Accepted: 10/21/2020] [Indexed: 12/15/2022] Open
Abstract
Because the health effects of many compounds are unknown, regulatory toxicology must often rely on the development of quantitative structure-activity relationship (QSAR) models to efficiently discover molecular initiating events (MIEs) in the adverse-outcome pathway (AOP) framework. However, the QSAR models used in numerous toxicity prediction studies are publicly unavailable, and thus, they are challenging to use in practical applications. Approaches that simultaneously identify the various toxic responses induced by a compound are also scarce. The present study develops Toxicity Predictor, a web application tool that comprehensively identifies potential MIEs. Using various chemicals in the Toxicology in the 21st Century (Tox21) 10K library, we identified potential endocrine-disrupting chemicals (EDCs) using a machine-learning approach. Based on the optimized three-dimensional (3D) molecular structures and XGBoost algorithm, we established molecular descriptors for QSAR models. Their predictive performances and applicability domain were evaluated and applied to Toxicity Predictor. The prediction performance of the constructed models matched that of the top model in the Tox21 Data Challenge 2014. These advanced prediction results for MIEs are freely available on the Internet.
Collapse
|
20
|
Rakhimbekova A, Madzhidov TI, Nugmanov RI, Gimadiev TR, Baskin II, Varnek A. Comprehensive Analysis of Applicability Domains of QSPR Models for Chemical Reactions. Int J Mol Sci 2020; 21:E5542. [PMID: 32756326 PMCID: PMC7432167 DOI: 10.3390/ijms21155542] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 07/27/2020] [Accepted: 07/30/2020] [Indexed: 01/28/2023] Open
Abstract
Nowadays, the problem of the model's applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions. The QRPR models' performance largely depends on the way that chemical transformation is encoded. In this study, various AD definition methods extensively used in QSAR/QSPR studies of individual molecules, as well as several novel approaches suggested in this work for reactions, were benchmarked on several reaction datasets. The ability to exclude wrong reaction types, increase coverage, improve the model performance and detect Y-outliers were tested. As a result, several "best" AD definitions for the QRPR models predicting reaction characteristics have been revealed and tested on a previously published external dataset with a clear AD definition problem.
Collapse
Affiliation(s)
- Assima Rakhimbekova
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
| | - Timur I. Madzhidov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
| | - Ramil I. Nugmanov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
| | - Timur R. Gimadiev
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo 001-0021, Japan;
| | - Igor I. Baskin
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
- Faculty of Physics, Moscow State University, 119234 Moscow, Russia
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 67000 Strasbourg, France
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo 001-0021, Japan;
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 67000 Strasbourg, France
| |
Collapse
|
21
|
Yan X, Sedykh A, Wang W, Yan B, Zhu H. Construction of a web-based nanomaterial database by big data curation and modeling friendly nanostructure annotations. Nat Commun 2020; 11:2519. [PMID: 32433469 PMCID: PMC7239871 DOI: 10.1038/s41467-020-16413-3] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 04/22/2020] [Indexed: 12/27/2022] Open
Abstract
Modern nanotechnology research has generated numerous experimental data for various nanomaterials. However, the few nanomaterial databases available are not suitable for modeling studies due to the way they are curated. Here, we report the construction of a large nanomaterial database containing annotated nanostructures suited for modeling research. The database, which is publicly available through http://www.pubvinas.com/, contains 705 unique nanomaterials covering 11 material types. Each nanomaterial has up to six physicochemical properties and/or bioactivities, resulting in more than ten endpoints in the database. All the nanostructures are annotated and transformed into protein data bank files, which are downloadable by researchers worldwide. Furthermore, the nanostructure annotation procedure generates 2142 nanodescriptors for all nanomaterials for machine learning purposes, which are also available through the portal. This database provides a public resource for data-driven nanoinformatics modeling research aimed at rational nanomaterial design and other areas of modern computational nanotechnology.
Collapse
Affiliation(s)
- Xiliang Yan
- Institute of Environmental Research at Greater Bay, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou, 510006, China.,The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA
| | - Alexander Sedykh
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA.,Sciome, Research Triangle Park, North Carolina, 27709, USA
| | - Wenyi Wang
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA
| | - Bing Yan
- Institute of Environmental Research at Greater Bay, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou, 510006, China. .,School of Environmental Science and Engineering, Shandong University, Jinan, 250100, China.
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA. .,Department of Chemistry, Rutgers University, Camden, NJ, 08102, USA.
| |
Collapse
|
22
|
Mora JR, Marrero-Ponce Y, García-Jacas CR, Suarez Causado A. Ensemble Models Based on QuBiLS-MAS Features and Shallow Learning for the Prediction of Drug-Induced Liver Toxicity: Improving Deep Learning and Traditional Approaches. Chem Res Toxicol 2020; 33:1855-1873. [PMID: 32406679 DOI: 10.1021/acs.chemrestox.0c00030] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Drug-induced liver injury (DILI) is a key safety issue in the drug discovery pipeline and a regulatory concern. Thus, many in silico tools have been proposed to improve the hepatotoxicity prediction of organic-type chemicals. Here, classifiers for the prediction of DILI were developed by using QuBiLS-MAS 0-2.5D molecular descriptors and shallow machine learning techniques, on a training set composed of 1075 molecules. The best ensemble model build, E13, was obtained with good statistical parameters for the learning series, namely, the following: accuracy = 0.840, sensibility = 0.890, specificity = 0.761, Matthew's correlation coefficient = 0.660, and area under the ROC curve = 0.904. The model was also satisfactorily evaluated with Y-scrambling test, and repeated k-fold cross-validation and repeated k-holdout validation. In addition, an exhaustive external validation was also carried out by using two test sets and five external test sets, with an average accuracy value equal to 0.854 (±0.062) and a coverage equal to 98.4% according to its applicability domain. A statistical comparison of the performance of the E13 model, with regard to results and tools (e.g., Padel DDPredictor Software, Deep Learning DILIserver, and Vslead) reported in the literature, was also performed. In general, E13 presented the best global performance in all experiments. The sum of the ranking differences procedure provided a very similar grouping pattern to that of the M-ANOVA statistical analysis, where E13 was identified as the best model for DILI predictions. A noncommercial and fully cross-platform software for the DILI prediction was also developed, which is freely available at http://tomocomd.com/apps/ptoxra. This software was used for the screening of seven data sets, containing natural products, leads, toxic materials, and FDA approved drugs, to assess the usefulness of the QSAR models in the DILI labeling of organic substances; it was found that 50-92% of the evaluated molecules are positive-DILI compounds. All in all, it can be stated that the E13 model is a relevant method for the prediction of DILI risk in humans, as it shows the best results among all of the methods analyzed.
Collapse
Affiliation(s)
- Jose R Mora
- Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, Universidad San Francisco de Quito (USFQ), Quito 17-1200-841, Ecuador.,Instituto de Simulación Computacional (ISC-USFQ), Universidad San Francisco de Quito (USFQ), Diego de Robles y Vía Interoceánica, Quito 17-1200-841, Ecuador
| | - Yovani Marrero-Ponce
- Instituto de Simulación Computacional (ISC-USFQ), Universidad San Francisco de Quito (USFQ), Diego de Robles y Vía Interoceánica, Quito 17-1200-841, Ecuador.,Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, and Instituto de Simulación Computacional (ISC-USFQ), Universidad San Francisco de Quito (USFQ), Diego de Robles y vía Interoceánica, Quito, Pichincha 170157, Ecuador
| | - César R García-Jacas
- Cátedras Conacyt-Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California 22860, México
| | - Amileth Suarez Causado
- Grupo de Investigación Prometeus & Biomedicina Aplicada a las Ciencias Clínicas, Área de Bioquímica, Campus de Zaragocilla, Facultad de Medicina, Universidad de Cartagena, Cartagena de Indias 130001, Colombia
| |
Collapse
|
23
|
Toropov AA, Toropova AP, Marzo M, Carnesecchi E, Selvestrel G, Benfenati E. Pesticides, cosmetics, drugs: identical and opposite influences of various molecular features as measures of endpoints similarity and dissimilarity. Mol Divers 2020; 25:1137-1144. [PMID: 32323128 DOI: 10.1007/s11030-020-10085-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 04/06/2020] [Indexed: 11/26/2022]
Abstract
The similarity is an important category in natural sciences. A measure of similarity for a group of various biochemical endpoints is suggested. The list of examined endpoints contains (1) toxicity of pesticides towards rainbow trout; (2) human skin sensitization; (3) mutagenicity; (4) toxicity of psychotropic drugs; and (5) anti HIV activity. Further applying and evolution of the suggested approach is discussed. In particular, the conception of the similarity (dissimilarity) of endpoints can play the role of a "useful bridge" between quantitative structure property/activity relationships (QSPRs/QSARs) and read-across technique.
Collapse
Affiliation(s)
- Andrey A Toropov
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156, Milan, Italy
| | - Alla P Toropova
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156, Milan, Italy.
| | - Marco Marzo
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156, Milan, Italy
| | - Edoardo Carnesecchi
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156, Milan, Italy
- Institute for Risk Assessment Sciences (IRAS), Utrecht University, P.O. Box 80177, 3508 TD, Utrecht, The Netherlands
| | - Gianluca Selvestrel
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156, Milan, Italy
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156, Milan, Italy
| |
Collapse
|
24
|
Kato H. Computational prediction of cytochrome P450 inhibition and induction. Drug Metab Pharmacokinet 2019; 35:30-44. [PMID: 31902468 DOI: 10.1016/j.dmpk.2019.11.006] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 10/27/2019] [Accepted: 11/17/2019] [Indexed: 12/14/2022]
Abstract
Cytochrome P450 (CYP) enzymes play an important role in the phase I metabolism of many xenobiotics. Most drug-drug interactions (DDIs) associated with CYP are caused by either CYP inhibition or induction. The early detection of potential DDIs is highly desirable in the pharmaceutical industry because DDIs can cause serious adverse events, which can lead to poor patient health and drug development failures. Recently, many computational studies predicting CYP inhibition and induction have been reported. The current computational modeling approaches for CYP metabolism are classified as ligand- and structure-based; various techniques, such as quantitative structure-activity relationships, machine learning, docking, and molecular dynamic simulation, are involved in both the approaches. Recently, combining these two approaches have resulted in improvements in the prediction accuracy of DDIs. In this review, we present important, recent developments in the computational prediction of the inhibition of four clinically crucial CYP isoforms (CYP1A2, 2C9, 2D6, and 3A4) and three nuclear receptors (aryl hydrocarbon receptor, constitutive androstane receptor, and pregnane X receptor) involved in the induction of CYP1A2, 2B6, and 3A4, respectively.
Collapse
Affiliation(s)
- Harutoshi Kato
- DMPK Research Laboratories, Mitsubishi Tanabe Pharma Corporation, Aoba-ku, Yokohama-shi, 227-0033, Japan.
| |
Collapse
|
25
|
Shiri F, Bakhshayesh S, Ghasemi JB. Computer-aided molecular design of (E)-N-Aryl-2-ethene-sulfonamide analogues as microtubule targeted agents in prostate cancer. ARAB J CHEM 2019. [DOI: 10.1016/j.arabjc.2014.11.063] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
|
26
|
Toropova AP, Toropov AA. Whether the Validation of the Predictive Potential of Toxicity Models is a Solved Task? Curr Top Med Chem 2019; 19:2643-2657. [PMID: 31702504 DOI: 10.2174/1568026619666191105111817] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 09/02/2019] [Accepted: 09/04/2019] [Indexed: 12/23/2022]
Abstract
Different kinds of biological activities are defined by complex biochemical interactions, which are termed as a "mathematical function" not only of the molecular structure but also for some additional circumstances, such as physicochemical conditions, interactions via energy and information effects between a substance and organisms, organs, cells. These circumstances lead to the great complexity of prediction for biochemical endpoints, since all "details" of corresponding phenomena are practically unavailable for the accurate registration and analysis. Researchers have not a possibility to carry out and analyse all possible ways of the biochemical interactions, which define toxicological or therapeutically attractive effects via direct experiment. Consequently, a compromise, i.e. the development of predictive models of the above phenomena, becomes necessary. However, the estimation of the predictive potential of these models remains a task that is solved only partially. This mini-review presents a collection of attempts to be used for the above-mentioned task, two special statistical indices are proposed, which may be a measure of the predictive potential of models. These indices are (i) Index of Ideality of Correlation; and (ii) Correlation Contradiction Index.
Collapse
Affiliation(s)
- Alla P Toropova
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via La Masa 19, 20156 Milano, Italy
| | - Andrey A Toropov
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via La Masa 19, 20156 Milano, Italy
| |
Collapse
|
27
|
García-Jacas CR, Marrero-Ponce Y, Cortés-Guzmán F, Suárez-Lezcano J, Martinez-Rios FO, García-González LA, Pupo-Meriño M, Martinez-Mayorga K. Enhancing Acute Oral Toxicity Predictions by using Consensus Modeling and Algebraic Form-Based 0D-to-2D Molecular Encodes. Chem Res Toxicol 2019; 32:1178-1192. [PMID: 31066547 DOI: 10.1021/acs.chemrestox.9b00011] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Quantitative structure-activity relationships (QSAR) are introduced to predict acute oral toxicity (AOT), by using the QuBiLS-MAS (acronym for quadratic, bilinear and N-Linear maps based on graph-theoretic electronic-density matrices and atomic weightings) framework for the molecular encoding. Three training sets were employed to build the models: EPA training set (5931 compounds), EPA-full training set (7413 compounds), and Zhu training set (10 152 compounds). Additionally, the EPA test set (1482 compounds) was used for the validation of the QSAR models built on the EPA training set, while the ProTox (425 compounds) and T3DB (284 compounds) external sets were employed for the assessment of all the models. The k-nearest neighbor, multilayer perceptron, random forest, and support vector machine procedures were employed to build several base (individual) models. The base models with REPA-training ≥ 0.75 ( R = correlation coefficient) and MAEEPA-training ≤ 0.5 (MAE = mean absolute error) were retained to build consensus models. As a result, two consensus models based on the minimum operator and denoted as M19 and M22, as well as a consensus model based on the weighted average operator and denoted as M24, were selected as the best ones for each training set considered. According to the applicability domain (AD) analysis performed, model M19 (built on the EPA training set) has MAEtest-AD = 0.4044, MAEProTox-AD = 0.4067 and MAET3DB-AD = 0.2586 on the EPA test set, ProTox external set, and T3DB external set, respectively; whereas model M22 (built on the EPA-full set) and model M24 (built on the Zhu set) present MAEProTox-AD = 0.3992 and MAET3DB-AD = 0.2286, and MAEProTox-AD = 0.3773 and MAET3DB-AD = 0.2471 on the two external sets accounted for, respectively. These outcomes were compared and statistically validated with respect to 14 QSAR methods (e.g., admetSAR, ProTox-II) from the literature. As a result, model M22 presents the best overall performance. In addition, a retrospective study on 261 withdrawn drugs due to their toxic/side effects was performed, to assess the usefulness of prospectively using the QSAR models proposed in the labeling of chemicals. A comparison with regard to the methods from the literature was also made. As a result, model M22 has the best ability of labeling a compound as toxic according to the globally harmonized system of classification and labeling of chemicals. Therefore, it can be concluded that the models proposed, especially model M22, constitute prominent tools for studying AOT, at providing the best results among all the methods examined. A freely available software was also developed to be used in virtual screening tasks ( http://tomocomd.com/apps/ptoxra ).
Collapse
Affiliation(s)
- César R García-Jacas
- Departamento de Ciencias de la Computación , Centro de Investigación Científica y de Educación Superior de Ensenada , Ensenada , Baja California , México
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito, Grupo de Medicina Molecular y Traslacional, Colegio de Ciencias de la Salud , Escuela de Medicina, Edificio de Especialidades Médicas , Quito , Pichincha , Ecuador.,Grupo de Investigación Ambiental, Programas Ambientales, Facultad de Ingenierías , Fundacion Universitaria Tecnologico Comfenalco-Cartagena , Cr44 DN 30 A, 91 , Cartagena , Bolívar , Colombia
| | - Fernando Cortés-Guzmán
- Instituto de Química , Universidad Nacional Autónoma de México , Ciudad de México , México
| | - José Suárez-Lezcano
- Pontificia Universidad Católica del Ecuador Sede Esmeraldas , Esmeraldas , Ecuador
| | | | - Luis A García-González
- Grupo de Investigación de Bioinformática , Universidad de las Ciencias Informáticas , La Habana , Cuba
| | - Mario Pupo-Meriño
- Grupo de Investigación de Bioinformática , Universidad de las Ciencias Informáticas , La Habana , Cuba
| | | |
Collapse
|
28
|
Toropov AA, Raška I, Toropova AP, Raškova M, Veselinović AM, Veselinović JB. The study of the index of ideality of correlation as a new criterion of predictive potential of QSPR/QSAR-models. THE SCIENCE OF THE TOTAL ENVIRONMENT 2019; 659:1387-1394. [PMID: 31096349 DOI: 10.1016/j.scitotenv.2018.12.439] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Revised: 12/14/2018] [Accepted: 12/28/2018] [Indexed: 06/09/2023]
Abstract
Acetylcholinesterase (AChE) inhibitors, dihydrofolate reductase inhibitors (DHFR), Toxicity in Tetrahymena pyriformis (TP), Acute Toxicity in fathead minnow (TFat), Water solubility (WS), and Acute Aquatic Toxicity in Daphnia magna (DM) are examined as endpoints to establish quantitative structure - property/activity relationships (QSPRs/QSARs). The Index of Ideality of Correlation (IIC) is a measure of predictive potential. The IIC has been studied in a few recent works. The comparison of models for the six endpoints above confirms that the index can be a useful tool for building up and validation of QSPR/QSAR models. All examined endpoints are important from an ecologic point of view. The diversity of examined endpoints confirms that the IIC is real criterion of the predictive potential of a model.
Collapse
Affiliation(s)
- Andrey A Toropov
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milano, Italy
| | - Ivan Raška
- 3rd Medical Department, 1st Faculty of Medicine, Charles University in Prague, U Nemocnice 1, 12808 Prague 2, Czech Republic
| | - Alla P Toropova
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milano, Italy.
| | - Maria Raškova
- 3rd Medical Department, 1st Faculty of Medicine, Charles University in Prague, U Nemocnice 1, 12808 Prague 2, Czech Republic
| | | | | |
Collapse
|
29
|
Berenger F, Yamanishi Y. A Distance-Based Boolean Applicability Domain for Classification of High Throughput Screening Data. J Chem Inf Model 2019; 59:463-476. [PMID: 30567434 DOI: 10.1021/acs.jcim.8b00499] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
In Quantitative Structure-Activity Relationship (QSAR) modeling, one must come up with an activity model but also with an applicability domain for that model. Some existing methods to create an applicability domain are complex, hard to implement, and/or difficult to interpret. Also, they often require the user to select a threshold value, or they embed an empirical constant. In this work, we propose a trivial to interpret and fully automatic Distance-Based Boolean Applicability Domain (DBBAD) algorithm for category QSAR. In retrospective experiments on High Throughput Screening data sets, this applicability domain improves the classification performance and early retrieval of support vector machine and random forest based classifiers, while improving the scaffold diversity among top-ranked active molecules.
Collapse
Affiliation(s)
- Francois Berenger
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering , Kyushu Institute of Technology , 680-4 Kawazu , Iizuka , Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering , Kyushu Institute of Technology , 680-4 Kawazu , Iizuka , Japan.,PRESTO, Japan Science and Technology Agency , Kawaguchi , Saitama 332-0012 , Japan
| |
Collapse
|
30
|
Hanser T, Barber C, Guesné S, Marchaland JF, Werner S. Applicability Domain: Towards a More Formal Framework to Express the Applicability of a Model and the Confidence in Individual Predictions. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2019. [DOI: 10.1007/978-3-030-16443-0_11] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
31
|
Chen Y, Yang H, Wu Z, Liu G, Tang Y, Li W. Prediction of Farnesoid X Receptor Disruptors with Machine Learning Methods. Chem Res Toxicol 2018; 31:1128-1137. [DOI: 10.1021/acs.chemrestox.8b00162] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Yue Chen
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Zengrui Wu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
32
|
Ruiz IL, Gómez-Nieto MÁ. Study of the Applicability Domain of the QSAR Classification Models by Means of the Rivality and Modelability Indexes. Molecules 2018; 23:molecules23112756. [PMID: 30356020 PMCID: PMC6278359 DOI: 10.3390/molecules23112756] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 10/14/2018] [Accepted: 10/22/2018] [Indexed: 11/30/2022] Open
Abstract
The reliability of a QSAR classification model depends on its capacity to achieve confident predictions of new compounds not considered in the building of the model. The results of this external validation process show the applicability domain (AD) of the QSAR model and, therefore, the robustness of the model to predict the property/activity of new molecules. In this paper we propose the use of the rivality and modelability indexes for the study of the characteristics of the datasets to be correctly modeled by a QSAR algorithm and to predict the reliability of the built model to prognosticate the property/activity of new molecules. The calculation of these indexes has a very low computational cost, not requiring the building of a model, thus being good tools for the analysis of the datasets in the first stages of the building of QSAR classification models. In our study, we have selected two benchmark datasets with similar number of molecules but with very different modelability and we have corroborated the capacity of the predictability of the rivality and modelability indexes regarding the classification models built using Support Vector Machine and Random Forest algorithms with 5-fold cross-validation and leave-one-out techniques. The results have shown the excellent ability of both indexes to predict outliers and the applicability domain of the QSAR classification models. In all cases, these values accurately predicted the statistic parameters of the QSAR models generated by the algorithms.
Collapse
Affiliation(s)
- Irene Luque Ruiz
- Department of Computing and Numerical Analysis, Campus Universitario de Rabanales, Albert Einstein Building, University of Córdoba, E-14071 Córdoba, Spain.
| | - Miguel Ángel Gómez-Nieto
- Department of Computing and Numerical Analysis, Campus Universitario de Rabanales, Albert Einstein Building, University of Córdoba, E-14071 Córdoba, Spain.
| |
Collapse
|
33
|
Roy K, Ambure P, Kar S. How Precise Are Our Quantitative Structure-Activity Relationship Derived Predictions for New Query Chemicals? ACS OMEGA 2018; 3:11392-11406. [PMID: 31459245 PMCID: PMC6645132 DOI: 10.1021/acsomega.8b01647] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 09/06/2018] [Indexed: 05/03/2023]
Abstract
Quantitative structure-activity relationship (QSAR) models have long been used for making predictions and data gap filling in diverse fields including medicinal chemistry, predictive toxicology, environmental fate modeling, materials science, agricultural science, nanoscience, food science, and so forth. Usually a QSAR model is developed based on chemical information of a properly designed training set and corresponding experimental response data while the model is validated using one or more test set(s) for which the experimental response data are available. However, it is interesting to estimate the reliability of predictions when the model is applied to a completely new data set (true external set) even when the new data points are within applicability domain (AD) of the developed model. In the present study, we have categorized the quality of predictions for the test set or true external set into three groups (good, moderate, and bad) based on absolute prediction errors. Then, we have used three criteria [(a) mean absolute error of leave-one-out predictions for 10 most close training compounds for each query molecule; (b) AD in terms of similarity based on the standardization approach; and (c) proximity of the predicted value of the query compound to the mean training response] in different weighting schemes for making a composite score of predictions. It was found that using the most frequently appearing weighting scheme 0.5-0-0.5, the composite score-based categorization showed concordance with absolute prediction error-based categorization for more than 80% test data points while working with 5 different datasets with 15 models for each set derived in three different splitting techniques. These observations were also confirmed with true external sets for another four endpoints suggesting applicability of the scheme to judge the reliability of predictions for new datasets. The scheme has been implemented in a tool "Prediction Reliability Indicator" available at http://dtclab.webs.com/software-tools and http://teqip.jdvu.ac.in/QSAR_Tools/DTCLab/, and the tool is presently valid for multiple linear regression models only.
Collapse
Affiliation(s)
- Kunal Roy
- Drug
Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical
Technology, Jadavpur University, Kolkata 700 032, India
- E-mail: and . Phone: +91 98315 94140. Fax: +91-33-2837-1078. URL: http://sites.google.com/site/kunalroyindia/
| | - Pravin Ambure
- Drug
Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical
Technology, Jadavpur University, Kolkata 700 032, India
| | - Supratik Kar
- Interdisciplinary
Center for Nanotoxicity, Department of Chemistry, Physics and Atmospheric
Sciences, Jackson State University, Jackson, Mississippi 39217, United States
| |
Collapse
|
34
|
Kaneko H. Data Visualization, Regression, Applicability Domains and Inverse Analysis Based on Generative Topographic Mapping. Mol Inform 2018; 38:e1800088. [PMID: 30259699 DOI: 10.1002/minf.201800088] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 08/30/2018] [Indexed: 01/11/2023]
Abstract
This paper introduces two generative topographic mapping (GTM) methods that can be used for data visualization, regression analysis, inverse analysis, and the determination of applicability domains (ADs). In GTM-multiple linear regression (GTM-MLR), the prior probability distribution of the descriptors or explanatory variables (X) is calculated with GTM, and the posterior probability distribution of the property/activity or objective variable (y) given X is calculated with MLR; inverse analysis is then performed using the product rule and Bayes' theorem. In GTM-regression (GTMR), X and y are combined and GTM is performed to obtain the joint probability distribution of X and y; this leads to the posterior probability distributions of y given X and of X given y, which are used for regression and inverse analysis, respectively. Simulations using linear and nonlinear datasets and quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) datasets confirm that GTM-MLR and GTMR enable data visualization, regression analysis, and inverse analysis considering appropriate ADs. Python and MATLAB codes for the proposed algorithms are available at https://github.com/hkaneko1985/gtm-generativetopographicmapping.
Collapse
Affiliation(s)
- Hiromasa Kaneko
- Department of Applied Chemistry, Meiji University 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa, 214-8571, Japan
| |
Collapse
|
35
|
McGuinness KN, Pan W, Sheridan RP, Murphy G, Crespo A. Role of simple descriptors and applicability domain in predicting change in protein thermostability. PLoS One 2018; 13:e0203819. [PMID: 30192891 PMCID: PMC6128648 DOI: 10.1371/journal.pone.0203819] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 08/28/2018] [Indexed: 01/07/2023] Open
Abstract
The melting temperature (Tm) of a protein is the temperature at which half of the protein population is in a folded state. Therefore, Tm is a measure of the thermostability of a protein. Increasing the Tm of a protein is a critical goal in biotechnology and biomedicine. However, predicting the change in melting temperature (dTm) due to mutations at a single residue is difficult because it depends on an intricate balance of forces. Existing methods for predicting dTm have had similar levels of success using generally complex models. We find that training a machine learning model with a simple set of easy to calculate physicochemical descriptors describing the local environment of the mutation performed as well as more complicated machine learning models and is 2-6 orders of magnitude faster. Importantly, unlike in most previous publications, we perform a blind prospective test on our simple model by designing 96 variants of a protein not in the training set. Results from retrospective and prospective predictions reveal the limited applicability domain of each model. This study highlights the current deficiencies in the available dTm dataset and is a call to the community to systematically design a larger and more diverse experimental dataset of mutants to prospectively predict dTm with greater certainty.
Collapse
Affiliation(s)
- Kenneth N. McGuinness
- Modeling and Informatics, Merck & Co., Inc., Kenilworth, New Jersey, United States of America
| | - Weilan Pan
- Biochemical Engineering and Structure, Merck & Co., Inc., Rahway, New Jersey, United States of America
| | - Robert P. Sheridan
- Modeling and Informatics, Merck & Co., Inc., Kenilworth, New Jersey, United States of America
| | - Grant Murphy
- Biochemical Engineering and Structure, Merck & Co., Inc., Rahway, New Jersey, United States of America
| | - Alejandro Crespo
- Modeling and Informatics, Merck & Co., Inc., Kenilworth, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
36
|
Zhenin M, Bahia MS, Marcou G, Varnek A, Senderowitz H, Horvath D. Rescoring of docking poses under Occam’s Razor: are there simpler solutions? J Comput Aided Mol Des 2018; 32:877-888. [DOI: 10.1007/s10822-018-0155-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Accepted: 08/26/2018] [Indexed: 01/04/2023]
|
37
|
Liu R, Glover KP, Feasel MG, Wallqvist A. General Approach to Estimate Error Bars for Quantitative Structure–Activity Relationship Predictions of Molecular Activity. J Chem Inf Model 2018; 58:1561-1575. [DOI: 10.1021/acs.jcim.8b00114] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Ruifeng Liu
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland 21702, United States
| | - Kyle P. Glover
- Defense Threat Reduction Agency, Aberdeen Proving Ground, Maryland 21010, United States
| | - Michael G. Feasel
- U.S. Army—Edgewood Chemical Biological Center, Operational Toxicology, Aberdeen Proving Ground, Maryland 21010, United States
| | - Anders Wallqvist
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland 21702, United States
| |
Collapse
|
38
|
Liang Y, Torralba-Sanchez TL, Di Toro DM. Estimating system parameters for solvent-water and plant cuticle-water using quantum chemically estimated Abraham solute parameters. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2018; 20:813-821. [PMID: 29667991 DOI: 10.1039/c7em00601b] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Polyparameter Linear Free Energy Relationships (pp-LFERs) using Abraham system parameters have many useful applications. However, developing the Abraham system parameters depends on the availability and quality of the Abraham solute parameters. Using Quantum Chemically estimated Abraham solute Parameters (QCAP) is shown to produce pp-LFERs that have lower root mean square errors (RMSEs) of predictions for solvent-water partition coefficients than parameters that are estimated using other presently available methods. pp-LFERs system parameters are estimated for solvent-water, plant cuticle-water systems, and for novel compounds using QCAP solute parameters and experimental partition coefficients. Refitting the system parameter improves the calculation accuracy and eliminates the bias. Refitted models for solvent-water partition coefficients using QCAP solute parameters give better results (RMSE = 0.278 to 0.506 log units for 24 systems) than those based on ABSOLV (0.326 to 0.618) and QSPR (0.294 to 0.700) solute parameters. For munition constituents and munition-like compounds not included in the calibration of the refitted model, QCAP solute parameters produce pp-LFER models with much lower RMSEs for solvent-water partition coefficients (RMSE = 0.734 and 0.664 for original and refitted model, respectively) than ABSOLV (4.46 and 5.98) and QSPR (2.838 and 2.723). Refitting plant cuticle-water pp-LFER including munition constituents using QCAP solute parameters also results in lower RMSE (RMSE = 0.386) than that using ABSOLV (0.778) and QSPR (0.512) solute parameters. Therefore, for fitting a model in situations for which experimental data exist and system parameters can be re-estimated, or for which system parameters do not exist and need to be developed, QCAP is the quantum chemical method of choice.
Collapse
Affiliation(s)
- Yuzhen Liang
- School of Environment and Energy, South China University of Technology, Guangzhou, Guangdong 51006, China.
| | | | | |
Collapse
|
39
|
Svensson F, Aniceto N, Norinder U, Cortes-Ciriano I, Spjuth O, Carlsson L, Bender A. Conformal Regression for Quantitative Structure–Activity Relationship Modeling—Quantifying Prediction Uncertainty. J Chem Inf Model 2018; 58:1132-1140. [DOI: 10.1021/acs.jcim.8b00054] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Fredrik Svensson
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
- IOTA Pharmaceuticals, St Johns Innovation Centre, Cowley Road, Cambridge CB4 0WS, U.K
| | - Natalia Aniceto
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Ulf Norinder
- Swetox, Unit of Toxicology Sciences, Karolinska Institutet, Forskargatan 20, SE-151 36 Södertälje, Sweden
- Department of Computer and Systems Sciences, Stockholm University, Box 7003, SE-164 07 Kista, Sweden
| | - Isidro Cortes-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, SE-75124, Uppsala Sweden
| | - Lars Carlsson
- Quantitative Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, SE-43183, Mölndal, Sweden
- Department of Computer Science, Royal Holloway, University of London, Egham Hill, Surrey, U.K
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
40
|
Mansouri K, Grulke CM, Judson RS, Williams AJ. OPERA models for predicting physicochemical properties and environmental fate endpoints. J Cheminform 2018. [PMID: 29520515 PMCID: PMC5843579 DOI: 10.1186/s13321-018-0263-1] [Citation(s) in RCA: 271] [Impact Index Per Article: 45.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The collection of chemical structure information and associated experimental data for quantitative structure–activity/property relationship (QSAR/QSPR) modeling is facilitated by an increasing number of public databases containing large amounts of useful data. However, the performance of QSAR models highly depends on the quality of the data and modeling methodology used. This study aims to develop robust QSAR/QSPR models for chemical properties of environmental interest that can be used for regulatory purposes. This study primarily uses data from the publicly available PHYSPROP database consisting of a set of 13 common physicochemical and environmental fate properties. These datasets have undergone extensive curation using an automated workflow to select only high-quality data, and the chemical structures were standardized prior to calculation of the molecular descriptors. The modeling procedure was developed based on the five Organization for Economic Cooperation and Development (OECD) principles for QSAR models. A weighted k-nearest neighbor approach was adopted using a minimum number of required descriptors calculated using PaDEL, an open-source software. The genetic algorithms selected only the most pertinent and mechanistically interpretable descriptors (2–15, with an average of 11 descriptors). The sizes of the modeled datasets varied from 150 chemicals for biodegradability half-life to 14,050 chemicals for logP, with an average of 3222 chemicals across all endpoints. The optimal models were built on randomly selected training sets (75%) and validated using fivefold cross-validation (CV) and test sets (25%). The CV Q2 of the models varied from 0.72 to 0.95, with an average of 0.86 and an R2 test value from 0.71 to 0.96, with an average of 0.82. Modeling and performance details are described in QSAR model reporting format and were validated by the European Commission’s Joint Research Center to be OECD compliant. All models are freely available as an open-source, command-line application called OPEn structure–activity/property Relationship App (OPERA). OPERA models were applied to more than 750,000 chemicals to produce freely available predicted data on the U.S. Environmental Protection Agency’s CompTox Chemistry Dashboard.![]()
Collapse
Affiliation(s)
- Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA. .,Oak Ridge Institute for Science and Education, 1299 Bethel Valley Road, Oak Ridge, TN, 37830, USA. .,ScitoVation LLC, 6 Davis Drive, Research Triangle Park, NC, 27709, USA.
| | - Chris M Grulke
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Richard S Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Antony J Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| |
Collapse
|
41
|
Kaneko H. Discussion on Regression Methods Based on Ensemble Learning and Applicability Domains of Linear Submodels. J Chem Inf Model 2018; 58:480-489. [PMID: 29425038 DOI: 10.1021/acs.jcim.7b00649] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
To develop a new ensemble learning method and construct highly predictive regression models in chemoinformatics and chemometrics, applicability domains (ADs) are introduced into the ensemble learning process of prediction. When estimating values of an objective variable using subregression models, only the submodels with ADs that cover a query sample, i.e., the sample is inside the model's AD, are used. By constructing submodels and changing a list of selected explanatory variables, the union of the submodels' ADs, which defines the overall AD, becomes large, and the prediction performance is enhanced for diverse compounds. By analyzing a quantitative structure-activity relationship data set and a quantitative structure-property relationship data set, it is confirmed that the ADs can be enlarged and the estimation performance of regression models is improved compared with traditional methods.
Collapse
Affiliation(s)
- Hiromasa Kaneko
- Department of Applied Chemistry, School of Science and Technology, Meiji University , 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
| |
Collapse
|
42
|
Gramatica P, Papa E, Sangion A. QSAR modeling of cumulative environmental end-points for the prioritization of hazardous chemicals. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2018; 20:38-47. [PMID: 29226926 DOI: 10.1039/c7em00519a] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The hazard of chemicals in the environment is inherently related to the molecular structure and derives simultaneously from various chemical properties/activities/reactivities. Models based on Quantitative Structure Activity Relationships (QSARs) are useful to screen, rank and prioritize chemicals that may have an adverse impact on humans and the environment. This paper reviews a selection of QSAR models (based on theoretical molecular descriptors) developed for cumulative multivariate endpoints, which were derived by mathematical combination of multiple effects and properties. The cumulative end-points provide an integrated holistic point of view to address environmentally relevant properties of chemicals.
Collapse
Affiliation(s)
- Paola Gramatica
- QSAR Research Unit on Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences (DiSTA), University of Insubria, Varese, Italy.
| | | | | |
Collapse
|
43
|
Grisoni F, Ballabio D, Todeschini R, Consonni V. Molecular Descriptors for Structure-Activity Applications: A Hands-On Approach. Methods Mol Biol 2018; 1800:3-53. [PMID: 29934886 DOI: 10.1007/978-1-4939-7899-1_1] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Molecular descriptors capture diverse parts of the structural information of molecules and they are the support of many contemporary computer-assisted toxicological and chemical applications. After briefly introducing some fundamental concepts of structure-activity applications (e.g., molecular descriptor dimensionality, classical vs. fingerprint description, and activity landscapes), this chapter guides the readers through a step-by-step explanation of molecular descriptors rationale and application. To this end, the chapter illustrates a case study of a recently published application of molecular descriptors for modeling the activity on cytochrome P450.
Collapse
Affiliation(s)
- Francesca Grisoni
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy.
| | - Davide Ballabio
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| | - Roberto Todeschini
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| | - Viviana Consonni
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
44
|
Marcou G, Delouis G, Mokshyna O, Horvath D, Lachiche N, Varnek A. Transductive Ridge Regression in Structure-activity Modeling. Mol Inform 2017; 37. [PMID: 29095574 DOI: 10.1002/minf.201700112] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 10/08/2017] [Indexed: 11/06/2022]
Abstract
In this article we consider the application of the Transductive Ridge Regression (TRR) approach to structure-activity modeling. An original procedure of the TRR parameters optimization is suggested. Calculations performed on 3 different datasets involving two types of descriptors demonstrated that TRR outperforms its non-transductive analogue (Ridge Regression) in more than 90 % of cases. The most significant transductive effect was observed for small datasets. This suggests that transduction may be particularly useful when the data are expensive or difficult to collect.
Collapse
Affiliation(s)
- Gilles Marcou
- Université de Strasbourg, Faculté de Chimie, 4 rue Blaise Pascal, BP 20296, 67008, Strasbourg Cedex, France
| | - Grace Delouis
- Université de Strasbourg, Faculté de Chimie, 4 rue Blaise Pascal, BP 20296, 67008, Strasbourg Cedex, France
| | - Olena Mokshyna
- Université de Strasbourg, Faculté de Chimie, 4 rue Blaise Pascal, BP 20296, 67008, Strasbourg Cedex, France.,Physico-Chemical Institute, National Academy of Science of Ukraine, 86, Lustdorfskaja doroga, Odessa, 65080
| | - Dragos Horvath
- Université de Strasbourg, Faculté de Chimie, 4 rue Blaise Pascal, BP 20296, 67008, Strasbourg Cedex, France
| | - Nicolas Lachiche
- ICube UMR 7357, 300 bd Sébastien Brant - CS 10413 -, F-67412, Illkirch Cedex
| | - Alexandre Varnek
- Université de Strasbourg, Faculté de Chimie, 4 rue Blaise Pascal, BP 20296, 67008, Strasbourg Cedex, France
| |
Collapse
|
45
|
Perspectives from the NanoSafety Modelling Cluster on the validation criteria for (Q)SAR models used in nanotechnology. Food Chem Toxicol 2017; 112:478-494. [PMID: 28943385 DOI: 10.1016/j.fct.2017.09.037] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Revised: 08/31/2017] [Accepted: 09/19/2017] [Indexed: 11/20/2022]
Abstract
Nanotechnology and the production of nanomaterials have been expanding rapidly in recent years. Since many types of engineered nanoparticles are suspected to be toxic to living organisms and to have a negative impact on the environment, the process of designing new nanoparticles and their applications must be accompanied by a thorough risk analysis. (Quantitative) Structure-Activity Relationship ([Q]SAR) modelling creates promising options among the available methods for the risk assessment. These in silico models can be used to predict a variety of properties, including the toxicity of newly designed nanoparticles. However, (Q)SAR models must be appropriately validated to ensure the clarity, consistency and reliability of predictions. This paper is a joint initiative from recently completed European research projects focused on developing (Q)SAR methodology for nanomaterials. The aim was to interpret and expand the guidance for the well-known "OECD Principles for the Validation, for Regulatory Purposes, of (Q)SAR Models", with reference to nano-(Q)SAR, and present our opinions on the criteria to be fulfilled for models developed for nanoparticles.
Collapse
|
46
|
Liang Y, Xiong R, Sandler SI, Di Toro DM. Quantum Chemically Estimated Abraham Solute Parameters Using Multiple Solvent-Water Partition Coefficients and Molecular Polarizability. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2017; 51:9887-9898. [PMID: 28742336 DOI: 10.1021/acs.est.7b01737] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Polyparameter Linear Free Energy Relationships (pp-LFERs), also called Linear Solvation Energy Relationships (LSERs), are used to predict many environmentally significant properties of chemicals. A method is presented for computing the necessary chemical parameters, the Abraham parameters (AP), used by many pp-LFERs. It employs quantum chemical calculations and uses only the chemical's molecular structure. The method computes the Abraham E parameter using density functional theory computed molecular polarizability and the Clausius-Mossotti equation relating the index refraction to the molecular polarizability, estimates the Abraham V as the COSMO calculated molecular volume, and computes the remaining AP S, A, and B jointly with a multiple linear regression using sixty-five solvent-water partition coefficients computed using the quantum mechanical COSMO-SAC solvation model. These solute parameters, referred to as Quantum Chemically estimated Abraham Parameters (QCAP), are further adjusted by fitting to experimentally based APs using QCAP parameters as the independent variables so that they are compatible with existing Abraham pp-LFERs. QCAP and adjusted QCAP for 1827 neutral chemicals are included. For 24 solvent-water systems including octanol-water, predicted log solvent-water partition coefficients using adjusted QCAP have the smallest root-mean-square errors (RMSEs, 0.314-0.602) compared to predictions made using APs estimated using the molecular fragment based method ABSOLV (0.45-0.716). For munition and munition-like compounds, adjusted QCAP has much lower RMSE (0.860) than does ABSOLV (4.45) which essentially fails for these compounds.
Collapse
Affiliation(s)
- Yuzhen Liang
- School of Environment and Energy, South China University of Technology , Guangzhou, Guangdong 510006, China
- Department of Civil and Environmental Engineering, University of Delaware , Newark, Delaware 19716, United States
| | - Ruichang Xiong
- Department of Chemical and Biomolecular Engineering, University of Delaware , Newark, Delaware 19716, United States
| | - Stanley I Sandler
- Department of Chemical and Biomolecular Engineering, University of Delaware , Newark, Delaware 19716, United States
| | - Dominic M Di Toro
- Department of Civil and Environmental Engineering, University of Delaware , Newark, Delaware 19716, United States
| |
Collapse
|
47
|
Verras A, Waller CL, Gedeck P, Green DVS, Kogej T, Raichurkar A, Panda M, Shelat AA, Clark J, Guy RK, Papadatos G, Burrows J. Shared Consensus Machine Learning Models for Predicting Blood Stage Malaria Inhibition. J Chem Inf Model 2017; 57:445-453. [DOI: 10.1021/acs.jcim.6b00572] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Andreas Verras
- Merck & Co., Inc., Kenilworth, New Jersey 07033, United States
| | - Chris L. Waller
- Merck & Co., Inc., Boston, Massachusetts 02210, United States
| | - Peter Gedeck
- Novartis Institute for Tropical Diseases Pte. Ltd., Singapore 138670, Singapore
| | | | | | | | | | - Anang A. Shelat
- Chemical
Biology and Therapeutics Department, St. Jude Children’s Research Hospital, Memphis, Tennessee 38105, United States
| | - Julie Clark
- Chemical
Biology and Therapeutics Department, St. Jude Children’s Research Hospital, Memphis, Tennessee 38105, United States
| | - R. Kiplin Guy
- Chemical
Biology and Therapeutics Department, St. Jude Children’s Research Hospital, Memphis, Tennessee 38105, United States
| | - George Papadatos
- European
Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, United Kingdom
| | - Jeremy Burrows
- Medicines for Malaria Ventures Discovery, Geneva 1215, Switzerland
| |
Collapse
|
48
|
Horvath D, Marcou G, Varnek A. Generative Topographic Mapping Approach to Chemical Space Analysis. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2017. [DOI: 10.1007/978-3-319-56850-8_6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
49
|
Kleandrova VV, Luan F, Speck-Planche A, Cordeiro MNDS. QSAR-Based Studies of Nanomaterials in the Environment. PHARMACEUTICAL SCIENCES 2017. [DOI: 10.4018/978-1-5225-1762-7.ch051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Nanotechnology is a newly emerging field, posing substantial impacts on society, economy, and the environment. In recent years, the development of nanotechnology has led to the design and large-scale production of many new materials and devices with a vast range of applications. However, along with the benefits, the use of nanomaterials raises many questions and generates concerns due to the possible health-risks and environmental impacts. This chapter provides an overview of the Quantitative Structure-Activity Relationships (QSAR) studies performed so far towards predicting nanoparticles' environmental toxicity. Recent progresses on the application of these modeling studies are additionally pointed out. Special emphasis is given to the setup of a QSAR perturbation-based model for the assessment of ecotoxic effects of nanoparticles in diverse conditions. Finally, ongoing challenges that may lead to new and exciting directions for QSAR modeling are discussed.
Collapse
Affiliation(s)
| | - Feng Luan
- Yantai University, China & University of Porto, Portugal
| | | | | |
Collapse
|
50
|
Aniceto N, Freitas AA, Bender A, Ghafourian T. A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood. J Cheminform 2016. [PMCID: PMC5395519 DOI: 10.1186/s13321-016-0182-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The ability to define the regions of chemical space where a predictive model can be safely used is a necessary condition to assure the reliability of new predictions. This implies that reliability must be determined across chemical space in the attempt to localize “safe” and “unsafe” regions for prediction. As a result we devised an applicability domain technique that addresses the data locally instead of handling it as a whole—the reliability-density neighbourhood (RDN). The main novelty aspect of this method is that it characterizes each single training instance according to the density of its neighbourhood in the training set, as well as its individual bias and precision. By scanning through the chemical space (by iteratively increasing the applicability domain area), it was observed that new test compounds are successively included into the applicability domain region in such a manner that strongly correlates to their predictive performance. This allows the mapping of local reliability across different locations in the training set space, and thus allows identifying regions where the model has low reliability. This method also showed matching profiles between two external sets, which is an indication that it performs robustly with new data. Another novel aspect in this technique is that it is paired with a specific feature selection algorithm. As a result, the impact of the feature set used was studied from which the top 20 features selected by ReliefF yielded the best results, as opposed to using the model’s features or the entire feature set as commonly done. As the third novel aspect, in this work we propose a new scoring function to help evaluate the quality of an applicability domain profile (i.e., the curve of accuracy vs the applicability domain measure in question). Overall, the RDN showed to be a promising method that can correctly sort new instances according to predictive performance. As a result, this technique can be received by an end-user as proof of concept for the performance of a QSAR model in new data, thus promoting the user’s trust on the QSAR output.. ![]()
Collapse
|