1
|
Walter M, Webb SJ, Gillet VJ. Interpreting Neural Network Models for Toxicity Prediction by Extracting Learned Chemical Features. J Chem Inf Model 2024; 64:3670-3688. [PMID: 38686880 PMCID: PMC11094726 DOI: 10.1021/acs.jcim.4c00127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/15/2024] [Accepted: 04/15/2024] [Indexed: 05/02/2024]
Abstract
Neural network models have become a popular machine-learning technique for the toxicity prediction of chemicals. However, due to their complex structure, it is difficult to understand predictions made by these models which limits confidence. Current techniques to tackle this problem such as SHAP or integrated gradients provide insights by attributing importance to the input features of individual compounds. While these methods have produced promising results in some cases, they do not shed light on how representations of compounds are transformed in hidden layers, which constitute how neural networks learn. We present a novel technique to interpret neural networks which identifies chemical substructures in training data found to be responsible for the activation of hidden neurons. For individual test compounds, the importance of hidden neurons is determined, and the associated substructures are leveraged to explain the model prediction. Using structural alerts for mutagenicity from the Derek Nexus expert system as ground truth, we demonstrate the validity of the approach and show that model explanations are competitive with and complementary to explanations obtained from an established feature attribution method.
Collapse
Affiliation(s)
- Moritz Walter
- Information
School, University of Sheffield, The Wave, 2 Whitham Road, Sheffield S10 2AH, U.K.
| | - Samuel J. Webb
- Lhasa
Limited, Granary Wharf
House, 2 Canal Wharf, Leeds LS11 5PY, U.K.
| | - Valerie J. Gillet
- Information
School, University of Sheffield, The Wave, 2 Whitham Road, Sheffield S10 2AH, U.K.
| |
Collapse
|
2
|
Tian T, Li S, Fang M, Zhao D, Zeng J. MolSHAP: Interpreting Quantitative Structure-Activity Relationships Using Shapley Values of R-Groups. J Chem Inf Model 2024; 64:2236-2249. [PMID: 37584270 DOI: 10.1021/acs.jcim.3c00465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Optimizing the activities and properties of lead compounds is an essential step in the drug discovery process. Despite recent advances in machine learning-aided drug discovery, most of the existing methods focus on making predictions for the desired objectives directly while ignoring the explanations for predictions. Although several techniques can provide interpretations for machine learning-based methods such as feature attribution, there are still gaps between these interpretations and the principles commonly adopted by medicinal chemists when designing and optimizing molecules. Here, we propose an interpretation framework, named MolSHAP, for quantitative structure-activity relationship analysis by estimating the contributions of R-groups. Instead of attributing the activities to individual input features, MolSHAP regards the R-group fragments as the basic units of interpretation, which is in accordance with the fragment-based modifications in molecule optimization. MolSHAP is a model-agnostic method that can interpret activity regression models with arbitrary input formats and model architectures. Based on the evaluations of numerous representative activity regression models on a specially designed R-group ranking task, MolSHAP achieved significantly better interpretation power compared with other methods. In addition, we developed a compound optimization algorithm based on MolSHAP and illustrated the reliability of the optimized compounds using an independent case study. These results demonstrated that MolSHAP can provide a useful tool for accurately interpreting the quantitative structure-activity relationships and rationally optimizing the compound activities in drug discovery.
Collapse
Affiliation(s)
- Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Meng Fang
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
3
|
Wu Z, Wang J, Du H, Jiang D, Kang Y, Li D, Pan P, Deng Y, Cao D, Hsieh CY, Hou T. Chemistry-intuitive explanation of graph neural networks for molecular property prediction with substructure masking. Nat Commun 2023; 14:2585. [PMID: 37142585 PMCID: PMC10160109 DOI: 10.1038/s41467-023-38192-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 04/12/2023] [Indexed: 05/06/2023] Open
Abstract
Graph neural networks (GNNs) have been widely used in molecular property prediction, but explaining their black-box predictions is still a challenge. Most existing explanation methods for GNNs in chemistry focus on attributing model predictions to individual nodes, edges or fragments that are not necessarily derived from a chemically meaningful segmentation of molecules. To address this challenge, we propose a method named substructure mask explanation (SME). SME is based on well-established molecular segmentation methods and provides an interpretation that aligns with the understanding of chemists. We apply SME to elucidate how GNNs learn to predict aqueous solubility, genotoxicity, cardiotoxicity and blood-brain barrier permeation for small molecules. SME provides interpretation that is consistent with the understanding of chemists, alerts them to unreliable performance, and guides them in structural optimization for target properties. Hence, we believe that SME empowers chemists to confidently mine structure-activity relationship (SAR) from reliable GNNs through a transparent inspection on how GNNs pick up useful signals when learning from data.
Collapse
Affiliation(s)
- Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, 430072, Hubei, P.R. China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004, Hunan, P.R. China.
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China.
| |
Collapse
|
4
|
Lameiro RF, Montanari CA. Investigating the Lack of Translation from Cruzain Inhibition to Trypanosoma cruzi Activity with Machine Learning and Chemical Space Analyses. ChemMedChem 2023; 18:e202200434. [PMID: 36692246 DOI: 10.1002/cmdc.202200434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 01/17/2023] [Accepted: 01/17/2023] [Indexed: 01/25/2023]
Abstract
Chagas disease is a neglected tropical disease caused by the protozoa Trypanosoma cruzi. Cruzain, its main cysteine protease, is commonly targeted in drug discovery efforts to find new treatments for this disease. Even though the essentiality of this enzyme for the parasite has been established, many cruzain inhibitors fail as trypanocidal agents. This lack of translation from biochemical to biological assays can involve several factors, including suboptimal physicochemical properties. In this work, we aim to rationalize this phenomenon through chemical space analyses of calculated molecular descriptors. These include statistical tests, visualization of projections, scaffold analysis, and creation of machine learning models coupled with interpretability methods. Our results demonstrate a significant difference between the chemical spaces of cruzain and T. cruzi inhibitors, with compounds with more hydrogen bond donors and rotatable bonds being more likely to be good cruzain inhibitors, but less likely to be active on T. cruzi. In addition, cruzain inhibitors seem to occupy specific regions of the chemical space that cannot be easily correlated with T. cruzi activity, which means that using predictive modeling to determine whether cruzain inhibitors will be trypanocidal is not a straightforward task. We believe that the conclusions from this work might be of interest for future projects that aim to develop novel trypanocidal compounds.
Collapse
Affiliation(s)
- Rafael F Lameiro
- Medicinal and Biological Chemistry Group, São Carlos Institute of Chemistry, University of São Paulo, Trabalhador São-Carlense Avenue 400, São Carlos, Brazil
| | - Carlos A Montanari
- Medicinal and Biological Chemistry Group, São Carlos Institute of Chemistry, University of São Paulo, Trabalhador São-Carlense Avenue 400, São Carlos, Brazil
| |
Collapse
|
5
|
Rodríguez-Pérez R, Miljković F, Bajorath J. Machine Learning in Chemoinformatics and Medicinal Chemistry. Annu Rev Biomed Data Sci 2022; 5:43-65. [PMID: 35440144 DOI: 10.1146/annurev-biodatasci-122120-124216] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Novartis Institutes for Biomedical Research, Novartis Campus, Basel, Switzerland
| | - Filip Miljković
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Data Science and AI, Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany;
| |
Collapse
|
6
|
Shulga DA, Ivanov NN, Palyulin VA. In Silico Structure-Based Approach for Group Efficiency Estimation in Fragment-Based Drug Design Using Evaluation of Fragment Contributions. Molecules 2022; 27:molecules27061985. [PMID: 35335347 PMCID: PMC8951103 DOI: 10.3390/molecules27061985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 03/10/2022] [Accepted: 03/15/2022] [Indexed: 12/10/2022] Open
Abstract
The notion of a contribution of a specific group in an organic molecule’s property and/or activity is both common in our thinking and is still not strictly correct due to the inherent non-additivity of free energy with respect to molecular fragments composing a molecule. The fragment- based drug discovery (FBDD) approach has proven to be fruitful in addressing the above notions. The main difficulty of the FBDD, however, is in its reliance on the low throughput and expensive experimental means of determining the fragment-sized molecules binding. In this article we propose a way to enhance the throughput and availability of the FBDD methods by judiciously using an in silico means of assessing the contribution to ligand-receptor binding energy of fragments of a molecule under question using a previously developed in silico Reverse Fragment Based Drug Discovery (R-FBDD) approach. It has been shown that the proposed structure-based drug discovery (SBDD) type of approach fills in the vacant niche among the existing in silico approaches, which mainly stem from the ligand-based drug discovery (LBDD) counterparts. In order to illustrate the applicability of the approach, our work retrospectively repeats the findings of the use case of an FBDD hit-to-lead project devoted to the experimentally based determination of additive group efficiency (GE)—an analog of ligand efficiency (LE) for a group in the molecule—using the Free-Wilson (FW) decomposition. It is shown that in using our in silico approach to evaluate fragment contributions of a ligand and to estimate GE one can arrive at similar decisions as those made using the experimentally determined activity-based FW decomposition. It is also shown that the approach is rather robust to the choice of the scoring function, provided the latter demonstrates a decent scoring power. We argue that the proposed approach of in silico assessment of GE has a wider applicability domain and expect that it will be widely applicable to enhance the net throughput of drug discovery based on the FBDD paradigm.
Collapse
|
7
|
Zhu T, Tao C. Prediction models with multiple machine learning algorithms for POPs: The calculation of PDMS-air partition coefficient from molecular descriptor. JOURNAL OF HAZARDOUS MATERIALS 2022; 423:127037. [PMID: 34530267 DOI: 10.1016/j.jhazmat.2021.127037] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 08/21/2021] [Accepted: 08/23/2021] [Indexed: 06/13/2023]
Abstract
Polydimethylsiloxane-air partition coefficient (KPDMS-air) is a key parameter for passive sampling to measure POPs concentrations. In this study, 13 QSPR models were developed to predict KPDMS-air, with two descriptor selection methods (MLR and RF) and seven algorithms (MLR, LASSO, ANN, SVM, kNN, RF and GBDT). All models were based on a data set of 244 POPs from 13 different categories. The diverse model evaluation parameters calculated from training and test set were used for internal and external verification. Notably, the Radj2, QBOOT2 and Qext2 are 0.995, 0.980 and 0.951 respectively for GBDT model, showing remarkable superiority in fitting, robustness and predictability compared with other models. The discovery that molecular size, branches and types of the bonds were the main internal factors affecting the partition process was revealed by mechanism explanation. Different from the existing QSPR models based on single category compounds, the models developed herein considered multiple classes compounds, so that its application domain was more comprehensive. Therefore, the obtained models can fill the data gap of missing experimental KPDMS-air values for compounds in the application range, and help researchers better understand the distribution behavior of POPs from the perspective of molecular structure.
Collapse
Affiliation(s)
- Tengyi Zhu
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China.
| | - Cuicui Tao
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China
| |
Collapse
|
8
|
Harren T, Matter H, Hessler G, Rarey M, Grebner C. Interpretation of Structure-Activity Relationships in Real-World Drug Design Data Sets Using Explainable Artificial Intelligence. J Chem Inf Model 2022; 62:447-462. [PMID: 35080887 DOI: 10.1021/acs.jcim.1c01263] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
In silico models based on Deep Neural Networks (DNNs) are promising for predicting activities and properties of new molecules. Unfortunately, their inherent black-box character hinders our understanding, as to which structural features are important for activity. However, this information is crucial for capturing the underlying structure-activity relationships (SARs) to guide further optimization. To address this interpretation gap, "Explainable Artificial Intelligence" (XAI) methods recently became popular. Herein, we apply and compare multiple XAI methods to projects of lead optimization data sets with well-established SARs and available X-ray crystal structures. As we can show, easily understandable and comprehensive interpretations are obtained by combining DNN models with some powerful interpretation methods. In particular, SHAP-based methods are promising for this task. A novel visualization scheme using atom-based heatmaps provides useful insights into the underlying SAR. It is important to note that all interpretations are only meaningful in the context of the underlying models and associated data.
Collapse
Affiliation(s)
- Tobias Harren
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| | - Hans Matter
- Synthetic Molecular Design, Integrated Drug Discovery, Sanofi-Aventis Deutschland GmbH, Industriepark Höchst, D-65926 Frankfurt am Main, Germany
| | - Gerhard Hessler
- Synthetic Molecular Design, Integrated Drug Discovery, Sanofi-Aventis Deutschland GmbH, Industriepark Höchst, D-65926 Frankfurt am Main, Germany
| | - Matthias Rarey
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| | - Christoph Grebner
- Synthetic Molecular Design, Integrated Drug Discovery, Sanofi-Aventis Deutschland GmbH, Industriepark Höchst, D-65926 Frankfurt am Main, Germany
| |
Collapse
|
9
|
Jiménez-Luna J, Skalic M, Weskamp N. Benchmarking Molecular Feature Attribution Methods with Activity Cliffs. J Chem Inf Model 2022; 62:274-283. [PMID: 35019265 DOI: 10.1021/acs.jcim.1c01163] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Feature attribution techniques are popular choices within the explainable artificial intelligence toolbox, as they can help elucidate which parts of the provided inputs used by an underlying supervised-learning method are considered relevant for a specific prediction. In the context of molecular design, these approaches typically involve the coloring of molecular graphs, whose presentation to medicinal chemists can be useful for making a decision of which compounds to synthesize or prioritize. The consistency of the highlighted moieties alongside expert background knowledge is expected to contribute to the understanding of machine-learning models in drug design. Quantitative evaluation of such coloring approaches, however, has so far been limited to substructure identification tasks. We here present an approach that is based on maximum common substructure algorithms applied to experimentally-determined activity cliffs. Using the proposed benchmark, we found that molecule coloring approaches in conjunction with classical machine-learning models tend to outperform more modern, graph-neural-network alternatives. The provided benchmark data are fully open sourced, which we hope will facilitate the testing of newly developed molecular feature attribution techniques.
Collapse
Affiliation(s)
- José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093 Zurich, Switzerland.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Miha Skalic
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Nils Weskamp
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| |
Collapse
|
10
|
Rodríguez-Pérez R, Bajorath J. Explainable Machine Learning for Property Predictions in Compound Optimization. J Med Chem 2021; 64:17744-17752. [PMID: 34902252 DOI: 10.1021/acs.jmedchem.1c01789] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The prediction of compound properties from chemical structure is a main task for machine learning (ML) in medicinal chemistry. ML is often applied to large data sets in applications such as compound screening, virtual library enumeration, or generative chemistry. Albeit desirable, a detailed understanding of ML model decisions is typically not required in these cases. By contrast, compound optimization efforts rely on small data sets to identify structural modifications leading to desired property profiles. In this situation, if ML is applied, one usually is reluctant to make decisions based on predictions that cannot be rationalized. Only few ML methods are interpretable. However, to yield insights into complex ML model decisions, explanatory approaches can be applied. Herein, methodologies for better understanding of ML models or explaining individual predictions are reviewed and current challenges in integrating ML into medicinal chemistry programs as well as future opportunities are discussed.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany.,Novartis Institutes for Biomedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Jürgen Bajorath
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
| |
Collapse
|
11
|
Nikonenko A, Zankov D, Baskin I, Madzhidov T, Polishchuk P. Multiple Conformer Descriptors for QSAR Modeling. Mol Inform 2021; 40:e2060030. [PMID: 34342944 DOI: 10.1002/minf.202060030] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 07/19/2021] [Indexed: 12/11/2022]
Abstract
The most widely used QSAR approaches are mainly based on 2D molecular representation which ignores stereoconfiguration and conformational flexibility of compounds. 3D QSAR uses a single conformer of each compound which is difficult to choose reasonably. 4D QSAR uses multiple conformers to overcome the issues of 2D and 3D methods. However, many of existing 4D QSAR models suffer from the necessity to pre-align conformers, while alignment-independent approaches often ignore stereoconfiguration of compounds. In this study we propose a QSAR modeling approach based on transforming chirality-aware 3D pharmacophore descriptors of individual conformers into a set of latent variables representing the whole conformer set of a molecule. This is achieved by clustering together all conformers of all training set compounds. The final representation of a compound is a bit string encoding cluster membership of its conformers. In our study we used Random Forest, but this representation can be used in combination with any machine learning method. We compared this approach with conventional 2D and 3D approaches using multiple data sets and investigated the sensitivity of the approach proposed to tuning parameters: number of conformers and clusters.
Collapse
Affiliation(s)
- Aleksandra Nikonenko
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| | - Dmitry Zankov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlevskaya Str. 18, 420008, Kazan, Russia
| | - Igor Baskin
- Department of Materials Science and Engineering, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
| | - Timur Madzhidov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlevskaya Str. 18, 420008, Kazan, Russia
| | - Pavel Polishchuk
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| |
Collapse
|
12
|
Tinkov OV, Grigorev VY, Grigoreva LD. QSAR analysis of the acute toxicity of avermectins towards Tetrahymena pyriformis. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:541-571. [PMID: 34157880 DOI: 10.1080/1062936x.2021.1932583] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 05/17/2021] [Indexed: 06/13/2023]
Abstract
Avermectins have been effectively used in medicine, veterinary medicine, and agriculture as antiparasitic agents for many years. However, there are still no reliable data on the main ecotoxicological characteristics of most individual avermectins. Although many QSAR models have been proposed to describe the acute toxicity of organic compounds towards Tetrahymena pyriformis (T. pyriformis), avermectins are outside the applicability domain of these models. The influence of the molecular structures of various organic compounds on the acute toxicity towards T. pyriformis was studied using the OCHEM web platform (https://ochem.eu). A data set of 1792 toxicants was used to create models. The QSAR (Quantitative Structure-Activity Relationship) models were developed using the molecular descriptors Dragon, ISIDA, CDK, PyDescriptor, alvaDesc, and SIRMS and machine learning methods, such as Least Squares Support Vector Machine and Transformer Convolutional Neural Network. The HYBOT descriptors and Random Forest were used for a comparative QSAR investigation. Since the best predictive ability was demonstrated by the Transformer Convolutional Neural Network model, it was used to predict the toxicity of individual avermectins towards T. pyriformis. During a structural interpretation of the developed QSAR model, we determined the significant molecular transformations that increase and decrease the acute toxicity of organic compounds.
Collapse
Affiliation(s)
- O V Tinkov
- Department of Pharmacology and Pharmaceutical Chemistry, Medical Faculty, Shevchenko Transnistria State University, Tiraspol, Moldova
- Department of Computer Science, Military Institute of the Ministry of Defense, Tiraspol, Moldova
| | - V Y Grigorev
- Department of Computer-aided Molecular Design, Institute of Physiologically Active Compounds of the Russian Academy of Science, Chernogolovka, Russia
| | - L D Grigoreva
- Department of Fundamental Physicochemical Engineering, Moscow State University, Moscow, Russia
| |
Collapse
|
13
|
Matveieva M, Polishchuk P. Benchmarks for interpretation of QSAR models. J Cheminform 2021; 13:41. [PMID: 34039411 PMCID: PMC8157407 DOI: 10.1186/s13321-021-00519-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 05/15/2021] [Indexed: 01/06/2023] Open
Abstract
Interpretation of QSAR models is useful to understand the complex nature of biological or physicochemical processes, guide structural optimization or perform knowledge-based validation of QSAR models. Highly predictive models are usually complex and their interpretation is non-trivial. This is particularly true for modern neural networks. Various approaches to interpretation of these models exist. However, it is difficult to evaluate and compare performance and applicability of these ever-emerging methods. Herein, we developed several benchmark data sets with end-points determined by pre-defined patterns. These data sets are purposed for evaluation of the ability of interpretation approaches to retrieve these patterns. They represent tasks with different complexity levels: from simple atom-based additive properties to pharmacophore hypothesis. We proposed several quantitative metrics of interpretation performance. Applicability of benchmarks and metrics was demonstrated on a set of conventional models and end-to-end graph convolutional neural networks, interpreted by the previously suggested universal ML-agnostic approach for structural interpretation. We anticipate these benchmarks to be useful in evaluation of new interpretation approaches and investigation of decision making of complex "black box" models.
Collapse
Affiliation(s)
- Mariia Matveieva
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University, University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| | - Pavel Polishchuk
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University, University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic.
| |
Collapse
|
14
|
Jiménez-Luna J, Skalic M, Weskamp N, Schneider G. Coloring Molecules with Explainable Artificial Intelligence for Preclinical Relevance Assessment. J Chem Inf Model 2021; 61:1083-1094. [PMID: 33629843 DOI: 10.1021/acs.jcim.0c01344] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Graph neural networks are able to solve certain drug discovery tasks such as molecular property prediction and de novo molecule generation. However, these models are considered "black-box" and "hard-to-debug". This study aimed to improve modeling transparency for rational molecular design by applying the integrated gradients explainable artificial intelligence (XAI) approach for graph neural network models. Models were trained for predicting plasma protein binding, hERG channel inhibition, passive permeability, and cytochrome P450 inhibition. The proposed methodology highlighted molecular features and structural elements that are in agreement with known pharmacophore motifs, correctly identified property cliffs, and provided insights into unspecific ligand-target interactions. The developed XAI approach is fully open-sourced and can be used by practitioners to train new models on other clinically relevant endpoints.
Collapse
Affiliation(s)
- José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8049 Zurich, Switzerland
| | - Miha Skalic
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Nils Weskamp
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8049 Zurich, Switzerland
| |
Collapse
|
15
|
Tinkov O, Polishchuk P, Matveieva M, Grigorev V, Grigoreva L, Porozov Y. The Influence of Structural Patterns on Acute Aquatic Toxicity of Organic Compounds. Mol Inform 2020; 40:e2000209. [PMID: 33029954 DOI: 10.1002/minf.202000209] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 10/01/2020] [Indexed: 12/28/2022]
Abstract
Investigation of the influence of molecular structure of different organic compounds on acute toxicity towards Fathead minnow, Daphnia magna, and Tetrahymena pyriformis has been carried out using 2D simplex representation of molecular structure and two modelling methods: Random Forest (RF) and Gradient Boosting Machine (GBM). Suitable QSAR (Quantitative Structure - Activity Relationships) models were obtained. The study was focused on QSAR models interpretation. The aim of the study was to develop a set of structural fragments that simultaneously consistently increase toxicity toward Fathead minnow, Daphnia magna, Tetrahymena pyriformis. The interpretation allowed to gain more details about known toxicophores and to propose new fragments. The results obtained made it possible to rank the contributions of molecular fragments to various types of toxicity to aquatic organisms. This information can be used for molecular optimization of chemicals. According to the results of structural interpretation, the most significant common mechanisms of the toxic effect of organic compounds on Fathead minnow, Daphnia magna and Tetrahymena pyriformis are reactions of nucleophilic substitution and inhibition of oxidative phosphorylation in mitochondria. In addition acetylcholinesterase and voltage-gated ion channel of Fathead minnow and Daphnia magna are important targets for toxicants. The on-line version of the OCHEM expert system (https://ochem.eu) were used for a comparative QSAR investigation. The proposed QSAR models comply with the OECD principles and can be used to reliably predict acute toxicity of organic compounds towards Fathead minnow, Daphnia magna and Tetrahymena pyriformis with allowance for applicability domain estimation.
Collapse
Affiliation(s)
- Oleg Tinkov
- Department of Computer Science, Military Institute of the Ministry of Defense, 3300, Gogol str. 2"B", Tiraspol, Transdniestria, Moldova.,Department of Pharmacology and Pharmaceutical Chemistry, Medical Faculty, Transnistrian State University, 3300, October 25 str. 128, Tiraspol, Transdniestria, Moldova
| | - Pavel Polishchuk
- Institute of Molecular and Translational Medicine Faculty of Medicine and Dentistry Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| | - Mariia Matveieva
- Institute of Molecular and Translational Medicine Faculty of Medicine and Dentistry Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| | - Veniamin Grigorev
- Institute of Physiologically Active Compounds, Russian Academy of Sciences, 142432, Severniy proezd 1, Chernogolovka, Moscow region, Russia
| | - Ludmila Grigoreva
- Department of Fundamental Physical and Chemical Engineering, Moscow State University, 119991, Leninskiye Gory 1/51, Moscow, Russia
| | - Yuri Porozov
- World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow, Russia.,Department of Computational Biology, Sirius University of Science and Technology, 354340, Olympic Ave 1, Sochi, Russia
| |
Collapse
|
16
|
Yuan R, Xue D, Xue D, Li J, Ding X, Sun J, Lookman T. Knowledge-Based Descriptor for the Compositional Dependence of the Phase Transition in BaTiO 3-Based Ferroelectrics. ACS APPLIED MATERIALS & INTERFACES 2020; 12:44970-44980. [PMID: 32924419 DOI: 10.1021/acsami.0c12763] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Descriptors play a central role in constructing composition-structure-property relationships to guide materials design. We propose a material descriptor, δτ, for the composition dependence of the Curie temperature (Tc) on single doping elements in BaTiO3 ferroelectrics, which is then generalized to a linear combination of multiple dopants in the solid solutions. The descriptor δτ depends linearly on the Curie temperature and also serves to separate the ferroelectric phase from the relaxor phase. We compare δτ to other commonly used descriptors such as the tolerance factor, electronegativity, and ionic displacement. By using regression analysis on our assembled experimental data, we show how it outperforms other descriptors. We use the trained machine-learned models to predict compositions in our search space with the largest ferroelectric, dielectric, and piezoelectric properties, namely, d33, electrostrain, and recoverable energy storage density. We experimentally verify our predictions for Tc and classification into ferroelectrics and relaxors by synthesizing and characterizing six solid solutions in BaTiO3 ferroelectrics. Our definition of δτ can shed light on the design of knowledge-based descriptors in other systems such as Pb-based and Bi-based solid solutions.
Collapse
Affiliation(s)
- Ruihao Yuan
- State Key Laboratory of Solidification Processing, Northwestern Polytechnical University, Xi'an 710072, China
- State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an 710049, China
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Deqing Xue
- School of Materials Science and Engineering, Xi'an University of Technology, Xi'an 710048, China
| | - Dezhen Xue
- State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an 710049, China
| | - Jinshan Li
- State Key Laboratory of Solidification Processing, Northwestern Polytechnical University, Xi'an 710072, China
| | - Xiangdong Ding
- State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an 710049, China
| | - Jun Sun
- State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an 710049, China
| | - Turab Lookman
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
17
|
Coley CW, Eyke NS, Jensen KF. Autonomous Discovery in the Chemical Sciences Part I: Progress. Angew Chem Int Ed Engl 2020; 59:22858-22893. [DOI: 10.1002/anie.201909987] [Citation(s) in RCA: 100] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Indexed: 01/05/2023]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
18
|
Coley CW, Eyke NS, Jensen KF. Autonome Entdeckung in den chemischen Wissenschaften, Teil I: Fortschritt. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.201909987] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
19
|
Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, Tropsha A. QSAR without borders. Chem Soc Rev 2020; 49:3525-3564. [PMID: 32356548 PMCID: PMC8008490 DOI: 10.1039/d0cs00098a] [Citation(s) in RCA: 312] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
Collapse
Affiliation(s)
- Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Wedlake AJ, Folia M, Piechota S, Allen TEH, Goodman JM, Gutsell S, Russell PJ. Structural Alerts and Random Forest Models in a Consensus Approach for Receptor Binding Molecular Initiating Events. Chem Res Toxicol 2020; 33:388-401. [PMID: 31850746 DOI: 10.1021/acs.chemrestox.9b00325] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
A molecular initiating event (MIE) is the gateway to an adverse outcome pathway (AOP), a sequence of events ending in an adverse effect. In silico predictions of MIEs are a vital tool in a modern, mechanism-focused approach to chemical risk assessment. For 90 biological targets representing important human MIEs, structural alert-based models have been constructed with an automated procedure that uses Bayesian statistics to iteratively select substructures. These models give impressive average performance statistics (an average of 92% correct predictions across targets), significantly improving on previous models. Random Forest models have been constructed from physicochemical features for the same targets, giving similarly impressive performance statistics (93% correct predictions). A key difference between the models is interpretation of predictions-the structural alert models are transparent and easy to interpret, while Random Forest models can only identify the most important physicochemical features for making predictions. The two complementary models have been combined in a consensus model, improving performance compared to each individual model (94% correct predictions) and increasing confidence in predictions. Variation in model performance has been explained by calculating a modelability index (MODI), using Tanimoto coefficient between Morgan fingerprints to identify nearest neighbor chemicals. This work is an important step toward building confidence in the use of in silico tools for assessment of toxicity.
Collapse
Affiliation(s)
- Andrew J Wedlake
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge , CB2 1EW , United Kingdom
| | - Maria Folia
- Unilever Safety and Environmental Assurance Centre , Colworth Science Park , Sharnbrook , Bedfordshire , MK44 1LQ , United Kingdom
| | - Sam Piechota
- Unilever Safety and Environmental Assurance Centre , Colworth Science Park , Sharnbrook , Bedfordshire , MK44 1LQ , United Kingdom
| | - Timothy E H Allen
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge , CB2 1EW , United Kingdom.,MRC Toxicology Unit , University of Cambridge , Lancaster Road , Leicester LE19HN , United Kingdom
| | - Jonathan M Goodman
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge , CB2 1EW , United Kingdom
| | - Steve Gutsell
- Unilever Safety and Environmental Assurance Centre , Colworth Science Park , Sharnbrook , Bedfordshire , MK44 1LQ , United Kingdom
| | - Paul J Russell
- Unilever Safety and Environmental Assurance Centre , Colworth Science Park , Sharnbrook , Bedfordshire , MK44 1LQ , United Kingdom
| |
Collapse
|
21
|
Ma XY, Lewis JP, Yan QB, Su G. Accelerated Discovery of Two-Dimensional Optoelectronic Octahedral Oxyhalides via High-Throughput Ab Initio Calculations and Machine Learning. J Phys Chem Lett 2019; 10:6734-6740. [PMID: 31621332 DOI: 10.1021/acs.jpclett.9b02420] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Traditional trial-and-error methods are obstacles for large-scale searching of new optoelectronic materials. Here, we introduce a method combining high-throughput ab initio calculations and machine-learning approaches to predict two-dimensional octahedral oxyhalides with improved optoelectronic properties. We develop an effective machine-learning model based on an expansive data set generated from density functional calculations including the geometric and electronic properties of 300 two-dimensional octahedral oxyhalides. Our model accelerates the screening of potential optoelectronic materials of 5000 two-dimensional octahedral oxyhalides. The distorted stacked octahedral factors proposed in our model play essential roles in the machine-learning prediction. Several potential two-dimensional optoelectronic octahedral oxyhalides with moderate band gaps, high electron mobilities, and ultrahigh absorbance coefficients are successfully hypothesized.
Collapse
Affiliation(s)
- Xing-Yu Ma
- School of Physical Sciences , University of Chinese Academy of Sciences , Beijing 100049 , China
| | - James P Lewis
- Department of Physics and Astronomy , West Virginia University , Morgantown , West Virginia 26506-6315 , United States
- State Key Laboratory of Coal Conversion, Institute of Coal Chemistry , Chinese Academy of Sciences , Taiyuan , Shanxi 030001 , China
- Beijing Advanced Innovation Center for Materials Genome Engineering , Beijing Information S & T University , Beijing 101400 , China
| | - Qing-Bo Yan
- Center of Materials Science and Optoelectronics Engineering, College of Materials Science and Optoelectronic Technology , University of Chinese Academy of Sciences , Beijing 100049 , China
| | - Gang Su
- School of Physical Sciences , University of Chinese Academy of Sciences , Beijing 100049 , China
- Kavli Institute for Theoretical Sciences, and CAS Center of Excellence in Topological Quantum Computation , University of Chinese Academy of Sciences , Beijing 100190 , China
| |
Collapse
|
22
|
Guo Y, Zhao L, Zhang X, Zhu H. Using a hybrid read-across method to evaluate chemical toxicity based on chemical structure and biological data. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2019; 178:178-187. [PMID: 31004930 PMCID: PMC6508079 DOI: 10.1016/j.ecoenv.2019.04.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 04/05/2019] [Accepted: 04/07/2019] [Indexed: 05/08/2023]
Abstract
Read-across has become a primary approach to fill data gaps for chemical safety assessments. Chemical similarity based on structure, reactivity, and physic-chemical property information is a traditional approach applied for read-across toxicity studies. However, toxicity mechanisms are usually complicated in a biological system, so only using chemical similarity to perform the read-across for new compounds was not satisfactory for most toxicity endpoints, especially when the chemically similar compounds show dissimilar toxicities. This study aims to develop an enhanced read-across method for chemical toxicity predictions. To this end, we used two large toxicity datasets for read-across purposes. One consists of 3979 compounds with Ames mutagenicity data, and the other contains 7332 compounds with rat acute oral toxicity data. First, biological data for all compounds in these two datasets were obtained by querying thousands of PubChem bioassays. The PubChem bioassays with at least five compounds from either of these two datasets showing active responses were selected to generate comprehensive bioprofiles. The read-across studies were performed by using chemical similarity search only and also by using a hybrid similarity search based on both chemical descriptors and bioprofiles. Compared to traditional read-across based on chemical similarity, the hybrid read-across approach showed improved accuracy of predictions for both Ames mutagenicity and acute oral toxicity. Furthermore, we could illustrate potential toxicity mechanisms by analyzing the bioprofiles used for this hybrid read-across study. The results of this study indicate that the new hybrid read-across approach could be an applicable computational tool for chemical toxicity predictions. In this way, the bottleneck of traditional read-across studies can be overcome by introducing public biological data into the traditional process. The incorporation of bioprofiles generated from the additional biological data for compounds can partially solve the "activity cliff" issue and reveal their potential toxicity mechanisms. This study leads to a promising direction to utilize data-driven approaches for computational toxicology studies in the big data era.
Collapse
Affiliation(s)
- Yajie Guo
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, China
| | - Linlin Zhao
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| | - Xiaoyi Zhang
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, China.
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA; Department of Chemistry, Rutgers University, Camden, NJ, USA.
| |
Collapse
|
23
|
Sheridan RP. Interpretation of QSAR Models by Coloring Atoms According to Changes in Predicted Activity: How Robust Is It? J Chem Inf Model 2019; 59:1324-1337. [DOI: 10.1021/acs.jcim.8b00825] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Robert P. Sheridan
- Modeling and Informatics, Merck & Co. Inc., Kenilworth, New Jersey 07065, United States
| |
Collapse
|
24
|
Bosc N, Atkinson F, Felix E, Gaulton A, Hersey A, Leach AR. Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery. J Cheminform 2019; 11:4. [PMID: 30631996 PMCID: PMC6690068 DOI: 10.1186/s13321-018-0325-4] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 12/24/2018] [Indexed: 12/22/2022] Open
Abstract
Structure–activity relationship modelling is frequently used in the early stage of drug discovery to assess the activity of a compound on one or several targets, and can also be used to assess the interaction of compounds with liability targets. QSAR models have been used for these and related applications over many years, with good success. Conformal prediction is a relatively new QSAR approach that provides information on the certainty of a prediction, and so helps in decision-making. However, it is not always clear how best to make use of this additional information. In this article, we describe a case study that directly compares conformal prediction with traditional QSAR methods for large-scale predictions of target-ligand binding. The ChEMBL database was used to extract a data set comprising data from 550 human protein targets with different bioactivity profiles. For each target, a QSAR model and a conformal predictor were trained and their results compared. The models were then evaluated on new data published since the original models were built to simulate a “real world” application. The comparative study highlights the similarities between the two techniques but also some differences that it is important to bear in mind when the methods are used in practical drug discovery applications.
Collapse
Affiliation(s)
- Nicolas Bosc
- Chemogenomics Team, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Francis Atkinson
- Chemogenomics Team, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Eloy Felix
- Chemogenomics Team, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Anna Gaulton
- Chemogenomics Team, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Anne Hersey
- Chemogenomics Team, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Andrew R Leach
- Chemogenomics Team, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
25
|
Wenzel J, Matter H, Schmidt F. Predictive Multitask Deep Neural Network Models for ADME-Tox Properties: Learning from Large Data Sets. J Chem Inf Model 2019; 59:1253-1268. [DOI: 10.1021/acs.jcim.8b00785] [Citation(s) in RCA: 96] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
26
|
Tinkov O, Grigorev V, Polishchuk P, Yarkov A, Raevsky O. QSAR investigation of acute toxicity of organic compounds during oral administration to mice. ACTA ACUST UNITED AC 2019; 65:123-132. [DOI: 10.18097/pbmc20196502123] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The effect of the structure of organic compounds on the acute toxicity upon oral injection in mice was studied using 2D simplex representation of the molecular structure and Random forest (RF) methods. Satisfactory quantitative structure-activity relationship (QSAR) models were constructed (R2 test = 0,61–0,62). The interpretation of the obtained QSAR models was carried out. The contributions of known toxicophores with established mechanisms of action were calculated in order to confirm the ability of the interpretation approach to correctly rank them relative to other structural fragments. The influence of the molecular surroundings of some toxicophores was analyzed. We analyzed the contributions of other highly ranked fragments from the list of common functional groups and ring systems in order to find new potential toxicophores. The on-line version of the expert system “OCHEM” (https://ochem.eu) and Arithmetic Mean Toxicity (AMT) approach were used for a comparative QSAR study.
Collapse
Affiliation(s)
- O.V. Tinkov
- Military Institute of the Ministry of Defense, Tiraspol, Moldova
| | - V.Yu. Grigorev
- Institute of Physiologically Active Compounds, Russian Academy of Sciences, Chernogolovka, Russia
| | - P.G. Polishchuk
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University, Olomouc, Czech Republic
| | - A.V. Yarkov
- Institute of Physiologically Active Compounds, Russian Academy of Sciences, Chernogolovka, Russia
| | - O.A. Raevsky
- Institute of Physiologically Active Compounds, Russian Academy of Sciences, Chernogolovka, Russia
| |
Collapse
|
27
|
Low YS, Alves VM, Fourches D, Sedykh A, Andrade CH, Muratov EN, Rusyn I, Tropsha A. Chemistry-Wide Association Studies (CWAS): A Novel Framework for Identifying and Interpreting Structure-Activity Relationships. J Chem Inf Model 2018; 58:2203-2213. [PMID: 30376324 DOI: 10.1021/acs.jcim.8b00450] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Quantitative structure-activity relationships (QSAR) models are often seen as a "black box" because they are considered difficult to interpret. Meanwhile, qualitative approaches, e.g., structural alerts (SA) or read-across, provide mechanistic insight, which is preferred for regulatory purposes, but predictive accuracy of such approaches is often low. Herein, we introduce the chemistry-wide association study (CWAS) approach, a novel framework that both addresses such deficiencies and combines advantages of statistical QSAR and alert-based approaches. The CWAS framework consists of the following steps: (i) QSAR model building for an end point of interest, (ii) identification of key chemical features, (iii) determination of communities of such features disproportionately co-occurring more frequently in the active than in the inactive class, and (iv) assembling these communities to form larger (and not necessarily chemically connected) novel structural alerts with high specificity. As a proof-of-concept, we have applied CWAS to model Ames mutagenicity and Stevens-Johnson Syndrome (SJS). For the well-studied Ames mutagenicity data set, we identified 76 important individual fragments and assembled co-occurring fragments into SA both replicative of known as well as representing novel mutagenicity alerts. For the SJS data set, we identified 29 important fragments and assembled co-occurring communities into SA including both known and novel alerts. In summary, we demonstrate that CWAS provides a new framework to interpret predictive QSAR models and derive refined structural alerts for more effective design and safety assessment of drugs and drug candidates.
Collapse
Affiliation(s)
- Yen S Low
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy , University of North Carolina , Chapel Hill , North Carolina 27599 , United States
| | - Vinicius M Alves
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy , University of North Carolina , Chapel Hill , North Carolina 27599 , United States.,Laboratory for Molecular Modeling and Design, Department of Pharmacy , Federal University of Goias , Goiania , Goias 74605-170 , Brazil
| | - Denis Fourches
- Department of Chemistry and Bioinformatics Research Center , North Carolina State University , Raleigh , North Carolina 27695 , United States
| | - Alexander Sedykh
- Sciome LLC , Research Triangle Park , North Carolina 27709 , United States
| | - Carolina Horta Andrade
- Laboratory for Molecular Modeling and Design, Department of Pharmacy , Federal University of Goias , Goiania , Goias 74605-170 , Brazil
| | - Eugene N Muratov
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy , University of North Carolina , Chapel Hill , North Carolina 27599 , United States.,Department of Chemical Technology , Odessa National Polytechnic University , Odessa 65000 , Ukraine
| | - Ivan Rusyn
- Department of Veterinary Integrative Biosciences , Texas A&M University , College Station , Texas 77843 , United States
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy , University of North Carolina , Chapel Hill , North Carolina 27599 , United States
| |
Collapse
|
28
|
Matveieva M, Cronin MTD, Polishchuk P. Interpretation of QSAR Models: Mining Structural Patterns Taking into Account Molecular Context. Mol Inform 2018; 38:e1800084. [DOI: 10.1002/minf.201800084] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2018] [Accepted: 09/27/2018] [Indexed: 01/22/2023]
Affiliation(s)
- Mariia Matveieva
- Institute of Molecular and Translational MedicineFaculty of Medicine and DentistryPalacký University and University Hospital in Olomouc Hnevotinska 5, 77900 Olomouc Czech Republic
| | - Mark T. D. Cronin
- School of Pharmacy and Biomolecular SciencesLiverpool John Moores University Byrom Street Liverpool L3 3AF United Kingdom
| | - Pavel Polishchuk
- Institute of Molecular and Translational MedicineFaculty of Medicine and DentistryPalacký University and University Hospital in Olomouc Hnevotinska 5, 77900 Olomouc Czech Republic
- A.M. Butlerov Institute of ChemistryKazan Federal University Kremlevskaya Str. 10 Kazan Russia
| |
Collapse
|
29
|
Cardoso‐Silva J, Papadatos G, Papageorgiou LG, Tsoka S. Optimal Piecewise Linear Regression Algorithm for QSAR Modelling. Mol Inform 2018; 38:e1800028. [DOI: 10.1002/minf.201800028] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 08/02/2018] [Indexed: 12/20/2022]
Affiliation(s)
- Jonathan Cardoso‐Silva
- Department of Informatics, Faculty of Natural and Mathematical SciencesKing's College London, Bush House London WC2B 4BG UK
| | - George Papadatos
- European Molecular Biology Laboratory – European Bioinformatics InstituteWellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD UK
- GlaxoSmithKline Gunnels Wood Road Stevenage, Hertfordshire SG1 2NY UK
| | - Lazaros G. Papageorgiou
- Centre for Process Systems Engineering, Department of Chemical EngineeringUniversity College London Torrington Place London WC1E 7JE UK
| | - Sophia Tsoka
- Department of Informatics, Faculty of Natural and Mathematical SciencesKing's College London, Bush House London WC2B 4BG UK
| |
Collapse
|
30
|
Polishchuk P. Interpretation of Quantitative Structure–Activity Relationship Models: Past, Present, and Future. J Chem Inf Model 2017; 57:2618-2639. [DOI: 10.1021/acs.jcim.7b00274] [Citation(s) in RCA: 120] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Affiliation(s)
- Pavel Polishchuk
- Institute of Molecular and
Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hněvotínská
1333/5, 779 00 Olomouc, Czech Republic
| |
Collapse
|
31
|
Marchese Robinson RL, Palczewska A, Palczewski J, Kidley N. Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets. J Chem Inf Model 2017; 57:1773-1792. [PMID: 28715209 DOI: 10.1021/acs.jcim.6b00753] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical programming language and the Python program HeatMapWrapper [ https://doi.org/10.5281/zenodo.495163 ] for heat map generation.
Collapse
Affiliation(s)
- Richard L Marchese Robinson
- Syngenta Ltd., Jealott's Hill International Research Centre , Bracknell, Berkshire RG42 6EY, United Kingdom.,School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University , James Parsons Building, Byrom Street, Liverpool L3 3AF, United Kingdom
| | - Anna Palczewska
- Department of Computing, University of Bradford , Bradford BD7 1DP, United Kingdom
| | - Jan Palczewski
- School of Mathematics, University of Leeds , Leeds LS2 9JT, United Kingdom
| | - Nathan Kidley
- Syngenta Ltd., Jealott's Hill International Research Centre , Bracknell, Berkshire RG42 6EY, United Kingdom
| |
Collapse
|
32
|
Coley CW, Barzilay R, Green WH, Jaakkola TS, Jensen KF. Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction. J Chem Inf Model 2017; 57:1757-1772. [PMID: 28696688 DOI: 10.1021/acs.jcim.6b00601] [Citation(s) in RCA: 220] [Impact Index Per Article: 31.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The task of learning an expressive molecular representation is central to developing quantitative structure-activity and property relationships. Traditional approaches rely on group additivity rules, empirical measurements or parameters, or generation of thousands of descriptors. In this paper, we employ a convolutional neural network for this embedding task by treating molecules as undirected graphs with attributed nodes and edges. Simple atom and bond attributes are used to construct atom-specific feature vectors that take into account the local chemical environment using different neighborhood radii. By working directly with the full molecular graph, there is a greater opportunity for models to identify important features relevant to a prediction task. Unlike other graph-based approaches, our atom featurization preserves molecule-level spatial information that significantly enhances model performance. Our models learn to identify important features of atom clusters for the prediction of aqueous solubility, octanol solubility, melting point, and toxicity. Extensions and limitations of this strategy are discussed.
Collapse
Affiliation(s)
- Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Tommi S Jaakkola
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
33
|
Isayev O, Oses C, Toher C, Gossett E, Curtarolo S, Tropsha A. Universal fragment descriptors for predicting properties of inorganic crystals. Nat Commun 2017; 8:15679. [PMID: 28580961 PMCID: PMC5465371 DOI: 10.1038/ncomms15679] [Citation(s) in RCA: 171] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 04/11/2017] [Indexed: 12/23/2022] Open
Abstract
Although historically materials discovery has been driven by a laborious trial-and-error process, knowledge-driven materials design can now be enabled by the rational combination of Machine Learning methods and materials databases. Here, data from the AFLOW repository for ab initio calculations is combined with Quantitative Materials Structure-Property Relationship models to predict important properties: metal/insulator classification, band gap energy, bulk/shear moduli, Debye temperature and heat capacities. The prediction's accuracy compares well with the quality of the training data for virtually any stoichiometric inorganic crystalline material, reciprocating the available thermomechanical experimental data. The universality of the approach is attributed to the construction of the descriptors: Property-Labelled Materials Fragments. The representations require only minimal structural input allowing straightforward implementations of simple heuristic design rules.
Collapse
Affiliation(s)
- Olexandr Isayev
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Corey Oses
- Center for Materials Genomics, Duke University, Durham, North Carolina 27708, USA
| | - Cormac Toher
- Center for Materials Genomics, Duke University, Durham, North Carolina 27708, USA
| | - Eric Gossett
- Center for Materials Genomics, Duke University, Durham, North Carolina 27708, USA
| | - Stefano Curtarolo
- Center for Materials Genomics, Duke University, Durham, North Carolina 27708, USA
- Materials Science, Electrical Engineering, Physics and Chemistry, Duke University, Durham, North Carolina 27708, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
34
|
A Ranged Series of Drug Molecule Fragments Defining Their Neuroavailability. Pharm Chem J 2017. [DOI: 10.1007/s11094-017-1553-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
35
|
Structural, Physicochemical and Stereochemical Interpretation of QSAR Models Based on Simplex Representation of Molecular Structure. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2017. [DOI: 10.1007/978-3-319-56850-8_4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
36
|
Alves V, Muratov E, Capuzzi S, Politi R, Low Y, Braga R, Zakharov AV, Sedykh A, Mokshyna E, Farag S, Andrade C, Kuz'min V, Fourches D, Tropsha A. Alarms about structural alerts. GREEN CHEMISTRY : AN INTERNATIONAL JOURNAL AND GREEN CHEMISTRY RESOURCE : GC 2016; 18:4348-4360. [PMID: 28503093 PMCID: PMC5423727 DOI: 10.1039/c6gc01492e] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Structural alerts are widely accepted in chemical toxicology and regulatory decision support as a simple and transparent means to flag potential chemical hazards or group compounds into categories for read-across. However, there has been a growing concern that alerts disproportionally flag too many chemicals as toxic, which questions their reliability as toxicity markers. Conversely, the rigorously developed and properly validated statistical QSAR models can accurately and reliably predict the toxicity of a chemical; however, their use in regulatory toxicology has been hampered by the lack of transparency and interpretability. We demonstrate that contrary to the common perception of QSAR models as "black boxes" they can be used to identify statistically significant chemical substructures (QSAR-based alerts) that influence toxicity. We show through several case studies, however, that the mere presence of structural alerts in a chemical, irrespective of the derivation method (expert-based or QSAR-based), should be perceived only as hypotheses of possible toxicological effect. We propose a new approach that synergistically integrates structural alerts and rigorously validated QSAR models for a more transparent and accurate safety assessment of new chemicals.
Collapse
Affiliation(s)
- Vinicius Alves
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
- Laboratory for Molecular Modeling and Design, Department of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Eugene Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
- Department of Chemical Technology, Odessa National Polytechnic University, Odessa, 65000, Ukraine
| | - Stephen Capuzzi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Regina Politi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Yen Low
- Netflix, San Francisco, CA 94123, USA
| | - Rodolpho Braga
- Laboratory for Molecular Modeling and Design, Department of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Alexey V. Zakharov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Rockville, MD 20850, USA
| | | | - Elena Mokshyna
- Laboratory of Theoretical Chemistry, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080, Ukraine
| | - Sherif Farag
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Carolina Andrade
- Laboratory for Molecular Modeling and Design, Department of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Victor Kuz'min
- Laboratory of Theoretical Chemistry, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080, Ukraine
| | - Denis Fourches
- Department of Chemistry and Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| |
Collapse
|
37
|
Polishchuk P, Tinkov O, Khristova T, Ognichenko L, Kosinskaya A, Varnek A, Kuz’min V. Structural and Physico-Chemical Interpretation (SPCI) of QSAR Models and Its Comparison with Matched Molecular Pair Analysis. J Chem Inf Model 2016; 56:1455-69. [DOI: 10.1021/acs.jcim.6b00371] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Affiliation(s)
- Pavel Polishchuk
- Institute
of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hněvotínská
1333/5, 779 00 Olomouc, Czech Republic
- A. V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine, Lustdorfskaya
doroga 86, 65080 Odessa, Ukraine
| | - Oleg Tinkov
- T. G. Shevchenko Transdniestria State University, ul. 25 Oktyabrya 107, 3300 Tiraspol, Transdniestria, Republic of Moldova
| | - Tatiana Khristova
- A. V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine, Lustdorfskaya
doroga 86, 65080 Odessa, Ukraine
- Laboratoire
de Chémoinformatique, UMR 7140 CNRS, Université de Strasbourg, 1 rue Blaise Pascal, 67000 Strasbourg, France
| | - Ludmila Ognichenko
- A. V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine, Lustdorfskaya
doroga 86, 65080 Odessa, Ukraine
| | - Anna Kosinskaya
- A. V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine, Lustdorfskaya
doroga 86, 65080 Odessa, Ukraine
| | - Alexandre Varnek
- Laboratoire
de Chémoinformatique, UMR 7140 CNRS, Université de Strasbourg, 1 rue Blaise Pascal, 67000 Strasbourg, France
- Laboratory
of Chemoinformatics and Molecular Modeling, Butlerov Institut of Chemistry, Kazan Federal University, Kremlevskaya 18, Kazan, Russia
| | - Victor Kuz’min
- A. V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine, Lustdorfskaya
doroga 86, 65080 Odessa, Ukraine
| |
Collapse
|
38
|
Zhang YY, Liu H, Summerfield SG, Luscombe CN, Sahi J. Integrating in Silico and in Vitro Approaches To Predict Drug Accessibility to the Central Nervous System. Mol Pharm 2016; 13:1540-50. [DOI: 10.1021/acs.molpharmaceut.6b00031] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Affiliation(s)
- Yan-Yan Zhang
- Drug Metabolism and Pharmacokinetics, Platform Technology and Science China, GlaxoSmithKline R&D, Shanghai, China
| | - Houfu Liu
- Drug Metabolism and Pharmacokinetics, Platform Technology and Science China, GlaxoSmithKline R&D, Shanghai, China
| | - Scott G. Summerfield
- David Jack Centre for R&D, GlaxoSmithKline R&D, Park Road, Ware, Hertfordshire, SG12 0DP, U.K
| | - Christopher N. Luscombe
- Computational
and Structural Chemistry, GlaxoSmithKline Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Jasminder Sahi
- Drug Metabolism and Pharmacokinetics, Platform Technology and Science China, GlaxoSmithKline R&D, Shanghai, China
| |
Collapse
|
39
|
Pérez-Garrido A, Rivero-Buceta V, Cano G, Kumar S, Pérez-Sánchez H, Bautista MT. Latest QSAR study of adenosine A $$_{\mathrm{2B}}$$ 2 B receptor affinity of xanthines and deazaxanthines. Mol Divers 2015; 19:975-89. [DOI: 10.1007/s11030-015-9608-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 06/24/2015] [Indexed: 12/24/2022]
|
40
|
Balfer J, Bajorath J. Visualization and Interpretation of Support Vector Machine Activity Predictions. J Chem Inf Model 2015; 55:1136-47. [DOI: 10.1021/acs.jcim.5b00175] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Jenny Balfer
- Department of Life Science
Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal
Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science
Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal
Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
| |
Collapse
|
41
|
Cortes-Ciriano I, Murrell DS, van Westen GJ, Bender A, Malliavin TE. Prediction of the potency of mammalian cyclooxygenase inhibitors with ensemble proteochemometric modeling. J Cheminform 2015; 7:1. [PMID: 25705261 PMCID: PMC4335128 DOI: 10.1186/s13321-014-0049-z] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Accepted: 11/21/2014] [Indexed: 12/16/2022] Open
Abstract
Cyclooxygenases (COX) are present in the body in two isoforms, namely: COX-1, constitutively expressed, and COX-2, induced in physiopathological conditions such as cancer or chronic inflammation. The inhibition of COX with non-steroideal anti-inflammatory drugs (NSAIDs) is the most widely used treatment for chronic inflammation despite the adverse effects associated to prolonged NSAIDs intake. Although selective COX-2 inhibition has been shown not to palliate all adverse effects (e.g. cardiotoxicity), there are still niche populations which can benefit from selective COX-2 inhibition. Thus, capitalizing on bioactivity data from both isoforms simultaneously would contribute to develop COX inhibitors with better safety profiles. We applied ensemble proteochemometric modeling (PCM) for the prediction of the potency of 3,228 distinct COX inhibitors on 11 mammalian cyclooxygenases. Ensemble PCM models ([Formula: see text], and RMSEtest = 0.71) outperformed models exclusively trained on compound ([Formula: see text], and RMSEtest = 1.09) or protein descriptors ([Formula: see text] and RMSEtest = 1.10) on the test set. Moreover, PCM predicted COX potency for 1,086 selective and non-selective COX inhibitors with [Formula: see text] and RMSEtest = 0.76. These values are in agreement with the maximum and minimum achievable [Formula: see text] and RMSEtest values of approximately 0.68 for both metrics. Confidence intervals for individual predictions were calculated from the standard deviation of the predictions from the individual models composing the ensembles. Finally, two substructure analysis pipelines singled out chemical substructures implicated in both potency and selectivity in agreement with the literature. Graphical AbstractPrediction of uncorrelated bioactivity profiles for mammalian COX inhibitors with Ensemble Proteochemometric Modeling.
Collapse
Affiliation(s)
- Isidro Cortes-Ciriano
- Département de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3825, 25, rue du Dr Roux, Paris, 75015 France
| | - Daniel S Murrell
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Gerard Jp van Westen
- European Molecular Biology Laboratory European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Andreas Bender
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Thérèse E Malliavin
- Département de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3825, 25, rue du Dr Roux, Paris, 75015 France
| |
Collapse
|
42
|
Sushko Y, Novotarskyi S, Körner R, Vogt J, Abdelaziz A, Tetko IV. Prediction-driven matched molecular pairs to interpret QSARs and aid the molecular optimization process. J Cheminform 2014; 6:48. [PMID: 25544551 PMCID: PMC4272757 DOI: 10.1186/s13321-014-0048-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Accepted: 11/07/2014] [Indexed: 11/24/2022] Open
Abstract
Background QSAR is an established and powerful method for cheap in silico assessment of physicochemical properties and biological activities of chemical compounds. However, QSAR models are rather complex mathematical constructs that cannot easily be interpreted. Medicinal chemists would benefit from practical guidance regarding which molecules to synthesize. Another possible approach is analysis of pairs of very similar molecules, so-called matched molecular pairs (MMPs). Such an approach allows identification of molecular transformations that affect particular activities (e.g. toxicity). In contrast to QSAR, chemical interpretation of these transformations is straightforward. Furthermore, such transformations can give medicinal chemists useful hints for the hit-to-lead optimization process. Results The current study suggests a combination of QSAR and MMP approaches by finding MMP transformations based on QSAR predictions for large chemical datasets. The study shows that such an approach, referred to as prediction-driven MMP analysis, is a useful tool for medicinal chemists, allowing identification of large numbers of “interesting” transformations that can be used to drive the molecular optimization process. All the methodological developments have been implemented as software products available online as part of OCHEM (http://ochem.eu/). Conclusions The prediction-driven MMPs methodology was exemplified by two use cases: modelling of aquatic toxicity and CYP3A4 inhibition. This approach helped us to interpret QSAR models and allowed identification of a number of “significant” molecular transformations that affect the desired properties. This can facilitate drug design as a part of molecular optimization process. Molecular matched pairs and transformation graphs facilitate interpretable molecular optimisation process. ![]()
Collapse
Affiliation(s)
- Yurii Sushko
- eADMET GmbH, Lichtenbergstraße 8, D-85748 Garching, Munich Germany
| | | | - Robert Körner
- eADMET GmbH, Lichtenbergstraße 8, D-85748 Garching, Munich Germany
| | - Joachim Vogt
- eADMET GmbH, Lichtenbergstraße 8, D-85748 Garching, Munich Germany
| | - Ahmed Abdelaziz
- eADMET GmbH, Lichtenbergstraße 8, D-85748 Garching, Munich Germany
| | - Igor V Tetko
- eADMET GmbH, Lichtenbergstraße 8, D-85748 Garching, Munich Germany ; Helmholtz-Zentrum München - German Research Centre for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, D-85764 Neuherberg, Germany ; A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya St. 18, 420008 Kazan, Russia
| |
Collapse
|
43
|
Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz'min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A. QSAR modeling: where have you been? Where are you going to? J Med Chem 2014; 57:4977-5010. [PMID: 24351051 PMCID: PMC4074254 DOI: 10.1021/jm4004285] [Citation(s) in RCA: 1023] [Impact Index Per Article: 102.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Quantitative structure-activity relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and experimental chemists toward collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific standards to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.
Collapse
Affiliation(s)
- Artem Cherkasov
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, V6H3Z6, Canada
| | - Eugene N. Muratov
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
- Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Odessa, 65080, Ukraine
| | - Denis Fourches
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Alexandre Varnek
- Department of Chemistry, L. Pasteur University of Strasbourg, Strasbourg, 67000, France
| | - Igor I. Baskin
- Department of Physics, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Mark Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L33AF, UK
| | - John Dearden
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L33AF, UK
| | - Paola Gramatica
- Department of Structural and Functional Biology, University of Insubria, Varese, 21100, Italy
| | | | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, 20126, Italy
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, 20126, Italy
| | - Victor E. Kuz'min
- Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Odessa, 65080, Ukraine
| | | | - Romualdo Benigni
- Environment and Health Department, Istituto Superiore di Sanita’, Rome, 00161, Italy
| | | | - James Rathman
- Altamira LLC, Columbus OH 43235, USA
- Department of Chemical and Biomolecular Engineering, the Ohio State University, Columbus, OH 43215, USA
| | | | | | - Ann Richard
- National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27519, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| |
Collapse
|
44
|
Webb SJ, Hanser T, Howlin B, Krause P, Vessey JD. Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity. J Cheminform 2014; 6:8. [PMID: 24661325 PMCID: PMC3997921 DOI: 10.1186/1758-2946-6-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Accepted: 03/18/2014] [Indexed: 01/14/2023] Open
Abstract
Background A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints. A fragmentation algorithm is utilised to investigate the model’s behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model’s behaviour for the specific query. Results Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. Conclusion This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.
Collapse
Affiliation(s)
- Samuel J Webb
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Holbeck, Leeds LS11 5PY UK.
| | | | | | | | | |
Collapse
|