Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Guha R. On the interpretation and interpretability of quantitative structure–activity relationship models. J Comput Aided Mol Des 2008;22:857-71. [DOI: 10.1007/s10822-008-9240-5] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2008] [Accepted: 08/14/2008] [Indexed: 01/28/2023]

For:	Guha R. On the interpretation and interpretability of quantitative structure–activity relationship models. J Comput Aided Mol Des 2008;22:857-71. [DOI: 10.1007/s10822-008-9240-5] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2008] [Accepted: 08/14/2008] [Indexed: 01/28/2023]

Number

Cited by Other Article(s)

Shah SK, Chaple DD, Masand VH, Jawarkar RD, Chaudhari S, Abiramasundari A, Zaki MEA, Al-Hussain SA. Multi-Target In-Silico modeling strategies to discover novel angiotensin converting enzyme and neprilysin dual inhibitors. Sci Rep 2024;14:15991. [PMID: 38987327 PMCID: PMC11237057 DOI: 10.1038/s41598-024-66230-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 06/28/2024] [Indexed: 07/12/2024] Open

Ancajas CMF, Oyedele AS, Butt CM, Walker AS. Advances, opportunities, and challenges in methods for interrogating the structure activity relationships of natural products. Nat Prod Rep 2024. [PMID: 38912779 DOI: 10.1039/d4np00009a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]

Abstract

Time span in literature: 1985-early 2024Natural products play a key role in drug discovery, both as a direct source of drugs and as a starting point for the development of synthetic compounds. Most natural products are not suitable to be used as drugs without further modification due to insufficient activity or poor pharmacokinetic properties. Choosing what modifications to make requires an understanding of the compound's structure-activity relationships. Use of structure-activity relationships is commonplace and essential in medicinal chemistry campaigns applied to human-designed synthetic compounds. Structure-activity relationships have also been used to improve the properties of natural products, but several challenges still limit these efforts. Here, we review methods for studying the structure-activity relationships of natural products and their limitations. Specifically, we will discuss how synthesis, including total synthesis, late-stage derivatization, chemoenzymatic synthetic pathways, and engineering and genome mining of biosynthetic pathways can be used to produce natural product analogs and discuss the challenges of each of these approaches. Finally, we will discuss computational methods including machine learning methods for analyzing the relationship between biosynthetic genes and product activity, computer aided drug design techniques, and interpretable artificial intelligence approaches towards elucidating structure-activity relationships from models trained to predict bioactivity from chemical structure. Our focus will be on these latter topics as their applications for natural products have not been extensively reviewed. We suggest that these methods are all complementary to each other, and that only collaborative efforts using a combination of these techniques will result in a full understanding of the structure-activity relationships of natural products.

Collapse

Shirasawa R, Takaki K, Miyao T. Generalizability Improvement of Interpretable Symbolic Regression Models for Quantitative Structure-Activity Relationships. ACS OMEGA 2024;9:9463-9474. [PMID: 38434845 PMCID: PMC10905595 DOI: 10.1021/acsomega.3c09047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 01/20/2024] [Accepted: 01/26/2024] [Indexed: 03/05/2024]

Lei L, Zhang L, Han Z, Chen Q, Liao P, Wu D, Tai J, Xie B, Su Y. Advancing chronic toxicity risk assessment in freshwater ecology by molecular characterization-based machine learning. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024;342:123093. [PMID: 38072027 DOI: 10.1016/j.envpol.2023.123093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/30/2023] [Accepted: 12/02/2023] [Indexed: 01/26/2024]

Abstract

The continuously increased production of various chemicals and their release into environments have raised potential negative effects on ecological health. However, traditional labor-intensive assessment methods cannot effectively and rapidly evaluate these hazards, especially for chronic risk. In this study, machine learning (ML) was employed to construct quantitative structure-activity relationship (QSAR) models, enabling the prediction of chronic toxicity to aquatic organisms by leveraging the molecular characteristics of pollutants, namely, the molecular descriptors, fingerprints, and graphs. The limited dataset size hindered the notable advantages of the graph attention network (GAT) model for the molecular graphs. Considering computational efficiency and performance (R2 = 0.78; RMSE = 0.77), XGBoost (XGB) was used for reliable QSAR-ML models predicting chronic toxicity using small- or medium-sized tabular data and the molecular descriptors. Further kernel density estimation analysis confirmed the high accuracy of the model for pollutant concentrations ranging from 10-3 to 102 mg/L, effectively aligning with most environmental scenarios. Model interpretation showed SlogP and exposure duration as the primary influential factors. SlogP, representing the distribution coefficient of a molecule between lipophilic and hydrophilic environments, had a negative effect on the toxicity outcomes. Additionally, the exposure duration played a crucial role in determining the chronic toxicity. Finally, the chronic toxicity data of bisphenol A validated the robustness and reliability of the model established in this research. Our study provided a robust and feasible methodology for chronic ecological risk evaluation of various types of pollutants and could facilitate and increase the use of ML applications in environmental fields.

Collapse

Affiliation(s)

Lang Lei Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Liangmao Zhang Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Zhibang Han Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Qirui Chen Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Pengcheng Liao Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Dong Wu Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
Jun Tai Shanghai Environmental Sanitation Engineering Design Institute Co., Ltd., Shanghai, 200232, China
Bing Xie Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
Yinglong Su Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China.

Collapse

Kotli M, Piir G, Maran U. Pesticide effect on earthworm lethality via interpretable machine learning. JOURNAL OF HAZARDOUS MATERIALS 2024;461:132577. [PMID: 37793249 DOI: 10.1016/j.jhazmat.2023.132577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 09/15/2023] [Accepted: 09/16/2023] [Indexed: 10/06/2023]

Takkar P, Singh B, Pani B, Kumar R. Design, synthesis and in silico evaluation of newer 1,4-dihydropyridine based amlodipine bio-isosteres as promising antihypertensive agents. RSC Adv 2023;13:34239-34248. [PMID: 38020040 PMCID: PMC10664005 DOI: 10.1039/d3ra06387a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 11/13/2023] [Indexed: 12/01/2023] Open

Ida T, Kojima H, Hori Y. Predicting and analyzing organic reaction pathways by combining machine learning and reaction network approaches. Chem Commun (Camb) 2023;59:12439-12442. [PMID: 37773321 DOI: 10.1039/d3cc03890d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/01/2023]

Tamang JSD, Banerjee S, Baidya SK, Ghosh B, Adhikari N, Jha T. Employing comparative QSAR techniques for the recognition of dibenzofuran and dibenzothiophene derivatives toward MMP-12 inhibition. J Biomol Struct Dyn 2023:1-17. [PMID: 37498149 DOI: 10.1080/07391102.2023.2239923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 07/17/2023] [Indexed: 07/28/2023]

Akinola LK, Uzairu A, Shallangwa GA, Abechi SE. Development and Validation of Predictive Quantitative Structure-Activity Relationship Models for Estrogenic Activities of Hydroxylated Polychlorinated Biphenyls. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY 2023;42:823-834. [PMID: 36692119 DOI: 10.1002/etc.5566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 11/17/2022] [Accepted: 01/18/2023] [Indexed: 06/17/2023]

Belfield SJ, Cronin MTD, Enoch SJ, Firman JW. Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs). PLoS One 2023;18:e0282924. [PMID: 37163504 PMCID: PMC10171609 DOI: 10.1371/journal.pone.0282924] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/26/2023] [Indexed: 05/12/2023] Open

Abstract

Recent years have seen a substantial growth in the adoption of machine learning approaches for the purposes of quantitative structure-activity relationship (QSAR) development. Such a trend has coincided with desire to see a shifting in the focus of methodology employed within chemical safety assessment: away from traditional reliance upon animal-intensive in vivo protocols, and towards increased application of in silico (or computational) predictive toxicology. With QSAR central amongst techniques applied in this area, the emergence of algorithms trained through machine learning with the objective of toxicity estimation has, quite naturally, arisen. On account of the pattern-recognition capabilities of the underlying methods, the statistical power of the ensuing models is potentially considerable-appropriate for the handling even of vast, heterogeneous datasets. However, such potency comes at a price: this manifesting as the general practical deficits observed with respect to the reproducibility, interpretability and generalisability of the resulting tools. Unsurprisingly, these elements have served to hinder broader uptake (most notably within a regulatory setting). Areas of uncertainty liable to accompany (and hence detract from applicability of) toxicological QSAR have previously been highlighted, accompanied by the forwarding of suggestions for "best practice" aimed at mitigation of their influence. However, the scope of such exercises has remained limited to "classical" QSAR-that conducted through use of linear regression and related techniques, with the adoption of comparatively few features or descriptors. Accordingly, the intention of this study has been to extend the remit of best practice guidance, so as to address concerns specific to employment of machine learning within the field. In doing so, the impact of strategies aimed at enhancing the transparency (feature importance, feature reduction), generalisability (cross-validation) and predictive power (hyperparameter optimisation) of algorithms, trained upon real toxicity data through six common learning approaches, is evaluated.

Collapse

Lin Z, Chou WC. Machine learning and artificial intelligence in toxicological sciences. Toxicol Sci 2022;189:7-19. [PMID: 35861448 PMCID: PMC9609874 DOI: 10.1093/toxsci/kfac075] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

Machine learning and artificial intelligence approaches have revolutionized multiple disciplines, including toxicology. This review summarizes representative recent applications of machine learning and artificial intelligence approaches in different areas of toxicology, including physiologically based pharmacokinetic (PBPK) modeling, quantitative structure-activity relationship modeling for toxicity prediction, adverse outcome pathway analysis, high-throughput screening, toxicogenomics, big data and toxicological databases. By leveraging machine learning and artificial intelligence approaches, now it is possible to develop PBPK models for hundreds of chemicals efficiently, to create in silico models to predict toxicity for a large number of chemicals with similar accuracies compared to in vivo animal experiments, and to analyze a large amount of different types of data (toxicogenomics, high-content image data, etc.) to generate new insights into toxicity mechanisms rapidly, which was impossible by manual approaches in the past. To continue advancing the field of toxicological sciences, several challenges should be considered: (1) not all machine learning models are equally useful for a particular type of toxicology data, and thus it is important to test different methods to determine the optimal approach; (2) current toxicity prediction is mainly on bioactivity classification (yes/no), so additional studies are needed to predict the intensity of effect or dose-response relationship; (3) as more data become available, it is crucial to perform rigorous data quality check and develop infrastructure to store, share, analyze, evaluate, and manage big data; and (4) it is important to convert machine learning models to user-friendly interfaces to facilitate their applications by both computational and bench scientists.

Collapse

Akinola LK, Uzairu A, Shallangwa GA, Abechi SE. Quantitative structure–activity relationship modeling of hydroxylated polychlorinated biphenyls as constitutive androstane receptor agonists. Struct Chem 2022. [DOI: 10.1007/s11224-022-01992-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Kaboudi N, Shayanfar A. Predicting the Drug Clearance Pathway with Structural Descriptors. Eur J Drug Metab Pharmacokinet 2022;47:363-369. [PMID: 35147854 DOI: 10.1007/s13318-021-00748-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/12/2021] [Indexed: 11/30/2022]

Abstract

BACKGROUND AND OBJECTIVE

The clearance, by renal elimination or hepatic metabolism, is one of the most important pharmacokinetic parameters of a drug. It allows the half-life, bioavailability, and drug-drug interactions to be predicted, and it can also affect the dose regimen of a drug. Predicting the clearance pathways of new chemical candidates during drug development is vital in order to minimize the risks of possible side effects and drug interactions. Many in vivo methods have been established to predict drug clearance in humans, and these mainly rely on data from in vivo studies in preclinical species-mainly rats, dogs, and monkeys. They are also time consuming and expensive. The aim of this study was to find the relationship between structural parameters of drugs and their clearance pathways.

METHODS

The clearance pathway of each drug was obtained from the literature. Various structural descriptors [Abraham solvation parameters, topological polar surface area, numbers of hydrogen-bond donors and acceptors, number of rotatable bonds, molecular weight, logarithm of the partition coefficient (logP), and logarithm of the distribution coefficient at pH 7.4 (logD_7.4)] were applied to develop a mechanistic model for predicting clearance pathways.

RESULTS

The results of this study indicate that compounds with logD_7.4 > 1 or with zero or one hydrogen-bond donor undergo hepatic metabolism, whereas the clearance pathway for chemicals with logD_7.4 < - 2 is renal elimination. Furthermore, models established using logistic regression based on five structural parameters for compounds with - 2 < logD_7.4 < 1 could be used in a clearance pathway prediction tool. The overall prediction accuracies of the first and second models were 84.8% and 84.4%, respectively.

CONCLUSION

The developed model can be used to find the clearance pathways of new drug candidates with acceptable accuracy. The main descriptors that are used to evaluate this parameter are the hydrophobicity and the number of hydrogen-bonding functional groups of the compound.

Collapse

Rodríguez-Pérez R, Bajorath J. Explainable Machine Learning for Property Predictions in Compound Optimization. J Med Chem 2021;64:17744-17752. [PMID: 34902252 DOI: 10.1021/acs.jmedchem.1c01789] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

De Jesus Silva J, Bartalucci N, Jelier B, Grosslight S, Gensch T, Schünemann C, Müller B, Kamer PCJ, Copéret C, Sigman MS, Togni A. Development and Molecular Understanding of a Pd‐Catalyzed Cyanation of Aryl Boronic Acids Enabled by High‐Throughput Experimentation and Data Analysis. Helv Chim Acta 2021. [DOI: 10.1002/hlca.202100200] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Medina-Franco JL, Sánchez-Cruz N, López-López E, Díaz-Eufracio BI. Progress on open chemoinformatic tools for expanding and exploring the chemical space. J Comput Aided Mol Des 2021;36:341-354. [PMID: 34143323 PMCID: PMC8211976 DOI: 10.1007/s10822-021-00399-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 06/14/2021] [Indexed: 01/10/2023]

Gajewicz-Skretna A, Kar S, Piotrowska M, Leszczynski J. The kernel-weighted local polynomial regression (KwLPR) approach: an efficient, novel tool for development of QSAR/QSAAR toxicity extrapolation models. J Cheminform 2021;13:9. [PMID: 33579384 PMCID: PMC7881668 DOI: 10.1186/s13321-021-00484-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Accepted: 01/11/2021] [Indexed: 11/10/2022] Open

Abstract

The ability of accurate predictions of biological response (biological activity/property/toxicity) of a given chemical makes the quantitative structure‐activity/property/toxicity relationship (QSAR/QSPR/QSTR) models unique among the in silico tools. In addition, experimental data of selected species can also be used as an independent variable along with other structural as well as physicochemical variables to predict the response for different species formulating quantitative activity–activity relationship (QAAR)/quantitative structure–activity–activity relationship (QSAAR) approach. Irrespective of the models' type, the developed model's quality, and reliability need to be checked through multiple classical stringent validation metrics. Among the validation metrics, error-based metrics are more significant as the basic idea of a good predictive model is to improve the predictions' quality by lowering the predicted residuals for new query compounds. Following the concept, we have checked the predictive quality of the QSAR and QSAAR models employing kernel-weighted local polynomial regression (KwLPR) approach over the traditional linear and non-linear regression-based approaches tools such as multiple linear regression (MLR) and k nearest neighbors (kNN). Five datasets which were previously modeled using linear and non-linear regression method were considered to implement the KwPLR approach, followed by comparison of their validation metrics outcomes. For all five cases, the KwLPR based models reported better results over the traditional approaches. The present study's focus is not to develop a better or improved QSAR/QSAAR model over the previous ones, but to demonstrate the advantage, prediction power, and reliability of the KwLPR algorithm and establishing it as a novel, powerful cheminformatic tool. To facilitate the use of the KwLPR algorithm for QSAR/QSPR/QSTR/QSAAR modeling, the authors provide an in-house developed KwLPR.RMD script under the open-source R programming language.

Collapse

Antiplasmodial activity of sulfonylhydrazones: in vitro and in silico approaches. Future Med Chem 2020;13:233-250. [PMID: 33295837 DOI: 10.4155/fmc-2020-0229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Russo DP, Yan X, Shende S, Huang H, Yan B, Zhu H. Virtual Molecular Projections and Convolutional Neural Networks for the End-to-End Modeling of Nanoparticle Activities and Properties. Anal Chem 2020;92:13971-13979. [PMID: 32970421 DOI: 10.1021/acs.analchem.0c02878] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Structural analysis of arylsulfonamide-based carboxylic acid derivatives: a QSAR study to identify the structural contributors toward their MMP-9 inhibition. Struct Chem 2020. [DOI: 10.1007/s11224-020-01635-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Zhang X, Xu J, Yang J, Chen L, Zhou H, Liu X, Li H, Lin T, Ying Y. Understanding the learning mechanism of convolutional neural networks in spectral analysis. Anal Chim Acta 2020;1119:41-51. [DOI: 10.1016/j.aca.2020.03.055] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Revised: 02/27/2020] [Accepted: 03/29/2020] [Indexed: 11/16/2022]

Luo Y, Gopaluni B, Xu Y, Cao L, Zhu QX. A Novel Approach to Alarm Causality Analysis Using Active Dynamic Transfer Entropy. Ind Eng Chem Res 2020. [DOI: 10.1021/acs.iecr.9b06262] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Barnard AS, Motevalli B, Parker AJ, Fischer JM, Feigl CA, Opletal G. Nanoinformatics, and the big challenges for the science of small things. NANOSCALE 2019;11:19190-19201. [PMID: 31397835 DOI: 10.1039/c9nr05912a] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Halder AK, Giri AK, Cordeiro MNDS. Multi-Target Chemometric Modelling, Fragment Analysis and Virtual Screening with ERK Inhibitors as Potential Anticancer Agents. Molecules 2019;24:molecules24213909. [PMID: 31671605 PMCID: PMC6864583 DOI: 10.3390/molecules24213909] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 10/21/2019] [Accepted: 10/25/2019] [Indexed: 02/07/2023] Open

Rodríguez-Pérez R, Bajorath J. Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values. J Med Chem 2019;63:8761-8777. [PMID: 31512867 DOI: 10.1021/acs.jmedchem.9b01101] [Citation(s) in RCA: 138] [Impact Index Per Article: 27.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

Halder AK, Cordeiro MNDS. Development of Multi-Target Chemometric Models for the Inhibition of Class I PI3K Enzyme Isoforms: A Case Study Using QSAR-Co Tool. Int J Mol Sci 2019;20:ijms20174191. [PMID: 31461863 PMCID: PMC6747073 DOI: 10.3390/ijms20174191] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 08/23/2019] [Accepted: 08/24/2019] [Indexed: 12/12/2022] Open

Ciallella HL, Zhu H. Advancing Computational Toxicology in the Big Data Era by Artificial Intelligence: Data-Driven and Mechanism-Driven Modeling for Chemical Toxicity. Chem Res Toxicol 2019;32:536-547. [PMID: 30907586 DOI: 10.1021/acs.chemrestox.8b00393] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

The current limits in virtual screening and property prediction. Future Med Chem 2018;10:1623-1635. [PMID: 29953247 DOI: 10.4155/fmc-2017-0303] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Polishchuk P. Interpretation of Quantitative Structure–Activity Relationship Models: Past, Present, and Future. J Chem Inf Model 2017;57:2618-2639. [DOI: 10.1021/acs.jcim.7b00274] [Citation(s) in RCA: 120] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Marchese Robinson RL, Palczewska A, Palczewski J, Kidley N. Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets. J Chem Inf Model 2017;57:1773-1792. [PMID: 28715209 DOI: 10.1021/acs.jcim.6b00753] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Abstract

The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical programming language and the Python program HeatMapWrapper [ https://doi.org/10.5281/zenodo.495163 ] for heat map generation.

Collapse

Koutsoukas A, Monaghan KJ, Li X, Huan J. Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J Cheminform 2017;9:42. [PMID: 29086090 PMCID: PMC5489441 DOI: 10.1186/s13321-017-0226-y] [Citation(s) in RCA: 112] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 05/27/2017] [Indexed: 01/03/2023] Open

Abstract

Background

In recent years, research in artificial neural networks has resurged, now under the deep-learning umbrella, and grown extremely popular. Recently reported success of DL techniques in crowd-sourced QSAR and predictive toxicology competitions has showcased these methods as powerful tools in drug-discovery and toxicology research. The aim of this work was dual, first large number of hyper-parameter configurations were explored to investigate how they affect the performance of DNNs and could act as starting points when tuning DNNs and second their performance was compared to popular methods widely employed in the field of cheminformatics namely Naïve Bayes, k-nearest neighbor, random forest and support vector machines. Moreover, robustness of machine learning methods to different levels of artificially introduced noise was assessed. The open-source Caffe deep-learning framework and modern NVidia GPU units were utilized to carry out this study, allowing large number of DNN configurations to be explored.

Results

We show that feed-forward deep neural networks are capable of achieving strong classification performance and outperform shallow methods across diverse activity classes when optimized. Hyper-parameters that were found to play critical role are the activation function, dropout regularization, number hidden layers and number of neurons. When compared to the rest methods, tuned DNNs were found to statistically outperform, with p value <0.01 based on Wilcoxon statistical test. DNN achieved on average MCC units of 0.149 higher than NB, 0.092 than kNN, 0.052 than SVM with linear kernel, 0.021 than RF and finally 0.009 higher than SVM with radial basis function kernel. When exploring robustness to noise, non-linear methods were found to perform well when dealing with low levels of noise, lower than or equal to 20%, however when dealing with higher levels of noise, higher than 30%, the Naïve Bayes method was found to perform well and even outperform at the highest level of noise 50% more sophisticated methods across several datasets.

Electronic supplementary material

The online version of this article (doi:10.1186/s13321-017-0226-y) contains supplementary material, which is available to authorized users.

Collapse

Structural, Physicochemical and Stereochemical Interpretation of QSAR Models Based on Simplex Representation of Molecular Structure. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2017. [DOI: 10.1007/978-3-319-56850-8_4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Hanser T, Barber C, Marchaland JF, Werner S. Applicability domain: towards a more formal definition. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2016;27:893-909. [PMID: 27827546 DOI: 10.1080/1062936x.2016.1250229] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 10/16/2016] [Indexed: 06/06/2023]

Falchi F, Bertozzi SM, Ottonello G, Ruda GF, Colombano G, Fiorelli C, Martucci C, Bertorelli R, Scarpelli R, Cavalli A, Bandiera T, Armirotti A. Kernel-Based, Partial Least Squares Quantitative Structure-Retention Relationship Model for UPLC Retention Time Prediction: A Useful Tool for Metabolite Identification. Anal Chem 2016;88:9510-9517. [DOI: 10.1021/acs.analchem.6b02075] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Hong H, Shen J, Ng HW, Sakkiah S, Ye H, Ge W, Gong P, Xiao W, Tong W. A Rat α-Fetoprotein Binding Activity Prediction Model to Facilitate Assessment of the Endocrine Disruption Potential of Environmental Chemicals. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2016;13:372. [PMID: 27023588 PMCID: PMC4847034 DOI: 10.3390/ijerph13040372] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Revised: 03/10/2016] [Accepted: 03/22/2016] [Indexed: 11/21/2022]

Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz'min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A. QSAR modeling: where have you been? Where are you going to? J Med Chem 2014;57:4977-5010. [PMID: 24351051 PMCID: PMC4074254 DOI: 10.1021/jm4004285] [Citation(s) in RCA: 1040] [Impact Index Per Article: 104.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Affiliation(s)

Artem Cherkasov Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, V6H3Z6, Canada
Eugene N. Muratov Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Odessa, 65080, Ukraine
Denis Fourches Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
Alexandre Varnek Department of Chemistry, L. Pasteur University of Strasbourg, Strasbourg, 67000, France
Igor I. Baskin Department of Physics, Lomonosov Moscow State University, Moscow, 119991, Russia
Mark Cronin School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L33AF, UK
John Dearden School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L33AF, UK
Paola Gramatica Department of Structural and Functional Biology, University of Insubria, Varese, 21100, Italy
Yvonne C. Martin Martin Consulting, Waukegan, IL, 60079, USA
Roberto Todeschini Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, 20126, Italy
Viviana Consonni Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, 20126, Italy
Victor E. Kuz'min Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Odessa, 65080, Ukraine
Richard Cramer Tripos, Inc., St. Louis, MO, 63144, USA
Romualdo Benigni Environment and Health Department, Istituto Superiore di Sanita’, Rome, 00161, Italy
Chihae Yang Altamira LLC, Columbus OH 43235, USA
James Rathman Altamira LLC, Columbus OH 43235, USA Department of Chemical and Biomolecular Engineering, the Ohio State University, Columbus, OH 43215, USA
Lothar Terfloth Molecular Networks GmbH, 91052 Erlangen, Germany
Johann Gasteiger Molecular Networks GmbH, 91052 Erlangen, Germany
Ann Richard National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27519, USA
Alexander Tropsha Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA

Collapse

Impact of distance-based metric learning on classification and visualization model performance and structure–activity landscapes. J Comput Aided Mol Des 2014;28:61-73. [DOI: 10.1007/s10822-014-9719-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Accepted: 01/24/2014] [Indexed: 10/25/2022]

Cumming JG, Davis AM, Muresan S, Haeberlein M, Chen H. Chemical predictive modelling to improve compound quality. Nat Rev Drug Discov 2014;12:948-62. [PMID: 24287782 DOI: 10.1038/nrd4128] [Citation(s) in RCA: 156] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Dander A, Mueller LA, Gallasch R, Pabinger S, Emmert-Streib F, Graber A, Dehmer M. [COMMODE] a large-scale database of molecular descriptors using compounds from PubChem. SOURCE CODE FOR BIOLOGY AND MEDICINE 2013;8:22. [PMID: 24225386 PMCID: PMC3831596 DOI: 10.1186/1751-0473-8-22] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 10/29/2013] [Indexed: 11/11/2022]

Polishchuk PG, Kuz'min VE, Artemenko AG, Muratov EN. Universal Approach for Structural Interpretation of QSAR/QSPR Models. Mol Inform 2013;32:843-53. [DOI: 10.1002/minf.201300029] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Accepted: 07/29/2013] [Indexed: 11/07/2022]

Phatak SS, Stephan CC, Cavasotto CN. High-throughput and in silico screenings in drug discovery. Expert Opin Drug Discov 2013;4:947-59. [PMID: 23480542 DOI: 10.1517/17460440903190961] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Guha R. On exploring structure-activity relationships. Methods Mol Biol 2013;993:81-94. [PMID: 23568465 PMCID: PMC4852705 DOI: 10.1007/978-1-62703-342-8_6] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Lamchouri F, Toufik H, Elmalki Z, Bouzzine SM, Ait Malek H, Hamidi M, Bouachrine M. Quantitative structure–activity relationship of antitumor and neurotoxic β-carbolines alkaloids: nine harmine derivatives. RESEARCH ON CHEMICAL INTERMEDIATES 2012. [DOI: 10.1007/s11164-012-0752-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Varnek A, Baskin I. Machine learning methods for property prediction in chemoinformatics: Quo Vadis? J Chem Inf Model 2012;52:1413-37. [PMID: 22582859 DOI: 10.1021/ci200409x] [Citation(s) in RCA: 148] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]

Powerful Integrative Tool Combining Structure Generator and Chemical Space Visualization. JOURNAL OF COMPUTER AIDED CHEMISTRY 2012. [DOI: 10.2751/jcac.13.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Sukumar N, Krein MP, Embrechts MJ. Predictive cheminformatics in drug discovery: statistical modeling for analysis of micro-array and gene expression data. Methods Mol Biol 2012;910:165-94. [PMID: 22821597 DOI: 10.1007/978-1-61779-965-5_9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]

Hutter MC. Determining the Degree of Randomness of Descriptors in Linear Regression Equations with Respect to the Data Size. J Chem Inf Model 2011;51:3099-104. [DOI: 10.1021/ci200403j] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Carbon-Mangels M, Hutter MC. Selecting Relevant Descriptors for Classification by Bayesian Estimates: A Comparison with Decision Trees and Support Vector Machines Approaches for Disparate Data Sets. Mol Inform 2011;30:885-95. [PMID: 27468108 DOI: 10.1002/minf.201100069] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2011] [Accepted: 08/19/2011] [Indexed: 11/12/2022]

Soto AJ, Vazquez GE, Strickert M, Ponzoni I. Target-Driven Subspace Mapping Methods and Their Applicability Domain Estimation. Mol Inform 2011;30:779-89. [DOI: 10.1002/minf.201100053] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2011] [Accepted: 05/26/2011] [Indexed: 11/06/2022]

From known knowns to known unknowns: predicting in vivo drug metabolites. Bioanalysis 2011;1:393-414. [PMID: 21083174 DOI: 10.4155/bio.09.32] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Abstract

'It is better to be useful than perfect'. This review attempts to critically cover and assess the currently available approaches and tools to answer the crucial question: Is it possible (and if it is, to what extent is it possible) to predict in vivo metabolites and their abundances on the basis of in vitro and preclinical animal studies? In preclinical drug development, it is possible to produce metabolite patterns from a candidate drug by virtual means (i.e., in silico models), but these are not yet validated. However, they may be useful to cover the potential range of metabolites. In vitro metabolite patterns and apparent relative abundances are produced by various in vitro systems employing tissue preparations (mainly liver) and in most cases using liquid chromatography-mass spectrometry analytical techniques for tentative identification. The pattern of the metabolites produced depends on the enzyme source; the most comprehensive source of drug-metabolizing enzymes is cultured human hepatocytes, followed by liver homogenate fortified with appropriate cofactors. For specific purposes, such as the identification of metabolizing enzyme(s), recombinant enzymes can be used. Metabolite data from animal in vitro and in vivo experiments, despite known species differences, may help pinpoint metabolites that are not apparently produced in in vitro human systems, or suggest alternative experimental approaches. The range of metabolites detected provides clues regarding the enzymes attacking the molecule under study. We also discuss established approaches to identify the major enzymes. The last question, regarding reliability and robustness of metabolite extrapolations from in vitro to in vivo, both qualitatively and quantitatively, cannot be easily answered. There are a number of examples in the literature suggesting that extrapolations are generally useful, but there are only a few systematic and comprehensive studies to validate in vitro-in vivo extrapolations. In conclusion, extrapolation from preclinical metabolite data to the in vivo situation is certainly useful, but it is not known to what extent.

Collapse