Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rosenbaum L, Hinselmann G, Jahn A, Zell A. Interpreting linear support vector machine models with heat map molecule coloring. J Cheminform 2011;3:11. [PMID: 21439031 PMCID: PMC3076244 DOI: 10.1186/1758-2946-3-11] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2010] [Accepted: 03/25/2011] [Indexed: 11/17/2022] Open

For:	Rosenbaum L, Hinselmann G, Jahn A, Zell A. Interpreting linear support vector machine models with heat map molecule coloring. J Cheminform 2011;3:11. [PMID: 21439031 PMCID: PMC3076244 DOI: 10.1186/1758-2946-3-11] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2010] [Accepted: 03/25/2011] [Indexed: 11/17/2022] Open

Number

Cited by Other Article(s)

Ancajas CMF, Oyedele AS, Butt CM, Walker AS. Advances, opportunities, and challenges in methods for interrogating the structure activity relationships of natural products. Nat Prod Rep 2024. [PMID: 38912779 DOI: 10.1039/d4np00009a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]

Abstract

Time span in literature: 1985-early 2024Natural products play a key role in drug discovery, both as a direct source of drugs and as a starting point for the development of synthetic compounds. Most natural products are not suitable to be used as drugs without further modification due to insufficient activity or poor pharmacokinetic properties. Choosing what modifications to make requires an understanding of the compound's structure-activity relationships. Use of structure-activity relationships is commonplace and essential in medicinal chemistry campaigns applied to human-designed synthetic compounds. Structure-activity relationships have also been used to improve the properties of natural products, but several challenges still limit these efforts. Here, we review methods for studying the structure-activity relationships of natural products and their limitations. Specifically, we will discuss how synthesis, including total synthesis, late-stage derivatization, chemoenzymatic synthetic pathways, and engineering and genome mining of biosynthetic pathways can be used to produce natural product analogs and discuss the challenges of each of these approaches. Finally, we will discuss computational methods including machine learning methods for analyzing the relationship between biosynthetic genes and product activity, computer aided drug design techniques, and interpretable artificial intelligence approaches towards elucidating structure-activity relationships from models trained to predict bioactivity from chemical structure. Our focus will be on these latter topics as their applications for natural products have not been extensively reviewed. We suggest that these methods are all complementary to each other, and that only collaborative efforts using a combination of these techniques will result in a full understanding of the structure-activity relationships of natural products.

Collapse

Jia X, Wang T, Zhu H. Advancing Computational Toxicology by Interpretable Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023;57:17690-17706. [PMID: 37224004 PMCID: PMC10666545 DOI: 10.1021/acs.est.3c00653] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/05/2023] [Accepted: 05/05/2023] [Indexed: 05/26/2023]

Evaluating eXplainable artificial intelligence tools for hard disk drive predictive maintenance. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10354-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Jiménez-Luna J, Skalic M, Weskamp N. Benchmarking Molecular Feature Attribution Methods with Activity Cliffs. J Chem Inf Model 2022;62:274-283. [PMID: 35019265 DOI: 10.1021/acs.jcim.1c01163] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Rodríguez-Pérez R, Bajorath J. Explainable Machine Learning for Property Predictions in Compound Optimization. J Med Chem 2021;64:17744-17752. [PMID: 34902252 DOI: 10.1021/acs.jmedchem.1c01789] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Heat Maps: Perfect Maps for Quick Reading? Comparing Usability of Heat Maps with Different Levels of Generalization. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2021. [DOI: 10.3390/ijgi10080562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]

Ye Z, Yang W, Yang Y, Ouyang D. Interpretable machine learning methods for in vitro pharmaceutical formulation development. FOOD FRONTIERS 2021. [DOI: 10.1002/fft2.78] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Wang Z, Dreyer F, Pulvermüller F, Ntemou E, Vajkoczy P, Fekonja LS, Picht T. Support vector machine based aphasia classification of transcranial magnetic stimulation language mapping in brain tumor patients. Neuroimage Clin 2020;29:102536. [PMID: 33360768 PMCID: PMC7772815 DOI: 10.1016/j.nicl.2020.102536] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 11/30/2020] [Accepted: 12/12/2020] [Indexed: 12/03/2022]

Abstract

Repetitive TMS (rTMS) allows for non-invasive and transient disruption of local neuronal functioning. We used machine learning approaches to assess whether brain tumor patients can be accurately classified into aphasic and non-aphasic groups using their rTMS language mapping results as input features. Given that each tumor affects the subject-specific language networks differently, resulting in heterogenous rTMS functional mappings, we propose the use of machine learning strategies to classify potential patterns of rTMS language mapping results. We retrospectively included 90 patients with left perisylvian world health organization (WHO) grade II-IV gliomas that underwent presurgical navigated rTMS language mapping. Within our cohort, 29 of 90 (32.2%) patients suffered from at least mild aphasia as shown in the Aachen Aphasia Test based Berlin Aphasia Score (BAS). After spatial normalization to MNI 152 of all rTMS spots, we calculated the error rate (ER) in each stimulated cortical area (28 regions of interest, ROI) by automated anatomical labeling parcellation (AAL3) and IIT. We used a support vector machine (SVM) to classify significant areas in relation to aphasia. After feeding the ROIs into the SVM model, it revealed that in addition to age (w = 2.98), the ERs of the left supramarginal gyrus (w = 3.64), left inferior parietal gyrus (w = 2.28) and right pars triangularis (w = 1.34) contributed more than other features to the model. The model's sensitivity was 86.2%, the specificity was 82.0%, the overall accuracy was 85.5% and the AUC was 89.3%. Our results demonstrate an increased vulnerability of right inferior pars triangularis to rTMS in aphasic patients due to left perisylvian gliomas. This finding points towards a functional relevant involvement of the right pars triangularis in response to aphasia. The tumor location feature, specified by calculating overlaps with white and grey matter atlases, did not affect the SVM model. The left supramarginal gyrus as a feature improved our SVM model the most. Additionally, our results could point towards a decreasing potential for neuroplasticity with age.

Collapse

Tinkov O, Polishchuk P, Matveieva M, Grigorev V, Grigoreva L, Porozov Y. The Influence of Structural Patterns on Acute Aquatic Toxicity of Organic Compounds. Mol Inform 2020;40:e2000209. [PMID: 33029954 DOI: 10.1002/minf.202000209] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 10/01/2020] [Indexed: 12/28/2022]

Sheridan RP. Interpretation of QSAR Models by Coloring Atoms According to Changes in Predicted Activity: How Robust Is It? J Chem Inf Model 2019;59:1324-1337. [DOI: 10.1021/acs.jcim.8b00825] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Mellor C, Marchese Robinson R, Benigni R, Ebbrell D, Enoch S, Firman J, Madden J, Pawar G, Yang C, Cronin M. Molecular fingerprint-derived similarity measures for toxicological read-across: Recommendations for optimal use. Regul Toxicol Pharmacol 2019;101:121-134. [DOI: 10.1016/j.yrtph.2018.11.002] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Revised: 10/09/2018] [Accepted: 11/12/2018] [Indexed: 12/20/2022]

Pu L, Naderi M, Liu T, Wu HC, Mukhopadhyay S, Brylinski M. eToxPred: a machine learning-based approach to estimate the toxicity of drug candidates. BMC Pharmacol Toxicol 2019;20:2. [PMID: 30621790 PMCID: PMC6325674 DOI: 10.1186/s40360-018-0282-6] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 12/26/2018] [Indexed: 12/20/2022] Open

Helal S, Li J, Liu L, Ebrahimie E, Dawson S, Murray DJ, Long Q. Predicting academic performance by considering student heterogeneity. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.07.042] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Mayr A, Klambauer G, Unterthiner T, Steijaert M, Wegner JK, Ceulemans H, Clevert DA, Hochreiter S. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem Sci 2018;9:5441-5451. [PMID: 30155234 PMCID: PMC6011237 DOI: 10.1039/c8sc00148k] [Citation(s) in RCA: 252] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 05/16/2018] [Indexed: 12/24/2022] Open

Marchese Robinson RL, Palczewska A, Palczewski J, Kidley N. Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets. J Chem Inf Model 2017;57:1773-1792. [PMID: 28715209 DOI: 10.1021/acs.jcim.6b00753] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Abstract

The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical programming language and the Python program HeatMapWrapper [ https://doi.org/10.5281/zenodo.495163 ] for heat map generation.

Collapse

Improving the expressiveness of black-box models for predicting student performance. COMPUTERS IN HUMAN BEHAVIOR 2017. [DOI: 10.1016/j.chb.2016.09.001] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Koutsoukas A, Monaghan KJ, Li X, Huan J. Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J Cheminform 2017;9:42. [PMID: 29086090 PMCID: PMC5489441 DOI: 10.1186/s13321-017-0226-y] [Citation(s) in RCA: 112] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 05/27/2017] [Indexed: 01/03/2023] Open

Abstract

Background

In recent years, research in artificial neural networks has resurged, now under the deep-learning umbrella, and grown extremely popular. Recently reported success of DL techniques in crowd-sourced QSAR and predictive toxicology competitions has showcased these methods as powerful tools in drug-discovery and toxicology research. The aim of this work was dual, first large number of hyper-parameter configurations were explored to investigate how they affect the performance of DNNs and could act as starting points when tuning DNNs and second their performance was compared to popular methods widely employed in the field of cheminformatics namely Naïve Bayes, k-nearest neighbor, random forest and support vector machines. Moreover, robustness of machine learning methods to different levels of artificially introduced noise was assessed. The open-source Caffe deep-learning framework and modern NVidia GPU units were utilized to carry out this study, allowing large number of DNN configurations to be explored.

Results

We show that feed-forward deep neural networks are capable of achieving strong classification performance and outperform shallow methods across diverse activity classes when optimized. Hyper-parameters that were found to play critical role are the activation function, dropout regularization, number hidden layers and number of neurons. When compared to the rest methods, tuned DNNs were found to statistically outperform, with p value <0.01 based on Wilcoxon statistical test. DNN achieved on average MCC units of 0.149 higher than NB, 0.092 than kNN, 0.052 than SVM with linear kernel, 0.021 than RF and finally 0.009 higher than SVM with radial basis function kernel. When exploring robustness to noise, non-linear methods were found to perform well when dealing with low levels of noise, lower than or equal to 20%, however when dealing with higher levels of noise, higher than 30%, the Naïve Bayes method was found to perform well and even outperform at the highest level of noise 50% more sophisticated methods across several datasets.

Electronic supplementary material

The online version of this article (doi:10.1186/s13321-017-0226-y) contains supplementary material, which is available to authorized users.

Collapse

Shoombuatong W, Prathipati P, Owasirikul W, Worachartcheewan A, Simeon S, Anuwongcharoen N, Wikberg JES, Nantasenamat C. Towards the Revival of Interpretable QSAR Models. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2017. [DOI: 10.1007/978-3-319-56850-8_1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Gütlein M, Kramer S. Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability. J Cheminform 2016;8:60. [PMID: 27853484 PMCID: PMC5088672 DOI: 10.1186/s13321-016-0173-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 10/18/2016] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Even though circular fingerprints have been first introduced more than 50 years ago, they are still widely used for building highly predictive, state-of-the-art (Q)SAR models. Historically, these structural fragments were designed to search large molecular databases. Hence, to derive a compact representation, circular fingerprint fragments are often folded to comparatively short bit-strings. However, folding fingerprints introduces bit collisions, and therefore adds noise to the encoded structural information and removes its interpretability. Both representations, folded as well as unprocessed fingerprints, are often used for (Q)SAR modeling.

RESULTS

We show that it can be preferable to build (Q)SAR models with circular fingerprint fragments that have been filtered by supervised feature selection, instead of applying folded or all fragments. Compared to folded fingerprints, filtered fingerprints significantly increase predictive performance and remain unambiguous and interpretable. Compared to unprocessed fingerprints, filtered fingerprints reduce the computational effort and are a more compact and less redundant feature representation. Depending on the selected learning algorithm filtering yields about equally predictive (Q)SAR models. We demonstrate the suitability of filtered fingerprints for (Q)SAR modeling by presenting our freely available web service Collision-free Filtered Circular Fingerprints that provides rationales for predictions by highlighting important structural features in the query compound (see http://coffer.informatik.uni-mainz.de).

CONCLUSIONS

Circular fingerprints are potent structural features that yield highly predictive models and encode interpretable structural information. However, to not lose interpretability, circular fingerprints should not be folded when building prediction models. Our experiments show that filtering is a suitable option to reduce the high computational effort when working with all fingerprint fragments. Additionally, our experiments suggest that the area under precision recall curve is a more sensible statistic for validating (Q)SAR models for virtual screening than the area under ROC or other measures for early recognition.

GRAPHICAL ABSTRACT

Collapse

Polishchuk P, Tinkov O, Khristova T, Ognichenko L, Kosinskaya A, Varnek A, Kuz’min V. Structural and Physico-Chemical Interpretation (SPCI) of QSAR Models and Its Comparison with Matched Molecular Pair Analysis. J Chem Inf Model 2016;56:1455-69. [DOI: 10.1021/acs.jcim.6b00371] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]

Rivera-Borroto OM, García-de la Vega JM, Marrero-Ponce Y, Grau R. Relational Agreement Measures for Similarity Searching of Cheminformatic Data Sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016;13:158-67. [PMID: 26886740 DOI: 10.1109/tcbb.2015.2424435] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]

Gagliano SA, Ravji R, Barnes MR, Weale ME, Knight J. Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants. Sci Rep 2015;5:13373. [PMID: 26300220 PMCID: PMC4642511 DOI: 10.1038/srep13373] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 07/24/2015] [Indexed: 11/09/2022] Open

Balfer J, Bajorath J. Visualization and Interpretation of Support Vector Machine Activity Predictions. J Chem Inf Model 2015;55:1136-47. [DOI: 10.1021/acs.jcim.5b00175] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Cortes-Ciriano I, Murrell DS, van Westen GJ, Bender A, Malliavin TE. Prediction of the potency of mammalian cyclooxygenase inhibitors with ensemble proteochemometric modeling. J Cheminform 2015;7:1. [PMID: 25705261 PMCID: PMC4335128 DOI: 10.1186/s13321-014-0049-z] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Accepted: 11/21/2014] [Indexed: 12/16/2022] Open

Abstract

Cyclooxygenases (COX) are present in the body in two isoforms, namely: COX-1, constitutively expressed, and COX-2, induced in physiopathological conditions such as cancer or chronic inflammation. The inhibition of COX with non-steroideal anti-inflammatory drugs (NSAIDs) is the most widely used treatment for chronic inflammation despite the adverse effects associated to prolonged NSAIDs intake. Although selective COX-2 inhibition has been shown not to palliate all adverse effects (e.g. cardiotoxicity), there are still niche populations which can benefit from selective COX-2 inhibition. Thus, capitalizing on bioactivity data from both isoforms simultaneously would contribute to develop COX inhibitors with better safety profiles. We applied ensemble proteochemometric modeling (PCM) for the prediction of the potency of 3,228 distinct COX inhibitors on 11 mammalian cyclooxygenases. Ensemble PCM models ([Formula: see text], and RMSEtest = 0.71) outperformed models exclusively trained on compound ([Formula: see text], and RMSEtest = 1.09) or protein descriptors ([Formula: see text] and RMSEtest = 1.10) on the test set. Moreover, PCM predicted COX potency for 1,086 selective and non-selective COX inhibitors with [Formula: see text] and RMSEtest = 0.76. These values are in agreement with the maximum and minimum achievable [Formula: see text] and RMSEtest values of approximately 0.68 for both metrics. Confidence intervals for individual predictions were calculated from the standard deviation of the predictions from the individual models composing the ensembles. Finally, two substructure analysis pipelines singled out chemical substructures implicated in both potency and selectivity in agreement with the literature. Graphical AbstractPrediction of uncorrelated bioactivity profiles for mammalian COX inhibitors with Ensemble Proteochemometric Modeling.

Collapse

Dörr A, Rosenbaum L, Zell A. A ranking method for the concurrent learning of compounds with various activity profiles. J Cheminform 2015;7:2. [PMID: 25643067 PMCID: PMC4306736 DOI: 10.1186/s13321-014-0050-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 12/11/2014] [Indexed: 11/30/2022] Open

Carroll G, Slip D, Jonsen I, Harcourt R. Supervised accelerometry analysis can identify prey capture by penguins at sea. ACTA ACUST UNITED AC 2014;217:4295-302. [PMID: 25394635 DOI: 10.1242/jeb.113076] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Balfer J, Bajorath J. Introduction of a methodology for visualization and graphical interpretation of Bayesian classification models. J Chem Inf Model 2014;54:2451-68. [PMID: 25137527 DOI: 10.1021/ci500410g] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]

Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz'min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A. QSAR modeling: where have you been? Where are you going to? J Med Chem 2014;57:4977-5010. [PMID: 24351051 PMCID: PMC4074254 DOI: 10.1021/jm4004285] [Citation(s) in RCA: 1040] [Impact Index Per Article: 104.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Affiliation(s)

Artem Cherkasov Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, V6H3Z6, Canada
Eugene N. Muratov Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Odessa, 65080, Ukraine
Denis Fourches Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
Alexandre Varnek Department of Chemistry, L. Pasteur University of Strasbourg, Strasbourg, 67000, France
Igor I. Baskin Department of Physics, Lomonosov Moscow State University, Moscow, 119991, Russia
Mark Cronin School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L33AF, UK
John Dearden School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L33AF, UK
Paola Gramatica Department of Structural and Functional Biology, University of Insubria, Varese, 21100, Italy
Yvonne C. Martin Martin Consulting, Waukegan, IL, 60079, USA
Roberto Todeschini Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, 20126, Italy
Viviana Consonni Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, 20126, Italy
Victor E. Kuz'min Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Odessa, 65080, Ukraine
Richard Cramer Tripos, Inc., St. Louis, MO, 63144, USA
Romualdo Benigni Environment and Health Department, Istituto Superiore di Sanita’, Rome, 00161, Italy
Chihae Yang Altamira LLC, Columbus OH 43235, USA
James Rathman Altamira LLC, Columbus OH 43235, USA Department of Chemical and Biomolecular Engineering, the Ohio State University, Columbus, OH 43215, USA
Lothar Terfloth Molecular Networks GmbH, 91052 Erlangen, Germany
Johann Gasteiger Molecular Networks GmbH, 91052 Erlangen, Germany
Ann Richard National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27519, USA
Alexander Tropsha Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA

Collapse

Hanser T, Barber C, Rosser E, Vessey JD, Webb SJ, Werner S. Self organising hypothesis networks: a new approach for representing and structuring SAR knowledge. J Cheminform 2014;6:21. [PMID: 24959206 PMCID: PMC4048587 DOI: 10.1186/1758-2946-6-21] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Accepted: 03/28/2014] [Indexed: 12/01/2022] Open

Abstract

Background

Combining different sources of knowledge to build improved structure activity relationship models is not easy owing to the variety of knowledge formats and the absence of a common framework to interoperate between learning techniques. Most of the current approaches address this problem by using consensus models that operate at the prediction level. We explore the possibility to directly combine these sources at the knowledge level, with the aim to harvest potentially increased synergy at an earlier stage. Our goal is to design a general methodology to facilitate knowledge discovery and produce accurate and interpretable models.

Results

To combine models at the knowledge level, we propose to decouple the learning phase from the knowledge application phase using a pivot representation (lingua franca) based on the concept of hypothesis. A hypothesis is a simple and interpretable knowledge unit. Regardless of its origin, knowledge is broken down into a collection of hypotheses. These hypotheses are subsequently organised into hierarchical network. This unification permits to combine different sources of knowledge into a common formalised framework. The approach allows us to create a synergistic system between different forms of knowledge and new algorithms can be applied to leverage this unified model. This first article focuses on the general principle of the Self Organising Hypothesis Network (SOHN) approach in the context of binary classification problems along with an illustrative application to the prediction of mutagenicity.

Conclusion

It is possible to represent knowledge in the unified form of a hypothesis network allowing interpretable predictions with performances comparable to mainstream machine learning techniques. This new approach offers the potential to combine knowledge from different sources into a common framework in which high level reasoning and meta-learning can be applied; these latter perspectives will be explored in future work.

Collapse

Palczewska A, Palczewski J, Marchese Robinson R, Neagu D. Interpreting Random Forest Classification Models Using a Feature Contribution Method. INTEGRATION OF REUSABLE SYSTEMS 2014. [DOI: 10.1007/978-3-319-04717-1_9] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Riniker S, Landrum GA. Similarity maps - a visualization strategy for molecular fingerprints and machine-learning methods. J Cheminform 2013;5:43. [PMID: 24063533 PMCID: PMC3852750 DOI: 10.1186/1758-2946-5-43] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2013] [Accepted: 07/23/2013] [Indexed: 02/03/2023] Open

Polishchuk PG, Kuz'min VE, Artemenko AG, Muratov EN. Universal Approach for Structural Interpretation of QSAR/QSPR Models. Mol Inform 2013;32:843-53. [DOI: 10.1002/minf.201300029] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Accepted: 07/29/2013] [Indexed: 11/07/2022]

Rosenbaum L, Dörr A, Bauer MR, Boeckler FM, Zell A. Inferring multi-target QSAR models with taxonomy-based multi-task learning. J Cheminform 2013;5:33. [PMID: 23842210 PMCID: PMC4104930 DOI: 10.1186/1758-2946-5-33] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2013] [Accepted: 07/03/2013] [Indexed: 11/13/2022] Open

Chen H, Carlsson L, Eriksson M, Varkonyi P, Norinder U, Nilsson I. Beyond the Scope of Free-Wilson Analysis: Building Interpretable QSAR Models with Machine Learning Algorithms. J Chem Inf Model 2013;53:1324-36. [DOI: 10.1021/ci4001376] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Vlachakis D, Tsiliki G, Pavlopoulou A, Roubelakis MG, Tsaniras SC, Kossida S. Antiviral Stratagems Against HIV-1 Using RNA Interference (RNAi) Technology. Evol Bioinform Online 2013;9:203-13. [PMID: 23761954 PMCID: PMC3662398 DOI: 10.4137/ebo.s11412] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Reutlinger M, Schneider G. Nonlinear dimensionality reduction and mapping of compound libraries for drug discovery. J Mol Graph Model 2012;34:108-17. [PMID: 22326864 DOI: 10.1016/j.jmgm.2011.12.006] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2011] [Revised: 12/13/2011] [Accepted: 12/14/2011] [Indexed: 01/29/2023]