1
|
Kaya I, Colmenarejo G. Analysis of Nuisance Substructures and Aggregators in a Comprehensive Database of Food Chemical Compounds. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2020; 68:8812-8824. [PMID: 32687707 DOI: 10.1021/acs.jafc.0c02521] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The mechanistic understanding of the biological effects of foods involves the testing of food compounds in biochemical and biological assays. Positive results in these assays can be artifactual due to some properties of the compound: namely chemical reactivity, membrane disruption, redox cycling, etc., or through the formation of colloidal aggregates. Within the drug discovery field, a wide set of so-called "nuisance" filters have been developed to identify substructures prone to assay artifacts and/or promiscuity, e.g., the pan-assay interference compounds (PAINS) and others. In the subarea of natural products, a similar concept is the so-called invalid metabolic panaceas (IMPs). Finally, tools to identify putative aggregators have also been developed. Here, we analyzed the presence of nuisance substructures, IMPs, and aggregators in a large database of food compounds (the FooDB), which should be useful to the researchers working in the field, in order to be aware of possible artifact/promiscuity issues in their assays.
Collapse
Affiliation(s)
- Irem Kaya
- Biostatistics and Bioinformatics Unit, IMDEA Food CEI UAM+CSIC, E28049 Madrid, Spain
| | - Gonzalo Colmenarejo
- Biostatistics and Bioinformatics Unit, IMDEA Food CEI UAM+CSIC, E28049 Madrid, Spain
| |
Collapse
|
2
|
Golbraikh A. Value of p-Value. Mol Inform 2019; 38:e1800152. [PMID: 31188542 DOI: 10.1002/minf.201800152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 05/07/2019] [Indexed: 11/09/2022]
Abstract
The goal of this manuscript is to discuss important aspects of external validation of classification and category Quantitative Structure - Activity/Property/Toxicity Relationship QS/A/P/T/R models that to the best of author's knowledge are not addressed in publications. Statistical significance (in terms of p-value) and accuracy of prediction (in terms of Correct Classification Rate (CCR)) of external validation set compounds are among most important characteristics of the models. We assert that in most cases the models built for classification or category response variable should be statistically significant and predictive for each class or category. We show that three thresholds of the number of compounds in each class or category of the external validation sets should be satisfied. 1) The p-value criterion can never be satisfied, if the number of compounds is below the first threshold. 2) If the number of compounds is between the first and the second thresholds, p-value criterion should be used. 3) If it is higher than the third threshold, classification or category accuracy criterion should be used. 4) If the number of compounds is between second and third thresholds, either one or the other criterion should be used depending on the value of p-value. 5) When the number of compounds in the class approaches infinity, the maximum relative error of prediction approaches the relative expected error. The results are of interest in other areas of multidimensional data analysis.
Collapse
Affiliation(s)
- Alexander Golbraikh
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, CB #7360, Chapel Hill, NC 27599
| |
Collapse
|
3
|
Yan L, Zhang Q, Huang F, Nie WW, Hu CQ, Ying HZ, Dong XW, Zhao MR. Ternary classification models for predicting hormonal activities of chemicals via nuclear receptors. Chem Phys Lett 2018. [DOI: 10.1016/j.cplett.2018.06.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
4
|
Danishuddin, Kumar A, Mobeen F, Khan AU. Development of Ligand and Structure-based classification models to design novel inhibitors against antibiotic hydrolyzing enzymes: Integration of web server. J Biomol Struct Dyn 2017; 36:2966-2975. [PMID: 28849700 DOI: 10.1080/07391102.2017.1373034] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Danishuddin
- a Medical Microbiology and Molecular Biology Laboratory, Interdisciplinary Biotechnology Unit , Aligarh Muslim University , Aligarh , UP 202002 , India
| | - Amit Kumar
- a Medical Microbiology and Molecular Biology Laboratory, Interdisciplinary Biotechnology Unit , Aligarh Muslim University , Aligarh , UP 202002 , India
| | - Fauzul Mobeen
- a Medical Microbiology and Molecular Biology Laboratory, Interdisciplinary Biotechnology Unit , Aligarh Muslim University , Aligarh , UP 202002 , India
| | - Asad U Khan
- a Medical Microbiology and Molecular Biology Laboratory, Interdisciplinary Biotechnology Unit , Aligarh Muslim University , Aligarh , UP 202002 , India
| |
Collapse
|
5
|
Abstract
INTRODUCTION With the emergence of the 'big data' era, the biomedical research community has great interest in exploiting publicly available chemical information for drug discovery. PubChem is an example of public databases that provide a large amount of chemical information free of charge. AREAS COVERED This article provides an overview of how PubChem's data, tools, and services can be used for virtual screening and reviews recent publications that discuss important aspects of exploiting PubChem for drug discovery. EXPERT OPINION PubChem offers comprehensive chemical information useful for drug discovery. It also provides multiple programmatic access routes, which are essential to build automated virtual screening pipelines that exploit PubChem data. In addition, PubChemRDF allows users to download PubChem data and load them into a local computing facility, facilitating data integration between PubChem and other resources. PubChem resources have been used in many studies for developing bioactivity and toxicity prediction models, discovering polypharmacologic (multi-target) ligands, and identifying new macromolecule targets of compounds (for drug-repurposing or off-target side effect prediction). These studies demonstrate the usefulness of PubChem as a key resource for computer-aided drug discovery and related area.
Collapse
Affiliation(s)
- Sunghwan Kim
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Department of Health and Human Services, Bethesda , MD , USA
| |
Collapse
|
6
|
Luo M, Reid TE, Wang XS. Discovery of Natural Product-Derived 5-HT1A Receptor Binders by Cheminfomatics Modeling of Known Binders, High Throughput Screening and Experimental Validation. Comb Chem High Throughput Screen 2016; 18:685-92. [PMID: 26138565 DOI: 10.2174/1386207318666150703113948] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2014] [Revised: 06/16/2014] [Accepted: 06/30/2015] [Indexed: 11/22/2022]
Abstract
The human 5-hydroxytryptamine receptor subtype 1A (5-HT1A) is highly expressed in the raphe nuclei region and limbic structures; for that reason 5-HT1A has served as a promising target for treating human mood disorders and neurodegenerative diseases. We have developed binary quantitative structure-activity relationship (QSAR) models for 5- HT1A binding using data retrieved from the WOMBAT database and the k-Nearest Neighbor (kNN) machine learning method. A rigorous QSAR modeling and screening workflow had been followed, with extensive internal and external validation processes. The models' classification accuracies to discriminate 5-HT1A binders from the non-binders are as high as 96% for the external validation. These models were employed further to mine two major natural products screening libraries, i.e. TimTec Natural Product Library (NPL) and Natural Derivatives Library (NDL). In the end five screening hits were tested by radioligand binding assays with a success rate of 40%, and two Library compounds were confirmed to be binders at the μM concentration against the human 5-HT1A receptor. The combined application of rigorous QSAR modeling and model-based virtual screening presents a powerful means for profiling natural products compounds with important biomedical activities.
Collapse
Affiliation(s)
| | | | - Xiang Simon Wang
- Department of Pharmaceutical Sciences, College of Pharmacy, Howard University, 2300 4th St. NW, Washington, DC 20059, USA.
| |
Collapse
|
7
|
Irwin JJ, Duan D, Torosyan H, Doak AK, Ziebart KT, Sterling T, Tumanian G, Shoichet BK. An Aggregation Advisor for Ligand Discovery. J Med Chem 2015; 58:7076-87. [PMID: 26295373 DOI: 10.1021/acs.jmedchem.5b01105] [Citation(s) in RCA: 301] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Colloidal aggregation of organic molecules is the dominant mechanism for artifactual inhibition of proteins, and controls against it are widely deployed. Notwithstanding an increasingly detailed understanding of this phenomenon, a method to reliably predict aggregation has remained elusive. Correspondingly, active molecules that act via aggregation continue to be found in early discovery campaigns and remain common in the literature. Over the past decade, over 12 thousand aggregating organic molecules have been identified, potentially enabling a precedent-based approach to match known aggregators with new molecules that may be expected to aggregate and lead to artifacts. We investigate an approach that uses lipophilicity, affinity, and similarity to known aggregators to advise on the likelihood that a candidate compound is an aggregator. In prospective experimental testing, five of seven new molecules with Tanimoto coefficients (Tc's) between 0.95 and 0.99 to known aggregators aggregated at relevant concentrations. Ten of 19 with Tc's between 0.94 and 0.90 and three of seven with Tc's between 0.89 and 0.85 also aggregated. Another three of the predicted compounds aggregated at higher concentrations. This method finds that 61 827 or 5.1% of the ligands acting in the 0.1 to 10 μM range in the medicinal chemistry literature are at least 85% similar to a known aggregator with these physical properties and may aggregate at relevant concentrations. Intriguingly, only 0.73% of all drug-like commercially available compounds resemble the known aggregators, suggesting that colloidal aggregators are enriched in the literature. As a percentage of the literature, aggregator-like compounds have increased 9-fold since 1995, partly reflecting the advent of high-throughput and virtual screens against molecular targets. Emerging from this study is an aggregator advisor database and tool ( http://advisor.bkslab.org ), free to the community, that may help distinguish between fruitful and artifactual screening hits acting by this mechanism.
Collapse
Affiliation(s)
- John J Irwin
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| | - Da Duan
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| | - Hayarpi Torosyan
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| | - Allison K Doak
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| | - Kristin T Ziebart
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| | - Teague Sterling
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| | - Gurgen Tumanian
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| | - Brian K Shoichet
- Department of Pharmaceutical Chemistry, University of California, San Francisco , Byers Hall, 1700 4th St, San Francisco, California 94158-2550, United States
| |
Collapse
|
8
|
Baig MH, Balaramnavar VM, Wadhwa G, Khan AU. Homology modeling and virtual screening of inhibitors against TEM- and SHV-type-resistant mutants: A multilayer filtering approach. Biotechnol Appl Biochem 2015; 62:669-80. [PMID: 25779642 DOI: 10.1002/bab.1370] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Accepted: 03/12/2015] [Indexed: 11/09/2022]
Abstract
TEM and SHV are class-A-type β-lactamases commonly found in Escherichia coli and Klebsiella pneumoniae. Previous studies reported S130G and K234R mutations in SHVs to be 41- and 10-fold more resistant toward clavulanic acid than SHV-1, respectively, whereas TEM S130G and R244S also showed the same level of resistance. These selected mutants confer higher level of resistance against clavulanic acid. They also show little susceptibility against other commercially available β-lactamase inhibitors. In this study, we have used docking-based virtual screening approach in order to screen potential inhibitors against some of the major resistant mutants of SHV and TEM types β-lactamase. Two different inhibitor-resistant mutants from SHV and TEM were selected. Moreover, we have retained the active site water molecules within each enzyme. Active site water molecules were placed within modeled structure of the mutant whose structure was unavailable with protein databank. The novelty of this work lies in the use of multilayer virtual screening approach for the prediction of best and accurate results. We are reporting five inhibitors on the basis of their efficacy against all the selected resistant mutants. These inhibitors were selected on the basis of their binding efficacies and pharmacophore features.
Collapse
Affiliation(s)
- Mohammad H Baig
- Interdisciplinary Biotechnology Unit, Aligarh Muslim University, Aligarh, India.,School of Biotechnology, Yeungnam University, Gyeongsan, Republic of Korea
| | - Vishal M Balaramnavar
- Division of Medicinal Chemistry and Drug Discovery, Global Institute of Pharmaceutical Education and Research, Kashipur, Udham Singh Nagar, Uttarakhand, India
| | - Gulshan Wadhwa
- Department of Biotechnology, Government of India, New Delhi, India
| | - Asad U Khan
- Interdisciplinary Biotechnology Unit, Aligarh Muslim University, Aligarh, India
| |
Collapse
|
9
|
Cao GP, Arooj M, Thangapandian S, Park C, Arulalapperumal V, Kim Y, Kwon YJ, Kim HH, Suh JK, Lee KW. A lazy learning-based QSAR classification study for screening potential histone deacetylase 8 (HDAC8) inhibitors. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2015; 26:397-420. [PMID: 25986171 DOI: 10.1080/1062936x.2015.1040453] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Histone deacetylases 8 (HDAC8) is an enzyme repressing the transcription of various genes including tumour suppressor gene and has already become a target of human cancer treatment. In an effort to facilitate the discovery of HDAC8 inhibitors, two quantitative structure-activity relationship (QSAR) classification models were developed using K nearest neighbours (KNN) and neighbourhood classifier (NEC). Molecular descriptors were calculated for the data set and database compounds using ADRIANA.Code of Molecular Networks. Principal components analysis (PCA) was used to select the descriptors. The developed models were validated by leave-one-out cross validation (LOO CV). The performances of the developed models were evaluated with an external test set. Highly predictive models were used for database virtual screening. Furthermore, hit compounds were subsequently subject to molecular docking. Five hits were obtained based on consensus scoring function and binding affinity as potential HDAC8 inhibitors. Finally, HDAC8 structures in complex with five hits were also subjected to 5 ns molecular dynamics (MD) simulations to evaluate the complex structure stability. To the best of our knowledge, the NEC classification model used in this study is the first application of NEC to virtual screening for drug discovery.
Collapse
Affiliation(s)
- G P Cao
- a Department of Biochemistry, Division of Applied Life Science (BK21 Plus Program) , Systems and Synthetic Agrobiotech Centre (SSAC), Plant Molecular Biology and Biotechnology Research Centre (PMBBRC), Research Institute of Natural Science (RINS), Gyeongsang National University , Jinju , Republic of Korea
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Fourches D, Tropsha A. Using Graph Indices for the Analysis and Comparison of Chemical Datasets. Mol Inform 2013; 32:827-42. [PMID: 27480235 DOI: 10.1002/minf.201300076] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Accepted: 08/05/2013] [Indexed: 12/13/2022]
Abstract
In cheminformatics, compounds are represented as points in multidimensional space of chemical descriptors. When all pairs of points found within certain distance threshold in the original high dimensional chemistry space are connected by distance-labeled edges, the resulting data structure can be defined as Dataset Graph (DG). We show that, similarly to the conventional description of organic molecules, many graph indices can be computed for DGs as well. We demonstrate that chemical datasets can be effectively characterized and compared by computing simple graph indices such as the average vertex degree or Randic connectivity index. This approach is used to characterize and quantify the similarity between different datasets or subsets of the same dataset (e.g., training, test, and external validation sets used in QSAR modeling). The freely available ADDAGRA program has been implemented to build and visualize DGs. The approach proposed and discussed in this report could be further explored and utilized for different cheminformatics applications such as dataset diversification by acquiring external compounds, dataset processing prior to QSAR modeling, or (dis)similarity modeling of multiple datasets studied in chemical genomics applications.
Collapse
Affiliation(s)
- Denis Fourches
- Laboratory for Molecular Modeling, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill NC 27599, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill NC 27599, USA.
| |
Collapse
|
11
|
In silico classification and virtual screening of maleimide derivatives using projection to latent structures discriminant analysis (PLS-DA) and hybrid docking. MONATSHEFTE FUR CHEMIE 2012. [DOI: 10.1007/s00706-012-0816-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
12
|
Berhanu WM, Pillai GG, Oliferenko AA, Katritzky AR. Quantitative Structure-Activity/Property Relationships: The Ubiquitous Links between Cause and Effect. Chempluschem 2012. [DOI: 10.1002/cplu.201200038] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
13
|
Chan FY, Neves MAC, Sun N, Tsang MW, Leung YC, Chan TH, Abagyan R, Wong KY. Validation of the AmpC β-Lactamase Binding Site and Identification of Inhibitors with Novel Scaffolds. J Chem Inf Model 2012; 52:1367-75. [DOI: 10.1021/ci300068m] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Fung-Yi Chan
- Department
of Applied Biology
and Chemical Technology and State Key Laboratory of Chirosciences, The Hong Kong Polytechnic University, Hung Hom, Kowloon,
Hong Kong, P. R. China
| | - Marco A. C. Neves
- Skaggs School of Pharmacy & Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, United States
- Centro de Neurociências,
Lab. Química Farmacêutica, Faculdade de Farmácia, Universidade de Coimbra, Pólo das Ciências
da Saúde, 3000-548 Coimbra, Portugal
| | - Ning Sun
- Department
of Applied Biology
and Chemical Technology and State Key Laboratory of Chirosciences, The Hong Kong Polytechnic University, Hung Hom, Kowloon,
Hong Kong, P. R. China
| | - Man-Wah Tsang
- Department
of Applied Biology
and Chemical Technology and State Key Laboratory of Chirosciences, The Hong Kong Polytechnic University, Hung Hom, Kowloon,
Hong Kong, P. R. China
| | - Yun-Chung Leung
- Department
of Applied Biology
and Chemical Technology and State Key Laboratory of Chirosciences, The Hong Kong Polytechnic University, Hung Hom, Kowloon,
Hong Kong, P. R. China
| | - Tak-Hang Chan
- Department
of Applied Biology
and Chemical Technology and State Key Laboratory of Chirosciences, The Hong Kong Polytechnic University, Hung Hom, Kowloon,
Hong Kong, P. R. China
| | - Ruben Abagyan
- Skaggs School of Pharmacy & Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, United States
| | - Kwok-Yin Wong
- Department
of Applied Biology
and Chemical Technology and State Key Laboratory of Chirosciences, The Hong Kong Polytechnic University, Hung Hom, Kowloon,
Hong Kong, P. R. China
| |
Collapse
|
14
|
Muratov EN, Varlamova EV, Artemenko AG, Polishchuk PG, Kuz'min VE. Existing and Developing Approaches for QSAR Analysis of Mixtures. Mol Inform 2012; 31:202-21. [PMID: 27477092 DOI: 10.1002/minf.201100129] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Accepted: 02/04/2012] [Indexed: 11/10/2022]
Abstract
This review is devoted to the critical analysis of advantages and disadvantages of existing mixture descriptors and their usage in various QSAR/QSPR tasks. We describe good practices for the QSAR modeling of mixtures, data sources for mixtures, a discussion of various mixture descriptors and their application, recommendations about proper external validation specific for mixture QSAR modeling, and future perspectives of this field. The biggest problem in QSAR of mixtures is the lack of reliable data about the mixtures' properties. Various mixture descriptors are used for the modeling of different endpoints. However, these descriptors have certain disadvantages, such as applicability only to 1 : 1 binary mixtures, and additive nature. The field of QSAR of mixtures is still under development, and existing efforts could be considered as a foundation for future approaches and studies. The usage of non-additive mixture descriptors, which are sensitive to interaction effects, in combination with best practices of QSAR model development (e.g., thorough data collection and curation, rigorous external validation, etc.) will significantly improve the quality of QSAR studies of mixtures.
Collapse
Affiliation(s)
- Eugene N Muratov
- Laboratory of Theoretical Chemistry, Department of Molecular Structure, A. V. Bogatsky Physical Chemical Institute, National Academy of Sciences of Ukraine, Lustdorfskaya Doroga 86, Odessa 65080, Ukraine tel: +380487662394, fax: +380487662394. , .,Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products, Eshelman School of Pharmacy, University of North Carolina, Beard Hall 301, CB#7568, Chapel Hill, NC, 27599, USA tel: +19199663459, fax: +19199660204. ,
| | - Ekaterina V Varlamova
- Laboratory of Theoretical Chemistry, Department of Molecular Structure, A. V. Bogatsky Physical Chemical Institute, National Academy of Sciences of Ukraine, Lustdorfskaya Doroga 86, Odessa 65080, Ukraine tel: +380487662394, fax: +380487662394
| | - Anatoly G Artemenko
- Laboratory of Theoretical Chemistry, Department of Molecular Structure, A. V. Bogatsky Physical Chemical Institute, National Academy of Sciences of Ukraine, Lustdorfskaya Doroga 86, Odessa 65080, Ukraine tel: +380487662394, fax: +380487662394
| | - Pavel G Polishchuk
- Laboratory of Theoretical Chemistry, Department of Molecular Structure, A. V. Bogatsky Physical Chemical Institute, National Academy of Sciences of Ukraine, Lustdorfskaya Doroga 86, Odessa 65080, Ukraine tel: +380487662394, fax: +380487662394
| | - Victor E Kuz'min
- Laboratory of Theoretical Chemistry, Department of Molecular Structure, A. V. Bogatsky Physical Chemical Institute, National Academy of Sciences of Ukraine, Lustdorfskaya Doroga 86, Odessa 65080, Ukraine tel: +380487662394, fax: +380487662394
| |
Collapse
|
15
|
Schattel V, Hinselmann G, Jahn A, Zell A, Laufer S. Modeling and benchmark data set for the inhibition of c-Jun N-terminal kinase-3. J Chem Inf Model 2011; 51:670-9. [PMID: 21280627 DOI: 10.1021/ci100410h] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The goal of this paper is to present and describe a novel 2D- and 3D-QSAR (quantitative structure-activity relationship) binary classification data set for the inhibition of c-Jun N-terminal kinase-3 with previously unpublished activities for a diverse set of compounds. JNK3 is an important pharmaceutical target because it is involved in many neurological disorders. Accordingly, the development of JNK3 inhibitors has gained increasing interest. 2D and 3D versions of the data set were used, consisting of 313 (70 actives) and 249 (60 actives) compounds, respectively. All compounds, for which activity was only determined for the racemate, were removed from the 3D data set. We investigated the diversity of the data sets by an agglomerative clustering with feature trees and show that the data set contains several different scaffolds. Furthermore, we show that the benchmarks can be tackled with standard supervised learning algorithms with a convincing performance. For the 2D problem, a random decision forest classifier achieves a Matthew's correlation coefficient of 0.744, the 3D problem could be modeled with a Matthew's correlation coefficient of 0.524 with 3D pharmacophores and a support vector machine. The performance of both data sets was evaluated within a nested 10-fold cross-validation. We therefore suggest that the data set is a reasonable basis for generating QSAR models for JNK3 because of its diverse composition and the performance of the classifiers presented in this study.
Collapse
Affiliation(s)
- Verena Schattel
- Department of Pharmaceutical and Medicinal Chemistry, Eberhard Karls University of Tübingen, Tübingen, Germany
| | | | | | | | | |
Collapse
|
16
|
Hinselmann G, Rosenbaum L, Jahn A, Fechner N, Ostermann C, Zell A. Large-Scale Learning of Structure−Activity Relationships Using a Linear Support Vector Machine and Problem-Specific Metrics. J Chem Inf Model 2011; 51:203-13. [DOI: 10.1021/ci100073w] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Georg Hinselmann
- Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany
| | - Lars Rosenbaum
- Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany
| | - Andreas Jahn
- Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany
| | - Nikolas Fechner
- Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany
| | | | - Andreas Zell
- Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany
| |
Collapse
|
17
|
Ebalunode JO, Zheng W, Tropsha A. Application of QSAR and shape pharmacophore modeling approaches for targeted chemical library design. Methods Mol Biol 2011; 685:111-33. [PMID: 20981521 DOI: 10.1007/978-1-60761-931-4_6] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Optimization of chemical library composition affords more efficient identification of hits from biological screening experiments. The optimization could be achieved through rational selection of reagents used in combinatorial library synthesis. However, with a rapid advent of parallel synthesis methods and availability of millions of compounds synthesized by many vendors, it may be more efficient to design targeted libraries by means of virtual screening of commercial compound collections. This chapter reviews the application of advanced cheminformatics approaches such as quantitative structure-activity relationships (QSAR) and pharmacophore modeling (both ligand and structure based) for virtual screening. Both approaches rely on empirical SAR data to build models; thus, the emphasis is placed on achieving models of the highest rigor and external predictive power. We present several examples of successful applications of both approaches for virtual screening to illustrate their utility. We suggest that the expert use of both QSAR and pharmacophore models, either independently or in combination, enables users to achieve targeted libraries enriched with experimentally confirmed hit compounds.
Collapse
Affiliation(s)
- Jerry O Ebalunode
- Department of Pharmaceutical Sciences, BRITE Institute, North Carolina Center University, Durham, NC, USA.
| | | | | |
Collapse
|
18
|
Tropsha A. Best Practices for QSAR Model Development, Validation, and Exploitation. Mol Inform 2010; 29:476-88. [DOI: 10.1002/minf.201000061] [Citation(s) in RCA: 1086] [Impact Index Per Article: 77.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Accepted: 06/08/2010] [Indexed: 11/11/2022]
|
19
|
Fechner N, Jahn A, Hinselmann G, Zell A. Estimation of the applicability domain of kernel-based machine learning models for virtual screening. J Cheminform 2010; 2:2. [PMID: 20222949 PMCID: PMC2851576 DOI: 10.1186/1758-2946-2-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2009] [Accepted: 03/11/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The virtual screening of large compound databases is an important application of structural-activity relationship models. Due to the high structural diversity of these data sets, it is impossible for machine learning based QSAR models, which rely on a specific training set, to give reliable results for all compounds. Thus, it is important to consider the subset of the chemical space in which the model is applicable. The approaches to this problem that have been published so far mostly use vectorial descriptor representations to define this domain of applicability of the model. Unfortunately, these cannot be extended easily to structured kernel-based machine learning models. For this reason, we propose three approaches to estimate the domain of applicability of a kernel-based QSAR model. RESULTS We evaluated three kernel-based applicability domain estimations using three different structured kernels on three virtual screening tasks. Each experiment consisted of the training of a kernel-based QSAR model using support vector regression and the ranking of a disjoint screening data set according to the predicted activity. For each prediction, the applicability of the model for the respective compound is quantitatively described using a score obtained by an applicability domain formulation. The suitability of the applicability domain estimation is evaluated by comparing the model performance on the subsets of the screening data sets obtained by different thresholds for the applicability scores. This comparison indicates that it is possible to separate the part of the chemspace, in which the model gives reliable predictions, from the part consisting of structures too dissimilar to the training set to apply the model successfully. A closer inspection reveals that the virtual screening performance of the model is considerably improved if half of the molecules, those with the lowest applicability scores, are omitted from the screening. CONCLUSION The proposed applicability domain formulations for kernel-based QSAR models can successfully identify compounds for which no reliable predictions can be expected from the model. The resulting reduction of the search space and the elimination of some of the active compounds should not be considered as a drawback, because the results indicate that, in most cases, these omitted ligands would not be found by the model anyway.
Collapse
Affiliation(s)
- Nikolas Fechner
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| | - Andreas Jahn
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| | - Georg Hinselmann
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| | - Andreas Zell
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| |
Collapse
|
20
|
Li Q, Wang Y, Bryant SH. A novel method for mining highly imbalanced high-throughput screening data in PubChem. ACTA ACUST UNITED AC 2009; 25:3310-6. [PMID: 19825798 PMCID: PMC2788930 DOI: 10.1093/bioinformatics/btp589] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Motivation: The comprehensive information of small molecules and their biological activities in PubChem brings great opportunities for academic researchers. However, mining high-throughput screening (HTS) assay data remains a great challenge given the very large data volume and the highly imbalanced nature with only small number of active compounds compared to inactive compounds. Therefore, there is currently a need for better strategies to work with HTS assay data. Moreover, as luciferase-based HTS technology is frequently exploited in the assays deposited in PubChem, constructing a computational model to distinguish and filter out potential interference compounds for these assays is another motivation. Results: We used the granular support vector machines (SVMs) repetitive under sampling method (GSVM-RU) to construct an SVM from luciferase inhibition bioassay data that the imbalance ratio of active/inactive is high (1/377). The best model recognized the active and inactive compounds at the accuracies of 86.60% and 88.89 with a total accuracy of 87.74%, by cross-validation test and blind test. These results demonstrate the robustness of the model in handling the intrinsic imbalance problem in HTS data and it can be used as a virtual screening tool to identify potential interference compounds in luciferase-based HTS experiments. Additionally, this method has also proved computationally efficient by greatly reducing the computational cost and can be easily adopted in the analysis of HTS data for other biological systems. Availability: Data are publicly available in PubChem with AIDs of 773, 1006 and 1379. Contact:ywang@ncbi.nlm.nih.gov; bryant@ncbi.nlm.nih.gov Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qingliang Li
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | |
Collapse
|
21
|
Peterson YK, Wang XS, Casey PJ, Tropsha A. Discovery of geranylgeranyltransferase-I inhibitors with novel scaffolds by the means of quantitative structure-activity relationship modeling, virtual screening, and experimental validation. J Med Chem 2009; 52:4210-20. [PMID: 19537691 DOI: 10.1021/jm8013772] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Geranylgeranylation is critical to the function of several proteins including Rho, Rap1, Rac, Cdc42, and G-protein gamma subunits. Geranylgeranyltransferase type I (GGTase-I) inhibitors (GGTIs) have therapeutic potential to treat inflammation, multiple sclerosis, atherosclerosis, and many other diseases. Following our standard workflow, we have developed and rigorously validated quantitative structure-activity relationship (QSAR) models for 48 GGTIs using variable selection k nearest neighbor (kNN), automated lazy learning (ALL), and partial least squares (PLS) methods. The QSAR models were employed for virtual screening of 9.5 million commercially available chemicals, yielding 47 diverse computational hits. Seven of these compounds with novel scaffolds and high predicted GGTase-I inhibitory activities were tested in vitro, and all were found to be bona fide and selective micromolar inhibitors. Notably, these novel hits could not be identified using traditional similarity search. These data demonstrate that rigorously developed QSAR models can serve as reliable virtual screening tools, leading to the discovery of structurally novel bioactive compounds.
Collapse
Affiliation(s)
- Yuri K Peterson
- Department of Pharmacology, Duke University Medical Center, Durham, North Carolina 27710, USA
| | | | | | | |
Collapse
|
22
|
Rohrer SG, Baumann K. Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data. J Chem Inf Model 2009; 49:169-84. [PMID: 19434821 DOI: 10.1021/ci8002649] [Citation(s) in RCA: 223] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Refined nearest neighbor analysis was recently introduced for the analysis of virtual screening benchmark data sets. It constitutes a technique from the field of spatial statistics and provides a mathematical framework for the nonparametric analysis of mapped point patterns. Here, refined nearest neighbor analysis is used to design benchmark data sets for virtual screening based on PubChem bioactivity data. A workflow is devised that purges data sets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies monitored by refined nearest neighbor analysis functions is applied to generate corresponding data sets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These data sets provide a tool for Maximum Unbiased Validation (MUV) of virtual screening methods. The data sets and a software package implementing the MUV design workflow are freely available at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.
Collapse
Affiliation(s)
- Sebastian G Rohrer
- Institute of Pharmaceutical Chemistry, Beethovenstrasse 55, Braunschweig University of Technology, 38106 Braunschweig, Germany
| | | |
Collapse
|
23
|
Tang H, Wang XS, Huang XP, Roth BL, Butler KV, Kozikowski AP, Jung M, Tropsha A. Novel Inhibitors of Human Histone Deacetylase (HDAC) Identified by QSAR Modeling of Known Inhibitors, Virtual Screening, and Experimental Validation. J Chem Inf Model 2009; 49:461-76. [DOI: 10.1021/ci800366f] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Hao Tang
- Laboratory for Molecular Modeling, and Carolina Exploratory Center for Cheminformatics Research, Division of Medicinal Chemistry and Natural Products, School of Pharmacy, Biophysics Training Program, Department of Pharmacology, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599, Department of Medicinal Chemistry and Pharmacognosy, University of Illinois, 833 South Wood Street, Chicago, Illinois 60612, and Department of Radiation Medicine, Georgetown University Medical
| | - Xiang S. Wang
- Laboratory for Molecular Modeling, and Carolina Exploratory Center for Cheminformatics Research, Division of Medicinal Chemistry and Natural Products, School of Pharmacy, Biophysics Training Program, Department of Pharmacology, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599, Department of Medicinal Chemistry and Pharmacognosy, University of Illinois, 833 South Wood Street, Chicago, Illinois 60612, and Department of Radiation Medicine, Georgetown University Medical
| | - Xi-Ping Huang
- Laboratory for Molecular Modeling, and Carolina Exploratory Center for Cheminformatics Research, Division of Medicinal Chemistry and Natural Products, School of Pharmacy, Biophysics Training Program, Department of Pharmacology, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599, Department of Medicinal Chemistry and Pharmacognosy, University of Illinois, 833 South Wood Street, Chicago, Illinois 60612, and Department of Radiation Medicine, Georgetown University Medical
| | - Bryan L. Roth
- Laboratory for Molecular Modeling, and Carolina Exploratory Center for Cheminformatics Research, Division of Medicinal Chemistry and Natural Products, School of Pharmacy, Biophysics Training Program, Department of Pharmacology, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599, Department of Medicinal Chemistry and Pharmacognosy, University of Illinois, 833 South Wood Street, Chicago, Illinois 60612, and Department of Radiation Medicine, Georgetown University Medical
| | - Kyle V. Butler
- Laboratory for Molecular Modeling, and Carolina Exploratory Center for Cheminformatics Research, Division of Medicinal Chemistry and Natural Products, School of Pharmacy, Biophysics Training Program, Department of Pharmacology, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599, Department of Medicinal Chemistry and Pharmacognosy, University of Illinois, 833 South Wood Street, Chicago, Illinois 60612, and Department of Radiation Medicine, Georgetown University Medical
| | - Alan P. Kozikowski
- Laboratory for Molecular Modeling, and Carolina Exploratory Center for Cheminformatics Research, Division of Medicinal Chemistry and Natural Products, School of Pharmacy, Biophysics Training Program, Department of Pharmacology, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599, Department of Medicinal Chemistry and Pharmacognosy, University of Illinois, 833 South Wood Street, Chicago, Illinois 60612, and Department of Radiation Medicine, Georgetown University Medical
| | - Mira Jung
- Laboratory for Molecular Modeling, and Carolina Exploratory Center for Cheminformatics Research, Division of Medicinal Chemistry and Natural Products, School of Pharmacy, Biophysics Training Program, Department of Pharmacology, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599, Department of Medicinal Chemistry and Pharmacognosy, University of Illinois, 833 South Wood Street, Chicago, Illinois 60612, and Department of Radiation Medicine, Georgetown University Medical
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, and Carolina Exploratory Center for Cheminformatics Research, Division of Medicinal Chemistry and Natural Products, School of Pharmacy, Biophysics Training Program, Department of Pharmacology, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599, Department of Medicinal Chemistry and Pharmacognosy, University of Illinois, 833 South Wood Street, Chicago, Illinois 60612, and Department of Radiation Medicine, Georgetown University Medical
| |
Collapse
|