1
|
Cereto-Massagué A, Ojeda MJ, Valls C, Mulero M, Garcia-Vallvé S, Pujadas G. Molecular fingerprint similarity search in virtual screening. Methods 2014; 71:58-63. [PMID: 25132639 DOI: 10.1016/j.ymeth.2014.08.005] [Citation(s) in RCA: 420] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Revised: 08/04/2014] [Accepted: 08/08/2014] [Indexed: 11/18/2022] Open
Abstract
Molecular fingerprints have been used for a long time now in drug discovery and virtual screening. Their ease of use (requiring little to no configuration) and the speed at which substructure and similarity searches can be performed with them - paired with a virtual screening performance similar to other more complex methods - is the reason for their popularity. However, there are many types of fingerprints, each representing a different aspect of the molecule, which can greatly affect search performance. This review focuses on commonly used fingerprint algorithms, their usage in virtual screening, and the software packages and online tools that provide these algorithms.
Collapse
|
Review |
11 |
420 |
2
|
Salentin S, Haupt VJ, Daminelli S, Schroeder M. Polypharmacology rescored: protein-ligand interaction profiles for remote binding site similarity assessment. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2014; 116:174-86. [PMID: 24923864 DOI: 10.1016/j.pbiomolbio.2014.05.006] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Revised: 05/20/2014] [Accepted: 05/26/2014] [Indexed: 11/27/2022]
Abstract
Detection of remote binding site similarity in proteins plays an important role for drug repositioning and off-target effect prediction. Various non-covalent interactions such as hydrogen bonds and van-der-Waals forces drive ligands' molecular recognition by binding sites in proteins. The increasing amount of available structures of protein-small molecule complexes enabled the development of comparative approaches. Several methods have been developed to characterize and compare protein-ligand interaction patterns. Usually implemented as fingerprints, these are mainly used for post processing docking scores and (off-)target prediction. In the latter application, interaction profiles detect similarities in the bound interactions of different ligands and thus identify essential interactions between a protein and its small molecule ligands. Interaction pattern similarity correlates with binding site similarity and is thus contributing to a higher precision in binding site similarity assessment of proteins with distinct global structure. This renders it valuable for existing drug repositioning approaches in structural bioinformatics. Current methods to characterize and compare structure-based interaction patterns - both for protein-small-molecule and protein-protein interactions - as well as their potential in target prediction will be reviewed in this article. The question of how the set of interaction types, flexibility or water-mediated interactions, influence the comparison of interaction patterns will be discussed. Due to the wealth of protein-ligand structures available today, predicted targets can be ranked by comparing their ligand interaction pattern to patterns of the known target. Such knowledge-based methods offer high precision in comparison to methods comparing whole binding sites based on shape and amino acid physicochemical similarity.
Collapse
|
Review |
11 |
80 |
3
|
Probst D, Reymond JL. A probabilistic molecular fingerprint for big data settings. J Cheminform 2018; 10:66. [PMID: 30564943 PMCID: PMC6755601 DOI: 10.1186/s13321-018-0321-8] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Accepted: 12/13/2018] [Indexed: 11/10/2022] Open
Abstract
Background Among the various molecular fingerprints available to describe small organic molecules, extended connectivity fingerprint, up to four bonds (ECFP4) performs best in benchmarking drug analog recovery studies as it encodes substructures with a high level of detail. Unfortunately, ECFP4 requires high dimensional representations (≥ 1024D) to perform well, resulting in ECFP4 nearest neighbor searches in very large databases such as GDB, PubChem or ZINC to perform very slowly due to the curse of dimensionality. Results Herein we report a new fingerprint, called MinHash fingerprint, up to six bonds (MHFP6), which encodes detailed substructures using the extended connectivity principle of ECFP in a fundamentally different manner, increasing the performance of exact nearest neighbor searches in benchmarking studies and enabling the application of locality sensitive hashing (LSH) approximate nearest neighbor search algorithms. To describe a molecule, MHFP6 extracts the SMILES of all circular substructures around each atom up to a diameter of six bonds and applies the MinHash method to the resulting set. MHFP6 outperforms ECFP4 in benchmarking analog recovery studies. By leveraging locality sensitive hashing, LSH approximate nearest neighbor search methods perform as well on unfolded MHFP6 as comparable methods do on folded ECFP4 fingerprints in terms of speed and relative recovery rate, while operating in very sparse and high-dimensional binary chemical space. Conclusion MHFP6 is a new molecular fingerprint, encoding circular substructures, which outperforms ECFP4 for analog searches while allowing the direct application of locality sensitive hashing algorithms. It should be well suited for the analysis of large databases. The source code for MHFP6 is available on GitHub (https://github.com/reymond-group/mhfp).![]() Electronic supplementary material The online version of this article (10.1186/s13321-018-0321-8) contains supplementary material, which is available to authorized users.
Collapse
|
Journal Article |
7 |
68 |
4
|
Zhao J, Du X, Cheng N, Chen L, Xue X, Zhao J, Wu L, Cao W. Identification of monofloral honeys using HPLC-ECD and chemometrics. Food Chem 2015; 194:167-74. [PMID: 26471540 DOI: 10.1016/j.foodchem.2015.08.010] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Revised: 07/31/2015] [Accepted: 08/04/2015] [Indexed: 10/23/2022]
Abstract
A total of 77 jujube, longan and chaste honey samples were collected from 18 different areas of China. Thirteen types of phenolic acids in the honey samples were analysed using high-performance liquid chromatography with electrochemical detection (HPLC-ECD). Moreover, HPLC-ECD fingerprints of the monofloral honey samples were established. From the analysis of the HPLC-ECD fingerprints, common chromatography peak information was obtained, and principal component analysis and discriminant analysis were performed using selected common chromatography peak areas as variables. By comparing with phenolic acids as variables, using a chemometric analysis which is based on the use of common chromatography peaks as variables, 36 honey samples and 41 test samples could be correctly identified according to their floral origin.
Collapse
|
Research Support, Non-U.S. Gov't |
10 |
56 |
5
|
Rodríguez-Vidal FJ, García-Valverde M, Ortega-Azabache B, González-Martínez Á, Bellido-Fernández A. Characterization of urban and industrial wastewaters using excitation-emission matrix (EEM) fluorescence: Searching for specific fingerprints. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2020; 263:110396. [PMID: 32174533 DOI: 10.1016/j.jenvman.2020.110396] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 02/14/2020] [Accepted: 03/03/2020] [Indexed: 06/10/2023]
Abstract
Excitation-emission matrix (EEM) fluorescence spectroscopy has been applied to characterize several urban and industrial wastewaters (effluents from different types of industries: brewery, winery, dairy, biscuit, tinned fish industry, slaughterhouse, pulp mill, textile dyeing and landfill leachates), searching for specific fluorescence fingerprints. Tryptophan protein-like peaks (T1 and T2) are the predominant fluorescence in urban and food industry wastewaters (brewery, winery, dairy/milk, biscuit and fish farm industries) but no special fingerprint has been found to discriminate among them. Protein-like fluorescence also dominates the spectra of meat/fish industries (effluents from a tinned fish industry and a slaughterhouse), but in this case tyrosine protein-like peaks (B1 and B2) also appear in the spectra in addition to tryptophan-like peaks. This fact might constitute a specific feature to differentiate these wastewaters from others, since the appearance of peaks B is quite uncommon in wastewaters. The textile dyeing effluent shows a characteristic triple humic-like fluorescence (peaks A, C1 and C2) that may represent a specific fingerprint for this kind of effluent. Leachates from medium-aged and old landfills might also show a specific fingerprint in their EEM spectra: the sole presence of the humic-like peak C with very high fluorescence intensity. This feature also allows differentiating them from young landfill leachates, which show predominance of protein-like peaks. The fluorescence index (FI) does not seem to be very appropriate to characterize wastewaters and its usefulness might be limited to the study of humic substances in natural waters, although further studies are needed on this topic. However, the humification index (HIX) and the biological index (BIX) do seem to be useful for studying wastewaters, since they have provided consistent results in the present work. This study shows the potential of EEM fluorescence to identify the origin of some industrial effluents, although more research is needed to check these preliminary results.
Collapse
|
|
5 |
54 |
6
|
Awale M, Jin X, Reymond JL. Stereoselective virtual screening of the ZINC database using atom pair 3D- fingerprints. J Cheminform 2015; 7:3. [PMID: 25750664 PMCID: PMC4352573 DOI: 10.1186/s13321-014-0051-5] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 12/19/2014] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Tools to explore large compound databases in search for analogs of query molecules provide a strategically important support in drug discovery to help identify available analogs of any given reference or hit compound by ligand based virtual screening (LBVS). We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry. Here we investigated related 3D-atom pair fingerprints to enable rapid stereoselective searches in the ZINC database (23.2 million 3D structures). RESULTS Molecular fingerprints counting atom pairs at increasing through-space distance intervals were designed using either all atoms (16-bit 3DAPfp) or different atom categories (80-bit 3DXfp). These 3D-fingerprints retrieved molecular shape and pharmacophore analogs (defined by OpenEye ROCS scoring functions) of 110,000 compounds from the Cambridge Structural Database with equal or better accuracy than the 2D-fingerprints APfp and Xfp, and showed comparable performance in recovering actives from decoys in the DUD database. LBVS by 3DXfp or 3DAPfp similarity was stereoselective and gave very different analogs when starting from different diastereomers of the same chiral drug. Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances. CONCLUSIONS 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases. Web-browsers for searching ZINC by 3DAPfp and 3DXfp similarity are accessible at www.gdb.unibe.ch and should provide useful assistance to drug discovery projects. Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.
Collapse
|
Journal Article |
10 |
45 |
7
|
Castillo-Peinado LS, Luque de Castro MD. Present and foreseeable future of metabolomics in forensic analysis. Anal Chim Acta 2016; 925:1-15. [PMID: 27188312 DOI: 10.1016/j.aca.2016.04.040] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Revised: 04/12/2016] [Accepted: 04/17/2016] [Indexed: 01/24/2023]
Abstract
The revulsive publications during the last years on the precariousness of forensic sciences worldwide have promoted the move of major steps towards improvement of this science. One of the steps (viz. a higher involvement of metabolomics in the new era of forensic analysis) deserves to be discussed under different angles. Thus, the characteristics of metabolomics that make it a useful tool in forensic analysis, the aspects in which this omics is so far implicit, but not mentioned in forensic analyses, and how typical forensic parameters such as the post-mortem interval or fingerprints take benefits from metabolomics are critically discussed in this review. The way in which the metabolomics-forensic binomial succeeds when either conventional or less frequent samples are used is highlighted here. Finally, the pillars that should support future developments involving metabolomics and forensic analysis, and the research required for a fruitful in-depth involvement of metabolomics in forensic analysis are critically discussed.
Collapse
|
Review |
9 |
41 |
8
|
Custers D, Canfyn M, Courselle P, De Beer JO, Apers S, Deconinck E. Headspace-gas chromatographic fingerprints to discriminate and classify counterfeit medicines. Talanta 2014; 123:78-88. [PMID: 24725867 DOI: 10.1016/j.talanta.2014.01.020] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 01/15/2014] [Accepted: 01/20/2014] [Indexed: 10/25/2022]
Abstract
Counterfeit medicines are a global threat to public health. These pharmaceuticals are not subjected to quality control and therefore their safety, quality and efficacy cannot be guaranteed. Today, the safety evaluation of counterfeit medicines is mainly based on the identification and quantification of the active substances present. However, the analysis of potential toxic secondary components, like residual solvents, becomes more important. Assessment of residual solvent content and chemometric analysis of fingerprints might be useful in the discrimination between genuine and counterfeit pharmaceuticals. Moreover, the fingerprint approach might also contribute in the evaluation of the health risks different types of counterfeit medicines pose. In this study a number of genuine and counterfeit Viagra(®) and Cialis(®) samples were analyzed for residual solvent content using headspace-GC-MS. The obtained chromatograms were used as fingerprints and analyzed using different chemometric techniques: Principal Component Analysis, Projection Pursuit, Classification and Regression Trees and Soft Independent Modelling of Class Analogy. It was tested whether these techniques can distinguish genuine pharmaceuticals from counterfeit ones and if distinct types of counterfeits could be differentiated based on health risks. This chemometric analysis showed that for both data sets PCA clearly discriminated between genuine and counterfeit drugs, and SIMCA generated the best predictive models. This technique not only resulted in a 100% correct classification rate for the discrimination between genuine and counterfeit medicines, the classification of the counterfeit samples was also superior compared to CART. This study shows that chemometric analysis of headspace-GC impurity fingerprints allows to distinguish between genuine and counterfeit medicines and to differentiate between groups of counterfeit products based on the public health risks they pose.
Collapse
|
Journal Article |
11 |
28 |
9
|
Swofford HJ, Koertner AJ, Zemp F, Ausdemore M, Liu A, Salyards MJ. A method for the statistical interpretation of friction ridge skin impression evidence: Method development and validation. Forensic Sci Int 2018; 287:113-126. [PMID: 29655097 DOI: 10.1016/j.forsciint.2018.03.043] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Revised: 03/08/2018] [Accepted: 03/27/2018] [Indexed: 11/16/2022]
Abstract
The forensic fingerprint community has faced increasing amounts of criticism by scientific and legal commentators, challenging the validity and reliability of fingerprint evidence due to the lack of an empirically demonstrable basis to evaluate and report the strength of the evidence in a given case. This paper presents a method, developed as a stand-alone software application, FRStat, which provides a statistical assessment of the strength of fingerprint evidence. The performance was evaluated using a variety of mated and non-mated datasets. The results show strong performance characteristics, often with values supporting specificity rates greater than 99%. This method provides fingerprint experts the capability to demonstrate the validity and reliability of fingerprint evidence in a given case and report the findings in a more transparent and standardized fashion with clearly defined criteria for conclusions and known error rate information thereby responding to concerns raised by the scientific and legal communities.
Collapse
|
Validation Study |
7 |
26 |
10
|
Chromatographic fingerprinting as a strategy to identify regulated plants in illegal herbal supplements. Talanta 2016; 164:490-502. [PMID: 28107963 DOI: 10.1016/j.talanta.2016.12.008] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Revised: 11/30/2016] [Accepted: 12/04/2016] [Indexed: 11/21/2022]
Abstract
Erectile dysfunction (ED) is a sexual disorder characterized by the inability to achieve or maintain a sufficiently rigid erection. Despite the availability of non-invasive oral treatment options, many patients turn to herbal alternatives. Furthermore, herbal supplements are increasingly gaining popularity in industrialized countries and, as a consequence, quality control is a highly important issue. Unfortunately, this is not a simple task since plants are often crushed and mixed with other plants, which complicates their identification by usage of classical approaches such as microscopy. The aim of this study was to explore the potential use of chromatographic fingerprinting to identify plants present in herbal preparations intended for the treatment of ED. To achieve this goal, a HPLC-PDA and a HPLC-MS method were developed, using a full factorial experimental design in order to acquire characteristic fingerprints of three plants which are potentially beneficial for treating ED: Epimedium spp., Pausinystalia yohimbe and Tribulus terrestris. The full factorial design demonstrated that for all three plant references a C8 column (250mm×4.6mm; 5µm particle size) is best suited; methanol and an ammonium formate buffer (pH 3) were found to be the best constituents for the mobile phase. The suitability of this strategy was demonstrated by analysing several self-made triturations in three different botanical matrices, which mimic the influential effects that could be expected when analysing herbal supplements. To conclude, this study demonstrates that chromatographic fingerprinting could provide a useful means to identify plants in a complex herbal mixture.
Collapse
|
Journal Article |
9 |
25 |
11
|
Silvis ICJ, Luning PA, Klose N, Jansen M, van Ruth SM. Similarities and differences of the volatile profiles of six spices explored by Proton Transfer Reaction Mass Spectrometry. Food Chem 2018; 271:318-327. [PMID: 30236683 DOI: 10.1016/j.foodchem.2018.07.021] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Revised: 06/29/2018] [Accepted: 07/02/2018] [Indexed: 10/28/2022]
Abstract
Aroma properties of spices are related to the volatile organic compounds (VOCs) present, which can provide distinct analytical signatures. The aim of the study was to examine similarity and diversity of VOC profiles of six common market spices (black/white pepper, chili paprika, cinnamon, nutmeg and saffron). The key volatiles were identified by PTR-TOFMS. Twelve samples per spice were subjected to PTR-Quadrupole MS (PTR-QMS) and Principal Component Analysis to compare the groups and examine diversity. With PTR-TOFMS, 101 volatile compounds were identified as total sum across all samples by mass and comparing them with literature data. Some spices comprised key character aroma compounds, e.g. cinnamaldehyde in cinnamon. For others, VOC groups, such as terpenes, acids and aldehydes topped the list. The PTR-QMS in combination with variables selection resulted in distinct PCA patterns for each spice. Variation within the spice groups was observed, but varied with the kind of spice. The results are valuable for future authentication studies.
Collapse
|
Journal Article |
7 |
24 |
12
|
Chemometrics and the identification of counterfeit medicines-A review. J Pharm Biomed Anal 2016; 127:112-22. [PMID: 27133184 DOI: 10.1016/j.jpba.2016.04.016] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Revised: 03/31/2016] [Accepted: 04/14/2016] [Indexed: 11/20/2022]
Abstract
This review article provides readers with a number of actual case studies dealing with verifying the authenticity of selected medicines supported by different chemometric approaches. In particular, a general data processing workflow is discussed with the major emphasis on the most frequently selected instrumental techniques to characterize drug samples and the chemometric methods being used to explore and/or model the analytical data. However, further discussion is limited to a situation in which the collected data describes two groups of drug samples - authentic ones and counterfeits.
Collapse
|
Review |
9 |
23 |
13
|
Abstract
The assessment of small molecule similarity is a central task in chemoinformatics and medicinal chemistry. A variety of molecular representations and metrics are applied to computationally evaluate and quantify molecular similarity. A critically important aspect of molecular similarity analysis in chemoinformatics and pharmaceutical research is that one is typically not interested in quantifying the degree of structural or chemical similarity between compounds per se, but rather in extrapolating from molecular similarity to property similarity. In other words, one assumes that there is a correlation between calculated similarity and specific properties of small molecules including, first and foremost, biological activities. Although similarity is a priori a subjective concept, and difficult to quantify, it must computationally be assessed in a formally consistent manner. Otherwise, there is little utility of similarity calculations. Consistent treatment requires approximations to be made and the consideration of alternative computational similarity concepts, as discussed herein.
Collapse
|
|
8 |
22 |
14
|
Vieira AR, Manton DJ. On the Variable Clinical Presentation of Molar-Incisor Hypomineralization. Caries Res 2019; 53:482-488. [PMID: 30943522 DOI: 10.1159/000496542] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Accepted: 12/15/2018] [Indexed: 11/19/2022] Open
Abstract
Molar-incisor hypomineralization (MIH) is a condition that is defined based on its peculiar clinical presentation. Original reports on the etiology of the condition and possible risk factors were inconclusive, and we refuted the original suggestion that MIH is an idiopathic condition and suggested that MIH has complex inheritance and is due to the interaction of more than one gene and the environment. Our group was the first to suggest MIH has a genetic component that involves genetic variation in genes expressed during dental enamel formation. Later we expanded this work to include genes related to the immune response. In this report, we provide a rationale to explain the variation seen in the clinical presentation of MIH, which can affect just one molar out of the four or just a portion of a particular molar.
Collapse
|
Journal Article |
6 |
21 |
15
|
Accurate and efficient target prediction using a potency-sensitive influence-relevance voter. J Cheminform 2015; 7:63. [PMID: 26719774 PMCID: PMC4696267 DOI: 10.1186/s13321-015-0110-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 12/02/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A number of algorithms have been proposed to predict the biological targets of diverse molecules. Some are structure-based, but the most common are ligand-based and use chemical fingerprints and the notion of chemical similarity. These methods tend to be computationally faster than others, making them particularly attractive tools as the amount of available data grows. RESULTS Using a ChEMBL-derived database covering 490,760 molecule-protein interactions and 3236 protein targets, we conduct a large-scale assessment of the performance of several target-prediction algorithms at predicting drug-target activity. We assess algorithm performance using three validation procedures: standard tenfold cross-validation, tenfold cross-validation in a simulated screen that includes random inactive molecules, and validation on an external test set composed of molecules not present in our database. CONCLUSIONS We present two improvements over current practice. First, using a modified version of the influence-relevance voter (IRV), we show that using molecule potency data can improve target prediction. Second, we demonstrate that random inactive molecules added during training can boost the accuracy of several algorithms in realistic target-prediction experiments. Our potency-sensitive version of the IRV (PS-IRV) obtains the best results on large test sets in most of the experiments. Models and software are publicly accessible through the chemoinformatics portal at http://chemdb.ics.uci.edu/.
Collapse
|
Journal Article |
10 |
21 |
16
|
Gilad Y, Nadassy K, Senderowitz H. A reliable computational workflow for the selection of optimal screening libraries. J Cheminform 2015; 7:61. [PMID: 26692904 PMCID: PMC4676138 DOI: 10.1186/s13321-015-0108-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2015] [Accepted: 11/24/2015] [Indexed: 11/10/2022] Open
Abstract
Background The experimental screening of compound collections is a common starting point in many drug discovery projects. Successes of such screening campaigns critically depend on the quality of the screened library. Many libraries are currently available from different vendors yet the selection of the optimal screening library for a specific project is challenging. We have devised a novel workflow for the rational selection of project-specific screening libraries. Results The workflow accepts as input a set of virtual candidate libraries and applies the following steps to each library: (1) data curation; (2) assessment of ADME/T profile; (3) assessment of the number of promiscuous binders/frequent HTS hitters; (4) assessment of internal diversity; (5) assessment of similarity to known active compound(s) (optional); (6) assessment of similarity to in-house or otherwise accessible compound collections (optional). For ADME/T profiling, Lipinski’s and Veber’s rule-based filters were implemented and a new blood brain barrier permeation model was developed and validated (85 and 74 % success rate for training set and test set, respectively). Diversity and similarity descriptors which demonstrated best performances in terms of their ability to select either diverse or focused sets of compounds from three databases (Drug Bank, CMC and CHEMBL) were identified and used for diversity and similarity assessments. The workflow was used to analyze nine common screening libraries available from six vendors. The results of this analysis are reported for each library providing an assessment of its quality. Furthermore, a consensus approach was developed to combine the results of these analyses into a single score for selecting the optimal library under different scenarios. Conclusions We have devised and tested a new workflow for the rational selection of screening libraries under different scenarios. The current workflow was implemented using the Pipeline Pilot software yet due to the usage of generic components, it can be easily adapted and reproduced by computational groups interested in rational selection of screening libraries. Furthermore, the workflow could be readily modified to include additional components. This workflow has been routinely used in our laboratory for the selection of libraries in multiple projects and consistently selects libraries which are well balanced across multiple parameters. |
Journal Article |
10 |
19 |
17
|
Gütlein M, Kramer S. Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability. J Cheminform 2016; 8:60. [PMID: 27853484 PMCID: PMC5088672 DOI: 10.1186/s13321-016-0173-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 10/18/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Even though circular fingerprints have been first introduced more than 50 years ago, they are still widely used for building highly predictive, state-of-the-art (Q)SAR models. Historically, these structural fragments were designed to search large molecular databases. Hence, to derive a compact representation, circular fingerprint fragments are often folded to comparatively short bit-strings. However, folding fingerprints introduces bit collisions, and therefore adds noise to the encoded structural information and removes its interpretability. Both representations, folded as well as unprocessed fingerprints, are often used for (Q)SAR modeling. RESULTS We show that it can be preferable to build (Q)SAR models with circular fingerprint fragments that have been filtered by supervised feature selection, instead of applying folded or all fragments. Compared to folded fingerprints, filtered fingerprints significantly increase predictive performance and remain unambiguous and interpretable. Compared to unprocessed fingerprints, filtered fingerprints reduce the computational effort and are a more compact and less redundant feature representation. Depending on the selected learning algorithm filtering yields about equally predictive (Q)SAR models. We demonstrate the suitability of filtered fingerprints for (Q)SAR modeling by presenting our freely available web service Collision-free Filtered Circular Fingerprints that provides rationales for predictions by highlighting important structural features in the query compound (see http://coffer.informatik.uni-mainz.de). CONCLUSIONS Circular fingerprints are potent structural features that yield highly predictive models and encode interpretable structural information. However, to not lose interpretability, circular fingerprints should not be folded when building prediction models. Our experiments show that filtering is a suitable option to reduce the high computational effort when working with all fingerprint fragments. Additionally, our experiments suggest that the area under precision recall curve is a more sensible statistic for validating (Q)SAR models for virtual screening than the area under ROC or other measures for early recognition. GRAPHICAL ABSTRACT
Collapse
|
research-article |
9 |
17 |
18
|
Sharma N, Patiyal S, Dhall A, Devi NL, Raghava GPS. ChAlPred: A web server for prediction of allergenicity of chemical compounds. Comput Biol Med 2021; 136:104746. [PMID: 34388468 DOI: 10.1016/j.compbiomed.2021.104746] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 08/04/2021] [Accepted: 08/04/2021] [Indexed: 11/28/2022]
Abstract
BACKGROUND Allergy is the abrupt reaction of the immune system that may occur after the exposure to allergens such as proteins, peptides, or chemicals. In the past, various methods have been generated for predicting allergenicity of proteins and peptides. In contrast, there is no method that can predict allergenic potential of chemicals. In this paper, we described a method ChAlPred developed for predicting chemical allergens as well as for designing chemical analogs with desired allergenicity. METHOD In this study, we have used 403 allergenic and 1074 non-allergenic chemical compounds obtained from IEDB database. The PaDEL software was used to compute the molecular descriptors of the chemical compounds to develop different prediction models. All the models were trained and tested on the 80% training data and evaluated on the 20% validation data using the 2D, 3D and FP descriptors. RESULTS In this study, we have developed different prediction models using several machine learning approaches. It was observed that the Random Forest based model developed using hybrid descriptors performed the best, and achieved the maximum accuracy of 83.39% and AUC of 0.93 on validation dataset. The fingerprint analysis of the dataset indicates that certain chemical fingerprints are more abundant in allergens that include PubChemFP129 and GraphFP1014. We have also predicted allergenicity potential of FDA-approved drugs using our best model and identified the drugs causing allergic symptoms (e.g., Cefuroxime, Spironolactone, Tioconazole). Our results agreed with allergenicity of these drugs reported in literature. CONCLUSIONS To aid the research community, we developed a smart-device compatible web server ChAlPred (https://webs.iiitd.edu.in/raghava/chalpred/) that allows to predict and design the chemicals with allergenic properties.
Collapse
|
Journal Article |
4 |
16 |
19
|
Vogelsang MD, Palmeri TJ, Busey TA. Holistic processing of fingerprints by expert forensic examiners. Cogn Res Princ Implic 2017; 2:15. [PMID: 28275708 PMCID: PMC5318483 DOI: 10.1186/s41235-017-0051-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2016] [Accepted: 01/17/2017] [Indexed: 11/10/2022] Open
Abstract
Holistic processing is often characterized as a process by which objects are perceived as a whole rather than a compilation of individual features. This mechanism may play an important role in the development of perceptual expertise because it allows for rapid integration across image regions. The present work explores whether holistic processing is present in latent fingerprint examiners, who compare fingerprints collected from crime scenes against a set of standards taken from a suspect. We adapted a composite task widely used in the face recognition and perceptual expertise literatures, in which participants were asked to match only a particular half of a fingerprint with a previous image while ignoring the other half. We tested both experts and novices, using both upright and inverted fingerprints. For upright fingerprints, we found weak evidence for holistic processing, but with no differences between experts and novices with respect to holistic processing. For inverted fingerprints, we found stronger evidence of holistic processing, with weak evidence for differences between experts and novices. These relatively weak holistic processing effects contrast with robust evidence for holistic processing with faces and with objects in other domains of perceptual expertise. The data constrain models of holistic processing by demonstrating that latent fingerprint experts and novices may not substantively differ in terms of the amount of holistic processing and that inverted stimuli actually produced more evidence for holistic processing than upright stimuli. Important differences between the present fingerprint stimuli and those in the literature include the lack of verbal labels for experts and the absence of strong vertical asymmetries, both of which might contribute to stronger holistic processing signatures in other stimulus domains.
Collapse
|
research-article |
8 |
16 |
20
|
Algarra M, Bartolić D, Radotić K, Mutavdžić D, Pino-González MS, Rodríguez-Castellón E, Lázaro-Martínez JM, Guerrero-González JJ, Esteves da Silva JC, Jiménez-Jiménez J. P-doped carbon nano-powders for fingerprint imaging. Talanta 2018; 194:150-157. [PMID: 30609515 DOI: 10.1016/j.talanta.2018.10.033] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 10/01/2018] [Accepted: 10/08/2018] [Indexed: 01/09/2023]
Abstract
A simple, fast, and laboratory efficient doped P carbon nanoparticles synthesis is developed for fingerprint imaging, using 1,3-dihydroxyacetone and di-phosphorous pentoxide. Fluorescence nanoparticles, with an average size of 230 nm were obtained, without additional energy input or external heating. ATR, solid NMR, XPS and fluorescence spectroscopy revealed their surface functionalization; a reaction mechanism is proposed. Fluorescence measurements exhibited a maximum emission band at ca. 495 nm, when excited at 385 nm. The images obtained, on different surfaces such as mobile telephone screen, magnetic band and metallic surface of a credit card and a Euro banknote treated with the obtained nano-powders allows us to record positive matches, confirming that the experimental results illustrate the effectiveness of proposed method.
Collapse
|
Journal Article |
7 |
15 |
21
|
Zhao H, Lai CJS, Yu Y, Wang YN, Zhao YJ, Ma F, Hu M, Guo J, Wang X, Guo L. Acidic hydrolysate fingerprints based on HILIC-ELSD/MS combined with multivariate analysis for investigating the quality of Ganoderma lucidum polysaccharides. Int J Biol Macromol 2020; 163:476-484. [PMID: 32593759 DOI: 10.1016/j.ijbiomac.2020.06.206] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 06/04/2020] [Accepted: 06/22/2020] [Indexed: 02/06/2023]
Abstract
In this preliminary study, the acidic hydrolysate fingerprints of polysaccharides based on hydrophilic-interaction chromatography-evaporative light scattering detection-electrospray time-of-flight mass spectrometry (HILIC-ELSD/ESI-TOF/MS) combined with multivariate statistical analysis was developed and applied to investigate the quality of Ganoderma lucidum from different regions. Projection-to-latent-structure discrimination analysis (PLS-DA) could distinguish samples of Zhejiang regions from those of other regions. Orthogonal-projection-to-latent-structure discrimination analysis (OPLS-DA) provided clear discrimination between G. lucidum samples cultivated in Zhejiang and that from other regions, in which Polysaccharides and D-galactose could be considered as candidate biomarkers. In addition, the intraspecific differentiation of G. lucidum was preliminarily investigated with samples from Shaanxi region. They were classified into four groups by PCA and PLS-DA, in which L-rhamnose, D-xylose, L-arabinose, and mannose were considered as potential chemical markers. These preliminary results contributed to our understanding of the variance of polysaccharides in Ganoderma spp. from different geographic origins and the intraspecific differentiation from the same region, which suggest great potential in the quality control of Ganoderma spp.
Collapse
|
Journal Article |
5 |
15 |
22
|
Johannessen H, Gill P, Roseth A, Fonneløp AE. Determination of shedder status: A comparison of two methods involving cell counting in fingerprints and the DNA analysis of handheld tubes. Forensic Sci Int Genet 2021; 53:102541. [PMID: 34090062 DOI: 10.1016/j.fsigen.2021.102541] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 05/25/2021] [Accepted: 05/26/2021] [Indexed: 01/13/2023]
Abstract
The shedder status of an individual may be important to consider in the context of DNA transfer, persistence and recovery and in Bayesian networks where a person's shedder status may have an impact on the outcome. In this study we compared two methods to determine shedder status: the handheld tube (HH) method and a fluorescent cell count (CC) method. A poor association was observed between the numbers of detected cells in a fingerprint using the CC method and the strength of the DNA result with the HH method. The 20 participants were classified into low (25%), medium (50%) and high (25%) shedders based on the HH method. While the low and high shedders showed a good consistency between the replicates, the medium shedders varied more and have to be considered more carefully as they may act as either a high or a low shedder in an event of DNA transfer.
Collapse
|
Research Support, Non-U.S. Gov't |
4 |
15 |
23
|
Kristensen TG, Nielsen J, Pedersen CNS. Methods for Similarity-based Virtual Screening. Comput Struct Biotechnol J 2013; 5:e201302009. [PMID: 24688702 PMCID: PMC3962175 DOI: 10.5936/csbj.201302009] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Revised: 01/30/2013] [Accepted: 02/08/2013] [Indexed: 11/22/2022] Open
Abstract
Developing new medical drugs is expensive. Among the first steps is a screening process, in which molecules in existing chemical libraries are tested for activity against a given target. This requires a lot of resources and manpower. Therefore it has become common to perform a virtual screening, where computers are used for predicting the activity of very large libraries of molecules, to identify the most promising leads for further laboratory experiments. Since computer simulations generally require fewer resources than physical experimentation this can lower the cost of medical and biological research significantly. In this paper we review practically fast algorithms for screening databases of molecules in order to find molecules that are sufficiently similar to a query molecule.
Collapse
|
Review |
12 |
14 |
24
|
Awale M, Reymond JL. Web-based 3D-visualization of the DrugBank chemical space. J Cheminform 2016; 8:25. [PMID: 27148409 PMCID: PMC4855437 DOI: 10.1186/s13321-016-0138-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Accepted: 04/27/2016] [Indexed: 12/14/2022] Open
Abstract
Background Similarly to the periodic table for elements, chemical space offers an organizing principle for representing the diversity of organic molecules, usually in the form of multi-dimensional property spaces that are subjected to dimensionality reduction methods to obtain 3D-spaces or 2D-maps suitable for visual inspection. Unfortunately, tools to look at chemical space on the internet are currently very limited. Results Herein we present webDrugCS, a web application freely available at www.gdb.unibe.ch to visualize DrugBank (www.drugbank.ca, containing over 6000 investigational and approved drugs) in five different property spaces. WebDrugCS displays 3D-clouds of color-coded grid points representing molecules, whose structural formula is displayed on mouse over with an option to link to the corresponding molecule page at the DrugBank website. The 3D-clouds are obtained by principal component analysis of high dimensional property spaces describing constitution and topology (42D molecular quantum numbers MQN), structural features (34D SMILES fingerprint SMIfp), molecular shape (20D atom pair fingerprint APfp), pharmacophores (55D atom category extended atom pair fingerprint Xfp) and substructures (1024D binary substructure fingerprint Sfp). User defined molecules can be uploaded as SMILES lists and displayed together with DrugBank. In contrast to 2D-maps where many compounds fold onto each other, these 3D-spaces have a comparable resolution to their parent high-dimensional chemical space. Conclusion To the best of our knowledge webDrugCS is the first publicly available web tool for interactive visualization and exploration of the DrugBank chemical space in 3D. WebDrugCS works on computers, tablets and phones, and facilitates the visual exploration of DrugBank to rapidly learn about the structural diversity of small molecule drugs. |