1
|
Mahjour BA, Coley CW. RDCanon: A Python Package for Canonicalizing the Order of Tokens in SMARTS Queries. J Chem Inf Model 2024; 64:2948-2954. [PMID: 38488634 DOI: 10.1021/acs.jcim.4c00138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
SMARTS is a widely used language in cheminformatics for defining substructural queries for database lookups, reaction templates for chemical transformations, and other applications. As an extension to SMILES, many SMARTS patterns can represent the same query. Despite this, no canonicalization algorithm invariant of the line notation sequence or atomic numbering is publicly available. Here, we introduce RDCanon, an open-source Python package that can be used to standardize SMARTS queries. RDCanon is designed to ensure that the sequence of atomic queries remains consistent for all graphs representing the same substructure query and to ensure a canonical sequence of primitives within each individual atom query; furthermore, the algorithm can be applied to canonicalize the order of reactants, agents, and products and their atom map numbers in reaction SMARTS templates. As part of its canonicalization algorithm, RDCanon provides a mechanism in which the canonicalized SMARTS is optimized for speed against specific molecular databases. Several case studies are provided to showcase improved efficiency in substructure matching and retrosynthetic analysis.
Collapse
Affiliation(s)
- Babak A Mahjour
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
2
|
Wang L, An Y, Wei X, Huang X, Tu Y, Qiao L, Zhu W. In silico screening combined with bioactivity evaluation to identify AMI-1 as a novel anticancer compound by targeting AXL. J Biomol Struct Dyn 2023:1-13. [PMID: 37691424 DOI: 10.1080/07391102.2023.2255654] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 07/20/2023] [Indexed: 09/12/2023]
Abstract
Recently, some studies have proven that AXL plays a crucial role in the drug resistance of tumors. At present, no AXL inhibitors on the market and it is essential to discover novel compounds targeting AXL to overcome resistance. In this work, based on the anchor structure, 21,313 compounds were obtained by substructure search from more than 400,000 compounds. Then, the Qvina and Ledock were selected for virtual screening to obtain 17 compounds. Next, four compounds (ARRY614, AMI-1, NG25, and Butein) were selected for bioactivity evaluation after hydrogen bond and cluster analysis. Further activity evaluation suggested that the compound AMI-1 is a novel AXL inhibitor with an IC50 value of 1.13 uM. In addition, molecular dynamics simulation demonstrated that compound AMI-1 contained lower binding energy and more key residues than the other three compounds, showing the best inhibitory activity against AXL. Finally, further MM/PBSA prediction showed that AMI-1 is more sensitive to mutant protein 3IKA than wildtype protein 1M17, which means that the AMI-1 may be helpful to overcome the resistance of EGFRT790M mutations. In conclusion, this work successfully discovered a novel compound with moderate inhibitory activity against AXL by a drug discovery workflow, which also could be applied to discover active compounds for other targets quickly.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Linxiao Wang
- Jiangxi Provincial Key Laboratory of Drug Design and Evaluation, School of Pharmacy, Jiangxi Science & Technology Normal University, Nanchang, China
| | - Yufeng An
- Jiangxi Provincial Key Laboratory of Drug Design and Evaluation, School of Pharmacy, Jiangxi Science & Technology Normal University, Nanchang, China
| | - Xiongpiao Wei
- Jiangxi Provincial Key Laboratory of Drug Design and Evaluation, School of Pharmacy, Jiangxi Science & Technology Normal University, Nanchang, China
| | - Xiaoling Huang
- Jiangxi Provincial Key Laboratory of Drug Design and Evaluation, School of Pharmacy, Jiangxi Science & Technology Normal University, Nanchang, China
| | - Yuanbiao Tu
- Cancer Research Center, Jiangxi University of Traditional Chinese Medicine, Nanchang, China
| | - Lukai Qiao
- Jiangxi Provincial Key Laboratory of Drug Design and Evaluation, School of Pharmacy, Jiangxi Science & Technology Normal University, Nanchang, China
| | - Wufu Zhu
- Jiangxi Provincial Key Laboratory of Drug Design and Evaluation, School of Pharmacy, Jiangxi Science & Technology Normal University, Nanchang, China
| |
Collapse
|
3
|
Fang L, Li J, Zhao M, Tan L, Lou JG. Single-step retrosynthesis prediction by leveraging commonly preserved substructures. Nat Commun 2023; 14:2446. [PMID: 37117216 PMCID: PMC10147675 DOI: 10.1038/s41467-023-37969-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 03/31/2023] [Indexed: 04/30/2023] Open
Abstract
Retrosynthesis analysis is an important task in organic chemistry with numerous industrial applications. Previously, machine learning approaches employing natural language processing techniques achieved promising results in this task by first representing reactant molecules as strings and subsequently predicting reactant molecules using text generation or machine translation models. Chemists cannot readily derive useful insights from traditional approaches that rely largely on atom-level decoding in the string representations, because human experts tend to interpret reactions by analyzing substructures that comprise a molecule. It is well-established that some substructures are stable and remain unchanged in reactions. In this paper, we developed a substructure-level decoding model, where commonly preserved portions of product molecules were automatically extracted with a fully data-driven approach. Our model achieves improvement over previously reported models, and we demonstrate that its performance can be boosted further by enhancing the accuracy of these substructures. Analyzing substructures extracted from our machine learning model can provide human experts with additional insights to assist decision-making in retrosynthesis analysis.
Collapse
Affiliation(s)
- Lei Fang
- Microsoft Research Asia, No.5 Dan Ling Street, Beijing, China.
| | - Junren Li
- College of Chemistry and Molecular Engineering, Peking University, No.5 Yiheyuan Road, Beijing, China
| | - Ming Zhao
- IPS, Waseda University, 2-7 Hibikino, Wakamatsu-ku, Kitakyushu-shi, Fukuoka, 808-0135, Japan
| | - Li Tan
- Mincui Therapeutix, No.1 Yongtaizhuang North Road, Beijing, China
| | - Jian-Guang Lou
- Microsoft Research Asia, No.5 Dan Ling Street, Beijing, China
| |
Collapse
|
4
|
Boulaamane Y, Ibrahim MAA, Britel MR, Maurady A. In silico studies of natural product-like caffeine derivatives as potential MAO-B inhibitors/AA 2AR antagonists for the treatment of Parkinson's disease. J Integr Bioinform 2022; 19:jib-2021-0027. [PMID: 36112816 PMCID: PMC9800045 DOI: 10.1515/jib-2021-0027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Accepted: 06/24/2022] [Indexed: 01/09/2023] Open
Abstract
Parkinson's disease is considered the second most frequent neurodegenerative disease. It is described by the loss of dopaminergic neurons in the mid-brain. For many decades, L-DOPA has been considered as the gold standard for treating Parkinson's disease motor symptoms, however, due to the decrease of efficacy, in the long run, there is an urgent need for novel antiparkinsonian drugs. Caffeine derivatives have been reported several times for their neuroprotective properties and dual blockade of monoamine oxidase (MAO) and adenosine A2A receptors (AA2AR). Natural products are currently attracting more focus due to structural diversity and safety in contrast to synthetic drugs. In the present work, computational studies were conducted on natural product-like caffeine derivatives to search for novel potent candidates acting as dual MAO-B inhibitors/AA2AR antagonists for Parkinson's disease. Our findings revealed two natural products among the top hits: CNP0202316 and CNP0365210 fulfill the requirements of drugs acting on the brain. The selected lead compounds were further studied using molecular dynamics simulation to assess their stability with MAO-B. Current findings might shift the interest towards natural-based compounds and could be exploited to further optimize caffeine derivatives into a successful dual-target-directed drug for managing and halting the neuronal damage in Parkinson's disease patients.
Collapse
Affiliation(s)
- Yassir Boulaamane
- Laboratory of Innovative Technologies, National School of Applied Sciences of Tangier, Abdelmalek Essaadi University, Tetouan, Morocco
| | - Mahmoud A. A. Ibrahim
- Computational Chemistry Laboratory, Chemistry Department, Faculty of Science, Minia University, Minia, 61519, Egypt
| | - Mohammed Reda Britel
- Laboratory of Innovative Technologies, National School of Applied Sciences of Tangier, Abdelmalek Essaadi University, Tetouan, Morocco
| | - Amal Maurady
- Laboratory of Innovative Technologies, National School of Applied Sciences of Tangier, Abdelmalek Essaadi University, Tetouan, Morocco
- Faculty of Sciences and Techniques of Tangier, Abdelmalek Essaadi University, Tetouan, Morocco
| |
Collapse
|
5
|
Mihelič J, Čibej U. An experimental evaluation of refinement techniques for the subgraph isomorphism backtracking algorithms. OPEN COMPUTER SCIENCE 2020. [DOI: 10.1515/comp-2020-0149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
In this paper, we study a well-known computationally hard problem, called the subgraph isomorphism problem where the goal is for a given pattern and target graphs to determine whether the pattern is a subgraph of the target graph. Numerous algorithms for solving the problem exist in the literature and most of them are based on the backtracking approach. Since straightforward backtracking is usually slow, many algorithmic refinement techniques are used in practical algorithms. The main goal of this paper is to study such refinement techniques and to determine their ability to speed up backtracking algorithms. To do this we use a methodology of experimental algorithmics. We perform an experimental evaluation of the techniques and their combinations and, hence, demonstrate their usefulness in practice.
Collapse
Affiliation(s)
- Jurij Mihelič
- Faculty of Computer and Information Science , University of Ljubljana , Večna pot 113 , Ljubljana , , Slovenia
| | - Uroš Čibej
- Faculty of Computer and Information Science , University of Ljubljana , Večna pot 113 , Ljubljana , , Slovenia
| |
Collapse
|
6
|
Kratochvíl M, Vondrášek J, Galgonek J. Sachem: a chemical cartridge for high-performance substructure search. J Cheminform 2018; 10:27. [PMID: 29797000 PMCID: PMC5966370 DOI: 10.1186/s13321-018-0282-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 05/16/2018] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Structure search is one of the valuable capabilities of small-molecule databases. Fingerprint-based screening methods are usually employed to enhance the search performance by reducing the number of calls to the verification procedure. In substructure search, fingerprints are designed to capture important structural aspects of the molecule to aid the decision about whether the molecule contains a given substructure. Currently available cartridges typically provide acceptable search performance for processing user queries, but do not scale satisfactorily with dataset size. RESULTS We present Sachem, a new open-source chemical cartridge that implements two substructure search methods: The first is a performance-oriented reimplementation of substructure indexing based on the OrChem fingerprint, and the second is a novel method that employs newly designed fingerprints stored in inverted indices. We assessed the performance of both methods on small, medium, and large datasets containing 1, 10, and 94 million compounds, respectively. Comparison of Sachem with other freely available cartridges revealed improvements in overall performance, scaling potential and screen-out efficiency. CONCLUSIONS The Sachem cartridge allows efficient substructure searches in databases of all sizes. The sublinear performance scaling of the second method and the ability to efficiently query large amounts of pre-extracted information may together open the door to new applications for substructure searches.
Collapse
Affiliation(s)
- Miroslav Kratochvíl
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, Prague 6, 166 10, Czech Republic.,Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Malostranské náměstí 25, Prague 1, 118 00, Czech Republic
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, Prague 6, 166 10, Czech Republic
| | - Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, Prague 6, 166 10, Czech Republic.
| |
Collapse
|
7
|
Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017; 117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo , Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense E-32004, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia) , Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain.,CEB-Centre of Biological Engineering, University of Minho , Campus de Gualtar, Braga 4710-057, Portugal
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Alfonso Valencia
- Life Science Department, Barcelona Supercomputing Centre (BSC-CNS) , C/Jordi Girona, 29-31, Barcelona E-08034, Spain.,Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona , C/ Baldiri Reixac 10, Barcelona E-08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig de Lluís Companys 23, Barcelona E-08010, Spain
| |
Collapse
|
8
|
Pottel J, Moitessier N. Customizable Generation of Synthetically Accessible, Local Chemical Subspaces. J Chem Inf Model 2017; 57:454-467. [DOI: 10.1021/acs.jcim.6b00648] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Joshua Pottel
- Department of Chemistry, McGill University, 801
Sherbrooke Street W., Montréal, Québec, Canada H3A 0B8
| | - Nicolas Moitessier
- Department of Chemistry, McGill University, 801
Sherbrooke Street W., Montréal, Québec, Canada H3A 0B8
| |
Collapse
|
9
|
García-Sánchez MO, Cruz-Monteagudo M, Medina-Franco JL. Quantitative Structure-Epigenetic Activity Relationships. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2017. [DOI: 10.1007/978-3-319-56850-8_8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
10
|
Weskamp N. Guided Iterative Substructure Search (GI-SSS) - A New Trick for an Old Dog. Mol Inform 2016; 35:286-92. [PMID: 27492243 DOI: 10.1002/minf.201600063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 06/09/2016] [Indexed: 11/10/2022]
Abstract
Substructure search (SSS) is a fundamental technique supported by various chemical information systems. Many users apply it in an iterative manner: they modify their queries to shape the composition of the retrieved hit sets according to their needs. We propose and evaluate two heuristic extensions of SSS aimed at simplifying these iterative query modifications by collecting additional information during query processing and visualizing this information in an intuitive way. This gives the user a convenient feedback on how certain changes to the query would affect the retrieved hit set and reduces the number of trial-and-error cycles needed to generate an optimal search result. The proposed heuristics are simple, yet surprisingly effective and can be easily added to existing SSS implementations.
Collapse
Affiliation(s)
- Nils Weskamp
- Boehringer Ingelheim Pharma GmbH & Co. KG, Discovery Research, Lead Identification and Optimization Support, Computational Chemistry, Birkendorfer Straße 65, 88397, Biberach an der Riss, Germany.
| |
Collapse
|
11
|
Čibej U, Mihelič J. Improvements to Ullmann's Algorithm for the Subgraph Isomorphism Problem. INT J PATTERN RECOGN 2015. [DOI: 10.1142/s0218001415500251] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The subgraph isomorphism problem is one of the most important problems for pattern recognition in graphs. Its applications are found in many different disciplines, including chemistry, medicine, and social network analysis. Because of the [Formula: see text]-completeness of the problem, the existing exact algorithms exhibit an exponential worst-case running time. In this paper, we propose several improvements to the well-known Ullmann's algorithm for the problem. The improvements lower the time consumption as well as the space requirements of the algorithm. We experimentally demonstrate the efficiency of our improvement by comparing it to another set of improvements called FocusSearch, as well as other state-of-the-art algorithms, namely VF2 and LAD.
Collapse
Affiliation(s)
- Uroš Čibej
- Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, 1000, Ljubljana, Slovenia
| | - Jurij Mihelič
- Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, 1000, Ljubljana, Slovenia
| |
Collapse
|
12
|
Bajorath J. Molecular crime scene investigation - dusting for fingerprints. DRUG DISCOVERY TODAY. TECHNOLOGIES 2013; 10:e491-e498. [PMID: 24451639 DOI: 10.1016/j.ddtec.2012.06.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In chemoinformatics and drug design, fingerprints (FPs) are defined as string representations of molecular structure and properties and are popular descriptors for similarity searching. FPs are generally characterized by the simplicity of their design and ease of use. Despite a long history in chemoinformatics, the potential and limitations of FP searching are often not well under- stood. Standard FPs can also be subjected to engineering techniques to tune them for specific search applications.
Collapse
|
13
|
Chemoinformatic characterization of activity and selectivity switches of antiprotozoal compounds. Future Med Chem 2013; 6:281-94. [PMID: 24279680 DOI: 10.4155/fmc.13.173] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Benzimidazole derivatives are promising compounds for the treatment of parasitic infections. The structure-activity relationships of 91 benzimidazoles with activity against Trichomonas vaginalis and Giardia intestinalis were analyzed using a novel activity landscape modeling approach. RESULTS We identified two prominent cases of 'activity switches' and 'selectivity switches' where two R group substitutions in the benzimidazole scaffold completely invert the activity and selectivity pattern for T. vaginalis and G. intestinalis. CONCLUSION A chemoinformatic methodology was used to rapidly identify discrete structural changes around the central scaffold that are associated with large changes in biological activity for each parasite. The structure-activity relationships for the benzimidazole derivatives is smooth for both protozoan with few but markedly important activity cliffs.
Collapse
|
14
|
Stumpfe D, Hu Y, Dimova D, Bajorath J. Recent progress in understanding activity cliffs and their utility in medicinal chemistry. J Med Chem 2013; 57:18-28. [PMID: 23981118 DOI: 10.1021/jm401120g] [Citation(s) in RCA: 151] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The activity cliff concept is of high relevance for medicinal chemistry. Recent studies are discussed that have further refined our understanding of activity cliffs and suggested different ways of exploiting activity cliff information. These include alternative approaches to define and classify activity cliffs in two and three dimensions, data mining investigations to systematically detect all possible activity cliffs, the introduction of computational methods to predict activity cliffs, and studies designed to explore activity cliff progression in medicinal chemistry. The discussion of these studies is complemented with new findings revealing the frequency of activity cliff formation when different molecular representations are used and the distribution of activity cliffs across different targets. Taken together, the results have a number of implications for the practice of medicinal chemistry.
Collapse
Affiliation(s)
- Dagmar Stumpfe
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität , Dahlmannstrasse 2, D-53113 Bonn, Germany
| | | | | | | |
Collapse
|
15
|
Schomburg KT, Wetzer L, Rarey M. Interactive design of generic chemical patterns. Drug Discov Today 2013; 18:651-8. [DOI: 10.1016/j.drudis.2013.02.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Revised: 12/19/2012] [Accepted: 02/01/2013] [Indexed: 11/17/2022]
|
16
|
Fingerprint design and engineering strategies: rationalizing and improving similarity search performance. Future Med Chem 2013; 4:1945-59. [PMID: 23088275 DOI: 10.4155/fmc.12.126] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Fingerprints (FPs) are bit or integer string representations of molecular structure and properties, and are popular descriptors for chemical similarity searching. A major goal of similarity searching is the identification of novel active compounds on the basis of known reference molecules. In this review recent FP design and engineering strategies are discussed. New types of FPs continue to be replaced, often applying different design principles. FP engineering techniques have recently been introduced to further improve search performance and computational efficiency and elucidate mechanisms by which FPs recognize active compounds. In addition, through feature selection and hybridization techniques, standard FPs have been transformed into compound class-specific versions with further increased search performance. Moreover, scaffold hopping mechanisms have been explored. FPs will continue to play an important role in the search for novel active compounds.
Collapse
|
17
|
Vogt M, Bajorath J. Chemoinformatics: A view of the field and current trends in method development. Bioorg Med Chem 2012; 20:5317-23. [DOI: 10.1016/j.bmc.2012.03.030] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Revised: 03/09/2012] [Accepted: 03/12/2012] [Indexed: 12/18/2022]
|
18
|
Ehrlich HC, Rarey M. Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2. J Cheminform 2012; 4:13. [PMID: 22849361 PMCID: PMC3586954 DOI: 10.1186/1758-2946-4-13] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2012] [Accepted: 04/27/2012] [Indexed: 11/24/2022] Open
Abstract
Background Searching for substructures in molecules belongs to the most elementary tasks in cheminformatics and is nowadays part of virtually every cheminformatics software. The underlying algorithms, used over several decades, are designed for the application to general graphs. Applied on molecular graphs, little effort has been spend on characterizing their performance. Therefore, it is not clear how current substructure search algorithms behave on such special graphs. One of the main reasons why such an evaluation was not performed in the past was the absence of appropriate data sets. Results In this paper, we present a systematic evaluation of Ullmann’s and the VF2 subgraph isomorphism algorithms on molecular data. The benchmark set consists of a collection of 1235 SMARTS substructure expressions and selected molecules from the ZINC database. The benchmark evaluates substructures search times for complete database scans as well as individual substructure-molecule pairs. In detail, we focus on the influence of substructure formulation and size, the impact of molecule size, and the ability of both algorithms to be used on multiple cores. Conclusions The results show a clear superiority of the VF2 algorithm in all test scenarios. In general, both algorithms solve most instances in less than one millisecond, which we consider to be acceptable. Still, in direct comparison, the VF2 is most often several folds faster than Ullmann’s algorithm. Additionally, Ullmann’s algorithm shows a surprising number of run time outliers.
Collapse
Affiliation(s)
- Hans-Christian Ehrlich
- Center for Bioinformatics, University of Hamburg, Bundestraße 43, 20146 Hamburg, Germany.
| | | |
Collapse
|
19
|
|