1
|
Whitehouse AJ, Sanchez-Martinez M, Salehi SM, Kurbatova N, Dean E. Open-Source Approach to GPU-Accelerated Substructure Search. J Chem Inf Model 2024. [PMID: 39225069 DOI: 10.1021/acs.jcim.4c00679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Chemical substructure search is a critical task in medicinal chemistry and small-molecule drug discovery, enabling the retrieval of molecules from databases based on specific chemical features. While systems exist for this purpose, the challenge of efficient and swift searching persists, particularly as data storage migrates to the cloud, introducing new complexities. This study provides a comprehensive analysis of chemical substructure searches, showcasing the benefits of graphics processing unit-accelerated fingerprint screening. The research highlights strategies for optimizing performance, making significant advancements in substructure searching, a pivotal aspect of drug discovery and molecular research. The accessible and scalable nature of the proposed approach makes it a valuable resource for scientists aiming to enhance their substructure search capabilities.
Collapse
Affiliation(s)
- Andrew J Whitehouse
- Zifo Technologies Ltd, Office 7, 37-39 Shakespeare Street, Southport, Merseyside PR8 5AB, U.K
| | | | - Seyedeh Maryam Salehi
- Zifo Technologies Ltd, Office 7, 37-39 Shakespeare Street, Southport, Merseyside PR8 5AB, U.K
| | - Natalja Kurbatova
- Zifo Technologies Ltd, Office 7, 37-39 Shakespeare Street, Southport, Merseyside PR8 5AB, U.K
| | - Euan Dean
- Zifo Technologies Ltd, Office 7, 37-39 Shakespeare Street, Southport, Merseyside PR8 5AB, U.K
| |
Collapse
|
2
|
Mahjour BA, Coley CW. RDCanon: A Python Package for Canonicalizing the Order of Tokens in SMARTS Queries. J Chem Inf Model 2024; 64:2948-2954. [PMID: 38488634 DOI: 10.1021/acs.jcim.4c00138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
SMARTS is a widely used language in cheminformatics for defining substructural queries for database lookups, reaction templates for chemical transformations, and other applications. As an extension to SMILES, many SMARTS patterns can represent the same query. Despite this, no canonicalization algorithm invariant of the line notation sequence or atomic numbering is publicly available. Here, we introduce RDCanon, an open-source Python package that can be used to standardize SMARTS queries. RDCanon is designed to ensure that the sequence of atomic queries remains consistent for all graphs representing the same substructure query and to ensure a canonical sequence of primitives within each individual atom query; furthermore, the algorithm can be applied to canonicalize the order of reactants, agents, and products and their atom map numbers in reaction SMARTS templates. As part of its canonicalization algorithm, RDCanon provides a mechanism in which the canonicalized SMARTS is optimized for speed against specific molecular databases. Several case studies are provided to showcase improved efficiency in substructure matching and retrosynthetic analysis.
Collapse
Affiliation(s)
- Babak A Mahjour
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
3
|
Rieder SR, Oliveira MP, Riniker S, Hünenberger PH. Development of an open-source software for isomer enumeration. J Cheminform 2023; 15:10. [PMID: 36683047 PMCID: PMC9867865 DOI: 10.1186/s13321-022-00677-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 12/28/2022] [Indexed: 01/23/2023] Open
Abstract
This article documents enu, a freely-downloadable, open-source and stand-alone program written in C++ for the enumeration of the constitutional isomers and stereoisomers of a molecular formula. The program relies on graph theory to enumerate all the constitutional isomers of a given formula on the basis of their canonical adjacency matrix. The stereoisomers of a given constitutional isomer are enumerated as well, on the basis of the automorphism group of this matrix. The isomer list is then reported in the form of canonical SMILES strings within files in XML format. The specification of the molecule family of interest is very flexible and the code is optimized for computational efficiency. The algorithms and implementations underlying enu are described, and simple illustrative applications are presented. The enu code is freely available on GitHub at https://github.com/csms-ethz/CombiFF .
Collapse
Affiliation(s)
- Salomé R. Rieder
- grid.5801.c0000 0001 2156 2780Laboratorium für Physikalische Chemie, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Marina P. Oliveira
- grid.5801.c0000 0001 2156 2780Laboratorium für Physikalische Chemie, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Sereina Riniker
- grid.5801.c0000 0001 2156 2780Laboratorium für Physikalische Chemie, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Philippe H. Hünenberger
- grid.5801.c0000 0001 2156 2780Laboratorium für Physikalische Chemie, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| |
Collapse
|
4
|
Luaces D, Viqueira JR, Cotos JM, Flores JC. Efficient access methods for very large distributed graph databases. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.05.047] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
5
|
Design of a small molecule that stimulates vascular endothelial growth factor A enabled by screening RNA fold-small molecule interactions. Nat Chem 2020; 12:952-961. [PMID: 32839603 PMCID: PMC7571259 DOI: 10.1038/s41557-020-0514-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 06/24/2020] [Indexed: 12/20/2022]
Abstract
Vascular endothelial growth factor A (VEGFA) stimulates angiogenesis in human endothelial cells, and increasing its expression is a potential treatment for heart failure. Here, we report the design of a small molecule (TGP-377) that specifically and potently enhances VEGFA expression by the targeting of a non-coding microRNA that regulates its expression. A selection-based screen, named two-dimensional combinatorial screening, revealed preferences in small-molecule chemotypes that bind RNA and preferences in the RNA motifs that bind small molecules. The screening program increased the dataset of known RNA motif–small molecule binding partners by 20-fold. Analysis of this dataset against the RNA-mediated pathways that regulate VEGFA defined that the microRNA-377 precursor, which represses Vegfa messenger RNA translation, is druggable in a selective manner. We designed TGP-377 to potently and specifically upregulate VEGFA in human umbilical vein endothelial cells. These studies illustrate the power of two-dimensional combinatorial screening to define molecular recognition events between ‘undruggable’ biomolecules and small molecules, and the ability of sequence-based design to deliver efficacious structure-specific compounds.
Collapse
|
6
|
Ehmki ESR, Schmidt R, Ohm F, Rarey M. Comparing Molecular Patterns Using the Example of SMARTS: Applications and Filter Collection Analysis. J Chem Inf Model 2019; 59:2572-2586. [DOI: 10.1021/acs.jcim.9b00249] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | - Robert Schmidt
- ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| | - Farina Ohm
- ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| | - Matthias Rarey
- ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| |
Collapse
|
7
|
Schmidt R, Ehmki ESR, Ohm F, Ehrlich HC, Mashychev A, Rarey M. Comparing Molecular Patterns Using the Example of SMARTS: Theory and Algorithms. J Chem Inf Model 2019; 59:2560-2571. [DOI: 10.1021/acs.jcim.9b00250] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Robert Schmidt
- ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| | | | - Farina Ohm
- ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| | | | - Andriy Mashychev
- ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| | - Matthias Rarey
- ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| |
Collapse
|
8
|
Shahare HV, Talele GS. Designing of benzothiazole derivatives as promising EGFR tyrosine kinase inhibitors: a pharmacoinformatics study. J Biomol Struct Dyn 2019; 38:1365-1374. [DOI: 10.1080/07391102.2019.1604264] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Hitesh V. Shahare
- Department of Chemistry, SNJBs Shriman Sureshdada Jain College of Pharmacy, Chandwad, Nasik, Maharashtra, India
| | - Gokul S. Talele
- Department of Chemistry, SNJBs Shriman Sureshdada Jain College of Pharmacy, Chandwad, Nasik, Maharashtra, India
- NGSPM College of Pharmacy, Brahmavalley Educational Campus, Anjaneri, Nashik, Maharashtra, India
| |
Collapse
|
9
|
Zanette C, Bannan CC, Bayly CI, Fass J, Gilson MK, Shirts MR, Chodera JD, Mobley DL. Toward Learned Chemical Perception of Force Field Typing Rules. J Chem Theory Comput 2019; 15:402-423. [PMID: 30512951 PMCID: PMC6467725 DOI: 10.1021/acs.jctc.8b00821] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Molecular mechanics force fields define how the energy and forces in a molecular system are computed from its atomic positions, thus enabling the study of such systems through computational methods like molecular dynamics and Monte Carlo simulations. Despite progress toward automated force field parametrization, considerable human expertise is required to develop or extend force fields. In particular, human input has long been required to define atom types, which encode chemically unique environments that determine which parameters will be assigned. However, relying on humans to establish atom types is suboptimal. Human-created atom types are often developed without statistical justification, leading to over- or under-fitting of data. Human-created types are also difficult to extend in a systematic and consistent manner when new chemistries must be modeled or new data becomes available. Finally, human effort is not scalable when force fields must be generated for new (bio)polymers, compound classes, or materials. To remedy these deficiencies, our long-term goal is to replace human specification of atom types with an automated approach, based on rigorous statistics and driven by experimental and/or quantum chemical reference data. In this work, we describe novel methods that automate the discovery of appropriate chemical perception: SMARTY allows for the creation of atom types, while SMIRKY goes further by automating the creation of fragment (nonbonded, bonds, angles, and torsions) types. These approaches enable the creation of move sets in atom or fragment type space, which are used within a Monte Carlo optimization approach. We demonstrate the power of these new methods by automating the rediscovery of human defined atom types (SMARTY) or fragment types (SMIRKY) in existing small molecule force fields. We assess these approaches using several molecular data sets, including one which covers a diverse subset of the DrugBank database.
Collapse
Affiliation(s)
- Camila Zanette
- Department of Pharmaceutical Sciences, University of California, Irvine
| | | | | | - Josh Fass
- Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY 10065
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
| | - Michael K. Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego
| | - Michael R. Shirts
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80309
| | - John D. Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
| | - David L. Mobley
- Department of Pharmaceutical Sciences, University of California, Irvine
- Department of Chemistry, University of California, Irvine
| |
Collapse
|
10
|
Chang HJ, Fischer T, Petit M, Zambelli M, Demiris Y. Learning Kinematic Structure Correspondences Using Multi-Order Similarities. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 40:2920-2934. [PMID: 29989982 DOI: 10.1109/tpami.2017.2777486] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, we present a novel framework for finding the kinematic structure correspondences between two articulated objects in videos via hypergraph matching. In contrast to appearance and graph alignment based matching methods, which have been applied among two similar static images, the proposed method finds correspondences between two dynamic kinematic structures of heterogeneous objects in videos. Thus our method allows matching the structure of objects which have similar topologies or motions, or a combination of the two. Our main contributions can be summarised as follows: (i) casting the kinematic structure correspondence problem into a hypergraph matching problem by incorporating multi-order similarities with normalising weights, (ii) introducing a structural topology similarity measure by aggregating topology constrained subgraph isomorphisms, (iii) measuring kinematic correlations between pairwise nodes, and (iv) proposing a combinatorial local motion similarity measure using geodesic distance on the Riemannian manifold. We demonstrate the robustness and accuracy of our method through a number of experiments on synthetic and real data, outperforming various other state of the art methods. Our method is not limited to a specific application nor sensor, and can be used as building block in applications such as action recognition, human motion retargeting to robots, and articulated object manipulation amongst others.
Collapse
|
11
|
Hähnke VD, Kim S, Bolton EE. PubChem chemical structure standardization. J Cheminform 2018; 10:36. [PMID: 30097821 PMCID: PMC6086778 DOI: 10.1186/s13321-018-0293-8] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 08/01/2018] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND PubChem is a chemical information repository, consisting of three primary databases: Substance, Compound, and BioAssay. When individual data contributors submit chemical substance descriptions to Substance, the unique chemical structures are extracted and stored into Compound through an automated process called structure standardization. The present study describes the PubChem standardization approaches and analyzes them for their success rates, reasons that cause structures to be rejected, and modifications applied to structures during the standardization process. Furthermore, the PubChem standardization is compared to the structure normalization of the IUPAC International Chemical Identifier (InChI) software, as manifested by conversion of the InChI back into a chemical structure. RESULTS The observed rejection rate for substances processed by PubChem standardization was 0.36%, which is predominantly attributed to structures with invalid atom valences that cannot be readily corrected without additional information from contributors. Of all structures that pass standardization, 44% are modified in the process, reducing the count of unique structures from 53,574,724 in substance to 45,808,881 in compound as identified by de-aromatized canonical isomeric SMILES. Even though the processing time is very low on average (only 0.4% of structures have individual standardization time above 0.1 s), total standardization time is completely dominated by edge cases: 90% of the time to standardize all structures in PubChem substance is spent on the 2.05% of structures with the highest individual standardization time. It is worth noting that 60% of the structures obtained from PubChem structure standardization are not identical to the chemical structure resulting from the InChI (primarily due to preferences for a different tautomeric form). CONCLUSIONS Standardization of chemical structures is complicated by the diversity of chemical information and their representations approaches. The PubChem standardization is an effective and efficient tool to account for molecular diversity and to eliminate invalid/incomplete structures. Further development will concentrate on improved tautomer consideration and an expanded stereocenter definition. Modifications are difficult to thoroughly validate, with slight changes often affecting many thousands of structures and various edge cases. The PubChem structure standardization service is accessible as a public resource ( https://pubchem.ncbi.nlm.nih.gov/standardize ), and via programmatic interfaces.
Collapse
Affiliation(s)
- Volker D. Hähnke
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894 USA
- Present Address: European Patent Office, Patentlaan 2, 2288 EE Rijswijk, The Netherlands
| | - Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894 USA
| | - Evan E. Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894 USA
| |
Collapse
|
12
|
Kratochvíl M, Vondrášek J, Galgonek J. Sachem: a chemical cartridge for high-performance substructure search. J Cheminform 2018; 10:27. [PMID: 29797000 PMCID: PMC5966370 DOI: 10.1186/s13321-018-0282-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 05/16/2018] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Structure search is one of the valuable capabilities of small-molecule databases. Fingerprint-based screening methods are usually employed to enhance the search performance by reducing the number of calls to the verification procedure. In substructure search, fingerprints are designed to capture important structural aspects of the molecule to aid the decision about whether the molecule contains a given substructure. Currently available cartridges typically provide acceptable search performance for processing user queries, but do not scale satisfactorily with dataset size. RESULTS We present Sachem, a new open-source chemical cartridge that implements two substructure search methods: The first is a performance-oriented reimplementation of substructure indexing based on the OrChem fingerprint, and the second is a novel method that employs newly designed fingerprints stored in inverted indices. We assessed the performance of both methods on small, medium, and large datasets containing 1, 10, and 94 million compounds, respectively. Comparison of Sachem with other freely available cartridges revealed improvements in overall performance, scaling potential and screen-out efficiency. CONCLUSIONS The Sachem cartridge allows efficient substructure searches in databases of all sizes. The sublinear performance scaling of the second method and the ability to efficiently query large amounts of pre-extracted information may together open the door to new applications for substructure searches.
Collapse
Affiliation(s)
- Miroslav Kratochvíl
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, Prague 6, 166 10, Czech Republic.,Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Malostranské náměstí 25, Prague 1, 118 00, Czech Republic
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, Prague 6, 166 10, Czech Republic
| | - Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, Prague 6, 166 10, Czech Republic.
| |
Collapse
|
13
|
Inhester T, Nittinger E, Sommer K, Schmidt P, Bietz S, Rarey M. NAOMInova: Interactive Geometric Analysis of Noncovalent Interactions in Macromolecular Structures. J Chem Inf Model 2017; 57:2132-2142. [DOI: 10.1021/acs.jcim.7b00291] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Therese Inhester
- ZBH - Center for Bioinformatics, Universität Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | - Eva Nittinger
- ZBH - Center for Bioinformatics, Universität Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | - Kai Sommer
- ZBH - Center for Bioinformatics, Universität Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | - Pascal Schmidt
- ZBH - Center for Bioinformatics, Universität Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | - Stefan Bietz
- ZBH - Center for Bioinformatics, Universität Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | - Matthias Rarey
- ZBH - Center for Bioinformatics, Universität Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| |
Collapse
|
14
|
Nastasi G, Miceli C, Pittalà V, Modica MN, Prezzavento O, Romeo G, Rescifina A, Marrazzo A, Amata E. S2RSLDB: a comprehensive manually curated, internet-accessible database of the sigma-2 receptor selective ligands. J Cheminform 2017; 9:3. [PMID: 28123452 PMCID: PMC5250622 DOI: 10.1186/s13321-017-0191-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 01/16/2017] [Indexed: 11/10/2022] Open
Abstract
Background
Sigma (σ) receptors are accepted as a particular receptor class consisting of two subtypes: sigma-1 (σ1) and sigma-2 (σ2). The two receptor subtypes have specific drug actions, pharmacological profiles and molecular characteristics. The σ2 receptor is overexpressed in several tumor cell lines, and its ligands are currently under investigation for their role in tumor diagnosis and treatment. The σ2 receptor structure has not been disclosed, and researchers rely on σ2 receptor radioligand binding assay to understand the receptor’s pharmacological behavior and design new lead compounds. Description
Here we present the sigma-2 Receptor Selective Ligands Database (S2RSLDB) a manually curated database of the σ2 receptor selective ligands containing more than 650 compounds. The database is built with chemical structure information, radioligand binding affinity data, computed physicochemical properties, and experimental radioligand binding procedures. The S2RSLDB is freely available online without account login and having a powerful search engine the user may build complex queries, sort tabulated results, generate color coded 2D and 3D graphs and download the data for additional screening. Conclusion The collection here reported is extremely useful for the development of new ligands endowed of σ2 receptor affinity, selectivity, and appropriate physicochemical properties. The database will be updated yearly and in the near future, an online submission form will be available to help with keeping the database widely spread in the research community and continually updated. The database is available at http://www.researchdsf.unict.it/S2RSLDB. Graphical abstract ![]()
Collapse
Affiliation(s)
- Giovanni Nastasi
- Department of Drug Sciences, Medicinal Chemistry Section, University of Catania, Viale A. Doria 6, 95125 Catania, Italy
| | - Carla Miceli
- Department of Drug Sciences, Medicinal Chemistry Section, University of Catania, Viale A. Doria 6, 95125 Catania, Italy
| | - Valeria Pittalà
- Department of Drug Sciences, Medicinal Chemistry Section, University of Catania, Viale A. Doria 6, 95125 Catania, Italy
| | - Maria N Modica
- Department of Drug Sciences, Medicinal Chemistry Section, University of Catania, Viale A. Doria 6, 95125 Catania, Italy
| | - Orazio Prezzavento
- Department of Drug Sciences, Medicinal Chemistry Section, University of Catania, Viale A. Doria 6, 95125 Catania, Italy
| | - Giuseppe Romeo
- Department of Drug Sciences, Medicinal Chemistry Section, University of Catania, Viale A. Doria 6, 95125 Catania, Italy
| | - Antonio Rescifina
- Department of Drug Sciences, Medicinal Chemistry Section, University of Catania, Viale A. Doria 6, 95125 Catania, Italy
| | - Agostino Marrazzo
- Department of Drug Sciences, Medicinal Chemistry Section, University of Catania, Viale A. Doria 6, 95125 Catania, Italy
| | - Emanuele Amata
- Department of Drug Sciences, Medicinal Chemistry Section, University of Catania, Viale A. Doria 6, 95125 Catania, Italy
| |
Collapse
|
15
|
Weskamp N. Guided Iterative Substructure Search (GI-SSS) - A New Trick for an Old Dog. Mol Inform 2016; 35:286-92. [PMID: 27492243 DOI: 10.1002/minf.201600063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 06/09/2016] [Indexed: 11/10/2022]
Abstract
Substructure search (SSS) is a fundamental technique supported by various chemical information systems. Many users apply it in an iterative manner: they modify their queries to shape the composition of the retrieved hit sets according to their needs. We propose and evaluate two heuristic extensions of SSS aimed at simplifying these iterative query modifications by collecting additional information during query processing and visualizing this information in an intuitive way. This gives the user a convenient feedback on how certain changes to the query would affect the retrieved hit set and reduces the number of trial-and-error cycles needed to generate an optimal search result. The proposed heuristics are simple, yet surprisingly effective and can be easily added to existing SSS implementations.
Collapse
Affiliation(s)
- Nils Weskamp
- Boehringer Ingelheim Pharma GmbH & Co. KG, Discovery Research, Lead Identification and Optimization Support, Computational Chemistry, Birkendorfer Straße 65, 88397, Biberach an der Riss, Germany.
| |
Collapse
|
16
|
Nettleton DF, Salas J. Approximate Matching of Neighborhood Subgraphs — An Ordered String Graph Levenshtein Method. INT J UNCERTAIN FUZZ 2016. [DOI: 10.1142/s0218488516500215] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Given that exact pair-wise graph matching has a high computational cost, different representational schemes and matching methods have been devised in order to make matching more efficient. Such methods include representing the graphs as tree structures, transforming the structures into strings and then calculating the edit distance between those strings. However many coding schemes are complex and are computationally expensive. In this paper, we present a novel coding scheme for unlabeled graphs and perform some empirical experiments to evaluate its precision and cost for the matching of neighborhood subgraphs in online social networks. We call our method OSG-L (Ordered String Graph-Levenshtein). Some key advantages of the pre-processing phase are its simplicity, compactness and lower execution time. Furthermore, our method is able to match both non-isomorphisms (near matches) and isomorphisms (exact matches), also taking into account the degrees of the neighbors, which is adequate for social network graphs.
Collapse
Affiliation(s)
- David F. Nettleton
- Departament de Tecnologia i de les Comunicacions, Universitat Pompeu Fabra, C. Roc Boronat, 122-140, 08018 Barcelona, Spain
- IIIA-CSIC, Campus de la UAB, s/n, 08193 Cerdanyola del Vallès, Spain
| | - Julian Salas
- IIIA-CSIC, Campus de la UAB, s/n, 08193 Cerdanyola del Vallès, Spain
- Departament d'Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, Av. Països Catalans, 26, Campus Sescelades, 43007 Tarragona, Spain
| |
Collapse
|
17
|
Schärfer C, Schulz-Gasch T, Hert J, Heinzerling L, Schulz B, Inhester T, Stahl M, Rarey M. CONFECT: Conformations from an Expert Collection of Torsion Patterns. ChemMedChem 2013; 8:1690-700. [DOI: 10.1002/cmdc.201300242] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2013] [Revised: 07/04/2013] [Indexed: 11/09/2022]
|
18
|
Ehrlich HC, Henzler AM, Rarey M. Searching for recursively defined generic chemical patterns in nonenumerated fragment spaces. J Chem Inf Model 2013; 53:1676-88. [PMID: 23751070 DOI: 10.1021/ci400107k] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Retrieving molecules with specific structural features is a fundamental requirement of today's molecular database technologies. Estimates claim the chemical space relevant for drug discovery to be around 10⁶⁰ molecules. This figure is many orders of magnitude larger than the amount of molecules conventional databases retain today and will store in the future. An elegant description of such a large chemical space is provided by the concept of fragment spaces. A fragment space comprises fragments that are molecules with open valences and describes rules how to connect these fragments to products. Due to the combinatorial nature of fragment spaces, a complete enumeration of its products is intractable. We present an algorithm to search fragment spaces for generic chemical patterns as present in the SMARTS chemical pattern language. Our method allows specification of the chemical surrounding of an atom in a query and, therefore, enables a chemically intuitive search. During the search, the costly enumeration of products is avoided. The result is a fragment space that exactly describes all possible molecules that contain the user-defined pattern. We evaluated the algorithm in three different drug development use-cases and performed a large scale statistical analysis with 738 SMARTS patterns on three public available fragment spaces. Our results show the ability of the algorithm to explore the chemical space around known active molecules, to analyze fragment spaces for the presence of likely toxic molecules, and to identify complex macromolecular structures under additional structural constraints. By searching the fragment space in its nonenumerated form, spaces covering up to 10¹⁹ molecules can be examined in times ranging between 47 s and 19 min depending on the complexity of the query pattern.
Collapse
|
19
|
Schärfer C, Schulz-Gasch T, Ehrlich HC, Guba W, Rarey M, Stahl M. Torsion angle preferences in druglike chemical space: a comprehensive guide. J Med Chem 2013; 56:2016-28. [PMID: 23379567 DOI: 10.1021/jm3016816] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Crystal structure databases offer ample opportunities to derive small molecule conformation preferences, but the derived knowledge is not systematically applied in drug discovery research. We address this gap by a comprehensive and extendable expert system enabling quick assessment of the probability of a given conformation to occur. It is based on a hierarchical system of torsion patterns that cover a large part of druglike chemical space. Each torsion pattern has associated frequency histograms generated from CSD and PDB data and, derived from the histograms, traffic-light rules for frequently observed, rare, and highly unlikely torsion ranges. Structures imported into the corresponding software are annotated according to these rules. We present the concept behind the tree of torsion patterns, the design of an intuitive user interface for the management and usage of the torsion library, and we illustrate how the system helps analyze and understand conformation properties of substructures widely used in medicinal chemistry.
Collapse
Affiliation(s)
- Christin Schärfer
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, D-20146 Hamburg, Germany
| | | | | | | | | | | |
Collapse
|