Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ehrlich HC, Rarey M. Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2. J Cheminform 2012;4:13. [PMID: 22849361 PMCID: PMC3586954 DOI: 10.1186/1758-2946-4-13] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2012] [Accepted: 04/27/2012] [Indexed: 11/24/2022] Open

For:	Ehrlich HC, Rarey M. Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2. J Cheminform 2012;4:13. [PMID: 22849361 PMCID: PMC3586954 DOI: 10.1186/1758-2946-4-13] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2012] [Accepted: 04/27/2012] [Indexed: 11/24/2022] Open

Number

Cited by Other Article(s)

Whitehouse AJ, Sanchez-Martinez M, Salehi SM, Kurbatova N, Dean E. Open-Source Approach to GPU-Accelerated Substructure Search. J Chem Inf Model 2024. [PMID: 39225069 DOI: 10.1021/acs.jcim.4c00679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]

Mahjour BA, Coley CW. RDCanon: A Python Package for Canonicalizing the Order of Tokens in SMARTS Queries. J Chem Inf Model 2024;64:2948-2954. [PMID: 38488634 DOI: 10.1021/acs.jcim.4c00138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]

Rieder SR, Oliveira MP, Riniker S, Hünenberger PH. Development of an open-source software for isomer enumeration. J Cheminform 2023;15:10. [PMID: 36683047 PMCID: PMC9867865 DOI: 10.1186/s13321-022-00677-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 12/28/2022] [Indexed: 01/23/2023] Open

Luaces D, Viqueira JR, Cotos JM, Flores JC. Efficient access methods for very large distributed graph databases. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.05.047] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Design of a small molecule that stimulates vascular endothelial growth factor A enabled by screening RNA fold-small molecule interactions. Nat Chem 2020;12:952-961. [PMID: 32839603 PMCID: PMC7571259 DOI: 10.1038/s41557-020-0514-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 06/24/2020] [Indexed: 12/20/2022]

Ehmki ESR, Schmidt R, Ohm F, Rarey M. Comparing Molecular Patterns Using the Example of SMARTS: Applications and Filter Collection Analysis. J Chem Inf Model 2019;59:2572-2586. [DOI: 10.1021/acs.jcim.9b00249] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Schmidt R, Ehmki ESR, Ohm F, Ehrlich HC, Mashychev A, Rarey M. Comparing Molecular Patterns Using the Example of SMARTS: Theory and Algorithms. J Chem Inf Model 2019;59:2560-2571. [DOI: 10.1021/acs.jcim.9b00250] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Shahare HV, Talele GS. Designing of benzothiazole derivatives as promising EGFR tyrosine kinase inhibitors: a pharmacoinformatics study. J Biomol Struct Dyn 2019;38:1365-1374. [DOI: 10.1080/07391102.2019.1604264] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Zanette C, Bannan CC, Bayly CI, Fass J, Gilson MK, Shirts MR, Chodera JD, Mobley DL. Toward Learned Chemical Perception of Force Field Typing Rules. J Chem Theory Comput 2019;15:402-423. [PMID: 30512951 PMCID: PMC6467725 DOI: 10.1021/acs.jctc.8b00821] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Abstract

Molecular mechanics force fields define how the energy and forces in a molecular system are computed from its atomic positions, thus enabling the study of such systems through computational methods like molecular dynamics and Monte Carlo simulations. Despite progress toward automated force field parametrization, considerable human expertise is required to develop or extend force fields. In particular, human input has long been required to define atom types, which encode chemically unique environments that determine which parameters will be assigned. However, relying on humans to establish atom types is suboptimal. Human-created atom types are often developed without statistical justification, leading to over- or under-fitting of data. Human-created types are also difficult to extend in a systematic and consistent manner when new chemistries must be modeled or new data becomes available. Finally, human effort is not scalable when force fields must be generated for new (bio)polymers, compound classes, or materials. To remedy these deficiencies, our long-term goal is to replace human specification of atom types with an automated approach, based on rigorous statistics and driven by experimental and/or quantum chemical reference data. In this work, we describe novel methods that automate the discovery of appropriate chemical perception: SMARTY allows for the creation of atom types, while SMIRKY goes further by automating the creation of fragment (nonbonded, bonds, angles, and torsions) types. These approaches enable the creation of move sets in atom or fragment type space, which are used within a Monte Carlo optimization approach. We demonstrate the power of these new methods by automating the rediscovery of human defined atom types (SMARTY) or fragment types (SMIRKY) in existing small molecule force fields. We assess these approaches using several molecular data sets, including one which covers a diverse subset of the DrugBank database.

Collapse

Chang HJ, Fischer T, Petit M, Zambelli M, Demiris Y. Learning Kinematic Structure Correspondences Using Multi-Order Similarities. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018;40:2920-2934. [PMID: 29989982 DOI: 10.1109/tpami.2017.2777486] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Hähnke VD, Kim S, Bolton EE. PubChem chemical structure standardization. J Cheminform 2018;10:36. [PMID: 30097821 PMCID: PMC6086778 DOI: 10.1186/s13321-018-0293-8] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 08/01/2018] [Indexed: 11/15/2022] Open

Abstract

BACKGROUND

PubChem is a chemical information repository, consisting of three primary databases: Substance, Compound, and BioAssay. When individual data contributors submit chemical substance descriptions to Substance, the unique chemical structures are extracted and stored into Compound through an automated process called structure standardization. The present study describes the PubChem standardization approaches and analyzes them for their success rates, reasons that cause structures to be rejected, and modifications applied to structures during the standardization process. Furthermore, the PubChem standardization is compared to the structure normalization of the IUPAC International Chemical Identifier (InChI) software, as manifested by conversion of the InChI back into a chemical structure.

RESULTS

The observed rejection rate for substances processed by PubChem standardization was 0.36%, which is predominantly attributed to structures with invalid atom valences that cannot be readily corrected without additional information from contributors. Of all structures that pass standardization, 44% are modified in the process, reducing the count of unique structures from 53,574,724 in substance to 45,808,881 in compound as identified by de-aromatized canonical isomeric SMILES. Even though the processing time is very low on average (only 0.4% of structures have individual standardization time above 0.1 s), total standardization time is completely dominated by edge cases: 90% of the time to standardize all structures in PubChem substance is spent on the 2.05% of structures with the highest individual standardization time. It is worth noting that 60% of the structures obtained from PubChem structure standardization are not identical to the chemical structure resulting from the InChI (primarily due to preferences for a different tautomeric form).

CONCLUSIONS

Standardization of chemical structures is complicated by the diversity of chemical information and their representations approaches. The PubChem standardization is an effective and efficient tool to account for molecular diversity and to eliminate invalid/incomplete structures. Further development will concentrate on improved tautomer consideration and an expanded stereocenter definition. Modifications are difficult to thoroughly validate, with slight changes often affecting many thousands of structures and various edge cases. The PubChem structure standardization service is accessible as a public resource ( https://pubchem.ncbi.nlm.nih.gov/standardize ), and via programmatic interfaces.

Collapse

Kratochvíl M, Vondrášek J, Galgonek J. Sachem: a chemical cartridge for high-performance substructure search. J Cheminform 2018;10:27. [PMID: 29797000 PMCID: PMC5966370 DOI: 10.1186/s13321-018-0282-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 05/16/2018] [Indexed: 12/19/2022] Open

Inhester T, Nittinger E, Sommer K, Schmidt P, Bietz S, Rarey M. NAOMInova: Interactive Geometric Analysis of Noncovalent Interactions in Macromolecular Structures. J Chem Inf Model 2017;57:2132-2142. [DOI: 10.1021/acs.jcim.7b00291] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Nastasi G, Miceli C, Pittalà V, Modica MN, Prezzavento O, Romeo G, Rescifina A, Marrazzo A, Amata E. S2RSLDB: a comprehensive manually curated, internet-accessible database of the sigma-2 receptor selective ligands. J Cheminform 2017;9:3. [PMID: 28123452 PMCID: PMC5250622 DOI: 10.1186/s13321-017-0191-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 01/16/2017] [Indexed: 11/10/2022] Open

Weskamp N. Guided Iterative Substructure Search (GI-SSS) - A New Trick for an Old Dog. Mol Inform 2016;35:286-92. [PMID: 27492243 DOI: 10.1002/minf.201600063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 06/09/2016] [Indexed: 11/10/2022]

Nettleton DF, Salas J. Approximate Matching of Neighborhood Subgraphs — An Ordered String Graph Levenshtein Method. INT J UNCERTAIN FUZZ 2016. [DOI: 10.1142/s0218488516500215] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Schärfer C, Schulz-Gasch T, Hert J, Heinzerling L, Schulz B, Inhester T, Stahl M, Rarey M. CONFECT: Conformations from an Expert Collection of Torsion Patterns. ChemMedChem 2013;8:1690-700. [DOI: 10.1002/cmdc.201300242] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2013] [Revised: 07/04/2013] [Indexed: 11/09/2022]

Ehrlich HC, Henzler AM, Rarey M. Searching for recursively defined generic chemical patterns in nonenumerated fragment spaces. J Chem Inf Model 2013;53:1676-88. [PMID: 23751070 DOI: 10.1021/ci400107k] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Abstract

Retrieving molecules with specific structural features is a fundamental requirement of today's molecular database technologies. Estimates claim the chemical space relevant for drug discovery to be around 10⁶⁰ molecules. This figure is many orders of magnitude larger than the amount of molecules conventional databases retain today and will store in the future. An elegant description of such a large chemical space is provided by the concept of fragment spaces. A fragment space comprises fragments that are molecules with open valences and describes rules how to connect these fragments to products. Due to the combinatorial nature of fragment spaces, a complete enumeration of its products is intractable. We present an algorithm to search fragment spaces for generic chemical patterns as present in the SMARTS chemical pattern language. Our method allows specification of the chemical surrounding of an atom in a query and, therefore, enables a chemically intuitive search. During the search, the costly enumeration of products is avoided. The result is a fragment space that exactly describes all possible molecules that contain the user-defined pattern. We evaluated the algorithm in three different drug development use-cases and performed a large scale statistical analysis with 738 SMARTS patterns on three public available fragment spaces. Our results show the ability of the algorithm to explore the chemical space around known active molecules, to analyze fragment spaces for the presence of likely toxic molecules, and to identify complex macromolecular structures under additional structural constraints. By searching the fragment space in its nonenumerated form, spaces covering up to 10¹⁹ molecules can be examined in times ranging between 47 s and 19 min depending on the complexity of the query pattern.

Collapse

Schärfer C, Schulz-Gasch T, Ehrlich HC, Guba W, Rarey M, Stahl M. Torsion angle preferences in druglike chemical space: a comprehensive guide. J Med Chem 2013;56:2016-28. [PMID: 23379567 DOI: 10.1021/jm3016816] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]