Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rajan K, Zielesny A, Steinbeck C. DECIMER 1.0: deep learning for chemical image recognition using transformers. J Cheminform 2021;13:61. [PMID: 34404468 PMCID: PMC8369700 DOI: 10.1186/s13321-021-00538-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/25/2021] [Indexed: 11/29/2022] Open

For:	Rajan K, Zielesny A, Steinbeck C. DECIMER 1.0: deep learning for chemical image recognition using transformers. J Cheminform 2021;13:61. [PMID: 34404468 PMCID: PMC8369700 DOI: 10.1186/s13321-021-00538-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/25/2021] [Indexed: 11/29/2022] Open

Number

Cited by Other Article(s)

Saifi I, Bhat BA, Hamdani SS, Bhat UY, Lobato-Tapia CA, Mir MA, Dar TUH, Ganie SA. Artificial intelligence and cheminformatics tools: a contribution to the drug development and chemical science. J Biomol Struct Dyn 2024;42:6523-6541. [PMID: 37434311 DOI: 10.1080/07391102.2023.2234039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 07/03/2023] [Indexed: 07/13/2023]

Ouyang H, Liu W, Tao J, Luo Y, Zhang W, Zhou J, Geng S, Zhang C. ChemReco: automated recognition of hand-drawn carbon-hydrogen-oxygen structures using deep learning. Sci Rep 2024;14:17126. [PMID: 39054356 PMCID: PMC11272916 DOI: 10.1038/s41598-024-67496-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 07/11/2024] [Indexed: 07/27/2024] Open

Qu G, Song Q, Fang T. The artistic image processing for visual healing in smart city. Sci Rep 2024;14:16846. [PMID: 39039163 PMCID: PMC11263401 DOI: 10.1038/s41598-024-68082-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 07/19/2024] [Indexed: 07/24/2024] Open

Fan V, Qian Y, Wang A, Wang A, Coley CW, Barzilay R. OpenChemIE: An Information Extraction Toolkit for Chemistry Literature. J Chem Inf Model 2024;64:5521-5534. [PMID: 38950894 DOI: 10.1021/acs.jcim.4c00572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]

Borup RM, Ree N, Jensen JH. pKalculator: A pK _a predictor for C-H bonds. Beilstein J Org Chem 2024;20:1614-1622. [PMID: 39076289 PMCID: PMC11285060 DOI: 10.3762/bjoc.20.144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 07/02/2024] [Indexed: 07/31/2024] Open

Rajan K, Brinkhaus HO, Zielesny A, Steinbeck C. Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture. J Cheminform 2024;16:78. [PMID: 38970120 PMCID: PMC11227129 DOI: 10.1186/s13321-024-00872-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 06/12/2024] [Indexed: 07/07/2024] Open

Abstract

Accurate recognition of hand-drawn chemical structures is crucial for digitising hand-written chemical information in traditional laboratory notebooks or facilitating stylus-based structure entry on tablets or smartphones. However, the inherent variability in hand-drawn structures poses challenges for existing Optical Chemical Structure Recognition (OCSR) software. To address this, we present an enhanced Deep lEarning for Chemical ImagE Recognition (DECIMER) architecture that leverages a combination of Convolutional Neural Networks (CNNs) and Transformers to improve the recognition of hand-drawn chemical structures. The model incorporates an EfficientNetV2 CNN encoder that extracts features from hand-drawn images, followed by a Transformer decoder that converts the extracted features into Simplified Molecular Input Line Entry System (SMILES) strings. Our models were trained using synthetic hand-drawn images generated by RanDepict, a tool for depicting chemical structures with different style elements. A benchmark was performed using a real-world dataset of hand-drawn chemical structures to evaluate the model's performance. The results indicate that our improved DECIMER architecture exhibits a significantly enhanced recognition accuracy compared to other approaches. SCIENTIFIC CONTRIBUTION: The new DECIMER model presented here refines our previous research efforts and is currently the only open-source model tailored specifically for the recognition of hand-drawn chemical structures. The enhanced model performs better in handling variations in handwriting styles, line thicknesses, and background noise, making it suitable for real-world applications. The DECIMER hand-drawn structure recognition model and its source code have been made available as an open-source package under a permissive license.

Collapse

Luong KD, Singh A. Application of Transformers in Cheminformatics. J Chem Inf Model 2024;64:4392-4409. [PMID: 38815246 PMCID: PMC11167597 DOI: 10.1021/acs.jcim.3c02070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/05/2024] [Accepted: 05/06/2024] [Indexed: 06/01/2024]

Meewan I, Panmanee J, Petchyam N, Lertvilai P. HBCVTr: an end-to-end transformer with a deep neural network hybrid model for anti-HBV and HCV activity predictor from SMILES. Sci Rep 2024;14:9262. [PMID: 38649402 PMCID: PMC11035669 DOI: 10.1038/s41598-024-59933-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 04/16/2024] [Indexed: 04/25/2024] Open

Liu P, Ren Y, Tao J, Ren Z. GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text. Comput Biol Med 2024;171:108073. [PMID: 38359660 DOI: 10.1016/j.compbiomed.2024.108073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/25/2023] [Accepted: 01/27/2024] [Indexed: 02/17/2024]

Kunnakkattu IR, Choudhary P, Pravda L, Nadzirin N, Smart OS, Yuan Q, Anyango S, Nair S, Varadi M, Velankar S. PDBe CCDUtils: an RDKit-based toolkit for handling and analysing small molecules in the Protein Data Bank. J Cheminform 2023;15:117. [PMID: 38042830 PMCID: PMC10693035 DOI: 10.1186/s13321-023-00786-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/17/2023] [Indexed: 12/04/2023] Open

Affiliation(s)

Ibrahim Roshan Kunnakkattu Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Preeti Choudhary Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Lukas Pravda Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Nurul Nadzirin Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Oliver S Smart Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Qi Yuan Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Stephen Anyango Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Sreenath Nair Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Mihaly Varadi Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Sameer Velankar Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Collapse

Zhou C, Liu W, Song X, Yang M, Peng X. YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications. J Cheminform 2023;15:111. [PMID: 37986007 PMCID: PMC10662772 DOI: 10.1186/s13321-023-00783-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 11/11/2023] [Indexed: 11/22/2023] Open

Abstract

In chemistry-related disciplines, a vast repository of molecular structural data has been documented in scientific publications but remains inaccessible to computational analyses owing to its non-machine-readable format. Optical chemical structure recognition (OCSR) addresses this gap by converting images of chemical molecular structures into a format accessible to computers and convenient for storage, paving the way for further analyses and studies on chemical information. A pivotal initial step in OCSR is automating the noise-free extraction of molecular descriptions from literature. Despite efforts utilising rule-based and deep learning approaches for the extraction process, the accuracy achieved to date is unsatisfactory. To address this issue, we introduce a deep learning model named YoDe-Segmentation in this study, engineered for the automated retrieval of molecular structures from scientific documents. This model operates via a three-stage process encompassing detection, mask generation, and calculation. Initially, it identifies and isolates molecular structures during the detection phase. Subsequently, mask maps are created based on these isolated structures in the mask generation stage. In the final calculation stage, refined and separated mask maps are combined with the isolated molecular structure images, resulting in the acquisition of pure molecular structures. Our model underwent rigorous testing using texts from multiple chemistry-centric journals, with the outcomes subjected to manual validation. The results revealed the superior performance of YoDe-Segmentation compared to alternative algorithms, documenting an average extraction efficiency of 97.62%. This outcome not only highlights the robustness and reliability of the model but also suggests its applicability on a broad scale.

Collapse

Mullowney MW, Duncan KR, Elsayed SS, Garg N, van der Hooft JJJ, Martin NI, Meijer D, Terlouw BR, Biermann F, Blin K, Durairaj J, Gorostiola González M, Helfrich EJN, Huber F, Leopold-Messer S, Rajan K, de Rond T, van Santen JA, Sorokina M, Balunas MJ, Beniddir MA, van Bergeijk DA, Carroll LM, Clark CM, Clevert DA, Dejong CA, Du C, Ferrinho S, Grisoni F, Hofstetter A, Jespers W, Kalinina OV, Kautsar SA, Kim H, Leao TF, Masschelein J, Rees ER, Reher R, Reker D, Schwaller P, Segler M, Skinnider MA, Walker AS, Willighagen EL, Zdrazil B, Ziemert N, Goss RJM, Guyomard P, Volkamer A, Gerwick WH, Kim HU, Müller R, van Wezel GP, van Westen GJP, Hirsch AKH, Linington RG, Robinson SL, Medema MH. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 2023;22:895-916. [PMID: 37697042 DOI: 10.1038/s41573-023-00774-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 09/13/2023]

Affiliation(s)

Michael W Mullowney Duchossois Family Institute, The University of Chicago, Chicago, IL, USA
Katherine R Duncan Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
Somayah S Elsayed Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
Neha Garg School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
Justin J J van der Hooft Bioinformatics Group, Wageningen University, Wageningen, The Netherlands Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
Nathaniel I Martin Biological Chemistry Group, Institute of Biology, Leiden University, Leiden, The Netherlands
David Meijer Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
Barbara R Terlouw Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
Friederike Biermann Bioinformatics Group, Wageningen University, Wageningen, The Netherlands Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
Kai Blin The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
Janani Durairaj Biozentrum, University of Basel, Basel, Switzerland
Marina Gorostiola González Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands ONCODE institute, Leiden, The Netherlands
Eric J N Helfrich Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
Florian Huber Center for Digitalization and Digitality, Hochschule Düsseldorf, Düsseldorf, Germany
Stefan Leopold-Messer Institut für Mikrobiologie, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
Kohulan Rajan Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany
Tristan de Rond School of Chemical Sciences, University of Auckland, Auckland, New Zealand
Jeffrey A van Santen Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada
Maria Sorokina Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller University, Jena, Germany Pharmaceuticals R&D, Bayer AG, Berlin, Germany
Marcy J Balunas Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
Mehdi A Beniddir Équipe "Chimie des Substances Naturelles", Université Paris-Saclay, CNRS, BioCIS, Orsay, France
Doris A van Bergeijk Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
Laura M Carroll Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
Chase M Clark Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
Djork-Arné Clevert WRDM - Machine Learning Research, Pfizer, Berlin, Germany
Chris A Dejong Adapsyn Bioscience, Hamilton, Ontario, Canada
Chao Du Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
Scarlet Ferrinho Chemistry Department, University of St Andrews, St Andrews, UK
Francesca Grisoni Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
Albert Hofstetter Laboratory of Physical Chemistry, ETH Zürich, Zürich, Switzerland
Willem Jespers Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
Olga V Kalinina Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany Drug Bioinformatics, Medical Faculty, Saarland University, Homburg, Germany Center for Bioinformatics, Saarland University, Saarbrücken, Germany
Satria A Kautsar Department of Chemistry, Scripps Research, FL, USA
Hyunwoo Kim College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University Seoul, Goyang-si, Republic of Korea
Tiago F Leao Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil
Joleen Masschelein Center for Microbiology, VIB-KU Leuven, Heverlee, Belgium Department of Biology, KU Leuven, Heverlee, Belgium
Evan R Rees Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
Raphael Reher Institute of Pharmaceutical Biology and Biotechnology, University of Marburg, Marburg, Germany Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
Daniel Reker Department of Biomedical Engineering, Duke University, Durham, NC, USA Duke Microbiome Center, Duke University, Durham, NC, USA
Philippe Schwaller Laboratory of Artificial Chemical Intelligence, Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Marwin Segler Microsoft Research, Cambridge, UK
Michael A Skinnider Adapsyn Bioscience, Hamilton, Ontario, Canada Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
Allison S Walker Department of Chemistry, Vanderbilt University, Nashville, TN, USA Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
Egon L Willighagen Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
Barbara Zdrazil European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, UK
Nadine Ziemert Interfaculty Institute for Microbiology and Infection Medicine Tuebingen (IMIT), Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany
Rebecca J M Goss Chemistry Department, University of St Andrews, St Andrews, UK
Pierre Guyomard Bonsai team, CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Université de Lille, Villeneuve d'Ascq Cedex, France
Andrea Volkamer Center for Bioinformatics, Saarland University, Saarbrücken, Germany In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
William H Gerwick Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
Hyun Uk Kim Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
Rolf Müller Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany Department of Pharmacy, Saarland University, Saarbrücken, Germany German Center for infection research (DZIF), Braunschweig, Germany Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany
Gilles P van Wezel Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, The Netherlands
Gerard J P van Westen Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
Anna K H Hirsch Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany. Department of Pharmacy, Saarland University, Saarbrücken, Germany. German Center for infection research (DZIF), Braunschweig, Germany. Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany.
Roger G Linington Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada.
Serina L Robinson Department of Environmental Microbiology, Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland.
Marnix H Medema Bioinformatics Group, Wageningen University, Wageningen, The Netherlands. Institute of Biology, Leiden University, Leiden, The Netherlands.

Collapse

Scholz VA, Stork C, Frericks M, Kirchmair J. Computational prediction of the metabolites of agrochemicals formed in rats. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023;895:165039. [PMID: 37355108 DOI: 10.1016/j.scitotenv.2023.165039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/17/2023] [Accepted: 06/19/2023] [Indexed: 06/26/2023]

Abstract

Today, computational tools for the prediction of the metabolite structures of xenobiotics are widely available and employed in small-molecule research. Reflecting the availability of measured data, these in silico tools are trained and validated primarily on drug metabolism data. In this work, we assessed the capacity of five leading metabolite structure predictors to represent the metabolism of agrochemicals observed in rats. More specifically, we tested the ability of SyGMa, GLORY, GLORYx, BioTransformer 3.0, and MetaTrans to correctly predict and rank the experimentally observed metabolites of a set of 85 parent compounds. We found that the models were able to recover about one to two-thirds of the experimentally observed first-generation, second-generation and third-generation metabolites, confirming their value in applications such as metabolite identification. However, precision was low for all investigated tools and did not exceed approximately 18 % for the pool of first-generation metabolites and 2 % for the pool of compounds representing the first three generations of metabolites. The variance in prediction success rates was high across the individual metabolic maps, meaning that outcomes depend strongly on the specific compound under investigation. We also found that the predictions for individual parent compounds differed strongly between the tools, particularly between those built on orthogonal technologies (e.g., rule-based and end-to-end machine learning approaches). This renders ensemble model strategies promising for improving success rates. Overall, the results of this benchmark study show that there is still considerable room for the improvement of metabolite structure predictors left. Our discussion points out several avenues to progress. The bottleneck in method development certainly has been, and will remain, for the foreseeable future, the limited quantity and quality of available measured data on small-molecule metabolism.

Collapse

Wilary D, Cole JM. ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes. J Chem Inf Model 2023;63:6053-6067. [PMID: 37729111 PMCID: PMC10565829 DOI: 10.1021/acs.jcim.3c00422] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Indexed: 09/22/2023]

Rajan K, Brinkhaus HO, Agea MI, Zielesny A, Steinbeck C. DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications. Nat Commun 2023;14:5045. [PMID: 37598180 PMCID: PMC10439916 DOI: 10.1038/s41467-023-40782-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 08/09/2023] [Indexed: 08/21/2023] Open

Carracedo-Cosme J, Romero-Muñiz C, Pou P, Pérez R. Molecular Identification from AFM Images Using the IUPAC Nomenclature and Attribute Multimodal Recurrent Neural Networks. ACS APPLIED MATERIALS & INTERFACES 2023;15:22692-22704. [PMID: 37126486 PMCID: PMC10176476 DOI: 10.1021/acsami.3c01550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]

Qian Y, Guo J, Tu Z, Li Z, Coley CW, Barzilay R. MolScribe: Robust Molecular Structure Recognition with Image-to-Graph Generation. J Chem Inf Model 2023;63:1925-1934. [PMID: 36971363 DOI: 10.1021/acs.jcim.2c01480] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]

Two years of explicit CiTO annotations. J Cheminform 2023;15:14. [PMID: 36737837 PMCID: PMC9897605 DOI: 10.1186/s13321-023-00683-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Accepted: 01/17/2023] [Indexed: 02/05/2023] Open

Lai A, Schaub J, Steinbeck C, Schymanski EL. An algorithm to classify homologous series within compound datasets. J Cheminform 2022;14:85. [PMID: 36510332 PMCID: PMC9746203 DOI: 10.1186/s13321-022-00663-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 11/27/2022] [Indexed: 12/15/2022] Open

Xu Y, Xiao J, Chou CH, Zhang J, Zhu J, Hu Q, Li H, Han N, Liu B, Zhang S, Han J, Zhang Z, Zhang S, Zhang W, Lai L, Pei J. MolMiner: You Only Look Once for Chemical Structure Recognition. J Chem Inf Model 2022;62:5321-5328. [PMID: 36108142 PMCID: PMC9710516 DOI: 10.1021/acs.jcim.2c00733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Indexed: 12/13/2022]

Wang J, Shen Z, Liao Y, Yuan Z, Li S, He G, Lan M, Qian X, Zhang K, Li H. Multi-modal chemical information reconstruction from images and texts for exploring the near-drug space. Brief Bioinform 2022;23:6761958. [PMID: 36252922 PMCID: PMC9677486 DOI: 10.1093/bib/bbac461] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 09/21/2022] [Accepted: 09/26/2022] [Indexed: 12/14/2022] Open

Abstract

Identification of new chemical compounds with desired structural diversity and biological properties plays an essential role in drug discovery, yet the construction of such a potential space with elements of 'near-drug' properties is still a challenging task. In this work, we proposed a multimodal chemical information reconstruction system to automatically process, extract and align heterogeneous information from the text descriptions and structural images of chemical patents. Our key innovation lies in a heterogeneous data generator that produces cross-modality training data in the form of text descriptions and Markush structure images, from which a two-branch model with image- and text-processing units can then learn to both recognize heterogeneous chemical entities and simultaneously capture their correspondence. In particular, we have collected chemical structures from ChEMBL database and chemical patents from the European Patent Office and the US Patent and Trademark Office using keywords 'A61P, compound, structure' in the years from 2010 to 2020, and generated heterogeneous chemical information datasets with 210K structural images and 7818 annotated text snippets. Based on the reconstructed results and substituent replacement rules, structural libraries of a huge number of near-drug compounds can be generated automatically. In quantitative evaluations, our model can correctly reconstruct 97% of the molecular images into structured format and achieve an F1-score around 97-98% in the recognition of chemical entities, which demonstrated the effectiveness of our model in automatic information extraction from chemical patents, and hopefully transforming them to a user-friendly, structured molecular database enriching the near-drug space to realize the intelligent retrieval technology of chemical knowledge.

Collapse

Musazade F, Jamalova N, Hasanov J. Review of techniques and models used in optical chemical structure recognition in images and scanned documents. J Cheminform 2022;14:61. [PMID: 36076301 PMCID: PMC9461257 DOI: 10.1186/s13321-022-00642-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 08/20/2022] [Indexed: 11/10/2022] Open

Xu Z, Li J, Yang Z, Li S, Li H. SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer. J Cheminform 2022;14:41. [PMID: 35778754 PMCID: PMC9248127 DOI: 10.1186/s13321-022-00624-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 06/12/2022] [Indexed: 11/26/2022] Open

Brinkhaus HO, Zielesny A, Steinbeck C, Rajan K. DECIMER-hand-drawn molecule images dataset. J Cheminform 2022;14:36. [PMID: 35681226 PMCID: PMC9185882 DOI: 10.1186/s13321-022-00620-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 05/25/2022] [Indexed: 12/01/2022] Open

Brinkhaus HO, Rajan K, Zielesny A, Steinbeck C. RanDepict: Random chemical structure depiction generator. J Cheminform 2022;14:31. [PMID: 35668480 PMCID: PMC9169273 DOI: 10.1186/s13321-022-00609-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/09/2022] [Indexed: 11/10/2022] Open

Petmezas G, Stefanopoulos L, Kilintzis V, Tzavelis A, Rogers JA, Katsaggelos AK, Maglaveras N. State-of-the-art Deep Learning Methods on Electrocardiogram Data: A Systematic Review (Preprint). JMIR Med Inform 2022;10:e38454. [PMID: 35969441 PMCID: PMC9425174 DOI: 10.2196/38454] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 06/03/2022] [Accepted: 07/03/2022] [Indexed: 11/13/2022] Open

Abstract

Background

Electrocardiogram (ECG) is one of the most common noninvasive diagnostic tools that can provide useful information regarding a patient’s health status. Deep learning (DL) is an area of intense exploration that leads the way in most attempts to create powerful diagnostic models based on physiological signals.

Objective

This study aimed to provide a systematic review of DL methods applied to ECG data for various clinical applications.

Methods

The PubMed search engine was systematically searched by combining “deep learning” and keywords such as “ecg,” “ekg,” “electrocardiogram,” “electrocardiography,” and “electrocardiology.” Irrelevant articles were excluded from the study after screening titles and abstracts, and the remaining articles were further reviewed. The reasons for article exclusion were manuscripts written in any language other than English, absence of ECG data or DL methods involved in the study, and absence of a quantitative evaluation of the proposed approaches.

Results

We identified 230 relevant articles published between January 2020 and December 2021 and grouped them into 6 distinct medical applications, namely, blood pressure estimation, cardiovascular disease diagnosis, ECG analysis, biometric recognition, sleep analysis, and other clinical analyses. We provide a complete account of the state-of-the-art DL strategies per the field of application, as well as major ECG data sources. We also present open research problems, such as the lack of attempts to address the issue of blood pressure variability in training data sets, and point out potential gaps in the design and implementation of DL models.

Conclusions

We expect that this review will provide insights into state-of-the-art DL methods applied to ECG data and point to future directions for research on DL to create robust models that can assist medical experts in clinical decision-making.

Collapse

Stocker M, Heger T, Schweidtmann A, Ćwiek-Kupczyńska H, Penev L, Dojchinovski M, Willighagen E, Vidal ME, Turki H, Balliet D, Tiddi I, Kuhn T, Mietchen D, Karras O, Vogt L, Hellmann S, Jeschke J, Krajewski P, Auer S. SKG4EOSC - Scholarly Knowledge Graphs for EOSC: Establishing a backbone of knowledge graphs for FAIR Scholarly Information in EOSC. RESEARCH IDEAS AND OUTCOMES 2022. [DOI: 10.3897/rio.8.e83789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract In the age of advanced information systems powering fast-paced knowledge economies that face global societal challenges, it is no longer adequate to express scholarly information - an essential resource for modern economies - primarily as article narratives in document form. Despite being a well-established tradition in scholarly communication, PDF-based text publishing is hindering scientific progress as it buries scholarly information into non-machine-readable formats. The key objective of SKG4EOSC is to improve science productivity through development and implementation of services for text and data conversion, and production, curation, and re-use of FAIR scholarly information. This will be achieved by (1) establishing the Open Research Knowledge Graph (ORKG, orkg.org), a service operated by the SKG4EOSC coordinator, as a Hub for access to FAIR scholarly information in the EOSC; (2) lifting to EOSC of numerous and heterogeneous domain-specific research infrastructures through the ORKG Hub’s harmonized access facilities; and (3) leverage the Hub to support cross-disciplinary research and policy decisions addressing societal challenges. SKG4EOSC will pilot the devised approaches and technologies in four research domains: biodiversity crisis, precision oncology, circular processes, and human cooperation. With the aim to improve machine-based scholarly information use, SKG4EOSC addresses an important current and future need of researchers. It extends the application of the FAIR data principles to scholarly communication practices, hence a more comprehensive coverage of the entire research lifecycle. Through explicit, machine actionable provenance links between FAIR scholarly information, primary data and contextual entities, it will substantially contribute to reproducibility, validation and trust in science. The resulting advanced machine support will catalyse new discoveries in basic research and solutions in key application areas. Collapse