1
|
Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017; 117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo , Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense E-32004, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia) , Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain.,CEB-Centre of Biological Engineering, University of Minho , Campus de Gualtar, Braga 4710-057, Portugal
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Alfonso Valencia
- Life Science Department, Barcelona Supercomputing Centre (BSC-CNS) , C/Jordi Girona, 29-31, Barcelona E-08034, Spain.,Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona , C/ Baldiri Reixac 10, Barcelona E-08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig de Lluís Companys 23, Barcelona E-08010, Spain
| |
Collapse
|
2
|
Wedler HB, Pemberton RP, Lounnas V, Vriend G, Tantillo DJ, Wang SC. Quantum chemical study of the isomerization of 24-methylenecycloartanol, a potential marker of olive oil refining. J Mol Model 2015; 21:111. [PMID: 25860110 DOI: 10.1007/s00894-015-2652-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 03/16/2015] [Indexed: 11/29/2022]
Abstract
Quantum chemical calculations on the isomerization of 24-methylenecycloartanol are described. An energetically viable mechanism, with a rate-determining protonation step, is proposed. This rearrangement may find applicability in tests for determining if an olive oil has been refined.
Collapse
Affiliation(s)
- Henry B Wedler
- Department of Chemistry, University of California-Davis, Davis, CA, 95616, USA
| | | | | | | | | | | |
Collapse
|
3
|
Lounnas V, Wedler HB, Newman T, Schaftenaar G, Harrison JG, Nepomuceno G, Pemberton R, Tantillo DJ, Vriend G. Visually impaired researchers get their hands on quantum chemistry: application to a computational study on the isomerization of a sterol. J Comput Aided Mol Des 2014; 28:1057-67. [PMID: 25091066 DOI: 10.1007/s10822-014-9782-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2014] [Accepted: 07/04/2014] [Indexed: 11/29/2022]
Abstract
In molecular sciences, articles tend to revolve around 2D representations of 3D molecules, and sighted scientists often resort to 3D virtual reality software to study these molecules in detail. Blind and visually impaired (BVI) molecular scientists have access to a series of audio devices that can help them read the text in articles and work with computers. Reading articles published in this journal, though, is nearly impossible for them because they need to generate mental 3D images of molecules, but the article-reading software cannot do that for them. We have previously designed AsteriX, a web server that fully automatically decomposes articles, detects 2D plots of low molecular weight molecules, removes meta data and annotations from these plots, and converts them into 3D atomic coordinates. AsteriX-BVI goes one step further and converts the 3D representation into a 3D printable, haptic-enhanced format that includes Braille annotations. These Braille-annotated physical 3D models allow BVI scientists to generate a complete mental model of the molecule. AsteriX-BVI uses Molden to convert the meta data of quantum chemistry experiments into BVI friendly formats so that the entire line of scientific information that sighted people take for granted-from published articles, via printed results of computational chemistry experiments, to 3D models-is now available to BVI scientists too. The possibilities offered by AsteriX-BVI are illustrated by a project on the isomerization of a sterol, executed by the blind co-author of this article (HBW).
Collapse
Affiliation(s)
- Valère Lounnas
- CMBI Radboudumc, Geert Grooteplein 26-28, 6525 GA, Nijmegen, The Netherlands,
| | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Frasconi P, Gabbrielli F, Lippi M, Marinai S. Markov logic networks for optical chemical structure recognition. J Chem Inf Model 2014; 54:2380-90. [PMID: 25068386 DOI: 10.1021/ci5002197] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Optical chemical structure recognition is the problem of converting a bitmap image containing a chemical structure formula into a standard structured representation of the molecule. We introduce a novel approach to this problem based on the pipelined integration of pattern recognition techniques with probabilistic knowledge representation and reasoning. Basic entities and relations (such as textual elements, points, lines, etc.) are first extracted by a low-level processing module. A probabilistic reasoning engine based on Markov logic, embodying chemical and graphical knowledge, is subsequently used to refine these pieces of information. An annotated connection table of atoms and bonds is finally assembled and converted into a standard chemical exchange format. We report a successful evaluation on two large image data sets, showing that the method compares favorably with the current state-of-the-art, especially on degraded low-resolution images. The system is available as a web server at http://mlocsr.dinfo.unifi.it.
Collapse
Affiliation(s)
- Paolo Frasconi
- Dipartimento di Ingegneria dell'Informazione, Università degli Studi di Firenze , Via di Santa Marta, 3, 50139 Firenze, Italy
| | | | | | | |
Collapse
|
5
|
Gurulingappa H, Mudi A, Toldo L, Hofmann-Apitius M, Bhate J. Challenges in mining the literature for chemical information. RSC Adv 2013. [DOI: 10.1039/c3ra40787j] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
|