Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Vroling B, Thorne D, McDermott P, Attwood TK, Vriend G, Pettifer S. Integrating GPCR-specific information with full text articles. BMC Bioinformatics 2011;12:362. [PMID: 21910883 PMCID: PMC3179973 DOI: 10.1186/1471-2105-12-362] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2011] [Accepted: 09/12/2011] [Indexed: 11/29/2022] Open

For:	Vroling B, Thorne D, McDermott P, Attwood TK, Vriend G, Pettifer S. Integrating GPCR-specific information with full text articles. BMC Bioinformatics 2011;12:362. [PMID: 21910883 PMCID: PMC3179973 DOI: 10.1186/1471-2105-12-362] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2011] [Accepted: 09/12/2011] [Indexed: 11/29/2022] Open

Number

Cited by Other Article(s)

van den Bergh T, Tamo G, Nobili A, Tao Y, Tan T, Bornscheuer UT, Kuipers RKP, Vroling B, de Jong RM, Subramanian K, Schaap PJ, Desmet T, Nidetzky B, Vriend G, Joosten HJ. CorNet: Assigning function to networks of co-evolving residues by automated literature mining. PLoS One 2017;12:e0176427. [PMID: 28545124 PMCID: PMC5436653 DOI: 10.1371/journal.pone.0176427] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Accepted: 12/12/2016] [Indexed: 12/30/2022] Open

Isberg V, de Graaf C, Bortolato A, Cherezov V, Katritch V, Marshall FH, Mordalski S, Pin JP, Stevens RC, Vriend G, Gloriam DE. Generic GPCR residue numbers - aligning topology maps while minding the gaps. Trends Pharmacol Sci 2014;36:22-31. [PMID: 25541108 DOI: 10.1016/j.tips.2014.11.001] [Citation(s) in RCA: 326] [Impact Index Per Article: 32.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Revised: 11/05/2014] [Accepted: 11/07/2014] [Indexed: 12/31/2022]

van der Kant R, Vriend G. Alpha-bulges in G protein-coupled receptors. Int J Mol Sci 2014;15:7841-64. [PMID: 24806342 PMCID: PMC4057707 DOI: 10.3390/ijms15057841] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Revised: 04/02/2014] [Accepted: 04/09/2014] [Indexed: 12/31/2022] Open

Isberg V, Vroling B, van der Kant R, Li K, Vriend G, Gloriam D. GPCRDB: an information system for G protein-coupled receptors. Nucleic Acids Res 2013;42:D422-5. [PMID: 24304901 PMCID: PMC3965068 DOI: 10.1093/nar/gkt1255] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Eriksson M, Nilsson I, Kogej T, Southan C, Johansson M, Tyrchan C, Muresan S, Blomberg N, Bjäreland M. SARConnect: A Tool to Interrogate the Connectivity Between Proteins, Chemical Structures and Activity Data. Mol Inform 2012;31:555-568. [PMID: 23308082 PMCID: PMC3535785 DOI: 10.1002/minf.201200030] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2012] [Accepted: 04/14/2012] [Indexed: 11/21/2022]

Layout-aware text extraction from full-text PDF of scientific articles. SOURCE CODE FOR BIOLOGY AND MEDICINE 2012;7:7. [PMID: 22640904 PMCID: PMC3441580 DOI: 10.1186/1751-0473-7-7] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2012] [Accepted: 05/28/2012] [Indexed: 11/17/2022]

Abstract

Background

The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the ‘Layout-Aware PDF Text Extraction’ (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications.

Results

Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2) Classifying text blocks into rhetorical categories using a rule-based method and (3) Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision¹ = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, ²commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for our system and identify further areas of improvement.

Conclusions

LA-PDFText is an open-source tool for accurately extracting text from full-text scientific articles. The release of the system is available at http://code.google.com/p/lapdftext/.

Collapse

Seddon G, Lounnas V, McGuire R, van den Bergh T, Bywater RP, Oliveira L, Vriend G. Drug design for ever, from hype to hope. J Comput Aided Mol Des 2012;26:137-50. [PMID: 22252446 PMCID: PMC3268973 DOI: 10.1007/s10822-011-9519-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Accepted: 12/05/2011] [Indexed: 01/28/2023]