Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yu H, Liu F, Ramesh BP. Automatic figure ranking and user interfacing for intelligent figure search. PLoS One 2010;5:e12983. [PMID: 20949102 PMCID: PMC2951344 DOI: 10.1371/journal.pone.0012983] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2010] [Accepted: 08/20/2010] [Indexed: 11/18/2022] Open

For:	Yu H, Liu F, Ramesh BP. Automatic figure ranking and user interfacing for intelligent figure search. PLoS One 2010;5:e12983. [PMID: 20949102 PMCID: PMC2951344 DOI: 10.1371/journal.pone.0012983] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2010] [Accepted: 08/20/2010] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Chaki J. An automatic system for extracting figure-caption pair from medical documents: a six-fold approach. PeerJ Comput Sci 2023;9:e1452. [PMID: 37547417 PMCID: PMC10403167 DOI: 10.7717/peerj-cs.1452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 06/01/2023] [Indexed: 08/08/2023]

Yamamoto S, Lauscher A, Ponzetto SP, Glavaš G, Morishima S. Visual Summary Identification From Scientific Publications via Self-Supervised Learning. Front Res Metr Anal 2021;6:719004. [PMID: 34490413 PMCID: PMC8418328 DOI: 10.3389/frma.2021.719004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 08/04/2021] [Indexed: 01/25/2023] Open

Li P, Jiang X, Shatkay H. Figure and caption extraction from biomedical documents. Bioinformatics 2020;35:4381-4388. [PMID: 30949681 PMCID: PMC6821181 DOI: 10.1093/bioinformatics/btz228] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 03/22/2019] [Accepted: 04/02/2019] [Indexed: 12/16/2022] Open

Brown P, Zhou Y. Large expert-curated database for benchmarking document similarity detection in biomedical literature search. Database (Oxford) 2019;2019:baz085. [PMID: 33326193 PMCID: PMC7291946 DOI: 10.1093/database/baz085] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 05/15/2019] [Accepted: 05/31/2019] [Indexed: 02/07/2023]

Abstract

Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.

Collapse

Brown P, Zhou Y. Large expert-curated database for benchmarking document similarity detection in biomedical literature search. Database (Oxford) 2019. [PMID: 33326193 DOI: 10.1093/database/baz085.] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Abstract

Collapse

DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures. PLoS One 2015;10:e0126200. [PMID: 25951377 PMCID: PMC4423993 DOI: 10.1371/journal.pone.0126200] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2014] [Accepted: 03/30/2015] [Indexed: 11/19/2022] Open

Liu F, Yu H. Learning to rank figures within a biomedical article. PLoS One 2014;9:e61567. [PMID: 24625719 PMCID: PMC3953065 DOI: 10.1371/journal.pone.0061567] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2012] [Accepted: 03/08/2013] [Indexed: 12/01/2022] Open

Abstract

Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. This ever-increasing sheer volume has made it difficult for scientists to effectively and accurately access figures of their interest, the process of which is crucial for validating research facts and for formulating or testing novel research hypotheses. Current figure search applications can't fully meet this challenge as the “bag of figures” assumption doesn't take into account the relationship among figures. In our previous study, hundreds of biomedical researchers have annotated articles in which they serve as corresponding authors. They ranked each figure in their paper based on a figure's importance at their discretion, referred to as “figure ranking”. Using this collection of annotated data, we investigated computational approaches to automatically rank figures. We exploited and extended the state-of-the-art listwise learning-to-rank algorithms and developed a new supervised-learning model BioFigRank.

The cross-validation results show that BioFigRank yielded the best performance compared with other state-of-the-art computational models, and the greedy feature selection can further boost the ranking performance significantly. Furthermore, we carry out the evaluation by comparing BioFigRank with three-level competitive domain-specific human experts: (1) First Author, (2) Non-Author-In-Domain-Expert who is not the author nor co-author of an article but who works in the same field of the corresponding author of the article, and (3) Non-Author-Out-Domain-Expert who is not the author nor co-author of an article and who may or may not work in the same field of the corresponding author of an article. Our results show that BioFigRank outperforms Non-Author-Out-Domain-Expert and performs as well as Non-Author-In-Domain-Expert. Although BioFigRank underperforms First Author, since most biomedical researchers are either in- or out-domain-experts for an article, we conclude that BioFigRank represents an artificial intelligence system that offers expert-level intelligence to help biomedical researchers to navigate increasingly proliferated big data efficiently.

Collapse

Bockhorst JP, Conroy JM, Agarwal S, O’Leary DP, Yu H. Beyond captions: linking figures with abstract sentences in biomedical articles. PLoS One 2012;7:e39618. [PMID: 22815711 PMCID: PMC3399876 DOI: 10.1371/journal.pone.0039618] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2011] [Accepted: 05/23/2012] [Indexed: 11/18/2022] Open

Automatic figure classification in bioscience literature. J Biomed Inform 2011;44:848-58. [PMID: 21645638 DOI: 10.1016/j.jbi.2011.05.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2010] [Revised: 04/13/2011] [Accepted: 05/11/2011] [Indexed: 11/21/2022]

Prasad R, McRoy S, Frid N, Joshi A, Yu H. The biomedical discourse relation bank. BMC Bioinformatics 2011;12:188. [PMID: 21605399 PMCID: PMC3130691 DOI: 10.1186/1471-2105-12-188] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2010] [Accepted: 05/23/2011] [Indexed: 12/17/2022] Open

Abstract

Background

Identification of discourse relations, such as causal and contrastive relations, between situations mentioned in text is an important task for biomedical text-mining. A biomedical text corpus annotated with discourse relations would be very useful for developing and evaluating methods for biomedical discourse processing. However, little effort has been made to develop such an annotated resource.

Results

We have developed the Biomedical Discourse Relation Bank (BioDRB), in which we have annotated explicit and implicit discourse relations in 24 open-access full-text biomedical articles from the GENIA corpus. Guidelines for the annotation were adapted from the Penn Discourse TreeBank (PDTB), which has discourse relations annotated over open-domain news articles. We introduced new conventions and modifications to the sense classification. We report reliable inter-annotator agreement of over 80% for all sub-tasks. Experiments for identifying the sense of explicit discourse connectives show the connective itself as a highly reliable indicator for coarse sense classification (accuracy 90.9% and F1 score 0.89). These results are comparable to results obtained with the same classifier on the PDTB data. With more refined sense classification, there is degradation in performance (accuracy 69.2% and F1 score 0.28), mainly due to sparsity in the data. The size of the corpus was found to be sufficient for identifying the sense of explicit connectives, with classifier performance stabilizing at about 1900 training instances. Finally, the classifier performs poorly when trained on PDTB and tested on BioDRB (accuracy 54.5% and F1 score 0.57).

Conclusion

Our work shows that discourse relations can be reliably annotated in biomedical text. Coarse sense disambiguation of explicit connectives can be done with high reliability by using just the connective as a feature, but more refined sense classification requires either richer features or more annotated data. The poor performance of a classifier trained in the open domain and tested in the biomedical domain suggests significant differences in the semantic usage of connectives across these domains, and provides robust evidence for a biomedical sublanguage for discourse and the need to develop a specialized biomedical discourse annotated corpus. The results of our cross-domain experiments are consistent with related work on identifying connectives in BioDRB.

Collapse

Kim D, Yu H. Figure text extraction in biomedical literature. PLoS One 2011;6:e15338. [PMID: 21249186 PMCID: PMC3020938 DOI: 10.1371/journal.pone.0015338] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2010] [Accepted: 11/08/2010] [Indexed: 12/05/2022] Open

Abstract

BACKGROUND

Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engine (http://figuresearch.askHERMES.org) to allow bioscientists to access figures efficiently. Since text frequently appears in figures, automatically extracting such text may assist the task of mining information from figures. Little research, however, has been conducted exploring text extraction from biomedical figures.

METHODOLOGY

We first evaluated an off-the-shelf Optical Character Recognition (OCR) tool on its ability to extract text from figures appearing in biomedical full-text articles. We then developed a Figure Text Extraction Tool (FigTExT) to improve the performance of the OCR tool for figure text extraction through the use of three innovative components: image preprocessing, character recognition, and text correction. We first developed image preprocessing to enhance image quality and to improve text localization. Then we adapted the off-the-shelf OCR tool on the improved text localization for character recognition. Finally, we developed and evaluated a novel text correction framework by taking advantage of figure-specific lexicons.

RESULTS/CONCLUSIONS

The evaluation on 382 figures (9,643 figure texts in total) randomly selected from PubMed Central full-text articles shows that FigTExT performed with 84% precision, 98% recall, and 90% F1-score for text localization and with 62.5% precision, 51.0% recall and 56.2% F1-score for figure text extraction. When limiting figure texts to those judged by domain experts to be important content, FigTExT performed with 87.3% precision, 68.8% recall, and 77% F1-score. FigTExT significantly improved the performance of the off-the-shelf OCR tool we used, which on its own performed with 36.6% precision, 19.3% recall, and 25.3% F1-score for text extraction. In addition, our results show that FigTExT can extract texts that do not appear in figure captions or other associated text, further suggesting the potential utility of FigTExT for improving figure search.

Collapse