Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Fiszman M, Demner-Fushman D, Kilicoglu H, Rindflesch TC. Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation. J Biomed Inform 2008;42:801-13. [PMID: 19022398 DOI: 10.1016/j.jbi.2008.10.002] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2008] [Revised: 09/30/2008] [Accepted: 10/15/2008] [Indexed: 11/18/2022]

For:	Fiszman M, Demner-Fushman D, Kilicoglu H, Rindflesch TC. Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation. J Biomed Inform 2008;42:801-13. [PMID: 19022398 DOI: 10.1016/j.jbi.2008.10.002] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2008] [Revised: 09/30/2008] [Accepted: 10/15/2008] [Indexed: 11/18/2022]

Number

Cited by Other Article(s)

Oliveira Dos Santos Á, Sergio da Silva E, Machado Couto L, Valadares Labanca Reis G, Silva Belo V. The use of artificial intelligence for automating or semi-automating biomedical literature analyses: a scoping review. J Biomed Inform 2023;142:104389. [PMID: 37187321 DOI: 10.1016/j.jbi.2023.104389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 04/11/2023] [Accepted: 05/08/2023] [Indexed: 05/17/2023]

Abstract

OBJECTIVE

Evidence-based medicine (EBM) is a decision-making process based on the conscious and judicious use of the best available scientific evidence. However, the exponential increase in the amount of information currently available likely exceeds the capacity of human-only analysis. In this context, artificial intelligence (AI) and its branches such as machine learning (ML) can be used to facilitate human efforts in analyzing the literature to foster EBM. The present scoping review aimed to examine the use of AI in the automation of biomedical literature survey and analysis with a view to establishing the state-of-the-art and identifying knowledge gaps.

MATERIALS AND METHODS

Comprehensive searches of the main databases were performed for articles published up to June 2022 and studies were selected according to inclusion and exclusion criteria. Data were extracted from the included articles and the findings categorized.

RESULTS

The total number of records retrieved from the databases was 12,145, of which 273 were included in the review. Classification of the studies according to the use of AI in evaluating the biomedical literature revealed three main application groups, namely assembly of scientific evidence (n=127; 47%), mining the biomedical literature (n=112; 41%) and quality analysis (n=34; 12%). Most studies addressed the preparation of systematic reviews, while articles focusing on the development of guidelines and evidence synthesis were the least frequent. The biggest knowledge gap was identified within the quality analysis group, particularly regarding methods and tools that assess the strength of recommendation and consistency of evidence.

CONCLUSION

Our review shows that, despite significant progress in the automation of biomedical literature surveys and analyses in recent years, intense research is needed to fill knowledge gaps on more difficult aspects of ML, deep learning and natural language processing, and to consolidate the use of automation by end-users (biomedical researchers and healthcare professionals).

Collapse

Mallick C, Das AK, Nayak J, Pelusi D, Vimal S. Evolutionary Algorithm based Ensemble Extractive Summarization for Developing Smart Medical System. Interdiscip Sci 2021;13:229-259. [PMID: 33576956 DOI: 10.1007/s12539-020-00412-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Revised: 12/17/2020] [Accepted: 12/21/2020] [Indexed: 11/25/2022]

Abstract

The amount of information in the scientific literature of the bio-medical domain is growing exponentially, which makes it difficult in developing a smart medical system. Summarization techniques help for efficient searching and understanding of relevant information from the medical documents. In the paper, an evolutionary algorithm based ensemble extractive summarization technique is devised as a smart medical application with the idea of hybrid artificial intelligence on natural language processing. We have considered the abstracts of the target article and its cited articles as the base summaries and a multi-objective evolutionary algorithm is applied for generating the ensemble summary of the target article. Each sentence of the base summaries is represented by a concept vector of the medical terms contained in it with the help of the Unified Modelling Language System (UMLS) tool which is widely used in various smart medical applications. These terms carry the key information of the sentence which is very useful to find out the semantic similarity among the sentences. Fitness functions of the evolutionary algorithm are mainly defined using clustering coefficient and sparsity index, the concepts of graph theory. After the convergence of the algorithm, the best solution of the final population gives the ensemble summary. Next, the semantic similarity of each sentence in the target article with the ensemble summary is calculated and the sentences which are most similar to the ensemble summary are considered as the summary of the target article. The method is applied to the articles available in the PubMed MEDLINE database system and experimental results are compared with some state of the art methods applied in the Bio-medical domain. Experimental results and comparative study based on the performance evaluation show that the method competes with some recently proposed summarization methods and outperforms others, which express the effectiveness of the proposed methodology. Different statistical tests have also been made to observe that the method is statistically significant.

Collapse

Dexter PR, Grout RW, Embi PJ. Transforming primary medical research knowledge into clinical decision. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2021;2020:358-362. [PMID: 33936408 PMCID: PMC8075430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Lee EK, Uppal K. CERC: an interactive content extraction, recognition, and construction tool for clinical and biomedical text. BMC Med Inform Decis Mak 2020;20:306. [PMID: 33323109 PMCID: PMC7739454 DOI: 10.1186/s12911-020-01330-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/11/2020] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

Automated summarization of scientific literature and patient records is essential for enhancing clinical decision-making and facilitating precision medicine. Most existing summarization methods are based on single indicators of relevance, offer limited capabilities for information visualization, and do not account for user specific interests. In this work, we develop an interactive content extraction, recognition, and construction system (CERC) that combines machine learning and visualization techniques with domain knowledge for highlighting and extracting salient information from clinical and biomedical text.

METHODS

A novel sentence-ranking framework multi indicator text summarization, MINTS, is developed for extractive summarization. MINTS uses random forests and multiple indicators of importance for relevance evaluation and ranking of sentences. Indicative summarization is performed using weighted term frequency-inverse document frequency scores of over-represented domain-specific terms. A controlled vocabulary dictionary generated using MeSH, SNOMED-CT, and PubTator is used for determining relevant terms. 35 full-text CRAFT articles were used as the training set. The performance of the MINTS algorithm is evaluated on a test set consisting of the remaining 32 full-text CRAFT articles and 30 clinical case reports using the ROUGE toolkit.

RESULTS

The random forests model classified sentences as "good" or "bad" with 87.5% accuracy on the test set. Summarization results from the MINTS algorithm achieved higher ROUGE-1, ROUGE-2, and ROUGE-SU4 scores when compared to methods based on single indicators such as term frequency distribution, position, eigenvector centrality (LexRank), and random selection, p < 0.01. The automatic language translator and the customizable information extraction and pre-processing pipeline for EHR demonstrate that CERC can readily be incorporated within clinical decision support systems to improve quality of care and assist in data-driven and evidence-based informed decision making for direct patient care.

CONCLUSIONS

We have developed a web-based summarization and visualization tool, CERC ( https://newton.isye.gatech.edu/CERC1/ ), for extracting salient information from clinical and biomedical text. The system ranks sentences by relevance and includes features that can facilitate early detection of medical risks in a clinical setting. The interactive interface allows users to filter content and edit/save summaries. The evaluation results on two test corpuses show that the newly developed MINTS algorithm outperforms methods based on single characteristics of importance.

Collapse

Zhitomirsky-Geffet M, Bergman O, Hilel S. Towards a wider perspective in the social sciences using a network of variables based on thousands of results. Scientometrics 2020. [DOI: 10.1007/s11192-020-03446-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Kilicoglu H, Rosemblat G, Fiszman M, Shin D. Broad-coverage biomedical relation extraction with SemRep. BMC Bioinformatics 2020;21:188. [PMID: 32410573 PMCID: PMC7222583 DOI: 10.1186/s12859-020-3517-7] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 04/29/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. In one evaluation, we use a manually annotated test collection and perform a comprehensive error analysis. In another evaluation, we assess SemRep's performance on the CDR dataset, a standard benchmark corpus annotated with causal chemical-disease relationships.

RESULTS

A strict evaluation of SemRep on our manually annotated dataset yields 0.55 precision, 0.34 recall, and 0.42 F 1 score. A relaxed evaluation, which more accurately characterizes SemRep performance, yields 0.69 precision, 0.42 recall, and 0.52 F 1 score. An error analysis reveals named entity recognition/normalization as the largest source of errors (26.9%), followed by argument identification (14%) and trigger detection errors (12.5%). The evaluation on the CDR corpus yields 0.90 precision, 0.24 recall, and 0.38 F 1 score. The recall and the F 1 score increase to 0.35 and 0.50, respectively, when the evaluation on this corpus is limited to sentence-bound relationships, which represents a fairer evaluation, as SemRep operates at the sentence level.

CONCLUSIONS

SemRep is a broad-coverage, interpretable, strong baseline system for extracting semantic relations from biomedical text. It also underpins SemMedDB, a literature-scale knowledge graph based on semantic relations. Through SemMedDB, SemRep has had significant impact in the scientific community, supporting a variety of clinical and translational applications, including clinical decision making, medical diagnosis, drug repurposing, literature-based discovery and hypothesis generation, and contributing to improved health outcomes. In ongoing development, we are redesigning SemRep to increase its modularity and flexibility, and addressing weaknesses identified in the error analysis.

Collapse

Chen YP, Chen YY, Lin JJ, Huang CH, Lai F. Modified Bidirectional Encoder Representations From Transformers Extractive Summarization Model for Hospital Information Systems Based on Character-Level Tokens (AlphaBERT): Development and Performance Evaluation. JMIR Med Inform 2020;8:e17787. [PMID: 32347806 PMCID: PMC7221648 DOI: 10.2196/17787] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 03/05/2020] [Accepted: 04/10/2020] [Indexed: 01/16/2023] Open

Abstract

Background

Doctors must care for many patients simultaneously, and it is time-consuming to find and examine all patients’ medical histories. Discharge diagnoses provide hospital staff with sufficient information to enable handling multiple patients; however, the excessive amount of words in the diagnostic sentences poses problems. Deep learning may be an effective solution to overcome this problem, but the use of such a heavy model may also add another obstacle to systems with limited computing resources.

Objective

We aimed to build a diagnoses-extractive summarization model for hospital information systems and provide a service that can be operated even with limited computing resources.

Methods

We used a Bidirectional Encoder Representations from Transformers (BERT)-based structure with a two-stage training method based on 258,050 discharge diagnoses obtained from the National Taiwan University Hospital Integrated Medical Database, and the highlighted extractive summaries written by experienced doctors were labeled. The model size was reduced using a character-level token, the number of parameters was decreased from 108,523,714 to 963,496, and the model was pretrained using random mask characters in the discharge diagnoses and International Statistical Classification of Diseases and Related Health Problems sets. We then fine-tuned the model using summary labels and cleaned up the prediction results by averaging all probabilities for entire words to prevent character level–induced fragment words. Model performance was evaluated against existing models BERT, BioBERT, and Long Short-Term Memory (LSTM) using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) L score, and a questionnaire website was built to collect feedback from more doctors for each summary proposal.

Results

The area under the receiver operating characteristic curve values of the summary proposals were 0.928, 0.941, 0.899, and 0.947 for BERT, BioBERT, LSTM, and the proposed model (AlphaBERT), respectively. The ROUGE-L scores were 0.697, 0.711, 0.648, and 0.693 for BERT, BioBERT, LSTM, and AlphaBERT, respectively. The mean (SD) critique scores from doctors were 2.232 (0.832), 2.134 (0.877), 2.207 (0.844), 1.927 (0.910), and 2.126 (0.874) for reference-by-doctor labels, BERT, BioBERT, LSTM, and AlphaBERT, respectively. Based on the paired t test, there was a statistically significant difference in LSTM compared to the reference (P<.001), BERT (P=.001), BioBERT (P<.001), and AlphaBERT (P=.002), but not in the other models.

Conclusions

Use of character-level tokens in a BERT model can greatly decrease the model size without significantly reducing performance for diagnoses summarization. A well-developed deep-learning model will enhance doctors’ abilities to manage patients and promote medical studies by providing the capability to use extensive unstructured free-text notes.

Collapse

Natural language processing applications in library and information science. ONLINE INFORMATION REVIEW 2019. [DOI: 10.1108/oir-07-2018-0217] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Guo J, Blake C, Guan Y. Evaluating automated entity extraction with respect to drug and non-drug treatment strategies. J Biomed Inform 2019;94:103177. [PMID: 30986506 DOI: 10.1016/j.jbi.2019.103177] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Revised: 04/10/2019] [Accepted: 04/11/2019] [Indexed: 11/28/2022]

Abstract

OBJECTIVES

Treatment used in a randomized clinical trial is a critical data element both for physicians at the point of care and reviewers who are evaluating different interventions. Much of existing work on treatment extraction from the biomedical literature has focused on the extraction of pharmacological interventions. However, non-pharmacological interventions (e.g., exercise, diet, etc.) that are frequently used to address chronic conditions are less well studied. The goal of this study is to compare knowledge-based and machine learning strategies for the extraction of both drug and non-drug treatments.

METHODS

We collected 800 randomized clinical trial abstracts each for breast cancer and diabetes from PubMed. The treatments in the result/conclusion sentences of the abstracts were manually annotated and marked as drug/non-drug treatments. We then designed three methods to identify the treatments and evaluated the systems with respect to drug/non-drug treatments. The first method is solely based on knowledge base (here we used MetaMap). The second method is based on a machine learning model trained mainly on contextual features (ML_only). The third method is a combination approach that integrates the previous two approaches.

RESULTS/DISCUSSION

Results show that MetaMap, when used with high precision semantic types, has better performance for drug compared to non-drug treatments (F1 = 0.77 vs. 0.64). The ML_only approach has smaller performance difference between drug and non-drug treatments compared with the KB-based approach (F1 = 0.02 vs. 0.05, 0.07, and 0.13). The combination approach achieves significantly better performance than all MetaMap approaches alone for total treatments (F1 = 0.76 vs. 0.72, p < 0.001). The performance gain mainly comes from the non-drug treatments (0.03-0.08 improvement in F1), while the drug treatments do not benefit much from the combination approach (0-0.03 improvement in F1).

CONCLUSION

These results suggest that a knowledge-based approach should be employed for medical conditions that are primarily treated with drugs whereas conditions that are treated with either a combination of drug and non-drug interventions or primarily non-drug interventions should use automated tools that combine machine learning and a knowledge-based approach to achieve optimal performance.

Collapse

Moradi M. CIBS: A biomedical text summarizer using topic-based sentence clustering. J Biomed Inform 2018;88:53-61. [DOI: 10.1016/j.jbi.2018.11.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2018] [Revised: 09/26/2018] [Accepted: 11/12/2018] [Indexed: 12/21/2022]

Scarton LA, Wang L, Kilicoglu H, Jahries M, Del Fiol G. Expanding vocabularies for complementary and alternative medicine therapies. Int J Med Inform 2018;121:64-74. [PMID: 30545491 DOI: 10.1016/j.ijmedinf.2018.11.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 11/14/2018] [Accepted: 11/20/2018] [Indexed: 01/19/2023]

Abstract

OBJECTIVE

There is a significant consumer demand for complementary and alternative medicine (CAM) therapies as possible alternatives to drugs in the treatment and prevention of chronic diseases. Expanding controlled vocabularies to include CAM treatment relations could help meet those needs by facilitating information retrieval from the published literature. The purpose of this study is to design and evaluate two methods to semi-automatically extract CAM treatment-related semantic predications (subject-predicate-object triplets) from the biomedical literature using the Semantic Medline database (SemMedDB).

METHODS

Predications were retrieved from SemMedDB, a database of semantic predications extracted from article abstracts available in Medline. Predications were retrieved for 20 biologically-based and 3 mind-body CAM therapies. The first method (allMedline) retrieved predications from any Medline citation, while the second method (soundStudies) only retrieved predications from scientifically sound clinical studies. Filtering criteria were applied to identify the predications focusing on the treatment and prevention of medical disorders using various CAM modalities. The disorders were extracted for each CAM therapy and ranked by occurrence. A reference vocabulary, composed of 20 biologically-based and 3 mind-body CAM therapies, was developed to evaluate the performance of each method according to precision and recall of the top 100 ranked concepts as well as average precision and recall.

RESULTS

The difference between allMedline and soundStudies in terms of median precision for the top 100 concepts ranked by occurrence was significant (21.0% versus 27.0%, p < .001). The soundStudies method had significantly higher precision (7.0% vs 11.5%, p < .001) and the allMedline had significantly higher recall (37.1% vs 25.6%, p < .001).

CONCLUSION

The soundStudies method may be useful for extracting treatment-related predications from the biomedical literature for the highest ranked concepts. Additional work is needed to improve the algorithm as well as identify and report shortcomings for future enhancements of the tools used to populate SemMedDB.

Collapse

Nasr Azadani M, Ghadiri N, Davoodijam E. Graph-based biomedical text summarization: An itemset mining and sentence clustering approach. J Biomed Inform 2018;84:42-58. [DOI: 10.1016/j.jbi.2018.06.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Revised: 04/21/2018] [Accepted: 06/10/2018] [Indexed: 10/28/2022]

Evaluating Different Similarity Measures for Automatic Biomedical Text Summarization. ACTA ACUST UNITED AC 2018. [DOI: 10.1007/978-3-319-76348-4_30] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

Garcia-Gathright JI, Matiasz NJ, Adame C, Sarma KV, Sauer L, Smedley NF, Spiegel ML, Strunck J, Garon EB, Taira RK, Aberle DR, Bui AAT. Evaluating Casama: Contextualized semantic maps for summarization of lung cancer studies. Comput Biol Med 2018;92:55-63. [PMID: 29149658 PMCID: PMC5762403 DOI: 10.1016/j.compbiomed.2017.10.034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Revised: 10/28/2017] [Accepted: 10/29/2017] [Indexed: 01/15/2023]

Affiliation(s)

Jean I Garcia-Gathright University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA.
Nicholas J Matiasz University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
Carlos Adame University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
Karthik V Sarma University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
Lauren Sauer University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
Nova F Smedley University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
Marshall L Spiegel University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
Jennifer Strunck University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
Edward B Garon University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
Ricky K Taira University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA; University of California, Los Angeles, Department of Radiological Sciences, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
Denise R Aberle University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA; University of California, Los Angeles, Department of Radiological Sciences, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
Alex A T Bui University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA; University of California, Los Angeles, Department of Radiological Sciences, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA

Collapse

Moradi M, Ghadiri N. Different approaches for identifying important concepts in probabilistic biomedical text summarization. Artif Intell Med 2018;84:101-116. [DOI: 10.1016/j.artmed.2017.11.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Revised: 08/25/2017] [Accepted: 11/28/2017] [Indexed: 10/18/2022]

Lloret E, Plaza L, Aker A. The challenging task of summary evaluation: an overview. LANG RESOUR EVAL 2017. [DOI: 10.1007/s10579-017-9399-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017;117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Karaa WBA, Ashour AS, Sassi DB, Roy P, Kausar N, Dey N. MEDLINE Text Mining: An Enhancement Genetic Algorithm Based Approach for Document Clustering. INTELLIGENT SYSTEMS REFERENCE LIBRARY 2016. [DOI: 10.1007/978-3-319-21212-8_12] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]

Bravo À, Piñero J, Queralt-Rosinach N, Rautschka M, Furlong LI. Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC Bioinformatics 2015;16:55. [PMID: 25886734 PMCID: PMC4466840 DOI: 10.1186/s12859-015-0472-9] [Citation(s) in RCA: 116] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 01/19/2015] [Indexed: 11/23/2022] Open

Abstract

Background

Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases.

Results

By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications.

Conclusions

BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0472-9) contains supplementary material, which is available to authorized users.

Collapse

Alhindi A, Kruschwitz U, Fox C, Albakour MD. Profile-Based Summarisation for Web Site Navigation. ACM T INFORM SYST 2015. [DOI: 10.1145/2699661] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Abstract Information systems that utilise contextual information have the potential of helping a user identify relevant information more quickly and more accurately than systems that work the same for all users and contexts. Contextual information comes in a variety of types, often derived from records of past interactions between a user and the information system. It can be individual or group based. We are focusing on the latter, harnessing the search behaviour of cohorts of users, turning it into a domain model that can then be used to assist other users of the same cohort. More specifically, we aim to explore how such a domain model is best utilised for profile-biased summarisation of documents in a navigation scenario in which such summaries can be displayed as hover text as a user moves the mouse over a link. The main motivation is to help a user find relevant documents more quickly. Given the fact that the Web in general has been studied extensively already, we focus our attention on Web sites and similar document collections. Such collections can be notoriously difficult to search or explore. The process of acquiring the domain model is not a research interest here; we simply adopt a biologically inspired method that resembles the idea of ant colony optimisation. This has been shown to work well in a variety of application areas. The model can be built in a continuous learning cycle that exploits search patterns as recorded in typical query log files. Our research explores different summarisation techniques, some of which use the domain model and some that do not. We perform task-based evaluations of these different techniques—thus of the impact of the domain model and profile-biased summarisation—in the context of Web site navigation. Collapse

Cohen AM, Smalheiser NR, McDonagh MS, Yu C, Adams CE, Davis JM, Yu PS. Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine. J Am Med Inform Assoc 2015;22:707-17. [PMID: 25656516 PMCID: PMC4457112 DOI: 10.1093/jamia/ocu025] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 11/15/2014] [Indexed: 11/24/2022] Open

Abstract

Objective: For many literature review tasks, including systematic review (SR) and other aspects of evidence-based medicine, it is important to know whether an article describes a randomized controlled trial (RCT). Current manual annotation is not complete or flexible enough for the SR process. In this work, highly accurate machine learning predictive models were built that include confidence predictions of whether an article is an RCT.

Materials and Methods: The LibSVM classifier was used with forward selection of potential feature sets on a large human-related subset of MEDLINE to create a classification model requiring only the citation, abstract, and MeSH terms for each article.

Results: The model achieved an area under the receiver operating characteristic curve of 0.973 and mean squared error of 0.013 on the held out year 2011 data. Accurate confidence estimates were confirmed on a manually reviewed set of test articles. A second model not requiring MeSH terms was also created, and performs almost as well.

Discussion: Both models accurately rank and predict article RCT confidence. Using the model and the manually reviewed samples, it is estimated that about 8000 (3%) additional RCTs can be identified in MEDLINE, and that 5% of articles tagged as RCTs in Medline may not be identified.

Conclusion: Retagging human-related studies with a continuously valued RCT confidence is potentially more useful for article ranking and review than a simple yes/no prediction. The automated RCT tagging tool should offer significant savings of time and effort during the process of writing SRs, and is a key component of a multistep text mining pipeline that we are building to streamline SR workflow. In addition, the model may be useful for identifying errors in MEDLINE publication types. The RCT confidence predictions described here have been made available to users as a web service with a user query form front end at: http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.

Collapse

Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J, Del Fiol G. Text summarization in the biomedical domain: a systematic review of recent research. J Biomed Inform 2014;52:457-67. [PMID: 25016293 DOI: 10.1016/j.jbi.2014.06.009] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2014] [Revised: 06/04/2014] [Accepted: 06/23/2014] [Indexed: 11/27/2022]

Demner-Fushman D, Mork JG, Aronson AR. Mining MEDLINE for problems associated with vitamin D. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2013;2013:300-308. [PMID: 24551339 PMCID: PMC3900180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Mishra R, Del Fiol G, Kilicoglu H, Jonnalagadda S, Fiszman M. Automatically extracting clinically useful sentences from UpToDate to support clinicians' information needs. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2013;2013:987-992. [PMID: 24551389 PMCID: PMC3900230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Development and evaluation of a biomedical search engine using a predicate-based vector space model. J Biomed Inform 2013;46:929-39. [PMID: 23892296 DOI: 10.1016/j.jbi.2013.07.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2013] [Revised: 06/18/2013] [Accepted: 07/19/2013] [Indexed: 11/21/2022]

Zhang H, Fiszman M, Shin D, Wilkowski B, Rindflesch TC. Clustering cliques for graph-based summarization of the biomedical research literature. BMC Bioinformatics 2013;14:182. [PMID: 23742159 PMCID: PMC3682874 DOI: 10.1186/1471-2105-14-182] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2012] [Accepted: 05/29/2013] [Indexed: 11/10/2022] Open

Jonnalagadda SR, Del Fiol G, Medlin R, Weir C, Fiszman M, Mostafa J, Liu H. Automatically extracting sentences from Medline citations to support clinicians' information needs. J Am Med Inform Assoc 2012;20:995-1000. [PMID: 23100128 DOI: 10.1136/amiajnl-2012-001347] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open

Wu JA, Hsu W, Bui AAT. An Approach for Incorporating Context in Building Probabilistic Predictive Models. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS, IMAGING AND SYSTEMS BIOLOGY 2012;2012:96-105. [PMID: 27617299 PMCID: PMC5017790 DOI: 10.1109/hisb.2012.30] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Workman TE, Fiszman M, Hurdle JF. Text summarization as a decision support aid. BMC Med Inform Decis Mak 2012;12:41. [PMID: 22621674 PMCID: PMC3461485 DOI: 10.1186/1472-6947-12-41] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2011] [Accepted: 04/18/2012] [Indexed: 11/18/2022] Open

Abstract

Background

PubMed data potentially can provide decision support information, but PubMed was not exclusively designed to be a point-of-care tool. Natural language processing applications that summarize PubMed citations hold promise for extracting decision support information. The objective of this study was to evaluate the efficiency of a text summarization application called Semantic MEDLINE, enhanced with a novel dynamic summarization method, in identifying decision support data.

Methods

We downloaded PubMed citations addressing the prevention and drug treatment of four disease topics. We then processed the citations with Semantic MEDLINE, enhanced with the dynamic summarization method. We also processed the citations with a conventional summarization method, as well as with a baseline procedure. We evaluated the results using clinician-vetted reference standards built from recommendations in a commercial decision support product, DynaMed.

Results

For the drug treatment data, Semantic MEDLINE enhanced with dynamic summarization achieved average recall and precision scores of 0.848 and 0.377, while conventional summarization produced 0.583 average recall and 0.712 average precision, and the baseline method yielded average recall and precision values of 0.252 and 0.277. For the prevention data, Semantic MEDLINE enhanced with dynamic summarization achieved average recall and precision scores of 0.655 and 0.329. The baseline technique resulted in recall and precision scores of 0.269 and 0.247. No conventional Semantic MEDLINE method accommodating summarization for prevention exists.

Conclusion

Semantic MEDLINE with dynamic summarization outperformed conventional summarization in terms of recall, and outperformed the baseline method in both recall and precision. This new approach to text summarization demonstrates potential in identifying decision support data for multiple needs.

Collapse

Workman TE, Stoddart JM. Rethinking information delivery: using a natural language processing application for point-of-care data discovery. J Med Libr Assoc 2012;100:113-20. [PMID: 22514507 DOI: 10.3163/1536-5050.100.2.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open

Kilicoglu H, Rosemblat G, Fiszman M, Rindflesch TC. Constructing a semantic predication gold standard from the biomedical literature. BMC Bioinformatics 2011;12:486. [PMID: 22185221 PMCID: PMC3281188 DOI: 10.1186/1471-2105-12-486] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Accepted: 12/20/2011] [Indexed: 11/30/2022] Open

Abstract

Background

Semantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biomedical text mining focus on narrow subdomains and adopt different semantic representations, rendering them difficult to use for benchmarking independently developed relation extraction systems. In this article, we present a multi-phase gold standard annotation study, in which we annotated 500 sentences randomly selected from MEDLINE abstracts on a wide range of biomedical topics with 1371 semantic predications. The UMLS Metathesaurus served as the main source for conceptual information and the UMLS Semantic Network for relational information. We measured interannotator agreement and analyzed the annotations closely to identify some of the challenges in annotating biomedical text with relations based on an ontology or a terminology.

Results

We obtain fair to moderate interannotator agreement in the practice phase (0.378-0.475). With improved guidelines and additional semantic equivalence criteria, the agreement increases by 12% (0.415 to 0.536) in the main annotation phase. In addition, we find that agreement increases to 0.688 when the agreement calculation is limited to those predications that are based only on the explicitly provided UMLS concepts and relations.

Conclusions

While interannotator agreement in the practice phase confirms that conceptual annotation is a challenging task, the increasing agreement in the main annotation phase points out that an acceptable level of agreement can be achieved in multiple iterations, by setting stricter guidelines and establishing semantic equivalence criteria. Mapping text to ontological concepts emerges as the main challenge in conceptual annotation. Annotating predications involving biomolecular entities and processes is particularly challenging. While the resulting gold standard is mainly intended to serve as a test collection for our semantic interpreter, we believe that the lessons learned are applicable generally.

Collapse

Li Y, Salmasian H, Harpaz R, Chase H, Friedman C. Determining the reasons for medication prescriptions in the EHR using knowledge and natural language processing. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2011;2011:768-776. [PMID: 22195134 PMCID: PMC3243251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Zhang H, Fiszman M, Shin D, Miller CM, Rosemblat G, Rindflesch TC. Degree centrality for semantic abstraction summarization of therapeutic studies. J Biomed Inform 2011;44:830-8. [PMID: 21575741 DOI: 10.1016/j.jbi.2011.05.001] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2010] [Revised: 04/25/2011] [Accepted: 05/02/2011] [Indexed: 11/28/2022]

Workman TE, Hurdle JF. Dynamic summarization of bibliographic-based data. BMC Med Inform Decis Mak 2011;11:6. [PMID: 21284871 PMCID: PMC3042900 DOI: 10.1186/1472-6947-11-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2010] [Accepted: 02/01/2011] [Indexed: 11/15/2022] Open

Abstract

Background

Traditional information retrieval techniques typically return excessive output when directed at large bibliographic databases. Natural Language Processing applications strive to extract salient content from the excessive data. Semantic MEDLINE, a National Library of Medicine (NLM) natural language processing application, highlights relevant information in PubMed data. However, Semantic MEDLINE implements manually coded schemas, accommodating few information needs. Currently, there are only five such schemas, while many more would be needed to realistically accommodate all potential users. The aim of this project was to develop and evaluate a statistical algorithm that automatically identifies relevant bibliographic data; the new algorithm could be incorporated into a dynamic schema to accommodate various information needs in Semantic MEDLINE, and eliminate the need for multiple schemas.

Methods

We developed a flexible algorithm named Combo that combines three statistical metrics, the Kullback-Leibler Divergence (KLD), Riloff's RlogF metric (RlogF), and a new metric called PredScal, to automatically identify salient data in bibliographic text. We downloaded citations from a PubMed search query addressing the genetic etiology of bladder cancer. The citations were processed with SemRep, an NLM rule-based application that produces semantic predications. SemRep output was processed by Combo, in addition to the standard Semantic MEDLINE genetics schema and independently by the two individual KLD and RlogF metrics. We evaluated each summarization method using an existing reference standard within the task-based context of genetic database curation.

Results

Combo asserted 74 genetic entities implicated in bladder cancer development, whereas the traditional schema asserted 10 genetic entities; the KLD and RlogF metrics individually asserted 77 and 69 genetic entities, respectively. Combo achieved 61% recall and 81% precision, with an F-score of 0.69. The traditional schema achieved 23% recall and 100% precision, with an F-score of 0.37. The KLD metric achieved 61% recall, 70% precision, with an F-score of 0.65. The RlogF metric achieved 61% recall, 72% precision, with an F-score of 0.66.

Conclusions

Semantic MEDLINE summarization using the new Combo algorithm outperformed a conventional summarization schema in a genetic database curation task. It potentially could streamline information acquisition for other needs without having to hand-build multiple saliency schemas.

Collapse

Workman TE, Fiszman M, Hurdle JF, Rindflesch TC. Biomedical text summarization to support genetic database curation: using Semantic MEDLINE to create a secondary database of genetic information. J Med Libr Assoc 2011;98:273-81. [PMID: 20936065 DOI: 10.3163/1536-5050.98.4.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open

Keselman A, Rosemblat G, Kilicoglu H, Fiszman M, Jin H, Shin D, Rindflesch TC. Adapting Semantic Natural Language Processing Technology to Address Information Overload in Influenza Epidemic Management. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY : JASIST 2010;61:10.1002/asi.21414. [PMID: 24311971 PMCID: PMC3847910 DOI: 10.1002/asi.21414] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Wang X, Chase HS, Li J, Hripcsak G, Friedman C. Integrating heterogeneous knowledge sources to acquire executable drug-related knowledge. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2010;2010:852-856. [PMID: 21347099 PMCID: PMC3041361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]

The structure and dynamics of cocitation clusters: A multiple-perspective cocitation analysis. ACTA ACUST UNITED AC 2010. [DOI: 10.1002/asi.21309] [Citation(s) in RCA: 714] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Demner-Fushman D, Chapman WW, McDonald CJ. What can natural language processing do for clinical decision support? J Biomed Inform 2009;42:760-72. [PMID: 19683066 PMCID: PMC2757540 DOI: 10.1016/j.jbi.2009.08.007] [Citation(s) in RCA: 274] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2008] [Revised: 08/10/2009] [Accepted: 08/11/2009] [Indexed: 11/29/2022]

Chapman WW, Cohen KB. Current issues in biomedical text mining and natural language processing. J Biomed Inform 2009;42:757-9. [PMID: 19735740 DOI: 10.1016/j.jbi.2009.09.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2009] [Revised: 08/31/2009] [Accepted: 09/01/2009] [Indexed: 11/29/2022]