1
|
Oliveira Dos Santos Á, Sergio da Silva E, Machado Couto L, Valadares Labanca Reis G, Silva Belo V. The use of artificial intelligence for automating or semi-automating biomedical literature analyses: a scoping review. J Biomed Inform 2023; 142:104389. [PMID: 37187321 DOI: 10.1016/j.jbi.2023.104389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 04/11/2023] [Accepted: 05/08/2023] [Indexed: 05/17/2023]
Abstract
OBJECTIVE Evidence-based medicine (EBM) is a decision-making process based on the conscious and judicious use of the best available scientific evidence. However, the exponential increase in the amount of information currently available likely exceeds the capacity of human-only analysis. In this context, artificial intelligence (AI) and its branches such as machine learning (ML) can be used to facilitate human efforts in analyzing the literature to foster EBM. The present scoping review aimed to examine the use of AI in the automation of biomedical literature survey and analysis with a view to establishing the state-of-the-art and identifying knowledge gaps. MATERIALS AND METHODS Comprehensive searches of the main databases were performed for articles published up to June 2022 and studies were selected according to inclusion and exclusion criteria. Data were extracted from the included articles and the findings categorized. RESULTS The total number of records retrieved from the databases was 12,145, of which 273 were included in the review. Classification of the studies according to the use of AI in evaluating the biomedical literature revealed three main application groups, namely assembly of scientific evidence (n=127; 47%), mining the biomedical literature (n=112; 41%) and quality analysis (n=34; 12%). Most studies addressed the preparation of systematic reviews, while articles focusing on the development of guidelines and evidence synthesis were the least frequent. The biggest knowledge gap was identified within the quality analysis group, particularly regarding methods and tools that assess the strength of recommendation and consistency of evidence. CONCLUSION Our review shows that, despite significant progress in the automation of biomedical literature surveys and analyses in recent years, intense research is needed to fill knowledge gaps on more difficult aspects of ML, deep learning and natural language processing, and to consolidate the use of automation by end-users (biomedical researchers and healthcare professionals).
Collapse
Affiliation(s)
| | - Eduardo Sergio da Silva
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| | - Letícia Machado Couto
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| | | | - Vinícius Silva Belo
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| |
Collapse
|
2
|
Mallick C, Das AK, Nayak J, Pelusi D, Vimal S. Evolutionary Algorithm based Ensemble Extractive Summarization for Developing Smart Medical System. Interdiscip Sci 2021; 13:229-259. [PMID: 33576956 DOI: 10.1007/s12539-020-00412-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Revised: 12/17/2020] [Accepted: 12/21/2020] [Indexed: 11/25/2022]
Abstract
The amount of information in the scientific literature of the bio-medical domain is growing exponentially, which makes it difficult in developing a smart medical system. Summarization techniques help for efficient searching and understanding of relevant information from the medical documents. In the paper, an evolutionary algorithm based ensemble extractive summarization technique is devised as a smart medical application with the idea of hybrid artificial intelligence on natural language processing. We have considered the abstracts of the target article and its cited articles as the base summaries and a multi-objective evolutionary algorithm is applied for generating the ensemble summary of the target article. Each sentence of the base summaries is represented by a concept vector of the medical terms contained in it with the help of the Unified Modelling Language System (UMLS) tool which is widely used in various smart medical applications. These terms carry the key information of the sentence which is very useful to find out the semantic similarity among the sentences. Fitness functions of the evolutionary algorithm are mainly defined using clustering coefficient and sparsity index, the concepts of graph theory. After the convergence of the algorithm, the best solution of the final population gives the ensemble summary. Next, the semantic similarity of each sentence in the target article with the ensemble summary is calculated and the sentences which are most similar to the ensemble summary are considered as the summary of the target article. The method is applied to the articles available in the PubMed MEDLINE database system and experimental results are compared with some state of the art methods applied in the Bio-medical domain. Experimental results and comparative study based on the performance evaluation show that the method competes with some recently proposed summarization methods and outperforms others, which express the effectiveness of the proposed methodology. Different statistical tests have also been made to observe that the method is statistically significant.
Collapse
Affiliation(s)
- Chirantana Mallick
- Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, 711103, India
| | - Asit Kumar Das
- Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, 711103, India.
| | - Janmenjoy Nayak
- Department of Computer Science and Engineering, Aditya Institute of Technology and Management (AITAM), Tekkali, Andhra Pradesh, 532201, India
| | - Danilo Pelusi
- Department of Communications Sciences, University of Teramo, Teramo, Italy
| | - S Vimal
- Department of Information Technology, National Engineering College, K.R.Nagar, Kovilpatti, Thoothukudi District, Tamilnadu, 628503, India
| |
Collapse
|
3
|
Dexter PR, Grout RW, Embi PJ. Transforming primary medical research knowledge into clinical decision. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2021; 2020:358-362. [PMID: 33936408 PMCID: PMC8075430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
While the utility of computerized clinical decision support (CCDS) for multiple select clinical domains has been clearly demonstrated, much less is known about the full breadth of domains to which CCDS approaches could be productively applied. To explore the applicability of CCDS to general medical knowledge, we sampled a total of 500 primary research articles from 4 high-impact medical journals. Employing rule-based templates, we created high-level CCDS rules for 72% (361/500) of primary medical research articles. We subsequently identified data sources needed to implement those rules. Ourfindings suggest that CCDS approaches, perhaps in the form of non-interruptive infobuttons, could be much more broadly applied. In addition, our analytic methods appear to provide a means of prioritizing and quantitating the relative utility of available data sources for purposes of CCDS.
Collapse
Affiliation(s)
- Paul R Dexter
- Regenstrief Institute, Inc., Indianapolis, IN
- Indiana University School of Medicine, Indianapolis, IN
| | - Randall W Grout
- Regenstrief Institute, Inc., Indianapolis, IN
- Indiana University School of Medicine, Indianapolis, IN
- Eskenazi Health, Indianapolis, IN
| | - Peter J Embi
- Regenstrief Institute, Inc., Indianapolis, IN
- Indiana University School of Medicine, Indianapolis, IN
| |
Collapse
|
4
|
Lee EK, Uppal K. CERC: an interactive content extraction, recognition, and construction tool for clinical and biomedical text. BMC Med Inform Decis Mak 2020; 20:306. [PMID: 33323109 PMCID: PMC7739454 DOI: 10.1186/s12911-020-01330-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/11/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Automated summarization of scientific literature and patient records is essential for enhancing clinical decision-making and facilitating precision medicine. Most existing summarization methods are based on single indicators of relevance, offer limited capabilities for information visualization, and do not account for user specific interests. In this work, we develop an interactive content extraction, recognition, and construction system (CERC) that combines machine learning and visualization techniques with domain knowledge for highlighting and extracting salient information from clinical and biomedical text. METHODS A novel sentence-ranking framework multi indicator text summarization, MINTS, is developed for extractive summarization. MINTS uses random forests and multiple indicators of importance for relevance evaluation and ranking of sentences. Indicative summarization is performed using weighted term frequency-inverse document frequency scores of over-represented domain-specific terms. A controlled vocabulary dictionary generated using MeSH, SNOMED-CT, and PubTator is used for determining relevant terms. 35 full-text CRAFT articles were used as the training set. The performance of the MINTS algorithm is evaluated on a test set consisting of the remaining 32 full-text CRAFT articles and 30 clinical case reports using the ROUGE toolkit. RESULTS The random forests model classified sentences as "good" or "bad" with 87.5% accuracy on the test set. Summarization results from the MINTS algorithm achieved higher ROUGE-1, ROUGE-2, and ROUGE-SU4 scores when compared to methods based on single indicators such as term frequency distribution, position, eigenvector centrality (LexRank), and random selection, p < 0.01. The automatic language translator and the customizable information extraction and pre-processing pipeline for EHR demonstrate that CERC can readily be incorporated within clinical decision support systems to improve quality of care and assist in data-driven and evidence-based informed decision making for direct patient care. CONCLUSIONS We have developed a web-based summarization and visualization tool, CERC ( https://newton.isye.gatech.edu/CERC1/ ), for extracting salient information from clinical and biomedical text. The system ranks sentences by relevance and includes features that can facilitate early detection of medical risks in a clinical setting. The interactive interface allows users to filter content and edit/save summaries. The evaluation results on two test corpuses show that the newly developed MINTS algorithm outperforms methods based on single characteristics of importance.
Collapse
Affiliation(s)
- Eva K Lee
- Center for Operations Research in Medicine and HealthCare, School of Industrial and Systems Engineering, School of Biological Sciences, Georgia Institute of Technology, Atlanta, USA.
| | - Karan Uppal
- School of Medicine, Emory University, Atlanta, GA, USA
| |
Collapse
|
5
|
Zhitomirsky-Geffet M, Bergman O, Hilel S. Towards a wider perspective in the social sciences using a network of variables based on thousands of results. Scientometrics 2020. [DOI: 10.1007/s11192-020-03446-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
6
|
Kilicoglu H, Rosemblat G, Fiszman M, Shin D. Broad-coverage biomedical relation extraction with SemRep. BMC Bioinformatics 2020; 21:188. [PMID: 32410573 PMCID: PMC7222583 DOI: 10.1186/s12859-020-3517-7] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 04/29/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. In one evaluation, we use a manually annotated test collection and perform a comprehensive error analysis. In another evaluation, we assess SemRep's performance on the CDR dataset, a standard benchmark corpus annotated with causal chemical-disease relationships. RESULTS A strict evaluation of SemRep on our manually annotated dataset yields 0.55 precision, 0.34 recall, and 0.42 F 1 score. A relaxed evaluation, which more accurately characterizes SemRep performance, yields 0.69 precision, 0.42 recall, and 0.52 F 1 score. An error analysis reveals named entity recognition/normalization as the largest source of errors (26.9%), followed by argument identification (14%) and trigger detection errors (12.5%). The evaluation on the CDR corpus yields 0.90 precision, 0.24 recall, and 0.38 F 1 score. The recall and the F 1 score increase to 0.35 and 0.50, respectively, when the evaluation on this corpus is limited to sentence-bound relationships, which represents a fairer evaluation, as SemRep operates at the sentence level. CONCLUSIONS SemRep is a broad-coverage, interpretable, strong baseline system for extracting semantic relations from biomedical text. It also underpins SemMedDB, a literature-scale knowledge graph based on semantic relations. Through SemMedDB, SemRep has had significant impact in the scientific community, supporting a variety of clinical and translational applications, including clinical decision making, medical diagnosis, drug repurposing, literature-based discovery and hypothesis generation, and contributing to improved health outcomes. In ongoing development, we are redesigning SemRep to increase its modularity and flexibility, and addressing weaknesses identified in the error analysis.
Collapse
Affiliation(s)
- Halil Kilicoglu
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, 20894 MD USA
- University of Illinois at Urbana-Champaign, School of Information Sciences, 501 E Daniel Street, Champaign, 61820 IL USA
| | - Graciela Rosemblat
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, 20894 MD USA
| | | | - Dongwook Shin
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, 20894 MD USA
| |
Collapse
|
7
|
Chen YP, Chen YY, Lin JJ, Huang CH, Lai F. Modified Bidirectional Encoder Representations From Transformers Extractive Summarization Model for Hospital Information Systems Based on Character-Level Tokens (AlphaBERT): Development and Performance Evaluation. JMIR Med Inform 2020; 8:e17787. [PMID: 32347806 PMCID: PMC7221648 DOI: 10.2196/17787] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 03/05/2020] [Accepted: 04/10/2020] [Indexed: 01/16/2023] Open
Abstract
Background Doctors must care for many patients simultaneously, and it is time-consuming to find and examine all patients’ medical histories. Discharge diagnoses provide hospital staff with sufficient information to enable handling multiple patients; however, the excessive amount of words in the diagnostic sentences poses problems. Deep learning may be an effective solution to overcome this problem, but the use of such a heavy model may also add another obstacle to systems with limited computing resources. Objective We aimed to build a diagnoses-extractive summarization model for hospital information systems and provide a service that can be operated even with limited computing resources. Methods We used a Bidirectional Encoder Representations from Transformers (BERT)-based structure with a two-stage training method based on 258,050 discharge diagnoses obtained from the National Taiwan University Hospital Integrated Medical Database, and the highlighted extractive summaries written by experienced doctors were labeled. The model size was reduced using a character-level token, the number of parameters was decreased from 108,523,714 to 963,496, and the model was pretrained using random mask characters in the discharge diagnoses and International Statistical Classification of Diseases and Related Health Problems sets. We then fine-tuned the model using summary labels and cleaned up the prediction results by averaging all probabilities for entire words to prevent character level–induced fragment words. Model performance was evaluated against existing models BERT, BioBERT, and Long Short-Term Memory (LSTM) using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) L score, and a questionnaire website was built to collect feedback from more doctors for each summary proposal. Results The area under the receiver operating characteristic curve values of the summary proposals were 0.928, 0.941, 0.899, and 0.947 for BERT, BioBERT, LSTM, and the proposed model (AlphaBERT), respectively. The ROUGE-L scores were 0.697, 0.711, 0.648, and 0.693 for BERT, BioBERT, LSTM, and AlphaBERT, respectively. The mean (SD) critique scores from doctors were 2.232 (0.832), 2.134 (0.877), 2.207 (0.844), 1.927 (0.910), and 2.126 (0.874) for reference-by-doctor labels, BERT, BioBERT, LSTM, and AlphaBERT, respectively. Based on the paired t test, there was a statistically significant difference in LSTM compared to the reference (P<.001), BERT (P=.001), BioBERT (P<.001), and AlphaBERT (P=.002), but not in the other models. Conclusions Use of character-level tokens in a BERT model can greatly decrease the model size without significantly reducing performance for diagnoses summarization. A well-developed deep-learning model will enhance doctors’ abilities to manage patients and promote medical studies by providing the capability to use extensive unstructured free-text notes.
Collapse
Affiliation(s)
- Yen-Pin Chen
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan.,Department of Emergency Medicine, National Taiwan University Hospital Chu-Tung Branch, Hsinchu County, Taiwan.,Department of Emergency Medicine, National Taiwan University Hospital, Taipei City, Taiwan
| | - Yi-Ying Chen
- Department of Emergency Medicine, National Taiwan University Hospital, Taipei City, Taiwan
| | - Jr-Jiun Lin
- Department of Emergency Medicine, National Taiwan University Hospital, Taipei City, Taiwan
| | - Chien-Hua Huang
- Department of Emergency Medicine, National Taiwan University Hospital, Taipei City, Taiwan.,Department of Emergency Medicine, College of Medicine, National Taiwan University, Taipei City, Taiwan
| | - Feipei Lai
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan.,Department of Computer Science & Information Engineering, National Taiwan University, Taipei City, Taiwan.,Department of Electrical Engineering, National Taiwan University, Taipei City, Taiwan
| |
Collapse
|
8
|
Natural language processing applications in library and information science. ONLINE INFORMATION REVIEW 2019. [DOI: 10.1108/oir-07-2018-0217] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
With the recent developments in information technologies, natural language processing (NLP) practices have made tasks in many areas easier and more practical. Nowadays, especially when big data are used in most research, NLP provides fast and easy methods for processing these data. The purpose of this paper is to identify subfields of library and information science (LIS) where NLP can be used and to provide a guide based on bibliometrics and social network analyses for researchers who intend to study this subject.
Design/methodology/approach
Within the scope of this study, 6,607 publications, including NLP methods published in the field of LIS, are examined and visualized by social network analysis methods.
Findings
After evaluating the obtained results, the subject categories of publications, frequently used keywords in these publications and the relationships between these words are revealed. Finally, the core journals and articles are classified thematically for researchers working in the field of LIS and planning to apply NLP in their research.
Originality/value
The results of this paper draw a general framework for LIS field and guides researchers on new techniques that may be useful in the field.
Collapse
|
9
|
Guo J, Blake C, Guan Y. Evaluating automated entity extraction with respect to drug and non-drug treatment strategies. J Biomed Inform 2019; 94:103177. [PMID: 30986506 DOI: 10.1016/j.jbi.2019.103177] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Revised: 04/10/2019] [Accepted: 04/11/2019] [Indexed: 11/28/2022]
Abstract
OBJECTIVES Treatment used in a randomized clinical trial is a critical data element both for physicians at the point of care and reviewers who are evaluating different interventions. Much of existing work on treatment extraction from the biomedical literature has focused on the extraction of pharmacological interventions. However, non-pharmacological interventions (e.g., exercise, diet, etc.) that are frequently used to address chronic conditions are less well studied. The goal of this study is to compare knowledge-based and machine learning strategies for the extraction of both drug and non-drug treatments. METHODS We collected 800 randomized clinical trial abstracts each for breast cancer and diabetes from PubMed. The treatments in the result/conclusion sentences of the abstracts were manually annotated and marked as drug/non-drug treatments. We then designed three methods to identify the treatments and evaluated the systems with respect to drug/non-drug treatments. The first method is solely based on knowledge base (here we used MetaMap). The second method is based on a machine learning model trained mainly on contextual features (ML_only). The third method is a combination approach that integrates the previous two approaches. RESULTS/DISCUSSION Results show that MetaMap, when used with high precision semantic types, has better performance for drug compared to non-drug treatments (F1 = 0.77 vs. 0.64). The ML_only approach has smaller performance difference between drug and non-drug treatments compared with the KB-based approach (F1 = 0.02 vs. 0.05, 0.07, and 0.13). The combination approach achieves significantly better performance than all MetaMap approaches alone for total treatments (F1 = 0.76 vs. 0.72, p < 0.001). The performance gain mainly comes from the non-drug treatments (0.03-0.08 improvement in F1), while the drug treatments do not benefit much from the combination approach (0-0.03 improvement in F1). CONCLUSION These results suggest that a knowledge-based approach should be employed for medical conditions that are primarily treated with drugs whereas conditions that are treated with either a combination of drug and non-drug interventions or primarily non-drug interventions should use automated tools that combine machine learning and a knowledge-based approach to achieve optimal performance.
Collapse
Affiliation(s)
- Jinlong Guo
- School of Information Sciences, University of Illinois at Urbana Champaign, USA.
| | - Catherine Blake
- School of Information Sciences, University of Illinois at Urbana Champaign, USA; Department of Computer Science, University of Illinois at Urbana Champaign, USA
| | - Yingjun Guan
- School of Information Sciences, University of Illinois at Urbana Champaign, USA
| |
Collapse
|
10
|
Moradi M. CIBS: A biomedical text summarizer using topic-based sentence clustering. J Biomed Inform 2018; 88:53-61. [DOI: 10.1016/j.jbi.2018.11.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2018] [Revised: 09/26/2018] [Accepted: 11/12/2018] [Indexed: 12/21/2022]
|
11
|
Scarton LA, Wang L, Kilicoglu H, Jahries M, Del Fiol G. Expanding vocabularies for complementary and alternative medicine therapies. Int J Med Inform 2018; 121:64-74. [PMID: 30545491 DOI: 10.1016/j.ijmedinf.2018.11.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 11/14/2018] [Accepted: 11/20/2018] [Indexed: 01/19/2023]
Abstract
OBJECTIVE There is a significant consumer demand for complementary and alternative medicine (CAM) therapies as possible alternatives to drugs in the treatment and prevention of chronic diseases. Expanding controlled vocabularies to include CAM treatment relations could help meet those needs by facilitating information retrieval from the published literature. The purpose of this study is to design and evaluate two methods to semi-automatically extract CAM treatment-related semantic predications (subject-predicate-object triplets) from the biomedical literature using the Semantic Medline database (SemMedDB). METHODS Predications were retrieved from SemMedDB, a database of semantic predications extracted from article abstracts available in Medline. Predications were retrieved for 20 biologically-based and 3 mind-body CAM therapies. The first method (allMedline) retrieved predications from any Medline citation, while the second method (soundStudies) only retrieved predications from scientifically sound clinical studies. Filtering criteria were applied to identify the predications focusing on the treatment and prevention of medical disorders using various CAM modalities. The disorders were extracted for each CAM therapy and ranked by occurrence. A reference vocabulary, composed of 20 biologically-based and 3 mind-body CAM therapies, was developed to evaluate the performance of each method according to precision and recall of the top 100 ranked concepts as well as average precision and recall. RESULTS The difference between allMedline and soundStudies in terms of median precision for the top 100 concepts ranked by occurrence was significant (21.0% versus 27.0%, p < .001). The soundStudies method had significantly higher precision (7.0% vs 11.5%, p < .001) and the allMedline had significantly higher recall (37.1% vs 25.6%, p < .001). CONCLUSION The soundStudies method may be useful for extracting treatment-related predications from the biomedical literature for the highest ranked concepts. Additional work is needed to improve the algorithm as well as identify and report shortcomings for future enhancements of the tools used to populate SemMedDB.
Collapse
Affiliation(s)
- Lou Ann Scarton
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT 84108, USA.
| | - Liqin Wang
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, 1620 Tremont St, Boston, MA 02120, USA; Harvard Medical School, A-111, 25 Shattuck Street, Boston, MA 02115, USA
| | - Halil Kilicoglu
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Margaret Jahries
- Gateway Emerging Technology Wellness Center, 440 West 200 South, Suite 250, Salt Lake City, Utah 84101, USA
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT 84108, USA
| |
Collapse
|
12
|
Nasr Azadani M, Ghadiri N, Davoodijam E. Graph-based biomedical text summarization: An itemset mining and sentence clustering approach. J Biomed Inform 2018; 84:42-58. [DOI: 10.1016/j.jbi.2018.06.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Revised: 04/21/2018] [Accepted: 06/10/2018] [Indexed: 10/28/2022]
|
13
|
|
14
|
Garcia-Gathright JI, Matiasz NJ, Adame C, Sarma KV, Sauer L, Smedley NF, Spiegel ML, Strunck J, Garon EB, Taira RK, Aberle DR, Bui AAT. Evaluating Casama: Contextualized semantic maps for summarization of lung cancer studies. Comput Biol Med 2018; 92:55-63. [PMID: 29149658 PMCID: PMC5762403 DOI: 10.1016/j.compbiomed.2017.10.034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Revised: 10/28/2017] [Accepted: 10/29/2017] [Indexed: 01/15/2023]
Abstract
OBJECTIVE It is crucial for clinicians to stay up to date on current literature in order to apply recent evidence to clinical decision making. Automatic summarization systems can help clinicians quickly view an aggregated summary of literature on a topic. Casama, a representation and summarization system based on "contextualized semantic maps," captures the findings of biomedical studies as well as the contexts associated with patient population and study design. This paper presents a user-oriented evaluation of Casama in comparison to a context-free representation, SemRep. MATERIALS AND METHODS The effectiveness of the representation was evaluated by presenting users with manually annotated Casama and SemRep summaries of ten articles on driver mutations in cancer. Automatic annotations were evaluated on a collection of articles on EGFR mutation in lung cancer. Seven users completed a questionnaire rating the summarization quality for various topics and applications. RESULTS Casama had higher median scores than SemRep for the majority of the topics (p≤ 0.00032), all of the applications (p≤ 0.00089), and in overall summarization quality (p≤ 1.5e-05). Casama's manual annotations outperformed Casama's automatic annotations (p = 0.00061). DISCUSSION Casama performed particularly well in the representation of strength of evidence, which was highly rated both quantitatively and qualitatively. Users noted that Casama's less granular, more targeted representation improved usability compared to SemRep. CONCLUSION This evaluation demonstrated the benefits of a contextualized representation for summarizing biomedical literature on cancer. Iteration on specific areas of Casama's representation, further development of its algorithms, and a clinically-oriented evaluation are warranted.
Collapse
Affiliation(s)
- Jean I Garcia-Gathright
- University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA.
| | - Nicholas J Matiasz
- University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
| | - Carlos Adame
- University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
| | - Karthik V Sarma
- University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
| | - Lauren Sauer
- University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
| | - Nova F Smedley
- University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
| | - Marshall L Spiegel
- University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
| | - Jennifer Strunck
- University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
| | - Edward B Garon
- University of California, Los Angeles, Department of Medicine - Division of Hematology-Oncology, 924 Westwood Boulevard, Suite 200, Los Angeles, CA, 90024, USA
| | - Ricky K Taira
- University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA; University of California, Los Angeles, Department of Radiological Sciences, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
| | - Denise R Aberle
- University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA; University of California, Los Angeles, Department of Radiological Sciences, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
| | - Alex A T Bui
- University of California, Los Angeles, Department of Bioengineering, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA; University of California, Los Angeles, Department of Radiological Sciences, 924 Westwood Boulevard, Suite 420, Los Angeles, CA, 90024, USA
| |
Collapse
|
15
|
Moradi M, Ghadiri N. Different approaches for identifying important concepts in probabilistic biomedical text summarization. Artif Intell Med 2018; 84:101-116. [DOI: 10.1016/j.artmed.2017.11.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Revised: 08/25/2017] [Accepted: 11/28/2017] [Indexed: 10/18/2022]
|
16
|
|
17
|
Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017; 117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo , Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense E-32004, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia) , Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain.,CEB-Centre of Biological Engineering, University of Minho , Campus de Gualtar, Braga 4710-057, Portugal
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Alfonso Valencia
- Life Science Department, Barcelona Supercomputing Centre (BSC-CNS) , C/Jordi Girona, 29-31, Barcelona E-08034, Spain.,Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona , C/ Baldiri Reixac 10, Barcelona E-08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig de Lluís Companys 23, Barcelona E-08010, Spain
| |
Collapse
|
18
|
Karaa WBA, Ashour AS, Sassi DB, Roy P, Kausar N, Dey N. MEDLINE Text Mining: An Enhancement Genetic Algorithm Based Approach for Document Clustering. INTELLIGENT SYSTEMS REFERENCE LIBRARY 2016. [DOI: 10.1007/978-3-319-21212-8_12] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
19
|
Bravo À, Piñero J, Queralt-Rosinach N, Rautschka M, Furlong LI. Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC Bioinformatics 2015; 16:55. [PMID: 25886734 PMCID: PMC4466840 DOI: 10.1186/s12859-015-0472-9] [Citation(s) in RCA: 116] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 01/19/2015] [Indexed: 11/23/2022] Open
Abstract
Background Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. Results By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. Conclusions BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0472-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Àlex Bravo
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Janet Piñero
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Núria Queralt-Rosinach
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Michael Rautschka
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
20
|
Alhindi A, Kruschwitz U, Fox C, Albakour MD. Profile-Based Summarisation for Web Site Navigation. ACM T INFORM SYST 2015. [DOI: 10.1145/2699661] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Information systems that utilise contextual information have the potential of helping a user identify relevant information more quickly and more accurately than systems that work the same for all users and contexts. Contextual information comes in a variety of types, often derived from records of past interactions between a user and the information system. It can be individual or group based. We are focusing on the latter, harnessing the search behaviour of cohorts of users, turning it into a domain model that can then be used to assist other users of the same cohort. More specifically, we aim to explore how such a domain model is best utilised for
profile-biased summarisation
of documents in a
navigation
scenario in which such summaries can be displayed as hover text as a user moves the mouse over a link. The main motivation is to help a user find relevant documents more quickly. Given the fact that the
Web
in general has been studied extensively already, we focus our attention on
Web sites
and similar document collections. Such collections can be notoriously difficult to search or explore. The process of acquiring the domain model is not a research interest here; we simply adopt a biologically inspired method that resembles the idea of ant colony optimisation. This has been shown to work well in a variety of application areas. The model can be built in a continuous learning cycle that exploits search patterns as recorded in typical query log files. Our research explores different summarisation techniques, some of which use the domain model and some that do not. We perform task-based evaluations of these different techniques—thus of the impact of the domain model and profile-biased summarisation—in the context of Web site navigation.
Collapse
Affiliation(s)
| | | | - Chris Fox
- University of Essex, Colchester, United Kingdom
| | | |
Collapse
|
21
|
Cohen AM, Smalheiser NR, McDonagh MS, Yu C, Adams CE, Davis JM, Yu PS. Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine. J Am Med Inform Assoc 2015; 22:707-17. [PMID: 25656516 PMCID: PMC4457112 DOI: 10.1093/jamia/ocu025] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 11/15/2014] [Indexed: 11/24/2022] Open
Abstract
Objective: For many literature review tasks, including systematic review (SR) and other aspects of evidence-based medicine, it is important to know whether an article describes a randomized controlled trial (RCT). Current manual annotation is not complete or flexible enough for the SR process. In this work, highly accurate machine learning predictive models were built that include confidence predictions of whether an article is an RCT. Materials and Methods: The LibSVM classifier was used with forward selection of potential feature sets on a large human-related subset of MEDLINE to create a classification model requiring only the citation, abstract, and MeSH terms for each article. Results: The model achieved an area under the receiver operating characteristic curve of 0.973 and mean squared error of 0.013 on the held out year 2011 data. Accurate confidence estimates were confirmed on a manually reviewed set of test articles. A second model not requiring MeSH terms was also created, and performs almost as well. Discussion: Both models accurately rank and predict article RCT confidence. Using the model and the manually reviewed samples, it is estimated that about 8000 (3%) additional RCTs can be identified in MEDLINE, and that 5% of articles tagged as RCTs in Medline may not be identified. Conclusion: Retagging human-related studies with a continuously valued RCT confidence is potentially more useful for article ranking and review than a simple yes/no prediction. The automated RCT tagging tool should offer significant savings of time and effort during the process of writing SRs, and is a key component of a multistep text mining pipeline that we are building to streamline SR workflow. In addition, the model may be useful for identifying errors in MEDLINE publication types. The RCT confidence predictions described here have been made available to users as a web service with a user query form front end at: http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.
Collapse
Affiliation(s)
- Aaron M Cohen
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239 USA
| | - Neil R Smalheiser
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612 USA
| | - Marian S McDonagh
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239 USA
| | - Clement Yu
- Department of Computer Science, University of Illinois at Chicago, Chicago, IL 60612 USA
| | - Clive E Adams
- Division of Psychiatry, University of Nottingham, Nottingham, UK
| | - John M Davis
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612 USA
| | - Philip S Yu
- Department of Computer Science, University of Illinois at Chicago, Chicago, IL 60612 USA
| |
Collapse
|
22
|
Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J, Del Fiol G. Text summarization in the biomedical domain: a systematic review of recent research. J Biomed Inform 2014; 52:457-67. [PMID: 25016293 DOI: 10.1016/j.jbi.2014.06.009] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2014] [Revised: 06/04/2014] [Accepted: 06/23/2014] [Indexed: 11/27/2022]
Abstract
OBJECTIVE The amount of information for clinicians and clinical researchers is growing exponentially. Text summarization reduces information as an attempt to enable users to find and understand relevant source texts more quickly and effortlessly. In recent years, substantial research has been conducted to develop and evaluate various summarization techniques in the biomedical domain. The goal of this study was to systematically review recent published research on summarization of textual documents in the biomedical domain. MATERIALS AND METHODS MEDLINE (2000 to October 2013), IEEE Digital Library, and the ACM digital library were searched. Investigators independently screened and abstracted studies that examined text summarization techniques in the biomedical domain. Information is derived from selected articles on five dimensions: input, purpose, output, method and evaluation. RESULTS Of 10,786 studies retrieved, 34 (0.3%) met the inclusion criteria. Natural language processing (17; 50%) and a hybrid technique comprising of statistical, Natural language processing and machine learning (15; 44%) were the most common summarization approaches. Most studies (28; 82%) conducted an intrinsic evaluation. DISCUSSION This is the first systematic review of text summarization in the biomedical domain. The study identified research gaps and provides recommendations for guiding future research on biomedical text summarization. CONCLUSION Recent research has focused on a hybrid technique comprising statistical, language processing and machine learning techniques. Further research is needed on the application and evaluation of text summarization in real research or patient care settings.
Collapse
Affiliation(s)
- Rashmi Mishra
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Jiantao Bian
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Clinical Modeling Team, Intermountain Healthcare, Salt Lake City, UT, USA
| | - Marcelo Fiszman
- Lister Hill Center, National Library of Medicine, Bethesda, MD, USA
| | - Charlene R Weir
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; VA Medical Center, Salt Lake City, UT, USA
| | - Siddhartha Jonnalagadda
- Department of Preventive Medicine-Health and Biomedical Informatics, Northwestern University, Chicago, IL, USA
| | - Javed Mostafa
- School of Information and Library Science (SILS), University of North Carolina, Chapel Hill, NC, USA
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
23
|
Demner-Fushman D, Mork JG, Aronson AR. Mining MEDLINE for problems associated with vitamin D. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2013; 2013:300-308. [PMID: 24551339 PMCID: PMC3900180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
This paper presents a two-step approach to generating comprehensive abstractive overviews for biomedical topics. It starts with a sensitivity-maximizing search of MEDLINE/PubMed and MeSH-based filtering of the results that are then processed using NLP methods to extract relations between entities of interest. We evaluate this approach in a case study based on the IOM report on the role of vitamin D in human health. The report defines disorders that serve as health indicators for the role of vitamin D. We evaluate the abstractive overviews generated using MeSH indexing and the extracted relations using the disorders listed in the IOM report as reference standard. We conclude that MeSH-based aggregation and filtering of the results is a useful and easy step in the generation of abstractive overviews. Although our relation extraction achieved 83.6% recall and 92.8% precision, only half of the disorders of interest participated in these relations.
Collapse
Affiliation(s)
- Dina Demner-Fushman
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, National Institutes of Health, DHHS, Bethesda, MD
| | - James G Mork
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, National Institutes of Health, DHHS, Bethesda, MD
| | - Alan R Aronson
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, National Institutes of Health, DHHS, Bethesda, MD
| |
Collapse
|
24
|
Mishra R, Del Fiol G, Kilicoglu H, Jonnalagadda S, Fiszman M. Automatically extracting clinically useful sentences from UpToDate to support clinicians' information needs. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2013; 2013:987-992. [PMID: 24551389 PMCID: PMC3900230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
UNLABELLED Clinicians raise several information needs in the course of care. Most of these needs can be met by online health knowledge resources such as UpToDate. However, finding relevant information in these resources often requires significant time and cognitive effort. OBJECTIVE To design and assess algorithms for extracting from UpToDate the sentences that represent the most clinically useful information for patient care decision making. METHODS We developed algorithms based on semantic predications extracted with SemRep, a semantic natural language processing parser. Two algorithms were compared against a gold standard composed of UpToDate sentences rated in terms of clinical usefulness. RESULTS Clinically useful sentences were strongly correlated with predication frequency (correlation= 0.95). The two algorithms did not differ in terms of top ten precision (53% vs. 49%; p=0.06). CONCLUSIONS Semantic predications may serve as the basis for extracting clinically useful sentences. Future research is needed to improve the algorithms.
Collapse
Affiliation(s)
- Rashmi Mishra
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Halil Kilicoglu
- Lister Hill Center, National Library of Medicine, Bethesda, MD, USA
| | | | - Marcelo Fiszman
- Lister Hill Center, National Library of Medicine, Bethesda, MD, USA
| |
Collapse
|
25
|
Development and evaluation of a biomedical search engine using a predicate-based vector space model. J Biomed Inform 2013; 46:929-39. [PMID: 23892296 DOI: 10.1016/j.jbi.2013.07.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2013] [Revised: 06/18/2013] [Accepted: 07/19/2013] [Indexed: 11/21/2022]
Abstract
Although biomedical information available in articles and patents is increasing exponentially, we continue to rely on the same information retrieval methods and use very few keywords to search millions of documents. We are developing a fundamentally different approach for finding much more precise and complete information with a single query using predicates instead of keywords for both query and document representation. Predicates are triples that are more complex datastructures than keywords and contain more structured information. To make optimal use of them, we developed a new predicate-based vector space model and query-document similarity function with adjusted tf-idf and boost function. Using a test bed of 107,367 PubMed abstracts, we evaluated the first essential function: retrieving information. Cancer researchers provided 20 realistic queries, for which the top 15 abstracts were retrieved using a predicate-based (new) and keyword-based (baseline) approach. Each abstract was evaluated, double-blind, by cancer researchers on a 0-5 point scale to calculate precision (0 versus higher) and relevance (0-5 score). Precision was significantly higher (p<.001) for the predicate-based (80%) than for the keyword-based (71%) approach. Relevance was almost doubled with the predicate-based approach-2.1 versus 1.6 without rank order adjustment (p<.001) and 1.34 versus 0.98 with rank order adjustment (p<.001) for predicate--versus keyword-based approach respectively. Predicates can support more precise searching than keywords, laying the foundation for rich and sophisticated information search.
Collapse
|
26
|
Zhang H, Fiszman M, Shin D, Wilkowski B, Rindflesch TC. Clustering cliques for graph-based summarization of the biomedical research literature. BMC Bioinformatics 2013; 14:182. [PMID: 23742159 PMCID: PMC3682874 DOI: 10.1186/1471-2105-14-182] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2012] [Accepted: 05/29/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). RESULTS SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings. CONCLUSIONS For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively.
Collapse
Affiliation(s)
- Han Zhang
- Department of Medical Informatics, China Medical University, Shenyang, Liaoning 110001, China
- National Library of Medicine, Bethesda, MD 20894, USA
| | | | - Dongwook Shin
- National Library of Medicine, Bethesda, MD 20894, USA
| | - Bartlomiej Wilkowski
- DTUInformatics, Technical University of Denmark, Kongens Lyngby, Denmark
- Danish National Biobank, National Health Surveillance & Research, Statens Serum Institut, Copenhagen, Denmark
| | | |
Collapse
|
27
|
Jonnalagadda SR, Del Fiol G, Medlin R, Weir C, Fiszman M, Mostafa J, Liu H. Automatically extracting sentences from Medline citations to support clinicians' information needs. J Am Med Inform Assoc 2012; 20:995-1000. [PMID: 23100128 DOI: 10.1136/amiajnl-2012-001347] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE Online health knowledge resources contain answers to most of the information needs raised by clinicians in the course of care. However, significant barriers limit the use of these resources for decision-making, especially clinicians' lack of time. In this study we assessed the feasibility of automatically generating knowledge summaries for a particular clinical topic composed of relevant sentences extracted from Medline citations. METHODS The proposed approach combines information retrieval and semantic information extraction techniques to identify relevant sentences from Medline abstracts. We assessed this approach in two case studies on the treatment alternatives for depression and Alzheimer's disease. RESULTS A total of 515 of 564 (91.3%) sentences retrieved in the two case studies were relevant to the topic of interest. About one-third of the relevant sentences described factual knowledge or a study conclusion that can be used for supporting information needs at the point of care. CONCLUSIONS The high rate of relevant sentences is desirable, given that clinicians' lack of time is one of the main barriers to using knowledge resources at the point of care. Sentence rank was not significantly associated with relevancy, possibly due to most sentences being highly relevant. Sentences located closer to the end of the abstract and sentences with treatment and comparative predications were likely to be conclusive sentences. Our proposed technical approach to helping clinicians meet their information needs is promising. The approach can be extended for other knowledge resources and information need types.
Collapse
|
28
|
Wu JA, Hsu W, Bui AAT. An Approach for Incorporating Context in Building Probabilistic Predictive Models. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS, IMAGING AND SYSTEMS BIOLOGY 2012; 2012:96-105. [PMID: 27617299 PMCID: PMC5017790 DOI: 10.1109/hisb.2012.30] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
With the increasing amount of information collected through clinical practice and scientific experimentation, a growing challenge is how to utilize available resources to construct predictive models to facilitate clinical decision making. Clinicians often have questions related to the treatment and outcome of a medical problem for individual patients; however, few tools exist that leverage the large collection of patient data and scientific knowledge to answer these questions. Without appropriate context, existing data that have been collected for a specific task may not be suitable for creating new models that answer different questions. This paper presents an approach that leverages available structured or unstructured data to build a probabilistic predictive model that assists physicians with answering clinical questions on individual patients. Various challenges related to transforming available data to an end-user application are addressed: problem decomposition, variable selection, context representation, automated extraction of information from unstructured data sources, model generation, and development of an intuitive application to query the model and present the results. We describe our efforts towards building a model that predicts the risk of vasospasm in aneurysm patients.
Collapse
Affiliation(s)
- Juan Anna Wu
- Biomedical Engineering IDP, Medical Imaging Informatics Group, University of California, Los Angeles, USA
| | - William Hsu
- Department of Radiological Sciences, Medical Imaging Informatics Group, University of California, Los Angeles, USA
| | - Alex AT Bui
- Department of Radiological Sciences, Medical Imaging Informatics Group, University of California, Los Angeles, USA
| |
Collapse
|
29
|
Workman TE, Fiszman M, Hurdle JF. Text summarization as a decision support aid. BMC Med Inform Decis Mak 2012; 12:41. [PMID: 22621674 PMCID: PMC3461485 DOI: 10.1186/1472-6947-12-41] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2011] [Accepted: 04/18/2012] [Indexed: 11/18/2022] Open
Abstract
Background PubMed data potentially can provide decision support information, but PubMed was not exclusively designed to be a point-of-care tool. Natural language processing applications that summarize PubMed citations hold promise for extracting decision support information. The objective of this study was to evaluate the efficiency of a text summarization application called Semantic MEDLINE, enhanced with a novel dynamic summarization method, in identifying decision support data. Methods We downloaded PubMed citations addressing the prevention and drug treatment of four disease topics. We then processed the citations with Semantic MEDLINE, enhanced with the dynamic summarization method. We also processed the citations with a conventional summarization method, as well as with a baseline procedure. We evaluated the results using clinician-vetted reference standards built from recommendations in a commercial decision support product, DynaMed. Results For the drug treatment data, Semantic MEDLINE enhanced with dynamic summarization achieved average recall and precision scores of 0.848 and 0.377, while conventional summarization produced 0.583 average recall and 0.712 average precision, and the baseline method yielded average recall and precision values of 0.252 and 0.277. For the prevention data, Semantic MEDLINE enhanced with dynamic summarization achieved average recall and precision scores of 0.655 and 0.329. The baseline technique resulted in recall and precision scores of 0.269 and 0.247. No conventional Semantic MEDLINE method accommodating summarization for prevention exists. Conclusion Semantic MEDLINE with dynamic summarization outperformed conventional summarization in terms of recall, and outperformed the baseline method in both recall and precision. This new approach to text summarization demonstrates potential in identifying decision support data for multiple needs.
Collapse
Affiliation(s)
- T Elizabeth Workman
- Department of Biomedical Informatics, University of Utah, HSEB 5775, Salt Lake City, UT 84112, USA.
| | | | | |
Collapse
|
30
|
Workman TE, Stoddart JM. Rethinking information delivery: using a natural language processing application for point-of-care data discovery. J Med Libr Assoc 2012; 100:113-20. [PMID: 22514507 DOI: 10.3163/1536-5050.100.2.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVE This paper examines the use of Semantic MEDLINE, a natural language processing application enhanced with a statistical algorithm known as Combo, as a potential decision support tool for clinicians. Semantic MEDLINE summarizes text in PubMed citations, transforming it into compact declarations that are filtered according to a user's information need that can be displayed in a graphic interface. Integration of the Combo algorithm enables Semantic MEDLINE to deliver information salient to many diverse needs. METHODS The authors selected three disease topics and crafted PubMed search queries to retrieve citations addressing the prevention of these diseases. They then processed the citations with Semantic MEDLINE, with the Combo algorithm enhancement. To evaluate the results, they constructed a reference standard for each disease topic consisting of preventive interventions recommended by a commercial decision support tool. RESULTS Semantic MEDLINE with Combo produced an average recall of 79% in primary and secondary analyses, an average precision of 45%, and a final average F-score of 0.57. CONCLUSION This new approach to point-of-care information delivery holds promise as a decision support tool for clinicians. Health sciences libraries could implement such technologies to deliver tailored information to their users.
Collapse
Affiliation(s)
- T Elizabeth Workman
- Postdoctoral Research Associate, Department of Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA.
| | | |
Collapse
|
31
|
Kilicoglu H, Rosemblat G, Fiszman M, Rindflesch TC. Constructing a semantic predication gold standard from the biomedical literature. BMC Bioinformatics 2011; 12:486. [PMID: 22185221 PMCID: PMC3281188 DOI: 10.1186/1471-2105-12-486] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Accepted: 12/20/2011] [Indexed: 11/30/2022] Open
Abstract
Background Semantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biomedical text mining focus on narrow subdomains and adopt different semantic representations, rendering them difficult to use for benchmarking independently developed relation extraction systems. In this article, we present a multi-phase gold standard annotation study, in which we annotated 500 sentences randomly selected from MEDLINE abstracts on a wide range of biomedical topics with 1371 semantic predications. The UMLS Metathesaurus served as the main source for conceptual information and the UMLS Semantic Network for relational information. We measured interannotator agreement and analyzed the annotations closely to identify some of the challenges in annotating biomedical text with relations based on an ontology or a terminology. Results We obtain fair to moderate interannotator agreement in the practice phase (0.378-0.475). With improved guidelines and additional semantic equivalence criteria, the agreement increases by 12% (0.415 to 0.536) in the main annotation phase. In addition, we find that agreement increases to 0.688 when the agreement calculation is limited to those predications that are based only on the explicitly provided UMLS concepts and relations. Conclusions While interannotator agreement in the practice phase confirms that conceptual annotation is a challenging task, the increasing agreement in the main annotation phase points out that an acceptable level of agreement can be achieved in multiple iterations, by setting stricter guidelines and establishing semantic equivalence criteria. Mapping text to ontological concepts emerges as the main challenge in conceptual annotation. Annotating predications involving biomolecular entities and processes is particularly challenging. While the resulting gold standard is mainly intended to serve as a test collection for our semantic interpreter, we believe that the lessons learned are applicable generally.
Collapse
Affiliation(s)
- Halil Kilicoglu
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD, USA.
| | | | | | | |
Collapse
|
32
|
Li Y, Salmasian H, Harpaz R, Chase H, Friedman C. Determining the reasons for medication prescriptions in the EHR using knowledge and natural language processing. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2011; 2011:768-776. [PMID: 22195134 PMCID: PMC3243251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Knowledge of medication indications is significant for automatic applications aimed at improving patient safety, such as computerized physician order entry and clinical decision support systems. The Electronic Health Record (EHR) contains pertinent information related to patient safety such as information related to appropriate prescribing. However, the reasons for medication prescriptions are usually not explicitly documented in the patient record. This paper describes a method that determines the reasons for medication uses based on information occurring in outpatient notes. The method utilizes drug-indication knowledge that we acquired, and natural language processing. Evaluation showed the method obtained a sensitivity of 62.8%, specificity of 93.9%, precision of 90% and F-measure of 73.9%. This pilot study demonstrated that linking external drug indication knowledge to the EHR for determining the reasons for medication use was promising, but also revealed some challenges. Future work will focus on increasing the accuracy and coverage of the indication knowledge and evaluating its performance using a much larger set of drugs frequently used in the outpatient population.
Collapse
Affiliation(s)
- Ying Li
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | | | | | | | | |
Collapse
|
33
|
Zhang H, Fiszman M, Shin D, Miller CM, Rosemblat G, Rindflesch TC. Degree centrality for semantic abstraction summarization of therapeutic studies. J Biomed Inform 2011; 44:830-8. [PMID: 21575741 DOI: 10.1016/j.jbi.2011.05.001] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2010] [Revised: 04/25/2011] [Accepted: 05/02/2011] [Indexed: 11/28/2022]
Abstract
Automatic summarization has been proposed to help manage the results of biomedical information retrieval systems. Semantic MEDLINE, for example, summarizes semantic predications representing assertions in MEDLINE citations. Results are presented as a graph which maintains links to the original citations. Graphs summarizing more than 500 citations are hard to read and navigate, however. We exploit graph theory for focusing these large graphs. The method is based on degree centrality, which measures connectedness in a graph. Four categories of clinical concepts related to treatment of disease were identified and presented as a summary of input text. A baseline was created using term frequency of occurrence. The system was evaluated on summaries for treatment of five diseases compared to a reference standard produced manually by two physicians. The results showed that recall for system results was 72%, precision was 73%, and F-score was 0.72. The system F-score was considerably higher than that for the baseline (0.47).
Collapse
Affiliation(s)
- Han Zhang
- Department of Medical Informatics, China Medical University, Shenyang, China.
| | | | | | | | | | | |
Collapse
|
34
|
Workman TE, Hurdle JF. Dynamic summarization of bibliographic-based data. BMC Med Inform Decis Mak 2011; 11:6. [PMID: 21284871 PMCID: PMC3042900 DOI: 10.1186/1472-6947-11-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2010] [Accepted: 02/01/2011] [Indexed: 11/15/2022] Open
Abstract
Background Traditional information retrieval techniques typically return excessive output when directed at large bibliographic databases. Natural Language Processing applications strive to extract salient content from the excessive data. Semantic MEDLINE, a National Library of Medicine (NLM) natural language processing application, highlights relevant information in PubMed data. However, Semantic MEDLINE implements manually coded schemas, accommodating few information needs. Currently, there are only five such schemas, while many more would be needed to realistically accommodate all potential users. The aim of this project was to develop and evaluate a statistical algorithm that automatically identifies relevant bibliographic data; the new algorithm could be incorporated into a dynamic schema to accommodate various information needs in Semantic MEDLINE, and eliminate the need for multiple schemas. Methods We developed a flexible algorithm named Combo that combines three statistical metrics, the Kullback-Leibler Divergence (KLD), Riloff's RlogF metric (RlogF), and a new metric called PredScal, to automatically identify salient data in bibliographic text. We downloaded citations from a PubMed search query addressing the genetic etiology of bladder cancer. The citations were processed with SemRep, an NLM rule-based application that produces semantic predications. SemRep output was processed by Combo, in addition to the standard Semantic MEDLINE genetics schema and independently by the two individual KLD and RlogF metrics. We evaluated each summarization method using an existing reference standard within the task-based context of genetic database curation. Results Combo asserted 74 genetic entities implicated in bladder cancer development, whereas the traditional schema asserted 10 genetic entities; the KLD and RlogF metrics individually asserted 77 and 69 genetic entities, respectively. Combo achieved 61% recall and 81% precision, with an F-score of 0.69. The traditional schema achieved 23% recall and 100% precision, with an F-score of 0.37. The KLD metric achieved 61% recall, 70% precision, with an F-score of 0.65. The RlogF metric achieved 61% recall, 72% precision, with an F-score of 0.66. Conclusions Semantic MEDLINE summarization using the new Combo algorithm outperformed a conventional summarization schema in a genetic database curation task. It potentially could streamline information acquisition for other needs without having to hand-build multiple saliency schemas.
Collapse
Affiliation(s)
- T Elizabeth Workman
- Department of Biomedical Informatics, University of Utah, HSEB 5775, Salt Lake City, UT, USA.
| | | |
Collapse
|
35
|
Workman TE, Fiszman M, Hurdle JF, Rindflesch TC. Biomedical text summarization to support genetic database curation: using Semantic MEDLINE to create a secondary database of genetic information. J Med Libr Assoc 2011; 98:273-81. [PMID: 20936065 DOI: 10.3163/1536-5050.98.4.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVE This paper examines the development and evaluation of an automatic summarization system in the domain of molecular genetics. The system is a potential component of an advanced biomedical information management application called Semantic MEDLINE and could assist librarians in developing secondary databases of genetic information extracted from the primary literature. METHODS An existing summarization system was modified for identifying biomedical text relevant to the genetic etiology of disease. The summarization system was evaluated on the task of identifying data describing genes associated with bladder cancer in MEDLINE citations. A gold standard was produced using records from Genetics Home Reference and Online Mendelian Inheritance in Man. Genes in text found by the system were compared to the gold standard. Recall, precision, and F-measure were calculated. RESULTS The system achieved recall of 46%, and precision of 88% (F-measure=0.61) by taking Gene References into Function (GeneRIFs) into account. CONCLUSION The new summarization schema for genetic etiology has potential as a component in Semantic MEDLINE to support the work of data curators.
Collapse
Affiliation(s)
- T Elizabeth Workman
- Department of Biomedical Informatics, University of Utah, 26 S 2000 E, HSEB 5700, Salt Lake City, UT 84112, USA.
| | | | | | | |
Collapse
|
36
|
Keselman A, Rosemblat G, Kilicoglu H, Fiszman M, Jin H, Shin D, Rindflesch TC. Adapting Semantic Natural Language Processing Technology to Address Information Overload in Influenza Epidemic Management. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY : JASIST 2010; 61:10.1002/asi.21414. [PMID: 24311971 PMCID: PMC3847910 DOI: 10.1002/asi.21414] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Explosion of disaster health information results in information overload among response professionals. The objective of this project was to determine the feasibility of applying semantic natural language processing (NLP) technology to addressing this overload. The project characterizes concepts and relationships commonly used in disaster health-related documents on influenza pandemics, as the basis for adapting an existing semantic summarizer to the domain. Methods include human review and semantic NLP analysis of a set of relevant documents. This is followed by a pilot-test in which two information specialists use the adapted application for a realistic information seeking task. According to the results, the ontology of influenza epidemics management can be described via a manageable number of semantic relationships that involve concepts from a limited number of semantic types. Test users demonstrate several ways to engage with the application to obtain useful information. This suggests that existing semantic NLP algorithms can be adapted to support information summarization and visualization in influenza epidemics and other disaster health areas. However, additional research is needed in the areas of terminology development (as many relevant relationships and terms are not part of existing standardized vocabularies), NLP, and user interface design.
Collapse
|
37
|
Wang X, Chase HS, Li J, Hripcsak G, Friedman C. Integrating heterogeneous knowledge sources to acquire executable drug-related knowledge. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2010; 2010:852-856. [PMID: 21347099 PMCID: PMC3041361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Knowledge of medical entities, such as drug-related information is critical for many automated biomedical applications, such as decision support and pharmacovigilance. In this work, heterogeneous information sources were integrated automatically to obtain drug-related knowledge. We focus on one type of knowledge, drug-treats-condition, in the study and propose a framework for integrating disparate knowledge sources. Evaluation based on a random sample of drug-condition pairs indicated an overall coverage of 96%, recall of 98% and a precision of 87%. In conclusion, the preliminary study demonstrated that the knowledge generated from this study was comparable to the manually curated gold standard and that this method of automatically integrating knowledge sources is effective. The automated method should also be applicable to integrate other clinical knowledge, such as drug-related knowledge with omics information.
Collapse
Affiliation(s)
- Xiaoyan Wang
- Department of Biomedical Informatics, Columbia University, New York, NY
| | | | | | | | | |
Collapse
|
38
|
The structure and dynamics of cocitation clusters: A multiple-perspective cocitation analysis. ACTA ACUST UNITED AC 2010. [DOI: 10.1002/asi.21309] [Citation(s) in RCA: 714] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
39
|
Demner-Fushman D, Chapman WW, McDonald CJ. What can natural language processing do for clinical decision support? J Biomed Inform 2009; 42:760-72. [PMID: 19683066 PMCID: PMC2757540 DOI: 10.1016/j.jbi.2009.08.007] [Citation(s) in RCA: 274] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2008] [Revised: 08/10/2009] [Accepted: 08/11/2009] [Indexed: 11/29/2022]
Abstract
Computerized clinical decision support (CDS) aims to aid decision making of health care providers and the public by providing easily accessible health-related information at the point and time it is needed. natural language processing (NLP) is instrumental in using free-text information to drive CDS, representing clinical knowledge and CDS interventions in standardized formats, and leveraging clinical narrative. The early innovative NLP research of clinical narrative was followed by a period of stable research conducted at the major clinical centers and a shift of mainstream interest to biomedical NLP. This review primarily focuses on the recently renewed interest in development of fundamental NLP methods and advances in the NLP systems for CDS. The current solutions to challenges posed by distinct sublanguages, intended user groups, and support goals are discussed.
Collapse
Affiliation(s)
- Dina Demner-Fushman
- U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
40
|
Chapman WW, Cohen KB. Current issues in biomedical text mining and natural language processing. J Biomed Inform 2009; 42:757-9. [PMID: 19735740 DOI: 10.1016/j.jbi.2009.09.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2009] [Revised: 08/31/2009] [Accepted: 09/01/2009] [Indexed: 11/29/2022]
|