1
|
Eghbali N, Klochko C, Razoky P, Chintalapati P, Jawad E, Mahdi Z, Craig J, Ghassemi MM. Improving Automating Quality Control in Radiology: Leveraging Large Language Models to Extract Correlative Findings in Radiology and Operative Reports. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:135-144. [PMID: 38827099 PMCID: PMC11141845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Radiology Imaging plays a pivotal role in medical diagnostics, providing clinicians with insights into patient health and guiding the next steps in treatment. The true value of a radiological image lies in the accuracy of its accompanying report. To ensure the reliability of these reports, they are often cross-referenced with operative findings. The conventional method of manually comparing radiology and operative reports is labor-intensive and demands specialized knowledge. This study explores the potential of a Large Language Model (LLM) to simplify the radiology evaluation process by automatically extracting pertinent details from these reports, focusing especially on the shoulder's primary anatomical structures. A fine-tuned LLM identifies mentions of the supraspinatus tendon, infraspinatus tendon, subscapularis tendon, biceps tendon, and glenoid labrum in lengthy radiology and operative documents. Initial findings emphasize the model's capability to pinpoint relevant data, suggesting a transformative approach to the typical evaluation methods in radiology.
Collapse
|
2
|
Automated Radiology-Arthroscopy Correlation of Knee Meniscal Tears Using Natural Language Processing Algorithms. Acad Radiol 2022; 29:479-487. [PMID: 33583713 DOI: 10.1016/j.acra.2021.01.017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 01/19/2021] [Accepted: 01/21/2021] [Indexed: 12/29/2022]
Abstract
RATIONALE AND OBJECTIVES Train and apply natural language processing (NLP) algorithms for automated radiology-arthroscopy correlation of meniscal tears. MATERIALS AND METHODS In this retrospective single-institution study, we trained supervised machine learning models (logistic regression, support vector machine, and random forest) to detect medial or lateral meniscus tears on free-text MRI reports. We trained and evaluated model performances with cross-validation using 3593 manually annotated knee MRI reports. To assess radiology-arthroscopy correlation, we then randomly partitioned this dataset 80:20 for training and testing, where 108 test set MRIs were followed by knee arthroscopy within 1 year. These free-text arthroscopy reports were also manually annotated. The NLP algorithms trained on the knee MRI training dataset were then evaluated on the MRI and arthroscopy report test datasets. We assessed radiology-arthroscopy agreement using the ensembled NLP-extracted findings versus manually annotated findings. RESULTS The NLP models showed high cross-validation performance for meniscal tear detection on knee MRI reports (medial meniscus F1 scores 0.93-0.94, lateral meniscus F1 scores 0.86-0.88). When these algorithms were evaluated on arthroscopy reports, despite never training on arthroscopy reports, performance was similar, though higher with model ensembling (medial meniscus F1 score 0.97, lateral meniscus F1 score 0.99). However, ensembling did not improve performance on knee MRI reports. In the radiology-arthroscopy test set, the ensembled NLP models were able to detect mismatches between MRI and arthroscopy reports with sensitivity 79% and specificity 87%. CONCLUSION Radiology-arthroscopy correlation can be automated for knee meniscal tears using NLP algorithms, which shows promise for education and quality improvement.
Collapse
|
3
|
Tsuji S, Wen A, Takahashi N, Zhang H, Ogasawara K, Jiang G. Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study. J Med Internet Res 2021; 23:e25378. [PMID: 34714247 PMCID: PMC8590187 DOI: 10.2196/25378] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 07/06/2021] [Accepted: 07/27/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions such as the name and location of a disease for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities that can be extracted depends on the dictionary lookup. In particular, the recognition of compound terms is very complicated because of the variety of patterns. OBJECTIVE The aim of this study is to develop and evaluate an NER tool concerned with compound terms using RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general purpose dictionary). We manually annotated 400 radiology reports for compound terms in noun phrases and used them as the gold standard for performance evaluation (precision, recall, and F-measure). In addition, we created a compound terms-enhanced dictionary (CtED) by analyzing false negatives and false positives and applied it to another 100 radiology reports for validation. We also evaluated the stem terms of compound terms by defining two measures: occurrence ratio (OR) and matching ratio (MR). RESULTS The F-measure of cTAKES+RadLex+general purpose dictionary was 30.9% (precision 73.3% and recall 19.6%) and that of the combined CtED was 63.1% (precision 82.8% and recall 51%). The OR indicated that the stem terms of effusion, node, tube, and disease were used frequently, but it still lacks capturing compound terms. The MR showed that 71.85% (9411/13,098) of the stem terms matched with that of the ontologies, and RadLex improved approximately 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using the ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance with regard to expanding vocabularies.
Collapse
Affiliation(s)
- Shintaro Tsuji
- Department of Health Sciences Research, Department of Radiology, Rochester, MN, United States.,Graduate School of Health Sciences, Hokkaido University, Sapporo, Japan
| | - Andrew Wen
- Department of Health Sciences Research, Department of Radiology, Rochester, MN, United States
| | - Naoki Takahashi
- Department of Radiology, Mayo Clinic, Rochester, MN, United States
| | - Hongjian Zhang
- Graduate School of Health Sciences, Hokkaido University, Sapporo, Japan
| | | | - Gouqian Jiang
- Department of Health Sciences Research, Department of Radiology, Rochester, MN, United States
| |
Collapse
|
4
|
Huang X, Chen H, Yan JD. Study on structured method of Chinese MRI report of nasopharyngeal carcinoma. BMC Med Inform Decis Mak 2021; 21:203. [PMID: 34330269 PMCID: PMC8323197 DOI: 10.1186/s12911-021-01547-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 06/02/2021] [Indexed: 12/02/2022] Open
Abstract
Background Image text is an important text data in the medical field at it can assist clinicians in making a diagnosis. However, due to the diversity of languages, most descriptions in the image text are unstructured data. The same medical phenomenon may also be described in various ways, such that it remains challenging to conduct text structure analysis. The aim of this research is to develop a feasible approach that can automatically convert nasopharyngeal cancer reports into structured text and build a knowledge network. Methods In this work, we compare commonly used named entity recognition (NER) models, choose the optimal model as our triplet extraction model, and present a Chinese structuring algorithm. Finally, we visualize the results of the algorithm in the form of a knowledge network of nasopharyngeal cancer. Results In NER, both accuracy and recall of the BERT-CRF model reached 99%. The structured extraction rate is 84.74%, and the accuracy is 89.39%. The architecture based on recurrent neural network does not rely on medical dictionaries or word segmentation tools and can realize triplet recognition. Conclusions The BERT-CRF model has high performance in NER, and the triplet can reflect the content of the image report. This work can provide technical support for the construction of a nasopharyngeal cancer database.
Collapse
Affiliation(s)
- Xin Huang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, Guangdong, China.,Nanfang Hospital, Southern Medical University, Guangzhou, 510515, Guangdong, China
| | - Hui Chen
- Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Jing-Dong Yan
- Nanfang Hospital, Southern Medical University, Guangzhou, 510515, Guangdong, China.
| |
Collapse
|
5
|
Kersloot MG, van Putten FJP, Abu-Hanna A, Cornet R, Arts DL. Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies. J Biomed Semantics 2020; 11:14. [PMID: 33198814 PMCID: PMC7670625 DOI: 10.1186/s13326-020-00231-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 11/03/2020] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Free-text descriptions in electronic health records (EHRs) can be of interest for clinical research and care optimization. However, free text cannot be readily interpreted by a computer and, therefore, has limited value. Natural Language Processing (NLP) algorithms can make free text machine-interpretable by attaching ontology concepts to it. However, implementations of NLP algorithms are not evaluated consistently. Therefore, the objective of this study was to review the current methods used for developing and evaluating NLP algorithms that map clinical text fragments onto ontology concepts. To standardize the evaluation of algorithms and reduce heterogeneity between studies, we propose a list of recommendations. METHODS Two reviewers examined publications indexed by Scopus, IEEE, MEDLINE, EMBASE, the ACM Digital Library, and the ACL Anthology. Publications reporting on NLP for mapping clinical text from EHRs to ontology concepts were included. Year, country, setting, objective, evaluation and validation methods, NLP algorithms, terminology systems, dataset size and language, performance measures, reference standard, generalizability, operational use, and source code availability were extracted. The studies' objectives were categorized by way of induction. These results were used to define recommendations. RESULTS Two thousand three hundred fifty five unique studies were identified. Two hundred fifty six studies reported on the development of NLP algorithms for mapping free text to ontology concepts. Seventy-seven described development and evaluation. Twenty-two studies did not perform a validation on unseen data and 68 studies did not perform external validation. Of 23 studies that claimed that their algorithm was generalizable, 5 tested this by external validation. A list of sixteen recommendations regarding the usage of NLP systems and algorithms, usage of data, evaluation and validation, presentation of results, and generalizability of results was developed. CONCLUSION We found many heterogeneous approaches to the reporting on the development and evaluation of NLP algorithms that map clinical text to ontology concepts. Over one-fourth of the identified publications did not perform an evaluation. In addition, over one-fourth of the included studies did not perform a validation, and 88% did not perform external validation. We believe that our recommendations, alongside an existing reporting standard, will increase the reproducibility and reusability of future studies and NLP algorithms in medicine.
Collapse
Affiliation(s)
- Martijn G. Kersloot
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
- Castor EDC, Amsterdam, The Netherlands
| | - Florentien J. P. van Putten
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| | - Ameen Abu-Hanna
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| | - Ronald Cornet
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| | - Derk L. Arts
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
- Castor EDC, Amsterdam, The Netherlands
| |
Collapse
|
6
|
Spasic I, Button K. Patient Triage by Topic Modeling of Referral Letters: Feasibility Study. JMIR Med Inform 2020; 8:e21252. [PMID: 33155985 PMCID: PMC7679210 DOI: 10.2196/21252] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 09/17/2020] [Accepted: 10/05/2020] [Indexed: 01/22/2023] Open
Abstract
Background Musculoskeletal conditions are managed within primary care, but patients can be referred to secondary care if a specialist opinion is required. The ever-increasing demand for health care resources emphasizes the need to streamline care pathways with the ultimate aim of ensuring that patients receive timely and optimal care. Information contained in referral letters underpins the referral decision-making process but is yet to be explored systematically for the purposes of treatment prioritization for musculoskeletal conditions. Objective This study aims to explore the feasibility of using natural language processing and machine learning to automate the triage of patients with musculoskeletal conditions by analyzing information from referral letters. Specifically, we aim to determine whether referral letters can be automatically assorted into latent topics that are clinically relevant, that is, considered relevant when prescribing treatments. Here, clinical relevance is assessed by posing 2 research questions. Can latent topics be used to automatically predict treatment? Can clinicians interpret latent topics as cohorts of patients who share common characteristics or experiences such as medical history, demographics, and possible treatments? Methods We used latent Dirichlet allocation to model each referral letter as a finite mixture over an underlying set of topics and model each topic as an infinite mixture over an underlying set of topic probabilities. The topic model was evaluated in the context of automating patient triage. Given a set of treatment outcomes, a binary classifier was trained for each outcome using previously extracted topics as the input features of the machine learning algorithm. In addition, a qualitative evaluation was performed to assess the human interpretability of topics. Results The prediction accuracy of binary classifiers outperformed the stratified random classifier by a large margin, indicating that topic modeling could be used to predict the treatment, thus effectively supporting patient triage. The qualitative evaluation confirmed the high clinical interpretability of the topic model. Conclusions The results established the feasibility of using natural language processing and machine learning to automate triage of patients with knee or hip pain by analyzing information from their referral letters.
Collapse
Affiliation(s)
- Irena Spasic
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Kate Button
- School of Healthcare Sciences, Cardiff University, Cardiff, United Kingdom
| |
Collapse
|
7
|
Button K, Spasić I, Playle R, Owen D, Lau M, Hannaway L, Jones S. Using routine referral data for patients with knee and hip pain to improve access to specialist care. BMC Musculoskelet Disord 2020; 21:66. [PMID: 32013997 PMCID: PMC6998102 DOI: 10.1186/s12891-020-3087-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Accepted: 01/22/2020] [Indexed: 11/29/2022] Open
Abstract
Background Referral letters from primary care contain a large amount of information that could be used to improve the appropriateness of the referral pathway for individuals seeking specialist opinion for knee or hip pain. The primary aim of this study was to evaluate the content of the referral letters to identify information that can independently predict an optimal care pathway. Methods Using a prospective longitudinal design, a convenience sample of patients with hip or knee pain were recruited from orthopaedic, specialist general practice and advanced physiotherapy practitioner clinics. Individuals completed a Knee or hip Osteoarthritis Outcome Score at initial consultation and after 6 months. Participant demographics, body mass index, medication and co-morbidity data were extracted from the referral letters. Free text of the referral letters was mapped automatically onto the Unified Medical Language System to identify relevant clinical variables. Treatment outcomes were extracted from the consultation letters. Each outcome was classified as being an optimal or sub-optimal pathway, where an optimal pathway was defined as the one that results in the right treatment at the right time. Logistic regression was used to identify variables that were independently associated with an optimal pathway. Results A total of 643 participants were recruited, 419 (66.7%) were classified as having an optimal pathway. Variables independently associated with having an optimal care pathway were lower body mass index (OR 1.0, 95% CI 0.9 to 1.0 p = 0.004), named disease or syndromes (OR 1.8, 95% CI 1.1 to 2.8, p = 0.02) and taking pharmacologic substances (OR 1.8, 95% CI 1.0 to 3.3, p = 0.02). Having a single diagnostic procedure was associated with a suboptimal pathway (OR 0.5, 95% CI 0.3 to 0.9 p < 0.001). Neither Knee nor Hip Osteoarthritis Outcome scores were associated with an optimal pathway. Body mass index was found to be a good predictor of patient rated function (coefficient − 0.8, 95% CI -1.1, − 0.4 p < 0.001). Conclusion Over 30% of patients followed sub-optimal care pathway, which represents potential inefficiency and wasted healthcare resource. A core data set including body mass index should be considered as this was a predictor of optimal care and patient rated pain and function.
Collapse
Affiliation(s)
- Kate Button
- School of Healthcare Sciences, Cardiff University, Eastgate House, Newport Road, Cardiff, CF24 0AB, UK. .,Physiotherapy Department, Cardiff and Vale University Health Board, Cardiff, UK.
| | - Irena Spasić
- School of Computer Science & Informatics, Cardiff University, Cardiff, UK
| | - Rebecca Playle
- Centre for Trials Research, Cardiff University, Cardiff, UK
| | - David Owen
- School of Computer Science & Informatics, Cardiff University, Cardiff, UK
| | - Mandy Lau
- Centre for Trials Research, Cardiff University, Cardiff, UK
| | | | - Stephen Jones
- Trauma and Orthopaedics, Cardiff and Vale Orthopaedic Centre, University Hospital Llandough, Cardiff and Vale UHB, Cardiff, UK
| |
Collapse
|
8
|
Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S, Liu H. Clinical information extraction applications: A literature review. J Biomed Inform 2018; 77:34-49. [PMID: 29162496 PMCID: PMC5771858 DOI: 10.1016/j.jbi.2017.11.011] [Citation(s) in RCA: 340] [Impact Index Per Article: 56.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 11/01/2017] [Accepted: 11/17/2017] [Indexed: 12/24/2022]
Abstract
BACKGROUND With the rapid adoption of electronic health records (EHRs), it is desirable to harvest information and knowledge from EHRs to support automated systems at the point of care and to enable secondary use of EHRs for clinical and translational research. One critical component used to facilitate the secondary use of EHR data is the information extraction (IE) task, which automatically extracts and encodes clinical information from text. OBJECTIVES In this literature review, we present a review of recent published research on clinical information extraction (IE) applications. METHODS A literature search was conducted for articles published from January 2009 to September 2016 based on Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and ACM Digital Library. RESULTS A total of 1917 publications were identified for title and abstract screening. Of these publications, 263 articles were selected and discussed in this review in terms of publication venues and data sources, clinical IE tools, methods, and applications in the areas of disease- and drug-related studies, and clinical workflow optimizations. CONCLUSIONS Clinical IE has been used for a wide range of applications, however, there is a considerable gap between clinical studies using EHR data and studies using clinical IE. This study enabled us to gain a more concrete understanding of the gap and to provide potential solutions to bridge this gap.
Collapse
Affiliation(s)
- Yanshan Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Liwei Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Majid Rastegar-Mojarad
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sungrim Moon
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Feichen Shen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Naveed Afzal
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sijia Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Yuqun Zeng
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Saeed Mehrabi
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sunghwan Sohn
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States.
| |
Collapse
|
9
|
Viani N, Larizza C, Tibollo V, Napolitano C, Priori SG, Bellazzi R, Sacchi L. Information extraction from Italian medical reports: An ontology-driven approach. Int J Med Inform 2017; 111:140-148. [PMID: 29425625 DOI: 10.1016/j.ijmedinf.2017.12.013] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Revised: 11/30/2017] [Accepted: 12/16/2017] [Indexed: 12/15/2022]
Abstract
OBJECTIVE In this work, we propose an ontology-driven approach to identify events and their attributes from episodes of care included in medical reports written in Italian. For this language, shared resources for clinical information extraction are not easily accessible. MATERIALS AND METHODS The corpus considered in this work includes 5432 non-annotated medical reports belonging to patients with rare arrhythmias. To guide the information extraction process, we built a domain-specific ontology that includes the events and the attributes to be extracted, with related regular expressions. The ontology and the annotation system were constructed on a development set, while the performance was evaluated on an independent test set. As a gold standard, we considered a manually curated hospital database named TRIAD, which stores most of the information written in reports. RESULTS The proposed approach performs well on the considered Italian medical corpus, with a percentage of correct annotations above 90% for most considered clinical events. We also assessed the possibility to adapt the system to the analysis of another language (i.e., English), with promising results. DISCUSSION AND CONCLUSION Our annotation system relies on a domain ontology to extract and link information in clinical text. We developed an ontology that can be easily enriched and translated, and the system performs well on the considered task. In the future, it could be successfully used to automatically populate the TRIAD database.
Collapse
Affiliation(s)
- Natalia Viani
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100, Pavia, PV, Italy.
| | - Cristiana Larizza
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100, Pavia, PV, Italy
| | - Valentina Tibollo
- IRCCS Istituti Clinici Scientifici Maugeri, Via Salvatore Maugeri 10, 27100, Pavia, PV, Italy
| | - Carlo Napolitano
- IRCCS Istituti Clinici Scientifici Maugeri, Via Salvatore Maugeri 10, 27100, Pavia, PV, Italy
| | - Silvia G Priori
- IRCCS Istituti Clinici Scientifici Maugeri, Via Salvatore Maugeri 10, 27100, Pavia, PV, Italy; Department of Molecular Medicine, University of Pavia, Via Forlanini, 27100, Pavia, PV, Italy
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100, Pavia, PV, Italy; IRCCS Istituti Clinici Scientifici Maugeri, Via Salvatore Maugeri 10, 27100, Pavia, PV, Italy
| | - Lucia Sacchi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100, Pavia, PV, Italy
| |
Collapse
|
10
|
Dongliang X, Jingchang P, Bailing W. Multiple kernels learning-based biological entity relationship extraction method. J Biomed Semantics 2017; 8:38. [PMID: 29297359 PMCID: PMC5763518 DOI: 10.1186/s13326-017-0138-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Background Automatic extracting protein entity interaction information from biomedical literature can help to build protein relation network and design new drugs. There are more than 20 million literature abstracts included in MEDLINE, which is the most authoritative textual database in the field of biomedicine, and follow an exponential growth over time. This frantic expansion of the biomedical literature can often be difficult to absorb or manually analyze. Thus efficient and automated search engines are necessary to efficiently explore the biomedical literature using text mining techniques. Results The P, R, and F value of tag graph method in Aimed corpus are 50.82, 69.76, and 58.61%, respectively. The P, R, and F value of tag graph kernel method in other four evaluation corpuses are 2–5% higher than that of all-paths graph kernel. And The P, R and F value of feature kernel and tag graph kernel fuse methods is 53.43, 71.62 and 61.30%, respectively. The P, R and F value of feature kernel and tag graph kernel fuse methods is 55.47, 70.29 and 60.37%, respectively. It indicated that the performance of the two kinds of kernel fusion methods is better than that of simple kernel. Conclusion In comparison with the all-paths graph kernel method, the tag graph kernel method is superior in terms of overall performance. Experiments show that the performance of the multi-kernels method is better than that of the three separate single-kernel method and the dual-mutually fused kernel method used hereof in five corpus sets.
Collapse
Affiliation(s)
- Xu Dongliang
- School of Mechanical, Electrical and Information Engineering, ShanDong University, WenHua West Road, WeiHai, 264209, China
| | - Pan Jingchang
- School of Mechanical, Electrical and Information Engineering, ShanDong University, WenHua West Road, WeiHai, 264209, China.
| | - Wang Bailing
- School of Computer Science and Technology, Harbin Institute of Technology, WenHua West Road, WeiHai, 264209, China
| |
Collapse
|
11
|
Min H, Mobahi H, Irvin K, Avramovic S, Wojtusiak J. Predicting activities of daily living for cancer patients using an ontology-guided machine learning methodology. J Biomed Semantics 2017; 8:39. [PMID: 28915930 PMCID: PMC5603095 DOI: 10.1186/s13326-017-0149-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2016] [Accepted: 09/06/2017] [Indexed: 01/29/2023] Open
Abstract
Background Bio-ontologies are becoming increasingly important in knowledge representation and in the machine learning (ML) fields. This paper presents a ML approach that incorporates bio-ontologies and its application to the SEER-MHOS dataset to discover patterns of patient characteristics that impact the ability to perform activities of daily living (ADLs). Bio-ontologies are used to provide computable knowledge for ML methods to “understand” biomedical data. Results This retrospective study included 723 cancer patients from the SEER-MHOS dataset. Two ML methods were applied to create predictive models for ADL disabilities for the first year after a patient’s cancer diagnosis. The first method is a standard rule learning algorithm; the second is that same algorithm additionally equipped with methods for reasoning with ontologies. The models showed that a patient’s race, ethnicity, smoking preference, treatment plan and tumor characteristics including histology, staging, cancer site, and morphology were predictors for ADL performance levels one year after cancer diagnosis. The ontology-guided ML method was more accurate at predicting ADL performance levels (P < 0.1) than methods without ontologies. Conclusions This study demonstrated that bio-ontologies can be harnessed to provide medical knowledge for ML algorithms. The presented method demonstrates that encoding specific types of hierarchical relationships to guide rule learning is possible, and can be extended to other types of semantic relationships present in biomedical ontologies. The ontology-guided ML method achieved better performance than the method without ontologies. The presented method can also be used to promote the effectiveness and efficiency of ML in healthcare, in which use of background knowledge and consistency with existing clinical expertise is critical.
Collapse
Affiliation(s)
- Hua Min
- Department of Health Administration and Policy, College of Health and Human Services, George Mason University, MS: 1J3, 4400 University Drive, Fairfax, VA, 22030-4444, USA.
| | - Hedyeh Mobahi
- Department of Health Administration and Policy, College of Health and Human Services, George Mason University, MS: 1J3, 4400 University Drive, Fairfax, VA, 22030-4444, USA
| | - Katherine Irvin
- Department of Health Administration and Policy, College of Health and Human Services, George Mason University, MS: 1J3, 4400 University Drive, Fairfax, VA, 22030-4444, USA
| | - Sanja Avramovic
- Department of Health Administration and Policy, College of Health and Human Services, George Mason University, MS: 1J3, 4400 University Drive, Fairfax, VA, 22030-4444, USA
| | - Janusz Wojtusiak
- Department of Health Administration and Policy, College of Health and Human Services, George Mason University, MS: 1J3, 4400 University Drive, Fairfax, VA, 22030-4444, USA
| |
Collapse
|
12
|
Yim WW, Kwan SW, Yetisgen M. Classifying tumor event attributes in radiology reports. J Assoc Inf Sci Technol 2017. [DOI: 10.1002/asi.23937] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Wen-wai Yim
- Palo Alto Veterans Affairs, Biomedical Informatics Research, Stanford University, 1265 Welch Road; Stanford CA 94305
| | - Sharon W. Kwan
- Department of Radiology; Interventional Radiology Section, University of Washington Medical Center, 1959 NE Pacific Street; Seattle WA 98195 USA
| | - Meliha Yetisgen
- Biomedical and Health Informatics, Linguistics; University of Washington, Box 358047; Seattle WA 98195 USA
| |
Collapse
|
13
|
Deléger L, Campillos L, Ligozat AL, Névéol A. Design of an extensive information representation scheme for clinical narratives. J Biomed Semantics 2017; 8:37. [PMID: 28893314 PMCID: PMC5594525 DOI: 10.1186/s13326-017-0135-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 07/26/2017] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Knowledge representation frameworks are essential to the understanding of complex biomedical processes, and to the analysis of biomedical texts that describe them. Combined with natural language processing (NLP), they have the potential to contribute to retrospective studies by unlocking important phenotyping information contained in the narrative content of electronic health records (EHRs). This work aims to develop an extensive information representation scheme for clinical information contained in EHR narratives, and to support secondary use of EHR narrative data to answer clinical questions. METHODS We review recent work that proposed information representation schemes and applied them to the analysis of clinical narratives. We then propose a unifying scheme that supports the extraction of information to address a large variety of clinical questions. RESULTS We devised a new information representation scheme for clinical narratives that comprises 13 entities, 11 attributes and 37 relations. The associated annotation guidelines can be used to consistently apply the scheme to clinical narratives and are https://cabernet.limsi.fr/annotation_guide_for_the_merlot_french_clinical_corpus-Sept2016.pdf . CONCLUSION The information scheme includes many elements of the major schemes described in the clinical natural language processing literature, as well as a uniquely detailed set of relations.
Collapse
Affiliation(s)
- Louise Deléger
- French National Institute for Agricultural Research (INRA), Domaine de Vilvert, Jouy en Josas, Paris, 78352, France.,LIMSI, CNRS, Université Paris - Saclay, Rue John von Neumann, Orsay, 91405, France
| | - Leonardo Campillos
- LIMSI, CNRS, Université Paris - Saclay, Rue John von Neumann, Orsay, 91405, France
| | - Anne-Laure Ligozat
- LIMSI, CNRS, Université Paris - Saclay, Rue John von Neumann, Orsay, 91405, France.,ENSIIE, 1 square de la résistance, Évry Cedex, 91025, France
| | - Aurélie Névéol
- LIMSI, CNRS, Université Paris - Saclay, Rue John von Neumann, Orsay, 91405, France.
| |
Collapse
|
14
|
Scuba W, Tharp M, Mowery D, Tseytlin E, Liu Y, Drews FA, Chapman WW. Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction. J Biomed Semantics 2016; 7:42. [PMID: 27338146 PMCID: PMC4919842 DOI: 10.1186/s13326-016-0086-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2015] [Accepted: 06/01/2016] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Clinical Natural Language Processing (NLP) systems require a semantic schema comprised of domain-specific concepts, their lexical variants, and associated modifiers to accurately extract information from clinical texts. An NLP system leverages this schema to structure concepts and extract meaning from the free texts. In the clinical domain, creating a semantic schema typically requires input from both a domain expert, such as a clinician, and an NLP expert who will represent clinical concepts created from the clinician's domain expertise into a computable format usable by an NLP system. The goal of this work is to develop a web-based tool, Knowledge Author, that bridges the gap between the clinical domain expert and the NLP system development by facilitating the development of domain content represented in a semantic schema for extracting information from clinical free-text. RESULTS Knowledge Author is a web-based, recommendation system that supports users in developing domain content necessary for clinical NLP applications. Knowledge Author's schematic model leverages a set of semantic types derived from the Secondary Use Clinical Element Models and the Common Type System to allow the user to quickly create and modify domain-related concepts. Features such as collaborative development and providing domain content suggestions through the mapping of concepts to the Unified Medical Language System Metathesaurus database further supports the domain content creation process. Two proof of concept studies were performed to evaluate the system's performance. The first study evaluated Knowledge Author's flexibility to create a broad range of concepts. A dataset of 115 concepts was created of which 87 (76 %) were able to be created using Knowledge Author. The second study evaluated the effectiveness of Knowledge Author's output in an NLP system by extracting concepts and associated modifiers representing a clinical element, carotid stenosis, from 34 clinical free-text radiology reports using Knowledge Author and an NLP system, pyConText. Knowledge Author's domain content produced high recall for concepts (targeted findings: 86 %) and varied recall for modifiers (certainty: 91 % sidedness: 80 %, neurovascular anatomy: 46 %). CONCLUSION Knowledge Author can support clinical domain content development for information extraction by supporting semantic schema creation by domain experts.
Collapse
Affiliation(s)
- William Scuba
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, 84108, USA
| | - Melissa Tharp
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, 84108, USA
| | - Danielle Mowery
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, 84108, USA
| | - Eugene Tseytlin
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15206, USA
| | - Yang Liu
- University of California, San Diego, CA, 92093, USA
| | - Frank A Drews
- Department of Psychology, University of Utah, Salt Lake City, UT, 84108, USA
| | - Wendy W Chapman
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, 84108, USA.
| |
Collapse
|