Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yang X, Zhang H, He X, Bian J, Wu Y. Extracting Family History of Patients From Clinical Narratives: Exploring an End-to-End Solution With Deep Learning Models. JMIR Med Inform 2020;8:e22982. [PMID: 33320104 PMCID: PMC7772072 DOI: 10.2196/22982] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 10/05/2020] [Accepted: 11/20/2020] [Indexed: 12/16/2022] Open

For:	Yang X, Zhang H, He X, Bian J, Wu Y. Extracting Family History of Patients From Clinical Narratives: Exploring an End-to-End Solution With Deep Learning Models. JMIR Med Inform 2020;8:e22982. [PMID: 33320104 PMCID: PMC7772072 DOI: 10.2196/22982] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 10/05/2020] [Accepted: 11/20/2020] [Indexed: 12/16/2022] Open

Number

Cited by Other Article(s)

Kozik R, Mazurczyk W, Cabaj K, Pawlicka A, Pawlicki M, Choraś M. Deep Learning for Combating Misinformation in Multicategorical Text Contents. SENSORS (BASEL, SWITZERLAND) 2023;23:9666. [PMID: 38139513 PMCID: PMC10747375 DOI: 10.3390/s23249666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/24/2023] [Accepted: 12/05/2023] [Indexed: 12/24/2023]

Mithun S, Jha AK, Sherkhane UB, Jaiswar V, Purandare NC, Dekker A, Puts S, Bermejo I, Rangarajan V, Zegers CML, Wee L. Clinical Concept-Based Radiology Reports Classification Pipeline for Lung Carcinoma. J Digit Imaging 2023;36:812-826. [PMID: 36788196 PMCID: PMC10287609 DOI: 10.1007/s10278-023-00787-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 01/23/2023] [Accepted: 01/24/2023] [Indexed: 02/16/2023] Open

Abstract

Rising incidence and mortality of cancer have led to an incremental amount of research in the field. To learn from preexisting data, it has become important to capture maximum information related to disease type, stage, treatment, and outcomes. Medical imaging reports are rich in this kind of information but are only present as free text. The extraction of information from such unstructured text reports is labor-intensive. The use of Natural Language Processing (NLP) tools to extract information from radiology reports can make it less time-consuming as well as more effective. In this study, we have developed and compared different models for the classification of lung carcinoma reports using clinical concepts. This study was approved by the institutional ethics committee as a retrospective study with a waiver of informed consent. A clinical concept-based classification pipeline for lung carcinoma radiology reports was developed using rule-based as well as machine learning models and compared. The machine learning models used were XGBoost and two more deep learning model architectures with bidirectional long short-term neural networks. A corpus consisting of 1700 radiology reports including computed tomography (CT) and positron emission tomography/computed tomography (PET/CT) reports were used for development and testing. Five hundred one radiology reports from MIMIC-III Clinical Database version 1.4 was used for external validation. The pipeline achieved an overall F1 score of 0.94 on the internal set and 0.74 on external validation with the rule-based algorithm using expert input giving the best performance. Among the machine learning models, the Bi-LSTM_dropout model performed better than the ML model using XGBoost and the Bi-LSTM_simple model on internal set, whereas on external validation, the Bi-LSTM_simple model performed relatively better than other 2. This pipeline can be used for clinical concept-based classification of radiology reports related to lung carcinoma from a huge corpus and also for automated annotation of these reports.

Collapse

Affiliation(s)

Sneha Mithun Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands. Department of Nuclear Medicine and Molecular Imaging, Tata Memorial Hospital, Mumbai, India. Homi Bhabha National Institute (HBNI), Deemed University, Mumbai, India.
Ashish Kumar Jha Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands Department of Nuclear Medicine and Molecular Imaging, Tata Memorial Hospital, Mumbai, India Homi Bhabha National Institute (HBNI), Deemed University, Mumbai, India
Umesh B Sherkhane Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands Department of Nuclear Medicine and Molecular Imaging, Tata Memorial Hospital, Mumbai, India
Vinay Jaiswar Department of Nuclear Medicine and Molecular Imaging, Tata Memorial Hospital, Mumbai, India
Nilendu C Purandare Department of Nuclear Medicine and Molecular Imaging, Tata Memorial Hospital, Mumbai, India Homi Bhabha National Institute (HBNI), Deemed University, Mumbai, India
Andre Dekker Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands
Sander Puts Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands
Inigo Bermejo Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands
V Rangarajan Department of Nuclear Medicine and Molecular Imaging, Tata Memorial Hospital, Mumbai, India Homi Bhabha National Institute (HBNI), Deemed University, Mumbai, India
Catharina M L Zegers Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands
Leonard Wee Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands

Collapse

Landolsi MY, Hlaoua L, Ben Romdhane L. Information extraction from electronic medical documents: state of the art and future research directions. Knowl Inf Syst 2023;65:463-516. [PMID: 36405956 PMCID: PMC9640816 DOI: 10.1007/s10115-022-01779-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 05/04/2022] [Accepted: 10/17/2022] [Indexed: 11/10/2022]

Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, Compas C, Martin C, Costa AB, Flores MG, Zhang Y, Magoc T, Harle CA, Lipori G, Mitchell DA, Hogan WR, Shenkman EA, Bian J, Wu Y. A large language model for electronic health records. NPJ Digit Med 2022;5:194. [PMID: 36572766 PMCID: PMC9792464 DOI: 10.1038/s41746-022-00742-2] [Citation(s) in RCA: 100] [Impact Index Per Article: 50.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 12/13/2022] [Indexed: 12/27/2022] Open

Affiliation(s)

Xi Yang Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA Cancer Informatics and eHealth core, University of Florida Health Cancer Center, Gainesville, FL, USA
Aokun Chen Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA Cancer Informatics and eHealth core, University of Florida Health Cancer Center, Gainesville, FL, USA
Nima PourNejatian NVIDIA, Santa Clara, CA, USA
Hoo Chang Shin NVIDIA, Santa Clara, CA, USA
Kaleb E Smith NVIDIA, Santa Clara, CA, USA
Christopher Parisien NVIDIA, Santa Clara, CA, USA
Colin Compas NVIDIA, Santa Clara, CA, USA
Cheryl Martin NVIDIA, Santa Clara, CA, USA
Anthony B Costa NVIDIA, Santa Clara, CA, USA
Mona G Flores NVIDIA, Santa Clara, CA, USA
Ying Zhang Research Computing, University of Florida, Gainesville, FL, USA
Tanja Magoc Integrated Data Repository Research Services, University of Florida, Gainesville, FL, USA
Christopher A Harle Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA Integrated Data Repository Research Services, University of Florida, Gainesville, FL, USA
Gloria Lipori Integrated Data Repository Research Services, University of Florida, Gainesville, FL, USA Lillian S. Wells Department of Neurosurgery, UF Clinical and Translational Science Institute, University of Florida, Gainesville, FL, USA
Duane A Mitchell Lillian S. Wells Department of Neurosurgery, UF Clinical and Translational Science Institute, University of Florida, Gainesville, FL, USA
William R Hogan Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
Elizabeth A Shenkman Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
Jiang Bian Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA Cancer Informatics and eHealth core, University of Florida Health Cancer Center, Gainesville, FL, USA
Yonghui Wu Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA. Cancer Informatics and eHealth core, University of Florida Health Cancer Center, Gainesville, FL, USA.

Collapse

Yu Z, Yang X, Sweeting GL, Ma Y, Stolte SE, Fang R, Wu Y. Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods. BMC Med Inform Decis Mak 2022;22:255. [PMID: 36167551 PMCID: PMC9513862 DOI: 10.1186/s12911-022-01996-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 09/14/2022] [Indexed: 11/16/2022] Open

Abstract

BACKGROUND

Diabetic retinopathy (DR) is a leading cause of blindness in American adults. If detected, DR can be treated to prevent further damage causing blindness. There is an increasing interest in developing artificial intelligence (AI) technologies to help detect DR using electronic health records. The lesion-related information documented in fundus image reports is a valuable resource that could help diagnoses of DR in clinical decision support systems. However, most studies for AI-based DR diagnoses are mainly based on medical images; there is limited studies to explore the lesion-related information captured in the free text image reports.

METHODS

In this study, we examined two state-of-the-art transformer-based natural language processing (NLP) models, including BERT and RoBERTa, compared them with a recurrent neural network implemented using Long short-term memory (LSTM) to extract DR-related concepts from clinical narratives. We identified four different categories of DR-related clinical concepts including lesions, eye parts, laterality, and severity, developed annotation guidelines, annotated a DR-corpus of 536 image reports, and developed transformer-based NLP models for clinical concept extraction and relation extraction. We also examined the relation extraction under two settings including 'gold-standard' setting-where gold-standard concepts were used-and end-to-end setting.

RESULTS

For concept extraction, the BERT model pretrained with the MIMIC III dataset achieve the best performance (0.9503 and 0.9645 for strict/lenient evaluation). For relation extraction, BERT model pretrained using general English text achieved the best strict/lenient F1-score of 0.9316. The end-to-end system, BERT_general_e2e, achieved the best strict/lenient F1-score of 0.8578 and 0.8881, respectively. Another end-to-end system based on the RoBERTa architecture, RoBERTa_general_e2e, also achieved the same performance as BERT_general_e2e in strict scores.

CONCLUSIONS

This study demonstrated the efficiency of transformer-based NLP models for clinical concept extraction and relation extraction. Our results show that it's necessary to pretrain transformer models using clinical text to optimize the performance for clinical concept extraction. Whereas, for relation extraction, transformers pretrained using general English text perform better.

Collapse

Shi J, Morgan KL, Bradshaw RL, Jung SH, Kohlmann W, Kaphingst KA, Kawamoto K, Fiol GD. Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach. JMIR Med Inform 2022;10:e37842. [PMID: 35969459 PMCID: PMC9412758 DOI: 10.2196/37842] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 06/29/2022] [Accepted: 07/06/2022] [Indexed: 11/13/2022] Open

Abstract

BACKGROUND

Family health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for genetic testing is labor intensive.

OBJECTIVE

The aim of this study was to develop a natural language processing (NLP) pipeline and assess its contribution to identifying patients who meet genetic testing criteria for hereditary cancers based on family health history data in the electronic health record (EHR). We compared an algorithm that uses structured data alone with structured data augmented using NLP.

METHODS

Algorithms were developed based on the National Comprehensive Cancer Network (NCCN) guidelines for genetic testing for hereditary breast, ovarian, pancreatic, and colorectal cancers. The NLP-augmented algorithm uses both structured family health history data and the associated unstructured free-text comments. The algorithms were compared with a reference standard of 100 patients with a family health history in the EHR.

RESULTS

Regarding identifying the reference standard patients meeting the NCCN criteria, the NLP-augmented algorithm compared with the structured data algorithm yielded a significantly higher recall of 0.95 (95% CI 0.9-0.99) versus 0.29 (95% CI 0.19-0.40) and a precision of 0.99 (95% CI 0.96-1.00) versus 0.81 (95% CI 0.65-0.95). On the whole data set, the NLP-augmented algorithm extracted 33.6% more entities, resulting in 53.8% more patients meeting the NCCN criteria.

CONCLUSIONS

Compared with the structured data algorithm, the NLP-augmented algorithm based on both structured and unstructured family health history data in the EHR increased the number of patients identified as meeting the NCCN criteria for genetic testing for hereditary breast or ovarian and colorectal cancers.

Collapse

Panaite V, Devendorf AR, Finch D, Bouayad L, Luther SL, Schultz SK. The Value of Extracting Clinician-Recorded Affect for Advancing Clinical Research on Depression: Proof-of-Concept Study Applying Natural Language Processing to Electronic Health Records. JMIR Form Res 2022;6:e34436. [PMID: 35551066 PMCID: PMC9136653 DOI: 10.2196/34436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 01/14/2022] [Accepted: 01/26/2022] [Indexed: 11/18/2022] Open

Chiavi D, Haag C, Chan A, Kamm CP, Sieber C, Stanikić M, Rodgers S, Pot C, Kesselring J, Salmen A, Rapold I, Calabrese P, Manjaly ZM, Gobbi C, Zecca C, Walther S, Stegmayer K, Hoepner R, Puhan M, von Wyl V. Studying Real-World Experiences of Persons with Multiple Sclerosis during the first Covid-19 Lockdown: An Application of Natural Language Processing (Preprint). JMIR Med Inform 2022;10:e37945. [DOI: 10.2196/37945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 10/02/2022] [Indexed: 11/06/2022] Open

Dedhia PH, Chen K, Song Y, LaRose E, Imbus JR, Peissig PL, Mendonca EA, Schneider DF. Ambiguous and Incomplete: Natural Language Processing Reveals Problematic Reporting Styles in Thyroid Ultrasound Reports. Methods Inf Med 2022;61:11-18. [PMID: 34991173 DOI: 10.1055/s-0041-1740493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Abstract

OBJECTIVE

Natural language processing (NLP) systems convert unstructured text into analyzable data. Here, we describe the performance measures of NLP to capture granular details on nodules from thyroid ultrasound (US) reports and reveal critical issues with reporting language.

METHODS

We iteratively developed NLP tools using clinical Text Analysis and Knowledge Extraction System (cTAKES) and thyroid US reports from 2007 to 2013. We incorporated nine nodule features for NLP extraction. Next, we evaluated the precision, recall, and accuracy of our NLP tools using a separate set of US reports from an academic medical center (A) and a regional health care system (B) during the same period. Two physicians manually annotated each test-set report. A third physician then adjudicated discrepancies. The adjudicated "gold standard" was then used to evaluate NLP performance on the test-set.

RESULTS

A total of 243 thyroid US reports contained 6,405 data elements. Inter-annotator agreement for all elements was 91.3%. Compared with the gold standard, overall recall of the NLP tool was 90%. NLP recall for thyroid lobe or isthmus characteristics was: laterality 96% and size 95%. NLP accuracy for nodule characteristics was: laterality 92%, size 92%, calcifications 76%, vascularity 65%, echogenicity 62%, contents 76%, and borders 40%. NLP recall for presence or absence of lymphadenopathy was 61%. Reporting style accounted for 18% errors. For example, the word "heterogeneous" interchangeably referred to nodule contents or echogenicity. While nodule dimensions and laterality were often described, US reports only described contents, echogenicity, vascularity, calcifications, borders, and lymphadenopathy, 46, 41, 17, 15, 9, and 41% of the time, respectively. Most nodule characteristics were equally likely to be described at hospital A compared with hospital B.

CONCLUSIONS

NLP can automate extraction of critical information from thyroid US reports. However, ambiguous and incomplete reporting language hinders performance of NLP systems regardless of institutional setting. Standardized or synoptic thyroid US reports could improve NLP performance.

Collapse

A Survey on Recent Named Entity Recognition and Relationship Extraction Techniques on Clinical Texts. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11188319] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Yu Z, Yang X, Dang C, Wu S, Adekkanattu P, Pathak J, George TJ, Hogan WR, Guo Y, Bian J, Wu Y. A Study of Social and Behavioral Determinants of Health in Lung Cancer Patients Using Transformers-based Natural Language Processing Models. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2021;2021:1225-1233. [PMID: 35309014 PMCID: PMC8861705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/14/2023]

Abstract

Social and behavioral determinants of health (SBDoH) have important roles in shaping people's health. In clinical research studies, especially comparative effectiveness studies, failure to adjust for SBDoH factors will potentially cause confounding issues and misclassification errors in either statistical analyses and machine learning-based models. However, there are limited studies to examine SBDoH factors in clinical outcomes due to the lack of structured SBDoH information in current electronic health record (EHR) systems, while much of the SBDoH information is documented in clinical narratives. Natural language processing (NLP) is thus the key technology to extract such information from unstructured clinical text. However, there is not a mature clinical NLP system focusing on SBDoH. In this study, we examined two state-of-the-art transformer-based NLP models, including BERT and RoBERTa, to extract SBDoH concepts from clinical narratives, applied the best performing model to extract SBDoH concepts on a lung cancer screening patient cohort, and examined the difference of SBDoH information between NLP extracted results and structured EHRs (SBDoH information captured in standard vocabularies such as the International Classification of Diseases codes). The experimental results show that the BERT-based NLP model achieved the best strict/lenient F1-score of 0.8791 and 0.8999, respectively. The comparison between NLP extracted SBDoH information and structured EHRs in the lung cancer patient cohort of 864 patients with 161,933 various types of clinical notes showed that much more detailed information about smoking, education, and employment were only captured in clinical narratives and that it is necessary to use both clinical narratives and structured EHRs to construct a more complete picture of patients' SBDoH factors.

Collapse