1
|
Albashayreh A, Bandyopadhyay A, Zeinali N, Zhang M, Fan W, Gilbertson White S. Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives. JCO Clin Cancer Inform 2024; 8:e2300235. [PMID: 39116379 DOI: 10.1200/cci.23.00235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 04/29/2024] [Accepted: 05/30/2024] [Indexed: 08/10/2024] Open
Abstract
PURPOSE Identifying cancer symptoms in electronic health record (EHR) narratives is feasible with natural language processing (NLP). However, more efficient NLP systems are needed to detect various symptoms and distinguish observed symptoms from negated symptoms and medication-related side effects. We evaluated the accuracy of NLP in (1) detecting 14 symptom groups (ie, pain, fatigue, swelling, depressed mood, anxiety, nausea/vomiting, pruritus, headache, shortness of breath, constipation, numbness/tingling, decreased appetite, impaired memory, disturbed sleep) and (2) distinguishing observed symptoms in EHR narratives among patients with cancer. METHODS We extracted 902,508 notes for 11,784 unique patients diagnosed with cancer and developed a gold standard corpus of 1,112 notes labeled for presence or absence of 14 symptom groups. We trained an embeddings-augmented NLP system integrating human and machine intelligence and conventional machine learning algorithms. NLP metrics were calculated on a gold standard corpus subset for testing. RESULTS The interannotator agreement for labeling the gold standard corpus was excellent at 92%. The embeddings-augmented NLP model achieved the best performance (F1 score = 0.877). The highest NLP accuracy was observed in pruritus (F1 score = 0.937) while the lowest accuracy was in swelling (F1 score = 0.787). After classifying the entire data set with embeddings-augmented NLP, we found that 41% of the notes included symptom documentation. Pain was the most documented symptom (29% of all notes) while impaired memory was the least documented (0.7% of all notes). CONCLUSION We illustrated the feasibility of detecting 14 symptom groups in EHR narratives and showed that an embeddings-augmented NLP system outperforms conventional machine learning algorithms in detecting symptom information and differentiating observed symptoms from negated symptoms and medication-related side effects.
Collapse
Affiliation(s)
| | | | | | - Min Zhang
- School of Economics and Management, Communication University of China, Beijing, China
| | - Weiguo Fan
- Tippie College of Business, University of Iowa, Iowa City, IA
| | | |
Collapse
|
2
|
Petmezas G, Papageorgiou VE, Vassilikos V, Pagourelias E, Tsaklidis G, Katsaggelos AK, Maglaveras N. Recent advancements and applications of deep learning in heart failure: Α systematic review. Comput Biol Med 2024; 176:108557. [PMID: 38728995 DOI: 10.1016/j.compbiomed.2024.108557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 04/12/2024] [Accepted: 05/05/2024] [Indexed: 05/12/2024]
Abstract
BACKGROUND Heart failure (HF), a global health challenge, requires innovative diagnostic and management approaches. The rapid evolution of deep learning (DL) in healthcare necessitates a comprehensive review to evaluate these developments and their potential to enhance HF evaluation, aligning clinical practices with technological advancements. OBJECTIVE This review aims to systematically explore the contributions of DL technologies in the assessment of HF, focusing on their potential to improve diagnostic accuracy, personalize treatment strategies, and address the impact of comorbidities. METHODS A thorough literature search was conducted across four major electronic databases: PubMed, Scopus, Web of Science and IEEE Xplore, yielding 137 articles that were subsequently categorized into five primary application areas: cardiovascular disease (CVD) classification, HF detection, image analysis, risk assessment, and other clinical analyses. The selection criteria focused on studies utilizing DL algorithms for HF assessment, not limited to HF detection but extending to any attempt in analyzing and interpreting HF-related data. RESULTS The analysis revealed a notable emphasis on CVD classification and HF detection, with DL algorithms showing significant promise in distinguishing between affected individuals and healthy subjects. Furthermore, the review highlights DL's capacity to identify underlying cardiomyopathies and other comorbidities, underscoring its utility in refining diagnostic processes and tailoring treatment plans to individual patient needs. CONCLUSIONS This review establishes DL as a key innovation in HF management, highlighting its role in advancing diagnostic accuracy and personalized care. The insights provided advocate for the integration of DL in clinical settings and suggest directions for future research to enhance patient outcomes in HF care.
Collapse
Affiliation(s)
- Georgios Petmezas
- 2nd Department of Obstetrics and Gynecology, Medical School, Aristotle University of Thessaloniki, Thessaloniki, Greece; Centre for Research and Technology Hellas, Thessaloniki, Greece.
| | | | - Vasileios Vassilikos
- 3rd Department of Cardiology, Medical School, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Efstathios Pagourelias
- 3rd Department of Cardiology, Medical School, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - George Tsaklidis
- Department of Mathematics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Aggelos K Katsaggelos
- Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL, USA
| | - Nicos Maglaveras
- 2nd Department of Obstetrics and Gynecology, Medical School, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
3
|
Zhang H, Jethani N, Jones S, Genes N, Major VJ, Jaffe IS, Cardillo AB, Heilenbach N, Ali NF, Bonanni LJ, Clayburn AJ, Khera Z, Sadler EC, Prasad J, Schlacter J, Liu K, Silva B, Montgomery S, Kim EJ, Lester J, Hill TM, Avoricani A, Chervonski E, Davydov J, Small W, Chakravartty E, Grover H, Dodson JA, Brody AA, Aphinyanaphongs Y, Masurkar A, Razavian N. Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.07.10.23292373. [PMID: 38405784 PMCID: PMC10888985 DOI: 10.1101/2023.07.10.23292373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Importance Large language models (LLMs) are crucial for medical tasks. Ensuring their reliability is vital to avoid false results. Our study assesses two state-of-the-art LLMs (ChatGPT and LlaMA-2) for extracting clinical information, focusing on cognitive tests like MMSE and CDR. Objective Evaluate ChatGPT and LlaMA-2 performance in extracting MMSE and CDR scores, including their associated dates. Methods Our data consisted of 135,307 clinical notes (Jan 12th, 2010 to May 24th, 2023) mentioning MMSE, CDR, or MoCA. After applying inclusion criteria 34,465 notes remained, of which 765 underwent ChatGPT (GPT-4) and LlaMA-2, and 22 experts reviewed the responses. ChatGPT successfully extracted MMSE and CDR instances with dates from 742 notes. We used 20 notes for fine-tuning and training the reviewers. The remaining 722 were assigned to reviewers, with 309 each assigned to two reviewers simultaneously. Inter-rater-agreement (Fleiss' Kappa), precision, recall, true/false negative rates, and accuracy were calculated. Our study follows TRIPOD reporting guidelines for model validation. Results For MMSE information extraction, ChatGPT (vs. LlaMA-2) achieved accuracy of 83% (vs. 66.4%), sensitivity of 89.7% (vs. 69.9%), true-negative rates of 96% (vs 60.0%), and precision of 82.7% (vs 62.2%). For CDR the results were lower overall, with accuracy of 87.1% (vs. 74.5%), sensitivity of 84.3% (vs. 39.7%), true-negative rates of 99.8% (98.4%), and precision of 48.3% (vs. 16.1%). We qualitatively evaluated the MMSE errors of ChatGPT and LlaMA-2 on double-reviewed notes. LlaMA-2 errors included 27 cases of total hallucination, 19 cases of reporting other scores instead of MMSE, 25 missed scores, and 23 cases of reporting only the wrong date. In comparison, ChatGPT's errors included only 3 cases of total hallucination, 17 cases of wrong test reported instead of MMSE, and 19 cases of reporting a wrong date. Conclusions In this diagnostic/prognostic study of ChatGPT and LlaMA-2 for extracting cognitive exam dates and scores from clinical notes, ChatGPT exhibited high accuracy, with better performance compared to LlaMA-2. The use of LLMs could benefit dementia research and clinical care, by identifying eligible patients for treatments initialization or clinical trial enrollments. Rigorous evaluation of LLMs is crucial to understanding their capabilities and limitations.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Abraham A Brody
- NYU Rory Meyers College of Nursing, NYU Grossman School of Medicine
| | | | | | | |
Collapse
|
4
|
Hossain E, Rana R, Higgins N, Soar J, Barua PD, Pisani AR, Turner K. Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review. Comput Biol Med 2023; 155:106649. [PMID: 36805219 DOI: 10.1016/j.compbiomed.2023.106649] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 01/04/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023]
Abstract
BACKGROUND Natural Language Processing (NLP) is widely used to extract clinical insights from Electronic Health Records (EHRs). However, the lack of annotated data, automated tools, and other challenges hinder the full utilisation of NLP for EHRs. Various Machine Learning (ML), Deep Learning (DL) and NLP techniques are studied and compared to understand the limitations and opportunities in this space comprehensively. METHODOLOGY After screening 261 articles from 11 databases, we included 127 papers for full-text review covering seven categories of articles: (1) medical note classification, (2) clinical entity recognition, (3) text summarisation, (4) deep learning (DL) and transfer learning architecture, (5) information extraction, (6) Medical language translation and (7) other NLP applications. This study follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. RESULT AND DISCUSSION EHR was the most commonly used data type among the selected articles, and the datasets were primarily unstructured. Various ML and DL methods were used, with prediction or classification being the most common application of ML or DL. The most common use cases were: the International Classification of Diseases, Ninth Revision (ICD-9) classification, clinical note analysis, and named entity recognition (NER) for clinical descriptions and research on psychiatric disorders. CONCLUSION We find that the adopted ML models were not adequately assessed. In addition, the data imbalance problem is quite important, yet we must find techniques to address this underlining problem. Future studies should address key limitations in studies, primarily identifying Lupus Nephritis, Suicide Attempts, perinatal self-harmed and ICD-9 classification.
Collapse
Affiliation(s)
- Elias Hossain
- School of Engineering & Physical Sciences, North South University, Dhaka 1229, Bangladesh.
| | - Rajib Rana
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Niall Higgins
- School of Management and Enterprise, University of Southern Queensland, Darling Heights QLD 4350, Australia; School of Nursing, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4000, Australia; Metro North Mental Health, Herston QLD 4029, Australia
| | - Jeffrey Soar
- School of Business, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Prabal Datta Barua
- School of Business, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Anthony R Pisani
- Center for the Study and Prevention of Suicide, University of Rochester, Rochester, NY, United States
| | - Kathryn Turner
- School of Nursing, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4000, Australia
| |
Collapse
|
5
|
Masukawa K, Aoyama M, Yokota S, Nakamura J, Ishida R, Nakayama M, Miyashita M. Machine learning models to detect social distress, spiritual pain, and severe physical psychological symptoms in terminally ill patients with cancer from unstructured text data in electronic medical records. Palliat Med 2022; 36:1207-1216. [PMID: 35773973 DOI: 10.1177/02692163221105595] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
BACKGROUND Few studies have developed automatic systems for identifying social distress, spiritual pain, and severe physical and phycological symptoms from text data in electronic medical records. AIM To develop models to detect social distress, spiritual pain, and severe physical and psychological symptoms in terminally ill patients with cancer from unstructured text data contained in electronic medical records. DESIGN A retrospective study of 1,554,736 narrative clinical records was analyzed 1 month before patients died. Supervised machine learning models were trained to detect comprehensive symptoms, and the performance of the models was tested using the area under the receiver operating characteristic curve (AUROC) and precision recall curve (AUPRC). SETTING/PARTICIPANTS A total of 808 patients was included in the study using records obtained from a university hospital in Japan between January 1, 2018 and December 31, 2019. As training data, we used medical records labeled for detecting social distress (n = 10,000) and spiritual pain (n = 10,000), and records that could be combined with the Support Team Assessment Schedule (based on date) for detecting severe physical/psychological symptoms (n = 5409). RESULTS Machine learning models for detecting social distress had AUROC and AUPRC values of 0.98 and 0.61, respectively; values for spiritual pain, were 0.90 and 0.58, respectively. The machine learning models accurately identified severe symptoms (pain, dyspnea, nausea, insomnia, and anxiety) with a high level of discrimination (AUROC > 0.8). CONCLUSION The machine learning models could detect social distress, spiritual pain, and severe symptoms in terminally ill patients with cancer from text data contained in electronic medical records.
Collapse
Affiliation(s)
- Kento Masukawa
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Maho Aoyama
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Shinichiroh Yokota
- Faculty of Medicine, The University of Tokyo, Hongo, Tokyo, Japan.,Department of Healthcare Information Management, The University of Tokyo Hospital, Hongo, Tokyo, Japan
| | - Jyunya Nakamura
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Ryoka Ishida
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Masaharu Nakayama
- Department of Medical Informatics, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Mitsunori Miyashita
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| |
Collapse
|
6
|
Chaichulee S, Promchai C, Kaewkomon T, Kongkamol C, Ingviya T, Sangsupawanich P. Multi-label classification of symptom terms from free-text bilingual adverse drug reaction reports using natural language processing. PLoS One 2022; 17:e0270595. [PMID: 35925971 PMCID: PMC9352066 DOI: 10.1371/journal.pone.0270595] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 06/14/2022] [Indexed: 11/25/2022] Open
Abstract
Allergic reactions to medication range from mild to severe or even life-threatening. Proper documentation of patient allergy information is critical for safe prescription, avoiding drug interactions, and reducing healthcare costs. Allergy information is regularly obtained during the medical interview, but is often poorly documented in electronic health records (EHRs). While many EHRs allow for structured adverse drug reaction (ADR) reporting, a free-text entry is still common. The resulting information is neither interoperable nor easily reusable for other applications, such as clinical decision support systems and prescription alerts. Current approaches require pharmacists to review and code ADRs documented by healthcare professionals. Recently, the effectiveness of machine algorithms in natural language processing (NLP) has been widely demonstrated. Our study aims to develop and evaluate different NLP algorithms that can encode unstructured ADRs stored in EHRs into institutional symptom terms. Our dataset consists of 79,712 pharmacist-reviewed drug allergy records. We evaluated three NLP techniques: Naive Bayes—Support Vector Machine (NB-SVM), Universal Language Model Fine-tuning (ULMFiT), and Bidirectional Encoder Representations from Transformers (BERT). We tested different general-domain pre-trained BERT models, including mBERT, XLM-RoBERTa, and WanchanBERTa, as well as our domain-specific AllergyRoBERTa, which was pre-trained from scratch on our corpus. Overall, BERT models had the highest performance. NB-SVM outperformed ULMFiT and BERT for several symptom terms that are not frequently coded. The ensemble model achieved an exact match ratio of 95.33%, a F1 score of 98.88%, and a mean average precision of 97.07% for the 36 most frequently coded symptom terms. The model was then further developed into a symptom term suggestion system and achieved a Krippendorff’s alpha agreement coefficient of 0.7081 in prospective testing with pharmacists. Some degree of automation could both accelerate the availability of allergy information and reduce the efforts for human coding.
Collapse
Affiliation(s)
- Sitthichok Chaichulee
- Department of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand
- Division of Digital Innovation and Data Analytics, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand
- * E-mail:
| | - Chissanupong Promchai
- Department of Pharmacy, Songklanagarind Hospital, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand
| | - Tanyamai Kaewkomon
- Department of Pharmacy, Songklanagarind Hospital, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand
| | - Chanon Kongkamol
- Department of Family and Preventive Medicine, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand
- Division of Digital Innovation and Data Analytics, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand
| | - Thammasin Ingviya
- Department of Family and Preventive Medicine, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand
- Division of Digital Innovation and Data Analytics, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand
| | - Pasuree Sangsupawanich
- Department of Pediatrics, Faculty of Medicine, Prince of Songkla University, Hatyai, Songkhla, Thailand
| |
Collapse
|
7
|
Faris H, Faris M, Habib M, Alomari A. Automatic symptoms identification from a massive volume of unstructured medical consultations using deep neural and BERT models. Heliyon 2022; 8:e09683. [PMID: 35761935 PMCID: PMC9233221 DOI: 10.1016/j.heliyon.2022.e09683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 04/10/2022] [Accepted: 06/01/2022] [Indexed: 11/25/2022] Open
Abstract
Automatic symptom identification plays a crucial role in assisting doctors during the diagnosis process in Telemedicine. In general, physicians spend considerable time on clinical documentation and symptom identification, which is unfeasible due to their full schedule. With text-based consultation services in telemedicine, the identification of symptoms from a user's consultation is a sophisticated process and time-consuming. Moreover, at Altibbi, which is an Arabic telemedicine platform and the context of this work, users consult doctors and describe their conditions in different Arabic dialects which makes the problem more complex and challenging. Therefore, in this work, an advanced deep learning approach is developed consultations with multi-dialects. The approach is formulated as a multi-label multi-class classification using features extracted based on AraBERT and fine-tuned on the bidirectional long short-term memory (BiLSTM) network. The Fine-tuning of BiLSTM relies on features engineered based on different variants of the bidirectional encoder representations from transformers (BERT). Evaluating the models based on precision, recall, and a customized hit rate showed a successful identification of symptoms from Arabic texts with promising accuracy. Hence, this paves the way toward deploying an automated symptom identification model in production at Altibbi which can help general practitioners in telemedicine in providing more efficient and accurate consultations.
Collapse
Affiliation(s)
- Hossam Faris
- King Abdullah II School for Information Technology, The University of Jordan, 11942, Jordan.,Research Centre for Information and Communications Technologies of the University of Granada (CITIC-UGR), University of Granada, Granada, Spain.,Altibbi1https://altibbi.com., Amman, Jordan
| | | | | | - Alaa Alomari
- Altibbi1https://altibbi.com., Amman, Jordan.,School of Informatics and Telecommunications Engineering, University of Granada, Granada, Spain
| |
Collapse
|
8
|
DiMartino L, Miano T, Wessell K, Bohac B, Hanson LC. Identification of Uncontrolled Symptoms in Cancer Patients Using Natural Language Processing. J Pain Symptom Manage 2022; 63:610-617. [PMID: 34743011 PMCID: PMC8930509 DOI: 10.1016/j.jpainsymman.2021.10.014] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 10/22/2021] [Accepted: 10/24/2021] [Indexed: 12/25/2022]
Abstract
CONTEXT For patients with cancer, uncontrolled pain and other symptoms are the leading cause of unplanned hospitalizations. Early access to specialty palliative care (PC) is effective to reduce symptom burden, but more efficient approaches are needed for rapid identification and referral. Information on symptom burden largely exists in free-text notes, limiting its utility as a trigger for best practice alerts or automated referrals. OBJECTIVES To evaluate whether natural language processing (NLP) can be used to identify uncontrolled symptoms (pain, dyspnea, or nausea/vomiting) in the electronic health record (EHR) among hospitalized cancer patients with advanced disease. METHODS The dataset included 1,644 hospitalization encounters for cancer patients admitted from 1/2017 -6/2019. We randomly sampled 296 encounters, which included 15,580 clinical notes. We manually reviewed the notes and recorded symptom severity. The primary endpoint was an indicator for whether a symptom was labeled as "controlled" (none, mild, not reported) or as "uncontrolled" (moderate or severe). We randomly split the data into training and test sets and used the Random Forest algorithm to evaluate final model performance. RESULTS Our models predicted presence of an uncontrolled symptom with the following performance: pain with 61% accuracy, 69% sensitivity, and 46% specificity (F1: 69.5); nausea/vomiting with 68% accuracy, 21% sensitivity, and 90% specificity (F1: 29.4); and dyspnea with 80% accuracy, 22% sensitivity, and 88% specificity (F1: 21.1). CONCLUSION This study demonstrated initial feasibility of using NLP to identify hospitalized cancer patients with uncontrolled symptoms. Further model development is needed before these algorithms could be implemented to trigger early access to PC.
Collapse
Affiliation(s)
- Lisa DiMartino
- RTI International, Translational Health Sciences Division (L.D.), Research Triangle Park, NC, USA.
| | - Thomas Miano
- RTI International, Center for Data Science (T.M.), Research Triangle Park, NC, USA
| | - Kathryn Wessell
- Cecil G. Sheps Center for Health Services Research, University of North Carolina at Chapel Hill (K.W., L.C.H.), Chapel Hill, NC, USA
| | - Buck Bohac
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill (B.B.), Chapel Hill, NC, USA
| | - Laura C Hanson
- Cecil G. Sheps Center for Health Services Research, University of North Carolina at Chapel Hill (K.W., L.C.H.), Chapel Hill, NC, USA; Division of Geriatric Medicine, University of North Carolina at Chapel Hill (L.C.H.), Chapel Hill, NC, USA
| |
Collapse
|
9
|
Agaronnik ND, Kwok A, Schoenfeld AJ, Lindvall C. Natural language processing for automated surveillance of intraoperative neuromonitoring in spine surgery. J Clin Neurosci 2022; 97:121-126. [PMID: 35093791 DOI: 10.1016/j.jocn.2022.01.015] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 09/08/2021] [Accepted: 01/16/2022] [Indexed: 10/19/2022]
Abstract
We sought to develop natural language processing (NLP) methods for automated detection and characterization of neuromonitoring documentation from free-text operative reports in patients undergoing spine surgery. We included 13,718 patients who received spine surgery at two tertiary academic medical centers between December 2000 - December 2020. We first validated a rule-based NLP method for identifying operative reports containing neuromonitoring documentation, comparing performance to standard administrative codes. We then trained a deep learning model in a subset of 993 patients to characterize neuromonitoring documentation and identify events indicating change in status or difficulty establishing baseline signals. Performance of the deep learning model was compared to gold-standard manual chart review. In our patient population, 3,606 (26.3%) patients had neuromonitoring documentation identified using NLP. Our NLP method identified notes containing neuromonitoring documentation with an F1-score of 1.0, surpassing performance of standard administrative codes which had an F1-score of 0.64. In the subset of 993 patients used for training, validation, and testing a deep learning model, the prevalence of change in status was 6.5% and difficulty establishing neuromonitoring baseline signals was 6.6%. The deep learning model had an F1-score = 0.80 and AUC-ROC = 1.0 for identifying change in status, and an F1-score = 0.80 and AUC-ROC = 0.97 for identifying difficulty establishing baseline signals. Compared to gold standard manual chart review, our methodology has greater efficiency for identifying infrequent yet important types of neuromonitoring documentation. This method may facilitate large-scale quality improvement initiatives that require timely analysis of a large volume of EHRs.
Collapse
Affiliation(s)
- Nicole D Agaronnik
- Harvard Medical School, Artificial Intelligence Operations and Data Science, Dana-Farber Cancer Institute, 25 Shattuck Street, Boston, MA 02115, United States.
| | - Anne Kwok
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, 02115, United States
| | - Andrew J Schoenfeld
- Department of Orthopaedic Surgery, Brigham and Women's Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115, United States
| | - Charlotta Lindvall
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 450 Brookline Ave, Boston, MA, 02115, United States
| |
Collapse
|
10
|
Zhu M, Fan X, Liu W, Shen J, Chen W, Xu Y, Yu X. Artificial Intelligence-Based Echocardiographic Left Atrial Volume Measurement with Pulmonary Vein Comparison. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:1336762. [PMID: 34912531 PMCID: PMC8668302 DOI: 10.1155/2021/1336762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 11/03/2021] [Accepted: 11/05/2021] [Indexed: 11/17/2022]
Abstract
This paper combines echocardiographic signal processing and artificial intelligence technology to propose a deep neural network model adapted to echocardiographic signals to achieve left atrial volume measurement and automatic assessment of pulmonary veins efficiently and quickly. Based on the echocardiographic signal generation mechanism and detection method, an experimental scheme for the echocardiographic signal acquisition was designed. The echocardiographic signal data of healthy subjects were measured in four different experimental states, and a database of left atrial volume measurements and pulmonary veins was constructed. Combining the correspondence between ECG signals and echocardiographic signals in the time domain, a series of preprocessing such as denoising, feature point localization, and segmentation of the cardiac cycle was realized by wavelet transform and threshold method to complete the data collection. This paper proposes a comparative model based on artificial intelligence, adapts to the characteristics of one-dimensional time-series echocardiographic signals, automatically extracts the deep features of echocardiographic signals, effectively reduces the subjective influence of manual feature selection, and realizes the automatic classification and evaluation of human left atrial volume measurement and pulmonary veins under different states. The experimental results show that the proposed BP neural network model has good adaptability and classification performance in the tasks of LV volume measurement and pulmonary vein automatic classification evaluation and achieves an average test accuracy of over 96.58%. The average root-mean-square error percentage of signal compression is only 0.65% by extracting the coding features of the original echocardiographic signal through the convolutional autoencoder, which completes the signal compression with low loss. Comparing the training time and classification accuracy of the LSTM network with the original signal and encoded features, the experimental results show that the AI model can greatly reduce the model training time cost and achieve an average accuracy of 97.97% in the test set and increase the real-time performance of the left atrial volume measurement and pulmonary vein evaluation as well as the security of the data transmission process, which is very important for the comparison of left atrial volume measurement and pulmonary vein. It is of great practical importance to compare left atrial volume measurements with pulmonary veins.
Collapse
Affiliation(s)
- Mengyun Zhu
- Department of Cardiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai 200072, China
| | - Ximin Fan
- Department of Cardiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai 200072, China
| | - Weijing Liu
- Department of Cardiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai 200072, China
| | - Jianying Shen
- Department of Cardiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai 200072, China
| | - Wei Chen
- Department of Cardiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai 200072, China
| | - Yawei Xu
- Department of Cardiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai 200072, China
| | - Xuejing Yu
- Department of Cardiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai 200072, China
| |
Collapse
|
11
|
de Oliveira JM, da Costa CA, Antunes RS. Data structuring of electronic health records: a systematic review. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00607-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
12
|
Reading Turchioe M, Volodarskiy A, Pathak J, Wright DN, Tcheng JE, Slotwiner D. Systematic review of current natural language processing methods and applications in cardiology. Heart 2021; 108:909-916. [PMID: 34711662 DOI: 10.1136/heartjnl-2021-319769] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 09/29/2021] [Indexed: 01/16/2023] Open
Abstract
Natural language processing (NLP) is a set of automated methods to organise and evaluate the information contained in unstructured clinical notes, which are a rich source of real-world data from clinical care that may be used to improve outcomes and understanding of disease in cardiology. The purpose of this systematic review is to provide an understanding of NLP, review how it has been used to date within cardiology and illustrate the opportunities that this approach provides for both research and clinical care. We systematically searched six scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, PubMed and Scopus) for studies published in 2015-2020 describing the development or application of NLP methods for clinical text focused on cardiac disease. Studies not published in English, lacking a description of NLP methods, non-cardiac focused and duplicates were excluded. Two independent reviewers extracted general study information, clinical details and NLP details and appraised quality using a checklist of quality indicators for NLP studies. We identified 37 studies developing and applying NLP in heart failure, imaging, coronary artery disease, electrophysiology, general cardiology and valvular heart disease. Most studies used NLP to identify patients with a specific diagnosis and extract disease severity using rule-based NLP methods. Some used NLP algorithms to predict clinical outcomes. A major limitation is the inability to aggregate findings across studies due to vastly different NLP methods, evaluation and reporting. This review reveals numerous opportunities for future NLP work in cardiology with more diverse patient samples, cardiac diseases, datasets, methods and applications.
Collapse
Affiliation(s)
- Meghan Reading Turchioe
- Department of Population Health Sciences, Division of Health Informatics, Weill Cornell Medicine, New York, New York, USA
| | - Alexander Volodarskiy
- Department of Medicine, Division of Cardiology, NewYork-Presbyterian Hospital, New York, New York, USA
| | - Jyotishman Pathak
- Department of Population Health Sciences, Division of Health Informatics, Weill Cornell Medicine, New York, New York, USA
| | - Drew N Wright
- Samuel J. Wood Library & C.V. Starr Biomedical Information Center, Weill Cornell Medical College, New York, New York, USA
| | - James Enlou Tcheng
- Department of Medicine, Duke University School of Medicine, Durham, North Carolina, USA
| | - David Slotwiner
- Department of Population Health Sciences, Division of Health Informatics, Weill Cornell Medicine, New York, New York, USA.,Department of Medicine, Division of Cardiology, NewYork-Presbyterian Hospital, New York, New York, USA
| |
Collapse
|